RustConcurrencySystems

Building a Lock-Free SPSC Queue in Rust

·2 min read

An SPSC queue is the simplest lock-free data structure that matters in practice. Audio callbacks, real-time control loops, inter-thread pipelines. Done right: 10–50x faster than a mutex.

The Ring

Buffer

pub struct SpscQueue<T, const N: usize> {
    head: AtomicUsize,
    _pad: [u8; 64 
  - size_of::<AtomicUsize>()], // prevent false sharing
    tail: AtomicUsize,
    buffer:              
  [UnsafeCell<MaybeUninit<T>>; N],
}

Cache line padding prevents false sharing — without it, producer writes to head invalidate the consumer's L1 cache line on every operation.

Push and

Pop

pub fn push(&self, value: T) -> bool {
    let head =
  self.head.load(Ordering::Relaxed);
    let next = (head + 1) % N;
    if next ==
  self.tail.load(Ordering::Acquire) { return false; }
    unsafe {
  (*self.buffer[head].get()).write(value); }
    self.head.store(next, Ordering::Release);

  true
}

pub fn pop(&self) -> Option<T> {
    let tail = self.tail.load(Ordering::Relaxed);
    if
  tail == self.head.load(Ordering::Acquire) { return None; }
    let value = unsafe {
  (*self.buffer[tail].get()).assume_init_read() };
    self.tail.store((tail + 1) % N,
  Ordering::Release);
    Some(value)
}

Producer reads tail with Acquire, writes head with Release. Consumer mirrors this. Each thread reads its own index with Relaxed.

Performance on

M2

| Method | Throughput | P50 | |--------|-----------|-----| | Mutex<VecDeque> | 48 M/s | 21 ns | | crossbeam::channel | 180 M/s | 5.5 ns | | This SPSC | 620 M/s | 1.6 ns |

No contention by design — two atomic loads per operation, hidden behind execution.

When to Use

Audio callback → processing thread. Sensor interrupt → control loop. Packet demux → per-connection workers. For anything else, reach for crossbeam first.