Building a Lock-Free SPSC Queue in Rust
An SPSC queue is the simplest lock-free data structure that matters in practice. Audio callbacks, real-time control loops, inter-thread pipelines. Done right: 10–50x faster than a mutex.
The Ring
Buffer
pub struct SpscQueue<T, const N: usize> {
head: AtomicUsize,
_pad: [u8; 64
- size_of::<AtomicUsize>()], // prevent false sharing
tail: AtomicUsize,
buffer:
[UnsafeCell<MaybeUninit<T>>; N],
}
Cache line padding prevents false sharing — without it,
producer writes to head invalidate the consumer's L1 cache line on every operation.
Push and
Pop
pub fn push(&self, value: T) -> bool {
let head =
self.head.load(Ordering::Relaxed);
let next = (head + 1) % N;
if next ==
self.tail.load(Ordering::Acquire) { return false; }
unsafe {
(*self.buffer[head].get()).write(value); }
self.head.store(next, Ordering::Release);
true
}
pub fn pop(&self) -> Option<T> {
let tail = self.tail.load(Ordering::Relaxed);
if
tail == self.head.load(Ordering::Acquire) { return None; }
let value = unsafe {
(*self.buffer[tail].get()).assume_init_read() };
self.tail.store((tail + 1) % N,
Ordering::Release);
Some(value)
}
Producer reads tail with Acquire, writes head with
Release. Consumer mirrors this. Each thread reads its own index with Relaxed.
Performance on
M2
| Method | Throughput | P50 |
|--------|-----------|-----|
| Mutex<VecDeque> | 48 M/s | 21 ns
|
| crossbeam::channel | 180 M/s | 5.5 ns |
| This SPSC | 620 M/s | 1.6 ns |
No contention by design — two atomic loads per operation, hidden behind execution.
When to Use
Audio callback →
processing thread. Sensor interrupt → control loop. Packet demux → per-connection workers. For anything
else, reach for crossbeam first.