Ticket System: Packed AtomicU64
Each multi-child node has K+1 concurrent events: K child deliveries
and 1 set_total. A single AtomicU64 determines which event is
last — the finalizer. No separate counters, no missed completions.
The problem
graph.visit produces children through a push callback. Workers may
complete and deliver results before the iterator finishes. Two things
happen concurrently:
- Child deliveries: each result written to a slot. Multiple threads, unpredictable order.
set_total: aftergraph.visitreturns, the child count is recorded.
Exactly one thread must detect that ALL events have occurred and perform the finalization.
Why two separate variables fail (Dekker race)
With delivered: AtomicU32 and total_known: AtomicBool separately:
- Thread A (delivering child K):
delivered.fetch_add(1), then checkstotal_known.load()→ seesfalse(stale) - Thread B (set_total):
total_known.store(true), then checksdelivered.load()→ seesK-1(stale)
Neither sees the complete state. Both exit. The fold hangs.
The solution: packed state
#![allow(unused)]
fn main() {
pub struct FoldChain<H, R> {
heap: UnsafeCell<H>,
first: SlotBuf<R>,
appended: AtomicU32,
state: AtomicU64, // low32: events_done, high32: total (0=unknown)
sweep: AtomicU32, // bit31: sweeping gate, bits 0-30: cursor position
done: AtomicBool, // finalized
}
}
state: AtomicU64 packs both counters into one word:
| Bits 63–32 | Bits 31–0 |
|---|---|
total (0 = unknown) | events_done |
#![allow(unused)]
fn main() {
fn pack_total(total: u32) -> u64 { (total as u64) << 32 }
fn unpack(state: u64) -> (u32, u32) { (state as u32, (state >> 32) as u32) }
}
How it works
Each event does a single fetch_add on state:
#![allow(unused)]
fn main() {
pub fn deliver_and_sweep<N>(&self, slot: SlotRef, result: R, fold: &impl FoldOps<N, H, R>) -> Option<R> {
let cell = self.slot_at(slot.0);
// SAFETY: each slot is delivered exactly once (slots come from
// append_slot + SlotRef, never cloned). The Release store on
// `filled` publishes the write to whoever sweeps via Acquire.
unsafe { (*cell.result.get()).write(result); }
cell.filled.store(true, Ordering::Release);
let prev = self.state.fetch_add(1, Ordering::Relaxed);
let (done_before, total) = unpack(prev);
let am_finalizer = total > 0 && done_before + 1 >= total;
}
Delivery adds 1 to the low 32 bits (events_done).
set_total adds pack_total(K) to the high 32 bits.
fetch_add is a read-modify-write (RMW) — atomically reads the
previous state, modifies it, and writes back. No window for
interleaving. Each event gets a unique snapshot of the previous state.
The finalizer condition on prev:
- Delivery:
prev_total > 0 && prev_done + 1 >= prev_total set_total:prev_done >= total
State transition examples
Deliveries before set_total:
All deliveries before set_total (set_total is finalizer):
In every interleaving, exactly one event transitions the state to
{done ≥ total, total > 0}.
Why Relaxed ordering is correct
The ticket determines WHO finalizes, not data visibility. Slot data
visibility is guaranteed by per-slot filled.store(true, Release) /
filled.load(Acquire) pairs. The sweep reads slots only after
confirming filled == true.
RMW linearization is ordering-independent: fetch_add operations on
a single atomic word are totally ordered by the CPU’s coherence
protocol regardless of the memory ordering specified. Relaxed does
not weaken atomicity — it only relaxes ordering with respect to OTHER
memory locations.
The exactly-one-finalizer proof
Claim: For K children, exactly one of K+1 events identifies itself as the finalizer.
Proof: The K+1 fetch_add operations form a total order (RMW
linearization). Each returns a unique prev. Before set_total
fires, total = 0 in all prev values — no delivery can satisfy
the condition. After set_total fires, each subsequent delivery
increments done. The delivery that pushes done to total is the
first (and only) to satisfy the condition. If all deliveries fire
before set_total, then set_total sees prev_done ≥ total and is
the finalizer. QED.