Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Queue Strategies

Two work-stealing strategies, selected at compile time via the WorkStealing trait:

#![allow(unused)]
fn main() {
/// A work-stealing strategy. Associates typed Store and Handle via GATs.
pub trait WorkStealing: 'static {
    type Spec: Copy + Default + Send + Sync;
    type Store<N: Send + 'static, H: 'static, R: Send + 'static>: Send + Sync;
    type Handle<'a, N: Send + 'static, H: 'static, R: Send + 'static>: TaskOps<N, H, R>
    where Self: 'a;

    fn create_store<N: Send + 'static, H: 'static, R: Send + 'static>(
        spec: &Self::Spec, n_workers: usize,
    ) -> Self::Store<N, H, R>;

    fn reset_store<N: Send + 'static, H: 'static, R: Send + 'static>(
        store: &mut Self::Store<N, H, R>,
    );

    fn handle<'a, N: Send + 'static, H: 'static, R: Send + 'static>(
        store: &'a Self::Store<N, H, R>, worker_idx: usize,
    ) -> Self::Handle<'a, N, H, R>;
}
}

The three associated types capture the queue lifecycle:

  • Spec: construction-time configuration
  • Store<N, H, R>: per-fold resources (deques, bitmask, etc.)
  • Handle<'a, N, H, R>: per-worker view that borrows from Store

Workers interact through TaskOps:

#![allow(unused)]
fn main() {
/// Per-worker task operations. Each WorkStealing::Handle implements this.
pub trait TaskOps<N, H, R> {
    /// Returns None on success, Some(task) if queue full (caller executes inline).
    fn push(&self, task: FunnelTask<N, H, R>) -> Option<FunnelTask<N, H, R>>;
    fn try_acquire(&self) -> Option<FunnelTask<N, H, R>>;
}
}

push returns Some(task) if the queue is full — the caller executes inline (Cilk overflow protocol). try_acquire encapsulates the strategy’s acquisition policy.

PerWorker: Chase-Lev deques + bitmask steal

Each worker owns a Chase-Lev deque. Push is LIFO (local, no atomic). Steal uses an AtomicU64 bitmask to find non-empty deques — one atomic load instead of scanning N deques.

%3cluster_dequesPerWorkerStored0deque[0]worker 0LIFO push/popbitmaskwork_availableAtomicU64 bitmaskbit i = deque[i] non-emptyd0->bitmaskset bitd1deque[1]worker 1d1->bitmaskd2deque[2]worker 2d2->bitmaskdcdeque[N]caller

#![allow(unused)]
fn main() {
impl<N: Send + 'static, H: 'static, R: Send + 'static> TaskOps<N, H, R>
    for PerWorkerHandle<'_, N, H, R>
{
    fn push(&self, task: FunnelTask<N, H, R>) -> Option<FunnelTask<N, H, R>> {
        match self.my_deque.push(task) {
            Ok(()) => {
                self.work_available.fetch_or(1u64 << self.my_idx, Ordering::Relaxed);
                None
            }
            Err(task) => Some(task),
        }
    }

    fn try_acquire(&self) -> Option<FunnelTask<N, H, R>> {
        // Local deque first — LIFO pop, cache-warm, no contention.
        if let Some(task) = self.my_deque.pop() { return Some(task); }
        // Bitmask-guided steal from other deques.
        let mut bits = self.work_available.load(Ordering::Relaxed);
        bits &= !(1u64 << self.my_idx);
        while bits != 0 {
            let target = bits.trailing_zeros() as usize;
            if let Some(task) = self.all_deques[target].steal() {
                return Some(task);
            }
            self.work_available.fetch_and(!(1u64 << target), Ordering::Relaxed);
            bits &= !(1u64 << target);
        }
        None
    }
}
}

Push: LIFO to own deque (no CAS), set bit in bitmask.

Try acquire: Pop local deque first (cache-warm, zero contention). If empty, load bitmask, iterate set bits, steal FIFO from first non-empty deque. Clear bit if deque found empty.

Best for: Trees with moderate-to-high branching. LIFO push + FIFO steal gives depth-first local execution with breadth-first work distribution. The bitmask avoids scanning all N deques.

Shared: single StealQueue

All threads push to one queue. All threads steal from it.

%3queueStealQueuepush → bottom++steal → top++monotonic indicesw0worker 0w0->queuepush/stealw1worker 1w1->queuepush/stealw2worker 2w2->queuepush/stealwccallerwc->queuepush/steal

Push: fetch_add on bottom, write to segment slot.

Steal: CAS on top, read from segment slot. FIFO order.

Best for: Wide trees (bf > 10) and small trees where per-worker deque overhead is disproportionate. No bitmask, no per-deque allocation.

When to use which

WorkloadStrategyWhy
General (bf=4-8)PerWorkerLIFO locality, bitmask steal
Wide (bf > 10)SharedNo deque allocation per worker
Deep narrow (bf=2)PerWorkerDFS spine dominates
Small tree (< 50 nodes)SharedLower fixed overhead