Queue Strategies
Two work-stealing strategies, selected at compile time via the
WorkStealing trait:
#![allow(unused)]
fn main() {
/// A work-stealing strategy. Associates typed Store and Handle via GATs.
pub trait WorkStealing: 'static {
type Spec: Copy + Default + Send + Sync;
type Store<N: Send + 'static, H: 'static, R: Send + 'static>: Send + Sync;
type Handle<'a, N: Send + 'static, H: 'static, R: Send + 'static>: TaskOps<N, H, R>
where Self: 'a;
fn create_store<N: Send + 'static, H: 'static, R: Send + 'static>(
spec: &Self::Spec, n_workers: usize,
) -> Self::Store<N, H, R>;
fn reset_store<N: Send + 'static, H: 'static, R: Send + 'static>(
store: &mut Self::Store<N, H, R>,
);
fn handle<'a, N: Send + 'static, H: 'static, R: Send + 'static>(
store: &'a Self::Store<N, H, R>, worker_idx: usize,
) -> Self::Handle<'a, N, H, R>;
}
}
The three associated types capture the queue lifecycle:
Spec: construction-time configurationStore<N, H, R>: per-fold resources (deques, bitmask, etc.)Handle<'a, N, H, R>: per-worker view that borrows from Store
Workers interact through TaskOps:
#![allow(unused)]
fn main() {
/// Per-worker task operations. Each WorkStealing::Handle implements this.
pub trait TaskOps<N, H, R> {
/// Returns None on success, Some(task) if queue full (caller executes inline).
fn push(&self, task: FunnelTask<N, H, R>) -> Option<FunnelTask<N, H, R>>;
fn try_acquire(&self) -> Option<FunnelTask<N, H, R>>;
}
}
push returns Some(task) if the queue is full — the caller
executes inline (Cilk overflow protocol). try_acquire encapsulates
the strategy’s acquisition policy.
PerWorker: Chase-Lev deques + bitmask steal
Each worker owns a Chase-Lev deque. Push is LIFO (local, no atomic).
Steal uses an AtomicU64 bitmask to find non-empty deques — one
atomic load instead of scanning N deques.
#![allow(unused)]
fn main() {
impl<N: Send + 'static, H: 'static, R: Send + 'static> TaskOps<N, H, R>
for PerWorkerHandle<'_, N, H, R>
{
fn push(&self, task: FunnelTask<N, H, R>) -> Option<FunnelTask<N, H, R>> {
match self.my_deque.push(task) {
Ok(()) => {
self.work_available.fetch_or(1u64 << self.my_idx, Ordering::Relaxed);
None
}
Err(task) => Some(task),
}
}
fn try_acquire(&self) -> Option<FunnelTask<N, H, R>> {
// Local deque first — LIFO pop, cache-warm, no contention.
if let Some(task) = self.my_deque.pop() { return Some(task); }
// Bitmask-guided steal from other deques.
let mut bits = self.work_available.load(Ordering::Relaxed);
bits &= !(1u64 << self.my_idx);
while bits != 0 {
let target = bits.trailing_zeros() as usize;
if let Some(task) = self.all_deques[target].steal() {
return Some(task);
}
self.work_available.fetch_and(!(1u64 << target), Ordering::Relaxed);
bits &= !(1u64 << target);
}
None
}
}
}
Push: LIFO to own deque (no CAS), set bit in bitmask.
Try acquire: Pop local deque first (cache-warm, zero contention). If empty, load bitmask, iterate set bits, steal FIFO from first non-empty deque. Clear bit if deque found empty.
Best for: Trees with moderate-to-high branching. LIFO push + FIFO steal gives depth-first local execution with breadth-first work distribution. The bitmask avoids scanning all N deques.
Shared: single StealQueue
All threads push to one queue. All threads steal from it.
Push: fetch_add on bottom, write to segment slot.
Steal: CAS on top, read from segment slot. FIFO order.
Best for: Wide trees (bf > 10) and small trees where per-worker deque overhead is disproportionate. No bitmask, no per-deque allocation.
When to use which
| Workload | Strategy | Why |
|---|---|---|
| General (bf=4-8) | PerWorker | LIFO locality, bitmask steal |
| Wide (bf > 10) | Shared | No deque allocation per worker |
| Deep narrow (bf=2) | PerWorker | DFS spine dominates |
| Small tree (< 50 nodes) | Shared | Lower fixed overhead |