Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Continuations: CPS Data Types

Three types carry the fold’s state through the CPS pipeline: FunnelTask (the parallelism boundary), Cont (the continuation), and ChainNode (the multi-child accumulator). A fourth, RootCell, is the terminal sink for the final result. Together they replace implicit stack frames with explicit data that can be created on one thread and consumed on another.

FunnelTask

#![allow(unused)]
fn main() {
pub enum FunnelTask<N, H, R> {
    Walk { child: N, cont: Cont<H, R> },
}
}

The unit of parallelism. Stored inline in deque slots (PerWorker) or queue segments (Shared). No heap allocation per task — the enum variant IS the data. N must be Clone + Send (cloned during graph.visit, sent across threads). R must be Send (results are moved across threads via destructive slot reads). H has no bounds — it travels inside Cont::Direct.

Cont

#![allow(unused)]
fn main() {
pub enum Cont<H, R> {
    /// Raw pointer to stack-local RootCell in run_fold.
    /// SAFETY: The scoped pool guarantees all workers complete before
    /// run_fold returns — the RootCell outlives every Cont::Root.
    Root(*const RootCell<R>),
    Direct { heap: H, parent_idx: ContIdx },
    Slot { node: ArenaIdx, slot: SlotRef },
}
}

The defunctionalized continuation. Tells fire_cont what to do with a result:

Cont::Root

Terminal. Created once per fold. When fire_cont receives it, the fold is complete: the result is written to the RootCell and fold_done is signaled. Size: 8 bytes (one raw pointer).

The RootCell lives on run_fold’s stack — no heap allocation. The raw pointer is safe because the scoped pool guarantees all workers complete before run_fold returns.

Cont::Direct

Single-child fast path. The heap value travels WITH the continuation — no ChainNode, no FoldChain, no atomics. parent_idx is a ContIdx(u32) into the ContArena. When fire_cont receives it: accumulate the result into the heap, finalize, take the parent continuation from the arena, continue the loop.

Size: sizeof(H) + 4 bytes.

Cont::Slot

Multi-child delivery. Two u32 indices: node (arena index to the ChainNode) and slot (which position in the FoldChain). When fire_cont receives it: deliver the result to the slot, check the ticket. If this was the last event, sweep/finalize the chain and take the parent continuation. If not, return — another thread will finalize.

Size: 8 bytes (two u32, regardless of H or R).

ChainNode

#![allow(unused)]
fn main() {
pub(crate) struct ChainNode<H, R> {
    pub(crate) chain: FoldChain<H, R>,
    parent_cont: UnsafeCell<Option<Cont<H, R>>>,
}
}

Arena-allocated. Created lazily on child 2 (never for single-child nodes). Contains:

  • chain: the FoldChain — slot cells, heap, ticket state
  • parent_cont: the continuation of the creating node, moved out exactly once by the finalizing thread via take_parent_cont()

Continuation graph

For a tree with root R, child A (2 children: C, D), and child B (1 child: E, leaf):

%3rootCont::Root*const RootCell(stack-local)chainRChainNode(R)FoldChainparent→RootchainR->rootcascadeslotR0Slot{R, 0}slotR0->chainRdeliverslotR1Slot{R, 1}slotR1->chainRdeliverdirectBDirect{heap_B}parent→Slot{R,1}directB->slotR1acc+fincascadeleafCleaf Cfinalize → fire_contleafC->slotR0leafDleaf Dfinalize → fire_contleafEleaf Efinalize → fire_contleafE->directB

Leaf C finalizes, delivers to Slot{R,0}. Leaf E finalizes, fires Direct for B (accumulates + finalizes), delivers to Slot{R,1}. Whichever delivery is last (ticket) sweeps ChainNode(R) and fires Root.

Data ownership

Each CPS type lives in a specific memory region:

%3cluster_stackrun_fold stackcluster_arenaArenas (bump alloc)cluster_dequeDeque / StealQueuecluster_stack2run_fold stack (local)wctxWalkCtxrefs to fold, grapharenas, viewjobJobfn ptr + data ptrcnChainNodeFoldChain + parent_contcaContArenaparent Conts(for bf=1 chains)taskFunnelTask::Walkchild + Cont(inline in slot)task->cnSlot: ArenaIdxtask->caDirect: ContIdxrootRootCellresult + done flag(stack-allocated)task->rootRoot: raw ptr

Deque stores tasks inline. Arena indices are u32 (Copy, no refcount). The CPS pipeline has zero heap allocations on the critical path — RootCell is stack-local, arenas grow lazily via segmented allocation, and tasks are stored inline in deque slots.

Size summary

TypeSizeNotes
Cont::Root8 bytesraw pointer to stack-local RootCell
Cont::Directsizeof(H) + 4heap value + ContIdx(u32)
Cont::Slot8 bytesArenaIdx(u32) + SlotRef(u32)
FunnelTask::Walksizeof(N) + sizeof(Cont) + tagstored inline in deque
ChainNodesizeof(FoldChain) + sizeof(Option<Cont>)arena-allocated
RootCellsizeof(Option<R>) + 1stack-local in run_fold