Accumulation Strategies

Two ways to fold child results into the parent’s heap, selected at compile time via the AccumulateStrategy trait. Both preserve child order — accumulate is called in slot order regardless of which worker delivered first. This is what allows hylic’s non-associative accumulate to run correctly in parallel:

#![allow(unused)]
fn main() {
/// Accumulation strategy: how child results flow into the parent's heap.
pub trait AccumulateStrategy: 'static {
    type Spec: Copy + Default + Send + Sync;

    fn deliver<N, H, R>(
        chain: &FoldChain<H, R>, slot: SlotRef, result: R,
        fold: &impl FoldOps<N, H, R>,
    ) -> Option<R>;

    fn set_total<N, H, R>(
        chain: &FoldChain<H, R>,
        fold: &impl FoldOps<N, H, R>,
    ) -> Option<R>;
}
}

Both use the same ticket system for last-event detection. They differ in WHEN and HOW the heap is swept.

OnArrival: streaming sweep

Each delivery tries to sweep contiguous filled slots immediately. A CAS gate (bit 31 of sweep: AtomicU32) ensures only one thread sweeps at a time. A cursor (bits 0-30) tracks sweep progress.

Per-delivery flow:

Write result to slot, filled.store(true, Release)
state.fetch_add(1, Relaxed) — take ticket
Try CAS gate: if won, sweep contiguous filled slots from cursor
If this was the last event (ticket), spin until sweep completes

Advantage: Results accumulate as they arrive — lower latency to completion when children finish in order.

Cost: CAS contention on the sweep gate when multiple threads deliver simultaneously. ~16-28ns per delivery depending on gate outcome.

OnFinalize: bulk sweep

Deliveries only store + ticket. No sweep, no CAS gate. The last event (determined by the ticket) bulk-sweeps all slots at once.

Per-delivery flow:

Write result to slot, filled.store(true, Release)
state.fetch_add(1, Relaxed) — take ticket
If last event: iterate all slots, filled.load(Acquire) each, accumulate, finalize

Advantage: Zero CAS contention per delivery. Each delivery is ~11ns (store + fetch_add only).

Cost: The finalizer must spin-wait on any slot whose filled hasn’t been published yet (rare — Release/Acquire propagates quickly). Cold-cache sweep if slots were written by other cores.

When to use which

Workload	Strategy	Why
Init-heavy, graph-heavy	OnArrival	Streaming pipelines computation
Balanced, finalize-heavy	OnFinalize	Minimal per-delivery overhead
Wide trees (bf > 10)	Either	OnArrival if results arrive in order
Deep narrow (bf=2)	OnFinalize	Only 2 slots — sweep trivial

Memory footprint

Both strategies use destructive reads — results are moved out of their slots during accumulation, then dropped. For result types that own heap memory (String, Vec, etc.), this means resources are freed progressively as the sweep advances, not held alive until fold completion.

With OnArrival, the live memory at any point is bounded by the number of delivered-but-not-yet-swept results (typically much smaller than total results). With OnFinalize, all results are delivered before the bulk sweep, so peak memory equals the node’s child count × result size — freed in one pass.

This is relevant for folds over large trees where each result carries significant heap data (parsed documents, artifact records, aggregated datasets). See Infrastructure for the arena allocation model.

Keyboard shortcuts

hylic

Accumulation Strategies

OnArrival: streaming sweep

OnFinalize: bulk sweep

When to use which

Memory footprint