hylic

A Rust library for tree-shaped recursive computation.

hylic splits a recursive computation into three pieces you build independently: a fold that says what to compute at each node, a graph that says how a node yields its children, and an executor that drives the recursion. Each piece can be defined, transformed, and composed on its own.

The library ships with two executors. Fused is the sequential one — a callback-based recursion with no overhead beyond the fold closures themselves. Funnel is the parallel one: a work-stealing engine with three compile-time policy axes (queue topology, accumulation strategy, wake policy), all monomorphised, no runtime dispatch on strategy choice. On the 14-workload Matrix bench Funnel wins 10 rows outright against handrolled Rayon and a scoped pool, and lands within a few percent of the winner on the rest. Numbers, the interactive viewer, and the workload catalogue are on the benchmarks page. Scenarios are synthetic CPU-burn workloads, so absolute milliseconds describe shape and relative ordering rather than any specific production pipeline.

The same fold runs unchanged under either executor — the choice of FUSED versus exec(funnel::Spec::default(n)) is one expression, not one rewrite.

A first example

Consider the classical problem of computing total size on disk. The tree structure corresponds to the directory layout; the fold at each node combines the node’s own size with the results from its children; the executor drives the recursion. Each concern is expressed independently and handed to the executor at the end:

#![allow(unused)]
fn main() {
    #[test]
    fn intro_dir_example() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct Dir { name: String, size: u64, children: Vec<Dir> }

        let graph = treeish(|d: &Dir| d.children.clone());
        let fold = fold(
            |d: &Dir| d.size,
            |heap: &mut u64, child: &u64| *heap += child,
            |heap: &u64| *heap,
        );

        let tree = Dir {
            name: "project".into(), size: 10,
            children: vec![
                Dir { name: "src".into(), size: 200, children: vec![] },
                Dir { name: "docs".into(), size: 50, children: vec![] },
            ],
        };

        // Sequential:
        let total = FUSED.run(&fold, &graph, &tree);
        assert_eq!(total, 260);

        // Parallel — same fold, same graph:
        let total = exec(funnel::Spec::default(4)).run(&fold, &graph, &tree);
        assert_eq!(total, 260);
    }
}

The tree structure need not live inside the data. A Treeish is a function from a node to its children — it can traverse a nested struct, look up indices in a flat array, or resolve references through any external mechanism:

#![allow(unused)]
fn main() {
    #[test]
    fn intro_flat_example() {
        use hylic::prelude::*;

        // Flat adjacency list — nodes are indices, children are looked up
        let children: Vec<Vec<usize>> = vec![
            vec![1, 2],  // node 0 → children 1, 2
            vec![],      // node 1 → leaf
            vec![],      // node 2 → leaf
        ];
        let graph = treeish_visit(move |n: &usize, cb: &mut dyn FnMut(&usize)| {
            for &c in &children[*n] { cb(&c); }
        });
        let fold = fold(|n: &usize| *n as u64, |h: &mut u64, c: &u64| *h += c, |h| h.clone());

        let total = FUSED.run(&fold, &graph, &0);
        assert_eq!(total, 3); // 0 + 1 + 2
    }
}

Architecture

User-defined closures are wrapped into composable types (Fold, Treeish), transformed independently, and handed to an executor. The executor drives a recursion where fold and graph interleave at every node:

N — the node type (a struct, an index, a key — anything)
H — the heap: per-node mutable scratch space, created by init, not shared between nodes
R — the result: produced by finalize, flows upward to the parent’s accumulate

Any fold and graph can be executed in parallel by switching to the Funnel executor — a work-stealing engine that interleaves unfold and fold without materialising the tree. Three compile-time policy axes — queue topology, accumulation strategy, wake policy — are monomorphised, so there is no runtime dispatch on strategy choice. See Benchmarks for measured results.

Transformations and lifts

Folds and graphs are independently transformable. Each combinator produces a new value — the original is unchanged (for Clone domains) or consumed (for Owned):

N, NewN — original and target node types
H — the fold’s per-node heap (unchanged by map/zipmap/contramap)
R, RNew, Extra — original, replaced, and augmented result types

All compose freely — see the Fold guide, Graph guide, and Transformations cookbook.

A lift goes further — it transforms both fold and treeish in sync into a different type domain via the Lift trait. The Explainer records the full computation trace at every node (histomorphism).

SeedPipeline handles a common case: the tree is discovered lazily from seed references rather than known upfront. The user provides a seed edge function (Edgy<N, Seed>) and a grow function (Fn(&Seed) -> N); the pipeline constructs the treeish, handles the entry transition, and runs the fold. Internally it uses a lift (SeedLift), and the SeedNode<N> row type is hidden behind sugar-time Node/EntryRoot dispatch.

Cookbook

The Cookbook contains worked examples with snapshot-tested output: expression evaluation, module resolution, configuration inheritance, filesystem summary, cycle detection, parallel execution.

Where to start

The Quick Start walks through constructing and running a fold. The recursive pattern explains the underlying decomposition.

Quick Start

A complete fold — definition, tree structure, sequential execution — is one prelude line and three closures:

#![allow(unused)]
fn main() {
    #[test]
    fn intro_dir_example() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct Dir { name: String, size: u64, children: Vec<Dir> }

        let graph = treeish(|d: &Dir| d.children.clone());
        let fold = fold(
            |d: &Dir| d.size,
            |heap: &mut u64, child: &u64| *heap += child,
            |heap: &u64| *heap,
        );

        let tree = Dir {
            name: "project".into(), size: 10,
            children: vec![
                Dir { name: "src".into(), size: 200, children: vec![] },
                Dir { name: "docs".into(), size: 50, children: vec![] },
            ],
        };

        // Sequential:
        let total = FUSED.run(&fold, &graph, &tree);
        assert_eq!(total, 260);

        // Parallel — same fold, same graph:
        let total = exec(funnel::Spec::default(4)).run(&fold, &graph, &tree);
        assert_eq!(total, 260);
    }
}

fold(...) builds a Shared-domain Fold<Dir, u64, u64> from three closures: init produces a per-node heap from a &Dir, accumulate folds each child’s result into the heap, and finalize extracts the result. treeish(...) wraps a children function as a Treeish<Dir>. FUSED is the sequential executor constant — callback-based recursion, no overhead beyond the fold closures.

The Funnel executor swaps in without touching the fold or graph:

#![allow(unused)]
fn main() {
    #[test]
    fn quickstart_funnel() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct Dir { name: String, size: u64, children: Vec<Dir> }

        let graph = treeish(|d: &Dir| d.children.clone());
        let fold = fold(
            |d: &Dir| d.size,
            |heap: &mut u64, child: &u64| *heap += child,
            |heap: &u64| *heap,
        );

        let tree = Dir {
            name: "root".into(), size: 10,
            children: vec![
                Dir { name: "a".into(), size: 5, children: vec![] },
                Dir { name: "b".into(), size: 3, children: vec![] },
            ],
        };

        let total = exec(funnel::Spec::default(4)).run(&fold, &graph, &tree);
        assert_eq!(total, 18);
    }
}

Spec::default(n) picks the Robust preset over n worker threads; see Funnel policies for the alternatives.

For repeated folds, pool creation amortises in a session scope:

#![allow(unused)]
fn main() {
    #[test]
    fn quickstart_session() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct Dir { name: String, size: u64, children: Vec<Dir> }

        let graph = treeish(|d: &Dir| d.children.clone());
        let fold = fold(
            |d: &Dir| d.size,
            |heap: &mut u64, child: &u64| *heap += child,
            |heap: &u64| *heap,
        );

        let tree = Dir {
            name: "root".into(), size: 10,
            children: vec![
                Dir { name: "a".into(), size: 5, children: vec![] },
            ],
        };

        exec(funnel::Spec::default(4)).session(|s| {
            let r1 = s.run(&fold, &graph, &tree);
            let r2 = s.run(&fold, &graph, &tree);
            assert_eq!(r1, r2);
        });
    }
}

The same fold over flat data

The tree need not live inside the data. The same summation fold runs over a Vec<Vec<usize>> adjacency list, where nodes are integer indices:

#![allow(unused)]
fn main() {
    #[test]
    fn intro_flat_example() {
        use hylic::prelude::*;

        // Flat adjacency list — nodes are indices, children are looked up
        let children: Vec<Vec<usize>> = vec![
            vec![1, 2],  // node 0 → children 1, 2
            vec![],      // node 1 → leaf
            vec![],      // node 2 → leaf
        ];
        let graph = treeish_visit(move |n: &usize, cb: &mut dyn FnMut(&usize)| {
            for &c in &children[*n] { cb(&c); }
        });
        let fold = fold(|n: &usize| *n as u64, |h: &mut u64, c: &u64| *h += c, |h| h.clone());

        let total = FUSED.run(&fold, &graph, &0);
        assert_eq!(total, 3); // 0 + 1 + 2
    }
}

Only the node type and the Treeish change — the fold logic is identical. This separation is the foundation of hylic’s composability.

Pivoting between the two

The two formulations describe the same shape of computation in different node types. Fold::contramap_n lets you take a fold written for one and run it over the other, without rewriting any of the fold’s closures.

Going Dir-fold → flat: synthesise a minimal Dir per index — only the fields the fold actually reads need to exist. The graph is swapped on the executor’s side.

#![allow(unused)]
fn main() {
    #[test]
    fn pivot_dir_fold_to_flat() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct Dir { size: u64, children: Vec<Dir> }

        // The Dir-fold reads d.size and nothing else.
        let dir_fold: Fold<Dir, u64, u64> = fold(
            |d: &Dir| d.size,
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );

        // Flat data for the same logical tree.
        let sizes: Vec<u64>      = vec![10, 200, 50];
        let adj: Vec<Vec<usize>> = vec![vec![1, 2], vec![], vec![]];

        // Pivot: contramap_n synthesises a minimal Dir from each index — only
        // the fields the fold reads need to exist. The fold's closures are
        // unchanged; the graph is the index-based one.
        let flat_fold:  Fold<usize, u64, u64> =
            dir_fold.contramap_n(move |i: &usize| Dir { size: sizes[*i], children: vec![] });
        let flat_graph: Treeish<usize> =
            treeish_visit(move |i: &usize, cb: &mut dyn FnMut(&usize)| {
                for &c in &adj[*i] { cb(&c); }
            });

        let total: u64 = FUSED.run(&flat_fold, &flat_graph, &0);
        assert_eq!(total, 260);
    }
}

The mirror direction projects each Dir to the index the flat-fold expects:

#![allow(unused)]
fn main() {
    #[test]
    fn pivot_flat_fold_to_dir() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct Dir { id: usize, children: Vec<Dir> }

        // The flat-fold reads sizes[*i] from a captured array.
        let sizes: Vec<u64> = vec![10, 200, 50];
        let flat_fold: Fold<usize, u64, u64> = fold(
            move |i: &usize| sizes[*i],
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );

        // The same logical tree as a struct.
        let root = Dir { id: 0, children: vec![
            Dir { id: 1, children: vec![] },
            Dir { id: 2, children: vec![] },
        ]};

        // Pivot: contramap_n projects each Dir to the index the fold expects.
        // The fold's closures are unchanged; the graph walks struct children.
        let dir_fold:  Fold<Dir, u64, u64> = flat_fold.contramap_n(|d: &Dir| d.id);
        let dir_graph: Treeish<Dir>        = treeish(|d: &Dir| d.children.clone());

        let total: u64 = FUSED.run(&dir_fold, &dir_graph, &root);
        assert_eq!(total, 260);
    }
}

In both directions the original fold’s closures pass through unchanged; the only transformation is contramap_n on the input axis. The graph is chosen at the call site to match.

Glossary

One-line definitions of the core terms, with pointers to where they’re developed in depth. Link to this page from anywhere a term appears without its definition.

Fold

The algebra over a recursion — a triple of closures init: &N → H, accumulate: &mut H, &R, finalize: &H → R. Given an input node N, a fold says how to produce a per-node scratch state, how to fold each child’s result into it, and how to close it out into a final R. See Fold guide.

Graph / `Treeish<N>`

A function from a node to its children. The type is Treeish<N>; “graph” is the informal name for the concept. A Treeish is how the recursion finds the next level; the executor and fold never see the tree structure directly, only what the Treeish yields. See Graph guide.

Heap (`H`)

The per-node working state inside a fold. Produced by init, mutated by accumulate as each child’s R arrives, consumed by finalize. Not shared across nodes; each node gets its own. Also written as the type variable H in Fold signatures.

`R` (result)

The type returned from finalize at each node, and the type that flows upward into the parent’s accumulate. At the root, the executor hands back an R.

Domain (`Shared` / `Local` / `Owned`)

How hylic stores closures inside folds, graphs, and grow functions. Shared uses Arc<dyn Fn + Send + Sync> (cheap clone, parallel-safe); Local uses Rc<dyn Fn> (cheap clone, single-thread, non-Send captures); Owned uses Box<dyn Fn> (one-shot, consumed on use). See The three domains.

Executor (`Fused` / `Funnel`)

The runtime that drives the recursion. Fused is a direct sequential callback recursion — one thread, no work queue. Funnel is a parallel work-stealing engine running over a compile-time policy (queue topology, accumulation strategy, wake policy). Both implement the Executor<N, R, D, G> trait. See Choosing an executor.

Lift

A transformation that rewrites a (grow, treeish, fold) triple into another one with possibly different types. Implemented by the Lift trait; the library ships four atoms (IdentityLift, ComposedLift, ShapeLift, SeedLift). See Lifts.

`ComposedLift<L1, L2>`

The binary composition atom — two lifts chained so L2’s inputs equal L1’s outputs. Every multi-sugar call builds a right-associated ComposedLift tree so the compiler can verify every junction at build time.

`ShapeLift`

The universal “rewrite one or more fold phases” lift used by most Stage-2 sugars (wrap_init, zipmap, filter_edges, …).

`SeedLift`

The finishing lift that closes a SeedPipeline‘s grow axis. Domain-parametric over ShapeCapable: a single SeedLift<D, N, Seed, H> struct, with per-domain Lift impls because the fold-construction closures’ Send+Sync discipline differs by domain. Not something user code constructs directly — seed-rooted Stage2Pipeline::run assembles it at call time from grow + user-supplied root_seeds + entry_heap and composes it as the first lift of the run-time chain.

`SeedNode<N>`

Sealed row type with two library-internal variants: the synthetic EntryRoot (a seed-closed chain’s root row) and a resolved Node(N). User code inspects via is_entry_root, as_node, map_node. See SeedNode.

`SeedExplainerResult<N, H, R>`

N-typed projection of a seed-closed explainer result. The EntryRoot row is promoted into top-level fields (entry_initial_heap, entry_working_heap, orig_result); each root subtree becomes an ExplainerResult<N, H, R> — no SeedNode<N> appears in the user-visible shape. Obtained via raw.into() (or SeedExplainerResult::from(raw)).

Pipeline

A typestate-chained builder over lifts. SeedPipeline and TreeishPipeline are Stage 1 (base slots); Stage2Pipeline<Base, L> is the unified Stage-2 form (base + lift chain), distinguished by its Base. OwnedPipeline is a one-shot variant. Every pipeline ultimately resolves to a (treeish, fold) pair handed to an executor. See Pipelines.

`Stage2Pipeline<Base, L>`

The single Stage-2 type. Base is a Stage-1 pipeline implementing Stage2Base; L is a Lift chain. Treeish-rooted pipelines (Stage2Pipeline<TreeishPipeline<…>, L>) and seed-rooted pipelines (Stage2Pipeline<SeedPipeline<…>, L>) compose Stage-2 sugars uniformly via Wrap dispatch.

`Wrap` / `Stage2Base`

Wrap is the dispatch trait that maps a Stage-2 sugar’s user-facing &N parameter type to the chain-tip’s actual type — Identity when the chain runs over N, SeedWrap when it runs over SeedNode<N>. Stage2Base is the trait implemented by Stage-1 pipelines that can root a Stage2Pipeline; it carries the associated Wrap implementation.

`ShapeCapable::EntryHeap<H>`

The per-domain GAT giving a domain its Fn() -> H storage discipline. Arc<dyn Fn() -> H + Send + Sync> on Shared, Rc<dyn Fn() -> H> on Local. Used by SeedLift for the EntryRoot init thunk in place of a hand-rolled domain discriminator enum.

Sugar

A pipeline method that delegates to .then_lift(...) with a library lift — wrap_init, zipmap, filter_edges, explain, etc. The sugar traits are split across stages and domains: SeedSugars* and TreeishSugars* for Stage 1 reshape; Stage2SugarsShared/Stage2SugarsLocal for Stage 2 (one trait per domain, blanket-implemented across both Bases). See Sugars.

CPS (continuation-passing style)

Used in two places with different meanings, both internal machinery:

In Lift::apply, the trait takes a continuation so a lift can transform the triple and then call through to the user’s executor rather than returning a value. This enables composition without an intermediate materialisation.
In the Funnel executor, the recursion is defunctionalised into Cont::Root / Cont::Direct / Cont::Slot variants so workers run a loop { match cont { … } } rather than nesting calls. See CPS walk.

Users of the library don’t need to think about CPS to use it; these sections are optional reading.

Variance

Whether a type’s role allows covariant, contravariant, or invariant transformation. N is covariant in grow, invariant in graph, contravariant in fold’s init; H and R are invariant. This is why the methods have the names they do (map for covariant, contramap for contravariant, *_bi for invariant/bijective). See Transforms and variance.

Grow

The Seed → N closure in a SeedPipeline — resolves a reference into a full node. Only SeedPipeline has a grow slot; a TreeishPipeline skips it (nodes are already materialised).

Benchmark results

Wall-clock means from criterion across four harnesses. The Overhead harness pits Fused against handrolled single-threaded recursions to measure framework cost. The Matrix harness puts Funnel across its 16 policy variants alongside Rayon and a scoped pool, all parallel, across 14 workload scenarios. Module simulation runs a synthetic dependency-graph resolver — the workload that originally motivated the library. Quick is a small subset of the Matrix grid, used to track changes during development.

What the numbers say

Sequential first. Fused lands within ±20% of hand.seq on every row of the Overhead bench, faster on 8 of 11. The spread against real.seq (a plain fn f(&T) -> R with no hylic types in sight) is within ±16%. The library’s fold/treeish indirection is, in this regime, on the order of compiler-level noise rather than an integer multiple. The plausible reason is a uniform per-node shape that monomorphises predictably plus closures held inside Fold and Treeish that the compiler can inline through; whatever the cause, the practical statement is parity, not dominance.

The parallel picture is more interesting. A Funnel variant is the row winner on 10 of 14 Matrix workloads. On the remaining 4 the row winner is handrolled and the nearest Funnel variant lies within a few percent. No single policy preset wins across the grid: shallow-wide workloads prefer Shared queues with OnArrival accumulation; deep-narrow prefer PerWorker with OnFinalize; the wake axis can move a row by 10–30% on its own. The 14-row table below is most useful read row-by-row — for any one workload, the policy that wins tells you something about the workload’s shape.

The Module-simulation harness picks at the same trade. On the four _fast rows (large-dense, large-sparse, small-dense, small-sparse) Funnel variants win — different policy axes per row, unsurprisingly given the Matrix story. On the _slow rows, where per-node work dominates and scheduling ceases to matter, the runners cluster.

These properties of Funnel are statements about the source, not inferences from the benchmarks. Policies are monomorphised (Funnel is generic, the entire walk specialises per policy, no runtime dispatch on strategy). Continuations are defunctionalised — Cont<H, R> is a three-variant enum (Root, Direct, Slot); the inner loop is match cont in a loop, no Box<dyn FnOnce> per step. Continuations and fold chains live in arenas (ChainNode<H, R> in a scoped Arena, Cont<H, R> in a ContArena, both released in bulk at the end of the pool’s lifetime; no per-node malloc/free). Under the OnArrival accumulation policy, each child result is folded into its parent’s heap on arrival via P::Accumulate::deliver, and the slot is freed; OnFinalize buffers until siblings are complete and then drains. The walk references the user’s fold and treeish by &'a _, with the lifetime tied to the pool’s with(...) scope; user closures are not cloned into worker queues. Queue topology is a compile-time choice — per-worker deques (local push, remote steal) or a single shared FIFO — and selection is per workload rather than universal.

See the Funnel deep-dive for the walk, ticket system, and arena details, and Policies and presets for the policy traits.

Interactive: Funnel axes viewer

The Matrix bench output filtered by policy axis, marginalised on demand, with cell-level deviations from real.rayon.

Overhead

make -C hylic-benchmark bench-overhead

The Overhead table also lists several parallel runners (real.rayon, hylic-rayon, hand.rayon, hylic-parref+rayon, hylic-eager+rayon) for cross-reference. They are not the denominators for sequential-overhead statements; a parallel runner beating a sequential one says that multiple cores are faster than one, not that the framework is slow.

For a framework-vs-handrolled comparison in the parallel regime, hylic-rayon versus real.rayon is the apples-to-apples pair: within ±15% on most rows, with a worst-case +33% on parse-lt_sm. That is a real framework tax on the parallel path; whether it’s tolerable depends on the choice between Funnel and a Rayon-backed executor.

Matrix

make -C hylic-benchmark bench-matrix

Each cell shows the wall-clock mean and the +X% deviation from the row’s fastest entry; the row winner is marked (best). Reading a few rows together brings out the policy-axis story.

wide_sm (200 nodes, branching 20): funnel.pw.arrv.push = 6.2ms (best), 20% ahead of both hand.pool and hand.rayon at 7.5ms. Wide fan-out plus immediate OnArrival delivery and per-worker deques keeps the push cheap and drains the child heap as siblings complete.

graph-hv_sm (heavy edge-discovery, modelling a dependency resolver): funnel.sh.fin.k2 = 16.4ms (best), 2% ahead of hand.rayon at 16.8ms. Dropping the wake frequency to every second child amortises the edge-discovery cost better than the handrolled approaches.

The 4 rows where handrolled wins are bal_sm, io_sm, graph-io_sm, noop_sm. On bal_sm, hand.rayon = 16.1ms versus funnel.sh.fin.push = 17.0ms (+6%). noop_sm is the zero-work cell — dominated by per-node bookkeeping, absolute times sub-millisecond, percentage deltas distort. The framework cost is most visible there and unavoidable for any tree-shaped recursive parallelisation.

Module simulation

make -C hylic-benchmark bench-modsim

Eight workloads on two axes — sparse vs dense graph, fast vs slow per-node work. On the four _fast rows, Funnel variants take three of four winners (funnel.pw.fin.push = 1.0ms on large-dense_fast, funnel.pw.arrv.push = 1.0ms on large-sparse_fast, funnel.sh.arrv.push = 0.3ms on small-sparse_fast); the fourth, small-dense_fast, sits near 0.3ms across runners. For dependency-graph-shaped workloads with cheap per-node work — the common case for a module resolver — Funnel is the faster choice. Where per-node work dominates, scheduler choice ceases to matter and the runners converge.

Quick

make bench-quick-light

Five runners — real.rayon plus four Funnel variants covering both queue axes (PerWorker, Shared) and both accumulation axes (OnArrival, OnFinalize), all with EveryK<4> wake. Nine scenarios chosen for variation: noop, hash, parse-lt, parse-hv, aggr, xform, bal, wide, graph-hv. Near-parity scenarios (io, deep, fin, graph-io, lg-dense) are excluded.

The -ab variants run the same bench across multiple git revisions of hylic, archiving each run with a timestamp. Further revisions can be added by appending label=gitref to the makefile target.

Workload scenarios

Each scenario is a TreeSpec (node count, branching factor) and a WorkSpec (per-phase CPU burn amounts plus an optional I/O spin-wait). busy_work is the deterministic u64 LCG loop inside black_box; spin_wait_us is a wall-clock busy-wait. The scenarios are synthetic — the intent is to cover a shape space (shallow-wide, deep-narrow, accumulate-heavy, finalize-heavy, I/O-bound, graph-discovery- heavy) rather than reproduce any specific production workload.

#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/support/scenario.rs:scenario_catalog}}
}

#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/support/work.rs:work_spec}}
}

Funnel policy variants

#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/support/executor_set.rs:funnel_specs}}
}

See Funnel policies for the meaning of each axis, the rationale, and guidance on selecting a preset.

Text tables

Overhead

workload                          hand-pool        hand-rayon          hand-seq hylic-eager+fused hylic-eager+rayon       hylic-fused hylic-fused-local hylic-fused-ownedhylic-parref+fusedhylic-parref+rayon       hylic-rayon        real-rayon          real-seq
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
aggr_sm                       19.5ms (+68%)     14.2ms (+23%)    50.1ms (+334%)     21.7ms (+88%)     20.1ms (+74%)    52.3ms (+353%)    52.0ms (+350%)    55.7ms (+382%)     15.6ms (+35%)      12.6ms (+9%)    11.5ms  (best)     15.9ms (+38%)    47.9ms (+315%)
bal_sm                        25.8ms (+69%)    15.3ms  (best)    92.3ms (+505%)    62.8ms (+311%)     28.5ms (+87%)    89.0ms (+483%)    88.2ms (+478%)    88.7ms (+481%)    59.9ms (+292%)     21.2ms (+39%)     21.0ms (+38%)     20.7ms (+36%)    76.9ms (+404%)
deep_sm                       10.0ms (+33%)      9.6ms (+28%)    34.9ms (+365%)    32.3ms (+330%)     13.3ms (+78%)    34.6ms (+361%)    36.8ms (+391%)    35.0ms (+367%)    31.9ms (+325%)      9.5ms (+26%)      8.6ms (+15%)     7.5ms  (best)    31.0ms (+313%)
fin_sm                        15.1ms (+60%)     9.5ms  (best)    45.1ms (+377%)     14.8ms (+56%)     15.7ms (+66%)    44.6ms (+371%)    47.9ms (+407%)    43.9ms (+364%)     9.5ms  (best)     10.5ms (+11%)      10.2ms (+7%)     11.0ms (+17%)    43.0ms (+355%)
hash_sm                        1.6ms (+69%)      1.2ms (+23%)     5.9ms (+520%)     4.5ms (+377%)      1.6ms (+66%)     4.2ms (+339%)     4.5ms (+375%)     4.5ms (+374%)     4.2ms (+340%)      1.3ms (+37%)       1.0ms (+5%)     1.0ms  (best)     4.6ms (+381%)
io_sm                         10.9ms (+43%)       7.6ms (+1%)    42.4ms (+460%)    42.7ms (+464%)       7.9ms (+4%)    42.3ms (+458%)    42.6ms (+462%)    42.5ms (+461%)    42.6ms (+462%)     7.6ms  (best)       7.6ms (+1%)     7.6ms  (best)    42.0ms (+455%)
lg-dense_sm                   29.1ms (+49%)    19.5ms  (best)    91.1ms (+368%)    74.8ms (+284%)     22.7ms (+17%)   102.8ms (+428%)    87.2ms (+348%)    85.7ms (+340%)    73.0ms (+275%)     22.9ms (+17%)      20.3ms (+4%)      20.5ms (+5%)    96.4ms (+395%)
noop_sm                       0.1ms (+7762%)     0.0ms (+2148%)      0.0ms (+36%)     0.1ms (+13599%)     0.2ms (+20791%)     0.0ms (+336%)     0.0ms (+867%)     0.0ms (+623%)     0.1ms (+12240%)     0.1ms (+8872%)     0.0ms (+2548%)     0.0ms (+2697%)     0.0ms  (best)
parse-hv_sm                   37.6ms (+56%)    24.2ms  (best)   102.4ms (+323%)   119.6ms (+395%)      24.8ms (+3%)   119.7ms (+395%)   114.0ms (+372%)   124.1ms (+413%)   106.6ms (+341%)      24.6ms (+2%)     27.2ms (+13%)      24.6ms (+2%)   111.5ms (+361%)
parse-lt_sm                  10.5ms (+101%)      7.2ms (+39%)    28.9ms (+455%)    26.8ms (+414%)      7.7ms (+48%)    26.2ms (+403%)    29.0ms (+458%)    26.2ms (+403%)    27.2ms (+422%)      6.9ms (+33%)      5.7ms (+10%)     5.2ms  (best)    30.6ms (+488%)
wide_sm                        9.6ms (+40%)     6.9ms  (best)    38.6ms (+461%)    31.6ms (+359%)     10.0ms (+45%)    35.6ms (+418%)    35.6ms (+417%)    35.6ms (+417%)    30.9ms (+349%)     10.2ms (+49%)      8.9ms (+30%)      9.4ms (+37%)    31.3ms (+355%)
xform_sm                      17.3ms (+66%)     11.7ms (+13%)    53.9ms (+419%)     19.5ms (+88%)     16.3ms (+57%)    43.6ms (+320%)    50.1ms (+383%)    50.0ms (+382%)     13.1ms (+27%)     12.4ms (+19%)    10.4ms  (best)     11.6ms (+12%)    50.9ms (+390%)

(Matrix and Module-simulation text tables refresh after the next make bench-matrix / make bench-modsim run.)

Benchmark source

Overhead harness

#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/bench_overhead.rs}}
}

Matrix harness

#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/bench_matrix.rs}}
}

Module simulation harness

#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/bench_modsim.rs}}
}

Runner matrix construction

#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/support/runners.rs}}
}

Handrolled baselines

#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/support/baselines.rs}}
}

Funnel policy specs

#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/support/executor_set.rs}}
}

Correctness

Performance numbers are uninformative without correctness. The Funnel executor has a unit and integration suite under hylic/src/exec/variant/funnel/tests/ covering the API, parity with the Fused baseline, and deterministic results across all policy variants. An interleaving stress harness in tests/interleaving.rs and tests/stress.rs exercises the scheduler under aggressive steal patterns. Every benchmark harness asserts that the computed R matches a reference Fused run (PreparedScenario::expected) before timing begins; a policy variant producing a faster-but-incorrect answer would never reach the tables above.

The recursive pattern

Recursive tree computations — regardless of domain — share a single underlying structure. hylic makes that structure explicit, names its parts, and allows each part to be transformed independently.

One function

The entire computation, taken directly from the sequential executor:

#![allow(unused)]
fn main() {
fn recurse<N, H, R>(
    fold: &impl FoldOps<N, H, R>,
    graph: &impl TreeOps<N>,
    node: &N,
) -> R {
    let mut heap = fold.init(node);
    graph.visit(node, &mut |child: &N| {
        let r = recurse(fold, graph, child);
        fold.accumulate(&mut heap, &r);
    });
    fold.finalize(&heap)
}
}

At each node:

init — construct a heap H from the node
visit children — for each child, recurse and accumulate the result
finalize — produce the node’s result R from the heap

Every tree fold — Fibonacci, dependency resolution, filesystem aggregation, AST evaluation — is this function instantiated with different choices for init, accumulate, finalize, and child structure.

Three pieces

The function above takes three things as parameters. hylic gives each a name and a type:

Treeish — the tree structure. Given a node, visit its children:

#![allow(unused)]
fn main() {
/// A `NodeT → EdgeT*` function wrapped as a clonable Arc-backed
/// struct. When `NodeT = EdgeT` the type is typically named
/// [`crate::graph::Treeish`].
pub struct Edgy<NodeT, EdgeT> {
    impl_visit: Arc<dyn Fn(&NodeT, &mut dyn FnMut(&EdgeT)) + Send + Sync>,
}
}

Treeish<N> is an alias for Edgy<N, N> — an edge function where nodes and edges are the same type:

#![allow(unused)]
fn main() {
pub type Treeish<Node> = Edgy<Node, Node>;
}

A Treeish is constructed from a function from node to children:

#![allow(unused)]
fn main() {
    #[test]
    fn treeish_constructor() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct Dir { name: String, size: u64, children: Vec<Dir> }

        let graph: Treeish<Dir> = treeish(|d: &Dir| d.children.clone());
        let root = Dir { name: "root".into(), size: 10, children: vec![] };
        assert_eq!(graph.apply(&root).len(), 0);
    }
}

The callback-based signature Fn(&N, &mut dyn FnMut(&N)) avoids any allocation per visit. The treeish() constructor wraps a Vec-returning function into this form.

The node type N may be anything — a nested struct, an integer index into an adjacency list, a string key into a map, or a reference resolved through I/O. The structure resides in the treeish function rather than in the data.

Fold — the computation. In the Shared domain, three closures behind Arc:

#![allow(unused)]
fn main() {
pub struct Fold<N, H, R> {
    pub(crate) impl_init: Arc<dyn Fn(&N) -> H + Send + Sync>,
    pub(crate) impl_accumulate: Arc<dyn Fn(&mut H, &R) + Send + Sync>,
    pub(crate) impl_finalize: Arc<dyn Fn(&H) -> R + Send + Sync>,
}
}

Other domains use Rc (Local) or Box (Owned) — same operations, different boxing. The fold type doesn’t carry the domain; the executor does.

init: &N → H — create per-node working state from the node
accumulate: &mut H, &R — fold one child’s result into the heap
finalize: &H → R — close the bracket, produce the node’s result

H and R are distinct types: H is mutable working state (the open bracket), R is the immutable result flowing to the parent (the closed bracket). See The N-H-R algebra factorization for the theoretical basis. Many folds have H = R, in which case finalize is just an identity extraction from the heap:

#![allow(unused)]
fn main() {
    #[test]
    fn identity_finalize_fold_example() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct Dir { name: String, size: u64, children: Vec<Dir> }

        let graph: Treeish<Dir>      = treeish(|d: &Dir| d.children.clone());
        let sum:   Fold<Dir, u64, u64> = fold(
            |d: &Dir| d.size,
            |heap: &mut u64, child: &u64| *heap += child,
            |h: &u64| *h,
        );

        let tree = Dir {
            name: "root".into(), size: 10,
            children: vec![
                Dir { name: "a".into(), size: 5, children: vec![] },
                Dir { name: "b".into(), size: 3, children: vec![] },
            ],
        };
        assert_eq!(FUSED.run(&sum, &graph, &tree), 18);
    }
}

Executor — the strategy. Controls HOW the recursion runs:

#![allow(unused)]
fn main() {
    #[test]
    fn exec_usage() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct N { val: u64, children: Vec<N> }

        let graph: Treeish<N> = treeish(|n: &N| n.children.clone());
        let f:     Fold<N, u64, u64> = fold(
            |n: &N| n.val,
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );
        let root = N { val: 1, children: vec![N { val: 2, children: vec![] }] };

        // Sequential:
        let r1: u64 = FUSED.run(&f, &graph, &root);
        // Parallel — same fold, same graph:
        let r2: u64 = exec(funnel::Spec::default(4)).run(&f, &graph, &root);
        assert_eq!(r1, r2);
    }
}

Two executors are provided:

Executor	Traversal	Domains
`FUSED` (Shared) / `local::FUSED` / `owned::FUSED`	Direct sequential recursion	all
Funnel	Parallel work-stealing	Shared

Both implement the Executor<N, R, D, G> trait, parameterised by a domain and graph type. See Executor architecture for details.

The separation

The fold carries no knowledge of the tree; the tree carries no knowledge of the fold; the executor connects them. The domain determines how closures are stored — the fold and treeish do not record this, the executor does.

Every computation in hylic reduces to executor.run(&fold, &treeish, &root). When the tree is discovered lazily (seeds resolved on demand), SeedPipeline constructs the treeish from a seed edge function together with a grow, and delegates to executor.run internally.

The operations traits

The executor’s recursion engine doesn’t know about Arc, Rc, or Box. It takes &impl FoldOps<N, H, R> and &impl TreeOps<N> — pure operation traits:

#![allow(unused)]
fn main() {
/// The three fold operations, independent of storage.
pub trait FoldOps<N, H, R> {
    /// Construct a fresh per-node heap from a node reference.
    fn init(&self, node: &N) -> H;
    /// Fold one child result into the heap in place.
    fn accumulate(&self, heap: &mut H, result: &R);
    /// Close out the heap into the node's final result.
    fn finalize(&self, heap: &H) -> R;
}
}

#![allow(unused)]
fn main() {
/// Tree traversal operations, independent of storage.
pub trait TreeOps<N> {
    /// Visit children of `node` via callback. Zero allocation.
    fn visit(&self, node: &N, cb: &mut dyn FnMut(&N));

    /// Collect children to Vec. Default: collect via visit.
    fn apply(&self, node: &N) -> Vec<N> where N: Clone {
        let mut v = Vec::new();
        self.visit(node, &mut |child| v.push(child.clone()));
        v
    }
}
}

The standard Fold<N, H, R> and Treeish<N> implement these traits. So do local::Fold, owned::Fold, and any user-defined struct with the right methods. The executor is generic over these traits — when called with a concrete struct, the compiler inlines completely.

See Domain system for how domains connect operations to storage.

The three domains

The underlying question

A recursion is, at its heart, five closures — a fold’s init, accumulate, and finalize, a graph’s edge function, and (in seed pipelines) a grow. hylic retains these closures across the duration of a run and hands them to executors, lifts, and user code. A single question organises the design:

How shall dyn Fn(&N) -> H be stored?

Rust offers three practical answers:

Storage	Clone?	`Send + Sync`?
`Arc<dyn>`	cheap (refcount bump)	yes, if the closure is
`Rc<dyn>`	cheap (refcount bump)	no (single-threaded)
`Box<dyn>`	not `Clone`	possible, but consumed on use

Each choice is a compromise. Arc pays an atomic instruction on every clone in exchange for the ability to cross thread boundaries. Rc uses a plain counter — faster single-threaded, incompatible with multi-threading. Box avoids any counter but forces transformation pipelines to consume the closure on each rewrite.

Every closure in a recursion must agree on the choice. hylic therefore selects once at the top level and propagates the choice through the entire pipeline; that selection is what is called a domain.

Three choices, three types

Shared stores closures behind Arc with Send + Sync bounds. The atomic clone grants access to the parallel Funnel executor and makes every pipeline Clone.
Local stores closures behind Rc (no Send bound). Clones remain cheap and the pipeline interfaces are unchanged, but execution is confined to a single thread. In return, captures may include Rc<_>, RefCell<_>, or any non-Send type.
Owned stores closures in Box. Clones and sharing are both absent; each stage of a pipeline consumes its predecessor. Appropriate for one-shot computations that should avoid reference-counting overhead entirely.

Shared is the conservative default and serves most code. Local is the escape hatch for non-Send captures. Owned is the minimalist one-shot variant.

The Domain trait

The three choices are encoded as marker types implementing the Domain<N> trait:

#![allow(unused)]
fn main() {
pub trait Domain<N: 'static>: 'static {
    type Fold<H: 'static, R: 'static>: FoldOps<N, H, R>;
    type Graph<E: 'static> where E: 'static;
    type Grow<Seed: 'static, NOut: 'static>;

    /// Construct a fold from three closures. Uniform Send+Sync
    /// bound; each domain sheds Send+Sync at storage time if it
    /// doesn't need it.
    fn make_fold<H: 'static, R: 'static>(
        init: impl Fn(&N) -> H + Send + Sync + 'static,
        acc:  impl Fn(&mut H, &R) + Send + Sync + 'static,
        fin:  impl Fn(&H) -> R + Send + Sync + 'static,
    ) -> Self::Fold<H, R>;

    /// Construct a grow closure from a Fn. Uniform Send+Sync bound.
    fn make_grow<Seed: 'static, NOut: 'static>(
        f: impl Fn(&Seed) -> NOut + Send + Sync + 'static,
    ) -> Self::Grow<Seed, NOut>;

    /// Invoke a stored grow closure.
    fn invoke_grow<Seed: 'static, NOut: 'static>(
        g: &Self::Grow<Seed, NOut>,
        s: &Seed,
    ) -> NOut;

    /// Construct a graph (Edgy) closure. Uniform Send+Sync bound.
    fn make_graph<E: 'static>(
        visit: impl Fn(&N, &mut dyn FnMut(&E)) + Send + Sync + 'static,
    ) -> Self::Graph<E>;
}
}

A Domain<N> implementation specifies:

the concrete Fold<H, R> type in use (closure storage lives inside Fold),
the concrete Graph type,
the concrete Grow<Seed, N> type (for seed pipelines),
constructor methods (make_fold, make_graph, make_grow) that build each of the above generically.

Code generic over D: Domain<N> constructs any of the three without knowing whether the underlying storage is Arc, Rc, or Box; the constructor methods handle the distinction.

Constructors across domains

Each domain exposes the same construction surface, distinguished only by its bounds:

#![allow(unused)]
fn main() {
    #[test]
    fn domains_three_folds() {
        // Shared: closures must be Send + Sync (they go into Arc).
        let _shared = hylic::domain::shared::fold(
            |n: &u64| *n,                     // init
            |h: &mut u64, c: &u64| *h += c,   // accumulate
            |h: &u64| *h,                     // finalize
        );

        // Local: closures can capture Rc / RefCell.
        use std::cell::RefCell;
        use std::rc::Rc;
        let state = Rc::new(RefCell::new(0u32));
        let state_for_init = state.clone();
        let _local = hylic::domain::local::fold(
            move |n: &u64| { *state_for_init.borrow_mut() += 1; *n },
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );

        // Owned: one-shot construction; not Clone.
        let _owned = hylic::domain::owned::fold(
            |n: &u64| *n,
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );
    }
}

The Shared constructor requires Fn + Send + Sync + 'static for every closure; Local requires Fn + 'static; Owned shares Local’s bounds but returns a Box-backed struct. The signatures are aligned so that generic code compiles without modification across domains; the bounds differ so that each domain accepts only those closures it is able to store.

The Fold struct, three times

Because storage differs, each domain ships its own Fold:

#![allow(unused)]
fn main() {
pub struct Fold<N, H, R> {
    pub(crate) impl_init: Arc<dyn Fn(&N) -> H + Send + Sync>,
    pub(crate) impl_accumulate: Arc<dyn Fn(&mut H, &R) + Send + Sync>,
    pub(crate) impl_finalize: Arc<dyn Fn(&H) -> R + Send + Sync>,
}
}

Local and Owned share the same shape, with Rc and Box substituted for Arc. The three are not interchangeable at the type level: the Fused executor reads whichever concrete D::Fold<H, R> the pipeline provides, and crossing domain boundaries requires an explicit conversion that the library does not supply — the expected discipline is to select a single domain per computation.

Parallelism

The parallel Funnel executor requires ShareableLift, a capability that reduces to D = Shared together with Send + Sync on every payload (N, H, R). Local and Owned cannot run in parallel by construction: their storage types do not cross thread boundaries, and the ShareableLift bound does not hold.

The converse is not true — a Shared pipeline runs without issue under Fused. The price of choosing Shared is one atomic operation per closure clone, and nothing more.

Picking one (decision tree)

In short: Shared by default, Local for non-Send captures, Owned for the one-shot minimal case.

For library authors

Prefer code generic over D: Domain<N>. The three domain markers are not interchangeable at runtime, but almost the whole of hylic compiles once and operates across all three. Select a concrete domain only where its specific capability is required (D = Shared for parallelism; D = Owned for consume-on-use).

Transforms and variance

Where a type axis appears inside a fold or graph determines how it may be transformed. The chapter opens with three examples whose argument shapes differ in an informative way; the following section traces those differences to the notion of variance. After that, the library’s naming — map versus contramap versus _bi — ceases to look like convention and becomes something that can be read off the types.

Three transforms, three shapes

map on R — covariant, a single function. Given a Fold<N, H, u64> summing filesystem sizes, producing a Fold<N, H, String> that formats the sum requires only a forward function u64 → String:

fold.map(|n: &u64| format!("{n} bytes"))

contramap_n — contravariant, a single function in the opposite direction. Adapting a Fold<Path, H, R> to operate on a Fold<PathBuf, H, R> requires bridging PathBuf → Path, since the existing init consumes &Path and must continue to do so:

fold.contramap_n(|pb: &PathBuf| pb.as_path().clone())

map_r_bi — invariant, a pair of functions. Changing the result type of an existing fold to a different representation requires both directions, because R is accumulated into (a parent receives its children’s R) as well as emitted (finalize returns R); a one-way function cannot carry values through both roles:

fold.map_r_bi(
    /* forward  */ |n: &u64| format!("{n}"),
    /* backward */ |s: &String| s.parse().unwrap(),
)

Three argument shapes: one forward function (covariant), one reverse function (contravariant), a pair (invariant). The names track the shape.

Why the three shapes?

Each axis occupies a specific position within the slots:

Grow<Seed, N>:        fn(&Seed) -> N            ← N is an output
Graph<N>:             fn(&N, &mut FnMut(&N))    ← N is both
Fold<N, H, R>::init:  fn(&N) -> H               ← N is an input
Fold<N, H, R>::acc:   fn(&mut H, &R)            ← H both, R input
Fold<N, H, R>::fin:   fn(&H) -> R               ← H input, R output

An axis that appears only in output position is covariant: a forward function suffices to rewrite the produced value. Hence map.

An axis that appears only in input position is contravariant: adapting the axis requires a function in the opposite direction, so the existing consumer continues to receive values it understands. Hence contramap.

An axis that appears in both positions is invariant: no single function bridges consumption and production together, so both directions must be supplied. Hence the _bi suffix.

The three positions

N occupies all three positions across different slots — output in Grow, input in Fold, both in Graph. A single “N transform” would therefore apply in different directions depending on the slot; the library instead exposes a per-slot transform on each side, or a coordinated Lift that rewrites N across all three slots at once.

H and R live only inside Fold, but each appears in both positions there (H is init-output / acc-in+out / fin-input; R is acc-input / fin-output). Both are invariant; changing either requires a bijection.

Method surface, derived

With the variance pinned, the catalogue follows automatically.

On a Fold<N, H, R>:

contramap_n(f: N' → N) — contravariant change of N. One arg.
map_r_bi(fwd, bwd) — invariant change of R. Two args.
wrap_init(w), wrap_accumulate(w), wrap_finalize(w) — invariant decorators on H and R. They don’t change the axes they touch; they intercept the existing functions.
zipmap(m) — a covariant extension: pair the existing R with an extra value derived from it. R changes R → (R, Extra), forward only; the new R’s first component is the old R, so “going back” is structurally free (|p: &(R, Extra)| &p.0).
product(other) — binary: run two folds in lockstep, carrier (H1, H2), (R1, R2).

On an Edgy<N, E>:

map(f: E → E') — functor over edges (covariant on E).
contramap(f: N' → N) — contravariant on N.
filter(pred) — edge predicate.
contramap_or_emit(f) — contramap with an escape hatch emitting edges directly (used in fallible graph construction).

Treeish<N> = Edgy<N, N> is what executors consume — the specialisation where node type equals edge type.

The primitive the Edgy sugars wrap:

#![allow(unused)]
fn main() {
    pub fn map<F, NewEdgeT: 'static>(&self, transform: F) -> Edgy<NodeT, NewEdgeT>
    where F: Fn(&EdgeT) -> NewEdgeT + Send + Sync + 'static,
    {
        self.map_endpoints(move |inner| {
            Arc::new(move |n: &NodeT, cb: &mut dyn FnMut(&NewEdgeT)| {
                inner(n, &mut |e: &EdgeT| cb(&transform(e)))
            })
        })
    }
}

#![allow(unused)]
fn main() {
    pub fn contramap<F, NewNodeT: 'static>(&self, transform: F) -> Edgy<NewNodeT, EdgeT>
    where F: Fn(&NewNodeT) -> NodeT + Send + Sync + 'static,
    {
        self.map_endpoints(move |inner| {
            Arc::new(move |n: &NewNodeT, cb: &mut dyn FnMut(&EdgeT)| {
                inner(&transform(n), cb)
            })
        })
    }
}

Naming convention, recovered

From the above:

Suffix	When
none (`map`, `filter`, `wrap_*`)	covariant or decorator-only
`contramap`, `contramap_<axis>`	contravariant; one function
`_bi` (`map_r_bi`, `map_n_bi_lift`, …)	invariant; bijection required
`_or_emit`	contramap with a direct-emit escape

Names mark the variance, so the shape of the arguments is predictable from the identifier alone.

What this chapter does NOT cover

All the operations above change one axis of one structure (Fold OR Graph). Changing N across BOTH structures in sync — or building a new transform that wraps the whole (Grow, Graph, Fold) triple and composes with others — is the job of a Lift. Every library lift internally reduces to one of these single-axis transforms or a coordinated set of them (e.g. n_lift changes N across all three slots at once).

Category-theoretic framing (brief)

The catamorphism’s algebra is F R → R. hylic factors this through H: init creates H from N, accumulate folds child Rs into H, finalize projects H → R. The carrier between nodes is R; H is internal to each node’s bracket. A lift is an algebra morphism — it maps the carrier types (MapR, and internally the heap type MapH) while preserving the fold structure. See The N-H-R algebra factorization.

Lifts — cross-axis transforms

The problem

The transforms in the previous chapter act on a single structure — a Fold or a Graph, not both. Some rewrites, however, must touch both in a coordinated manner: a change of node type that the Graph produces and the Fold consumes; a filter that drops edges and must therefore also leave the Fold structurally consistent with what remains; a per-node trace that wraps the Fold’s output and composes with whatever other transforms are already in place.

A Lift is the object that performs such cross-axis rewrites. It operates on the full triple carried by a pipeline, not on one slot in isolation, and composes with other lifts to form a chain.

Why three axes

Besides Fold<N, H, R> and Graph<N>, there’s a third slot: Grow<Seed, N> — a closure that resolves a Seed into an N. Most users never build a Grow by hand; SeedPipeline constructs one from grow: Seed → N. But because lifts compose and some pipelines carry a Grow, the trait has to account for it.

Three slots, not two. The Lift trait threads all three through composition. A lift that doesn’t care about Grow (say, a fold-wrapper) passes it unchanged. A lift that does care (the N-change lifts; SeedLift) rewrites it in concert with the other slots.

The trait

#![allow(unused)]
fn main() {
/// Domain-generic transformer over the `(treeish, fold)` pair.
///
/// A `Lift` rewrites the graph side and/or the fold side, possibly
/// changing their carrier types, and hands the result to a
/// continuation. The caller's continuation-return type `T` flows
/// through, so the chain of output types stays inferred across
/// composition (`ComposedLift<L1, L2>`).
///
/// Grow is deliberately absent from this signature. Only the Seed
/// finishing lift ([`SeedLift`](super::SeedLift)) needs a grow
/// input; it is composed internally by
/// `hylic_pipeline::PipelineExecSeed::run` and does not travel as
/// a 3-slot signature through the `Lift` trait.
///
/// See [Lifts](https://hylic.balcony.codes/concepts/lifts.html).
pub trait Lift<D, N, H, R>
where D: Domain<N> + Domain<Self::N2>,
      N: Clone + 'static, H: Clone + 'static, R: Clone + 'static,
{
    /// Output node type after the lift has been applied.
    type N2:   Clone + 'static;
    /// Output heap type after the lift has been applied.
    type MapH: Clone + 'static;
    /// Output result type after the lift has been applied.
    type MapR: Clone + 'static;

    /// Apply the lift to `(treeish, fold)` and invoke `cont` with
    /// the transformed pair.
    fn apply<T>(
        &self,
        treeish: <D as Domain<N>>::Graph<N>,
        fold:    <D as Domain<N>>::Fold<H, R>,
        cont: impl FnOnce(
            <D as Domain<Self::N2>>::Graph<Self::N2>,
            <D as Domain<Self::N2>>::Fold<Self::MapH, Self::MapR>,
        ) -> T,
    ) -> T;
}
}

Three associated output types (N2, MapH, MapR) and a single apply method. As a type-level arrow:

L : (Grow<Seed, N>, Graph<N>, Fold<N, H, R>)
  → (Grow<Seed, L::N2>, Graph<L::N2>, Fold<L::N2, L::MapH, L::MapR>)

Quick start

Most users never interact with the Lift trait directly; the pipeline sugars are the usual surface, and each sugar delegates to a library lift. A small example demonstrates what a lift changes at the value level:

#![allow(unused)]
fn main() {
    #[test]
    fn bare_lift_wrap_init() {
        use hylic::prelude::*;

        let t:   Treeish<u64>      = treeish(|n: &u64| if *n > 0 { vec![*n - 1] } else { vec![] });
        let fld: Fold<u64, u64, u64> = fold(|n: &u64| *n, |h: &mut u64, c: &u64| *h += c, |h: &u64| *h);

        // Wrap init to add +1 at each node.
        let wi = Shared::wrap_init_lift::<u64, u64, u64, _>(|n, orig| orig(n) + 1);
        let r:  u64 = wi.run_on(&FUSED, t, fld, &3u64);
        // Tree 3→2→1→0: 4 nodes, each +1 → 4 extra → 6 + 4 = 10.
        assert_eq!(r, 10);
    }
}

wrap_init_lift accepts a closure that intercepts every call to init. The pipeline’s R is unchanged; only the per-node init closure is wrapped. The remaining sugars follow the same pattern: select an axis, supply the transformation as a closure, obtain a new pipeline that differs only along that axis.

The Library catalogue below lists the axes touched by each library lift.

Four atoms

Every library lift is an instance of one of four types. The sugars compose these atoms without requiring any of them to be constructed by hand; this section names the parts so that they are recognisable in compiler errors and in custom-lift implementations.

IdentityLift — pass-through. Used as the seed of a lift chain when a Stage-1 pipeline transitions to Stage 2 via .lift().

#![allow(unused)]
fn main() {
/// The pass-through lift — the unit of lift composition. Leaves
/// every slot unchanged.
pub struct IdentityLift;
}

ComposedLift<L1, L2> — sequential composition. L1 runs first; L2 takes L1’s outputs as its inputs.

#![allow(unused)]
fn main() {
/// Sequential composition of two lifts. `L1` runs first; `L2`
/// takes `L1`'s outputs as its inputs. The outer lift's `apply`
/// drives this composition.
#[must_use]
pub struct ComposedLift<L1, L2> {
    pub(crate) inner: L1,
    pub(crate) outer: L2,
}
````<div class="mdbook-graphviz-output"><!-- Generated by graphviz version 2.43.0 (0) --><!-- Title: %3 Pages: 1 --><svg width="607pt" height="44pt" viewBox="0.00 0.00 607.00 44.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 40)"><title>%3</title><polygon fill="white" stroke="transparent" points="-4,4 -4,-40 603,-40 603,4 -4,4"/><!-- in --><g id="node1" class="node"><title>in</title><path fill="#e8f5e9" stroke="black" d="M59,-36C59,-36 12,-36 12,-36 6,-36 0,-30 0,-24 0,-24 0,-12 0,-12 0,-6 6,0 12,0 12,0 59,0 59,0 65,0 71,-6 71,-12 71,-12 71,-24 71,-24 71,-30 65,-36 59,-36"/><text text-anchor="middle" x="35.5" y="-15.5" font-family="monospace" font-size="10.00">(N, H, R)</text></g><!-- mid --><g id="node2" class="node"><title>mid</title><path fill="#fff3cd" stroke="black" d="M323,-36C323,-36 162,-36 162,-36 156,-36 150,-30 150,-24 150,-24 150,-12 150,-12 150,-6 156,0 162,0 162,0 323,0 323,0 329,0 335,-6 335,-12 335,-12 335,-24 335,-24 335,-30 329,-36 323,-36"/><text text-anchor="middle" x="242.5" y="-15.5" font-family="monospace" font-size="10.00">(L1::N2, L1::MapH, L1::MapR)</text></g><!-- in&#45;&gt;mid --><g id="edge1" class="edge"><title>in&#45;&gt;mid</title><path fill="none" stroke="black" d="M71.28,-18C90.35,-18 115.09,-18 139.77,-18"/><polygon fill="black" stroke="black" points="139.96,-21.5 149.96,-18 139.96,-14.5 139.96,-21.5"/><text text-anchor="middle" x="110.5" y="-20.8" font-family="sans-serif" font-size="9.00">L1::apply</text></g><!-- out --><g id="node3" class="node"><title>out</title><path fill="#e3f2fd" stroke="black" d="M587,-36C587,-36 426,-36 426,-36 420,-36 414,-30 414,-24 414,-24 414,-12 414,-12 414,-6 420,0 426,0 426,0 587,0 587,0 593,0 599,-6 599,-12 599,-12 599,-24 599,-24 599,-30 593,-36 587,-36"/><text text-anchor="middle" x="506.5" y="-15.5" font-family="monospace" font-size="10.00">(L2::N2, L2::MapH, L2::MapR)</text></g><!-- mid&#45;&gt;out --><g id="edge2" class="edge"><title>mid&#45;&gt;out</title><path fill="none" stroke="black" d="M335.13,-18C357.27,-18 381.06,-18 403.64,-18"/><polygon fill="black" stroke="black" points="403.92,-21.5 413.92,-18 403.92,-14.5 403.92,-21.5"/><text text-anchor="middle" x="374.5" y="-20.8" font-family="sans-serif" font-size="9.00">L2::apply</text></g></g></svg></div>



The type-level bound `L2: Lift<D, L1::N2, L1::MapH, L1::MapR>`
enforces the connection. A mistake here surfaces as a compile
error at the composition site.

**`ShapeLift<D, N, H, R, N2, H2, R2>`** — the universal library
lift. Stores three per-domain xforms (one per slot) and applies
them in sequence.

````rust
/// The universal library `Lift` — stores one xform per slot
/// (treeish, fold) and applies them during `apply`. Every library
/// lift except `SeedLift` is a `ShapeLift` with appropriate xforms.
#[must_use]
pub struct ShapeLift<D, N, H, R, N2, H2, R2>
where D: ShapeCapable<N> + Domain<N2>,
      N:  Clone + 'static, H:  Clone + 'static, R:  Clone + 'static,
      N2: Clone + 'static, H2: Clone + 'static, R2: Clone + 'static,
{
    pub(crate) treeish_xform: D::TreeishXform<N2>,
    pub(crate) fold_xform:    D::FoldXform<H, R, N2, H2, R2>,
}
}

Every concrete library lift is a ShapeLift with appropriate xforms. wrap_init_lift only rewrites Fold’s init phase; filter_edges_lift only rewrites Graph’s visit; n_lift rewrites all three; explainer_lift rewrites only Fold (but changes MapH and MapR to the explainer’s wrapper types).

SeedLift<D, N, Seed, H> — a finishing lift that closes a SeedPipeline by turning the (grow, seeds_from_node, fold) triple into a runnable (treeish, fold) pair rooted at an EntryRoot variant. Domain-parametric over ShapeCapable (Shared + Local impls; per-domain because the fold-construction closures’ Send+Sync discipline differs by domain). Assembled at run time inside the seed-rooted Stage2Pipeline::run(...) from the base’s grow plus the caller-supplied root_seeds and entry_heap, then composed as the first lift of the run-time chain.

#![allow(unused)]
fn main() {
/// The finishing lift that closes a `SeedPipeline`'s grow axis.
/// Composes entry-seed dispatch on top of a `(grow, seeds, fold)`
/// triple and produces a treeish over `SeedNode<N>`. Not
/// user-constructed; assembled internally by
/// `Stage2Pipeline::run` at call time.
///
/// Domain-parametric: storage of the entry-seeds graph and the
/// entry-heap thunk is per-domain via `<D as Domain<()>>::Graph<Seed>`
/// and `<D as ShapeCapable<N>>::EntryHeap<H>`. No hand-rolled
/// domain discriminator.
#[must_use]
pub struct SeedLift<D, N, Seed, H>
where D: ShapeCapable<N> + Domain<()>,
      N: 'static, Seed: 'static, H: 'static,
{
    pub(crate) grow:          <D as Domain<N>>::Grow<Seed, N>,
    pub(crate) entry_seeds:   <D as Domain<()>>::Graph<Seed>,
    pub(crate) entry_heap_fn: <D as ShapeCapable<N>>::EntryHeap<H>,
    _m: PhantomData<fn() -> (D, N, Seed, H)>,
}
}

Its N2 is SeedNode<N> — a sealed row type whose variants are library-internal; user code inspects via is_entry_root(), as_node(), map_node(f). Two inhabitants: the synthetic EntryRoot (root fan-out over entry seeds) and a resolved Node(N). SeedLift builds a Treeish<SeedNode<N>> that dispatches on variant: EntryRoot visits the entry seeds via grow, Node visits the user’s treeish.

For an N-typed view of a seed-closed .explain() result, convert the raw ExplainerResult<SeedNode<N>, H, R> to SeedExplainerResult via raw.into() — see seed explainer result.

Bare application

Any Lift is usable without a pipeline. LiftBare is a blanket trait:

#![allow(unused)]
fn main() {
/// Blanket trait extending any [`Lift`] with direct application to
/// a bare `(treeish, fold)` pair. Implemented automatically; users
/// call `.apply_bare(...)` or `.run_on(...)` without a pipeline.
pub trait LiftBare<D, N, H, R>: Lift<D, N, H, R>
where D: ShapeCapable<N> + Domain<Self::N2>,
      N: Clone + 'static, H: Clone + 'static, R: Clone + 'static,
      Self::N2:   Clone + 'static,
      Self::MapH: Clone + 'static,
      Self::MapR: Clone + 'static,
{
    /// Apply this lift to a bare (treeish, fold) pair; return the
    /// transformed pair.
    fn apply_bare(
        &self,
        treeish: <D as Domain<N>>::Graph<N>,
        fold:    <D as Domain<N>>::Fold<H, R>,
    ) -> (<D as Domain<Self::N2>>::Graph<Self::N2>,
          <D as Domain<Self::N2>>::Fold<Self::MapH, Self::MapR>)
    {
        self.apply(treeish, fold, |t, f| (t, f))
    }

    /// Apply this lift and run the result under the given executor.
    fn run_on<E>(
        &self,
        exec:    &E,
        treeish: <D as Domain<N>>::Graph<N>,
        fold:    <D as Domain<N>>::Fold<H, R>,
        root:    &Self::N2,
    ) -> Self::MapR
    where
        E: Executor<
            Self::N2, Self::MapR, D,
            <D as Domain<Self::N2>>::Graph<Self::N2>,
        >,
        <D as Domain<Self::N2>>::Graph<Self::N2>: TreeOps<Self::N2>,
    {
        let (t, f) = self.apply_bare(treeish, fold);
        exec.run(&f, &t, root)
    }
}
}

See Bare lift application in the Pipelines overview for the rationale and the panic-grow trick that lets LiftBare skip the grow slot.

Per-domain capability

Not every domain supports ShapeLift. A domain has to declare what it can store as a per-slot xform:

#![allow(unused)]
fn main() {
#[allow(missing_docs)] // associated types/methods are implementation plumbing for ShapeLift
pub trait ShapeCapable<N: 'static>: Domain<N> {
    type GrowXform<N2: 'static>: Clone + 'static;
    type TreeishXform<N2: 'static>: Clone + 'static;
    type FoldXform<H, R, N2, H2, R2>: Clone + 'static
    where H: 'static, R: 'static, N2: 'static, H2: 'static, R2: 'static;

    fn apply_grow_xform<Seed: 'static, N2: 'static>(
        t: &Self::GrowXform<N2>,
        g: <Self as Domain<N>>::Grow<Seed, N>,
    ) -> <Self as Domain<N2>>::Grow<Seed, N2>
    where Self: Domain<N2>;

    fn apply_treeish_xform<N2: 'static>(
        t: &Self::TreeishXform<N2>,
        g: <Self as Domain<N>>::Graph<N>,
    ) -> <Self as Domain<N2>>::Graph<N2>
    where Self: Domain<N2>;

    fn apply_fold_xform<H, R, N2, H2, R2>(
        t: &Self::FoldXform<H, R, N2, H2, R2>,
        f: <Self as Domain<N>>::Fold<H, R>,
    ) -> <Self as Domain<N2>>::Fold<H2, R2>
    where Self: Domain<N2>,
          H: 'static, R: 'static, N2: 'static, H2: 'static, R2: 'static;

    fn identity_grow_xform() -> Self::GrowXform<N>
    where N: Clone;

    fn identity_treeish_xform() -> Self::TreeishXform<N>
    where N: Clone;

    fn identity_fold_xform<H: 'static, R: 'static>() -> Self::FoldXform<H, R, N, H, R>;

    /// Compose a `grow: Seed → N` with a `seeds: Graph<Seed>` to
    /// produce the fused `Graph<N>` (treeish). Needed by
    /// `SeedPipeline::with_constructed` which yields a treeish over
    /// N to the executor.
    fn fuse_grow_with_seeds<Seed: 'static>(
        grow:  <Self as Domain<N>>::Grow<Seed, N>,
        seeds: <Self as Domain<N>>::Graph<Seed>,
    ) -> <Self as Domain<N>>::Graph<N>
    where Seed: Clone;

    /// Storage type for `SeedLift`'s entry-heap thunk: a
    /// `Fn() -> H` whose backing pointer matches the domain's
    /// closure-storage discipline (Arc on Shared, Rc on Local).
    /// Used in place of a hand-rolled domain discriminator enum.
    type EntryHeap<H: 'static>: Clone + 'static;
}

}

Shared and Local are ShapeCapable — each storage uses its own pointer type (Arc vs Rc) and closure bounds (Send + Sync vs none). Owned is not ShapeCapable: Box<dyn Fn> is not Clone, so xforms can’t be applied to produce a new owned fold. Owned pipelines have no Stage-2 surface.

Parallel vs sequential

Two blanket markers gate which executors a lift can feed:

PureLift<D, N, H, R> — any Lift + Clone + 'static with Clone outputs. Sufficient for the sequential executor Fused.
ShareableLift<D, N, H, R> — adds Send + Sync on everything. Required for the parallel Funnel executor.

You don’t implement these; the compiler picks them up via blanket impls in ops::lift::capability. If your lift (or your data) doesn’t meet the parallel bounds, calling .run(&funnel_exec, ...) is a compile error — there’s no silent fallback.

Library catalogue

Each ShapeCapable domain exposes a set of constructors that return a ShapeLift shaped for the transformation. For Shared:

Constructor	What it changes
`Shared::wrap_init_lift(w)`	intercept `init` at every node
`Shared::wrap_accumulate_lift(w)`	intercept `accumulate`
`Shared::wrap_finalize_lift(w)`	intercept `finalize`
`Shared::zipmap_lift(m)`	extend R: `R → (R, Extra)`
`Shared::map_r_bi_lift(fwd, bwd)`	change R (bijection required; R is invariant)
`Shared::filter_edges_lift(pred)`	drop edges matching a predicate
`Shared::wrap_visit_lift(w)`	intercept graph `visit`
`Shared::memoize_by_lift(key)`	memoise subtree results by key
`Shared::map_n_bi_lift(co, contra)`	change N (bijection; N is invariant across slots)
`Shared::n_lift(ln, bt, fc)`	change N with per-slot coordination
`Shared::explainer_lift()`	wrap fold with per-node trace recording
`Shared::explainer_describe_lift(fmt, emit)`	streaming trace; `MapR = R`
`Shared::phases_lift(mi, ma, mf)`	rewrite all three Fold phases (primitive)
`Shared::treeish_lift(mt)`	rewrite the graph (primitive)

Local mirrors the set (except explainer_describe_lift), with Rc storage and no Send + Sync bounds.

The last two (phases_lift, treeish_lift) are the primitives: the per-axis sugars all delegate to one of them. n_lift is the primitive for coordinated N-change; map_n_bi_lift is the bijective special case.

Appendix: why the trait takes a continuation

This section is relevant only to writing a custom Lift implementation; it explains the signature rather than the everyday use of lifts.

A direct signature would return the transformed triple from apply. The return type of such a form is (Grow<D, Seed, N2>, Graph<D, N2>, Fold<D, N2, H2, R2>), each component a domain-associated GAT and each axis an associated type of the lift. Following three chained lifts, the return type admits no nameable alias.

Continuation-passing style — “CPS” in the source and in some comments — avoids this. The caller supplies apply with a closure (the continuation cont), which apply invokes with the transformed triple. Because the continuation’s return type propagates outward, Rust’s type inference threads every intermediate through end-to-end, and no intermediate requires a nameable alias.

Consequently, every pipeline’s .run(...) reduces to a single descent through the lift chain via nested apply calls, each closing over the next. The chain is constructed at the type level, evaluated once at the value level, and the executor ultimately sees only the final (treeish, fold) pair.

Fold: shaping the computation

A Fold<N, H, R> is defined by three phases: init, accumulate, and finalize. Each phase is a closure stored in the boxing strategy of the domain in use — Arc for Shared, Rc for Local, Box for Owned. Each phase may be transformed independently.

Named-closures-first pattern

Closures should be extracted and named before being passed to the constructor:

#![allow(unused)]
fn main() {
    #[test]
    fn named_closures_pattern() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct N { val: u64, children: Vec<N> }

        // Named closures — reusable across domains.
        let init = |n: &N| n.val;
        let acc  = |h: &mut u64, c: &u64| *h += c;
        let fin  = |h: &u64| *h;

        // A free `fn` makes the visit function's full type visible at the
        // binding site; capture-free, reusable across domains.
        fn children(n: &N, cb: &mut dyn FnMut(&N)) {
            for c in &n.children { cb(c); }
        }

        let f:     Fold<N, u64, u64> = fold(init, acc, fin);
        let graph: Treeish<N>        = treeish_visit(children);
        let root = N { val: 1, children: vec![N { val: 2, children: vec![] }] };

        assert_eq!(FUSED.run(&f, &graph, &root), 3);
    }
}

This form allows closures to be reused across domains and read without nesting.

Phase transformations

Wrap individual phases without changing the fold’s types:

wrap_init — adding side effects at initialisation

#![allow(unused)]
fn main() {
    #[test]
    fn fold_wrap_init() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct Dir { name: String, size: u64, children: Vec<Dir> }

        let graph: Treeish<Dir> = treeish(|d: &Dir| d.children.clone());
        let f:     Fold<Dir, u64, u64> = fold(
            |d: &Dir| d.size,
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );

        let logged: Fold<Dir, u64, u64> = f.wrap_init(
            |d: &Dir, orig: &dyn Fn(&Dir) -> u64| orig(d),
        );

        let tree = Dir { name: "r".into(), size: 10, children: vec![] };
        assert_eq!(FUSED.run(&logged, &graph, &tree), 10);
    }
}

The wrapper receives the node and the original init as a callable reference. The closure may invoke it, modify its result, add side effects, or bypass it entirely. The mechanism is available in all three domains.

Result-type transformations

Change what the fold produces:

zipmap — augmenting the result

#![allow(unused)]
fn main() {
    #[test]
    fn fold_zipmap() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct N { val: u64, children: Vec<N> }

        let graph: Treeish<N> = treeish(|n: &N| n.children.clone());
        let f:     Fold<N, u64, u64> = fold(
            |n: &N| n.val,
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );

        let with_flag: Fold<N, u64, (u64, bool)> = f.zipmap(|r: &u64| *r > 5);
        let root = N { val: 1, children: vec![
            N { val: 3, children: vec![] },
            N { val: 4, children: vec![] },
        ]};
        let (total, over_five): (u64, bool) = FUSED.run(&with_flag, &graph, &root);
        assert_eq!(total, 8);
        assert!(over_five);
    }
}

zipmap is the most common transformation: additional computed data is attached to the result without altering the fold’s core logic.

Node-type transformations

contramap — changing the input type

#![allow(unused)]
fn main() {
    #[test]
    fn fold_contramap() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct N { val: u64, children: Vec<N> }

        let f: Fold<N, u64, u64> = fold(
            |n: &N| n.val,
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );

        // Change node type: String → N.
        let by_name: Fold<String, u64, u64> =
            f.contramap_n(|s: &String| N { val: s.len() as u64, children: vec![] });
        let graph: Treeish<String> =
            treeish_visit(|_: &String, _cb: &mut dyn FnMut(&String)| {});

        let result: u64 = FUSED.run(&by_name, &graph, &"hello".to_string());
        assert_eq!(result, 5);
    }
}

Only init consumes the node directly. contramap wraps init to transform the input; accumulate and finalize are left unchanged. See also Transforms and variance for the variance story that dictates the argument shape.

Composition

product — two folds, one traversal

#![allow(unused)]
fn main() {
    #[test]
    fn fold_product() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct Dir { name: String, size: u64, children: Vec<Dir> }

        let graph: Treeish<Dir> = treeish(|d: &Dir| d.children.clone());
        let size_fold: Fold<Dir, u64, u64> = fold(
            |d: &Dir| d.size,
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );

        let both: Fold<Dir, (u64, usize), (u64, usize)> = size_fold.product(&depth_fold());
        let tree = Dir {
            name: "r".into(), size: 10,
            children: vec![Dir { name: "a".into(), size: 5, children: vec![] }],
        };
        let (total_size, max_depth) = FUSED.run(&both, &graph, &tree);
        assert_eq!(total_size, 15);
        assert_eq!(max_depth, 2);
    }
}

The categorical product: each fold maintains its own heap, observes its own child results, and produces its own output. One traversal yields two results; no node is visited twice.

Domain parity

All three domains support the same transformation surface:

Method	Shared	Local	Owned	Effect
`wrap_init`	`&self`	`&self`	`self`	intercept init phase
`wrap_accumulate`	`&self`	`&self`	`self`	intercept accumulate phase
`wrap_finalize`	`&self`	`&self`	`self`	intercept finalize phase
`map`	`&self`	`&self`	`self`	change result type R → R2
`zipmap`	`&self`	`&self`	`self`	augment result (R, Extra)
`contramap`	`&self`	`&self`	`self`	change node type N → N2
`product`	`&self`	`&self`	`self`	two folds, one traversal

Shared and Local borrow &self, so the original fold is preserved; Owned consumes self, moving the original into the result. All three delegate to the same domain-independent combinator functions in fold/combinators.rs; auto-trait propagation ensures that Send + Sync flows correctly for Shared.

All domains also expose .init(), .accumulate(), and .finalize() as direct methods, in addition to the FoldOps trait implementation.

See The three domains for guidance on when to select which domain.

Working example

#![allow(unused)]
fn main() {
//! Transformations: features as standalone functions that match the contract.
//!
//! One domain, one base fold, one base graph. Each feature is a named
//! function — it IS the concern, separated and reusable. Plugging it
//! in is a single method call on the existing construct.

#[cfg(test)]
mod tests {
    use std::collections::HashMap;
    use std::sync::{Arc, Mutex};
    use hylic::prelude::*;
    use hylic::prelude::memoize_treeish_by;
    use insta::assert_snapshot;

    // ── Domain ──────────────────────────────────────────────

    #[derive(Clone, Debug)]
    struct Task {
        name: String,
        cost_ms: u64,
        deps: Vec<String>,
    }

    struct Registry(HashMap<String, Task>);

    impl Registry {
        fn new(tasks: &[(&str, u64, &[&str])]) -> Self {
            Registry(tasks.iter().map(|(name, cost, deps)| {
                (name.to_string(), Task {
                    name: name.to_string(),
                    cost_ms: *cost,
                    deps: deps.iter().map(|d| d.to_string()).collect(),
                })
            }).collect())
        }
        fn get(&self, name: &str) -> Option<&Task> { self.0.get(name) }
    }

    // ── Shared setup ────────────────────────────────────────

    fn setup() -> (Treeish<Task>, Task) {
        let reg = Registry::new(&[
            ("app",       50,  &["compile", "link"]),
            ("compile",   200, &["parse", "typecheck"]),
            ("parse",     100, &[]),
            ("typecheck", 300, &[]),
            ("link",      150, &[]),
        ]);
        let map = reg.0.clone();
        let g: Treeish<Task> = treeish(move |task: &Task| {
            task.deps.iter().filter_map(|d| map.get(d).cloned()).collect()
        });
        let root = reg.get("app").unwrap().clone();
        (g, root)
    }

    fn base_fold() -> Fold<Task, u64, u64> {
        fold(
            |t: &Task| t.cost_ms,
            |heap: &mut u64, child: &u64| *heap += child,
            |h: &u64| *h,
        )
    }

    // ── Fold phase wrappers ─────────────────────────────────
    //
    // Each is a standalone closure matching the wrap contract:
    //   wrap_init:       Fn(&N, &dyn Fn(&N) -> H) -> H
    //   wrap_accumulate: Fn(&mut H, &R, &dyn Fn(&mut H, &R))
    //   wrap_finalize:   Fn(&H, &dyn Fn(&H) -> R) -> R

    /// Hooks into init: called once per node, before children.
    /// Logs the task name, then delegates to the original init.
    fn visit_logger(sink: Arc<Mutex<Vec<String>>>)
        -> impl Fn(&Task, &dyn Fn(&Task) -> u64) -> u64
    {
        move |task: &Task, orig: &dyn Fn(&Task) -> u64| {
            sink.lock().unwrap().push(task.name.clone());
            orig(task)
        }
    }

    /// Hooks into accumulate: conditionally skips small children.
    /// By not calling orig, the child result is never folded in.
    fn skip_small_children(threshold: u64)
        -> impl Fn(&mut u64, &u64, &dyn Fn(&mut u64, &u64))
    {
        move |heap: &mut u64, child: &u64, orig: &dyn Fn(&mut u64, &u64)| {
            if *child >= threshold { orig(heap, child); }
        }
    }

    /// Hooks into finalize: clamps the result.
    fn clamp_at(max: u64)
        -> impl Fn(&u64, &dyn Fn(&u64) -> u64) -> u64
    {
        move |heap: &u64, orig: &dyn Fn(&u64) -> u64| orig(heap).min(max)
    }

    /// zipmap contract: a plain Fn(&R) -> Extra. No wrapping needed —
    /// the function itself IS the feature. zipmap calls it per node,
    /// pairing the original result with the derived value: R → (R, Extra).
    fn classify(total: &u64) -> &'static str {
        match *total {
            t if t >= 500 => "critical",
            t if t >= 200 => "heavy",
            _ => "light",
        }
    }

    // ── Graph transformations ───────────────────────────────

    fn only_costly_deps(g: &Treeish<Task>, min_cost: u64) -> Treeish<Task> {
        let inner = g.clone();
        treeish(move |task: &Task| {
            inner.at(task)
                .filter(|child: &Task| child.cost_ms >= min_cost)
                .collect_vec()
        })
    }

    // ── Tests ───────────────────────────────────────────────

    #[test]
    fn test_visit_logger() {
        let (graph, root) = setup();
        let visited = Arc::new(Mutex::new(Vec::new()));
        let fold = base_fold().wrap_init(visit_logger(visited.clone()));

        let total = FUSED.run(&fold, &graph, &root);
        let names: Vec<String> = visited.lock().unwrap().clone();
        assert_eq!(total, 800);
        assert_snapshot!("visit_logger", format!(
            "total={total}, visited: {}", names.join(" → ")
        ));
    }

    #[test]
    fn test_skip_small_children() {
        let (graph, root) = setup();
        let fold = base_fold().wrap_accumulate(skip_small_children(200));
        let total = FUSED.run(&fold, &graph, &root);
        // app(50) + compile(200+typecheck 300) = 550; parse(100) and link(150) skipped
        assert_eq!(total, 550);
        assert_snapshot!("skip_small", format!("total={total} (small children skipped)"));
    }

    #[test]
    fn test_clamp_at() {
        let (graph, root) = setup();
        let fold = base_fold().wrap_finalize(clamp_at(500));
        let total = FUSED.run(&fold, &graph, &root);
        // compile=min(600,500)=500, link=150, app=min(50+500+150,500)=500
        assert_eq!(total, 500);
        assert_snapshot!("clamp_at", format!("total={total} (clamped at 500)"));
    }

    #[test]
    fn test_classify() {
        let (graph, root) = setup();
        let (total, category) = FUSED.run(&base_fold().zipmap(classify), &graph, &root);
        assert_eq!(total, 800);
        assert_eq!(category, "critical");
        assert_snapshot!("classify", format!("total={total}, category={category}"));
    }

    #[test]
    fn test_only_costly_deps() {
        let (graph, root) = setup();
        let filtered = only_costly_deps(&graph, 150);
        let total = FUSED.run(&base_fold(), &filtered, &root);
        // parse(100) pruned: app(50)+compile(200)+typecheck(300)+link(150) = 700
        assert_eq!(total, 700);
        assert_snapshot!("only_costly", format!("total={total} (deps with cost < 150 pruned)"));
    }

    #[test]
    fn test_memoize_diamond() {
        let reg = Registry::new(&[
            ("app", 10, &["compile", "link"]),
            ("compile", 50, &["stdlib"]),
            ("link", 30, &["stdlib"]),
            ("stdlib", 200, &[]),
        ]);
        let visit_count = Arc::new(Mutex::new(0u32));
        let vc = visit_count.clone();
        let map = reg.0.clone();
        let graph = treeish(move |task: &Task| {
            *vc.lock().unwrap() += 1;
            task.deps.iter().filter_map(|d| map.get(d).cloned()).collect()
        });
        let root = reg.get("app").unwrap().clone();

        let total = FUSED.run(&base_fold(), &graph, &root);
        let raw_visits = *visit_count.lock().unwrap();

        *visit_count.lock().unwrap() = 0;
        let cached = memoize_treeish_by(&graph, |t: &Task| t.name.clone());
        let total_memo = FUSED.run(&base_fold(), &cached, &root);
        let memo_visits = *visit_count.lock().unwrap();

        assert_eq!((total, raw_visits), (490, 5));
        assert_eq!((total_memo, memo_visits), (490, 4));
        assert_snapshot!("memoize", format!(
            "raw: total={total} visits={raw_visits}, memo: total={total_memo} visits={memo_visits}"
        ));
    }

    #[test]
    fn test_composed_pipeline() {
        let (graph, root) = setup();
        let visited = Arc::new(Mutex::new(Vec::new()));
        let pipeline = base_fold()
            .wrap_init(visit_logger(visited.clone()))
            .wrap_finalize(clamp_at(500))
            .zipmap(classify);

        let (total, category) = FUSED.run(&pipeline, &graph, &root);
        let names: Vec<String> = visited.lock().unwrap().clone();
        assert_eq!(total, 500);
        assert_eq!(category, "critical");
        assert_snapshot!("composed", format!(
            "total={total} [{category}], visited: {}", names.join(" → ")
        ));
    }
}
}

Graph: controlling traversal

The graph — Treeish<N>, or the more general Edgy<N, E> — is a function from a node to its children, and determines what is visited during the fold. The node type N may be any type: a struct, an integer index, a string key, a database identifier. The structure of the tree resides in the function, not in the data.

Constructors

Three means of creating a Treeish<N>:

#![allow(unused)]
fn main() {
    #[test]
    fn treeish_constructors() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct Node { value: u64, children: Vec<Node> }

        let root = Node { value: 1, children: vec![Node { value: 2, children: vec![] }] };

        // Callback-based (zero allocation per visit):
        let g1: Treeish<Node> = treeish_visit(|n: &Node, cb: &mut dyn FnMut(&Node)| {
            for child in &n.children { cb(child); }
        });

        // Vec-returning (allocates per visit):
        let g2: Treeish<Node> = treeish(|n: &Node| n.children.clone());

        // Slice accessor (borrows, zero allocation):
        let g3: Treeish<Node> = treeish_from(|n: &Node| n.children.as_slice());

        assert_eq!(g1.apply(&root).len(), 1);
        assert_eq!(g2.apply(&root).len(), 1);
        assert_eq!(g3.apply(&root).len(), 1);

        // Flat data — nodes are indices, children from adjacency list:
        let adj: Vec<Vec<usize>> = vec![vec![1, 2], vec![], vec![]];
        let g4: Treeish<usize> = treeish_visit(move |n: &usize, cb: &mut dyn FnMut(&usize)| {
            for &c in &adj[*n] { cb(&c); }
        });
        assert_eq!(g4.apply(&0).len(), 2);
    }
}

treeish_visit is the most general form; its callback receives each child without the allocation of a Vec. treeish wraps a Vec-returning function for convenience, and treeish_from extracts a slice reference from a field.

For non-nested data — adjacency lists, maps, external lookups — treeish_visit is the appropriate constructor:

// Adjacency list: nodes are indices
let adj: Vec<Vec<usize>> = vec![vec![1, 2], vec![3], vec![], vec![]];
let graph = treeish_visit(move |n: &usize, cb: &mut dyn FnMut(&usize)| {
    for &c in &adj[*n] { cb(&c); }
});

// HashMap-backed graph: nodes are string keys
let edges: HashMap<String, Vec<String>> = /* ... */;
let graph = treeish_visit(move |n: &String, cb: &mut dyn FnMut(&String)| {
    if let Some(children) = edges.get(n) {
        for c in children { cb(c); }
    }
});

For a runnable adjacency-list example, see intro_flat_example.

Edge transformations

The Edgy<N, E> type generalises Treeish<N> by allowing edges and nodes to be different types. Combinators transform the edge type or the node type:

filter — pruning children

#![allow(unused)]
fn main() {
    #[test]
    fn graph_filter() {
        use hylic::prelude::*;

        #[derive(Clone)]
        struct Node { value: u64, children: Vec<Node> }

        let graph: Treeish<Node> = treeish(|n: &Node| n.children.clone());
        let f:     Fold<Node, u64, u64> = fold(
            |n: &Node| n.value,
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );

        let root = Node { value: 1, children: vec![
            Node { value: 10, children: vec![] },
            Node { value: 2, children: vec![] },
        ]};

        // Only visit children with value > 5.
        let pruned: Treeish<Node> = graph.filter(|child: &Node| child.value > 5);
        let result: u64 = FUSED.run(&f, &pruned, &root);
        assert_eq!(result, 11); // 1 + 10 (skipped 2)
    }
}

The fold sees fewer children without any awareness that pruning has occurred.

Caching: memoize_treeish

For DAGs in which the same node is reachable from multiple parents, memoize_treeish caches the child enumeration:

#![allow(unused)]
fn main() {
    #[test]
    fn memoize_example() {
        use hylic::prelude::*;
        use std::sync::atomic::{AtomicUsize, Ordering};
        use std::sync::Arc;

        let call_count = Arc::new(AtomicUsize::new(0));
        let cc = call_count.clone();

        let graph: Treeish<u64> = treeish(move |n: &u64| -> Vec<u64> {
            cc.fetch_add(1, Ordering::Relaxed);
            if *n == 0 { vec![] } else { vec![n - 1] }
        });
        let f: Fold<u64, u64, u64> = fold(
            |n: &u64| *n,
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );

        let cached: Treeish<u64> = memoize_treeish(&graph);
        let _ = FUSED.run(&f, &cached, &3u64);
        let first_count = call_count.load(Ordering::Relaxed);

        // Second run hits the cache; no new calls into `graph`.
        let _ = FUSED.run(&f, &cached, &3u64);
        let second_count = call_count.load(Ordering::Relaxed);
        assert_eq!(first_count, second_count);
    }
}

The first visit to a key computes and caches its children; subsequent visits return the cached result.

Visit combinator

Edgy::at(node) returns a Visit<T, F> — a push-based iterator exposing map, filter, fold, count, and collect_vec. All combinators are callback-based internally.

Execution: choosing the strategy

The executor governs how the tree recursion is carried out. The fold and graph determine what is computed at each node and how children are found; the executor determines traversal order, parallelism, and resource lifecycle. Substituting one executor for another changes performance characteristics without modifying the fold or the graph.

The interface

Both sequential and parallel execution use the same .run() method on Exec<D, S>. The method is inherent; no trait import is required:

#![allow(unused)]
fn main() {
use hylic::prelude::*;

// Sequential:
FUSED.run(&fold, &graph, &root);

// Parallel:
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
}

The domain D is fixed by the executor instance or the exec() call. The type parameters N, H, R, and the graph type G are inferred from the arguments.

Built-in executors

Two executors are provided. The choice between them is straightforward:

Executor	Domain	Graph requirement	Characteristics
Fused	all	any `TreeOps<N>`	Sequential direct recursion (single thread)
Funnel	Shared	`TreeOps<N> + Send + Sync`	Parallel work-stealing across a scoped thread pool

Fused operates on all domains and all graph types because it borrows everything on a single thread. Funnel requires Send + Sync on the graph because it shares the graph reference across a scoped thread pool.

Using the Funnel executor

The Funnel executor supports three usage tiers, trading convenience for control over resource lifetime:

One-shot — the pool is created and destroyed per call:

#![allow(unused)]
fn main() {
use hylic::prelude::*;
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
}

Session scope — the pool is shared across multiple folds:

#![allow(unused)]
fn main() {
exec(funnel::Spec::default(8)).session(|s| {
    s.run(&fold, &graph, &root);
    s.run(&fold, &graph, &root);
});
}

Explicit attach — the caller manages the pool directly:

#![allow(unused)]
fn main() {
funnel::Pool::with(8, |pool| {
    exec(funnel::Spec::default(8)).attach(pool).run(&fold, &graph, &root);
});
}

See Policies and presets for workload-specific configuration.

Defining a project-wide executor

For projects that use a fixed Funnel configuration, a common pattern is to define the executor once and reference it throughout:

use hylic::prelude::*;

type MyPolicy = funnel::policy::Policy<
    funnel::queue::PerWorker,
    funnel::accumulate::OnArrival,
    funnel::wake::EveryK<4>,
>;

pub fn project_exec() -> hylic::exec::Exec<Shared, funnel::Spec<MyPolicy>> {
    let nw = std::thread::available_parallelism().map(|n| n.get()).unwrap_or(4);
    exec(
        funnel::Spec::default(nw)
            .with_accumulate::<funnel::accumulate::OnArrival>(
                funnel::accumulate::on_arrival::OnArrivalSpec)
            .with_wake::<funnel::wake::EveryK<4>>(
                funnel::wake::every_k::EveryKSpec)
    )
}

Call sites then use crate::project_exec().run(&fold, &graph, &root) without naming the policy type.

Lift integration

Lifts operate on the Shared domain. The Explainer is the canonical example — composed onto a fold, it captures every node’s intermediate state into an ExplainerResult<N, H, R>:

#![allow(unused)]
fn main() {
    #[test]
    fn explainer_usage() {
        use hylic_pipeline::prelude::*;

        #[derive(Clone)]
        struct N { val: u64, children: Vec<N> }

        let f: Fold<N, u64, u64> = fold(
            |n: &N| n.val,
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );
        let root = N { val: 1, children: vec![N { val: 2, children: vec![] }] };

        let trace: ExplainerResult<N, u64, u64> =
            TreeishPipeline::new(treeish(|n: &N| n.children.clone()), &f)
                .lift()
                .then_lift(Shared::explainer_lift::<N, u64, u64>())
                .run_from_node(&FUSED, &root);
        assert_eq!(trace.orig_result, 3);
    }
}

See Lifts for the Explainer and other lift patterns, and Pipeline overview for the chainable .explain() sugar that wraps this.

Pipelines — overview

hylic-pipeline is a typestate builder over hylic’s lift primitives. Three pipeline types sit behind the same builder surface, distinguished by what they hold:

Pipeline	Slots	When to use
`SeedPipeline<D, N, Seed, H, R>`	`grow`, `seeds_from_node`, `fold`	Tree is discovered lazily through a `Seed → N` resolver. Run from a forest of entry seeds.
`TreeishPipeline<D, N, H, R>`	`treeish`, `fold`	Children are enumerable directly from the node (`N → N*`). Run from a known root `&N`.
`OwnedPipeline<N, H, R>`	`treeish`, `fold` (Owned domain)	One-shot, by-value, no `Clone`. Run consumes `self`.

Each pipeline is Stage 1: it stores its base slots and exposes per-shape reshape sugars (e.g. filter_seeds, map_node_bi, wrap_grow). Calling .lift() flips it into Stage 2, where every method composes a lift onto the chain held in Stage2Pipeline<Base, L>. Stage2Pipeline is one type parameterised over which Stage-1 base is wrapped; the sugar trait body covers both bases through Wrap dispatch.

Run methods are owned by the pipeline that defines them: SeedPipeline::run / run_from_slice in Stage 1 — SeedPipeline; PipelineExec::run_from_node in Stage 1 — TreeishPipeline; PipelineExecOnce::run_from_node_once in OwnedPipeline. Stage2Pipeline inherits run from its Stage-1 base.

Stage 1 — `SeedPipeline`

A SeedPipeline carries three base slots — a coalgebra plus an algebra:

#![allow(unused)]
fn main() {
/// Stage-1 typestate pipeline with three base slots: `grow`,
/// `seeds_from_node`, and `fold`. Used when the tree is discovered
/// lazily from `Seed` references.
#[must_use]
pub struct SeedPipeline<D, N, Seed, H, R>
where D: Domain<N>,
      N: 'static, Seed: 'static, H: 'static, R: 'static,
{
    pub(crate) grow:            <D as Domain<N>>::Grow<Seed, N>,
    pub(crate) seeds_from_node: <D as Domain<N>>::Graph<Seed>,
    pub(crate) fold:            <D as Domain<N>>::Fold<H, R>,
}
}

grow: Seed → N — resolves a reference (a Seed) into a full node (N).
seeds_from_node: Edgy<N, Seed> — given a resolved node, enumerates the references it points to.
fold: Fold<N, H, R> — the algebra over resolved nodes.

The pipeline operates lazily on demand: given an entry seed at run time, it grows the tree by alternating grow and seeds_from_node until each branch terminates at a leaf.

When to pick this over `TreeishPipeline`

Use SeedPipeline when the dependency graph speaks a different language from the nodes — file paths, module names, URLs, anything that must be resolved into a full data structure before its children can be examined. When the nodes themselves already enumerate their children directly (N → N*), TreeishPipeline is simpler: no grow slot.

Constructing one

#![allow(unused)]
fn main() {
    #[test]
    fn pipeline_overview_seed() {
        use hylic_pipeline::prelude::*;
        use std::collections::HashMap;
        use std::sync::Arc;

        #[derive(Clone)]
        struct Mod { cost: u64, deps: Vec<String> }
        let reg: Arc<HashMap<String, Mod>> = Arc::new({
            let mut m = HashMap::new();
            m.insert("app".into(), Mod { cost: 1, deps: vec!["db".into()] });
            m.insert("db".into(),  Mod { cost: 2, deps: vec![] });
            m
        });
        let reg_grow = reg.clone();

        let sp: SeedPipeline<Shared, Mod, String, u64, u64> = SeedPipeline::new(
            move |s: &String| reg_grow.get(s).cloned().unwrap(),
            edgy_visit(|n: &Mod, cb: &mut dyn FnMut(&String)| {
                for d in &n.deps { cb(d); }
            }),
            &fold(|n: &Mod| n.cost, |h: &mut u64, c: &u64| *h += c, |h: &u64| *h),
        );

        let r: u64 = sp
            .filter_seeds(|s: &String| !s.starts_with('_'))
            .run_from_slice(&FUSED, &["app".to_string()], 0u64);

        // Reachable modules: app (cost 1) + db (cost 2) = 3.
        assert_eq!(r, 3);
    }
}

Stage-1 reshape

A SeedPipeline can be reshaped without lifting — the result is still a SeedPipeline of (possibly different) type parameters. The SeedSugarsShared trait provides the surface; SeedSugarsLocal mirrors it for the Local domain. Both come into scope via use hylic_pipeline::prelude::*;.

method	changes
`filter_seeds(pred)`	`Seed` set narrowed; types preserved
`wrap_grow(w)`	intercepts every grow; types preserved
`map_node_bi(co, contra)`	changes N to N2 via bijection
`map_seed_bi(to, from)`	changes Seed to Seed2 via bijection

Transitioning to Stage 2

Stage-2 sugars are not available on SeedPipeline directly — an explicit .lift() is required. (TreeishPipeline auto-lifts; SeedPipeline does not, because the Stage-2 chain operates over SeedNode<N> rather than N, and an implicit transition would surface that asymmetry in error messages.)

let lsp = pipeline
    .lift()                              // → Stage2Pipeline<SeedPipeline<…>, IdentityLift>
    .wrap_init(|n: &N, orig| orig(n) + 1)
    .zipmap(|r: &R| classify(r));        // chain extends; tip R becomes (R, classification)

After .lift(), the chain operates on SeedNode<N> — but every Stage-2 sugar’s user closure types at &N. The SeedNode row is sealed and auto-dispatched; see SeedNode<N> for the row’s shape and the rare cases where it surfaces in a chain-tip type, and Wrap dispatch for how the sugar trait reaches both Bases through one body.

Running

Two equivalent surfaces:

Direct on SeedPipeline — .run(exec, entry_seeds, entry_heap) and .run_from_slice(exec, &[seeds], entry_heap) are inherent on SeedPipeline<D, …> itself. They forward through self.clone().lift() internally; ergonomic shorthand for the common case where no Stage-2 sugars are chained.
On Stage2Pipeline<SeedPipeline<…>, L> — same method names, same arguments. Used after .lift() plus any chain of Stage-2 sugars.

// Entry seeds as a slice (convenience), no sugars — direct on SeedPipeline:
let r: u64 = pipeline.run_from_slice(&FUSED, &["app".to_string()], 0u64);

// Entry seeds as a general Edgy<(), Seed>, no sugars:
let entry = edgy_visit(|_: &(), cb| cb(&"app".to_string()));
let r: u64 = pipeline.run(&FUSED, entry, 0u64);

// With Stage-2 sugars — `.lift()` is the explicit transition:
let r: u64 = pipeline
    .lift()
    .wrap_init(|n: &Mod, orig: &dyn Fn(&Mod) -> u64| orig(n) + 1)
    .run_from_slice(&FUSED, &["app".to_string()], 0u64);

The last argument is the initial heap at the synthetic root level — what the top-level accumulator starts with before any seed’s result is folded in. It is always the base H type; the chain’s own MapH is reached internally as the sugars promote from H outward.

.lift() itself is preserved as the explicit Stage-1 → Stage-2 transition. The shorthand on SeedPipeline exists to elide the empty-.lift() ceremony at call sites that do not chain sugars; when sugars are involved, .lift() makes the row-type transition (chain input becoming SeedNode<N>) traceable to a single line in the user’s source.

Full example

#![allow(unused)]
fn main() {
    #[test]
    fn seed_pipeline_example() {
        use hylic_pipeline::prelude::*;
        use std::collections::HashMap;

        // The "registry" — flat data, not a tree.
        let mut modules: HashMap<String, Vec<String>> = HashMap::new();
        modules.insert("app".into(),  vec!["db".into(), "auth".into()]);
        modules.insert("db".into(),   vec![]);
        modules.insert("auth".into(), vec!["db".into()]);

        // Edge function: given a module name, produce its dependency seeds.
        let reg = modules.clone();
        let seeds_from_node: Edgy<String, String> =
            edgy_visit(move |name: &String, cb: &mut dyn FnMut(&String)| {
                if let Some(deps) = reg.get(name) {
                    for dep in deps { cb(dep); }
                }
            });

        // Fold: collect every reachable name.
        let f: Fold<String, Vec<String>, Vec<String>> = fold(
            |name: &String| vec![name.clone()],
            |heap: &mut Vec<String>, child: &Vec<String>| heap.extend(child.iter().cloned()),
            |heap: &Vec<String>| heap.clone(),
        );

        let pipeline: SeedPipeline<Shared, String, String, Vec<String>, Vec<String>> =
            SeedPipeline::new(|seed: &String| seed.clone(), seeds_from_node, &f);

        let result: Vec<String> = pipeline.run_from_slice(
            &FUSED,
            &["app".to_string()],
            Vec::new(),
        );
        assert!(result.contains(&"app".to_string()));
        assert!(result.contains(&"auth".to_string()));
    }
}

Stage 1 — `TreeishPipeline`

#![allow(unused)]
fn main() {
/// Stage-1 typestate pipeline with two base slots: `treeish`
/// (graph) and `fold`. Used when children are directly enumerable
/// from nodes of the same type (`N → N*`).
#[must_use]
pub struct TreeishPipeline<D, N, H, R>
where D: Domain<N>,
      N: 'static, H: 'static, R: 'static,
{
    pub(crate) treeish: <D as Domain<N>>::Graph<N>,
    pub(crate) fold:    <D as Domain<N>>::Fold<H, R>,
}
}

Two slots:

treeish: <D as Domain<N>>::Graph<N> — direct child enumeration, N → N*.
fold: <D as Domain<N>>::Fold<H, R> — the algebra over N.

No grow step, no entry seeds. Execution starts from a &N root supplied to the executor.

Constructors

#![allow(unused)]
fn main() {
// Shared domain.
TreeishPipeline::<Shared, _, _, _>::new(
    treeish_arc,         // hylic::graph::Treeish<N>
    &fold,               // &shared::Fold<N, H, R>
);

// Local domain — note the `_local` suffix; Rust's inherent-method
// resolution can't disambiguate two `new`s on the same struct that
// differ only in the domain marker.
TreeishPipeline::<Local, _, _, _>::new_local(
    treeish_local,       // local::Edgy<N, N>
    fold_local,          // local::Fold<N, H, R>
);

// Domain-generic.
TreeishPipeline::<D, _, _, _>::from_slots(treeish, fold);
}

Stage-1 reshape

One sugar — there’s no grow axis to reshape and no seeds to filter:

method	output
`map_node_bi(co, contra)`	`TreeishPipeline<D, N2, H, R>`

Provided by TreeishSugarsShared (Local mirror: TreeishSugarsLocal); see Sugars.

Stage 2

Two ways to enter:

Explicit: tree_pipeline.lift() returns Stage2Pipeline<TreeishPipeline<D, N, H, R>, IdentityLift>.
Auto-lift: every Stage-2 sugar is also callable directly on TreeishPipeline. tree_pipeline.wrap_init(w) is shorthand for tree_pipeline.lift().wrap_init(w).

#![allow(unused)]
fn main() {
    #[test]
    fn treeish_pipeline_chain() {
        use hylic_pipeline::prelude::*;

        #[derive(Clone)]
        struct Node { value: u64, children: Vec<Node> }
        let root = Node {
            value: 1,
            children: vec![
                Node { value: 2, children: vec![] },
                Node { value: 3, children: vec![] },
            ],
        };

        let tp: TreeishPipeline<Shared, Node, u64, u64> = TreeishPipeline::new(
            treeish(|n: &Node| n.children.clone()),
            &fold(|n: &Node| n.value, |h: &mut u64, c: &u64| *h += c, |h: &u64| *h),
        );

        let r: (u64, bool) = tp
            .wrap_init(|n: &Node, orig: &dyn Fn(&Node) -> u64| orig(n) + 1)
            .zipmap(|r: &u64| *r > 5)
            .run_from_node(&FUSED, &root);
        assert_eq!(r, (9, true));
    }
}

The chain’s input N stays at the user’s N (no wrap layer); the Wrap impl is Identity.

Running

#![allow(unused)]
fn main() {
let r = pipeline.run_from_node(&FUSED, &root);
}

PipelineExec::run_from_node(&exec, &root) is a blanket method on every TreeishSource. The first init runs on the supplied root. Returns the chain-tip R — the base fold’s R when no Stage-2 sugars are composed, otherwise whatever the rightmost lift produces.

Stage2Pipeline<TreeishPipeline<…>, L> inherits the same method through its TreeishSource impl; the call shape is identical.

Worked example

#![allow(unused)]
fn main() {
    #[test]
    fn treeish_pipeline_ctor() {
        use hylic_pipeline::prelude::*;

        #[derive(Clone)]
        struct Node { value: u64, children: Vec<Node> }
        let root = Node { value: 7, children: vec![] };

        let tp: TreeishPipeline<Shared, Node, u64, u64> = TreeishPipeline::new(
            treeish(|n: &Node| n.children.clone()),
            &fold(
                |n: &Node| n.value,
                |h: &mut u64, c: &u64| *h += c,
                |h: &u64| *h,
            ),
        );
        assert_eq!(tp.run_from_node(&FUSED, &root), 7);
    }
}

Stage 2 — `Stage2Pipeline`

#![allow(unused)]
fn main() {
/// Stage-2 typestate pipeline. Wraps a Stage-1 base with a lift chain.
/// The chain's input N is `<Base::Wrap as Wrap>::Of<UN>` — see the
/// `Stage2Base` and `Wrap` traits in this module.
#[must_use]
pub struct Stage2Pipeline<Base, L = IdentityLift> {
    pub(crate) base:     Base,
    pub(crate) pre_lift: L,
}
}

base is a Stage-1 pipeline. pre_lift is one lift value, but typically a ComposedLift<L1, L2> tree built up through .then_lift calls and Stage-2 sugars. Each sugar appends one node to the tree.

The chain’s input N is determined by the Base via the Wrap projection on Stage2Base:

Base = TreeishPipeline<D, N, H, R> — Wrap::Of<N> = N.
Base = SeedPipeline<D, N, Seed, H, R> — Wrap::Of<N> = SeedNode<N>. SeedLift is composed at the chain head when .run is called; every stored lift in pre_lift sees SeedNode<N> as its input.

Type evolution

After three sugars on a TreeishPipeline<Shared, u64, u64, u64>:

Each sugar wraps the previous chain in one more ComposedLift layer. The base is unchanged. The whole chain monomorphises and inlines together; there is no per-lift dispatch at runtime.

Entering Stage 2

#![allow(unused)]
fn main() {
let lp  = tree_pipeline.lift();   // Stage2Pipeline<TreeishPipeline<..>, IdentityLift>
let lsp = seed_pipeline.lift();   // Stage2Pipeline<SeedPipeline<..>,    IdentityLift>
}

TreeishPipeline also auto-lifts: tree_pipeline.wrap_init(w) calls .lift() internally. SeedPipeline does not — .lift() must be written explicitly.

Compositional primitives

`then_lift` — append

#![allow(unused)]
fn main() {
    /// Post-compose `outer` onto the chain. Pure struct construction;
    /// no bounds. The composition's *meaningfulness* is enforced where
    /// the chain is consumed (`.run_*`, `TreeishSource`).
    pub fn then_lift<L2>(
        self,
        outer: L2,
    ) -> Stage2Pipeline<Base, ComposedLift<L, L2>> {
        Stage2Pipeline {
            base:     self.base,
            pre_lift: ComposedLift::compose(self.pre_lift, outer),
        }
    }
}

L2’s inputs must match the chain tip’s outputs. The new tip becomes (L2::N2, L2::MapH, L2::MapR). Available on every Stage2Pipeline<Base, L>.

then_lift is unconstrained at the struct-method level (pure construction). Validity is enforced where the chain is consumed — the .run* methods and the TreeishSource impl.

`before_lift` — prepend (treeish-rooted only)

#![allow(unused)]
fn main() {
    /// Pre-compose a type-preserving lift `first` before the chain.
    /// `first`'s output (N, H, R) must equal the base's input.
    /// For non-type-preserving pre-adaptation, use the variance-aware
    /// sugars (`map_node_bi`, `map_r_bi`, `n_lift`, `phases_lift`).
    ///
    /// Available only for treeish-rooted pipelines: seed-rooted
    /// chains have `SeedLift` composed at `.run` time as the natural
    /// chain head, leaving no meaningful "before" position.
    pub fn before_lift<L0>(self, first: L0)
        -> Stage2Pipeline<TreeishPipeline<D, N, H, R>, ComposedLift<L0, L>>
    where L0: Lift<D, N, H, R>,
          D: Domain<L0::N2>,
    {
        Stage2Pipeline { base: self.base, pre_lift: ComposedLift::compose(first, self.pre_lift) }
    }
}

Pre-compose L0 at the head of the chain. L0 must be type-preserving — its outputs must equal the Base’s inputs — which restricts L0 to lifts that don’t change (N, H, R) (filter_edges_lift, wrap_visit_lift, memoize_by_lift are the practical choices).

Available only on Stage2Pipeline<TreeishPipeline<…>, L>.

Sugars

Stage-2 sugars all delegate to then_lift after building a ShapeLift through Wrap dispatch. The user’s closures type at &UN regardless of Base; the seed-rooted case adapts via a SeedNode::Node(_)-peeling adapter inside the Wrap impl. Full catalogue: Sugars. Type-level mechanism: Wrap dispatch.

#![allow(unused)]
fn main() {
    #[test]
    fn lifted_sugar_chain() {
        use hylic_pipeline::prelude::*;

        let tp: TreeishPipeline<Shared, u64, u64, u64> = TreeishPipeline::new(
            treeish(|n: &u64| if *n > 0 { vec![*n - 1] } else { vec![] }),
            &fold(|n: &u64| *n, |h: &mut u64, c: &u64| *h += c, |h: &u64| *h),
        );

        let r: String = tp
            .wrap_init(|n: &u64, orig: &dyn Fn(&u64) -> u64| orig(n) + 1)
            .zipmap(|r: &u64| *r > 5)
            .filter_edges(|n: &u64| *n != 0)
            .map_r_bi(
                |r: &(u64, bool)| format!("{}:{}", r.0, r.1),
                |s: &String| {
                    let (a, b) = s.split_once(':').unwrap();
                    (a.parse().unwrap(), b == "true")
                },
            )
            .run_from_node(&FUSED, &3u64);

        // filter_edges drops the 0-step: tree visits 3→2→1, three nodes.
        // wrap_init adds +1 each → values 4, 3, 2; sum = 9. zipmap > 5 → true.
        assert_eq!(r, "9:true");
    }
}

Running

Stage2Pipeline inherits run from its Stage-1 base. Treeish-rooted: .run_from_node(&exec, &root) — see TreeishPipeline. Seed-rooted: .run(&exec, root_seeds, entry_heap) and .run_from_slice(&exec, &[seed], entry_heap) — see SeedPipeline. The call shape is unchanged across stages.

Sugars — the chainable surface

Every transform users reach for at Stage 1 or Stage 2 is a trait method. Each method picks an axis, builds the right library lift, and either reshapes the Stage-1 slots in place (Stage 1) or appends the lift to the chain via then_lift (Stage 2).

Trait surfaces

Where you are	Sugars in scope
`SeedPipeline<Shared, …>`	`SeedSugarsShared`
`SeedPipeline<Local, …>`	`SeedSugarsLocal`
`TreeishPipeline<Shared, …>`	`TreeishSugarsShared`
`TreeishPipeline<Local, …>`	`TreeishSugarsLocal`
`Stage2Pipeline<Base, L>` (Shared, any Base)	`Stage2SugarsShared`
`Stage2Pipeline<Base, L>` (Local, any Base)	`Stage2SugarsLocal`

use hylic_pipeline::prelude::*; brings them all in scope.

Shared and Local: same names, different bounds

Method names are identical across domains. Only the closure storage and bounds differ:

// Shared: parallel-safe; closures must be Send + Sync.
let r = shared_pipe.wrap_init(w).zipmap(m).run(...);

// Local: same call shape, captures may be non-Send.
let r = local_pipe.wrap_init(w).zipmap(m).run(...);

Stage 2: one trait covers both Bases

Stage2SugarsShared is one trait, blanket-implemented on every Stage2Pipeline<Base, L>. The treeish-rooted vs seed-rooted dispatch happens inside the lift-construction call, not at the trait level. Every Stage-2 sugar body is one line:

#![allow(unused)]
fn main() {
    fn wrap_init<W>(self, w: W) -> Self::With<ShapeLift<Shared,
        <<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>, H, R,
        <<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>, H, R>>
    where
        <Self::Base as Stage2Base>::Wrap: WrapShared,
        <<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>: Clone + Send + Sync + 'static,
        W: Fn(&UN, &dyn Fn(&UN) -> H) -> H + Send + Sync + 'static,
    {
        self.then_lift(<<Self::Base as Stage2Base>::Wrap as WrapShared>::build_wrap_init::<UN, H, R, _>(w))
    }
}

<<Self::Base as Stage2Base>::Wrap as WrapShared>::build_wrap_init is the dispatch. Identity (treeish-rooted) calls Shared::wrap_init_lift directly; SeedWrap (seed-rooted) wraps the user’s closure with a SeedNode::Node(_)-peeling adapter, then calls the same Shared::wrap_init_lift. Both produce a ShapeLift; both forward to then_lift. From the user’s perspective the closure types at &UN either way. See Wrap dispatch for the full mechanics.

Stage 1: per-Base reshape sugars

Stage-1 reshape rewrites the base slots in place and returns a fresh Stage-1 pipeline of (possibly different) type parameters:

#![allow(unused)]
fn main() {
pub trait SeedSugarsShared<N, Seed, H, R>: Sized
where N: Clone + 'static, Seed: Clone + 'static,
      H: Clone + 'static, R: Clone + 'static,
{
    fn filter_seeds<P>(self, pred: P) -> SeedPipeline<Shared, N, Seed, H, R>
    where P: Fn(&Seed) -> bool + Send + Sync + 'static;

    fn wrap_grow<W>(self, wrapper: W) -> SeedPipeline<Shared, N, Seed, H, R>
    where W: Fn(&Seed, &dyn Fn(&Seed) -> N) -> N + Send + Sync + 'static;

    fn map_node_bi<N2, Co, Contra>(self, co: Co, contra: Contra)
        -> SeedPipeline<Shared, N2, Seed, H, R>
    where N2: Clone + 'static,
          Co:     Fn(&N) -> N2 + Send + Sync + 'static,
          Contra: Fn(&N2) -> N + Send + Sync + 'static;

    fn map_seed_bi<Seed2, ToNew, FromNew>(self, to_new: ToNew, from_new: FromNew)
        -> SeedPipeline<Shared, N, Seed2, H, R>
    where Seed2: Clone + 'static,
          ToNew:   Fn(&Seed) -> Seed2 + Send + Sync + 'static,
          FromNew: Fn(&Seed2) -> Seed + Send + Sync + 'static;
}

}

Stage-2 sugars are not in scope until .lift() (or the TreeishPipeline auto-lift) has produced a Stage2Pipeline.

Catalogue

Stage 1 — `SeedSugarsShared` / `SeedSugarsLocal`

Operates on SeedPipeline<D, N, Seed, H, R>:

method	output shape
`filter_seeds(pred)`	`SeedPipeline<D, N, Seed, H, R>`
`wrap_grow(w)`	`SeedPipeline<D, N, Seed, H, R>`
`map_node_bi(co, contra)`	`SeedPipeline<D, N2, Seed, H, R>`
`map_seed_bi(to, from)`	`SeedPipeline<D, N, Seed2, H, R>`

Stage 1 — `TreeishSugarsShared` / `TreeishSugarsLocal`

Operates on TreeishPipeline<D, N, H, R>:

method	output shape
`map_node_bi(co, contra)`	`TreeishPipeline<D, N2, H, R>`

Stage 2 — `Stage2SugarsShared` / `Stage2SugarsLocal`

Operates on Stage2Pipeline<Base, L> (and on TreeishPipeline via auto-lift). User closures type at &UN; the chain’s actual N is UN (treeish-rooted) or SeedNode<UN> (seed-rooted), bridged by Wrap.

method	what the lift does
`wrap_init(w)`	intercept `init` at every node
`wrap_accumulate(w)`	intercept `accumulate`
`wrap_finalize(w)`	intercept `finalize`
`zipmap(m)`	extend `R`: `R → (R, Extra)`
`map_r_bi(fwd, bwd)`	change `R` bijectively
`filter_edges(pred)`	drop edges from the graph
`wrap_visit(w)`	intercept graph `visit`
`memoize_by(key)`	memoise subtree results by key
`map_n_bi(co, contra)`	change `N` bijectively (chain-tip)
`explain()`	wrap fold with per-node trace recording
`explain_describe(fmt, emit)`	streaming trace; chain-tip `R` unchanged (Shared only)

The Stage-1 reshape map_node_bi and the Stage-2 sugar map_n_bi share a purpose (change N) but are distinct operations. Stage 1 rewrites the base slots in place; Stage 2 composes a ShapeLift onto the chain. Use Stage 2 when the N change must sit on top of earlier sugars.

Where `wrap_init`’s second argument comes from

Every wrap_* user closure receives an orig: &dyn Fn(...) -> ... parameter alongside the node. orig is the prior fold’s corresponding phase, exposed as a value so the sugar body can compose with it: |n, orig| orig(n) + 1. Lifts are, at the type level, natural transformations between fold algebras; a phase mapper takes the prior phase as input and produces the new phase. See the type-level deep dive.

Wrap dispatch — how Stage-2 sugars reach both Bases

Stage2Pipeline<Base, L> is one struct. Its sugar surface (Stage2SugarsShared and Stage2SugarsLocal) is one trait per domain. Yet its chain L operates over different node types depending on the Base:

Stage2Pipeline<TreeishPipeline<…>, L> — chain runs over the user’s N.
Stage2Pipeline<SeedPipeline<…>, L> — chain runs over SeedNode<N>, because SeedLift prepends the synthetic EntryRoot at run time.

A user-facing closure types at &N. The chain expects &<wrapped N>. Bridging the two is the job of Wrap.

The trait

#![allow(unused)]
fn main() {
/// Type-level dispatch for the chain's input N. Each
/// [`Stage2Base`](super::Stage2Base) declares which `Wrap` it uses;
/// `WrapShared` / `WrapLocal` impls carry the per-domain lift
/// construction.
pub trait Wrap {
    /// The wrapped node type for a given user-facing N.
    type Of<UN: Clone + 'static>: Clone + 'static;
}
}

Two impls:

Identity::Of<UN> = UN — used by TreeishPipeline-rooted chains.
SeedWrap::Of<UN> = SeedNode<UN> — used by SeedPipeline-rooted chains.

Stage2Base declares which Wrap a Base uses:

#![allow(unused)]
fn main() {
/// A Stage-1 pipeline that can drive a Stage-2 chain. Carries the
/// `Wrap` selection plus the run-time machinery (pre-lift, root
/// reference, run-input shape).
///
/// Inherits `TreeishSource` so the `(treeish<N>, fold<N, H, R>)` pair is
/// yielded through one canonical path; `with_treeish` is the single
/// place per-base storage shapes are read.
///
/// `PreLift` is intentionally unbounded at the trait level. The
/// `Stage2Pipeline::run` impl adds the `Lift<…, N2 = <Wrap>::Of<N>>`
/// bound at use time; that keeps the supertrait surface free of the
/// `Domain<<Wrap>::Of<N>>` obligation that would otherwise propagate
/// through every site naming `Stage2Base`.
pub trait Stage2Base: TreeishSource + Sized {
    /// Type-level dispatcher for the chain's input N.
    /// `Identity` → `Of<UN> = UN` (treeish-rooted).
    /// `SeedWrap` → `Of<UN> = SeedNode<UN>` (seed-rooted).
    type Wrap: Wrap;

    /// The user-facing N (the type user lambdas type at). Equal to
    /// `Self::N` for every shipped base; kept distinct for
    /// documentation symmetry with the sugar surface, which threads
    /// `UN` as a method-level parameter.
    type UserN: Clone + 'static;

    /// What `.run(...)` accepts as its second argument. Parameterised
    /// by `CurN`, the user-facing N at the chain tip (i.e. after any
    /// `map_n_bi` lifts; `CurN = Self::N` if the chain doesn't change
    /// the user N).
    ///
    /// `Identity`-Wrap bases: `&'i CurN` (a borrowed post-chain root).
    /// `SeedWrap` bases: an owned `(seeds, entry_heap)` pair (the
    /// `CurN` parameter is unused at the value level — `EntryRoot` is
    /// constructible at any inner type).
    type RunInputs<'i, CurN: Clone + 'static>;

    /// The lift composed at the head of the run-time chain.
    /// `IdentityLift` for treeish-rooted, `SeedLift` for seed-rooted.
    /// Pre-lift transforms `(treeish<N>, fold<N,H,R>)` into
    /// `(treeish<Wrap::Of<N>>, fold<Wrap::Of<N>, H, R>)` without
    /// touching H or R.
    ///
    /// Unbounded at the trait level — see the trait-level note.
    /// The `Stage2Pipeline::run` impl adds
    /// `Self::PreLift: Lift<…, N2 = <Wrap>::Of<N>, MapH = H, MapR = R>`
    /// at use time.
    type PreLift;

    /// Build the pre-lift from inputs (consuming the parts of inputs
    /// the lift captures), then yield it together with the executor's
    /// post-chain root reference to the continuation.
    ///
    /// The continuation receives the pre-lift by value (consumed when
    /// applied to the (treeish, fold) pair) and the root by reference,
    /// at the post-chain type `<Self::Wrap as Wrap>::Of<CurN>`. The
    /// reference is valid for the entire duration of `cont`.
    ///
    /// `Identity` case: pre-lift is `IdentityLift`; the root is the
    /// `&CurN` extracted from `inputs`.
    /// `SeedWrap` case: pre-lift is `SeedLift::from_*_grow(...)`,
    /// consuming `inputs.0` (entry seeds) and `inputs.1` (entry heap);
    /// the root is `&SeedNode::entry_root::<CurN>()`, constructed
    /// locally in this frame and alive for `cont`'s lifetime.
    fn provide_run_essentials<CurN: Clone + 'static, T>(
        &self,
        inputs: Self::RunInputs<'_, CurN>,
        cont: impl FnOnce(Self::PreLift,
                          &<Self::Wrap as Wrap>::Of<CurN>) -> T,
    ) -> T;
}
}

So in the type system: <<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN> is the chain’s input N — equal to UN for treeish-rooted, equal to SeedNode<UN> for seed-rooted. This two-hop projection appears verbatim in every Stage-2 sugar’s signature.

Per-domain build subtraits

Wrap is type-only: it fixes a type family, not how to construct lifts. The constructors live on per-domain subtraits:

#![allow(unused)]
fn main() {
    fn build_wrap_init<UN, H, R, W>(w: W)
        -> ShapeLift<Shared, Self::Of<UN>, H, R, Self::Of<UN>, H, R>
    where
        UN: Clone + Send + Sync + 'static,
        H:  Clone + Send + Sync + 'static,
        R:  Clone + Send + Sync + 'static,
        Self::Of<UN>: Clone + Send + Sync + 'static,
        W: Fn(&UN, &dyn Fn(&UN) -> H) -> H + Send + Sync + 'static;
}

(Plus one method per Stage-2 sugar; see stage2/wrap/shared.rs for the full set, and stage2/wrap/local.rs for the Local mirror.)

The split is forced by the Send + Sync axis: Shared user closures must be Send + Sync (Arc storage; parallel executors); Local must not require it (Rc storage; supports non-Send captured state). WrapShared/WrapLocal are how that single asymmetry is expressed without macros.

Identity: pass-through

#![allow(unused)]
fn main() {
impl WrapShared for Identity {
    fn build_wrap_init<UN, H, R, W>(w: W)
        -> ShapeLift<Shared, UN, H, R, UN, H, R>
    where
        UN: Clone + Send + Sync + 'static,
        H:  Clone + Send + Sync + 'static,
        R:  Clone + Send + Sync + 'static,
        W: Fn(&UN, &dyn Fn(&UN) -> H) -> H + Send + Sync + 'static,
    {
        Shared::wrap_init_lift::<UN, H, R, _>(w)   // pass-through
    }
}

User closure goes straight to Shared::wrap_init_lift. Of<UN> = UN, so no adaptation is needed.

SeedWrap: peel `Node`, pass `EntryRoot`

#![allow(unused)]
fn main() {
impl WrapShared for SeedWrap {
    fn build_wrap_init<UN, H, R, W>(w: W)
        -> ShapeLift<Shared, SeedNode<UN>, H, R, SeedNode<UN>, H, R>
    where
        UN: Clone + Send + Sync + 'static,
        H:  Clone + Send + Sync + 'static,
        R:  Clone + Send + Sync + 'static,
        W: Fn(&UN, &dyn Fn(&UN) -> H) -> H + Send + Sync + 'static,
    {
        let user = Arc::new(w);
        // Adapter for the SeedNode<UN>-typed chain: peel Node(_), pass EntryRoot.
        let lifted = move |ln: &SeedNode<UN>,
                           orig: &dyn Fn(&SeedNode<UN>) -> H| -> H
        {
            match sn_int::inner(ln) {
                SeedNodeInner::Node(n) => {
                    let user = user.clone();
                    user(n, &|inner: &UN| orig(&sn_int::node(inner.clone())))
                }
                SeedNodeInner::EntryRoot => orig(ln),
            }
        };
        Shared::wrap_init_lift::<SeedNode<UN>, H, R, _>(lifted)
    }
}

The user types Fn(&UN, …) -> H. The chain expects Fn(&SeedNode<UN>, …) -> H. The body adapts: when the row is Node(n), call the user’s closure with &n; when it’s EntryRoot, call through to the chain’s orig continuation directly (the user closure has nothing to do with the synthetic root).

The same pattern recurs for every N-aware sugar: build_filter_edges, build_memoize_by, build_wrap_visit, build_map_n_bi. Sugars without &N in their signature (wrap_accumulate, wrap_finalize, zipmap, map_r_bi, explain) need no peeling — both impls forward unchanged.

How the sugar trait forwards

A representative Stage2SugarsShared body — the unified surface that covers both Base shapes:

#![allow(unused)]
fn main() {
    fn wrap_init<W>(self, w: W) -> Self::With<ShapeLift<Shared,
        <<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>, H, R,
        <<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>, H, R>>
    where
        <Self::Base as Stage2Base>::Wrap: WrapShared,
        <<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>: Clone + Send + Sync + 'static,
        W: Fn(&UN, &dyn Fn(&UN) -> H) -> H + Send + Sync + 'static,
    {
        self.then_lift(<<Self::Base as Stage2Base>::Wrap as WrapShared>::build_wrap_init::<UN, H, R, _>(w))
    }
}

The body is one line. The surrounding where clauses repeat the projection chain so Rust’s solver can verify each junction; that’s where the verbosity sits. See the type-level deep dive for why the projection has to be spelled out symmetrically here.

What the user sees

Nothing of the above. From the call site:

seed_pipeline
    .lift()
    .wrap_init(|n: &N, orig| orig(n) + 1)   // typed at &N, not &SeedNode<N>
    .filter_edges(|n: &N| !is_excluded(n))
    .run_from_slice(&exec, &seeds, h0);

Wrap dispatch is invisible. The user picks a Base; the trait routes through the right impl; closures stay typed at the user’s N. Switching Base shape — say, building the same chain over a TreeishPipeline — costs no code at the sugar layer.

`SeedNode<N>` — the seed-rooted row type

#![allow(unused)]
fn main() {
/// Opaque row type in a seed-closed chain's treeish. Values are
/// either the synthetic `EntryRoot` row (seed fan-out) or a resolved
/// `Node(N)`. User code inspects via [`is_entry_root`](Self::is_entry_root),
/// [`as_node`](Self::as_node), [`into_node`](Self::into_node), and
/// [`map_node`](Self::map_node); the variants are sealed.
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct SeedNode<N> {
    // Exposed `pub` (not `pub(crate)`) so the doc-hidden
    // `seed_node_internal` module can re-export it for
    // `hylic-pipeline`'s dispatch. User code should treat this field
    // as opaque and use `is_entry_root` / `as_node` / `map_node`.
    #[doc(hidden)]
    pub inner: SeedNodeInner<N>,
}

/// Library-internal variant carrier for `SeedNode<N>`. Exposed
/// `pub` only to make crate-external re-export through the
/// `seed_node_internal` doc-hidden module possible. User code
/// should never name this directly.
#[doc(hidden)]
#[derive(Clone, PartialEq, Eq, Hash)]
pub enum SeedNodeInner<N> {
    EntryRoot,
    Node(N),
}
}

SeedNode<N> is the chain’s input node type once SeedLift has fired in a seed-rooted Stage2Pipeline. Two inhabitants:

Node(N) — a real grown node from the user’s seed graph.
EntryRoot — the synthetic forest root above the entry seeds.

Variants are sealed; pattern-matching is not exposed to user code. Inspection is through accessor methods:

method	returns
`sn.is_entry_root()`	`bool`
`sn.as_node()`	`Option<&N>`
`sn.into_node()`	`Option<N>`
`sn.map_node(f: FnOnce(&N) -> M)`	`SeedNode<M>` — Node mapped, EntryRoot preserved

Inside Stage-2 sugar bodies, SeedNode<N> never appears; user closures type at &N and the row is peeled (or routed past) by Wrap dispatch.

Where the row surfaces

A lift whose output type mentions the chain’s N can carry SeedNode<N> to the chain tip. The explainer is the canonical case:

let raw: ExplainerResult<SeedNode<N>, H, R> = pipeline
    .lift()
    .explain()
    .run_from_slice(&exec, &seeds, h0);

ExplainerResult’s first parameter is the per-node heap.node, which on a seed-rooted chain is SeedNode<N>.

For walks over the trace (formatting, post-fact analysis), project to an N-typed view via SeedExplainerResult::from:

let sealed: SeedExplainerResult<N, H, R> = raw.into();
// sealed.entry_initial_heap, sealed.entry_working_heap, sealed.orig_result
//   — the EntryRoot row, promoted out of the tree as fields.
// sealed.roots: Vec<ExplainerResult<N, H, R>>
//   — per-seed subtrees, every node now plain N.

The conversion is total: every node below the EntryRoot row is unwrapped, and SeedNode<N> no longer appears in the user-visible shape.

Tree shape

        EntryRoot
        ├── Node(grow(seed_0))
        ├── Node(grow(seed_1))
        └── …

SeedLift produces this tree at run time from the entry seeds and the user’s grow. Each Node(n) below has the user’s seeds_from_node + grow as its own children-producing function.

One-shot — `OwnedPipeline`

#![allow(unused)]
fn main() {
/// One-shot pipeline over the `Owned` domain. Not `Clone`; runs
/// via [`crate::source::PipelineExecOnce::run_from_node_once`],
/// which consumes `self`.
#[must_use]
pub struct OwnedPipeline<N, H, R>
where N: 'static, H: 'static, R: 'static,
{
    pub(crate) treeish: Edgy<N, N>,
    pub(crate) fold:    Fold<N, H, R>,
}
}

Two slots, like TreeishPipeline, but stored in the Owned domain — closures are Box<dyn Fn>, not Clone, not Send + Sync. Runs once and is consumed.

Constructor

#![allow(unused)]
fn main() {
let pipeline = OwnedPipeline::new(
    treeish,    // owned::Edgy<N, N>
    fold,       // owned::Fold<N, H, R>
);
}

Running

#![allow(unused)]
fn main() {
let r = pipeline.run_from_node_once(&FUSED, &root);
// pipeline is consumed.
}

run_from_node_once is the by-value method on PipelineExecOnce, the consuming counterpart of PipelineExec::run_from_node. Owned does not implement ShapeCapable, so Stage-2 sugars are not available — there is no chain to compose.

Worked example

#![allow(unused)]
fn main() {
    #[test]
    fn owned_pipeline_example() {
        use hylic_pipeline::{OwnedPipeline, PipelineExecOnce};
        use hylic::domain::owned as odom;

        let graph = odom::edgy::treeish(|n: &u64|
            if *n > 0 { vec![*n - 1] } else { vec![] });
        let fld = odom::fold(
            |n: &u64| *n,
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );

        let r: u64 = OwnedPipeline::new(graph, fld)
            .run_from_node_once(&odom::FUSED, &5u64);
        // 5+4+3+2+1+0 = 15.
        assert_eq!(r, 15);
    }
}

Writing a custom Lift

Most transformations compose out of the library catalogue and the sugar traits. A custom Lift impl is the right tool when the transformation carries cross-node state, requires per-variant dispatch on the input N, or is itself an execution strategy.

apply has one job: produce the three output slots and hand them to a continuation cont(grow', treeish', fold'). Everything about the impl follows from four decisions about how those slots relate to the input ones.

Four decisions

1. Output types.

type N2   = ???;
type MapH = ???;
type MapR = ???;

Mirror the input on axes the lift does not change. Where an axis changes, declare the new type. MapH and MapR are typically wrappers — the explainer, for instance, wraps MapR into ExplainerResult<N, H, R>.

2. Treatment of the input grow.

Three options: pass through unchanged, wrap with an N-conversion when N changes, or synthesise a fresh grow (the SeedLift case, where the chain head closes the grow axis). Most custom lifts pass through.

3. Treatment of the input treeish.

Pass through, filter, wrap in a visit-intercepting closure, or rebuild entirely.

4. Treatment of the input fold.

Clone it once per phase closure. Build a new Fold<D::N2, MapH, MapR> whose init, accumulate, and finalize delegate to the original through the captured clones.

Worked example

NoteVisits increments a shared counter every time init runs. No type changes; grow and treeish pass through; fold gets a wrapped init.

#![allow(unused)]
fn main() {
    #[test]
    fn custom_lift_note_visits() {
        use std::sync::{Arc, Mutex};
        use hylic::domain::Shared;
        use hylic::domain::shared::fold::{self as sfold, Fold};
        use hylic::graph::Treeish;
        use hylic::ops::Lift;

        /// Counts init calls into a shared counter.
        #[derive(Clone)]
        struct NoteVisits {
            counter: Arc<Mutex<u64>>,
        }

        impl<N, H, R> Lift<Shared, N, H, R> for NoteVisits
        where N: Clone + 'static, H: Clone + 'static, R: Clone + 'static,
        {
            type N2   = N;
            type MapH = H;
            type MapR = R;

            fn apply<T>(
                &self,
                treeish: Treeish<N>,
                fold:    Fold<N, H, R>,
                cont: impl FnOnce(
                    Treeish<N>,
                    Fold<N, H, R>,
                ) -> T,
            ) -> T
            {
                let fold_for_init = fold.clone();
                let fold_for_acc  = fold.clone();
                let fold_for_fin  = fold;
                let counter       = self.counter.clone();
                let wrapped: Fold<N, H, R> = sfold::fold(
                    move |n: &N| { *counter.lock().unwrap() += 1; fold_for_init.init(n) },
                    move |h: &mut H, r: &R| fold_for_acc.accumulate(h, r),
                    move |h: &H| fold_for_fin.finalize(h),
                );
                cont(treeish, wrapped)
            }
        }

        use hylic::ops::LiftBare;
        use hylic::prelude::{treeish, fold, FUSED};

        let counter = Arc::new(Mutex::new(0u64));
        let lift    = NoteVisits { counter: counter.clone() };
        let t       = treeish(|n: &u64| if *n > 0 { vec![*n - 1] } else { vec![] });
        let f       = fold(|n: &u64| *n, |h: &mut u64, c: &u64| *h += c, |h: &u64| *h);
        let r: u64  = lift.run_on(&FUSED, t, f, &3u64);
        assert_eq!(r, 6);                            // 3 + 2 + 1 + 0
        assert_eq!(*counter.lock().unwrap(), 4);     // four init calls
    }
}

Apply via LiftBare::run_on or compose into a pipeline:

#![allow(unused)]
fn main() {
use hylic_pipeline::prelude::*;
let r = my_treeish_pipeline.lift()
    .then_lift(NoteVisits { counter })
    .run_from_node(&FUSED, &root);
}

When `ShapeLift` is sufficient

If the transformation is “rewrite one of the three slots” — which it is most of the time — one of the per-axis primitives or the universal ShapeLift does the job.

Primitive	When
`Shared::phases_lift(mi, ma, mf)`	rewrite all three Fold phases
`Shared::treeish_lift(mt)`	rewrite the graph
`Shared::n_lift(lift_node, build_treeish, contra)`	coordinated N-change across all slots
`Shared::wrap_init_lift(w)`	wrap `init`
`Shared::zipmap_lift(m)`	extend `R`
`Shared::filter_edges_lift(p)`	drop edges from the graph

(Local mirrors are alongside.) NoteVisits above is expressible as Shared::wrap_init_lift(|n, orig| { counter.bump(); orig(n) }); the custom impl was shown to illustrate the trait structure.

Capability bounds

PureLift — Clone + 'static on the lift, Clone on every output type. Required for the sequential Fused executor.
ShareableLift — adds Send + Sync + 'static on the lift and on every payload. Required for the parallel Funnel executor.

Both are blanket markers; the compiler selects them when the bounds are met. To run under Funnel, the lift struct itself must be Clone + Send + Sync + 'static, and every captured field must satisfy the same.

The Exec Pattern

Every executor in hylic has the same type-level structure. Two traits — Executor (computation) and ExecutorSpec (lifecycle) — and one wrapper — Exec<D, S> — compose into a uniform API where every executor, regardless of whether it needs resources, presents the same interface to the user.

The core idea

A Spec is a defunctionalized executor — pure data that fully describes a computation strategy. It is Copy: small, moveable, transformable. Calling .run() on a Spec refunctionalizes it: turns the data back into computation.

For executors that need resources (thread pools, arenas), .run() internally creates the resource, binds it, runs the fold, and destroys the resource. For executors that need nothing (Fused), the same .run() just runs.

#![allow(unused)]
fn main() {
use hylic::prelude::*;

// Sequential — no resource needed:
FUSED.run(&fold, &graph, &root);

// Parallel — resource created + destroyed internally:
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
}

The call shape is identical. Resource management is an internal concern of each executor.

The trait pair

#![allow(unused)]
fn main() {
/// Lifecycle: resource management + session creation.
/// Only Specs implement this. Sessions are the output.
pub trait ExecutorSpec: Copy {
    /// Borrowed resource attached to a session (for example, a
    /// thread-pool reference).
    type Resource<'r> where Self: 'r;
    /// The session type produced by `attach`.
    type Session<'s>: 's where Self: 's;
    /// Bind the spec to a borrowed resource, returning a session.
    fn attach(self, resource: Self::Resource<'_>) -> Self::Session<'_>;
    /// Construct an owned session scoped to `f` and run `f` against it.
    fn with_session<R>(&self, f: impl for<'s> FnOnce(&Self::Session<'s>) -> R) -> R;
}
}

ExecutorSpec is the lifecycle trait. Two GATs define each executor’s world:

Resource<'r>: what the executor needs. A thread pool (&'r Pool) for Funnel. () for Fused.
Session<'s>: the bound executor — Spec + resource, ready to run folds. Borrows the resource at lifetime 's.

Two methods connect them:

attach(self, resource): partial application. Consumes the Spec (it’s Copy — the caller keeps their copy), fixes the resource, produces a Session. This is the explicit path.
with_session(&self, f): the scoped path. Creates the resource internally, attaches, calls f with the session, cleans up. Copies the Spec (since attach consumes and with_session borrows &self).

#![allow(unused)]
fn main() {
/// Run a fold on a tree. Both Specs and Sessions implement this.
///
/// The fold is domain-specific (`D::Fold<H, R>`). The graph type G
/// is a trait-level parameter — each executor impl declares its own
/// bounds on G (e.g. Fused accepts any TreeOps, Funnel requires
/// Send+Sync). The compiler checks G at the call site.
pub trait Executor<N: 'static, R: 'static, D: Domain<N>, G: TreeOps<N> + 'static> {
    /// Run the given `fold` over the `graph` starting at `root` and
    /// return the fold's final result for the root.
    fn run<H: 'static>(&self, fold: &D::Fold<H, R>, graph: &G, root: &N) -> R;
}
}

Executor is the computation trait. Both Specs and Sessions implement it:

Spec::run: routes through self.with_session(|s| s.run(...)) — creates the resource, runs, destroys
Session::run: direct dispatch — the resource is already bound

`Exec<D, S>`

#![allow(unused)]
fn main() {
/// User-facing executor wrapper tying a domain `D` to an executor
/// strategy `S`. Both Specs and Sessions appear inside `Exec`.
pub struct Exec<D, S>(pub(crate) S, PhantomData<D>);

#[allow(missing_docs)] // trivial constructor/accessor pair
impl<D, S> Exec<D, S> {
    pub const fn new(inner: S) -> Self { Exec(inner, PhantomData) }
    pub fn into_inner(self) -> S { self.0 }
}

impl<D, S: Clone> Clone for Exec<D, S> {
    fn clone(&self) -> Self { Exec::new(self.0.clone()) }
}
impl<D, S: Copy> Copy for Exec<D, S> {}
}

The user-facing wrapper. D is the domain (determines fold/graph types via GATs). S is the strategy — a Spec or a Session. Exec is repr(transparent) over S and derives Copy when S is Copy.

Two method blocks:

#![allow(unused)]
fn main() {
    /// Run the inner strategy as an [`Executor`]. Inferred over `N`,
    /// `H`, `R`, and `G` from the arguments.
    pub fn run<N: 'static, H: 'static, R: 'static, G: TreeOps<N> + 'static>(
        &self, fold: &<D as Domain<N>>::Fold<H, R>, graph: &G, root: &N,
    ) -> R
    where D: Domain<N>, S: Executor<N, R, D, G>
    {
        Executor::<N, R, D, G>::run(&self.0, fold, graph, root)
    }
}

Block A (.run()): available on ALL Exec where S: Executor. This is the one way to execute. Works on Specs and Sessions alike.

#![allow(unused)]
fn main() {
impl<D, S: ExecutorSpec> Exec<D, S> {
    /// Construct a session bound to an owned resource, pass it to
    /// `f` by value (wrapping a borrowed session inside a fresh
    /// `Exec<D, &Session>`), and return `f`'s result. The session
    /// is dropped at the end of the scope.
    pub fn session<R>(
        &self,
        f: impl for<'s> FnOnce(Exec<D, &S::Session<'s>>) -> R,
    ) -> R {
        self.0.with_session(|session| f(Exec::new(session)))
    }

    /// Bind the spec to a borrowed resource, returning a session as
    /// an `Exec`.
    pub fn attach(self, resource: S::Resource<'_>) -> Exec<D, S::Session<'_>> {
        Exec::new(self.0.attach(resource))
    }
}
}

Block B (.session(), .attach()): available only on Spec-level Exec where S: ExecutorSpec. These are the resource-management surface:

.session(|s| ...): borrows the Spec, creates the resource in a scope, passes the session-level Exec to the closure. Multiple .run() calls inside share the resource.
.attach(resource): consumes the Spec (partial application), returns a session-level Exec bound to the resource. One expression — no intermediate bindings needed because Specs are Copy.

The three usage tiers

Executors can be used at three levels of resource control:

One-shot — the common case. Each .run() manages resources internally:

#![allow(unused)]
fn main() {
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
}

Session scope — amortized multi-run. The resource (thread pool) is created once, shared across folds:

#![allow(unused)]
fn main() {
exec(funnel::Spec::default(8)).session(|s| {
    s.run(&fold1, &graph1, &root1);
    s.run(&fold2, &graph2, &root2);
});
}

Explicit attach — manual resource management. You provide the resource; the Spec binds to it:

#![allow(unused)]
fn main() {
funnel::Pool::with(8, |pool| {
    exec(funnel::Spec::default(8)).attach(pool).run(&fold, &graph, &root);
});
}

For zero-resource executors (Fused), all three tiers compile but .session() and .attach(()) are identity — the compiler optimizes them away.

The impl table

Every executor fills the same shape:

Type	`Resource`	`Session`	`Executor::run`
`fused::Spec`	`()`	`Self`	direct recursion
`funnel::Spec<P>`	`&Pool`	`Session<P>`	routes through `with_session`
`funnel::Session`	—	—	direct `dispatch::run_fold`

Sessions do NOT implement ExecutorSpec — they’re the output of attach, not a Spec themselves.

Domain constants

Fused is a zero-sized Spec exposed as a domain-bound const:

pub const FUSED: Exec<Shared, fused::Spec> = Exec::new(fused::Spec);

FUSED is Copy. .run() calls Executor::run on fused::Spec directly (it implements both traits). No resource, no session — the Spec IS the session.

Generic-over-executor code

The Executor trait is the single generic bound. The graph type G is a trait-level parameter — each executor impl declares its own bounds on G:

fn measure<G: TreeOps<NodeId> + 'static, S: Executor<NodeId, u64, Shared, G>>(
    exec: &Exec<Shared, S>, fold: &shared::Fold<NodeId, u64, u64>, graph: &G, root: &NodeId,
) -> u64 {
    exec.run(fold, graph, root)
}

This works for Exec<Shared, fused::Spec>, Exec<Shared, funnel::Spec>, and Exec<Shared, funnel::Session<'_, P>> — all through the same bound, the same .run(), the same call site.

How a new executor fits in

Adding a new executor requires implementing two traits:

Define MySpec (Copy) and MySession<'s>
Implement ExecutorSpec on MySpec — define Resource, Session, attach, with_session
Implement Executor on MySession — the direct dispatch
Implement Executor on MySpec — route through with_session
Users: shared::exec(MySpec { ... }).run(...) (or the local/ owned equivalent) — same shape as every other executor

The framework provides .run(), .session(), .attach() for free via the Exec<D, S> wrapper.

Domain integration

The domain system lets executors accept folds without knowing their concrete storage. The Domain trait maps a marker type to a concrete Fold type via a GAT. The graph type is a separate concern — the Executor trait accepts any G: TreeOps<N>, with per-executor bounds checked at the call site.

The Domain trait

The Domain trait provides a single associated type — the fold:

#![allow(unused)]
fn main() {
pub trait Domain<N: 'static>: 'static {
    type Fold<H: 'static, R: 'static>: FoldOps<N, H, R>;
    type Graph<E: 'static> where E: 'static;
    type Grow<Seed: 'static, NOut: 'static>;

    /// Construct a fold from three closures. Uniform Send+Sync
    /// bound; each domain sheds Send+Sync at storage time if it
    /// doesn't need it.
    fn make_fold<H: 'static, R: 'static>(
        init: impl Fn(&N) -> H + Send + Sync + 'static,
        acc:  impl Fn(&mut H, &R) + Send + Sync + 'static,
        fin:  impl Fn(&H) -> R + Send + Sync + 'static,
    ) -> Self::Fold<H, R>;

    /// Construct a grow closure from a Fn. Uniform Send+Sync bound.
    fn make_grow<Seed: 'static, NOut: 'static>(
        f: impl Fn(&Seed) -> NOut + Send + Sync + 'static,
    ) -> Self::Grow<Seed, NOut>;

    /// Invoke a stored grow closure.
    fn invoke_grow<Seed: 'static, NOut: 'static>(
        g: &Self::Grow<Seed, NOut>,
        s: &Seed,
    ) -> NOut;

    /// Construct a graph (Edgy) closure. Uniform Send+Sync bound.
    fn make_graph<E: 'static>(
        visit: impl Fn(&N, &mut dyn FnMut(&E)) + Send + Sync + 'static,
    ) -> Self::Graph<E>;
}
}

Each domain marker (Shared, Local, Owned) implements this trait with a different closure boxing strategy:

Domain	`Fold<H, R>` storage	Send+Sync
Shared	`Arc<dyn Fn + Send + Sync>`	yes
Local	`Rc<dyn Fn>`	no
Owned	`Box<dyn Fn>`	no

Graph types are domain-independent. Treeish<N> and Edgy<N, E> in hylic::graph are always Arc-based (they need Clone for graph composition). Any type implementing TreeOps<N> can serve as a graph, including user-defined structs with no boxing at all.

The Executor trait

The executor trait has four type parameters: N (node), R (result), D (domain), and G (graph):

#![allow(unused)]
fn main() {
/// Run a fold on a tree. Both Specs and Sessions implement this.
///
/// The fold is domain-specific (`D::Fold<H, R>`). The graph type G
/// is a trait-level parameter — each executor impl declares its own
/// bounds on G (e.g. Fused accepts any TreeOps, Funnel requires
/// Send+Sync). The compiler checks G at the call site.
pub trait Executor<N: 'static, R: 'static, D: Domain<N>, G: TreeOps<N> + 'static> {
    /// Run the given `fold` over the `graph` starting at `root` and
    /// return the fold's final result for the root.
    fn run<H: 'static>(&self, fold: &D::Fold<H, R>, graph: &G, root: &N) -> R;
}
}

The domain D determines the fold type (D::Fold<H, R>). The graph type G is constrained per executor implementation. This separation means the fold’s boxing strategy and the graph’s storage are independent choices.

The type resolution at a call site proceeds as follows:

The compiler checks that G satisfies the executor’s requirements. For Fused, any TreeOps<N> suffices. For Funnel, G must also be Send + Sync (the graph reference is shared across a scoped pool). If the graph type does not satisfy the executor’s bounds, the call site produces a compile error.

Why D is on the executor, not the fold

Fold<N, H, R> carries no domain parameter — the domain lives on the executor: Exec<D, S>. This resolves a type inference problem: GATs are not injective (D::Fold<H, R> does not uniquely identify D), so the compiler cannot infer D from a fold argument alone. With D fixed by the executor constant or exec() call, everything resolves statically.

Domain compatibility

	Shared	Local	Owned
Fused	yes	yes	yes
Funnel	yes	—	—

Fused supports all domains because it borrows both fold and graph on a single thread. Funnel requires N: Clone + Send and R: Send on the fold’s types, which the Shared domain satisfies. The graph must additionally be Send + Sync.

The FoldOps trait

Executors do not call fold methods through the concrete domain type. They operate through the FoldOps<N, H, R> trait, which all domain Fold types implement:

The executor’s recursion engine takes &impl FoldOps<N, H, R> — fully monomorphized for the concrete fold type, with no runtime dispatch beyond the closure’s own vtable.

Policy Traits: Zero-Cost Configuration

Funnel’s three behavioral axes (queue, accumulation, wake) are each a trait with an associated Spec type. The FunnelPolicy bundle combines them into one type parameter. This pattern — Spec → Store/State → Handle, resolved at compile time — is the general recipe for adding zero-overhead configuration axes to any executor.

This page describes the pattern generically. For the concrete implementations (Chase-Lev deques, streaming sweep, etc.), see the Funnel section.

Specs as data

Every Spec in hylic is Copy — a small value type that fully describes configuration. This follows from the defunctionalization principle: Specs are data, not behavior. Combining Specs via axis transformations produces new Specs. Attaching a resource to a Spec produces a Session. Running a Spec creates the resource internally.

The policy sub-specs (PerWorkerSpec, OnFinalizeSpec, EveryKSpec, etc.) are all Copy + Default + Send + Sync. Most are ZSTs. The funnel Spec composes them and is itself Copy (~40 bytes of usizes and ZSTs).

The Spec → Store → Handle pattern

Each axis follows the same three-phase lifecycle:

Three associated types capture the lifecycle:

Spec — construction-time configuration. Carried in the executor’s Spec. Small, Copy, Default.
Store — per-fold resources created from the Spec. Owned by the fold’s stack frame. Send+Sync (shared across workers).
Handle — per-worker view that borrows from the Store. Has the actual push/pop/steal methods.

All three use GATs to carry the task’s generic parameters without boxing.

Concrete example: WorkStealing

#![allow(unused)]
fn main() {
/// A work-stealing strategy. Associates typed Store and Handle via GATs.
pub trait WorkStealing: 'static {
    type Spec: Copy + Default + Send + Sync;
    type Store<N: Send + 'static, H: 'static, R: Send + 'static>: Send + Sync;
    type Handle<'a, N: Send + 'static, H: 'static, R: Send + 'static>: TaskOps<N, H, R>
    where Self: 'a;

    fn create_store<N: Send + 'static, H: 'static, R: Send + 'static>(
        spec: &Self::Spec, n_workers: usize,
    ) -> Self::Store<N, H, R>;

    fn reset_store<N: Send + 'static, H: 'static, R: Send + 'static>(
        store: &mut Self::Store<N, H, R>,
    );

    fn handle<'a, N: Send + 'static, H: 'static, R: Send + 'static>(
        store: &'a Self::Store<N, H, R>, worker_idx: usize,
    ) -> Self::Handle<'a, N, H, R>;
}
}

Two implementations:

	PerWorker	Shared
Spec	`PerWorkerSpec { deque_capacity }` (Copy)	`SharedSpec` (ZST, Copy)
Store	`Vec<WorkerDeque>` + `AtomicU64` bitmask	`StealQueue`
Handle	refs to own deque + all deques + bitmask	ref to queue

Bundling: FunnelPolicy

Three independent axes combined into one type parameter:

#![allow(unused)]
fn main() {
/// Bundles queue topology, accumulation strategy, and wake policy.
/// One type parameter on the executor replaces three.
pub trait FunnelPolicy: 'static {
    type Queue: WorkStealing;
    type Accumulate: AccumulateStrategy;
    type Wake: WakeStrategy;
}
}

#![allow(unused)]
fn main() {
/// Generic policy: any combination of axes. Named presets are type aliases over this.
pub struct Policy<
    Q: WorkStealing = queue::PerWorker,
    A: AccumulateStrategy = accumulate::OnFinalize,
    W: WakeStrategy = wake::EveryPush,
>(PhantomData<(Q, A, W)>);

impl<Q: WorkStealing, A: AccumulateStrategy, W: WakeStrategy> FunnelPolicy for Policy<Q, A, W> {
    type Queue = Q;
    type Accumulate = A;
    type Wake = W;
}
}

Policy<Q, A, W> is the generic implementor. Named presets are type aliases. The funnel Spec carries each axis’s sub-spec:

#![allow(unused)]
fn main() {
pub struct Spec<P: FunnelPolicy = policy::Default> {
    /// Pool size for `.run()` and `.session()`. Not consulted when
    /// attaching to an explicit pool via `.attach()`.
    pub default_pool_size: usize,
    pub queue: <P::Queue as WorkStealing>::Spec,
    pub accumulate: <P::Accumulate as AccumulateStrategy>::Spec,
    pub wake: <P::Wake as WakeStrategy>::Spec,
}
}

Named presets as transformations

Every named preset is a transformation of Spec::default(n). Default values live in ONE place — the default() constructor. Presets compose axis builders on top:

// WideLight = default + Shared queue + OnArrival accumulation
fn for_wide_light(n: usize) -> Spec<WideLight> {
    Spec::default(n)
        .with_queue::<Shared>(SharedSpec)
        .with_accumulate::<OnArrival>(OnArrivalSpec)
}

The axis builders (with_queue, with_accumulate, with_wake) are typestate transformations — they change the Policy type parameter, producing a new Spec type.

How monomorphization flows

The type parameter propagates from Spec to every call site:

From Spec<WideLight> to the innermost push/deliver/notify — every call is resolved at compile time. No vtable, no trait object, no indirect call.

The const generic optimization

Wake strategies like EveryK<K> use a const generic for the notification interval. The modulus count % K compiles to a bitmask when K is a power of 2 — the compiler sees the constant and optimizes.

Applying the pattern to new axes

To add a fourth axis (e.g., steal ordering):

Define a trait: pub trait StealOrder: 'static { type Spec: Copy + Default + Send + Sync; ... }
Add implementations: struct Fifo;, struct Lifo;
Add to FunnelPolicy: type Steal: StealOrder;
Update Policy<Q, A, W, St> and named presets
Thread through Spec and run_fold

The call chain monomorphizes automatically. No runtime cost for the new axis.

Funnel: Parallel Fused Hylomorphism

The funnel executor parallelizes a fused hylomorphism — an unfold (tree traversal) composed with a fold (bottom-up accumulation) where the intermediate tree is never materialized. Children are discovered one at a time through a push-based callback, processed concurrently across worker threads, and their results flow back to the parent through defunctionalized continuations.

What a fused hylomorphism is

A hylomorphism composes an unfold (anamorphism) with a fold (catamorphism). The unfold generates a tree structure from a seed; the fold consumes it bottom-up. When fused, the two interleave: each node is produced, its children recursively processed, and their results accumulated — without materializing the tree.

In hylic terms: a Treeish<N> (the coalgebra) exposes visit(&node, |child| ...) and a Fold<N, H, R> (the algebra, factored as init/accumulate/finalize) provides the per-node bracket. The executor calls visit to discover children, recursively processes each, accumulates their R results into the parent’s H heap, and finalizes. The intermediate tree is never materialized as a data structure.

The funnel parallelizes this: children beyond the first are pushed to a work-stealing queue. Worker threads steal and process subtrees concurrently. Results flow back through continuations to the parent’s accumulator. The challenge is coordinating the fold — detecting when all children are done, accumulating their results, and cascading upward — without locks, without allocation on the critical path.

Design values

Four properties define the funnel’s design:

Iterator-based traversal. The graph exposes visit(&node, |child| ...). Children arrive one at a time. There is no children(&node) -> Vec<N>.
Fully parallel. Each child beyond the first is pushed to a work-stealing queue. Workers steal and process subtrees concurrently. The first child is walked inline — zero queue overhead on the DFS spine.
Fused unfold+fold. Results accumulate into the parent as they arrive (streaming) or in bulk by the last thread (finalize). The tree is never materialized.
Zero allocation on the hot path. Tasks are enum variants stored inline in deque slots. Multi-child accumulators are arena-allocated. Single-child nodes carry their heap inside the continuation. No Box<dyn FnOnce>, no Arc per task.

Where funnel sits

Executor	Parallelism	Unfold/fold fusion	Task repr	Allocation
Fused	none	fully fused	stack frames	zero
Funnel	CPS + work-stealing	fully fused	`FunnelTask` enum	arenas

Fused is the sequential baseline — zero overhead, callback-based recursion on a single thread. Funnel preserves the fused property while adding parallelism through CPS (continuation-passing style) and work-stealing queues. The fold/graph are unchanged between the two — only the executor differs.

Both use the same Exec<D, S> type-level pattern. Funnel’s policy system is an instance of the generic Spec → Store → Handle pattern for zero-cost executor configuration.

Module map

The funnel’s code is organized into four clusters:

Three behavioral axes

The funnel is parameterized along three independent axes, all resolved at compile time through the FunnelPolicy trait:

#![allow(unused)]
fn main() {
/// Bundles queue topology, accumulation strategy, and wake policy.
/// One type parameter on the executor replaces three.
pub trait FunnelPolicy: 'static {
    type Queue: WorkStealing;
    type Accumulate: AccumulateStrategy;
    type Wake: WakeStrategy;
}
}

Each axis is a trait with its own Spec, Store/State, and implementations. The Policy<Q, A, W> struct bundles any combination:

#![allow(unused)]
fn main() {
/// Generic policy: any combination of axes. Named presets are type aliases over this.
pub struct Policy<
    Q: WorkStealing = queue::PerWorker,
    A: AccumulateStrategy = accumulate::OnFinalize,
    W: WakeStrategy = wake::EveryPush,
>(PhantomData<(Q, A, W)>);

impl<Q: WorkStealing, A: AccumulateStrategy, W: WakeStrategy> FunnelPolicy for Policy<Q, A, W> {
    type Queue = Q;
    type Accumulate = A;
    type Wake = W;
}
}

Named presets are type aliases:

#![allow(unused)]
fn main() {
// ── Named presets (type aliases) ─────────────────
//
// Each is a concrete instantiation of Policy<Q, A, W>.
// `Default` aliases the robust all-rounder; other names describe
// the workload shape they're tuned for.

/// PerWorker + OnFinalize + EveryPush. The robust all-rounder.
pub type Robust = Policy;

/// The default policy. Alias for Robust.
pub type Default = Robust;

/// Same axes as Default. Distinguished by Spec configuration (larger arenas).
pub type GraphHeavy = Robust;

/// Shared + OnArrival + EveryPush. Wide trees (bf=20+).
pub type WideLight = Policy<queue::Shared, accumulate::OnArrival>;

/// PerWorker + OnFinalize + OncePerBatch. Overhead-sensitive (noop-like).
pub type LowOverhead = Policy<queue::PerWorker, accumulate::OnFinalize, wake::OncePerBatch>;

/// PerWorker + OnArrival + EveryPush. Streaming sweep with per-worker deques.
pub type PerWorkerArrival = Policy<queue::PerWorker, accumulate::OnArrival>;

/// Shared + OnFinalize + EveryPush.
pub type SharedDefault = Policy<queue::Shared>;

/// PerWorker + OnFinalize + EveryK<4>. Balanced wakeups for heavy workloads.
pub type HighThroughput = Policy<queue::PerWorker, accumulate::OnFinalize, wake::EveryK<4>>;

/// Shared + OnArrival + OncePerBatch.
pub type StreamingWide = Policy<queue::Shared, accumulate::OnArrival, wake::OncePerBatch>;

/// PerWorker + OnFinalize + EveryK<2>. For deep narrow trees (bf=2).
pub type DeepNarrow = Policy<queue::PerWorker, accumulate::OnFinalize, wake::EveryK<2>>;
````<div class="mdbook-graphviz-output"><!-- Generated by graphviz version 2.43.0 (0) --><!-- Title: %3 Pages: 1 --><svg width="671pt" height="188pt" viewBox="0.00 0.00 671.00 188.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 184)"><title>%3</title><polygon fill="white" stroke="transparent" points="-4,4 -4,-184 667,-184 667,4 -4,4"/><!-- policy --><g id="node1" class="node"><title>policy</title><path fill="#e2d9f3" stroke="black" d="M352,-180C352,-180 255,-180 255,-180 249,-180 243,-174 243,-168 243,-168 243,-156 243,-156 243,-150 249,-144 255,-144 255,-144 352,-144 352,-144 358,-144 364,-150 364,-156 364,-156 364,-168 364,-168 364,-174 358,-180 352,-180"/><text text-anchor="middle" x="303.5" y="-165" font-family="sans-serif" font-size="10.00">FunnelPolicy</text><text text-anchor="middle" x="303.5" y="-154" font-family="sans-serif" font-size="10.00">(one type parameter)</text></g><!-- q --><g id="node2" class="node"><title>q</title><path fill="#cce5ff" stroke="black" d="M192,-108C192,-108 135,-108 135,-108 129,-108 123,-102 123,-96 123,-96 123,-84 123,-84 123,-78 129,-72 135,-72 135,-72 192,-72 192,-72 198,-72 204,-78 204,-84 204,-84 204,-96 204,-96 204,-102 198,-108 192,-108"/><text text-anchor="middle" x="163.5" y="-93" font-family="sans-serif" font-size="10.00">Queue</text><text text-anchor="middle" x="163.5" y="-82" font-family="sans-serif" font-size="10.00">WorkStealing</text></g><!-- policy&#45;&gt;q --><g id="edge1" class="edge"><title>policy&#45;&gt;q</title><path fill="none" stroke="black" d="M269.25,-143.88C250.44,-134.47 226.92,-122.71 206.83,-112.67"/><polygon fill="black" stroke="black" points="208.39,-109.53 197.88,-108.19 205.26,-115.79 208.39,-109.53"/></g><!-- a --><g id="node3" class="node"><title>a</title><path fill="#fff3cd" stroke="black" d="M349.5,-108C349.5,-108 257.5,-108 257.5,-108 251.5,-108 245.5,-102 245.5,-96 245.5,-96 245.5,-84 245.5,-84 245.5,-78 251.5,-72 257.5,-72 257.5,-72 349.5,-72 349.5,-72 355.5,-72 361.5,-78 361.5,-84 361.5,-84 361.5,-96 361.5,-96 361.5,-102 355.5,-108 349.5,-108"/><text text-anchor="middle" x="303.5" y="-93" font-family="sans-serif" font-size="10.00">Accumulate</text><text text-anchor="middle" x="303.5" y="-82" font-family="sans-serif" font-size="10.00">AccumulateStrategy</text></g><!-- policy&#45;&gt;a --><g id="edge2" class="edge"><title>policy&#45;&gt;a</title><path fill="none" stroke="black" d="M303.5,-143.7C303.5,-135.98 303.5,-126.71 303.5,-118.11"/><polygon fill="black" stroke="black" points="307,-118.1 303.5,-108.1 300,-118.1 307,-118.1"/></g><!-- w --><g id="node4" class="node"><title>w</title><path fill="#d4edda" stroke="black" d="M520.5,-108C520.5,-108 460.5,-108 460.5,-108 454.5,-108 448.5,-102 448.5,-96 448.5,-96 448.5,-84 448.5,-84 448.5,-78 454.5,-72 460.5,-72 460.5,-72 520.5,-72 520.5,-72 526.5,-72 532.5,-78 532.5,-84 532.5,-84 532.5,-96 532.5,-96 532.5,-102 526.5,-108 520.5,-108"/><text text-anchor="middle" x="490.5" y="-93" font-family="sans-serif" font-size="10.00">Wake</text><text text-anchor="middle" x="490.5" y="-82" font-family="sans-serif" font-size="10.00">WakeStrategy</text></g><!-- policy&#45;&gt;w --><g id="edge3" class="edge"><title>policy&#45;&gt;w</title><path fill="none" stroke="black" d="M349.25,-143.88C376.33,-133.74 410.72,-120.87 438.78,-110.36"/><polygon fill="black" stroke="black" points="440.29,-113.53 448.42,-106.75 437.83,-106.98 440.29,-113.53"/></g><!-- pw --><g id="node5" class="node"><title>pw</title><path fill="#cce5ff" stroke="black" d="M99,-36C99,-36 12,-36 12,-36 6,-36 0,-30 0,-24 0,-24 0,-12 0,-12 0,-6 6,0 12,0 12,0 99,0 99,0 105,0 111,-6 111,-12 111,-12 111,-24 111,-24 111,-30 105,-36 99,-36"/><text text-anchor="middle" x="55.5" y="-20.8" font-family="sans-serif" font-size="9.00">PerWorker</text><text text-anchor="middle" x="55.5" y="-10.8" font-family="sans-serif" font-size="9.00">Chase&#45;Lev + bitmask</text></g><!-- q&#45;&gt;pw --><g id="edge4" class="edge"><title>q&#45;&gt;pw</title><path fill="none" stroke="black" d="M137.08,-71.88C123.09,-62.81 105.72,-51.55 90.61,-41.76"/><polygon fill="black" stroke="black" points="92.32,-38.69 82.02,-36.19 88.51,-44.57 92.32,-38.69"/></g><!-- shared --><g id="node6" class="node"><title>shared</title><path fill="#cce5ff" stroke="black" d="M185.5,-36C185.5,-36 141.5,-36 141.5,-36 135.5,-36 129.5,-30 129.5,-24 129.5,-24 129.5,-12 129.5,-12 129.5,-6 135.5,0 141.5,0 141.5,0 185.5,0 185.5,0 191.5,0 197.5,-6 197.5,-12 197.5,-12 197.5,-24 197.5,-24 197.5,-30 191.5,-36 185.5,-36"/><text text-anchor="middle" x="163.5" y="-20.8" font-family="sans-serif" font-size="9.00">Shared</text><text text-anchor="middle" x="163.5" y="-10.8" font-family="sans-serif" font-size="9.00">StealQueue</text></g><!-- q&#45;&gt;shared --><g id="edge5" class="edge"><title>q&#45;&gt;shared</title><path fill="none" stroke="black" d="M163.5,-71.7C163.5,-63.98 163.5,-54.71 163.5,-46.11"/><polygon fill="black" stroke="black" points="167,-46.1 163.5,-36.1 160,-46.1 167,-46.1"/></g><!-- arrive --><g id="node7" class="node"><title>arrive</title><path fill="#fff3cd" stroke="black" d="M297.5,-36C297.5,-36 227.5,-36 227.5,-36 221.5,-36 215.5,-30 215.5,-24 215.5,-24 215.5,-12 215.5,-12 215.5,-6 221.5,0 227.5,0 227.5,0 297.5,0 297.5,0 303.5,0 309.5,-6 309.5,-12 309.5,-12 309.5,-24 309.5,-24 309.5,-30 303.5,-36 297.5,-36"/><text text-anchor="middle" x="262.5" y="-20.8" font-family="sans-serif" font-size="9.00">OnArrival</text><text text-anchor="middle" x="262.5" y="-10.8" font-family="sans-serif" font-size="9.00">streaming sweep</text></g><!-- a&#45;&gt;arrive --><g id="edge6" class="edge"><title>a&#45;&gt;arrive</title><path fill="none" stroke="black" d="M293.37,-71.7C288.65,-63.64 282.94,-53.89 277.72,-44.98"/><polygon fill="black" stroke="black" points="280.59,-42.96 272.52,-36.1 274.55,-46.5 280.59,-42.96"/></g><!-- finalize --><g id="node8" class="node"><title>finalize</title><path fill="#fff3cd" stroke="black" d="M383.5,-36C383.5,-36 339.5,-36 339.5,-36 333.5,-36 327.5,-30 327.5,-24 327.5,-24 327.5,-12 327.5,-12 327.5,-6 333.5,0 339.5,0 339.5,0 383.5,0 383.5,0 389.5,0 395.5,-6 395.5,-12 395.5,-12 395.5,-24 395.5,-24 395.5,-30 389.5,-36 383.5,-36"/><text text-anchor="middle" x="361.5" y="-20.8" font-family="sans-serif" font-size="9.00">OnFinalize</text><text text-anchor="middle" x="361.5" y="-10.8" font-family="sans-serif" font-size="9.00">bulk sweep</text></g><!-- a&#45;&gt;finalize --><g id="edge7" class="edge"><title>a&#45;&gt;finalize</title><path fill="none" stroke="black" d="M317.84,-71.7C324.79,-63.3 333.27,-53.07 340.9,-43.86"/><polygon fill="black" stroke="black" points="343.64,-46.04 347.33,-36.1 338.25,-41.57 343.64,-46.04"/></g><!-- every --><g id="node9" class="node"><title>every</title><path fill="#d4edda" stroke="black" d="M465.5,-36C465.5,-36 425.5,-36 425.5,-36 419.5,-36 413.5,-30 413.5,-24 413.5,-24 413.5,-12 413.5,-12 413.5,-6 419.5,0 425.5,0 425.5,0 465.5,0 465.5,0 471.5,0 477.5,-6 477.5,-12 477.5,-12 477.5,-24 477.5,-24 477.5,-30 471.5,-36 465.5,-36"/><text text-anchor="middle" x="445.5" y="-15.8" font-family="sans-serif" font-size="9.00">EveryPush</text></g><!-- w&#45;&gt;every --><g id="edge8" class="edge"><title>w&#45;&gt;every</title><path fill="none" stroke="black" d="M479.38,-71.7C474.14,-63.56 467.8,-53.69 462.02,-44.7"/><polygon fill="black" stroke="black" points="464.85,-42.62 456.5,-36.1 458.96,-46.41 464.85,-42.62"/></g><!-- once --><g id="node10" class="node"><title>once</title><path fill="#d4edda" stroke="black" d="M563.5,-36C563.5,-36 507.5,-36 507.5,-36 501.5,-36 495.5,-30 495.5,-24 495.5,-24 495.5,-12 495.5,-12 495.5,-6 501.5,0 507.5,0 507.5,0 563.5,0 563.5,0 569.5,0 575.5,-6 575.5,-12 575.5,-12 575.5,-24 575.5,-24 575.5,-30 569.5,-36 563.5,-36"/><text text-anchor="middle" x="535.5" y="-15.8" font-family="sans-serif" font-size="9.00">OncePerBatch</text></g><!-- w&#45;&gt;once --><g id="edge9" class="edge"><title>w&#45;&gt;once</title><path fill="none" stroke="black" d="M501.62,-71.7C506.86,-63.56 513.2,-53.69 518.98,-44.7"/><polygon fill="black" stroke="black" points="522.04,-46.41 524.5,-36.1 516.15,-42.62 522.04,-46.41"/></g><!-- everyk --><g id="node11" class="node"><title>everyk</title><path fill="#d4edda" stroke="black" d="M651,-36C651,-36 606,-36 606,-36 600,-36 594,-30 594,-24 594,-24 594,-12 594,-12 594,-6 600,0 606,0 606,0 651,0 651,0 657,0 663,-6 663,-12 663,-12 663,-24 663,-24 663,-30 657,-36 651,-36"/><text text-anchor="middle" x="628.5" y="-15.8" font-family="sans-serif" font-size="9.00">EveryK&lt;K&gt;</text></g><!-- w&#45;&gt;everyk --><g id="edge10" class="edge"><title>w&#45;&gt;everyk</title><path fill="none" stroke="black" d="M524.26,-71.88C542.72,-62.51 565.78,-50.81 585.52,-40.8"/><polygon fill="black" stroke="black" points="587.28,-43.83 594.61,-36.19 584.11,-37.59 587.28,-43.83"/></g></g></svg></div>



See [Policies](policies.md) for the full decision guide and
benchmark-informed recommendations.

# Reading order

|Page|What you learn|
|----|--------------|
|[CPS walk](cps_walk.md)|The downward pass: how nodes are processed and tasks created|
|[Continuations](continuations.md)|`FunnelTask`, `Cont`, `ChainNode`, `RootCell` — the CPS data types|
|[Cascade](cascade.md)|`fire_cont`: the trampolined upward pass|
|[Ticket system](ticket_system.md)|Packed `AtomicU64` for exactly-one-finalizer detection|
|[Pool and dispatch](pool_dispatch.md)|Thread pool, `Job` struct, the `dispatch()` CPS lifecycle|
|[Queue strategies](queue_strategies.md)|PerWorker (Chase-Lev + bitmask) vs Shared (StealQueue)|
|[Accumulation](accumulation.md)|OnArrival (streaming sweep) vs OnFinalize (bulk)|
|[Policies](policies.md)|`FunnelPolicy` GAT, three axes, named presets, decision guide|
|[Infrastructure](infrastructure.md)|Arena, ContArena, WorkerDeque, EventCount|
|[Testing](testing.md)|Correctness, stress, interleaving proof|
}

Policies: Configuration and Presets

The funnel’s behavior is fully determined by three compile-time axes bundled into FunnelPolicy:

#![allow(unused)]
fn main() {
/// Bundles queue topology, accumulation strategy, and wake policy.
/// One type parameter on the executor replaces three.
pub trait FunnelPolicy: 'static {
    type Queue: WorkStealing;
    type Accumulate: AccumulateStrategy;
    type Wake: WakeStrategy;
}
}

The Spec

#![allow(unused)]
fn main() {
pub struct Spec<P: FunnelPolicy = policy::Default> {
    /// Pool size for `.run()` and `.session()`. Not consulted when
    /// attaching to an explicit pool via `.attach()`.
    pub default_pool_size: usize,
    pub queue: <P::Queue as WorkStealing>::Spec,
    pub accumulate: <P::Accumulate as AccumulateStrategy>::Spec,
    pub wake: <P::Wake as WakeStrategy>::Spec,
}
}

Each axis contributes its Spec type. default_pool_size sets the thread count for one-shot execution. Arenas grow lazily via segmented allocation — no capacity configuration.

Named presets

#![allow(unused)]
fn main() {
// ── Named presets (type aliases) ─────────────────
//
// Each is a concrete instantiation of Policy<Q, A, W>.
// `Default` aliases the robust all-rounder; other names describe
// the workload shape they're tuned for.

/// PerWorker + OnFinalize + EveryPush. The robust all-rounder.
pub type Robust = Policy;

/// The default policy. Alias for Robust.
pub type Default = Robust;

/// Same axes as Default. Distinguished by Spec configuration (larger arenas).
pub type GraphHeavy = Robust;

/// Shared + OnArrival + EveryPush. Wide trees (bf=20+).
pub type WideLight = Policy<queue::Shared, accumulate::OnArrival>;

/// PerWorker + OnFinalize + OncePerBatch. Overhead-sensitive (noop-like).
pub type LowOverhead = Policy<queue::PerWorker, accumulate::OnFinalize, wake::OncePerBatch>;

/// PerWorker + OnArrival + EveryPush. Streaming sweep with per-worker deques.
pub type PerWorkerArrival = Policy<queue::PerWorker, accumulate::OnArrival>;

/// Shared + OnFinalize + EveryPush.
pub type SharedDefault = Policy<queue::Shared>;

/// PerWorker + OnFinalize + EveryK<4>. Balanced wakeups for heavy workloads.
pub type HighThroughput = Policy<queue::PerWorker, accumulate::OnFinalize, wake::EveryK<4>>;

/// Shared + OnArrival + OncePerBatch.
pub type StreamingWide = Policy<queue::Shared, accumulate::OnArrival, wake::OncePerBatch>;

/// PerWorker + OnFinalize + EveryK<2>. For deep narrow trees (bf=2).
pub type DeepNarrow = Policy<queue::PerWorker, accumulate::OnFinalize, wake::EveryK<2>>;
}

Nine names map to seven distinct monomorphizations:

Preset	Queue	Accumulate	Wake	Use case
`Default` / `Robust`	PerWorker	OnFinalize	EveryPush	All-rounder
`GraphHeavy`	(same as Robust)			Large trees (alias for Robust)
`WideLight`	Shared	OnArrival	EveryPush	bf > 10
`LowOverhead`	PerWorker	OnFinalize	OncePerBatch	Noop-sensitive
`PerWorkerArrival`	PerWorker	OnArrival	EveryPush	Streaming + deques
`SharedDefault`	Shared	OnFinalize	EveryPush	Shared baseline
`HighThroughput`	PerWorker	OnFinalize	EveryK<4>	Heavy balanced
`StreamingWide`	Shared	OnArrival	OncePerBatch	Known +11% fold-hv
`DeepNarrow`	PerWorker	OnFinalize	EveryK<2>	bf=2 chains

Decision guide

Start from the tree shape, then refine by work distribution:

When unsure, use Spec::default(n) — the Robust preset has zero regressions on any benchmarked workload.

The three usage tiers

#![allow(unused)]
fn main() {
use hylic::prelude::*;

// One-shot: creates pool, runs, joins
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);

// Session scope: pool lives for the closure, multiple folds share it
exec(funnel::Spec::default(8)).session(|s| {
    s.run(&fold1, &graph1, &root1);
    s.run(&fold2, &graph2, &root2);
});

// Explicit attach: manual pool management
funnel::Pool::with(8, |pool| {
    exec(funnel::Spec::default(8)).attach(pool).run(&fold, &graph, &root);
});
}

See The Exec pattern for the type-level design behind these tiers.

Wake strategies

#![allow(unused)]
fn main() {
/// Wake strategy: when to notify idle workers of pushed tasks.
///
/// `State` is per-worker mutable state (embedded in WorkerCtx as
/// `Cell<State>`). Created once via `init_state`, reset per visit batch.
pub trait WakeStrategy: 'static {
    type Spec: Copy + Default + Send + Sync;
    type State: Copy;

    fn init_state(spec: &Self::Spec) -> Self::State;

    /// Called after each successful push.
    /// Returns true if the caller should wake an idle worker.
    fn should_notify(state: &mut Self::State) -> bool;

    /// Called before each graph.visit batch.
    fn reset(state: &mut Self::State);
}
}

Strategy	Behavior	Per-worker state
`EveryPush`	Notify on every push	`()` (none)
`OncePerBatch`	First push per `graph.visit` only	`bool`
`EveryK<K>`	Every K-th push (K is const generic)	`u32` counter

EveryK<K> uses a const generic — the modulus compiles to a bitmask when K is a power of 2.

Zero-cost monomorphization

The entire call chain is generic over P: FunnelPolicy. The compiler generates separate code per policy — WorkerCtx, worker_loop, walk_cps, fire_cont, push_task are all monomorphized. No vtable, no trait object, no indirect call. Each push and try_acquire is a direct, inlinable function call.

CPS Walk: The Downward Pass

walk_cps is the core of the funnel executor. It processes one node at a time: initializes the fold heap, iterates children through the graph’s push-based visitor, and branches on the child count. It is a void function — results flow through continuations, not return values. This is what makes cross-thread result delivery possible without blocking.

The algorithm

#![allow(unused)]
fn main() {
pub(crate) fn walk_cps<N, H, R, F, G, P: FunnelPolicy>(
    wctx: &WorkerCtx<N, H, R, F, G, P>,
    mut node: N,
    mut cont: Cont<H, R>,
) where
    F: FoldOps<N, H, R> + 'static,
    G: TreeOps<N> + 'static,
    N: Clone + Send + 'static,
    H: 'static,
    R: Send + 'static,
{
    let ctx = wctx.ctx;
    loop {
        let fold = ctx.fold_ref();
        let graph = ctx.graph_ref();
        let chain_arena = ctx.chain_arena();
        let cont_arena = ctx.cont_arena();
        let heap = fold.init(&node);

        let mut child_count = 0u32;
        let mut first_child: Option<N> = None;
        let mut chain_idx: Option<super::super::infra::arena::ArenaIdx> = None;
        let mut heap_opt = Some(heap);
        let mut cont_opt = Some(cont);

        wctx.reset_wake();
        graph.visit(&node, &mut |child: &N| {
            child_count += 1;
            if child_count == 1 {
                first_child = Some(child.clone());
            } else {
                if child_count == 2 {
                    let cn = ChainNode::new(heap_opt.take().unwrap(), cont_opt.take().unwrap());
                    let idx = chain_arena.alloc(cn);
                    // SAFETY: idx was just returned by chain_arena.alloc
                // (or a prior iteration within this visit closure) and
                // the arena lives for the pool duration.
                let node_ref = unsafe { chain_arena.get(idx) };
                    node_ref.chain.append_slot();
                    chain_idx = Some(idx);
                }
                let idx = chain_idx.unwrap();
                // SAFETY: idx was just returned by chain_arena.alloc
                // (or a prior iteration within this visit closure) and
                // the arena lives for the pool duration.
                let node_ref = unsafe { chain_arena.get(idx) };
                let slot = node_ref.chain.append_slot();
                wctx.push_task(FunnelTask::Walk {
                    child: child.clone(),
                    cont: Cont::Slot { node: idx, slot },
                });
            }
        });

        match child_count {
            0 => {
                let heap = heap_opt.take().unwrap();
                let cont = cont_opt.take().unwrap();
                let result = fold.finalize(&heap);
                fire_cont::<N, H, R, F, G, P>(ctx, cont, result);
                return;
            }
            1 => {
                let child = first_child.unwrap();
                let heap = heap_opt.take().unwrap();
                let parent_cont = cont_opt.take().unwrap();
                let parent_idx = cont_arena.alloc(parent_cont);
                node = child;
                cont = Cont::Direct { heap, parent_idx };
            }
            _ => {
                let idx = chain_idx.unwrap();
                // SAFETY: idx came from chain_arena.alloc above.
                let cn = unsafe { chain_arena.get(idx) };
                let fold = ctx.fold_ref();
                let set_total_result = P::Accumulate::set_total(&cn.chain, fold);
                if let Some(finalized) = set_total_result {
                    let parent = cn.take_parent_cont();
                    fire_cont::<N, H, R, F, G, P>(ctx, parent, finalized);
                    return;
                }
                let child = first_child.unwrap();
                node = child;
                cont = Cont::Slot { node: idx, slot: SlotRef(0) };
            }
        }
    }
}
}

The function takes (wctx, node, cont):

wctx: per-worker context (queue handle + wake state)
node: the graph node to process
cont: what to do with this node’s result

It loops (trampolined for the inline child case), processing one node per iteration.

Child-count branching

After graph.visit returns, the child count determines the control flow:

Leaf (0 children): Finalize the heap and call fire_cont with the original continuation. This is the base case — the upward cascade begins here.

Single child (1): No ChainNode needed. The heap moves into a Cont::Direct, the parent continuation is stored in the ContArena, and the loop continues with the child. Zero queue interaction, zero atomic operations.

Multi-child (2+): A ChainNode is allocated in the arena (lazily, on child 2 — not child 1). Children 1..K are pushed as FunnelTask::Walk to the queue. Then set_total records the child count in the ticket system. The loop continues with child 0 (inline walk).

First-child inlining

Child 0 is ALWAYS walked inline — a continuation of the current thread’s DFS spine, with zero queue overhead. Siblings are pushed to the queue for workers to steal. This gives every active thread a guaranteed DFS path from its entry point to a leaf:

Red edges = inline walks (zero queue cost). Dashed = queue submissions. Thread 0 walks root → c0 → c00 → … → leaf without touching the queue at any level. This is structurally equivalent to Cilk’s continuation-stealing, inverted: we push sibling tasks (child stealing) instead of stealing the parent’s continuation.

Three compounding effects make this critical:

Zero-queue spine. For depth D, one thread processes D nodes with no push/pop overhead (~20-50ns saved per level).
Cache warmth. ChainNodes allocated on the way down are in L1 cache on the way up via fire_cont.
Reduced contention. One fewer task per level competing for deque access.

Defunctionalization

Tasks are data, not closures:

#![allow(unused)]
fn main() {
pub enum FunnelTask<N, H, R> {
    Walk { child: N, cont: Cont<H, R> },
}
}

FunnelTask::Walk pairs a child node with its continuation — plain data stored inline in deque slots. No Box<dyn FnOnce>, no closure capture, no vtable. The execute_task function is the apply:

#![allow(unused)]
fn main() {
pub(crate) fn execute_task<N, H, R, F, G, P: FunnelPolicy>(
    wctx: &WorkerCtx<N, H, R, F, G, P>,
    task: FunnelTask<N, H, R>,
) where
    F: FoldOps<N, H, R> + 'static,
    G: TreeOps<N> + 'static,
    N: Clone + Send + 'static,
    H: 'static,
    R: Send + 'static,
{
    match task {
        FunnelTask::Walk { child, cont } => walk_cps(wctx, child, cont),
    }
}
}

This is the Reynolds/Danvy defunctionalization transformation applied to parallel work items.

Streaming submission

Children are pushed to the queue during graph.visit, not after. Workers can steal siblings while the parent is still discovering more children. append_slot is called per child inside the callback; set_total is called after graph.visit returns. Between these two events, workers may deliver results to already-appended slots. The ticket system handles this race.

Task submission and wake

#![allow(unused)]
fn main() {
    pub(crate) fn push_task(&self, task: FunnelTask<N, H, R>) {
        if let Some(overflow) = self.handle.push(task) {
            execute_task(self, overflow);
            return;
        }
        let mut state = self.wake_state.get();
        if P::Wake::should_notify(&mut state) {
            self.view().notify_idle();
        }
        self.wake_state.set(state);
    }
}

push goes through the policy’s queue handle. If the queue is full, the task is executed inline (Cilk overflow protocol). Otherwise, the wake strategy decides whether to notify a parked worker.

Worked example

A sum fold over tree R(A(D,E), B, C) where D, E, B, C are leaves. Thread 0 is the caller; threads 1-2 are workers.

Thread 0 walks the left spine (R→A→D) inline
Thread 1 steals B, then E — becomes finalizer for A, cascades A’s result to R
Thread 2 steals C — becomes finalizer for R, fires Cont::Root
The fold completes when any thread fires Cont::Root

Cross-references

Continuations — Cont, FunnelTask, ChainNode
Cascade — fire_cont: the trampolined upward pass
Ticket system — how set_total determines the finalizer
Queue strategies — how push_task dispatches to PerWorker or Shared

Continuations: CPS Data Types

Three types carry the fold’s state through the CPS pipeline: FunnelTask (the parallelism boundary), Cont (the continuation), and ChainNode (the multi-child accumulator). A fourth, RootCell, is the terminal sink for the final result. Together they replace implicit stack frames with explicit data that can be created on one thread and consumed on another.

FunnelTask

#![allow(unused)]
fn main() {
pub enum FunnelTask<N, H, R> {
    Walk { child: N, cont: Cont<H, R> },
}
}

The unit of parallelism. Stored inline in deque slots (PerWorker) or queue segments (Shared). No heap allocation per task — the enum variant IS the data. N must be Clone + Send (cloned during graph.visit, sent across threads). R must be Send (results are moved across threads via destructive slot reads). H has no bounds — it travels inside Cont::Direct.

Cont

#![allow(unused)]
fn main() {
pub enum Cont<H, R> {
    /// Raw pointer to stack-local RootCell in run_fold.
    /// SAFETY: The scoped pool guarantees all workers complete before
    /// run_fold returns — the RootCell outlives every Cont::Root.
    Root(*const RootCell<R>),
    Direct { heap: H, parent_idx: ContIdx },
    Slot { node: ArenaIdx, slot: SlotRef },
}
}

The defunctionalized continuation. Tells fire_cont what to do with a result:

`Cont::Root`

Terminal. Created once per fold. When fire_cont receives it, the fold is complete: the result is written to the RootCell and fold_done is signaled. Size: 8 bytes (one raw pointer).

The RootCell lives on run_fold’s stack — no heap allocation. The raw pointer is safe because the scoped pool guarantees all workers complete before run_fold returns.

`Cont::Direct`

Single-child fast path. The heap value travels WITH the continuation — no ChainNode, no FoldChain, no atomics. parent_idx is a ContIdx(u32) into the ContArena. When fire_cont receives it: accumulate the result into the heap, finalize, take the parent continuation from the arena, continue the loop.

Size: sizeof(H) + 4 bytes.

`Cont::Slot`

Multi-child delivery. Two u32 indices: node (arena index to the ChainNode) and slot (which position in the FoldChain). When fire_cont receives it: deliver the result to the slot, check the ticket. If this was the last event, sweep/finalize the chain and take the parent continuation. If not, return — another thread will finalize.

Size: 8 bytes (two u32, regardless of H or R).

ChainNode

#![allow(unused)]
fn main() {
pub(crate) struct ChainNode<H, R> {
    pub(crate) chain: FoldChain<H, R>,
    parent_cont: UnsafeCell<Option<Cont<H, R>>>,
}
}

Arena-allocated. Created lazily on child 2 (never for single-child nodes). Contains:

chain: the FoldChain — slot cells, heap, ticket state
parent_cont: the continuation of the creating node, moved out exactly once by the finalizing thread via take_parent_cont()

Continuation graph

For a tree with root R, child A (2 children: C, D), and child B (1 child: E, leaf):

Leaf C finalizes, delivers to Slot{R,0}. Leaf E finalizes, fires Direct for B (accumulates + finalizes), delivers to Slot{R,1}. Whichever delivery is last (ticket) sweeps ChainNode(R) and fires Root.

Data ownership

Each CPS type lives in a specific memory region:

Deque stores tasks inline. Arena indices are u32 (Copy, no refcount). The CPS pipeline has zero heap allocations on the critical path — RootCell is stack-local, arenas grow lazily via segmented allocation, and tasks are stored inline in deque slots.

Size summary

Type	Size	Notes
`Cont::Root`	8 bytes	raw pointer to stack-local RootCell
`Cont::Direct`	`sizeof(H) + 4`	heap value + `ContIdx(u32)`
`Cont::Slot`	8 bytes	`ArenaIdx(u32) + SlotRef(u32)`
`FunnelTask::Walk`	`sizeof(N) + sizeof(Cont) + tag`	stored inline in deque
`ChainNode`	`sizeof(FoldChain) + sizeof(Option<Cont>)`	arena-allocated
`RootCell`	`sizeof(Option<R>) + 1`	stack-local in `run_fold`

Cascade: The Trampolined Upward Pass

When a child completes, fire_cont delivers its result and cascades upward through the continuation chain. It is a loop, not recursion — zero stack growth. One thread can cascade from leaf to root without touching the queue.

This is the dual of walk_cps: walk descends, fire_cont ascends. Together they form a single DFS round-trip — down and back up — on one thread for the inline spine, handing off across threads at multi-child boundaries.

The function

#![allow(unused)]
fn main() {
pub(crate) fn fire_cont<N, H, R, F, G, P: FunnelPolicy>(
    ctx: &WalkCtx<'_, F, G, H, R, P>,
    mut cont: Cont<H, R>,
    mut result: R,
) where
    F: FoldOps<N, H, R> + 'static,
    G: TreeOps<N> + 'static,
    N: Clone + Send + 'static,
    H: 'static,
    R: Send + 'static,
{
    loop {
        match cont {
            Cont::Root(cell_ptr) => {
                // SAFETY: cell_ptr points to stack-local RootCell in run_fold.
                // The scoped pool guarantees this thread finishes before run_fold returns.
                let cell = unsafe { &*cell_ptr };
                cell.set(result);
                let view = ctx.view_ref();
                view.fold_done.store(true, Ordering::Release);
                view.event().notify_all();
                return;
            }
            Cont::Direct { mut heap, parent_idx } => {
                let fold = ctx.fold_ref();
                fold.accumulate(&mut heap, &result);
                result = fold.finalize(&heap);
                // SAFETY: parent_idx came from cont_arena.alloc in
                // walk_cps; Direct conts are consumed exactly once
                // (the child's fire_cont path), so this take is the
                // sole reader.
                cont = unsafe { ctx.cont_arena().take(parent_idx) };
            }
            Cont::Slot { node: node_idx, slot } => {
                let arena = ctx.chain_arena();
                // SAFETY: node_idx came from chain_arena.alloc in
                // walk_cps and outlives every Cont::Slot that holds it
                // (arena is freed by run_fold after all workers join).
                let node = unsafe { arena.get(node_idx) };
                let fold = ctx.fold_ref();
                let delivered = P::Accumulate::deliver(&node.chain, slot, result, fold);
                match delivered {
                    Some(finalized) => {
                        cont = node.take_parent_cont();
                        result = finalized;
                    }
                    None => return,
                }
            }
        }
    }
}
}

Three continuation variants, three behaviors, one loop.

Per-variant behavior

`Cont::Root` — terminal

The fold is complete. Write the result to the RootCell, set fold_done, notify all parked workers. Cost: ~5ns. One cell write, one atomic store, one futex wake.

`Cont::Direct` — single-child fast path

Accumulate the child result into the heap. Finalize. Take the parent continuation from the ContArena. Continue the loop. No atomics, no synchronization — pure sequential speed. The heap was moved INTO the continuation by walk_cps. The entire single-child spine collapses into loop iterations at ~2ns overhead + user work each.

`Cont::Slot` — multi-child delivery

Deliver the result to the FoldChain slot via P::Accumulate::deliver(). The ticket system determines if this thread is the last to arrive. If yes: sweep or finalize the chain, take the parent continuation, continue cascading. If no: return — this thread’s cascade is done.

This is where parallelism meets sequentiality. Multiple threads race to deliver. Exactly one wins. The winner cascades; the losers return to the help loop to steal more work.

The cascade as a round-trip

The walk and cascade form a symmetric pair — down through task creation, up through result delivery:

A leaf fires upward. Direct levels collapse at sequential speed. The Slot level requires atomic delivery and a ticket check. Root stores the result and signals completion.

Parallel interleaving

The cascade runs WITHOUT touching the queue. One thread goes up while other threads simultaneously go down:

Thread 0’s delivery to R and Thread 1’s delivery to R are concurrent atomic operations on R’s FoldChain. The ticket determines which thread cascades past R.

Cache warmth

The same thread that walks DOWN the DFS spine walks back UP via fire_cont. ChainNodes allocated on the way down are in L1 cache on the way up — no cross-core transfer. This is a structural consequence of first-child inlining: the allocating thread is the reading thread.

Compile-time accumulation dispatch

In the Cont::Slot arm, the accumulation strategy is resolved at compile time:

let delivered = P::Accumulate::deliver(&node.chain, slot, result, fold);

P::Accumulate is an associated type on FunnelPolicy, resolved via monomorphization. No runtime branch — the compiler inlines deliver_and_sweep (OnArrival) or deliver_and_finalize (OnFinalize) directly. See Accumulation for the two strategies.

Ticket System: Packed AtomicU64

Each multi-child node has K+1 concurrent events: K child deliveries and 1 set_total. A single AtomicU64 determines which event is last — the finalizer. No separate counters, no missed completions.

The problem

graph.visit produces children through a push callback. Workers may complete and deliver results before the iterator finishes. Two things happen concurrently:

Child deliveries: each result written to a slot. Multiple threads, unpredictable order.
set_total: after graph.visit returns, the child count is recorded.

Exactly one thread must detect that ALL events have occurred and perform the finalization.

Why two separate variables fail (Dekker race)

With delivered: AtomicU32 and total_known: AtomicBool separately:

Thread A (delivering child K): delivered.fetch_add(1), then checks total_known.load() → sees false (stale)
Thread B (set_total): total_known.store(true), then checks delivered.load() → sees K-1 (stale)

Neither sees the complete state. Both exit. The fold hangs.

The solution: packed state

#![allow(unused)]
fn main() {
pub struct FoldChain<H, R> {
    heap: UnsafeCell<H>,
    first: SlotBuf<R>,
    appended: AtomicU32,
    state: AtomicU64,       // low32: events_done, high32: total (0=unknown)
    sweep: AtomicU32,       // bit31: sweeping gate, bits 0-30: cursor position
    done: AtomicBool,       // finalized
}
}

state: AtomicU64 packs both counters into one word:

Bits 63–32	Bits 31–0
`total` (0 = unknown)	`events_done`

#![allow(unused)]
fn main() {
fn pack_total(total: u32) -> u64 { (total as u64) << 32 }
fn unpack(state: u64) -> (u32, u32) { (state as u32, (state >> 32) as u32) }
}

How it works

Each event does a single fetch_add on state:

#![allow(unused)]
fn main() {
    pub fn deliver_and_sweep<N>(&self, slot: SlotRef, result: R, fold: &impl FoldOps<N, H, R>) -> Option<R> {
        let cell = self.slot_at(slot.0);
        // SAFETY: each slot is delivered exactly once (slots come from
        // append_slot + SlotRef, never cloned). The Release store on
        // `filled` publishes the write to whoever sweeps via Acquire.
        unsafe { (*cell.result.get()).write(result); }
        cell.filled.store(true, Ordering::Release);

        let prev = self.state.fetch_add(1, Ordering::Relaxed);
        let (done_before, total) = unpack(prev);
        let am_finalizer = total > 0 && done_before + 1 >= total;
}

Delivery adds 1 to the low 32 bits (events_done). set_total adds pack_total(K) to the high 32 bits.

fetch_add is a read-modify-write (RMW) — atomically reads the previous state, modifies it, and writes back. No window for interleaving. Each event gets a unique snapshot of the previous state.

The finalizer condition on prev:

Delivery: prev_total > 0 && prev_done + 1 >= prev_total
set_total: prev_done >= total

State transition examples

Deliveries before set_total:

All deliveries before set_total (set_total is finalizer):

In every interleaving, exactly one event transitions the state to {done ≥ total, total > 0}.

Why Relaxed ordering is correct

The ticket determines WHO finalizes, not data visibility. Slot data visibility is guaranteed by per-slot filled.store(true, Release) / filled.load(Acquire) pairs. The sweep reads slots only after confirming filled == true.

RMW linearization is ordering-independent: fetch_add operations on a single atomic word are totally ordered by the CPU’s coherence protocol regardless of the memory ordering specified. Relaxed does not weaken atomicity — it only relaxes ordering with respect to OTHER memory locations.

The exactly-one-finalizer proof

Claim: For K children, exactly one of K+1 events identifies itself as the finalizer.

Proof: The K+1 fetch_add operations form a total order (RMW linearization). Each returns a unique prev. Before set_total fires, total = 0 in all prev values — no delivery can satisfy the condition. After set_total fires, each subsequent delivery increments done. The delivery that pushes done to total is the first (and only) to satisfy the condition. If all deliveries fire before set_total, then set_total sees prev_done ≥ total and is the finalizer. QED.

Pool and Dispatch

The pool provides persistent threads. The executor provides the work. A thin Job struct (two words) bridges them. The dispatch function encapsulates the full lifecycle: publish → body → seal → latch.

PoolState

#![allow(unused)]
fn main() {
pub(crate) struct PoolState {
    pub shutdown: AtomicBool,
    pub job_ptr: AtomicPtr<()>,
    pub wake: EventCount,
    /// Threads currently between loading job_ptr and returning from
    /// the job call. dispatch waits for this to reach 0 before returning.
    pub in_job: AtomicU32,
    pub n_threads: usize,
    pub dispatch_lock: Mutex<()>,
}
}

job_ptr: points to a stack-local Job during dispatch, null otherwise
in_job: threads currently in the job-handling region (the latch counter)
wake: futex-based EventCount for thread parking
dispatch_lock: serializes folds (one fold at a time per pool)

Job

#![allow(unused)]
fn main() {
#[repr(C)]
pub(crate) struct Job {
    pub call: unsafe fn(*const (), usize),
    pub data: *const (),
}
}

call is a monomorphized worker_entry::<N, H, R, F, G, P> — a concrete function pointer, not vtable dispatch. data points to a stack-local FoldState. Two words, no allocation.

The dispatch lifecycle

#![allow(unused)]
fn main() {
// CPS lifecycle: publish → body → seal → latch.
// The body just does fold work and returns a result.
// All pool-thread synchronization is dispatch's responsibility.
pub(crate) fn dispatch<R>(state: &PoolState, job: &Job, body: impl FnOnce() -> R) -> R {
    let _guard = state.dispatch_lock.lock().unwrap();

    // Publish: make job visible to workers
    state.job_ptr.store(job as *const Job as *mut (), Ordering::Release);
    state.wake.notify_all();

    // Body: caller participates in the fold
    let result = body();

    // Seal: prevent new workers from entering
    state.job_ptr.store(std::ptr::null_mut(), Ordering::Release);

    // Latch: wait for all workers to leave the job region in pool_thread.
    // in_job brackets the entire load-job_ptr → call-worker_entry → return
    // sequence, so in_job==0 guarantees no thread holds a reference to
    // the stack-local Job or FoldState.
    let mut spins = 0u32;
    while state.in_job.load(Ordering::Acquire) > 0 {
        spins += 1;
        if spins > 5_000_000 {
            panic!("dispatch latch: {} threads still in job region",
                state.in_job.load(Ordering::Relaxed));
        }
        std::hint::spin_loop();
    }

    result
}
````<div class="mdbook-graphviz-output"><!-- Generated by graphviz version 2.43.0 (0) --><!-- Title: %3 Pages: 1 --><svg width="156pt" height="342pt" viewBox="0.00 0.00 156.00 342.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 338)"><title>%3</title><polygon fill="white" stroke="transparent" points="-4,4 -4,-338 152,-338 152,4 -4,4"/><!-- publish --><g id="node1" class="node"><title>publish</title><path fill="#fff3cd" stroke="black" d="M124,-334C124,-334 24,-334 24,-334 18,-334 12,-328 12,-322 12,-322 12,-305 12,-305 12,-299 18,-293 24,-293 24,-293 124,-293 124,-293 130,-293 136,-299 136,-305 136,-305 136,-322 136,-322 136,-328 130,-334 124,-334"/><text text-anchor="middle" x="74" y="-322" font-family="sans-serif" font-size="10.00">Publish</text><text text-anchor="middle" x="74" y="-311" font-family="sans-serif" font-size="10.00">job_ptr.store(Release)</text><text text-anchor="middle" x="74" y="-300" font-family="sans-serif" font-size="10.00">wake.notify_all()</text></g><!-- body --><g id="node2" class="node"><title>body</title><path fill="#d4edda" stroke="black" d="M122.5,-257C122.5,-257 25.5,-257 25.5,-257 19.5,-257 13.5,-251 13.5,-245 13.5,-245 13.5,-228 13.5,-228 13.5,-222 19.5,-216 25.5,-216 25.5,-216 122.5,-216 122.5,-216 128.5,-216 134.5,-222 134.5,-228 134.5,-228 134.5,-245 134.5,-245 134.5,-251 128.5,-257 122.5,-257"/><text text-anchor="middle" x="74" y="-245" font-family="sans-serif" font-size="10.00">Body</text><text text-anchor="middle" x="74" y="-234" font-family="sans-serif" font-size="10.00">walk_cps + help loop</text><text text-anchor="middle" x="74" y="-223" font-family="sans-serif" font-size="10.00">(caller is a worker)</text></g><!-- publish&#45;&gt;body --><g id="edge1" class="edge"><title>publish&#45;&gt;body</title><path fill="none" stroke="black" d="M74,-292.79C74,-284.96 74,-275.77 74,-267.16"/><polygon fill="black" stroke="black" points="77.5,-267.07 74,-257.07 70.5,-267.07 77.5,-267.07"/></g><!-- seal --><g id="node3" class="node"><title>seal</title><path fill="#cce5ff" stroke="black" d="M136,-180C136,-180 12,-180 12,-180 6,-180 0,-174 0,-168 0,-168 0,-156 0,-156 0,-150 6,-144 12,-144 12,-144 136,-144 136,-144 142,-144 148,-150 148,-156 148,-156 148,-168 148,-168 148,-174 142,-180 136,-180"/><text text-anchor="middle" x="74" y="-165" font-family="sans-serif" font-size="10.00">Seal</text><text text-anchor="middle" x="74" y="-154" font-family="sans-serif" font-size="10.00">job_ptr.store(null, Release)</text></g><!-- body&#45;&gt;seal --><g id="edge2" class="edge"><title>body&#45;&gt;seal</title><path fill="none" stroke="black" d="M74,-215.69C74,-207.91 74,-198.84 74,-190.45"/><polygon fill="black" stroke="black" points="77.5,-190.32 74,-180.32 70.5,-190.32 77.5,-190.32"/></g><!-- latch --><g id="node4" class="node"><title>latch</title><path fill="#f8d7da" stroke="black" d="M123,-108C123,-108 25,-108 25,-108 19,-108 13,-102 13,-96 13,-96 13,-84 13,-84 13,-78 19,-72 25,-72 25,-72 123,-72 123,-72 129,-72 135,-78 135,-84 135,-84 135,-96 135,-96 135,-102 129,-108 123,-108"/><text text-anchor="middle" x="74" y="-93" font-family="sans-serif" font-size="10.00">Latch</text><text text-anchor="middle" x="74" y="-82" font-family="sans-serif" font-size="10.00">while in_job &gt; 0: spin</text></g><!-- seal&#45;&gt;latch --><g id="edge3" class="edge"><title>seal&#45;&gt;latch</title><path fill="none" stroke="black" d="M74,-143.7C74,-135.98 74,-126.71 74,-118.11"/><polygon fill="black" stroke="black" points="77.5,-118.1 74,-108.1 70.5,-118.1 77.5,-118.1"/></g><!-- ret --><g id="node5" class="node"><title>ret</title><path fill="#d4edda" stroke="black" d="M121.5,-36C121.5,-36 26.5,-36 26.5,-36 20.5,-36 14.5,-30 14.5,-24 14.5,-24 14.5,-12 14.5,-12 14.5,-6 20.5,0 26.5,0 26.5,0 121.5,0 121.5,0 127.5,0 133.5,-6 133.5,-12 133.5,-12 133.5,-24 133.5,-24 133.5,-30 127.5,-36 121.5,-36"/><text text-anchor="middle" x="74" y="-21" font-family="sans-serif" font-size="10.00">Return result</text><text text-anchor="middle" x="74" y="-10" font-family="sans-serif" font-size="10.00">stack safe to destroy</text></g><!-- latch&#45;&gt;ret --><g id="edge4" class="edge"><title>latch&#45;&gt;ret</title><path fill="none" stroke="black" d="M74,-71.7C74,-63.98 74,-54.71 74,-46.11"/><polygon fill="black" stroke="black" points="77.5,-46.1 74,-36.1 70.5,-46.1 77.5,-46.1"/></g></g></svg></div>



1. **Publish**: store the `Job` pointer, wake all threads
1. **Body**: the caller participates in the fold (walk root, help loop)
1. **Seal**: clear `job_ptr` — no new threads can enter
1. **Latch**: spin until `in_job == 0` — all threads have left the
   job-handling region
1. **Return**: the `Job` and `FoldState` on the stack are safe to drop

The body knows nothing about pool lifecycle — it’s pure fold logic.
All synchronization is dispatch’s responsibility.

# Pool thread

````rust
fn pool_thread(state: &PoolState, thread_idx: usize) {
    let mut last_epoch = 0u32;
    loop {
        loop {
            let token = state.wake.prepare();
            if state.shutdown.load(Ordering::Acquire) { return; }
            if token.epoch() > last_epoch {
                last_epoch = token.epoch();
                break;
            }
            state.wake.wait(token);
        }
        // in_job MUST be incremented BEFORE loading job_ptr.
        // This closes the TOCTOU gap: the body cannot return (destroying
        // the Job/FoldState on the stack) while any thread is between
        // loading job_ptr and finishing the job call.
        state.in_job.fetch_add(1, Ordering::Acquire);
        let ptr = state.job_ptr.load(Ordering::Acquire);
        if !ptr.is_null() {
            // SAFETY: non-null ptr was published by dispatch, which
            // holds the dispatch_lock and does not seal (nor drop the
            // Job) until `in_job` returns to zero. We incremented
            // `in_job` before loading ptr, so the seal cannot have
            // happened yet — the referent is live.
            let job = unsafe { &*(ptr as *const Job) };
            // SAFETY: `job.call` is the worker_entry function for the
            // matching FoldState; `job.data` is a `*const FoldState<…>`
            // cast erased at the Job boundary. The caller (dispatch)
            // guarantees the FoldState is live for the duration of
            // this call via the same `in_job` latch.
            unsafe { (job.call)(job.data, thread_idx); }
        }
        state.in_job.fetch_sub(1, Ordering::Release);
    }
}
}

The critical ordering: in_job increment happens before job_ptr load. This closes the TOCTOU gap:

Without this ordering, a thread could load job_ptr (valid), then the body returns and destroys the stack, then the thread dereferences the destroyed pointer → SIGSEGV. The in_job counter makes the thread visible to the latch before it touches the pointer.

run_fold

#![allow(unused)]
fn main() {
pub(crate) fn run_fold<N, H, R, F, G, P: FunnelPolicy>(
    fold: &F, graph: &G, root: &N,
    pool_state: &PoolState, spec: &Spec<P>,
) -> R
where
    F: FoldOps<N, H, R> + 'static, G: TreeOps<N> + 'static,
    N: Clone + Send + 'static, H: 'static, R: Send + 'static,
{
    let store = P::Queue::create_store(&spec.queue, pool_state.n_threads);
    let chain_arena = Arena::<ChainNode<H, R>>::new();
    let cont_arena = ContArena::<Cont<H, R>>::new();
    let root_cell = RootCell::new();

    let view = FoldView {
        pool_state,
        fold_done: AtomicBool::new(false),
        idle_count: AtomicU32::new(0),
        n_workers: pool_state.n_threads,
    };

    let ctx = WalkCtx {
        fold,
        graph,
        view: &view,
        chain_arena: &chain_arena,
        cont_arena: &cont_arena,
        _policy: std::marker::PhantomData,
    };

    let state = FoldState::<N, H, R, F, G, P> {
        ctx: &ctx,
        store: &store,
    };
    // The ONE unsafe boundary: erase typed FoldState to *const () for the Job.
    let job = Job {
        call: worker_entry::<N, H, R, F, G, P>,
        data: &state as *const FoldState<N, H, R, F, G, P> as *const (),
    };

    dispatch(pool_state, &job, || {
        let caller_idx = view.n_workers;
        let handle = P::Queue::handle(&store, caller_idx);
        let wake_state = Cell::new(P::Wake::init_state(&spec.wake));
        let wctx = WorkerCtx::<N, H, R, F, G, P> { ctx: &ctx, handle, wake_state };

        walk_cps(&wctx, root.clone(), Cont::Root(&root_cell as *const RootCell<R>));

        let mut spins = 0u64;
        while !root_cell.is_done() {
            if let Some(task) = wctx.handle.try_acquire() {
                execute_task(&wctx, task);
                spins = 0;
            } else {
                spins += 1;
                if spins > 10_000_000 {
                    panic!("run_fold hung: root_done={}", root_cell.is_done());
                }
                std::hint::spin_loop();
            }
        }

        root_cell.take()
    })
}
}

Creates per-fold state (store, arenas, root cell, view, context), erases it to *const () for the Job, and delegates to dispatch. The body walks the root and help-loops until root_cell.is_done().

Scoped pool

Pool::with(n, |pool| ...) uses std::thread::scope — threads are joined when the closure returns. No leaked threads, no lifetime footguns.

The pool is the executor’s Resource (defined by the Resource GAT on ExecutorSpec). It can be provided explicitly via .attach(), or created internally by .run() / .session():

#![allow(unused)]
fn main() {
use hylic::prelude::*;

// One-shot: pool created + destroyed per fold
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);

// Session: pool shared across folds
exec(funnel::Spec::default(8)).session(|s| {
    s.run(&fold, &graph, &root);
    s.run(&fold, &graph, &root);
});

// Explicit attach: manual pool, multiple policies
funnel::Pool::with(8, |pool| {
    let pw = exec(funnel::Spec::default(8)).attach(pool);
    let sh = exec(funnel::Spec::for_wide_light(8)).attach(pool);
    pw.run(&fold, &graph, &root);
    sh.run(&fold, &graph, &root);
});
}

Thread spawn/join cost is paid once per pool scope. Each .run() allocates working memory fresh — only threads are shared.

Queue Strategies

Two work-stealing strategies, selected at compile time via the WorkStealing trait:

#![allow(unused)]
fn main() {
/// A work-stealing strategy. Associates typed Store and Handle via GATs.
pub trait WorkStealing: 'static {
    type Spec: Copy + Default + Send + Sync;
    type Store<N: Send + 'static, H: 'static, R: Send + 'static>: Send + Sync;
    type Handle<'a, N: Send + 'static, H: 'static, R: Send + 'static>: TaskOps<N, H, R>
    where Self: 'a;

    fn create_store<N: Send + 'static, H: 'static, R: Send + 'static>(
        spec: &Self::Spec, n_workers: usize,
    ) -> Self::Store<N, H, R>;

    fn reset_store<N: Send + 'static, H: 'static, R: Send + 'static>(
        store: &mut Self::Store<N, H, R>,
    );

    fn handle<'a, N: Send + 'static, H: 'static, R: Send + 'static>(
        store: &'a Self::Store<N, H, R>, worker_idx: usize,
    ) -> Self::Handle<'a, N, H, R>;
}
}

The three associated types capture the queue lifecycle:

Spec: construction-time configuration
Store<N, H, R>: per-fold resources (deques, bitmask, etc.)
Handle<'a, N, H, R>: per-worker view that borrows from Store

Workers interact through TaskOps:

#![allow(unused)]
fn main() {
/// Per-worker task operations. Each WorkStealing::Handle implements this.
pub trait TaskOps<N, H, R> {
    /// Returns None on success, Some(task) if queue full (caller executes inline).
    fn push(&self, task: FunnelTask<N, H, R>) -> Option<FunnelTask<N, H, R>>;
    fn try_acquire(&self) -> Option<FunnelTask<N, H, R>>;
}
}

push returns Some(task) if the queue is full — the caller executes inline (Cilk overflow protocol). try_acquire encapsulates the strategy’s acquisition policy.

PerWorker: Chase-Lev deques + bitmask steal

Each worker owns a Chase-Lev deque. Push is LIFO (local, no atomic). Steal uses an AtomicU64 bitmask to find non-empty deques — one atomic load instead of scanning N deques.

#![allow(unused)]
fn main() {
impl<N: Send + 'static, H: 'static, R: Send + 'static> TaskOps<N, H, R>
    for PerWorkerHandle<'_, N, H, R>
{
    fn push(&self, task: FunnelTask<N, H, R>) -> Option<FunnelTask<N, H, R>> {
        match self.my_deque.push(task) {
            Ok(()) => {
                self.work_available.fetch_or(1u64 << self.my_idx, Ordering::Relaxed);
                None
            }
            Err(task) => Some(task),
        }
    }

    fn try_acquire(&self) -> Option<FunnelTask<N, H, R>> {
        // Local deque first — LIFO pop, cache-warm, no contention.
        if let Some(task) = self.my_deque.pop() { return Some(task); }
        // Bitmask-guided steal from other deques.
        let mut bits = self.work_available.load(Ordering::Relaxed);
        bits &= !(1u64 << self.my_idx);
        while bits != 0 {
            let target = bits.trailing_zeros() as usize;
            if let Some(task) = self.all_deques[target].steal() {
                return Some(task);
            }
            self.work_available.fetch_and(!(1u64 << target), Ordering::Relaxed);
            bits &= !(1u64 << target);
        }
        None
    }
}
}

Push: LIFO to own deque (no CAS), set bit in bitmask.

Try acquire: Pop local deque first (cache-warm, zero contention). If empty, load bitmask, iterate set bits, steal FIFO from first non-empty deque. Clear bit if deque found empty.

Best for: Trees with moderate-to-high branching. LIFO push + FIFO steal gives depth-first local execution with breadth-first work distribution. The bitmask avoids scanning all N deques.

Shared: single StealQueue

All threads push to one queue. All threads steal from it.

Push: fetch_add on bottom, write to segment slot.

Steal: CAS on top, read from segment slot. FIFO order.

Best for: Wide trees (bf > 10) and small trees where per-worker deque overhead is disproportionate. No bitmask, no per-deque allocation.

When to use which

Workload	Strategy	Why
General (bf=4-8)	PerWorker	LIFO locality, bitmask steal
Wide (bf > 10)	Shared	No deque allocation per worker
Deep narrow (bf=2)	PerWorker	DFS spine dominates
Small tree (< 50 nodes)	Shared	Lower fixed overhead

Accumulation Strategies

Two ways to fold child results into the parent’s heap, selected at compile time via the AccumulateStrategy trait. Both preserve child order — accumulate is called in slot order regardless of which worker delivered first. This is what allows hylic’s non-associative accumulate to run correctly in parallel:

#![allow(unused)]
fn main() {
/// Accumulation strategy: how child results flow into the parent's heap.
pub trait AccumulateStrategy: 'static {
    type Spec: Copy + Default + Send + Sync;

    fn deliver<N, H, R>(
        chain: &FoldChain<H, R>, slot: SlotRef, result: R,
        fold: &impl FoldOps<N, H, R>,
    ) -> Option<R>;

    fn set_total<N, H, R>(
        chain: &FoldChain<H, R>,
        fold: &impl FoldOps<N, H, R>,
    ) -> Option<R>;
}
}

Both use the same ticket system for last-event detection. They differ in WHEN and HOW the heap is swept.

OnArrival: streaming sweep

Each delivery tries to sweep contiguous filled slots immediately. A CAS gate (bit 31 of sweep: AtomicU32) ensures only one thread sweeps at a time. A cursor (bits 0-30) tracks sweep progress.

Per-delivery flow:

Write result to slot, filled.store(true, Release)
state.fetch_add(1, Relaxed) — take ticket
Try CAS gate: if won, sweep contiguous filled slots from cursor
If this was the last event (ticket), spin until sweep completes

Advantage: Results accumulate as they arrive — lower latency to completion when children finish in order.

Cost: CAS contention on the sweep gate when multiple threads deliver simultaneously. ~16-28ns per delivery depending on gate outcome.

OnFinalize: bulk sweep

Deliveries only store + ticket. No sweep, no CAS gate. The last event (determined by the ticket) bulk-sweeps all slots at once.

Per-delivery flow:

Write result to slot, filled.store(true, Release)
state.fetch_add(1, Relaxed) — take ticket
If last event: iterate all slots, filled.load(Acquire) each, accumulate, finalize

Advantage: Zero CAS contention per delivery. Each delivery is ~11ns (store + fetch_add only).

Cost: The finalizer must spin-wait on any slot whose filled hasn’t been published yet (rare — Release/Acquire propagates quickly). Cold-cache sweep if slots were written by other cores.

When to use which

Workload	Strategy	Why
Init-heavy, graph-heavy	OnArrival	Streaming pipelines computation
Balanced, finalize-heavy	OnFinalize	Minimal per-delivery overhead
Wide trees (bf > 10)	Either	OnArrival if results arrive in order
Deep narrow (bf=2)	OnFinalize	Only 2 slots — sweep trivial

Memory footprint

Both strategies use destructive reads — results are moved out of their slots during accumulation, then dropped. For result types that own heap memory (String, Vec, etc.), this means resources are freed progressively as the sweep advances, not held alive until fold completion.

With OnArrival, the live memory at any point is bounded by the number of delivered-but-not-yet-swept results (typically much smaller than total results). With OnFinalize, all results are delivered before the bulk sweep, so peak memory equals the node’s child count × result size — freed in one pass.

This is relevant for folds over large trees where each result carries significant heap data (parsed documents, artifact records, aggregated datasets). See Infrastructure for the arena allocation model.

Infrastructure

Four supporting types underpin the funnel executor: a segmented bump allocator (backing both Arena and ContArena), a Chase-Lev deque, and a futex-based parking primitive. All are created per fold in run_fold and dropped at fold completion.

SegmentedSlab<T> — the allocation foundation

Arena and ContArena share a common backing store: SegmentedSlab<T>. It grows lazily in 64-slot segments, never invalidating existing references. This is the key invariant that makes it safe under the CPS walk where alloc() and get() interleave with live references during recursive child discovery.

The design follows the same AtomicPtr CAS pattern used by the StealQueue’s SegmentTable:

On alloc():

next.fetch_add(1, Relaxed) — atomic bump, returns linear index
Decompose index: segment = idx >> 6, offset = idx & 0x3F
ensure_segment(seg) — Acquire load; if null, allocate + CAS
Write value to slot

On get(idx) / take(idx):

Acquire load the segment pointer (L1 cache hit on hot path)
Index into the segment’s slot array

Segment allocation races (multiple threads hit a new segment simultaneously) are resolved by the CAS: one thread installs its segment, losers free theirs. Exactly one segment is installed per position.

Memory profile: A fold over a 200-node tree with bf=8 allocates ~2 segments (128 slots) instead of pre-allocating 4096. Initial overhead: 32KB for the null-pointer table. Growth: one heap allocation per 64 elements. Maximum capacity: 262,144 elements.

Arena<T>

Thin wrapper over SegmentedSlab<T> for values written once and read many times. Used for ChainNode<H, R> — one slot per multi-child node.

alloc(value) → ArenaIdx: delegates to SegmentedSlab. One fetch_add(1, Relaxed) + one segment pointer load.
get(idx) → &T: segment pointer load + slot index. The returned reference is stable — subsequent allocs never invalidate it.
Drop: iterates all allocated slots and drops each value, then frees all segments.

ArenaIdx is u32, Copy — a plain integer index. No refcount. Passing an index across threads costs 4 bytes.

ContArena<T>

Same segmented design as Arena, but with move-out semantics:

alloc(value) → ContIdx: identical to Arena.
take(idx) → T: moves the value OUT of the slot. Called exactly once per slot during fire_cont’s Cont::Direct handling.

Drop frees segment memory only — every allocated slot was already take()n during the upward cascade. If the fold panics mid-execution, slots leak (no tracking bitset). This is accepted: panic during fold is not recoverable.

Used for parent continuations in single-child chains.

WorkerDeque<T>

Chase-Lev work-stealing deque. Fixed-capacity ring buffer with power-of-2 masking.

The deque provides per-worker local task storage for the PerWorker queue strategy:

Owner: LIFO push/pop from bottom (no atomics in fast path)
Stealers: FIFO steal from top (CAS for contention)
ManuallyDrop<T> wrapping prevents double-free on speculative reads between pop and steal
Cache-padded: bottom and top on separate 128-byte lines

If the deque is full, push returns the task to the caller, which executes it inline (Cilk overflow protocol). This makes the fixed capacity a performance knob, not a correctness hazard.

EventCount

Lock-free thread parking via atomic epoch + futex. Used for pool thread parking and idle worker notification.

The protocol prevents lost wakeups structurally:

prepare() → Token: snapshot the epoch (Acquire)
wait(token): futex sleep if epoch unchanged since prepare()
notify_one() / notify_all(): bump epoch (Release) + wake

If a notification fires between prepare() and wait(), the epoch has changed and wait returns immediately — no lost wakeup.

Why arenas, not per-node allocation

Stable references: segmented layout means alloc() never invalidates existing pointers. Vec-backed growth would require reallocation, breaking live references in walk_cps.
No refcounting: ArenaIdx is Copy, 4 bytes. The equivalent with per-node allocation would be Arc<ChainNode> at ~10-15ns per clone/drop.
Lazy growth: memory usage is proportional to actual tree size, not a pre-configured maximum. No capacity configuration needed.
Bulk cleanup: arena drop iterates allocated slots + frees segments. No per-node free list interaction.

Streaming sweep and memory footprint

With the OnArrival accumulation strategy, child results are moved out of their slots during the sweep (destructive read). Each result is borrowed for fold.accumulate, then dropped. This means heap resources owned by result values (Strings, Vecs, etc.) are freed progressively as the sweep advances — not held alive until fold completion. See Accumulation strategies for the sweep mechanics.

Testing

The funnel test suite covers three dimensions: correctness, stress, and the hylomorphism property. All tests run for both queue strategies (PerWorker and Shared) via policy-generic test helpers.

Correctness

Verify that funnel produces the same result as the sequential Fused executor across all named policy presets:

Default, SharedDefault, WideLight, LowOverhead, PerWorkerArrival
Tree sizes: 60 nodes (bf=4), 200 nodes (bf=6, bf=20)
Zero workers (all work done by the caller thread)
Adjacency-list trees (callback-based treeish_visit)
Wide-tree stress (500 iterations, pool reused)

Stress

High iteration counts to catch timing-sensitive races:

1500 runs per policy on a reused pool
Pool lifecycle: 5000 create/destroy cycles
Mixed policy: 50k iterations switching between PerWorker and Shared on the same pool (mimics criterion benchmark pattern)
20k noop iterations each: Shared + OnFinalize and Shared + OnArrival at criterion warmup intensity
Interleaved policies: 12.5k iterations alternating four policies on one pool

These tests exercise the dispatch → in_job latch protocol under the exact conditions that previously triggered SIGSEGV (high-iteration noop folds with rapid pool reuse).

Interleaving proof

The hylomorphism property: fold interleaves with traversal across subtrees. While one subtree is being visited (walk down), another subtree’s results are being accumulated (cascade up).

The test uses a lock-free TraceLog (2048-entry bounded log, atomic sequence counter) to record visit and accumulate operations with thread IDs and subtree tags. After 20 attempts on an 85-node tree, the test asserts that cross-subtree interleaving occurred — proving that the fold is not merely parallel but genuinely fused.

Fibonacci

Recursive Fibonacci via tree fold — the simplest possible example.

#![allow(unused)]
fn main() {
//! Fibonacci via tree fold — the simplest hylic example.
//! The node type is `i32` — not a struct with children.
//! The treeish computes children from the value: fib(n) → [fib(n-1), fib(n-2)].

#[cfg(test)]
mod tests {
    use hylic::prelude::*;
    use insta::assert_snapshot;


    /// A Fibonacci node: just the number n.
    /// Branches into n-1 and n-2 until reaching base cases 0 or 1.
    #[derive(Clone)]
    struct FibNode(u64);

    #[test]
    fn fibonacci() {
        // Children of fib(n) are fib(n-1) and fib(n-2); fib(0) and fib(1) are leaves.
        let graph: Treeish<FibNode> = treeish(|n: &FibNode| {
            if n.0 <= 1 { vec![] }
            else { vec![FibNode(n.0 - 1), FibNode(n.0 - 2)] }
        });

        // init: leaves seed the heap with n; inner nodes seed with 0.
        // accumulate: each child's result is summed into the heap.
        // finalize: identity (H = R = u64).
        let fib: Fold<FibNode, u64, u64> = fold(
            |n: &FibNode| if n.0 <= 1 { n.0 } else { 0 },
            |heap: &mut u64, child: &u64| *heap += child,
            |h: &u64| *h,
        );

        let result: u64 = FUSED.run(&fib, &graph, &FibNode(10));
        assert_eq!(result, 55);

        assert_snapshot!("fib10", format!("fib(10) = {result}"));
    }
}
}

Output:

fib(10) = 55

Expression evaluation

Evaluate an AST bottom-up. vec_fold gives finalize access to the node and all child results — needed when different node types combine children differently.

#![allow(unused)]
fn main() {
//! Expression evaluation — AST fold with heterogeneous node types.

#[cfg(test)]
mod tests {
    use hylic::prelude::vec_fold::{vec_fold, VecHeap};
    use hylic::prelude::*;
    use insta::assert_snapshot;


    /// An arithmetic expression tree.
    /// Each variant defines both its meaning and its children.
    #[derive(Clone)]
    enum Expr {
        Num(f64),
        Add(Box<Expr>, Box<Expr>),
        Mul(Box<Expr>, Box<Expr>),
        Neg(Box<Expr>),
    }

    /// Convenience constructors for readable test data.
    fn num(v: f64) -> Expr { Expr::Num(v) }
    fn add(a: Expr, b: Expr) -> Expr { Expr::Add(Box::new(a), Box::new(b)) }
    fn mul(a: Expr, b: Expr) -> Expr { Expr::Mul(Box::new(a), Box::new(b)) }
    fn neg(a: Expr) -> Expr { Expr::Neg(Box::new(a)) }

    #[test]
    fn evaluate_expression() {
        let expr = mul(add(num(3.0), num(4.0)), neg(num(2.0)));

        let graph: Treeish<Expr> = treeish_visit(|e: &Expr, cb: &mut dyn FnMut(&Expr)| {
            match e {
                Expr::Num(_) => {}
                Expr::Add(a, b) | Expr::Mul(a, b) => { cb(a); cb(b); }
                Expr::Neg(a) => { cb(a); }
            }
        });

        // vec_fold collects children before finalize, so each variant can
        // combine its results differently (sum / product / negate).
        let eval: Fold<Expr, VecHeap<Expr, f64>, f64> = vec_fold(
            |heap: &VecHeap<Expr, f64>| match &heap.node {
                Expr::Num(v)        => *v,
                Expr::Add(_, _)     => heap.childresults.iter().sum(),
                Expr::Mul(_, _)     => heap.childresults.iter().product(),
                Expr::Neg(_)        => -heap.childresults[0],
            },
        );

        let result: f64 = FUSED.run(&eval, &graph, &expr);
        assert_eq!(result, -14.0);

        assert_snapshot!("expr_eval", format!("(3 + 4) * -(2) = {result}"));
    }
}
}

Output:

(3 + 4) * -(2) = -14

Filesystem summary

Aggregate file sizes, counts, and directory depth in one pass. The heap is a structured Summary — multiple metrics accumulated simultaneously.

#![allow(unused)]
fn main() {
//! Filesystem tree summary — structured heap accumulating multiple metrics.

#[cfg(test)]
mod tests {
    use hylic::prelude::*;
    use insta::assert_snapshot;


    /// A filesystem entry: either a file (leaf) or a directory (branch).
    #[derive(Clone)]
    #[allow(dead_code)]
    enum FsEntry {
        File { name: String, size: u64 },
        Dir { name: String, children: Vec<FsEntry> },
    }

    impl FsEntry {
        fn file(name: &str, size: u64) -> Self {
            FsEntry::File { name: name.into(), size }
        }
        fn dir(name: &str, ch: Vec<FsEntry>) -> Self {
            FsEntry::Dir { name: name.into(), children: ch }
        }
    }

    /// Accumulates size, file count, and directory count in one pass.
    #[derive(Clone, Debug, PartialEq)]
    struct Summary {
        total_size: u64,
        file_count: usize,
        dir_count: usize,
    }

    #[test]
    fn summarize_filesystem() {
        let tree = FsEntry::dir("project", vec![
            FsEntry::file("README.md", 1200),
            FsEntry::dir("src", vec![
                FsEntry::file("main.rs", 5000),
                FsEntry::file("lib.rs", 3000),
                FsEntry::dir("utils", vec![
                    FsEntry::file("helpers.rs", 800),
                ]),
            ]),
            FsEntry::file("Cargo.toml", 400),
        ]);

        // Files are implicit leaves; only directories produce children.
        let graph: Treeish<FsEntry> = treeish_visit(|entry: &FsEntry, cb: &mut dyn FnMut(&FsEntry)| {
            if let FsEntry::Dir { children, .. } = entry {
                for child in children { cb(child); }
            }
        });

        // Structured heap — multiple metrics tracked in one pass. H = R = Summary.
        let summarize: Fold<FsEntry, Summary, Summary> = fold(
            |entry: &FsEntry| match entry {
                FsEntry::File { size, .. } =>
                    Summary { total_size: *size, file_count: 1, dir_count: 0 },
                FsEntry::Dir { .. } =>
                    Summary { total_size: 0, file_count: 0, dir_count: 1 },
            },
            |heap: &mut Summary, child: &Summary| {
                heap.total_size += child.total_size;
                heap.file_count += child.file_count;
                heap.dir_count += child.dir_count;
            },
            |h: &Summary| h.clone(),
        );

        let result: Summary = FUSED.run(&summarize, &graph, &tree);
        assert_eq!(result, Summary {
            total_size: 10400, file_count: 5, dir_count: 3,
        });

        assert_snapshot!("fs_summary", format!(
            "project/: {} bytes, {} files, {} dirs",
            result.total_size, result.file_count, result.dir_count,
        ));
    }
}
}

Output:

project/: 10400 bytes, 5 files, 3 dirs

Cycle detection

Detect cycles in a dependency graph. Cycle state lives in the node type (ancestor set), not the fold — the Treeish decides structure, the Fold just collects.

#![allow(unused)]
fn main() {
//! Cycle detection in a dependency graph.
//! Demonstrates: treeish over a graph with potential cycles,
//! fold that tracks visited nodes to detect re-entry.

#[cfg(test)]
mod tests {
    use std::collections::{HashMap, HashSet};
    use hylic::prelude::*;
    use insta::assert_snapshot;


    /// A dependency graph defined as adjacency lists.
    /// Nodes are string IDs, edges are dependencies.
    #[derive(Clone)]
    struct DepGraph {
        edges: HashMap<String, Vec<String>>,
    }

    impl DepGraph {
        fn new(edges: &[(&str, &[&str])]) -> Self {
            DepGraph {
                edges: edges.iter()
                    .map(|(k, v)| (k.to_string(), v.iter().map(|s| s.to_string()).collect()))
                    .collect(),
            }
        }
    }

    /// A node in the traversal: carries the current ID and
    /// the set of ancestors on this path (for cycle detection).
    #[derive(Clone)]
    struct DepNode {
        id: String,
        ancestors: HashSet<String>,
    }

    impl DepNode {
        fn root(id: &str) -> Self {
            DepNode { id: id.to_string(), ancestors: HashSet::new() }
        }
        fn child(&self, id: &str) -> Self {
            let mut ancestors = self.ancestors.clone();
            ancestors.insert(self.id.clone());
            DepNode { id: id.to_string(), ancestors }
        }
        fn is_cycle(&self) -> bool {
            self.ancestors.contains(&self.id)
        }
    }

    /// Result of cycle analysis for a subtree.
    #[derive(Clone, Debug)]
    struct CycleResult {
        cycles: Vec<String>,
        visited: usize,
    }

    #[test]
    fn detect_cycles() {
        let graph_data = DepGraph::new(&[
            ("A", &["B", "C"]),
            ("B", &["D"]),
            ("C", &["D", "A"]),  // C → A creates a cycle
            ("D", &[]),
        ]);

        // Cycle state lives in the node type — DepNode carries its ancestor
        // set. When a node sees itself in that set, the treeish stops by
        // returning no children.
        let graph: Treeish<DepNode> = treeish(move |node: &DepNode| {
            if node.is_cycle() { return vec![]; }
            graph_data.edges.get(&node.id)
                .map(|deps| deps.iter().map(|d| node.child(d)).collect())
                .unwrap_or_default()
        });

        let detect: Fold<DepNode, CycleResult, CycleResult> = fold(
            |node: &DepNode| CycleResult {
                cycles:  if node.is_cycle() { vec![node.id.clone()] } else { vec![] },
                visited: 1,
            },
            |heap: &mut CycleResult, child: &CycleResult| {
                heap.cycles.extend(child.cycles.iter().cloned());
                heap.visited += child.visited;
            },
            |h: &CycleResult| h.clone(),
        );

        let result: CycleResult = FUSED.run(&detect, &graph, &DepNode::root("A"));

        assert_eq!(result.cycles, vec!["A"]);  // C → A cycle detected
        assert_eq!(result.visited, 6);          // A, B, C, D, D, A(cycle)

        assert_snapshot!("cycles", format!(
            "cycles: {:?}, visited: {}", result.cycles, result.visited
        ));
    }
}
}

Output:

cycles: ["A"], visited: 6

Configuration inheritance

Overlay configuration scopes bottom-up. or_insert in accumulate gives parent-wins semantics — init runs before accumulate, so the parent’s values are already in the map.

#![allow(unused)]
fn main() {
//! Configuration inheritance with overlay/merge.
//! Demonstrates: a fold where the heap IS a config map,
//! and children's configs overlay the parent's defaults.

#[cfg(test)]
mod tests {
    use std::collections::BTreeMap;
    use hylic::prelude::*;
    use insta::assert_snapshot;


    /// A configuration scope. Each scope has its own key-value overrides
    /// and child scopes that inherit and can further override.
    #[derive(Clone, Debug)]
    struct ConfigScope {
        name: String,
        overrides: BTreeMap<String, String>,
        children: Vec<ConfigScope>,
    }

    impl ConfigScope {
        fn new(name: &str, overrides: &[(&str, &str)], children: Vec<ConfigScope>) -> Self {
            ConfigScope {
                name: name.into(),
                overrides: overrides.iter().map(|(k, v)| (k.to_string(), v.to_string())).collect(),
                children,
            }
        }
        fn leaf(name: &str, overrides: &[(&str, &str)]) -> Self {
            Self::new(name, overrides, vec![])
        }
    }

    /// Resolved configuration: the merged key-value map for a scope,
    /// collecting all overrides from the scope and its descendants.
    #[derive(Clone, Debug, PartialEq)]
    struct ResolvedConfig {
        scope: String,
        merged: BTreeMap<String, String>,
    }

    #[test]
    fn config_overlay() {
        let root = ConfigScope::new("global", &[
            ("color", "blue"),
            ("font_size", "12"),
            ("theme", "light"),
        ], vec![
            ConfigScope::new("production", &[
                ("theme", "dark"),
                ("debug", "false"),
            ], vec![
                ConfigScope::leaf("production.api", &[
                    ("font_size", "14"),
                    ("rate_limit", "1000"),
                ]),
            ]),
            ConfigScope::leaf("development", &[
                ("debug", "true"),
                ("theme", "light"),
            ]),
        ]);

        let graph: Treeish<ConfigScope> =
            treeish_from(|scope: &ConfigScope| scope.children.as_slice());

        // init seeds the heap with the scope's own overrides; init runs
        // before accumulate, so parent values win — child entries only
        // fill in keys the parent hasn't set.
        let resolve: Fold<ConfigScope, ResolvedConfig, ResolvedConfig> = fold(
            |scope: &ConfigScope| ResolvedConfig {
                scope:  scope.name.clone(),
                merged: scope.overrides.clone(),
            },
            |heap: &mut ResolvedConfig, child: &ResolvedConfig| {
                for (k, v) in &child.merged {
                    heap.merged.entry(k.clone()).or_insert_with(|| v.clone());
                }
            },
            |h: &ResolvedConfig| h.clone(),
        );

        let result: ResolvedConfig = FUSED.run(&resolve, &graph, &root);

        // Global scope sees all keys from all descendants,
        // but its own values win for "color", "font_size", "theme".
        assert_eq!(result.merged.get("color").unwrap(), "blue");
        assert_eq!(result.merged.get("theme").unwrap(), "light");  // parent wins
        assert_eq!(result.merged.get("debug").unwrap(), "false");  // production's value
        assert_eq!(result.merged.get("rate_limit").unwrap(), "1000");

        let display: Vec<String> = result.merged.iter()
            .map(|(k, v)| format!("{k}={v}")).collect();
        assert_snapshot!("config", display.join(", "));
    }
}
}

Output:

color=blue, debug=false, font_size=12, rate_limit=1000, theme=light

Parallel execution

hylic provides two built-in executors. FUSED runs the fold sequentially through callback-based recursion. The Funnel executor parallelizes the same fold across a scoped thread pool. Both are invoked through the same .run() method.

Sequential: `FUSED`

Callback-based recursion on a single thread, with no overhead beyond the fold closures themselves:

#![allow(unused)]
fn main() {
use hylic::prelude::*;
FUSED.run(&fold, &graph, &root);
}

Parallel: Funnel

The Funnel executor preserves the fused property — children are discovered through graph.visit and processed concurrently. No intermediate tree is built.

One-shot

#![allow(unused)]
fn main() {
use hylic::prelude::*;
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
}

Spec::default(n) uses the Robust policy preset. .run() creates a scoped thread pool internally, runs the fold, and joins before returning.

Session scope

For repeated folds, amortize pool creation:

#![allow(unused)]
fn main() {
exec(funnel::Spec::default(8)).session(|s| {
    s.run(&fold1, &graph1, &root1);
    s.run(&fold2, &graph2, &root2);
});
}

The pool lives for the closure. Each .run() inside is cheap.

Explicit attach

Provide the pool yourself:

#![allow(unused)]
fn main() {
funnel::Pool::with(8, |pool| {
    let pw = exec(funnel::Spec::default(8)).attach(pool);
    let sh = exec(funnel::Spec::for_wide_light(8)).attach(pool);
    pw.run(&fold, &graph, &root);
    sh.run(&fold, &graph, &root);
});
}

Different policies can share a pool — each .attach() consumes a (Copy) Spec and binds it to the pool, producing a session-level executor.

Policy variants

Preset	Best for
`Spec::default(n)`	General purpose
`Spec::for_wide_light(n)`	Wide trees (bf > 10)
`Spec::for_deep_narrow(n)`	Deep chains (bf = 2)
`Spec::for_low_overhead(n)`	Overhead-sensitive
`Spec::for_high_throughput(n)`	Heavy balanced

See Funnel policies for the full decision guide and The Exec pattern for the type-level design behind .run(), .session(), and .attach().

External parallel options

One additional strategy lives in a sibling crate:

Rayon (hylic-benchmark): par_iter-based fork-join

Working example

This example uses a flat adjacency list — nodes are integer indices, children are looked up by index. The same fold runs sequentially (Fused) and in parallel (Funnel) with identical results.

#![allow(unused)]
fn main() {
//! Parallel execution: Fused vs Funnel over flat data.
//! Demonstrates: adjacency-list graph, identical results across
//! policies, session scope, explicit pool attach.

#[cfg(test)]
mod tests {
    use hylic::prelude::*;
    use hylic::exec::funnel;
    use insta::assert_snapshot;

    /// Build a tree as a flat adjacency list + value array.
    /// Node 0 is the root with 6 children; each child has 3 leaves.
    fn build_tree() -> (Vec<Vec<usize>>, Vec<u64>) {
        let mut adj: Vec<Vec<usize>> = Vec::new();
        let mut vals: Vec<u64> = Vec::new();

        // root (node 0)
        adj.push((1..=6).collect());
        vals.push(1);

        // 6 branches (nodes 1-6), each with 3 leaves
        let mut next_leaf = 7;
        for i in 0..6 {
            let children: Vec<usize> = (next_leaf..next_leaf + 3).collect();
            adj.push(children);
            vals.push(i as u64 * 10);
            next_leaf += 3;
        }

        // 18 leaves (nodes 7-24)
        for i in 0..6 {
            for j in 0..3u64 {
                adj.push(vec![]);
                vals.push(i as u64 * 10 + j);
            }
        }

        (adj, vals)
    }

    #[test]
    fn parallel_strategies() {
        let (adj, vals) = build_tree();

        // The treeish looks up children by index; no nested structs.
        let adj_for_graph = adj.clone();
        let graph: Treeish<usize> = treeish_visit(move |n: &usize, cb: &mut dyn FnMut(&usize)| {
            for &c in &adj_for_graph[*n] { cb(&c); }
        });

        let vals_for_fold = vals.clone();
        let sum: Fold<usize, u64, u64> = fold(
            move |n: &usize| vals_for_fold[*n],
            |heap: &mut u64, child: &u64| *heap += child,
            |heap: &u64| *heap,
        );

        // Sequential baseline
        let expected = FUSED.run(&sum, &graph, &0usize);

        // One-shot: .run() creates + destroys pool internally
        let r_default = exec(funnel::Spec::default(4)).run(&sum, &graph, &0usize);
        assert_eq!(r_default, expected);

        // Different policy: wide-light
        let r_wide = exec(funnel::Spec::for_wide_light(4)).run(&sum, &graph, &0usize);
        assert_eq!(r_wide, expected);

        // Session scope: pool shared across folds
        exec(funnel::Spec::default(4)).session(|s| {
            assert_eq!(s.run(&sum, &graph, &0usize), expected);
            assert_eq!(s.run(&sum, &graph, &0usize), expected);
        });

        // Explicit attach: manual pool, multiple policies
        funnel::Pool::with(4, |pool| {
            let pw = exec(funnel::Spec::default(4)).attach(pool);
            let sh = exec(funnel::Spec::for_wide_light(4)).attach(pool);
            assert_eq!(pw.run(&sum, &graph, &0usize), expected);
            assert_eq!(sh.run(&sum, &graph, &0usize), expected);
        });

        assert_snapshot!("parallel", format!(
            "sum = {expected}, verified: fused, funnel(one-shot), funnel(wide), session, attach"
        ));
    }
}
}

Output:

sum = 619, verified: fused, funnel(one-shot), funnel(wide), session, attach

Zero-cost performance

The closure-based API (Fold from a domain module, plus Treeish) is the ergonomic default. For performance-critical paths, the graph side admits user-defined TreeOps implementations whose visit method monomorphises directly. The fold side does not — the executor signature pins the fold type to D::Fold<H, R> (the closure-based wrapper). The two ops traits are nevertheless the right vocabulary for thinking about per-node cost.

The overhead budget

Per node with K children, the fused executor makes these calls through the closure-based API:

Call site	Count	Dispatch
`fold.init(node)`	1	`dyn Fn` via Arc/Rc/Box
`graph.visit(node, cb)`	1	`dyn Fn` via Arc/Rc/Box
`cb(child)` inside visit	K	`&mut dyn FnMut` callback
`fold.accumulate(heap, &r)`	K	`dyn Fn` via Arc/Rc/Box
`fold.finalize(heap)`	1	`dyn Fn` via Arc/Rc/Box
Total	3+2K

Measured: ~0.47 ns per indirect call (well-predicted by the branch predictor). On a noop workload (bf=8, 200 nodes): ~1.8 µs above hand-written recursion. On any real workload (>10 µs/node) the overhead drops below the noise floor.

Eliminating graph dispatch: implement TreeOps

The executor’s graph parameter is generic — G: TreeOps<N> — so any concrete impl is monomorphised at the call site:

#![allow(unused)]
fn main() {
    #[test]
    fn zero_cost_treeops() {
        use hylic::prelude::*;
        use hylic::ops::TreeOps;

        #[derive(Clone)]
        struct TreeNode { id: usize, value: u64 }

        struct AdjGraph {
            adj:   Vec<Vec<usize>>,
            nodes: Vec<TreeNode>,
        }
        impl TreeOps<TreeNode> for AdjGraph {
            fn visit(&self, node: &TreeNode, cb: &mut dyn FnMut(&TreeNode)) {
                for &child_id in &self.adj[node.id] {
                    cb(&self.nodes[child_id]);
                }
            }
        }

        let graph = AdjGraph {
            adj: vec![vec![1, 2], vec![], vec![]],
            nodes: vec![
                TreeNode { id: 0, value: 1 },
                TreeNode { id: 1, value: 2 },
                TreeNode { id: 2, value: 3 },
            ],
        };

        let f: Fold<TreeNode, u64, u64> = fold(
            |n: &TreeNode| n.value,
            |h: &mut u64, r: &u64| *h += r,
            |h: &u64| *h,
        );
        let root = graph.nodes[0].clone();
        let total: u64 = FUSED.run(&f, &graph, &root);
        assert_eq!(total, 6);
    }
}

AdjGraph::visit is a direct, inlinable loop. Only the callback cb: &mut dyn FnMut is still indirect — K calls per node. The closure-based fold is still in the picture because executors take &D::Fold<H, R>; replacing it requires a custom executor (below).

The shape of the trait

FoldOps and TreeOps are the operation traits any user code can target:

pub trait FoldOps<N, H, R> {
    fn init(&self, node: &N) -> H;
    fn accumulate(&self, heap: &mut H, result: &R);
    fn finalize(&self, heap: &H) -> R;
}

pub trait TreeOps<N> {
    fn visit(&self, node: &N, cb: &mut dyn FnMut(&N));
}

The closure-based domain folds (shared::Fold, local::Fold, owned::Fold) implement FoldOps by delegating to their stored closures. A user-defined FoldOps struct is callable from any custom executor that drives the recursion through the trait — bypassing the closure layer entirely. The shipped Fused executor’s inner loop is exactly this:

#![allow(unused)]
fn main() {
fn recurse<N, H, R>(
    fold: &impl FoldOps<N, H, R>,
    graph: &impl TreeOps<N>,
    node: &N,
) -> R {
    let mut heap = fold.init(node);
    graph.visit(node, &mut |child: &N| {
        let r = recurse(fold, graph, child);
        fold.accumulate(&mut heap, &r);
    });
    fold.finalize(&heap)
}
}

When the budget matters

Path	Per-node overhead	When to use
Closure-based Fold + Treeish	3+2K indirect calls	Default — combinators, lifts, sugars
Closure-based Fold + custom TreeOps	K+1 indirect calls	Adjacency lists or graph types where the visit path is the hot side
Custom executor over `FoldOps + TreeOps`	K indirect calls	Maximum control; sacrifices the lift / pipeline machinery for one specific shape

Why LTO doesn’t help

LLVM cannot devirtualise Rust dyn Fn calls. Rust does not emit the !vcall_visibility metadata that LLVM’s whole-program devirtualisation would need. Neither thin LTO nor fat LTO changes this. The trait-based path is the only reliable way to eliminate dispatch.

See Benchmarks for the measured comparison across all execution modes.

Transformations

Features as standalone functions matching the transformation contract. One domain, one base fold, one base graph. Each feature is a named function — defined separately, plugged in with a single method call.

The phase-wrapping contract — each wrapper receives the original phase as a callable reference:

wrap_init: Fn(&N, &dyn Fn(&N) -> H) -> H
wrap_accumulate: Fn(&mut H, &R, &dyn Fn(&mut H, &R))
wrap_finalize: Fn(&H, &dyn Fn(&H) -> R) -> R

#![allow(unused)]
fn main() {
//! Transformations: features as standalone functions that match the contract.
//!
//! One domain, one base fold, one base graph. Each feature is a named
//! function — it IS the concern, separated and reusable. Plugging it
//! in is a single method call on the existing construct.

#[cfg(test)]
mod tests {
    use std::collections::HashMap;
    use std::sync::{Arc, Mutex};
    use hylic::prelude::*;
    use hylic::prelude::memoize_treeish_by;
    use insta::assert_snapshot;

    // ── Domain ──────────────────────────────────────────────

    #[derive(Clone, Debug)]
    struct Task {
        name: String,
        cost_ms: u64,
        deps: Vec<String>,
    }

    struct Registry(HashMap<String, Task>);

    impl Registry {
        fn new(tasks: &[(&str, u64, &[&str])]) -> Self {
            Registry(tasks.iter().map(|(name, cost, deps)| {
                (name.to_string(), Task {
                    name: name.to_string(),
                    cost_ms: *cost,
                    deps: deps.iter().map(|d| d.to_string()).collect(),
                })
            }).collect())
        }
        fn get(&self, name: &str) -> Option<&Task> { self.0.get(name) }
    }

    // ── Shared setup ────────────────────────────────────────

    fn setup() -> (Treeish<Task>, Task) {
        let reg = Registry::new(&[
            ("app",       50,  &["compile", "link"]),
            ("compile",   200, &["parse", "typecheck"]),
            ("parse",     100, &[]),
            ("typecheck", 300, &[]),
            ("link",      150, &[]),
        ]);
        let map = reg.0.clone();
        let g: Treeish<Task> = treeish(move |task: &Task| {
            task.deps.iter().filter_map(|d| map.get(d).cloned()).collect()
        });
        let root = reg.get("app").unwrap().clone();
        (g, root)
    }

    fn base_fold() -> Fold<Task, u64, u64> {
        fold(
            |t: &Task| t.cost_ms,
            |heap: &mut u64, child: &u64| *heap += child,
            |h: &u64| *h,
        )
    }

    // ── Fold phase wrappers ─────────────────────────────────
    //
    // Each is a standalone closure matching the wrap contract:
    //   wrap_init:       Fn(&N, &dyn Fn(&N) -> H) -> H
    //   wrap_accumulate: Fn(&mut H, &R, &dyn Fn(&mut H, &R))
    //   wrap_finalize:   Fn(&H, &dyn Fn(&H) -> R) -> R

    /// Hooks into init: called once per node, before children.
    /// Logs the task name, then delegates to the original init.
    fn visit_logger(sink: Arc<Mutex<Vec<String>>>)
        -> impl Fn(&Task, &dyn Fn(&Task) -> u64) -> u64
    {
        move |task: &Task, orig: &dyn Fn(&Task) -> u64| {
            sink.lock().unwrap().push(task.name.clone());
            orig(task)
        }
    }

    /// Hooks into accumulate: conditionally skips small children.
    /// By not calling orig, the child result is never folded in.
    fn skip_small_children(threshold: u64)
        -> impl Fn(&mut u64, &u64, &dyn Fn(&mut u64, &u64))
    {
        move |heap: &mut u64, child: &u64, orig: &dyn Fn(&mut u64, &u64)| {
            if *child >= threshold { orig(heap, child); }
        }
    }

    /// Hooks into finalize: clamps the result.
    fn clamp_at(max: u64)
        -> impl Fn(&u64, &dyn Fn(&u64) -> u64) -> u64
    {
        move |heap: &u64, orig: &dyn Fn(&u64) -> u64| orig(heap).min(max)
    }

    /// zipmap contract: a plain Fn(&R) -> Extra. No wrapping needed —
    /// the function itself IS the feature. zipmap calls it per node,
    /// pairing the original result with the derived value: R → (R, Extra).
    fn classify(total: &u64) -> &'static str {
        match *total {
            t if t >= 500 => "critical",
            t if t >= 200 => "heavy",
            _ => "light",
        }
    }

    // ── Graph transformations ───────────────────────────────

    fn only_costly_deps(g: &Treeish<Task>, min_cost: u64) -> Treeish<Task> {
        let inner = g.clone();
        treeish(move |task: &Task| {
            inner.at(task)
                .filter(|child: &Task| child.cost_ms >= min_cost)
                .collect_vec()
        })
    }

    // ── Tests ───────────────────────────────────────────────

    #[test]
    fn test_visit_logger() {
        let (graph, root) = setup();
        let visited = Arc::new(Mutex::new(Vec::new()));
        let fold = base_fold().wrap_init(visit_logger(visited.clone()));

        let total = FUSED.run(&fold, &graph, &root);
        let names: Vec<String> = visited.lock().unwrap().clone();
        assert_eq!(total, 800);
        assert_snapshot!("visit_logger", format!(
            "total={total}, visited: {}", names.join(" → ")
        ));
    }

    #[test]
    fn test_skip_small_children() {
        let (graph, root) = setup();
        let fold = base_fold().wrap_accumulate(skip_small_children(200));
        let total = FUSED.run(&fold, &graph, &root);
        // app(50) + compile(200+typecheck 300) = 550; parse(100) and link(150) skipped
        assert_eq!(total, 550);
        assert_snapshot!("skip_small", format!("total={total} (small children skipped)"));
    }

    #[test]
    fn test_clamp_at() {
        let (graph, root) = setup();
        let fold = base_fold().wrap_finalize(clamp_at(500));
        let total = FUSED.run(&fold, &graph, &root);
        // compile=min(600,500)=500, link=150, app=min(50+500+150,500)=500
        assert_eq!(total, 500);
        assert_snapshot!("clamp_at", format!("total={total} (clamped at 500)"));
    }

    #[test]
    fn test_classify() {
        let (graph, root) = setup();
        let (total, category) = FUSED.run(&base_fold().zipmap(classify), &graph, &root);
        assert_eq!(total, 800);
        assert_eq!(category, "critical");
        assert_snapshot!("classify", format!("total={total}, category={category}"));
    }

    #[test]
    fn test_only_costly_deps() {
        let (graph, root) = setup();
        let filtered = only_costly_deps(&graph, 150);
        let total = FUSED.run(&base_fold(), &filtered, &root);
        // parse(100) pruned: app(50)+compile(200)+typecheck(300)+link(150) = 700
        assert_eq!(total, 700);
        assert_snapshot!("only_costly", format!("total={total} (deps with cost < 150 pruned)"));
    }

    #[test]
    fn test_memoize_diamond() {
        let reg = Registry::new(&[
            ("app", 10, &["compile", "link"]),
            ("compile", 50, &["stdlib"]),
            ("link", 30, &["stdlib"]),
            ("stdlib", 200, &[]),
        ]);
        let visit_count = Arc::new(Mutex::new(0u32));
        let vc = visit_count.clone();
        let map = reg.0.clone();
        let graph = treeish(move |task: &Task| {
            *vc.lock().unwrap() += 1;
            task.deps.iter().filter_map(|d| map.get(d).cloned()).collect()
        });
        let root = reg.get("app").unwrap().clone();

        let total = FUSED.run(&base_fold(), &graph, &root);
        let raw_visits = *visit_count.lock().unwrap();

        *visit_count.lock().unwrap() = 0;
        let cached = memoize_treeish_by(&graph, |t: &Task| t.name.clone());
        let total_memo = FUSED.run(&base_fold(), &cached, &root);
        let memo_visits = *visit_count.lock().unwrap();

        assert_eq!((total, raw_visits), (490, 5));
        assert_eq!((total_memo, memo_visits), (490, 4));
        assert_snapshot!("memoize", format!(
            "raw: total={total} visits={raw_visits}, memo: total={total_memo} visits={memo_visits}"
        ));
    }

    #[test]
    fn test_composed_pipeline() {
        let (graph, root) = setup();
        let visited = Arc::new(Mutex::new(Vec::new()));
        let pipeline = base_fold()
            .wrap_init(visit_logger(visited.clone()))
            .wrap_finalize(clamp_at(500))
            .zipmap(classify);

        let (total, category) = FUSED.run(&pipeline, &graph, &root);
        let names: Vec<String> = visited.lock().unwrap().clone();
        assert_eq!(total, 500);
        assert_eq!(category, "critical");
        assert_snapshot!("composed", format!(
            "total={total} [{category}], visited: {}", names.join(" → ")
        ));
    }
}
}

Outputs:

total=800, visited: app → compile → parse → typecheck → link

total=550 (small children skipped)

total=500 (clamped at 500)

total=800, category=critical

total=700 (deps with cost < 150 pruned)

raw: total=490 visits=5, memo: total=490 visits=4

total=500 [critical], visited: app → compile → parse → typecheck → link

Module resolution

Lazy dependency resolution via SeedPipeline. A grow function resolves dependency references (seeds) into modules (nodes), which may themselves have dependencies. Error handling uses Either<Error, Valid> — error nodes are leaves with no children.

See Seed-based lazy discovery for the SeedPipeline API and its internal mechanics.

#![allow(unused)]
fn main() {
//! Minified module resolution — the pattern that motivated hylic.
//! Demonstrates: SeedPipeline for lazy dependency discovery,
//! error handling via Either, and seeds_for_fallible.

#[cfg(test)]
mod tests {
    use std::collections::HashMap;
    use either::Either;

    use hylic_pipeline::prelude::*;
    use insta::assert_snapshot;


    /// A module has a name and declares dependencies on other modules.
    #[derive(Clone, Debug)]
    struct Module {
        name: String,
        deps: Vec<String>,
    }

    /// A module registry: maps names to module definitions.
    struct Registry(HashMap<String, Module>);

    impl Registry {
        fn new(modules: &[(&str, &[&str])]) -> Self {
            Registry(modules.iter().map(|(name, deps)| {
                (name.to_string(), Module {
                    name: name.to_string(),
                    deps: deps.iter().map(|s| s.to_string()).collect(),
                })
            }).collect())
        }
    }

    /// Error when a module can't be found.
    #[derive(Clone, Debug)]
    struct ResolveError(String);

    /// Resolution result: either an error or a list of resolved module names.
    #[derive(Clone, Debug)]
    struct Resolved {
        modules: Vec<String>,
        errors: Vec<String>,
    }

    #[test]
    fn resolve_modules() {
        let registry = Registry::new(&[
            ("app",    &["logging", "config", "ghost"]),
            ("logging", &["utils"]),
            ("config", &["utils"]),
            ("utils",  &[]),
            // "ghost" is not in the registry — will produce an error
        ]);

        // Node = Either<ResolveError, Module>. `seeds_for_fallible` adapts a
        // valid-side edge function so errors produce no seeds.
        let seeds_from_node: Edgy<Either<ResolveError, Module>, String> =
            seeds_for_fallible(edgy(move |module: &Module| module.deps.clone()));

        // grow: dependency name → Either<Error, Module>.
        let grow = {
            let reg = registry;
            move |dep_name: &String| -> Either<ResolveError, Module> {
                match reg.0.get(dep_name) {
                    Some(m) => Either::Right(m.clone()),
                    None    => Either::Left(ResolveError(format!("not found: {}", dep_name))),
                }
            }
        };

        let collect: Fold<Either<ResolveError, Module>, Resolved, Resolved> = fold(
            |node: &Either<ResolveError, Module>| match node {
                Either::Right(m) => Resolved { modules: vec![m.name.clone()], errors: vec![] },
                Either::Left(e)  => Resolved { modules: vec![], errors: vec![e.0.clone()] },
            },
            |heap: &mut Resolved, child: &Resolved| {
                heap.modules.extend(child.modules.iter().cloned());
                heap.errors.extend(child.errors.iter().cloned());
            },
            |h: &Resolved| h.clone(),
        );

        let pipeline: SeedPipeline<Shared, Either<ResolveError, Module>, String, Resolved, Resolved> =
            SeedPipeline::new(grow, seeds_from_node, &collect);

        let result: Resolved = pipeline.run_from_slice(
            &FUSED,
            &["app".to_string()],
            Resolved { modules: vec![], errors: vec![] },
        );

        assert!(result.modules.contains(&"utils".to_string()));
        assert!(result.modules.contains(&"app".to_string()));
        assert!(result.errors.contains(&"not found: ghost".to_string()));

        assert_snapshot!("resolution", format!(
            "resolved: [{}], errors: [{}]",
            result.modules.join(", "),
            result.errors.join(", "),
        ));
    }
}
}

Output:

resolved: [app, logging, utils, config, utils], errors: [not found: ghost]

Case study — Explainer

explainer_lift is a ShapeLift constructor that wraps a fold with per-node trace recording. It’s a useful case study because it changes H and R (not N), composes as a post-lift, and produces a result type that lets callers inspect the full computation tree.

What it does

#![allow(unused)]
fn main() {
    pub fn explainer_lift<N, H, R>()
        -> ShapeLift<Shared, N, H, R,
                     N,
                     ExplainerHeap<N, H, ExplainerResult<N, H, R>>,
                     ExplainerResult<N, H, R>>
    where N: Clone + Send + Sync + 'static,
          H: Clone + Send + Sync + 'static,
          R: Clone + Send + Sync + 'static,
    {
        let fold_xform: <Shared as ShapeCapable<N>>::FoldXform<
            H, R, N,
            ExplainerHeap<N, H, ExplainerResult<N, H, R>>,
            ExplainerResult<N, H, R>,
        > = Arc::new(move |f: Fold<N, H, R>| {
            let f1 = f.clone();
            let f2 = f.clone();
            let f3 = f;
            sfold::fold(
                move |n: &N| ExplainerHeap::new(n.clone(), f1.init(n)),
                move |heap: &mut ExplainerHeap<N, H, ExplainerResult<N, H, R>>,
                      child: &ExplainerResult<N, H, R>| {
                    f2.accumulate(&mut heap.working_heap, &child.orig_result);
                    heap.transitions.push(ExplainerStep {
                        incoming_result: child.clone(),
                        resulting_heap:  heap.working_heap.clone(),
                    });
                },
                move |heap: &ExplainerHeap<N, H, ExplainerResult<N, H, R>>| ExplainerResult {
                    orig_result: f3.finalize(&heap.working_heap),
                    heap:        heap.clone(),
                },
            )
        });
        ShapeLift::new(
            <Shared as ShapeCapable<N>>::identity_treeish_xform(),
            fold_xform,
        )
    }
}

The lift wraps:

H becomes ExplainerHeap<N, H, ExplainerResult<N, H, R>>: the original H plus a vector of per-child transitions recorded during accumulate.
R becomes ExplainerResult<N, H, R>: the original result plus the full heap (so callers can walk the trace tree).

Every node’s finalize produces both the original R and the recorded history.

Usage

Via the sugar method .explain() on any Stage-2 pipeline — a treeish-rooted Stage2Pipeline, a seed-rooted Stage2Pipeline, or a TreeishPipeline via auto-lift. A SeedPipeline requires an explicit .lift() first:

#![allow(unused)]
fn main() {
    #[test]
    fn explainer_usage() {
        use hylic_pipeline::prelude::*;

        #[derive(Clone)]
        struct N { val: u64, children: Vec<N> }

        let f: Fold<N, u64, u64> = fold(
            |n: &N| n.val,
            |h: &mut u64, c: &u64| *h += c,
            |h: &u64| *h,
        );
        let root = N { val: 1, children: vec![N { val: 2, children: vec![] }] };

        let trace: ExplainerResult<N, u64, u64> =
            TreeishPipeline::new(treeish(|n: &N| n.children.clone()), &f)
                .lift()
                .then_lift(Shared::explainer_lift::<N, u64, u64>())
                .run_from_node(&FUSED, &root);
        assert_eq!(trace.orig_result, 3);
    }
}

The return type is ExplainerResult<N', H, R> where N' is the chain’s current node type — N on a treeish-rooted chain, but SeedNode<N> on a seed-rooted chain (since the seed chain’s node type is SeedNode<N> from .lift() onward). Access .orig_result for the original computation’s output:

#![allow(unused)]
fn main() {
    #[test]
    fn explainer_orig_result() {
        use hylic_pipeline::prelude::*;

        #[derive(Clone)]
        struct Node { v: u64, ch: Vec<Node> }
        let root = Node { v: 3, ch: vec![
            Node { v: 2, ch: vec![] },
            Node { v: 1, ch: vec![] },
        ]};

        let tp: TreeishPipeline<Shared, Node, u64, u64> = TreeishPipeline::new(
            treeish(|n: &Node| n.ch.clone()),
            &fold(|n: &Node| n.v, |h: &mut u64, c: &u64| *h += c, |h: &u64| *h),
        );

        let trace: ExplainerResult<Node, u64, u64> = tp
            .explain()
            .run_from_node(&FUSED, &root);
        // Sum = 3 + 2 + 1 = 6.
        assert_eq!(trace.orig_result, 6);
        // Every non-leaf records its child-accumulations.
        assert!(!trace.heap.transitions.is_empty());
    }
}

Sealed view on the seed path

For an N-typed view of the trace that hides SeedNode entirely, project via the standard From conversion:

use hylic::prelude::SeedExplainerResult;

let raw: ExplainerResult<SeedNode<N>, H, R> =
    pipeline.lift().explain().run_from_slice(&FUSED, &seeds, h0);
let sealed: SeedExplainerResult<N, H, R> = raw.into();

// sealed.entry_initial_heap, entry_working_heap, orig_result — EntryRoot row promoted out
// sealed.roots: Vec<ExplainerResult<N, H, R>>                — per-seed subtrees

Use raw when you need to keep composing lifts on top of .explain() (the chain type is what matters); use sealed when you want an N-typed view for formatting or assertions — the library’s invariant guarantees every below-root node is a Node(n), so the unwrap is total.

Composing with other lifts

Because explain() is just a then_lift(Shared::explainer_lift()), it composes:

let r = pipeline
    .wrap_init(|n, orig| orig(n) * 2)   // first lift
    .explain()                           // records the wrap_init results
    .zipmap(|r| r.orig_result > 100);    // inspect .orig_result

Order matters: lifts run bottom-up (the first .wrap_init runs innermost; .explain sees its results; .zipmap sees the ExplainerResult).

Streaming variant

Shared::explainer_describe_lift(fmt, emit) emits formatted trace lines per node via a callback and leaves MapR = R unchanged:

use hylic::prelude::*;
let _ = Shared::explainer_describe_lift::<Node, u64, u64, _, _>(
    trace_fold_compact::<Node, u64, u64>,
    |line: &str| eprintln!("[trace] {line}"),
);

Local mirror deferred (blocked on Send+Sync in the formatter); explainer_lift is available for Local.

Import patterns

Two preludes cover everything most users need:

hylic::prelude::* — core: domain markers (Shared / Local / Owned), Shared-default Fold and Treeish constructors, executor helpers (FUSED, exec), every lift atom (Lift, IdentityLift, ComposedLift, ShapeLift, SeedLift, LiftBare, SeedNode), and explainer/format helpers.
hylic_pipeline::prelude::* — re-exports the core prelude plus pipeline typestates (SeedPipeline, TreeishPipeline, Stage2Pipeline, OwnedPipeline), source traits (TreeishSource, PipelineExec, PipelineExecOnce, PipelineSourceOnce), and the sugar trait families (SeedSugars*, TreeishSugars*, Stage2Sugars*).

A complete program — fold + graph + executor — needs exactly one prelude line:

#![allow(unused)]
fn main() {
use hylic::prelude::*;

let fold  = fold(|n: &i32| *n as u64,
                 |h: &mut u64, c: &u64| *h += c,
                 |h: &u64| *h);
let graph = treeish(|n: &i32| if *n > 1 { vec![n - 1, n - 2] } else { vec![] });
let total = FUSED.run(&fold, &graph, &5);
}

FUSED is the sequential executor, available as a const on the Shared domain. fold and treeish are the Shared-default constructors — for Local or Owned, take the per-domain path (below).

Switching domains

For Local or Owned construction, address the domain module directly. The closures don’t change; only the constructor and the executor binding do:

#![allow(unused)]
fn main() {
    #[test]
    fn domain_switching() {
        use hylic::domain::{shared as sdom, local as ldom, owned as odom};
        use hylic::graph::treeish_visit;

        #[derive(Clone)]
        struct N { val: u64, children: Vec<N> }

        // Same closures used to build a fold in each domain.
        let init = |n: &N| n.val;
        let acc  = |h: &mut u64, c: &u64| *h += c;
        let fin  = |h: &u64| *h;
        fn children(n: &N, cb: &mut dyn FnMut(&N)) {
            for c in &n.children { cb(c); }
        }

        let root = N { val: 1, children: vec![N { val: 2, children: vec![] }] };

        // Shared (Arc):
        let r1: u64 = sdom::FUSED.run(
            &sdom::fold(init, acc, fin),
            &treeish_visit(children),
            &root,
        );
        // Local (Rc):
        let r2: u64 = ldom::FUSED.run(
            &ldom::fold(init, acc, fin),
            &treeish_visit(children),
            &root,
        );
        // Owned (Box):
        let r3: u64 = odom::FUSED.run(
            &odom::fold(init, acc, fin),
            &treeish_visit(children),
            &root,
        );

        assert_eq!(r1, r2);
        assert_eq!(r2, r3);
    }
}

Parallel execution

Funnel comes in through the prelude as the funnel module:

#![allow(unused)]
fn main() {
use hylic::prelude::*;
let total = exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
}

Spec presets (default, for_wide_light, for_deep_narrow, …) are documented in Funnel policies. For amortised pool reuse across many folds, use .session(|s| s.run(...)).

Pipeline programs

Pipelines layer on the same imports — switch to the pipeline prelude:

#![allow(unused)]
fn main() {
use hylic_pipeline::prelude::*;
}

That single line brings the core prelude with it; users do not import hylic::prelude separately. From there, every Stage-1 constructor (SeedPipeline::new, TreeishPipeline::new, OwnedPipeline::new) and every sugar (.lift(), .then_lift(…), .zipmap(…), .wrap_init(…), .explain(), .run(…)) is in scope.

A full pipeline example is at the end of Pipelines — overview.

When you need bare module paths

The preludes cover normal usage. The bare module paths are useful for

Generic code over executors or operations

#![allow(unused)]
fn main() {
use hylic::ops::{FoldOps, TreeOps};
use hylic::exec::Executor;
}

Per-domain primitives (e.g. when you keep hylic::prelude and want Local constructors visible at the same names): import the domain module under an alias —

#![allow(unused)]
fn main() {
use hylic::domain::local as ldom;
let lf = ldom::fold(|n: &i32| *n as u64,
                    |h: &mut u64, c: &u64| *h += c,
                    |h: &u64| *h);
ldom::FUSED.run(&lf, &graph, &root);
}

Helpers under hylic::prelude::* that come in via the wildcard but are sometimes worth naming explicitly — Traced, memoize_treeish, VecFold, vec_fold, the explainer trace formatters, TreeFormatCfg. Reach for the qualified path when the call site benefits from the extra signal at a glance.

Module map

Concept map

How the pieces fit together.

The three axes

hylic is built on three orthogonal axes. Each can be chosen independently:

Operations define the computation. Domain determines boxing overhead. Executor determines traversal strategy. Any combination works (subject to domain compatibility).

Type landscape

How a user navigates

The prelude defaults to the Shared domain — fold(…) / treeish(…) / FUSED resolve to the Shared constructors. For Local or Owned, take the per-domain path; see Import patterns.

Domain compatibility matrix

	Shared	Local	Owned
Fused	yes	yes	yes
Funnel	yes	—	—
Explainer	yes	yes	—
Pipeline (Stage 1 + 2)	yes	yes	—
OwnedPipeline	—	—	yes

Fused supports all domains (borrows, never clones). Funnel requires N: Clone + Send, R: Send, G: Send + Sync — the Shared domain provides these. The Stage-1/Stage-2 pipeline surface is mirrored across Shared and Local (SeedSugarsShared / SeedSugarsLocal, Stage2SugarsShared / Stage2SugarsLocal, …); the Owned domain gets the dedicated one-shot OwnedPipeline instead, since its closures consume on first use and cannot back a Clone-able chain.

Zero-boxing path

For maximum performance, skip the domain system entirely. Implement FoldOps and TreeOps on concrete structs — the compiler monomorphises every call, eliminating the dyn Fn indirections. See Zero-cost performance for the worked walkthrough.

Domain system

A domain controls how fold closures are stored — the boxing strategy that determines the refcount overhead, thread-safety, and transformation semantics of a Fold<N, H, R>. Graph types are domain-independent and live in a separate module (hylic::graph).

The three domains

The three built-in domains form a spectrum from maximum capability to minimum overhead:

Domain	Fold storage	Clone	Send+Sync	Fold transforms	Executors
Shared	`Arc<dyn Fn + Send + Sync>`	yes	yes	borrow (`&self`)	Fused, Funnel
Local	`Rc<dyn Fn>`	yes	no	borrow (`&self`)	Fused
Owned	`Box<dyn Fn>`	no	no	move (`self`)	Fused

The domain affects only the fold. Graph types (Treeish, Edgy, Graph) are always Arc-based because graph composition requires Clone. The executor accepts any graph type that implements TreeOps<N> — the graph’s storage is checked at the call site, not through the domain.

Module structure

Each domain module provides fold constructors and executor bindings. Graph types are in a separate public module:

domain/
  shared/fold.rs    Fold (Arc) + fold(), exec(), FUSED
  local/mod.rs      Fold (Rc)  + fold(), exec(), FUSED
  owned/mod.rs      Fold (Box) + fold(), exec(), FUSED

graph/
  edgy.rs           Edgy<N,E>, Treeish<N> (Arc) + combinators
  compose.rs        Graph

A typical program imports the prelude — every domain marker, the Shared-default constructors, and the graph constructors come with it:

#![allow(unused)]
fn main() {
use hylic::prelude::*;
}

For Local or Owned construction, address the per-domain module directly (hylic::domain::local, hylic::domain::owned) — see Import patterns.

The Domain trait

The Domain trait provides a single associated type — the concrete Fold type for each domain:

#![allow(unused)]
fn main() {
pub trait Domain<N: 'static>: 'static {
    type Fold<H: 'static, R: 'static>: FoldOps<N, H, R>;
    type Graph<E: 'static> where E: 'static;
    type Grow<Seed: 'static, NOut: 'static>;

    /// Construct a fold from three closures. Uniform Send+Sync
    /// bound; each domain sheds Send+Sync at storage time if it
    /// doesn't need it.
    fn make_fold<H: 'static, R: 'static>(
        init: impl Fn(&N) -> H + Send + Sync + 'static,
        acc:  impl Fn(&mut H, &R) + Send + Sync + 'static,
        fin:  impl Fn(&H) -> R + Send + Sync + 'static,
    ) -> Self::Fold<H, R>;

    /// Construct a grow closure from a Fn. Uniform Send+Sync bound.
    fn make_grow<Seed: 'static, NOut: 'static>(
        f: impl Fn(&Seed) -> NOut + Send + Sync + 'static,
    ) -> Self::Grow<Seed, NOut>;

    /// Invoke a stored grow closure.
    fn invoke_grow<Seed: 'static, NOut: 'static>(
        g: &Self::Grow<Seed, NOut>,
        s: &Seed,
    ) -> NOut;

    /// Construct a graph (Edgy) closure. Uniform Send+Sync bound.
    fn make_graph<E: 'static>(
        visit: impl Fn(&N, &mut dyn FnMut(&E)) + Send + Sync + 'static,
    ) -> Self::Graph<E>;
}
}

The Executor trait is parameterized by D: Domain<N>, so the compiler resolves D::Fold<H, R> to the concrete fold type at monomorphization time. The graph type is a separate type parameter G: TreeOps<N> on the Executor trait, constrained per executor implementation (Fused accepts any G; Funnel requires G: Send+Sync).

#![allow(unused)]
fn main() {
/// Run a fold on a tree. Both Specs and Sessions implement this.
///
/// The fold is domain-specific (`D::Fold<H, R>`). The graph type G
/// is a trait-level parameter — each executor impl declares its own
/// bounds on G (e.g. Fused accepts any TreeOps, Funnel requires
/// Send+Sync). The compiler checks G at the call site.
pub trait Executor<N: 'static, R: 'static, D: Domain<N>, G: TreeOps<N> + 'static> {
    /// Run the given `fold` over the `graph` starting at `root` and
    /// return the fold's final result for the root.
    fn run<H: 'static>(&self, fold: &D::Fold<H, R>, graph: &G, root: &N) -> R;
}
}

FoldOps and TreeOps

The operations traits provide the universal interface that executors program against:

Any type implementing init/accumulate/finalize is a fold. Any type implementing visit is a graph. The executor’s recursion engine operates on these traits, not on concrete types.

Why the domain is on the executor

Fold<N, H, R> has no domain parameter — the domain is a type parameter on the executor: Exec<D, S>. This resolves a type inference problem: GATs are not injective (D::Fold<H, R> does not uniquely identify D), so placing the domain on the fold would prevent the compiler from inferring the domain from the argument types. With D on the executor, each constant (shared::FUSED, local::FUSED, owned::FUSED) or <domain>::exec(...) call fixes D, and the compiler resolves everything statically. See Domain integration.

Choosing a domain

Shared is the default choice. It supports parallel execution (Funnel requires Send+Sync folds), lift integration (Explainer operates on Shared folds), and non-destructive fold transformations (the original fold is preserved after map/contramap/product).

Local provides the same transformation API with lighter refcounting (Rc vs Arc — non-atomic vs atomic increment). It works with the Fused executor for single-threaded computation.

Owned eliminates refcounting entirely. Fold transformations consume the original (move semantics). Useful for measuring the framework’s raw overhead in benchmarks, or for single-use folds where the original is not needed after transformation.

All three domains provide the same fold combinator surface: wrap_init, wrap_accumulate, wrap_finalize, map, zipmap, contramap, product. The difference is in the calling convention (borrow vs move) and the auto-traits (Send+Sync vs neither).

Implementation notes

Technical specifics of how hylic stores closures, traverses graphs, and erases types across the lift family.

Closure storage

The three functions in a Fold<N, H, R> (init, accumulate, finalize) are stored as type-erased closures behind Arc:

#![allow(unused)]
fn main() {
pub struct Fold<N, H, R> {
    pub(crate) impl_init: Arc<dyn Fn(&N) -> H + Send + Sync>,
    pub(crate) impl_accumulate: Arc<dyn Fn(&mut H, &R) + Send + Sync>,
    pub(crate) impl_finalize: Arc<dyn Fn(&H) -> R + Send + Sync>,
}
}

Type erasure (dyn Fn) means every Fold produced by map/zipmap shares the concrete type Fold<N, H, R>, so combinators compose without per-lift type explosion.

Arc is required because Fold is Clone (the lift layer clones it once per phase closure). Box<dyn Fn> is not Clone; Arc<dyn Fn> increments a refcount.

The Local domain uses Rc<dyn Fn> (lighter refcount, single-threaded). The Owned domain uses Box<dyn Fn> (no refcount, no Clone, single-shot).

Fold, Edgy, Graph, SeedPipeline, and related types implement Clone by hand. A derived Clone would constrain type parameters to Clone, which the contained Arc/Edgy/ Fold already cover without that bound.

Graph traversal

Edgy<N, E> (and Treeish<N> = Edgy<N, N>) stores a callback-based visit closure:

#![allow(unused)]
fn main() {
/// A `NodeT → EdgeT*` function wrapped as a clonable Arc-backed
/// struct. When `NodeT = EdgeT` the type is typically named
/// [`crate::graph::Treeish`].
pub struct Edgy<NodeT, EdgeT> {
    impl_visit: Arc<dyn Fn(&NodeT, &mut dyn FnMut(&EdgeT)) + Send + Sync>,
}
}

The signature is Fn(&N, &mut dyn FnMut(&E)). Children are visited by reference; no allocation per traversal. When a Vec is needed (parallel iteration, for instance), apply() collects via the callback. Edgy::at(node) returns a Visit — a zero-allocation push-based iterator with map, filter, fold, collect_vec.

The `Lift` trait

Lift<N, N2> has two GATs: MapH<H, R> and MapR<H, R>. H and R are method-level parameters on lift_fold<H, R>, not trait-level parameters, so they’re inferred from the fold at each call site. The trait is a bifunctor on the (H, R) pair.

Concrete lifts implement Lift directly as structs. Explainer is a unit struct; SeedLift carries a grow function and is used internally by SeedPipeline. Automatic composition is provided by a blanket ComposedLift impl — no per-lift boilerplate.

`ConstructFold`: domain-generic fold construction

ConstructFold<N> constructs a D::Fold<H, R> from three closures, generic over D. Each domain implements it with its own storage strategy: Shared wraps in Arc, Local in Rc.

Shared’s fold constructor requires closures to be Send + Sync, but the trait signature is uniform across domains. make_fold is therefore unsafe fn with a documented contract — for the Shared impl, callers must pass closures that are actually Send + Sync. The Shared impl uses AssertSend<T> (an unsafe-marked Send+Sync wrapper) with method-call capture (.get()) to satisfy the compiler.

The method-call pattern matters under Rust 2021 precise captures: (wrapper.0)(n) captures the inner field (and bypasses the Send assertion); wrapper.get()(n) captures the whole wrapper.

Reserved for downstream lift implementations that need domain-generic fold construction without going through the typestate pipeline.

Module visibility

graph/ is pub — it holds the domain-independent graph types (Edgy, Treeish, Graph) that every other module imports. fold/ is pub(crate) and contains domain-independent combinator functions used by the per-domain Fold implementations.

Each domain owns its Fold type in domain/{shared,local,owned}/fold.rs. exec/ and ops/ are pub — exec for executors (Executor, Exec, fused, funnel); ops for the operations traits (FoldOps, TreeOps) and the lift atoms (Lift, ShapeLift, SeedLift, …).

The `prelude` module

Types in prelude/ are built on the core but optional to use:

VecFold / VecHeap — convenience fold that collects all children before finalizing.
Explainer — computation tracing as a Lift.
TreeFormatCfg — tree-to-string formatting.
Traced — path tracking for tree nodes.
memoize_treeish — graph-level caching for DAGs.
seeds_for_fallible — fallible seed pattern for Either<Error, Valid> graphs.

Sibling crates

The following subsystems live in sibling crates and are documented in their own source:

hylic-benchmark — Rayon executor, Sequential executor, benchmark scenarios and runners.
hylic-pipeline — typestate builder over hylic’s lift primitives. See Pipelines.

Theory notes

hylic implements patterns from the theory of recursion schemes, adapted for Rust’s type system. This page maps hylic’s types to their formal names.

Catamorphism

A catamorphism is a bottom-up fold over a recursive structure. The algebra is F R → R — given one layer of structure with children already folded to R, produce R. The carrier type R is the result at every subtree.

hylic factors this algebra into three steps through an intermediate type H:

F R → R  =  init(&N) → H, accumulate(&mut H, &R) per child, finalize(&H) → R

H is mutable working state internal to each node. R is the immutable result that flows between nodes. The bracket (init opens H, finalize closes to R) makes the invariant boundary explicit. See The N-H-R algebra factorization for the comparison with Milewski’s monoidal decomposition and the equivalence under associative ⊕.

Hylomorphism

When the tree structure is not materialized but discovered on demand (via a Treeish backed by lazy child discovery), the unfold (anamorphism) and fold (catamorphism) fuse — the tree exists only as a call stack, never as a data structure. This is a hylomorphism.

In hylic, every Exec::run() call is a hylomorphism: the executor receives a coalgebra (Treeish<N>, which produces children on demand) and an algebra (FoldOps<N, H, R>, which consumes them), and fuses both in a single recursive pass. (N, Treeish<N>) is hylic’s runtime equivalent of the type-level Fix (f a) — the pair describes a root and a way to get children, recursively, without materializing the tree.

The Funnel executor parallelizes the hylomorphism using CPS and defunctionalized tasks.

Anamorphism (seed-based discovery)

An anamorphism builds recursive structure from a seed. SeedPipeline encapsulates this: given a seed edge function (Edgy<N, Seed>) and a grow function (Fn(&Seed) → N), it constructs the treeish by composing seeds_from_node.map(grow) and handles the entry transition. Internally, SeedLift implements Lift to express the SeedNode<N> indirection as a fold transformation.

Histomorphism (fold with history)

The Explainer records the full computation trace at every node — initial heap, each child result folded in, and the final result. This corresponds to a histomorphism: a catamorphism where each node has access to the full computation history of its subtree.

The Explainer’s output (ExplainerResult) is analogous to the cofree comonad annotation. It is expressed as a Lift — a fold transformation that changes the carrier types (H → ExplainerHeap, R → ExplainerResult). The original R is accessible via ExplainerResult::orig_result.

Algebra morphism (Lift)

Lift<N, N2> maps one fold algebra into another. It transforms the carrier types through two GATs (MapH<H, R>, MapR<H, R>) and can change the node type (N → N2) by extending the tree structure with new constructors.

The SeedLift extends the tree with an entry-root constructor (SeedNode<N>: EntryRoot, Node) — the EntryRoot row’s children are the per-seed grown nodes. The Explainer enriches the heap with trace data without changing the node type. Both are algebra morphisms: they transform the F R → R algebra into a different F' R' → R' algebra over a richer domain.

lift::run_lifted applies the three trait methods (lift_treeish, lift_fold, lift_root), runs the lifted computation, and returns MapR<H, R>.

Externalized tree structure

Classical recursion schemes encode tree structure via fixed points of functors (Fix F). The functor F defines one layer of shape (leaf, binary node, n-ary node), and Fix F is the recursive nesting.

hylic externalizes this as a runtime function: Treeish<N> is Fn(&N, &mut dyn FnMut(&N)). The node type N carries identity, not structure — the same N can be traversed by different treeish functions, and the same fold works with any tree shape. This trades compile-time structural guarantees for the orthogonal decomposition of fold, graph, and executor.

The pair (N, Treeish<N>) corresponds to a coalgebra N → F N — the treeish IS the coalgebra, producing one layer of children on demand. Combined with the fold algebra, the executor performs a fused hylomorphism.

Operations traits and domain abstraction

FoldOps<N, H, R> and TreeOps<N> abstract the fold and graph operations from their storage. The standard types (Fold, Treeish) store closures behind Arc (the Shared domain). Alternative implementations can use Rc (Local), Box (Owned), or concrete structs (zero-boxing). The executor’s recursion engine takes &impl FoldOps + &impl TreeOps — fully generic over the storage, monomorphized to zero overhead for concrete types.

The Domain trait with GATs maps the marker type (Shared, Local, Owned) to concrete fold types. Graph types are domain-independent — always Arc-based, always Send + Sync. The domain controls only how fold closures are stored.

The N-H-R algebra factorization

A catamorphism’s algebra collapses one layer of recursive structure. The standard formulation is a single morphism F R → R. Both hylic and Milewski’s monoidal catamorphism factor this morphism into composable steps. They factor it differently. This page establishes the precise relationship between the two and shows when one can be derived from the other.

The two formulations

	hylic	Milewski
Extract	`init: &N → H`	`s: a → m` (scatter)
Combine	`acc: &mut H, &R`	`⊕: m × m → m` (monoid)
Output	`fin: &H → R` (every node)	`g: m → b` (root only)
Working type	`H` (unconstrained)	`m` (associative, with identity)
Carrier	`R`	`m`

In hylic, the carrier is R. Every subtree produces R. In Milewski, the carrier is m. Every subtree produces m, and a separate function g converts to the output type b once at the root.

The bracket

At each node, init opens mutable working state H, accumulate folds each child’s R into it, and finalize closes it to R. The heap H never crosses node boundaries. Only R flows between nodes.

Green is H-world (mutable working state). Blue is R (immutable result). The green-to-blue transition at each node is the finalize step, the bracket closing.

The node type N seeds the heap but is not part of the algebra. It is the node’s identity; the recursive structure lives in Treeish<N>, not in N. The pair (N, Treeish<N>) is hylic’s runtime equivalent of Milewski’s type-level Fix (f a).

The bracket separates mutable working state from immutable results. H can be a growable Vec while R is a frozen Arc<[T]>, for example. Without the bracket, the user would either accumulate into Arc (expensive reallocation on every push) or return Vec as the result (wrong invariant for the parent, which expects immutable data). The Rust type system reinforces this: &mut H is single-owner and never shared, while R can be Send and cross thread boundaries. The Funnel executor exploits this directly. R values are delivered across threads via slot delivery; H stays on the sweeping thread. For single-child nodes, the bracket is carried as a direct continuation with no allocation and no atomic. Each phase can be wrapped independently via wrap_init, wrap_accumulate, wrap_finalize.

The monoidal form

In Milewski’s decomposition, the working type m is a monoid (associative binary operation ⊕ with identity ε). A Fold(s, g) pairs a scatter function s: a → m with a gather function g: m → b. An MAlgebra provides the structural combination rule, combining one layer of the functor using only ⊕ and ε.

The catamorphism cat malg (Fold s g) = g ∘ cata (malg ∘ bimap s id) produces m at every node. g converts to b once at the root.

Compare the two diagrams. In the bracket form, every node has a green-to-blue transition (per-node finalize). In the monoidal form, green m flows uniformly and the single blue step occurs at the root.

Relationship

Claim. Milewski’s monoidal catamorphism is a special case of hylic’s N-H-R fold.

Proof. Given a Milewski fold with monoid (m, ⊕, ε), scatter s: a → m, and gather g: m → b, construct the hylic fold:

H = R = m,   init = s,   acc = ⊕,   fin = identity

At each node, hylic computes acc(acc(init(n), r₁), r₂) = s(n) ⊕ r₁ ⊕ r₂. This is the value Milewski’s catamorphism produces at every node. The user applies g to the root result to obtain b. ∎

Conditions for the converse. A hylic fold is expressible as a Milewski monoidal catamorphism when:

H = R and fin = identity
acc is a monoid (associative with identity element)

These make (H, acc, ε) a monoid. The correspondence is then m = H, s = init, ⊕ = acc, g = identity.

Without these conditions, hylic’s fold is strictly more general. It admits non-associative accumulation and distinct working/result types.

Examples

Folds that satisfy the monoid conditions:

Sum. H = R = u64, acc = +, fin = id. Addition with identity 0.
Extend. H = R = Vec<T>, acc = extend, fin = clone. Concatenation with identity vec![]. The filesystem summary uses this.
Union. H = R = HashSet<K>, acc = union, fin = clone. Associative and commutative.

Folds that do not:

Child count. acc((s,c), r) = (s+r, c+1). The count tracks immediate children, not descendants. Not associative: (h₁⊕h₂)⊕h₃ yields c+2 while h₁⊕(h₂⊕h₃) yields c+1.
Bracketed formatting. fin(h) = format!("[{}]", h). Here H ≠ R and regrouping changes the nesting: [a[b]][c] ≠ [a][b[c]].

Associativity and parallel accumulation

A monoid’s associativity allows the executor to contract adjacent sibling results in any grouping. If children b and c have completed but a has not, b ⊕ c can proceed without waiting for a. When a eventually completes, it combines with the already-contracted result. For n children, this reduces the accumulation depth from O(n) to O(log n).

hylic’s Funnel executor does not perform this contraction. It parallelizes subtree computation (children run concurrently on different workers) and accumulates their results left-to-right as the sweep cursor advances. This is a design choice: sequential accumulation enables progressive memory freeing, where each child’s R is consumed and dropped as the cursor passes. It also means the executor imposes no algebraic requirements on acc. It is up to the user to supply an appropriate accumulate function, and up to the executor to decide how results are folded into H.

A lift can recover O(log n) depth when needed: by transforming the tree structure to insert balanced reduction nodes, the contraction becomes a property of the tree shape rather than the algebra.

The general structure

In algebraic terms, acc: H × R → H is an action of R on H. When H = R and acc is a monoid, this is a monoid acting on itself, which is Milewski’s formulation. In general, it is an R-module: R acts on a distinct type H through acc, with fin: H → R as the projection. A monoid is a module over itself; a module is not necessarily a monoid.

hylic’s API does not distinguish between these cases. The user writes init, acc, fin. The executor runs them with sequential accumulation and parallel subtree computation via CPS work-stealing.

Composability

hylic’s fold combinators (product, map, zipmap, wrap_*) and graph combinators (filter, memoize, contramap) achieve the same practical composability as Milewski’s Functor/Applicative on Fold. Lifts transform both fold and treeish in sync, changing the carrier types through GATs. The SeedPipeline uses a lift internally to bridge coalgebra and algebra when they speak different types.

Bridging coalgebra and algebra: SeedPipeline

A hylomorphism fuses a coalgebra (produce children) with an algebra (fold results). Both operate on the same type N. In practice, the dependency structure often speaks a different type. A module resolver starts with module names (seeds), not parsed modules (nodes). A grow function resolves one into the other.

The user provides:

grow:            Fn(&Seed) → N           resolve a reference
seeds_from_node: N → Seed*              a node's dependency references
fold:            FoldOps<N, H, R>        the algebra, defined over N

In hylic, N → Seed* is Edgy<N, Seed>, the general edge function. N → N* is Treeish<N>, the special case where node and edge types match.

The coalgebra produces Seed. The algebra consumes N. The morphism grow: Seed → N bridges them. SeedPipeline reconciles this through two combinator chains.

Chain 1: coalgebra composition. Close N → Seed* into N → N* via .map(grow):

seeds_from_node: Edgy<N, Seed>             N → Seed*
    .map(grow)                             Seed → N
= treeish:       Edgy<N, N>               N → N*  (= Treeish<N>)

In code: the (grow, seeds_from_node) pair is fused internally at run time via Shared::fuse_grow_with_seeds, producing the Treeish<N> that drives traversal past the entry. The underlying combinator is Edgy::map — see hylic/src/graph/edgy.rs.

Chain 2: entry lifting. The SeedLift constructs a Treeish<SeedNode<N>> with two variants: Node(n) visits the original treeish (wrapping children as Node), and EntryRoot fans out the entry seeds by running grow(seed) on each and wrapping the result as Node.

The relevant struct and its Lift impl:

#![allow(unused)]
fn main() {
/// The finishing lift that closes a `SeedPipeline`'s grow axis.
/// Composes entry-seed dispatch on top of a `(grow, seeds, fold)`
/// triple and produces a treeish over `SeedNode<N>`. Not
/// user-constructed; assembled internally by
/// `Stage2Pipeline::run` at call time.
///
/// Domain-parametric: storage of the entry-seeds graph and the
/// entry-heap thunk is per-domain via `<D as Domain<()>>::Graph<Seed>`
/// and `<D as ShapeCapable<N>>::EntryHeap<H>`. No hand-rolled
/// domain discriminator.
#[must_use]
pub struct SeedLift<D, N, Seed, H>
where D: ShapeCapable<N> + Domain<()>,
      N: 'static, Seed: 'static, H: 'static,
{
    pub(crate) grow:          <D as Domain<N>>::Grow<Seed, N>,
    pub(crate) entry_seeds:   <D as Domain<()>>::Graph<Seed>,
    pub(crate) entry_heap_fn: <D as ShapeCapable<N>>::EntryHeap<H>,
    _m: PhantomData<fn() -> (D, N, Seed, H)>,
}
}

#![allow(unused)]
fn main() {
/// Opaque row type in a seed-closed chain's treeish. Values are
/// either the synthetic `EntryRoot` row (seed fan-out) or a resolved
/// `Node(N)`. User code inspects via [`is_entry_root`](Self::is_entry_root),
/// [`as_node`](Self::as_node), [`into_node`](Self::into_node), and
/// [`map_node`](Self::map_node); the variants are sealed.
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct SeedNode<N> {
    // Exposed `pub` (not `pub(crate)`) so the doc-hidden
    // `seed_node_internal` module can re-export it for
    // `hylic-pipeline`'s dispatch. User code should treat this field
    // as opaque and use `is_entry_root` / `as_node` / `map_node`.
    #[doc(hidden)]
    pub inner: SeedNodeInner<N>,
}

/// Library-internal variant carrier for `SeedNode<N>`. Exposed
/// `pub` only to make crate-external re-export through the
/// `seed_node_internal` doc-hidden module possible. User code
/// should never name this directly.
#[doc(hidden)]
#[derive(Clone, PartialEq, Eq, Hash)]
pub enum SeedNodeInner<N> {
    EntryRoot,
    Node(N),
}
}

Node(n) delegates to the inner treeish. EntryRoot has no children of its own in the treeish — its children come from the entry seeds provided at run time, grown inline at the EntryRoot visit.

After the EntryRoot → Node(grow(s)) transition, the original coalgebra and algebra drive all further recursion. The SeedNode<N> row and the composed treeish are internal to the pipeline.

Entry seeds are supplied at run time via Edgy<(), Seed> passed to pipeline.run(exec, entry_seeds, initial_heap), or via pipeline.run_from_slice(exec, &[seed1, seed2], initial_heap). The pipeline itself stores no entry concerns — only grow, seeds_from_node, and the fold.

Pipeline transformability

This chapter explains how pipelines and Stage-2 pipelines compose, why the seed-rooted Stage-2 form has its own sugar surface even though it shares the Stage2Pipeline<Base, L> type, and how the transformation layers fit together.

The core mechanics are in scope for every user; the two appendices at the end are flagged interested user only and cover the hard design problems that were worked through to arrive at the current shape. Most readers can stop at The two stages at a glance.

The two stages at a glance

A pipeline has a Stage 1 (base slots — coalgebraic form) and a Stage 2 (stacked lifts — algebraic transformation form). A .lift() call moves a pipeline from Stage 1 to Stage 2.

Two transformation vocabularies:

Stage 1 (reshape): rewrites base slots in place. A SeedPipeline’s .filter_seeds, .wrap_grow, .map_node_bi, .map_seed_bi all produce another SeedPipeline of (possibly different) type parameters. Cheap; no lift chain involved.
Stage 2 (lift-chain compose): appends a library lift to the chain. .wrap_init, .zipmap, .map_r_bi, .map_n_bi, .filter_edges, .memoize_by, .explain, .wrap_accumulate, .wrap_finalize — each one delegates internally to then_lift(Domain::xxx_lift(...)).

The same sugar name may appear at both stages with different semantics. map_node_bi at Stage 1 is a reshape (new Stage-1 pipeline); map_n_bi at Stage 2 composes a ShapeLift onto the chain. Distinct names make the stage unambiguous at the call site.

The Lift trait — the single transformation primitive

Every Stage-2 sugar ultimately builds a value implementing Lift<D, N, H, R>:

#![allow(unused)]
fn main() {
/// Domain-generic transformer over the `(treeish, fold)` pair.
///
/// A `Lift` rewrites the graph side and/or the fold side, possibly
/// changing their carrier types, and hands the result to a
/// continuation. The caller's continuation-return type `T` flows
/// through, so the chain of output types stays inferred across
/// composition (`ComposedLift<L1, L2>`).
///
/// Grow is deliberately absent from this signature. Only the Seed
/// finishing lift ([`SeedLift`](super::SeedLift)) needs a grow
/// input; it is composed internally by
/// `hylic_pipeline::PipelineExecSeed::run` and does not travel as
/// a 3-slot signature through the `Lift` trait.
///
/// See [Lifts](https://hylic.balcony.codes/concepts/lifts.html).
pub trait Lift<D, N, H, R>
where D: Domain<N> + Domain<Self::N2>,
      N: Clone + 'static, H: Clone + 'static, R: Clone + 'static,
{
    /// Output node type after the lift has been applied.
    type N2:   Clone + 'static;
    /// Output heap type after the lift has been applied.
    type MapH: Clone + 'static;
    /// Output result type after the lift has been applied.
    type MapR: Clone + 'static;

    /// Apply the lift to `(treeish, fold)` and invoke `cont` with
    /// the transformed pair.
    fn apply<T>(
        &self,
        treeish: <D as Domain<N>>::Graph<N>,
        fold:    <D as Domain<N>>::Fold<H, R>,
        cont: impl FnOnce(
            <D as Domain<Self::N2>>::Graph<Self::N2>,
            <D as Domain<Self::N2>>::Fold<Self::MapH, Self::MapR>,
        ) -> T,
    ) -> T;
}
}

A Lift takes a (treeish, fold) pair over (N, H, R) and produces another over (N2, MapH, MapR). The library ships four atoms:

IdentityLift — pass-through; the seed of every chain.
ComposedLift<L1, L2> — sequential composition, L1’s outputs feeding L2’s inputs.
ShapeLift — the universal store-three-xforms lift that every library sugar instantiates (one per axis: treeish, fold, plus N-type).
SeedLift — the finishing lift that closes the seed axis (assembled at run time; not user-constructed in the common path).

A chain is a right-associated tree of ComposedLift values rooted at IdentityLift; every .then_lift(L) or sugar call wraps the current tip in ComposedLift<current, L>. Each lift specifies how it rewrites the pair via its apply method, which uses CPS so the caller’s continuation-return type threads through composition.

See Lifts — cross-axis transforms for the full catalogue and the atom-level reference.

One Stage-2 type, two Base configurations

Stage 2 has a single pipeline type — Stage2Pipeline<Base, L> — but its sugar surface bifurcates by Base because the chain L operates over a different node type in each configuration.

Treeish-rooted: `Stage2Pipeline<TreeishPipeline<…>, L>`

When the base is a TreeishPipeline (or another treeish-rooted Stage2Pipeline being extended), the chain L: Lift<D, N, H, R> operates over the base’s N directly. All sugars come from the blanket trait Stage2SugarsShared / Stage2SugarsLocal.

Seed-rooted: `Stage2Pipeline<SeedPipeline<…>, L>`

When the base is a SeedPipeline, SeedLift is assembled at .run() time from the base’s grow plus the caller-supplied root_seeds and entry_heap, and composed as the first lift in the run-time chain. SeedLift’s output node type is SeedNode<N>, so every lift in the stored chain L operates over SeedNode<N> — not the user’s N.

Even though the chain runs over SeedNode<N>, Stage-2 sugars on this form take user closures over &N. The translation is done by Wrap dispatch: one Stage2SugarsShared / Stage2SugarsLocal trait covers both Bases, peeling &SeedNode<N> to &N on the seed-rooted side and acting as a pass-through on the treeish-rooted side. EntryRoot defaults are encoded in the per-sugar peel routines (EntryRoot passes through wrap_init, filter_edges always admits its children, etc.) — see the table later in this chapter for the full set.

The seed-pipeline-unification (one cycle ago) collapsed the two historical struct types LiftedPipeline and LiftedSeedPipeline into the single Stage2Pipeline<Base, L>, and reduced the parallel-but-separate sugar catalogues into one Wrap-dispatched trait per domain. The old type names no longer exist.

The follow-on stage2-base-run-unification cycle did the same operation on the run path: a single Stage2Pipeline::run_with_inputs body lives in stage2/run/mod.rs, generic over Base: Stage2Base. The base contributes a RunInputs<'i, CurN> GAT, a PreLift lift, and a provide_run_essentials callback that builds the pre-chain lift from inputs and yields the descend-root reference at the post-chain type. The two former per-domain run files (stage2/run/seed_shared.rs, stage2/run/seed_local.rs) were retired; the per-domain residue lives in seed/stage2_base_*.rs, where the Domain<N>::Grow<Seed, N> GAT non-normalisation forces a per-domain pinning of the SeedLift constructor.

Run composition: how `.run()` actually produces a result

For a treeish-rooted Stage2Pipeline<TreeishPipeline<…>, L>:

1. Base = TreeishPipeline: yield (treeish, fold) over (N, H, R).
2. Chain L applied: (treeish, fold) over (N, H, R)
                  → (treeish', fold') over (N2, MapH, MapR).
3. Executor: run(&fold', &treeish', &root) → MapR.

For a seed-rooted Stage2Pipeline<SeedPipeline<…>, L>:

1. Base = SeedPipeline: hold (grow, seeds_from_node, fold) over (N, Seed, H, R).
   At .run time the user adds (root_seeds, entry_heap: H).

2. Fuse: treeish_base = seeds_from_node.map(grow)   — over N.

3. SeedLift::apply (the first, innermost lift):
   (treeish_base, fold_base) over (N, H, R)
   → (treeish_lifted, fold_lifted) over (SeedNode<N>, H, R).

4. Chain L applied: (treeish_lifted, fold_lifted) over (SeedNode<N>, H, R)
                  → (treeish', fold') over (SeedNode<N2>, MapH, MapR).

5. Executor: run(&fold', &treeish', &SeedNode::entry_root()) → MapR.

The critical design point in step 3 is that SeedLift is applied inside the chain, not as an outer wrap. That’s what lets the user chain L transform the fold and treeish in ways that would otherwise depend on the seed-closing step having already happened. See Appendix A for the history of this choice.

Transformation variance at a glance

Each axis in the (N, H, R) triple has a distinct variance:

Axis	Role in fold	Variance	Sugar
N	`init(&N) → H` (contra)	invariant (used in both fold and graph)	`map_node_bi` (S1) / `map_n_bi` (S2) — bijection required
H	`&mut H` (internal)	invariant (never crosses node boundaries)	`wrap_init`, `wrap_accumulate`, `wrap_finalize`
R	`finalize(&H) → R`; `&R` in acc	invariant (appears in and out)	`zipmap`, `map_r_bi`

Invariance on all three is why every N- and R-change sugar requires both a forward and a backward closure. See Transforms and variance for the categorical picture.

Where the abstractions stop

One deliberate asymmetry remains in the current surface:

Auto-lift on TreeishPipeline but not on SeedPipeline. tp.wrap_init(w) works directly (the Stage-1 → Stage-2 transition is implicit); sp.wrap_init(w) is a compile error — write sp.lift().wrap_init(w) explicitly. The asymmetry is intentional: a treeish-rooted Stage-2 chain operates over the same N as the base, so the lift is invisible to the user; a seed-rooted Stage-2 chain operates over SeedNode<N>, and silencing that transition would surface in any lift whose output type mentions N (the explainer being the canonical example). Forcing .lift() makes the row type’s appearance traceable to a single line in the user’s source.

The historical second asymmetry — a parallel inherent-methods catalogue on the seed-rooted form — was eliminated by the seed-pipeline-unification. Stage-2 sugars are now one trait per domain (Stage2SugarsShared / Stage2SugarsLocal), blanket implemented over both Bases, with Wrap doing the per-Base type projection. See Wrap dispatch.

Appendix A — interested user only: seed pipeline, lifting, and run composition

This section is intentionally deeper than the preceding narrative and is safe to skip. It exists for readers who want to know why the current shape is as it is.

The problem the SeedPipeline solves

A hylomorphism fuses a coalgebra (N → children) with an algebra (children → result). In practice the dependency graph often speaks a different type than the algebra: module paths, URLs, database keys — a Seed that must be grown to an N before the algebra can inspect it. A SeedPipeline carries the triple (grow: Seed → N, seeds_from_node: N → Seed*, fold: N → H → R) and fuses seed-axis and node-axis into one treeish at run time.

The fusion is:

seeds_from_node:  N → Seed*            via Edgy<N, Seed>
    .map(grow):   Seed → N            via the domain's grow xform
=>  treeish:      N → N*               via Edgy<N, N> ≡ Treeish<N>

The result is a Treeish<N> that the executor can walk. But before walking, execution needs a root. The user supplies entry seeds (root_seeds: Edgy<(), Seed>) and an initial heap (entry_heap: H); these must be turned into (a) a starting node the executor can descend from, and (b) a top-level accumulation protocol for the children’s results.

The EntryRoot-as-node compromise

The executor’s run method takes a single root: run(fold, treeish, &N) → R. To handle a forest of entry seeds under a single-root executor, the library invents a synthetic root row:

#![allow(unused)]
fn main() {
/// Opaque row type in a seed-closed chain's treeish. Values are
/// either the synthetic `EntryRoot` row (seed fan-out) or a resolved
/// `Node(N)`. User code inspects via [`is_entry_root`](Self::is_entry_root),
/// [`as_node`](Self::as_node), [`into_node`](Self::into_node), and
/// [`map_node`](Self::map_node); the variants are sealed.
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct SeedNode<N> {
    // Exposed `pub` (not `pub(crate)`) so the doc-hidden
    // `seed_node_internal` module can re-export it for
    // `hylic-pipeline`'s dispatch. User code should treat this field
    // as opaque and use `is_entry_root` / `as_node` / `map_node`.
    #[doc(hidden)]
    pub inner: SeedNodeInner<N>,
}

/// Library-internal variant carrier for `SeedNode<N>`. Exposed
/// `pub` only to make crate-external re-export through the
/// `seed_node_internal` doc-hidden module possible. User code
/// should never name this directly.
#[doc(hidden)]
#[derive(Clone, PartialEq, Eq, Hash)]
pub enum SeedNodeInner<N> {
    EntryRoot,
    Node(N),
}
}

SeedLift wraps the treeish so that:

SeedNode::EntryRoot.visit fans out to SeedNode::Node(grow(s)) for each entry seed.
SeedNode::Node(n).visit delegates to the base treeish.

And wraps the fold so that:

init(SeedNode::EntryRoot) = entry_heap_fn() — returns the user’s entry_heap.
init(SeedNode::Node(n)) = base.init(n).
accumulate / finalize are uniform.

The executor then begins at &SeedNode::entry_root() and walks normally. At the value level the EntryRoot row participates in the fold like any other node — it receives children’s R via accumulate, has its own finalize, and produces the final R.

This is a compromise, not the most principled shape: a native-forest executor (one that accepts run_forest(fold, treeish, roots: &[N], initial_heap: H) → R directly) would eliminate SeedNode<N> entirely from the chain’s node type and strip the leak from user-visible result types. The refactor cost is significant (touches the Executor trait, every executor impl, and the accumulation protocol) and was deferred. See Sealed SeedNode for how the current shape is sealed at the user surface.

Why SeedLift is composed first (not last)

The library considered two architectures for where SeedLift sits relative to the stored user chain L.

Option A (rejected). SeedLift as the outermost lift, wrapped around the user’s chain. The user’s chain operates over plain N; SeedLift wraps the result to introduce SeedNode<N> at the outside.

Under this arrangement, N-changing lifts inside the user’s chain would produce an N2 that needs to be re-introduced as the inner node type that SeedLift’s grow-output is wrapped as. Because the Lift trait cannot, in general, surface both a forward and backward map over N (variance is already invariant on N), Option A would force L::N2 = Base::N — the chain could not change N. For N-change to work on the seed path, this invariance had to be broken.

Option B (shipped). SeedLift as the innermost lift, composed first at run time. The user’s chain operates over SeedNode<N> from .lift() onward. N-changing lifts inside the chain change SeedNode<N> → SeedNode<N2>, which is natural — the chain sees EntryRoot as part of the structure and any N-transform that preserves EntryRoot works.

The cost Option B pays is the SeedNode<N> leak into chain-tip types (visible in ExplainerResult<SeedNode<N>, H, R>). That cost is bounded: the sugar layer hides the variant in user closures (EntryRoot is auto-routed), and SeedExplainerResult (via From) projects the trace to N-typed when the user wants a sealed view.

Why SeedLift is assembled at run time, not at `.lift()`

SeedLift needs three ingredients: grow (from the base), root_seeds (from the caller of .run), and entry_heap (from the caller of .run). Only the first is available at .lift() time. The library has two places to put the root_seeds and entry_heap:

(a) as parameters to .lift(), turning it into a real constructor — which then requires knowing the entry data before any Stage-2 sugars are composed.

(b) as parameters to .run(), letting Stage-2 chains be composed and reused across different (seeds, h0) inputs.

The library chose (b): a seed-rooted Stage2Pipeline is a reusable computation; seeds + initial heap vary per call. This lets patterns like:

let lsp = pipe.lift().wrap_init(w).zipmap(m);
let r1 = lsp.run_from_slice(&exec, &seeds1, h0);
let r2 = lsp.run_from_slice(&exec, &seeds2, h0);

work without reconstructing the chain.

The default semantics of EntryRoot-dispatch in sugars

Each user-closure sugar on the seed-rooted Stage2Pipeline makes a per-sugar decision about EntryRoot:

Sugar	EntryRoot behaviour
`wrap_init(w)`	EntryRoot bypasses `w`; original init (returns `entry_heap`) runs
`wrap_accumulate(w)`	applied uniformly (no N-signature to dispatch on)
`wrap_finalize(w)`	applied uniformly
`filter_edges(p)`	EntryRoot always admits its children; `p(&CurN)` applied to Nodes
`memoize_by(k)`	EntryRoot uncached (keyed `None`); Nodes keyed `Some(k(n))`
`zipmap(m)` / `map_r_bi`(fwd,bwd)	applied uniformly
`map_n_bi(co, contra)`	EntryRoot → EntryRoot; Node(n) ↔ Node(f(n))
`explain()`	EntryRoot is a fold row with its own trace

Users who need different defaults — e.g., filter_edges that excludes EntryRoot’s fan-out — use .then_lift(Domain::xxx_lift::<SeedNode<CurN>, …>(pred)) directly. The sugars hide the common case; the raw surface remains available for specialisation.

Appendix B — interested user only: the typing front

Why `Lift::apply` uses CPS

A direct apply signature would return the transformed pair (Graph<D, N2>, Fold<D, N2, H2, R2>). Composition stacks these returns:

fn apply(self) -> (Graph<D, N_k>, Fold<D, N_k, H_k, R_k>)

After three chained lifts, N_k, H_k, R_k are all associated types of a ComposedLift<ComposedLift<ComposedLift<…>, …>, …>. No single named alias exists for the final return type before the whole chain is constructed, and Rust’s inference cannot thread the unnamed types through composition.

The CPS form threads the caller’s T outward:

fn apply<T>(
    &self,
    treeish: Graph<D, N>,
    fold:    Fold<D, N, H, R>,
    cont: impl FnOnce(Graph<D, N2>, Fold<D, N2, H2, R2>) -> T,
) -> T;

Whatever type the caller’s closure produces at the innermost apply (usually the executor’s MapR) flows back through every enclosing apply call. Rust infers the intermediate pair types at each junction without needing a named alias.

GAT normalisation helpers

The Domain<N> trait exposes three GATs:

type Fold<H, R>;
type Graph<E>;
type Grow<Seed, NOut>;

For Shared: Grow<Seed, N> = Arc<dyn Fn(&Seed) -> N + Send + Sync>. For Local: Grow<Seed, N> = Rc<dyn Fn(&Seed) -> N>.

Inside a generic impl body parameterised over some other type (say N), Rust’s trait solver does not reduce <Shared as Domain<N>>::Grow<Seed, N> to Arc<dyn Fn…>. The types compare equal by definition, but the solver doesn’t do the reduction unless Self = Shared is pinned in the enclosing scope.

The library works around this via free helper functions that pin the domain:

fn shared_grow_as_arc<Seed, NOut>(
    g: <Shared as Domain<NOut>>::Grow<Seed, NOut>,
) -> Arc<dyn Fn(&Seed) -> NOut + Send + Sync> { g }

Inside this function’s body, Self = Shared is the only option and the GAT normalises. The function is a no-op at runtime (the same value, type-coerced in), but makes the type checker happy in a context that was about to timeout.

stage2/run/gat_helpers.rs collects the helpers (a Shared set and a Local set), one per GAT crossing point.

The trait-twin Shared/Local pattern

Every sugar file has a _shared.rs and a _local.rs version. Bodies are line-for-line identical; only Arc vs Rc, and Send + Sync vs nothing, differ.

Why `SeedNode<N>` cannot be fully hidden in chain-tip types

Lift’s associated type N2 is the chain’s output node type, which appears in every type parameter of every chain-tip result the user sees. On the seed path, SeedLift sets N2 = SeedNode<N>; any later lift that preserves N preserves SeedNode<N>; any later lift that changes N via map_n_bi produces SeedNode<N2>. Concretely: ExplainerResult’s first type parameter — the per-node “heap.node” field — ends up being SeedNode<N>.

Hiding this would require either:

A projection pre-baked into each lift’s output (each lift carries a “strip SeedNode” step). This would require the library to know at composition time whether the input was a SeedPipeline — a structural property the Lift trait doesn’t expose.
A seal at the value-variant level (SeedNode’s variants become non-matchable). The current design does this: variants are pub(crate), user code inspects via is_entry_root, as_node, map_node. The type name still appears in chain-tip result types, but the enum-nature is sealed.

The library ships (2) plus a projection helper (SeedExplainerResult::from) for users who want the type-name seal on the explainer result. (1) would be cleaner but requires trait-level machinery that’s not currently on the roadmap.

Unified sugar catalogue across both bases

The chain-bound mismatch between Lift<D, SeedNode<N>, H, R> (seed-rooted) and Lift<D, N, H, R> (treeish-rooted) is bridged by a Wrap dispatch trait:

Wrap is a type-only trait with a GAT Of<UN>. Two impls: Identity::Of<UN> = UN (treeish-rooted), SeedWrap::Of<UN> = SeedNode<UN> (seed-rooted).
Stage2Base is implemented by every Stage-1 base; it carries type Wrap: Wrap so each base names its own projection.
Stage2SugarsShared / Stage2SugarsLocal are the unified sugar traits, blanket-implemented on every Stage2Pipeline<Base, L> where Base: Stage2Base and L: Lift<D, <Base::Wrap as Wrap>::Of<UN>, …>. Each sugar has one canonical body; the per-Base build closure is provided by a WrapShared / WrapLocal impl that peels &SeedNode<UN> to &UN on the seed-rooted side.

See Wrap dispatch for the implementation.

Constraints carried by the typing

The CPS shape of Lift::apply is what makes the chain composable in Rust without HKT-style trait parameters.
stage2/run/gat_helpers.rs carries one helper per GAT crossing point per domain. Zero-cost at runtime; refreshes whenever rustc’s GAT normalisation gets stronger.
The _shared.rs / _local.rs sugar twin files keep both domains readable without macros; the bodies stay line-for- line identical.
SeedNode<N> is sealed but visible in chain-tip types whose lift output mentions the chain’s N. The SeedExplainerResult::from projection lifts the row out of the explainer’s tip type for callers that prefer not to see it.

The type-level landscape

This chapter is the design-level account of how the library’s functional concepts sit on top of Rust’s type system. It assumes familiarity with the Lift, Wrap dispatch, and SeedNode<N> chapters — and an interest in why those shapes are what they are, beyond just what they do.

The library is a type-level functional kernel. Every transformation — fold-phase rewrite, axis change, seed-closure — is a categorical construct (natural transformation, type-level function, indexed family) encoded in Rust’s syntax. Where the encoding works smoothly the library reads like ordinary Rust; where it doesn’t, the friction shows up as verbose projection chains, deliberate (bi-suffixed) bidirectionality, or the per-domain split. Each of those is structural, not stylistic.

GATs are higher-order functions

The first principle (lifted directly from Crichton, “GATs are HOFs”): a Generic Associated Type is a type-level function. A GAT type Of<X>; on a trait T is a function X ↦ T::Of<X>, where the function’s body is the impl’s expansion.

In hylic, the canonical GAT is Wrap::Of:

#![allow(unused)]
fn main() {
/// Type-level dispatch for the chain's input N. Each
/// [`Stage2Base`](super::Stage2Base) declares which `Wrap` it uses;
/// `WrapShared` / `WrapLocal` impls carry the per-domain lift
/// construction.
pub trait Wrap {
    /// The wrapped node type for a given user-facing N.
    type Of<UN: Clone + 'static>: Clone + 'static;
}
}

Wrap is a trait with a single GAT. Two impls give two functions:

Identity::Of  : Type → Type   ≡  λUN. UN              -- identity
SeedWrap::Of  : Type → Type   ≡  λUN. SeedNode[UN]    -- one-tag wrap

These are not “associated types you happen to read off an impl”; they are first-class type-level functions. The library uses them where a Haskell library would use f a quantified over f, or a Scala 3 library would use [F[_]]. In Rust, the type lambda is encoded as a trait with a GAT, and applied via the projection <W as Wrap>::Of<UN>.

Lift as a triple of natural transformations

A Fold<N, H, R> has three phases:

init        :  N → H
accumulate  :  (H, R) → ()       (mutates H)
finalize    :  H → R

A Lift transforms one fold algebra into another. Per the categorical intuition, that is a triple of natural transformations — one per phase. The general primitive Shared::phases_lift exposes exactly that structure: it takes three phase mappers, each a function that takes the prior fold’s phase as a value and returns the new fold’s phase:

init_mapper  :  (N → H)              → (N₂ → H₂)
acc_mapper   :  ((H, R) → ())        → ((H₂, R₂) → ())
fin_mapper   :  (H → R)              → (H₂ → R₂)

Compare with the wrap_init user closure:

W : (N, prior_init) → H        -- curried form of the init_mapper

prior_init is the prior fold’s init phase, currying init_mapper into a friendlier shape: instead of “give me a function, get a function,” “give me a node and a function-on-nodes, get a value.” The user’s orig argument is not a callback; it is the prior phase as a first-class value, which composition needs as input. Drop the parameter and you no longer have a phase mapper; you have a phase replacement. Lift composition would stop being categorical.

This is the answer to “why does every wrap_* sugar take an orig argument I sometimes don’t use.” The closure is a phase mapper. The user not consulting orig is an identity-mapped composition — the no-op natural transformation at that phase, which is structurally fine but textually looks redundant.

Why CPS in `Lift::apply`

A direct apply signature would return the transformed triple:

apply : (Grow[Seed, N], Graph[N], Fold[N, H, R])
      → (Grow[Seed, N₂], Graph[N₂], Fold[N₂, H₂, R₂])

Each component is a domain-associated GAT and each axis is an associated type of the lift. After three composed lifts the return type involves three nested levels of associated-type projection, and no single name in the language admits the result without spelling all of them out. Rust’s type inference does not span that distance.

CPS — continuation-passing style — sidesteps the unnameable.

#![allow(unused)]
fn main() {
/// Domain-generic transformer over the `(treeish, fold)` pair.
///
/// A `Lift` rewrites the graph side and/or the fold side, possibly
/// changing their carrier types, and hands the result to a
/// continuation. The caller's continuation-return type `T` flows
/// through, so the chain of output types stays inferred across
/// composition (`ComposedLift<L1, L2>`).
///
/// Grow is deliberately absent from this signature. Only the Seed
/// finishing lift ([`SeedLift`](super::SeedLift)) needs a grow
/// input; it is composed internally by
/// `hylic_pipeline::PipelineExecSeed::run` and does not travel as
/// a 3-slot signature through the `Lift` trait.
///
/// See [Lifts](https://hylic.balcony.codes/concepts/lifts.html).
pub trait Lift<D, N, H, R>
where D: Domain<N> + Domain<Self::N2>,
      N: Clone + 'static, H: Clone + 'static, R: Clone + 'static,
{
    /// Output node type after the lift has been applied.
    type N2:   Clone + 'static;
    /// Output heap type after the lift has been applied.
    type MapH: Clone + 'static;
    /// Output result type after the lift has been applied.
    type MapR: Clone + 'static;

    /// Apply the lift to `(treeish, fold)` and invoke `cont` with
    /// the transformed pair.
    fn apply<T>(
        &self,
        treeish: <D as Domain<N>>::Graph<N>,
        fold:    <D as Domain<N>>::Fold<H, R>,
        cont: impl FnOnce(
            <D as Domain<Self::N2>>::Graph<Self::N2>,
            <D as Domain<Self::N2>>::Fold<Self::MapH, Self::MapR>,
        ) -> T,
    ) -> T;
}
}

apply takes a continuation cont: impl FnOnce(triple) -> T. The continuation’s return type T flows out unchanged. Inference threads each intermediate triple through the continuation locally; nothing has to be named at the top level. The composition reads as nested closure calls; the executor’s final T propagates outward through every intermediate apply.

This is the same trick categorically: instead of returning a value of some object in a category, take a hom-set element (a morphism out of that object) and apply it. Rust’s impl Trait argument is a way of saying “a morphism from this object to something”; the something is whatever shows up in the call chain.

The two-hop projection

Stage2Pipeline<Base, L> is one struct. Base is Stage2Base:

#![allow(unused)]
fn main() {
/// A Stage-1 pipeline that can drive a Stage-2 chain. Carries the
/// `Wrap` selection plus the run-time machinery (pre-lift, root
/// reference, run-input shape).
///
/// Inherits `TreeishSource` so the `(treeish<N>, fold<N, H, R>)` pair is
/// yielded through one canonical path; `with_treeish` is the single
/// place per-base storage shapes are read.
///
/// `PreLift` is intentionally unbounded at the trait level. The
/// `Stage2Pipeline::run` impl adds the `Lift<…, N2 = <Wrap>::Of<N>>`
/// bound at use time; that keeps the supertrait surface free of the
/// `Domain<<Wrap>::Of<N>>` obligation that would otherwise propagate
/// through every site naming `Stage2Base`.
pub trait Stage2Base: TreeishSource + Sized {
    /// Type-level dispatcher for the chain's input N.
    /// `Identity` → `Of<UN> = UN` (treeish-rooted).
    /// `SeedWrap` → `Of<UN> = SeedNode<UN>` (seed-rooted).
    type Wrap: Wrap;

    /// The user-facing N (the type user lambdas type at). Equal to
    /// `Self::N` for every shipped base; kept distinct for
    /// documentation symmetry with the sugar surface, which threads
    /// `UN` as a method-level parameter.
    type UserN: Clone + 'static;

    /// What `.run(...)` accepts as its second argument. Parameterised
    /// by `CurN`, the user-facing N at the chain tip (i.e. after any
    /// `map_n_bi` lifts; `CurN = Self::N` if the chain doesn't change
    /// the user N).
    ///
    /// `Identity`-Wrap bases: `&'i CurN` (a borrowed post-chain root).
    /// `SeedWrap` bases: an owned `(seeds, entry_heap)` pair (the
    /// `CurN` parameter is unused at the value level — `EntryRoot` is
    /// constructible at any inner type).
    type RunInputs<'i, CurN: Clone + 'static>;

    /// The lift composed at the head of the run-time chain.
    /// `IdentityLift` for treeish-rooted, `SeedLift` for seed-rooted.
    /// Pre-lift transforms `(treeish<N>, fold<N,H,R>)` into
    /// `(treeish<Wrap::Of<N>>, fold<Wrap::Of<N>, H, R>)` without
    /// touching H or R.
    ///
    /// Unbounded at the trait level — see the trait-level note.
    /// The `Stage2Pipeline::run` impl adds
    /// `Self::PreLift: Lift<…, N2 = <Wrap>::Of<N>, MapH = H, MapR = R>`
    /// at use time.
    type PreLift;

    /// Build the pre-lift from inputs (consuming the parts of inputs
    /// the lift captures), then yield it together with the executor's
    /// post-chain root reference to the continuation.
    ///
    /// The continuation receives the pre-lift by value (consumed when
    /// applied to the (treeish, fold) pair) and the root by reference,
    /// at the post-chain type `<Self::Wrap as Wrap>::Of<CurN>`. The
    /// reference is valid for the entire duration of `cont`.
    ///
    /// `Identity` case: pre-lift is `IdentityLift`; the root is the
    /// `&CurN` extracted from `inputs`.
    /// `SeedWrap` case: pre-lift is `SeedLift::from_*_grow(...)`,
    /// consuming `inputs.0` (entry seeds) and `inputs.1` (entry heap);
    /// the root is `&SeedNode::entry_root::<CurN>()`, constructed
    /// locally in this frame and alive for `cont`'s lifetime.
    fn provide_run_essentials<CurN: Clone + 'static, T>(
        &self,
        inputs: Self::RunInputs<'_, CurN>,
        cont: impl FnOnce(Self::PreLift,
                          &<Self::Wrap as Wrap>::Of<CurN>) -> T,
    ) -> T;
}
}

The chain’s input N is <Base::Wrap as Wrap>::Of<UN> — a two-hop projection: first project Base::Wrap (a type), then project that type through the Wrap::Of GAT at parameter UN. The full path:

<<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>
   |__________ ___________|     |__ __|     |
              v                    v        v
        find Base from        find that     apply
        Self's Stage2Base     type's Wrap   Wrap::Of
        impl                  impl          at UN

For Self::Base = TreeishPipeline<…>, the chain unfolds to Identity::Of<UN> = UN. For Self::Base = SeedPipeline<…>, it unfolds to SeedWrap::Of<UN> = SeedNode<UN>. Both reduce; both are a single projection chain; both work in every position the library uses (method-return type, where-clause, GAT projection).

What broke the first attempt

The original Phase-4 attempt had the sugar trait body call into a helper trait whose return type was <Self as Helper<UN>>::N, while the sugar method’s declared return type was the full <<Base::Wrap as Wrap>::Of<UN>> projection. Both reduced to the same concrete type for any specific impl, but the paths through the type system differed. The trait solver does not bridge two extensionally equal but syntactically distinct projections inside a default body.

The fix: don’t bridge. Have the sugar body call directly through the projection that already names the chain’s input N. The same projection sits in the return-type slot, the where-clause, and the build-method call. Rust’s solver verifies syntactic equality in each position; no reduction across distinct projection chains is ever required.

Variance is structural, not friction

Every axis-change sugar with _bi in its name (map_n_bi, map_r_bi, map_node_bi, map_seed_bi) takes a pair (co, contra). That is not Rust-specific verbosity. N, H, R are all invariant in a fold algebra — N appears in init’s argument (contravariant) and in Graph<N>’s child output (covariant); R appears in finalize’s output and in accumulate’s child input. An invariant type can only be transformed by an isomorphism; an isomorphism in types is a pair of arrows.

Scala 3 needs the pair too. So does Haskell. The library’s choice is to expose the pair explicitly, named at the call site, rather than hide it behind an Iso or Bijection typeclass. The “extra” closure is the structural witness that the transform is an iso. In an invariant world, that witness can’t be elided.

Send + Sync as a per-domain axis

Domains differ on closure storage and bound:

Domain	Closure cell	Bound on user closures
`Shared`	`Arc<dyn Fn …>`	`Send + Sync + 'static`
`Local`	`Rc<dyn Fn …>`	`'static`
`Owned`	`Box<dyn Fn …>`	`'static` (and one-shot)

The asymmetry is real: Shared parallel executors share the fold across threads, so the fold’s closures have to be Send + Sync; Local’s Rc storage actively forbids Send + Sync on captured state, allowing things like Rc<RefCell<…>> that the Shared form rejects.

Send + Sync cannot be expressed as a uniform parameterisation of one trait without macros (the bound is on a concrete closure type, not a projection-able shape). Sugars and the build dispatcher are therefore split per domain: WrapShared/WrapLocal, Stage2SugarsShared/Stage2SugarsLocal, SeedSugarsShared/SeedSugarsLocal. The trait bodies read identically line for line; only the bound differs.

Bounds at consumption, not construction

Stage2Pipeline::then_lift is unconstrained at the struct level — pure construction:

#![allow(unused)]
fn main() {
    /// Post-compose `outer` onto the chain. Pure struct construction;
    /// no bounds. The composition's *meaningfulness* is enforced where
    /// the chain is consumed (`.run_*`, `TreeishSource`).
    pub fn then_lift<L2>(
        self,
        outer: L2,
    ) -> Stage2Pipeline<Base, ComposedLift<L, L2>> {
        Stage2Pipeline {
            base:     self.base,
            pre_lift: ComposedLift::compose(self.pre_lift, outer),
        }
    }
}

A pipeline whose chain wouldn’t actually .run is structurally typeable. The compile-time check happens at consumption: .run_* and the TreeishSource impl carry the chain-validity bounds. Construction is a builder; validity is a runner concern. Imposing chain bounds at every .then_lift would force every intermediate composition to be runnable, which loses the “construct freely, validate at the consumption boundary” pattern that lets chained sugars compose without each one having to fully type-prove the chain so far.

What this buys at runtime

Nothing has runtime overhead. The sugar trait monomorphises into a chain of ComposedLift types; the type tree records every junction; inlining flattens the chain into a single tree walk that produces one (treeish, fold) pair. The executor never sees the chain — only its collapsed result. Wrap dispatch resolves at compile time per instantiation. The verbose projections in error messages are the price of carrying that information through the type system; the compiled binary has none of it.

What remains as friction

Three things, all structural:

No first-class higher-kinded types. Wrap::Of is the closest approximation. Verbose two-hop projections are the cost.
No type-level pattern matching (no Scala 3 match types). Rust cannot decompose SeedNode<UN> into UN at the type level. If a trait’s bound says L::N2 = SeedNode<UN>, UN must be supplied from elsewhere (e.g., a closure argument’s inferred type) — Rust will not invert the constructor.
No macros. The Shared/Local mirror could be one file with per-domain bound sugar, but the codebase declines macro-generated trait bodies. The duplication is documented and accepted.

What is not friction: bidirectional axis transforms (universal), orig callbacks in wrap_init (structural natural-transformation shape), CPS in apply (only way to thread unnameable returns). Those are the right shapes; Rust just exposes them at the level the abstraction needs.

Wrap_init as a phase mapper

The wrap_init family deserves a closer read because the user closure looks like a callback-with-fallthrough but is actually a curried phase mapper. The sugar’s user signature is

W : (N, prior_init: dyn Fn N → H) → H

curried from the underlying

init_mapper : (N → H) → (N → H)
            ≡ Fn(prior_init) → Fn(N → H)        -- uncurried
            ≡ Fn(N, prior_init) → H              -- curried; same content

The sugar’s body in the library realises this:

#![allow(unused)]
fn main() {
    pub fn wrap_init_lift<N, H, R, W>(wrapper: W) -> ShapeLift<Shared, N, H, R, N, H, R>
    where
        N: Clone + 'static, H: Clone + 'static, R: Clone + 'static,
        W: Fn(&N, &dyn Fn(&N) -> H) -> H + Send + Sync + 'static,
    {
        let w = Arc::new(wrapper);
        // init_mapper: (N → H) → (N → H). Curries the user's W with the prior init.
        let mi = move |old: Arc<dyn Fn(&N) -> H + Send + Sync>|
                   -> Arc<dyn Fn(&N) -> H + Send + Sync> {
            let w = w.clone();
            Arc::new(move |n: &N| w(n, &*old))
        };
        Shared::phases_lift::<N, H, R, H, R, _, _, _>(
            mi,
            Shared::identity_acc_mapper::<H, R>(),
            Shared::identity_fin_mapper::<H, R>(),
        )
    }
}

Reading the body: take the user wrapper w, return a function from the prior init old to a new init that, on each n, calls w(n, &*old). The closure-passed-in is the prior phase, exposed as a value. That is the structural definition of a phase mapper. The “intercept” framing is incidental; the categorical content is compose this layer’s natural transformation with the prior layer’s.

The same shape recurs, with appropriate types, in wrap_accumulate_lift and wrap_finalize_lift. The general primitive they all collapse to is Shared::phases_lift — three phase mappers, one per phase, each taking the prior phase and producing the next.

Reading list

Crichton, “GATs are HOFs” for the GAT framing.
Lifts for the trait shape and the four atoms.
Wrap dispatch for the surface where the type-level machinery lands at the user’s call site.

Keyboard shortcuts

hylic