hylic
A Rust library for tree-shaped recursive computation.
hylic splits a recursive computation into three pieces you build independently: a fold that says what to compute at each node, a graph that says how a node yields its children, and an executor that drives the recursion. Each piece can be defined, transformed, and composed on its own.
The library ships with two executors. Fused is the sequential
one — a callback-based recursion with no overhead beyond the
fold closures themselves. Funnel is the parallel one: a
work-stealing engine with three compile-time policy axes (queue
topology, accumulation strategy, wake policy), all monomorphised,
no runtime dispatch on strategy choice. On the 14-workload Matrix
bench Funnel wins 10 rows outright against handrolled Rayon
and a scoped pool, and lands within a few percent of the winner
on the rest. Numbers, the
interactive viewer,
and the workload catalogue are on the
benchmarks page. Scenarios are
synthetic CPU-burn workloads, so absolute milliseconds describe
shape and relative ordering rather than any specific production
pipeline.
The same fold runs unchanged under either executor — the choice
of FUSED versus exec(funnel::Spec::default(n)) is one
expression, not one rewrite.
A first example
Consider the classical problem of computing total size on disk. The tree structure corresponds to the directory layout; the fold at each node combines the node’s own size with the results from its children; the executor drives the recursion. Each concern is expressed independently and handed to the executor at the end:
#![allow(unused)]
fn main() {
#[test]
fn intro_dir_example() {
use hylic::prelude::*;
#[derive(Clone)]
struct Dir { name: String, size: u64, children: Vec<Dir> }
let graph = treeish(|d: &Dir| d.children.clone());
let fold = fold(
|d: &Dir| d.size,
|heap: &mut u64, child: &u64| *heap += child,
|heap: &u64| *heap,
);
let tree = Dir {
name: "project".into(), size: 10,
children: vec![
Dir { name: "src".into(), size: 200, children: vec![] },
Dir { name: "docs".into(), size: 50, children: vec![] },
],
};
// Sequential:
let total = FUSED.run(&fold, &graph, &tree);
assert_eq!(total, 260);
// Parallel — same fold, same graph:
let total = exec(funnel::Spec::default(4)).run(&fold, &graph, &tree);
assert_eq!(total, 260);
}
}
The tree structure need not live inside the data. A Treeish is a
function from a node to its children — it can traverse a nested
struct, look up indices in a flat array, or resolve references
through any external mechanism:
#![allow(unused)]
fn main() {
#[test]
fn intro_flat_example() {
use hylic::prelude::*;
// Flat adjacency list — nodes are indices, children are looked up
let children: Vec<Vec<usize>> = vec![
vec![1, 2], // node 0 → children 1, 2
vec![], // node 1 → leaf
vec![], // node 2 → leaf
];
let graph = treeish_visit(move |n: &usize, cb: &mut dyn FnMut(&usize)| {
for &c in &children[*n] { cb(&c); }
});
let fold = fold(|n: &usize| *n as u64, |h: &mut u64, c: &u64| *h += c, |h| h.clone());
let total = FUSED.run(&fold, &graph, &0);
assert_eq!(total, 3); // 0 + 1 + 2
}
}
Architecture
User-defined closures are wrapped into composable types (Fold, Treeish), transformed independently, and handed to an executor. The executor drives a recursion where fold and graph interleave at every node:
N— the node type (a struct, an index, a key — anything)H— the heap: per-node mutable scratch space, created byinit, not shared between nodesR— the result: produced byfinalize, flows upward to the parent’saccumulate
Any fold and graph can be executed in parallel by switching to the Funnel executor — a work-stealing engine that interleaves unfold and fold without materialising the tree. Three compile-time policy axes — queue topology, accumulation strategy, wake policy — are monomorphised, so there is no runtime dispatch on strategy choice. See Benchmarks for measured results.
Transformations and lifts
Folds and graphs are independently transformable. Each combinator produces a new value — the original is unchanged (for Clone domains) or consumed (for Owned):
N,NewN— original and target node typesH— the fold’s per-node heap (unchanged by map/zipmap/contramap)R,RNew,Extra— original, replaced, and augmented result types
All compose freely — see the Fold guide, Graph guide, and Transformations cookbook.
A lift goes further — it transforms both fold
and treeish in sync into a different type domain via the
Lift trait. The
Explainer
records the full computation trace at every node (histomorphism).
SeedPipeline handles a common case:
the tree is discovered lazily from seed references rather than
known upfront. The user provides a seed edge function
(Edgy<N, Seed>) and a grow function (Fn(&Seed) -> N); the
pipeline constructs the treeish, handles the entry transition, and
runs the fold. Internally it uses a lift (SeedLift), and the
SeedNode<N> row type is hidden behind sugar-time Node/EntryRoot
dispatch.
Cookbook
The Cookbook contains worked examples with snapshot-tested output: expression evaluation, module resolution, configuration inheritance, filesystem summary, cycle detection, parallel execution.
Where to start
The Quick Start walks through constructing and running a fold. The recursive pattern explains the underlying decomposition.
Further reading
- Meijer, Fokkinga, Paterson. Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire. (1991) — the original recursion schemes paper.
- Milewski. Monoidal Catamorphisms (2020) — a different algebra factorization. See comparison.
- Kmett. recursion-schemes — Haskell reference implementation.
- Malick. recursion.wtf — practical recursion schemes in Rust.
Quick Start
A complete fold — definition, tree structure, sequential execution — is one prelude line and three closures:
#![allow(unused)]
fn main() {
#[test]
fn intro_dir_example() {
use hylic::prelude::*;
#[derive(Clone)]
struct Dir { name: String, size: u64, children: Vec<Dir> }
let graph = treeish(|d: &Dir| d.children.clone());
let fold = fold(
|d: &Dir| d.size,
|heap: &mut u64, child: &u64| *heap += child,
|heap: &u64| *heap,
);
let tree = Dir {
name: "project".into(), size: 10,
children: vec![
Dir { name: "src".into(), size: 200, children: vec![] },
Dir { name: "docs".into(), size: 50, children: vec![] },
],
};
// Sequential:
let total = FUSED.run(&fold, &graph, &tree);
assert_eq!(total, 260);
// Parallel — same fold, same graph:
let total = exec(funnel::Spec::default(4)).run(&fold, &graph, &tree);
assert_eq!(total, 260);
}
}
fold(...) builds a Shared-domain Fold<Dir, u64, u64> from three
closures: init produces a per-node heap from a &Dir,
accumulate folds each child’s result into the heap, and
finalize extracts the result. treeish(...) wraps a children
function as a Treeish<Dir>. FUSED is the sequential executor
constant — callback-based recursion, no overhead beyond the fold
closures.
The Funnel executor swaps in without touching the fold or graph:
#![allow(unused)]
fn main() {
#[test]
fn quickstart_funnel() {
use hylic::prelude::*;
#[derive(Clone)]
struct Dir { name: String, size: u64, children: Vec<Dir> }
let graph = treeish(|d: &Dir| d.children.clone());
let fold = fold(
|d: &Dir| d.size,
|heap: &mut u64, child: &u64| *heap += child,
|heap: &u64| *heap,
);
let tree = Dir {
name: "root".into(), size: 10,
children: vec![
Dir { name: "a".into(), size: 5, children: vec![] },
Dir { name: "b".into(), size: 3, children: vec![] },
],
};
let total = exec(funnel::Spec::default(4)).run(&fold, &graph, &tree);
assert_eq!(total, 18);
}
}
Spec::default(n) picks the Robust preset over n worker threads;
see Funnel policies for the alternatives.
For repeated folds, pool creation amortises in a session scope:
#![allow(unused)]
fn main() {
#[test]
fn quickstart_session() {
use hylic::prelude::*;
#[derive(Clone)]
struct Dir { name: String, size: u64, children: Vec<Dir> }
let graph = treeish(|d: &Dir| d.children.clone());
let fold = fold(
|d: &Dir| d.size,
|heap: &mut u64, child: &u64| *heap += child,
|heap: &u64| *heap,
);
let tree = Dir {
name: "root".into(), size: 10,
children: vec![
Dir { name: "a".into(), size: 5, children: vec![] },
],
};
exec(funnel::Spec::default(4)).session(|s| {
let r1 = s.run(&fold, &graph, &tree);
let r2 = s.run(&fold, &graph, &tree);
assert_eq!(r1, r2);
});
}
}
The same fold over flat data
The tree need not live inside the data. The same summation fold
runs over a Vec<Vec<usize>> adjacency list, where nodes are
integer indices:
#![allow(unused)]
fn main() {
#[test]
fn intro_flat_example() {
use hylic::prelude::*;
// Flat adjacency list — nodes are indices, children are looked up
let children: Vec<Vec<usize>> = vec![
vec![1, 2], // node 0 → children 1, 2
vec![], // node 1 → leaf
vec![], // node 2 → leaf
];
let graph = treeish_visit(move |n: &usize, cb: &mut dyn FnMut(&usize)| {
for &c in &children[*n] { cb(&c); }
});
let fold = fold(|n: &usize| *n as u64, |h: &mut u64, c: &u64| *h += c, |h| h.clone());
let total = FUSED.run(&fold, &graph, &0);
assert_eq!(total, 3); // 0 + 1 + 2
}
}
Only the node type and the Treeish change — the fold logic is identical. This separation is the foundation of hylic’s composability.
Pivoting between the two
The two formulations describe the same shape of computation in
different node types. Fold::contramap_n lets you take a fold
written for one and run it over the other, without rewriting any
of the fold’s closures.
Going Dir-fold → flat: synthesise a minimal Dir per index — only
the fields the fold actually reads need to exist. The graph is
swapped on the executor’s side.
#![allow(unused)]
fn main() {
#[test]
fn pivot_dir_fold_to_flat() {
use hylic::prelude::*;
#[derive(Clone)]
struct Dir { size: u64, children: Vec<Dir> }
// The Dir-fold reads d.size and nothing else.
let dir_fold: Fold<Dir, u64, u64> = fold(
|d: &Dir| d.size,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
// Flat data for the same logical tree.
let sizes: Vec<u64> = vec![10, 200, 50];
let adj: Vec<Vec<usize>> = vec![vec![1, 2], vec![], vec![]];
// Pivot: contramap_n synthesises a minimal Dir from each index — only
// the fields the fold reads need to exist. The fold's closures are
// unchanged; the graph is the index-based one.
let flat_fold: Fold<usize, u64, u64> =
dir_fold.contramap_n(move |i: &usize| Dir { size: sizes[*i], children: vec![] });
let flat_graph: Treeish<usize> =
treeish_visit(move |i: &usize, cb: &mut dyn FnMut(&usize)| {
for &c in &adj[*i] { cb(&c); }
});
let total: u64 = FUSED.run(&flat_fold, &flat_graph, &0);
assert_eq!(total, 260);
}
}
The mirror direction projects each Dir to the index the
flat-fold expects:
#![allow(unused)]
fn main() {
#[test]
fn pivot_flat_fold_to_dir() {
use hylic::prelude::*;
#[derive(Clone)]
struct Dir { id: usize, children: Vec<Dir> }
// The flat-fold reads sizes[*i] from a captured array.
let sizes: Vec<u64> = vec![10, 200, 50];
let flat_fold: Fold<usize, u64, u64> = fold(
move |i: &usize| sizes[*i],
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
// The same logical tree as a struct.
let root = Dir { id: 0, children: vec![
Dir { id: 1, children: vec![] },
Dir { id: 2, children: vec![] },
]};
// Pivot: contramap_n projects each Dir to the index the fold expects.
// The fold's closures are unchanged; the graph walks struct children.
let dir_fold: Fold<Dir, u64, u64> = flat_fold.contramap_n(|d: &Dir| d.id);
let dir_graph: Treeish<Dir> = treeish(|d: &Dir| d.children.clone());
let total: u64 = FUSED.run(&dir_fold, &dir_graph, &root);
assert_eq!(total, 260);
}
}
In both directions the original fold’s closures pass through
unchanged; the only transformation is contramap_n on the input
axis. The graph is chosen at the call site to match.
Further reading
- The recursive pattern — the decomposition that makes this work
- Fold guide — transformations: map, contramap, product, phase wrapping
- Graph guide — filtering, contramap, memoization
- Funnel executor — the parallel work-stealing engine
- Cookbook — worked examples
Glossary
One-line definitions of the core terms, with pointers to where they’re developed in depth. Link to this page from anywhere a term appears without its definition.
Fold
The algebra over a recursion — a triple of closures init: &N → H,
accumulate: &mut H, &R, finalize: &H → R. Given an input node
N, a fold says how to produce a per-node scratch state, how to
fold each child’s result into it, and how to close it out into a
final R. See Fold guide.
Graph / Treeish<N>
A function from a node to its children. The type is
Treeish<N>; “graph” is the informal name for the concept. A
Treeish is how the recursion finds the next level; the executor
and fold never see the tree structure directly, only what the
Treeish yields. See Graph guide.
Heap (H)
The per-node working state inside a fold. Produced by init,
mutated by accumulate as each child’s R arrives, consumed by
finalize. Not shared across nodes; each node gets its own. Also
written as the type variable H in Fold signatures.
R (result)
The type returned from finalize at each node, and the type that
flows upward into the parent’s accumulate. At the root, the
executor hands back an R.
Domain (Shared / Local / Owned)
How hylic stores closures inside folds, graphs, and grow
functions. Shared uses Arc<dyn Fn + Send + Sync> (cheap clone,
parallel-safe); Local uses Rc<dyn Fn> (cheap clone,
single-thread, non-Send captures); Owned uses Box<dyn Fn>
(one-shot, consumed on use). See
The three domains.
Executor (Fused / Funnel)
The runtime that drives the recursion. Fused is a direct
sequential callback recursion — one thread, no work queue.
Funnel is a parallel work-stealing engine running over a
compile-time policy (queue topology, accumulation strategy, wake
policy). Both implement the Executor<N, R, D, G> trait. See
Choosing an executor.
Lift
A transformation that rewrites a (grow, treeish, fold) triple
into another one with possibly different types. Implemented by
the Lift trait; the library ships four atoms (IdentityLift,
ComposedLift, ShapeLift, SeedLift). See
Lifts.
ComposedLift<L1, L2>
The binary composition atom — two lifts chained so L2’s inputs
equal L1’s outputs. Every multi-sugar call builds a
right-associated ComposedLift tree so the compiler can verify
every junction at build time.
ShapeLift
The universal “rewrite one or more fold phases” lift used by
most Stage-2 sugars (wrap_init, zipmap, filter_edges, …).
SeedLift
The finishing lift that closes a SeedPipeline‘s grow axis.
Domain-parametric over ShapeCapable: a single
SeedLift<D, N, Seed, H> struct, with per-domain Lift impls
because the fold-construction closures’ Send+Sync discipline
differs by domain. Not something user code constructs directly —
seed-rooted Stage2Pipeline::run assembles it at call time from
grow + user-supplied root_seeds + entry_heap and composes
it as the first lift of the run-time chain.
SeedNode<N>
Sealed row type with two library-internal variants: the
synthetic EntryRoot (a seed-closed chain’s root row) and a
resolved Node(N). User code inspects via is_entry_root,
as_node, map_node. See SeedNode.
SeedExplainerResult<N, H, R>
N-typed projection of a seed-closed explainer result. The
EntryRoot row is promoted into top-level fields
(entry_initial_heap, entry_working_heap, orig_result);
each root subtree becomes an ExplainerResult<N, H, R> — no
SeedNode<N> appears in the user-visible shape. Obtained via
raw.into() (or SeedExplainerResult::from(raw)).
Pipeline
A typestate-chained builder over lifts. SeedPipeline and
TreeishPipeline are Stage 1 (base slots);
Stage2Pipeline<Base, L> is the unified Stage-2 form (base +
lift chain), distinguished by its Base. OwnedPipeline is a
one-shot variant. Every pipeline ultimately resolves to a
(treeish, fold) pair handed to an executor. See
Pipelines.
Stage2Pipeline<Base, L>
The single Stage-2 type. Base is a Stage-1 pipeline implementing
Stage2Base; L is a Lift chain. Treeish-rooted pipelines
(Stage2Pipeline<TreeishPipeline<…>, L>) and seed-rooted
pipelines (Stage2Pipeline<SeedPipeline<…>, L>) compose Stage-2
sugars uniformly via Wrap dispatch.
Wrap / Stage2Base
Wrap is the dispatch trait that maps a Stage-2 sugar’s user-facing
&N parameter type to the chain-tip’s actual type — Identity
when the chain runs over N, SeedWrap when it runs over
SeedNode<N>. Stage2Base is the trait implemented by Stage-1
pipelines that can root a Stage2Pipeline; it carries the
associated Wrap implementation.
ShapeCapable::EntryHeap<H>
The per-domain GAT giving a domain its Fn() -> H storage
discipline. Arc<dyn Fn() -> H + Send + Sync> on Shared,
Rc<dyn Fn() -> H> on Local. Used by SeedLift for the
EntryRoot init thunk in place of a hand-rolled domain
discriminator enum.
Sugar
A pipeline method that delegates to .then_lift(...) with a
library lift — wrap_init, zipmap, filter_edges, explain,
etc. The sugar traits are split across stages and domains:
SeedSugars* and TreeishSugars* for Stage 1 reshape;
Stage2SugarsShared/Stage2SugarsLocal for Stage 2 (one trait
per domain, blanket-implemented across both Bases). See
Sugars.
CPS (continuation-passing style)
Used in two places with different meanings, both internal machinery:
- In
Lift::apply, the trait takes a continuation so a lift can transform the triple and then call through to the user’s executor rather than returning a value. This enables composition without an intermediate materialisation. - In the Funnel executor, the recursion is defunctionalised into
Cont::Root / Cont::Direct / Cont::Slotvariants so workers run aloop { match cont { … } }rather than nesting calls. See CPS walk.
Users of the library don’t need to think about CPS to use it; these sections are optional reading.
Variance
Whether a type’s role allows covariant, contravariant, or
invariant transformation. N is covariant in grow, invariant in
graph, contravariant in fold’s init; H and R are invariant.
This is why the methods have the names they do
(map for covariant, contramap for contravariant, *_bi for
invariant/bijective). See Transforms and variance.
Grow
The Seed → N closure in a SeedPipeline — resolves a reference
into a full node. Only SeedPipeline has a grow slot; a
TreeishPipeline skips it (nodes are already materialised).
Benchmark results
Wall-clock means from criterion across four harnesses. The
Overhead harness pits Fused against handrolled
single-threaded recursions to measure framework cost. The
Matrix harness puts Funnel across its 16 policy variants
alongside Rayon and a scoped pool, all parallel, across 14
workload scenarios. Module simulation runs a synthetic
dependency-graph resolver — the workload that originally
motivated the library. Quick is a small subset of the
Matrix grid, used to track changes during development.
What the numbers say
Sequential first. Fused lands within ±20% of hand.seq
on every row of the Overhead bench, faster on 8 of 11. The
spread against real.seq (a plain fn f(&T) -> R with no
hylic types in sight) is within ±16%. The library’s
fold/treeish indirection is, in this regime, on the order of
compiler-level noise rather than an integer multiple. The
plausible reason is a uniform per-node shape that monomorphises
predictably plus closures held inside Fold and Treeish
that the compiler can inline through; whatever the cause, the
practical statement is parity, not dominance.
The parallel picture is more interesting. A Funnel variant
is the row winner on 10 of 14 Matrix workloads. On the
remaining 4 the row winner is handrolled and the nearest
Funnel variant lies within a few percent. No single policy
preset wins across the grid: shallow-wide workloads prefer
Shared queues with OnArrival accumulation; deep-narrow
prefer PerWorker with OnFinalize; the wake axis can move a
row by 10–30% on its own. The 14-row table below is most
useful read row-by-row — for any one workload, the policy that
wins tells you something about the workload’s shape.
The Module-simulation harness picks at the same trade. On the
four _fast rows (large-dense, large-sparse, small-dense,
small-sparse) Funnel variants win — different policy axes
per row, unsurprisingly given the Matrix story. On the _slow
rows, where per-node work dominates and scheduling ceases to
matter, the runners cluster.
These properties of Funnel are statements about the source,
not inferences from the benchmarks. Policies are monomorphised
(Funnel<P> is generic, the entire walk specialises per
policy, no runtime dispatch on strategy). Continuations are
defunctionalised — Cont<H, R> is a three-variant enum
(Root, Direct, Slot); the inner loop is match cont in
a loop, no Box<dyn FnOnce> per step. Continuations and
fold chains live in arenas (ChainNode<H, R> in a scoped
Arena, Cont<H, R> in a ContArena, both released in bulk
at the end of the pool’s lifetime; no per-node malloc/free).
Under the OnArrival accumulation policy, each child result
is folded into its parent’s heap on arrival via
P::Accumulate::deliver, and the slot is freed; OnFinalize
buffers until siblings are complete and then drains. The walk
references the user’s fold and treeish by &'a _, with the
lifetime tied to the pool’s with(...) scope; user closures
are not cloned into worker queues. Queue topology is a
compile-time choice — per-worker deques (local push, remote
steal) or a single shared FIFO — and selection is per workload
rather than universal.
See the Funnel deep-dive for the walk, ticket system, and arena details, and Policies and presets for the policy traits.
Interactive: Funnel axes viewer
The Matrix bench output filtered by policy axis, marginalised
on demand, with cell-level deviations from real.rayon.
Overhead
make -C hylic-benchmark bench-overhead
The Overhead table also lists several parallel runners
(real.rayon, hylic-rayon, hand.rayon, hylic-parref+rayon,
hylic-eager+rayon) for cross-reference. They are not the
denominators for sequential-overhead statements; a parallel
runner beating a sequential one says that multiple cores are
faster than one, not that the framework is slow.
For a framework-vs-handrolled comparison in the parallel
regime, hylic-rayon versus real.rayon is the
apples-to-apples pair: within ±15% on most rows, with a
worst-case +33% on parse-lt_sm. That is a real framework tax
on the parallel path; whether it’s tolerable depends on the
choice between Funnel and a Rayon-backed executor.
Matrix
make -C hylic-benchmark bench-matrix
Each cell shows the wall-clock mean and the +X% deviation
from the row’s fastest entry; the row winner is marked
(best). Reading a few rows together brings out the
policy-axis story.
wide_sm (200 nodes, branching 20):
funnel.pw.arrv.push = 6.2ms (best), 20% ahead of both
hand.pool and hand.rayon at 7.5ms. Wide fan-out plus
immediate OnArrival delivery and per-worker deques keeps the
push cheap and drains the child heap as siblings complete.
graph-hv_sm (heavy edge-discovery, modelling a dependency
resolver): funnel.sh.fin.k2 = 16.4ms (best), 2% ahead of
hand.rayon at 16.8ms. Dropping the wake frequency to every
second child amortises the edge-discovery cost better than the
handrolled approaches.
The 4 rows where handrolled wins are bal_sm, io_sm,
graph-io_sm, noop_sm. On bal_sm, hand.rayon = 16.1ms
versus funnel.sh.fin.push = 17.0ms (+6%). noop_sm is the
zero-work cell — dominated by per-node bookkeeping, absolute
times sub-millisecond, percentage deltas distort. The
framework cost is most visible there and unavoidable for any
tree-shaped recursive parallelisation.
Module simulation
make -C hylic-benchmark bench-modsim
Eight workloads on two axes — sparse vs dense graph, fast vs
slow per-node work. On the four _fast rows, Funnel
variants take three of four winners (funnel.pw.fin.push = 1.0ms on large-dense_fast, funnel.pw.arrv.push = 1.0ms
on large-sparse_fast, funnel.sh.arrv.push = 0.3ms on
small-sparse_fast); the fourth, small-dense_fast, sits
near 0.3ms across runners. For dependency-graph-shaped
workloads with cheap per-node work — the common case for a
module resolver — Funnel is the faster choice. Where
per-node work dominates, scheduler choice ceases to matter and
the runners converge.
Quick
make bench-quick-light
Five runners — real.rayon plus four Funnel variants
covering both queue axes (PerWorker, Shared) and both
accumulation axes (OnArrival, OnFinalize), all with EveryK<4>
wake. Nine scenarios chosen for variation: noop, hash,
parse-lt, parse-hv, aggr, xform, bal, wide,
graph-hv. Near-parity scenarios (io, deep, fin,
graph-io, lg-dense) are excluded.
The -ab variants run the same bench across multiple git
revisions of hylic, archiving each run with a timestamp.
Further revisions can be added by appending label=gitref to
the makefile target.
Workload scenarios
Each scenario is a TreeSpec (node count, branching factor)
and a WorkSpec (per-phase CPU burn amounts plus an optional
I/O spin-wait). busy_work is the deterministic u64 LCG
loop inside black_box; spin_wait_us is a wall-clock
busy-wait. The scenarios are synthetic — the intent is to
cover a shape space (shallow-wide, deep-narrow,
accumulate-heavy, finalize-heavy, I/O-bound, graph-discovery-
heavy) rather than reproduce any specific production workload.
#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/support/scenario.rs:scenario_catalog}}
}
#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/support/work.rs:work_spec}}
}
Funnel policy variants
#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/support/executor_set.rs:funnel_specs}}
}
See Funnel policies for the meaning of each axis, the rationale, and guidance on selecting a preset.
Text tables
Overhead
workload hand-pool hand-rayon hand-seq hylic-eager+fused hylic-eager+rayon hylic-fused hylic-fused-local hylic-fused-ownedhylic-parref+fusedhylic-parref+rayon hylic-rayon real-rayon real-seq
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
aggr_sm 19.5ms (+68%) 14.2ms (+23%) 50.1ms (+334%) 21.7ms (+88%) 20.1ms (+74%) 52.3ms (+353%) 52.0ms (+350%) 55.7ms (+382%) 15.6ms (+35%) 12.6ms (+9%) 11.5ms (best) 15.9ms (+38%) 47.9ms (+315%)
bal_sm 25.8ms (+69%) 15.3ms (best) 92.3ms (+505%) 62.8ms (+311%) 28.5ms (+87%) 89.0ms (+483%) 88.2ms (+478%) 88.7ms (+481%) 59.9ms (+292%) 21.2ms (+39%) 21.0ms (+38%) 20.7ms (+36%) 76.9ms (+404%)
deep_sm 10.0ms (+33%) 9.6ms (+28%) 34.9ms (+365%) 32.3ms (+330%) 13.3ms (+78%) 34.6ms (+361%) 36.8ms (+391%) 35.0ms (+367%) 31.9ms (+325%) 9.5ms (+26%) 8.6ms (+15%) 7.5ms (best) 31.0ms (+313%)
fin_sm 15.1ms (+60%) 9.5ms (best) 45.1ms (+377%) 14.8ms (+56%) 15.7ms (+66%) 44.6ms (+371%) 47.9ms (+407%) 43.9ms (+364%) 9.5ms (best) 10.5ms (+11%) 10.2ms (+7%) 11.0ms (+17%) 43.0ms (+355%)
hash_sm 1.6ms (+69%) 1.2ms (+23%) 5.9ms (+520%) 4.5ms (+377%) 1.6ms (+66%) 4.2ms (+339%) 4.5ms (+375%) 4.5ms (+374%) 4.2ms (+340%) 1.3ms (+37%) 1.0ms (+5%) 1.0ms (best) 4.6ms (+381%)
io_sm 10.9ms (+43%) 7.6ms (+1%) 42.4ms (+460%) 42.7ms (+464%) 7.9ms (+4%) 42.3ms (+458%) 42.6ms (+462%) 42.5ms (+461%) 42.6ms (+462%) 7.6ms (best) 7.6ms (+1%) 7.6ms (best) 42.0ms (+455%)
lg-dense_sm 29.1ms (+49%) 19.5ms (best) 91.1ms (+368%) 74.8ms (+284%) 22.7ms (+17%) 102.8ms (+428%) 87.2ms (+348%) 85.7ms (+340%) 73.0ms (+275%) 22.9ms (+17%) 20.3ms (+4%) 20.5ms (+5%) 96.4ms (+395%)
noop_sm 0.1ms (+7762%) 0.0ms (+2148%) 0.0ms (+36%) 0.1ms (+13599%) 0.2ms (+20791%) 0.0ms (+336%) 0.0ms (+867%) 0.0ms (+623%) 0.1ms (+12240%) 0.1ms (+8872%) 0.0ms (+2548%) 0.0ms (+2697%) 0.0ms (best)
parse-hv_sm 37.6ms (+56%) 24.2ms (best) 102.4ms (+323%) 119.6ms (+395%) 24.8ms (+3%) 119.7ms (+395%) 114.0ms (+372%) 124.1ms (+413%) 106.6ms (+341%) 24.6ms (+2%) 27.2ms (+13%) 24.6ms (+2%) 111.5ms (+361%)
parse-lt_sm 10.5ms (+101%) 7.2ms (+39%) 28.9ms (+455%) 26.8ms (+414%) 7.7ms (+48%) 26.2ms (+403%) 29.0ms (+458%) 26.2ms (+403%) 27.2ms (+422%) 6.9ms (+33%) 5.7ms (+10%) 5.2ms (best) 30.6ms (+488%)
wide_sm 9.6ms (+40%) 6.9ms (best) 38.6ms (+461%) 31.6ms (+359%) 10.0ms (+45%) 35.6ms (+418%) 35.6ms (+417%) 35.6ms (+417%) 30.9ms (+349%) 10.2ms (+49%) 8.9ms (+30%) 9.4ms (+37%) 31.3ms (+355%)
xform_sm 17.3ms (+66%) 11.7ms (+13%) 53.9ms (+419%) 19.5ms (+88%) 16.3ms (+57%) 43.6ms (+320%) 50.1ms (+383%) 50.0ms (+382%) 13.1ms (+27%) 12.4ms (+19%) 10.4ms (best) 11.6ms (+12%) 50.9ms (+390%)
(Matrix and Module-simulation text tables refresh after the
next make bench-matrix / make bench-modsim run.)
Benchmark source
Overhead harness
#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/bench_overhead.rs}}
}
Matrix harness
#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/bench_matrix.rs}}
}
Module simulation harness
#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/bench_modsim.rs}}
}
Runner matrix construction
#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/support/runners.rs}}
}
Handrolled baselines
#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/support/baselines.rs}}
}
Funnel policy specs
#![allow(unused)]
fn main() {
{{#include ../../../../hylic-benchmark/benches/support/executor_set.rs}}
}
Correctness
Performance numbers are uninformative without correctness. The
Funnel executor has a unit and integration suite under
hylic/src/exec/variant/funnel/tests/ covering the API, parity
with the Fused baseline, and deterministic results across all
policy variants. An interleaving stress harness in
tests/interleaving.rs and tests/stress.rs exercises the
scheduler under aggressive steal patterns. Every benchmark
harness asserts that the computed R matches a reference Fused
run (PreparedScenario::expected) before timing begins; a
policy variant producing a faster-but-incorrect answer would
never reach the tables above.
The recursive pattern
Recursive tree computations — regardless of domain — share a single underlying structure. hylic makes that structure explicit, names its parts, and allows each part to be transformed independently.
One function
The entire computation, taken directly from the sequential executor:
#![allow(unused)]
fn main() {
fn recurse<N, H, R>(
fold: &impl FoldOps<N, H, R>,
graph: &impl TreeOps<N>,
node: &N,
) -> R {
let mut heap = fold.init(node);
graph.visit(node, &mut |child: &N| {
let r = recurse(fold, graph, child);
fold.accumulate(&mut heap, &r);
});
fold.finalize(&heap)
}
}
At each node:
- init — construct a heap
Hfrom the node - visit children — for each child, recurse and accumulate the result
- finalize — produce the node’s result
Rfrom the heap
Every tree fold — Fibonacci, dependency resolution, filesystem
aggregation, AST evaluation — is this function instantiated with
different choices for init, accumulate, finalize, and child
structure.
Three pieces
The function above takes three things as parameters. hylic gives each a name and a type:
Treeish — the tree structure. Given a node, visit its children:
#![allow(unused)]
fn main() {
/// A `NodeT → EdgeT*` function wrapped as a clonable Arc-backed
/// struct. When `NodeT = EdgeT` the type is typically named
/// [`crate::graph::Treeish`].
pub struct Edgy<NodeT, EdgeT> {
impl_visit: Arc<dyn Fn(&NodeT, &mut dyn FnMut(&EdgeT)) + Send + Sync>,
}
}
Treeish<N> is an alias for Edgy<N, N> — an edge function where
nodes and edges are the same type:
#![allow(unused)]
fn main() {
pub type Treeish<Node> = Edgy<Node, Node>;
}
A Treeish is constructed from a function from node to children:
#![allow(unused)]
fn main() {
#[test]
fn treeish_constructor() {
use hylic::prelude::*;
#[derive(Clone)]
struct Dir { name: String, size: u64, children: Vec<Dir> }
let graph: Treeish<Dir> = treeish(|d: &Dir| d.children.clone());
let root = Dir { name: "root".into(), size: 10, children: vec![] };
assert_eq!(graph.apply(&root).len(), 0);
}
}
The callback-based signature Fn(&N, &mut dyn FnMut(&N)) avoids
any allocation per visit. The treeish() constructor wraps a
Vec-returning function into this form.
The node type N may be anything — a nested struct, an integer
index into an adjacency list, a string key into a map, or a
reference resolved through I/O. The structure resides in the
treeish function rather than in the data.
Fold — the computation. In the Shared domain, three closures behind Arc:
#![allow(unused)]
fn main() {
pub struct Fold<N, H, R> {
pub(crate) impl_init: Arc<dyn Fn(&N) -> H + Send + Sync>,
pub(crate) impl_accumulate: Arc<dyn Fn(&mut H, &R) + Send + Sync>,
pub(crate) impl_finalize: Arc<dyn Fn(&H) -> R + Send + Sync>,
}
}
Other domains use Rc (Local) or Box (Owned) — same operations, different boxing. The fold type doesn’t carry the domain; the executor does.
init:&N → H— create per-node working state from the nodeaccumulate:&mut H, &R— fold one child’s result into the heapfinalize:&H → R— close the bracket, produce the node’s result
H and R are distinct types: H is mutable working state (the
open bracket), R is the immutable result flowing to the parent (the
closed bracket). See
The N-H-R algebra factorization for the
theoretical basis. Many folds have H = R, in which case finalize
is just an identity extraction from the heap:
#![allow(unused)]
fn main() {
#[test]
fn identity_finalize_fold_example() {
use hylic::prelude::*;
#[derive(Clone)]
struct Dir { name: String, size: u64, children: Vec<Dir> }
let graph: Treeish<Dir> = treeish(|d: &Dir| d.children.clone());
let sum: Fold<Dir, u64, u64> = fold(
|d: &Dir| d.size,
|heap: &mut u64, child: &u64| *heap += child,
|h: &u64| *h,
);
let tree = Dir {
name: "root".into(), size: 10,
children: vec![
Dir { name: "a".into(), size: 5, children: vec![] },
Dir { name: "b".into(), size: 3, children: vec![] },
],
};
assert_eq!(FUSED.run(&sum, &graph, &tree), 18);
}
}
Executor — the strategy. Controls HOW the recursion runs:
#![allow(unused)]
fn main() {
#[test]
fn exec_usage() {
use hylic::prelude::*;
#[derive(Clone)]
struct N { val: u64, children: Vec<N> }
let graph: Treeish<N> = treeish(|n: &N| n.children.clone());
let f: Fold<N, u64, u64> = fold(
|n: &N| n.val,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
let root = N { val: 1, children: vec![N { val: 2, children: vec![] }] };
// Sequential:
let r1: u64 = FUSED.run(&f, &graph, &root);
// Parallel — same fold, same graph:
let r2: u64 = exec(funnel::Spec::default(4)).run(&f, &graph, &root);
assert_eq!(r1, r2);
}
}
Two executors are provided:
| Executor | Traversal | Domains |
|---|---|---|
FUSED (Shared) / local::FUSED / owned::FUSED | Direct sequential recursion | all |
| Funnel | Parallel work-stealing | Shared |
Both implement the Executor<N, R, D, G> trait, parameterised by
a domain and graph type. See
Executor architecture for
details.
The separation
The fold carries no knowledge of the tree; the tree carries no knowledge of the fold; the executor connects them. The domain determines how closures are stored — the fold and treeish do not record this, the executor does.
Every computation in hylic reduces to
executor.run(&fold, &treeish, &root). When the tree is
discovered lazily (seeds resolved on demand),
SeedPipeline constructs the treeish
from a seed edge function together with a grow, and delegates
to executor.run internally.
The operations traits
The executor’s recursion engine doesn’t know about Arc, Rc, or Box.
It takes &impl FoldOps<N, H, R> and &impl TreeOps<N> — pure
operation traits:
#![allow(unused)]
fn main() {
/// The three fold operations, independent of storage.
pub trait FoldOps<N, H, R> {
/// Construct a fresh per-node heap from a node reference.
fn init(&self, node: &N) -> H;
/// Fold one child result into the heap in place.
fn accumulate(&self, heap: &mut H, result: &R);
/// Close out the heap into the node's final result.
fn finalize(&self, heap: &H) -> R;
}
}
#![allow(unused)]
fn main() {
/// Tree traversal operations, independent of storage.
pub trait TreeOps<N> {
/// Visit children of `node` via callback. Zero allocation.
fn visit(&self, node: &N, cb: &mut dyn FnMut(&N));
/// Collect children to Vec. Default: collect via visit.
fn apply(&self, node: &N) -> Vec<N> where N: Clone {
let mut v = Vec::new();
self.visit(node, &mut |child| v.push(child.clone()));
v
}
}
}
The standard Fold<N, H, R> and Treeish<N> implement these traits.
So do local::Fold, owned::Fold, and any user-defined struct with
the right methods. The executor is generic over these traits — when
called with a concrete struct, the compiler inlines completely.
See Domain system for how domains connect operations to storage.
The three domains
The underlying question
A recursion is, at its heart, five closures — a fold’s init,
accumulate, and finalize, a graph’s edge function, and (in
seed pipelines) a grow. hylic retains these closures across
the duration of a run and hands them to executors, lifts, and
user code. A single question organises the design:
How shall
dyn Fn(&N) -> Hbe stored?
Rust offers three practical answers:
| Storage | Clone? | Send + Sync? |
|---|---|---|
Arc<dyn> | cheap (refcount bump) | yes, if the closure is |
Rc<dyn> | cheap (refcount bump) | no (single-threaded) |
Box<dyn> | not Clone | possible, but consumed on use |
Each choice is a compromise. Arc pays an atomic instruction on
every clone in exchange for the ability to cross thread
boundaries. Rc uses a plain counter — faster single-threaded,
incompatible with multi-threading. Box avoids any counter but
forces transformation pipelines to consume the closure on each
rewrite.
Every closure in a recursion must agree on the choice. hylic therefore selects once at the top level and propagates the choice through the entire pipeline; that selection is what is called a domain.
Three choices, three types
Sharedstores closures behindArcwithSend + Syncbounds. The atomic clone grants access to the parallelFunnelexecutor and makes every pipelineClone.Localstores closures behindRc(noSendbound). Clones remain cheap and the pipeline interfaces are unchanged, but execution is confined to a single thread. In return, captures may includeRc<_>,RefCell<_>, or any non-Sendtype.Ownedstores closures inBox. Clones and sharing are both absent; each stage of a pipeline consumes its predecessor. Appropriate for one-shot computations that should avoid reference-counting overhead entirely.
Shared is the conservative default and serves most code. Local
is the escape hatch for non-Send captures. Owned is the
minimalist one-shot variant.
The Domain trait
The three choices are encoded as marker types implementing the
Domain<N> trait:
#![allow(unused)]
fn main() {
pub trait Domain<N: 'static>: 'static {
type Fold<H: 'static, R: 'static>: FoldOps<N, H, R>;
type Graph<E: 'static> where E: 'static;
type Grow<Seed: 'static, NOut: 'static>;
/// Construct a fold from three closures. Uniform Send+Sync
/// bound; each domain sheds Send+Sync at storage time if it
/// doesn't need it.
fn make_fold<H: 'static, R: 'static>(
init: impl Fn(&N) -> H + Send + Sync + 'static,
acc: impl Fn(&mut H, &R) + Send + Sync + 'static,
fin: impl Fn(&H) -> R + Send + Sync + 'static,
) -> Self::Fold<H, R>;
/// Construct a grow closure from a Fn. Uniform Send+Sync bound.
fn make_grow<Seed: 'static, NOut: 'static>(
f: impl Fn(&Seed) -> NOut + Send + Sync + 'static,
) -> Self::Grow<Seed, NOut>;
/// Invoke a stored grow closure.
fn invoke_grow<Seed: 'static, NOut: 'static>(
g: &Self::Grow<Seed, NOut>,
s: &Seed,
) -> NOut;
/// Construct a graph (Edgy) closure. Uniform Send+Sync bound.
fn make_graph<E: 'static>(
visit: impl Fn(&N, &mut dyn FnMut(&E)) + Send + Sync + 'static,
) -> Self::Graph<E>;
}
}
A Domain<N> implementation specifies:
- the concrete
Fold<H, R>type in use (closure storage lives insideFold), - the concrete
Graphtype, - the concrete
Grow<Seed, N>type (for seed pipelines), - constructor methods (
make_fold,make_graph,make_grow) that build each of the above generically.
Code generic over D: Domain<N> constructs any of the three
without knowing whether the underlying storage is Arc, Rc,
or Box; the constructor methods handle the distinction.
Constructors across domains
Each domain exposes the same construction surface, distinguished only by its bounds:
#![allow(unused)]
fn main() {
#[test]
fn domains_three_folds() {
// Shared: closures must be Send + Sync (they go into Arc).
let _shared = hylic::domain::shared::fold(
|n: &u64| *n, // init
|h: &mut u64, c: &u64| *h += c, // accumulate
|h: &u64| *h, // finalize
);
// Local: closures can capture Rc / RefCell.
use std::cell::RefCell;
use std::rc::Rc;
let state = Rc::new(RefCell::new(0u32));
let state_for_init = state.clone();
let _local = hylic::domain::local::fold(
move |n: &u64| { *state_for_init.borrow_mut() += 1; *n },
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
// Owned: one-shot construction; not Clone.
let _owned = hylic::domain::owned::fold(
|n: &u64| *n,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
}
}
The Shared constructor requires Fn + Send + Sync + 'static
for every closure; Local requires Fn + 'static; Owned shares
Local’s bounds but returns a Box-backed struct. The
signatures are aligned so that generic code compiles without
modification across domains; the bounds differ so that each
domain accepts only those closures it is able to store.
The Fold struct, three times
Because storage differs, each domain ships its own Fold:
#![allow(unused)]
fn main() {
pub struct Fold<N, H, R> {
pub(crate) impl_init: Arc<dyn Fn(&N) -> H + Send + Sync>,
pub(crate) impl_accumulate: Arc<dyn Fn(&mut H, &R) + Send + Sync>,
pub(crate) impl_finalize: Arc<dyn Fn(&H) -> R + Send + Sync>,
}
}
Local and Owned share the same shape, with Rc and Box
substituted for Arc. The three are not interchangeable at the
type level: the Fused executor reads whichever concrete
D::Fold<H, R> the pipeline provides, and crossing domain
boundaries requires an explicit conversion that the library does
not supply — the expected discipline is to select a single domain
per computation.
Parallelism
The parallel Funnel executor requires
ShareableLift, a capability that reduces to
D = Shared together with Send + Sync on every payload
(N, H, R). Local and Owned cannot run in parallel by
construction: their storage types do not cross thread boundaries,
and the ShareableLift bound does not hold.
The converse is not true — a Shared pipeline runs without issue
under Fused. The price of choosing Shared is one atomic
operation per closure clone, and nothing more.
Picking one (decision tree)
In short: Shared by default, Local for non-Send captures,
Owned for the one-shot minimal case.
For library authors
Prefer code generic over D: Domain<N>. The three domain
markers are not interchangeable at runtime, but almost the whole
of hylic compiles once and operates across all three. Select a
concrete domain only where its specific capability is required
(D = Shared for parallelism; D = Owned for consume-on-use).
Transforms and variance
Where a type axis appears inside a fold or graph determines how
it may be transformed. The chapter opens with three examples
whose argument shapes differ in an informative way; the following
section traces those differences to the notion of variance. After
that, the library’s naming — map versus contramap versus
_bi — ceases to look like convention and becomes something that
can be read off the types.
Three transforms, three shapes
map on R — covariant, a single function. Given a
Fold<N, H, u64> summing filesystem sizes, producing a
Fold<N, H, String> that formats the sum requires only a forward
function u64 → String:
fold.map(|n: &u64| format!("{n} bytes"))
contramap_n — contravariant, a single function in the
opposite direction. Adapting a Fold<Path, H, R> to operate on
a Fold<PathBuf, H, R> requires bridging PathBuf → Path, since
the existing init consumes &Path and must continue to do so:
fold.contramap_n(|pb: &PathBuf| pb.as_path().clone())
map_r_bi — invariant, a pair of functions. Changing the
result type of an existing fold to a different representation
requires both directions, because R is accumulated into (a
parent receives its children’s R) as well as emitted (finalize
returns R); a one-way function cannot carry values through
both roles:
fold.map_r_bi(
/* forward */ |n: &u64| format!("{n}"),
/* backward */ |s: &String| s.parse().unwrap(),
)
Three argument shapes: one forward function (covariant), one reverse function (contravariant), a pair (invariant). The names track the shape.
Why the three shapes?
Each axis occupies a specific position within the slots:
Grow<Seed, N>: fn(&Seed) -> N ← N is an output
Graph<N>: fn(&N, &mut FnMut(&N)) ← N is both
Fold<N, H, R>::init: fn(&N) -> H ← N is an input
Fold<N, H, R>::acc: fn(&mut H, &R) ← H both, R input
Fold<N, H, R>::fin: fn(&H) -> R ← H input, R output
An axis that appears only in output position is covariant:
a forward function suffices to rewrite the produced value.
Hence map.
An axis that appears only in input position is
contravariant: adapting the axis requires a function in the
opposite direction, so the existing consumer continues to
receive values it understands. Hence contramap.
An axis that appears in both positions is invariant: no
single function bridges consumption and production together, so
both directions must be supplied. Hence the _bi suffix.
The three positions
N occupies all three positions across different slots — output
in Grow, input in Fold, both in Graph. A single “N transform”
would therefore apply in different directions depending on the
slot; the library instead exposes a per-slot transform on each
side, or a coordinated Lift that rewrites N across
all three slots at once.
H and R live only inside Fold, but each appears in both
positions there (H is init-output / acc-in+out / fin-input; R is
acc-input / fin-output). Both are invariant; changing either
requires a bijection.
Method surface, derived
With the variance pinned, the catalogue follows automatically.
On a Fold<N, H, R>:
contramap_n(f: N' → N)— contravariant change of N. One arg.map_r_bi(fwd, bwd)— invariant change of R. Two args.wrap_init(w),wrap_accumulate(w),wrap_finalize(w)— invariant decorators on H and R. They don’t change the axes they touch; they intercept the existing functions.zipmap(m)— a covariant extension: pair the existing R with an extra value derived from it. R changesR → (R, Extra), forward only; the new R’s first component is the old R, so “going back” is structurally free (|p: &(R, Extra)| &p.0).product(other)— binary: run two folds in lockstep, carrier(H1, H2), (R1, R2).
On an Edgy<N, E>:
map(f: E → E')— functor over edges (covariant on E).contramap(f: N' → N)— contravariant on N.filter(pred)— edge predicate.contramap_or_emit(f)— contramap with an escape hatch emitting edges directly (used in fallible graph construction).
Treeish<N> = Edgy<N, N> is what executors consume — the
specialisation where node type equals edge type.
The primitive the Edgy sugars wrap:
#![allow(unused)]
fn main() {
pub fn map<F, NewEdgeT: 'static>(&self, transform: F) -> Edgy<NodeT, NewEdgeT>
where F: Fn(&EdgeT) -> NewEdgeT + Send + Sync + 'static,
{
self.map_endpoints(move |inner| {
Arc::new(move |n: &NodeT, cb: &mut dyn FnMut(&NewEdgeT)| {
inner(n, &mut |e: &EdgeT| cb(&transform(e)))
})
})
}
}
#![allow(unused)]
fn main() {
pub fn contramap<F, NewNodeT: 'static>(&self, transform: F) -> Edgy<NewNodeT, EdgeT>
where F: Fn(&NewNodeT) -> NodeT + Send + Sync + 'static,
{
self.map_endpoints(move |inner| {
Arc::new(move |n: &NewNodeT, cb: &mut dyn FnMut(&EdgeT)| {
inner(&transform(n), cb)
})
})
}
}
Naming convention, recovered
From the above:
| Suffix | When |
|---|---|
none (map, filter, wrap_*) | covariant or decorator-only |
contramap, contramap_<axis> | contravariant; one function |
_bi (map_r_bi, map_n_bi_lift, …) | invariant; bijection required |
_or_emit | contramap with a direct-emit escape |
Names mark the variance, so the shape of the arguments is predictable from the identifier alone.
What this chapter does NOT cover
All the operations above change one axis of one structure
(Fold OR Graph). Changing N across BOTH structures in sync — or
building a new transform that wraps the whole (Grow, Graph, Fold)
triple and composes with others — is the job of a
Lift. Every library lift internally reduces to
one of these single-axis transforms or a coordinated set of them
(e.g. n_lift changes N across all three slots at once).
Category-theoretic framing (brief)
The catamorphism’s algebra is F R → R. hylic factors this through
H: init creates H from N, accumulate folds child Rs into H,
finalize projects H → R. The carrier between nodes is R; H is
internal to each node’s bracket. A lift is an algebra morphism —
it maps the carrier types (MapR, and internally the heap type
MapH) while preserving the fold structure. See
The N-H-R algebra factorization.
Lifts — cross-axis transforms
The problem
The transforms in the previous chapter act on a single structure — a Fold or a Graph, not both. Some rewrites, however, must touch both in a coordinated manner: a change of node type that the Graph produces and the Fold consumes; a filter that drops edges and must therefore also leave the Fold structurally consistent with what remains; a per-node trace that wraps the Fold’s output and composes with whatever other transforms are already in place.
A Lift is the object that performs such cross-axis rewrites. It operates on the full triple carried by a pipeline, not on one slot in isolation, and composes with other lifts to form a chain.
Why three axes
Besides Fold<N, H, R> and Graph<N>, there’s a third slot:
Grow<Seed, N> — a closure that resolves a Seed into an N.
Most users never build a Grow by hand; SeedPipeline
constructs one from grow: Seed → N. But because lifts compose
and some pipelines carry a Grow, the trait has to account for it.
Three slots, not two. The Lift trait threads all
three through composition. A lift that doesn’t care about Grow
(say, a fold-wrapper) passes it unchanged. A lift that does care
(the N-change lifts; SeedLift) rewrites it in concert with the
other slots.
The trait
#![allow(unused)]
fn main() {
/// Domain-generic transformer over the `(treeish, fold)` pair.
///
/// A `Lift` rewrites the graph side and/or the fold side, possibly
/// changing their carrier types, and hands the result to a
/// continuation. The caller's continuation-return type `T` flows
/// through, so the chain of output types stays inferred across
/// composition (`ComposedLift<L1, L2>`).
///
/// Grow is deliberately absent from this signature. Only the Seed
/// finishing lift ([`SeedLift`](super::SeedLift)) needs a grow
/// input; it is composed internally by
/// `hylic_pipeline::PipelineExecSeed::run` and does not travel as
/// a 3-slot signature through the `Lift` trait.
///
/// See [Lifts](https://hylic.balcony.codes/concepts/lifts.html).
pub trait Lift<D, N, H, R>
where D: Domain<N> + Domain<Self::N2>,
N: Clone + 'static, H: Clone + 'static, R: Clone + 'static,
{
/// Output node type after the lift has been applied.
type N2: Clone + 'static;
/// Output heap type after the lift has been applied.
type MapH: Clone + 'static;
/// Output result type after the lift has been applied.
type MapR: Clone + 'static;
/// Apply the lift to `(treeish, fold)` and invoke `cont` with
/// the transformed pair.
fn apply<T>(
&self,
treeish: <D as Domain<N>>::Graph<N>,
fold: <D as Domain<N>>::Fold<H, R>,
cont: impl FnOnce(
<D as Domain<Self::N2>>::Graph<Self::N2>,
<D as Domain<Self::N2>>::Fold<Self::MapH, Self::MapR>,
) -> T,
) -> T;
}
}
Three associated output types (N2, MapH, MapR) and a single
apply method. As a type-level arrow:
L : (Grow<Seed, N>, Graph<N>, Fold<N, H, R>)
→ (Grow<Seed, L::N2>, Graph<L::N2>, Fold<L::N2, L::MapH, L::MapR>)
Quick start
Most users never interact with the Lift trait directly; the
pipeline sugars are the usual surface, and each sugar delegates
to a library lift. A small example demonstrates what a lift
changes at the value level:
#![allow(unused)]
fn main() {
#[test]
fn bare_lift_wrap_init() {
use hylic::prelude::*;
let t: Treeish<u64> = treeish(|n: &u64| if *n > 0 { vec![*n - 1] } else { vec![] });
let fld: Fold<u64, u64, u64> = fold(|n: &u64| *n, |h: &mut u64, c: &u64| *h += c, |h: &u64| *h);
// Wrap init to add +1 at each node.
let wi = Shared::wrap_init_lift::<u64, u64, u64, _>(|n, orig| orig(n) + 1);
let r: u64 = wi.run_on(&FUSED, t, fld, &3u64);
// Tree 3→2→1→0: 4 nodes, each +1 → 4 extra → 6 + 4 = 10.
assert_eq!(r, 10);
}
}
wrap_init_lift accepts a closure that intercepts every call to
init. The pipeline’s R is unchanged; only the per-node init
closure is wrapped. The remaining sugars follow the same pattern:
select an axis, supply the transformation as a closure, obtain a
new pipeline that differs only along that axis.
The Library catalogue below lists the axes touched by each library lift.
Four atoms
Every library lift is an instance of one of four types. The sugars compose these atoms without requiring any of them to be constructed by hand; this section names the parts so that they are recognisable in compiler errors and in custom-lift implementations.
IdentityLift — pass-through. Used as the seed of a lift chain
when a Stage-1 pipeline transitions to Stage 2 via .lift().
#![allow(unused)]
fn main() {
/// The pass-through lift — the unit of lift composition. Leaves
/// every slot unchanged.
pub struct IdentityLift;
}
ComposedLift<L1, L2> — sequential composition. L1 runs
first; L2 takes L1’s outputs as its inputs.
#![allow(unused)]
fn main() {
/// Sequential composition of two lifts. `L1` runs first; `L2`
/// takes `L1`'s outputs as its inputs. The outer lift's `apply`
/// drives this composition.
#[must_use]
pub struct ComposedLift<L1, L2> {
pub(crate) inner: L1,
pub(crate) outer: L2,
}
````<div class="mdbook-graphviz-output"><!-- Generated by graphviz version 2.43.0 (0) --><!-- Title: %3 Pages: 1 --><svg width="607pt" height="44pt" viewBox="0.00 0.00 607.00 44.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 40)"><title>%3</title><polygon fill="white" stroke="transparent" points="-4,4 -4,-40 603,-40 603,4 -4,4"/><!-- in --><g id="node1" class="node"><title>in</title><path fill="#e8f5e9" stroke="black" d="M59,-36C59,-36 12,-36 12,-36 6,-36 0,-30 0,-24 0,-24 0,-12 0,-12 0,-6 6,0 12,0 12,0 59,0 59,0 65,0 71,-6 71,-12 71,-12 71,-24 71,-24 71,-30 65,-36 59,-36"/><text text-anchor="middle" x="35.5" y="-15.5" font-family="monospace" font-size="10.00">(N, H, R)</text></g><!-- mid --><g id="node2" class="node"><title>mid</title><path fill="#fff3cd" stroke="black" d="M323,-36C323,-36 162,-36 162,-36 156,-36 150,-30 150,-24 150,-24 150,-12 150,-12 150,-6 156,0 162,0 162,0 323,0 323,0 329,0 335,-6 335,-12 335,-12 335,-24 335,-24 335,-30 329,-36 323,-36"/><text text-anchor="middle" x="242.5" y="-15.5" font-family="monospace" font-size="10.00">(L1::N2, L1::MapH, L1::MapR)</text></g><!-- in->mid --><g id="edge1" class="edge"><title>in->mid</title><path fill="none" stroke="black" d="M71.28,-18C90.35,-18 115.09,-18 139.77,-18"/><polygon fill="black" stroke="black" points="139.96,-21.5 149.96,-18 139.96,-14.5 139.96,-21.5"/><text text-anchor="middle" x="110.5" y="-20.8" font-family="sans-serif" font-size="9.00">L1::apply</text></g><!-- out --><g id="node3" class="node"><title>out</title><path fill="#e3f2fd" stroke="black" d="M587,-36C587,-36 426,-36 426,-36 420,-36 414,-30 414,-24 414,-24 414,-12 414,-12 414,-6 420,0 426,0 426,0 587,0 587,0 593,0 599,-6 599,-12 599,-12 599,-24 599,-24 599,-30 593,-36 587,-36"/><text text-anchor="middle" x="506.5" y="-15.5" font-family="monospace" font-size="10.00">(L2::N2, L2::MapH, L2::MapR)</text></g><!-- mid->out --><g id="edge2" class="edge"><title>mid->out</title><path fill="none" stroke="black" d="M335.13,-18C357.27,-18 381.06,-18 403.64,-18"/><polygon fill="black" stroke="black" points="403.92,-21.5 413.92,-18 403.92,-14.5 403.92,-21.5"/><text text-anchor="middle" x="374.5" y="-20.8" font-family="sans-serif" font-size="9.00">L2::apply</text></g></g></svg></div>
The type-level bound `L2: Lift<D, L1::N2, L1::MapH, L1::MapR>`
enforces the connection. A mistake here surfaces as a compile
error at the composition site.
**`ShapeLift<D, N, H, R, N2, H2, R2>`** — the universal library
lift. Stores three per-domain xforms (one per slot) and applies
them in sequence.
````rust
/// The universal library `Lift` — stores one xform per slot
/// (treeish, fold) and applies them during `apply`. Every library
/// lift except `SeedLift` is a `ShapeLift` with appropriate xforms.
#[must_use]
pub struct ShapeLift<D, N, H, R, N2, H2, R2>
where D: ShapeCapable<N> + Domain<N2>,
N: Clone + 'static, H: Clone + 'static, R: Clone + 'static,
N2: Clone + 'static, H2: Clone + 'static, R2: Clone + 'static,
{
pub(crate) treeish_xform: D::TreeishXform<N2>,
pub(crate) fold_xform: D::FoldXform<H, R, N2, H2, R2>,
}
}
Every concrete library lift is a ShapeLift with appropriate
xforms. wrap_init_lift only rewrites Fold’s init phase;
filter_edges_lift only rewrites Graph’s visit; n_lift rewrites
all three; explainer_lift rewrites only Fold (but changes
MapH and MapR to the explainer’s wrapper types).
SeedLift<D, N, Seed, H> — a finishing lift that closes a
SeedPipeline by turning the (grow, seeds_from_node, fold) triple
into a runnable (treeish, fold) pair rooted at an EntryRoot
variant. Domain-parametric over ShapeCapable (Shared + Local
impls; per-domain because the fold-construction closures’
Send+Sync discipline differs by domain). Assembled at run time
inside the seed-rooted Stage2Pipeline::run(...) from the base’s
grow plus the caller-supplied root_seeds and entry_heap,
then composed as the first lift of the run-time chain.
#![allow(unused)]
fn main() {
/// The finishing lift that closes a `SeedPipeline`'s grow axis.
/// Composes entry-seed dispatch on top of a `(grow, seeds, fold)`
/// triple and produces a treeish over `SeedNode<N>`. Not
/// user-constructed; assembled internally by
/// `Stage2Pipeline::run` at call time.
///
/// Domain-parametric: storage of the entry-seeds graph and the
/// entry-heap thunk is per-domain via `<D as Domain<()>>::Graph<Seed>`
/// and `<D as ShapeCapable<N>>::EntryHeap<H>`. No hand-rolled
/// domain discriminator.
#[must_use]
pub struct SeedLift<D, N, Seed, H>
where D: ShapeCapable<N> + Domain<()>,
N: 'static, Seed: 'static, H: 'static,
{
pub(crate) grow: <D as Domain<N>>::Grow<Seed, N>,
pub(crate) entry_seeds: <D as Domain<()>>::Graph<Seed>,
pub(crate) entry_heap_fn: <D as ShapeCapable<N>>::EntryHeap<H>,
_m: PhantomData<fn() -> (D, N, Seed, H)>,
}
}
Its N2 is SeedNode<N> — a sealed row type whose variants are
library-internal; user code inspects via is_entry_root(),
as_node(), map_node(f). Two inhabitants: the synthetic
EntryRoot (root fan-out over entry seeds) and a resolved Node(N).
SeedLift builds a Treeish<SeedNode<N>> that dispatches on
variant: EntryRoot visits the entry seeds via grow, Node visits
the user’s treeish.
For an N-typed view of a seed-closed .explain() result, convert
the raw ExplainerResult<SeedNode<N>, H, R> to SeedExplainerResult
via raw.into() — see
seed explainer result.
Bare application
Any Lift is usable without a pipeline. LiftBare is a blanket
trait:
#![allow(unused)]
fn main() {
/// Blanket trait extending any [`Lift`] with direct application to
/// a bare `(treeish, fold)` pair. Implemented automatically; users
/// call `.apply_bare(...)` or `.run_on(...)` without a pipeline.
pub trait LiftBare<D, N, H, R>: Lift<D, N, H, R>
where D: ShapeCapable<N> + Domain<Self::N2>,
N: Clone + 'static, H: Clone + 'static, R: Clone + 'static,
Self::N2: Clone + 'static,
Self::MapH: Clone + 'static,
Self::MapR: Clone + 'static,
{
/// Apply this lift to a bare (treeish, fold) pair; return the
/// transformed pair.
fn apply_bare(
&self,
treeish: <D as Domain<N>>::Graph<N>,
fold: <D as Domain<N>>::Fold<H, R>,
) -> (<D as Domain<Self::N2>>::Graph<Self::N2>,
<D as Domain<Self::N2>>::Fold<Self::MapH, Self::MapR>)
{
self.apply(treeish, fold, |t, f| (t, f))
}
/// Apply this lift and run the result under the given executor.
fn run_on<E>(
&self,
exec: &E,
treeish: <D as Domain<N>>::Graph<N>,
fold: <D as Domain<N>>::Fold<H, R>,
root: &Self::N2,
) -> Self::MapR
where
E: Executor<
Self::N2, Self::MapR, D,
<D as Domain<Self::N2>>::Graph<Self::N2>,
>,
<D as Domain<Self::N2>>::Graph<Self::N2>: TreeOps<Self::N2>,
{
let (t, f) = self.apply_bare(treeish, fold);
exec.run(&f, &t, root)
}
}
}
See Bare lift application
in the Pipelines overview for the rationale and the panic-grow
trick that lets LiftBare skip the grow slot.
Per-domain capability
Not every domain supports ShapeLift. A domain has to declare
what it can store as a per-slot xform:
#![allow(unused)]
fn main() {
#[allow(missing_docs)] // associated types/methods are implementation plumbing for ShapeLift
pub trait ShapeCapable<N: 'static>: Domain<N> {
type GrowXform<N2: 'static>: Clone + 'static;
type TreeishXform<N2: 'static>: Clone + 'static;
type FoldXform<H, R, N2, H2, R2>: Clone + 'static
where H: 'static, R: 'static, N2: 'static, H2: 'static, R2: 'static;
fn apply_grow_xform<Seed: 'static, N2: 'static>(
t: &Self::GrowXform<N2>,
g: <Self as Domain<N>>::Grow<Seed, N>,
) -> <Self as Domain<N2>>::Grow<Seed, N2>
where Self: Domain<N2>;
fn apply_treeish_xform<N2: 'static>(
t: &Self::TreeishXform<N2>,
g: <Self as Domain<N>>::Graph<N>,
) -> <Self as Domain<N2>>::Graph<N2>
where Self: Domain<N2>;
fn apply_fold_xform<H, R, N2, H2, R2>(
t: &Self::FoldXform<H, R, N2, H2, R2>,
f: <Self as Domain<N>>::Fold<H, R>,
) -> <Self as Domain<N2>>::Fold<H2, R2>
where Self: Domain<N2>,
H: 'static, R: 'static, N2: 'static, H2: 'static, R2: 'static;
fn identity_grow_xform() -> Self::GrowXform<N>
where N: Clone;
fn identity_treeish_xform() -> Self::TreeishXform<N>
where N: Clone;
fn identity_fold_xform<H: 'static, R: 'static>() -> Self::FoldXform<H, R, N, H, R>;
/// Compose a `grow: Seed → N` with a `seeds: Graph<Seed>` to
/// produce the fused `Graph<N>` (treeish). Needed by
/// `SeedPipeline::with_constructed` which yields a treeish over
/// N to the executor.
fn fuse_grow_with_seeds<Seed: 'static>(
grow: <Self as Domain<N>>::Grow<Seed, N>,
seeds: <Self as Domain<N>>::Graph<Seed>,
) -> <Self as Domain<N>>::Graph<N>
where Seed: Clone;
/// Storage type for `SeedLift`'s entry-heap thunk: a
/// `Fn() -> H` whose backing pointer matches the domain's
/// closure-storage discipline (Arc on Shared, Rc on Local).
/// Used in place of a hand-rolled domain discriminator enum.
type EntryHeap<H: 'static>: Clone + 'static;
}
}
Shared and Local are ShapeCapable — each storage uses its
own pointer type (Arc vs Rc) and closure bounds (Send + Sync
vs none). Owned is not ShapeCapable: Box<dyn Fn> is not
Clone, so xforms can’t be applied to produce a new owned fold.
Owned pipelines have no Stage-2 surface.
Parallel vs sequential
Two blanket markers gate which executors a lift can feed:
PureLift<D, N, H, R>— anyLift + Clone + 'staticwithCloneoutputs. Sufficient for the sequential executorFused.ShareableLift<D, N, H, R>— addsSend + Syncon everything. Required for the parallelFunnelexecutor.
You don’t implement these; the compiler picks them up via blanket
impls in ops::lift::capability.
If your lift (or your data) doesn’t meet the parallel bounds,
calling .run(&funnel_exec, ...) is a compile error — there’s
no silent fallback.
Library catalogue
Each ShapeCapable domain exposes a set of constructors that
return a ShapeLift shaped for the transformation. For Shared:
| Constructor | What it changes |
|---|---|
Shared::wrap_init_lift(w) | intercept init at every node |
Shared::wrap_accumulate_lift(w) | intercept accumulate |
Shared::wrap_finalize_lift(w) | intercept finalize |
Shared::zipmap_lift(m) | extend R: R → (R, Extra) |
Shared::map_r_bi_lift(fwd, bwd) | change R (bijection required; R is invariant) |
Shared::filter_edges_lift(pred) | drop edges matching a predicate |
Shared::wrap_visit_lift(w) | intercept graph visit |
Shared::memoize_by_lift(key) | memoise subtree results by key |
Shared::map_n_bi_lift(co, contra) | change N (bijection; N is invariant across slots) |
Shared::n_lift(ln, bt, fc) | change N with per-slot coordination |
Shared::explainer_lift() | wrap fold with per-node trace recording |
Shared::explainer_describe_lift(fmt, emit) | streaming trace; MapR = R |
Shared::phases_lift(mi, ma, mf) | rewrite all three Fold phases (primitive) |
Shared::treeish_lift(mt) | rewrite the graph (primitive) |
Local mirrors the set (except explainer_describe_lift), with
Rc storage and no Send + Sync bounds.
The last two (phases_lift, treeish_lift) are the primitives:
the per-axis sugars all delegate to one of them. n_lift is the
primitive for coordinated N-change; map_n_bi_lift is the
bijective special case.
Appendix: why the trait takes a continuation
This section is relevant only to writing a custom Lift
implementation; it explains the signature rather than the
everyday use of lifts.
A direct signature would return the transformed triple from
apply. The return type of such a form is
(Grow<D, Seed, N2>, Graph<D, N2>, Fold<D, N2, H2, R2>), each
component a domain-associated GAT and each axis an associated
type of the lift. Following three chained lifts, the return type
admits no nameable alias.
Continuation-passing style — “CPS” in the source and in some
comments — avoids this. The caller supplies apply with a
closure (the continuation cont), which apply invokes with
the transformed triple. Because the continuation’s return type
propagates outward, Rust’s type inference threads every
intermediate through end-to-end, and no intermediate requires a
nameable alias.
Consequently, every pipeline’s .run(...) reduces to a single
descent through the lift chain via nested apply calls, each
closing over the next. The chain is constructed at the type
level, evaluated once at the value level, and the executor
ultimately sees only the final (treeish, fold) pair.
Fold: shaping the computation
A Fold<N, H, R> is defined by three phases: init,
accumulate, and finalize. Each phase is a closure stored in
the boxing strategy of the domain in
use — Arc for Shared, Rc for Local, Box for Owned. Each
phase may be transformed independently.
Named-closures-first pattern
Closures should be extracted and named before being passed to the constructor:
#![allow(unused)]
fn main() {
#[test]
fn named_closures_pattern() {
use hylic::prelude::*;
#[derive(Clone)]
struct N { val: u64, children: Vec<N> }
// Named closures — reusable across domains.
let init = |n: &N| n.val;
let acc = |h: &mut u64, c: &u64| *h += c;
let fin = |h: &u64| *h;
// A free `fn` makes the visit function's full type visible at the
// binding site; capture-free, reusable across domains.
fn children(n: &N, cb: &mut dyn FnMut(&N)) {
for c in &n.children { cb(c); }
}
let f: Fold<N, u64, u64> = fold(init, acc, fin);
let graph: Treeish<N> = treeish_visit(children);
let root = N { val: 1, children: vec![N { val: 2, children: vec![] }] };
assert_eq!(FUSED.run(&f, &graph, &root), 3);
}
}
This form allows closures to be reused across domains and read without nesting.
Phase transformations
Wrap individual phases without changing the fold’s types:
wrap_init — adding side effects at initialisation
#![allow(unused)]
fn main() {
#[test]
fn fold_wrap_init() {
use hylic::prelude::*;
#[derive(Clone)]
struct Dir { name: String, size: u64, children: Vec<Dir> }
let graph: Treeish<Dir> = treeish(|d: &Dir| d.children.clone());
let f: Fold<Dir, u64, u64> = fold(
|d: &Dir| d.size,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
let logged: Fold<Dir, u64, u64> = f.wrap_init(
|d: &Dir, orig: &dyn Fn(&Dir) -> u64| orig(d),
);
let tree = Dir { name: "r".into(), size: 10, children: vec![] };
assert_eq!(FUSED.run(&logged, &graph, &tree), 10);
}
}
The wrapper receives the node and the original init as a
callable reference. The closure may invoke it, modify its result,
add side effects, or bypass it entirely. The mechanism is
available in all three domains.
Result-type transformations
Change what the fold produces:
zipmap — augmenting the result
#![allow(unused)]
fn main() {
#[test]
fn fold_zipmap() {
use hylic::prelude::*;
#[derive(Clone)]
struct N { val: u64, children: Vec<N> }
let graph: Treeish<N> = treeish(|n: &N| n.children.clone());
let f: Fold<N, u64, u64> = fold(
|n: &N| n.val,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
let with_flag: Fold<N, u64, (u64, bool)> = f.zipmap(|r: &u64| *r > 5);
let root = N { val: 1, children: vec![
N { val: 3, children: vec![] },
N { val: 4, children: vec![] },
]};
let (total, over_five): (u64, bool) = FUSED.run(&with_flag, &graph, &root);
assert_eq!(total, 8);
assert!(over_five);
}
}
zipmap is the most common transformation: additional computed
data is attached to the result without altering the fold’s core
logic.
Node-type transformations
contramap — changing the input type
#![allow(unused)]
fn main() {
#[test]
fn fold_contramap() {
use hylic::prelude::*;
#[derive(Clone)]
struct N { val: u64, children: Vec<N> }
let f: Fold<N, u64, u64> = fold(
|n: &N| n.val,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
// Change node type: String → N.
let by_name: Fold<String, u64, u64> =
f.contramap_n(|s: &String| N { val: s.len() as u64, children: vec![] });
let graph: Treeish<String> =
treeish_visit(|_: &String, _cb: &mut dyn FnMut(&String)| {});
let result: u64 = FUSED.run(&by_name, &graph, &"hello".to_string());
assert_eq!(result, 5);
}
}
Only init consumes the node directly. contramap wraps init
to transform the input; accumulate and finalize are left
unchanged. See also
Transforms and variance
for the variance story that dictates the argument shape.
Composition
product — two folds, one traversal
#![allow(unused)]
fn main() {
#[test]
fn fold_product() {
use hylic::prelude::*;
#[derive(Clone)]
struct Dir { name: String, size: u64, children: Vec<Dir> }
let graph: Treeish<Dir> = treeish(|d: &Dir| d.children.clone());
let size_fold: Fold<Dir, u64, u64> = fold(
|d: &Dir| d.size,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
let both: Fold<Dir, (u64, usize), (u64, usize)> = size_fold.product(&depth_fold());
let tree = Dir {
name: "r".into(), size: 10,
children: vec![Dir { name: "a".into(), size: 5, children: vec![] }],
};
let (total_size, max_depth) = FUSED.run(&both, &graph, &tree);
assert_eq!(total_size, 15);
assert_eq!(max_depth, 2);
}
}
The categorical product: each fold maintains its own heap, observes its own child results, and produces its own output. One traversal yields two results; no node is visited twice.
Domain parity
All three domains support the same transformation surface:
| Method | Shared | Local | Owned | Effect |
|---|---|---|---|---|
wrap_init | &self | &self | self | intercept init phase |
wrap_accumulate | &self | &self | self | intercept accumulate phase |
wrap_finalize | &self | &self | self | intercept finalize phase |
map | &self | &self | self | change result type R → R2 |
zipmap | &self | &self | self | augment result (R, Extra) |
contramap | &self | &self | self | change node type N → N2 |
product | &self | &self | self | two folds, one traversal |
Shared and Local borrow &self, so the original fold is
preserved; Owned consumes self, moving the original into the
result. All three delegate to the same domain-independent
combinator functions in fold/combinators.rs; auto-trait
propagation ensures that Send + Sync flows correctly for
Shared.
All domains also expose .init(), .accumulate(), and
.finalize() as direct methods, in addition to the FoldOps
trait implementation.
See The three domains for guidance on when to select which domain.
Working example
#![allow(unused)]
fn main() {
//! Transformations: features as standalone functions that match the contract.
//!
//! One domain, one base fold, one base graph. Each feature is a named
//! function — it IS the concern, separated and reusable. Plugging it
//! in is a single method call on the existing construct.
#[cfg(test)]
mod tests {
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use hylic::prelude::*;
use hylic::prelude::memoize_treeish_by;
use insta::assert_snapshot;
// ── Domain ──────────────────────────────────────────────
#[derive(Clone, Debug)]
struct Task {
name: String,
cost_ms: u64,
deps: Vec<String>,
}
struct Registry(HashMap<String, Task>);
impl Registry {
fn new(tasks: &[(&str, u64, &[&str])]) -> Self {
Registry(tasks.iter().map(|(name, cost, deps)| {
(name.to_string(), Task {
name: name.to_string(),
cost_ms: *cost,
deps: deps.iter().map(|d| d.to_string()).collect(),
})
}).collect())
}
fn get(&self, name: &str) -> Option<&Task> { self.0.get(name) }
}
// ── Shared setup ────────────────────────────────────────
fn setup() -> (Treeish<Task>, Task) {
let reg = Registry::new(&[
("app", 50, &["compile", "link"]),
("compile", 200, &["parse", "typecheck"]),
("parse", 100, &[]),
("typecheck", 300, &[]),
("link", 150, &[]),
]);
let map = reg.0.clone();
let g: Treeish<Task> = treeish(move |task: &Task| {
task.deps.iter().filter_map(|d| map.get(d).cloned()).collect()
});
let root = reg.get("app").unwrap().clone();
(g, root)
}
fn base_fold() -> Fold<Task, u64, u64> {
fold(
|t: &Task| t.cost_ms,
|heap: &mut u64, child: &u64| *heap += child,
|h: &u64| *h,
)
}
// ── Fold phase wrappers ─────────────────────────────────
//
// Each is a standalone closure matching the wrap contract:
// wrap_init: Fn(&N, &dyn Fn(&N) -> H) -> H
// wrap_accumulate: Fn(&mut H, &R, &dyn Fn(&mut H, &R))
// wrap_finalize: Fn(&H, &dyn Fn(&H) -> R) -> R
/// Hooks into init: called once per node, before children.
/// Logs the task name, then delegates to the original init.
fn visit_logger(sink: Arc<Mutex<Vec<String>>>)
-> impl Fn(&Task, &dyn Fn(&Task) -> u64) -> u64
{
move |task: &Task, orig: &dyn Fn(&Task) -> u64| {
sink.lock().unwrap().push(task.name.clone());
orig(task)
}
}
/// Hooks into accumulate: conditionally skips small children.
/// By not calling orig, the child result is never folded in.
fn skip_small_children(threshold: u64)
-> impl Fn(&mut u64, &u64, &dyn Fn(&mut u64, &u64))
{
move |heap: &mut u64, child: &u64, orig: &dyn Fn(&mut u64, &u64)| {
if *child >= threshold { orig(heap, child); }
}
}
/// Hooks into finalize: clamps the result.
fn clamp_at(max: u64)
-> impl Fn(&u64, &dyn Fn(&u64) -> u64) -> u64
{
move |heap: &u64, orig: &dyn Fn(&u64) -> u64| orig(heap).min(max)
}
/// zipmap contract: a plain Fn(&R) -> Extra. No wrapping needed —
/// the function itself IS the feature. zipmap calls it per node,
/// pairing the original result with the derived value: R → (R, Extra).
fn classify(total: &u64) -> &'static str {
match *total {
t if t >= 500 => "critical",
t if t >= 200 => "heavy",
_ => "light",
}
}
// ── Graph transformations ───────────────────────────────
fn only_costly_deps(g: &Treeish<Task>, min_cost: u64) -> Treeish<Task> {
let inner = g.clone();
treeish(move |task: &Task| {
inner.at(task)
.filter(|child: &Task| child.cost_ms >= min_cost)
.collect_vec()
})
}
// ── Tests ───────────────────────────────────────────────
#[test]
fn test_visit_logger() {
let (graph, root) = setup();
let visited = Arc::new(Mutex::new(Vec::new()));
let fold = base_fold().wrap_init(visit_logger(visited.clone()));
let total = FUSED.run(&fold, &graph, &root);
let names: Vec<String> = visited.lock().unwrap().clone();
assert_eq!(total, 800);
assert_snapshot!("visit_logger", format!(
"total={total}, visited: {}", names.join(" → ")
));
}
#[test]
fn test_skip_small_children() {
let (graph, root) = setup();
let fold = base_fold().wrap_accumulate(skip_small_children(200));
let total = FUSED.run(&fold, &graph, &root);
// app(50) + compile(200+typecheck 300) = 550; parse(100) and link(150) skipped
assert_eq!(total, 550);
assert_snapshot!("skip_small", format!("total={total} (small children skipped)"));
}
#[test]
fn test_clamp_at() {
let (graph, root) = setup();
let fold = base_fold().wrap_finalize(clamp_at(500));
let total = FUSED.run(&fold, &graph, &root);
// compile=min(600,500)=500, link=150, app=min(50+500+150,500)=500
assert_eq!(total, 500);
assert_snapshot!("clamp_at", format!("total={total} (clamped at 500)"));
}
#[test]
fn test_classify() {
let (graph, root) = setup();
let (total, category) = FUSED.run(&base_fold().zipmap(classify), &graph, &root);
assert_eq!(total, 800);
assert_eq!(category, "critical");
assert_snapshot!("classify", format!("total={total}, category={category}"));
}
#[test]
fn test_only_costly_deps() {
let (graph, root) = setup();
let filtered = only_costly_deps(&graph, 150);
let total = FUSED.run(&base_fold(), &filtered, &root);
// parse(100) pruned: app(50)+compile(200)+typecheck(300)+link(150) = 700
assert_eq!(total, 700);
assert_snapshot!("only_costly", format!("total={total} (deps with cost < 150 pruned)"));
}
#[test]
fn test_memoize_diamond() {
let reg = Registry::new(&[
("app", 10, &["compile", "link"]),
("compile", 50, &["stdlib"]),
("link", 30, &["stdlib"]),
("stdlib", 200, &[]),
]);
let visit_count = Arc::new(Mutex::new(0u32));
let vc = visit_count.clone();
let map = reg.0.clone();
let graph = treeish(move |task: &Task| {
*vc.lock().unwrap() += 1;
task.deps.iter().filter_map(|d| map.get(d).cloned()).collect()
});
let root = reg.get("app").unwrap().clone();
let total = FUSED.run(&base_fold(), &graph, &root);
let raw_visits = *visit_count.lock().unwrap();
*visit_count.lock().unwrap() = 0;
let cached = memoize_treeish_by(&graph, |t: &Task| t.name.clone());
let total_memo = FUSED.run(&base_fold(), &cached, &root);
let memo_visits = *visit_count.lock().unwrap();
assert_eq!((total, raw_visits), (490, 5));
assert_eq!((total_memo, memo_visits), (490, 4));
assert_snapshot!("memoize", format!(
"raw: total={total} visits={raw_visits}, memo: total={total_memo} visits={memo_visits}"
));
}
#[test]
fn test_composed_pipeline() {
let (graph, root) = setup();
let visited = Arc::new(Mutex::new(Vec::new()));
let pipeline = base_fold()
.wrap_init(visit_logger(visited.clone()))
.wrap_finalize(clamp_at(500))
.zipmap(classify);
let (total, category) = FUSED.run(&pipeline, &graph, &root);
let names: Vec<String> = visited.lock().unwrap().clone();
assert_eq!(total, 500);
assert_eq!(category, "critical");
assert_snapshot!("composed", format!(
"total={total} [{category}], visited: {}", names.join(" → ")
));
}
}
}
Graph: controlling traversal
The graph — Treeish<N>, or the more general Edgy<N, E> — is a
function from a node to its children, and determines what is
visited during the fold. The node type N may be any type: a
struct, an integer index, a string key, a database identifier.
The structure of the tree resides in the function, not in the
data.
Constructors
Three means of creating a Treeish<N>:
#![allow(unused)]
fn main() {
#[test]
fn treeish_constructors() {
use hylic::prelude::*;
#[derive(Clone)]
struct Node { value: u64, children: Vec<Node> }
let root = Node { value: 1, children: vec![Node { value: 2, children: vec![] }] };
// Callback-based (zero allocation per visit):
let g1: Treeish<Node> = treeish_visit(|n: &Node, cb: &mut dyn FnMut(&Node)| {
for child in &n.children { cb(child); }
});
// Vec-returning (allocates per visit):
let g2: Treeish<Node> = treeish(|n: &Node| n.children.clone());
// Slice accessor (borrows, zero allocation):
let g3: Treeish<Node> = treeish_from(|n: &Node| n.children.as_slice());
assert_eq!(g1.apply(&root).len(), 1);
assert_eq!(g2.apply(&root).len(), 1);
assert_eq!(g3.apply(&root).len(), 1);
// Flat data — nodes are indices, children from adjacency list:
let adj: Vec<Vec<usize>> = vec![vec![1, 2], vec![], vec![]];
let g4: Treeish<usize> = treeish_visit(move |n: &usize, cb: &mut dyn FnMut(&usize)| {
for &c in &adj[*n] { cb(&c); }
});
assert_eq!(g4.apply(&0).len(), 2);
}
}
treeish_visit is the most general form; its callback receives
each child without the allocation of a Vec. treeish wraps a
Vec-returning function for convenience, and treeish_from
extracts a slice reference from a field.
For non-nested data — adjacency lists, maps, external lookups —
treeish_visit is the appropriate constructor:
// Adjacency list: nodes are indices
let adj: Vec<Vec<usize>> = vec![vec![1, 2], vec![3], vec![], vec![]];
let graph = treeish_visit(move |n: &usize, cb: &mut dyn FnMut(&usize)| {
for &c in &adj[*n] { cb(&c); }
});
// HashMap-backed graph: nodes are string keys
let edges: HashMap<String, Vec<String>> = /* ... */;
let graph = treeish_visit(move |n: &String, cb: &mut dyn FnMut(&String)| {
if let Some(children) = edges.get(n) {
for c in children { cb(c); }
}
});
For a runnable adjacency-list example, see
intro_flat_example.
Edge transformations
The Edgy<N, E> type generalises Treeish<N> by allowing edges
and nodes to be different types. Combinators transform the edge
type or the node type:
filter — pruning children
#![allow(unused)]
fn main() {
#[test]
fn graph_filter() {
use hylic::prelude::*;
#[derive(Clone)]
struct Node { value: u64, children: Vec<Node> }
let graph: Treeish<Node> = treeish(|n: &Node| n.children.clone());
let f: Fold<Node, u64, u64> = fold(
|n: &Node| n.value,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
let root = Node { value: 1, children: vec![
Node { value: 10, children: vec![] },
Node { value: 2, children: vec![] },
]};
// Only visit children with value > 5.
let pruned: Treeish<Node> = graph.filter(|child: &Node| child.value > 5);
let result: u64 = FUSED.run(&f, &pruned, &root);
assert_eq!(result, 11); // 1 + 10 (skipped 2)
}
}
The fold sees fewer children without any awareness that pruning has occurred.
Caching: memoize_treeish
For DAGs in which the same node is reachable from multiple
parents, memoize_treeish caches the child enumeration:
#![allow(unused)]
fn main() {
#[test]
fn memoize_example() {
use hylic::prelude::*;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
let call_count = Arc::new(AtomicUsize::new(0));
let cc = call_count.clone();
let graph: Treeish<u64> = treeish(move |n: &u64| -> Vec<u64> {
cc.fetch_add(1, Ordering::Relaxed);
if *n == 0 { vec![] } else { vec![n - 1] }
});
let f: Fold<u64, u64, u64> = fold(
|n: &u64| *n,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
let cached: Treeish<u64> = memoize_treeish(&graph);
let _ = FUSED.run(&f, &cached, &3u64);
let first_count = call_count.load(Ordering::Relaxed);
// Second run hits the cache; no new calls into `graph`.
let _ = FUSED.run(&f, &cached, &3u64);
let second_count = call_count.load(Ordering::Relaxed);
assert_eq!(first_count, second_count);
}
}
The first visit to a key computes and caches its children; subsequent visits return the cached result.
Visit combinator
Edgy::at(node) returns a Visit<T, F> — a push-based iterator
exposing map, filter, fold, count, and collect_vec. All
combinators are callback-based internally.
Execution: choosing the strategy
The executor governs how the tree recursion is carried out. The fold and graph determine what is computed at each node and how children are found; the executor determines traversal order, parallelism, and resource lifecycle. Substituting one executor for another changes performance characteristics without modifying the fold or the graph.
The interface
Both sequential and parallel execution use the same .run()
method on Exec<D, S>. The method is inherent; no trait import
is required:
#![allow(unused)]
fn main() {
use hylic::prelude::*;
// Sequential:
FUSED.run(&fold, &graph, &root);
// Parallel:
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
}
The domain D is fixed by the executor instance or the
exec() call. The type parameters N, H, R, and the graph
type G are inferred from the arguments.
Built-in executors
Two executors are provided. The choice between them is straightforward:
| Executor | Domain | Graph requirement | Characteristics |
|---|---|---|---|
| Fused | all | any TreeOps<N> | Sequential direct recursion (single thread) |
| Funnel | Shared | TreeOps<N> + Send + Sync | Parallel work-stealing across a scoped thread pool |
Fused operates on all domains and all graph types because it
borrows everything on a single thread. Funnel requires
Send + Sync on the graph because it shares the graph reference
across a scoped thread pool.
Using the Funnel executor
The Funnel executor supports three usage tiers, trading
convenience for control over resource lifetime:
One-shot — the pool is created and destroyed per call:
#![allow(unused)]
fn main() {
use hylic::prelude::*;
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
}
Session scope — the pool is shared across multiple folds:
#![allow(unused)]
fn main() {
exec(funnel::Spec::default(8)).session(|s| {
s.run(&fold, &graph, &root);
s.run(&fold, &graph, &root);
});
}
Explicit attach — the caller manages the pool directly:
#![allow(unused)]
fn main() {
funnel::Pool::with(8, |pool| {
exec(funnel::Spec::default(8)).attach(pool).run(&fold, &graph, &root);
});
}
See Policies and presets for workload-specific configuration.
Defining a project-wide executor
For projects that use a fixed Funnel configuration, a common pattern is to define the executor once and reference it throughout:
use hylic::prelude::*;
type MyPolicy = funnel::policy::Policy<
funnel::queue::PerWorker,
funnel::accumulate::OnArrival,
funnel::wake::EveryK<4>,
>;
pub fn project_exec() -> hylic::exec::Exec<Shared, funnel::Spec<MyPolicy>> {
let nw = std::thread::available_parallelism().map(|n| n.get()).unwrap_or(4);
exec(
funnel::Spec::default(nw)
.with_accumulate::<funnel::accumulate::OnArrival>(
funnel::accumulate::on_arrival::OnArrivalSpec)
.with_wake::<funnel::wake::EveryK<4>>(
funnel::wake::every_k::EveryKSpec)
)
}
Call sites then use crate::project_exec().run(&fold, &graph, &root)
without naming the policy type.
Lift integration
Lifts operate on the Shared domain. The Explainer is the canonical
example — composed onto a fold, it captures every node’s
intermediate state into an ExplainerResult<N, H, R>:
#![allow(unused)]
fn main() {
#[test]
fn explainer_usage() {
use hylic_pipeline::prelude::*;
#[derive(Clone)]
struct N { val: u64, children: Vec<N> }
let f: Fold<N, u64, u64> = fold(
|n: &N| n.val,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
let root = N { val: 1, children: vec![N { val: 2, children: vec![] }] };
let trace: ExplainerResult<N, u64, u64> =
TreeishPipeline::new(treeish(|n: &N| n.children.clone()), &f)
.lift()
.then_lift(Shared::explainer_lift::<N, u64, u64>())
.run_from_node(&FUSED, &root);
assert_eq!(trace.orig_result, 3);
}
}
See Lifts for the Explainer and other
lift patterns, and Pipeline overview
for the chainable .explain() sugar that wraps this.
Further reading
- The Exec pattern — the
type-level design behind
Spec,Session, andExec. - Policy traits — how Funnel’s three behavioural axes compose.
- The three domains — how the domain parameter selects fold storage.
Pipelines — overview
hylic-pipeline is a typestate builder over hylic’s lift
primitives. Three pipeline types sit behind the same builder
surface, distinguished by what they hold:
| Pipeline | Slots | When to use |
|---|---|---|
SeedPipeline<D, N, Seed, H, R> | grow, seeds_from_node, fold | Tree is discovered lazily through a Seed → N resolver. Run from a forest of entry seeds. |
TreeishPipeline<D, N, H, R> | treeish, fold | Children are enumerable directly from the node (N → N*). Run from a known root &N. |
OwnedPipeline<N, H, R> | treeish, fold (Owned domain) | One-shot, by-value, no Clone. Run consumes self. |
Each pipeline is Stage 1: it stores its base slots and
exposes per-shape reshape sugars (e.g. filter_seeds,
map_node_bi, wrap_grow). Calling .lift() flips it into
Stage 2, where every method composes a lift onto the chain
held in Stage2Pipeline<Base, L>. Stage2Pipeline is one type
parameterised over which Stage-1 base is wrapped; the sugar
trait body covers both bases through Wrap dispatch.
Run methods are owned by the pipeline that defines them:
SeedPipeline::run / run_from_slice in
Stage 1 — SeedPipeline;
PipelineExec::run_from_node in
Stage 1 — TreeishPipeline;
PipelineExecOnce::run_from_node_once in
OwnedPipeline. Stage2Pipeline inherits run from
its Stage-1 base.
Stage 1 — SeedPipeline
A SeedPipeline carries three base slots — a coalgebra plus an algebra:
#![allow(unused)]
fn main() {
/// Stage-1 typestate pipeline with three base slots: `grow`,
/// `seeds_from_node`, and `fold`. Used when the tree is discovered
/// lazily from `Seed` references.
#[must_use]
pub struct SeedPipeline<D, N, Seed, H, R>
where D: Domain<N>,
N: 'static, Seed: 'static, H: 'static, R: 'static,
{
pub(crate) grow: <D as Domain<N>>::Grow<Seed, N>,
pub(crate) seeds_from_node: <D as Domain<N>>::Graph<Seed>,
pub(crate) fold: <D as Domain<N>>::Fold<H, R>,
}
}
grow: Seed → N— resolves a reference (aSeed) into a full node (N).seeds_from_node: Edgy<N, Seed>— given a resolved node, enumerates the references it points to.fold: Fold<N, H, R>— the algebra over resolved nodes.
The pipeline operates lazily on demand: given an entry seed at run time,
it grows the tree by alternating grow and seeds_from_node until each
branch terminates at a leaf.
When to pick this over TreeishPipeline
Use SeedPipeline when the dependency graph speaks a different language
from the nodes — file paths, module names, URLs, anything that must be
resolved into a full data structure before its children can be examined.
When the nodes themselves already enumerate their children directly
(N → N*), TreeishPipeline is simpler: no grow slot.
Constructing one
#![allow(unused)]
fn main() {
#[test]
fn pipeline_overview_seed() {
use hylic_pipeline::prelude::*;
use std::collections::HashMap;
use std::sync::Arc;
#[derive(Clone)]
struct Mod { cost: u64, deps: Vec<String> }
let reg: Arc<HashMap<String, Mod>> = Arc::new({
let mut m = HashMap::new();
m.insert("app".into(), Mod { cost: 1, deps: vec!["db".into()] });
m.insert("db".into(), Mod { cost: 2, deps: vec![] });
m
});
let reg_grow = reg.clone();
let sp: SeedPipeline<Shared, Mod, String, u64, u64> = SeedPipeline::new(
move |s: &String| reg_grow.get(s).cloned().unwrap(),
edgy_visit(|n: &Mod, cb: &mut dyn FnMut(&String)| {
for d in &n.deps { cb(d); }
}),
&fold(|n: &Mod| n.cost, |h: &mut u64, c: &u64| *h += c, |h: &u64| *h),
);
let r: u64 = sp
.filter_seeds(|s: &String| !s.starts_with('_'))
.run_from_slice(&FUSED, &["app".to_string()], 0u64);
// Reachable modules: app (cost 1) + db (cost 2) = 3.
assert_eq!(r, 3);
}
}
Stage-1 reshape
A SeedPipeline can be reshaped without lifting — the result is still a
SeedPipeline of (possibly different) type parameters. The
SeedSugarsShared
trait provides the surface; SeedSugarsLocal mirrors it for the Local
domain. Both come into scope via use hylic_pipeline::prelude::*;.
| method | changes |
|---|---|
filter_seeds(pred) | Seed set narrowed; types preserved |
wrap_grow(w) | intercepts every grow; types preserved |
map_node_bi(co, contra) | changes N to N2 via bijection |
map_seed_bi(to, from) | changes Seed to Seed2 via bijection |
Transitioning to Stage 2
Stage-2 sugars are not available on SeedPipeline directly — an
explicit .lift() is required. (TreeishPipeline auto-lifts; SeedPipeline
does not, because the Stage-2 chain operates over SeedNode<N> rather
than N, and an implicit transition would surface that asymmetry in
error messages.)
let lsp = pipeline
.lift() // → Stage2Pipeline<SeedPipeline<…>, IdentityLift>
.wrap_init(|n: &N, orig| orig(n) + 1)
.zipmap(|r: &R| classify(r)); // chain extends; tip R becomes (R, classification)
After .lift(), the chain operates on SeedNode<N> — but every Stage-2
sugar’s user closure types at &N. The SeedNode row is sealed and
auto-dispatched; see SeedNode<N> for the row’s shape
and the rare cases where it surfaces in a chain-tip type, and
Wrap dispatch for how the sugar trait reaches both
Bases through one body.
Running
Two equivalent surfaces:
- Direct on
SeedPipeline—.run(exec, entry_seeds, entry_heap)and.run_from_slice(exec, &[seeds], entry_heap)are inherent onSeedPipeline<D, …>itself. They forward throughself.clone().lift()internally; ergonomic shorthand for the common case where no Stage-2 sugars are chained. - On
Stage2Pipeline<SeedPipeline<…>, L>— same method names, same arguments. Used after.lift()plus any chain of Stage-2 sugars.
// Entry seeds as a slice (convenience), no sugars — direct on SeedPipeline:
let r: u64 = pipeline.run_from_slice(&FUSED, &["app".to_string()], 0u64);
// Entry seeds as a general Edgy<(), Seed>, no sugars:
let entry = edgy_visit(|_: &(), cb| cb(&"app".to_string()));
let r: u64 = pipeline.run(&FUSED, entry, 0u64);
// With Stage-2 sugars — `.lift()` is the explicit transition:
let r: u64 = pipeline
.lift()
.wrap_init(|n: &Mod, orig: &dyn Fn(&Mod) -> u64| orig(n) + 1)
.run_from_slice(&FUSED, &["app".to_string()], 0u64);
The last argument is the initial heap at the synthetic root level —
what the top-level accumulator starts with before any seed’s result is
folded in. It is always the base H type; the chain’s own MapH
is reached internally as the sugars promote from H outward.
.lift() itself is preserved as the explicit Stage-1 → Stage-2
transition. The shorthand on SeedPipeline exists to elide the
empty-.lift() ceremony at call sites that do not chain sugars; when
sugars are involved, .lift() makes the row-type transition (chain
input becoming SeedNode<N>) traceable to a single line in the
user’s source.
Full example
#![allow(unused)]
fn main() {
#[test]
fn seed_pipeline_example() {
use hylic_pipeline::prelude::*;
use std::collections::HashMap;
// The "registry" — flat data, not a tree.
let mut modules: HashMap<String, Vec<String>> = HashMap::new();
modules.insert("app".into(), vec!["db".into(), "auth".into()]);
modules.insert("db".into(), vec![]);
modules.insert("auth".into(), vec!["db".into()]);
// Edge function: given a module name, produce its dependency seeds.
let reg = modules.clone();
let seeds_from_node: Edgy<String, String> =
edgy_visit(move |name: &String, cb: &mut dyn FnMut(&String)| {
if let Some(deps) = reg.get(name) {
for dep in deps { cb(dep); }
}
});
// Fold: collect every reachable name.
let f: Fold<String, Vec<String>, Vec<String>> = fold(
|name: &String| vec![name.clone()],
|heap: &mut Vec<String>, child: &Vec<String>| heap.extend(child.iter().cloned()),
|heap: &Vec<String>| heap.clone(),
);
let pipeline: SeedPipeline<Shared, String, String, Vec<String>, Vec<String>> =
SeedPipeline::new(|seed: &String| seed.clone(), seeds_from_node, &f);
let result: Vec<String> = pipeline.run_from_slice(
&FUSED,
&["app".to_string()],
Vec::new(),
);
assert!(result.contains(&"app".to_string()));
assert!(result.contains(&"auth".to_string()));
}
}
Stage 1 — TreeishPipeline
#![allow(unused)]
fn main() {
/// Stage-1 typestate pipeline with two base slots: `treeish`
/// (graph) and `fold`. Used when children are directly enumerable
/// from nodes of the same type (`N → N*`).
#[must_use]
pub struct TreeishPipeline<D, N, H, R>
where D: Domain<N>,
N: 'static, H: 'static, R: 'static,
{
pub(crate) treeish: <D as Domain<N>>::Graph<N>,
pub(crate) fold: <D as Domain<N>>::Fold<H, R>,
}
}
Two slots:
treeish: <D as Domain<N>>::Graph<N>— direct child enumeration,N → N*.fold: <D as Domain<N>>::Fold<H, R>— the algebra overN.
No grow step, no entry seeds. Execution starts from a &N
root supplied to the executor.
Constructors
#![allow(unused)]
fn main() {
// Shared domain.
TreeishPipeline::<Shared, _, _, _>::new(
treeish_arc, // hylic::graph::Treeish<N>
&fold, // &shared::Fold<N, H, R>
);
// Local domain — note the `_local` suffix; Rust's inherent-method
// resolution can't disambiguate two `new`s on the same struct that
// differ only in the domain marker.
TreeishPipeline::<Local, _, _, _>::new_local(
treeish_local, // local::Edgy<N, N>
fold_local, // local::Fold<N, H, R>
);
// Domain-generic.
TreeishPipeline::<D, _, _, _>::from_slots(treeish, fold);
}
Stage-1 reshape
One sugar — there’s no grow axis to reshape and no seeds to
filter:
| method | output |
|---|---|
map_node_bi(co, contra) | TreeishPipeline<D, N2, H, R> |
Provided by TreeishSugarsShared (Local mirror:
TreeishSugarsLocal); see Sugars.
Stage 2
Two ways to enter:
- Explicit:
tree_pipeline.lift()returnsStage2Pipeline<TreeishPipeline<D, N, H, R>, IdentityLift>. - Auto-lift: every Stage-2 sugar is also callable directly on
TreeishPipeline.tree_pipeline.wrap_init(w)is shorthand fortree_pipeline.lift().wrap_init(w).
#![allow(unused)]
fn main() {
#[test]
fn treeish_pipeline_chain() {
use hylic_pipeline::prelude::*;
#[derive(Clone)]
struct Node { value: u64, children: Vec<Node> }
let root = Node {
value: 1,
children: vec![
Node { value: 2, children: vec![] },
Node { value: 3, children: vec![] },
],
};
let tp: TreeishPipeline<Shared, Node, u64, u64> = TreeishPipeline::new(
treeish(|n: &Node| n.children.clone()),
&fold(|n: &Node| n.value, |h: &mut u64, c: &u64| *h += c, |h: &u64| *h),
);
let r: (u64, bool) = tp
.wrap_init(|n: &Node, orig: &dyn Fn(&Node) -> u64| orig(n) + 1)
.zipmap(|r: &u64| *r > 5)
.run_from_node(&FUSED, &root);
assert_eq!(r, (9, true));
}
}
The chain’s input N stays at the user’s N (no wrap layer);
the Wrap impl is Identity.
Running
#![allow(unused)]
fn main() {
let r = pipeline.run_from_node(&FUSED, &root);
}
PipelineExec::run_from_node(&exec, &root) is a blanket method
on every TreeishSource. The first init runs on the supplied
root. Returns the chain-tip R — the base fold’s R when no
Stage-2 sugars are composed, otherwise whatever the rightmost
lift produces.
Stage2Pipeline<TreeishPipeline<…>, L> inherits the same
method through its TreeishSource impl; the call shape is
identical.
Worked example
#![allow(unused)]
fn main() {
#[test]
fn treeish_pipeline_ctor() {
use hylic_pipeline::prelude::*;
#[derive(Clone)]
struct Node { value: u64, children: Vec<Node> }
let root = Node { value: 7, children: vec![] };
let tp: TreeishPipeline<Shared, Node, u64, u64> = TreeishPipeline::new(
treeish(|n: &Node| n.children.clone()),
&fold(
|n: &Node| n.value,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
),
);
assert_eq!(tp.run_from_node(&FUSED, &root), 7);
}
}
Stage 2 — Stage2Pipeline
#![allow(unused)]
fn main() {
/// Stage-2 typestate pipeline. Wraps a Stage-1 base with a lift chain.
/// The chain's input N is `<Base::Wrap as Wrap>::Of<UN>` — see the
/// `Stage2Base` and `Wrap` traits in this module.
#[must_use]
pub struct Stage2Pipeline<Base, L = IdentityLift> {
pub(crate) base: Base,
pub(crate) pre_lift: L,
}
}
base is a Stage-1 pipeline. pre_lift is one lift value, but
typically a ComposedLift<L1, L2> tree built up through
.then_lift calls and Stage-2 sugars. Each sugar appends one
node to the tree.
The chain’s input N is determined by the Base via the
Wrap projection on
Stage2Base:
Base = TreeishPipeline<D, N, H, R>—Wrap::Of<N> = N.Base = SeedPipeline<D, N, Seed, H, R>—Wrap::Of<N> = SeedNode<N>.SeedLiftis composed at the chain head when.runis called; every stored lift inpre_liftseesSeedNode<N>as its input.
Type evolution
After three sugars on a TreeishPipeline<Shared, u64, u64, u64>:
Each sugar wraps the previous chain in one more ComposedLift
layer. The base is unchanged. The whole chain monomorphises and
inlines together; there is no per-lift dispatch at runtime.
Entering Stage 2
#![allow(unused)]
fn main() {
let lp = tree_pipeline.lift(); // Stage2Pipeline<TreeishPipeline<..>, IdentityLift>
let lsp = seed_pipeline.lift(); // Stage2Pipeline<SeedPipeline<..>, IdentityLift>
}
TreeishPipeline also auto-lifts: tree_pipeline.wrap_init(w)
calls .lift() internally. SeedPipeline does not — .lift()
must be written explicitly.
Compositional primitives
then_lift — append
#![allow(unused)]
fn main() {
/// Post-compose `outer` onto the chain. Pure struct construction;
/// no bounds. The composition's *meaningfulness* is enforced where
/// the chain is consumed (`.run_*`, `TreeishSource`).
pub fn then_lift<L2>(
self,
outer: L2,
) -> Stage2Pipeline<Base, ComposedLift<L, L2>> {
Stage2Pipeline {
base: self.base,
pre_lift: ComposedLift::compose(self.pre_lift, outer),
}
}
}
L2’s inputs must match the chain tip’s outputs. The new tip
becomes (L2::N2, L2::MapH, L2::MapR). Available on every
Stage2Pipeline<Base, L>.
then_lift is unconstrained at the struct-method level (pure
construction). Validity is enforced where the chain is consumed
— the .run* methods and the TreeishSource impl.
before_lift — prepend (treeish-rooted only)
#![allow(unused)]
fn main() {
/// Pre-compose a type-preserving lift `first` before the chain.
/// `first`'s output (N, H, R) must equal the base's input.
/// For non-type-preserving pre-adaptation, use the variance-aware
/// sugars (`map_node_bi`, `map_r_bi`, `n_lift`, `phases_lift`).
///
/// Available only for treeish-rooted pipelines: seed-rooted
/// chains have `SeedLift` composed at `.run` time as the natural
/// chain head, leaving no meaningful "before" position.
pub fn before_lift<L0>(self, first: L0)
-> Stage2Pipeline<TreeishPipeline<D, N, H, R>, ComposedLift<L0, L>>
where L0: Lift<D, N, H, R>,
D: Domain<L0::N2>,
{
Stage2Pipeline { base: self.base, pre_lift: ComposedLift::compose(first, self.pre_lift) }
}
}
Pre-compose L0 at the head of the chain. L0 must be
type-preserving — its outputs must equal the Base’s inputs —
which restricts L0 to lifts that don’t change (N, H, R)
(filter_edges_lift, wrap_visit_lift, memoize_by_lift are
the practical choices).
Available only on Stage2Pipeline<TreeishPipeline<…>, L>.
Sugars
Stage-2 sugars all delegate to then_lift after building a
ShapeLift through Wrap dispatch. The user’s closures type at
&UN regardless of Base; the seed-rooted case adapts via a
SeedNode::Node(_)-peeling adapter inside the Wrap impl. Full
catalogue: Sugars. Type-level mechanism:
Wrap dispatch.
#![allow(unused)]
fn main() {
#[test]
fn lifted_sugar_chain() {
use hylic_pipeline::prelude::*;
let tp: TreeishPipeline<Shared, u64, u64, u64> = TreeishPipeline::new(
treeish(|n: &u64| if *n > 0 { vec![*n - 1] } else { vec![] }),
&fold(|n: &u64| *n, |h: &mut u64, c: &u64| *h += c, |h: &u64| *h),
);
let r: String = tp
.wrap_init(|n: &u64, orig: &dyn Fn(&u64) -> u64| orig(n) + 1)
.zipmap(|r: &u64| *r > 5)
.filter_edges(|n: &u64| *n != 0)
.map_r_bi(
|r: &(u64, bool)| format!("{}:{}", r.0, r.1),
|s: &String| {
let (a, b) = s.split_once(':').unwrap();
(a.parse().unwrap(), b == "true")
},
)
.run_from_node(&FUSED, &3u64);
// filter_edges drops the 0-step: tree visits 3→2→1, three nodes.
// wrap_init adds +1 each → values 4, 3, 2; sum = 9. zipmap > 5 → true.
assert_eq!(r, "9:true");
}
}
Running
Stage2Pipeline inherits run from its Stage-1 base.
Treeish-rooted: .run_from_node(&exec, &root) — see
TreeishPipeline. Seed-rooted:
.run(&exec, root_seeds, entry_heap) and
.run_from_slice(&exec, &[seed], entry_heap) — see
SeedPipeline. The call shape is unchanged
across stages.
Sugars — the chainable surface
Every transform users reach for at Stage 1 or Stage 2 is a
trait method. Each method picks an axis, builds the right
library lift, and either reshapes the
Stage-1 slots in place (Stage 1) or appends the lift to the
chain via then_lift (Stage 2).
Trait surfaces
| Where you are | Sugars in scope |
|---|---|
SeedPipeline<Shared, …> | SeedSugarsShared |
SeedPipeline<Local, …> | SeedSugarsLocal |
TreeishPipeline<Shared, …> | TreeishSugarsShared |
TreeishPipeline<Local, …> | TreeishSugarsLocal |
Stage2Pipeline<Base, L> (Shared, any Base) | Stage2SugarsShared |
Stage2Pipeline<Base, L> (Local, any Base) | Stage2SugarsLocal |
use hylic_pipeline::prelude::*; brings them all in scope.
Shared and Local: same names, different bounds
Method names are identical across domains. Only the closure storage and bounds differ:
// Shared: parallel-safe; closures must be Send + Sync.
let r = shared_pipe.wrap_init(w).zipmap(m).run(...);
// Local: same call shape, captures may be non-Send.
let r = local_pipe.wrap_init(w).zipmap(m).run(...);
Stage 2: one trait covers both Bases
Stage2SugarsShared is one trait, blanket-implemented on every
Stage2Pipeline<Base, L>. The treeish-rooted vs seed-rooted
dispatch happens inside the lift-construction call, not at the
trait level. Every Stage-2 sugar body is one line:
#![allow(unused)]
fn main() {
fn wrap_init<W>(self, w: W) -> Self::With<ShapeLift<Shared,
<<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>, H, R,
<<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>, H, R>>
where
<Self::Base as Stage2Base>::Wrap: WrapShared,
<<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>: Clone + Send + Sync + 'static,
W: Fn(&UN, &dyn Fn(&UN) -> H) -> H + Send + Sync + 'static,
{
self.then_lift(<<Self::Base as Stage2Base>::Wrap as WrapShared>::build_wrap_init::<UN, H, R, _>(w))
}
}
<<Self::Base as Stage2Base>::Wrap as WrapShared>::build_wrap_init
is the dispatch. Identity (treeish-rooted) calls
Shared::wrap_init_lift directly; SeedWrap (seed-rooted)
wraps the user’s closure with a SeedNode::Node(_)-peeling
adapter, then calls the same Shared::wrap_init_lift. Both
produce a ShapeLift; both forward to then_lift. From the
user’s perspective the closure types at &UN either way. See
Wrap dispatch for the full mechanics.
Stage 1: per-Base reshape sugars
Stage-1 reshape rewrites the base slots in place and returns a fresh Stage-1 pipeline of (possibly different) type parameters:
#![allow(unused)]
fn main() {
pub trait SeedSugarsShared<N, Seed, H, R>: Sized
where N: Clone + 'static, Seed: Clone + 'static,
H: Clone + 'static, R: Clone + 'static,
{
fn filter_seeds<P>(self, pred: P) -> SeedPipeline<Shared, N, Seed, H, R>
where P: Fn(&Seed) -> bool + Send + Sync + 'static;
fn wrap_grow<W>(self, wrapper: W) -> SeedPipeline<Shared, N, Seed, H, R>
where W: Fn(&Seed, &dyn Fn(&Seed) -> N) -> N + Send + Sync + 'static;
fn map_node_bi<N2, Co, Contra>(self, co: Co, contra: Contra)
-> SeedPipeline<Shared, N2, Seed, H, R>
where N2: Clone + 'static,
Co: Fn(&N) -> N2 + Send + Sync + 'static,
Contra: Fn(&N2) -> N + Send + Sync + 'static;
fn map_seed_bi<Seed2, ToNew, FromNew>(self, to_new: ToNew, from_new: FromNew)
-> SeedPipeline<Shared, N, Seed2, H, R>
where Seed2: Clone + 'static,
ToNew: Fn(&Seed) -> Seed2 + Send + Sync + 'static,
FromNew: Fn(&Seed2) -> Seed + Send + Sync + 'static;
}
}
Stage-2 sugars are not in scope until .lift() (or the
TreeishPipeline auto-lift) has produced a Stage2Pipeline.
Catalogue
Stage 1 — SeedSugarsShared / SeedSugarsLocal
Operates on SeedPipeline<D, N, Seed, H, R>:
| method | output shape |
|---|---|
filter_seeds(pred) | SeedPipeline<D, N, Seed, H, R> |
wrap_grow(w) | SeedPipeline<D, N, Seed, H, R> |
map_node_bi(co, contra) | SeedPipeline<D, N2, Seed, H, R> |
map_seed_bi(to, from) | SeedPipeline<D, N, Seed2, H, R> |
Stage 1 — TreeishSugarsShared / TreeishSugarsLocal
Operates on TreeishPipeline<D, N, H, R>:
| method | output shape |
|---|---|
map_node_bi(co, contra) | TreeishPipeline<D, N2, H, R> |
Stage 2 — Stage2SugarsShared / Stage2SugarsLocal
Operates on Stage2Pipeline<Base, L> (and on TreeishPipeline
via auto-lift). User closures type at &UN; the chain’s actual
N is UN (treeish-rooted) or SeedNode<UN> (seed-rooted),
bridged by Wrap.
| method | what the lift does |
|---|---|
wrap_init(w) | intercept init at every node |
wrap_accumulate(w) | intercept accumulate |
wrap_finalize(w) | intercept finalize |
zipmap(m) | extend R: R → (R, Extra) |
map_r_bi(fwd, bwd) | change R bijectively |
filter_edges(pred) | drop edges from the graph |
wrap_visit(w) | intercept graph visit |
memoize_by(key) | memoise subtree results by key |
map_n_bi(co, contra) | change N bijectively (chain-tip) |
explain() | wrap fold with per-node trace recording |
explain_describe(fmt, emit) | streaming trace; chain-tip R unchanged (Shared only) |
The Stage-1 reshape map_node_bi and the Stage-2 sugar
map_n_bi share a purpose (change N) but are distinct
operations. Stage 1 rewrites the base slots in place; Stage 2
composes a ShapeLift onto the chain. Use Stage 2 when the N
change must sit on top of earlier sugars.
Where wrap_init’s second argument comes from
Every wrap_* user closure receives an orig: &dyn Fn(...) -> ...
parameter alongside the node. orig is the prior fold’s
corresponding phase, exposed as a value so the sugar body can
compose with it: |n, orig| orig(n) + 1. Lifts are, at the
type level, natural transformations between fold algebras; a
phase mapper takes the prior phase as input and produces the
new phase. See
the type-level deep dive.
Wrap dispatch — how Stage-2 sugars reach both Bases
Stage2Pipeline<Base, L> is one struct. Its sugar surface
(Stage2SugarsShared and Stage2SugarsLocal) is one trait per domain.
Yet its chain L operates over different node types depending on the
Base:
Stage2Pipeline<TreeishPipeline<…>, L>— chain runs over the user’sN.Stage2Pipeline<SeedPipeline<…>, L>— chain runs overSeedNode<N>, becauseSeedLiftprepends the synthetic EntryRoot at run time.
A user-facing closure types at &N. The chain expects &<wrapped N>.
Bridging the two is the job of Wrap.
The trait
#![allow(unused)]
fn main() {
/// Type-level dispatch for the chain's input N. Each
/// [`Stage2Base`](super::Stage2Base) declares which `Wrap` it uses;
/// `WrapShared` / `WrapLocal` impls carry the per-domain lift
/// construction.
pub trait Wrap {
/// The wrapped node type for a given user-facing N.
type Of<UN: Clone + 'static>: Clone + 'static;
}
}
Two impls:
Identity::Of<UN> = UN— used byTreeishPipeline-rooted chains.SeedWrap::Of<UN> = SeedNode<UN>— used bySeedPipeline-rooted chains.
Stage2Base declares which Wrap a Base uses:
#![allow(unused)]
fn main() {
/// A Stage-1 pipeline that can drive a Stage-2 chain. Carries the
/// `Wrap` selection plus the run-time machinery (pre-lift, root
/// reference, run-input shape).
///
/// Inherits `TreeishSource` so the `(treeish<N>, fold<N, H, R>)` pair is
/// yielded through one canonical path; `with_treeish` is the single
/// place per-base storage shapes are read.
///
/// `PreLift` is intentionally unbounded at the trait level. The
/// `Stage2Pipeline::run` impl adds the `Lift<…, N2 = <Wrap>::Of<N>>`
/// bound at use time; that keeps the supertrait surface free of the
/// `Domain<<Wrap>::Of<N>>` obligation that would otherwise propagate
/// through every site naming `Stage2Base`.
pub trait Stage2Base: TreeishSource + Sized {
/// Type-level dispatcher for the chain's input N.
/// `Identity` → `Of<UN> = UN` (treeish-rooted).
/// `SeedWrap` → `Of<UN> = SeedNode<UN>` (seed-rooted).
type Wrap: Wrap;
/// The user-facing N (the type user lambdas type at). Equal to
/// `Self::N` for every shipped base; kept distinct for
/// documentation symmetry with the sugar surface, which threads
/// `UN` as a method-level parameter.
type UserN: Clone + 'static;
/// What `.run(...)` accepts as its second argument. Parameterised
/// by `CurN`, the user-facing N at the chain tip (i.e. after any
/// `map_n_bi` lifts; `CurN = Self::N` if the chain doesn't change
/// the user N).
///
/// `Identity`-Wrap bases: `&'i CurN` (a borrowed post-chain root).
/// `SeedWrap` bases: an owned `(seeds, entry_heap)` pair (the
/// `CurN` parameter is unused at the value level — `EntryRoot` is
/// constructible at any inner type).
type RunInputs<'i, CurN: Clone + 'static>;
/// The lift composed at the head of the run-time chain.
/// `IdentityLift` for treeish-rooted, `SeedLift` for seed-rooted.
/// Pre-lift transforms `(treeish<N>, fold<N,H,R>)` into
/// `(treeish<Wrap::Of<N>>, fold<Wrap::Of<N>, H, R>)` without
/// touching H or R.
///
/// Unbounded at the trait level — see the trait-level note.
/// The `Stage2Pipeline::run` impl adds
/// `Self::PreLift: Lift<…, N2 = <Wrap>::Of<N>, MapH = H, MapR = R>`
/// at use time.
type PreLift;
/// Build the pre-lift from inputs (consuming the parts of inputs
/// the lift captures), then yield it together with the executor's
/// post-chain root reference to the continuation.
///
/// The continuation receives the pre-lift by value (consumed when
/// applied to the (treeish, fold) pair) and the root by reference,
/// at the post-chain type `<Self::Wrap as Wrap>::Of<CurN>`. The
/// reference is valid for the entire duration of `cont`.
///
/// `Identity` case: pre-lift is `IdentityLift`; the root is the
/// `&CurN` extracted from `inputs`.
/// `SeedWrap` case: pre-lift is `SeedLift::from_*_grow(...)`,
/// consuming `inputs.0` (entry seeds) and `inputs.1` (entry heap);
/// the root is `&SeedNode::entry_root::<CurN>()`, constructed
/// locally in this frame and alive for `cont`'s lifetime.
fn provide_run_essentials<CurN: Clone + 'static, T>(
&self,
inputs: Self::RunInputs<'_, CurN>,
cont: impl FnOnce(Self::PreLift,
&<Self::Wrap as Wrap>::Of<CurN>) -> T,
) -> T;
}
}
So in the type system: <<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>
is the chain’s input N — equal to UN for treeish-rooted, equal to
SeedNode<UN> for seed-rooted. This two-hop projection appears verbatim
in every Stage-2 sugar’s signature.
Per-domain build subtraits
Wrap is type-only: it fixes a type family, not how to construct lifts.
The constructors live on per-domain subtraits:
#![allow(unused)]
fn main() {
fn build_wrap_init<UN, H, R, W>(w: W)
-> ShapeLift<Shared, Self::Of<UN>, H, R, Self::Of<UN>, H, R>
where
UN: Clone + Send + Sync + 'static,
H: Clone + Send + Sync + 'static,
R: Clone + Send + Sync + 'static,
Self::Of<UN>: Clone + Send + Sync + 'static,
W: Fn(&UN, &dyn Fn(&UN) -> H) -> H + Send + Sync + 'static;
}
(Plus one method per Stage-2 sugar; see
stage2/wrap/shared.rs
for the full set, and
stage2/wrap/local.rs
for the Local mirror.)
The split is forced by the Send + Sync axis: Shared user closures must
be Send + Sync (Arc storage; parallel executors); Local must not require
it (Rc storage; supports non-Send captured state). WrapShared/WrapLocal
are how that single asymmetry is expressed without macros.
Identity: pass-through
#![allow(unused)]
fn main() {
impl WrapShared for Identity {
fn build_wrap_init<UN, H, R, W>(w: W)
-> ShapeLift<Shared, UN, H, R, UN, H, R>
where
UN: Clone + Send + Sync + 'static,
H: Clone + Send + Sync + 'static,
R: Clone + Send + Sync + 'static,
W: Fn(&UN, &dyn Fn(&UN) -> H) -> H + Send + Sync + 'static,
{
Shared::wrap_init_lift::<UN, H, R, _>(w) // pass-through
}
}
User closure goes straight to Shared::wrap_init_lift. Of<UN> = UN, so
no adaptation is needed.
SeedWrap: peel Node, pass EntryRoot
#![allow(unused)]
fn main() {
impl WrapShared for SeedWrap {
fn build_wrap_init<UN, H, R, W>(w: W)
-> ShapeLift<Shared, SeedNode<UN>, H, R, SeedNode<UN>, H, R>
where
UN: Clone + Send + Sync + 'static,
H: Clone + Send + Sync + 'static,
R: Clone + Send + Sync + 'static,
W: Fn(&UN, &dyn Fn(&UN) -> H) -> H + Send + Sync + 'static,
{
let user = Arc::new(w);
// Adapter for the SeedNode<UN>-typed chain: peel Node(_), pass EntryRoot.
let lifted = move |ln: &SeedNode<UN>,
orig: &dyn Fn(&SeedNode<UN>) -> H| -> H
{
match sn_int::inner(ln) {
SeedNodeInner::Node(n) => {
let user = user.clone();
user(n, &|inner: &UN| orig(&sn_int::node(inner.clone())))
}
SeedNodeInner::EntryRoot => orig(ln),
}
};
Shared::wrap_init_lift::<SeedNode<UN>, H, R, _>(lifted)
}
}
The user types Fn(&UN, …) -> H. The chain expects
Fn(&SeedNode<UN>, …) -> H. The body adapts: when the row is
Node(n), call the user’s closure with &n; when it’s EntryRoot, call
through to the chain’s orig continuation directly (the user closure has
nothing to do with the synthetic root).
The same pattern recurs for every N-aware sugar:
build_filter_edges, build_memoize_by, build_wrap_visit,
build_map_n_bi. Sugars without &N in their signature
(wrap_accumulate, wrap_finalize, zipmap, map_r_bi, explain) need
no peeling — both impls forward unchanged.
How the sugar trait forwards
A representative Stage2SugarsShared body — the unified surface that
covers both Base shapes:
#![allow(unused)]
fn main() {
fn wrap_init<W>(self, w: W) -> Self::With<ShapeLift<Shared,
<<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>, H, R,
<<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>, H, R>>
where
<Self::Base as Stage2Base>::Wrap: WrapShared,
<<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>: Clone + Send + Sync + 'static,
W: Fn(&UN, &dyn Fn(&UN) -> H) -> H + Send + Sync + 'static,
{
self.then_lift(<<Self::Base as Stage2Base>::Wrap as WrapShared>::build_wrap_init::<UN, H, R, _>(w))
}
}
The body is one line. The surrounding where clauses repeat the projection
chain so Rust’s solver can verify each junction; that’s where the verbosity
sits. See the type-level deep dive for why the
projection has to be spelled out symmetrically here.
What the user sees
Nothing of the above. From the call site:
seed_pipeline
.lift()
.wrap_init(|n: &N, orig| orig(n) + 1) // typed at &N, not &SeedNode<N>
.filter_edges(|n: &N| !is_excluded(n))
.run_from_slice(&exec, &seeds, h0);
Wrap dispatch is invisible. The user picks a Base; the trait routes
through the right impl; closures stay typed at the user’s N. Switching
Base shape — say, building the same chain over a TreeishPipeline —
costs no code at the sugar layer.
SeedNode<N> — the seed-rooted row type
#![allow(unused)]
fn main() {
/// Opaque row type in a seed-closed chain's treeish. Values are
/// either the synthetic `EntryRoot` row (seed fan-out) or a resolved
/// `Node(N)`. User code inspects via [`is_entry_root`](Self::is_entry_root),
/// [`as_node`](Self::as_node), [`into_node`](Self::into_node), and
/// [`map_node`](Self::map_node); the variants are sealed.
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct SeedNode<N> {
// Exposed `pub` (not `pub(crate)`) so the doc-hidden
// `seed_node_internal` module can re-export it for
// `hylic-pipeline`'s dispatch. User code should treat this field
// as opaque and use `is_entry_root` / `as_node` / `map_node`.
#[doc(hidden)]
pub inner: SeedNodeInner<N>,
}
/// Library-internal variant carrier for `SeedNode<N>`. Exposed
/// `pub` only to make crate-external re-export through the
/// `seed_node_internal` doc-hidden module possible. User code
/// should never name this directly.
#[doc(hidden)]
#[derive(Clone, PartialEq, Eq, Hash)]
pub enum SeedNodeInner<N> {
EntryRoot,
Node(N),
}
}
SeedNode<N> is the chain’s input node type once SeedLift
has fired in a seed-rooted Stage2Pipeline. Two inhabitants:
Node(N)— a real grown node from the user’s seed graph.EntryRoot— the synthetic forest root above the entry seeds.
Variants are sealed; pattern-matching is not exposed to user code. Inspection is through accessor methods:
| method | returns |
|---|---|
sn.is_entry_root() | bool |
sn.as_node() | Option<&N> |
sn.into_node() | Option<N> |
sn.map_node(f: FnOnce(&N) -> M) | SeedNode<M> — Node mapped, EntryRoot preserved |
Inside Stage-2 sugar bodies, SeedNode<N> never appears; user
closures type at &N and the row is peeled (or routed past)
by Wrap dispatch.
Where the row surfaces
A lift whose output type mentions the chain’s N can carry
SeedNode<N> to the chain tip. The explainer is the canonical
case:
let raw: ExplainerResult<SeedNode<N>, H, R> = pipeline
.lift()
.explain()
.run_from_slice(&exec, &seeds, h0);
ExplainerResult’s first parameter is the per-node heap.node,
which on a seed-rooted chain is SeedNode<N>.
For walks over the trace (formatting, post-fact analysis),
project to an N-typed view via
SeedExplainerResult::from:
let sealed: SeedExplainerResult<N, H, R> = raw.into();
// sealed.entry_initial_heap, sealed.entry_working_heap, sealed.orig_result
// — the EntryRoot row, promoted out of the tree as fields.
// sealed.roots: Vec<ExplainerResult<N, H, R>>
// — per-seed subtrees, every node now plain N.
The conversion is total: every node below the EntryRoot row is
unwrapped, and SeedNode<N> no longer appears in the
user-visible shape.
Tree shape
EntryRoot
├── Node(grow(seed_0))
├── Node(grow(seed_1))
└── …
SeedLift produces this tree at run time from the entry seeds
and the user’s grow. Each Node(n) below has the user’s
seeds_from_node + grow as its own children-producing function.
One-shot — OwnedPipeline
#![allow(unused)]
fn main() {
/// One-shot pipeline over the `Owned` domain. Not `Clone`; runs
/// via [`crate::source::PipelineExecOnce::run_from_node_once`],
/// which consumes `self`.
#[must_use]
pub struct OwnedPipeline<N, H, R>
where N: 'static, H: 'static, R: 'static,
{
pub(crate) treeish: Edgy<N, N>,
pub(crate) fold: Fold<N, H, R>,
}
}
Two slots, like TreeishPipeline, but stored in the Owned
domain — closures are Box<dyn Fn>, not Clone, not
Send + Sync. Runs once and is consumed.
Constructor
#![allow(unused)]
fn main() {
let pipeline = OwnedPipeline::new(
treeish, // owned::Edgy<N, N>
fold, // owned::Fold<N, H, R>
);
}
Running
#![allow(unused)]
fn main() {
let r = pipeline.run_from_node_once(&FUSED, &root);
// pipeline is consumed.
}
run_from_node_once is the by-value method on
PipelineExecOnce, the consuming counterpart of
PipelineExec::run_from_node. Owned does not implement
ShapeCapable, so Stage-2 sugars are not available — there is
no chain to compose.
Worked example
#![allow(unused)]
fn main() {
#[test]
fn owned_pipeline_example() {
use hylic_pipeline::{OwnedPipeline, PipelineExecOnce};
use hylic::domain::owned as odom;
let graph = odom::edgy::treeish(|n: &u64|
if *n > 0 { vec![*n - 1] } else { vec![] });
let fld = odom::fold(
|n: &u64| *n,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
let r: u64 = OwnedPipeline::new(graph, fld)
.run_from_node_once(&odom::FUSED, &5u64);
// 5+4+3+2+1+0 = 15.
assert_eq!(r, 15);
}
}
Writing a custom Lift
Most transformations compose out of the library catalogue and
the sugar traits. A custom Lift impl is the right tool when
the transformation carries cross-node state, requires
per-variant dispatch on the input N, or is itself an execution
strategy.
apply has one job: produce the three output slots and hand
them to a continuation cont(grow', treeish', fold'). Everything
about the impl follows from four decisions about how those
slots relate to the input ones.
Four decisions
1. Output types.
type N2 = ???;
type MapH = ???;
type MapR = ???;
Mirror the input on axes the lift does not change. Where an
axis changes, declare the new type. MapH and MapR are
typically wrappers — the explainer, for instance, wraps MapR
into ExplainerResult<N, H, R>.
2. Treatment of the input grow.
Three options: pass through unchanged, wrap with an N-conversion
when N changes, or synthesise a fresh grow (the SeedLift
case, where the chain head closes the grow axis). Most custom
lifts pass through.
3. Treatment of the input treeish.
Pass through, filter, wrap in a visit-intercepting closure, or rebuild entirely.
4. Treatment of the input fold.
Clone it once per phase closure. Build a new
Fold<D::N2, MapH, MapR> whose init, accumulate, and
finalize delegate to the original through the captured clones.
Worked example
NoteVisits increments a shared counter every time init
runs. No type changes; grow and treeish pass through; fold
gets a wrapped init.
#![allow(unused)]
fn main() {
#[test]
fn custom_lift_note_visits() {
use std::sync::{Arc, Mutex};
use hylic::domain::Shared;
use hylic::domain::shared::fold::{self as sfold, Fold};
use hylic::graph::Treeish;
use hylic::ops::Lift;
/// Counts init calls into a shared counter.
#[derive(Clone)]
struct NoteVisits {
counter: Arc<Mutex<u64>>,
}
impl<N, H, R> Lift<Shared, N, H, R> for NoteVisits
where N: Clone + 'static, H: Clone + 'static, R: Clone + 'static,
{
type N2 = N;
type MapH = H;
type MapR = R;
fn apply<T>(
&self,
treeish: Treeish<N>,
fold: Fold<N, H, R>,
cont: impl FnOnce(
Treeish<N>,
Fold<N, H, R>,
) -> T,
) -> T
{
let fold_for_init = fold.clone();
let fold_for_acc = fold.clone();
let fold_for_fin = fold;
let counter = self.counter.clone();
let wrapped: Fold<N, H, R> = sfold::fold(
move |n: &N| { *counter.lock().unwrap() += 1; fold_for_init.init(n) },
move |h: &mut H, r: &R| fold_for_acc.accumulate(h, r),
move |h: &H| fold_for_fin.finalize(h),
);
cont(treeish, wrapped)
}
}
use hylic::ops::LiftBare;
use hylic::prelude::{treeish, fold, FUSED};
let counter = Arc::new(Mutex::new(0u64));
let lift = NoteVisits { counter: counter.clone() };
let t = treeish(|n: &u64| if *n > 0 { vec![*n - 1] } else { vec![] });
let f = fold(|n: &u64| *n, |h: &mut u64, c: &u64| *h += c, |h: &u64| *h);
let r: u64 = lift.run_on(&FUSED, t, f, &3u64);
assert_eq!(r, 6); // 3 + 2 + 1 + 0
assert_eq!(*counter.lock().unwrap(), 4); // four init calls
}
}
Apply via LiftBare::run_on or compose into a pipeline:
#![allow(unused)]
fn main() {
use hylic_pipeline::prelude::*;
let r = my_treeish_pipeline.lift()
.then_lift(NoteVisits { counter })
.run_from_node(&FUSED, &root);
}
When ShapeLift is sufficient
If the transformation is “rewrite one of the three slots” —
which it is most of the time — one of the per-axis primitives
or the universal ShapeLift does the job.
| Primitive | When |
|---|---|
Shared::phases_lift(mi, ma, mf) | rewrite all three Fold phases |
Shared::treeish_lift(mt) | rewrite the graph |
Shared::n_lift(lift_node, build_treeish, contra) | coordinated N-change across all slots |
Shared::wrap_init_lift(w) | wrap init |
Shared::zipmap_lift(m) | extend R |
Shared::filter_edges_lift(p) | drop edges from the graph |
(Local mirrors are alongside.) NoteVisits above is
expressible as
Shared::wrap_init_lift(|n, orig| { counter.bump(); orig(n) });
the custom impl was shown to illustrate the trait structure.
Capability bounds
PureLift—Clone + 'staticon the lift,Cloneon every output type. Required for the sequentialFusedexecutor.ShareableLift— addsSend + Sync + 'staticon the lift and on every payload. Required for the parallelFunnelexecutor.
Both are blanket markers; the compiler selects them when the
bounds are met. To run under Funnel, the lift struct itself
must be Clone + Send + Sync + 'static, and every captured
field must satisfy the same.
The Exec Pattern
Every executor in hylic has the same type-level structure. Two
traits — Executor (computation) and ExecutorSpec (lifecycle) —
and one wrapper — Exec<D, S> — compose into a uniform API where
every executor, regardless of whether it needs resources, presents
the same interface to the user.
The core idea
A Spec is a defunctionalized executor — pure data that fully
describes a computation strategy. It is Copy: small, moveable,
transformable. Calling .run() on a Spec refunctionalizes it: turns
the data back into computation.
For executors that need resources (thread pools, arenas), .run()
internally creates the resource, binds it, runs the fold, and
destroys the resource. For executors that need nothing (Fused), the
same .run() just runs.
#![allow(unused)]
fn main() {
use hylic::prelude::*;
// Sequential — no resource needed:
FUSED.run(&fold, &graph, &root);
// Parallel — resource created + destroyed internally:
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
}
The call shape is identical. Resource management is an internal concern of each executor.
The trait pair
#![allow(unused)]
fn main() {
/// Lifecycle: resource management + session creation.
/// Only Specs implement this. Sessions are the output.
pub trait ExecutorSpec: Copy {
/// Borrowed resource attached to a session (for example, a
/// thread-pool reference).
type Resource<'r> where Self: 'r;
/// The session type produced by `attach`.
type Session<'s>: 's where Self: 's;
/// Bind the spec to a borrowed resource, returning a session.
fn attach(self, resource: Self::Resource<'_>) -> Self::Session<'_>;
/// Construct an owned session scoped to `f` and run `f` against it.
fn with_session<R>(&self, f: impl for<'s> FnOnce(&Self::Session<'s>) -> R) -> R;
}
}
ExecutorSpec is the lifecycle trait. Two GATs define each executor’s
world:
Resource<'r>: what the executor needs. A thread pool (&'r Pool) for Funnel.()for Fused.Session<'s>: the bound executor — Spec + resource, ready to run folds. Borrows the resource at lifetime's.
Two methods connect them:
attach(self, resource): partial application. Consumes the Spec (it’s Copy — the caller keeps their copy), fixes the resource, produces a Session. This is the explicit path.with_session(&self, f): the scoped path. Creates the resource internally, attaches, callsfwith the session, cleans up. Copies the Spec (sinceattachconsumes andwith_sessionborrows&self).
#![allow(unused)]
fn main() {
/// Run a fold on a tree. Both Specs and Sessions implement this.
///
/// The fold is domain-specific (`D::Fold<H, R>`). The graph type G
/// is a trait-level parameter — each executor impl declares its own
/// bounds on G (e.g. Fused accepts any TreeOps, Funnel requires
/// Send+Sync). The compiler checks G at the call site.
pub trait Executor<N: 'static, R: 'static, D: Domain<N>, G: TreeOps<N> + 'static> {
/// Run the given `fold` over the `graph` starting at `root` and
/// return the fold's final result for the root.
fn run<H: 'static>(&self, fold: &D::Fold<H, R>, graph: &G, root: &N) -> R;
}
}
Executor is the computation trait. Both Specs and Sessions
implement it:
- Spec::run: routes through
self.with_session(|s| s.run(...))— creates the resource, runs, destroys - Session::run: direct dispatch — the resource is already bound
Exec<D, S>
#![allow(unused)]
fn main() {
/// User-facing executor wrapper tying a domain `D` to an executor
/// strategy `S`. Both Specs and Sessions appear inside `Exec`.
pub struct Exec<D, S>(pub(crate) S, PhantomData<D>);
#[allow(missing_docs)] // trivial constructor/accessor pair
impl<D, S> Exec<D, S> {
pub const fn new(inner: S) -> Self { Exec(inner, PhantomData) }
pub fn into_inner(self) -> S { self.0 }
}
impl<D, S: Clone> Clone for Exec<D, S> {
fn clone(&self) -> Self { Exec::new(self.0.clone()) }
}
impl<D, S: Copy> Copy for Exec<D, S> {}
}
The user-facing wrapper. D is the domain (determines fold/graph
types via GATs). S is the strategy — a Spec or a Session.
Exec is repr(transparent) over S and derives Copy when
S is Copy.
Two method blocks:
#![allow(unused)]
fn main() {
/// Run the inner strategy as an [`Executor`]. Inferred over `N`,
/// `H`, `R`, and `G` from the arguments.
pub fn run<N: 'static, H: 'static, R: 'static, G: TreeOps<N> + 'static>(
&self, fold: &<D as Domain<N>>::Fold<H, R>, graph: &G, root: &N,
) -> R
where D: Domain<N>, S: Executor<N, R, D, G>
{
Executor::<N, R, D, G>::run(&self.0, fold, graph, root)
}
}
Block A (.run()): available on ALL Exec where S: Executor.
This is the one way to execute. Works on Specs and Sessions alike.
#![allow(unused)]
fn main() {
impl<D, S: ExecutorSpec> Exec<D, S> {
/// Construct a session bound to an owned resource, pass it to
/// `f` by value (wrapping a borrowed session inside a fresh
/// `Exec<D, &Session>`), and return `f`'s result. The session
/// is dropped at the end of the scope.
pub fn session<R>(
&self,
f: impl for<'s> FnOnce(Exec<D, &S::Session<'s>>) -> R,
) -> R {
self.0.with_session(|session| f(Exec::new(session)))
}
/// Bind the spec to a borrowed resource, returning a session as
/// an `Exec`.
pub fn attach(self, resource: S::Resource<'_>) -> Exec<D, S::Session<'_>> {
Exec::new(self.0.attach(resource))
}
}
}
Block B (.session(), .attach()): available only on Spec-level
Exec where S: ExecutorSpec. These are the resource-management
surface:
.session(|s| ...): borrows the Spec, creates the resource in a scope, passes the session-levelExecto the closure. Multiple.run()calls inside share the resource..attach(resource): consumes the Spec (partial application), returns a session-levelExecbound to the resource. One expression — no intermediate bindings needed because Specs are Copy.
The three usage tiers
Executors can be used at three levels of resource control:
One-shot — the common case. Each .run() manages resources
internally:
#![allow(unused)]
fn main() {
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
}
Session scope — amortized multi-run. The resource (thread pool) is created once, shared across folds:
#![allow(unused)]
fn main() {
exec(funnel::Spec::default(8)).session(|s| {
s.run(&fold1, &graph1, &root1);
s.run(&fold2, &graph2, &root2);
});
}
Explicit attach — manual resource management. You provide the resource; the Spec binds to it:
#![allow(unused)]
fn main() {
funnel::Pool::with(8, |pool| {
exec(funnel::Spec::default(8)).attach(pool).run(&fold, &graph, &root);
});
}
For zero-resource executors (Fused), all three tiers compile but
.session() and .attach(()) are identity — the compiler optimizes
them away.
The impl table
Every executor fills the same shape:
| Type | Resource | Session | Executor::run |
|---|---|---|---|
fused::Spec | () | Self | direct recursion |
funnel::Spec<P> | &Pool | Session<P> | routes through with_session |
funnel::Session | — | — | direct dispatch::run_fold |
Sessions do NOT implement ExecutorSpec — they’re the output of
attach, not a Spec themselves.
Domain constants
Fused is a zero-sized Spec exposed as a domain-bound const:
pub const FUSED: Exec<Shared, fused::Spec> = Exec::new(fused::Spec);
FUSED is Copy. .run() calls Executor::run on fused::Spec
directly (it implements both traits). No resource, no session — the
Spec IS the session.
Generic-over-executor code
The Executor trait is the single generic bound. The graph type G
is a trait-level parameter — each executor impl declares its own
bounds on G:
fn measure<G: TreeOps<NodeId> + 'static, S: Executor<NodeId, u64, Shared, G>>(
exec: &Exec<Shared, S>, fold: &shared::Fold<NodeId, u64, u64>, graph: &G, root: &NodeId,
) -> u64 {
exec.run(fold, graph, root)
}
This works for Exec<Shared, fused::Spec>, Exec<Shared, funnel::Spec<P>>,
and Exec<Shared, funnel::Session<'_, P>> — all through the same
bound, the same .run(), the same call site.
How a new executor fits in
Adding a new executor requires implementing two traits:
- Define
MySpec(Copy) andMySession<'s> - Implement
ExecutorSpeconMySpec— defineResource,Session,attach,with_session - Implement
ExecutoronMySession— the direct dispatch - Implement
ExecutoronMySpec— route throughwith_session - Users:
shared::exec(MySpec { ... }).run(...)(or thelocal/ownedequivalent) — same shape as every other executor
The framework provides .run(), .session(), .attach() for free
via the Exec<D, S> wrapper.
Domain integration
The domain system lets executors accept folds without knowing their
concrete storage. The Domain trait maps a marker type to a concrete
Fold type via a GAT. The graph type is a separate concern — the
Executor trait accepts any G: TreeOps<N>, with per-executor
bounds checked at the call site.
The Domain trait
The Domain trait provides a single associated type — the fold:
#![allow(unused)]
fn main() {
pub trait Domain<N: 'static>: 'static {
type Fold<H: 'static, R: 'static>: FoldOps<N, H, R>;
type Graph<E: 'static> where E: 'static;
type Grow<Seed: 'static, NOut: 'static>;
/// Construct a fold from three closures. Uniform Send+Sync
/// bound; each domain sheds Send+Sync at storage time if it
/// doesn't need it.
fn make_fold<H: 'static, R: 'static>(
init: impl Fn(&N) -> H + Send + Sync + 'static,
acc: impl Fn(&mut H, &R) + Send + Sync + 'static,
fin: impl Fn(&H) -> R + Send + Sync + 'static,
) -> Self::Fold<H, R>;
/// Construct a grow closure from a Fn. Uniform Send+Sync bound.
fn make_grow<Seed: 'static, NOut: 'static>(
f: impl Fn(&Seed) -> NOut + Send + Sync + 'static,
) -> Self::Grow<Seed, NOut>;
/// Invoke a stored grow closure.
fn invoke_grow<Seed: 'static, NOut: 'static>(
g: &Self::Grow<Seed, NOut>,
s: &Seed,
) -> NOut;
/// Construct a graph (Edgy) closure. Uniform Send+Sync bound.
fn make_graph<E: 'static>(
visit: impl Fn(&N, &mut dyn FnMut(&E)) + Send + Sync + 'static,
) -> Self::Graph<E>;
}
}
Each domain marker (Shared, Local, Owned) implements this
trait with a different closure boxing strategy:
| Domain | Fold<H, R> storage | Send+Sync |
|---|---|---|
| Shared | Arc<dyn Fn + Send + Sync> | yes |
| Local | Rc<dyn Fn> | no |
| Owned | Box<dyn Fn> | no |
Graph types are domain-independent. Treeish<N> and Edgy<N, E>
in hylic::graph are always Arc-based (they need Clone for graph
composition). Any type implementing TreeOps<N> can serve as a
graph, including user-defined structs with no boxing at all.
The Executor trait
The executor trait has four type parameters: N (node), R
(result), D (domain), and G (graph):
#![allow(unused)]
fn main() {
/// Run a fold on a tree. Both Specs and Sessions implement this.
///
/// The fold is domain-specific (`D::Fold<H, R>`). The graph type G
/// is a trait-level parameter — each executor impl declares its own
/// bounds on G (e.g. Fused accepts any TreeOps, Funnel requires
/// Send+Sync). The compiler checks G at the call site.
pub trait Executor<N: 'static, R: 'static, D: Domain<N>, G: TreeOps<N> + 'static> {
/// Run the given `fold` over the `graph` starting at `root` and
/// return the fold's final result for the root.
fn run<H: 'static>(&self, fold: &D::Fold<H, R>, graph: &G, root: &N) -> R;
}
}
The domain D determines the fold type (D::Fold<H, R>). The graph
type G is constrained per executor implementation. This separation
means the fold’s boxing strategy and the graph’s storage are
independent choices.
The type resolution at a call site proceeds as follows:
The compiler checks that G satisfies the executor’s requirements.
For Fused, any TreeOps<N> suffices. For Funnel, G must also be
Send + Sync (the graph reference is shared across a scoped pool).
If the graph type does not satisfy the executor’s bounds, the call
site produces a compile error.
Why D is on the executor, not the fold
Fold<N, H, R> carries no domain parameter — the domain lives on
the executor: Exec<D, S>. This resolves a type inference problem:
GATs are not injective (D::Fold<H, R> does not uniquely identify
D), so the compiler cannot infer D from a fold argument alone.
With D fixed by the executor constant or exec() call, everything
resolves statically.
Domain compatibility
| Shared | Local | Owned | |
|---|---|---|---|
| Fused | yes | yes | yes |
| Funnel | yes | — | — |
Fused supports all domains because it borrows both fold and graph
on a single thread. Funnel requires N: Clone + Send and R: Send
on the fold’s types, which the Shared domain satisfies. The graph
must additionally be Send + Sync.
The FoldOps trait
Executors do not call fold methods through the concrete domain type.
They operate through the FoldOps<N, H, R> trait, which all domain
Fold types implement:
The executor’s recursion engine takes &impl FoldOps<N, H, R> —
fully monomorphized for the concrete fold type, with no runtime
dispatch beyond the closure’s own vtable.
Policy Traits: Zero-Cost Configuration
Funnel’s three behavioral axes (queue, accumulation, wake) are each
a trait with an associated Spec type. The FunnelPolicy bundle
combines them into one type parameter. This pattern —
Spec → Store/State → Handle, resolved at compile time — is the
general recipe for adding zero-overhead configuration axes to any
executor.
This page describes the pattern generically. For the concrete implementations (Chase-Lev deques, streaming sweep, etc.), see the Funnel section.
Specs as data
Every Spec in hylic is Copy — a small value type that fully
describes configuration. This follows from the
defunctionalization principle: Specs are data,
not behavior. Combining Specs via axis transformations produces new
Specs. Attaching a resource to a Spec produces a Session. Running
a Spec creates the resource internally.
The policy sub-specs (PerWorkerSpec, OnFinalizeSpec, EveryKSpec,
etc.) are all Copy + Default + Send + Sync. Most are ZSTs. The
funnel Spec<P> composes them and is itself Copy (~40 bytes of
usizes and ZSTs).
The Spec → Store → Handle pattern
Each axis follows the same three-phase lifecycle:
Three associated types capture the lifecycle:
- Spec — construction-time configuration. Carried in the
executor’s
Spec<P>. Small, Copy, Default. - Store — per-fold resources created from the Spec. Owned by the fold’s stack frame. Send+Sync (shared across workers).
- Handle — per-worker view that borrows from the Store. Has the actual push/pop/steal methods.
All three use GATs to carry the task’s generic parameters without boxing.
Concrete example: WorkStealing
#![allow(unused)]
fn main() {
/// A work-stealing strategy. Associates typed Store and Handle via GATs.
pub trait WorkStealing: 'static {
type Spec: Copy + Default + Send + Sync;
type Store<N: Send + 'static, H: 'static, R: Send + 'static>: Send + Sync;
type Handle<'a, N: Send + 'static, H: 'static, R: Send + 'static>: TaskOps<N, H, R>
where Self: 'a;
fn create_store<N: Send + 'static, H: 'static, R: Send + 'static>(
spec: &Self::Spec, n_workers: usize,
) -> Self::Store<N, H, R>;
fn reset_store<N: Send + 'static, H: 'static, R: Send + 'static>(
store: &mut Self::Store<N, H, R>,
);
fn handle<'a, N: Send + 'static, H: 'static, R: Send + 'static>(
store: &'a Self::Store<N, H, R>, worker_idx: usize,
) -> Self::Handle<'a, N, H, R>;
}
}
Two implementations:
| PerWorker | Shared | |
|---|---|---|
| Spec | PerWorkerSpec { deque_capacity } (Copy) | SharedSpec (ZST, Copy) |
| Store | Vec<WorkerDeque> + AtomicU64 bitmask | StealQueue |
| Handle | refs to own deque + all deques + bitmask | ref to queue |
Bundling: FunnelPolicy
Three independent axes combined into one type parameter:
#![allow(unused)]
fn main() {
/// Bundles queue topology, accumulation strategy, and wake policy.
/// One type parameter on the executor replaces three.
pub trait FunnelPolicy: 'static {
type Queue: WorkStealing;
type Accumulate: AccumulateStrategy;
type Wake: WakeStrategy;
}
}
#![allow(unused)]
fn main() {
/// Generic policy: any combination of axes. Named presets are type aliases over this.
pub struct Policy<
Q: WorkStealing = queue::PerWorker,
A: AccumulateStrategy = accumulate::OnFinalize,
W: WakeStrategy = wake::EveryPush,
>(PhantomData<(Q, A, W)>);
impl<Q: WorkStealing, A: AccumulateStrategy, W: WakeStrategy> FunnelPolicy for Policy<Q, A, W> {
type Queue = Q;
type Accumulate = A;
type Wake = W;
}
}
Policy<Q, A, W> is the generic implementor. Named presets are type
aliases. The funnel Spec<P> carries each axis’s sub-spec:
#![allow(unused)]
fn main() {
pub struct Spec<P: FunnelPolicy = policy::Default> {
/// Pool size for `.run()` and `.session()`. Not consulted when
/// attaching to an explicit pool via `.attach()`.
pub default_pool_size: usize,
pub queue: <P::Queue as WorkStealing>::Spec,
pub accumulate: <P::Accumulate as AccumulateStrategy>::Spec,
pub wake: <P::Wake as WakeStrategy>::Spec,
}
}
Named presets as transformations
Every named preset is a transformation of Spec::default(n).
Default values live in ONE place — the default() constructor.
Presets compose axis builders on top:
// WideLight = default + Shared queue + OnArrival accumulation
fn for_wide_light(n: usize) -> Spec<WideLight> {
Spec::default(n)
.with_queue::<Shared>(SharedSpec)
.with_accumulate::<OnArrival>(OnArrivalSpec)
}
The axis builders (with_queue, with_accumulate, with_wake)
are typestate transformations — they change the Policy type parameter,
producing a new Spec type.
How monomorphization flows
The type parameter propagates from Spec to every call site:
From Spec<WideLight> to the innermost push/deliver/notify — every
call is resolved at compile time. No vtable, no trait object, no
indirect call.
The const generic optimization
Wake strategies like EveryK<K> use a const generic for the
notification interval. The modulus count % K compiles to a bitmask
when K is a power of 2 — the compiler sees the constant and
optimizes.
Applying the pattern to new axes
To add a fourth axis (e.g., steal ordering):
- Define a trait:
pub trait StealOrder: 'static { type Spec: Copy + Default + Send + Sync; ... } - Add implementations:
struct Fifo;,struct Lifo; - Add to
FunnelPolicy:type Steal: StealOrder; - Update
Policy<Q, A, W, St>and named presets - Thread through
Spec<P>andrun_fold
The call chain monomorphizes automatically. No runtime cost for the new axis.
Funnel: Parallel Fused Hylomorphism
The funnel executor parallelizes a fused hylomorphism — an unfold (tree traversal) composed with a fold (bottom-up accumulation) where the intermediate tree is never materialized. Children are discovered one at a time through a push-based callback, processed concurrently across worker threads, and their results flow back to the parent through defunctionalized continuations.
What a fused hylomorphism is
A hylomorphism composes an unfold (anamorphism) with a fold (catamorphism). The unfold generates a tree structure from a seed; the fold consumes it bottom-up. When fused, the two interleave: each node is produced, its children recursively processed, and their results accumulated — without materializing the tree.
In hylic terms: a Treeish<N> (the coalgebra) exposes
visit(&node, |child| ...) and a Fold<N, H, R> (the algebra,
factored as init/accumulate/finalize)
provides the per-node bracket. The executor calls visit to
discover children, recursively processes each, accumulates their R
results into the parent’s H heap, and finalizes. The intermediate
tree is never materialized as a data structure.
The funnel parallelizes this: children beyond the first are pushed to a work-stealing queue. Worker threads steal and process subtrees concurrently. Results flow back through continuations to the parent’s accumulator. The challenge is coordinating the fold — detecting when all children are done, accumulating their results, and cascading upward — without locks, without allocation on the critical path.
Design values
Four properties define the funnel’s design:
-
Iterator-based traversal. The graph exposes
visit(&node, |child| ...). Children arrive one at a time. There is nochildren(&node) -> Vec<N>. -
Fully parallel. Each child beyond the first is pushed to a work-stealing queue. Workers steal and process subtrees concurrently. The first child is walked inline — zero queue overhead on the DFS spine.
-
Fused unfold+fold. Results accumulate into the parent as they arrive (streaming) or in bulk by the last thread (finalize). The tree is never materialized.
-
Zero allocation on the hot path. Tasks are enum variants stored inline in deque slots. Multi-child accumulators are arena-allocated. Single-child nodes carry their heap inside the continuation. No
Box<dyn FnOnce>, noArcper task.
Where funnel sits
| Executor | Parallelism | Unfold/fold fusion | Task repr | Allocation |
|---|---|---|---|---|
| Fused | none | fully fused | stack frames | zero |
| Funnel | CPS + work-stealing | fully fused | FunnelTask enum | arenas |
Fused is the sequential baseline — zero overhead, callback-based recursion on a single thread. Funnel preserves the fused property while adding parallelism through CPS (continuation-passing style) and work-stealing queues. The fold/graph are unchanged between the two — only the executor differs.
Both use the same Exec<D, S>
type-level pattern. Funnel’s policy system is an instance of the
generic Spec → Store → Handle
pattern for zero-cost executor configuration.
Module map
The funnel’s code is organized into four clusters:
Three behavioral axes
The funnel is parameterized along three independent axes, all
resolved at compile time through the FunnelPolicy trait:
#![allow(unused)]
fn main() {
/// Bundles queue topology, accumulation strategy, and wake policy.
/// One type parameter on the executor replaces three.
pub trait FunnelPolicy: 'static {
type Queue: WorkStealing;
type Accumulate: AccumulateStrategy;
type Wake: WakeStrategy;
}
}
Each axis is a trait with its own Spec, Store/State, and
implementations. The Policy<Q, A, W> struct bundles any combination:
#![allow(unused)]
fn main() {
/// Generic policy: any combination of axes. Named presets are type aliases over this.
pub struct Policy<
Q: WorkStealing = queue::PerWorker,
A: AccumulateStrategy = accumulate::OnFinalize,
W: WakeStrategy = wake::EveryPush,
>(PhantomData<(Q, A, W)>);
impl<Q: WorkStealing, A: AccumulateStrategy, W: WakeStrategy> FunnelPolicy for Policy<Q, A, W> {
type Queue = Q;
type Accumulate = A;
type Wake = W;
}
}
Named presets are type aliases:
#![allow(unused)]
fn main() {
// ── Named presets (type aliases) ─────────────────
//
// Each is a concrete instantiation of Policy<Q, A, W>.
// `Default` aliases the robust all-rounder; other names describe
// the workload shape they're tuned for.
/// PerWorker + OnFinalize + EveryPush. The robust all-rounder.
pub type Robust = Policy;
/// The default policy. Alias for Robust.
pub type Default = Robust;
/// Same axes as Default. Distinguished by Spec configuration (larger arenas).
pub type GraphHeavy = Robust;
/// Shared + OnArrival + EveryPush. Wide trees (bf=20+).
pub type WideLight = Policy<queue::Shared, accumulate::OnArrival>;
/// PerWorker + OnFinalize + OncePerBatch. Overhead-sensitive (noop-like).
pub type LowOverhead = Policy<queue::PerWorker, accumulate::OnFinalize, wake::OncePerBatch>;
/// PerWorker + OnArrival + EveryPush. Streaming sweep with per-worker deques.
pub type PerWorkerArrival = Policy<queue::PerWorker, accumulate::OnArrival>;
/// Shared + OnFinalize + EveryPush.
pub type SharedDefault = Policy<queue::Shared>;
/// PerWorker + OnFinalize + EveryK<4>. Balanced wakeups for heavy workloads.
pub type HighThroughput = Policy<queue::PerWorker, accumulate::OnFinalize, wake::EveryK<4>>;
/// Shared + OnArrival + OncePerBatch.
pub type StreamingWide = Policy<queue::Shared, accumulate::OnArrival, wake::OncePerBatch>;
/// PerWorker + OnFinalize + EveryK<2>. For deep narrow trees (bf=2).
pub type DeepNarrow = Policy<queue::PerWorker, accumulate::OnFinalize, wake::EveryK<2>>;
````<div class="mdbook-graphviz-output"><!-- Generated by graphviz version 2.43.0 (0) --><!-- Title: %3 Pages: 1 --><svg width="671pt" height="188pt" viewBox="0.00 0.00 671.00 188.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 184)"><title>%3</title><polygon fill="white" stroke="transparent" points="-4,4 -4,-184 667,-184 667,4 -4,4"/><!-- policy --><g id="node1" class="node"><title>policy</title><path fill="#e2d9f3" stroke="black" d="M352,-180C352,-180 255,-180 255,-180 249,-180 243,-174 243,-168 243,-168 243,-156 243,-156 243,-150 249,-144 255,-144 255,-144 352,-144 352,-144 358,-144 364,-150 364,-156 364,-156 364,-168 364,-168 364,-174 358,-180 352,-180"/><text text-anchor="middle" x="303.5" y="-165" font-family="sans-serif" font-size="10.00">FunnelPolicy</text><text text-anchor="middle" x="303.5" y="-154" font-family="sans-serif" font-size="10.00">(one type parameter)</text></g><!-- q --><g id="node2" class="node"><title>q</title><path fill="#cce5ff" stroke="black" d="M192,-108C192,-108 135,-108 135,-108 129,-108 123,-102 123,-96 123,-96 123,-84 123,-84 123,-78 129,-72 135,-72 135,-72 192,-72 192,-72 198,-72 204,-78 204,-84 204,-84 204,-96 204,-96 204,-102 198,-108 192,-108"/><text text-anchor="middle" x="163.5" y="-93" font-family="sans-serif" font-size="10.00">Queue</text><text text-anchor="middle" x="163.5" y="-82" font-family="sans-serif" font-size="10.00">WorkStealing</text></g><!-- policy->q --><g id="edge1" class="edge"><title>policy->q</title><path fill="none" stroke="black" d="M269.25,-143.88C250.44,-134.47 226.92,-122.71 206.83,-112.67"/><polygon fill="black" stroke="black" points="208.39,-109.53 197.88,-108.19 205.26,-115.79 208.39,-109.53"/></g><!-- a --><g id="node3" class="node"><title>a</title><path fill="#fff3cd" stroke="black" d="M349.5,-108C349.5,-108 257.5,-108 257.5,-108 251.5,-108 245.5,-102 245.5,-96 245.5,-96 245.5,-84 245.5,-84 245.5,-78 251.5,-72 257.5,-72 257.5,-72 349.5,-72 349.5,-72 355.5,-72 361.5,-78 361.5,-84 361.5,-84 361.5,-96 361.5,-96 361.5,-102 355.5,-108 349.5,-108"/><text text-anchor="middle" x="303.5" y="-93" font-family="sans-serif" font-size="10.00">Accumulate</text><text text-anchor="middle" x="303.5" y="-82" font-family="sans-serif" font-size="10.00">AccumulateStrategy</text></g><!-- policy->a --><g id="edge2" class="edge"><title>policy->a</title><path fill="none" stroke="black" d="M303.5,-143.7C303.5,-135.98 303.5,-126.71 303.5,-118.11"/><polygon fill="black" stroke="black" points="307,-118.1 303.5,-108.1 300,-118.1 307,-118.1"/></g><!-- w --><g id="node4" class="node"><title>w</title><path fill="#d4edda" stroke="black" d="M520.5,-108C520.5,-108 460.5,-108 460.5,-108 454.5,-108 448.5,-102 448.5,-96 448.5,-96 448.5,-84 448.5,-84 448.5,-78 454.5,-72 460.5,-72 460.5,-72 520.5,-72 520.5,-72 526.5,-72 532.5,-78 532.5,-84 532.5,-84 532.5,-96 532.5,-96 532.5,-102 526.5,-108 520.5,-108"/><text text-anchor="middle" x="490.5" y="-93" font-family="sans-serif" font-size="10.00">Wake</text><text text-anchor="middle" x="490.5" y="-82" font-family="sans-serif" font-size="10.00">WakeStrategy</text></g><!-- policy->w --><g id="edge3" class="edge"><title>policy->w</title><path fill="none" stroke="black" d="M349.25,-143.88C376.33,-133.74 410.72,-120.87 438.78,-110.36"/><polygon fill="black" stroke="black" points="440.29,-113.53 448.42,-106.75 437.83,-106.98 440.29,-113.53"/></g><!-- pw --><g id="node5" class="node"><title>pw</title><path fill="#cce5ff" stroke="black" d="M99,-36C99,-36 12,-36 12,-36 6,-36 0,-30 0,-24 0,-24 0,-12 0,-12 0,-6 6,0 12,0 12,0 99,0 99,0 105,0 111,-6 111,-12 111,-12 111,-24 111,-24 111,-30 105,-36 99,-36"/><text text-anchor="middle" x="55.5" y="-20.8" font-family="sans-serif" font-size="9.00">PerWorker</text><text text-anchor="middle" x="55.5" y="-10.8" font-family="sans-serif" font-size="9.00">Chase-Lev + bitmask</text></g><!-- q->pw --><g id="edge4" class="edge"><title>q->pw</title><path fill="none" stroke="black" d="M137.08,-71.88C123.09,-62.81 105.72,-51.55 90.61,-41.76"/><polygon fill="black" stroke="black" points="92.32,-38.69 82.02,-36.19 88.51,-44.57 92.32,-38.69"/></g><!-- shared --><g id="node6" class="node"><title>shared</title><path fill="#cce5ff" stroke="black" d="M185.5,-36C185.5,-36 141.5,-36 141.5,-36 135.5,-36 129.5,-30 129.5,-24 129.5,-24 129.5,-12 129.5,-12 129.5,-6 135.5,0 141.5,0 141.5,0 185.5,0 185.5,0 191.5,0 197.5,-6 197.5,-12 197.5,-12 197.5,-24 197.5,-24 197.5,-30 191.5,-36 185.5,-36"/><text text-anchor="middle" x="163.5" y="-20.8" font-family="sans-serif" font-size="9.00">Shared</text><text text-anchor="middle" x="163.5" y="-10.8" font-family="sans-serif" font-size="9.00">StealQueue</text></g><!-- q->shared --><g id="edge5" class="edge"><title>q->shared</title><path fill="none" stroke="black" d="M163.5,-71.7C163.5,-63.98 163.5,-54.71 163.5,-46.11"/><polygon fill="black" stroke="black" points="167,-46.1 163.5,-36.1 160,-46.1 167,-46.1"/></g><!-- arrive --><g id="node7" class="node"><title>arrive</title><path fill="#fff3cd" stroke="black" d="M297.5,-36C297.5,-36 227.5,-36 227.5,-36 221.5,-36 215.5,-30 215.5,-24 215.5,-24 215.5,-12 215.5,-12 215.5,-6 221.5,0 227.5,0 227.5,0 297.5,0 297.5,0 303.5,0 309.5,-6 309.5,-12 309.5,-12 309.5,-24 309.5,-24 309.5,-30 303.5,-36 297.5,-36"/><text text-anchor="middle" x="262.5" y="-20.8" font-family="sans-serif" font-size="9.00">OnArrival</text><text text-anchor="middle" x="262.5" y="-10.8" font-family="sans-serif" font-size="9.00">streaming sweep</text></g><!-- a->arrive --><g id="edge6" class="edge"><title>a->arrive</title><path fill="none" stroke="black" d="M293.37,-71.7C288.65,-63.64 282.94,-53.89 277.72,-44.98"/><polygon fill="black" stroke="black" points="280.59,-42.96 272.52,-36.1 274.55,-46.5 280.59,-42.96"/></g><!-- finalize --><g id="node8" class="node"><title>finalize</title><path fill="#fff3cd" stroke="black" d="M383.5,-36C383.5,-36 339.5,-36 339.5,-36 333.5,-36 327.5,-30 327.5,-24 327.5,-24 327.5,-12 327.5,-12 327.5,-6 333.5,0 339.5,0 339.5,0 383.5,0 383.5,0 389.5,0 395.5,-6 395.5,-12 395.5,-12 395.5,-24 395.5,-24 395.5,-30 389.5,-36 383.5,-36"/><text text-anchor="middle" x="361.5" y="-20.8" font-family="sans-serif" font-size="9.00">OnFinalize</text><text text-anchor="middle" x="361.5" y="-10.8" font-family="sans-serif" font-size="9.00">bulk sweep</text></g><!-- a->finalize --><g id="edge7" class="edge"><title>a->finalize</title><path fill="none" stroke="black" d="M317.84,-71.7C324.79,-63.3 333.27,-53.07 340.9,-43.86"/><polygon fill="black" stroke="black" points="343.64,-46.04 347.33,-36.1 338.25,-41.57 343.64,-46.04"/></g><!-- every --><g id="node9" class="node"><title>every</title><path fill="#d4edda" stroke="black" d="M465.5,-36C465.5,-36 425.5,-36 425.5,-36 419.5,-36 413.5,-30 413.5,-24 413.5,-24 413.5,-12 413.5,-12 413.5,-6 419.5,0 425.5,0 425.5,0 465.5,0 465.5,0 471.5,0 477.5,-6 477.5,-12 477.5,-12 477.5,-24 477.5,-24 477.5,-30 471.5,-36 465.5,-36"/><text text-anchor="middle" x="445.5" y="-15.8" font-family="sans-serif" font-size="9.00">EveryPush</text></g><!-- w->every --><g id="edge8" class="edge"><title>w->every</title><path fill="none" stroke="black" d="M479.38,-71.7C474.14,-63.56 467.8,-53.69 462.02,-44.7"/><polygon fill="black" stroke="black" points="464.85,-42.62 456.5,-36.1 458.96,-46.41 464.85,-42.62"/></g><!-- once --><g id="node10" class="node"><title>once</title><path fill="#d4edda" stroke="black" d="M563.5,-36C563.5,-36 507.5,-36 507.5,-36 501.5,-36 495.5,-30 495.5,-24 495.5,-24 495.5,-12 495.5,-12 495.5,-6 501.5,0 507.5,0 507.5,0 563.5,0 563.5,0 569.5,0 575.5,-6 575.5,-12 575.5,-12 575.5,-24 575.5,-24 575.5,-30 569.5,-36 563.5,-36"/><text text-anchor="middle" x="535.5" y="-15.8" font-family="sans-serif" font-size="9.00">OncePerBatch</text></g><!-- w->once --><g id="edge9" class="edge"><title>w->once</title><path fill="none" stroke="black" d="M501.62,-71.7C506.86,-63.56 513.2,-53.69 518.98,-44.7"/><polygon fill="black" stroke="black" points="522.04,-46.41 524.5,-36.1 516.15,-42.62 522.04,-46.41"/></g><!-- everyk --><g id="node11" class="node"><title>everyk</title><path fill="#d4edda" stroke="black" d="M651,-36C651,-36 606,-36 606,-36 600,-36 594,-30 594,-24 594,-24 594,-12 594,-12 594,-6 600,0 606,0 606,0 651,0 651,0 657,0 663,-6 663,-12 663,-12 663,-24 663,-24 663,-30 657,-36 651,-36"/><text text-anchor="middle" x="628.5" y="-15.8" font-family="sans-serif" font-size="9.00">EveryK<K></text></g><!-- w->everyk --><g id="edge10" class="edge"><title>w->everyk</title><path fill="none" stroke="black" d="M524.26,-71.88C542.72,-62.51 565.78,-50.81 585.52,-40.8"/><polygon fill="black" stroke="black" points="587.28,-43.83 594.61,-36.19 584.11,-37.59 587.28,-43.83"/></g></g></svg></div>
See [Policies](policies.md) for the full decision guide and
benchmark-informed recommendations.
# Reading order
|Page|What you learn|
|----|--------------|
|[CPS walk](cps_walk.md)|The downward pass: how nodes are processed and tasks created|
|[Continuations](continuations.md)|`FunnelTask`, `Cont`, `ChainNode`, `RootCell` — the CPS data types|
|[Cascade](cascade.md)|`fire_cont`: the trampolined upward pass|
|[Ticket system](ticket_system.md)|Packed `AtomicU64` for exactly-one-finalizer detection|
|[Pool and dispatch](pool_dispatch.md)|Thread pool, `Job` struct, the `dispatch()` CPS lifecycle|
|[Queue strategies](queue_strategies.md)|PerWorker (Chase-Lev + bitmask) vs Shared (StealQueue)|
|[Accumulation](accumulation.md)|OnArrival (streaming sweep) vs OnFinalize (bulk)|
|[Policies](policies.md)|`FunnelPolicy` GAT, three axes, named presets, decision guide|
|[Infrastructure](infrastructure.md)|Arena, ContArena, WorkerDeque, EventCount|
|[Testing](testing.md)|Correctness, stress, interleaving proof|
}
Policies: Configuration and Presets
The funnel’s behavior is fully determined by three compile-time axes
bundled into FunnelPolicy:
#![allow(unused)]
fn main() {
/// Bundles queue topology, accumulation strategy, and wake policy.
/// One type parameter on the executor replaces three.
pub trait FunnelPolicy: 'static {
type Queue: WorkStealing;
type Accumulate: AccumulateStrategy;
type Wake: WakeStrategy;
}
}
The Spec
#![allow(unused)]
fn main() {
pub struct Spec<P: FunnelPolicy = policy::Default> {
/// Pool size for `.run()` and `.session()`. Not consulted when
/// attaching to an explicit pool via `.attach()`.
pub default_pool_size: usize,
pub queue: <P::Queue as WorkStealing>::Spec,
pub accumulate: <P::Accumulate as AccumulateStrategy>::Spec,
pub wake: <P::Wake as WakeStrategy>::Spec,
}
}
Each axis contributes its Spec type. default_pool_size sets the
thread count for one-shot execution. Arenas grow lazily via
segmented allocation — no capacity configuration.
Named presets
#![allow(unused)]
fn main() {
// ── Named presets (type aliases) ─────────────────
//
// Each is a concrete instantiation of Policy<Q, A, W>.
// `Default` aliases the robust all-rounder; other names describe
// the workload shape they're tuned for.
/// PerWorker + OnFinalize + EveryPush. The robust all-rounder.
pub type Robust = Policy;
/// The default policy. Alias for Robust.
pub type Default = Robust;
/// Same axes as Default. Distinguished by Spec configuration (larger arenas).
pub type GraphHeavy = Robust;
/// Shared + OnArrival + EveryPush. Wide trees (bf=20+).
pub type WideLight = Policy<queue::Shared, accumulate::OnArrival>;
/// PerWorker + OnFinalize + OncePerBatch. Overhead-sensitive (noop-like).
pub type LowOverhead = Policy<queue::PerWorker, accumulate::OnFinalize, wake::OncePerBatch>;
/// PerWorker + OnArrival + EveryPush. Streaming sweep with per-worker deques.
pub type PerWorkerArrival = Policy<queue::PerWorker, accumulate::OnArrival>;
/// Shared + OnFinalize + EveryPush.
pub type SharedDefault = Policy<queue::Shared>;
/// PerWorker + OnFinalize + EveryK<4>. Balanced wakeups for heavy workloads.
pub type HighThroughput = Policy<queue::PerWorker, accumulate::OnFinalize, wake::EveryK<4>>;
/// Shared + OnArrival + OncePerBatch.
pub type StreamingWide = Policy<queue::Shared, accumulate::OnArrival, wake::OncePerBatch>;
/// PerWorker + OnFinalize + EveryK<2>. For deep narrow trees (bf=2).
pub type DeepNarrow = Policy<queue::PerWorker, accumulate::OnFinalize, wake::EveryK<2>>;
}
Nine names map to seven distinct monomorphizations:
| Preset | Queue | Accumulate | Wake | Use case |
|---|---|---|---|---|
Default / Robust | PerWorker | OnFinalize | EveryPush | All-rounder |
GraphHeavy | (same as Robust) | Large trees (alias for Robust) | ||
WideLight | Shared | OnArrival | EveryPush | bf > 10 |
LowOverhead | PerWorker | OnFinalize | OncePerBatch | Noop-sensitive |
PerWorkerArrival | PerWorker | OnArrival | EveryPush | Streaming + deques |
SharedDefault | Shared | OnFinalize | EveryPush | Shared baseline |
HighThroughput | PerWorker | OnFinalize | EveryK<4> | Heavy balanced |
StreamingWide | Shared | OnArrival | OncePerBatch | Known +11% fold-hv |
DeepNarrow | PerWorker | OnFinalize | EveryK<2> | bf=2 chains |
Decision guide
Start from the tree shape, then refine by work distribution:
When unsure, use Spec::default(n) — the Robust preset has zero
regressions on any benchmarked workload.
The three usage tiers
#![allow(unused)]
fn main() {
use hylic::prelude::*;
// One-shot: creates pool, runs, joins
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
// Session scope: pool lives for the closure, multiple folds share it
exec(funnel::Spec::default(8)).session(|s| {
s.run(&fold1, &graph1, &root1);
s.run(&fold2, &graph2, &root2);
});
// Explicit attach: manual pool management
funnel::Pool::with(8, |pool| {
exec(funnel::Spec::default(8)).attach(pool).run(&fold, &graph, &root);
});
}
See The Exec pattern for the type-level design behind these tiers.
Wake strategies
#![allow(unused)]
fn main() {
/// Wake strategy: when to notify idle workers of pushed tasks.
///
/// `State` is per-worker mutable state (embedded in WorkerCtx as
/// `Cell<State>`). Created once via `init_state`, reset per visit batch.
pub trait WakeStrategy: 'static {
type Spec: Copy + Default + Send + Sync;
type State: Copy;
fn init_state(spec: &Self::Spec) -> Self::State;
/// Called after each successful push.
/// Returns true if the caller should wake an idle worker.
fn should_notify(state: &mut Self::State) -> bool;
/// Called before each graph.visit batch.
fn reset(state: &mut Self::State);
}
}
| Strategy | Behavior | Per-worker state |
|---|---|---|
EveryPush | Notify on every push | () (none) |
OncePerBatch | First push per graph.visit only | bool |
EveryK<K> | Every K-th push (K is const generic) | u32 counter |
EveryK<K> uses a const generic — the modulus compiles to a bitmask
when K is a power of 2.
Zero-cost monomorphization
The entire call chain is generic over P: FunnelPolicy. The compiler
generates separate code per policy — WorkerCtx, worker_loop,
walk_cps, fire_cont, push_task are all monomorphized. No vtable,
no trait object, no indirect call. Each push and try_acquire is a
direct, inlinable function call.
CPS Walk: The Downward Pass
walk_cps is the core of the funnel executor. It processes one node
at a time: initializes the fold heap, iterates children through the
graph’s push-based visitor, and branches on the child count. It is a
void function — results flow through continuations, not return
values. This is what makes cross-thread result delivery possible
without blocking.
The algorithm
#![allow(unused)]
fn main() {
pub(crate) fn walk_cps<N, H, R, F, G, P: FunnelPolicy>(
wctx: &WorkerCtx<N, H, R, F, G, P>,
mut node: N,
mut cont: Cont<H, R>,
) where
F: FoldOps<N, H, R> + 'static,
G: TreeOps<N> + 'static,
N: Clone + Send + 'static,
H: 'static,
R: Send + 'static,
{
let ctx = wctx.ctx;
loop {
let fold = ctx.fold_ref();
let graph = ctx.graph_ref();
let chain_arena = ctx.chain_arena();
let cont_arena = ctx.cont_arena();
let heap = fold.init(&node);
let mut child_count = 0u32;
let mut first_child: Option<N> = None;
let mut chain_idx: Option<super::super::infra::arena::ArenaIdx> = None;
let mut heap_opt = Some(heap);
let mut cont_opt = Some(cont);
wctx.reset_wake();
graph.visit(&node, &mut |child: &N| {
child_count += 1;
if child_count == 1 {
first_child = Some(child.clone());
} else {
if child_count == 2 {
let cn = ChainNode::new(heap_opt.take().unwrap(), cont_opt.take().unwrap());
let idx = chain_arena.alloc(cn);
// SAFETY: idx was just returned by chain_arena.alloc
// (or a prior iteration within this visit closure) and
// the arena lives for the pool duration.
let node_ref = unsafe { chain_arena.get(idx) };
node_ref.chain.append_slot();
chain_idx = Some(idx);
}
let idx = chain_idx.unwrap();
// SAFETY: idx was just returned by chain_arena.alloc
// (or a prior iteration within this visit closure) and
// the arena lives for the pool duration.
let node_ref = unsafe { chain_arena.get(idx) };
let slot = node_ref.chain.append_slot();
wctx.push_task(FunnelTask::Walk {
child: child.clone(),
cont: Cont::Slot { node: idx, slot },
});
}
});
match child_count {
0 => {
let heap = heap_opt.take().unwrap();
let cont = cont_opt.take().unwrap();
let result = fold.finalize(&heap);
fire_cont::<N, H, R, F, G, P>(ctx, cont, result);
return;
}
1 => {
let child = first_child.unwrap();
let heap = heap_opt.take().unwrap();
let parent_cont = cont_opt.take().unwrap();
let parent_idx = cont_arena.alloc(parent_cont);
node = child;
cont = Cont::Direct { heap, parent_idx };
}
_ => {
let idx = chain_idx.unwrap();
// SAFETY: idx came from chain_arena.alloc above.
let cn = unsafe { chain_arena.get(idx) };
let fold = ctx.fold_ref();
let set_total_result = P::Accumulate::set_total(&cn.chain, fold);
if let Some(finalized) = set_total_result {
let parent = cn.take_parent_cont();
fire_cont::<N, H, R, F, G, P>(ctx, parent, finalized);
return;
}
let child = first_child.unwrap();
node = child;
cont = Cont::Slot { node: idx, slot: SlotRef(0) };
}
}
}
}
}
The function takes (wctx, node, cont):
wctx: per-worker context (queue handle + wake state)node: the graph node to processcont: what to do with this node’s result
It loops (trampolined for the inline child case), processing one node per iteration.
Child-count branching
After graph.visit returns, the child count determines the control flow:
Leaf (0 children): Finalize the heap and call
fire_cont with the original continuation. This is
the base case — the upward cascade begins here.
Single child (1): No ChainNode needed. The heap moves into a
Cont::Direct, the parent continuation is stored in the ContArena,
and the loop continues with the child. Zero queue interaction, zero
atomic operations.
Multi-child (2+): A ChainNode is allocated in the arena
(lazily, on child 2 — not child 1). Children 1..K are pushed as
FunnelTask::Walk to the queue. Then set_total records the child
count in the ticket system. The loop continues
with child 0 (inline walk).
First-child inlining
Child 0 is ALWAYS walked inline — a continuation of the current thread’s DFS spine, with zero queue overhead. Siblings are pushed to the queue for workers to steal. This gives every active thread a guaranteed DFS path from its entry point to a leaf:
Red edges = inline walks (zero queue cost). Dashed = queue submissions. Thread 0 walks root → c0 → c00 → … → leaf without touching the queue at any level. This is structurally equivalent to Cilk’s continuation-stealing, inverted: we push sibling tasks (child stealing) instead of stealing the parent’s continuation.
Three compounding effects make this critical:
- Zero-queue spine. For depth D, one thread processes D nodes with no push/pop overhead (~20-50ns saved per level).
- Cache warmth.
ChainNodes allocated on the way down are in L1 cache on the way up viafire_cont. - Reduced contention. One fewer task per level competing for deque access.
Defunctionalization
Tasks are data, not closures:
#![allow(unused)]
fn main() {
pub enum FunnelTask<N, H, R> {
Walk { child: N, cont: Cont<H, R> },
}
}
FunnelTask::Walk pairs a child node with its continuation — plain
data stored inline in deque slots. No Box<dyn FnOnce>, no closure
capture, no vtable. The execute_task function is the apply:
#![allow(unused)]
fn main() {
pub(crate) fn execute_task<N, H, R, F, G, P: FunnelPolicy>(
wctx: &WorkerCtx<N, H, R, F, G, P>,
task: FunnelTask<N, H, R>,
) where
F: FoldOps<N, H, R> + 'static,
G: TreeOps<N> + 'static,
N: Clone + Send + 'static,
H: 'static,
R: Send + 'static,
{
match task {
FunnelTask::Walk { child, cont } => walk_cps(wctx, child, cont),
}
}
}
This is the Reynolds/Danvy defunctionalization transformation applied to parallel work items.
Streaming submission
Children are pushed to the queue during graph.visit, not after.
Workers can steal siblings while the parent is still discovering
more children. append_slot is called per child inside the callback;
set_total is called after graph.visit returns. Between these two
events, workers may deliver results to already-appended slots. The
ticket system handles this race.
Task submission and wake
#![allow(unused)]
fn main() {
pub(crate) fn push_task(&self, task: FunnelTask<N, H, R>) {
if let Some(overflow) = self.handle.push(task) {
execute_task(self, overflow);
return;
}
let mut state = self.wake_state.get();
if P::Wake::should_notify(&mut state) {
self.view().notify_idle();
}
self.wake_state.set(state);
}
}
push goes through the policy’s queue handle. If the queue is
full, the task is executed inline (Cilk overflow protocol). Otherwise,
the wake strategy decides whether to notify a parked worker.
Worked example
A sum fold over tree R(A(D,E), B, C) where D, E, B, C are leaves.
Thread 0 is the caller; threads 1-2 are workers.
- Thread 0 walks the left spine (R→A→D) inline
- Thread 1 steals B, then E — becomes finalizer for A, cascades A’s result to R
- Thread 2 steals C — becomes finalizer for R, fires
Cont::Root - The fold completes when any thread fires
Cont::Root
Cross-references
- Continuations —
Cont,FunnelTask,ChainNode - Cascade —
fire_cont: the trampolined upward pass - Ticket system — how
set_totaldetermines the finalizer - Queue strategies — how
push_taskdispatches to PerWorker or Shared
Continuations: CPS Data Types
Three types carry the fold’s state through the CPS pipeline:
FunnelTask (the parallelism boundary), Cont (the continuation),
and ChainNode (the multi-child accumulator). A fourth, RootCell,
is the terminal sink for the final result. Together they replace
implicit stack frames with explicit data that can be created on one
thread and consumed on another.
FunnelTask
#![allow(unused)]
fn main() {
pub enum FunnelTask<N, H, R> {
Walk { child: N, cont: Cont<H, R> },
}
}
The unit of parallelism. Stored inline in deque slots (PerWorker)
or queue segments (Shared). No heap allocation per task — the enum
variant IS the data. N must be Clone + Send (cloned during
graph.visit, sent across threads). R must be Send (results are
moved across threads via destructive slot reads). H has no bounds
— it travels inside Cont::Direct.
Cont
#![allow(unused)]
fn main() {
pub enum Cont<H, R> {
/// Raw pointer to stack-local RootCell in run_fold.
/// SAFETY: The scoped pool guarantees all workers complete before
/// run_fold returns — the RootCell outlives every Cont::Root.
Root(*const RootCell<R>),
Direct { heap: H, parent_idx: ContIdx },
Slot { node: ArenaIdx, slot: SlotRef },
}
}
The defunctionalized continuation. Tells fire_cont
what to do with a result:
Cont::Root
Terminal. Created once per fold. When fire_cont receives it, the
fold is complete: the result is written to the RootCell and
fold_done is signaled. Size: 8 bytes (one raw pointer).
The RootCell lives on run_fold’s stack — no heap allocation.
The raw pointer is safe because the scoped pool guarantees all
workers complete before run_fold returns.
Cont::Direct
Single-child fast path. The heap value travels WITH the continuation
— no ChainNode, no FoldChain, no atomics. parent_idx is a
ContIdx(u32) into the ContArena. When fire_cont receives it:
accumulate the result into the heap, finalize, take the parent
continuation from the arena, continue the loop.
Size: sizeof(H) + 4 bytes.
Cont::Slot
Multi-child delivery. Two u32 indices: node (arena index to the
ChainNode) and slot (which position in the FoldChain). When
fire_cont receives it: deliver the result to the slot, check the
ticket. If this was the last event, sweep/finalize
the chain and take the parent continuation. If not, return — another
thread will finalize.
Size: 8 bytes (two u32, regardless of H or R).
ChainNode
#![allow(unused)]
fn main() {
pub(crate) struct ChainNode<H, R> {
pub(crate) chain: FoldChain<H, R>,
parent_cont: UnsafeCell<Option<Cont<H, R>>>,
}
}
Arena-allocated. Created lazily on child 2 (never for single-child nodes). Contains:
chain: theFoldChain— slot cells, heap, ticket stateparent_cont: the continuation of the creating node, moved out exactly once by the finalizing thread viatake_parent_cont()
Continuation graph
For a tree with root R, child A (2 children: C, D), and child B (1 child: E, leaf):
Leaf C finalizes, delivers to Slot{R,0}. Leaf E finalizes, fires Direct for B (accumulates + finalizes), delivers to Slot{R,1}. Whichever delivery is last (ticket) sweeps ChainNode(R) and fires Root.
Data ownership
Each CPS type lives in a specific memory region:
Deque stores tasks inline. Arena indices are u32 (Copy, no
refcount). The CPS pipeline has zero heap allocations on the
critical path — RootCell is stack-local, arenas grow lazily via
segmented allocation, and tasks are stored
inline in deque slots.
Size summary
| Type | Size | Notes |
|---|---|---|
Cont::Root | 8 bytes | raw pointer to stack-local RootCell |
Cont::Direct | sizeof(H) + 4 | heap value + ContIdx(u32) |
Cont::Slot | 8 bytes | ArenaIdx(u32) + SlotRef(u32) |
FunnelTask::Walk | sizeof(N) + sizeof(Cont) + tag | stored inline in deque |
ChainNode | sizeof(FoldChain) + sizeof(Option<Cont>) | arena-allocated |
RootCell | sizeof(Option<R>) + 1 | stack-local in run_fold |
Cascade: The Trampolined Upward Pass
When a child completes, fire_cont delivers its result and cascades
upward through the continuation chain. It is a loop, not
recursion — zero stack growth. One thread can cascade from leaf to
root without touching the queue.
This is the dual of walk_cps: walk descends,
fire_cont ascends. Together they form a single DFS round-trip —
down and back up — on one thread for the inline spine, handing off
across threads at multi-child boundaries.
The function
#![allow(unused)]
fn main() {
pub(crate) fn fire_cont<N, H, R, F, G, P: FunnelPolicy>(
ctx: &WalkCtx<'_, F, G, H, R, P>,
mut cont: Cont<H, R>,
mut result: R,
) where
F: FoldOps<N, H, R> + 'static,
G: TreeOps<N> + 'static,
N: Clone + Send + 'static,
H: 'static,
R: Send + 'static,
{
loop {
match cont {
Cont::Root(cell_ptr) => {
// SAFETY: cell_ptr points to stack-local RootCell in run_fold.
// The scoped pool guarantees this thread finishes before run_fold returns.
let cell = unsafe { &*cell_ptr };
cell.set(result);
let view = ctx.view_ref();
view.fold_done.store(true, Ordering::Release);
view.event().notify_all();
return;
}
Cont::Direct { mut heap, parent_idx } => {
let fold = ctx.fold_ref();
fold.accumulate(&mut heap, &result);
result = fold.finalize(&heap);
// SAFETY: parent_idx came from cont_arena.alloc in
// walk_cps; Direct conts are consumed exactly once
// (the child's fire_cont path), so this take is the
// sole reader.
cont = unsafe { ctx.cont_arena().take(parent_idx) };
}
Cont::Slot { node: node_idx, slot } => {
let arena = ctx.chain_arena();
// SAFETY: node_idx came from chain_arena.alloc in
// walk_cps and outlives every Cont::Slot that holds it
// (arena is freed by run_fold after all workers join).
let node = unsafe { arena.get(node_idx) };
let fold = ctx.fold_ref();
let delivered = P::Accumulate::deliver(&node.chain, slot, result, fold);
match delivered {
Some(finalized) => {
cont = node.take_parent_cont();
result = finalized;
}
None => return,
}
}
}
}
}
}
Three continuation variants, three behaviors, one loop.
Per-variant behavior
Cont::Root — terminal
The fold is complete. Write the result to the RootCell, set
fold_done, notify all parked workers. Cost: ~5ns. One cell write,
one atomic store, one futex wake.
Cont::Direct — single-child fast path
Accumulate the child result into the heap. Finalize. Take the parent
continuation from the ContArena. Continue the loop. No atomics, no
synchronization — pure sequential speed. The heap was moved INTO the
continuation by walk_cps. The entire single-child spine collapses
into loop iterations at ~2ns overhead + user work each.
Cont::Slot — multi-child delivery
Deliver the result to the FoldChain slot via
P::Accumulate::deliver(). The ticket system
determines if this thread is the last to arrive. If yes: sweep or
finalize the chain, take the parent continuation, continue cascading.
If no: return — this thread’s cascade is done.
This is where parallelism meets sequentiality. Multiple threads race to deliver. Exactly one wins. The winner cascades; the losers return to the help loop to steal more work.
The cascade as a round-trip
The walk and cascade form a symmetric pair — down through task creation, up through result delivery:
A leaf fires upward. Direct levels collapse at sequential speed. The Slot level requires atomic delivery and a ticket check. Root stores the result and signals completion.
Parallel interleaving
The cascade runs WITHOUT touching the queue. One thread goes up while other threads simultaneously go down:
Thread 0’s delivery to R and Thread 1’s delivery to R are concurrent
atomic operations on R’s FoldChain. The ticket
determines which thread cascades past R.
Cache warmth
The same thread that walks DOWN the DFS spine walks back UP via
fire_cont. ChainNodes allocated on the way down are in L1 cache
on the way up — no cross-core transfer. This is a structural
consequence of first-child inlining:
the allocating thread is the reading thread.
Compile-time accumulation dispatch
In the Cont::Slot arm, the accumulation strategy is resolved at
compile time:
let delivered = P::Accumulate::deliver(&node.chain, slot, result, fold);
P::Accumulate is an associated type on FunnelPolicy, resolved
via monomorphization. No runtime branch — the compiler inlines
deliver_and_sweep (OnArrival) or deliver_and_finalize (OnFinalize)
directly. See Accumulation for the two strategies.
Ticket System: Packed AtomicU64
Each multi-child node has K+1 concurrent events: K child deliveries
and 1 set_total. A single AtomicU64 determines which event is
last — the finalizer. No separate counters, no missed completions.
The problem
graph.visit produces children through a push callback. Workers may
complete and deliver results before the iterator finishes. Two things
happen concurrently:
- Child deliveries: each result written to a slot. Multiple threads, unpredictable order.
set_total: aftergraph.visitreturns, the child count is recorded.
Exactly one thread must detect that ALL events have occurred and perform the finalization.
Why two separate variables fail (Dekker race)
With delivered: AtomicU32 and total_known: AtomicBool separately:
- Thread A (delivering child K):
delivered.fetch_add(1), then checkstotal_known.load()→ seesfalse(stale) - Thread B (set_total):
total_known.store(true), then checksdelivered.load()→ seesK-1(stale)
Neither sees the complete state. Both exit. The fold hangs.
The solution: packed state
#![allow(unused)]
fn main() {
pub struct FoldChain<H, R> {
heap: UnsafeCell<H>,
first: SlotBuf<R>,
appended: AtomicU32,
state: AtomicU64, // low32: events_done, high32: total (0=unknown)
sweep: AtomicU32, // bit31: sweeping gate, bits 0-30: cursor position
done: AtomicBool, // finalized
}
}
state: AtomicU64 packs both counters into one word:
| Bits 63–32 | Bits 31–0 |
|---|---|
total (0 = unknown) | events_done |
#![allow(unused)]
fn main() {
fn pack_total(total: u32) -> u64 { (total as u64) << 32 }
fn unpack(state: u64) -> (u32, u32) { (state as u32, (state >> 32) as u32) }
}
How it works
Each event does a single fetch_add on state:
#![allow(unused)]
fn main() {
pub fn deliver_and_sweep<N>(&self, slot: SlotRef, result: R, fold: &impl FoldOps<N, H, R>) -> Option<R> {
let cell = self.slot_at(slot.0);
// SAFETY: each slot is delivered exactly once (slots come from
// append_slot + SlotRef, never cloned). The Release store on
// `filled` publishes the write to whoever sweeps via Acquire.
unsafe { (*cell.result.get()).write(result); }
cell.filled.store(true, Ordering::Release);
let prev = self.state.fetch_add(1, Ordering::Relaxed);
let (done_before, total) = unpack(prev);
let am_finalizer = total > 0 && done_before + 1 >= total;
}
Delivery adds 1 to the low 32 bits (events_done).
set_total adds pack_total(K) to the high 32 bits.
fetch_add is a read-modify-write (RMW) — atomically reads the
previous state, modifies it, and writes back. No window for
interleaving. Each event gets a unique snapshot of the previous state.
The finalizer condition on prev:
- Delivery:
prev_total > 0 && prev_done + 1 >= prev_total set_total:prev_done >= total
State transition examples
Deliveries before set_total:
All deliveries before set_total (set_total is finalizer):
In every interleaving, exactly one event transitions the state to
{done ≥ total, total > 0}.
Why Relaxed ordering is correct
The ticket determines WHO finalizes, not data visibility. Slot data
visibility is guaranteed by per-slot filled.store(true, Release) /
filled.load(Acquire) pairs. The sweep reads slots only after
confirming filled == true.
RMW linearization is ordering-independent: fetch_add operations on
a single atomic word are totally ordered by the CPU’s coherence
protocol regardless of the memory ordering specified. Relaxed does
not weaken atomicity — it only relaxes ordering with respect to OTHER
memory locations.
The exactly-one-finalizer proof
Claim: For K children, exactly one of K+1 events identifies itself as the finalizer.
Proof: The K+1 fetch_add operations form a total order (RMW
linearization). Each returns a unique prev. Before set_total
fires, total = 0 in all prev values — no delivery can satisfy
the condition. After set_total fires, each subsequent delivery
increments done. The delivery that pushes done to total is the
first (and only) to satisfy the condition. If all deliveries fire
before set_total, then set_total sees prev_done ≥ total and is
the finalizer. QED.
Pool and Dispatch
The pool provides persistent threads. The executor provides the work.
A thin Job struct (two words) bridges them. The dispatch function
encapsulates the full lifecycle: publish → body → seal → latch.
PoolState
#![allow(unused)]
fn main() {
pub(crate) struct PoolState {
pub shutdown: AtomicBool,
pub job_ptr: AtomicPtr<()>,
pub wake: EventCount,
/// Threads currently between loading job_ptr and returning from
/// the job call. dispatch waits for this to reach 0 before returning.
pub in_job: AtomicU32,
pub n_threads: usize,
pub dispatch_lock: Mutex<()>,
}
}
job_ptr: points to a stack-localJobduring dispatch, null otherwisein_job: threads currently in the job-handling region (the latch counter)wake: futex-basedEventCountfor thread parkingdispatch_lock: serializes folds (one fold at a time per pool)
Job
#![allow(unused)]
fn main() {
#[repr(C)]
pub(crate) struct Job {
pub call: unsafe fn(*const (), usize),
pub data: *const (),
}
}
call is a monomorphized worker_entry::<N, H, R, F, G, P> —
a concrete function pointer, not vtable dispatch. data points to
a stack-local FoldState. Two words, no allocation.
The dispatch lifecycle
#![allow(unused)]
fn main() {
// CPS lifecycle: publish → body → seal → latch.
// The body just does fold work and returns a result.
// All pool-thread synchronization is dispatch's responsibility.
pub(crate) fn dispatch<R>(state: &PoolState, job: &Job, body: impl FnOnce() -> R) -> R {
let _guard = state.dispatch_lock.lock().unwrap();
// Publish: make job visible to workers
state.job_ptr.store(job as *const Job as *mut (), Ordering::Release);
state.wake.notify_all();
// Body: caller participates in the fold
let result = body();
// Seal: prevent new workers from entering
state.job_ptr.store(std::ptr::null_mut(), Ordering::Release);
// Latch: wait for all workers to leave the job region in pool_thread.
// in_job brackets the entire load-job_ptr → call-worker_entry → return
// sequence, so in_job==0 guarantees no thread holds a reference to
// the stack-local Job or FoldState.
let mut spins = 0u32;
while state.in_job.load(Ordering::Acquire) > 0 {
spins += 1;
if spins > 5_000_000 {
panic!("dispatch latch: {} threads still in job region",
state.in_job.load(Ordering::Relaxed));
}
std::hint::spin_loop();
}
result
}
````<div class="mdbook-graphviz-output"><!-- Generated by graphviz version 2.43.0 (0) --><!-- Title: %3 Pages: 1 --><svg width="156pt" height="342pt" viewBox="0.00 0.00 156.00 342.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 338)"><title>%3</title><polygon fill="white" stroke="transparent" points="-4,4 -4,-338 152,-338 152,4 -4,4"/><!-- publish --><g id="node1" class="node"><title>publish</title><path fill="#fff3cd" stroke="black" d="M124,-334C124,-334 24,-334 24,-334 18,-334 12,-328 12,-322 12,-322 12,-305 12,-305 12,-299 18,-293 24,-293 24,-293 124,-293 124,-293 130,-293 136,-299 136,-305 136,-305 136,-322 136,-322 136,-328 130,-334 124,-334"/><text text-anchor="middle" x="74" y="-322" font-family="sans-serif" font-size="10.00">Publish</text><text text-anchor="middle" x="74" y="-311" font-family="sans-serif" font-size="10.00">job_ptr.store(Release)</text><text text-anchor="middle" x="74" y="-300" font-family="sans-serif" font-size="10.00">wake.notify_all()</text></g><!-- body --><g id="node2" class="node"><title>body</title><path fill="#d4edda" stroke="black" d="M122.5,-257C122.5,-257 25.5,-257 25.5,-257 19.5,-257 13.5,-251 13.5,-245 13.5,-245 13.5,-228 13.5,-228 13.5,-222 19.5,-216 25.5,-216 25.5,-216 122.5,-216 122.5,-216 128.5,-216 134.5,-222 134.5,-228 134.5,-228 134.5,-245 134.5,-245 134.5,-251 128.5,-257 122.5,-257"/><text text-anchor="middle" x="74" y="-245" font-family="sans-serif" font-size="10.00">Body</text><text text-anchor="middle" x="74" y="-234" font-family="sans-serif" font-size="10.00">walk_cps + help loop</text><text text-anchor="middle" x="74" y="-223" font-family="sans-serif" font-size="10.00">(caller is a worker)</text></g><!-- publish->body --><g id="edge1" class="edge"><title>publish->body</title><path fill="none" stroke="black" d="M74,-292.79C74,-284.96 74,-275.77 74,-267.16"/><polygon fill="black" stroke="black" points="77.5,-267.07 74,-257.07 70.5,-267.07 77.5,-267.07"/></g><!-- seal --><g id="node3" class="node"><title>seal</title><path fill="#cce5ff" stroke="black" d="M136,-180C136,-180 12,-180 12,-180 6,-180 0,-174 0,-168 0,-168 0,-156 0,-156 0,-150 6,-144 12,-144 12,-144 136,-144 136,-144 142,-144 148,-150 148,-156 148,-156 148,-168 148,-168 148,-174 142,-180 136,-180"/><text text-anchor="middle" x="74" y="-165" font-family="sans-serif" font-size="10.00">Seal</text><text text-anchor="middle" x="74" y="-154" font-family="sans-serif" font-size="10.00">job_ptr.store(null, Release)</text></g><!-- body->seal --><g id="edge2" class="edge"><title>body->seal</title><path fill="none" stroke="black" d="M74,-215.69C74,-207.91 74,-198.84 74,-190.45"/><polygon fill="black" stroke="black" points="77.5,-190.32 74,-180.32 70.5,-190.32 77.5,-190.32"/></g><!-- latch --><g id="node4" class="node"><title>latch</title><path fill="#f8d7da" stroke="black" d="M123,-108C123,-108 25,-108 25,-108 19,-108 13,-102 13,-96 13,-96 13,-84 13,-84 13,-78 19,-72 25,-72 25,-72 123,-72 123,-72 129,-72 135,-78 135,-84 135,-84 135,-96 135,-96 135,-102 129,-108 123,-108"/><text text-anchor="middle" x="74" y="-93" font-family="sans-serif" font-size="10.00">Latch</text><text text-anchor="middle" x="74" y="-82" font-family="sans-serif" font-size="10.00">while in_job > 0: spin</text></g><!-- seal->latch --><g id="edge3" class="edge"><title>seal->latch</title><path fill="none" stroke="black" d="M74,-143.7C74,-135.98 74,-126.71 74,-118.11"/><polygon fill="black" stroke="black" points="77.5,-118.1 74,-108.1 70.5,-118.1 77.5,-118.1"/></g><!-- ret --><g id="node5" class="node"><title>ret</title><path fill="#d4edda" stroke="black" d="M121.5,-36C121.5,-36 26.5,-36 26.5,-36 20.5,-36 14.5,-30 14.5,-24 14.5,-24 14.5,-12 14.5,-12 14.5,-6 20.5,0 26.5,0 26.5,0 121.5,0 121.5,0 127.5,0 133.5,-6 133.5,-12 133.5,-12 133.5,-24 133.5,-24 133.5,-30 127.5,-36 121.5,-36"/><text text-anchor="middle" x="74" y="-21" font-family="sans-serif" font-size="10.00">Return result</text><text text-anchor="middle" x="74" y="-10" font-family="sans-serif" font-size="10.00">stack safe to destroy</text></g><!-- latch->ret --><g id="edge4" class="edge"><title>latch->ret</title><path fill="none" stroke="black" d="M74,-71.7C74,-63.98 74,-54.71 74,-46.11"/><polygon fill="black" stroke="black" points="77.5,-46.1 74,-36.1 70.5,-46.1 77.5,-46.1"/></g></g></svg></div>
1. **Publish**: store the `Job` pointer, wake all threads
1. **Body**: the caller participates in the fold (walk root, help loop)
1. **Seal**: clear `job_ptr` — no new threads can enter
1. **Latch**: spin until `in_job == 0` — all threads have left the
job-handling region
1. **Return**: the `Job` and `FoldState` on the stack are safe to drop
The body knows nothing about pool lifecycle — it’s pure fold logic.
All synchronization is dispatch’s responsibility.
# Pool thread
````rust
fn pool_thread(state: &PoolState, thread_idx: usize) {
let mut last_epoch = 0u32;
loop {
loop {
let token = state.wake.prepare();
if state.shutdown.load(Ordering::Acquire) { return; }
if token.epoch() > last_epoch {
last_epoch = token.epoch();
break;
}
state.wake.wait(token);
}
// in_job MUST be incremented BEFORE loading job_ptr.
// This closes the TOCTOU gap: the body cannot return (destroying
// the Job/FoldState on the stack) while any thread is between
// loading job_ptr and finishing the job call.
state.in_job.fetch_add(1, Ordering::Acquire);
let ptr = state.job_ptr.load(Ordering::Acquire);
if !ptr.is_null() {
// SAFETY: non-null ptr was published by dispatch, which
// holds the dispatch_lock and does not seal (nor drop the
// Job) until `in_job` returns to zero. We incremented
// `in_job` before loading ptr, so the seal cannot have
// happened yet — the referent is live.
let job = unsafe { &*(ptr as *const Job) };
// SAFETY: `job.call` is the worker_entry function for the
// matching FoldState; `job.data` is a `*const FoldState<…>`
// cast erased at the Job boundary. The caller (dispatch)
// guarantees the FoldState is live for the duration of
// this call via the same `in_job` latch.
unsafe { (job.call)(job.data, thread_idx); }
}
state.in_job.fetch_sub(1, Ordering::Release);
}
}
}
The critical ordering: in_job increment happens before
job_ptr load. This closes the TOCTOU gap:
Without this ordering, a thread could load job_ptr (valid), then
the body returns and destroys the stack, then the thread dereferences
the destroyed pointer → SIGSEGV. The in_job counter makes the
thread visible to the latch before it touches the pointer.
run_fold
#![allow(unused)]
fn main() {
pub(crate) fn run_fold<N, H, R, F, G, P: FunnelPolicy>(
fold: &F, graph: &G, root: &N,
pool_state: &PoolState, spec: &Spec<P>,
) -> R
where
F: FoldOps<N, H, R> + 'static, G: TreeOps<N> + 'static,
N: Clone + Send + 'static, H: 'static, R: Send + 'static,
{
let store = P::Queue::create_store(&spec.queue, pool_state.n_threads);
let chain_arena = Arena::<ChainNode<H, R>>::new();
let cont_arena = ContArena::<Cont<H, R>>::new();
let root_cell = RootCell::new();
let view = FoldView {
pool_state,
fold_done: AtomicBool::new(false),
idle_count: AtomicU32::new(0),
n_workers: pool_state.n_threads,
};
let ctx = WalkCtx {
fold,
graph,
view: &view,
chain_arena: &chain_arena,
cont_arena: &cont_arena,
_policy: std::marker::PhantomData,
};
let state = FoldState::<N, H, R, F, G, P> {
ctx: &ctx,
store: &store,
};
// The ONE unsafe boundary: erase typed FoldState to *const () for the Job.
let job = Job {
call: worker_entry::<N, H, R, F, G, P>,
data: &state as *const FoldState<N, H, R, F, G, P> as *const (),
};
dispatch(pool_state, &job, || {
let caller_idx = view.n_workers;
let handle = P::Queue::handle(&store, caller_idx);
let wake_state = Cell::new(P::Wake::init_state(&spec.wake));
let wctx = WorkerCtx::<N, H, R, F, G, P> { ctx: &ctx, handle, wake_state };
walk_cps(&wctx, root.clone(), Cont::Root(&root_cell as *const RootCell<R>));
let mut spins = 0u64;
while !root_cell.is_done() {
if let Some(task) = wctx.handle.try_acquire() {
execute_task(&wctx, task);
spins = 0;
} else {
spins += 1;
if spins > 10_000_000 {
panic!("run_fold hung: root_done={}", root_cell.is_done());
}
std::hint::spin_loop();
}
}
root_cell.take()
})
}
}
Creates per-fold state (store, arenas, root cell, view, context),
erases it to *const () for the Job, and delegates to dispatch.
The body walks the root and help-loops until root_cell.is_done().
Scoped pool
Pool::with(n, |pool| ...) uses std::thread::scope — threads
are joined when the closure returns. No leaked threads, no lifetime
footguns.
The pool is the executor’s Resource (defined by the Resource
GAT on ExecutorSpec). It can be provided explicitly via .attach(),
or created internally by .run() / .session():
#![allow(unused)]
fn main() {
use hylic::prelude::*;
// One-shot: pool created + destroyed per fold
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
// Session: pool shared across folds
exec(funnel::Spec::default(8)).session(|s| {
s.run(&fold, &graph, &root);
s.run(&fold, &graph, &root);
});
// Explicit attach: manual pool, multiple policies
funnel::Pool::with(8, |pool| {
let pw = exec(funnel::Spec::default(8)).attach(pool);
let sh = exec(funnel::Spec::for_wide_light(8)).attach(pool);
pw.run(&fold, &graph, &root);
sh.run(&fold, &graph, &root);
});
}
Thread spawn/join cost is paid once per pool scope. Each .run()
allocates working memory fresh — only threads are shared.
Queue Strategies
Two work-stealing strategies, selected at compile time via the
WorkStealing trait:
#![allow(unused)]
fn main() {
/// A work-stealing strategy. Associates typed Store and Handle via GATs.
pub trait WorkStealing: 'static {
type Spec: Copy + Default + Send + Sync;
type Store<N: Send + 'static, H: 'static, R: Send + 'static>: Send + Sync;
type Handle<'a, N: Send + 'static, H: 'static, R: Send + 'static>: TaskOps<N, H, R>
where Self: 'a;
fn create_store<N: Send + 'static, H: 'static, R: Send + 'static>(
spec: &Self::Spec, n_workers: usize,
) -> Self::Store<N, H, R>;
fn reset_store<N: Send + 'static, H: 'static, R: Send + 'static>(
store: &mut Self::Store<N, H, R>,
);
fn handle<'a, N: Send + 'static, H: 'static, R: Send + 'static>(
store: &'a Self::Store<N, H, R>, worker_idx: usize,
) -> Self::Handle<'a, N, H, R>;
}
}
The three associated types capture the queue lifecycle:
Spec: construction-time configurationStore<N, H, R>: per-fold resources (deques, bitmask, etc.)Handle<'a, N, H, R>: per-worker view that borrows from Store
Workers interact through TaskOps:
#![allow(unused)]
fn main() {
/// Per-worker task operations. Each WorkStealing::Handle implements this.
pub trait TaskOps<N, H, R> {
/// Returns None on success, Some(task) if queue full (caller executes inline).
fn push(&self, task: FunnelTask<N, H, R>) -> Option<FunnelTask<N, H, R>>;
fn try_acquire(&self) -> Option<FunnelTask<N, H, R>>;
}
}
push returns Some(task) if the queue is full — the caller
executes inline (Cilk overflow protocol). try_acquire encapsulates
the strategy’s acquisition policy.
PerWorker: Chase-Lev deques + bitmask steal
Each worker owns a Chase-Lev deque. Push is LIFO (local, no atomic).
Steal uses an AtomicU64 bitmask to find non-empty deques — one
atomic load instead of scanning N deques.
#![allow(unused)]
fn main() {
impl<N: Send + 'static, H: 'static, R: Send + 'static> TaskOps<N, H, R>
for PerWorkerHandle<'_, N, H, R>
{
fn push(&self, task: FunnelTask<N, H, R>) -> Option<FunnelTask<N, H, R>> {
match self.my_deque.push(task) {
Ok(()) => {
self.work_available.fetch_or(1u64 << self.my_idx, Ordering::Relaxed);
None
}
Err(task) => Some(task),
}
}
fn try_acquire(&self) -> Option<FunnelTask<N, H, R>> {
// Local deque first — LIFO pop, cache-warm, no contention.
if let Some(task) = self.my_deque.pop() { return Some(task); }
// Bitmask-guided steal from other deques.
let mut bits = self.work_available.load(Ordering::Relaxed);
bits &= !(1u64 << self.my_idx);
while bits != 0 {
let target = bits.trailing_zeros() as usize;
if let Some(task) = self.all_deques[target].steal() {
return Some(task);
}
self.work_available.fetch_and(!(1u64 << target), Ordering::Relaxed);
bits &= !(1u64 << target);
}
None
}
}
}
Push: LIFO to own deque (no CAS), set bit in bitmask.
Try acquire: Pop local deque first (cache-warm, zero contention). If empty, load bitmask, iterate set bits, steal FIFO from first non-empty deque. Clear bit if deque found empty.
Best for: Trees with moderate-to-high branching. LIFO push + FIFO steal gives depth-first local execution with breadth-first work distribution. The bitmask avoids scanning all N deques.
Shared: single StealQueue
All threads push to one queue. All threads steal from it.
Push: fetch_add on bottom, write to segment slot.
Steal: CAS on top, read from segment slot. FIFO order.
Best for: Wide trees (bf > 10) and small trees where per-worker deque overhead is disproportionate. No bitmask, no per-deque allocation.
When to use which
| Workload | Strategy | Why |
|---|---|---|
| General (bf=4-8) | PerWorker | LIFO locality, bitmask steal |
| Wide (bf > 10) | Shared | No deque allocation per worker |
| Deep narrow (bf=2) | PerWorker | DFS spine dominates |
| Small tree (< 50 nodes) | Shared | Lower fixed overhead |
Accumulation Strategies
Two ways to fold child results into the parent’s heap, selected at
compile time via the AccumulateStrategy trait. Both preserve child
order — accumulate is called in slot order regardless of which worker
delivered first. This is what allows hylic’s
non-associative accumulate
to run correctly in parallel:
#![allow(unused)]
fn main() {
/// Accumulation strategy: how child results flow into the parent's heap.
pub trait AccumulateStrategy: 'static {
type Spec: Copy + Default + Send + Sync;
fn deliver<N, H, R>(
chain: &FoldChain<H, R>, slot: SlotRef, result: R,
fold: &impl FoldOps<N, H, R>,
) -> Option<R>;
fn set_total<N, H, R>(
chain: &FoldChain<H, R>,
fold: &impl FoldOps<N, H, R>,
) -> Option<R>;
}
}
Both use the same ticket system for last-event detection. They differ in WHEN and HOW the heap is swept.
OnArrival: streaming sweep
Each delivery tries to sweep contiguous filled slots immediately.
A CAS gate (bit 31 of sweep: AtomicU32) ensures only one thread
sweeps at a time. A cursor (bits 0-30) tracks sweep progress.
Per-delivery flow:
- Write result to slot,
filled.store(true, Release) state.fetch_add(1, Relaxed)— take ticket- Try CAS gate: if won, sweep contiguous filled slots from cursor
- If this was the last event (ticket), spin until sweep completes
Advantage: Results accumulate as they arrive — lower latency to completion when children finish in order.
Cost: CAS contention on the sweep gate when multiple threads deliver simultaneously. ~16-28ns per delivery depending on gate outcome.
OnFinalize: bulk sweep
Deliveries only store + ticket. No sweep, no CAS gate. The last event (determined by the ticket) bulk-sweeps all slots at once.
Per-delivery flow:
- Write result to slot,
filled.store(true, Release) state.fetch_add(1, Relaxed)— take ticket- If last event: iterate all slots,
filled.load(Acquire)each, accumulate, finalize
Advantage: Zero CAS contention per delivery. Each delivery is ~11ns (store + fetch_add only).
Cost: The finalizer must spin-wait on any slot whose filled
hasn’t been published yet (rare — Release/Acquire propagates
quickly). Cold-cache sweep if slots were written by other cores.
When to use which
| Workload | Strategy | Why |
|---|---|---|
| Init-heavy, graph-heavy | OnArrival | Streaming pipelines computation |
| Balanced, finalize-heavy | OnFinalize | Minimal per-delivery overhead |
| Wide trees (bf > 10) | Either | OnArrival if results arrive in order |
| Deep narrow (bf=2) | OnFinalize | Only 2 slots — sweep trivial |
Memory footprint
Both strategies use destructive reads — results are moved out of their slots during accumulation, then dropped. For result types that own heap memory (String, Vec, etc.), this means resources are freed progressively as the sweep advances, not held alive until fold completion.
With OnArrival, the live memory at any point is bounded by the number of delivered-but-not-yet-swept results (typically much smaller than total results). With OnFinalize, all results are delivered before the bulk sweep, so peak memory equals the node’s child count × result size — freed in one pass.
This is relevant for folds over large trees where each result carries significant heap data (parsed documents, artifact records, aggregated datasets). See Infrastructure for the arena allocation model.
Infrastructure
Four supporting types underpin the funnel executor: a segmented
bump allocator (backing both Arena and ContArena), a Chase-Lev
deque, and a futex-based parking primitive. All are created per fold
in run_fold and dropped at fold completion.
SegmentedSlab<T> — the allocation foundation
Arena and ContArena share a common backing store: SegmentedSlab<T>.
It grows lazily in 64-slot segments, never invalidating existing
references. This is the key invariant that makes it safe under the
CPS walk where alloc() and get() interleave
with live references during recursive child discovery.
The design follows the same AtomicPtr CAS pattern used by the
StealQueue’s SegmentTable:
On alloc():
next.fetch_add(1, Relaxed)— atomic bump, returns linear index- Decompose index: segment =
idx >> 6, offset =idx & 0x3F ensure_segment(seg)— Acquire load; if null, allocate + CAS- Write value to slot
On get(idx) / take(idx):
- Acquire load the segment pointer (L1 cache hit on hot path)
- Index into the segment’s slot array
Segment allocation races (multiple threads hit a new segment simultaneously) are resolved by the CAS: one thread installs its segment, losers free theirs. Exactly one segment is installed per position.
Memory profile: A fold over a 200-node tree with bf=8 allocates ~2 segments (128 slots) instead of pre-allocating 4096. Initial overhead: 32KB for the null-pointer table. Growth: one heap allocation per 64 elements. Maximum capacity: 262,144 elements.
Arena<T>
Thin wrapper over SegmentedSlab<T> for values written once and
read many times. Used for ChainNode<H, R> —
one slot per multi-child node.
alloc(value) → ArenaIdx: delegates to SegmentedSlab. Onefetch_add(1, Relaxed)+ one segment pointer load.get(idx) → &T: segment pointer load + slot index. The returned reference is stable — subsequent allocs never invalidate it.- Drop: iterates all allocated slots and drops each value, then frees all segments.
ArenaIdx is u32, Copy — a plain integer index. No refcount.
Passing an index across threads costs 4 bytes.
ContArena<T>
Same segmented design as Arena, but with move-out semantics:
alloc(value) → ContIdx: identical to Arena.take(idx) → T: moves the value OUT of the slot. Called exactly once per slot duringfire_cont’sCont::Directhandling.
Drop frees segment memory only — every allocated slot was already
take()n during the upward cascade. If the fold panics mid-execution,
slots leak (no tracking bitset). This is accepted: panic during fold
is not recoverable.
Used for parent continuations in single-child chains.
WorkerDeque<T>
Chase-Lev work-stealing deque. Fixed-capacity ring buffer with power-of-2 masking.
The deque provides per-worker local task storage for the PerWorker queue strategy:
- Owner: LIFO push/pop from bottom (no atomics in fast path)
- Stealers: FIFO steal from top (CAS for contention)
ManuallyDrop<T>wrapping prevents double-free on speculative reads between pop and steal- Cache-padded:
bottomandtopon separate 128-byte lines
If the deque is full, push returns the task to the caller, which
executes it inline (Cilk overflow protocol). This makes the fixed
capacity a performance knob, not a correctness hazard.
EventCount
Lock-free thread parking via atomic epoch + futex. Used for pool thread parking and idle worker notification.
The protocol prevents lost wakeups structurally:
prepare() → Token: snapshot the epoch (Acquire)wait(token): futex sleep if epoch unchanged sinceprepare()notify_one()/notify_all(): bump epoch (Release) + wake
If a notification fires between prepare() and wait(), the epoch
has changed and wait returns immediately — no lost wakeup.
Why arenas, not per-node allocation
- Stable references: segmented layout means
alloc()never invalidates existing pointers.Vec-backed growth would require reallocation, breaking live references inwalk_cps. - No refcounting:
ArenaIdxisCopy, 4 bytes. The equivalent with per-node allocation would beArc<ChainNode>at ~10-15ns per clone/drop. - Lazy growth: memory usage is proportional to actual tree size, not a pre-configured maximum. No capacity configuration needed.
- Bulk cleanup: arena drop iterates allocated slots + frees segments. No per-node free list interaction.
Streaming sweep and memory footprint
With the OnArrival accumulation strategy, child
results are moved out of their slots during the sweep (destructive
read). Each result is borrowed for fold.accumulate, then dropped.
This means heap resources owned by result values (Strings, Vecs,
etc.) are freed progressively as the sweep advances — not held
alive until fold completion. See
Accumulation strategies for the sweep mechanics.
Testing
The funnel test suite covers three dimensions: correctness, stress, and the hylomorphism property. All tests run for both queue strategies (PerWorker and Shared) via policy-generic test helpers.
Correctness
Verify that funnel produces the same result as the sequential Fused
executor across all named policy presets:
- Default, SharedDefault, WideLight, LowOverhead, PerWorkerArrival
- Tree sizes: 60 nodes (bf=4), 200 nodes (bf=6, bf=20)
- Zero workers (all work done by the caller thread)
- Adjacency-list trees (callback-based
treeish_visit) - Wide-tree stress (500 iterations, pool reused)
Stress
High iteration counts to catch timing-sensitive races:
- 1500 runs per policy on a reused pool
- Pool lifecycle: 5000 create/destroy cycles
- Mixed policy: 50k iterations switching between PerWorker and Shared on the same pool (mimics criterion benchmark pattern)
- 20k noop iterations each: Shared + OnFinalize and Shared + OnArrival at criterion warmup intensity
- Interleaved policies: 12.5k iterations alternating four policies on one pool
These tests exercise the dispatch → in_job latch protocol
under the exact conditions that previously triggered SIGSEGV
(high-iteration noop folds with rapid pool reuse).
Interleaving proof
The hylomorphism property: fold interleaves with traversal across subtrees. While one subtree is being visited (walk down), another subtree’s results are being accumulated (cascade up).
The test uses a lock-free TraceLog (2048-entry bounded log, atomic
sequence counter) to record visit and accumulate operations with
thread IDs and subtree tags. After 20 attempts on an 85-node tree,
the test asserts that cross-subtree interleaving occurred — proving
that the fold is not merely parallel but genuinely fused.
Fibonacci
Recursive Fibonacci via tree fold — the simplest possible example.
#![allow(unused)]
fn main() {
//! Fibonacci via tree fold — the simplest hylic example.
//! The node type is `i32` — not a struct with children.
//! The treeish computes children from the value: fib(n) → [fib(n-1), fib(n-2)].
#[cfg(test)]
mod tests {
use hylic::prelude::*;
use insta::assert_snapshot;
/// A Fibonacci node: just the number n.
/// Branches into n-1 and n-2 until reaching base cases 0 or 1.
#[derive(Clone)]
struct FibNode(u64);
#[test]
fn fibonacci() {
// Children of fib(n) are fib(n-1) and fib(n-2); fib(0) and fib(1) are leaves.
let graph: Treeish<FibNode> = treeish(|n: &FibNode| {
if n.0 <= 1 { vec![] }
else { vec![FibNode(n.0 - 1), FibNode(n.0 - 2)] }
});
// init: leaves seed the heap with n; inner nodes seed with 0.
// accumulate: each child's result is summed into the heap.
// finalize: identity (H = R = u64).
let fib: Fold<FibNode, u64, u64> = fold(
|n: &FibNode| if n.0 <= 1 { n.0 } else { 0 },
|heap: &mut u64, child: &u64| *heap += child,
|h: &u64| *h,
);
let result: u64 = FUSED.run(&fib, &graph, &FibNode(10));
assert_eq!(result, 55);
assert_snapshot!("fib10", format!("fib(10) = {result}"));
}
}
}
Output:
fib(10) = 55
Expression evaluation
Evaluate an AST bottom-up. vec_fold gives finalize access to
the node and all child results — needed when different node types
combine children differently.
#![allow(unused)]
fn main() {
//! Expression evaluation — AST fold with heterogeneous node types.
#[cfg(test)]
mod tests {
use hylic::prelude::vec_fold::{vec_fold, VecHeap};
use hylic::prelude::*;
use insta::assert_snapshot;
/// An arithmetic expression tree.
/// Each variant defines both its meaning and its children.
#[derive(Clone)]
enum Expr {
Num(f64),
Add(Box<Expr>, Box<Expr>),
Mul(Box<Expr>, Box<Expr>),
Neg(Box<Expr>),
}
/// Convenience constructors for readable test data.
fn num(v: f64) -> Expr { Expr::Num(v) }
fn add(a: Expr, b: Expr) -> Expr { Expr::Add(Box::new(a), Box::new(b)) }
fn mul(a: Expr, b: Expr) -> Expr { Expr::Mul(Box::new(a), Box::new(b)) }
fn neg(a: Expr) -> Expr { Expr::Neg(Box::new(a)) }
#[test]
fn evaluate_expression() {
let expr = mul(add(num(3.0), num(4.0)), neg(num(2.0)));
let graph: Treeish<Expr> = treeish_visit(|e: &Expr, cb: &mut dyn FnMut(&Expr)| {
match e {
Expr::Num(_) => {}
Expr::Add(a, b) | Expr::Mul(a, b) => { cb(a); cb(b); }
Expr::Neg(a) => { cb(a); }
}
});
// vec_fold collects children before finalize, so each variant can
// combine its results differently (sum / product / negate).
let eval: Fold<Expr, VecHeap<Expr, f64>, f64> = vec_fold(
|heap: &VecHeap<Expr, f64>| match &heap.node {
Expr::Num(v) => *v,
Expr::Add(_, _) => heap.childresults.iter().sum(),
Expr::Mul(_, _) => heap.childresults.iter().product(),
Expr::Neg(_) => -heap.childresults[0],
},
);
let result: f64 = FUSED.run(&eval, &graph, &expr);
assert_eq!(result, -14.0);
assert_snapshot!("expr_eval", format!("(3 + 4) * -(2) = {result}"));
}
}
}
Output:
(3 + 4) * -(2) = -14
Filesystem summary
Aggregate file sizes, counts, and directory depth in one pass.
The heap is a structured Summary — multiple metrics accumulated
simultaneously.
#![allow(unused)]
fn main() {
//! Filesystem tree summary — structured heap accumulating multiple metrics.
#[cfg(test)]
mod tests {
use hylic::prelude::*;
use insta::assert_snapshot;
/// A filesystem entry: either a file (leaf) or a directory (branch).
#[derive(Clone)]
#[allow(dead_code)]
enum FsEntry {
File { name: String, size: u64 },
Dir { name: String, children: Vec<FsEntry> },
}
impl FsEntry {
fn file(name: &str, size: u64) -> Self {
FsEntry::File { name: name.into(), size }
}
fn dir(name: &str, ch: Vec<FsEntry>) -> Self {
FsEntry::Dir { name: name.into(), children: ch }
}
}
/// Accumulates size, file count, and directory count in one pass.
#[derive(Clone, Debug, PartialEq)]
struct Summary {
total_size: u64,
file_count: usize,
dir_count: usize,
}
#[test]
fn summarize_filesystem() {
let tree = FsEntry::dir("project", vec![
FsEntry::file("README.md", 1200),
FsEntry::dir("src", vec![
FsEntry::file("main.rs", 5000),
FsEntry::file("lib.rs", 3000),
FsEntry::dir("utils", vec![
FsEntry::file("helpers.rs", 800),
]),
]),
FsEntry::file("Cargo.toml", 400),
]);
// Files are implicit leaves; only directories produce children.
let graph: Treeish<FsEntry> = treeish_visit(|entry: &FsEntry, cb: &mut dyn FnMut(&FsEntry)| {
if let FsEntry::Dir { children, .. } = entry {
for child in children { cb(child); }
}
});
// Structured heap — multiple metrics tracked in one pass. H = R = Summary.
let summarize: Fold<FsEntry, Summary, Summary> = fold(
|entry: &FsEntry| match entry {
FsEntry::File { size, .. } =>
Summary { total_size: *size, file_count: 1, dir_count: 0 },
FsEntry::Dir { .. } =>
Summary { total_size: 0, file_count: 0, dir_count: 1 },
},
|heap: &mut Summary, child: &Summary| {
heap.total_size += child.total_size;
heap.file_count += child.file_count;
heap.dir_count += child.dir_count;
},
|h: &Summary| h.clone(),
);
let result: Summary = FUSED.run(&summarize, &graph, &tree);
assert_eq!(result, Summary {
total_size: 10400, file_count: 5, dir_count: 3,
});
assert_snapshot!("fs_summary", format!(
"project/: {} bytes, {} files, {} dirs",
result.total_size, result.file_count, result.dir_count,
));
}
}
}
Output:
project/: 10400 bytes, 5 files, 3 dirs
Cycle detection
Detect cycles in a dependency graph. Cycle state lives in the node type (ancestor set), not the fold — the Treeish decides structure, the Fold just collects.
#![allow(unused)]
fn main() {
//! Cycle detection in a dependency graph.
//! Demonstrates: treeish over a graph with potential cycles,
//! fold that tracks visited nodes to detect re-entry.
#[cfg(test)]
mod tests {
use std::collections::{HashMap, HashSet};
use hylic::prelude::*;
use insta::assert_snapshot;
/// A dependency graph defined as adjacency lists.
/// Nodes are string IDs, edges are dependencies.
#[derive(Clone)]
struct DepGraph {
edges: HashMap<String, Vec<String>>,
}
impl DepGraph {
fn new(edges: &[(&str, &[&str])]) -> Self {
DepGraph {
edges: edges.iter()
.map(|(k, v)| (k.to_string(), v.iter().map(|s| s.to_string()).collect()))
.collect(),
}
}
}
/// A node in the traversal: carries the current ID and
/// the set of ancestors on this path (for cycle detection).
#[derive(Clone)]
struct DepNode {
id: String,
ancestors: HashSet<String>,
}
impl DepNode {
fn root(id: &str) -> Self {
DepNode { id: id.to_string(), ancestors: HashSet::new() }
}
fn child(&self, id: &str) -> Self {
let mut ancestors = self.ancestors.clone();
ancestors.insert(self.id.clone());
DepNode { id: id.to_string(), ancestors }
}
fn is_cycle(&self) -> bool {
self.ancestors.contains(&self.id)
}
}
/// Result of cycle analysis for a subtree.
#[derive(Clone, Debug)]
struct CycleResult {
cycles: Vec<String>,
visited: usize,
}
#[test]
fn detect_cycles() {
let graph_data = DepGraph::new(&[
("A", &["B", "C"]),
("B", &["D"]),
("C", &["D", "A"]), // C → A creates a cycle
("D", &[]),
]);
// Cycle state lives in the node type — DepNode carries its ancestor
// set. When a node sees itself in that set, the treeish stops by
// returning no children.
let graph: Treeish<DepNode> = treeish(move |node: &DepNode| {
if node.is_cycle() { return vec![]; }
graph_data.edges.get(&node.id)
.map(|deps| deps.iter().map(|d| node.child(d)).collect())
.unwrap_or_default()
});
let detect: Fold<DepNode, CycleResult, CycleResult> = fold(
|node: &DepNode| CycleResult {
cycles: if node.is_cycle() { vec![node.id.clone()] } else { vec![] },
visited: 1,
},
|heap: &mut CycleResult, child: &CycleResult| {
heap.cycles.extend(child.cycles.iter().cloned());
heap.visited += child.visited;
},
|h: &CycleResult| h.clone(),
);
let result: CycleResult = FUSED.run(&detect, &graph, &DepNode::root("A"));
assert_eq!(result.cycles, vec!["A"]); // C → A cycle detected
assert_eq!(result.visited, 6); // A, B, C, D, D, A(cycle)
assert_snapshot!("cycles", format!(
"cycles: {:?}, visited: {}", result.cycles, result.visited
));
}
}
}
Output:
cycles: ["A"], visited: 6
Configuration inheritance
Overlay configuration scopes bottom-up. or_insert in accumulate
gives parent-wins semantics — init runs before accumulate, so the
parent’s values are already in the map.
#![allow(unused)]
fn main() {
//! Configuration inheritance with overlay/merge.
//! Demonstrates: a fold where the heap IS a config map,
//! and children's configs overlay the parent's defaults.
#[cfg(test)]
mod tests {
use std::collections::BTreeMap;
use hylic::prelude::*;
use insta::assert_snapshot;
/// A configuration scope. Each scope has its own key-value overrides
/// and child scopes that inherit and can further override.
#[derive(Clone, Debug)]
struct ConfigScope {
name: String,
overrides: BTreeMap<String, String>,
children: Vec<ConfigScope>,
}
impl ConfigScope {
fn new(name: &str, overrides: &[(&str, &str)], children: Vec<ConfigScope>) -> Self {
ConfigScope {
name: name.into(),
overrides: overrides.iter().map(|(k, v)| (k.to_string(), v.to_string())).collect(),
children,
}
}
fn leaf(name: &str, overrides: &[(&str, &str)]) -> Self {
Self::new(name, overrides, vec![])
}
}
/// Resolved configuration: the merged key-value map for a scope,
/// collecting all overrides from the scope and its descendants.
#[derive(Clone, Debug, PartialEq)]
struct ResolvedConfig {
scope: String,
merged: BTreeMap<String, String>,
}
#[test]
fn config_overlay() {
let root = ConfigScope::new("global", &[
("color", "blue"),
("font_size", "12"),
("theme", "light"),
], vec![
ConfigScope::new("production", &[
("theme", "dark"),
("debug", "false"),
], vec![
ConfigScope::leaf("production.api", &[
("font_size", "14"),
("rate_limit", "1000"),
]),
]),
ConfigScope::leaf("development", &[
("debug", "true"),
("theme", "light"),
]),
]);
let graph: Treeish<ConfigScope> =
treeish_from(|scope: &ConfigScope| scope.children.as_slice());
// init seeds the heap with the scope's own overrides; init runs
// before accumulate, so parent values win — child entries only
// fill in keys the parent hasn't set.
let resolve: Fold<ConfigScope, ResolvedConfig, ResolvedConfig> = fold(
|scope: &ConfigScope| ResolvedConfig {
scope: scope.name.clone(),
merged: scope.overrides.clone(),
},
|heap: &mut ResolvedConfig, child: &ResolvedConfig| {
for (k, v) in &child.merged {
heap.merged.entry(k.clone()).or_insert_with(|| v.clone());
}
},
|h: &ResolvedConfig| h.clone(),
);
let result: ResolvedConfig = FUSED.run(&resolve, &graph, &root);
// Global scope sees all keys from all descendants,
// but its own values win for "color", "font_size", "theme".
assert_eq!(result.merged.get("color").unwrap(), "blue");
assert_eq!(result.merged.get("theme").unwrap(), "light"); // parent wins
assert_eq!(result.merged.get("debug").unwrap(), "false"); // production's value
assert_eq!(result.merged.get("rate_limit").unwrap(), "1000");
let display: Vec<String> = result.merged.iter()
.map(|(k, v)| format!("{k}={v}")).collect();
assert_snapshot!("config", display.join(", "));
}
}
}
Output:
color=blue, debug=false, font_size=12, rate_limit=1000, theme=light
Parallel execution
hylic provides two built-in executors. FUSED runs the fold
sequentially through callback-based recursion. The Funnel executor
parallelizes the same fold across a scoped thread pool. Both are
invoked through the same .run() method.
Sequential: FUSED
Callback-based recursion on a single thread, with no overhead beyond the fold closures themselves:
#![allow(unused)]
fn main() {
use hylic::prelude::*;
FUSED.run(&fold, &graph, &root);
}
Parallel: Funnel
The Funnel executor preserves the fused property — children are
discovered through graph.visit and processed concurrently. No
intermediate tree is built.
One-shot
#![allow(unused)]
fn main() {
use hylic::prelude::*;
exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
}
Spec::default(n) uses the Robust policy preset. .run() creates
a scoped thread pool internally, runs the fold, and joins before
returning.
Session scope
For repeated folds, amortize pool creation:
#![allow(unused)]
fn main() {
exec(funnel::Spec::default(8)).session(|s| {
s.run(&fold1, &graph1, &root1);
s.run(&fold2, &graph2, &root2);
});
}
The pool lives for the closure. Each .run() inside is cheap.
Explicit attach
Provide the pool yourself:
#![allow(unused)]
fn main() {
funnel::Pool::with(8, |pool| {
let pw = exec(funnel::Spec::default(8)).attach(pool);
let sh = exec(funnel::Spec::for_wide_light(8)).attach(pool);
pw.run(&fold, &graph, &root);
sh.run(&fold, &graph, &root);
});
}
Different policies can share a pool — each .attach() consumes a
(Copy) Spec and binds it to the pool, producing a session-level
executor.
Policy variants
| Preset | Best for |
|---|---|
Spec::default(n) | General purpose |
Spec::for_wide_light(n) | Wide trees (bf > 10) |
Spec::for_deep_narrow(n) | Deep chains (bf = 2) |
Spec::for_low_overhead(n) | Overhead-sensitive |
Spec::for_high_throughput(n) | Heavy balanced |
See Funnel policies for the full decision
guide and The Exec pattern for
the type-level design behind .run(), .session(), and .attach().
External parallel options
One additional strategy lives in a sibling crate:
- Rayon (hylic-benchmark):
par_iter-based fork-join
Working example
This example uses a flat adjacency list — nodes are integer indices, children are looked up by index. The same fold runs sequentially (Fused) and in parallel (Funnel) with identical results.
#![allow(unused)]
fn main() {
//! Parallel execution: Fused vs Funnel over flat data.
//! Demonstrates: adjacency-list graph, identical results across
//! policies, session scope, explicit pool attach.
#[cfg(test)]
mod tests {
use hylic::prelude::*;
use hylic::exec::funnel;
use insta::assert_snapshot;
/// Build a tree as a flat adjacency list + value array.
/// Node 0 is the root with 6 children; each child has 3 leaves.
fn build_tree() -> (Vec<Vec<usize>>, Vec<u64>) {
let mut adj: Vec<Vec<usize>> = Vec::new();
let mut vals: Vec<u64> = Vec::new();
// root (node 0)
adj.push((1..=6).collect());
vals.push(1);
// 6 branches (nodes 1-6), each with 3 leaves
let mut next_leaf = 7;
for i in 0..6 {
let children: Vec<usize> = (next_leaf..next_leaf + 3).collect();
adj.push(children);
vals.push(i as u64 * 10);
next_leaf += 3;
}
// 18 leaves (nodes 7-24)
for i in 0..6 {
for j in 0..3u64 {
adj.push(vec![]);
vals.push(i as u64 * 10 + j);
}
}
(adj, vals)
}
#[test]
fn parallel_strategies() {
let (adj, vals) = build_tree();
// The treeish looks up children by index; no nested structs.
let adj_for_graph = adj.clone();
let graph: Treeish<usize> = treeish_visit(move |n: &usize, cb: &mut dyn FnMut(&usize)| {
for &c in &adj_for_graph[*n] { cb(&c); }
});
let vals_for_fold = vals.clone();
let sum: Fold<usize, u64, u64> = fold(
move |n: &usize| vals_for_fold[*n],
|heap: &mut u64, child: &u64| *heap += child,
|heap: &u64| *heap,
);
// Sequential baseline
let expected = FUSED.run(&sum, &graph, &0usize);
// One-shot: .run() creates + destroys pool internally
let r_default = exec(funnel::Spec::default(4)).run(&sum, &graph, &0usize);
assert_eq!(r_default, expected);
// Different policy: wide-light
let r_wide = exec(funnel::Spec::for_wide_light(4)).run(&sum, &graph, &0usize);
assert_eq!(r_wide, expected);
// Session scope: pool shared across folds
exec(funnel::Spec::default(4)).session(|s| {
assert_eq!(s.run(&sum, &graph, &0usize), expected);
assert_eq!(s.run(&sum, &graph, &0usize), expected);
});
// Explicit attach: manual pool, multiple policies
funnel::Pool::with(4, |pool| {
let pw = exec(funnel::Spec::default(4)).attach(pool);
let sh = exec(funnel::Spec::for_wide_light(4)).attach(pool);
assert_eq!(pw.run(&sum, &graph, &0usize), expected);
assert_eq!(sh.run(&sum, &graph, &0usize), expected);
});
assert_snapshot!("parallel", format!(
"sum = {expected}, verified: fused, funnel(one-shot), funnel(wide), session, attach"
));
}
}
}
Output:
sum = 619, verified: fused, funnel(one-shot), funnel(wide), session, attach
Zero-cost performance
The closure-based API (Fold from a domain module, plus Treeish)
is the ergonomic default. For performance-critical paths, the
graph side admits user-defined TreeOps implementations whose
visit method monomorphises directly. The fold side does not — the
executor signature pins the fold type to D::Fold<H, R> (the
closure-based wrapper). The two ops traits are nevertheless the
right vocabulary for thinking about per-node cost.
The overhead budget
Per node with K children, the fused executor makes these calls through the closure-based API:
| Call site | Count | Dispatch |
|---|---|---|
fold.init(node) | 1 | dyn Fn via Arc/Rc/Box |
graph.visit(node, cb) | 1 | dyn Fn via Arc/Rc/Box |
cb(child) inside visit | K | &mut dyn FnMut callback |
fold.accumulate(heap, &r) | K | dyn Fn via Arc/Rc/Box |
fold.finalize(heap) | 1 | dyn Fn via Arc/Rc/Box |
| Total | 3+2K |
Measured: ~0.47 ns per indirect call (well-predicted by the branch predictor). On a noop workload (bf=8, 200 nodes): ~1.8 µs above hand-written recursion. On any real workload (>10 µs/node) the overhead drops below the noise floor.
Eliminating graph dispatch: implement TreeOps
The executor’s graph parameter is generic — G: TreeOps<N> —
so any concrete impl is monomorphised at the call site:
#![allow(unused)]
fn main() {
#[test]
fn zero_cost_treeops() {
use hylic::prelude::*;
use hylic::ops::TreeOps;
#[derive(Clone)]
struct TreeNode { id: usize, value: u64 }
struct AdjGraph {
adj: Vec<Vec<usize>>,
nodes: Vec<TreeNode>,
}
impl TreeOps<TreeNode> for AdjGraph {
fn visit(&self, node: &TreeNode, cb: &mut dyn FnMut(&TreeNode)) {
for &child_id in &self.adj[node.id] {
cb(&self.nodes[child_id]);
}
}
}
let graph = AdjGraph {
adj: vec![vec![1, 2], vec![], vec![]],
nodes: vec![
TreeNode { id: 0, value: 1 },
TreeNode { id: 1, value: 2 },
TreeNode { id: 2, value: 3 },
],
};
let f: Fold<TreeNode, u64, u64> = fold(
|n: &TreeNode| n.value,
|h: &mut u64, r: &u64| *h += r,
|h: &u64| *h,
);
let root = graph.nodes[0].clone();
let total: u64 = FUSED.run(&f, &graph, &root);
assert_eq!(total, 6);
}
}
AdjGraph::visit is a direct, inlinable loop. Only the callback
cb: &mut dyn FnMut is still indirect — K calls per node. The
closure-based fold is still in the picture because executors take
&D::Fold<H, R>; replacing it requires a custom executor (below).
The shape of the trait
FoldOps and TreeOps are the operation traits any user code
can target:
pub trait FoldOps<N, H, R> {
fn init(&self, node: &N) -> H;
fn accumulate(&self, heap: &mut H, result: &R);
fn finalize(&self, heap: &H) -> R;
}
pub trait TreeOps<N> {
fn visit(&self, node: &N, cb: &mut dyn FnMut(&N));
}
The closure-based domain folds (shared::Fold, local::Fold,
owned::Fold) implement FoldOps by delegating to their stored
closures. A user-defined FoldOps struct is callable from any
custom executor that drives the recursion through the trait —
bypassing the closure layer entirely. The shipped Fused
executor’s inner loop is exactly this:
#![allow(unused)]
fn main() {
fn recurse<N, H, R>(
fold: &impl FoldOps<N, H, R>,
graph: &impl TreeOps<N>,
node: &N,
) -> R {
let mut heap = fold.init(node);
graph.visit(node, &mut |child: &N| {
let r = recurse(fold, graph, child);
fold.accumulate(&mut heap, &r);
});
fold.finalize(&heap)
}
}
When the budget matters
| Path | Per-node overhead | When to use |
|---|---|---|
| Closure-based Fold + Treeish | 3+2K indirect calls | Default — combinators, lifts, sugars |
| Closure-based Fold + custom TreeOps | K+1 indirect calls | Adjacency lists or graph types where the visit path is the hot side |
Custom executor over FoldOps + TreeOps | K indirect calls | Maximum control; sacrifices the lift / pipeline machinery for one specific shape |
Why LTO doesn’t help
LLVM cannot devirtualise Rust dyn Fn calls. Rust does not emit
the !vcall_visibility metadata that LLVM’s whole-program
devirtualisation would need. Neither thin LTO nor fat LTO changes
this. The trait-based path is the only reliable way to eliminate
dispatch.
See Benchmarks for the measured comparison across all execution modes.
Transformations
Features as standalone functions matching the transformation contract. One domain, one base fold, one base graph. Each feature is a named function — defined separately, plugged in with a single method call.
The phase-wrapping contract — each wrapper receives the original phase as a callable reference:
wrap_init:Fn(&N, &dyn Fn(&N) -> H) -> Hwrap_accumulate:Fn(&mut H, &R, &dyn Fn(&mut H, &R))wrap_finalize:Fn(&H, &dyn Fn(&H) -> R) -> R
#![allow(unused)]
fn main() {
//! Transformations: features as standalone functions that match the contract.
//!
//! One domain, one base fold, one base graph. Each feature is a named
//! function — it IS the concern, separated and reusable. Plugging it
//! in is a single method call on the existing construct.
#[cfg(test)]
mod tests {
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use hylic::prelude::*;
use hylic::prelude::memoize_treeish_by;
use insta::assert_snapshot;
// ── Domain ──────────────────────────────────────────────
#[derive(Clone, Debug)]
struct Task {
name: String,
cost_ms: u64,
deps: Vec<String>,
}
struct Registry(HashMap<String, Task>);
impl Registry {
fn new(tasks: &[(&str, u64, &[&str])]) -> Self {
Registry(tasks.iter().map(|(name, cost, deps)| {
(name.to_string(), Task {
name: name.to_string(),
cost_ms: *cost,
deps: deps.iter().map(|d| d.to_string()).collect(),
})
}).collect())
}
fn get(&self, name: &str) -> Option<&Task> { self.0.get(name) }
}
// ── Shared setup ────────────────────────────────────────
fn setup() -> (Treeish<Task>, Task) {
let reg = Registry::new(&[
("app", 50, &["compile", "link"]),
("compile", 200, &["parse", "typecheck"]),
("parse", 100, &[]),
("typecheck", 300, &[]),
("link", 150, &[]),
]);
let map = reg.0.clone();
let g: Treeish<Task> = treeish(move |task: &Task| {
task.deps.iter().filter_map(|d| map.get(d).cloned()).collect()
});
let root = reg.get("app").unwrap().clone();
(g, root)
}
fn base_fold() -> Fold<Task, u64, u64> {
fold(
|t: &Task| t.cost_ms,
|heap: &mut u64, child: &u64| *heap += child,
|h: &u64| *h,
)
}
// ── Fold phase wrappers ─────────────────────────────────
//
// Each is a standalone closure matching the wrap contract:
// wrap_init: Fn(&N, &dyn Fn(&N) -> H) -> H
// wrap_accumulate: Fn(&mut H, &R, &dyn Fn(&mut H, &R))
// wrap_finalize: Fn(&H, &dyn Fn(&H) -> R) -> R
/// Hooks into init: called once per node, before children.
/// Logs the task name, then delegates to the original init.
fn visit_logger(sink: Arc<Mutex<Vec<String>>>)
-> impl Fn(&Task, &dyn Fn(&Task) -> u64) -> u64
{
move |task: &Task, orig: &dyn Fn(&Task) -> u64| {
sink.lock().unwrap().push(task.name.clone());
orig(task)
}
}
/// Hooks into accumulate: conditionally skips small children.
/// By not calling orig, the child result is never folded in.
fn skip_small_children(threshold: u64)
-> impl Fn(&mut u64, &u64, &dyn Fn(&mut u64, &u64))
{
move |heap: &mut u64, child: &u64, orig: &dyn Fn(&mut u64, &u64)| {
if *child >= threshold { orig(heap, child); }
}
}
/// Hooks into finalize: clamps the result.
fn clamp_at(max: u64)
-> impl Fn(&u64, &dyn Fn(&u64) -> u64) -> u64
{
move |heap: &u64, orig: &dyn Fn(&u64) -> u64| orig(heap).min(max)
}
/// zipmap contract: a plain Fn(&R) -> Extra. No wrapping needed —
/// the function itself IS the feature. zipmap calls it per node,
/// pairing the original result with the derived value: R → (R, Extra).
fn classify(total: &u64) -> &'static str {
match *total {
t if t >= 500 => "critical",
t if t >= 200 => "heavy",
_ => "light",
}
}
// ── Graph transformations ───────────────────────────────
fn only_costly_deps(g: &Treeish<Task>, min_cost: u64) -> Treeish<Task> {
let inner = g.clone();
treeish(move |task: &Task| {
inner.at(task)
.filter(|child: &Task| child.cost_ms >= min_cost)
.collect_vec()
})
}
// ── Tests ───────────────────────────────────────────────
#[test]
fn test_visit_logger() {
let (graph, root) = setup();
let visited = Arc::new(Mutex::new(Vec::new()));
let fold = base_fold().wrap_init(visit_logger(visited.clone()));
let total = FUSED.run(&fold, &graph, &root);
let names: Vec<String> = visited.lock().unwrap().clone();
assert_eq!(total, 800);
assert_snapshot!("visit_logger", format!(
"total={total}, visited: {}", names.join(" → ")
));
}
#[test]
fn test_skip_small_children() {
let (graph, root) = setup();
let fold = base_fold().wrap_accumulate(skip_small_children(200));
let total = FUSED.run(&fold, &graph, &root);
// app(50) + compile(200+typecheck 300) = 550; parse(100) and link(150) skipped
assert_eq!(total, 550);
assert_snapshot!("skip_small", format!("total={total} (small children skipped)"));
}
#[test]
fn test_clamp_at() {
let (graph, root) = setup();
let fold = base_fold().wrap_finalize(clamp_at(500));
let total = FUSED.run(&fold, &graph, &root);
// compile=min(600,500)=500, link=150, app=min(50+500+150,500)=500
assert_eq!(total, 500);
assert_snapshot!("clamp_at", format!("total={total} (clamped at 500)"));
}
#[test]
fn test_classify() {
let (graph, root) = setup();
let (total, category) = FUSED.run(&base_fold().zipmap(classify), &graph, &root);
assert_eq!(total, 800);
assert_eq!(category, "critical");
assert_snapshot!("classify", format!("total={total}, category={category}"));
}
#[test]
fn test_only_costly_deps() {
let (graph, root) = setup();
let filtered = only_costly_deps(&graph, 150);
let total = FUSED.run(&base_fold(), &filtered, &root);
// parse(100) pruned: app(50)+compile(200)+typecheck(300)+link(150) = 700
assert_eq!(total, 700);
assert_snapshot!("only_costly", format!("total={total} (deps with cost < 150 pruned)"));
}
#[test]
fn test_memoize_diamond() {
let reg = Registry::new(&[
("app", 10, &["compile", "link"]),
("compile", 50, &["stdlib"]),
("link", 30, &["stdlib"]),
("stdlib", 200, &[]),
]);
let visit_count = Arc::new(Mutex::new(0u32));
let vc = visit_count.clone();
let map = reg.0.clone();
let graph = treeish(move |task: &Task| {
*vc.lock().unwrap() += 1;
task.deps.iter().filter_map(|d| map.get(d).cloned()).collect()
});
let root = reg.get("app").unwrap().clone();
let total = FUSED.run(&base_fold(), &graph, &root);
let raw_visits = *visit_count.lock().unwrap();
*visit_count.lock().unwrap() = 0;
let cached = memoize_treeish_by(&graph, |t: &Task| t.name.clone());
let total_memo = FUSED.run(&base_fold(), &cached, &root);
let memo_visits = *visit_count.lock().unwrap();
assert_eq!((total, raw_visits), (490, 5));
assert_eq!((total_memo, memo_visits), (490, 4));
assert_snapshot!("memoize", format!(
"raw: total={total} visits={raw_visits}, memo: total={total_memo} visits={memo_visits}"
));
}
#[test]
fn test_composed_pipeline() {
let (graph, root) = setup();
let visited = Arc::new(Mutex::new(Vec::new()));
let pipeline = base_fold()
.wrap_init(visit_logger(visited.clone()))
.wrap_finalize(clamp_at(500))
.zipmap(classify);
let (total, category) = FUSED.run(&pipeline, &graph, &root);
let names: Vec<String> = visited.lock().unwrap().clone();
assert_eq!(total, 500);
assert_eq!(category, "critical");
assert_snapshot!("composed", format!(
"total={total} [{category}], visited: {}", names.join(" → ")
));
}
}
}
Outputs:
total=800, visited: app → compile → parse → typecheck → link
total=550 (small children skipped)
total=500 (clamped at 500)
total=800, category=critical
total=700 (deps with cost < 150 pruned)
raw: total=490 visits=5, memo: total=490 visits=4
total=500 [critical], visited: app → compile → parse → typecheck → link
Module resolution
Lazy dependency resolution via SeedPipeline. A grow function
resolves dependency references (seeds) into modules (nodes), which
may themselves have dependencies. Error handling uses
Either<Error, Valid> — error nodes are leaves with no children.
See Seed-based lazy discovery for
the SeedPipeline API and its internal mechanics.
#![allow(unused)]
fn main() {
//! Minified module resolution — the pattern that motivated hylic.
//! Demonstrates: SeedPipeline for lazy dependency discovery,
//! error handling via Either, and seeds_for_fallible.
#[cfg(test)]
mod tests {
use std::collections::HashMap;
use either::Either;
use hylic_pipeline::prelude::*;
use insta::assert_snapshot;
/// A module has a name and declares dependencies on other modules.
#[derive(Clone, Debug)]
struct Module {
name: String,
deps: Vec<String>,
}
/// A module registry: maps names to module definitions.
struct Registry(HashMap<String, Module>);
impl Registry {
fn new(modules: &[(&str, &[&str])]) -> Self {
Registry(modules.iter().map(|(name, deps)| {
(name.to_string(), Module {
name: name.to_string(),
deps: deps.iter().map(|s| s.to_string()).collect(),
})
}).collect())
}
}
/// Error when a module can't be found.
#[derive(Clone, Debug)]
struct ResolveError(String);
/// Resolution result: either an error or a list of resolved module names.
#[derive(Clone, Debug)]
struct Resolved {
modules: Vec<String>,
errors: Vec<String>,
}
#[test]
fn resolve_modules() {
let registry = Registry::new(&[
("app", &["logging", "config", "ghost"]),
("logging", &["utils"]),
("config", &["utils"]),
("utils", &[]),
// "ghost" is not in the registry — will produce an error
]);
// Node = Either<ResolveError, Module>. `seeds_for_fallible` adapts a
// valid-side edge function so errors produce no seeds.
let seeds_from_node: Edgy<Either<ResolveError, Module>, String> =
seeds_for_fallible(edgy(move |module: &Module| module.deps.clone()));
// grow: dependency name → Either<Error, Module>.
let grow = {
let reg = registry;
move |dep_name: &String| -> Either<ResolveError, Module> {
match reg.0.get(dep_name) {
Some(m) => Either::Right(m.clone()),
None => Either::Left(ResolveError(format!("not found: {}", dep_name))),
}
}
};
let collect: Fold<Either<ResolveError, Module>, Resolved, Resolved> = fold(
|node: &Either<ResolveError, Module>| match node {
Either::Right(m) => Resolved { modules: vec![m.name.clone()], errors: vec![] },
Either::Left(e) => Resolved { modules: vec![], errors: vec![e.0.clone()] },
},
|heap: &mut Resolved, child: &Resolved| {
heap.modules.extend(child.modules.iter().cloned());
heap.errors.extend(child.errors.iter().cloned());
},
|h: &Resolved| h.clone(),
);
let pipeline: SeedPipeline<Shared, Either<ResolveError, Module>, String, Resolved, Resolved> =
SeedPipeline::new(grow, seeds_from_node, &collect);
let result: Resolved = pipeline.run_from_slice(
&FUSED,
&["app".to_string()],
Resolved { modules: vec![], errors: vec![] },
);
assert!(result.modules.contains(&"utils".to_string()));
assert!(result.modules.contains(&"app".to_string()));
assert!(result.errors.contains(&"not found: ghost".to_string()));
assert_snapshot!("resolution", format!(
"resolved: [{}], errors: [{}]",
result.modules.join(", "),
result.errors.join(", "),
));
}
}
}
Output:
resolved: [app, logging, utils, config, utils], errors: [not found: ghost]
Case study — Explainer
explainer_lift is a ShapeLift constructor that wraps a fold
with per-node trace recording. It’s a useful case study because
it changes H and R (not N), composes as a post-lift, and
produces a result type that lets callers inspect the full
computation tree.
What it does
#![allow(unused)]
fn main() {
pub fn explainer_lift<N, H, R>()
-> ShapeLift<Shared, N, H, R,
N,
ExplainerHeap<N, H, ExplainerResult<N, H, R>>,
ExplainerResult<N, H, R>>
where N: Clone + Send + Sync + 'static,
H: Clone + Send + Sync + 'static,
R: Clone + Send + Sync + 'static,
{
let fold_xform: <Shared as ShapeCapable<N>>::FoldXform<
H, R, N,
ExplainerHeap<N, H, ExplainerResult<N, H, R>>,
ExplainerResult<N, H, R>,
> = Arc::new(move |f: Fold<N, H, R>| {
let f1 = f.clone();
let f2 = f.clone();
let f3 = f;
sfold::fold(
move |n: &N| ExplainerHeap::new(n.clone(), f1.init(n)),
move |heap: &mut ExplainerHeap<N, H, ExplainerResult<N, H, R>>,
child: &ExplainerResult<N, H, R>| {
f2.accumulate(&mut heap.working_heap, &child.orig_result);
heap.transitions.push(ExplainerStep {
incoming_result: child.clone(),
resulting_heap: heap.working_heap.clone(),
});
},
move |heap: &ExplainerHeap<N, H, ExplainerResult<N, H, R>>| ExplainerResult {
orig_result: f3.finalize(&heap.working_heap),
heap: heap.clone(),
},
)
});
ShapeLift::new(
<Shared as ShapeCapable<N>>::identity_treeish_xform(),
fold_xform,
)
}
}
The lift wraps:
- H becomes
ExplainerHeap<N, H, ExplainerResult<N, H, R>>: the original H plus a vector of per-child transitions recorded during accumulate. - R becomes
ExplainerResult<N, H, R>: the original result plus the full heap (so callers can walk the trace tree).
Every node’s finalize produces both the original R and the recorded history.
Usage
Via the sugar method .explain() on any Stage-2 pipeline — a
treeish-rooted Stage2Pipeline, a seed-rooted Stage2Pipeline,
or a TreeishPipeline via auto-lift. A SeedPipeline requires
an explicit .lift() first:
#![allow(unused)]
fn main() {
#[test]
fn explainer_usage() {
use hylic_pipeline::prelude::*;
#[derive(Clone)]
struct N { val: u64, children: Vec<N> }
let f: Fold<N, u64, u64> = fold(
|n: &N| n.val,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h,
);
let root = N { val: 1, children: vec![N { val: 2, children: vec![] }] };
let trace: ExplainerResult<N, u64, u64> =
TreeishPipeline::new(treeish(|n: &N| n.children.clone()), &f)
.lift()
.then_lift(Shared::explainer_lift::<N, u64, u64>())
.run_from_node(&FUSED, &root);
assert_eq!(trace.orig_result, 3);
}
}
The return type is ExplainerResult<N', H, R> where N' is the
chain’s current node type — N on a treeish-rooted chain, but
SeedNode<N> on a seed-rooted chain (since the seed chain’s
node type is SeedNode<N> from .lift() onward). Access
.orig_result for the original computation’s output:
#![allow(unused)]
fn main() {
#[test]
fn explainer_orig_result() {
use hylic_pipeline::prelude::*;
#[derive(Clone)]
struct Node { v: u64, ch: Vec<Node> }
let root = Node { v: 3, ch: vec![
Node { v: 2, ch: vec![] },
Node { v: 1, ch: vec![] },
]};
let tp: TreeishPipeline<Shared, Node, u64, u64> = TreeishPipeline::new(
treeish(|n: &Node| n.ch.clone()),
&fold(|n: &Node| n.v, |h: &mut u64, c: &u64| *h += c, |h: &u64| *h),
);
let trace: ExplainerResult<Node, u64, u64> = tp
.explain()
.run_from_node(&FUSED, &root);
// Sum = 3 + 2 + 1 = 6.
assert_eq!(trace.orig_result, 6);
// Every non-leaf records its child-accumulations.
assert!(!trace.heap.transitions.is_empty());
}
}
Sealed view on the seed path
For an N-typed view of the trace that hides SeedNode entirely,
project via the standard From conversion:
use hylic::prelude::SeedExplainerResult;
let raw: ExplainerResult<SeedNode<N>, H, R> =
pipeline.lift().explain().run_from_slice(&FUSED, &seeds, h0);
let sealed: SeedExplainerResult<N, H, R> = raw.into();
// sealed.entry_initial_heap, entry_working_heap, orig_result — EntryRoot row promoted out
// sealed.roots: Vec<ExplainerResult<N, H, R>> — per-seed subtrees
Use raw when you need to keep composing lifts on top of
.explain() (the chain type is what matters); use sealed when
you want an N-typed view for formatting or assertions — the
library’s invariant guarantees every below-root node is a
Node(n), so the unwrap is total.
Composing with other lifts
Because explain() is just a then_lift(Shared::explainer_lift()),
it composes:
let r = pipeline
.wrap_init(|n, orig| orig(n) * 2) // first lift
.explain() // records the wrap_init results
.zipmap(|r| r.orig_result > 100); // inspect .orig_result
Order matters: lifts run bottom-up (the first .wrap_init runs
innermost; .explain sees its results; .zipmap sees the
ExplainerResult).
Streaming variant
Shared::explainer_describe_lift(fmt, emit) emits formatted
trace lines per node via a callback and leaves MapR = R
unchanged:
use hylic::prelude::*;
let _ = Shared::explainer_describe_lift::<Node, u64, u64, _, _>(
trace_fold_compact::<Node, u64, u64>,
|line: &str| eprintln!("[trace] {line}"),
);
Local mirror deferred (blocked on Send+Sync in the formatter);
explainer_lift is available for Local.
Import patterns
Two preludes cover everything most users need:
hylic::prelude::*— core: domain markers (Shared/Local/Owned), Shared-defaultFoldandTreeishconstructors, executor helpers (FUSED,exec), every lift atom (Lift,IdentityLift,ComposedLift,ShapeLift,SeedLift,LiftBare,SeedNode), and explainer/format helpers.hylic_pipeline::prelude::*— re-exports the core prelude plus pipeline typestates (SeedPipeline,TreeishPipeline,Stage2Pipeline,OwnedPipeline), source traits (TreeishSource,PipelineExec,PipelineExecOnce,PipelineSourceOnce), and the sugar trait families (SeedSugars*,TreeishSugars*,Stage2Sugars*).
A complete program — fold + graph + executor — needs exactly one prelude line:
#![allow(unused)]
fn main() {
use hylic::prelude::*;
let fold = fold(|n: &i32| *n as u64,
|h: &mut u64, c: &u64| *h += c,
|h: &u64| *h);
let graph = treeish(|n: &i32| if *n > 1 { vec![n - 1, n - 2] } else { vec![] });
let total = FUSED.run(&fold, &graph, &5);
}
FUSED is the sequential executor, available as a const on the
Shared domain. fold and treeish are the Shared-default
constructors — for Local or Owned, take the per-domain path
(below).
Switching domains
For Local or Owned construction, address the domain module
directly. The closures don’t change; only the constructor and the
executor binding do:
#![allow(unused)]
fn main() {
#[test]
fn domain_switching() {
use hylic::domain::{shared as sdom, local as ldom, owned as odom};
use hylic::graph::treeish_visit;
#[derive(Clone)]
struct N { val: u64, children: Vec<N> }
// Same closures used to build a fold in each domain.
let init = |n: &N| n.val;
let acc = |h: &mut u64, c: &u64| *h += c;
let fin = |h: &u64| *h;
fn children(n: &N, cb: &mut dyn FnMut(&N)) {
for c in &n.children { cb(c); }
}
let root = N { val: 1, children: vec![N { val: 2, children: vec![] }] };
// Shared (Arc):
let r1: u64 = sdom::FUSED.run(
&sdom::fold(init, acc, fin),
&treeish_visit(children),
&root,
);
// Local (Rc):
let r2: u64 = ldom::FUSED.run(
&ldom::fold(init, acc, fin),
&treeish_visit(children),
&root,
);
// Owned (Box):
let r3: u64 = odom::FUSED.run(
&odom::fold(init, acc, fin),
&treeish_visit(children),
&root,
);
assert_eq!(r1, r2);
assert_eq!(r2, r3);
}
}
Parallel execution
Funnel comes in through the prelude as the funnel module:
#![allow(unused)]
fn main() {
use hylic::prelude::*;
let total = exec(funnel::Spec::default(8)).run(&fold, &graph, &root);
}
Spec presets (default, for_wide_light, for_deep_narrow, …)
are documented in Funnel policies. For
amortised pool reuse across many folds, use
.session(|s| s.run(...)).
Pipeline programs
Pipelines layer on the same imports — switch to the pipeline prelude:
#![allow(unused)]
fn main() {
use hylic_pipeline::prelude::*;
}
That single line brings the core prelude with it; users do not
import hylic::prelude separately. From there, every Stage-1
constructor (SeedPipeline::new, TreeishPipeline::new,
OwnedPipeline::new) and every sugar (.lift(), .then_lift(…),
.zipmap(…), .wrap_init(…), .explain(), .run(…)) is in
scope.
A full pipeline example is at the end of Pipelines — overview.
When you need bare module paths
The preludes cover normal usage. The bare module paths are useful for
-
Generic code over executors or operations
#![allow(unused)] fn main() { use hylic::ops::{FoldOps, TreeOps}; use hylic::exec::Executor; } -
Per-domain primitives (e.g. when you keep
hylic::preludeand wantLocalconstructors visible at the same names): import the domain module under an alias —#![allow(unused)] fn main() { use hylic::domain::local as ldom; let lf = ldom::fold(|n: &i32| *n as u64, |h: &mut u64, c: &u64| *h += c, |h: &u64| *h); ldom::FUSED.run(&lf, &graph, &root); } -
Helpers under
hylic::prelude::*that come in via the wildcard but are sometimes worth naming explicitly —Traced,memoize_treeish,VecFold,vec_fold, the explainer trace formatters,TreeFormatCfg. Reach for the qualified path when the call site benefits from the extra signal at a glance.
Module map
Concept map
How the pieces fit together.
The three axes
hylic is built on three orthogonal axes. Each can be chosen independently:
Operations define the computation. Domain determines boxing overhead. Executor determines traversal strategy. Any combination works (subject to domain compatibility).
Type landscape
How a user navigates
The prelude defaults to the Shared domain — fold(…) /
treeish(…) / FUSED resolve to the Shared constructors. For
Local or Owned, take the per-domain path; see
Import patterns.
Domain compatibility matrix
| Shared | Local | Owned | |
|---|---|---|---|
| Fused | yes | yes | yes |
| Funnel | yes | — | — |
| Explainer | yes | yes | — |
| Pipeline (Stage 1 + 2) | yes | yes | — |
| OwnedPipeline | — | — | yes |
Fused supports all domains (borrows, never clones). Funnel requires
N: Clone + Send, R: Send, G: Send + Sync — the Shared domain
provides these. The Stage-1/Stage-2 pipeline surface is mirrored
across Shared and Local (SeedSugarsShared / SeedSugarsLocal,
Stage2SugarsShared / Stage2SugarsLocal, …); the Owned domain
gets the dedicated one-shot OwnedPipeline instead, since its
closures consume on first use and cannot back a Clone-able
chain.
Zero-boxing path
For maximum performance, skip the domain system entirely.
Implement FoldOps and TreeOps on concrete structs — the
compiler monomorphises every call, eliminating the dyn Fn
indirections. See
Zero-cost performance
for the worked walkthrough.
Domain system
A domain controls how fold closures are stored — the boxing strategy
that determines the refcount overhead, thread-safety, and
transformation semantics of a Fold<N, H, R>. Graph types are
domain-independent and live in a separate module (hylic::graph).
The three domains
The three built-in domains form a spectrum from maximum capability to minimum overhead:
| Domain | Fold storage | Clone | Send+Sync | Fold transforms | Executors |
|---|---|---|---|---|---|
| Shared | Arc<dyn Fn + Send + Sync> | yes | yes | borrow (&self) | Fused, Funnel |
| Local | Rc<dyn Fn> | yes | no | borrow (&self) | Fused |
| Owned | Box<dyn Fn> | no | no | move (self) | Fused |
The domain affects only the fold. Graph types (Treeish, Edgy,
Graph) are always Arc-based because graph composition requires
Clone. The executor accepts any graph type that implements
TreeOps<N> — the graph’s storage is checked at the call site, not
through the domain.
Module structure
Each domain module provides fold constructors and executor bindings. Graph types are in a separate public module:
domain/
shared/fold.rs Fold (Arc) + fold(), exec(), FUSED
local/mod.rs Fold (Rc) + fold(), exec(), FUSED
owned/mod.rs Fold (Box) + fold(), exec(), FUSED
graph/
edgy.rs Edgy<N,E>, Treeish<N> (Arc) + combinators
compose.rs Graph
A typical program imports the prelude — every domain marker, the Shared-default constructors, and the graph constructors come with it:
#![allow(unused)]
fn main() {
use hylic::prelude::*;
}
For Local or Owned construction, address the per-domain module
directly (hylic::domain::local, hylic::domain::owned) — see
Import patterns.
The Domain trait
The Domain trait provides a single associated type — the concrete
Fold type for each domain:
#![allow(unused)]
fn main() {
pub trait Domain<N: 'static>: 'static {
type Fold<H: 'static, R: 'static>: FoldOps<N, H, R>;
type Graph<E: 'static> where E: 'static;
type Grow<Seed: 'static, NOut: 'static>;
/// Construct a fold from three closures. Uniform Send+Sync
/// bound; each domain sheds Send+Sync at storage time if it
/// doesn't need it.
fn make_fold<H: 'static, R: 'static>(
init: impl Fn(&N) -> H + Send + Sync + 'static,
acc: impl Fn(&mut H, &R) + Send + Sync + 'static,
fin: impl Fn(&H) -> R + Send + Sync + 'static,
) -> Self::Fold<H, R>;
/// Construct a grow closure from a Fn. Uniform Send+Sync bound.
fn make_grow<Seed: 'static, NOut: 'static>(
f: impl Fn(&Seed) -> NOut + Send + Sync + 'static,
) -> Self::Grow<Seed, NOut>;
/// Invoke a stored grow closure.
fn invoke_grow<Seed: 'static, NOut: 'static>(
g: &Self::Grow<Seed, NOut>,
s: &Seed,
) -> NOut;
/// Construct a graph (Edgy) closure. Uniform Send+Sync bound.
fn make_graph<E: 'static>(
visit: impl Fn(&N, &mut dyn FnMut(&E)) + Send + Sync + 'static,
) -> Self::Graph<E>;
}
}
The Executor trait is parameterized by D: Domain<N>, so the
compiler resolves D::Fold<H, R> to the concrete fold type at
monomorphization time. The graph type is a separate type parameter
G: TreeOps<N> on the Executor trait, constrained per executor
implementation (Fused accepts any G; Funnel requires G: Send+Sync).
#![allow(unused)]
fn main() {
/// Run a fold on a tree. Both Specs and Sessions implement this.
///
/// The fold is domain-specific (`D::Fold<H, R>`). The graph type G
/// is a trait-level parameter — each executor impl declares its own
/// bounds on G (e.g. Fused accepts any TreeOps, Funnel requires
/// Send+Sync). The compiler checks G at the call site.
pub trait Executor<N: 'static, R: 'static, D: Domain<N>, G: TreeOps<N> + 'static> {
/// Run the given `fold` over the `graph` starting at `root` and
/// return the fold's final result for the root.
fn run<H: 'static>(&self, fold: &D::Fold<H, R>, graph: &G, root: &N) -> R;
}
}
FoldOps and TreeOps
The operations traits provide the universal interface that executors program against:
Any type implementing init/accumulate/finalize is a fold. Any
type implementing visit is a graph. The executor’s recursion engine
operates on these traits, not on concrete types.
Why the domain is on the executor
Fold<N, H, R> has no domain parameter — the domain is a type
parameter on the executor: Exec<D, S>. This resolves a type
inference problem: GATs are not injective (D::Fold<H, R> does not
uniquely identify D), so placing the domain on the fold would
prevent the compiler from inferring the domain from the argument
types. With D on the executor, each constant (shared::FUSED,
local::FUSED, owned::FUSED) or <domain>::exec(...) call fixes
D, and the compiler resolves everything statically. See
Domain integration.
Choosing a domain
Shared is the default choice. It supports parallel execution (Funnel requires Send+Sync folds), lift integration (Explainer operates on Shared folds), and non-destructive fold transformations (the original fold is preserved after map/contramap/product).
Local provides the same transformation API with lighter refcounting (Rc vs Arc — non-atomic vs atomic increment). It works with the Fused executor for single-threaded computation.
Owned eliminates refcounting entirely. Fold transformations consume the original (move semantics). Useful for measuring the framework’s raw overhead in benchmarks, or for single-use folds where the original is not needed after transformation.
All three domains provide the same fold combinator surface:
wrap_init, wrap_accumulate, wrap_finalize, map, zipmap,
contramap, product. The difference is in the calling convention
(borrow vs move) and the auto-traits (Send+Sync vs neither).
Implementation notes
Technical specifics of how hylic stores closures, traverses graphs, and erases types across the lift family.
Closure storage
The three functions in a Fold<N, H, R> (init, accumulate,
finalize) are stored as type-erased closures behind Arc:
#![allow(unused)]
fn main() {
pub struct Fold<N, H, R> {
pub(crate) impl_init: Arc<dyn Fn(&N) -> H + Send + Sync>,
pub(crate) impl_accumulate: Arc<dyn Fn(&mut H, &R) + Send + Sync>,
pub(crate) impl_finalize: Arc<dyn Fn(&H) -> R + Send + Sync>,
}
}
Type erasure (dyn Fn) means every Fold produced by
map/zipmap shares the concrete type Fold<N, H, R>, so
combinators compose without per-lift type explosion.
Arc is required because Fold is Clone (the lift layer
clones it once per phase closure). Box<dyn Fn> is not Clone;
Arc<dyn Fn> increments a refcount.
The Local domain uses Rc<dyn Fn> (lighter refcount,
single-threaded). The Owned domain uses Box<dyn Fn> (no
refcount, no Clone, single-shot).
Fold, Edgy, Graph, SeedPipeline, and related types
implement Clone by hand. A derived Clone would constrain
type parameters to Clone, which the contained Arc/Edgy/
Fold already cover without that bound.
Graph traversal
Edgy<N, E> (and Treeish<N> = Edgy<N, N>) stores a
callback-based visit closure:
#![allow(unused)]
fn main() {
/// A `NodeT → EdgeT*` function wrapped as a clonable Arc-backed
/// struct. When `NodeT = EdgeT` the type is typically named
/// [`crate::graph::Treeish`].
pub struct Edgy<NodeT, EdgeT> {
impl_visit: Arc<dyn Fn(&NodeT, &mut dyn FnMut(&EdgeT)) + Send + Sync>,
}
}
The signature is Fn(&N, &mut dyn FnMut(&E)). Children are
visited by reference; no allocation per traversal. When a Vec
is needed (parallel iteration, for instance), apply()
collects via the callback. Edgy::at(node) returns a Visit —
a zero-allocation push-based iterator with map, filter,
fold, collect_vec.
The Lift trait
Lift<N, N2> has two GATs: MapH<H, R> and MapR<H, R>.
H and R are method-level parameters on lift_fold<H, R>,
not trait-level parameters, so they’re inferred from the fold
at each call site. The trait is a bifunctor on the (H, R)
pair.
Concrete lifts implement Lift directly as structs.
Explainer is a unit struct; SeedLift carries a grow
function and is used internally by SeedPipeline. Automatic
composition is provided by a blanket ComposedLift impl — no
per-lift boilerplate.
ConstructFold: domain-generic fold construction
ConstructFold<N> constructs a D::Fold<H, R> from three
closures, generic over D. Each domain implements it with its
own storage strategy: Shared wraps in Arc, Local in Rc.
Shared’s fold constructor requires closures to be Send + Sync,
but the trait signature is uniform across domains. make_fold
is therefore unsafe fn with a documented contract — for the
Shared impl, callers must pass closures that are actually
Send + Sync. The Shared impl uses AssertSend<T> (an
unsafe-marked Send+Sync wrapper) with method-call capture
(.get()) to satisfy the compiler.
The method-call pattern matters under Rust 2021 precise
captures: (wrapper.0)(n) captures the inner field (and
bypasses the Send assertion); wrapper.get()(n) captures the
whole wrapper.
Reserved for downstream lift implementations that need domain-generic fold construction without going through the typestate pipeline.
Module visibility
graph/ is pub — it holds the domain-independent graph
types (Edgy, Treeish, Graph) that every other module
imports. fold/ is pub(crate) and contains
domain-independent combinator functions used by the per-domain
Fold implementations.
Each domain owns its Fold type in
domain/{shared,local,owned}/fold.rs. exec/ and ops/ are
pub — exec for executors (Executor, Exec, fused,
funnel); ops for the operations traits (FoldOps,
TreeOps) and the lift atoms (Lift, ShapeLift,
SeedLift, …).
The prelude module
Types in prelude/ are built on the core but optional to use:
VecFold/VecHeap— convenience fold that collects all children before finalizing.Explainer— computation tracing as aLift.TreeFormatCfg— tree-to-string formatting.Traced— path tracking for tree nodes.memoize_treeish— graph-level caching for DAGs.seeds_for_fallible— fallible seed pattern forEither<Error, Valid>graphs.
Sibling crates
The following subsystems live in sibling crates and are documented in their own source:
- hylic-benchmark — Rayon executor, Sequential executor, benchmark scenarios and runners.
- hylic-pipeline — typestate builder over
hylic’s lift primitives. See Pipelines.
Theory notes
hylic implements patterns from the theory of recursion schemes, adapted for Rust’s type system. This page maps hylic’s types to their formal names.
Catamorphism
A catamorphism is a bottom-up fold over a recursive structure. The
algebra is F R → R — given one layer of structure with children
already folded to R, produce R. The carrier type R is the
result at every subtree.
hylic factors this algebra into three steps through an intermediate
type H:
F R → R = init(&N) → H, accumulate(&mut H, &R) per child, finalize(&H) → R
H is mutable working state internal to each node. R is the
immutable result that flows between nodes. The bracket
(init opens H, finalize closes to R) makes the invariant
boundary explicit. See
The N-H-R algebra factorization for the comparison
with Milewski’s monoidal decomposition and the equivalence under
associative ⊕.
Hylomorphism
When the tree structure is not materialized but discovered on demand
(via a Treeish backed by lazy child discovery), the unfold
(anamorphism) and fold (catamorphism) fuse — the tree exists only as
a call stack, never as a data structure. This is a hylomorphism.
In hylic, every Exec::run() call is a hylomorphism: the executor
receives a coalgebra (Treeish<N>, which produces children on
demand) and an algebra (FoldOps<N, H, R>, which consumes them),
and fuses both in a single recursive pass. (N, Treeish<N>) is
hylic’s runtime equivalent of the type-level Fix (f a) — the pair
describes a root and a way to get children, recursively, without
materializing the tree.
The Funnel executor parallelizes the hylomorphism using CPS and defunctionalized tasks.
Anamorphism (seed-based discovery)
An anamorphism builds recursive structure from a seed.
SeedPipeline encapsulates this:
given a seed edge function (Edgy<N, Seed>) and a grow function
(Fn(&Seed) → N), it constructs the treeish by composing
seeds_from_node.map(grow) and handles the entry transition.
Internally, SeedLift implements Lift to express the
SeedNode<N> indirection as a fold transformation.
Histomorphism (fold with history)
The Explainer records the full computation trace at every node —
initial heap, each child result folded in, and the final result.
This corresponds to a histomorphism: a catamorphism where each node
has access to the full computation history of its subtree.
The Explainer’s output (ExplainerResult) is analogous to the
cofree comonad annotation. It is expressed as a
Lift — a fold transformation that changes the
carrier types (H → ExplainerHeap, R → ExplainerResult). The
original R is accessible via ExplainerResult::orig_result.
Algebra morphism (Lift)
Lift<N, N2> maps one fold algebra into another. It
transforms the carrier types through two GATs (MapH<H, R>,
MapR<H, R>) and can change the node type (N → N2) by extending
the tree structure with new constructors.
The SeedLift extends the tree with an entry-root constructor
(SeedNode<N>: EntryRoot, Node) — the EntryRoot row’s children
are the per-seed grown nodes. The Explainer enriches the heap with trace data
without changing the node type. Both are algebra morphisms: they
transform the F R → R algebra into a different F' R' → R'
algebra over a richer domain.
lift::run_lifted applies the three trait methods (lift_treeish,
lift_fold, lift_root), runs the lifted computation, and returns
MapR<H, R>.
Externalized tree structure
Classical recursion schemes encode tree structure via fixed points
of functors (Fix F). The functor F defines one layer of shape
(leaf, binary node, n-ary node), and Fix F is the recursive
nesting.
hylic externalizes this as a runtime function: Treeish<N> is
Fn(&N, &mut dyn FnMut(&N)). The node type N carries identity,
not structure — the same N can be traversed by different treeish
functions, and the same fold works with any tree shape. This
trades compile-time structural guarantees for the orthogonal
decomposition of fold, graph, and executor.
The pair (N, Treeish<N>) corresponds to a coalgebra N → F N —
the treeish IS the coalgebra, producing one layer of children on
demand. Combined with the fold algebra, the executor performs a
fused hylomorphism.
Operations traits and domain abstraction
FoldOps<N, H, R> and TreeOps<N> abstract the fold and graph
operations from their storage. The standard types (Fold, Treeish)
store closures behind Arc (the Shared domain). Alternative
implementations can use Rc (Local), Box (Owned), or concrete structs
(zero-boxing). The executor’s recursion engine takes &impl FoldOps + &impl TreeOps — fully generic over the storage, monomorphized to
zero overhead for concrete types.
The Domain trait with GATs maps the marker type (Shared, Local,
Owned) to concrete fold types. Graph types are domain-independent —
always Arc-based, always Send + Sync. The domain controls only
how fold closures are stored.
Further reading
- Meijer, Fokkinga, Paterson. Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire. (1991) — the original recursion schemes paper.
- Milewski. Monoidal Catamorphisms (2020) — a different algebra factorization. See comparison.
- Gonzalez. foldl — the left-fold-with-extraction pattern.
- Kmett. recursion-schemes — Haskell reference implementation.
- Malick. recursion.wtf — practical recursion schemes in Rust.
The N-H-R algebra factorization
A catamorphism’s algebra collapses one layer of recursive structure.
The standard formulation is a single morphism F R → R. Both hylic
and Milewski’s monoidal catamorphism
factor this morphism into composable steps. They factor it
differently. This page establishes the precise relationship between
the two and shows when one can be derived from the other.
The two formulations
| hylic | Milewski | |
|---|---|---|
| Extract | init: &N → H | s: a → m (scatter) |
| Combine | acc: &mut H, &R | ⊕: m × m → m (monoid) |
| Output | fin: &H → R (every node) | g: m → b (root only) |
| Working type | H (unconstrained) | m (associative, with identity) |
| Carrier | R | m |
In hylic, the carrier is R. Every subtree produces R. In
Milewski, the carrier is m. Every subtree produces m, and a
separate function g converts to the output type b once at the
root.
The bracket
At each node, init opens mutable working state H, accumulate
folds each child’s R into it, and finalize closes it to R.
The heap H never crosses node boundaries. Only R flows between
nodes.
Green is H-world (mutable working state). Blue is R (immutable
result). The green-to-blue transition at each node is the finalize
step, the bracket closing.
The node type N seeds the heap but is not part of the algebra. It
is the node’s identity; the recursive structure lives in
Treeish<N>, not in N. The pair
(N, Treeish<N>) is hylic’s runtime equivalent of Milewski’s
type-level Fix (f a).
The bracket separates mutable working state from immutable results.
H can be a growable Vec while R is a frozen Arc<[T]>, for
example. Without the bracket, the user would either accumulate into
Arc (expensive reallocation on every push) or return Vec as the
result (wrong invariant for the parent, which expects immutable
data). The Rust type system reinforces this: &mut H is
single-owner and never shared, while R can be Send and cross
thread boundaries. The Funnel executor
exploits this directly. R values are delivered across threads via
slot delivery; H stays on the
sweeping thread. For single-child nodes, the bracket is carried as
a direct continuation with no
allocation and no atomic. Each phase can be wrapped
independently via wrap_init,
wrap_accumulate, wrap_finalize.
The monoidal form
In Milewski’s decomposition, the working type m is a monoid
(associative binary operation ⊕ with identity ε). A Fold(s, g)
pairs a scatter function s: a → m with a gather function
g: m → b. An MAlgebra provides the structural combination rule,
combining one layer of the functor using only ⊕ and ε.
The catamorphism cat malg (Fold s g) = g ∘ cata (malg ∘ bimap s id)
produces m at every node. g converts to b once at the root.
Compare the two diagrams. In the bracket form, every node has a
green-to-blue transition (per-node finalize). In the monoidal form,
green m flows uniformly and the single blue step occurs at the
root.
Relationship
Claim. Milewski’s monoidal catamorphism is a special case of hylic’s N-H-R fold.
Proof. Given a Milewski fold with monoid (m, ⊕, ε), scatter
s: a → m, and gather g: m → b, construct the hylic fold:
H = R = m, init = s, acc = ⊕, fin = identity
At each node, hylic computes acc(acc(init(n), r₁), r₂)
= s(n) ⊕ r₁ ⊕ r₂. This is the value Milewski’s catamorphism
produces at every node. The user applies g to the root result to
obtain b. ∎
Conditions for the converse. A hylic fold is expressible as a Milewski monoidal catamorphism when:
H = Randfin = identityaccis a monoid (associative with identity element)
These make (H, acc, ε) a monoid. The correspondence is then
m = H, s = init, ⊕ = acc, g = identity.
Without these conditions, hylic’s fold is strictly more general. It admits non-associative accumulation and distinct working/result types.
Examples
Folds that satisfy the monoid conditions:
- Sum.
H = R = u64,acc = +,fin = id. Addition with identity 0. - Extend.
H = R = Vec<T>,acc = extend,fin = clone. Concatenation with identityvec![]. The filesystem summary uses this. - Union.
H = R = HashSet<K>,acc = union,fin = clone. Associative and commutative.
Folds that do not:
- Child count.
acc((s,c), r) = (s+r, c+1). The count tracks immediate children, not descendants. Not associative:(h₁⊕h₂)⊕h₃yieldsc+2whileh₁⊕(h₂⊕h₃)yieldsc+1. - Bracketed formatting.
fin(h) = format!("[{}]", h). HereH ≠ Rand regrouping changes the nesting:[a[b]][c] ≠ [a][b[c]].
Associativity and parallel accumulation
A monoid’s associativity allows the executor to contract adjacent
sibling results in any grouping. If children b and c have completed
but a has not, b ⊕ c can proceed without waiting for a. When a
eventually completes, it combines with the already-contracted result.
For n children, this reduces the accumulation depth from O(n) to
O(log n).
hylic’s Funnel executor does not perform
this contraction. It parallelizes subtree computation (children run
concurrently on different workers) and accumulates their results
left-to-right as the sweep cursor
advances. This is a design choice: sequential accumulation enables
progressive memory freeing, where each child’s R is consumed and
dropped as the cursor passes. It also means the executor imposes no
algebraic requirements on acc. It is up to the user to supply an
appropriate accumulate function, and up to the executor to decide
how results are folded into H.
A lift can recover O(log n) depth when needed: by transforming the tree structure to insert balanced reduction nodes, the contraction becomes a property of the tree shape rather than the algebra.
The general structure
In algebraic terms, acc: H × R → H is an action of R on H.
When H = R and acc is a monoid, this is a monoid acting on
itself, which is Milewski’s formulation. In general, it is an
R-module: R acts on a distinct type H through acc, with
fin: H → R as the projection. A monoid is a module over itself;
a module is not necessarily a monoid.
hylic’s API does not distinguish between these cases. The user
writes init, acc, fin. The executor runs them with sequential
accumulation and parallel subtree computation via
CPS work-stealing.
Composability
hylic’s fold combinators
(product,
map, zipmap,
wrap_*) and graph combinators
(filter,
memoize,
contramap) achieve the same practical
composability as Milewski’s Functor/Applicative on Fold.
Lifts transform both fold and treeish in
sync, changing the carrier types through GATs. The
SeedPipeline uses a lift internally
to bridge coalgebra and algebra when they speak different types.
Bridging coalgebra and algebra: SeedPipeline
A hylomorphism fuses a coalgebra (produce children) with an algebra
(fold results). Both operate on the same type N. In practice, the
dependency structure often speaks a different type. A module
resolver starts with module names (seeds), not parsed modules
(nodes). A grow function resolves one into the other.
The user provides:
grow: Fn(&Seed) → N resolve a reference
seeds_from_node: N → Seed* a node's dependency references
fold: FoldOps<N, H, R> the algebra, defined over N
In hylic, N → Seed* is Edgy<N, Seed>, the general edge
function. N → N* is Treeish<N>, the special case where node and
edge types match.
The coalgebra produces Seed. The algebra consumes N. The
morphism grow: Seed → N bridges them.
SeedPipeline reconciles this
through two combinator chains.
Chain 1: coalgebra composition. Close N → Seed* into
N → N* via .map(grow):
seeds_from_node: Edgy<N, Seed> N → Seed*
.map(grow) Seed → N
= treeish: Edgy<N, N> N → N* (= Treeish<N>)
In code: the (grow, seeds_from_node) pair is fused internally
at run time via Shared::fuse_grow_with_seeds, producing the
Treeish<N> that drives traversal past the entry. The underlying
combinator is Edgy::map — see
hylic/src/graph/edgy.rs.
Chain 2: entry lifting. The SeedLift constructs a
Treeish<SeedNode<N>> with two variants: Node(n) visits the
original treeish (wrapping children as Node), and EntryRoot
fans out the entry seeds by running grow(seed) on each and
wrapping the result as Node.
The relevant struct and its Lift impl:
#![allow(unused)]
fn main() {
/// The finishing lift that closes a `SeedPipeline`'s grow axis.
/// Composes entry-seed dispatch on top of a `(grow, seeds, fold)`
/// triple and produces a treeish over `SeedNode<N>`. Not
/// user-constructed; assembled internally by
/// `Stage2Pipeline::run` at call time.
///
/// Domain-parametric: storage of the entry-seeds graph and the
/// entry-heap thunk is per-domain via `<D as Domain<()>>::Graph<Seed>`
/// and `<D as ShapeCapable<N>>::EntryHeap<H>`. No hand-rolled
/// domain discriminator.
#[must_use]
pub struct SeedLift<D, N, Seed, H>
where D: ShapeCapable<N> + Domain<()>,
N: 'static, Seed: 'static, H: 'static,
{
pub(crate) grow: <D as Domain<N>>::Grow<Seed, N>,
pub(crate) entry_seeds: <D as Domain<()>>::Graph<Seed>,
pub(crate) entry_heap_fn: <D as ShapeCapable<N>>::EntryHeap<H>,
_m: PhantomData<fn() -> (D, N, Seed, H)>,
}
}
#![allow(unused)]
fn main() {
/// Opaque row type in a seed-closed chain's treeish. Values are
/// either the synthetic `EntryRoot` row (seed fan-out) or a resolved
/// `Node(N)`. User code inspects via [`is_entry_root`](Self::is_entry_root),
/// [`as_node`](Self::as_node), [`into_node`](Self::into_node), and
/// [`map_node`](Self::map_node); the variants are sealed.
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct SeedNode<N> {
// Exposed `pub` (not `pub(crate)`) so the doc-hidden
// `seed_node_internal` module can re-export it for
// `hylic-pipeline`'s dispatch. User code should treat this field
// as opaque and use `is_entry_root` / `as_node` / `map_node`.
#[doc(hidden)]
pub inner: SeedNodeInner<N>,
}
/// Library-internal variant carrier for `SeedNode<N>`. Exposed
/// `pub` only to make crate-external re-export through the
/// `seed_node_internal` doc-hidden module possible. User code
/// should never name this directly.
#[doc(hidden)]
#[derive(Clone, PartialEq, Eq, Hash)]
pub enum SeedNodeInner<N> {
EntryRoot,
Node(N),
}
}
Node(n) delegates to the inner treeish. EntryRoot has no
children of its own in the treeish — its children come from the
entry seeds provided at run time, grown inline at the EntryRoot
visit.
After the EntryRoot → Node(grow(s)) transition, the original
coalgebra and algebra drive all further recursion. The
SeedNode<N> row and the composed treeish are internal to the
pipeline.
Entry seeds are supplied at run time via Edgy<(), Seed> passed to
pipeline.run(exec, entry_seeds, initial_heap), or via
pipeline.run_from_slice(exec, &[seed1, seed2], initial_heap).
The pipeline itself stores no entry concerns — only grow,
seeds_from_node, and the fold.
Further reading
- Milewski. Monoidal Catamorphisms (2020).
- Gonzalez. foldl — the left-fold-with-extraction type.
- Meijer, Fokkinga, Paterson. Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire (1991).
Pipeline transformability
This chapter explains how pipelines and Stage-2 pipelines compose,
why the seed-rooted Stage-2 form has its own sugar surface even
though it shares the Stage2Pipeline<Base, L> type, and how the
transformation layers fit together.
The core mechanics are in scope for every user; the two appendices at the end are flagged interested user only and cover the hard design problems that were worked through to arrive at the current shape. Most readers can stop at The two stages at a glance.
The two stages at a glance
A pipeline has a Stage 1 (base slots — coalgebraic form) and
a Stage 2 (stacked lifts — algebraic transformation form). A
.lift() call moves a pipeline from Stage 1 to Stage 2.
Two transformation vocabularies:
-
Stage 1 (reshape): rewrites base slots in place. A
SeedPipeline’s.filter_seeds,.wrap_grow,.map_node_bi,.map_seed_biall produce anotherSeedPipelineof (possibly different) type parameters. Cheap; no lift chain involved. -
Stage 2 (lift-chain compose): appends a library lift to the chain.
.wrap_init,.zipmap,.map_r_bi,.map_n_bi,.filter_edges,.memoize_by,.explain,.wrap_accumulate,.wrap_finalize— each one delegates internally tothen_lift(Domain::xxx_lift(...)).
The same sugar name may appear at both stages with different
semantics. map_node_bi at Stage 1 is a reshape (new Stage-1
pipeline); map_n_bi at Stage 2 composes a ShapeLift onto the
chain. Distinct names make the stage unambiguous at the call
site.
The Lift trait — the single transformation primitive
Every Stage-2 sugar ultimately builds a value implementing
Lift<D, N, H, R>:
#![allow(unused)]
fn main() {
/// Domain-generic transformer over the `(treeish, fold)` pair.
///
/// A `Lift` rewrites the graph side and/or the fold side, possibly
/// changing their carrier types, and hands the result to a
/// continuation. The caller's continuation-return type `T` flows
/// through, so the chain of output types stays inferred across
/// composition (`ComposedLift<L1, L2>`).
///
/// Grow is deliberately absent from this signature. Only the Seed
/// finishing lift ([`SeedLift`](super::SeedLift)) needs a grow
/// input; it is composed internally by
/// `hylic_pipeline::PipelineExecSeed::run` and does not travel as
/// a 3-slot signature through the `Lift` trait.
///
/// See [Lifts](https://hylic.balcony.codes/concepts/lifts.html).
pub trait Lift<D, N, H, R>
where D: Domain<N> + Domain<Self::N2>,
N: Clone + 'static, H: Clone + 'static, R: Clone + 'static,
{
/// Output node type after the lift has been applied.
type N2: Clone + 'static;
/// Output heap type after the lift has been applied.
type MapH: Clone + 'static;
/// Output result type after the lift has been applied.
type MapR: Clone + 'static;
/// Apply the lift to `(treeish, fold)` and invoke `cont` with
/// the transformed pair.
fn apply<T>(
&self,
treeish: <D as Domain<N>>::Graph<N>,
fold: <D as Domain<N>>::Fold<H, R>,
cont: impl FnOnce(
<D as Domain<Self::N2>>::Graph<Self::N2>,
<D as Domain<Self::N2>>::Fold<Self::MapH, Self::MapR>,
) -> T,
) -> T;
}
}
A Lift takes a (treeish, fold) pair over (N, H, R) and
produces another over (N2, MapH, MapR). The library ships four
atoms:
IdentityLift— pass-through; the seed of every chain.ComposedLift<L1, L2>— sequential composition,L1’s outputs feedingL2’s inputs.ShapeLift— the universal store-three-xforms lift that every library sugar instantiates (one per axis: treeish, fold, plus N-type).SeedLift— the finishing lift that closes the seed axis (assembled at run time; not user-constructed in the common path).
A chain is a right-associated tree of ComposedLift values
rooted at IdentityLift; every .then_lift(L) or sugar call
wraps the current tip in ComposedLift<current, L>. Each lift
specifies how it rewrites the pair via its apply method, which
uses CPS so the caller’s
continuation-return type threads through composition.
See Lifts — cross-axis transforms for the full catalogue and the atom-level reference.
One Stage-2 type, two Base configurations
Stage 2 has a single pipeline type — Stage2Pipeline<Base, L> —
but its sugar surface bifurcates by Base because the chain L
operates over a different node type in each configuration.
Treeish-rooted: Stage2Pipeline<TreeishPipeline<…>, L>
When the base is a TreeishPipeline (or another treeish-rooted
Stage2Pipeline being extended), the chain L: Lift<D, N, H, R>
operates over the base’s N directly. All sugars come from the
blanket trait Stage2SugarsShared / Stage2SugarsLocal.
Seed-rooted: Stage2Pipeline<SeedPipeline<…>, L>
When the base is a SeedPipeline, SeedLift is assembled at
.run() time from the base’s grow plus the caller-supplied
root_seeds and entry_heap, and composed as the first lift
in the run-time chain. SeedLift’s output node type is
SeedNode<N>, so every lift in the stored chain L operates over
SeedNode<N> — not the user’s N.
Even though the chain runs over SeedNode<N>, Stage-2 sugars on
this form take user closures over &N. The translation is done
by Wrap dispatch: one
Stage2SugarsShared / Stage2SugarsLocal trait covers both Bases,
peeling &SeedNode<N> to &N on the seed-rooted side and acting
as a pass-through on the treeish-rooted side. EntryRoot defaults
are encoded in the per-sugar peel routines (EntryRoot passes
through wrap_init, filter_edges always admits its children,
etc.) — see the table later in this chapter for the full set.
The seed-pipeline-unification (one cycle ago) collapsed the two
historical struct types LiftedPipeline and LiftedSeedPipeline
into the single Stage2Pipeline<Base, L>, and reduced the
parallel-but-separate sugar catalogues into one Wrap-dispatched
trait per domain. The old type names no longer exist.
The follow-on stage2-base-run-unification cycle did the same
operation on the run path: a single
Stage2Pipeline::run_with_inputs body lives in
stage2/run/mod.rs, generic over Base: Stage2Base. The base
contributes a RunInputs<'i, CurN> GAT, a PreLift lift, and a
provide_run_essentials callback that builds the pre-chain lift
from inputs and yields the descend-root reference at the
post-chain type. The two former per-domain run files
(stage2/run/seed_shared.rs, stage2/run/seed_local.rs) were
retired; the per-domain residue lives in seed/stage2_base_*.rs,
where the Domain<N>::Grow<Seed, N> GAT non-normalisation forces
a per-domain pinning of the SeedLift constructor.
Run composition: how .run() actually produces a result
For a treeish-rooted Stage2Pipeline<TreeishPipeline<…>, L>:
1. Base = TreeishPipeline: yield (treeish, fold) over (N, H, R).
2. Chain L applied: (treeish, fold) over (N, H, R)
→ (treeish', fold') over (N2, MapH, MapR).
3. Executor: run(&fold', &treeish', &root) → MapR.
For a seed-rooted Stage2Pipeline<SeedPipeline<…>, L>:
1. Base = SeedPipeline: hold (grow, seeds_from_node, fold) over (N, Seed, H, R).
At .run time the user adds (root_seeds, entry_heap: H).
2. Fuse: treeish_base = seeds_from_node.map(grow) — over N.
3. SeedLift::apply (the first, innermost lift):
(treeish_base, fold_base) over (N, H, R)
→ (treeish_lifted, fold_lifted) over (SeedNode<N>, H, R).
4. Chain L applied: (treeish_lifted, fold_lifted) over (SeedNode<N>, H, R)
→ (treeish', fold') over (SeedNode<N2>, MapH, MapR).
5. Executor: run(&fold', &treeish', &SeedNode::entry_root()) → MapR.
The critical design point in step 3 is that SeedLift is applied
inside the chain, not as an outer wrap. That’s what lets the
user chain L transform the fold and treeish in ways that would
otherwise depend on the seed-closing step having already
happened. See Appendix A
for the history of this choice.
Transformation variance at a glance
Each axis in the (N, H, R) triple has a distinct variance:
| Axis | Role in fold | Variance | Sugar |
|---|---|---|---|
| N | init(&N) → H (contra) | invariant (used in both fold and graph) | map_node_bi (S1) / map_n_bi (S2) — bijection required |
| H | &mut H (internal) | invariant (never crosses node boundaries) | wrap_init, wrap_accumulate, wrap_finalize |
| R | finalize(&H) → R; &R in acc | invariant (appears in and out) | zipmap, map_r_bi |
Invariance on all three is why every N- and R-change sugar requires both a forward and a backward closure. See Transforms and variance for the categorical picture.
Where the abstractions stop
One deliberate asymmetry remains in the current surface:
Auto-lift on TreeishPipeline but not on SeedPipeline.
tp.wrap_init(w) works directly (the Stage-1 → Stage-2
transition is implicit); sp.wrap_init(w) is a compile error —
write sp.lift().wrap_init(w) explicitly. The asymmetry is
intentional: a treeish-rooted Stage-2 chain operates over the
same N as the base, so the lift is invisible to the user; a
seed-rooted Stage-2 chain operates over SeedNode<N>, and
silencing that transition would surface in any lift whose output
type mentions N (the explainer being the canonical example).
Forcing .lift() makes the row type’s appearance traceable to a
single line in the user’s source.
The historical second asymmetry — a parallel inherent-methods
catalogue on the seed-rooted form — was eliminated by the
seed-pipeline-unification. Stage-2 sugars are now one trait per
domain (Stage2SugarsShared / Stage2SugarsLocal), blanket
implemented over both Bases, with Wrap doing the per-Base type
projection. See Wrap dispatch.
Appendix A — interested user only: seed pipeline, lifting, and run composition
This section is intentionally deeper than the preceding narrative and is safe to skip. It exists for readers who want to know why the current shape is as it is.
The problem the SeedPipeline solves
A hylomorphism fuses a coalgebra (N → children) with an
algebra (children → result). In practice the dependency graph
often speaks a different type than the algebra: module paths,
URLs, database keys — a Seed that must be grown to an N
before the algebra can inspect it. A SeedPipeline carries the
triple (grow: Seed → N, seeds_from_node: N → Seed*, fold: N → H → R)
and fuses seed-axis and node-axis into one treeish at run time.
The fusion is:
seeds_from_node: N → Seed* via Edgy<N, Seed>
.map(grow): Seed → N via the domain's grow xform
=> treeish: N → N* via Edgy<N, N> ≡ Treeish<N>
The result is a Treeish<N> that the executor can walk. But
before walking, execution needs a root. The user supplies
entry seeds (root_seeds: Edgy<(), Seed>) and an initial heap
(entry_heap: H); these must be turned into (a) a starting node
the executor can descend from, and (b) a top-level accumulation
protocol for the children’s results.
The EntryRoot-as-node compromise
The executor’s run method takes a single root: run(fold, treeish, &N) → R. To handle a forest of entry seeds under a
single-root executor, the library invents a synthetic root row:
#![allow(unused)]
fn main() {
/// Opaque row type in a seed-closed chain's treeish. Values are
/// either the synthetic `EntryRoot` row (seed fan-out) or a resolved
/// `Node(N)`. User code inspects via [`is_entry_root`](Self::is_entry_root),
/// [`as_node`](Self::as_node), [`into_node`](Self::into_node), and
/// [`map_node`](Self::map_node); the variants are sealed.
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct SeedNode<N> {
// Exposed `pub` (not `pub(crate)`) so the doc-hidden
// `seed_node_internal` module can re-export it for
// `hylic-pipeline`'s dispatch. User code should treat this field
// as opaque and use `is_entry_root` / `as_node` / `map_node`.
#[doc(hidden)]
pub inner: SeedNodeInner<N>,
}
/// Library-internal variant carrier for `SeedNode<N>`. Exposed
/// `pub` only to make crate-external re-export through the
/// `seed_node_internal` doc-hidden module possible. User code
/// should never name this directly.
#[doc(hidden)]
#[derive(Clone, PartialEq, Eq, Hash)]
pub enum SeedNodeInner<N> {
EntryRoot,
Node(N),
}
}
SeedLift wraps the treeish so that:
SeedNode::EntryRoot.visitfans out toSeedNode::Node(grow(s))for each entry seed.SeedNode::Node(n).visitdelegates to the base treeish.
And wraps the fold so that:
init(SeedNode::EntryRoot) = entry_heap_fn()— returns the user’sentry_heap.init(SeedNode::Node(n)) = base.init(n).accumulate/finalizeare uniform.
The executor then begins at &SeedNode::entry_root() and walks
normally. At the value level the EntryRoot row participates in
the fold like any other node — it receives children’s R via
accumulate, has its own finalize, and produces the final R.
This is a compromise, not the most principled shape: a
native-forest executor (one that accepts run_forest(fold, treeish, roots: &[N], initial_heap: H) → R directly) would
eliminate SeedNode<N> entirely from the chain’s node type and
strip the leak from user-visible result types. The refactor
cost is significant (touches the Executor trait, every
executor impl, and the accumulation protocol) and was deferred.
See Sealed SeedNode for
how the current shape is sealed at the user surface.
Why SeedLift is composed first (not last)
The library considered two architectures for where SeedLift
sits relative to the stored user chain L.
Option A (rejected). SeedLift as the outermost lift,
wrapped around the user’s chain. The user’s chain operates over
plain N; SeedLift wraps the result to introduce
SeedNode<N> at the outside.
Under this arrangement, N-changing lifts inside the user’s chain
would produce an N2 that needs to be re-introduced as the
inner node type that SeedLift’s grow-output is wrapped as.
Because the Lift trait cannot, in general, surface both a
forward and backward map over N (variance is already invariant
on N), Option A would force L::N2 = Base::N — the chain could
not change N. For N-change to work on the seed path, this
invariance had to be broken.
Option B (shipped). SeedLift as the innermost lift,
composed first at run time. The user’s chain operates over
SeedNode<N> from .lift() onward. N-changing lifts inside
the chain change SeedNode<N> → SeedNode<N2>, which is
natural — the chain sees EntryRoot as part of the structure and
any N-transform that preserves EntryRoot works.
The cost Option B pays is the SeedNode<N> leak into
chain-tip types (visible in ExplainerResult<SeedNode<N>, H, R>). That cost is bounded: the sugar layer hides the variant in
user closures (EntryRoot is auto-routed), and
SeedExplainerResult (via From) projects the trace to N-typed
when the user wants a sealed view.
Why SeedLift is assembled at run time, not at .lift()
SeedLift needs three ingredients: grow (from the base),
root_seeds (from the caller of .run), and entry_heap
(from the caller of .run). Only the first is available at
.lift() time. The library has two places to put the
root_seeds and entry_heap:
(a) as parameters to .lift(), turning it into a real
constructor — which then requires knowing the entry data
before any Stage-2 sugars are composed.
(b) as parameters to .run(), letting Stage-2 chains be
composed and reused across different (seeds, h0) inputs.
The library chose (b): a seed-rooted Stage2Pipeline is a
reusable computation; seeds + initial heap vary per call. This
lets patterns like:
let lsp = pipe.lift().wrap_init(w).zipmap(m);
let r1 = lsp.run_from_slice(&exec, &seeds1, h0);
let r2 = lsp.run_from_slice(&exec, &seeds2, h0);
work without reconstructing the chain.
The default semantics of EntryRoot-dispatch in sugars
Each user-closure sugar on the seed-rooted Stage2Pipeline makes a
per-sugar decision about EntryRoot:
| Sugar | EntryRoot behaviour |
|---|---|
wrap_init(w) | EntryRoot bypasses w; original init (returns entry_heap) runs |
wrap_accumulate(w) | applied uniformly (no N-signature to dispatch on) |
wrap_finalize(w) | applied uniformly |
filter_edges(p) | EntryRoot always admits its children; p(&CurN) applied to Nodes |
memoize_by(k) | EntryRoot uncached (keyed None); Nodes keyed Some(k(n)) |
zipmap(m) / map_r_bi(fwd,bwd) | applied uniformly |
map_n_bi(co, contra) | EntryRoot → EntryRoot; Node(n) ↔ Node(f(n)) |
explain() | EntryRoot is a fold row with its own trace |
Users who need different defaults — e.g., filter_edges that
excludes EntryRoot’s fan-out — use
.then_lift(Domain::xxx_lift::<SeedNode<CurN>, …>(pred))
directly. The sugars hide the common case; the raw surface
remains available for specialisation.
Appendix B — interested user only: the typing front
Why Lift::apply uses CPS
A direct apply signature would return the transformed pair
(Graph<D, N2>, Fold<D, N2, H2, R2>). Composition stacks these
returns:
fn apply(self) -> (Graph<D, N_k>, Fold<D, N_k, H_k, R_k>)
After three chained lifts, N_k, H_k, R_k are all
associated types of a ComposedLift<ComposedLift<ComposedLift<…>, …>, …>. No single named alias exists for the final return type
before the whole chain is constructed, and Rust’s inference
cannot thread the unnamed types through composition.
The CPS form threads the caller’s T outward:
fn apply<T>(
&self,
treeish: Graph<D, N>,
fold: Fold<D, N, H, R>,
cont: impl FnOnce(Graph<D, N2>, Fold<D, N2, H2, R2>) -> T,
) -> T;
Whatever type the caller’s closure produces at the innermost
apply (usually the executor’s MapR) flows back through every
enclosing apply call. Rust infers the intermediate pair types
at each junction without needing a named alias.
GAT normalisation helpers
The Domain<N> trait exposes three GATs:
type Fold<H, R>;
type Graph<E>;
type Grow<Seed, NOut>;
For Shared: Grow<Seed, N> = Arc<dyn Fn(&Seed) -> N + Send + Sync>. For Local: Grow<Seed, N> = Rc<dyn Fn(&Seed) -> N>.
Inside a generic impl body parameterised over some other
type (say N), Rust’s trait solver does not reduce
<Shared as Domain<N>>::Grow<Seed, N> to Arc<dyn Fn…>. The
types compare equal by definition, but the solver doesn’t do the
reduction unless Self = Shared is pinned in the enclosing
scope.
The library works around this via free helper functions that pin the domain:
fn shared_grow_as_arc<Seed, NOut>(
g: <Shared as Domain<NOut>>::Grow<Seed, NOut>,
) -> Arc<dyn Fn(&Seed) -> NOut + Send + Sync> { g }
Inside this function’s body, Self = Shared is the only option
and the GAT normalises. The function is a no-op at runtime (the
same value, type-coerced in), but makes the type checker happy
in a context that was about to timeout.
stage2/run/gat_helpers.rs collects the helpers (a Shared set
and a Local set), one per GAT crossing point.
The trait-twin Shared/Local pattern
Every sugar file has a _shared.rs and a _local.rs version.
Bodies are line-for-line identical; only Arc vs Rc, and
Send + Sync vs nothing, differ.
Why SeedNode<N> cannot be fully hidden in chain-tip types
Lift’s associated type N2 is the chain’s output node type,
which appears in every type parameter of every chain-tip result
the user sees. On the seed path, SeedLift sets N2 = SeedNode<N>; any later lift that preserves N preserves
SeedNode<N>; any later lift that changes N via map_n_bi
produces SeedNode<N2>. Concretely: ExplainerResult’s first
type parameter — the per-node “heap.node” field — ends up being
SeedNode<N>.
Hiding this would require either:
-
A projection pre-baked into each lift’s output (each lift carries a “strip SeedNode” step). This would require the library to know at composition time whether the input was a SeedPipeline — a structural property the
Lifttrait doesn’t expose. -
A seal at the value-variant level (SeedNode’s variants become non-matchable). The current design does this: variants are
pub(crate), user code inspects viais_entry_root,as_node,map_node. The type name still appears in chain-tip result types, but the enum-nature is sealed.
The library ships (2) plus a projection helper
(SeedExplainerResult::from) for users who want the type-name
seal on the explainer result. (1) would be cleaner but requires
trait-level machinery that’s not currently on the roadmap.
Unified sugar catalogue across both bases
The chain-bound mismatch between Lift<D, SeedNode<N>, H, R>
(seed-rooted) and Lift<D, N, H, R> (treeish-rooted) is bridged
by a Wrap dispatch trait:
Wrapis a type-only trait with a GATOf<UN>. Two impls:Identity::Of<UN> = UN(treeish-rooted),SeedWrap::Of<UN> = SeedNode<UN>(seed-rooted).Stage2Baseis implemented by every Stage-1 base; it carriestype Wrap: Wrapso each base names its own projection.Stage2SugarsShared/Stage2SugarsLocalare the unified sugar traits, blanket-implemented on everyStage2Pipeline<Base, L>whereBase: Stage2BaseandL: Lift<D, <Base::Wrap as Wrap>::Of<UN>, …>. Each sugar has one canonical body; the per-Base build closure is provided by aWrapShared/WrapLocalimpl that peels&SeedNode<UN>to&UNon the seed-rooted side.
See Wrap dispatch for the implementation.
Constraints carried by the typing
- The CPS shape of
Lift::applyis what makes the chain composable in Rust without HKT-style trait parameters. stage2/run/gat_helpers.rscarries one helper per GAT crossing point per domain. Zero-cost at runtime; refreshes whenever rustc’s GAT normalisation gets stronger.- The
_shared.rs/_local.rssugar twin files keep both domains readable without macros; the bodies stay line-for- line identical. SeedNode<N>is sealed but visible in chain-tip types whose lift output mentions the chain’s N. TheSeedExplainerResult::fromprojection lifts the row out of the explainer’s tip type for callers that prefer not to see it.
The type-level landscape
This chapter is the design-level account of how the library’s
functional concepts sit on top of Rust’s type system. It assumes
familiarity with the Lift, Wrap
dispatch, and
SeedNode<N> chapters — and an interest in
why those shapes are what they are, beyond just what they do.
The library is a type-level functional kernel. Every transformation —
fold-phase rewrite, axis change, seed-closure — is a categorical
construct (natural transformation, type-level function, indexed family)
encoded in Rust’s syntax. Where the encoding works smoothly the library
reads like ordinary Rust; where it doesn’t, the friction shows up as
verbose projection chains, deliberate (bi-suffixed) bidirectionality,
or the per-domain split. Each of those is structural, not stylistic.
GATs are higher-order functions
The first principle (lifted directly from
Crichton, “GATs are HOFs”):
a Generic Associated Type is a type-level function. A GAT
type Of<X>; on a trait T is a function X ↦ T::Of<X>, where the
function’s body is the impl’s expansion.
In hylic, the canonical GAT is Wrap::Of:
#![allow(unused)]
fn main() {
/// Type-level dispatch for the chain's input N. Each
/// [`Stage2Base`](super::Stage2Base) declares which `Wrap` it uses;
/// `WrapShared` / `WrapLocal` impls carry the per-domain lift
/// construction.
pub trait Wrap {
/// The wrapped node type for a given user-facing N.
type Of<UN: Clone + 'static>: Clone + 'static;
}
}
Wrap is a trait with a single GAT. Two impls give two functions:
Identity::Of : Type → Type ≡ λUN. UN -- identity
SeedWrap::Of : Type → Type ≡ λUN. SeedNode[UN] -- one-tag wrap
These are not “associated types you happen to read off an impl”; they
are first-class type-level functions. The library uses them where a
Haskell library would use f a quantified over f, or a Scala 3
library would use [F[_]]. In Rust, the type lambda is encoded as a
trait with a GAT, and applied via the projection <W as Wrap>::Of<UN>.
Lift as a triple of natural transformations
A Fold<N, H, R> has three phases:
init : N → H
accumulate : (H, R) → () (mutates H)
finalize : H → R
A Lift transforms one fold algebra into another. Per the categorical
intuition, that is a triple of natural transformations — one per
phase. The general primitive Shared::phases_lift exposes exactly
that structure: it takes three phase mappers, each a function that
takes the prior fold’s phase as a value and returns the new fold’s
phase:
init_mapper : (N → H) → (N₂ → H₂)
acc_mapper : ((H, R) → ()) → ((H₂, R₂) → ())
fin_mapper : (H → R) → (H₂ → R₂)
Compare with the wrap_init user closure:
W : (N, prior_init) → H -- curried form of the init_mapper
prior_init is the prior fold’s init phase, currying init_mapper
into a friendlier shape: instead of “give me a function, get a
function,” “give me a node and a function-on-nodes, get a value.” The
user’s orig argument is not a callback; it is the prior phase as a
first-class value, which composition needs as input. Drop the
parameter and you no longer have a phase mapper; you have a phase
replacement. Lift composition would stop being categorical.
This is the answer to “why does every wrap_* sugar take an orig
argument I sometimes don’t use.” The closure is a phase mapper. The
user not consulting orig is an identity-mapped composition — the
no-op natural transformation at that phase, which is structurally fine
but textually looks redundant.
Why CPS in Lift::apply
A direct apply signature would return the transformed triple:
apply : (Grow[Seed, N], Graph[N], Fold[N, H, R])
→ (Grow[Seed, N₂], Graph[N₂], Fold[N₂, H₂, R₂])
Each component is a domain-associated GAT and each axis is an associated type of the lift. After three composed lifts the return type involves three nested levels of associated-type projection, and no single name in the language admits the result without spelling all of them out. Rust’s type inference does not span that distance.
CPS — continuation-passing style — sidesteps the unnameable.
#![allow(unused)]
fn main() {
/// Domain-generic transformer over the `(treeish, fold)` pair.
///
/// A `Lift` rewrites the graph side and/or the fold side, possibly
/// changing their carrier types, and hands the result to a
/// continuation. The caller's continuation-return type `T` flows
/// through, so the chain of output types stays inferred across
/// composition (`ComposedLift<L1, L2>`).
///
/// Grow is deliberately absent from this signature. Only the Seed
/// finishing lift ([`SeedLift`](super::SeedLift)) needs a grow
/// input; it is composed internally by
/// `hylic_pipeline::PipelineExecSeed::run` and does not travel as
/// a 3-slot signature through the `Lift` trait.
///
/// See [Lifts](https://hylic.balcony.codes/concepts/lifts.html).
pub trait Lift<D, N, H, R>
where D: Domain<N> + Domain<Self::N2>,
N: Clone + 'static, H: Clone + 'static, R: Clone + 'static,
{
/// Output node type after the lift has been applied.
type N2: Clone + 'static;
/// Output heap type after the lift has been applied.
type MapH: Clone + 'static;
/// Output result type after the lift has been applied.
type MapR: Clone + 'static;
/// Apply the lift to `(treeish, fold)` and invoke `cont` with
/// the transformed pair.
fn apply<T>(
&self,
treeish: <D as Domain<N>>::Graph<N>,
fold: <D as Domain<N>>::Fold<H, R>,
cont: impl FnOnce(
<D as Domain<Self::N2>>::Graph<Self::N2>,
<D as Domain<Self::N2>>::Fold<Self::MapH, Self::MapR>,
) -> T,
) -> T;
}
}
apply takes a continuation cont: impl FnOnce(triple) -> T. The
continuation’s return type T flows out unchanged. Inference threads
each intermediate triple through the continuation locally; nothing has
to be named at the top level. The composition reads as nested closure
calls; the executor’s final T propagates outward through every
intermediate apply.
This is the same trick categorically: instead of returning a value of
some object in a category, take a hom-set element (a morphism out of
that object) and apply it. Rust’s impl Trait argument is a way of
saying “a morphism from this object to something”; the something
is whatever shows up in the call chain.
The two-hop projection
Stage2Pipeline<Base, L> is one struct. Base is Stage2Base:
#![allow(unused)]
fn main() {
/// A Stage-1 pipeline that can drive a Stage-2 chain. Carries the
/// `Wrap` selection plus the run-time machinery (pre-lift, root
/// reference, run-input shape).
///
/// Inherits `TreeishSource` so the `(treeish<N>, fold<N, H, R>)` pair is
/// yielded through one canonical path; `with_treeish` is the single
/// place per-base storage shapes are read.
///
/// `PreLift` is intentionally unbounded at the trait level. The
/// `Stage2Pipeline::run` impl adds the `Lift<…, N2 = <Wrap>::Of<N>>`
/// bound at use time; that keeps the supertrait surface free of the
/// `Domain<<Wrap>::Of<N>>` obligation that would otherwise propagate
/// through every site naming `Stage2Base`.
pub trait Stage2Base: TreeishSource + Sized {
/// Type-level dispatcher for the chain's input N.
/// `Identity` → `Of<UN> = UN` (treeish-rooted).
/// `SeedWrap` → `Of<UN> = SeedNode<UN>` (seed-rooted).
type Wrap: Wrap;
/// The user-facing N (the type user lambdas type at). Equal to
/// `Self::N` for every shipped base; kept distinct for
/// documentation symmetry with the sugar surface, which threads
/// `UN` as a method-level parameter.
type UserN: Clone + 'static;
/// What `.run(...)` accepts as its second argument. Parameterised
/// by `CurN`, the user-facing N at the chain tip (i.e. after any
/// `map_n_bi` lifts; `CurN = Self::N` if the chain doesn't change
/// the user N).
///
/// `Identity`-Wrap bases: `&'i CurN` (a borrowed post-chain root).
/// `SeedWrap` bases: an owned `(seeds, entry_heap)` pair (the
/// `CurN` parameter is unused at the value level — `EntryRoot` is
/// constructible at any inner type).
type RunInputs<'i, CurN: Clone + 'static>;
/// The lift composed at the head of the run-time chain.
/// `IdentityLift` for treeish-rooted, `SeedLift` for seed-rooted.
/// Pre-lift transforms `(treeish<N>, fold<N,H,R>)` into
/// `(treeish<Wrap::Of<N>>, fold<Wrap::Of<N>, H, R>)` without
/// touching H or R.
///
/// Unbounded at the trait level — see the trait-level note.
/// The `Stage2Pipeline::run` impl adds
/// `Self::PreLift: Lift<…, N2 = <Wrap>::Of<N>, MapH = H, MapR = R>`
/// at use time.
type PreLift;
/// Build the pre-lift from inputs (consuming the parts of inputs
/// the lift captures), then yield it together with the executor's
/// post-chain root reference to the continuation.
///
/// The continuation receives the pre-lift by value (consumed when
/// applied to the (treeish, fold) pair) and the root by reference,
/// at the post-chain type `<Self::Wrap as Wrap>::Of<CurN>`. The
/// reference is valid for the entire duration of `cont`.
///
/// `Identity` case: pre-lift is `IdentityLift`; the root is the
/// `&CurN` extracted from `inputs`.
/// `SeedWrap` case: pre-lift is `SeedLift::from_*_grow(...)`,
/// consuming `inputs.0` (entry seeds) and `inputs.1` (entry heap);
/// the root is `&SeedNode::entry_root::<CurN>()`, constructed
/// locally in this frame and alive for `cont`'s lifetime.
fn provide_run_essentials<CurN: Clone + 'static, T>(
&self,
inputs: Self::RunInputs<'_, CurN>,
cont: impl FnOnce(Self::PreLift,
&<Self::Wrap as Wrap>::Of<CurN>) -> T,
) -> T;
}
}
The chain’s input N is <Base::Wrap as Wrap>::Of<UN> — a two-hop
projection: first project Base::Wrap (a type), then project that type
through the Wrap::Of GAT at parameter UN. The full path:
<<Self::Base as Stage2Base>::Wrap as Wrap>::Of<UN>
|__________ ___________| |__ __| |
v v v
find Base from find that apply
Self's Stage2Base type's Wrap Wrap::Of
impl impl at UN
For Self::Base = TreeishPipeline<…>, the chain unfolds to
Identity::Of<UN> = UN. For Self::Base = SeedPipeline<…>, it unfolds
to SeedWrap::Of<UN> = SeedNode<UN>. Both reduce; both are a single
projection chain; both work in every position the library uses
(method-return type, where-clause, GAT projection).
What broke the first attempt
The original Phase-4 attempt had the sugar trait body call into a
helper trait whose return type was <Self as Helper<UN>>::N, while
the sugar method’s declared return type was the full
<<Base::Wrap as Wrap>::Of<UN>> projection. Both reduced to the same
concrete type for any specific impl, but the paths through the type
system differed. The trait solver does not bridge two extensionally
equal but syntactically distinct projections inside a default body.
The fix: don’t bridge. Have the sugar body call directly through the projection that already names the chain’s input N. The same projection sits in the return-type slot, the where-clause, and the build-method call. Rust’s solver verifies syntactic equality in each position; no reduction across distinct projection chains is ever required.
Variance is structural, not friction
Every axis-change sugar with _bi in its name (map_n_bi,
map_r_bi, map_node_bi, map_seed_bi) takes a pair (co, contra).
That is not Rust-specific verbosity. N, H, R are all invariant
in a fold algebra — N appears in init’s argument (contravariant)
and in Graph<N>’s child output (covariant); R appears in
finalize’s output and in accumulate’s child input. An invariant
type can only be transformed by an isomorphism; an isomorphism in
types is a pair of arrows.
Scala 3 needs the pair too. So does Haskell. The library’s choice is
to expose the pair explicitly, named at the call site, rather than
hide it behind an Iso or Bijection typeclass. The “extra” closure
is the structural witness that the transform is an iso. In an
invariant world, that witness can’t be elided.
Send + Sync as a per-domain axis
Domains differ on closure storage and bound:
| Domain | Closure cell | Bound on user closures |
|---|---|---|
Shared | Arc<dyn Fn …> | Send + Sync + 'static |
Local | Rc<dyn Fn …> | 'static |
Owned | Box<dyn Fn …> | 'static (and one-shot) |
The asymmetry is real: Shared parallel executors share the fold
across threads, so the fold’s closures have to be Send + Sync;
Local’s Rc storage actively forbids Send + Sync on captured
state, allowing things like Rc<RefCell<…>> that the Shared form
rejects.
Send + Sync cannot be expressed as a uniform parameterisation of one
trait without macros (the bound is on a concrete closure type, not a
projection-able shape). Sugars and the build dispatcher are therefore
split per domain: WrapShared/WrapLocal,
Stage2SugarsShared/Stage2SugarsLocal,
SeedSugarsShared/SeedSugarsLocal. The trait bodies read identically
line for line; only the bound differs.
Bounds at consumption, not construction
Stage2Pipeline::then_lift is unconstrained at the struct level —
pure construction:
#![allow(unused)]
fn main() {
/// Post-compose `outer` onto the chain. Pure struct construction;
/// no bounds. The composition's *meaningfulness* is enforced where
/// the chain is consumed (`.run_*`, `TreeishSource`).
pub fn then_lift<L2>(
self,
outer: L2,
) -> Stage2Pipeline<Base, ComposedLift<L, L2>> {
Stage2Pipeline {
base: self.base,
pre_lift: ComposedLift::compose(self.pre_lift, outer),
}
}
}
A pipeline whose chain wouldn’t actually .run is structurally
typeable. The compile-time check happens at consumption: .run_* and
the TreeishSource impl carry the chain-validity bounds. Construction
is a builder; validity is a runner concern. Imposing chain bounds at
every .then_lift would force every
intermediate composition to be runnable, which loses the “construct
freely, validate at the consumption boundary” pattern that lets
chained sugars compose without each one having to fully type-prove
the chain so far.
What this buys at runtime
Nothing has runtime overhead. The sugar trait monomorphises into a
chain of ComposedLift types; the type tree records every junction;
inlining flattens the chain into a single tree walk that produces one
(treeish, fold) pair. The executor never sees the chain — only its
collapsed result. Wrap dispatch resolves at compile time per
instantiation. The verbose projections in error messages are the price
of carrying that information through the type system; the compiled
binary has none of it.
What remains as friction
Three things, all structural:
-
No first-class higher-kinded types.
Wrap::Ofis the closest approximation. Verbose two-hop projections are the cost. -
No type-level pattern matching (no Scala 3 match types). Rust cannot decompose
SeedNode<UN>intoUNat the type level. If a trait’s bound saysL::N2 = SeedNode<UN>,UNmust be supplied from elsewhere (e.g., a closure argument’s inferred type) — Rust will not invert the constructor. -
No macros. The Shared/Local mirror could be one file with per-domain bound sugar, but the codebase declines macro-generated trait bodies. The duplication is documented and accepted.
What is not friction: bidirectional axis transforms (universal),
orig callbacks in wrap_init (structural natural-transformation
shape), CPS in apply (only way to thread unnameable returns). Those
are the right shapes; Rust just exposes them at the level the
abstraction needs.
Wrap_init as a phase mapper
The wrap_init family deserves a closer read because the user closure
looks like a callback-with-fallthrough but is actually a curried
phase mapper. The sugar’s user signature is
W : (N, prior_init: dyn Fn N → H) → H
curried from the underlying
init_mapper : (N → H) → (N → H)
≡ Fn(prior_init) → Fn(N → H) -- uncurried
≡ Fn(N, prior_init) → H -- curried; same content
The sugar’s body in the library realises this:
#![allow(unused)]
fn main() {
pub fn wrap_init_lift<N, H, R, W>(wrapper: W) -> ShapeLift<Shared, N, H, R, N, H, R>
where
N: Clone + 'static, H: Clone + 'static, R: Clone + 'static,
W: Fn(&N, &dyn Fn(&N) -> H) -> H + Send + Sync + 'static,
{
let w = Arc::new(wrapper);
// init_mapper: (N → H) → (N → H). Curries the user's W with the prior init.
let mi = move |old: Arc<dyn Fn(&N) -> H + Send + Sync>|
-> Arc<dyn Fn(&N) -> H + Send + Sync> {
let w = w.clone();
Arc::new(move |n: &N| w(n, &*old))
};
Shared::phases_lift::<N, H, R, H, R, _, _, _>(
mi,
Shared::identity_acc_mapper::<H, R>(),
Shared::identity_fin_mapper::<H, R>(),
)
}
}
Reading the body: take the user wrapper w, return a function from
the prior init old to a new init that, on each n, calls
w(n, &*old). The closure-passed-in is the prior phase, exposed as a
value. That is the structural definition of a phase mapper. The
“intercept” framing is incidental; the categorical content is
compose this layer’s natural transformation with the prior layer’s.
The same shape recurs, with appropriate types, in
wrap_accumulate_lift and wrap_finalize_lift. The general primitive
they all collapse to is Shared::phases_lift — three phase mappers,
one per phase, each taking the prior phase and producing the next.
Reading list
- Crichton, “GATs are HOFs” for the GAT framing.
- Lifts for the trait shape and the four atoms.
- Wrap dispatch for the surface where the type-level machinery lands at the user’s call site.