Deterministic execution substrate

Deterministic execution for AI‑generated research.

BLISP lets stochastic agents propose computations while a typed execution layer grounds, canonicalizes, executes, hashes, and replays them deterministically. No unwarranted operation reaches execution. Every result is replayable by hash.

BLISP does not try to make LLMs deterministic; it makes the execution boundary deterministic.

Read the research program View the architecture

Prompt→

Agent Proposal→

Grounding Gate→

Canonical Execution→

8-Layer Provenance→

Replayable Result

23.3% → 10.0%

Valid-but-unwarranted executions reduced by grounding gate

100% → 0%

Unwarranted executions on undiscoverable prompts

50/50

Replay runs produced bit-identical execution hashes

<14 ms

Grounding overhead per request

The problem

AI agents can reason. They cannot be trusted to execute unchecked.

Large language models propose computational pipelines from natural-language prompts. The operations they select may be structurally valid but semantically unwarranted—the operation exists in the system, but the user's request does not justify it. Schema validation catches malformed output. It does not catch valid-but-wrong execution.

Example: valid-but-unwarranted execution

User request

“Build a momentum strategy on equity futures, ranked by Sharpe ratio.”

Agent proposal

Family: MOM_REV (mean-reversion)
Metric: SRP (Sharpe)

Both are valid capabilities. Schema validation passes. The pipeline executes—and produces the opposite computational signal.

The output is correct in form and exactly wrong in substance. Constrained decoding restricts the model to the full set of valid names—all 36 family×metric pairs—but not to the per-prompt discovered subset. The grounding gate restricts to discovered names only.

System design

The missing boundary between proposal and execution.

BLISP interposes a mandatory admissibility boundary—the grounding gate—between stochastic reasoning and deterministic execution. Above the boundary, agents propose. Below it, everything is deterministic, typed, and content-addressed.

01 Registry

A live capability registry (244 operations, 4 strategy families, 9 metrics): operations, families, signal blocks, and recipes. Each entry is hashed over semantic, algebraic, and implementation layers.

02 Discovery

Given natural-language terms, the system matches against the live registry using a four-tier cascade: exact, alias, tag, keyword. Unresolved terms cannot reach execution.

03 Grounding Gate

A deterministic function that checks whether every capability name in the agent's proposal has evidence in the discovery result. Names lacking evidence are rejected.

04 Specification

Admitted proposals become typed specification records with family, metric, parameter ranges, and data source. Parameter ranges expand into a morphism grid via Cartesian product.

05 Canonicalization

Expressions are parsed, normalized, canonicalized, planned, and optimized through a six-stage typed compilation pipeline. Surface syntax differences collapse to canonical identity.

06 Execution

Each admissible morphism executes through a typed deterministic execution engine. Deterministic: same input, same registry, same output. No randomness below the boundary.

07 Provenance

Every execution produces an 8-layer hash decomposing provenance into registry, request, morphisms, plans, artifacts, score, selection, and data. Fault localization without re-execution.

08 Replay

Identical grounded requests against identical data and registry produce bit-identical hashes. Compare two hashes to verify replay. Compare sub-hashes to localize divergence.

Design principle

Description/identity separation.

Each capability is hashed over three layers: semantic properties, algebraic type signature, and implementation details. A fourth layer—discovery metadata (aliases, tags, descriptions)—is explicitly excluded from the identity hash. Adding an alias like “log returns” → dlog changes what agents can discover; it does not change what dlog computes. The registry can improve discoverability without invalidating any prior execution hash.

Research Program

Eleven papers

The scientific backbone.

BLISP is built on an eleven-paper research program that formalizes the execution semantics, computation identity, provenance structure, semantic coordinates, and behavioral geometry of AI-generated computation. Paper 11 maps the emerging “verified AI actions” landscape and presents the first implemented system for runtime semantic verification of AI tool selection.

Program DOI: 10.5281/zenodo.20459958

Paper 1

The Grounding Gate

A mandatory admissibility boundary between stochastic AI reasoning and deterministic execution. Proposals whose capability names lack evidence from the user's terms are rejected before execution.

F3 rate: 23.3% → 10.0% (p = 0.027)
Undiscoverable: 100% → 0%

DOI: 10.5281/zenodo.20817087

Paper 2

Canonical Execution Semantics

A typed specification space, canonicalization pipeline, and content-addressed hashing scheme that provides execution identity independent of surface syntax.

278 → 235 canonical ops · 1,200 LLM generations
50/50 bit-identical replays

DOI: 10.5281/zenodo.20457255

Paper 3

Execution Categories

Stochastic prompt variation defines an equivalence relation on the execution space. Prompts that produce the same canonical execution form a quotient class. Execution fibers bundle equivalent proposals.

Congruence · quotient category · fiber projection

DOI: 10.5281/zenodo.20457403

Paper 4

Provenance Algebra

Every execution produces a decomposable provenance record. Sub-hash comparison localizes divergence without re-execution. Drift detection isolates which semantic layer changed.

Compositional provenance · divergence localization · partial replay

DOI: 10.5281/zenodo.20457667

Paper 5

Execution Fibers

Under stochastic prompt variation, many distinct proposals collapse into few execution identities. Synonym perturbations stay intra-fiber. Metric/family substitutions produce clean inter-fiber transitions.

2,200 proposals · synonym ρ = 0.985
metric/family ρ = 0.000 · σ = 1.000

DOI: 10.5281/zenodo.20457990

Paper 6

The Semantic Structure of Execution

A single 7-valued coordinate (DependencyClass) predicts four independent optimizer behaviors with 99.6% accuracy. The coordinate is a predictive object, not a label.

243/244 predictions · z = 13.0 · p < 10⁻³⁸

DOI: 10.5281/zenodo.20612709

Paper 7

Semantic Coordinates as Predictive Objects

Frozen taxonomy generalizes to 25 unseen operations at 100% accuracy. Ablation confirms the coordinate is minimal. Random baselines achieve chance.

100/100 holdout · MI explains 96.7–100% of entropy

DOI: 10.5281/zenodo.20706294

Paper 8

Cross-System Transfer

Frozen dependency-shape taxonomy predicts execution behavior in Polars and DuckDB at 91.1% combined accuracy. Zero errors from taxonomy assignments.

180 predictions · buffering 96.7% in both systems

DOI: 10.5281/zenodo.20706086

Paper 9

Agent Convergence

Independent model families reconstruct structurally equivalent execution-identity primitives under task pressure. 7/8 primitives converge above 0.90.

55 runs · 3 model families · 3 domains · ~178k tokens/run

DOI: 10.5281/zenodo.20706156

Paper 10

When Data-Hash Caching Fails

Safe compositional caching requires the cache key to induce a congruence on the computation algebra. Data-hash keying violates this; identity-hash keying satisfies it.

97 false hits (DataHash) · 0 false hits (IdentityHash)
Theorem 1: cache key correctness iff congruence

DOI: 10.5281/zenodo.20815342

Paper 11

Verified AI Actions

The first implemented system for runtime semantic verification of AI tool selection. Three verification layers, a runtime grounding wall, and the architecture requirements for a cross-framework verification protocol.

Wrong-tool: 23.3% → 10.0% (p = 0.027)
9 property tests · 1,600/1,600 total pass

DOI: 10.5281/zenodo.20816935

Why it matters

Different audiences, one execution problem.

For AI research

Agents need execution substrates, not just tool APIs.

Tool-augmented LLMs select tools directly with no admission gate between selection and execution. A valid but wrong tool call produces a silent failure. The grounding gate makes tool admission evidence-based and deterministic.

For research

Computations must be replayable, comparable, and attributable.

Two researchers running the same grounded request against the same data get bit-identical results. When results differ, 8-layer sub-hash comparison localizes the divergence to a specific semantic layer without re-execution.

For finance

Systematic research needs deterministic provenance from prompt to portfolio.

Strategy families, scoring metrics, and parameter grids are content-addressed. Every research pipeline has a verifiable execution fingerprint. Six months later, the hash still validates.

For infrastructure

BLISP turns agent outputs into typed, admissible, content-addressed executions.

The execution layer is domain-independent. Finance is the first package. The architecture—discovery, grounding, canonicalization, provenance—applies to any domain where AI-generated pipelines must be validated before execution.

Infrastructure thesis

Why this can become infrastructure.

Agentic AI increases the volume of generated computations. Most will be plausible. Not all will be warranted.
Regulated and scientific domains cannot execute black-box proposals. Auditable provenance is a requirement, not a feature.
BLISP sits between LLMs and execution engines—an admission and provenance layer that neither side provides alone.
Initial wedge: quantitative research and reproducible computational workflows where execution identity already matters.
Long-term position: a deterministic execution substrate for AI agents across domains that require typed, verifiable, replayable computation.

BLISP does not make the model truthful. It prevents unwarranted proposals from silently becoming executions. The model reasons stochastically. The execution layer operates deterministically. The boundary between them is the contribution.

Formal structure

The execution pipeline, formally.

E_R ⟶ Γ ⟶ B_R/∼_R ⟶ κ_R ⟶ ε_R ⟶ P_R

E_R: Stochastic proposal space—all agent-generated proposals
Γ: Grounding gate—rejects proposals without discovery evidence
B_R/∼_R: Execution identity—equivalence classes under canonicalization
κ_R: Canonical representative—one expression per equivalence class
ε_R: Deterministic execution—same canonical input, same output
P_R: 8-layer provenance record—decomposable, content-addressed

Stochastic prompt variation generates many elements of E_R. The grounding gate Γ admits only proposals with discovery evidence. Canonicalization collapses admitted proposals into equivalence classes B_R/∼_R, each with a unique canonical representative κ_R. Execution ε_R is a function on canonical representatives—deterministic by construction. The provenance record P_R decomposes the full execution into 8 semantic layers for audit and fault localization.