BLISP Research Program

Program Abstract

Overview

AI systems increasingly generate computation rather than having humans write it directly. When the generator is stochastic, the execution system must determine which proposals are admissible, which surface forms are equivalent, whether results can be replayed, and where two executions diverge. This program develops a formal and empirical framework for these problems.

Paper 1 establishes the admissibility boundary: a grounding gate that rejects valid-but-unwarranted operations before execution. Paper 2 formalizes the canonical execution boundary: typed specifications, a canonicalization pipeline, 8-layer provenance hashing, and description/identity separation. Paper 3 proves that the operational equivalence is a congruence, enabling a quotient category that gives precise meaning to deterministic execution identity. Paper 4 defines provenance as a semantic factorization with a dependency-indexed composition law, enabling divergence localization and partial replay. Paper 5 measures the empirical fiber structure of 2,200 stochastic proposals under controlled perturbation, demonstrating that surface-form variation is absorbed while provenance-level changes create clean transitions.

Paper 10 proves that safe compositional caching requires the cache key to induce a congruence on the computation algebra, and demonstrates empirically that data-hash keying violates this condition (97 false hits) while identity-hash keying satisfies it (0 false hits). This is the theoretical anchor: it characterizes why computation identity works.

Papers 6–7 investigate the semantic structure of operations themselves: a single 7-valued coordinate (DependencyClass) predicts four independent optimizer behaviors at 99.6% accuracy and generalizes to unseen operations at 100%. Paper 8 tests whether this structure transfers to independently-developed systems: the frozen taxonomy predicts execution behavior in Polars and DuckDB at 91.1% combined accuracy, with zero errors from incorrect dependency-shape assignments. Paper 9 asks whether agents reconstruct structurally equivalent execution-identity primitives under task pressure: across three domains and three model families, 7/8 primitives converge above 0.90.

Paper 11 maps the emerging “verified AI actions” landscape into three layers—post-hoc audit, policy gates, and semantic verification—and presents the first implemented and empirically evaluated system for runtime semantic verification of AI tool selection.

All constructions are operational, registry-relative, and grounded in a running system (BLISP) evaluated in systematic trading research. The architecture is domain-independent; the evaluation is not.

Papers

All Eleven Papers

PAPER 1

The Grounding Gate: Admissibility and Replay Guarantees for AI-Driven Research

DOI: 10.5281/zenodo.20817087

AI systems that generate computational pipelines from natural language may propose operations that are structurally valid but semantically unwarranted. This paper presents a grounding gate: a mandatory admissibility boundary between AI-proposed operations and deterministic execution. The system discovers which capabilities match the user's terms by querying a live registry (236 capabilities) and rejects proposals whose names lack discovery evidence. Evaluated on 30 prompts: unwarranted execution reduced from 23.3% to 10.0% (Fisher exact p = 0.027). Replay produces bit-identical hashes across 50 runs. Grounding overhead under 14 ms.

PDF Zenodo Artifacts Paper Card

Prerequisites: None

PAPER 2

Canonical Execution Semantics for Stochastic Program Generators

doi:10.5281/zenodo.20457255

When the generator of computation is stochastic, independently generated programs that represent the same intended computation arrive in different surface forms. This paper presents the canonical execution boundary: an architectural invariant beyond which stochasticity does not propagate. Four mechanisms enforce the boundary: typed specifications, a canonicalization pipeline (278 surface forms to 235 canonical operations), 8-layer execution hashing, and description/identity separation. Evaluated on 1,200 stochastic LLM generations with 50-run replay determinism and provenance stability under registry evolution.

PDF Zenodo Artifacts Paper Card

Prerequisites: Paper 1

PAPER 3

Execution Categories for Stochastic Program Generators: Quotient Semantics for Deterministic Executable Identity

doi:10.5281/zenodo.20457403

The operational equivalence generated by the system's rewrite rules (alias resolution, argument-order normalization, canonical form selection) forms a congruence: equivalent subexpressions remain equivalent under arbitrary well-typed pipeline composition. This is the central formal result of the program. The resulting quotient category gives precise meaning to deterministic execution identity. Content-addressed hashing serves as a computable operational witness of quotient membership. A projection connects stochastic proposals to their execution classes, with fibers measuring collapse from surface diversity to canonical identity.

PDF Zenodo Artifacts Paper Card

Prerequisites: Papers 1-2

PAPER 4

Provenance Algebra for Deterministic AI Execution: Replay Semantics for Stochastic Program Generators

doi:10.5281/zenodo.20457667

Provenance for deterministic execution systems is not metadata but a semantic factorization of execution identity. A provenance map decomposes each execution equivalence class into an 8-layer hash record with declared dependencies. A dependency-indexed composition law establishes that pipeline provenance is determined by stage provenance and the declared dependency map. This enables replay equivalence by hash comparison, divergence localization to specific semantic layers, partial replay of only changed layers, and provenance-preserving registry evolution where discovery aliases are invisible at all eight layers.

PDF Zenodo Artifacts Paper Card

Prerequisites: Papers 1-3

PAPER 5

Proposal Collapse and Execution Fibers in Stochastic Program Generation

doi:10.5281/zenodo.20457990

Two distinct kinds of variation emerge when stochastic generators propose executable specifications: surface-form variation (absorbed by canonicalization, intra-fiber) and execution ambiguity (changing execution identity, inter-fiber). Across 2,200 proposals with controlled perturbations: synonym rewording stays within fibers (rho = 0.985), metric and family substitutions produce zero same-fiber mass (rho = 0.000) with perfect per-variant stability (sigma = 1.000). The execution adjacency graph is sparse (density = 0.095, 10 connected components). The key finding is that provenance-level changes create clean, stable transitions between execution classes, not noisy instability.

PDF Zenodo Artifacts Paper Card

Prerequisites: Papers 1-4

PAPER 6

The Semantic Structure of Execution: An Empirical Study of Predictive Coordinates in Computational Operations

doi:10.5281/zenodo.20612709

A single 7-valued coordinate (DependencyClass) classifies operations by data-dependency shape and predicts four independent optimizer behaviors—fusion eligibility, window semantics, pipeline position, and state management—with 99.6% accuracy (243/244 behavior predictions, z = 13.0, p < 10⁻³⁸ vs random baseline). The coordinate is not a descriptive label; it is a predictive object that determines execution behavior from semantic structure alone.

PDF Zenodo Artifacts

Prerequisites: Papers 1-5

PAPER 7

Semantic Coordinates as Predictive Objects in Time-Series Computation

doi:10.5281/zenodo.20706294

A frozen taxonomy trained on 61 operations generalizes to 25 unseen operations at 100% accuracy (100/100 holdout predictions) with zero recalibration. Coordinate ablation confirms that the full coordinate is minimal—removing any single dimension degrades prediction. Random baselines with equivalent cardinality achieve chance accuracy. The result establishes semantic coordinates as predictive objects: they predict optimizer behavior, not merely describe it.

PDF Zenodo Artifacts

Prerequisites: Papers 1-6

PAPER 8

Dependency Shape Predicts Execution Behavior Across Independent Data Processing Systems

doi:10.5281/zenodo.20706086

A frozen 8-valued dependency-shape taxonomy, built without inspecting either target system, predicts three execution behaviors (streaming, buffering, warmup) in Polars (Rust, morsel-driven) and DuckDB (C++, push-based). Buffering predictions reach 96.7% accuracy in both systems. Combined accuracy across 180 predictions is 91.1%, with zero errors from incorrect dependency-shape assignments. All errors trace to architectural choices and API conventions, not to the taxonomy itself.

PDF Zenodo Artifacts

Prerequisites: Papers 1-7

PAPER 9

Agents Reconstruct Execution Identity Algebra Under Task Pressure

doi:10.5281/zenodo.20706156

Independent frontier model families (Anthropic, OpenAI, Google), working on independent domains (finance, SQL, build/CI), reconstruct structurally equivalent execution-identity primitives under task pressure. Nine question tiers of increasing difficulty elicit eight primitives: normalization, canonical identity, equivalence classes, grouping, composite rewriting, replay mappings, computation DAGs, and policy checking. 7/8 primitives converge above 0.90 across 55 runs. Reconstruction is convergent, staged, and expensive (~178,000 tokens per reconstruction). A reference implementation materializes the same eight primitives as persistent, composable, domain-portable infrastructure at zero marginal query cost.

PDF Zenodo Artifacts

Prerequisites: Papers 1-8

PAPER 10

When Data-Hash Caching Fails: False Hits in Parameterized Pipeline Search

DOI: 10.5281/zenodo.20815342

Sub-expression caching keyed on data hashes produces silent false hits when the same intermediate data flows through differently parameterized pipeline branches. In a 515-comparison experiment across 9 strategy families, data-hash keying produces 97 false hits (18.8%). Identity-hash keying (MOR_HSH/SRH_HSH) produces zero. The paper proves that safe compositional caching requires the cache key to induce a congruence—an equivalence preserved under composition—on the computation algebra. Data-hash equivalence is not a congruence; canonical equivalence is. This is the theoretical anchor of the program: it characterizes why computation identity works and what breaks without it.

PDF Zenodo

Prerequisites: Papers 2-3 (canonical execution, congruence)

PAPER 11

Verified AI Actions: Closing the Pre-Action Legitimacy Gap

DOI: 10.5281/zenodo.20816935

Every production tool-calling protocol treats tool selection as an assertion sufficient for execution. This paper identifies a verification gap and maps the emerging landscape into three layers: post-hoc audit (deployed), policy gates (emerging), and semantic verification (this work). We present the first implemented and empirically evaluated system for runtime semantic verification of AI tool selection, using content-addressed behavioral identity and a runtime grounding wall. In 180 trials, semantic verification reduced uncaught wrong-tool selection from 23.3% to 10.0% (Fisher exact p = 0.027). A 10-system comparison table distinguishes integrity verification (hashing what a tool is) from behavioral verification (hashing what a tool does).

PDF Zenodo

Prerequisites: Papers 1, 10 (grounding gate, false hits)

Paper	Dataset	Size
Paper 1	30-prompt evaluation (5 categories, 4 families, 9 metrics)	prompts_30.json
Paper 2	1,200 LLM generations (30 prompts x 4 temps x 10 reps), replay CSV, provenance CSV	experiment-data.tar.gz
Paper 3	Theoretical paper, no experiment data	--
Paper 4	Theoretical paper, no experiment data	--
Paper 5	2,200 proposals (1,200 baseline + 1,000 perturbations), fiber stats, adjacency graph	experiment-data.tar.gz
Paper 6	61-operation taxonomy, 4 optimizer behavior predictions, conditional MI analysis, holdout data	cargo test
Paper 7	25-operation holdout generalization, coordinate ablation, random baseline comparison	cargo test
Paper 8	30 operations × 2 systems × 3 behaviors (180 predictions), Polars + DuckDB	reproduce.sh
Paper 9	55 runs across 3 model families × 3 domains × 9 question tiers, ~178k tokens per run	reproduce.sh
Paper 10	515 comparisons across 9 strategy families, 3 cache modes (NONE/DAT/SRH), false-hit detection	cargo test
Paper 11	180 trials (grounded vs unconstrained), 9 grounding wall property tests, 10-system comparison	cargo test

Overview

Reading Paths by Audience

Researchers

Engineers

AI Practitioners

Investors

If you read only one paper

Dependency Graph

All Eleven Papers

Reading Order and Artifacts

Experiment Data

How to Cite

Research Program

Individual Papers

#	Paper	Type	Pages	DOI	Release
1	The Grounding Gate	Empirical	13	20817087	v1
2	Canonical Execution Semantics	Empirical	23	20457255	v1
3	Execution Categories	Formal	14	20457403	v1
4	Provenance Algebra	Formal	15	20457667	v1
5	Execution Fibers	Empirical	12	20457990	v1
6	The Semantic Structure of Execution	Empirical	17	20612709	v1
7	Semantic Coordinates as Predictive Objects	Empirical	14	20706294	v1
8	Dependency Shape Predicts Execution Behavior	Empirical	17	20706086	v1
9	Cross-Family Convergence	Empirical	17	20706156	v1
10	When Data-Hash Caching Fails	Formal + Empirical		20815342	v1
11	Verified AI Actions	Position + Empirical	6	20816935	v1

#	BibTeX Key	DOI
1	dionysopoulos2026grounding	10.5281/zenodo.20817087
2	dionysopoulos2026canonical	10.5281/zenodo.20457255
3	dionysopoulos2026categories	10.5281/zenodo.20457403
4	dionysopoulos2026provenance	10.5281/zenodo.20457667
5	dionysopoulos2026fibers	10.5281/zenodo.20457990
6	dionysopoulos2026semantic	10.5281/zenodo.20612709
7	dionysopoulos2026predictive	10.5281/zenodo.20706294
8	dionysopoulos2026transfer	10.5281/zenodo.20706086
9	dionysopoulos2026convergence	10.5281/zenodo.20706156
10	dionysopoulos2026falsehits	10.5281/zenodo.20815342
11	dionysopoulos2026verifiedactions	10.5281/zenodo.20816935