Canonical Reference

BLISP Research Program

An eleven-paper research program on admissibility, deterministic execution, computation identity, semantic coordinates, cross-system transfer, agent convergence, and verified AI actions.

Author: Thomas Dionysopoulos · 11 papers · all published with DOIs

Program DOI 10.5281/zenodo.20459958 Program DOI: 10.5281/zenodo.20459958

Program Abstract

Overview

AI systems increasingly generate computation rather than having humans write it directly. When the generator is stochastic, the execution system must determine which proposals are admissible, which surface forms are equivalent, whether results can be replayed, and where two executions diverge. This program develops a formal and empirical framework for these problems.

Paper 1 establishes the admissibility boundary: a grounding gate that rejects valid-but-unwarranted operations before execution. Paper 2 formalizes the canonical execution boundary: typed specifications, a canonicalization pipeline, 8-layer provenance hashing, and description/identity separation. Paper 3 proves that the operational equivalence is a congruence, enabling a quotient category that gives precise meaning to deterministic execution identity. Paper 4 defines provenance as a semantic factorization with a dependency-indexed composition law, enabling divergence localization and partial replay. Paper 5 measures the empirical fiber structure of 2,200 stochastic proposals under controlled perturbation, demonstrating that surface-form variation is absorbed while provenance-level changes create clean transitions.

Paper 10 proves that safe compositional caching requires the cache key to induce a congruence on the computation algebra, and demonstrates empirically that data-hash keying violates this condition (97 false hits) while identity-hash keying satisfies it (0 false hits). This is the theoretical anchor: it characterizes why computation identity works.

Papers 6–7 investigate the semantic structure of operations themselves: a single 7-valued coordinate (DependencyClass) predicts four independent optimizer behaviors at 99.6% accuracy and generalizes to unseen operations at 100%. Paper 8 tests whether this structure transfers to independently-developed systems: the frozen taxonomy predicts execution behavior in Polars and DuckDB at 91.1% combined accuracy, with zero errors from incorrect dependency-shape assignments. Paper 9 asks whether agents reconstruct structurally equivalent execution-identity primitives under task pressure: across three domains and three model families, 7/8 primitives converge above 0.90.

Paper 11 maps the emerging “verified AI actions” landscape into three layers—post-hoc audit, policy gates, and semantic verification—and presents the first implemented and empirically evaluated system for runtime semantic verification of AI tool selection.

All constructions are operational, registry-relative, and grounded in a running system (BLISP) evaluated in systematic trading research. The architecture is domain-independent; the evaluation is not.

Start Here

Reading Paths by Audience

Researchers

Read Paper 1 (grounding gate), then Paper 2 (execution semantics). Papers 35 for formal quotient semantics, provenance, and fibers. Papers 67 for semantic coordinates as predictive objects. Paper 8 for cross-system transfer. Paper 9 for agent convergence.

Engineers

Start with Paper 1 for the architecture and grounding gate. Paper 2 describes the canonicalization pipeline and 8-layer hashing you would implement. Paper 6 shows how a single coordinate predicts optimizer behavior. Paper 8 demonstrates cross-system portability.

AI Practitioners

Paper 1 addresses valid-but-unwarranted execution in LLM tool use. Paper 5 measures how LLM-generated proposals behave under controlled perturbation. Paper 9 shows that independent agents reconstruct structurally equivalent execution-identity primitives under task pressure.

Investors

Paper 1 establishes the core value proposition. Paper 8 proves the taxonomy transfers across systems. Paper 9 demonstrates convergent reconstruction by agents, implying the structure is natural and worth materializing. The papers portal provides the complete picture.

If you read only one paper

Read Paper 1: The Grounding Gate. It introduces the core problem (valid-but-unwarranted execution), the grounding gate architecture, and the empirical evaluation. No prerequisites. 13 pages.

Structure

Dependency Graph

Each paper depends on all preceding papers. The program is a linear chain, not a DAG.

Paper 1
Grounding Gate
empirical
-->
Paper 2
Canonical Exec
empirical
-->
Paper 3
Categories
formal
-->
Paper 4
Provenance
formal
-->
Paper 5
Fibers
empirical
Paper 6
Semantic Structure
empirical
-->
Paper 7
Predictive Objects
empirical
-->
Paper 8
Cross-System
empirical
-->
Paper 9
Convergence
empirical
Paper 10
False Hits
formal + empirical
-->
Paper 11
Verified Actions
position + empirical
Empirical papers contain experiments and data. Formal papers contain definitions, propositions, and proofs.
Green = foundation (Papers 1–5). Cyan = semantic coordinates → transfer → convergence (Papers 6–9). Violet = identity applications (Papers 10–11).
Papers

All Eleven Papers

PAPER 1
The Grounding Gate: Admissibility and Replay Guarantees for AI-Driven Research

AI systems that generate computational pipelines from natural language may propose operations that are structurally valid but semantically unwarranted. This paper presents a grounding gate: a mandatory admissibility boundary between AI-proposed operations and deterministic execution. The system discovers which capabilities match the user's terms by querying a live registry (236 capabilities) and rejects proposals whose names lack discovery evidence. Evaluated on 30 prompts: unwarranted execution reduced from 23.3% to 10.0% (Fisher exact p = 0.027). Replay produces bit-identical hashes across 50 runs. Grounding overhead under 14 ms.

Prerequisites: None
PAPER 2
Canonical Execution Semantics for Stochastic Program Generators

When the generator of computation is stochastic, independently generated programs that represent the same intended computation arrive in different surface forms. This paper presents the canonical execution boundary: an architectural invariant beyond which stochasticity does not propagate. Four mechanisms enforce the boundary: typed specifications, a canonicalization pipeline (278 surface forms to 235 canonical operations), 8-layer execution hashing, and description/identity separation. Evaluated on 1,200 stochastic LLM generations with 50-run replay determinism and provenance stability under registry evolution.

Prerequisites: Paper 1
PAPER 3
Execution Categories for Stochastic Program Generators: Quotient Semantics for Deterministic Executable Identity

The operational equivalence generated by the system's rewrite rules (alias resolution, argument-order normalization, canonical form selection) forms a congruence: equivalent subexpressions remain equivalent under arbitrary well-typed pipeline composition. This is the central formal result of the program. The resulting quotient category gives precise meaning to deterministic execution identity. Content-addressed hashing serves as a computable operational witness of quotient membership. A projection connects stochastic proposals to their execution classes, with fibers measuring collapse from surface diversity to canonical identity.

Prerequisites: Papers 1-2
PAPER 4
Provenance Algebra for Deterministic AI Execution: Replay Semantics for Stochastic Program Generators

Provenance for deterministic execution systems is not metadata but a semantic factorization of execution identity. A provenance map decomposes each execution equivalence class into an 8-layer hash record with declared dependencies. A dependency-indexed composition law establishes that pipeline provenance is determined by stage provenance and the declared dependency map. This enables replay equivalence by hash comparison, divergence localization to specific semantic layers, partial replay of only changed layers, and provenance-preserving registry evolution where discovery aliases are invisible at all eight layers.

Prerequisites: Papers 1-3
PAPER 5
Proposal Collapse and Execution Fibers in Stochastic Program Generation

Two distinct kinds of variation emerge when stochastic generators propose executable specifications: surface-form variation (absorbed by canonicalization, intra-fiber) and execution ambiguity (changing execution identity, inter-fiber). Across 2,200 proposals with controlled perturbations: synonym rewording stays within fibers (rho = 0.985), metric and family substitutions produce zero same-fiber mass (rho = 0.000) with perfect per-variant stability (sigma = 1.000). The execution adjacency graph is sparse (density = 0.095, 10 connected components). The key finding is that provenance-level changes create clean, stable transitions between execution classes, not noisy instability.

Prerequisites: Papers 1-4
PAPER 6
The Semantic Structure of Execution: An Empirical Study of Predictive Coordinates in Computational Operations

A single 7-valued coordinate (DependencyClass) classifies operations by data-dependency shape and predicts four independent optimizer behaviors—fusion eligibility, window semantics, pipeline position, and state management—with 99.6% accuracy (243/244 behavior predictions, z = 13.0, p < 10−38 vs random baseline). The coordinate is not a descriptive label; it is a predictive object that determines execution behavior from semantic structure alone.

Prerequisites: Papers 1-5
PAPER 7
Semantic Coordinates as Predictive Objects in Time-Series Computation

A frozen taxonomy trained on 61 operations generalizes to 25 unseen operations at 100% accuracy (100/100 holdout predictions) with zero recalibration. Coordinate ablation confirms that the full coordinate is minimal—removing any single dimension degrades prediction. Random baselines with equivalent cardinality achieve chance accuracy. The result establishes semantic coordinates as predictive objects: they predict optimizer behavior, not merely describe it.

Prerequisites: Papers 1-6
PAPER 8
Dependency Shape Predicts Execution Behavior Across Independent Data Processing Systems

A frozen 8-valued dependency-shape taxonomy, built without inspecting either target system, predicts three execution behaviors (streaming, buffering, warmup) in Polars (Rust, morsel-driven) and DuckDB (C++, push-based). Buffering predictions reach 96.7% accuracy in both systems. Combined accuracy across 180 predictions is 91.1%, with zero errors from incorrect dependency-shape assignments. All errors trace to architectural choices and API conventions, not to the taxonomy itself.

Prerequisites: Papers 1-7
PAPER 9
Agents Reconstruct Execution Identity Algebra Under Task Pressure

Independent frontier model families (Anthropic, OpenAI, Google), working on independent domains (finance, SQL, build/CI), reconstruct structurally equivalent execution-identity primitives under task pressure. Nine question tiers of increasing difficulty elicit eight primitives: normalization, canonical identity, equivalence classes, grouping, composite rewriting, replay mappings, computation DAGs, and policy checking. 7/8 primitives converge above 0.90 across 55 runs. Reconstruction is convergent, staged, and expensive (~178,000 tokens per reconstruction). A reference implementation materializes the same eight primitives as persistent, composable, domain-portable infrastructure at zero marginal query cost.

Prerequisites: Papers 1-8
PAPER 10
When Data-Hash Caching Fails: False Hits in Parameterized Pipeline Search

Sub-expression caching keyed on data hashes produces silent false hits when the same intermediate data flows through differently parameterized pipeline branches. In a 515-comparison experiment across 9 strategy families, data-hash keying produces 97 false hits (18.8%). Identity-hash keying (MOR_HSH/SRH_HSH) produces zero. The paper proves that safe compositional caching requires the cache key to induce a congruence—an equivalence preserved under composition—on the computation algebra. Data-hash equivalence is not a congruence; canonical equivalence is. This is the theoretical anchor of the program: it characterizes why computation identity works and what breaks without it.

Prerequisites: Papers 2-3 (canonical execution, congruence)
PAPER 11
Verified AI Actions: Closing the Pre-Action Legitimacy Gap

Every production tool-calling protocol treats tool selection as an assertion sufficient for execution. This paper identifies a verification gap and maps the emerging landscape into three layers: post-hoc audit (deployed), policy gates (emerging), and semantic verification (this work). We present the first implemented and empirically evaluated system for runtime semantic verification of AI tool selection, using content-addressed behavioral identity and a runtime grounding wall. In 180 trials, semantic verification reduced uncaught wrong-tool selection from 23.3% to 10.0% (Fisher exact p = 0.027). A 10-system comparison table distinguishes integrity verification (hashing what a tool is) from behavioral verification (hashing what a tool does).

Prerequisites: Papers 1, 10 (grounding gate, false hits)
Reference

Reading Order and Artifacts

# Paper Type Pages DOI Release
1 The Grounding Gate Empirical 13 20817087 v1
2 Canonical Execution Semantics Empirical 23 20457255 v1
3 Execution Categories Formal 14 20457403 v1
4 Provenance Algebra Formal 15 20457667 v1
5 Execution Fibers Empirical 12 20457990 v1
6 The Semantic Structure of Execution Empirical 17 20612709 v1
7 Semantic Coordinates as Predictive Objects Empirical 14 20706294 v1
8 Dependency Shape Predicts Execution Behavior Empirical 17 20706086 v1
9 Cross-Family Convergence Empirical 17 20706156 v1
10 When Data-Hash Caching Fails Formal + Empirical 20815342 v1
11 Verified AI Actions Position + Empirical 6 20816935 v1

11 papers. All published as open-access working papers under CC-BY-4.0. Each GitHub release contains PDF, LaTeX source, experiment data (where applicable), verification scripts, CITATION.cff, and .zenodo.json.

Reproducibility

Experiment Data

Nine of the eleven papers include computational experiments with published datasets.

PaperDatasetSize
Paper 1 30-prompt evaluation (5 categories, 4 families, 9 metrics) prompts_30.json
Paper 2 1,200 LLM generations (30 prompts x 4 temps x 10 reps), replay CSV, provenance CSV experiment-data.tar.gz
Paper 3 Theoretical paper, no experiment data --
Paper 4 Theoretical paper, no experiment data --
Paper 5 2,200 proposals (1,200 baseline + 1,000 perturbations), fiber stats, adjacency graph experiment-data.tar.gz
Paper 6 61-operation taxonomy, 4 optimizer behavior predictions, conditional MI analysis, holdout data cargo test
Paper 7 25-operation holdout generalization, coordinate ablation, random baseline comparison cargo test
Paper 8 30 operations × 2 systems × 3 behaviors (180 predictions), Polars + DuckDB reproduce.sh
Paper 9 55 runs across 3 model families × 3 domains × 9 question tiers, ~178k tokens per run reproduce.sh
Paper 10 515 comparisons across 9 strategy families, 3 cache modes (NONE/DAT/SRH), false-hit detection cargo test
Paper 11 180 trials (grounded vs unconstrained), 9 grounding wall property tests, 10-system comparison cargo test

All datasets are included in their respective GitHub releases. Verification scripts are provided for each paper.

Citation

How to Cite

Research Program

To reference the program as a whole:

Dionysopoulos, T. (2026). BLISP Research Program: Admissibility, Deterministic Execution, Provenance, and Capability-Grounded AI Systems. Zenodo. https://doi.org/10.5281/zenodo.20459958

@misc{blisp2026program,
  title        = {BLISP Research Program: Admissibility, Deterministic Execution,
                  Provenance, and Capability-Grounded AI Systems},
  author       = {Dionysopoulos, Thomas},
  year         = {2026},
  doi          = {10.5281/zenodo.20459958},
  publisher    = {Zenodo},
  url          = {https://doi.org/10.5281/zenodo.20459958},
  note         = {11-paper program; all papers published with DOIs}
}

Individual Papers

#BibTeX KeyDOI
1 dionysopoulos2026grounding 10.5281/zenodo.20817087
2 dionysopoulos2026canonical 10.5281/zenodo.20457255
3 dionysopoulos2026categories 10.5281/zenodo.20457403
4 dionysopoulos2026provenance 10.5281/zenodo.20457667
5 dionysopoulos2026fibers 10.5281/zenodo.20457990
6 dionysopoulos2026semantic 10.5281/zenodo.20612709
7 dionysopoulos2026predictive 10.5281/zenodo.20706294
8 dionysopoulos2026transfer 10.5281/zenodo.20706086
9 dionysopoulos2026convergence 10.5281/zenodo.20706156
10 dionysopoulos2026falsehits 10.5281/zenodo.20815342
11 dionysopoulos2026verifiedactions 10.5281/zenodo.20816935

Full BibTeX entries with DOI fields are available on each paper card.