Tutorial 6

MPRD and the Algorithmic CEO

MPRD turns "model proposes, rules decide" into a checkable control architecture. The algorithmic CEO proposes within bounded menus, while policy gates, invariant rails, and verification decide what executes.

View source Built 2026-04-01

This tutorial maps one concrete system to the architecture from Tutorial 5, “propose, then gate with evidence”.

The system is MPRD (Model Proposes, Rules Decide). The proposer is an Algorithmic CEO: a deterministic, IO-free controller that searches a pre-computed decision graph. The gate is policy evaluation. In MPRD, the policy artifact is a Tau specification stored on Tau Net. Policy Algebra is the authoring and certification rail used to build and validate gate logic. The witness loop is an invariant rail with counterexample minimization.

The central claim is narrow and testable: MPRD separates candidate generation from execution authority. Better proposers can expand search quality, but they do not gain permission to execute without passing formal gates.

Source scope: the public MPRD repository at github.com/TheDarkLightX/MPRD.

Mental pictures to keep

A CEO as a planner with zero execution keys
A menu graph as a safe decision lattice, traversal is limited to pre-computed edges
Invariant rails as guardrails on a mountain road: the road is narrow on purpose
Policy as the gate, represented as Tau specs on Tau Net
The neuro-symbolic loop from Tutorial 5, realized: propose, gate, witness, refine

Part I: one proposal, one gate, one outcome

The fastest way to understand MPRD is to watch one decision epoch.

A bounded planner proposes one action.
A policy gate checks whether that action is allowed in the current state.
The executor runs only approved actions.
The invariant rail checks the result and stores a replayable witness if something breaks.

That is the system story in plain English: a planner with no execution keys, a guarded single door, and a replayable audit trail.

One decision epoch in MPRD

The proposer can search aggressively, but it still has to walk through the same narrow door as every other action. Model proposes. Gate checks. Counterexample teaches.

Three-layer architecture

The architecture separates into three layers with distinct trust profiles:

Layer	Role	Trust
Proposer	Generates candidate actions	Untrusted, bounded by interface
Governor (Policy on Tau Net)	Evaluates candidates against policy	Trusted, bounded, deterministic
Executor	Performs only approved actions	Guarded, single-path

The separation matters because it makes the safety claim local. Safety depends on the governor and execution boundary, not on the proposer internals. A better proposer can improve search quality, but it does not gain permission to execute by becoming cleverer.

One concrete action trace

Suppose the current tokenomics state is healthy enough that the planner wants to increase burn allocation by one safe step.

The Algorithmic CEO searches the current menu graph neighborhood and returns one ActionId.
The policy gate evaluates that action against the current Tau policy on Tau Net.
If a veto condition such as emergency_freeze is true, the action is denied and execution is skipped.
If the policy returns allow, the executor performs the action through the single guarded path.
The invariant rail checks the resulting state. If something fails, the system stores a concrete counterexample trace.

The important teaching point is not the tokenomics detail. It is the control split. The proposer suggests. The governor authorizes. The rail judges the aftermath.

The core invariant in plain language

The top-level safety claim is simple:

Every executed action must already have passed policy.

Everything else on this page is a more precise version of that sentence.

Why the architecture exists

The metaphor behind “Algorithmic CEO” is governance separation, not human-style intelligence. Candidate generation can be abundant. Permission to mutate economic state stays scarce.

Assumption A (explicit): model inference and search remain cheaper than governance failures. If Assumption A holds, design effort should concentrate on policy gates, execution boundaries, and invariant rails, not on granting proposers direct authority.

This connects directly to two earlier tutorials:

Tutorial 4 (world models): a learned system that updates at runtime needs formal rails to prevent drift. If the model can change itself, then something outside the model must define what counts as “out of bounds”. MPRD is one answer to that design problem.
Tutorial 5, Part VIII (neuro-symbolic gates): the abstract pattern is “propose, gate, refine”. MPRD is the concrete instantiation of that pattern for economic governance.

Exact logic form in MPRD artifacts

What to look for in the next formulas: each one has the same skeleton, a trigger condition and a required consequence. If the trigger is true and the consequence is false on some trace, that trace is the witness.

In the MPRD repository, the main safety claim appears in execution form as:

\[\forall p, s, a.\; \text{ExecCalled}(p, s, a) \Rightarrow Allowed(p, s, a)\]

Here, $p$ is the policy artifact (the Tau policy specification stored on Tau Net), $s$ is the state snapshot, and $a$ is the candidate action.

The lifecycle spec then adds two always-invariants:

\[\Box(\text{verdict} = \text{Denied} \Rightarrow \text{exec} = \text{skipped})\] \[\Box(\text{exec} \in \{\text{success}, \text{failed}\} \Rightarrow \text{proof} = \text{verified})\]

In the production ZK profile, the execution gate is phrased as:

\[\text{Execute}(b) \Rightarrow \text{ValidDecision}(b, r) = \text{true}\]

with selector correctness scoped as:

\[\text{Sel}(\text{policy}, \text{state}, \text{candidates}) = \text{action} \Rightarrow \text{action} \in \text{candidates} \;\land\; Allowed(\text{policy}, \text{state}, \text{action})\]

Plain-text mirror of the same invariants:

InvSafety: ExecCalled(p,s,a) => Allowed(p,s,a)
InvDeniedImpliesSkipped: verdict = Denied => exec = skipped
InvExecRequiresVerified: exec in {success, failed} => proof = verified
InvExecuteRequiresValidDecision: Execute(bundle) => ValidDecision(bundle, registry_state) = true
InvSelectorContract: Sel(policy, state, candidates) = action
                     => action in candidates and Allowed(policy, state, action)

Reading guide for the invariants

Each invariant has the same logical shape: trigger condition => required consequence.

Notation	Read as	Role in this section
`∀ p, s, a`	for every policy, state, action	universal scope
`⇒`	implies, if … then	obligation
`∧`	both must hold	conjunction
`□`	always, at every step	temporal safety
`∈`	belongs to set	status membership

Name legend:

$p$ = policy (Tau specification on Tau Net).
$s$ = state snapshot.
$a$ = candidate action.
$\text{exec}$ = execution status.
$\text{proof}$ = proof status.
$\text{verdict}$ = policy result ($\text{Allowed}$ or $\text{Denied}$ in the lifecycle model).

Invariant-by-invariant interpretation:

InvSafety: if execution is called for (p,s,a), then the action must be policy-allowed. Pass example: an action executes, and Allowed(p,s,a) is true. Violation example: an action executes while Allowed(p,s,a) is false.
InvDeniedImpliesSkipped: if the policy verdict is denied, execution must stay skipped. Pass example: verdict is denied, exec = skipped. Violation example: verdict is denied, but exec = success.
InvExecRequiresVerified: if execution ends in success or failed, proof must already be verified. Pass example: exec = success, proof = verified. Violation example: exec = failed, proof = pending.
InvExecuteRequiresValidDecision: if the executor runs a bundle, registry-bound decision verification must be true. Pass example: bundle executes only after ValidDecision(bundle, registry_state) = true. Violation example: bundle executes while registry checks fail.
InvSelectorContract: selected action must come from the candidate set and be policy-allowed. Pass example: selector returns a listed action with Allowed = true. Violation example: selector returns an action outside candidates, or one with Allowed = false.

This is not a guideline. It is enforced through five architectural constraints:

Proposer isolation. Models generate candidates without execution capability.
Single execution path. All actions funnel through one guarded channel.
Token requirement. Actions execute only with valid governance tokens.
Deterministic minting. Tokens issue exclusively when rules permit.
Verifiable attestation. Proofs enable third-party validation without trusting the proposer.

Part II: the Algorithmic CEO (the bounded proposer)

The Algorithmic CEO is a deterministic, IO-free controller. It is a bounded local-search optimizer over a pre-computed menu graph.

The word “CEO” is metaphorical. This is not general intelligence. It is a bounded optimizer that searches a finite graph under formal constraints. The metaphor is: a CEO who can suggest moves but cannot act without board approval, and who can only suggest moves that are edges in a pre-approved decision lattice.

CEO loop in one recurrence

At each step $t$, the proposer computes:

\[\text{target}_t = \underset{n \in N_h(x_t)}{\operatorname{argmax}}\; \text{Score}(n, \theta_t)\] \[\text{action}_t = \text{StepTowards}(x_t, \text{target}_t)\]

where $x_t$ is current state, $N_h(x_t)$ is the bounded neighborhood of radius $h$, and $\theta_t$ is objective configuration.

Execution is then gated by policy:

\[x_{t+1} = \begin{cases} \text{Apply}(x_t, \text{action}_t) & \text{if } Allowed(p_t, x_t, \text{action}_t) \\ x_t & \text{otherwise} \end{cases}\]

This is the key control split: the CEO computes proposals, policy grants authority.

CEO interface in practice

Per decision epoch, the Algorithmic CEO is a deterministic function:

Inputs: current state snapshot, fixed menu graph, objective configuration, horizon bound.
Output: one ActionId (or NOOP) that corresponds to a valid graph edge.
No authority: no direct state mutation and no bypass path around policy evaluation.

This interface is what makes proposer-swapping an engineering object instead of a governance hazard. Different proposers can be swapped in, but every proposer must emit the same bounded action type and pass the same policy gate.

Why this is an algorithmic upgrade, within a bounded scope

Within the pre-computed menu graph, the Algorithmic CEO is not one trick. It is a stack of algorithmic choices that make bounded autonomy practical:

Decision-space compilation. Continuous controls are compiled into a finite lattice of valid states, so invalid states are unrepresentable.
Geometry shortcut with proof. A Lean-checked result shows $L^\infty$ distance equals shortest-path length in the menu graph, removing the need for runtime graph search.
Deterministic total ordering. Candidate choice is fully deterministic (score, then distance, then key), making runs replayable and auditable.
Numerically safe scoring. Saturating arithmetic and hard constraint sentinels make objective evaluation fail-bounded.
Scalable planning upgrades. The simplex extension adds partial-order and symmetry reductions, so larger action spaces remain searchable under fixed budgets.

The net effect is scoped but important: inside that bounded graph, the CEO behaves like a deterministic search component, not like a black-box agent with open-ended authority.

The decision space is pre-computed as a directed graph.

Nodes are valid setpoint configurations. In the v6 tokenomics, each node is a MenuNode consisting of three bounded parameters:

BurnPct: burn surplus allocation (5,000–9,500 bps, in 100 bps steps, yielding 46 unit values)
AuctionPct: auction surplus allocation (500–5,000 bps, in 100 bps steps, yielding 46 unit values)
DripStep: drip rate (5–100 bps, in 5 bps steps, yielding 20 unit values)

A split cap constraint enforces burn_bps + auction_bps ≤ 10,000. Nodes that violate the cap are never created.

Edges are feasible one-step transitions. Each step changes at most one unit in each dimension (burn, auction, drip), giving 27 possible deltas per node (3 choices per dimension: step up, step down, or hold). An ActionId encodes each delta as an index from 0 to 26, with ActionId(13) as NOOP.

The graph itself is the safety boundary. You cannot propose a move that is not an edge. The MenuGraph::generate() method enumerates all valid lattice points, sorts them canonically by key, and builds the full adjacency matrix. The canonical hash uses domain-separated SHA-256 (MPRD_CEO_MENU_GRAPH_V1) for deterministic audit trails.

This is a reformulation move in the sense of Tutorial 5, Part III: a continuous parameter space has been decomposed into a discrete, searchable lattice. The decomposition makes invalid states unrepresentable and search bounded.

Toy picture: imagine only two knobs, burn and auction, each with three legal settings. The menu graph is then a tiny lattice of legal points. A proposal is not “jump anywhere.” It is “take one legal edge from the current node.”

Greedy algorithm

The GreedyCeo implements a single-step greedy planner:

Enumerate reachable nodes within an $L^\infty$ ball of radius horizon (max 8, giving at most $(2h+1)^3 = 4{,}913$ candidates).
Score each candidate using a deterministic objective function.
Select the best target with deterministic tie-breaking: higher score → smaller $L^\infty$ distance → smaller $L^1$ distance → smaller key.
Return one safe step toward the target using step_towards_key.

The $L^\infty$ distance equals the exact shortest-path length in the safe-menu graph (proved in the Lean artifact internal/specs/mprd_ceo_menu_shortest_path_proofs.lean). This means the planner can navigate without running a full graph search: sign-directed one-step movement reaches the target in exactly $d_\infty$ steps.

Pseudocode view

INPUT: state x, objective config θ, horizon h
R := neighborhood(x, h)                // bounded candidate set
best := argmax_total_order(R, Score(·, θ))
return step_towards_key(x, best.key)   // one valid edge

The important point is not only optimization quality. The point is that every operation in this loop is bounded, deterministic, and auditable.

Multi-objective evaluation

Three objective regimes are available:

ProfitUtility: scores based on cashflow, auction revenue, and burn, penalized by risk and churn.
OpiFirst: prioritizes OPI (operator profitability index) health, with a revenue floor constraint.
Hybrid: weighted combination of the two, where profit_weight_bps + opi_weight_bps = 10,000.

All scoring uses saturating arithmetic (add_i64_saturating, sub_i64_saturating, mul_bps_i64), so overflow produces bounded values rather than panics or wraps. Reserve floor violations and revenue floor violations produce CONSTRAINT_VIOLATION_SCORE = i64::MIN / 4, a sentinel that deterministically ranks any violating configuration below all feasible alternatives.

Objective configuration changes are gated by a cooldown: can_update(now, cooldown_epochs) rejects updates until enough epochs have elapsed since the last change. This prevents oscillation.

The connection to Tutorial 5, Part III is that the menu graph decomposes a continuous parameter space into a discrete lattice, and the objective function turns a multi-criteria optimization into a single deterministic score. Both are canonicalization moves that reduce the effective search space.

Part III: policy gate, Tau storage, and Policy Algebra rail

In MPRD, the gate is the policy itself. The canonical policy artifact is a Tau specification stored on Tau Net. Every state transition on a trust boundary requires explicit, deterministic, fail-closed policy evaluation before execution.

Policy Algebra is the rail around that gate: it structures policy logic, canonicalizes it, and supports certification workflows (including Tau emission and semantic checks).

AST

The policy expression language is a small, bounded tree:

PolicyExpr ::= True | False
             | Atom(name)
             | Not(child)
             | All(children)      -- conjunction
             | Any(children)      -- disjunction
             | Threshold(k, children)
             | DenyIf(name)       -- absorbing veto

Bounds are enforced by PolicyLimits: max 64 children per node, max 1,024 nodes per tree, max 64-character atom names. These are safety rails against denial-of-service and non-termination, not tokenomics parameters.

DenyIf as absorbing veto

DenyIf is the central safety mechanism. It is evaluated in a separate first phase, before the rest of the tree:

Phase 1: Extract all DenyIf atoms from the entire tree. For each, check the signal. If any DenyIf atom is true (the dangerous condition holds) or missing (the signal is absent), the entire policy evaluates to DenyVeto. No further evaluation occurs.
Phase 2: If all DenyIf atoms are false (the dangerous conditions are confirmed absent), evaluate the rest of the tree normally.

This is fail-closed by design. Missing data does not default to “allow”. It defaults to “veto”.

Tiny example: if the rest of the tree says “OPI healthy” and “cooldown elapsed”, but emergency_freeze is true, or the signal is missing, the whole policy stops right there with DenyVeto. The system does not keep arguing with the veto.

4-valued outcomes

Policy evaluation produces one of four outcomes, not two:

Outcome	Meaning
`Allow`	All conditions met, action permitted
`DenySoft`	Normal denial, conditions not met
`DenyVeto`	Absorbing veto, DenyIf fired or signal missing
`Neutral`	No opinion (used internally for DenyIf atoms in Phase 2)

This is not standard boolean algebra. Classical laws can fail in the presence of DenyIf:

Idempotence can fail: All([DenyIf("x"), DenyIf("x")]) is not the same as DenyIf("x") in all contexts, because the tree structure affects Phase 1 collection.
Excluded middle can fail: Any([Atom("x"), Not(Atom("x"))]) need not be Allow if a DenyIf elsewhere fires first.
Non-contradiction can fail: the 4-valued output means a policy can be neither Allow nor DenySoft, it can be DenyVeto or Neutral.

This is intentional. Safety overrides algebraic convenience. The evaluation semantics are designed so that no composition of Allow-producing sub-policies can override a DenyVeto.

Canonicalization and hashing

The CanonicalPolicy type normalizes policy expressions into a stable canonical form:

Flatten nested All/Any (associativity).
Remove identity elements (True in All, False in Any).
Eliminate contradictions and complements (only when no DenyIf is present, safety-preserving).
Absorb redundant sub-expressions.
Sort children by deny_if_rank then by encoded bytes, giving a total canonical order.
Deduplicate children in All/Any; preserve multiplicity in Threshold.

The canonical form is then serialized and hashed with domain separation. Two policies that are semantically equivalent under the 4-valued semantics will have the same canonical hash. This enables deterministic audit trails: if a policy changes, the hash changes, and a diff with concrete counterexample inputs can be produced automatically.

This is canonicalization in the sense of Tutorial 5, Part III: pick one representative per equivalence class, so comparison and search work on unique representatives instead of redundant variants.

Concrete example: the CEO gate

Before the Algorithmic CEO can change tokenomics parameters, its proposed action must pass through a policy gate. A typical gate policy might look like:

All([
  Atom("opi_healthy"),         -- OPI above threshold
  Atom("reserve_sufficient"),  -- reserve covers floor
  Atom("cooldown_elapsed"),    -- enough epochs since last change
  DenyIf("emergency_freeze")  -- absolute veto if emergency detected
])

If emergency_freeze is true, or if its signal is missing, the entire policy evaluates to DenyVeto, regardless of whether OPI is healthy, reserves are sufficient, and cooldown has elapsed. The DenyIf absorbs everything.

The PolicyGateV6 trait defines the interface:

pub trait PolicyGateV6 {
    fn check(&self, eng: &TokenomicsV6, action: &ActionV6) -> Result<()>;
}

An Err result means the action is denied. The executor will not proceed.

Part IV: invariant rails and counterexample minimization

The invariant rail

The invariant rail runs after every state transition. It checks three categories of properties:

1. No mutation on error. If an action returns Err, the state hash must not change. If it did, the engine violated its own contract.

2. Tokenomics invariants. After every successful action, TokenomicsV6::check_invariants_v1 validates engine-level properties (balance conservation, bound consistency, etc.).

3. Transition-level conservation. Specific action outcomes must satisfy conservation laws. For example, after SettleOpsPayroll:

\[p_{\mathrm{total}} + c_{\mathrm{reserve}} = o_{\mathrm{pool}}\]

Here p_total is the payout total, c_reserve is the carry sent back to reserve, and o_pool is the original operations-payroll pool.

If any invariant fails, the rail returns an InvariantCounterexampleV6 containing the violation ID, the step index, the state hash, and the full action trace.

This connects to the counterexample analysis from Tutorial 1, Part IV: a single concrete trace that violates an “always” claim is the most informative kind of failure evidence.

Counterexample minimization

When a violation is found, the trace may be longer than necessary. The minimize_counterexample_v1 function reduces it to a minimal failing trace using deterministic delta-debugging:

Verify the provided trace reproduces the violation.
Initialize chunk size $n = 2$.
Loop: divide the action sequence into $n$ chunks. For each chunk, try removing it. If the reduced trace still triggers the same invariant ID:
- Accept the reduction.
- Reset chunk size toward 2.
- Restart.
If no chunk removal preserves the violation, double $n$ (up to sequence length).
Return the minimized counterexample.

The result is a minimal witness: the shortest sub-trace that triggers the specific invariant violation. This is the “minimized witness” concept from Tutorial 1, Part V: a counterexample becomes more useful when it carries less irrelevant context.

Testing strategy: LTLf and bounded model checking

The invariant rail is complemented by temporal property checking using LTLf (Linear Temporal Logic over finite traces) with bounded model checking (BMC).

The ltlf module implements a minimal LTLf monitor with:

Formula constructors: always(φ), eventually(φ), until(φ, ψ), release(φ, ψ), next(φ), weak_next(φ).
Progression semantics: progress(f, valuation) advances a formula by one step.
BMC: bmc_find_violation(spec, initial, max_steps, step) performs explicit-state search for counterexample traces.
Best-effort synthesis: synthesize_bounded(spec, initial, max_steps, actions, step) implements a two-player game between the system and an adversarial environment, producing Guaranteed, Possible, or Impossible verdicts.

Concrete temporal properties tested against the tokenomics engine include:

Settlement ordering: settlements must precede certain epoch-close operations.
No mutation on error: temporal invariant version of the per-step check.
Epoch-close forbids open actions: after settlements close an epoch, drip and service actions must fail.

The synthesis mode distinguishes between deterministic environments (where the system controls all choices) and adversarial environments (where some steps may fail non-deterministically). A Guaranteed verdict means the spec holds against all environment choices. A Possible verdict means the spec holds along some path but not all.

This connects to Tutorial 3 (Tau specifications): Tau specs can state stepwise constraints and temporal obligations. The beginner tutorial focuses on the common stepwise fragment, and the temporal checker here searches for violations of the larger temporal claim.

Part V: simplex planning (bounded-horizon search with symmetry)

The simplex CEO extends the greedy CEO to multi-step planning over a generalized k-way allocation.

k-way simplex

In the simplex model, allocations are a bounded vector of $k$ non-negative integers, each capped individually, optionally constrained to sum to a constant $T$. Actions are transfers: move one unit from bucket $i$ to bucket $j$, subject to:

Source has at least 1 unit: x[src] > 0
Destination is below cap: x[dst] < cap[dst]

If a transfer is disabled (preconditions not met), it acts as a no-op. The sum is preserved by construction, every transfer subtracts 1 from one bucket and adds 1 to another. This is correct by construction (CBC): invalid states are unrepresentable because the transfer semantics make it impossible to violate the sum constraint or the per-bucket bounds.

For $k$ buckets with total $T$, the number of distinct states is the stars-and-bars count $\binom{T+k-1}{k-1}$, further reduced by per-bucket caps. For 3–4 buckets with moderate totals, this is manageable. Beyond that, coarser granularity or symmetry reduction is needed.

Three planning modes

The simplex planner supports three modes, each a reformulation insight:

TracePor (Mazurkiewicz-style partial-order reduction):

Two transfers commute if executing them in either order produces the same state. The POR oracle stable_enabled_ineq gives a closed-form sufficient condition:

If two transfers share a source, we need at least 2 units there so both can fire independently.
If two transfers share a destination, we need at least 2 slack (cap minus current) so both can fit.

When transfers commute, only one ordering needs to be explored. TracePor canonicalizes traces by certified swaps: given a trace of transfers, rebuild it by inserting each new action at the canonical position (determined by the independence oracle). The canonical trace represents an equivalence class of all orderings that produce the same final state.

This is canonicalization from Tutorial 5, Part III applied to action sequences: many execution orders that reach the same state are collapsed to one canonical representative, and search only visits representatives.

Toy example: if one transfer moves one unit from bucket A to B and another moves one unit from C to D, and those endpoints do not interfere, then doing A->B first or C->D first lands in the same state. Search should pay for that situation once, not twice.

StateSymmetry (quotient by symmetry classes):

When buckets are interchangeable, same cap, same objective weight, same role, the state space can be quotiented by permutations within each class. The symmetry_key function computes a canonical key:

Group buckets by (cap, weight).
Within each group, sort the bucket values.
The sorted group vectors form the canonical key.

Two states that differ only by a permutation of interchangeable buckets will have the same key. Search visits one representative per equivalence class.

This is the quotient search from Tutorial 5, Part III: instead of searching raw states, search equivalence classes.

Toy example: if bucket 1 and bucket 2 have the same cap, weight, and role, then states [4,1,7] and [1,4,7] are different raw arrays but the same planning situation. Symmetry reduction keeps one representative and discards the duplicate paperwork.

AmplePorDfsC2 (depth-first with ample-set POR and C2 cycle proviso):

This mode uses a more aggressive POR strategy with depth-first search:

At each state, compute an ample set: a subset of enabled actions sufficient to represent all behaviors (under the POR independence assumptions).
Use a DFS stack to detect cycles and apply the C2 proviso: if a cycle would be created, expand the ample set to full to avoid missing reachable states.
A safety visibility contract prevents reduction at states where any enabled action would change the linear objective score, ensuring the optimization criterion is fully visible to the planner.

Five ample-set strategies are implemented (None, MinOnly, MinPlusDependentsOfMin, MinPlusDependencyClosure, DfsC2), with exhaustive bounded counterexample mining to validate each.

Assumption hygiene: POR soundness is bounded, not universal

The simplex ample-set module explicitly states: "This module makes no global soundness claim by default. Any pruning rule must be validated by (a) bounded exhaustive checks and/or (b) Lean proofs." Tests include dedicated counterexample miners that compare POR-reduced reachability against full reachability on small instances. Some strategies (MinOnly, MinPlusDependencyClosure) have known counterexamples and are not promoted.

Bounded exploration

All planning modes are bounded by two parameters:

Horizon: maximum depth of the search.
Expansion budget: maximum number of states expanded.

The planner terminates deterministically when either bound is reached. Tie-breaking for the best plan follows the same total order as the greedy CEO: score descending → depth ascending → first-action key ascending → target state lexicographic ascending.

Combinatorics and the connection to VC dimension

For $k$ buckets with total $T$, the unreduced state count is $\binom{T+k-1}{k-1}$. For 3 buckets with $T = 10{,}000$ bps, that is $\binom{10002}{2} = 50{,}015{,}001$ states, tractable with POR but large enough to motivate symmetry reduction. For 4 buckets, it grows to $\binom{10003}{3} \approx 1.67 \times 10^{11}$, which requires coarser granularity or aggressive quotienting.

This connects to the hypothesis space analysis from Tutorial 5, Part III: the effective size of the search space determines how much evidence (search budget) is needed to find a good plan, and reformulation techniques (POR, symmetry) reduce that effective size without losing reachable optima.

Part VI: the link back (MPRD as the neuro-symbolic loop, realized)

Tutorial 5, Part VIII defined the neuro-symbolic gate as an architecture:

A model proposes candidates. A formal checker decides which candidates are allowed, and returns counterexamples when a candidate is wrong.

MPRD is this pattern instantiated for economic governance:

Loop role	MPRD component
Proposer	Algorithmic CEO (greedy/simplex planner over menu graph)
Gate	Policy evaluation (Tau policy on Tau Net, fail-closed)
Witness	Invariant rail violation (minimized counterexample trace)
Refinement	PID controllers adjusting setpoints toward objectives

The safety controller as Simplex architecture

The SafetyController implements a constrained PID loop:

Normal operation: given setpoints (target burn, auction, drip) from the CEO or external input, compute a one-step action toward the target, constrained to valid graph transitions.
Fallback: if the state is corrupted or the graph version mismatches, fall back to NOOP, do nothing rather than do something wrong.
Split cap enforcement: after PID computation, enforce_split_cap_preserve_burn ensures the resulting allocation respects the global constraint, favoring burn preservation.

This is a Simplex-style architecture (in the control-theory sense): a normal controller proposes, a safety controller constrains the proposal to the valid operating region, and a fallback ensures graceful degradation.

What MPRD adds beyond the abstract pattern

The abstract “propose, gate, witness” loop from Tutorial 5 is a template. MPRD fills it in with specific engineering choices:

1. The decision space is pre-computed. The menu graph is generated once and its canonical hash is fixed. The CEO searches within this graph, not over an unbounded space. This is the difference between “propose anything and let the gate filter” and “propose only from a pre-validated lattice and let the gate confirm”.

2. Policy is owned by Tau Net, not by operators. The separation is not just proposer-vs-gate but also governance-of-policy-vs-governance-of-operations. Policy changes go through their own governance path (committee quorum, timelock, escalation), distinct from the CEO’s operating path.

3. Proofs are machine-checked. Lean 4 artifacts prove properties of the menu graph geometry and simplex POR. Kani harnesses verify Rust kernel invariants. ROBDD certification verifies policy algebra semantics. This is not “we tested it”, it is “a proof assistant checked the claim”.

4. Everything is deterministic and IO-free at the kernel. The CEO, the policy evaluator, and the invariant rail perform no I/O, no randomness, and no floating-point arithmetic. Audit trails are reproducible from first principles: given the same inputs, the same outputs are guaranteed.

Assumption hygiene: what the CEO is and is not

The Algorithmic CEO is not general intelligence. It is a bounded optimizer searching a pre-computed graph under formal constraints. The safety claim is scoped and explicit: actions that pass through the governor satisfy the policy at evaluation time. The system does not claim that the policy is complete (it might miss hazards), that the objective is correct (it might optimize the wrong thing), or that external inputs are trustworthy (they might be manipulated).

These are explicitly documented as assumptions A1–A4 in the MPRD repository. The engineering response is not to wish the assumptions away, but to stress-test them: policy diffs with counterexamples, oracle lockstep checks, symmetry class validation, and signed state attestations.

Part VII: what is proved, what is empirical, what is exploratory

MPRD maintains explicit boundaries between three evidence categories:

Machine-checked formal claims:

Lean theorem bundles for CEO simplex POR (commutation conditions, swap-equivalence invariance, trace canonicalization invariance, reachability equivalence under canonicalization).
Lean artifact for v6 menu shortest-path geometry ($L^\infty$ distance equals shortest-path length).
Kani harnesses for Rust verified kernels (no-mutation-on-error, conservation laws, anti-replay).
ROBDD certification for policy algebra (semantic hashing, semantic diff with concrete counterexamples).

One precision: the simplex POR Lean artifact explicitly notes a disjoint-endpoint commutation lemma as an axiom. This is a named remaining proof obligation, not a gap swept under the rug.

Deterministic empirical rails:

Bounded exhaustive sweeps comparing POR-reduced reachability against full reachability on small instances.
Counterexample miners that actively search for strategy failures.
CI-style scripts (check_ceo_simplex_rail.sh) that verify oracle definitions match formal predicates.

Exploratory modeling:

Simulation tools (e.g., ceo_simulation.py) explicitly marked as “IDEA-ONLY”.
Strategy comparison notebooks for studying controller behavior under scenarios.

Keeping these categories separate prevents a failure mode discussed in Tutorial 5, Part VII (the Lighthill report): treating simulation success as a proof.

References

MPRD repository: https://github.com/TheDarkLightX/MPRD
MPRD README (core invariant and architecture): https://github.com/TheDarkLightX/MPRD/blob/main/README.md
Algorithmic CEO menu modes: https://github.com/TheDarkLightX/MPRD/blob/main/docs/CEO_MENU_MODES.md
CEO simplex POR integration: https://github.com/TheDarkLightX/MPRD/blob/main/docs/CEO_SIMPLEX_POR_INTEGRATION.md
Policy algebra: https://github.com/TheDarkLightX/MPRD/blob/main/docs/POLICY_ALGEBRA.md
Policy certification rail: https://github.com/TheDarkLightX/MPRD/blob/main/docs/POLICY_CERTIFICATION.md
Production ZK and registry-bound verification: https://github.com/TheDarkLightX/MPRD/blob/main/docs/PRODUCTION_ZK.md
Tutorial 1: Approximate state tracking, counterexamples, CEGIS, minimized witnesses
Tutorial 3: Tau Language, executable policy semantics
Tutorial 4: World models and continuous learning, adaptation versus drift, formal rails for learning systems
Tutorial 5: Reformulation and compression, neuro-symbolic gates, canonicalization, decomposition, VC dimension