Ordered Patch Theory

Appendix T-4: MDL / Parsimony Comparison

Anders Jarevåg

v2.0.0 — April 2, 2026 | DOI: 10.5281/zenodo.19300777

Original Task T-4: MDL / Parsimony Comparison Problem: The live preprint claims parsimony over standard physics by treating physical laws as macroscopic compression algorithms, but does not provide a formal MDL comparison. Deliverable: Comparative MDL analysis of OPT versus benchmark physics model classes under explicit coding conventions.

Closure status: PARTIAL REPAIR (v3.6.10, Phase 4c). Previously marked CLOSED at v2.0; REOPENED at v3.6.0; v3.6.10 executes the partial repair. The two formal errors (sign error in §4 / T-4b dominance bound; cancellation error in §7 / T-4e proof) are now fixed; the three interpretive overclaims (Solomonoff dominance does not give a finite advantage; OPT does not get the data likelihood for free; K_0 \approx 36 bits is not credible) are now acknowledged and the corresponding theorem claims withdrawn or weakened. The structural-correspondence intuition — that OPT shifts the locus of explanatory burden from law-enumeration to law-selection — is preserved; the quantitative parsimony advantage claim (\sim -1714 bits) is not preserved.

v3.6.10 repair summary:

  1. §4 (T-4b dominance bound) — sign corrected. The identity -\log \xi^F(y) = -\log \xi(y) + \log \xi(\mathcal{O}) now appears with the correct sign (\log \xi(\mathcal{O}) \le 0 — a reduction, not a penalty). The boxed Solomonoff dominance bound becomes L_T(\text{OPT}) \le L_T(\nu) + K_0 + K(\nu) - \log \xi(\mathcal{O}) + O(1), with the event-cost interpretation made explicit (the apparent -\log \xi(\mathcal{O}) term cancels the conditional-likelihood gain if \mathcal{O} is encoded as a separate event, restoring the bare dominance bound -\log \xi(y) \le -\log \nu(y) + K(\nu) + O(1)).

  2. §6 (T-4d) — demoted from “permanent advantage” to “asymptotic dominance.” The earlier L_T(\text{OPT}) - L_T(\nu) \to K_0 - K(\nu) “permanent constant-bit advantage” is withdrawn. The replacement T-4d (v3.6.10) states only the asymptotic dominance bound: L_T(\text{OPT}) - L_T(\nu) \le K(\nu) + O(1) asymptotically (modulo the typicality condition); this is upper bound only, not equality. OPT cannot do worse than benchmark by more than a complexity constant; it does not necessarily achieve K(\nu) - K_0 savings.

  3. §7 (T-4e finite-T condition) — cancellation error fixed; event-cost double-count removed (2026-06 review). The corrected exact condition K_{\text{IC}} > K_0 - K_{\text{laws}} + [-\log \xi^F(y) - (-\log P_{\text{SP}}(y))] is now in place, with no separate -\log \xi(\mathcal{O}) term: the conditional-code identity absorbs the event cost into -\log \xi^F(y), and the earlier v3.6.10 box that carried an extra -\log \xi(\mathcal{O}) term double-counted it (its “if and only if” held only as “if”). The -K_{\text{laws}} term is real and load-bearing — under standard K_{\text{laws}}(\text{SM+GR}) \sim 1450 bits and K_0 \approx 36 bits, the threshold is trivially satisfied (modulo the sign-indeterminate log-likelihood deficit term, which is uncomputed and can dominate) but the structural content is correspondingly much weaker than the earlier “300 > 36” presentation suggested. The corrected T-4e is preserved as an existence statement: under the corrected book-keeping, some finite-T inequality of the right form can be written — but its quantitative content does not support the earlier headline claim.

  4. §5.1 (T-4c IC compression) — absorption-vs-elimination annotation now in main flow. Conjecture T-4c is retained as conjectural, with the explicit understanding that absorbing laws + IC into the prior moves complexity from the model-code half of the two-part code to the data-likelihood half — it does not eliminate complexity. The hypothetical K(\text{IC} \mid \text{OPT}) \approx 0 regime depends on the load-bearing typicality assumption (Stability Filter making observer-streams typical under the universal prior); this assumption is not established and is itself the substantive open problem.

  5. §3 (T-4a K_0 \approx 36 bits) — annotated as not credible for full-theory complexity. The selector rule is not just R_{\text{req}} \le B; it presupposes definitions of observer-compatible stream, Markov blanket, predictive rate, distortion metric, bottleneck architecture, Stability Filter, rendering, codec relation, self-model conditions, maintenance dynamics, synthetic-observer thresholds. The bare inequality might be short; the actual OPT model class is not. The K_0 \approx 36 figure is now flagged as a bound on the bare selector rule only, not on the full theory; the full-theory complexity remains an open question.

  6. §8 (Comparative MDL table) — restated honestly. The earlier table is withdrawn in its quantitative form; replaced with a structural table showing what each model commits and where OPT’s structural shift sits, without claiming a specific bit-count advantage.

  7. §10 closure summary — repair-status item-by-item. Each pre-audit deliverable is tagged with v3.6.10 status (repaired / weakened / withdrawn / preserved-conditional).

Surviving content (v3.6.10). The structural-correspondence intuition — that OPT shifts explanatory burden from law-enumeration to law-selection — survives the audit. The §1–§3 coding conventions and benchmark classes are sound. The conditional dominance bound (corrected T-4b) is sound. What is withdrawn is only the quantitative MDL ranking — the claim that OPT achieves a specific bit-count advantage over \mathcal{M}_1.

Implications for the main paper: §5.1 / §6.6 / §8.13 citations to T-4d / T-4e can be updated from “pending T-4 repair” to “T-4d / T-4e v3.6.10 partial repair” — the existence claims (some structural shift; some conditional finite-T inequality of the right form) are now supported; the quantitative advantage claims are not. T-5 (constants recovery) is unblocked at v3.6.10 to the extent it depended on T-4 being internally consistent; T-5 still depends on T-2 (partially repaired v3.6.3).

Original closure status and the v3.6.0 audit annotations are preserved inline below for traceability. The replaced proofs (§4 v3.6.10, §6 v3.6.10, §7 v3.6.10) appear in place of the broken originals; the audit annotations referenced in the v3.6.0 closure block remain visible as historical commentary inside the replaced proofs.


§1. Fixing the MDL Coding Conventions

MDL comparisons are meaningless without explicit, fixed coding conventions. The preprint’s §5.1 notes this requirement but defers it. We fix conventions here following Rissanen (1978) [12] and the two-part MDL framework of Li & Vitányi (2008) [27].

1.1 The Two-Part Code Length

For a hypothesis class \mathcal{M} and observation sequence y_{1:T} \in \{0,1\}^*, the two-part MDL code length is:

L_T(\mathcal{M}) = K(\mathcal{M}) + L(y_{1:T} \mid \mathcal{M}) \tag{preprint §5.1, Eq. 13}

where K(\mathcal{M}) is the prefix Kolmogorov complexity of the hypothesis — the length of the shortest self-delimiting program on a fixed universal Turing machine (UTM) that outputs a complete description of \mathcal{M} — and L(y_{1:T} \mid \mathcal{M}) is the negative log-likelihood of the data under \mathcal{M}’s best predictive model:

L(y_{1:T} \mid \mathcal{M}) = -\log_2 P_\mathcal{M}(y_{1:T})

For deterministic theories (laws + IC uniquely determine observations), L(y_{1:T} \mid \mathcal{M}) = 0 when y is consistent with the theory and L = \infty otherwise. All logarithms are base 2; all code lengths in bits.

1.2 The Universal Machine

We fix a single optimal UTM \mathcal{U} throughout. All Kolmogorov complexities are relative to \mathcal{U}; results change by at most \mathcal{O}(1) bits under a different choice of UTM. The Solomonoff measure \xi is defined relative to \mathcal{U} (preprint Eq. 1). This fixes the convention for all subsequent comparisons.

1.3 Scope of y_{1:T}

We compare models on the domain each was designed to predict: the observer’s conscious stream y_{1:T} = z_{0:T} (the sequence of compressed latent states, C_{\max} bits per second over T seconds). Standard physics is evaluated on the same domain by reducing its predictions to the observer-compatible stream via coarse-graining. Both theories are asked to account for exactly the same observations.


§2. Benchmark Model Classes

Three benchmark classes are fixed. Each is assigned an explicit K(\mathcal{M}) estimate under our UTM convention. Precise numerical values are order-of-magnitude estimates; the structural results in §§3–7 depend only on the ordering, not the exact values.

2.1 \mathcal{M}_1 — Standard Model + General Relativity

The most predictively accurate physical theory currently available. Its description requires three components:

K(\mathcal{M}_1) = K_{\text{struct}} + K_{\text{param}} \approx 1750 \text{ bits}

K(\text{IC} \mid \mathcal{M}_1) \approx 300 \text{ bits (inflationary)}

2.2 \mathcal{M}_2 — Generic Renormalisable QFT

The class of all renormalisable quantum field theories in \leq 4 spacetime dimensions. This class contains \mathcal{M}_1 as one member. Because gauge group and particle content must also be specified:

K(\mathcal{M}_2) \gg K(\mathcal{M}_1) \gg 1750 \text{ bits}

\mathcal{M}_2 is included as the foil for OPT’s claim that laws are selected, not enumerated. While the MDL comparison with \mathcal{M}_2 is trivially won by any finite sub-class (including \mathcal{M}_1) because K(\mathcal{M}_2) is unbounded, its inclusion serves formally to demonstrate the infinite scale of the parameter selection problem that the Stability Filter natively collapses.

2.3 \mathcal{M}_3 — Boltzmann Brain / Thermal Fluctuation

Standard physics with maximally simple initial conditions: a thermal (maximum-entropy) state at the Planck scale. Laws are identical to \mathcal{M}_1; initial conditions are trivially simple:

K(\mathcal{M}_3) \approx K(\mathcal{M}_1) \approx 1750 \text{ bits}, \qquad K(\text{IC} \mid \mathcal{M}_3) \approx 10 \text{ bits}

However, the log-likelihood of observing an ordered conscious stream y_{1:T} under \mathcal{M}_3 is astronomically small: L(y_{1:T} \mid \mathcal{M}_3) \approx K(y_{1:T}) \gg T \cdot C_{\max}. \mathcal{M}_3 thus has negligible IC cost but a catastrophic likelihood cost, and is included to show that OPT’s MDL advantage is not achieved by the same trick.


§3. OPT’s Code Length — Theorem T-4a

The MDL code length for OPT decomposes as:

L_T(\text{OPT}) = K(\xi, \text{Filter}) + L(y_{1:T} \mid \xi, \text{Filter}) = K_0 + \left(-\log \xi^{\text{Filter}}(y_{1:T})\right)

where \xi^{\text{Filter}} is the Solomonoff measure \xi conditioned on the observer-compatible class \mathcal{O} (streams satisfying R_{\text{req}} \leq B_{\max}), and K_0 = K(\xi, \text{Filter}) is the description length of the selector rule.

Theorem T-4a (Meta-Rule Complexity Bound). K(\xi, \text{Filter}) = K_0 = \mathcal{O}(1) bits. Specifically:

K_0 \leq K(\mathcal{U}) + K(C_{\max}) + K(\Delta t) + c

where K(\mathcal{U}) is the complexity of the UTM, K(C_{\max}) = \mathcal{O}(\log C_{\max}) bits encodes the bandwidth threshold to experimental precision, K(\Delta t) = \mathcal{O}(\log \Delta t) encodes the update window, and c is a small universal constant.

Proof. The Solomonoff measure \xi is uniquely determined by the fixed UTM \mathcal{U}, so K(\xi \mid \mathcal{U}) = \mathcal{O}(1). The Stability Filter requires two parameters: C_{\max} and \Delta t, each measured to \sim 4 significant figures, so K(C_{\max}, \Delta t) \leq 2 \times (4 \times \log_2 10) \approx 26 bits. The condition R_{\text{req}} \leq B_{\max} is a single inequality in fixed notation: \sim 10 bits. Total: K_0 \leq K(\mathcal{U}) + 36 bits.

To absorb K(\mathcal{U}) fairly, we must assume an “epistemically neutral” UTM — meaning a reference machine whose built-in instruction set encodes no physical theory preferentially (i.e. a basic combinator or Brainfuck-equivalent geometry, completely agnostic to physics). Under such an unbiased machine, maintaining K(\xi, \text{Filter}) \approx 36 bits while standardizing K(\mathcal{M}_1) \approx 1750 bits is valid. We acknowledge this specifically leaves the absolute bit-count vulnerable to an \mathcal{O}(1) constant scaling if the UTM is changed, meaning the 36 vs 1750 calculation is inherently relative. The structurally honest mathematical statement here is the rank ordering (K_0 \ll K(\mathcal{M}_1)), asserting a robust structural advantage independent of the precise numeric constant. \blacksquare

Comparison: Excluding the shared UTM overhead, K_0 \approx 36 bits vs. K(\mathcal{M}_1) \approx 1750 bits. The OPT selector rule is shorter than the Standard Model description by K(\mathcal{M}_1) - K_0 \approx 1714 bits. This is the structural parsimony advantage claimed in §5 of the preprint — now with an explicit bit count.

v3.6.10 caveat (annotation). K_0 \approx 36 bits bounds the bare selector rule only — the inequality R_{\text{req}} \le B_{\max} plus its two parameters — not the full OPT model class, whose conceptual scaffolding (observer-compatible stream, Markov blanket, predictive rate, distortion metric, bottleneck architecture, Stability Filter, rendering, codec relation, self-model conditions, maintenance dynamics, synthetic-observer thresholds) remains unscored; K_0 \approx 36 is therefore not credible as full-theory complexity. The \approx 1714-bit comparison above is accordingly withdrawn as a quantitative claim; it survives only as the rank-ordering correspondence K_0 \ll K(\mathcal{M}_1) — the structural shift from law-enumeration to law-selection. See the front repair summary (item 5), §8, and §10.


§4. The Solomonoff Dominance Bound — Theorem T-4b (v3.6.10 sign-corrected)

Theorem T-4b (Solomonoff Dominance Bound, v3.6.10). For any computable physics measure \nu (including \mathcal{M}_1, \mathcal{M}_2, \mathcal{M}_3) with K(\nu) < \infty, and for any data stream y_{1:T} \in \mathcal{O}:

\boxed{\; L_T(\text{OPT}) \;\le\; L_T(\nu) \,+\, K_0 \,+\, K(\nu) \,-\, \log \xi(\mathcal{O}) \,+\, O(1) \;}

where \log \xi(\mathcal{O}) \le 0 (since \xi(\mathcal{O}) \le 1). Equivalently, writing the event-encoding cost as -\log \xi(\mathcal{O}) \ge 0:

L_T(\text{OPT}) \;\le\; L_T(\nu) \,+\, K_0 \,+\, K(\nu) \,+\, \bigl[\,-\log \xi(\mathcal{O})\,\bigr] \,+\, O(1).

The term -\log \xi(\mathcal{O}) is the event-encoding cost of declaring “the stream lies in \mathcal{O}” as a separate event in the code; it cancels the conditional-likelihood gain (see derivation below).

Proof (v3.6.10). From the definition of the Solomonoff measure (preprint Eq. 1), with w_\nu \asymp 2^{-K(\nu)}:

\xi(y_{1:T}) \;\geq\; w_\nu \cdot \nu(y_{1:T}) \;\geq\; 2^{-K(\nu)} \cdot \nu(y_{1:T})

Taking negative logarithms gives the bare Solomonoff dominance bound:

-\log \xi(y_{1:T}) \;\leq\; -\log \nu(y_{1:T}) + K(\nu) + O(1) \tag{4.1}

When transitioning from the universal measure \xi to the conditional measure on \mathcal{O}, the correct conditional-code identity is:

\boxed{\; -\log \xi^F(y) \;=\; -\log \xi(y) \,+\, \log \xi(\mathcal{O}) \;}

with \log \xi(\mathcal{O}) \le 0. This is a reduction in code length, not a penalty: conditioning on a known event makes the conditional measure assign higher probability to events inside the conditioning set, lowering their code length. Substituting into L_T(\text{OPT}) = K_0 - \log \xi^F(y_{1:T}):

L_T(\text{OPT}) \;=\; K_0 \,-\, \log \xi(y_{1:T}) \,+\, \log \xi(\mathcal{O}) \;\le\; K_0 \,+\, K(\nu) \,-\, \log \nu(y_{1:T}) \,+\, \log \xi(\mathcal{O}) \,+\, O(1)

=\; L_T(\nu) \,+\, K_0 \,+\, K(\nu) \,+\, \log \xi(\mathcal{O}) \,+\, O(1)

The boxed dominance bound writes -\log \xi(\mathcal{O}) (the event-cost form, \ge 0) rather than +\log \xi(\mathcal{O}) (the reduction form, \le 0), to make the bound a sum of non-negative terms on the RHS. The two are not algebraically identical: the event-cost form weakens the derived bound by the non-negative slack -2 \log \xi(\mathcal{O}) (the tight form replaces -\log \xi(\mathcal{O}) with +\log \xi(\mathcal{O})). The boxed bound remains true as a deliberate weakening, and makes the structural reading transparent — OPT’s code length is at most benchmark + selector-rule cost + benchmark-complexity cost + event-encoding cost. \blacksquare

Event-encoding identity (v3.6.10 — structural reading). When \mathcal{O} is encoded explicitly as a separate event in the code, the event cost -\log \xi(\mathcal{O}) \ge 0 enters with the opposite sign from the conditional-likelihood gain \log \xi(\mathcal{O}) \le 0. The two cancel exactly:

\bigl[-\log \xi(\mathcal{O})\bigr] \,+\, \bigl[-\log \xi(y) + \log \xi(\mathcal{O})\bigr] \;=\; -\log \xi(y)

i.e., encoding y as “first encode the event \mathcal{O} contains y; then encode y given \mathcal{O}” has the same total code length as encoding y directly under \xi. This is a Kraft-inequality-style consistency check; the boxed dominance bound is correct in either reading, but only the corrected sign makes the structural decomposition coherent.

Important caveat (preserved through v3.6.10). Theorem T-4b does not show that OPT outperforms any benchmark. It shows that OPT cannot do worse than benchmark \nu by more than K_0 + K(\nu) - \log \xi(\mathcal{O}) + O(1) bits. The bound is upper only; it does not establish any positive parsimony advantage for OPT. The v3.6.0 audit annotation that the appendix’s downstream use of \log(1/\xi(\mathcal{O})) as an “added penalty” propagated the sign error into T-4d / T-4e / §8 is correct — and the §6 / §7 / §8 chain is repaired below to use the corrected sign.

v3.6.0 audit annotation (preserved as historical commentary). The original proof (pre-v3.6.10) wrote -\log \xi^{\text{Filter}}(y) = -\log \xi(y) + \log(1/\xi(\mathcal{O})) with the wrong sign, leading to a normalisation “penalty” K'_0 = K_0 + \log(1/\xi(\mathcal{O})) rather than the correct K'_0 = K_0 - \log \xi(\mathcal{O}) event-cost form. The two have the same absolute value but the structural meaning was inverted. The v3.6.10 corrected proof above replaces the broken derivation; the original error is retained as a footnote for traceability.


§5. The Initial Conditions Compression — Theorem T-4c

The structural source of OPT’s MDL advantage is the compression of initial conditions. In standard physics, the laws and the initial conditions are separate objects that must both be described. In OPT, the initial conditions are absorbed into the prior: the Solomonoff measure already assigns highest weight to the simplest observer-compatible streams, making a separate IC specification redundant.

5.1 The IC Redundancy Argument

Under standard physics (\mathcal{M}_1), the full MDL code for a deterministic theory is:

L_T(\text{SP}) = K_{\text{laws}} + K(\text{IC} \mid \text{laws}) + 0 \qquad \text{[deterministic: } -\log P = 0 \text{ if consistent]}

The IC term K(\text{IC} \mid \text{laws}) is the description length of the specific initial conditions given the laws — it is not derivable from the laws themselves. This is the locus of fine-tuning.

Under OPT, the two-part code is:

L_T(\text{OPT}) = K_0 + \left(-\log \xi^{\text{Filter}}(y_{1:T})\right)

The term -\log \xi^{\text{Filter}}(y_{1:T}) encodes the specific stream given the meta-rule. The Solomonoff prior already incorporates a universal model of physics: -\log \xi(y) \approx K(y). The OPT encoding never needs to separately pay for the IC.

Conjecture T-4c (IC Compression Heuristic Bound). Define the IC compression advantage:

\Delta_{\text{IC}} = K(\text{IC} \mid \text{SP laws}) - K(\text{IC} \mid \text{OPT})

We argue the following heuristic bound:

\boxed{L_T(\text{OPT}) \leq L_T(\text{SP}) - \Delta_{\text{IC}} + K_0 + \mathcal{O}(1)}

where K(\text{IC} \mid \text{OPT}) := K(\text{IC} \mid \xi, \text{Filter}, \text{codec}) is the residual description length of the initial conditions given OPT’s full model. \Delta_{\text{IC}} \geq 0, with equality iff the Stability Filter provides no additional compression of the IC beyond what the laws already give.

Argument. Starting from the full two-part code for SP and applying Solomonoff dominance (absorbing the normalisation constants into an \mathcal{O}(1) UTM bounding term):

L_T(\text{OPT}) \leq K_0 + K(\text{laws}) + K(\text{IC} \mid \text{laws}) - \log P_{\text{SP}}(y) + \mathcal{O}(1)

Rearranging and substituting L_T(\text{SP}) = K_{\text{laws}} + K(\text{IC} \mid \text{laws}) (deterministic theory):

L_T(\text{OPT}) \leq L_T(\text{SP}) + K_0 + \mathcal{O}(1)

Within OPT, -\log \xi^{\text{Filter}}(y_{1:T}) need not individually encode IC: the Filter selects from the Solomonoff prior, which compresses IC inherently via length weightings. AIT subadditivity guarantees K(\text{IC} \mid x, f(x)) \leq K(\text{IC} \mid x) + \mathcal{O}(1). If we postulate that the OPT selection rule bounds as a tighter descriptive string than simply declaring the raw laws (which is the core wager of the framework, not a mathematical derivative proof), then the residual encoded K(\text{IC} \mid \text{OPT}) cannot exceed K(\text{IC} \mid \text{laws}) significantly. Yielding heuristically \Delta_{\text{IC}} \geq 0.

By substitution: L_T(\text{OPT}) \leq L_T(\text{SP}) - \Delta_{\text{IC}} + K_0 + \mathcal{O}(1). \blacksquare

v3.6.0 ABSORPTION-VS-ELIMINATION ANNOTATION (pending repair). The argument above implicitly treats “the Filter selects from the Solomonoff prior, which compresses IC inherently” as if absorption into the prior eliminates the IC term. It does not. The universal prior \xi does absorb laws + IC into the model-code half of the two-part code, but the specific observed stream y_{1:T} still has to be encoded in the likelihood term -\log \xi^F(y_{1:T}). If the shortest program generating y is “standard physics + initial conditions + observer extraction,” the Solomonoff code pays for that inside -\log \xi(y) — it moves the IC cost from the model side to the data-likelihood side, but does not delete it. The boxed inequality L_T(\text{OPT}) \le L_T(\text{SP}) - \Delta_{\text{IC}} + K_0 + \mathcal{O}(1) overcounts the OPT advantage by treating the IC absorption as elimination rather than relocation. The corrected statement should make explicit that any saving on the model-code side has a corresponding cost on the data-likelihood side; the residual structural advantage (if any) depends on the Stability Filter making the observer-stream typical under the universal prior, which is the load-bearing typicality assumption noted in the §10 closure summary. Repair of T-4 must redo the §5 / §6 / §7 / §8 chain making the model-code vs. data-likelihood book-keeping explicit.

Remark. We hypothesize the anthropic compression K(\text{IC} \mid \text{OPT}) \approx 0 operates in the limit where the Stability Filter is highly constraining, mapping mathematically to uniquely observer-compatible states. This is a motivated physical proposition rather than an algorithmically proven uniqueness bound.


§6. Asymptotic Dominance Bound — Theorem T-4d (v3.6.10 demoted from “permanent advantage”)

Theorem T-4d (Asymptotic Dominance Bound, v3.6.10). For every fixed computable physics measure \nu with K(\nu) < \infty, and for any data stream y_{1:T} \in \mathcal{O} that is also \nu-typical:

\boxed{\; L_T(\text{OPT}) \,-\, L_T(\nu) \;\le\; K(\nu) \,+\, K_0 \,-\, \log \xi(\mathcal{O}) \,+\, O(1) \quad \text{as } T \to \infty \;}

This is an asymptotic upper bound, not equality. OPT cannot do worse than benchmark \nu by more than K(\nu) + K_0 - \log \xi(\mathcal{O}) + O(1) bits. The bound does not establish that OPT achieves a fixed K_0 - K(\nu) saving over \nu; it establishes only that OPT does not lose more than K(\nu) + K_0 - \log \xi(\mathcal{O}) + O(1) bits.

Proof (v3.6.10). From T-4b (v3.6.10), L_T(\text{OPT}) \le L_T(\nu) + K_0 + K(\nu) - \log \xi(\mathcal{O}) + O(1). Rearranging,

L_T(\text{OPT}) - L_T(\nu) \;\le\; K_0 + K(\nu) - \log \xi(\mathcal{O}) + O(1).

This holds at every finite T and passes to the asymptotic limit unchanged. \blacksquare

What T-4d does not establish (v3.6.10 explicit). The earlier draft of T-4d claimed L_T(\text{OPT}) - L_T(\nu) \to K_0 - K(\nu) — i.e., that OPT achieves a permanent constant-bit advantage of approximately 1714 bits over \mathcal{M}_1. This claim does not follow from Solomonoff dominance alone. Solomonoff dominance gives an upper bound up to a complexity constant (-\log \xi(y) \le -\log \nu(y) + K(\nu) + O(1)), not equality. The asymptotic per-bit log-loss convergence -T^{-1} \log \xi(y_{1:T}) \to H(\nu) that the earlier proof invoked is valid for \nu-typical sequences, but it controls the per-bit rate, not the total-bit advantage; the additive complexity constant K(\nu) + O(1) is exactly the gap that prevents the bound from collapsing to equality. The v3.6.10 corrected statement therefore retains the direction of T-4b (OPT does not lose more than a complexity constant) without claiming the magnitude of any saving.

The earlier T-4d’s “permanent fixed-bit advantage of \sim -1714 bits” is withdrawn at v3.6.10.

v3.6.0 audit annotation (preserved as historical commentary). The original T-4d proof exploited Solomonoff convergence on \nu-typical sequences to claim asymptotic equality L_T(\text{OPT}) - L_T(\nu) \to K_0 - K(\nu). The convergence is valid; the inference to equality is not. The v3.6.10 corrected statement above replaces the broken claim with the correct dominance-bound-only reading.

What survives (v3.6.10). The structural claim — that OPT’s formulation shifts the locus of explanatory burden from law-enumeration (K(\nu) bits per benchmark) to law-selection (K_0 bits for the selector rule) — survives the audit as a qualitative correspondence. The framework’s wager is that the typicality assumption (Stability Filter producing \nu-typical streams) plus the IC compression conjecture (T-4c) makes the dominance bound tight in practice; this wager is empirical and conjectural rather than mathematically established, and is what T-4 sets up rather than what T-4 proves. The earlier appendix overstated this wager as an established result; v3.6.10 honestly downgrades the headline claim while preserving the structural intuition.


§7. The Finite-T Conditional Inequality — Theorem T-4e (v3.6.10 cancellation-fixed, weakened)

For streams of finite length, the MDL comparison requires honest book-keeping of which side of the two-part code the law-complexity term appears on.

Theorem T-4e (Finite-T Conditional Inequality, v3.6.10). OPT achieves a strict finite-T MDL advantage over \mathcal{M}_1 — that is, L_T(\text{OPT}) < L_T(\mathcal{M}_1) — if and only if:

\boxed{\; K(\text{IC} \mid \text{SP laws}) \;>\; K_0 \,-\, K_{\text{laws}} \,+\, \bigl[\,-\log \xi^F(y) \,-\, (-\log P_{\text{SP}}(y))\,\bigr] \;}

No separate -\log \xi(\mathcal{O}) term appears (2026-06 correction). The conditional-code identity -\log \xi^F(y) = -\log \xi(y) + \log \xi(\mathcal{O}) already absorbs the event cost into the deficit term; the earlier v3.6.10 box carried an additional -\log \xi(\mathcal{O}) term, double-counting the event cost and inflating the RHS by |\log \xi(\mathcal{O})| — under which the stated “if and only if” held only as “if”. The form above is the exact biconditional.

Note the -K_{\text{laws}} term (v3.6.10 cancellation fix). The earlier draft of T-4e omitted this term via a fake cancellation [K_{\text{laws}} - K_{\text{laws}}] that does not exist on the OPT side of the inequality. The -K_{\text{laws}} term is real, load-bearing, and dominates the RHS at realistic scales: under K_{\text{laws}}(\text{SM+GR}) \sim 1450 bits and K_0 \approx 36 bits, the threshold becomes K(\text{IC} \mid \text{laws}) > -1414 + \text{deficit}, which is trivially satisfied for almost any positive IC complexity — modulo the sign-indeterminate log-likelihood deficit term, which is uncomputed and can dominate.

The triviality of the satisfaction condition is precisely what makes T-4e much weaker as a structural claim than the earlier K(\text{IC}) > 36 presentation suggested.

Proof (v3.6.10 cancellation-fixed). Direct manipulation of the two-part code lengths, with the sign correction from §4 (v3.6.10):

L_T(\text{OPT}) \;<\; L_T(\text{SP})

Substituting the corrected L_T(\text{OPT}) = K_0 + [-\log \xi(y) + \log \xi(\mathcal{O})] (sign-corrected per §4 v3.6.10) and L_T(\text{SP}) = K_{\text{laws}} + K(\text{IC} \mid \text{laws}) - \log P_{\text{SP}}(y):

K_0 \,-\, \log \xi(y) \,+\, \log \xi(\mathcal{O}) \;<\; K_{\text{laws}} \,+\, K(\text{IC} \mid \text{laws}) \,-\, \log P_{\text{SP}}(y)

Honest rearrangement (no fake cancellation):

K(\text{IC} \mid \text{laws}) \;>\; K_0 \,-\, K_{\text{laws}} \,+\, \log \xi(\mathcal{O}) \,+\, \bigl[\,-\log \xi(y) \,-\, (-\log P_{\text{SP}}(y))\,\bigr]

By the conditional-code identity -\log \xi^F(y) = -\log \xi(y) + \log \xi(\mathcal{O}) (§4 v3.6.10), the \log \xi(\mathcal{O}) term combines with -\log \xi(y) into the conditioned likelihood, leaving no separate \xi(\mathcal{O}) term:

K(\text{IC} \mid \text{laws}) \;>\; K_0 \,-\, K_{\text{laws}} \,+\, \bigl[\,-\log \xi^F(y) \,-\, (-\log P_{\text{SP}}(y))\,\bigr]

This is the boxed inequality; every step is an equivalence, so the condition is exact (if and only if). The -K_{\text{laws}} term is present and cannot be cancelled because K_{\text{laws}} appears only on the SP side of the original inequality, not on the OPT side. \blacksquare

What the cancellation-fixed T-4e actually says (v3.6.10 honesty). Under realistic K_{\text{laws}} (the laws of physics cost order \sim 10^3 bits), the RHS of the corrected inequality is negative or near-zero (dominated by -K_{\text{laws}}); the condition K(\text{IC}) > \text{negative number} is trivially satisfied by any positive IC complexity — modulo the sign-indeterminate log-likelihood deficit term, which is uncomputed and can dominate. T-4e therefore establishes that some finite-T MDL inequality of the right form holds — but the quantitative content is much weaker than the earlier “K(\text{IC}) > K_0 \approx 36 bits, holds by substantial margin” reading. The corrected T-4e is an existence statement rather than a strength statement.

Recovery of a stronger T-4e (open). A meaningful quantitative T-4e advantage would require either (i) decomposing -\log \xi^F(y) into a component that already includes the law-cost K_{\text{laws}} (which would require Solomonoff’s universal prior to be shown to charge K_{\text{laws}} “for free” on observer streams — a non-trivial typicality assumption), or (ii) restating T-4e in terms of codec-MDL rather than prior-MDL (the codec-based formulation does not pay K_{\text{laws}} on the OPT side; this is the path §9.2 already gestures at via the P-2 codec-identification constraint). Neither approach is established at v3.6.10; T-4e is left in the corrected weaker form.

v3.6.0 audit annotation (preserved as historical commentary). The original T-4e proof contained a fake cancellation [K_{\text{laws}} - K_{\text{laws}}] that obscured the real -K_{\text{laws}} dependence on the OPT side. The v3.6.10 corrected proof above replaces the broken derivation; the cancellation error is retained in the historical record by the inline K_laws - K_laws annotation in the §7.1 evaluation block below.

7.1 Evaluating the Condition for Standard Cosmology (v3.6.10 honestly evaluated)

Under the inflationary encoding (the most favourable case for SP):

The v3.6.10 corrected condition:

K(\text{IC} \mid \text{SP laws}) \;>\; \underbrace{K_0}_{\sim 36} \;-\; \underbrace{K_{\text{laws}}}_{\sim 1450} \;+\; \underbrace{[\,-\log \xi^F(y) - (-\log P_{\text{SP}}(y))\,]}_{\text{sign indeterminate}}

\Longrightarrow \quad K(\text{IC}) \;>\; -1414 \;+\; \text{deficit}

Whenever the deficit term stays below \sim 1414 bits, the RHS is negative or near-zero, and the threshold is trivially satisfied by any positive IC complexity (including K(\text{IC}) = 1 bit). The condition fails only in one regime: a catastrophic log-likelihood deficit on the specific stream, [-\log \xi^F(y) - (-\log P_{\text{SP}}(y))] \gtrsim 1414 bits — which cannot be ruled out, since the deficit is sign-indeterminate and uncomputed.

What this honestly establishes (v3.6.10). OPT achieves a finite-T MDL inequality of the right form, but the inequality is far too loose to support a quantitative “OPT beats SM+GR by \sim 1714 bits” headline claim. The structural shift from law-enumeration to law-selection is real; the quantitative MDL ranking is not established by T-4e as written.

The earlier “K(\text{IC} \mid \text{SP laws}) > K_0, i.e., 300 > 36” presentation is withdrawn at v3.6.10 because it relied on the fake K_{\text{laws}} cancellation.


§8. The Comparative MDL Table — Structural Form (v3.6.10 restated)

v3.6.10 RESTATEMENT. The earlier quantitative MDL table (preserved below as the v3.6.0 withdrawn-pending-repair table for historical traceability) claimed an OPT total of \sim 36 bits versus SM+GR’s \sim 2050 bits. That table is withdrawn at v3.6.10 because three errors compounded in OPT’s row: (i) absorption-vs-elimination — laws + IC absorbed into the prior \xi are not eliminated, they reappear in -\log \xi^F(y); (ii) sign error — the conditional likelihood was written with the wrong sign; (iii) K_0 \approx 36 — the selector-rule complexity is much larger than the bare inequality R_{\text{req}} \le B once the full conceptual scaffolding is counted. The v3.6.10 restatement is structural rather than quantitative: it shows where each model commits in the two-part code, not what the total bit-counts are.

v3.6.10 structural comparison table.

Model Model-code commitments Data-likelihood commitments Structural posture
\mathcal{M}_1 — SM + GR K_{\text{laws}}(\text{SM+GR}) \approx 1450 bits (gauge group, parameters, Lorentz, GR symmetry) + K(\text{IC} \mid \text{laws}) \approx 300 bits (inflationary) -\log P_{\text{SP}}(y) \approx 0 for y deterministically consistent with laws + IC Enumerates laws explicitly; IC paid as separate fine-tuning
\mathcal{M}_3 — Boltzmann K_{\text{laws}} \approx 1450 + K(\text{IC}) \approx 10 bits (thermal max-entropy) -\log P(y) \gg 0 for any ordered conscious stream (catastrophic likelihood) Same laws; IC trivially short but likelihood catastrophic
\mathcal{M}_{\text{OPT}} (v3.6.10 corrected) K_0 (selector rule — bare inequality \sim 36 bits; full conceptual scaffolding much larger; see §3 v3.6.10) + -\log \xi(\mathcal{O}) (event cost, \ge 0, uncomputed) -\log \xi^F(y) = -\log \xi(y) + \log \xi(\mathcal{O}) — by Solomonoff dominance -\log \xi(y) \le -\log \nu(y) + K(\nu) + O(1)includes the law-cost component if \nu is the SM+GR codec Selector rule + universal prior absorbs laws into model-code; relocates IC + law costs into the data-likelihood term; structural shift is real, quantitative advantage is conditional

What the v3.6.10 table establishes.

The structural commitments of each model are now visible:

What the v3.6.10 table does not establish.

The earlier headline claim that OPT achieves a \sim 1714-bit total MDL advantage over \mathcal{M}_1 is withdrawn. Two reasons:

  1. The relocation (laws + IC) → (-\log \xi^F(y)) is not eliminated; it moves cost between the model-code half and the data-likelihood half, but the total need not shrink.
  2. The typicality assumption — that the Stability Filter restriction to \mathcal{O} makes observer-streams \nu-typical under the universal prior so that Solomonoff convergence applies — is itself the substantive open problem. Without that assumption, the dominance bound is only a bound, not an advantage.

What survives (v3.6.10). The structural-correspondence claim — that OPT shifts the locus of explanatory burden from law-enumeration to law-selection — survives. The shift is real and visible in the table above. What is withdrawn is the quantitative advantage; the structural shift is preserved as a qualitative posture.


v3.6.0 withdrawn-pending-repair table (preserved as historical commentary):

Model K(\mathcal{M}) (bits) K(\text{IC}\mid\mathcal{M}) (bits) -\log P(y\mid\mathcal{M}) L_T total MDL rank
\mathcal{M}_1 — SM + GR \sim 1750 \sim 300 (inflationary) \sim 0 (deterministic) \sim 2050 2nd (inflationary)
\mathcal{M}_3 — Boltzmann \sim 1750 \sim 10 \gg 0 (rare stream) \gg 1760 Last (catastrophic likelihood)
\mathcal{M}_{\text{OPT}} — OPT (withdrawn at v3.6.0; replaced v3.6.10) \sim 36 (not credible — selector-rule presupposes much more) \sim 0 (absorption ≠ elimination) \sim 0 (double-counts laws) \sim 36 (invalid) 1st (unsupported)

Retained for traceability of the v3.6.0 audit; the v3.6.10 structural table above replaces this for current claims.


§9. Limits of the Comparison

9.1 K(y \mid \text{Filter}) is Not Computable

The OPT code length K_0 + K(y \mid \text{Filter}) = K_0 - \log \xi^{\text{Filter}}(y) contains a term that is not computable in the Turing sense (the halting problem prevents computing \xi exactly). In practice, OPT’s predictions must be approximated by a finite codec K_\theta — which is standard physics. This means that for predictive purposes, OPT reduces to the best computable codec available. The MDL advantage of OPT over SP is therefore a structural advantage (in the description of the selector rule) rather than an operational advantage in making novel predictions.

This is not a flaw — it is the correct formal content of the preprint’s claim: “OPT shifts part of the explanatory burden from law-enumeration to law-selection.” The shift is real as a structural correspondence — the rank-ordering K_0 \ll K(\mathcal{M}_1), with no quantitative bit figure claimed (see the §3 v3.6.10 caveat) — but it does not generate new predictive content over and above what the codec already provides.

9.2 The Codec Identification Problem

The OPT codec K_\theta is the specific computable measure from \mathcal{M} that the Stability Filter selects. T-4 does not determine which measure this is — that identification requires T-5 (constants recovery) and the full physical unification programme. Until K_\theta is explicitly identified with SM + GR, the MDL comparison is conditional on this identification. The formal bound L_T(\text{OPT}) \leq L_T(\text{SP}) + K_0 guarantees that OPT cannot do worse than SP, but does not guarantee it does better in finite time unless the IC condition of T-4e is met — whose corrected form is an existence statement only (§7 v3.6.10); no quantitative finite-T advantage is established.

Constraint from P-2. Appendix P-2 (Hilbert Space Embedding via Quantum Error Correction) establishes that, under local noise, the codec must satisfy QECC structure — its internal representation must constitute a quantum error-correcting code with specific parameters (n, k, d). This narrows the codec identification problem: K_\theta is no longer an arbitrary computable measure, but one whose predictive states carry the error-correcting geometry of a Hilbert space. This constraint is upstream of T-5’s constants recovery programme and may provide additional selection criteria for identifying K_\theta with the Standard Model.


§10. Closure Summary (v3.6.10 partial-repair status)

v3.6.10 STATUS: PARTIAL REPAIR. The two formal errors (§4 sign, §7 cancellation) are now fixed; the three interpretive overclaims (Solomonoff dominance giving finite advantage; OPT getting data likelihood “for free”; K_0 \approx 36 bits as full-theory complexity) are now acknowledged and the corresponding theorem claims withdrawn or weakened. The structural-correspondence intuition — that OPT shifts the locus of explanatory burden from law-enumeration to law-selection — survives. The quantitative advantage claims (\sim -1714 bits) do not survive.

T-4 Deliverables — v3.6.10 partial-repair status (item-by-item):

  1. Coding conventions fixed (§1). PRESERVED. Two-part MDL, prefix Kolmogorov complexity relative to fixed UTM, data domain y_{1:T} = z_{0:T}. No v3.6.10 changes.

  2. Benchmark classes fixed (§2). PRESERVED. \mathcal{M}_1 (SM+GR), \mathcal{M}_2 (generic QFT), \mathcal{M}_3 (Boltzmann). The benchmarks themselves are not the unfair side of the table; the OPT row was. v3.6.10’s restated §8 table fixes that.

  3. T-4a (Meta-rule complexity). WEAKENED. K_0 \approx 36 bits is preserved as a bound on the bare selector rule only. The full-theory complexity (selector rule + conceptual scaffolding: observer-compatible stream, Markov blanket, predictive rate, distortion metric, bottleneck architecture, Stability Filter, rendering, codec, self-model, maintenance, synthetic-observer thresholds) is much larger and remains an open question. The rank-ordering claim (K_0 \ll K(\mathcal{M}_1)) survives only at the structural-correspondence level, not as a quantitative result.

  4. T-4b (Solomonoff dominance). REPAIRED. The sign error is fixed. The corrected dominance bound L_T(\text{OPT}) \le L_T(\nu) + K_0 + K(\nu) - \log \xi(\mathcal{O}) + O(1) is now in §4 v3.6.10. The bound establishes only that OPT cannot do worse than benchmark by more than a complexity constant; it does not establish that OPT does better.

  5. Conjecture T-4c (IC compression heuristic bound). PRESERVED AS CONJECTURE. The conjecture \Delta_{\text{IC}} \ge 0 is retained as conjectural, with the absorption-vs-elimination distinction in §5.1 now load-bearing in the main flow. The hypothetical K(\text{IC} \mid \text{OPT}) \approx 0 regime is acknowledged to depend on the unestablished typicality assumption.

  6. T-4d (Constant-bit model advantage). DEMOTED. The earlier “permanent constant-bit advantage of \sim -1714 bits” claim is withdrawn. Replaced by §6 v3.6.10 asymptotic dominance bound: L_T(\text{OPT}) - L_T(\nu) \le K(\nu) + K_0 - \log \xi(\mathcal{O}) + O(1) as T \to \infty — upper bound only, no positive advantage established.

  7. T-4e (Finite-T conditional inequality). CANCELLATION-FIXED, WEAKENED. The corrected exact inequality K(\text{IC}) > K_0 - K_{\text{laws}} + \text{deficit} — with no separate -\log \xi(\mathcal{O}) term; the event cost is absorbed into the -\log \xi^F(y) deficit (2026-06 correction of the v3.6.10 double-count) — is now in §7. With K_{\text{laws}} \sim 1450 bits dominating, the threshold is trivially satisfied for any positive IC complexity — modulo the sign-indeterminate log-likelihood deficit term, which is uncomputed and can dominate — but the structural content is correspondingly much weaker than the earlier “300 > 36” reading suggested. T-4e is an existence statement, not a strength statement.

  8. §8 comparative MDL table. RESTATED. The earlier quantitative table is withdrawn (preserved as historical commentary). The v3.6.10 structural table shows where each model commits in the two-part code without claiming a specific bit-count advantage.

Falsification conditions for the MDL claim (v3.6.10 honestly evaluated).

The earlier falsification list assumed T-4d / T-4e established quantitative advantages; with those advantages withdrawn, the relevant falsifiers are:

The earlier falsifiers (a) — “IC derivable from SP laws in fewer than 36 bits” — and (b) — “Stability Filter does not compress IC” — are moot at v3.6.10 because T-4d’s quantitative advantage claim is withdrawn; the inequality is no longer expressed as “K(\text{IC}) > K_0” at the bare 36-bit level.

Downstream dependencies (v3.6.10).


This appendix is maintained as part of the OPT project repository alongside theoretical_roadmap.pdf. References: Rissanen (1978) [12], Li & Vitányi (2008) [27], Solomonoff (1964) [11], Penrose (2004).