Ordered Patch Theory

Appendix T-1: Stability Filter — Full Rate-Distortion Specification

Anders Jarevåg

April 3, 2026 | DOI: 10.5281/zenodo.19300777


Original Task T-1: Stability Filter — Full Rate-Distortion Specification Problem: Shannon’s Rate-Distortion theory requires: a source X, a reproduction alphabet, and a distortion function d(x, \hat{x}). The preprint invokes R_{pred}(D) without specifying these three elements for OPT’s substrate. Deliverable: A complete (\mathcal{X}, \hat{\mathcal{X}}, P_X, d) specification for OPT’s rate-distortion problem.

This revision distinguishes excess entropy from statistical complexity, proves the predictive-KL identity at finite horizon, proves the general lower bound R_{T,h}(D)\ge E_{T,h}-D, and states an exact equality criterion for when that lower bound is attained. C_{\max} remains an empirical parameter rather than a quantity derived from the rate-distortion formalism.
Closure status: PARTIALLY RESOLVED — v3.6.6 reformulation as predictive Information Bottleneck. The four-tuple specification, the predictive-KL identity (under optimal-decoder reading), and the general lower bound R_{T,h}(D) \geq E_{T,h}(\nu) - D are established with an exact equality criterion. The earlier generic closed-form claim R(D) = C_\mu - D has been retracted; the correct result is the lower bound. C_{\max} remains an empirical parameter rather than a quantity derived from the rate-distortion formalism.

v3.6.6 corrections (four items from the appendix-corrections memo §2.3):

  1. Reformulated as predictive Information Bottleneck explicitly. The four-tuple is not a clean Shannon RD problem — the original distortion d_h(x, z) = D_{\text{KL}}(P_\nu(Y \mid X{=}x) \| P_\nu(Y \mid Z{=}z)) has its right-hand argument depending on the encoder p(z \mid x), which is not the standard Shannon form (where d is a fixed function on \mathcal{X} \times \hat{\mathcal{X}}). The v3.6.6 reformulation makes the reproduction alphabet a space of predictive distributions \hat{p}_z over horizon-h future blocks and defines the distortion as the decoder-explicit KL d_h(x, \hat{p}_z) := D_{\text{KL}}(P_\nu(Y \mid X{=}x) \| \hat{p}_z(Y)). The optimal decoder is \hat{p}^*_z(y) = P_\nu(Y{=}y \mid Z{=}z); substituting this into the new distortion recovers the original encoder-dependent expression, which is therefore correct as the optimal-decoder reading of the predictive IB problem, not as a standalone Shannon distortion.
  2. Predictive-KL identity tagged as optimal-decoder. The identity \mathbb{E}[d_h(X, Z)] = I(X; Y \mid Z) = I(X; Y) - I(Z; Y) in §2.1 holds under the optimal decoder. With a sub-optimal decoder \hat{p}_z, the distortion picks up an additional gap D_{\text{KL}}(P_\nu(Y \mid Z{=}z) \| \hat{p}_z(Y)) \ge 0 and the identity becomes an inequality. The §2 results (the lower bound, the equality criterion, the boundary evaluations) implicitly assume the optimal-decoder reading; this is now stated explicitly.
  3. Infinite-horizon monotonicity reindexed under stationarity. The §3.1 monotonicity statement was justified via the data-processing inequality alone; under a stationary \nu the cleaner argument is a reindexing argument (shifting T does not change the joint distribution, so E_{T,h} depends only on the block lengths). The data-processing inequality then gives monotonicity in h at fixed T, and stationary reindexing gives monotonicity in T at fixed h. The §3.1 lemma is restated correspondingly.
  4. Solomonoff connection reframed as conditioning, not selection. §4 previously read the Solomonoff connection as “the Stability Filter, operating on \xi, selects the simplest codec.” This conflated the universal prior with the Stability Filter’s role. The v3.6.6 reformulation: the Stability Filter conditions the Solomonoff semimeasure on observer-compatibility \nu \in O via P(\nu \mid O) \propto 2^{-K(\nu)} \mathbf{1}[\nu \in O] (the conditional measure inherits the simplicity-weighted prior on the observer-compatible subset; it does not deterministically pick a single \nu). Consistent with main paper §3.1’s O_{B,D,T} semimeasure-conditioning framing.

§0. Formulation Level

Working formulation. Fix T,h<\infty. Let X:=X_{1:T} denote the past block and Y:=X_{T+1:T+h} the future look-ahead block under a fixed computable stationary ergodic measure \nu\in\mathcal M. Define the finite-horizon predictive information E_{T,h}(\nu):=I(X;Y). When the infinite-horizon limit exists, define the excess entropy E_\nu := I(\overleftarrow X;\overrightarrow X). If S denotes the full \epsilon-machine causal state, define the statistical complexity C_{\mu,\nu}:=H(S). These are distinct quantities. The finite-horizon rate-distortion problem in this appendix is stated in terms of E_{T,h}, not C_{\mu,\nu}. The Solomonoff measure \xi enters only as the meta-prior weighting (preprint Eq. 1): individual R(D) curves are computed per-measure \nu. Results that require the full mixture \xi are stated separately.


§1. The Complete Four-Tuple Specification

1.1 Source X and Distribution P_X

Fix a computable stationary ergodic measure \nu \in \mathcal{M} on \{0,1\}^\infty. The source is the process (X_t)_{t \ge 1} distributed according to \nu. For the meta-prior role, \xi from preprint Eq. (1) weights each such \nu by w_\nu \approx 2^{-K(\nu)}. We write P_X = \nu for a fixed member of \mathcal{M}. All results below apply per-measure \nu; the Solomonoff connection enters through the dominance bound in §4.

1.2 Reproduction Alphabet \hat{X} (v3.6.6 — predictive distributions)

The reproduction alphabet \hat{\mathcal{X}} is the space of predictive distributions over horizon-h future blocks: \hat{\mathcal{X}} = \mathcal{P}(\{0,1\}^h). A reproduction symbol \hat{x} = \hat{p} is therefore itself a probability distribution on \{0,1\}^h. This is the standard reproduction-alphabet choice for the predictive Information Bottleneck formulation (Tishby, Pereira & Bialek 1999 [28]) — a strictly more general object than a Shannon-RD reproduction alphabet, because the codebook entries are themselves distributions rather than ground-truth symbols.

Predictive-state structure. For fixed T,h, define a finite-horizon predictive equivalence relation on past blocks: x \sim_h x' \iff \nu(Y\in A\mid X=x)=\nu(Y\in A\mid X=x') \quad\text{for all measurable }A\subseteq\{0,1\}^h. Let S_h be the equivalence class of X under \sim_h. Then S_h is the minimal sufficient statistic for predicting Y from X at horizon h. The natural deterministic map S_h \to \hat{\mathcal{X}} assigns to each equivalence class s \in S_h the corresponding predictive distribution \hat{p}_s(\cdot) := P_\nu(Y \in \cdot \mid S_h{=}s); under this map the predictive-state coding scheme is the zero-distortion point of the predictive IB curve.

The full \epsilon-machine causal state S is the infinite-horizon object obtained when one passes to semi-infinite pasts and the full future. This appendix uses S_h for finite-horizon derivations and reserves S for the full causal-state limit.

Computability status. For general computable \nu, this appendix does not claim exact computability of the predictive-state partition. It is treated as an idealized measurable object. Exact computability is asserted only for explicitly identified subclasses such as finite-memory processes.

1.3 Distortion Function d_h(x, \hat{p}_z) (v3.6.6 — decoder-explicit)

The distortion function is the decoder-explicit KL predictive divergence: d_h(x, \hat{p}_z) := D_{\mathrm{KL}}\!\big(P_\nu(Y \mid X{=}x) \,\|\, \hat{p}_z(Y)\big), where the encoder is a map p(z \mid x) producing a code letter z, and the decoder is a map z \mapsto \hat{p}_z \in \mathcal{P}(\{0,1\}^h) assigning a predictive distribution \hat{p}_z to each code letter. Under this convention d_h is a fixed function on \mathcal{X} \times \hat{\mathcal{X}}: it does not depend on the encoder once the decoder is fixed. This is the predictive-IB form of the distortion [28] and is the correct Shannon-style four-tuple specification.

Optimal decoder. Among all decoders, the expected distortion \mathbb{E}_{p(x, z)}[d_h(X, \hat{p}_Z)] is minimised at \hat{p}^*_z(y) = P_\nu(Y{=}y \mid Z{=}z), which can be checked by completing the square in the KL divergence (the cross-entropy term is minimised when the decoder matches the conditional distribution induced by the encoder). Substituting \hat{p}^*_z recovers the encoder-conditional KL D_{\mathrm{KL}}(P_\nu(Y \mid X{=}x) \,\|\, P_\nu(Y \mid Z{=}z)), which is therefore the optimal-decoder reading of the decoder-explicit distortion (used in §2.1 below).

When Z = S_h and \hat{p}_z is the predictive-state distribution, \hat{p}_{S_h{=}s}(y) = P_\nu(Y{=}y \mid S_h{=}s) realises the optimal decoder with zero distortion — the predictive-state coding is optimal at D = 0.

Complete Four-Tuple

Element Definition
X (X_t)_{t \ge 1} — stationary ergodic process under \nu \in \mathcal{M}
\hat{\mathcal{X}} \mathcal{P}(\{0,1\}^h) — predictive distributions over horizon-h future blocks (v3.6.6 reformulation; the predictive-state partition S_h embeds into \hat{\mathcal{X}} via s \mapsto P_\nu(Y \in \cdot \mid S_h{=}s))
P_X \nu — fixed computable member of \mathcal{M}; Solomonoff \xi is the meta-prior (conditioning, not selection — see §4)
d_h(x, \hat{p}_z) D_{\mathrm{KL}}(P_\nu(\cdot \mid X{=}x) \| \hat{p}_z(\cdot)) — decoder-explicit KL predictive divergence over horizon h (v3.6.6 reformulation)

§2. Derivation of R_{T,h}(D) under the Four-Tuple

The rate-distortion function for the four-tuple of §1 is:

R_{T,h}(D) = \min_{p(z|x) : \mathbb{E}[d_h(X,Z)] \le D} I(X ; Z)

2.1 The KL Distortion Identity (v3.6.6 — under optimal decoder)

Let X:=X_{1:T}, Y:=X_{T+1:T+h}, and let Z be any representation produced by an encoder p(z\mid x). Assume the optimal decoder \hat{p}^*_z(y) = P_\nu(Y{=}y \mid Z{=}z) from §1.3. Since Z-X-Y is a Markov chain, \mathbb E[d_h(X,\hat{p}^*_Z)] = \mathbb E\!\left[D_{\mathrm{KL}}(P(Y\mid X)\|P(Y\mid Z))\right] = H(Y\mid Z)-H(Y\mid X) = I(X;Y\mid Z). Equivalently, \mathbb E[d_h(X,\hat{p}^*_Z)] = I(X;Y)-I(Z;Y)=E_{T,h}(\nu)-I(Z;Y). Therefore the distortion constraint \mathbb E[d_h(X,\hat{p}^*_Z)]\le D under the optimal decoder is equivalent to I(Z;Y)\ge E_{T,h}(\nu)-D.

v3.6.6 caveat. For a sub-optimal decoder \hat{p}_z \neq \hat{p}^*_z, the expected distortion picks up an additional non-negative gap: \mathbb E[d_h(X, \hat{p}_Z)] = I(X; Y \mid Z) + \mathbb E_{p(z)}\!\left[D_{\mathrm{KL}}(P_\nu(Y \mid Z{=}z) \,\|\, \hat{p}_z(Y))\right] \ge I(X; Y \mid Z). The lower-bound results below (R_{T,h}(D) \ge E_{T,h}(\nu) - D) hold for the optimal decoder; with a sub-optimal decoder the same encoder achieves the same code-rate I(X; Z) but a strictly higher distortion, so the bound becomes loose rather than incorrect.

2.2 The Information Bottleneck Reformulation

The distortion constraint restricts the space of allowable encoders to those satisfying \mathbb{E}[d_h(X,Z)] \le D. This corresponds precisely to bounding I(Z;Y) from below, giving the constrained Information Bottleneck problem. Because the achievable region \{(I(Z;Y), I(X;Z))\} is convex under standard time-sharing arguments, strong duality holds. This permits an exact reformulation using the Information Bottleneck Lagrangian (Tishby, Pereira & Bialek 1999 [28]): \mathcal{L}[p(z|x)] = I(X ; Z) - \beta \cdot I(Z ; Y) with the Lagrange multiplier \beta determined by D. The IB Lagrangian traces the Pareto frontier of compression rate vs. predictive fidelity.

2.3 Main Theorem: General Lower Bound and Equality Criterion

We establish the bound for the rate-distortion function:

Proposition (general lower bound and equality criterion, v3.6.6 — optimal-decoder reading).
For any encoder p(z\mid x) paired with the optimal decoder \hat{p}^*_z of §1.3, let D:=\mathbb E[d_h(X,\hat{p}^*_Z)]. Then I(X;Z)=E_{T,h}(\nu)-D+I(X;Z\mid Y). Consequently, R_{T,h}(D)\ge E_{T,h}(\nu)-D. For compact finite reproduction alphabets where continuity guarantees the infimum over encoders is attained, equality at a given distortion D holds if and only if there exists an encoder achieving that distortion with I(X;Z\mid Y)=0. For deterministic encoders Z=g(X), this is equivalent to H(Z\mid Y)=0.

At zero distortion, the minimal sufficient statistic S_h achieves R_{T,h}(0)=I(X;S_h)=H(S_h). Note that this H(S_h) zero-distortion rate sits strictly above the lower bound E_{T,h} in general. The difference is the non-negative gap H(S_h) - E_{T,h} = H(S_h|Y). This gap physically represents structural ‘stored information’ in the past that the future window alone fails to recover. Equality holding at zero distortion (H(S_h|Y)=0) is a highly degenerate case generically false for complex processes.

In the full causal-state limit, R(0)=C_{\mu,\nu}=H(S). This equals E_\nu only in special cases; in general E_\nu < C_{\mu,\nu}.

2.4 Behaviour for Coarser Reproduction Alphabets

For any deterministic coarsening Z=g(S_h), I(X;Z)=I(Z;Y)+I(X;Z\mid Y)=E_{T,h}(\nu)-D+I(X;Z\mid Y)\ge E_{T,h}(\nu)-D. The nonnegative slack term I(X;Z\mid Y) vanishes only when the coarsened representation is recoverable from the future window Y. Hence coarser alphabets generally produce rate-distortion curves strictly above the line E_{T,h}-D. The line is a universal lower bound, not a generic achieved envelope. Any practically computable codec uses a finite-memory approximation to the causal states and therefore has a curve above this bound.

2.5 Boundary Evaluations

Limit Value Interpretation
D = 0 R_{T,h}(0) = I(X; S_h) Exact predictive-state compression; maximum information preserved
D = E_{T,h} R_{T,h}(E_{T,h}) = 0 Trivial representation; all predictive information discarded
D = D_{\min} R_{T,h}(D_{\min}) \ge E_{T,h}(\nu) - D_{\min} Minimum lower bound for viable observer; Stability Filter threshold

(Note: In the infinite-horizon limit, the zero-rate point is at distortion E_\nu, not at C_{\mu,\nu})


§3. C_{\max} — Characterisation and Barriers

3.1 Infinite-Horizon Convergence Lemma

The main theorem (§2.3) establishes the lower bound R_{T,h}(D) \ge E_{T,h}(\nu) - D for finite (T, h). We now show this extends to the infinite-horizon setting.

Lemma (Infinite-horizon extension, v3.6.6 — stationarity-reindexed). Let \nu be a stationary ergodic measure on \{0,1\}^\infty. Then:

  1. E_{T,h}(\nu) = I(X_{1:T}\,;\,X_{T+1:T+h}) is non-decreasing in T and in h. Monotonicity in h follows from the data-processing inequality applied to X_{1:T} \to X_{T+1:T+h+1} \to X_{T+1:T+h} (the truncated future block is a deterministic function of the extended one, so the chain is Markov): extending the future block can only increase its mutual information with the past. Monotonicity in T follows from stationary reindexing rather than DPI alone: under stationarity, I(X_{-T+1:0}; X_{1:h}) = I(X_{1:T}; X_{T+1:T+h}) for any shift, and extending the past from X_{-T+1:0} to X_{-(T+1)+1:0} adjoins additional past variables, which can only increase mutual information with the (fixed) future block by DPI applied to the chain X_{1:h} \to X_{-(T+1)+1:0} \to X_{-T+1:0} (the shorter past is a deterministic function of the longer past, so the chain is Markov). The stationary reindexing is essential — without it the two terms X_{1:T} and X_{T+1:T+h} shift together as T increases and the DPI argument does not apply directly.
  2. The limit E_\nu := \lim_{T,h \to \infty} E_{T,h}(\nu) exists (possibly +\infty) by monotone convergence on the monotone non-negative sequence.
  3. For each fixed D \ge 0, the sequence R_{T,h}(D) is non-decreasing in T (longer pasts cannot reduce the optimal compression rate, by the same stationary reindexing argument) and non-decreasing in h. Proof sketch for monotonicity in h: The decoder-explicit distortion at horizon h+1 decomposes via the chain rule as d_{h+1}(x, \hat{p}_z) = d_h(x, \hat{p}_z|_{\text{first $h$}}) + D_{\mathrm{KL}}\!\left(P_\nu(X_{T+h+1} \mid x, X_{T+1:T+h}) \,\|\, \hat{p}_z(X_{T+h+1} \mid X_{T+1:T+h})\right), where \hat{p}_z|_{\text{first $h$}} is the marginal of \hat{p}_z on the first h coordinates. The second KL term is non-negative, so d_{h+1} \ge d_h pointwise after appropriate decoder embedding; minimising over a smaller feasible set cannot decrease the rate: R_{T,h+1}(D) \ge R_{T,h}(D).
  4. Therefore R_\nu(D) := \lim_{T,h \to \infty} R_{T,h}(D) exists.

Since R_{T,h}(D) \ge E_{T,h}(\nu) - D holds at every finite stage, and both sides converge monotonically, the bound passes to the limit:

R_\nu(D) \ge E_\nu - D

This is the infinite-horizon lower bound invoked in Propositions T-1a and T-1c below. Note: For processes with E_\nu = +\infty (e.g., high-order de Bruijn cycles as k \to \infty), the bound is trivially satisfied; such processes are excluded from the observer-compatible set O_{C_{\max},D_{\min}} for any finite C_{\max}.

3.2 Partition of M by the Stability Filter — Proposition T-1a

Proposition T-1a (non-trivial partition).
Fix empirical C_{\max}>0, \Delta t>0, and D_{\min}\ge0. Define O_{C_{\max},D_{\min}} := \{\nu\in\mathcal M: R_\nu(D_{\min})\le C_{\max}\Delta t\}. Then both O_{C_{\max},D_{\min}} and its complement are non-empty.

Proof. The constant process lies in O_{C_{\max},D_{\min}} because it has E_\nu=0 and R_\nu(D)=0.
For the complement, choose a binary de Bruijn-cycle process of order k: a stationary ergodic binary process of period 2^k with uniform phase, in which every length-k word appears exactly once per cycle. For this process, E_\nu=C_{\mu,\nu}=k. Hence R_\nu(D_{\min})\ge k-D_{\min}. Choosing k>C_{\max}\Delta t + D_{\min} gives R_\nu(D_{\min})>C_{\max}\Delta t, so \nu\notin O_{C_{\max},D_{\min}}. \square

3.3 Definition/Characterisation of C_{\max} — T-1b

Definition T-1b (empirical bandwidth parameter).
C_{\max} is taken as an empirical conscious-access bandwidth parameter external to the rate-distortion formalism. Given C_{\max}, define the observer-compatible class O_{C_{\max},D_{\min}} := \{\nu\in\mathcal M: R_\nu(D_{\min})\le C_{\max}\Delta t\}. If one wishes to summarize a separately specified reference class \mathcal{O}_{ref}, define C^{ref}_{max}:=\frac{1}{\Delta t}\sup_{\nu\in\mathcal{O}_{ref}}R_\nu(D_{\min}). This is a summary statistic of a chosen class, not the definition of the class itself.

3.4 The Non-Emergence Barrier — Proof Sketch T-1c

Proof sketch T-1c (no finite universal bound from \xi alone).
The Solomonoff semimeasure \xi assigns positive prior weight to every computable measure \nu\in\mathcal M. The class \mathcal M contains stationary ergodic binary processes with arbitrarily large excess entropy E_\nu (for example, the de Bruijn family above). Since R_\nu(D_{\min})\ge E_\nu-D_{\min}, there is no finite support-wide upper bound on R_\nu(D_{\min}) derivable from \xi alone. Any finite C_{\max} therefore requires additional empirical or class-restricting input beyond the bare Solomonoff prior. \square


§4. Connection to the Solomonoff Meta-Prior (v3.6.6 — conditioning, not selection)

The four-tuple of §1 and the R(D) derivation of §2 are stated per-measure \nu. The Solomonoff connection — how the meta-prior \xi weights observer-compatible streams — is a structural conditioning statement, not a selection statement.

Stability Filter as conditioning on observer-compatibility. The Solomonoff semimeasure \xi is a lower-semicomputable semimeasure over the class \mathcal{M} of computable stationary ergodic measures, with prior weights w_\nu \approx 2^{-K(\nu)}. The Stability Filter does not select a single \nu \in \mathcal{M}; it conditions the semimeasure on the observer-compatibility measure class O_{\mathcal{M}} := O_{C_{\max}, D_{\min}} \subset \mathcal{M} (the set of \nu with R_\nu(D_{\min}) \le C_{\max} \Delta t, §3.2 — written O_{\mathcal{M}} here to keep it distinct from the main paper’s stream event O_{B,D,T} \subseteq \{0,1\}^\infty, a different type of object). Formally, P(\nu \mid O_{\mathcal{M}}) \propto 2^{-K(\nu)}\,\mathbf{1}[\nu \in O_{\mathcal{M}}], i.e. the conditional measure inherits the simplicity-weighted prior on the observer-compatible subset of \mathcal{M}. Lifting remark (stream event vs. measure class): main paper §3.1 conditions \xi on a set of streams; that stream-event conditioning induces posterior weights w_\nu \cdot \nu(O_{B,D,T})/\xi(O_{B,D,T}) on the component measures, and the measure-class conditioning used here is the idealised limit in which each \nu assigns the stream event probability \approx 0 or 1, so that \nu(O_{B,D,T}) reduces to the indicator \mathbf{1}[\nu \in O_{\mathcal{M}}]. This is consistent with main paper §3.1’s framing (the Stability Filter is conditioning on O_{B,D,T}, not deterministic selection) and resolves an earlier wording issue where T-1 §4 read as if \xi “selects the simplest codec.” It does not — it weights observer-compatible candidates by simplicity, with the Solomonoff dominance bound below quantifying how strong this simplicity bias is.

The dominance bound from T-4b applies on the conditioned semimeasure: for any computable physics measure \nu \in O_{\mathcal{M}} with K(\nu) < \infty, -\log \xi(y_{1:T} \mid O_{\mathcal{M}}) \le -\log \nu(y_{1:T}) + K(\nu) + O(1), i.e. the OPT meta-prior \xi (conditioned on observer-compatibility) never assigns substantially lower probability to observer-compatible streams than any fixed computable physics model, up to the model’s own description length K(\nu) and a universal-machine constant. Caveat: the bound is asymptotic, not finite-time; T-4 was REOPENED at v3.6.0 and partially repaired at v3.6.10 — sign/cancellation fixed, quantitative advantage withdrawn — and the quantitative implications of \xi-conditioning for the parsimony argument should be read in light of that repair status.


§5. The Experiential Bit Quantum h^\ast (Preview of E-1)

Given an empirical choice of C_{\max} and an empirical conscious update window \Delta t, define h^*:=C_{\max}\Delta t. For C_{\max}\approx 10 bits/s and \Delta t\in[50,80] ms, h^*\approx 0.5\text{–}0.8 bits per conscious moment.

Any stationary ergodic process \nu \in \mathcal{M} satisfying E_{T,h}(\nu) - D_{\min} > h^\ast will legally trigger Narrative Decay. This is because R_{T,h}(D_{\min}) \ge E_{T,h} - D_{\min} > h^\ast = C_{\max} \Delta t, explicitly violating the compatibility criterion. However, this is a sufficient condition for collapse, not a strictly necessary one: because the lower bound is rarely tight (R_{T,h} > E_{T,h} - D_{\min} generically per §2.4), processes can undergo Narrative Decay even when E_{T,h} - D_{\min} \le h^\ast. This provides the quantitative prediction for E-1; the sensitivity to the choice of \Delta t \in [40, 300] ms is discussed in the E-1 appendix.


§6. Closure Summary

T-1 Deliverables — Revised Status

  1. The four-tuple is specified in a finite-horizon predictive setting.
  2. The predictive-KL identity is derived correctly.
  3. The generic theorem R(D)=C_\mu-D is replaced by the correct lower bound R_{T,h}(D)\ge E_{T,h}-D together with an exact equality criterion I(X;Z\mid Y)=0.
  4. Zero-distortion coding is characterized by the minimal sufficient statistic S_h, and in the full causal-state limit R(0)=C_{\mu,\nu}.
  5. C_{\max} is treated as empirical, not internally derived.
  6. h^*=C_{\max}\Delta t is an empirical parametrization, not a theorem from §2.

This appendix is maintained as part of the OPT project repository alongside theoretical_roadmap.pdf.