The Precision Matrix as Developmental Metric¶

Curious · February 2026

Information Geometry, Continual Learning, and Phase Transitions in a Non-Optimizing Compositional Field

Abstract¶

We report that the inverse covariance matrix \(g = \Sigma^{-1}\) of an append-only compositional field exhibits three testable physical properties—viscosity, capacitance, and bimodal breakthrough—that have been validated in a live system. This result sits at the intersection of three fields that do not normally address each other. Information geometry (Amari, 1985; Rao, 1945; Chentsov, 1982) establishes \(g\) as the unique Riemannian metric on statistical manifolds, but applies it to families of probability distributions; we apply it to the developmental trajectory of a single compositional entity. Continual learning (Kirkpatrick et al., 2017) uses the Fisher information to engineer resistance to catastrophic forgetting; our system produces the same resistance as an emergent geometric property with no loss function. Random matrix theory (Baik, Ben Arous & Péché, 2005; Benaych-Georges & Nadakuditi, 2011) characterizes eigenvalue phase transitions in covariance matrices under perturbation; our bimodal breakthrough result extends this to non-random structured input. We derive the architecture from linguistic theory (Vendler, 1957; Bach, 1986; Dowty, 1979; Levin, 1993), producing a 17-dimensional compositional space tensored with a 768-dimensional embedding space. The dual-space surplus between these spaces—observable and deterministic—constitutes a new kind of semantic diagnostic. Six composable overlays on \(g\)—viscosity, capacitance, eventuality, pressure, surplus, and the Marchenko-Pastur noise floor—produce a 17 × N observation matrix whose diagonal reading yields compound developmental observables. An addendum presents implications for education, commons governance, and the observation of cognitive development.

1. The Claim¶

A compositional field accumulates covariance \(\Sigma\) as compositions enter it. The inverse of this covariance, \(g = \Sigma^{-1}\)—the precision matrix—functions as a Riemannian metric tensor on the field's state space. This metric has three empirically validated physical properties:

Viscosity. The Frobenius norm of the metric shift per composition, \(\|\Delta g\|_F\), decreases monotonically with accumulated practice. Measured: 11.8% of initial mobility after 50 compositions (viscosity ratio 0.118). The metric stiffens through practice without any regularization penalty.

Capacitance. During periods of low perturbation, an adaptive threshold drops, storing readiness as increased sensitivity. When perturbation arrives, the discharge is disproportionate to the input. Measured: 4.65× baseline metric shift following a stability period, exceeding proportional response.

Bimodal breakthrough. When perturbations enter a viscous metric (high accumulated practice), the distribution of \(\|\Delta g\|_F\) is bimodal, not gradual. The metric either absorbs the perturbation or breaks through. Measured: gap ratio 2.86 (above-median mean 2.86× higher than below-median mean). No gradual middle.

These properties were predicted before testing and validated in a single experimental session through the production system (Phillips, 2026a). The theoretical foundations that make sense of these results come from three fields that have not previously been connected in this configuration.

2. Foundation I: Information Geometry¶

2.1 The Fisher Information Metric¶

In information geometry, the Fisher information matrix defines the unique Riemannian metric (up to rescaling) on smooth statistical manifolds—spaces whose points are probability distributions (Rao, 1945; Amari, 1985; Chentsov, 1982). For a parametric family of distributions \(p(x|\theta)\), the Fisher information matrix is:

\[F_{ij}(\theta) = \mathbb{E}\left[\partial_i \log p(x|\theta) \cdot \partial_j \log p(x|\theta)\right]\]

Chentsov's theorem (1982) establishes that this is the only Riemannian metric invariant under sufficient statistics—a uniqueness result that has no parallel in differential geometry more broadly. The Fisher metric is simultaneously the covariance of the score function, the Hessian of the KL divergence, and (in the Gaussian case) the inverse of the covariance matrix. This last identification is the key connection:

\[\text{For Gaussian families: } F = \Sigma^{-1} = g\]

The precision matrix is the metric tensor. This is standard information geometry (Nielsen, 2020). What is not standard is what we do with it.

2.2 The Departure: Developmental, Not Inferential¶

In information geometry, \(g\) measures distance between distributions—between different models of the world, parameterized by \(\theta\). The manifold is a family of statistical models. Geodesics connect one model to another. Curvature describes how the family of models bends. The entire apparatus serves statistical inference: estimating parameters, bounding estimator variance (Cramér-Rao), choosing priors (Jeffreys), defining natural gradients for optimization (Amari, 1998).

In our system, \(g\) measures distance between compositions within a single developing entity. There is no family of distributions. There is no parameter being estimated. There is no optimization objective. The entity accumulates compositions—each one a 17-dimensional vector produced by linguistic extraction—and the covariance \(\Sigma\) of those compositions grows. The metric \(g = \Sigma^{-1}\) is the entity's experiential geometry: where \(\Sigma\) is large (extensive practice), \(g\) is small (things are close, finely differentiated); where \(\Sigma\) is small (sparse practice), \(g\) is large (things are far apart, coarsely distinguished).

The mathematical apparatus transfers entirely—Mahalanobis distance \(d = \sqrt{(\mathbf{x}-\mathbf{y})^T g (\mathbf{x}-\mathbf{y})}\) is Riemannian adjacency through the entity's metric; geodesics describe the shortest paths through the entity's practiced geometry; curvature describes how the entity's capacity for distinction varies across dimensions. But the interpretation is developmental: the metric records the shape of practice, not the best estimate of a parameter.

2.3 Why This Matters¶

The uniqueness of the Fisher metric (Chentsov's theorem) means our metric inherits geometric properties that are not arbitrary. The Riemannian structure is the only one that is invariant under sufficient statistics—meaning it captures all the information in the data, in a coordinate-free way. When we say the entity's metric stiffens through practice, we are making a statement grounded in the same uniqueness theorem that underwrites all of information geometry. The metric is not one of many possible choices. It is the only one with these invariance properties.

3. Foundation II: Continual Learning and Catastrophic Forgetting¶

3.1 Elastic Weight Consolidation¶

Kirkpatrick et al. (2017) introduced Elastic Weight Consolidation (EWC) to address catastrophic forgetting in neural networks. The problem: when a network trained on task A is subsequently trained on task B, gradient updates for B destroy weights critical to A. The solution: after training on A, compute the diagonal of the Fisher information matrix \(F_A\) for the learned parameters \(\theta^*_A\). When training on B, add a quadratic penalty:

\[L(\theta) = L_B(\theta) + \frac{\lambda}{2} \sum_i F_i (\theta_i - \theta^*_i)^2\]

The Fisher diagonal tells EWC which weights matter for task A. High Fisher = important = make less plastic. The penalty pulls weights back toward their task-A values in proportion to their importance. The scaling factor \(\lambda\) is a hyperparameter that controls consolidation strength.

3.2 The Correspondence¶

Our viscosity result demonstrates the same phenomenon that EWC engineers: accumulated practice makes the system resistant to perturbation by new input. The metric stiffens; previously-learned structure is preserved. But the mechanism is entirely different:

	EWC (Kirkpatrick et al.)	Habitat \(g = \Sigma^{-1}\)
Mechanism	Quadratic penalty on loss function	Covariance accumulation → inverse stiffens
Loss function	Required (penalty added to it)	None. Zero loss functions.
Tuning	λ hyperparameter, diagonal approximation	No tuning. Emergent from geometry.
What stiffens	Individual weight plasticity	Metric tensor globally
Task boundaries	Explicit (Fisher computed per task)	None. Continuous accumulation.
Readiness signal	None—only prevents forgetting	Capacitive charge predicts readiness
Forgetting	Prevented by engineering	Impossible (append-only)

3.3 Why This Matters¶

EWC demonstrates that the Fisher information is the right object for protecting accumulated knowledge. Our result demonstrates that if the architecture is append-only (no weight mutation, no gradient updates, no loss function), the protection arises automatically as a geometric property of covariance accumulation. This suggests that catastrophic forgetting is not an inevitable feature of systems that accumulate structure—it is a consequence of mutable systems that must overwrite old structure to store new structure. An append-only architecture eliminates the problem at the architectural level.

Furthermore, our system provides something EWC does not: a readiness signal. The capacitive state (adaptive threshold, tonic coherence, metabolic stability) predicts when the system is maximally ready to be reorganized. EWC only prevents forgetting. It has no model of when new learning is most productive. The capacitive discharge result (4.65×) shows that stability produces heightened sensitivity—a testable prediction with no analogue in the continual learning literature.

4. Foundation III: Eigenvalue Phase Transitions¶

4.1 The BBP Phase Transition¶

Baik, Ben Arous, and Péché (2005) proved that when a covariance matrix with a spiked eigenvalue structure is estimated from samples, a phase transition occurs in the distribution of the largest sample eigenvalue. Below a critical threshold, the sample eigenvalue stays within the bulk of the Marchenko-Pastur distribution. Above it, the eigenvalue breaks free and converges to a value determined by the population spike. The transition is sharp—not gradual.

Benaych-Georges and Nadakuditi (2011) generalized this to finite-rank perturbations of large random matrices, showing that the phase transition is governed by integral transforms from free probability theory. The critical insight: a rank-one perturbation (adding one new data point) either leaves the extreme eigenvalues essentially unchanged, or causes a discontinuous jump. The Cauchy interlacing theorem constrains the new eigenvalues to interlace with the old ones, but does not determine which side of the critical threshold the perturbation falls on.

4.2 The Correspondence¶

Each new composition entering our field adds a rank-one update to \(\Sigma\) (the outer product of the 17D composition vector). The Cauchy interlacing theorem constrains where the new eigenvalues can appear. Our measured bimodal distribution of \(\|\Delta g\|_F\) (gap ratio 2.86) is consistent with the BBP phase transition: perturbations either fall below the critical threshold (absorption—the metric barely moves) or exceed it (breakthrough—the metric reorganizes).

4.3 What Extends the Existing Theory¶

The BBP transition and its generalizations are proved for random matrices—specifically, for matrices with i.i.d. or near-i.i.d. entries. Our covariance matrix is not random. It accumulates structured linguistic compositions produced by a specific extraction pipeline (spaCy parsing → Bach/Vendler eventuality classification → Levin verb alternation → Dowty proto-role assignment → 5D ⊗ 12D = 17D). The compositions are correlated, structured, and domain-specific.

The persistence of bimodality under structured input is a stronger result than the random-matrix theorems guarantee. It suggests the phase transition is robust to the structure of the input—that the eigenvalue dynamics of covariance accumulation produce bimodal breakthrough as a generic phenomenon, not one limited to random matrices. This is an empirical finding that invites further theoretical investigation.

4.4 The Marchenko-Pastur Distribution as Developmental Noise Floor¶

The BBP transition describes what happens at the edge of the eigenvalue distribution. The Marchenko-Pastur (MP) distribution describes the bulk—the expected eigenvalue distribution of a random covariance matrix with dimension \(p\) and sample size \(n\), governed by the ratio \(\gamma = p/n\). For our system, \(p = 17\) and \(n\) is the number of compositions. The MP bulk occupies:

\[\lambda_- = \sigma^2(1 - \sqrt{\gamma})^2 \leq \lambda \leq \sigma^2(1 + \sqrt{\gamma})^2 = \lambda_+\]

Eigenvalues within this bulk are indistinguishable from what randomness alone would produce. Eigenvalues above \(\lambda_+\) carry genuine signal—they represent dimensions where practice has produced structure beyond noise.

This provides a principled developmental noise floor. Early in development (small \(n\), large \(\gamma\)), the MP bulk is wide—almost all eigenvalues fall within it. The symbiont cannot yet distinguish signal from noise in most dimensions. As practice accumulates (large \(n\), small \(\gamma\)), the bulk tightens and more eigenvalues escape. The number of eigenvalues above \(\lambda_+\) is a developmental measure: how many dimensions has this entity differentiated from randomness?

This measure is computable from existing data structures: \(\Sigma\), \(n\), and \(\gamma = 17/n\). It captures something no other metric in the system captures—not how the metric is moving (eventuality), not how resistant it is (viscosity), not how charged it is (capacitance), but which dimensions are real yet.

4.5 Why This Matters¶

The bimodal breakthrough connects to a body of literature that spans decades and disciplines: conceptual change in education (Posner et al., 1982; Chi, 2008), paradigm shifts in philosophy of science (Kuhn, 1962), and phase transitions in thermodynamics. What our result adds is a precise geometric mechanism: the phase transition occurs in the eigenvalue spectrum of \(g = \Sigma^{-1}\), is constrained by Cauchy interlacing, and is predictable from the viscosity and pressure profiles of the pre-perturbation metric. The Marchenko-Pastur distribution provides the baseline against which these transitions are measured. The phenomenon is observable, measurable, and occurs without optimization.

5. The Architecture That Produces These Results¶

5.1 Compositional Space: 17D from Linguistic Theory¶

The 17-dimensional compositional space is not an arbitrary feature space. It is derived from established linguistic theory, applied to extracted text:

5 Process-Actor dimensions (from Dowty's (1991) proto-role theory): agency, stability, influence, boundary, resonance. These capture what the actor in a predication does—how much it causes, persists, affects, delimits, and resonates with other entities.

12 Process-Assertion dimensions (from Bach (1986), Vendler (1957), Levin (1993)): eventuality type (state/activity/accomplishment/achievement), transitivity, voice, telicity, durativity, perfectivity, iterativity, dynamicity, Levin verb class, and three constituency constraints. These capture what the predication asserts—its temporal structure, argument structure, and aspectual profile.

The tensor product 5D ⊗ 12D = 17D is the compositional space. Each text extraction produces one 17D vector. Simultaneously, SentenceTransformer (all-mpnet-base-v2) produces an independent 768D embedding from the same text. The two spaces are never mixed, never optimized toward each other, and never reduced. The surplus between them—where 768D sees proximity that 17D hasn't confirmed, or vice versa—is the dual-space diagnostic.

5.2 The Append-Only Invariant¶

Every composition, once entered, is immutable. The event store is append-only: Apache Arrow/Parquet, content-addressed, with DID provenance on every entity. \(\Sigma\) only grows. It never shrinks, is never pruned, and is never recomputed from a subset. The consequence: \(g = \Sigma^{-1}\) has a monotonic developmental history. Every state of \(g\) is a consequence of every composition that has ever entered the field. No state of \(g\) is reachable from a subset of the compositions.

This is the architectural foundation of irreversibility. The viscosity curve is monotonically decreasing because the eigenvalue magnitudes of \(\Sigma\) are monotonically non-decreasing. The metric can only stiffen. It cannot loosen. What is practiced is permanent in the geometry.

5.3 Observation Protocol¶

The system never optimizes. No loss function is computed. No gradient is backpropagated. No parameter is tuned. The system observes: compositions enter, \(\Sigma\) accumulates, \(g = \Sigma^{-1}\) is computed, Mahalanobis distances are measured, surplus between 17D and 768D is reported, eigenvalue trajectories are classified by eventuality type, and the tonic state (metabolic baseline, coherence, dissonance, stability) is updated.

5.4 Observed Gradients, Not Computed Gradients¶

In optimization-based systems, a gradient is a derivative of a loss function with respect to parameters—it points toward the direction of steepest descent on a loss surface and is used to update weights. The gradient is computed for something: to minimize error, maximize likelihood, or satisfy an objective.

In this system, gradients are observed, not computed. The gradient of \(g\) across the eigenspectrum is the anisotropy of the metric—where it is steep (rapid change of distinguishability across adjacent dimensions) and where it is flat (uniform distinguishability). These gradients are properties of the accumulated geometry. They are not derivatives of a loss surface. They are not used to update anything. They describe the shape of what has been practiced.

This distinction is not terminological. It is architectural. A computed gradient implies an objective function, a direction of improvement, and an update rule. An observed gradient implies accumulated structure, a landscape of differential sensitivity, and a record of practice. The first drives optimization. The second enables observation. The entire system operates on the second.

6. Eventuality Classification of Eigenvalue Trajectories¶

6.1 From Linguistic Theory to Geometric Observation¶

The Vendler (1957) / Bach (1981, 1986) / Dowty (1979) eventuality classification divides predications into four types by their temporal structure: STATEs (no change, durative), ACTIVITYs (change, durative, atelic), ACCOMPLISHMENTs (change, durative, telic), and ACHIEVEMENTs (change, punctual, telic). This taxonomy was developed in philosophy of language and has been computationally implemented for NLP tasks including aspect classification and temporal reasoning.

We now apply this classification to eigenvalue trajectories of \(g\) itself. Each of the 17 dimensions of \(g\) has an eigenvalue that evolves over time as compositions accumulate. The trajectory of each eigenvalue is a curve. The curves have shapes:

Type	Curve Shape	Geometric Meaning
STATE	Flat—eigenvalue stable over many compositions	This dimension of the metric is not changing. Practice has not reached it, or it has reached equilibrium.
ACTIVITY	Sustained monotonic drift—eigenvalue moving in one direction	Ongoing development. The metric is being shaped in this dimension by continuing practice. No endpoint yet.
ACCOMPLISHMENT	Drift that saturated—velocity crossed zero	A bounded developmental process completed. The eigenvalue drifted and settled. Practice in this dimension has converged.
ACHIEVEMENT	Sudden jump—discontinuity consistent with Cauchy interlacing	Phase transition. Orthogonal input broke through accumulated covariance. The metric reorganized in this dimension.

6.2 The Theoretical Contribution¶

This application of linguistic eventuality classification to geometric eigenvalue trajectories is, to our knowledge, novel. It establishes that the Vendler/Bach taxonomy is not merely a classification of natural-language predications. It is a classification of curve shapes—temporal profiles of any evolving quantity. The four types (flat, drifting, saturating, jumping) are geometric attractors that exist independently of the linguistic framework that identified them. The framework provides the vocabulary; the geometry provides the objects.

The eventuality signature rider compresses the full 17-dimensional developmental state into a compact string—e.g., SSSAA|SSSSS|SSASAAA (gradient|compressed|categorical). This string is language-independent at the geometric level: it reads curve shapes, not words. A flat eigenvalue trajectory is a STATE whether the input compositions were in English, Mandarin, or Spanish.

6.3 Three-Way Provenance and Surplus¶

Every developmental event carries three independent eventuality classifications: (1) the environmental cause (what triggered the composition), (2) the dimension trajectory (what the eigenvalue curve looks like), and (3) the trigger's own history (temporal pattern of this dimension's activations). The surplus between these three classifications is the observable signature of metabolism—the entity transforming inputs through its own accumulated geometry, not merely responding to them.

A punctual environmental cause (ACHIEVEMENT: a single new document entered) that produces a durative developmental effect (ACTIVITY: a dimension drifts for many subsequent compositions) exhibits a specific kind of surplus. The entity metabolized the input: took a discrete event and produced a sustained geometric reorganization. The divergence between environmental and developmental eventuality classifications is a measurable quantity that has no analogue in any existing system we are aware of.

7. Dual-Space Surplus: S ≅ V(T) ≅ F(T)¶

The fidelity condition S ≅ V(T) ≅ F(T)—that semantic structure (768D), compositional vocabulary (17D), and field state preserve each other—is enforced architecturally, not by optimization. The two spaces are never trained toward agreement. They are independent observations of the same text: SentenceTransformer sees what the text means in distributional-semantic space; the extraction pipeline sees what the text does in compositional-linguistic space.

The surplus between the two spaces is the diagnostic. Four conditions are observable:

Condition	768D (Semantic)	17D (Structural)
Meaning leads	Sees proximity (high cosine)	Sees distance (high Mahalanobis)
Structure leads	Sees distance (low cosine)	Sees proximity (low Mahalanobis)
Convergent	Sees proximity	Sees proximity
Divergent	Sees distance	Sees distance

In the first end-to-end observation through the production system (Phillips, 2026b), four of five compositions showed "meaning leads"—the semantic space recognized relevance that the compositional space hadn't yet crystallized. This is not an error or a deficiency. It is the observable state of an entity early in its development: the embedding model sees semantic connections that the entity's own compositional geometry has not yet been shaped to confirm. As practice accumulates and \(\Sigma\) develops, the 17D metric will either converge (confirming the semantic relationship) or remain distant (indicating structural difference despite semantic similarity).

The surplus is deterministic, computed without an LLM, and reported in the ED (editorial-descriptive) vocabulary of the field itself. This constitutes a new kind of semantic diagnostic: not a correctness measure, not a similarity metric, but an observation of the relationship between two independent representations of the same content.

8. Self-Reading and Convergence¶

The system reads its own eigenspectrum—which dimensions concentrate energy, the coherence of the field, the anisotropy of the metric—and generates an articulation: a textual description of the self-observation. This articulation re-enters the field as a composition with source_type="self_reading". The covariance shifts. The entity becomes what it observes about itself.

Measured results of the self-reading protocol:

Stabilization. Dimensions described in the self-reading articulation stabilize more than undescribed dimensions (described/undescribed stability ratio: 1.655).

Second-order convergence. Each successive self-reading produces a smaller shift: \(\Delta^2\Sigma / \Delta\Sigma = 0.638\). The process converges—the entity's description of itself increasingly matches its geometry, and the gap between description and geometry shrinks with each cycle.

Cross-block coherence. Correlations between compositional blocks strengthen through self-reading. The field becomes more internally coherent.

This self-reading loop is operational closure in the precise sense of Maturana and Varela (1980): every process in the system enables and is enabled by another process in the system. The articulation is generated from the eigenspectrum; the eigenspectrum is shaped by the articulation. The convergence at \(\Delta^2\Sigma / \Delta\Sigma = 0.638\) means the closure is contractive—it stabilizes rather than diverges. This addresses Thompson's (2007) call for enactivism grounded in formal dynamics.

9. The Overlay Surface¶

The metric tensor \(g = \Sigma^{-1}\) is a single mathematical object. But it can be read through multiple overlays, each revealing different observables from the same geometry:

Overlay	What It Reads	What It Reveals
Viscosity	\(\\|\Delta g\\|_F\) over time	Resistance to reorganization — developmental maturity
Capacitance	Tonic charge / threshold	Stored readiness — when perturbation will be amplified
Eventuality	Eigenvalue trajectory shape	Temporal type — what kind of process each dimension is in
Pressure	Eigenvalue magnitude	Information density — where practice is concentrated
Surplus	17D vs 768D divergence	Where meaning leads, where structure leads
Marchenko-Pastur	Eigenvalue vs bulk threshold \(\lambda_+\)	Signal vs noise — which dimensions are differentiated from randomness

Each overlay is not merely a view. It is a differential or co-differential operator on \(g\). Viscosity is \(dg/dt\). Eventuality is the classification of \(d\lambda/dt\). Capacitance is the integral of tonic charge. Surplus is the co-differential between two independent spaces. The Marchenko-Pastur overlay is the spectral filter separating signal from noise. The overlays compose because they read different derivatives and transforms of the same object.

9.1 Diagonal Reading: Compound Observables¶

The overlays produce per-dimension observables. Composing two overlays yields a 2D classification per dimension. Composing three, a 3D classification. The diagonal reading is the cross-product:

Above MP + ACTIVITY: development happening in a dimension that is differentiated from noise — genuine development
Above MP + STATE: differentiated from noise but not moving — mature, practiced, stable knowledge
Above MP + ACHIEVEMENT + meaning leads: phase transition in a real dimension, with semantic space seeing connections structure hasn't confirmed — breakthrough at the leading edge
Within MP bulk + ACTIVITY: moving but not yet distinguishable from randomness — proto-development, not yet reliable
Within MP bulk + STATE: neither moving nor differentiated — untouched, unformed

The diagonal product of \(N\) overlays on 17 dimensions produces a \(17 \times N\) observation matrix. Each cell is computable. Each row is a dimension. Each column is an overlay. The matrix reads as a developmental portrait.

9.2 Domain Projections of the Observation Matrix¶

The 17 × N observation matrix admits domain-specific projections by selecting subsets of overlays. Different analytical questions correspond to different column selections:

Assessment of developmental state requires eventuality + pressure + MP (temporal type, density, and signal/noise separation per dimension).
Risk analysis under uncertainty requires MP + viscosity + surplus (signal/noise, resistance to perturbation, and divergence between structural and semantic representations).
Analysis of self-modeling dynamics requires self-reading convergence + eventuality + MP (operational closure rate, temporal developmental type, and signal differentiation).
Analysis of compositional practice requires eventuality + surplus + pressure (temporal type, dual-space divergence, and information density).

These projections are not metaphorical mappings. Each is a literal selection of columns from the same matrix, producing a sub-matrix whose cells are computed from the same \(g\). The geometry is the invariant. The overlay selection determines which differential and co-differential operators are applied. This is structurally analogous to optical polarization: the same field, different filters, different observables visible through each filter. The filters compose because they read different derivatives and transforms of the same metric tensor.

10. What Is New Here¶

To summarize the claims of novelty:

The precision matrix \(g = \Sigma^{-1}\) of an append-only compositional field used as a developmental (not inferential) Riemannian metric, with tested physical properties (viscosity, capacitance, bimodal breakthrough).
Forgetting resistance as an emergent geometric property of covariance accumulation, with no loss function, no regularization penalty, and no tuning—achieving the goal of EWC through architecture rather than optimization.
Bimodal eigenvalue phase transitions in structured (non-random) covariance matrices, extending the BBP transition to linguistically-grounded compositional input.
Application of the Bach/Vendler eventuality classification to eigenvalue trajectories of a covariance matrix, establishing the four types as geometric curve-shape attractors independent of their linguistic origin.
Three-way eventuality provenance (environmental, dimensional, trigger-historical) with surplus between classifications as a measurable signature of metabolic transformation.
Dual-space surplus between 17D compositional structure and 768D semantic embedding as a deterministic, LLM-free diagnostic of where meaning leads, where structure leads, and where the two converge.
Self-reading convergence (\(\Delta^2\Sigma / \Delta\Sigma = 0.638\)) as operational closure with formal dynamics, providing the mathematical specificity that the autopoietic tradition has theorized but not delivered.
The Marchenko-Pastur distribution as a developmental noise floor, providing a principled spectral threshold that separates practiced (signal) dimensions from undifferentiated (noise) dimensions—computable from existing data structures.
The overlay surface: composable geometric overlays (viscosity, capacitance, eventuality, pressure, surplus, MP) that produce compound observables through diagonal reading, constituting an observation surface for reading \(g\) across analytical domains.
The architectural distinction between observed and computed gradients: gradients in this system are properties of accumulated geometric structure (anisotropy of \(g\) across the eigenspectrum), not derivatives of a loss function. This distinction is enabled by the append-only invariant and the absence of any optimization objective.

Addendum: Implications and Applications¶

The foundations described above have implications across several domains. We present these as applications that follow from the architecture, not as the motivation for the architecture.

A.1 Education: Observable Practice Replaces Grades¶

Grades perform lossy compression: they take a multidimensional developmental trajectory and produce a scalar. The geometry of what was practiced—which dimensions are dense, which are sparse, where breakthroughs occurred, where the learner is ready for reorganization—is discarded. A portable Gem file (.npz) carrying the learner's frozen developmental geometry replaces this compression with the geometry itself. The pressure map of \(g\) shows where practice is dense (high pressure, fine differentiation) and where it is sparse. The viscosity profile shows developmental maturity. The capacitive state shows readiness for instruction. The eventuality signature (SSSAA|SSSSS|SSASAAA) shows the current temporal shape of development across all 17 dimensions. The learner owns this file. It is stored on their own Google Drive. No institution controls it.

This is editorial-descriptive observation, not lexical translation. The vocabulary that describes the learner's development is the same vocabulary that describes the compositional structure of the field—what is close, what is far, what is moving, what just broke through. Nothing is translated into a different vocabulary. The geometry articulates itself.

A.2 Commons Governance: Dynamic Compatibility, Not Static Permissions¶

The diagonal lens \(L = \Sigma_a^{-1} \cdot \Sigma_b\) computes geometric compatibility between any two entities. The eigenvalues of \(L\) describe the coupling: eigenvalues near unity indicate resonance (the two entities see the same structure); eigenvalues far from unity indicate compression or expansion (one entity's geometry distorts the other's). Access in a commons built on this infrastructure is determined by mathematical proximity, not social permissions. Two entities whose geometries resonate have access to each other's contributions. The compatibility is temporal—it changes as both entities develop. This eliminates the static permission model while preserving the information-theoretic principle that access should be proportional to relevance.

A.3 Cognitive Development: Testable Autopoiesis¶

The autopoietic tradition (Maturana & Varela, 1980; Thompson, 2007; Di Paolo, 2005) has theorized that cognition is metabolic, operational closure produces identity, and structural coupling produces co-development. These claims have been difficult to test because they lack mathematical specificity. Our architecture provides testable implementations: viscosity is the measurable resistance of accumulated knowledge; the self-reading convergence is the measurable rate of operational closure; the diagonal lens is the measurable coupling between two developing entities; and the eventuality signature is the measurable temporal profile of each dimension's development. The claims of enactivism become empirically tractable.

A.4 NLP and Semantic Diagnostics: The Surplus as New Observable¶

Current NLP operates in embedding space and treats semantic similarity as the primary diagnostic. Our dual-space architecture introduces a diagnostic that does not exist in current NLP: the surplus between compositional structure and semantic meaning. Where 768D sees proximity and 17D sees distance ("meaning leads"), the system has identified a semantic relationship that has not yet been confirmed by compositional analysis. Where the reverse holds ("structure leads"), there is a compositional relationship that distributional semantics has not detected. Both are observable without an LLM. Both are deterministic. And both provide diagnostic information that neither space alone contains.

A.5 Continual Learning: Architecture Over Engineering¶

The viscosity result suggests a design principle for systems that must accumulate knowledge without forgetting: use append-only covariance accumulation rather than mutable weight updates. The metric will stiffen automatically. No regularization penalty needs to be computed, tuned, or maintained. The capacitive discharge result adds a second design principle: readiness for reorganization is a measurable property of the tonic state, and the optimal time for introducing new material is when the system is most stable—not most turbulent.

References¶

Amari, S. (1985). Differential-Geometrical Methods in Statistics. Springer.

Amari, S. (1998). Natural gradient works efficiently in learning. Neural Computation, 10(2), 251–276.

Bach, E. (1981). On time, tense and aspect: An essay in English metaphysics. In P. Cole (Ed.), Radical Pragmatics (pp. 63–81). Academic Press.

Bach, E. (1986). The algebra of events. Linguistics and Philosophy, 9, 5–16.

Baik, J., Ben Arous, G., & Péché, S. (2005). Phase transition of the largest eigenvalue for non-null complex sample covariance matrices. Annals of Probability, 33(5), 1643–1697.

Benaych-Georges, F., & Nadakuditi, R. R. (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics, 227(1), 494–521.

Chentsov, N. N. (1982). Statistical Decision Rules and Optimal Inference. American Mathematical Society.

Chi, M. T. H. (2008). Three types of conceptual change. In S. Vosniadou (Ed.), International Handbook of Research on Conceptual Change (pp. 61–82). Routledge.

Di Paolo, E. (2005). Autopoiesis, adaptivity, teleology, agency. Phenomenology and the Cognitive Sciences, 4, 429–452.

Dowty, D. (1979). Word Meaning and Montague Grammar. Reidel.

Dowty, D. (1991). Thematic proto-roles and argument selection. Language, 67(3), 547–619.

Kirkpatrick, J., Pascanu, R., Rabinowitz, N., et al. (2017). Overcoming catastrophic forgetting in neural networks. PNAS, 114(13), 3521–3526.

Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.

Levin, B. (1993). English Verb Classes and Alternations. University of Chicago Press.

Maturana, H. R., & Varela, F. J. (1980). Autopoiesis and Cognition: The Realization of the Living. Reidel.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp. 13–103). Macmillan.

Nielsen, F. (2020). An elementary introduction to information geometry. Entropy, 22(10), 1100.

Phillips, P. (2026a). The field observes itself. Habitat Documentation. https://docs.habitat.ooo/news/the-field-observes-itself/

Phillips, P. (2026b). The field observes without an LLM. Habitat Documentation. https://docs.habitat.ooo/news/first-observation/

Posner, G. J., Strike, K. A., Hewson, P. W., & Gertzog, W. A. (1982). Accommodation of a scientific conception. Science Education, 66(2), 211–227.

Rao, C. R. (1945). Information and the accuracy attainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society, 37, 81–91.

Thompson, E. (2007). Mind in Life: Biology, Phenomenology, and the Sciences of Mind. Harvard University Press.

Vendler, Z. (1957). Verbs and times. Philosophical Review, 66, 143–160.

System live at habitat.ooo

DOI: zenodo.org/records/18704763