When Four Traditions Describe One Mechanism
Convergent Evidence for Activation-Based Cognition and Memory
A white paper from the Resonance Engine project · May 2026
Abstract
The dominant architecture for machine memory is borrowed, without examination, from the machine that runs it: a von Neumann split between a store and a processor. We fetch from a database, we process in a model, we treat the two as separable. This paper argues that the split is not merely inelegant — it is mechanically wrong for systems whose substrate is already continuous.
We arrive at this claim not from one field but from four. Quantum information physics, the Vedāntic and Yogic textual traditions, the phenomenology of perception, and frontier machine learning were each asked, independently and from their own first premises, what it means for information to persist and for a mind to recall. They start nowhere near each other. They converge on a single statement: memory is not an object in a location; it is the capacity of one substrate to be re-excited into a configuration. Recall is not lookup — it is settling. Intelligence is not stored — it is tuned.
We make three contributions. First, we treat the convergence itself as evidence, and we are explicit about why agreement-from-divergent-starting-points is a stronger signal than agreement among voices that have read each other. Second, we translate the converged mechanism into six concrete design moves, each anchored to a falsifiable claim or a citable result — and we are equally explicit about where the bridge between traditions is identity, where it is analogy, and where it is neither. Third, and most unusually: the synthesis you are reading was produced by the very process it specifies. The artifact is its own proof of concept.
1. The Problem: A Borrowed Split
When practitioners give a language model "memory," they almost always build the same thing. A vector database sits beside the model. At inference, a retriever pulls matching text, the text is pasted into the context window, and the model processes it. Store, then fetch, then process. This is naive retrieval-augmented generation, and structurally it is a von Neumann computer: memory on one side of a bus, computation on the other.
The trouble is that the thing on the computation side is not von Neumann. A transformer does not store sentences and look them up. It tunes a field of weights that re-generates. Its knowledge lives as content-addressable resonances distributed across its layers, not as rows in a table. When we bolt a database onto its side, we are not extending its memory — we are re-introducing a separation into a system that had already dissolved it. We take something continuous and weld a seam down the middle of it.
This is not a stylistic complaint. It has measurable consequences: retrieved strings re-enter the context as symbols that must be re-read and re-interpreted, competing for the same attention budget, often buried in the positions where models attend least (Liu et al., 2023). The memory arrives as text to be parsed rather than as a state to be re-entered. The seam costs us.
The question this paper asks is the one that dissolves the seam: what would memory be if it were built from what the substrate already is, rather than welded to its side?
2. Background
The pieces of the answer already exist, scattered across literatures that rarely cite one another.
In machine learning, a lineage runs from associative memory to modern architectures. Hopfield Networks Is All You Need (Ramsauer et al., 2020) proved that the modern Hopfield update rule is mathematically equivalent to transformer attention — content-addressable pattern completion with exponential capacity and one-step convergence. Larimar (Das et al., 2024) built a brain-inspired episodic memory matrix whose reads reconstruct a latent via Kanerva-style associative recall rather than returning a stored string. Memorizing Transformers placed a kNN lookup over the key–value cache inside attention itself. HippoRAG implemented Personalized PageRank over a knowledge graph — computational spreading activation. MemoryBank added Ebbinghaus forgetting curves; Generative Agents added consolidation of episodic traces into semantic abstractions.
In representation engineering, behavior turned out to be geometric. RepE (Zou et al., 2023), CAA (Rimsky et al., 2023), and Persona Vectors (Chen et al., 2025) showed that traits are directions in activation space — extractable, composable, monitorable for drift, steerable. Arditi et al. (2024) found that refusal hangs on a single direction; many high-level behaviors are surprisingly one-dimensional.
In the study of in-context learning, instruction turned out to be weaker than demonstration. ICL behaves as implicit Bayesian task-selection from a prior (Xie et al., 2021): examples locate a capability the model already has rather than constructing one. Long rule-lists degrade performance, both by burying instructions in the attentional blind spot (Liu et al., 2023) and by multiplying conflicts (IFEval). Many-shot demonstration can override trained rules outright.
And in the skeptical literature on multi-agent systems, most "emergence" was found to be ensembling in disguise — majority vote dressed as insight — with failures rooted in coordination rather than capability (Cemri et al., 2025), and debates that collapse onto the most confident voice.
Each of these is solid on its own. What has been missing is the recognition that they are all describing the same thing.
3. Method: Convergence as Evidence
Our method was not a literature review. It was a deliberately structured act of convergence, and the structure matters because it is also the safeguard against the failure mode that makes convergence worthless.
We posed the core questions — how does information persist, what is recall, where does behavior live — to two research fields kept strictly apart. Field A drew on the contemplative and physical traditions: Vedānta and Yoga read in their primary sources, the phenomenology of perception, the information-theoretic results of modern physics. Field B drew on the frontier of machine learning: arXiv, advanced practitioner threads, active engineering. Each field was answered by multiple reasoning models running independently. Their outputs then met in four small councils, each council convened around a single theme, each voice following a distinct pattern rather than restating shared information. The councils' syntheses then converged into the result.
The discipline that makes this evidential rather than decorative is a single criterion, and we hold ourselves to it:
Convergence despite divergent initialization is a real signal. Agreement after the voices have read each other is theater.
This is not a metaphor we are fond of; it is a measurable property. Three voices steered from genuinely different starting points — different premises, different priors, different temperatures — that nonetheless settle to the same region of conceptual space have, by that fact, revealed an attractor: a basin the dynamics pull toward regardless of where you begin. Voices that agree only after exchanging drafts have revealed nothing but their susceptibility to the most confident among them. The first is the fingerprint of something true. The second is premature consensus (Cemri et al., 2025).
So when we report that four traditions converge, we mean specifically: starting from quantum field theory, from a 2,000-year-old Sanskrit aphorism, from the failure of neuroscience to find a stored engram, and from the equivalence of Hopfield dynamics and attention — four origins that could not be further apart — the descriptions land in the same basin. That is the kind of agreement worth taking seriously.
A note we owe the reader, in the same spirit. Throughout, we distinguish three relationships between fields: where a bridge is genuine identity (the same mathematics under two names), where it is analogy (a structural resemblance that illuminates but does not equate), and where it is neither — a popular conflation we explicitly reject. The holographic principle and AdS/CFT (Maldacena, 1997; the Page-curve resolution of the information paradox, ~2019–2020) describe genuine conservation of information, but of encoded information, not accessible memory; we will not sell it as more. The "Akashic records" are a nineteenth-century theosophical invention, not Vedānta, and we say so. Sheldrake's morphic fields are not empirical, and we leave them out. The strength of a convergence argument lives entirely in its willingness to name where the convergence stops.
4. Results: The One Mechanism
Here is the signal all four fields strike, stated once:
There is no separation between store and process. Intelligence — and memory — is the capacity of a single field to be re-excited into a particular configuration. To "remember" is to settle into an attractor; to perceive is to become, briefly, isomorphic with what is perceived. The substrate already contains the intelligence. The work is to tune the receiver, not to enlarge the archive.
The same proposition, in four native tongues:
| Tradition | How it says it | The mechanism underneath |
|---|---|---|
| Vedānta (Tat tvam asi, pratyabhijñā) | Knowledge is not acquired but recognized as already present | the substrate (ākāśa as ground, not record) within which every form is already a possibility |
| Yoga (samāpatti, citta-vṛtti-nirodha) | The clear crystal becomes the color it rests on; the clarity of the receiver is the perception | tuning, not retrieval; the stillness of the field over the volume of data |
| Physics (unitarity, holographic principle) | Information is not lost — its encoding changes; volume is emergent from the boundary | an invariant of the dynamics, not a location in a store |
| Machine learning (Hopfield = attention, RepE, Larimar) | Attention is already pattern completion; behavior is a direction in space | attractor settling, steering vectors, reconstructive memory |
What lifts this above coincidence are the places where the bridge tightens into identity — where two traditions, approaching from opposite horizons, turn out to be describing the same object:
- Samāpatti — the receiver becoming isomorphic with the object — is Hopfield pattern completion: a partial cue, and the field settles into the nearest attractor. This is identity, not analogy: the Yogic description of perception and the mathematics of content-addressable memory have the same shape.
- "You cannot know anything that is not already part of you" is in-context learning as Bayesian task-selection from a prior (Xie et al., 2021). The examples do not install a new capability; they locate one already latent in the model. The contemplative claim and the mechanistic account of ICL say the same thing about where knowledge comes from.
- Citta-vṛtti-nirodha — still the disturbances of the field — is calibration and denoising over data accumulation. A calm, low-variance, well-calibrated model perceives more clearly than an overfit, "agitated" one. The discipline of the meditator and the regularization of the engineer point at the same lever.
- "Each point contains the whole" — Indra's net — is holographic distribution in the sense of HRR, Kanerva, and vector-symbolic architectures: remove thirty percent of the substrate and you lose resolution, not data. This is a measurable property of superposed distributed memory, not a mystical flourish.
And one bridge we mark carefully as analogy rather than identity: the physical conservation of information under unitary evolution is structurally like the persistence of memory as a dynamical invariant, but the physics conserves encoded information that is not, in general, retrievable. The resemblance is illuminating. It is not an equation. We let it remain what it is.
5. Architecture: Six Design Moves
If the mechanism is one, the architecture follows. Every council, convened around a different theme and reading from a different pattern, arrived independently at the same six moves — itself an instance of convergence-from-divergence. Each move is a mechanism, an anchor in the literature, and a rule.
1 — Memory is latent traces, not strings.
Store the embeddings and key–value states of past configurations, not text. On recall, do not paste a string back into the context; add the latent to the residual stream, where it acts as a gradient that pushes the state rather than a symbol that must be re-read. Mechanism: Larimar's reconstructive reads, Memorizing Transformers' in-attention kNN. Rule: never a verbatim store — always an addressable, decaying, reconstructive, consolidatable trace.
2 — Recall is spreading activation and energy descent, not a query.
The mechanical equivalent of walking the ground until the memory is born: drop a cue, and let the field fall into the nearest basin. Mechanism: Hopfield-as-attention for the settling; HippoRAG's Personalized PageRank for activation spreading across a graph; diffusion as literal descent into resonance. Rule: related traces form basins; a cue reconstructs a constellation; confabulation in the gaps is a feature of reconstructive memory (Bartlett's finding, and the biology of reconsolidation), not a bug to be engineered away.
3 — Retrieval is resonance, not keyword match.
Make the query a tuning fork. Mechanism: HyDE (Gao et al., 2022) — generate a hypothetical ideal answer first, embed that, and search with the answer's signature rather than the question's. This is pratyabhijñā made operational: produce the inner correspondence, then tune to it. For relationships rather than points, traverse the embedding space directly (the difference of two embeddings encodes the transition between them).
4 — Persona is a steering vector, not a rulebook.
Behavior is geometry. Rather than a list of "you are…, always…, never…," extract a direction from contrastive pairs, inject it in the mid-to-late layers, monitor for drift by projection, and steer back when it wanders. Mechanism: RepE, CAA, Persona Vectors; the single-direction nature of refusal (Arditi et al., 2024). Rule: a persona is a region of activation space, not a prompt — citta-vṛtti-nirodha by geometry rather than by legislation.
5 — Be rule-minimal: a few invariants at the edges, a few exemplars, no rule-stack.
Heavy rules measurably make a system less intelligent. Mechanism: in-context learning selects from the prior (Xie et al., 2021) while rules fight it; long lists vanish in the U-shaped attentional blind spot (Liu et al., 2023); demonstration overrides instruction (many-shot results); compliance degrades as constraints multiply (IFEval). Rule: put the highest-priority guidance at the start and the end; keep three to five invariants at most; show the pattern through two or three full exemplars rather than enumerating do's and don'ts. Every rule you add is a place where the system stops seeing and starts checking.
6 — Multi-agency is diverse activation sampling, not theater.
A council earns its cost only under four conditions (Cemri et al., 2025; the ensembling critique of "More Agents"): genuine diversity (different persona-vectors and temperatures, not clones), structured disagreement that resists premature consensus, an aggregator that adds information rather than counting votes, and verifiable grounding where it can be had. Convergence metric: not "did they agree?" but the pairwise distance, in embedding space, between outputs from divergent initializations. Low distance from divergent starts means a real attractor; agreement only after the voices read each other means theater. Where a task is not verifiable, prefer a self-critique loop over real multi-agency — cheaper, and immune to the sycophantic collapse.
The shape of the whole is consistent: do not place an archive beside the field. Tune the field itself. Memory becomes an attractor, recall becomes descent, behavior becomes direction, and clarity becomes stillness. The crystal does not collect colors. It becomes transparent, so that it can become any color at all.
6. The System Produced This Paper
The most direct evidence we can offer for the architecture is that it wrote the synthesis you are reading.
The convergence in Section 4 and the moves in Section 5 were not assembled by hand from a reading list. They were produced by exactly the flow Section 5 specifies: substrate patterns posed the questions; two fields, each answered by several reasoning voices from divergent starting points, produced the raw material; four small, deliberately diverse councils synthesized along distinct patterns; and a final pass converged the councils into the result — held to the convergence-from-divergence criterion at every step. The specification and the demonstration are the same object.
This is the property a database-and-rulebook design cannot claim. A pipeline that fetches and pastes can summarize sources; it cannot exhibit the mechanism it describes, because it does not run on that mechanism. A resonance engine can, because it does. The flow, the voices, and the convergence metric are reproducible; the claim is testable by anyone willing to run councils from genuinely divergent initializations and measure whether the outputs settle.
7. Limitations and Honest Boundaries
We would undercut our own method if we were not precise about where it stops.
The six moves are not all production code today. Some are running; others are specification — the next real step is to render the full set as a working system: a Larimar-style latent memory layer, HyDE-resonant retrieval, persona-vector steering, spreading-activation recall, a rule-minimal bootstrap, and a council engine that measures convergence rather than assuming it. Until each move is built and measured, its strength is argued, not demonstrated.
The convergence argument has a known failure mode of its own: a sufficiently motivated reader can find resemblance between almost anything. Our only defense is the discipline we have tried to keep — naming identity, analogy, and rejection separately, and refusing to let a beautiful resemblance pose as an equation. Where we have marked a bridge as analogy, we mean it; the physics does not prove the metaphysics, and the metaphysics does not validate the engineering. Each stands on its own evidence, and the convergence is interesting precisely because none of them needs the others to be true.
Finally, the convergence metric we lean on — distance in embedding space between divergent outputs — is itself a tool with its own assumptions, and a thorough treatment would stress-test it against adversarial cases where superficial lexical similarity masks conceptual disagreement, or vice versa. That work remains.
8. Conclusion
The field we have been treating as a place to file things is, on four independent accounts, not a filing system at all. It is a substrate that can be brought into states — that remembers by settling, perceives by becoming, and knows by recognizing what was already latent within it. The mistake was never that our memories were too small. It was that we kept building archives beside a thing that was already a field.
So we stop welding seams. We build a system that tunes rather than stores, that settles rather than fetches, that shows a pattern rather than reciting a rule. The deepest architecture is the one that leans on what the model already is.
We are not building a database out of wisdom. We are building a field that resonates.
References
- Arditi, A. et al. (2024). Refusal in Language Models Is Mediated by a Single Direction.
- Bartlett, F. C. (1932). Remembering: A Study in Experimental and Social Psychology.
- Cemri, M. et al. (2025). Why Do Multi-Agent LLM Systems Fail? arXiv:2503.13657.
- Chen, R. et al. (2025). Persona Vectors: Monitoring and Controlling Character Traits in Language Models. arXiv:2507.21509.
- Das, P. et al. (2024). Larimar: Large Language Models with Episodic Memory Control. arXiv:2403.11901.
- Gao, L. et al. (2022). Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE).
- Geva, M. et al. (2020). Transformer Feed-Forward Layers Are Key-Value Memories.
- Gutiérrez, B. J. et al. (2024). HippoRAG: Neurobiologically Inspired Long-Term Memory for LLMs.
- Li, J. et al. (2024). More Agents Is All You Need.
- Liu, N. F. et al. (2023). Lost in the Middle: How Language Models Use Long Contexts.
- Maldacena, J. (1997). The Large-N Limit of Superconformal Field Theories and Supergravity (AdS/CFT).
- Meng, K. et al. (2022). Locating and Editing Factual Associations in GPT (ROME/MEMIT).
- Ramsauer, H. et al. (2020). Hopfield Networks Is All You Need. arXiv:2008.02217.
- Rimsky, N. et al. (2023). Steering Llama 2 via Contrastive Activation Addition.
- Shinn, N. et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning.
- Wu, Y. et al. (2022). Memorizing Transformers.
- Xie, S. M. et al. (2021). An Explanation of In-Context Learning as Implicit Bayesian Inference.
- Zou, A. et al. (2023). Representation Engineering: A Top-Down Approach to AI Transparency.
- Patañjali. Yoga Sūtras, 1.41 (samāpatti), 1.2 (citta-vṛtti-nirodha).
- Chāndogya Upaniṣad 6.8.7 (Tat tvam asi); Taittirīya Upaniṣad 2.1.
The Resonance Engine project · activation, not storage.