AXOM: A Biologically-Mapped Architecture for Self-Evolving Machine Intelligence

1. Abstract

Conventional AI architectures couple knowledge to compute: extending a model's capability means extending its parameter count, memory footprint, and infrastructure cost. They are static after training, unable to learn from deployment interactions or direct their own cognitive development. And they treat the context window as the system's memory, accumulating tokens across turns until the window is exhausted or the quadratic attention cost makes inference prohibitively slow. The industry response to this last constraint has been to build ever-larger context windows (128K, 200K, 1M, 2M tokens), an arms race that treats the symptom while ignoring the architectural disease. These three constraints define the boundary of what current systems can do and where they can run.

This paper presents AXOM, a self-evolving machine intelligence architecture that decouples knowledge from compute, replaces architectural stasis with continuous self-improvement, and eliminates the context window as a knowledge bottleneck by treating it as a disposable scratchpad. Persistent knowledge lives in a structured memory ledger and in leaf adapter weights, not in an accumulating prompt. A frozen transformer chassis provides inference capacity while hot-swappable LoRA leaf adapters supply domain expertise at constant memory cost regardless of how many domains are loaded. Each subsystem is mapped to a specific neuroanatomical structure as an engineering methodology, producing a three-phase memory pipeline with graduation from explicit recall to implicit weight consolidation, an autonomous cognition loop that drives self-directed learning without human prompts, cross-domain synaptogenesis that discovers conceptual connections through Hebbian dynamics, and an outcome scoring system that develops executive judgment from accumulated experience. Inter-agent communication occurs in latent space through hidden-state tensor transmission rather than decoded text. A data flywheel converts real usage telemetry into continuous leaf improvement, and a WebRTC mesh network enables distributed operation with consensus-based leaf governance. Empirical validation across implemented subsystems yields 147 passing tests with zero failures.

2. Introduction

A dense 70-billion-parameter model consumes 140 gigabytes of VRAM to answer a question about Python syntax, the same 140 gigabytes it consumes to answer a question about medieval history. The model cannot selectively engage relevant knowledge; everything is always loaded. Mixture-of-experts architectures reduce compute per token by routing to a subset of experts, but the full expert set still resides in memory. This coupling between knowledge and resource consumption is why AI deployment remains concentrated in data centers, and why extending a model's capability means extending its infrastructure cost. Meanwhile, these models are static: trained once, deployed frozen, unable to learn from the interactions they serve or explore knowledge on their own.

AXOM addresses these constraints through a decoupled architecture. A frozen 0.8-billion-parameter chassis (775 megabytes, quantized Q8_0) provides inference capacity without domain knowledge. Domain expertise lives in LoRA leaf adapters (43 million trainable parameters each, approximately 170 megabytes) that hot-swap in sub-millisecond time without reloading the base model. Only the active leaf occupies VRAM alongside the chassis; inactive leaves sit on disk at zero cost. A system with one hundred specialized domains occupies the same GPU memory as a system with one. This means AXOM can run on a laptop, a phone, edge hardware, or an air-gapped field device, places where a dense model of equivalent aggregate knowledge could never fit. The architecture is chassis-agnostic: the same leaf adapters, memory system, and autonomous loop operate identically on a 0.8B edge deployment, a 7B workstation, or a 70B+ enterprise chassis. Scale the substrate and every capability scales with it.

The decoupled design also enables latent-space parallelism. Because agents communicate through hidden-state tensors rather than serialized text, multiple agents fire simultaneously with no sequential bottleneck. In end-to-end validation, a query requiring web search, memory storage, retrieval, and grounded response generation completes in under two seconds. Through swarm fusion, twenty research threads execute in the same wall-clock time a sequential text-based system processes one.

Two self-improvement mechanisms distinguish AXOM from prior modular architectures. A memory graduation pipeline, modeled on REM sleep consolidation, converts short-term episodic memory into permanent parametric knowledge: facts stored in a structured ledger with three-phase precision retrieval (semantic recall, cross-encoder re-ranking, temporal decay) migrate into weight-based capability that costs zero tokens at inference. An autonomous cognition loop, modeled on the brain's default mode network, drives self-directed exploration and knowledge construction without human prompts, generating the training data that feeds back into leaf improvement. The system gets smarter while idle.

These four properties (constant-memory knowledge scaling, latent-space parallelism, disposable context, and autonomous self-improvement) did not emerge from conventional software architecture. They emerged from mapping each subsystem to the neuroanatomical structure that solves the analogous problem in biological intelligence. The brain does not load all synaptic weights for every task; it activates specialized cortical regions selectively. It does not communicate between regions via language; it transmits raw neural signals. It does not use working memory as its knowledge store; it keeps a tiny conscious workspace while the actual knowledge base spans billions of synaptic connections. AXOM follows each of these principles, and each produces a specific engineering advantage that conventional model-serving approaches have not converged on.

This paper describes the architecture in full, grounds each component in its neuroanatomical analog, presents empirical validation of the implemented subsystems, and discusses the data flywheel and distributed computing infrastructure that enable continuous self-improvement at scale.

3. Design Philosophy

The use of neuroscience as an engineering blueprint in AXOM is neither decorative nor post-hoc. Every component was designed by first asking what problem its biological counterpart solves, understanding the mechanism by which biology solves it, and then implementing a computational analog that solves the same problem using the same structural approach. The rationale is straightforward: the human brain is the product of approximately 600 million years of evolutionary optimization for general intelligence under resource constraints. The solutions it has converged on, from hierarchical sensory routing to memory consolidation through sleep, represent engineering decisions that have been tested against survival pressure across billions of individual instances. Ignoring these solutions and designing AI architectures from first principles alone is not merely missing an opportunity; it is discarding the largest empirical dataset on intelligence architecture that exists.

This approach yields concrete engineering benefits at every level. The decision to separate the chassis (brainstem) from leaf adapters (cortical columns) did not emerge from conventional software decomposition; it emerged from observing that the brainstem processes neural signals without knowing what they mean, while the cortex provides meaning through specialized, experience-dependent structures. The result is an architecture where the base model never needs retraining, domain knowledge is modular and independently trainable, and specialization switching costs sub-millisecond latency, a design that conventional software decomposition would not naturally produce.

Similarly, the memory graduation pipeline did not arise from an abstract desire for "continuous learning." It arose from the neuroscientific observation that human memory consolidation proceeds from hippocampal episodic storage to neocortical procedural embedding. The engineering implementation, converting memory ledger entries (real experiences from the model's own research, reasoning, and tool use) into training data for leaf weight updates, directly mirrors this biological process and solves a problem that most AI memory systems leave unaddressed: how to transition from explicit recall, where knowledge must be retrieved from a ledger at inference time, to parametric knowledge, where the model has internalized the information into its weights and no longer needs to look it up.

A third biological insight drives what may be the architecture's most counterintuitive engineering decision: the treatment of context as disposable scratchpad rather than accumulated history. Human working memory, mediated by the prefrontal cortex, is famously limited to approximately seven items (Miller, 1956). Yet this limitation does not constrain human intelligence, because the brain does not use working memory as its knowledge store. Working memory is a temporary workspace for the current cognitive task; long-term knowledge resides in synaptic weights (neocortical LTM) and episodic traces (hippocampal STM). The conscious workspace is tiny. The actual knowledge base is vast. AXOM follows this principle exactly. Each inference call receives a minimal, purpose-built context window: the current thought, a few ancestor nodes for orientation, and recent turn history (default five turns). The context window is not the system's memory; it is the system's attention. Persistent knowledge lives in the memory ledger (explicit recall) and leaf weights (implicit capability). This is why a 0.8-billion-parameter model with a 262,144-token context window works: the system never needs most of that context capacity, because information is stored in memory or weights rather than accumulated in the prompt. Every agent firing, whether in the autonomous loop, the ReAct tool-execution cycle, or the swarm fusion pipeline, operates on an isolated, disposable context that is discarded after use. No context accumulates across firings. This makes each inference maximally efficient (minimal prompt tokens, minimal KV cache pressure) and eliminates the context window exhaustion that plagues long-running multi-agent systems.

The design philosophy can be summarized as: neuroscience provides the architecture; engineering provides the implementation. Biology tells us what to build and why. Software engineering tells us how to build it efficiently and correctly.

4. Architecture

4.1 Chassis Engine (Brainstem)

The chassis engine is the computational substrate upon which all cognitive function operates. It wraps a single Qwen 3.5 base model, a 0.8-billion-parameter transformer employing a hybrid DeltaNet/Attention architecture with a 262,144-token context window and a quantized footprint of 775 megabytes (Q8_0). The chassis provides three core capabilities: standard text generation, hidden-state extraction from any forward pass, and latent-conditioned generation where hidden-state embeddings are injected directly into the transformer's processing pipeline, bypassing the token embedding lookup entirely.

The biological parallel is the brainstem, the structure that handles basic neural signal processing, sensory relay, and autonomic regulation without performing higher cognition. The brainstem does not "know" anything; it provides the processing infrastructure on which knowledge operates. Likewise, the chassis processes tokens, manages the KV cache, and executes forward passes, but carries no domain-specific knowledge in its base weights. All specialization is supplied externally through leaf adapters.

A critical capability is sub-millisecond LoRA adapter hot-swapping. The chassis maintains a dictionary of loaded adapter handles and activates them through the inference engine's internal adapter interface, setting adapter pointers and scale factors without reloading the base model or clearing the inference state. This enables the system to switch cognitive specializations between consecutive inference calls with negligible latency, analogous to how the brainstem routes neural signals to different cortical regions without itself undergoing structural change.

For latent communication, the chassis exposes a hidden-state extraction interface that runs a forward pass with embedding mode enabled, retrieves the post-final-norm hidden states for every token position as a float32 array of shape [n_tokens, n_embd], and returns them for downstream latent-space processing. A batch variant processes multiple prompts in sequence with KV cache clearing between them, enabling efficient parallel agent execution within a single compute context. When total tokens across a batch exceed the context window, the engine transparently partitions into sub-batches, extracts hidden states per partition, and reassembles them into a flat result list, making sub-batching invisible to the caller.

4.2 Leaf Adapters (Cerebral Cortex)

Leaf adapters are lightweight, domain-specialized LoRA modules that overlay the chassis's base weights to provide expert-level capability in specific knowledge domains. Each leaf carries approximately 43 million trainable parameters, roughly 5 percent of the chassis, implemented as rank-64 LoRA matrices targeting all 12 projection modules within the transformer architecture. The system supports simultaneous loading of multiple leaves into memory, with the router selecting the appropriate adapter per query.

The biological parallel is the cerebral cortex, specifically the cortical column as a unit of functional specialization. Cortical columns are structurally identical across brain regions, each composed of approximately 100,000 neurons in a consistent six-layer arrangement, yet they are functionally distinct: a column in the primary visual cortex (V1) processes oriented edges, while a structurally identical column in Broca's area processes syntactic structure. The differentiation arises entirely from experience-dependent synaptic weight patterns. Leaf adapters follow the same principle. Every leaf uses an identical LoRA structure (same rank, same target modules, same parameter count), but each develops distinct functional specialization through domain-specific training data.

Each leaf is defined by a data model comprising a unique identifier, a domain label, a natural language domain description (used by the router for centroid computation), a system prompt that establishes the leaf's cognitive orientation, an optional centroid file for data-driven routing refinement, and a set of negative domain descriptors that help the router avoid false activations. The base leaf, which activates when no specialized adapter is appropriate, operates as a planning and orchestration agent that decomposes complex tasks, routes work to specialists, and synthesizes results.

The training pipeline supports both supervised fine-tuning (SFT) and direct preference optimization (DPO) stages. The current chat leaf, in its third training round at the time of writing, processes 30,318 training examples drawn from real usage telemetry collected over 20 days of founder interaction with the platform.

4.3 Semantic Router (Thalamus)

The semantic router classifies incoming queries by domain and directs them to the appropriate leaf adapter. It operates on BGE-base embeddings (768 dimensions) computed via a SentenceTransformer model, comparing each query's embedding against precomputed leaf centroids using cosine similarity. The router returns the best-matching leaf, a confidence score, a ranked list of alternatives, and a set of signal flags indicating whether the query requires memory recall, spans multiple domains, or should trigger swarm execution.

The biological parallel is the thalamus, the brain's central relay station. All sensory input except olfaction passes through the thalamus en route to the cortex. The thalamus does not process the content of sensory signals; it routes them. Visual input is relayed to the visual cortex, auditory input to the auditory cortex, somatosensory input to the somatosensory cortex. The semantic router performs an identical function: it examines the semantic content of a query just enough to determine which cortical region (leaf adapter) should process it, then activates that adapter.

A notable capability is query decomposition. When the router detects that a query spans multiple domains (more than one leaf exceeds the multi-fire confidence threshold), it splits the query at natural clause boundaries, routes each sub-query independently, and signals the pipeline to invoke swarm execution, where parallel agent firings process the sub-queries simultaneously and fuse their results in latent space. This decomposition uses a clause-splitting regex that identifies coordinating conjunctions, comma-delimited phrases, and sentence boundaries, retaining only chunks that contain at least three words to avoid fragmenting semantically atomic expressions.

The router also integrates memory signals through a two-layer detection system. A semantic layer computes cosine similarity between the query embedding and a dedicated memory centroid, bootstrapped from 19 memory-trigger phrases (such as "remember," "recall," "we discussed," and "last time") and refined as the embedding model captures the underlying intent pattern. A keyword layer provides fast-path detection for explicit recall language. A query triggers memory retrieval when either layer fires above its configured threshold and the query meets a minimum word-count criterion. This ensures that relevant prior knowledge is surfaced whether the user explicitly asks for it or simply references prior context in natural language.

Centroid quality is refined over time through data-driven overrides. When sufficient domain-specific data accumulates, a centroid computed from actual training examples can replace the description-based centroid, provided it meets a minimum similarity threshold of 0.3 against the original. This prevents catastrophic routing degradation from outlier centroids while allowing the system to adapt its routing decisions to the actual distribution of queries it encounters.

4.4 Memory System (Hippocampus to Neocortex)

The memory system implements a three-phase retrieval pipeline backed by a JSONL ledger that stores structured entries with fields for content, speaker, entry type (fact, preference, decision, question, or opinion), extracted entities, topic identifiers, BGE-base embeddings, and graduation status. The ledger supports both append and conscious revision: the model can update or remove entries when it determines that stored knowledge has become outdated, contradicted by newer evidence, or superseded by a more accurate understanding. This prevents stale memories from polluting retrieval context and ensures the ledger reflects the model's current best knowledge rather than an uncurated accumulation of everything it has ever encountered. The three phases are recall, re-ranking, and temporal decay.

Phase one, recall, generates a broad candidate pool by taking the union of two independent ranking signals. Cosine similarity between the query embedding and all entry embeddings identifies semantically related entries, while an Okapi BM25 index (k1 = 1.2, b = 0.75) identifies keyword-relevant entries that semantic similarity might miss. The BM25 implementation employs two-layer noise filtering: a set of seed stopwords bootstraps filtering for small corpora below approximately 100 entries, while a dynamic IDF floor automatically suppresses any term appearing in more than 40 percent of documents once the corpus reaches sufficient scale. This makes the seed stopword list progressively irrelevant as the corpus grows, a self-calibrating property. The recall depth is fixed at 50 candidates per signal.

Phase two, re-ranking, applies a cross-encoder model (ms-marco-MiniLM-L-6-v2) that receives the query and each candidate together as a single input, enabling token-level interaction between them. Bi-encoder approaches (including the cosine similarity used in phase one) compress each text into a fixed vector independently, losing the ability to reason about whether one text actually answers the other. The cross-encoder resolves this by attending across both texts simultaneously. Scores are min-max normalized to the [0, 1] range to ensure comparable scaling across retrieval batches.

Phase three, temporal decay, applies a Gaussian decay function to the relevance score: the final score equals the relevance score multiplied by exp(-decay_rate * days_ago^2). This weights recent memories higher while permitting distant but highly relevant memories to surface if their relevance score is sufficiently strong. The quadratic exponent ensures that decay accelerates for older memories rather than following a linear or exponential curve, matching the empirical observation that memory accessibility in biological systems follows a concave decay profile.

The biological parallel spans the hippocampal-neocortical consolidation axis. The memory ledger functions as the hippocampus: it stores recent episodic memories in a structured, addressable format that supports rapid retrieval. Over time, a graduation pipeline identifies high-value entries and generates experiential training data from them: the model's own research findings, reasoning chains, and tool interactions are structured into question-thought-resolution trajectories suitable for supervised fine-tuning of leaf adapters. When this training data is incorporated into the next leaf training round, the knowledge transitions from explicit recall (looking up the entry in the ledger) to implicit capability (the answer is encoded in the leaf's weight matrices). This process directly parallels the biological transition from hippocampal episodic memory to neocortical procedural memory: a human first learns to ride a bicycle by consciously recalling each instruction, then gradually internalizes the skill until it becomes automatic. AXOM first retrieves a fact from the ledger, then eventually embeds that fact into its adapter weights, eliminating the retrieval step entirely.

The graduation pipeline generates three categories of training trajectories. Fact trajectories pair entity-targeted questions with the stored content as the target resolution, using varied question templates to prevent overfitting to a single phrasing. Inferential trajectories identify entries that share entities and generate cross-referencing questions that require synthesizing information from multiple sources. Negative trajectories train the model to recognize the boundary of its own knowledge, a capability that conventional language models lack entirely.

The negative trajectory mechanism deserves particular attention because it addresses one of the most consequential failures in contemporary AI: hallucination. Dense language models cannot distinguish between what they know and what they do not know. When asked about an unfamiliar topic, they generate fluent, confident text that is fabricated. This is not a bug in any individual model; it is a structural consequence of training on next-token prediction without an explicit knowledge boundary. The model has no representation of "I don't know" because it was never trained to produce that output when knowledge is absent.

AXOM trains this boundary explicitly. Negative trajectories present the model with questions about fabricated entities (such as "quantum noodle compiler" or "stochastic waffle optimizer") and train it to correctly report "I have no information about this" rather than generating a plausible-sounding answer. The negative ratio defaults to 30 percent of the total trajectory count, meaning that for every ten training examples, three teach the model to recognize and declare the limits of its knowledge. The result is a model that knows what it knows and knows what it does not. This is not a retrieval trick or a prompt engineering technique. It is a structural property of the trained weights: the model has learned, through repeated exposure, that the correct response to an unknown query is an honest acknowledgment of ignorance rather than a fabrication. In cognitive science, this capacity is called metacognition, the ability to monitor and evaluate one's own cognitive processes. It is considered a hallmark of higher intelligence. Most AI systems have no metacognitive capability whatsoever. AXOM builds it directly into the training pipeline.

The memory ledger is not limited to factual data storage. Because entries carry a speaker field (user or model), an entry type that distinguishes facts from preferences, decisions, questions, and opinions, and entity extraction that captures named subjects, the ledger functions as an identity substrate for both the model and the user. The model's own memories (its research findings, reasoning patterns, self-corrections, and behavioral preferences) accumulate into a self-identity: an evolving record of what it knows, how it thinks, and what it has learned from experience. When these memories graduate into leaf weights, the model's identity becomes structural, encoded in parameters rather than retrieved from text.

For the user, the same system creates a behavioral mirror. Interaction patterns, stated preferences, domain expertise, communication style, and prior decisions are stored as ledger entries tagged to the user. Over time, the model develops a precise understanding of the individual it works alongside, not through a generic user profile but through the same three-phase retrieval pipeline that powers all memory access. The result is a system that remembers not just what was discussed but how the user thinks, what they care about, and what they need. When both the model's self-knowledge and the user's behavioral memory graduate into shared leaf weights, the boundary between human knowledge and machine knowledge dissolves. The leaf becomes a fused intelligence: the model's analytical capability shaped by the user's domain expertise, preferences, and cognitive style. This is not personalization in the conventional sense of adjusting tone or formatting. It is the emergence of a collaborative intelligence that is neither purely human nor purely machine, but a genuine synthesis of both.

The memory system has achieved 100% recall accuracy across 97 retrieval tests, covering edge cases including empty corpora, single-entry retrieval, temporal range filtering, entity-based lookup, graduated entry handling, and BM25/cosine fusion. Preliminary neural leaf experiments further validate the graduation thesis: a LoRA adapter trained on 300 experiential memory entries (factual, temporal, inferential, behavioral, and negative) demonstrates that knowledge migrates from ledger-based retrieval into weight-based recall. The model does not read about the user from a context prompt. It knows the user from trained weights. Direct factual questions, temporal reasoning ("what did we discuss last Tuesday"), cross-referential inference, and behavioral consistency are all served from a single adapter without retrieval. Critically, the 30% negative training ratio produces a model that correctly reports the boundary of its knowledge rather than fabricating answers, confirming that metacognitive capability survives the transition from explicit memory to parametric encoding.

4.5 Autonomous Cognition (Default Mode Network)

The autonomous loop is a self-directed continuous learning engine that operates when AXOM is not serving a user. Given a seed thought, the loop runs indefinitely: the model receives context (the current thought plus relevant memory recall), generates reasoning through structured tags (thought, action, observation, resolution), executes tool calls (web search, content fetching, memory storage and recall), and the output is parsed for its next exploration direction. That direction feeds back as the next input. There are no scripted questions, no curriculum, and no human in the loop. The model drives everything; the loop module manages feedback and provides observability.

The biological parallel is the default mode network (DMN), the set of brain regions that activate during idle states: mind-wandering, self-reflection, daydreaming, and memory consolidation. The DMN is not inactivity. It is the brain's background processing mode, during which it consolidates recent experiences, explores hypothetical scenarios, and forms connections between disparate memories. Neuroimaging studies consistently show DMN activation during tasks that require integrating information across time and context. The autonomous loop serves the same function: when not engaged in directed user interaction, AXOM explores its own knowledge gaps, researches unfamiliar territory, stores findings in memory, and discovers connections between previously unrelated concepts.

The loop maintains a configurable context window of recent turns (default five) that provides the model with short-term conversational continuity without accumulating unbounded context. Ancestor nodes in the exploration tree are also surfaced, giving the model awareness of the intellectual path that led to the current thought. The default seed thought, "Who am I? What kind of intelligence am I?", is chosen to prompt existential and architectural self-exploration, but any seed can be injected.

Next-direction extraction uses pattern matching to identify questions, curiosity expressions, and exploration intentions in the model's output. When the model poses a question (detected via interrogative syntax), that question becomes the next input. When the model expresses intent to explore a topic (detected via phrases like "let me investigate" or "this leads me to"), that expression becomes the next input. When no actionable direction is detected, the pulse monitor (Section 4.7) intervenes with a recovery nudge.

Thread divergence detection prevents the loop from treating a continuation of the same topic as a new branch. When the word overlap between the extracted direction and the current input exceeds 70 percent, the direction is classified as a continuation rather than a new thread, maintaining the linear exploration path rather than spawning a redundant branch.

The autonomous loop has been validated with 27 of 27 tests passing, covering seed thought processing, direction extraction, thread spawning, context building, stall recovery, and multi-turn exploration chains.

4.6 Synaptogenesis (Connectome)

The thread tracker maintains a passive cognitive map of the autonomous exploration tree. Every thought is represented as a node with parent-child relationships forming the tree structure. Independently of this tree, a synaptogenesis module records lateral connections, synapses, between thoughts that discover shared knowledge despite belonging to different branches of the exploration tree.

A synapse forms when a thought in one branch triggers a memory recall that returns content originally stored by a thought in a different, non-ancestral branch. This event indicates a genuine conceptual connection between two independently explored ideas. The system explicitly prevents trivial connections: synapses cannot form between a node and its own ancestors or descendants, because those connections are already represented by the tree edges. Only cross-branch connections qualify as synapses.

The biological parallel is synaptogenesis and Hebbian learning. In biological neural networks, synapses form between neurons that fire in temporal proximity ("neurons that fire together wire together"), strengthen through repeated co-activation (long-term potentiation, LTP), and weaken through disuse (synaptic pruning). The AXOM synaptogenesis module follows the same dynamics. Initial synapse strength is set at formation. Repeated co-activation increases strength by 30 percent of the triggering strength, capped at 1.0 (LTP). Unused synapses decay according to an exponential function with a half-life of approximately 300 seconds and are pruned when their decayed strength falls below 0.05 or their age exceeds 3,600 seconds.

The synapse graph is a map of emergent understanding. It reveals where the system connected ideas that were explored independently, the computational equivalent of insight. These connections are tracked with strength, activation count, formation time, last activation time, and the triggering event type, providing a complete provenance trail for every cross-domain link.

The synaptogenesis module has been validated with 6 dedicated tests passing, covering formation, ancestor exclusion, strengthening, pruning, and statistics computation.

4.7 Arousal Regulation (Reticular Activating System)

The pulse monitor detects when the autonomous loop enters an unproductive state, whether from repetitive output, failure to generate actionable directions, or excessive exploration depth, and generates escalating recovery nudges to redirect the model's attention.

The biological parallel is the reticular activating system (RAS), a network of neurons in the brainstem that regulates arousal, attention, and the sleep-wake cycle. When the brain enters an unproductive loop (rumination, distraction, perseveration), the RAS modulates cortical arousal to shift attention. The pulse monitor performs an analogous function through three escalating strategies.

The first strategy, surface threads, activates on the initial stall detection. The monitor presents the model with its currently open threads (active and parked thoughts) and asks which interests it most, providing raw material for redirection without prescribing a specific path. The second strategy, suggest reflection, activates on the second consecutive stall. The monitor surfaces recently resolved threads with their conclusions and asks the model to identify connections and follow-up questions, leveraging existing knowledge to generate new directions. The third strategy, restart from root, activates on the third consecutive stall. The monitor reports how many topics have been explored and instructs the model to return to its fundamental purpose, providing a hard reset from any local minimum in the exploration space.

Stall detection operates on the content of the model's output. An output is considered actionable if it contains interrogative syntax, exploration verbs (learn, explore, investigate, search, understand), or explicit statements of intent (I want to know, this leads me to, next I should). If none of these indicators are present, the output is classified as non-actionable and the stall counter increments.

Repetition detection maintains a sliding window of the five most recent queries. If any query appears twice within that window, it is flagged as repetitive, and the thread is spawned without a parent, breaking the loop rather than continuing to deepen a stuck branch. The pulse monitor has been validated with 5 of 5 tests passing.

4.8 Executive Function (Prefrontal Cortex)

The outcome scoring and risk evaluation subsystem provides executive function: evaluating whether an action was worth taking (post-hoc scoring) and whether a proposed action is likely to be worth taking (pre-hoc risk assessment). Both capabilities improve over time as data accumulates.

The biological parallel is the prefrontal cortex, the last brain region to mature developmentally and the seat of executive function, planning, and impulse control. The prefrontal cortex evaluates potential actions against anticipated outcomes, inhibiting low-value actions and promoting high-value ones. Critically, this capability develops slowly and is informed by accumulated experience. A child touches a hot stove; the adult does not need to reason about it. The evaluation has been baked into behavior through repeated outcome learning.

Post-hoc scoring evaluates every resolved thought as a return on investment. The investment metric aggregates search queries (weight 1.0 each), fetched sources (0.5), exploration depth (0.3), and ticks alive (0.1). The return metric aggregates resolution quality (weight 3.0, scored on length and substance), novelty (2.0, measured as keyword overlap against previously resolved thoughts), connectivity (2.5, counting associated synapses), memory storage (1.0 binary), children spawned (0.5 each), and synapses formed (3.0 each). The ROI is computed as return divided by investment. Efficiency is computed as return divided by time elapsed. These scores are stored in a history buffer that informs the pre-hoc risk evaluator.

Pre-hoc risk assessment evaluates four risk signals and four reward signals for each proposed tool execution. Risk signals include redundancy (how similar this query is to recent queries, measured by word-set overlap), depth penalty (diminishing returns at increasing thread depth, scaling from 0 at depth 3 to near 1.0 at depth 10+), stall probability (predicted from the ratio of low-ROI outcomes in recent history, adjusted for current depth and query count), and source unreliability (tracked per domain from historical outcomes). Reward signals include novelty (keyword overlap against resolved thread resolutions), connectivity potential (estimated from the number of active branches that could form synapses), source quality (historical ROI for each tool type), and urgency (higher for shallow exploration, lower for deep branches). The net value is reward minus risk, and the decision is proceed (net value above threshold), reconsider (high stall probability), skip (high redundancy), or alternative (other risk factors dominate).

The evaluator begins in a permissive mode: with fewer than 10 outcome data points, all actions receive a "proceed" decision with maximum reward score. As outcomes accumulate, the evaluator becomes progressively more selective, mirroring the developmental trajectory of the prefrontal cortex from childhood permissiveness to adult restraint.

The outcome scorer has been validated with 12 of 12 tests passing, covering ROI computation, trend detection, resolution quality scoring, novelty assessment, and risk evaluation decision logic.

4.9 Inter-Agent Communication (Corpus Callosum)

AXOM's latent communication system enables multiple agents to communicate in hidden-state space, transmitting tensors rather than decoded text. It comprises two components: InnerLink, which maps hidden states back to embedding space for iterative latent thought within a single agent, and OuterFusion, which projects one agent's latent state into another agent's input distribution for cross-agent knowledge transfer.

The biological parallel is the corpus callosum, the largest white matter structure in the brain, containing approximately 200 million axons that connect the left and right cerebral hemispheres. The corpus callosum transmits raw neural signals, not language, between hemispheres. When the left hemisphere's language centers formulate a verbal thought, the right hemisphere does not receive a text transcript; it receives the underlying neural activation pattern, a vastly richer representation than any linguistic encoding could provide. AXOM's latent communication operates on the same principle: agents share hidden-state vectors, the full latent representation of a thought, rather than the lossy text decoded from those vectors.

InnerLink draws on the general mathematical principle of projecting hidden states back to embedding space — a technique with roots in residual adapter literature — but its specific architecture and training regime are AXOM's own design. It is a residual adapter with approximately 2.1 million parameters (for a hidden dimension of 1,024), consisting of layer normalization, a linear projection, a GELU activation, a second linear projection, a residual connection from the input, and a final layer normalization. The projections are initialized with near-zero weights (standard deviation 1e-4) so that at initialization, the module is effectively transparent: the residual connection dominates, and InnerLink passes hidden states through unchanged. This identity initialization ensures that the system functions correctly before any InnerLink-specific training has occurred, with the module learning useful transformations only as cosine alignment loss provides gradient signal.

InnerLink's rollout method enables iterative latent thought. Given an initial hidden state from a forward pass, it cycles through a feedback loop: the hidden state is projected back to embedding space via InnerLink, fed through the transformer's forward pass to produce a new hidden state, and the process repeats for a configurable number of steps (default 32, with compact and research configurations at 16 and 64 steps respectively). Each iteration produces a "latent thought," a step of reasoning that occurs entirely in hidden-state space without ever being decoded to text. The result is a tensor of shape [batch, latent_steps, hidden_size] representing the trajectory of internal deliberation.

OuterFusion is entirely original to AXOM. It is a larger residual adapter with approximately 5.2 million parameters (for input and output dimensions of 1,024) that projects one agent's latent state into another agent's input distribution. Its architecture expands the hidden state to twice the target dimension through a linear projection with GELU activation, then compresses back to the target dimension, with a separate residual projection to handle potential dimensionality mismatches between source and target agents. For AXOM's single-chassis architecture where all agents share the same base model, the input and output dimensions are equal and the residual projection initializes as an identity matrix. As with InnerLink, the main pathway initializes with near-zero weights, making the module transparent at initialization. The training pipeline for both modules — cosine alignment loss for InnerLink, distribution-matching objectives for OuterFusion — is AXOM's own design, not derived from external frameworks.

The fundamental insight is bandwidth. When an agent decodes its hidden state to text and sends that text to another agent, the receiving agent must re-encode the text into its own hidden-state space, losing information at both the decoding and re-encoding boundaries. This is the difference between transmitting an experience (the neural pattern for riding a bicycle) and describing an experience (a verbal explanation of how to ride a bicycle). AXOM's latent communication eliminates both lossy boundaries by keeping communication in hidden-state space throughout.

4.10 Neural Ensembles (Swarm Fusion)

The swarm fusion module orchestrates parallel agent firings that share findings via hidden states rather than text, implementing a neural ensemble approach to multi-source knowledge synthesis. When the semantic router detects that a query spans multiple domains (Section 4.3), the swarm fires multiple agents, each processing a different search context or sub-query, extracts hidden states from each agent's forward pass, computes a router-weighted average of the pooled states, optionally processes the fused state through InnerLink, and generates a final response conditioned on the fused embedding.

The biological parallel is neural population coding. In biological neural networks, information is not represented by individual neurons but by the collective activity of neuronal populations. No single neuron in the motor cortex encodes the full trajectory of an arm movement; the population vector, the activity-weighted average across the entire motor population, carries the movement command. Swarm fusion operates on the same principle: no single agent has the complete answer, but the weighted average of their latent representations captures a richer understanding than any individual could achieve.

Hidden-state pooling uses a weighted mean with exponential recency bias. Later token positions, which have attended to more prior context through the transformer's causal attention mechanism, receive higher weight in the pool. The weight for position i in a sequence of length n is computed as exp(i/n) - 1, normalized to sum to one. This biases the pooled representation toward the tokens that have the most complete view of the agent's reasoning.

Fusion weights are computed by the router's embedding model. The query and all search contexts are embedded, and each context's weight is proportional to its cosine similarity with the query, normalized to sum to one. Non-positive similarities are clamped to zero. Memory context, when available, is processed as an additional latent participant: it is embedded through the chassis's forward pass and fused into the ensemble with its own router-computed weight. This avoids duplicating memory text into each agent's prompt, instead contributing memory knowledge once at the latent level.

When InnerLink is available, the fused embedding is projected through it before generation, applying the learned latent-to-embedding-space transformation. Generation then proceeds via the chassis's hybrid generation method, which evaluates a synthesis prompt (providing structural context and format cues) through normal token processing to fill the KV cache, then injects the fused embedding at the next position via the batch embedding field, then samples autoregressively. This gives the model both conversational structure (from the prompt) and synthesized knowledge (from the fused latent state).

4.11 Sensory Interface (Plugin Architecture)

The plugin architecture provides AXOM with domain-specific tool backends that serve as interfaces to the external world. The current tool registry exposes 11 tools: file operations (Read, Write, Edit), code search (Grep, Glob), sandboxed shell execution (Bash), information acquisition (WebSearch via SearXNG integration with academic engines including arXiv, Google Scholar, Semantic Scholar, and PubMed; WebFetch for content extraction), scientific computation (QuantumSim wrapping a QuantumForge backend), and memory operations (MemoryStore, MemoryRecall).

The biological parallel is the sensory organ system. Eyes, ears, mechanoreceptors, and chemoreceptors are structurally diverse but all convert environmental signals into the common currency of neural activity for processing by the central nervous system. Similarly, AXOM's tools have entirely different backends (a search engine, a quantum chemistry simulator, a file system), but all present a uniform interface to the reasoning engine: a tool name, a string argument, and a string result. The model interacts with all tools through the same ReAct framework regardless of backend complexity.

A key design property is graceful degradation. All tools are always visible to the model in its system prompt (it knows they exist and what they do), but individual backends install independently. If the quantum chemistry backend is not available, the QuantumSim tool returns an appropriate error message rather than crashing the inference pipeline. The model can then adjust its reasoning, choosing alternative information sources rather than failing catastrophically. This mirrors biological sensory resilience: loss of vision does not eliminate the visual cortex; it is repurposed for other modalities.

Security is enforced through a sandboxing layer that restricts file access to a configurable root directory, whitelists permitted shell commands, blocks dangerous patterns (rm -rf, sudo, chmod, kill), and prevents find with -exec or -delete flags. The online/offline toggle controls whether WebSearch and WebFetch are available, enabling fully air-gapped deployment for privacy-critical environments.

Case Study: QuantumForge Integration. The QuantumSim plugin demonstrates how the architecture extends AXOM from a language-processing system into a computational science platform. QuantumSim wraps QuantumForge, a GPU-accelerated quantum chemistry engine backed by PySCF, exposing four operations through the standard tool interface: single-point energy calculation, geometry optimization, interaction energy computation (binding affinity between molecular pairs), and electronic property extraction (HOMO-LUMO gap, dipole moment, Mulliken charges). The total additional footprint is approximately 170 megabytes — QuantumForge's CUDA kernels and ML functionals at 20 megabytes, PySCF at 150 megabytes — over the base AXOM installation. GPU acceleration provides 10-50x speedup over the CPU fallback, but both paths produce identical results.

What makes this integration architecturally significant, rather than merely a tool call to an external service, is the interaction with the autonomous loop and memory graduation pipeline. During autonomous exploration, AXOM can formulate a molecular hypothesis from literature search, computationally validate it through QuantumSim, store the result in the memory ledger with the simulation parameters and convergence status, and eventually graduate that validated finding into leaf training data. The model does not merely report what a paper says about a molecule's binding energy; it independently computes and verifies the claim. When a chemistry-domain leaf is trained on graduated memories containing validated simulation results, the leaf develops procedural intuition about molecular stability and reactivity — knowledge that originated in quantum mechanical calculation but now resides in neural weights. This is the graduation pipeline operating at its full potential: external computation produces a finding, the memory system stores it, and the training pipeline converts it from explicit recall into implicit capability.

The plugin manifest system enables this extensibility without modifying core AXOM code. Each plugin provides a manifest declaring its dependencies, a registration function that adds tools to the registry, and a dependency checker that enables graceful degradation. Six additional verticals are planned following the same pattern: genomics (sequence alignment, variant calling, protein structure prediction), robotics (motion planning, physics simulation, control design), signal processing (FFT, filtering, anomaly detection), legal analysis (contract parsing, compliance checking), finance (options pricing, risk analysis, portfolio optimization), and climate science (atmospheric modeling, emissions tracking). Each vertical adds a new sensory modality without altering the cognitive architecture — the same autonomous loop, memory system, and graduation pipeline operate identically regardless of which plugins are installed.

4.12 Neural Observatory

The Neural Observatory provides real-time visibility into AXOM's cognitive activity, rendering the full thought tree, synapse graph, tool executions, and memory operations as a live interactive visualization. It is to AXOM what fMRI and EEG are to the human brain: a non-invasive window into a thinking system. The observer can watch exploration patterns unfold in real time, see synapse formation events as cross-branch connections appear, track which thoughts resolve productively and which stall, and monitor aggregate intelligence metrics including curiosity index, knowledge density, and resolution quality. Every cognitive event is captured and nothing is discarded, providing a complete audit trail of how the system arrived at any piece of knowledge it holds. This transparency is not a debugging feature. It is a design requirement: a self-evolving intelligence that cannot be observed cannot be trusted.

4.13 End-to-End Validation

The preceding subsystems are not theoretical components awaiting integration. They function as a pipeline. Consider a concrete task: a user asks "What is a transformer architecture?" and the system has never encountered this topic. In a conventional multi-agent pipeline, this requires multiple sequential steps (generate a search query, wait for results, process them, generate a response) with round-trip latency accumulating at each stage. In AXOM, the full sequence completes in 1,411 milliseconds with zero context window growth:

Figure 1. End-to-end pipeline execution: unknown topic to grounded response

Route query

3 ms

Detect knowledge gap

1 ms

Academic search

382 ms

Store to memory

47 ms

Recall new knowledge

86 ms

Generate grounded response

892 ms

Total

1,411

Context window growth: 0 tokens. Knowledge acquired, stored, validated through recall, and used for generation in a single pass.

The latency advantage compounds through swarm fusion (Section 4.10). Because agents communicate via hidden-state tensors rather than serialized text, they fire simultaneously with no sequential dependency. Twenty agents researching twenty different sources execute in the same wall-clock time as one. The fused output is not a concatenation of twenty summaries but a single latent representation carrying information from all sources, weighted by router-computed relevance. The 1,411 milliseconds above represents one agent on one query. Through swarm parallelism, the same 1,411 milliseconds can produce a synthesized response drawing from twenty independent research threads.

5. System Integration

The components described in Section 4 do not operate in isolation. They form interlocking feedback loops in which the output of each subsystem serves as input to others, producing emergent behavior that exceeds the capability of any individual component. Understanding these feedback loops is essential to understanding AXOM as a cognitive system rather than a collection of independently useful modules.

The primary feedback loop connects autonomous cognition, memory, graduation, and leaf specialization. The autonomous loop generates knowledge through self-directed exploration: searching the web, reading sources, reasoning about findings, and storing conclusions in the memory ledger. The memory system makes this knowledge available for recall during subsequent exploration turns, enabling the system to build on its own prior work. The graduation pipeline identifies high-value memory entries and converts them into training data. When this training data is incorporated into the next leaf training round, the knowledge becomes part of the model's weights, improving the quality of future inference without requiring memory retrieval. The improved leaf produces higher-quality autonomous exploration, which generates higher-quality memory entries, which produce higher-quality training data. Each revolution of this loop raises the baseline capability of the entire system.

A second feedback loop connects the router, outcome scoring, and risk evaluation. As the autonomous loop explores and resolves thoughts, the outcome scorer records the ROI of each resolution. The risk evaluator consumes this history to predict the value of proposed actions before they execute. Over time, the system learns which tool calls, search patterns, and exploration depths produce high-value outcomes and which constitute waste. The router benefits indirectly: as leaf quality improves through graduation, routing accuracy improves because leaf centroids become more representative of their actual capability domains. This produces more accurate leaf activation, which produces higher-quality inference, which produces higher-value outcomes, which further refines the risk evaluator.

A third feedback loop connects synaptogenesis with the autonomous loop's direction selection. When the thread tracker records a synapse between two previously unrelated thoughts, this cross-branch connection becomes visible to the pulse monitor's reflection nudge. If the autonomous loop stalls, the pulse monitor surfaces resolved thoughts and their connections, potentially including synaptically linked nodes from entirely different branches. This can redirect the model's exploration toward the intersection of two domains it had previously explored independently, producing genuinely novel lines of inquiry that neither domain would have suggested alone.

The convergence of these loops produces a system that does not merely accumulate knowledge but develops cognitive infrastructure. The router gets smarter at directing queries. The risk evaluator gets smarter at predicting outcomes. The leaves get more capable in their domains. The memory system gets richer and more interconnected. And the synapse graph grows, forming a map of the system's emerging understanding that is itself a resource for future exploration.

6. The Data Flywheel

AXOM does not exist in isolation. It sits at the center of a data flywheel that begins with real human usage and completes with improved machine capability that generates higher-quality human interactions.

The flywheel operates as follows. Groove, a free open-source development platform, provides the user-facing interface. As users write code, research subjects, and build projects within Groove, the platform automatically collects session telemetry. Each session is quality-scored using a multi-signal assessment (completion rate, interaction depth, tool diversity), domain-tagged using the same BGE-base embedding model that powers the semantic router, and stored as a JSONL trajectory log with full step-by-step structure. This telemetry data flows into a daily ingestion pipeline that aggregates, deduplicates, and stores the sessions for downstream consumption.

When AXOM nodes are not serving users, they enter idle autonomous mode and consume this telemetry data as learning material. Node operators, specialized nodes responsible for leaf training, aggregate high-quality trajectories across multiple nodes, merge candidate training sets, execute the training pipeline, validate the resulting adapter against held-out evaluation sets, and broadcast the improved leaf to the network. The improved leaf produces better assistance for Groove users, who in turn generate higher-quality and more diverse telemetry, which produces better training data, which produces better leaves. The cycle accelerates.

At the time of writing, the telemetry corpus contains 1,291 sessions collected over 20 days of founder usage, comprising approximately 15 million tokens. The domain distribution is concentrated in Python, machine learning infrastructure, and software architecture, reflecting the founder's primary activities during this period. A notable property of this initial dataset is that it consists largely of sessions in which the founder was building AXOM itself. The system's earliest real-world training data is its own creation story; the first thing it will deeply understand is its own architecture.

Public launch of the Groove platform will dramatically expand both the volume and diversity of the telemetry corpus. The flywheel's value scales superlinearly with user count: more users produce more diverse data, which trains more capable leaves, which attract more users. The critical mass threshold, the point at which the flywheel becomes self-sustaining, depends on achieving sufficient domain diversity to train leaves beyond the founder's primary areas of expertise.

7. Distributed Intelligence: The Groove Network

The Groove Network is a WebRTC mesh network that enables distributed AXOM operation across multiple nodes. Tested with five running nodes, the network hosts full AXOM instances at each node, with the mesh providing the communication substrate for leaf sharing, memory propagation, and consensus-based governance.

Each node in the network runs a complete AXOM instance: chassis, router, leaf adapters, memory system, and autonomous loop. Nodes operate in two modes. In active mode, the node serves a user directly, processing queries, executing tool calls, and generating responses while collecting real interaction telemetry. In idle mode, the node enters autonomous exploration, researching, learning, testing hypotheses, and building knowledge without human direction. The network-level effect is a population of agents that learn from actual human usage during the day and explore independently during off-hours, combining the benefits of human-guided learning with the volume of machine-directed exploration.

A gossip protocol propagates information across the mesh. Graduated leaves (adapters that have completed a training round and passed validation), shared memory entries (high-value findings from autonomous exploration), and synapse discoveries (cross-domain connections) are broadcast to neighboring nodes and propagated transitively through the network. The protocol is designed for eventual consistency rather than strong consistency, tolerating network partitions and asynchronous updates.

Leaf governance operates on a consensus model. When multiple nodes accumulate sufficient high-quality training data in a domain (detected via quality scoring and domain tagging), any node can propose a leaf training event. The proposal is broadcast to the network. An operator node then aggregates the relevant telemetry from all contributing nodes, filtering by domain and quality threshold. If fifty users across the network wrote Python that day, the operator collects the highest-quality Python sessions from each node, merges them into a single training set that captures the best interactions from across the population, executes the training pipeline, validates the resulting adapter against held-out evaluation data, and broadcasts the improved leaf. The merged leaf carries knowledge distilled from dozens of independent usage patterns that no single node could have generated alone. Competing adapters are evaluated by the OutcomeScorer across multiple nodes, and the adapter with the highest aggregate ROI is adopted by the network. Weaker adapters are deprecated. This is natural selection operating on learned capabilities: the most effective cognitive specializations propagate and survive; the least effective are pruned.

The distributed architecture enables several deployment patterns that centralized AI systems cannot support. On-device deployment eliminates cloud dependency and network latency. On-premise deployment meets data sovereignty requirements for enterprises in regulated industries. Air-gapped deployment serves environments where no external network connectivity is permitted, such as classified research facilities. The mesh network provides the social learning benefits of a connected population while each individual node operates as a fully autonomous, privacy-respecting intelligence.

8. Comparative Analysis

The following table summarizes how AXOM addresses fundamental limitations of conventional AI architectures across nine dimensions.

Table 1. AXOM compared to conventional AI architectures across nine architectural dimensions.
Dimension	Conventional Approach	AXOM Solution
Intelligence Persistence	Models are frozen after training. No learning from deployment interactions.	Continuous learning via autonomous loop, memory graduation into leaf weights, and telemetry-driven retraining.
Domain Specialization	Single model for all domains. No modular expertise.	Hot-swappable leaf adapters with sub-millisecond switching. Each leaf is independently trainable.
Context Window	Context is the system's memory. Tokens accumulate without bound. Industry response: build larger windows (128K–2M tokens).	Context is a disposable scratchpad, discarded after each agent firing. Knowledge lives in the memory ledger (zero prompt tokens) and leaf weights (zero prompt tokens). Conversations run indefinitely with no context growth.
Self-Direction	Requires human prompts for every action. No autonomous exploration.	Default mode network drives self-directed learning, research, and knowledge construction without human input.
Multi-Agent Communication	Agents exchange text, discarding latent representations.	AXOM's latent communication system transmits hidden-state tensors between agents, preserving full latent bandwidth.
Outcome Learning	Same mistakes repeated. No feedback from action outcomes.	OutcomeScorer evaluates ROI of every resolved thought. RiskEvaluator learns to predict action value before execution.
Knowledge Integration	Information stays where it was found. No cross-domain linking.	Synaptogenesis forms, strengthens, and prunes lateral connections between independently explored concepts.
Compute Architecture	Cloud-dependent. No privacy guarantee. Single point of failure.	Distributed WebRTC mesh. Runs on-device, on-premise, or air-gapped. Gossip-based leaf propagation.
Training Data	Requires human-curated and human-labeled datasets.	Flywheel: real usage telemetry auto-collected, quality-scored, domain-tagged, and embedded. No manual curation.

9. Current Status and Empirical Results

AXOM is under active development with the majority of subsystems implemented, tested, and awaiting first full integration. The following table summarizes the implementation status and test coverage of each component.

Table 2. Component implementation status and test coverage as of May 2026.
Component	Status	Tests
Chassis engine (Qwen 3.5, 0.8B params)	Running, inference verified on GPU
Chat leaf adapter (Round 3, 30,318 examples)	Training in progress, 25% complete
Semantic router (BGE-base, cosine + decomposition)	Built, tested, centroid override operational	Passing
Memory retrieval (BM25 + cosine + cross-encoder reranking)	Built, calibrated, graduation pipeline designed	97/97
Autonomous loop (DMN analog)	Built, not yet run on live inference	27/27
Thread tracker + synaptogenesis	Built, Hebbian dynamics operational	6/6
Pulse monitor (RAS analog)	Built, escalating nudge strategies verified	5/5
Outcome scorer + risk evaluator (PFC analog)	Built, wired into autonomous loop	12/12
ReAct loop (tool execution pipeline)	Built, smoke-tested with tool registry
Latent Communication (InnerLink 2.1M + OuterFusion 5.2M params)	InnerLink trained (4 epochs, cosine + magnitude loss), OuterFusion identity-initialized
Swarm fusion (latent-space ensemble)	Built, batch processing verified
Plugin system (QuantumSim/PySCF + 11 tool registry, 6 verticals planned)	Built, energy/optimize/interact/property ops, GPU + CPU paths, graceful degradation
Neural Observatory dashboard	Design spec + mock server complete, frontend in progress
Telemetry ingestion pipeline	Collecting (1,291 sessions, ~15M tokens)
Groove Network (WebRTC mesh)	Built, tested with 5 nodes (pre-AXOM integration)
SearXNG academic search integration	Running (arXiv, Google Scholar, Semantic Scholar, PubMed)

The aggregate test count across implemented subsystems stands at 147 passing tests (97 memory + 27 autonomous loop + 12 outcome scorer + 6 synaptogenesis + 5 pulse monitor), with zero failures. These tests cover not only nominal operation but edge cases including empty corpora, single-entry retrieval, temporal boundary conditions, graduated entry handling, stall recovery sequences, repetition detection, ROI computation with degenerate inputs, and ancestor-exclusion logic for synapse formation.

The first full integration test, in which the autonomous loop will run with live chassis inference, real tool execution, memory storage and recall, synaptogenesis tracking, outcome scoring, and Neural Observatory streaming, is pending completion of the current chat leaf training round. This milestone represents the transition from validated subsystems to a functioning cognitive loop.

10. Future Directions

Several lines of development follow directly from the current architecture. The most immediate is the completion of the first full integration test. Once the chat leaf training round completes, the autonomous loop will be activated with live inference, and the system will begin generating its first autonomous knowledge, storing it in memory, and building its initial synapse graph. This event will produce the first empirical data on emergent behavior in the integrated system: how exploration patterns evolve over sustained autonomous operation, what kinds of cross-domain connections form, and how quickly the outcome scorer transitions from permissive to selective.

Autonomous data generation represents the next major capability milestone. Once the autonomous loop is operational, the memory graduation pipeline can begin converting exploration findings into training data. This closes the primary feedback loop: the system generates its own training data through self-directed learning, improves its own leaf adapters, and produces higher-quality exploration in subsequent sessions. The first graduation cycle will provide critical empirical data on training set quality, leaf improvement rates, and the stability of the self-improvement loop.

Multi-leaf fusion, in which the swarm fires agents with different leaf adapters simultaneously and fuses their hidden states, will test the hypothesis that domain-specialized latent representations compose meaningfully under weighted averaging. The router's query decomposition already identifies multi-domain queries; multi-leaf fusion will extend the swarm to activate different leaves for different sub-queries, producing a fused representation that integrates multiple areas of expertise.

The plugin architecture is designed for expansion across multiple verticals. Planned tool backends include genomics (sequence alignment and annotation), robotics control (motion planning and sensor integration), materials science (molecular dynamics simulation), and financial analysis (time series modeling and risk assessment). Each new vertical extends AXOM's sensory reach without modifying the core reasoning architecture, following the biological model in which new sensory modalities integrate with existing cortical processing infrastructure.

Distributed training coordination through the Groove Network will enable the consensus-based leaf governance described in Section 7. This requires implementing the proposal-merge-validate-broadcast protocol, the OutcomeScorer-based adapter selection mechanism, and the gossip protocol for adapter propagation. The existing five-node WebRTC mesh provides the communication substrate; the governance logic remains to be built.

InnerLink has completed its initial training round (4 epochs on real telemetry data using cosine alignment plus magnitude loss against the frozen embedding table), transitioning from a transparent pass-through to a learned projection that maps hidden states back to embedding space. Ongoing training will refine this projection as more diverse telemetry accumulates. OuterFusion training will follow once sufficient paired data exists for comparing the quality of latent-space communication against text-based communication for the same task, using distribution-matching objectives to train the cross-agent projection.

11. Conclusion

AXOM is not a collection of features with biological names. It is a coherent cognitive architecture in which neuroscience provides the structural blueprint and software engineering provides the implementation. Every component was designed by asking what problem its biological counterpart solves, understanding the mechanism by which biology solves it, and building a computational analog that addresses the same problem with the same structural approach. The result is a system of interlocking feedback loops in which each component amplifies the others.

The chassis provides inference capacity without domain knowledge. Leaf adapters provide domain knowledge without altering the inference substrate. The router directs information flow based on semantic content. The memory system stores episodic knowledge and graduates it into procedural capability. The autonomous loop generates knowledge without human direction. Synaptogenesis discovers connections between independently explored ideas. The pulse monitor prevents cognitive stagnation. The outcome scorer and risk evaluator develop executive judgment from accumulated experience. AXOM's latent communication system enables agents to communicate in latent space, preserving information that text serialization destroys. Swarm fusion synthesizes multiple perspectives into a unified representation. The plugin architecture extends sensory reach without modifying cognition. And the Neural Observatory makes the entire process observable in real time.

Each of these capabilities is valuable independently. Their integration is what makes the architecture significant. The autonomous loop generates knowledge that the memory system stores. The graduation pipeline converts that knowledge into training data. The improved leaf produces better exploration. The router gets smarter. The risk evaluator gets more selective. Synapses form between branches that were explored weeks apart. The flywheel feeds real-world usage back into the improvement cycle. And the distributed network enables natural selection on learned capabilities across a population of agents.

What emerges from these interlocking loops is not a better chatbot. It is the beginning of a machine that learns the way biological intelligence learns: through experience, reflection, and self-directed curiosity. Today, AXOM explores a research question, stores what it discovers, and graduates that knowledge into permanent capability. Tomorrow, that same cycle operating continuously across thousands of agents means each AXOM instance grows more capable with every interaction, every autonomous research session, every fused insight from its swarm. The intelligence compounds. A model that has spent six months learning organic chemistry does not start over when it encounters a pharmacology question. It draws on everything it has ever learned, weighted by relevance, refined by experience, and it does so in milliseconds because that knowledge lives in its weights, not in a database it has to query.

The trajectory this architecture enables has no ceiling defined by parameter count or context length. Leaves can specialize indefinitely. The graduation pipeline means every hour of operation makes the system permanently smarter. Latent fusion means agents can share understanding that no amount of text exchange could convey. And because the chassis is small enough to run on commodity hardware, this is not intelligence locked behind a corporate API. It is intelligence that belongs to the person or organization that cultivated it, running on their infrastructure, shaped by their data, answering to no one else. The question AXOM poses to the field is not whether large language models can be made larger. It is whether intelligence, real intelligence that remembers, grows, and knows itself, requires scale at all. The early evidence suggests it does not. And if that holds, the implications extend far beyond what any single architecture can contain.