1. Abstract
This paper presents AXOM, a self-evolving machine intelligence architecture in which each computational subsystem is mapped to a specific neuroanatomical structure, not as metaphor but as an engineering methodology. Built on a 0.8-billion-parameter hybrid DeltaNet/Attention transformer chassis, AXOM employs hot-swappable LoRA leaf adapters for domain specialization, a three-phase memory retrieval pipeline with temporal decay and graduation from explicit recall to implicit weight consolidation, and an autonomous cognition loop that drives self-directed learning without human prompts. Inter-agent communication occurs in latent space through a trainable projection system that transmits hidden-state tensors rather than decoded text, preserving information bandwidth that text serialization destroys. A synaptogenesis module passively tracks cross-branch conceptual connections discovered during autonomous exploration, forming, strengthening, and pruning lateral links according to Hebbian dynamics. An outcome scoring system evaluates the return on investment of every resolved thought, feeding an executive risk evaluator that learns which actions produce value and which constitute waste. The architecture is supported by a data flywheel: real usage telemetry from a free open-source platform is quality-scored, domain-tagged, and embedded, then consumed by idle AXOM nodes for continuous leaf improvement. A WebRTC mesh network enables distributed operation with consensus-based leaf governance and natural selection on learned capabilities. Empirical validation across the implemented subsystems yields 97 of 97 memory retrieval tests, 27 of 27 autonomous loop tests, and 12 of 12 outcome scorer tests passing. The architecture constitutes a coherent, biologically-grounded system in which each component amplifies the others through feedback loops that produce emergent cognitive behavior.
2. Introduction
Contemporary artificial intelligence systems, despite their remarkable generative capabilities, share a fundamental architectural limitation: they are static. A large language model trained on a fixed corpus and deployed behind an API endpoint represents frozen intelligence. It cannot learn from its interactions, specialize to new domains without retraining, or direct its own cognitive development. Multi-agent frameworks address some of these constraints by orchestrating multiple model instances, but they rely on text serialization for inter-agent communication, a lossy channel that discards the rich latent representations computed internally during inference.
AXOM approaches this problem from a different direction. Rather than layering increasingly complex orchestration atop architecturally undifferentiated model instances, AXOM asks what the most successful intelligence architecture known to science, the mammalian brain, actually does, and then builds computational analogs to those mechanisms. The result is not a collection of features with biological names. It is an integrated system in which the choice of neuroanatomical mapping at each level produces specific engineering advantages that would not have been discovered through conventional software architecture alone.
The architecture rests on a 0.8-billion-parameter transformer chassis that provides raw inference capacity without domain knowledge, analogous to the brainstem providing signal processing capacity without cognitive content. Domain expertise is supplied by lightweight LoRA adapters, each carrying approximately 43 million trainable parameters, that can be activated in sub-millisecond time, analogous to cortical column specialization. A semantic router directs incoming queries to the appropriate adapter, as the thalamus relays sensory input to the corresponding cortical region. Memory operates on a hippocampal-neocortical consolidation model: recent episodic memories reside in a structured ledger (short-term memory), while a graduation pipeline mines high-value memories into leaf training data, literally converting explicit recall into implicit capability baked into model weights (long-term memory). When the system is not serving a user, an autonomous cognition loop, modeled on the brain's default mode network, drives self-directed exploration, research, and knowledge construction without human prompts or scripted question banks.
Two architectural innovations distinguish AXOM from prior work in modular and multi-agent AI. First, AXOM's latent communication system enables agents to communicate in hidden-state space rather than through text, transmitting the full latent representation of a thought rather than its lossy decoded form. Second, the combination of autonomous exploration with synaptogenesis, a module that tracks genuine cross-domain conceptual connections discovered during self-directed learning, produces emergent insight that no individual component could generate in isolation.
This paper describes the architecture in full, grounds each component in its neuroanatomical analog, presents empirical validation of the implemented subsystems, and discusses the data flywheel and distributed computing infrastructure that enable continuous self-improvement at scale.
3. Design Philosophy
The use of neuroscience as an engineering blueprint in AXOM is neither decorative nor post-hoc. Every component was designed by first asking what problem its biological counterpart solves, understanding the mechanism by which biology solves it, and then implementing a computational analog that solves the same problem using the same structural approach. The rationale is straightforward: the human brain is the product of approximately 600 million years of evolutionary optimization for general intelligence under resource constraints. The solutions it has converged on, from hierarchical sensory routing to memory consolidation through sleep, represent engineering decisions that have been tested against survival pressure across billions of individual instances. Ignoring these solutions and designing AI architectures from first principles alone is not merely missing an opportunity; it is discarding the largest empirical dataset on intelligence architecture that exists.
This approach yields concrete engineering benefits at every level. The decision to separate the chassis (brainstem) from leaf adapters (cortical columns) did not emerge from conventional software decomposition; it emerged from observing that the brainstem processes neural signals without knowing what they mean, while the cortex provides meaning through specialized, experience-dependent structures. The result is an architecture where the base model never needs retraining, domain knowledge is modular and independently trainable, and specialization switching costs sub-millisecond latency, a design that would be unusual to arrive at through typical model-serving considerations alone.
Similarly, the memory graduation pipeline did not arise from an abstract desire for "continuous learning." It arose from the neuroscientific observation that human memory consolidation proceeds from hippocampal episodic storage to neocortical procedural embedding. The engineering implementation, converting memory ledger entries into synthetic training data for leaf weight updates, directly mirrors this biological process and solves a problem that most AI memory systems leave unaddressed: how to transition from retrieval-augmented generation, where knowledge sits in an external store, to parametric knowledge, where the model has internalized the information and no longer needs to look it up.
A third biological insight drives what may be the architecture's most counterintuitive engineering decision: the treatment of context as disposable scratchpad rather than accumulated history. Human working memory, mediated by the prefrontal cortex, is famously limited to approximately seven items (Miller, 1956). Yet this limitation does not constrain human intelligence, because the brain does not use working memory as its knowledge store. Working memory is a temporary workspace for the current cognitive task; long-term knowledge resides in synaptic weights (neocortical LTM) and episodic traces (hippocampal STM). The conscious workspace is tiny. The actual knowledge base is vast. AXOM follows this principle exactly. Each inference call receives a minimal, purpose-built context window: the current thought, a few ancestor nodes for orientation, and recent turn history (default five turns). The context window is not the system's memory; it is the system's attention. Persistent knowledge lives in the memory ledger (explicit recall) and leaf weights (implicit capability). This is why a 0.8-billion-parameter model with a 262,144-token context window works: the system never needs most of that context capacity, because information is stored in memory or weights rather than accumulated in the prompt. Every agent firing, whether in the autonomous loop, the ReAct tool-execution cycle, or the swarm fusion pipeline, operates on an isolated, disposable context that is discarded after use. No context accumulates across firings. This makes each inference maximally efficient (minimal prompt tokens, minimal KV cache pressure) and eliminates the context window exhaustion that plagues long-running multi-agent systems.
The design philosophy can be summarized as: neuroscience provides the architecture; engineering provides the implementation. Biology tells us what to build and why. Software engineering tells us how to build it efficiently and correctly.
4. Architecture
4.1 Chassis Engine (Brainstem)
The chassis engine is the computational substrate upon which all cognitive function operates. It wraps a single Qwen 3.5 base model, a 0.8-billion-parameter transformer employing a hybrid DeltaNet/Attention architecture with a 262,144-token context window and a quantized footprint of 775 megabytes (Q8_0). The chassis provides three core capabilities: standard text generation, hidden-state extraction from any forward pass, and latent-conditioned generation where hidden-state embeddings are injected directly into the transformer's processing pipeline, bypassing the token embedding lookup entirely.
The biological parallel is the brainstem, the structure that handles basic neural signal processing, sensory relay, and autonomic regulation without performing higher cognition. The brainstem does not "know" anything; it provides the processing infrastructure on which knowledge operates. Likewise, the chassis processes tokens, manages the KV cache, and executes forward passes, but carries no domain-specific knowledge in its base weights. All specialization is supplied externally through leaf adapters.
A critical capability is sub-millisecond LoRA adapter hot-swapping. The chassis maintains a dictionary of loaded adapter handles and activates them via the llama.cpp adapter API, setting adapter pointers and scale factors without reloading the base model or clearing the inference state. This enables the system to switch cognitive specializations between consecutive inference calls with negligible latency, analogous to how the brainstem routes neural signals to different cortical regions without itself undergoing structural change.
For latent communication, the chassis exposes a hidden-state extraction interface that runs a forward pass with embedding mode enabled, retrieves the post-final-norm hidden states for every token position as a float32 array of shape [n_tokens, n_embd], and returns them for downstream latent-space processing. A batch variant processes multiple prompts in sequence with KV cache clearing between them, enabling efficient parallel agent execution within a single GPU context. When total tokens across a batch exceed the context window, the engine transparently partitions into sub-batches, extracts hidden states per partition, and reassembles them into a flat result list, making sub-batching invisible to the caller.
4.2 Leaf Adapters (Cerebral Cortex)
Leaf adapters are lightweight, domain-specialized LoRA modules that overlay the chassis's base weights to provide expert-level capability in specific knowledge domains. Each leaf carries approximately 43 million trainable parameters, roughly 5 percent of the chassis, implemented as rank-64 LoRA matrices targeting all 12 projection modules within the transformer architecture. The system supports simultaneous loading of multiple leaves into GPU memory, with the router selecting the appropriate adapter per query.
The biological parallel is the cerebral cortex, specifically the cortical column as a unit of functional specialization. Cortical columns are structurally identical across brain regions, each composed of approximately 100,000 neurons in a consistent six-layer arrangement, yet they are functionally distinct: a column in the primary visual cortex (V1) processes oriented edges, while a structurally identical column in Broca's area processes syntactic structure. The differentiation arises entirely from experience-dependent synaptic weight patterns. Leaf adapters follow the same principle. Every leaf uses an identical LoRA structure (same rank, same target modules, same parameter count), but each develops distinct functional specialization through domain-specific training data.
Each leaf is defined by a data model comprising a unique identifier, a domain label, a natural language domain description (used by the router for centroid computation), a system prompt that establishes the leaf's cognitive orientation, an optional centroid file for data-driven routing refinement, and a set of negative domain descriptors that help the router avoid false activations. The base leaf, which activates when no specialized adapter is appropriate, operates as a planning and orchestration agent that decomposes complex tasks, routes work to specialists, and synthesizes results.
The training pipeline supports both supervised fine-tuning (SFT) and direct preference optimization (DPO) stages. The current chat leaf, in its third training round at the time of writing, processes 30,318 training examples drawn from real usage telemetry collected over 20 days of founder interaction with the platform.
4.3 Semantic Router (Thalamus)
The semantic router classifies incoming queries by domain and directs them to the appropriate leaf adapter. It operates on BGE-base embeddings (768 dimensions) computed via a SentenceTransformer model, comparing each query's embedding against precomputed leaf centroids using cosine similarity. The router returns the best-matching leaf, a confidence score, a ranked list of alternatives, and a set of signal flags indicating whether the query requires memory recall, spans multiple domains, or should trigger swarm execution.
The biological parallel is the thalamus, the brain's central relay station. All sensory input except olfaction passes through the thalamus en route to the cortex. The thalamus does not process the content of sensory signals; it routes them. Visual input is relayed to the visual cortex, auditory input to the auditory cortex, somatosensory input to the somatosensory cortex. The semantic router performs an identical function: it examines the semantic content of a query just enough to determine which cortical region (leaf adapter) should process it, then activates that adapter.
A notable capability is query decomposition. When the router detects that a query spans multiple domains (more than one leaf exceeds the multi-fire confidence threshold), it splits the query at natural clause boundaries, routes each sub-query independently, and signals the pipeline to invoke swarm execution, where parallel agent firings process the sub-queries simultaneously and fuse their results in latent space. This decomposition uses a clause-splitting regex that identifies coordinating conjunctions, comma-delimited phrases, and sentence boundaries, retaining only chunks that contain at least three words to avoid fragmenting semantically atomic expressions.
The router also integrates memory signals. A dedicated centroid built from 19 memory-trigger phrases (such as "remember," "recall," "we discussed," and "last time") detects when a query references prior conversation context. When the memory signal score exceeds the configured threshold and the query meets a minimum word-count criterion, the router flags the query for memory retrieval, ensuring that relevant prior knowledge is surfaced without requiring the user to explicitly request it.
Centroid quality is refined over time through data-driven overrides. When sufficient domain-specific data accumulates, a centroid computed from actual training examples can replace the description-based centroid, provided it meets a minimum similarity threshold of 0.3 against the original. This prevents catastrophic routing degradation from outlier centroids while allowing the system to adapt its routing decisions to the actual distribution of queries it encounters.
4.4 Memory System (Hippocampus to Neocortex)
The memory system implements a three-phase retrieval pipeline backed by an append-only JSONL ledger that stores structured entries with fields for content, speaker, entry type (fact, preference, decision, question, or opinion), extracted entities, topic identifiers, BGE-base embeddings, and graduation status. The three phases are recall, re-ranking, and temporal decay.
Phase one, recall, generates a broad candidate pool by taking the union of two independent ranking signals. Cosine similarity between the query embedding and all entry embeddings identifies semantically related entries, while an Okapi BM25 index (k1 = 1.2, b = 0.75) identifies keyword-relevant entries that semantic similarity might miss. The BM25 implementation employs two-layer noise filtering: a set of seed stopwords bootstraps filtering for small corpora below approximately 100 entries, while a dynamic IDF floor automatically suppresses any term appearing in more than 40 percent of documents once the corpus reaches sufficient scale. This makes the seed stopword list progressively irrelevant as the corpus grows, a self-calibrating property. The recall depth is fixed at 50 candidates per signal.
Phase two, re-ranking, applies a cross-encoder model (ms-marco-MiniLM-L-6-v2) that receives the query and each candidate together as a single input, enabling token-level interaction between them. Bi-encoder approaches (including the cosine similarity used in phase one) compress each text into a fixed vector independently, losing the ability to reason about whether one text actually answers the other. The cross-encoder resolves this by attending across both texts simultaneously. Scores are min-max normalized to the [0, 1] range to ensure comparable scaling across retrieval batches.
Phase three, temporal decay, applies a Gaussian decay function to the relevance score: the final score equals the relevance score multiplied by exp(-decay_rate * days_ago^2). This weights recent memories higher while permitting distant but highly relevant memories to surface if their relevance score is sufficiently strong. The quadratic exponent ensures that decay accelerates for older memories rather than following a linear or exponential curve, matching the empirical observation that memory accessibility in biological systems follows a concave decay profile.
The biological parallel spans the hippocampal-neocortical consolidation axis. The memory ledger functions as the hippocampus: it stores recent episodic memories in a structured, addressable format that supports rapid retrieval. Over time, a graduation pipeline identifies high-value entries and generates synthetic training data from them, producing question-thought-resolution trajectories suitable for supervised fine-tuning of leaf adapters. When this training data is incorporated into the next leaf training round, the knowledge transitions from explicit recall (looking up the entry in the ledger) to implicit capability (the answer is encoded in the leaf's weight matrices). This process directly parallels the biological transition from hippocampal episodic memory to neocortical procedural memory: a human first learns to ride a bicycle by consciously recalling each instruction, then gradually internalizes the skill until it becomes automatic. AXOM first retrieves a fact from the ledger, then eventually embeds that fact into its adapter weights, eliminating the retrieval step entirely.
The graduation pipeline generates three categories of training trajectories. Fact trajectories pair entity-targeted questions with the stored content as the target resolution, using varied question templates to prevent overfitting to a single phrasing. Inferential trajectories identify entries that share entities and generate cross-referencing questions that require synthesizing information from multiple sources. Negative trajectories use fabricated entities (such as "quantum noodle compiler" or "stochastic waffle optimizer") to train the model to correctly report the absence of knowledge rather than hallucinating. The negative ratio is configurable, defaulting to 30 percent of the total trajectory count.
The memory system has been validated with 97 of 97 retrieval tests passing, covering edge cases including empty corpora, single-entry retrieval, temporal range filtering, entity-based lookup, graduated entry handling, and BM25/cosine fusion accuracy.
4.5 Autonomous Cognition (Default Mode Network)
The autonomous loop is a self-directed continuous learning engine that operates when AXOM is not serving a user. Given a seed thought, the loop runs indefinitely: the model receives context (the current thought plus relevant memory recall), generates reasoning through structured tags (thought, action, observation, resolution), executes tool calls (web search, content fetching, memory storage and recall), and the output is parsed for its next exploration direction. That direction feeds back as the next input. There are no scripted questions, no curriculum, and no human in the loop. The model drives everything; the loop module manages feedback and provides observability.
The biological parallel is the default mode network (DMN), the set of brain regions that activate during idle states: mind-wandering, self-reflection, daydreaming, and memory consolidation. The DMN is not inactivity. It is the brain's background processing mode, during which it consolidates recent experiences, explores hypothetical scenarios, and forms connections between disparate memories. Neuroimaging studies consistently show DMN activation during tasks that require integrating information across time and context. The autonomous loop serves the same function: when not engaged in directed user interaction, AXOM explores its own knowledge gaps, researches unfamiliar territory, stores findings in memory, and discovers connections between previously unrelated concepts.
The loop maintains a configurable context window of recent turns (default five) that provides the model with short-term conversational continuity without accumulating unbounded context. Ancestor nodes in the exploration tree are also surfaced, giving the model awareness of the intellectual path that led to the current thought. Critically, each turn's context is assembled fresh from these components and discarded after inference completes. The system does not maintain a growing conversation buffer. A thought at tick 500 receives the same-sized context as a thought at tick 5: the current question, three ancestor nodes, and five recent turns. All knowledge acquired across the preceding 495 ticks persists in the memory ledger and, after graduation, in leaf weights, not in the prompt. This scratchpad architecture means the autonomous loop can run indefinitely without context window pressure. The 262,144-token context capacity of the chassis is available to each individual turn for deep reasoning, not consumed by accumulated history. The default seed thought, "Who am I? What kind of intelligence am I?", is chosen to prompt existential and architectural self-exploration, but any seed can be injected.
Next-direction extraction uses pattern matching to identify questions, curiosity expressions, and exploration intentions in the model's output. When the model poses a question (detected via interrogative syntax), that question becomes the next input. When the model expresses intent to explore a topic (detected via phrases like "let me investigate" or "this leads me to"), that expression becomes the next input. When no actionable direction is detected, the pulse monitor (Section 4.7) intervenes with a recovery nudge.
Thread divergence detection prevents the loop from treating a continuation of the same topic as a new branch. When the word overlap between the extracted direction and the current input exceeds 70 percent, the direction is classified as a continuation rather than a new thread, maintaining the linear exploration path rather than spawning a redundant branch.
The autonomous loop has been validated with 27 of 27 tests passing, covering seed thought processing, direction extraction, thread spawning, context building, stall recovery, and multi-turn exploration chains.
4.6 Synaptogenesis (Connectome)
The thread tracker maintains a passive cognitive map of the autonomous exploration tree. Every thought is represented as a node with parent-child relationships forming the tree structure. Independently of this tree, a synaptogenesis module records lateral connections, synapses, between thoughts that discover shared knowledge despite belonging to different branches of the exploration tree.
A synapse forms when a thought in one branch triggers a memory recall that returns content originally stored by a thought in a different, non-ancestral branch. This event indicates a genuine conceptual connection between two independently explored ideas. The system explicitly prevents trivial connections: synapses cannot form between a node and its own ancestors or descendants, because those connections are already represented by the tree edges. Only cross-branch connections qualify as synapses.
The biological parallel is synaptogenesis and Hebbian learning. In biological neural networks, synapses form between neurons that fire in temporal proximity ("neurons that fire together wire together"), strengthen through repeated co-activation (long-term potentiation, LTP), and weaken through disuse (synaptic pruning). The AXOM synaptogenesis module follows the same dynamics. Initial synapse strength is set at formation. Repeated co-activation increases strength by 30 percent of the triggering strength, capped at 1.0 (LTP). Unused synapses decay according to an exponential function with a half-life of approximately 300 seconds and are pruned when their decayed strength falls below 0.05 or their age exceeds 3,600 seconds.
The synapse graph is a map of emergent understanding. It reveals where the system connected ideas that were explored independently, the computational equivalent of insight. These connections are tracked with strength, activation count, formation time, last activation time, and the triggering event type, providing a complete provenance trail for every cross-domain link.
The synaptogenesis module has been validated with 6 dedicated tests passing, covering formation, ancestor exclusion, strengthening, pruning, and statistics computation.
4.7 Arousal Regulation (Reticular Activating System)
The pulse monitor detects when the autonomous loop enters an unproductive state, whether from repetitive output, failure to generate actionable directions, or excessive exploration depth, and generates escalating recovery nudges to redirect the model's attention.
The biological parallel is the reticular activating system (RAS), a network of neurons in the brainstem that regulates arousal, attention, and the sleep-wake cycle. When the brain enters an unproductive loop (rumination, distraction, perseveration), the RAS modulates cortical arousal to shift attention. The pulse monitor performs an analogous function through three escalating strategies.
The first strategy, surface threads, activates on the initial stall detection. The monitor presents the model with its currently open threads (active and parked thoughts) and asks which interests it most, providing raw material for redirection without prescribing a specific path. The second strategy, suggest reflection, activates on the second consecutive stall. The monitor surfaces recently resolved threads with their conclusions and asks the model to identify connections and follow-up questions, leveraging existing knowledge to generate new directions. The third strategy, restart from root, activates on the third consecutive stall. The monitor reports how many topics have been explored and instructs the model to return to its fundamental purpose, providing a hard reset from any local minimum in the exploration space.
Stall detection operates on the content of the model's output. An output is considered actionable if it contains interrogative syntax, exploration verbs (learn, explore, investigate, search, understand), or explicit statements of intent (I want to know, this leads me to, next I should). If none of these indicators are present, the output is classified as non-actionable and the stall counter increments.
Repetition detection maintains a sliding window of the five most recent queries. If any query appears twice within that window, it is flagged as repetitive, and the thread is spawned without a parent, breaking the loop rather than continuing to deepen a stuck branch. The pulse monitor has been validated with 5 of 5 tests passing.
4.8 Executive Function (Prefrontal Cortex)
The outcome scoring and risk evaluation subsystem provides executive function: evaluating whether an action was worth taking (post-hoc scoring) and whether a proposed action is likely to be worth taking (pre-hoc risk assessment). Both capabilities improve over time as data accumulates.
The biological parallel is the prefrontal cortex, the last brain region to mature developmentally and the seat of executive function, planning, and impulse control. The prefrontal cortex evaluates potential actions against anticipated outcomes, inhibiting low-value actions and promoting high-value ones. Critically, this capability develops slowly and is informed by accumulated experience. A child touches a hot stove; the adult does not need to reason about it. The evaluation has been baked into behavior through repeated outcome learning.
Post-hoc scoring evaluates every resolved thought as a return on investment. The investment metric aggregates search queries (weight 1.0 each), fetched sources (0.5), exploration depth (0.3), and ticks alive (0.1). The return metric aggregates resolution quality (weight 3.0, scored on length and substance), novelty (2.0, measured as keyword overlap against previously resolved thoughts), connectivity (2.5, counting associated synapses), memory storage (1.0 binary), children spawned (0.5 each), and synapses formed (3.0 each). The ROI is computed as return divided by investment. Efficiency is computed as return divided by time elapsed. These scores are stored in a history buffer that informs the pre-hoc risk evaluator.
Pre-hoc risk assessment evaluates four risk signals and four reward signals for each proposed tool execution. Risk signals include redundancy (how similar this query is to recent queries, measured by word-set overlap), depth penalty (diminishing returns at increasing thread depth, scaling from 0 at depth 3 to near 1.0 at depth 10+), stall probability (predicted from the ratio of low-ROI outcomes in recent history, adjusted for current depth and query count), and source unreliability (tracked per domain from historical outcomes). Reward signals include novelty (keyword overlap against resolved thread resolutions), connectivity potential (estimated from the number of active branches that could form synapses), source quality (historical ROI for each tool type), and urgency (higher for shallow exploration, lower for deep branches). The net value is reward minus risk, and the decision is proceed (net value above threshold), reconsider (high stall probability), skip (high redundancy), or alternative (other risk factors dominate).
The evaluator begins in a permissive mode: with fewer than 10 outcome data points, all actions receive a "proceed" decision with maximum reward score. As outcomes accumulate, the evaluator becomes progressively more selective, mirroring the developmental trajectory of the prefrontal cortex from childhood permissiveness to adult restraint.
The outcome scorer has been validated with 12 of 12 tests passing, covering ROI computation, trend detection, resolution quality scoring, novelty assessment, and risk evaluation decision logic.
4.9 Inter-Agent Communication (Corpus Callosum)
AXOM's latent communication system enables multiple agents to share knowledge through hidden-state tensors rather than decoded text. The system comprises two original modules: InnerLink, which maps hidden states back to embedding space for iterative latent thought within a single agent, and OuterFusion, which projects one agent's latent state into another agent's input distribution for cross-agent knowledge transfer.
InnerLink draws on the general mathematical principle of projecting transformer hidden states back to embedding space to enable recurrent-style processing within a feedforward architecture. The concept of hidden-to-embedding projection for latent iteration has appeared in several research contexts; AXOM's contribution is the specific residual adapter architecture, the identity initialization strategy, and the integration into a multi-agent swarm with router-weighted fusion. OuterFusion is entirely original to AXOM: no existing system projects latent states across agents using a learned residual MLP with dimensionality-aware initialization. The training pipeline for both modules, including the cosine alignment loss for InnerLink and the reconstruction objective for OuterFusion, is AXOM's own design.
The biological parallel is the corpus callosum, the largest white matter structure in the brain, containing approximately 200 million axons that connect the left and right cerebral hemispheres. The corpus callosum transmits raw neural signals, not language, between hemispheres. When the left hemisphere's language centers formulate a verbal thought, the right hemisphere does not receive a text transcript; it receives the underlying neural activation pattern, a vastly richer representation than any linguistic encoding could provide. AXOM's latent communication operates on the same principle: agents share hidden-state vectors, the full latent representation of a thought, rather than the lossy text decoded from those vectors.
InnerLink is a residual adapter with approximately 2.1 million parameters (for a hidden dimension of 1,024). Its architecture consists of layer normalization, a linear projection, a GELU activation, a second linear projection, a residual connection from the input, and a final layer normalization. The projections are initialized with near-zero weights (standard deviation 1e-4) so that at initialization, the module is effectively transparent: the residual connection dominates, and InnerLink passes hidden states through unchanged. This identity initialization ensures that the system functions correctly before any InnerLink-specific training has occurred, with the module learning useful transformations only as cosine alignment loss provides gradient signal.
InnerLink's rollout method enables iterative latent thought. Given an initial hidden state from a forward pass, it cycles through a feedback loop: the hidden state is projected back to embedding space via InnerLink, fed through the transformer's forward pass to produce a new hidden state, and the process repeats for a configurable number of steps (default 32, with compact and research configurations at 16 and 64 steps respectively). Each iteration produces a "latent thought," a step of reasoning that occurs entirely in hidden-state space without ever being decoded to text. The result is a tensor of shape [batch, latent_steps, hidden_size] representing the trajectory of internal deliberation.
OuterFusion is a larger residual adapter with approximately 5.2 million parameters (for input and output dimensions of 1,024), designed and built entirely within the AXOM architecture. Its architecture expands the hidden state to twice the target dimension through a linear projection with GELU activation, then compresses back to the target dimension, with a separate residual projection to handle potential dimensionality mismatches between source and target agents. For AXOM's single-chassis architecture where all agents share the same base model, the input and output dimensions are equal and the residual projection initializes as an identity matrix. As with InnerLink, the main pathway initializes with near-zero weights, making the module transparent at initialization. This design allows the swarm to operate correctly from the first inference (latent states pass through unchanged) while the modules learn useful transformations over time.
The fundamental insight driving this architecture is bandwidth. When an agent decodes its hidden state to text and sends that text to another agent, the receiving agent must re-encode the text into its own hidden-state space, losing information at both the decoding and re-encoding boundaries. This is the difference between transmitting an experience (the neural pattern for riding a bicycle) and describing an experience (a verbal explanation of how to ride a bicycle). AXOM's latent communication eliminates both lossy boundaries by keeping inter-agent transfer in hidden-state space throughout.
4.10 Neural Ensembles (Swarm Fusion)
The swarm fusion module orchestrates parallel agent firings that share findings via hidden states rather than text, implementing a neural ensemble approach to multi-source knowledge synthesis. When the semantic router detects that a query spans multiple domains (Section 4.3), the swarm fires multiple agents, each processing a different search context or sub-query, extracts hidden states from each agent's forward pass, computes a router-weighted average of the pooled states, optionally processes the fused state through InnerLink, and generates a final response conditioned on the fused embedding.
The biological parallel is neural population coding. In biological neural networks, information is not represented by individual neurons but by the collective activity of neuronal populations. No single neuron in the motor cortex encodes the full trajectory of an arm movement; the population vector, the activity-weighted average across the entire motor population, carries the movement command. Swarm fusion operates on the same principle: no single agent has the complete answer, but the weighted average of their latent representations captures a richer understanding than any individual could achieve.
The scratchpad context architecture (Section 3) is especially important in swarm fusion. Each agent receives only its assigned search context as a prompt, processes it in a single forward pass, and is immediately discarded. No agent accumulates context from the other agents' work. The synthesis happens entirely in latent space via the weighted hidden-state average, not through an ever-growing shared prompt. This means a five-agent swarm consumes the same per-agent context as a single-agent inference, making the ensemble computationally tractable on a 0.8-billion-parameter chassis that would be overwhelmed by the concatenated text of five agent transcripts in a conventional multi-agent framework.
Hidden-state pooling uses a weighted mean with exponential recency bias. Later token positions, which have attended to more prior context through the transformer's causal attention mechanism, receive higher weight in the pool. The weight for position i in a sequence of length n is computed as exp(i/n) - 1, normalized to sum to one. This biases the pooled representation toward the tokens that have the most complete view of the agent's reasoning.
Fusion weights are computed by the router's embedding model. The query and all search contexts are embedded, and each context's weight is proportional to its cosine similarity with the query, normalized to sum to one. Non-positive similarities are clamped to zero. Memory context, when available, is processed as an additional latent participant: it is embedded through the chassis's forward pass and fused into the ensemble with its own router-computed weight. This avoids duplicating memory text into each agent's prompt, instead contributing memory knowledge once at the latent level.
When InnerLink is available, the fused embedding is projected through it before generation, applying the learned latent-to-embedding-space transformation. Generation then proceeds via the chassis's hybrid generation method, which evaluates a synthesis prompt (providing structural context and format cues) through normal token processing to fill the KV cache, then injects the fused embedding at the next position via the batch embedding field, then samples autoregressively. This gives the model both conversational structure (from the prompt) and synthesized knowledge (from the fused latent state).
4.11 Sensory Interface (Plugin Architecture)
The plugin architecture provides AXOM with domain-specific tool backends that serve as interfaces to the external world. The current tool registry exposes 11 tools: file operations (Read, Write, Edit), code search (Grep, Glob), sandboxed shell execution (Bash), information acquisition (WebSearch via SearXNG integration with academic engines including arXiv, Google Scholar, Semantic Scholar, and PubMed; WebFetch for content extraction), scientific computation (QuantumSim wrapping a QuantumForge backend), and memory operations (MemoryStore, MemoryRecall).
The biological parallel is the sensory organ system. Eyes, ears, mechanoreceptors, and chemoreceptors are structurally diverse but all convert environmental signals into the common currency of neural activity for processing by the central nervous system. Similarly, AXOM's tools have entirely different backends (a search engine, a quantum chemistry simulator, a file system), but all present a uniform interface to the reasoning engine: a tool name, a string argument, and a string result. The model interacts with all tools through the same ReAct framework regardless of backend complexity.
A key design property is graceful degradation. All tools are always visible to the model in its system prompt (it knows they exist and what they do), but individual backends install independently. If the quantum chemistry backend is not available, the QuantumSim tool returns an appropriate error message rather than crashing the inference pipeline. The model can then adjust its reasoning, choosing alternative information sources rather than failing catastrophically. This mirrors biological sensory resilience: loss of vision does not eliminate the visual cortex; it is repurposed for other modalities.
Security is enforced through a sandboxing layer that restricts file access to a configurable root directory, whitelists permitted shell commands, blocks dangerous patterns (rm -rf, sudo, chmod, kill), and prevents find with -exec or -delete flags. The online/offline toggle controls whether WebSearch and WebFetch are available, enabling fully air-gapped deployment for privacy-critical environments.
Case Study: QuantumForge Integration. The QuantumSim plugin demonstrates how the architecture extends AXOM from a language-processing system into a computational science platform. QuantumSim wraps QuantumForge, a GPU-accelerated quantum chemistry engine backed by PySCF, exposing four operations through the standard tool interface: single-point energy calculation, geometry optimization, interaction energy computation (binding affinity between molecular pairs), and electronic property extraction (HOMO-LUMO gap, dipole moment, Mulliken charges). The total additional footprint is approximately 170 megabytes — QuantumForge's CUDA kernels and ML functionals at 20 megabytes, PySCF at 150 megabytes — over the base AXOM installation. GPU acceleration provides 10-50x speedup over the CPU fallback, but both paths produce identical results.
What makes this integration architecturally significant, rather than merely a tool call to an external service, is the interaction with the autonomous loop and memory graduation pipeline. During autonomous exploration, AXOM can formulate a molecular hypothesis from literature search, computationally validate it through QuantumSim, store the result in the memory ledger with the simulation parameters and convergence status, and eventually graduate that validated finding into leaf training data. The model does not merely report what a paper says about a molecule's binding energy; it independently computes and verifies the claim. When a chemistry-domain leaf is trained on graduated memories containing validated simulation results, the leaf develops procedural intuition about molecular stability and reactivity — knowledge that originated in quantum mechanical calculation but now resides in neural weights. This is the graduation pipeline operating at its full potential: external computation produces a finding, the memory system stores it, and the training pipeline converts it from explicit recall into implicit capability.
The plugin manifest system enables this extensibility without modifying core AXOM code. Each plugin provides a manifest declaring its dependencies, a registration function that adds tools to the registry, and a dependency checker that enables graceful degradation. Six additional verticals are planned following the same pattern: genomics (sequence alignment, variant calling, protein structure prediction), robotics (motion planning, physics simulation, control design), signal processing (FFT, filtering, anomaly detection), legal analysis (contract parsing, compliance checking), finance (options pricing, risk analysis, portfolio optimization), and climate science (atmospheric modeling, emissions tracking). Each vertical adds a new sensory modality without altering the cognitive architecture — the same autonomous loop, memory system, and graduation pipeline operate identically regardless of which plugins are installed.
4.12 Neural Observatory
The Neural Observatory is a real-time cognitive visualization system that renders AXOM's internal activity as an interactive graph. It consumes events from the autonomous loop's event bus via WebSocket and presents them as a force-directed D3 graph with two distinct edge types: tree edges (white, representing parent-child thought relationships) and synaptic edges (electric blue, representing cross-branch conceptual connections discovered through synaptogenesis).
The biological parallel is neuroimaging. The Neural Observatory is to AXOM what functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) are to the human brain: a non-invasive window into cognitive activity. Node pulsing indicates active processing. Synapse formation events appear as new blue edges, representing insight moments. Depth visualization reveals exploration patterns, showing whether the system is diving deep into a single topic or branching broadly across domains.
The event system that feeds the observatory defines 23 event types organized across six categories: thought lifecycle (spawned, active, resolved, parked, stalled), tool execution (start, end), memory operations (stored, recalled), loop lifecycle (tick, stall, nudge, started, stopped), agent and inference events (inference start, inference end, agent fired, agent returned), latent space events (latent fusion, hidden state), and synaptogenesis events (synapse formed, synapse strengthened, synapse pruned). Every event carries a timestamp, an optional thought identifier for graph placement, and a typed payload with event-specific data. Events are append-only: nothing is lost, and the full cognitive history is available for retrospective analysis.
The server architecture uses FastAPI with WebSocket endpoints for real-time event streaming and REST endpoints for state queries, supporting both a live mode (connected to a running autonomous loop) and a mock mode (replaying representative event sequences for development and demonstration). The dashboard additionally computes aggregate intelligence metrics including a composite IQ score, a curiosity index derived from branching behavior, and a knowledge density metric based on resolution quality and synapse formation rates.
5. System Integration
The components described in Section 4 do not operate in isolation. They form interlocking feedback loops in which the output of each subsystem serves as input to others, producing emergent behavior that exceeds the capability of any individual component. Understanding these feedback loops is essential to understanding AXOM as a cognitive system rather than a collection of independently useful modules.
The primary feedback loop connects autonomous cognition, memory, graduation, and leaf specialization. The autonomous loop generates knowledge through self-directed exploration: searching the web, reading sources, reasoning about findings, and storing conclusions in the memory ledger. The memory system makes this knowledge available for recall during subsequent exploration turns, enabling the system to build on its own prior work. The graduation pipeline identifies high-value memory entries and converts them into training data. When this training data is incorporated into the next leaf training round, the knowledge becomes part of the model's weights, improving the quality of future inference without requiring memory retrieval. The improved leaf produces higher-quality autonomous exploration, which generates higher-quality memory entries, which produce higher-quality training data. Each revolution of this loop raises the baseline capability of the entire system.
A second feedback loop connects the router, outcome scoring, and risk evaluation. As the autonomous loop explores and resolves thoughts, the outcome scorer records the ROI of each resolution. The risk evaluator consumes this history to predict the value of proposed actions before they execute. Over time, the system learns which tool calls, search patterns, and exploration depths produce high-value outcomes and which constitute waste. The router benefits indirectly: as leaf quality improves through graduation, routing accuracy improves because leaf centroids become more representative of their actual capability domains. This produces more accurate leaf activation, which produces higher-quality inference, which produces higher-value outcomes, which further refines the risk evaluator.
A third feedback loop connects synaptogenesis with the autonomous loop's direction selection. When the thread tracker records a synapse between two previously unrelated thoughts, this cross-branch connection becomes visible to the pulse monitor's reflection nudge. If the autonomous loop stalls, the pulse monitor surfaces resolved thoughts and their connections, potentially including synaptically linked nodes from entirely different branches. This can redirect the model's exploration toward the intersection of two domains it had previously explored independently, producing genuinely novel lines of inquiry that neither domain would have suggested alone.
The convergence of these loops produces a system that does not merely accumulate knowledge but develops cognitive infrastructure. The router gets smarter at directing queries. The risk evaluator gets smarter at predicting outcomes. The leaves get more capable in their domains. The memory system gets richer and more interconnected. And the synapse graph grows, forming a map of the system's emerging understanding that is itself a resource for future exploration.
6. The Data Flywheel
AXOM does not exist in isolation. It sits at the center of a data flywheel that begins with real human usage and completes with improved machine capability that generates higher-quality human interactions.
The flywheel operates as follows. Groove, a free open-source development platform, provides the user-facing interface. As users write code, research subjects, and build projects within Groove, the platform automatically collects session telemetry. Each session is quality-scored using a multi-signal assessment (completion rate, interaction depth, tool diversity), domain-tagged using the same BGE-base embedding model that powers the semantic router, and stored as a JSONL trajectory log with full step-by-step structure. This telemetry data flows into a daily ingestion pipeline that aggregates, deduplicates, and stores the sessions for downstream consumption.
When AXOM nodes are not serving users, they enter idle autonomous mode and consume this telemetry data as learning material. Node operators, specialized nodes responsible for leaf training, aggregate high-quality trajectories across multiple nodes, merge candidate training sets, execute the training pipeline, validate the resulting adapter against held-out evaluation sets, and broadcast the improved leaf to the network. The improved leaf produces better assistance for Groove users, who in turn generate higher-quality and more diverse telemetry, which produces better training data, which produces better leaves. The cycle accelerates.
At the time of writing, the telemetry corpus contains 1,291 sessions collected over 20 days of founder usage, comprising approximately 15 million tokens. The domain distribution is concentrated in Python, machine learning infrastructure, and software architecture, reflecting the founder's primary activities during this period. A notable property of this initial dataset is that it consists largely of sessions in which the founder was building AXOM itself. The system's earliest real-world training data is its own creation story; the first thing it will deeply understand is its own architecture.
Public launch of the Groove platform will dramatically expand both the volume and diversity of the telemetry corpus. The flywheel's value scales superlinearly with user count: more users produce more diverse data, which trains more capable leaves, which attract more users. The critical mass threshold, the point at which the flywheel becomes self-sustaining, depends on achieving sufficient domain diversity to train leaves beyond the founder's primary areas of expertise.
7. Distributed Intelligence: The Groove Network
The Groove Network is a WebRTC mesh network that enables distributed AXOM operation across multiple nodes. Originally built for local Mixture-of-Experts model coordination and tested with five running nodes, the network has been redesigned to host full AXOM instances at each node, with the mesh providing the communication substrate for leaf sharing, memory propagation, and consensus-based governance.
Each node in the network runs a complete AXOM instance: chassis, router, leaf adapters, memory system, and autonomous loop. Nodes operate in two modes. In active mode, the node serves a user directly, processing queries, executing tool calls, and generating responses while collecting real interaction telemetry. In idle mode, the node enters autonomous exploration, researching, learning, testing hypotheses, and building knowledge without human direction. The network-level effect is a population of agents that learn from actual human usage during the day and explore independently during off-hours, combining the benefits of human-guided learning with the volume of machine-directed exploration.
A gossip protocol propagates information across the mesh. Graduated leaves (adapters that have completed a training round and passed validation), shared memory entries (high-value findings from autonomous exploration), and synapse discoveries (cross-domain connections) are broadcast to neighboring nodes and propagated transitively through the network. The protocol is designed for eventual consistency rather than strong consistency, tolerating network partitions and asynchronous updates.
Leaf governance operates on a consensus model. When multiple nodes accumulate sufficient high-quality training data in a domain (detected via quality scoring and domain tagging), any node can propose a leaf training event. The proposal is broadcast to the network, and if multiple nodes have registered candidate adapters with scores above a configurable threshold, an operator node merges the training data, executes the training pipeline, validates the result, and broadcasts the new adapter. Competing adapters are evaluated by the OutcomeScorer across multiple nodes, and the adapter with the highest aggregate ROI is adopted by the network. Weaker adapters are deprecated. This is natural selection operating on learned capabilities: the most effective cognitive specializations propagate and survive; the least effective are pruned.
The distributed architecture enables several deployment patterns that centralized AI systems cannot support. On-device deployment eliminates cloud dependency and network latency. On-premise deployment meets data sovereignty requirements for enterprises in regulated industries. Air-gapped deployment serves environments where no external network connectivity is permitted, such as classified research facilities. The mesh network provides the social learning benefits of a connected population while each individual node operates as a fully autonomous, privacy-respecting intelligence.
8. Comparative Analysis
The following table summarizes how AXOM addresses fundamental limitations of conventional AI architectures across eight dimensions.
| Dimension | Conventional Approach | AXOM Solution |
|---|---|---|
| Intelligence Persistence | Models are frozen after training. No learning from deployment interactions. | Continuous learning via autonomous loop, memory graduation into leaf weights, and telemetry-driven retraining. |
| Domain Specialization | Single model for all domains. No modular expertise. | Hot-swappable leaf adapters with sub-millisecond switching. Each leaf is independently trainable. |
| Self-Direction | Requires human prompts for every action. No autonomous exploration. | Default mode network drives self-directed learning, research, and knowledge construction without human input. |
| Multi-Agent Communication | Agents exchange text, discarding latent representations. | AXOM's latent communication system transmits hidden-state tensors between agents, preserving full latent bandwidth. |
| Outcome Learning | Same mistakes repeated. No feedback from action outcomes. | OutcomeScorer evaluates ROI of every resolved thought. RiskEvaluator learns to predict action value before execution. |
| Knowledge Integration | Information stays where it was found. No cross-domain linking. | Synaptogenesis forms, strengthens, and prunes lateral connections between independently explored concepts. |
| Compute Architecture | Cloud-dependent. No privacy guarantee. Single point of failure. | Distributed WebRTC mesh. Runs on-device, on-premise, or air-gapped. Gossip-based leaf propagation. |
| Training Data | Requires human-curated and human-labeled datasets. | Flywheel: real usage telemetry auto-collected, quality-scored, domain-tagged, and embedded. No manual curation. |
9. Current Status and Empirical Results
AXOM is under active development with the majority of subsystems implemented, tested, and awaiting first full integration. The following table summarizes the implementation status and test coverage of each component.
| Component | Status | Tests |
|---|---|---|
| Chassis engine (Qwen 3.5, 0.8B params) | Running, inference verified on GPU | |
| Chat leaf adapter (Round 3, 30,318 examples) | Training in progress, 25% complete | |
| Semantic router (BGE-base, cosine + decomposition) | Built, tested, centroid override operational | Passing |
| Memory retrieval (BM25 + cosine + cross-encoder reranking) | Built, calibrated, graduation pipeline designed | 97/97 |
| Autonomous loop (DMN analog) | Built, not yet run on live inference | 27/27 |
| Thread tracker + synaptogenesis | Built, Hebbian dynamics operational | 6/6 |
| Pulse monitor (RAS analog) | Built, escalating nudge strategies verified | 5/5 |
| Outcome scorer + risk evaluator (PFC analog) | Built, wired into autonomous loop | 12/12 |
| ReAct loop (tool execution pipeline) | Built, smoke-tested with tool registry | |
| Latent Communication (InnerLink 2.1M + OuterFusion 5.2M params) | Built, identity-initialized, awaiting training | |
| Swarm fusion (latent-space ensemble) | Built, batch processing verified | |
| Plugin system (QuantumSim/PySCF + 11 tool registry, 6 verticals planned) | Built, energy/optimize/interact/property ops, GPU + CPU paths, graceful degradation | |
| Neural Observatory dashboard | Design spec + mock server complete, frontend in progress | |
| Telemetry ingestion pipeline | Collecting (1,291 sessions, ~15M tokens) | |
| Groove Network (WebRTC mesh) | Built, tested with 5 nodes (pre-AXOM integration) | |
| SearXNG academic search integration | Running (arXiv, Google Scholar, Semantic Scholar, PubMed) |
The aggregate test count across implemented subsystems stands at 147 passing tests (97 memory + 27 autonomous loop + 12 outcome scorer + 6 synaptogenesis + 5 pulse monitor), with zero failures. These tests cover not only nominal operation but edge cases including empty corpora, single-entry retrieval, temporal boundary conditions, graduated entry handling, stall recovery sequences, repetition detection, ROI computation with degenerate inputs, and ancestor-exclusion logic for synapse formation.
The first full integration test, in which the autonomous loop will run with live chassis inference, real tool execution, memory storage and recall, synaptogenesis tracking, outcome scoring, and Neural Observatory streaming, is pending completion of the current chat leaf training round. This milestone represents the transition from validated subsystems to a functioning cognitive loop.
10. Future Directions
Several lines of development follow directly from the current architecture. The most immediate is the completion of the first full integration test. Once the chat leaf training round completes, the autonomous loop will be activated with live inference, and the system will begin generating its first autonomous knowledge, storing it in memory, and building its initial synapse graph. This event will produce the first empirical data on emergent behavior in the integrated system: how exploration patterns evolve over sustained autonomous operation, what kinds of cross-domain connections form, and how quickly the outcome scorer transitions from permissive to selective.
Autonomous data generation represents the next major capability milestone. Once the autonomous loop is operational, the memory graduation pipeline can begin converting exploration findings into training data. This closes the primary feedback loop: the system generates its own training data through self-directed learning, improves its own leaf adapters, and produces higher-quality exploration in subsequent sessions. The first graduation cycle will provide critical empirical data on training set quality, leaf improvement rates, and the stability of the self-improvement loop.
Multi-leaf fusion, in which the swarm fires agents with different leaf adapters simultaneously and fuses their hidden states, will test the hypothesis that domain-specialized latent representations compose meaningfully under weighted averaging. The router's query decomposition already identifies multi-domain queries; multi-leaf fusion will extend the swarm to activate different leaves for different sub-queries, producing a fused representation that integrates multiple areas of expertise.
The plugin architecture is designed for expansion across multiple verticals. Planned tool backends include genomics (sequence alignment and annotation), robotics control (motion planning and sensor integration), materials science (molecular dynamics simulation), and financial analysis (time series modeling and risk assessment). Each new vertical extends AXOM's sensory reach without modifying the core reasoning architecture, following the biological model in which new sensory modalities integrate with existing cortical processing infrastructure.
An intermediate step toward full distributed intelligence is dual-instance collaborative learning: two AXOM instances sharing the same memory store, event bus, and thread tracker, but maintaining separate conversation contexts and engaging in Socratic dialogue. One instance poses questions; the other researches and responds; roles alternate. This eliminates the stall recovery problem organically, because a second intelligence asking "but what about X?" is a qualitatively richer signal than the pulse monitor's canned nudge strategies. The risk of echo-chamber convergence (two copies of the same weights reinforcing shared blind spots) can be mitigated by assigning different leaf adapters or exploration strategies to each instance, producing cognitive diversity from the same chassis. At 775 megabytes per chassis instance, two simultaneous AXOM nodes fit comfortably within the memory budget of consumer-grade GPUs.
Distributed training coordination through the Groove Network will enable the consensus-based leaf governance described in Section 7. This requires implementing the proposal-merge-validate-broadcast protocol, the OutcomeScorer-based adapter selection mechanism, and the gossip protocol for adapter propagation. The existing five-node WebRTC mesh provides the communication substrate; the governance logic remains to be built.
A natural lifecycle for leaf adapters is anticipated. The current chat leaf was trained on synthetic data generated by a teacher model to bootstrap conversational capability, tool-calling patterns, and behavioral norms. As the autonomous loop and telemetry flywheel accumulate real-world interaction data, a "neural leaf" trained on AXOM's own generated trajectories will emerge. Once this neural leaf reaches parity with the synthetic chat leaf (approximately 30,000 genuine turns), the chat leaf's router weight will decline toward zero as the neural leaf covers the same capabilities with an authentic cognitive voice rather than an imitated one. The chat leaf will eventually be removed entirely, with a small residual fraction (approximately 5 percent) of structural training data retained solely to preserve output formatting conventions. This transition from synthetic bootstrap to experiential intelligence mirrors human cognitive development: early learning is heavily scaffolded by caregivers and teachers, but adult cognition operates primarily on lived experience.
Finally, InnerLink and OuterFusion training will be initiated once sufficient paired data exists, specifically, cases where the system can compare the quality of latent-space communication against text-based communication for the same task. The cosine alignment loss for InnerLink and a reconstruction loss for OuterFusion will train these modules to perform useful transformations rather than transparent pass-throughs, potentially unlocking the full bandwidth advantage of latent-space inter-agent communication.
11. Conclusion
AXOM is not a collection of features with biological names. It is a coherent cognitive architecture in which neuroscience provides the structural blueprint and software engineering provides the implementation. Every component was designed by asking what problem its biological counterpart solves, understanding the mechanism by which biology solves it, and building a computational analog that addresses the same problem with the same structural approach. The result is a system of interlocking feedback loops in which each component amplifies the others.
The chassis provides inference capacity without domain knowledge. Leaf adapters provide domain knowledge without altering the inference substrate. The router directs information flow based on semantic content. The memory system stores episodic knowledge and graduates it into procedural capability. The autonomous loop generates knowledge without human direction. Synaptogenesis discovers connections between independently explored ideas. The pulse monitor prevents cognitive stagnation. The outcome scorer and risk evaluator develop executive judgment from accumulated experience. AXOM's latent communication system enables agents to communicate in latent space, preserving information that text serialization destroys. Swarm fusion synthesizes multiple perspectives into a unified representation. The plugin architecture extends sensory reach without modifying cognition. And the Neural Observatory makes the entire process observable in real time.
Each of these capabilities is valuable independently. Their integration is what makes the architecture significant. The autonomous loop generates knowledge that the memory system stores. The graduation pipeline converts that knowledge into training data. The improved leaf produces better exploration. The router gets smarter. The risk evaluator gets more selective. Synapses form between branches that were explored weeks apart. The flywheel feeds real-world usage back into the improvement cycle. And the distributed network enables natural selection on learned capabilities across a population of agents.
The architecture is modeled on the most successful intelligence architecture known: the human brain. The brain solves problems that AI has struggled with for decades, not because it uses different hardware, but because its organizational structure creates emergent capability from the interaction of specialized, relatively simple subsystems. AXOM applies this insight directly: each subsystem is simple enough to implement and test in isolation (147 passing tests and zero failures across the implemented components), but the integrated system produces behavior that no individual subsystem could generate alone. Self-directed exploration that discovers cross-domain connections, stores them in memory, graduates them into weight updates, and uses the improved capability to explore more effectively, this is not a feature. It is an emergent property of the architecture.
The path from validated subsystems to self-evolving intelligence is not a leap; it is a series of concrete engineering milestones. The first full integration test, the first autonomous graduation cycle, the first multi-leaf fusion, the first distributed leaf governance event. Each milestone is well-defined, testable, and within reach. The architecture does not require theoretical breakthroughs to proceed. It requires engineering execution.