OpenMythos is an open-source, theoretical reconstruction of the Claude Mythos architecture, developed independently and based solely on publicly available research and speculation. The repository aims to provide a practical implementation of what is believed to be the core architectural innovations behind Claude Mythos, focusing on a Recurrent-Depth Transformer (RDT) design. This approach diverges from traditional transformer models by recycling a subset of layers multiple times within a single forward pass, enabling deeper reasoning without increasing parameter count.
The model architecture is divided into three main stages: Prelude, Recurrent Block, and Coda. The Prelude and Coda consist of standard transformer layers, each run once, while the Recurrent Block is looped up to a configurable number of iterations. This looped structure allows the model to perform iterative reasoning, akin to multi-step chain-of-thought, but entirely within the latent space and without intermediate token outputs. The input signal is injected at every loop iteration, ensuring that the original context remains influential throughout the recurrent process.
OpenMythos offers flexible attention mechanisms, allowing users to switch between Grouped Query Attention (GQA) and Multi-Latent Attention (MLA). GQA reduces memory requirements by using fewer key-value heads than query heads and supports Flash Attention 2 for efficient computation when available. MLA compresses key-value caches for position-aware attention, leveraging RoPE and other head dimension splits for optimal performance. The feed-forward layers are implemented as a sparse Mixture of Experts (MoE), with both routed and shared experts, enabling compute-adaptive and depth-variable reasoning. This MoE structure is suspected to be a key factor in Mythos’s ability to handle diverse domains efficiently.
The repository provides pre-configured model variants ranging from 1 billion to 1 trillion parameters, each with detailed specifications for dimensionality, expert count, loop iterations, context length, and output capacity. Training scripts are included, supporting both single and multi-GPU setups, and are designed to work with large-scale datasets such as FineWeb-Edu. The training process utilizes AdamW optimization, sharded streaming datasets, and precision settings tailored to modern GPUs, with a schedule that combines linear warmup and cosine decay.
A central hypothesis explored in OpenMythos is that the looped transformer architecture enables systematic generalization and depth extrapolation, allowing the model to solve problems requiring multi-hop reasoning and compositional logic. The repository discusses theoretical aspects such as stability in training, achieved by constraining injection parameters to maintain a spectral radius below one, and the potential use of loop-index embeddings to differentiate computational phases across iterations. It also addresses challenges like the memorization-reasoning tradeoff, overthinking due to excessive loops, and parameter reuse via LoRA adaptation.
OpenMythos is designed for researchers and practitioners interested in advanced transformer architectures, offering comprehensive documentation, API references, and guidance on scaling laws for looped models. The project emphasizes parameter efficiency, adaptive computation, and robust training, positioning itself as a valuable resource for exploring the next generation of deep learning models inspired by Claude Mythos.