OpenMythos
by
kyegomez

Description: A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.

View kyegomez/OpenMythos on GitHub ↗

Summary Information

Updated 16 minutes ago

Added to GitGenius on May 3rd, 2026

Created on April 18th, 2026

Open Issues & Pull Requests: 46 (+0)

Number of forks: 2,621

Total Stargazers: 11,536 (+3)

Total Subscribers: 139 (+0)

Issue Activity (beta)

Open issues: 13

New in 7 days: 2

Closed in 7 days: 3

Avg open age: 11 days

Stale 30+ days: 0

Stale 90+ days: 0

Recent activity

Opened in 7 days: 1

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

#23 No feasible way to train/low efficiency - 3 events / 1 comments
#28 Empirical test of the depth-extrapolation claim: U-shape with fixed-loop training, flat plateau with random-loop training - 2 events / 0 comments
#44 Any evidence for Mythos using Loop Transformer(LT)? - 2 events / 1 comments
#54 This is still dumb - 2 events / 1 comments
#5 Real-world applications of this architecture - 1 events / 0 comments

Explore full issue details

Detailed Description

OpenMythos is an open-source, theoretical reconstruction of the Claude Mythos architecture, developed independently and based solely on publicly available research and speculation. The repository aims to provide a practical implementation of what is believed to be the core architectural innovations behind Claude Mythos, focusing on a Recurrent-Depth Transformer (RDT) design. This approach diverges from traditional transformer models by recycling a subset of layers multiple times within a single forward pass, enabling deeper reasoning without increasing parameter count.

The model architecture is divided into three main stages: Prelude, Recurrent Block, and Coda. The Prelude and Coda consist of standard transformer layers, each run once, while the Recurrent Block is looped up to a configurable number of iterations. This looped structure allows the model to perform iterative reasoning, akin to multi-step chain-of-thought, but entirely within the latent space and without intermediate token outputs. The input signal is injected at every loop iteration, ensuring that the original context remains influential throughout the recurrent process.

OpenMythos offers flexible attention mechanisms, allowing users to switch between Grouped Query Attention (GQA) and Multi-Latent Attention (MLA). GQA reduces memory requirements by using fewer key-value heads than query heads and supports Flash Attention 2 for efficient computation when available. MLA compresses key-value caches for position-aware attention, leveraging RoPE and other head dimension splits for optimal performance. The feed-forward layers are implemented as a sparse Mixture of Experts (MoE), with both routed and shared experts, enabling compute-adaptive and depth-variable reasoning. This MoE structure is suspected to be a key factor in Mythos’s ability to handle diverse domains efficiently.

The repository provides pre-configured model variants ranging from 1 billion to 1 trillion parameters, each with detailed specifications for dimensionality, expert count, loop iterations, context length, and output capacity. Training scripts are included, supporting both single and multi-GPU setups, and are designed to work with large-scale datasets such as FineWeb-Edu. The training process utilizes AdamW optimization, sharded streaming datasets, and precision settings tailored to modern GPUs, with a schedule that combines linear warmup and cosine decay.

A central hypothesis explored in OpenMythos is that the looped transformer architecture enables systematic generalization and depth extrapolation, allowing the model to solve problems requiring multi-hop reasoning and compositional logic. The repository discusses theoretical aspects such as stability in training, achieved by constraining injection parameters to maintain a spectral radius below one, and the potential use of loop-index embeddings to differentiate computational phases across iterations. It also addresses challenges like the memorization-reasoning tradeoff, overthinking due to excessive loops, and parameter reuse via LoRA adaptation.

OpenMythos is designed for researchers and practitioners interested in advanced transformer architectures, offering comprehensive documentation, API references, and guidance on scaling laws for looped models. The project emphasizes parameter efficiency, adaptive computation, and robust training, positioning itself as a valuable resource for exploring the next generation of deep learning models inspired by Claude Mythos.

OpenMythos
by
kyegomez

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

OpenMythos
by
kyegomezkyegomez/OpenMythos

Repository Details

OpenMythos by kyegomez

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

OpenMythos by kyegomezkyegomez/OpenMythos

Repository Details

OpenMythos
by
kyegomez

OpenMythos
by
kyegomezkyegomez/OpenMythos