nanochat
by
karpathy

Description: The best ChatGPT that $100 can buy.

View on GitHub ↗

Summary Information

Updated 2 hours ago

Added to GitGenius on November 27th, 2025

Created on October 13th, 2025

Open Issues & Pull Requests: 90 (+0)

Number of forks: 7,737

Total Stargazers: 56,119 (+0)

Total Subscribers: 365 (+0)

Issue Activity (beta)

Open issues: 19

New in 7 days: 0

Closed in 7 days: 2

Avg open age: 51 days

Stale 30+ days: 17

Stale 90+ days: 10

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

feature (17)
code robustness (8)
potential_bug (8)
bug (7)
improvement (5)
install (4)
docs (3)
performance (3)

Most active issues this week

#685 MPS BF16 support - 3 events / 1 comments
#532 base_train.py crashes with raw AssertionError on small batch sizes instead of auto-aligning or providing actionable error - 1 events / 0 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 2.7 hours

Mean response time: 37.2 hours

90th percentile: 4.1 days

Tracked items: 133

Most active contributors

svlandeg - 187 events, 94 issues
karpathy - 77 events, 48 issues
nitishpandey04 - 6 events, 4 issues
aifartist - 5 events, 2 issues
falseywinchnet - 5 events, 1 issues

Related by overlapping contributors

Detailed Description

nanochat is an experimental training harness for large language models designed to be minimal, hackable, and cost-effective. Created by Andrej Karpathy, the repository enables users to train GPT-2 capability models on a single GPU node for under $100, a dramatic reduction from the approximately $43,000 cost in 2019. The core innovation is a single complexity dial—the `--depth` parameter representing transformer layers—that automatically calculates all other hyperparameters including model width, number of attention heads, learning rates, training duration, and weight decay to ensure compute-optimal training across different model sizes.

The repository covers the complete LLM pipeline including tokenization, pretraining, finetuning, evaluation, and inference. The speedrun script demonstrates training a GPT-2 equivalent model in approximately 1.5 to 2 hours on an 8XH100 GPU node, with costs around $48 on standard instances or closer to $15 on spot instances. The trained models can be interacted with through a simple command-line interface, allowing users to have conversations with their custom-trained models.

A central feature of nanochat is its Time-to-GPT-2 Leaderboard, which tracks wall-clock training time required to achieve GPT-2 grade capability as measured by the DCLM CORE score. The leaderboard documents progressive improvements from the original 168-hour OpenAI GPT-2 training down to recent entries achieving the target in under 2 hours through various optimizations including fp8 precision, batch size adjustments, dataset changes, and autoresearch techniques. This leaderboard serves as both a benchmark and incentive for community collaboration on pretraining optimization.

The codebase is written in Python and uses uv for dependency management. Precision handling is managed explicitly through a global COMPUTE_DTYPE variable that auto-detects based on hardware, defaulting to bfloat16 on CUDA SM 80+ devices, float32 on older GPUs, and float32 on CPU/MPS. Model weights are stored in fp32 for optimizer precision while computations occur in the specified dtype, providing mixed-precision benefits with explicit control.

According to GitGenius tracking data, the repository has grown from 55,774 to 55,779 stargazers since July 2026, indicating sustained interest. The project maintains active issue and pull request management with a median response latency of 2.7 hours across 133 tracked items, though mean latency is 37.2 hours. The most active labels are feature requests (17), potential bugs (8), and code robustness concerns (8). Primary contributors tracked by GitGenius include svlandeg with 187 events, karpathy with 77 events, and nitishpandey04 with 6 events. The repository overlaps with major machine learning projects including huggingface/accelerate, huggingface/transformers, and pytorch/pytorch through shared contributors.

The project is classified across multiple domains including chat application, client-server architecture, network communication, messaging systems, real-time processing, minimalist code design, educational tools, and socket programming. The repository includes research-focused scripts for scaling laws and miniseries training, supporting quick experimentation with smaller models like 12-layer GPT-1 sized variants for rapid iteration. The code runs on various hardware including Ampere A100 nodes, single GPUs with gradient accumulation, and even CPU or Apple Silicon through simplified training configurations, though with reduced performance on resource-constrained hardware.

nanochat
by
karpathy

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

nanochat
by
karpathykarpathy/nanochat

Repository Details

nanochat by karpathy

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

nanochat by karpathykarpathy/nanochat

Repository Details

nanochat
by
karpathy

nanochat
by
karpathykarpathy/nanochat