Megatron-LM
by
NVIDIA

Description: Ongoing research training transformer models at scale

View on GitHub ↗

Summary Information

Updated 53 minutes ago

Added to GitGenius on February 25th, 2026

Created on March 21st, 2019

Open Issues & Pull Requests: 988 (+0)

Number of forks: 4,218

Total Stargazers: 17,020 (+0)

Total Subscribers: 167 (+0)

Issue Activity (beta)

Open issues: 375

New in 7 days: 10

Closed in 7 days: 7

Avg open age: 174 days

Stale 30+ days: 236

Stale 90+ days: 142

Recent activity

Opened in 7 days: 8

Closed in 7 days: 7

Comments in 7 days: 13

Events in 7 days: 84

Top labels

community-request (462)
bug (398)
stale (301)
enhancement (206)
question (181)
waiting-on-maintainers (129)
module: moe (91)
module: transformer engine (37)

Most active issues this week

#5655 Symmetric memory support in MFSDP v2 - 11 events / 1 comments
#5595 Add per-layer measured logging for activation memory and forward/backward time - 7 events / 1 comments
#5643 Runtime error when enabling CP (context parallelism) during SFT of DeepSeek V4 - 7 events / 2 comments
#3945 [BUG] "Spike no more" feature was removed - 6 events / 1 comments
#4590 Reduce downstream Megatron patching for RL use cases - 6 events / 2 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 22.6 hours

Mean response time: 175.2 days

90th percentile: 729.9 days

Tracked items: 1,429

Most active contributors

svcnvidia-nemo-ci - 2,227 events, 184 issues
sbhavani - 1,591 events, 557 issues
chtruong814 - 542 events, 149 issues
shanmugamr1992 - 211 events, 198 issues
Phlip79 - 155 events, 74 issues

Related by overlapping contributors

Detailed Description

Megatron-LM is a GPU-optimized library developed by NVIDIA for training transformer models at scale, containing two complementary components. Megatron-LM serves as a reference implementation with pre-configured training scripts designed for research teams and quick experimentation with distributed training. Megatron Core is a composable library providing GPU-optimized building blocks for developers constructing custom training frameworks, offering transformer building blocks, advanced parallelism strategies including tensor parallelism, pipeline parallelism, data parallelism, expert parallelism, and context parallelism, along with mixed precision support for FP16, BF16, FP8, and FP4 formats.

The repository demonstrates substantial production capability, successfully training models ranging from 2 billion to 462 billion parameters across thousands of GPUs. Performance benchmarking on H100 clusters achieves up to 47 percent Model FLOP Utilization on weak scaling tests, with superlinear scaling improvements as model size increases from 41 percent to 47-48 percent MFU. Strong scaling tests of a 175 billion parameter GPT-3 model scale from 96 to 4,608 H100 GPUs while maintaining consistent batch sizes, demonstrating practical scalability for production workloads.

The codebase is written in Python and maintains active development with significant community engagement. GitGenius tracking shows 1,422 issues and pull requests with a median response latency of 23.4 hours, indicating responsive maintenance. The most active contributors include svcnvidia-nemo-ci with 2,217 tracked events, sbhavani with 1,575 events, and chtruong814 with 542 events. Community requests represent the highest volume of issue labels at 452 items, followed by bug reports at 389 and stale issues at 292, reflecting ongoing feature development and maintenance priorities.

Recent developments highlight expanding model architecture support and optimization techniques. The repository includes initial DeepSeek-V4 implementation on the development branch, support for emerging optimizers through the Emerging-Optimizers library, Falcon-H1 hybrid transformer-Mamba architecture contributions from Technology Innovation Institute, and dynamic context parallelism achieving up to 1.48x speedup for variable-length sequence training. Megatron Bridge provides bidirectional checkpoint conversion between Hugging Face and Megatron formats with production-ready recipes for popular models.

The project maintains interconnected development with related repositories including DeepSpeed, PyTorch, and Hugging Face Transformers, as evidenced by overlapping contributor networks. Installation options include PyPI packages or source compilation, with documentation available through NVIDIA's official guides covering installation, first training runs, parallelism strategies, and contribution procedures. The development process moved to open GitHub in December 2025, enabling transparent community contributions and continuous integration in public view.

Benchmark configurations specify vocabulary size of 131,072 tokens and sequence length of 4,096 tokens with varied model dimensions to achieve target parameter counts. Communication optimizations include fine-grained overlapping for data parallelism, tensor parallelism, and pipeline parallelism. The codebase includes comprehensive checkpointing and fault tolerance mechanisms for production training pipelines. Upcoming releases will drop Python 3.10 support, requiring downstream applications to adopt Python 3.12 or later for compatibility with Megatron Core 0.17.0 and subsequent versions.

Megatron-LM
by
NVIDIA

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

Megatron-LM
by
NVIDIANVIDIA/Megatron-LM

Repository Details

Megatron-LM by NVIDIA

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

Megatron-LM by NVIDIANVIDIA/Megatron-LM

Repository Details

Megatron-LM
by
NVIDIA

Megatron-LM
by
NVIDIANVIDIA/Megatron-LM