Megatron-LM
by
bigcode-project

Description: Ongoing research training transformer models at scale

View on GitHub ↗

Summary Information

Updated 37 minutes ago

Added to GitGenius on February 29th, 2024

Created on October 7th, 2022

Open Issues & Pull Requests: 30 (+0)

Number of forks: 52

Total Stargazers: 396 (+0)

Total Subscribers: 7 (+0)

Issue Activity (beta)

Open issues: 21

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 1,067 days

Stale 30+ days: 21

Stale 90+ days: 21

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

bug (6)
enhancement (6)
help wanted (1)

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 1.3 hours

Mean response time: 370.8 days

90th percentile: 741.5 days

Tracked items: 2

Most active contributors

Ankush2k - 1 events, 1 issues
cakiki - 1 events, 1 issues

Related by overlapping contributors

Detailed Description

The Megatron-LM repository is NVIDIA's research codebase for training large-scale transformer language models efficiently across multiple GPUs and nodes. Developed by the Applied Deep Learning Research team at NVIDIA, it provides implementations for pretraining transformer-based models including GPT, BERT, and T5 using mixed precision training. The repository focuses on enabling efficient training of models ranging from billions to trillions of parameters through advanced parallelization techniques.

The core technical contribution of Megatron-LM centers on three model-parallel approaches: tensor parallelism, sequence parallelism, and pipeline parallelism. These techniques allow researchers to distribute massive models across multiple GPUs and compute nodes, making it feasible to train models that would otherwise exceed the memory capacity of individual devices. The codebase demonstrates near-linear scaling up to 1 trillion parameter models running on 3,072 A100 GPUs, achieving model FLOPs utilization of 56.3% and hardware FLOPs utilization of 57.0% for the largest configurations.

Megatron-LM has become a foundational tool in large-scale language model research, with documented applications across numerous published projects. These include BioMegatron for biomedical domain language modeling, work on neural retrievers for open-domain question answering, multi-actor generative dialog modeling, controllable story generation, and detoxification of large-scale language models. The framework is also integrated into NVIDIA's NeMo Megatron, a commercial framework designed to help enterprises build and train sophisticated NLP models with billions and trillions of parameters.

The repository provides comprehensive workflows for training and evaluation. Users can preprocess training data from loose JSON format into optimized binary formats using mmap, cached index, or lazy loader implementations. The codebase includes example scripts for BERT pretraining, GPT pretraining, and T5 pretraining, along with distributed pretraining capabilities. Additional features include activation checkpointing and recomputation for memory efficiency, a distributed optimizer, and FlashAttention support. For evaluation, the repository offers tools for GPT text generation, perplexity evaluation on WikiText, LAMBADA cloze accuracy testing, and BERT task evaluation including RACE and MNLI benchmarks.

The project provides pretrained checkpoints for BERT-345M and GPT-345M models available through NVIDIA's GPU Cloud registry, enabling users to skip the expensive pretraining phase and proceed directly to downstream task evaluation or finetuning. Setup requires the latest NGC PyTorch container, PyTorch, CUDA, NCCL, and NVIDIA APEX, with NLTK needed only for data preprocessing.

According to GitGenius activity tracking, the repository shows a median issue and pull request response latency of 1.3 hours across tracked items, indicating active maintenance. The most active tracked contributors include Ankush2k and cakiki. The repository maintains strong connections to related projects in the deep learning ecosystem, with overlapping contributors linking it to PyTorch, DeepSpeed, and NVIDIA TensorRT-LM, reflecting its central role in the large-scale AI training infrastructure landscape.

Megatron-LM
by
bigcode-project

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

Megatron-LM
by
bigcode-projectbigcode-project/Megatron-LM

Repository Details

Megatron-LM by bigcode-project

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

Megatron-LM by bigcode-projectbigcode-project/Megatron-LM

Repository Details

Megatron-LM
by
bigcode-project

Megatron-LM
by
bigcode-projectbigcode-project/Megatron-LM