DeepSpeed
by
deepspeedai

Description: DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

View on GitHub ↗

Summary Information

Updated 41 minutes ago

Added to GitGenius on December 13th, 2023

Created on January 23rd, 2020

Open Issues & Pull Requests: 1,302 (+0)

Number of forks: 4,883

Total Stargazers: 42,685 (+0)

Total Subscribers: 359 (+0)

Issue Activity (beta)

Open issues: 1,127

New in 7 days: 1

Closed in 7 days: 2

Avg open age: 947 days

Stale 30+ days: 1,114

Stale 90+ days: 1,102

Recent activity

Opened in 7 days: 1

Closed in 7 days: 2

Comments in 7 days: 0

Events in 7 days: 0

Top labels

bug (1,661)
training (961)
enhancement (412)
inference (300)
ci-failure (120)
deepspeed-chat (106)
compression (83)
build (42)

Most active issues this week

#7483 [BUG] FlopsProfiler will hit error when sequence parallel enabled - 2 events / 0 comments
#8102 [BUG] AutoEP breaks Python 3.9 compatibility -- `TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'` - 2 events / 0 comments
#8084 zero.Init silently does not shard (world_size=1) when the process group is uninitialized before from_pretrained -> per-rank full load -> OOM - 1 events / 1 comments
#8097 Missing input validation could cause unexpected behavior with edge case inputs - 1 events / 1 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 0.0 hours

Mean response time: 156.8 days

90th percentile: 582.3 days

Tracked items: 1,228

Most active contributors

loadams - 916 events, 317 issues
tjruwase - 450 events, 192 issues
jomayeri - 227 events, 103 issues
tohtana - 210 events, 100 issues
sfc-gh-truwase - 110 events, 59 issues

Related by overlapping contributors

Detailed Description

DeepSpeed is a deep learning optimization library developed by Microsoft that enables efficient distributed training and inference of large-scale models. Written in Python and built on PyTorch, the library provides system-level innovations that have made it possible to train some of the world's largest language models, including MT-530B and BLOOM. The repository has grown to 42,655 stargazers as of the most recent tracking period and maintains active development with substantial community engagement.

The core innovations in DeepSpeed include ZeRO, a memory optimization technique that reduces memory consumption during training, along with ZeRO-Infinity for handling extremely large models. The library also implements 3D-Parallelism, Ulysses Sequence Parallelism for handling long sequences, and DeepSpeed-MoE for mixture-of-experts models. Recent additions documented in the repository include SuperOffload for large-scale LLM training on superchips, ZenFlow as a stall-free offloading engine, Arctic Long Sequence Training for multi-million token sequences, and DeepCompile for compiler optimization in distributed training. The library supports various parallelism strategies including data parallelism, model parallelism, and pipeline parallelism, enabling training of models with parameters ranging from billions to trillions.

DeepSpeed has been integrated into major open-source frameworks including Hugging Face Transformers, Hugging Face Accelerate, PyTorch Lightning, MosaicML Composer, Determined, and MMEngine, making it accessible to practitioners across different training ecosystems. The library has powered training of numerous large-scale models such as Jurassic-1 (178B parameters), GLM (130B), YaLM (100B), and GPT-NeoX (20B), demonstrating its effectiveness across diverse model architectures and scales.

The repository shows active maintenance and community engagement. GitGenius tracking data reveals a median issue and pull request response latency of 0.0 hours with a mean of 3778.4 hours across 1,223 tracked items, indicating rapid initial responses to community contributions. The most active issue labels are bug (706 occurrences), training (537 occurrences), and enhancement (163 occurrences), reflecting the library's focus on stability, training optimization, and feature development. Key contributors tracked by GitGenius include loadams with 916 events, tjruwase with 450 events, and jomayeri with 227 events, showing concentrated expertise in the project's development.

The project maintains regular community engagement through monthly office hours held on the last Tuesday of each month, providing opportunities for users and developers to discuss development plans and ask questions. Recent work highlighted in the repository includes the Muon Optimizer integration, System DMA for ZeRO-3 on AMD GPUs, and DeepNVMe for affordable I/O scaling. The DeepSpeed team presented at ASPLOS 2026 and received an Honorable Mention for the Best Paper Award for SuperOffload work, demonstrating continued research contributions to the field. The library's classification spans memory efficiency, optimization techniques, mixed precision training, checkpointing strategies, performance scaling, and reduced communication overhead, positioning it as a comprehensive solution for large-scale deep learning infrastructure.

DeepSpeed
by
deepspeedai

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

DeepSpeed
by
deepspeedaideepspeedai/DeepSpeed

Repository Details

DeepSpeed by deepspeedai

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

DeepSpeed by deepspeedaideepspeedai/DeepSpeed

Repository Details

DeepSpeed
by
deepspeedai

DeepSpeed
by
deepspeedaideepspeedai/DeepSpeed