torchtitan
by
pytorch

Description: A PyTorch native platform for training generative AI models

View on GitHub ↗

Summary Information

Updated 1 hour ago

Added to GitGenius on May 13th, 2025

Created on December 13th, 2023

Open Issues & Pull Requests: 584 (-1)

Number of forks: 893

Total Stargazers: 5,512 (+1)

Total Subscribers: 59 (+0)

Issue Activity (beta)

Open issues: 223

New in 7 days: 1

Closed in 7 days: 0

Avg open age: 195 days

Stale 30+ days: 203

Stale 90+ days: 164

Recent activity

Opened in 7 days: 1

Closed in 7 days: 0

Comments in 7 days: 1

Events in 7 days: 1

Top labels

question (105)
bug (71)
enhancement (69)
triage review (63)
high priority (58)
module: torch.compile (28)
documentation (19)
module: checkpoint (17)

Most active issues this week

#613 Ability to train based on epoch - 1 events / 1 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 0.7 hours

Mean response time: 6.1 days

90th percentile: 2.4 days

Tracked items: 718

Most active contributors

tianyu-l - 1,254 events, 493 issues
fegin - 301 events, 139 issues
danielvegamyhre - 225 events, 43 issues
wwwjn - 203 events, 96 issues
awgu - 148 events, 68 issues

Related by overlapping contributors

Detailed Description

TorchTitan is a PyTorch native platform designed for rapid experimentation and large-scale training of generative AI models. It serves as a minimal clean-room implementation of PyTorch native scaling techniques, providing a flexible foundation for developers to build upon. The project is currently under extensive development, with the README recommending users employ the most recent PyTorch nightly to access the latest features. The platform has achieved significant milestones, including acceptance of its research paper at ICLR 2025, a GPU MODE lecture in December 2024, and a presentation at PyTorch Conference 2024. Recent developments include the release of version 0.2.0 in October 2025, integration with SkyPilot for distributed training orchestration, and an AMD-optimized fork released in November 2025 for AMD GPU support.

The core mission of TorchTitan is to accelerate innovation in generative AI by empowering researchers and developers to explore new modeling architectures and infrastructure techniques. The platform is designed around three guiding principles: ease of understanding, use, and extension for different training purposes; minimal changes to model code when applying multi-dimensional parallelism; and a bias towards a clean, minimal codebase while providing basic reusable and swappable components. TorchTitan has been showcasing PyTorch's latest distributed training features through support for pretraining Llama 3.1 LLMs of various sizes, including 8B, 70B, and 405B parameter models.

The platform offers comprehensive support for multi-dimensional composable parallelisms, including FSDP2 with per-parameter sharding, Tensor Parallel with async variants, Pipeline Parallel with zero-bubble capabilities, and Context Parallel for training long-context LLMs. Additional key features include meta device initialization, selective and full activation checkpointing, distributed checkpointing with async support, torch.compile integration, Float8 and MXFP8 quantization support, supervised fine-tuning with chat-formatted datasets, and flexible learning rate scheduling. The platform provides extensive debugging tools, structured logging, and helper scripts for tokenizer downloads, checkpoint conversion, and distributed inference.

According to GitGenius activity tracking, the repository has demonstrated strong community engagement with a median issue and pull request response latency of 0.7 hours across 718 tracked items, though the mean response time is 147.2 hours, indicating some variance in response patterns. The most active issue labels are question with 105 occurrences, bug with 71, and enhancement with 69, reflecting a healthy mix of user inquiries, bug reports, and feature requests. The primary contributors tracked by GitGenius are tianyu-l with 1254 events, fegin with 301 events, and danielvegamyhre with 225 events. The repository maintains overlapping contributors with microsoft/vscode, pytorch/pytorch, and microsoft/typescript, indicating cross-project collaboration and knowledge sharing.

TorchTitan supports installation through multiple methods including direct source code execution, nightly builds, and stable releases via pip or conda. The platform includes an experiments folder to accelerate contributions and innovations, with dedicated guidelines for both experimental contributions and core fixes. Performance has been reported on up to 512 GPUs, with verified loss convergence correctness across various techniques. The source code is made available under a BSD 3 license, though users may have other legal obligations governing their use of linked third-party data and models.

torchtitan
by
pytorch

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

torchtitan
by
pytorchpytorch/torchtitan

Repository Details

torchtitan by pytorch

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

torchtitan by pytorchpytorch/torchtitan

Repository Details

torchtitan
by
pytorch

torchtitan
by
pytorchpytorch/torchtitan