stable-audio-tools
by
Stability-AI

Description: Generative models for conditional audio generation

View on GitHub ↗

Summary Information

Updated 20 minutes ago

Added to GitGenius on December 16th, 2025

Created on May 23rd, 2023

Open Issues & Pull Requests: 128 (+0)

Number of forks: 469

Total Stargazers: 3,809 (+0)

Total Subscribers: 53 (+0)

Issue Activity (beta)

Open issues: 87

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 556 days

Stale 30+ days: 85

Stale 90+ days: 83

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 3.2 days

Mean response time: 56.4 days

90th percentile: 179.6 days

Tracked items: 103

Most active contributors

zqevans - 27 events, 18 issues
nateraw - 21 events, 16 issues
sskalnik - 19 events, 12 issues
BingliangLi - 17 events, 6 issues
fred-dev - 16 events, 9 issues

Related by overlapping contributors

Detailed Description

Stable Audio Tools is a Python-based repository maintained by Stability AI that provides training and inference code for generative audio models capable of conditional audio generation. The project is classified across multiple domains including audio generation, sound synthesis, machine learning, deep learning, generative AI, audio processing, music production, sound design, AI models, and audio utilities. The repository requires PyTorch 2.5 or later to support Flash Attention and Flex Attention capabilities, with development conducted in Python 3.10. Dependency management is handled through uv, a fast and reproducible package manager.

The repository includes a Gradio-based interface for testing trained models, allowing users to interact with pre-trained checkpoints from Hugging Face without extensive setup. The interface supports multiple configuration options including loading pre-trained models by name, specifying local model configurations and checkpoint paths, replacing pretransform components for decoder testing, creating publicly shareable links, and setting login credentials. Users can also optionally convert model weights to half-precision for reduced memory requirements.

Training functionality is built on PyTorch Lightning to facilitate multi-GPU and multi-node distributed training. The training pipeline requires both model configuration files and dataset configuration files, along with a Weights and Biases account for logging training outputs and demonstrations. The codebase implements a training wrapper system where models are wrapped in PyTorch Lightning modules during training to include discriminators, EMA model copies, and optimizer states. An unwrap_model.py script removes these training-specific components to create inference-ready checkpoints, which are required for inference scripts, using models as pretransforms, and fine-tuning with modified configurations.

The repository supports fine-tuning through two approaches: continuing from wrapped model checkpoints using the ckpt-path flag, or starting fresh training runs with pre-trained unwrapped models via the pretrained-ckpt-path flag. Training configuration is highly customizable through command-line flags controlling checkpoint frequency, batch sizes, GPU allocation across single or multiple nodes, gradient accumulation, distributed training strategies including DeepSpeed ZeRO Stage 2, floating-point precision, data loader worker counts, and random seed initialization for deterministic training.

Model and dataset configuration relies on JSON files defining hyperparameters, training settings, and dataset information. Model configurations specify the model type from options including autoencoder, diffusion_uncond, diffusion_cond, diffusion_cond_inpaint, diffusion_autoencoder, and lm, along with audio properties like sample size, sample rate, and channel count. Dataset configuration supports both local audio file directories and WebDataset datasets stored in Amazon S3.

According to GitGenius activity tracking, the repository shows a median issue and pull request response latency of 76.5 hours across 103 tracked items, with a mean latency of 1354.3 hours. The most active contributors tracked are zqevans with 27 events, nateraw with 21 events, and sskalnik with 19 events. The repository shares overlapping contributors with major projects including Microsoft's VSCode and TypeScript implementations, as well as the Rust language project, indicating cross-pollination with significant open-source ecosystems.

stable-audio-tools
by
Stability-AI

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

stable-audio-tools
by
Stability-AIStability-AI/stable-audio-tools

Repository Details

stable-audio-tools by Stability-AI

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

stable-audio-tools by Stability-AIStability-AI/stable-audio-tools

Repository Details

stable-audio-tools
by
Stability-AI

stable-audio-tools
by
Stability-AIStability-AI/stable-audio-tools