RVC-Boss/GPT-SoVITS

Description: 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

View on GitHub ↗Jump to charts ↓

Summary Information

Updated 23 minutes ago

Added to GitGenius on February 26th, 2026

Created on January 14th, 2024

Open Issues & Pull Requests: 873 (+0)

Number of forks: 6,511

Total Stargazers: 59,736 (+0)

Total Subscribers: 274 (+0)

Issue Activity (beta)

Open issues: 692

New in 7 days: 3

Closed in 7 days: 2

Avg open age: 459 days

Stale 30+ days: 683

Stale 90+ days: 667

Recent activity

Opened in 7 days: 2

Closed in 7 days: 2

Comments in 7 days: 0

Events in 7 days: 2

Top labels

In follow-up (192)
todolist (36)
bug (13)
高亮 (4)
enhancement (2)
good first issue (2)
question (1)

Most active issues this week

#2282 Sovits模型训练不出，GPT可以训练，一直报错，5070显卡，14600kf cpu - 2 events / 1 comments
#2802 install.sh的第301行echo忘记加-e了，导致颜色转义失败 - 2 events / 1 comments
#2805 很好的项目 - 2 events / 0 comments
#2605 建议增加关于哈气呼吸声笑声的特殊性标注功能 - 1 events / 1 comments
#2798 Windows single-GPU training crash (ProcessExitedException 3221225620) - 1 events / 0 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 12.1 hours

Mean response time: 60.3 days

90th percentile: 254.4 days

Tracked items: 1,506

Most active contributors

RVC-Boss - 1,494 events, 728 issues
XXXXRT666 - 575 events, 357 issues
KamioRinn - 143 events, 91 issues
ChasonJiang - 93 events, 47 issues
SapphireLab - 87 events, 72 issues

Related by overlapping contributors

Detailed Description

GPT-SoVITS is a Python-based text-to-speech and voice cloning system that enables users to create high-quality TTS models with minimal voice data. The core innovation is its ability to train effective models using just one minute of voice data, making it accessible for few-shot voice cloning applications. The project provides a WebUI interface that integrates multiple tools to streamline the entire workflow from raw audio to trained models.

The system supports two primary inference modes. Zero-shot TTS allows instant text-to-speech conversion from a five-second vocal sample without any training. Few-shot TTS enables fine-tuning with one minute of training data to achieve improved voice similarity and realism. The project demonstrates strong inference performance, with reported real-time factors of 0.028 on NVIDIA 4060Ti GPUs and 0.014 on 4090 GPUs, translating to approximately 3.36 seconds of inference time for 1400 words of text.

Cross-lingual capability is a significant feature, supporting inference in English, Japanese, Korean, Cantonese, and Chinese, even when the training dataset differs from the target language. The WebUI includes integrated tools for voice accompaniment separation, automatic training set segmentation, Chinese ASR with punctuation restoration, and text labeling functionality. These tools are designed to assist users in preparing training datasets and building GPT and SoVITS models without requiring extensive technical expertise.

The repository shows substantial community engagement and active maintenance. GitGenius tracking data reveals 1501 issues and pull requests with a median response latency of 12.1 hours, indicating responsive project management. The primary contributor RVC-Boss has logged 1490 tracked events, with secondary contributors XXXXRT666 and KamioRinn contributing 575 and 143 events respectively. The most active issue labels are "In follow-up" with 189 items, "todolist" with 36 items, and "bug" with 13 items, reflecting ongoing development and refinement.

The project supports multiple deployment environments including Windows, Linux, and macOS, with tested configurations spanning Python 3.9 through 3.11 and PyTorch versions from 2.2.2 to 2.8.0dev. Docker support is available with both full and lightweight image variants. Windows users can download an integrated package for simplified installation, while users in China have access to localized download mirrors and cloud-based deployment options through AutoDL.

Version 2 introduced significant enhancements including Korean and Cantonese language support, an optimized text frontend, and extended pre-trained models trained on 5000 hours of data compared to the original 2000 hours. The system includes support for UVR5 models for advanced audio separation and reverberation removal, with flexibility to use different model architectures including roformer variants.

The repository is classified across multiple domains including Voice Synthesis, Text-to-Speech, Voice Conversion, Singing Voice, Speech Generation, Voice Cloning, Deep Learning, Audio Processing, Generative AI, and Speech Models. Its contributor network overlaps with major open-source projects including Microsoft's VSCode and TypeScript repositories, as well as the Rust language project, indicating its integration within broader development ecosystems.

RVC-Boss/GPT-SoVITS

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

GPT-SoVITS
by
RVC-BossRVC-Boss/GPT-SoVITS

Repository Details

RVC-Boss/GPT-SoVITS

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

GPT-SoVITS by RVC-BossRVC-Boss/GPT-SoVITS

Repository Details

GPT-SoVITS
by
RVC-BossRVC-Boss/GPT-SoVITS