ChatTTS
by
2noise

Description: A generative speech model for daily dialogue.

View on GitHub ↗

Summary Information

Updated 1 hour ago

Added to GitGenius on August 4th, 2025

Created on May 27th, 2024

Open Issues & Pull Requests: 63 (+0)

Number of forks: 4,249

Total Stargazers: 39,586 (+0)

Total Subscribers: 205 (+0)

Issue Activity (beta)

Open issues: 64

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 477 days

Stale 30+ days: 60

Stale 90+ days: 56

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

stale (300)
documentation (202)
algorithm (65)
bug (65)
help wanted (57)
invalid (42)
question (41)
duplicate (27)

Most active issues this week

#1006 [Request] Russian support - 1 events / 1 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 5.9 hours

Mean response time: 3.3 days

90th percentile: 9.3 days

Tracked items: 623

Most active contributors

fumiama - 1,317 events, 417 issues
jianchang512 - 30 events, 20 issues
libukai - 29 events, 21 issues
anitman - 16 events, 7 issues
kangyiwen - 15 events, 3 issues

Related by overlapping contributors

Detailed Description

ChatTTS is an open-source project aiming to create a highly realistic and controllable text-to-speech (TTS) system, specifically designed for conversational AI applications. It distinguishes itself by focusing on *expressive* speech, going beyond simply converting text to audio and striving for natural-sounding prosody, emotion, and speaking style. The core innovation lies in its use of a diffusion model conditioned on both text and acoustic features, allowing for fine-grained control over the generated speech.

At its heart, ChatTTS leverages a non-autoregressive diffusion probabilistic model. Unlike traditional autoregressive TTS models that generate audio sequentially, diffusion models start with random noise and iteratively refine it into coherent speech based on the provided conditions. This approach offers several advantages, including faster inference speeds and the potential for higher audio quality. The model is conditioned on both the input text (using a text encoder) and a set of acoustic features extracted from reference audio. These acoustic features, such as pitch, energy, and duration, are crucial for controlling the characteristics of the synthesized speech.

A key component of ChatTTS is its emphasis on *referential encoding*. This means the system can learn to mimic the speaking style of a specific speaker from a relatively small amount of reference audio. Users provide a short recording (typically a few seconds to a minute) of the desired voice, and the model extracts acoustic features from this recording. These features are then used to guide the diffusion process, resulting in synthesized speech that closely resembles the reference speaker's voice and prosody. This capability is particularly valuable for creating personalized voice assistants or for applications where maintaining a consistent voice identity is important.

The repository provides pre-trained models, training scripts, and inference code, making it relatively accessible for researchers and developers. It supports both zero-shot TTS (generating speech without a reference speaker) and reference-based TTS. The training process involves several stages, including acoustic feature extraction, text encoding, and diffusion model training. The project utilizes a combination of publicly available datasets for training, including LibriSpeech and VCTK, and provides instructions for preparing custom datasets.

Currently, ChatTTS is still under active development, but it demonstrates promising results in terms of speech quality and controllability. The project's GitHub repository includes detailed documentation, examples, and a growing community of contributors. Future development directions include improving the robustness of the reference encoding, expanding the range of supported languages, and exploring techniques for further enhancing the expressiveness and naturalness of the generated speech. The project's open-source nature encourages collaboration and innovation in the field of conversational AI and TTS.

ChatTTS
by
2noise

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

ChatTTS
by
2noise2noise/ChatTTS

Repository Details

ChatTTS by 2noise

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

ChatTTS by 2noise2noise/ChatTTS

Repository Details

ChatTTS
by
2noise

ChatTTS
by
2noise2noise/ChatTTS