VibeVoice
by
microsoft

Description: Open-Source Frontier Voice AI

View on GitHub ↗

Summary Information

Updated 1 hour ago

Added to GitGenius on December 6th, 2025

Created on August 25th, 2025

Open Issues & Pull Requests: 176 (+0)

Number of forks: 5,592

Total Stargazers: 50,015 (+0)

Total Subscribers: 250 (+0)

Issue Activity (beta)

Open issues: 125

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 76 days

Stale 30+ days: 117

Stale 90+ days: 81

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

bug (1)
compatibility (1)

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 6.8 hours

Mean response time: 5.2 days

90th percentile: 14.1 days

Tracked items: 203

Most active contributors

YaoyaoChang - 113 events, 64 issues
pengzhiliang - 100 events, 57 issues
MSLDCherryPick - 25 events, 21 issues
wenhui0924 - 23 events, 16 issues
donglixp - 18 events, 14 issues

Related by overlapping contributors

Detailed Description

VibeVoice is Microsoft's open-source frontier voice AI framework that provides a family of models for both text-to-speech and automatic speech recognition tasks. The repository is written in Python and serves as a research platform designed to advance collaboration in the speech synthesis community. The project maintains an active development pace with a median issue and pull request response latency of 6.8 hours across 203 tracked items, indicating responsive maintainership. The core development team includes YaoyaoChang with 113 recorded events, pengzhiliang with 100 events, and MSLDCherryPick with 25 events as the most active contributors.

The technical foundation of VibeVoice centers on continuous speech tokenizers operating at an ultra-low frame rate of 7.5 Hz, which preserve audio fidelity while significantly boosting computational efficiency for processing long sequences. The framework employs a next-token diffusion approach that leverages a Large Language Model to understand textual context and dialogue flow, combined with a diffusion head to generate high-fidelity acoustic details. This architecture enables the models to handle substantially longer audio sequences than conventional approaches.

The repository contains three primary models. VibeVoice-ASR is a 7-billion parameter unified speech-to-text model capable of processing 60-minute long-form audio in a single pass, generating structured transcriptions that include speaker identification, timestamps, and content. It supports over 50 languages natively and includes customized hotword functionality for domain-specific accuracy improvements. The ASR model was integrated into the Hugging Face Transformers library in March 2026 and includes available finetuning code. VibeVoice-TTS is a 1.5-billion parameter long-form multi-speaker text-to-speech model that synthesizes conversational speech up to 90 minutes in length while supporting up to four distinct speakers in a single conversation. This model was accepted as an oral presentation at ICLR 2026 and supports English, Chinese, and other languages. VibeVoice-Realtime is a lightweight 0.5-billion parameter real-time streaming TTS model designed for deployment scenarios, achieving approximately 300 milliseconds first audible latency while supporting streaming text input and robust long-form generation up to 10 minutes.

The repository is classified across multiple domains including emotional speech, speech synthesis, voice generation, diffusion models, emotion control, speech datasets, text-to-speech, audio generation, and deep learning. The project maintains connections with other major open-source repositories through overlapping contributors, including comfy-org/comfyui, ggml-org/llama.cpp, and huggingface/transformers, indicating integration within the broader AI ecosystem. Documentation, interactive Colab notebooks, and playground environments are provided for experimentation. The project includes technical reports available through OpenReview and arXiv, with the ASR report available at arxiv.org/pdf/2601.18184. The repository tracks bug and compatibility issues as the most active issue categories, reflecting focus on reliability and cross-platform support.

VibeVoice
by
microsoft

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

VibeVoice
by
microsoftmicrosoft/VibeVoice

Repository Details

VibeVoice by microsoft

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

VibeVoice by microsoftmicrosoft/VibeVoice

Repository Details

VibeVoice
by
microsoft

VibeVoice
by
microsoftmicrosoft/VibeVoice