mlx-audio
by
Blaizzy

Description: A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple...

View on GitHub ↗

Summary Information

Updated 19 minutes ago

Added to GitGenius on February 1st, 2026

Created on November 27th, 2024

Open Issues & Pull Requests: 81 (+0)

Number of forks: 663

Total Stargazers: 7,522 (+0)

Total Subscribers: 52 (+0)

Issue Activity (beta)

Open issues: 65

New in 7 days: 2

Closed in 7 days: 6

Avg open age: 174 days

Stale 30+ days: 64

Stale 90+ days: 48

Recent activity

Opened in 7 days: 2

Closed in 7 days: 6

Comments in 7 days: 7

Events in 7 days: 13

Top labels

bug (9)
enhancement (5)
documentation (1)
duplicate (1)
good first issue (1)
question (1)

Most active issues this week

#816 Long audio (>5min) causes Metal OOM in Nemotron ASR — log_mel_spectrogram and create_chunked_limited_mask process full input at once - 3 events / 2 comments
#818 Server: all Qwen3-TTS models fail to load — 'str' object has no attribute '__module__' (reproduced in 3 clean environments) - 3 events / 2 comments
#780 feat(stt): SRT/VTT cues are one-per-chunk and too long for subtitles - 2 events / 1 comments
#813 MLX Kokoro TTS produces NaN / silent audio on Apple Silicon — all voices, all languages - 2 events / 1 comments
#815 Kokoro TTS produces NaN + astronomically large audio on all languages (MLX numerical instability) - 2 events / 1 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 8.3 hours

Mean response time: 8.5 days

90th percentile: 19.9 days

Tracked items: 258

Most active contributors

Blaizzy - 499 events, 182 issues
lucasnewman - 115 events, 67 issues
chigkim - 46 events, 9 issues
rudrankriyam - 20 events, 7 issues
ivanfioravanti - 14 events, 8 issues

Related by overlapping contributors

Detailed Description

MLX-Audio is a Python-based audio processing library built on Apple's MLX framework, designed to deliver fast and efficient speech synthesis, recognition, and conversion specifically optimized for Apple Silicon devices. The library provides implementations of text-to-speech (TTS), speech-to-text (STT), and speech-to-speech (STS) capabilities, making it a comprehensive solution for audio tasks on M-series chips.

The repository supports an extensive collection of models across all three primary audio domains. For text-to-speech, it includes over twenty different model architectures ranging from lightweight options like KittenTTS and Soprano to advanced multimodal systems like Ming Omni TTS and KugelAudio. These TTS models support anywhere from single languages to over 600 languages, with features including voice cloning, style control, and adjustable speech speed. The speech-to-text implementations feature models from major organizations including OpenAI's Whisper, Alibaba's Qwen3-ASR, NVIDIA's Parakeet and Nemotron systems, and Meta's massively multilingual MMS supporting over 1000 languages. STT models in the library offer capabilities like speaker diarization, word-level alignment, streaming inference, and language identification.

The library provides multiple interfaces for users. A command-line interface allows straightforward audio generation and processing with options for streaming output and audio joining. A Python API enables programmatic access to all functionality. The repository includes an interactive web interface with 3D audio visualization and an OpenAI-compatible REST API for integration into existing systems. Installation is available through pip or uv, with separate options for command-line tools, full development environments, and web interface support.

Performance optimization is central to MLX-Audio's design. The library supports quantization at multiple bit depths including 3-bit, 4-bit, 6-bit, and 8-bit formats to reduce model size and improve inference speed on Apple Silicon. A Swift package is available for iOS and macOS integration, extending the library's reach to Apple's native platforms.

According to GitGenius activity tracking, the repository shows strong maintenance patterns with a median issue and pull request response latency of 6.5 hours across 254 tracked items, though mean latency extends to 202.1 hours indicating some complex issues require extended discussion. Bug reports represent the most active issue category with nine tracked items, followed by enhancement requests with five items. The primary contributor Blaizzy has logged 499 events, with secondary contributors lucasnewman and chigkim contributing 106 and 46 events respectively. The repository shares contributors with several other significant projects including ollama/ollama, unslothai/unsloth, and ggml-org/llama.cpp, indicating active participation in the broader machine learning and inference optimization ecosystem.

mlx-audio
by
Blaizzy

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

mlx-audio
by
BlaizzyBlaizzy/mlx-audio

Repository Details

mlx-audio by Blaizzy

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

mlx-audio by BlaizzyBlaizzy/mlx-audio

Repository Details

mlx-audio
by
Blaizzy

mlx-audio
by
BlaizzyBlaizzy/mlx-audio