MLX Port for 24-Language Voice-Clone TTS Reduces Model Size by 73%
Quantized MLX weights for rednote-hilab/dots.tts-soar, a 24-language zero-shot voice-clone TTS model, are now available for Apple Silicon. These int4 weights reduce the model size by 73% at no quality loss, enabling efficient local deployment on Metal-compatible hardware. The new MLX port and quantization code are released under an Apache-2.0 license.
Key Takeaways
- `dots.tts-soar`, a 24-language zero-shot voice-clone Text-to-Speech (TTS) model, now has quantized MLX weights.
- The int4 weights reduce the model size by 73% to approximately 2.4 GB, down from 9 GB.
- MLX port and quantization code are released under an Apache-2.0 license, usable with the `dots-tts-mlx` runtime.
- Quality validation on a multilingual check (EN/DE/ES/FR + Hindi) showed no regression in transcription accuracy or voice similarity for int4 and int8 variants compared to bf16.
Why It Matters
The significant reduction in model size for a 24-language voice-clone TTS model enhances accessibility and reduces computational overhead, particularly for local deployment on Apple Silicon. This development lowers the barrier to entry for high-quality, multilingual AI voice generation in applications such as dubbing, content creation, and accessibility tools. The industry will be watching how this improved efficiency translates into wider adoption and new use cases for on-device TTS capabilities.
Additional Context
The availability of MLX-optimized TTS models like `dots.tts-mlx` highlights a growing trend in local AI inference on Apple Silicon. Several other projects are also leveraging MLX for speech synthesis and processing. For instance, `appautomaton/mlx-speech` provides local speech synthesis on Apple Silicon for TTS, voice cloning, and dialogue, supporting various models like MossTTSLocal and VibeVoice (GitHub, March 2026). Similarly, `louiscoetzee/mlx-tts-studio` is a native macOS app for high-quality TTS using Qwen3-TTS models, offering voice cloning and design features entirely on-device (GitHub, February 2026). Furthermore, the `mlx-tts-server` project offers an OpenAI-compatible Text-to-Speech server for Apple Silicon, powered by Qwen3-TTS and MLX, indicating a move towards standardized API access for local models (PyPI, March 2026). These parallel developments underscore increased developer focus on maximizing the on-device AI capabilities of Apple hardware, reducing reliance on cloud-based solutions and potentially improving privacy and latency for real-time streaming applications.
Read full article at huggingface.co