OpenMOSS Expands MOSS-TTS Family with Nano Model, Enhanced SoundEffects
OpenMOSS and MOSI.AI have expanded the MOSS-TTS family of open-source speech and sound generation models with new releases. Key additions include MOSS-SoundEffect-v2.0 for 48 kHz bilingual sound effects and MOSS-TTS-Nano, a 100M-parameter model for multilingual voice cloning on CPU cores, alongside a MOSS-TTS `llama.cpp` implementation for PyTorch-free inference.
Key Takeaways
- MOSS-TTS-Nano, a 100M-parameter model, supports multilingual voice cloning and 48 kHz stereo audio on CPU cores.
- MOSS-SoundEffect-v2.0 generates 48 kHz bilingual sound effects up to 30 seconds using a DiT backbone with Flow Matching.
- MOSS-TTS-v1.5 improves multilingual synthesis with language tags, voice cloning stability, and adds explicit pause control.
- PyTorch-free inference is now supported via `llama.cpp` and ONNX Runtime, enabling lightweight on-device deployment.
Why It Matters
The release of MOSS-TTS-Nano for CPU-based inference democratizes advanced voice cloning, making high-quality speech generation more accessible for edge devices and cost-sensitive applications. This expansion of open-source capabilities pressures commercial TTS providers, forcing reevaluation of pricing models and feature sets. Streaming platforms and content creators can leverage these tools for more localized and diverse audio production without heavy computational overhead, improving user experience and potentially reducing operational costs. Next, watch for real-world application deployments utilizing these CPU-friendly models and any subsequent announcements from commercial competitors.
Read full article at github.com