
Mistral AI has released Voxtral TTS as a free open-weight enterprise voice model, positioning it against closed leaders such as ElevenLabs with lower-cost, fully controllable deployment across servers, edge devices, and smartphones.
Mistral AI has launched Voxtral TTS, which it describes as the first frontier-quality, open-weight enterprise text-to-speech model, releasing the full weights for free download in a direct challenge to closed voice AI leaders such as ElevenLabs, OpenAI, Google Cloud, and IBM.
The open-weight release forms the strongest differentiator, allowing enterprises to run the model on their own servers, deploy it on smartphones, retain all audio data in-house, and avoid third-party APIs entirely. This enterprise-owned approach strengthens compliance, data sovereignty, and deployment control for regulated sectors including finance, healthcare, and government.
Strategically, Voxtral TTS completes Mistral’s end-to-end speech pipeline, adding the output layer to its existing stack of Voxtral Transcribe, LLMs, Forge, AI Studio, and Compute, enabling full speech-to-speech enterprise agents without external dependencies.
Mistral also positions the launch as a direct disruption play against subscription-based voice services, claiming a 62.8% listener preference over ElevenLabs Flash v2.5 and 69.9% in voice customisation, while maintaining lower latency and infrastructure cost.
Built on a 3.4B-parameter decoder with a 390M acoustic transformer and 300M neural codec, the model runs in roughly 3GB RAM when quantised, supports nine languages, delivers 90ms time-to-first-audio, and generates speech six times faster than real time. The launch sharpens the industry shift towards open-weight, enterprise-controlled voice AI at scale.












































































