Hume AI Launches Open Source TADA TTS Model Supporting Long-Context Speech

0
1
Hume AI Releases Open Source TADA TTS Model With New Dual-Alignment Architecture
Hume AI Releases Open Source TADA TTS Model With New Dual-Alignment Architecture

Hume AI has open sourced its TTS model TADA, introducing a new text-acoustic alignment architecture that generates speech over five times faster while supporting long-form audio up to 700 seconds.

Hume AI has released TADA, its first open source text-to-speech (TTS) model, making both the models and source code publicly available for researchers, developers, and companies building voice-enabled applications.

TADA introduces a Text-Acoustic Dual Alignment tokenization architecture that aligns text tokens directly with audio tokens. This design enables more accurate speech synthesis while significantly reducing common TTS errors.

Performance is a key highlight. The system generates real-time speech more than five times faster than comparable LLM-based TTS systems, while maintaining high accuracy and nearly eliminating content errors in generated speech.

The model also supports long-form speech generation with up to 700 seconds of audio context, a capability that goes well beyond typical TTS systems, which often struggle with limited context windows, heavy memory requirements, and issues such as hallucinated or missing speech segments.

TADA is designed for on-device deployment, enabling lower latency, improved privacy, and reduced reliance on cloud infrastructure. The release includes both English and multilingual models, expanding its potential for global voice applications.

Early technical evaluations indicate high speaker similarity and strong naturalness scores, positioning the system as a potential alternative to existing commercial and research TTS solutions.

Initial reactions from developers and AI experts suggest the architecture could reshape voice synthesis, particularly for regulated industries and resource-constrained environments.

Hume AI focuses on building voice AI research infrastructure for AI organisations and research labs, aiming to advance reliable and efficient voice generation technologies through open collaboration.

LEAVE A REPLY

Please enter your comment!
Please enter your name here