Microsoft Releases VibeVoice AI To Generate 90-Minute Multi-Speaker Podcasts

0
78
VibeVoice Brings Open Source Long-Form Text-to-Speech to Developers
VibeVoice Brings Open Source Long-Form Text-to-Speech to Developers

Microsoft launches VibeVoice, an open source AI tool that generates 90-minute, multi-speaker podcasts, offering developers and creators a powerful alternative to proprietary TTS systems.

Microsoft has unveiled VibeVoice, an open-source AI text-to-voice tool capable of generating 90-minute audio recordings with up to four distinct speakers from a script.

Unlike other AI-powered Text-to-Speech (TTS) systems, VibeVoice stands out for its ability to preserve audio fidelity, speaker consistency, and natural turn-taking across extended audio segments. These features make it suitable for producing expressive, long-form, multi-speaker conversational formats such as podcasts.
According to Microsoft’s official description:

“VibeVoice employs a next-token diffusion framework, leveraging a Large Language Model (LLM) to understand textual context and dialogue flow, and a diffusion head to generate high-fidelity acoustic details.”

The tool requires a script to function, which can be written manually or generated using another AI system such as ChatGPT. Once provided with content, VibeVoice can generate long-form conversational audio that maintains flow and speaker identity throughout.

VibeVoice is available in multiple parameter models:

  • 1.5 billion parameters (requires ~7GB VRAM, compatible with modern GPUs).
  • 7 billion parameters (requires ~18GB VRAM, offering higher quality).
  • 0.5 billion parameter model (in development, optimised for real-time audio generation).

The tool is accessible via a live demo and is downloadable from both GitHub and Hugging Face, highlighting Microsoft’s commitment to open-source development. This availability allows researchers, developers, and creators to test, customise, and build upon the system.

While VibeVoice delivers impressive voice quality and conversational flow, its audio still retains an AI-generated tone. Nonetheless, the release positions VibeVoice as a scalable, adaptable, and open-source alternative to proprietary long-form audio generation tools.

LEAVE A REPLY

Please enter your comment!
Please enter your name here