New Open Source Emotional AI Text to Speech and Finishing Up Audiobook Maker

Introduction

Hello YouTube! Today, I'm excited to share my weekly updates regarding the Audiobook Maker and a fascinating new emotional AI Text-to-Speech (TTS) software that I've recently discovered. The Audiobook Maker is nearing its finish line, with an expected release early next week, allowing users the opportunity to test it out.

Audiobook Maker Updates

We're currently in the coding phase, and the functionality is progressing smoothly. The Audiobook Maker now supports multiple speakers, which can be switched easily. Just look at this interface—it's full-sized for better visibility. You can load existing audiobooks and see the different speakers available. The speaker voice is adjustable, and I can even alter the sentence color for easier identification of which voice to use.

One of the exciting features is the Regeneration Mode. This allows users to select specific sentences to regenerate with different voices. Once selected, you can simply run a command to continue audiobook generation. The process is visually represented in the terminal as it loads, making it user-friendly during operation.

Before the release next week, I just need to finalize some settings related to RVC (referred to as S2S, or Speech-to-Speech). As it stands, the first version will utilize only the tortoise engine. While I aim to include Style TTS 2, its addition will depend on my efficiency over the weekend. Regarding Xtts, I've noted its inclusion, but I lack a timeline since I’m still learning how to implement it effectively.

For those keen on audiobook creation or who are interested specifically in Tortoise TTS, this tool is designed to meet those needs. You can follow its progress on my GitHub repository. Instructions on how to install it will be made clear in the version 3 branch before pushing it to the master branch.

New Emotional TTS Research

In addition to the Audiobook Maker updates, I stumbled upon an innovative emotional text-to-speech tool called EmoKnob. This open-source framework enables users to extract emotional tones from text using large pre-trained models. The rationale behind this technology is that thanks to rich data training sets, we can enhance our ability to control emotional output in generated audio.

I explored some demos and was impressed by how EmoKnob operates. It seems to convert speaker voices and emotional samples into vectors, affecting how the generated audio sounds. I listened to various samples, both original and with emotional overlays like anger and sadness. While some variations were effective, the emotional accuracy isn’t perfect yet, but the advancements in this area of TTS are promising.

Conclusion

That sums up today’s updates! The Audiobook Maker will be available soon for members of the channel's package tier. As a follow-up, I plan on launching a series simplifying the installation of different TTS engines. Stick around for more exciting updates and findings!

Keywords

Emotional AI
Text-to-Speech
Audiobook Maker
Regeneration Mode
Tortoise TTS
EmoKnob
RVC
Speaker Voice

FAQ

Q: When will the Audiobook Maker be released?
A: The Audiobook Maker is expected to be released early next week.

Q: What features are included in the Audiobook Maker?
A: The key features include support for multiple speakers, a regeneration mode for sentence-specific voice alterations, and customizable speaker voices.

Q: What is EmoKnob?
A: EmoKnob is an emotional AI TTS framework that allows users to control emotional output in generated speech using pre-trained models.

Q: Can I find the Audiobook Maker code online?
A: Yes, the progress of the Audiobook Maker can be tracked on GitHub.

Q: Will there be tutorials for installing different TTS engines?
A: Yes, I plan to create a series of videos to help streamline the installation process for various TTS engines.