Updating the Best Local AI Audiobook Maker Application

Introduction

In this week's update, I'll be discussing some personal projects I've been focusing on, along with an overview of the ongoing development for an audiobook maker application. While I have paused my research on text-to-speech (TTS) alternatives, as I have multiple ongoing projects, I'm excited to share the progress on various tools.

AI Voice Cloning Repository

The first project I’m working on is the AI voice cloning repository, commonly referred to as Tortoise. I’ve encountered several issues primarily related to Docker and Linux compatibility. I plan to resolve these Docker issues to ensure that the requirements are correctly installed. However, as a solo developer, I will focus mainly on Windows, as that is my primary environment, and I won't maintain Docker or Linux anymore. If anyone has the expertise to address these concerns, I welcome pull requests to implement fixes.

Style TTS Web UI

Next, I’m excited to inform you that the Style TTS Web UI is gaining traction, with many users enjoying its functionality. If you encounter any issues, please utilize the issues tab to report them. I will address these as I can, though my response time may slow due to my involvement in other projects. A new feature has been added by John Singleton, allowing for paragraph generation within the web UI, which concatenates audio samples into one cohesive piece of audio.

Beatrice Trainer Web UI

Moving on to the Beatrice Trainer Web UI, it appears to be running smoothly for most users. I have been discussing a particular issue with Tommy Vault and will be producing follow-up videos that compare RVC (Real-Time Voice Cloning) with Beatrice. Although Beatrice is functional, it lacks the sound quality of RVC, which is typically better, albeit with some latency. This comparison will shed more light on the strengths and weaknesses of each tool.

Audiobook Maker Updates

Shifting focus to the Audiobook Maker, I’m currently updating version 3 of the application and will incorporate those enhancements into the main branch. One exciting development is the modularization of the app’s architecture using the model-view-controller (MVC) pattern, enhancing the application’s overall design and making it easier to implement new features.

I'm integrating various TTS engines, including Style TTS 2 and Tortoise. A checkbox feature will also be included for users to generate samples using RVC. Furthermore, I envision adding the capability to select different speakers for specific dialogues, enhancing the storytelling experience. I will consult with ChatGPT for assistance in ensuring that these implementations are feasible.

Additionally, I’ve experimented with separating speakers using large language models (LLMs). This functionality could automatically identify the narrator and characters in a text, establishing a structured method to produce audiobooks that represent different speakers seamlessly.

Future Aspirations

While I’ve made considerable progress, implementing these features will take time. On another note, I may return to an old project titled "Vivy," the AI VTuber, aiming to refactor the code for better organization. Given that this was one of my first projects, it currently lacks coherence, but with the advancements of the latest AI models, I am optimistic about rejuvenating the project.

In conclusion, I appreciate the support from my channel members, and I welcome any feedback on these projects. Thank you for following along, and stay tuned for future updates!

Keywords

AI voice cloning
TTS (Text-To-Speech)
Docker
Linux
Style TTS Web UI
Beatrice Trainer Web UI
Audiobook Maker
MVC architecture
RVC (Real-Time Voice Cloning)
LLM (Large Language Models)

FAQ

Q1: What is the AI Voice Cloning repository?
A1: The AI Voice Cloning repository, known as Tortoise, is aimed at enabling users to create AI-generated voice clones.

Q2: What is the Style TTS Web UI?
A2: The Style TTS Web UI is a text-to-speech application that allows users to convert written text into styled audio samples.

Q3: What features will be available in the Audiobook Maker?
A3: Future features of the Audiobook Maker will include speaker selection for dialogues, integration of various TTS engines, and improved audio processing through LLMs.

Q4: How can I report an issue with the projects?
A4: You can report issues through the issues tab in the respective project repositories for prompt attention.

Q5: What’s the difference between Beatrice and RVC?
A5: Beatrice offers functional capabilities for real-time applications but typically lacks the audio quality and clarity found in RVC, though it processes audio more quickly.