TTS with AI Cloned Voices for Audiobooks, Narration, etc.

TTS with AI Cloned Voices for Audiobooks, Narration, etc. - Set-up and Installation

I have created an AI audiobook maker and narrator using various AI voice tools that I’ve explored on my channel. In this article, I’ll provide a detailed guide on installation and share a quick demo to get you started.

Features and Demo

I have an AI audiobook maker open and a processed audiobook ready. Useful features include:

Generating audiobooks from text files
Exporting to a single audio file
Regenerating audio for specific sentences

For instance, the sentence "penniless" did not sound right initially, but with audio regeneration, it was corrected.

Prerequisites

Hardware:

Nvidia graphics card (10 series upwards with at least 6GB VRam recommended)
Plan to add support for AMD and Mac in the future

Software:

CUDA: Ensure that you have the latest CUDA version installed.
Python: Python version 3.10 recommended.
Git: Necessary for cloning repositories.
VS Code: While optional, it’s highly recommended for better code management.
Tortoise TTS: Necessary for text-to-speech.

Installation Steps

First, ensure you have the following software:

Python 3.10 (make sure to add to PATH while installing)
Git
VS Code
Tortoise TTS Installation: Detailed on my YouTube tutorial or on the GitHub installation wiki.

Clone the audiobook maker from GitHub:

git clone https://github.com/YourGitHub/audiobook_maker.git
cd audiobook_maker

Set up a virtual environment:

python -m venv venv
.\venv\Scripts\activate

Install dependencies like PyTorch:

pip install torch torchvision torchaudio
pip install -r requirements.txt
pip install --index-url https://pypi.org/simple/ rvc
pip install ffmpeg-python

Ensure additional files are in place:

rmvpe.pt and hubert_base.pt in the parent directory
azas.pth voice model in the voice_models directory

Download ffmpeg full build and place ffmpeg.exe and ffprobe.exe in the parent directory.

Configuration and Running

Open the project in VS Code and change the interpreter to the virtual environment. Restart the language server:

Ctrl + Shift + P -> Python: Restart Language Server

Edit torts.yaml:

voice: Mel
samples: 4
iterations: 32
temperature: 0.8

Launch Tortoise:

Locate and run start.bat
Go to the Local URL
Configure result folder and autoregressive model

Start the audiobook app:

python audiobook_app_2.0.py

Create a text file with your content and select it in the GUI. The interface allows for:

Playing generated audio
Regenerating sentences
Exporting as an audiobook file

Additional functionality includes handling interrupted sessions and continuation for generation from where it stopped.

Future Enhancements

Simplifying the installation process
Adding support for AMD and Mac
Support multiple languages for text-to-speech
Enhancements for better usability and performance

If you encounter any issues, reach out, and I’ll try to assist as much as possible.

Keywords

AI audiobook maker
AI voice tools
Nvidia graphics card
CUDA
Python 3.10
Git
VS Code
Tortoise TTS
PyTorch
Virtual Environment
ffmpeg

FAQ

Q1: What hardware do I need? A: An Nvidia graphics card, preferably from the 10 series upward with at least 6GB of VRAM.

Q2: Can I use this tool on AMD or Mac? A: Currently, it’s limited to Nvidia GPUs, but plans are underway to support AMD and Mac in the future.

Q3: What software do I need to install? A: You need to install CUDA, Python 3.10, Git, VS Code, Tortoise TTS, PyTorch, and ffmpeg.

Q4: Where can I download the necessary files and dependencies? A: Links to download all required software and files are provided in the detailed guide above.

Q5: Can I regenerate specific sentences in the audiobook? A: Yes, the tool allows for regenerating audio for specific sentences if the initial generation is not satisfactory.

Q6: How do I handle interrupted audiobook generation sessions? A: You can load the existing audiobook and continue generation from where it left off using the application’s built-in functionality.