TTS with AI Cloned Voices for Audiobooks, Narration, etc. - Set-up and Installation
Science & Technology
TTS with AI Cloned Voices for Audiobooks, Narration, etc. - Set-up and Installation
I have created an AI audiobook maker and narrator using various AI voice tools that I’ve explored on my channel. In this article, I’ll provide a detailed guide on installation and share a quick demo to get you started.
Features and Demo
I have an AI audiobook maker open and a processed audiobook ready. Useful features include:
- Generating audiobooks from text files
- Exporting to a single audio file
- Regenerating audio for specific sentences
For instance, the sentence "penniless" did not sound right initially, but with audio regeneration, it was corrected.
Prerequisites
Hardware:
- Nvidia graphics card (10 series upwards with at least 6GB VRam recommended)
- Plan to add support for AMD and Mac in the future
Software:
- CUDA: Ensure that you have the latest CUDA version installed.
- Python: Python version 3.10 recommended.
- Git: Necessary for cloning repositories.
- VS Code: While optional, it’s highly recommended for better code management.
- Tortoise TTS: Necessary for text-to-speech.
Installation Steps
First, ensure you have the following software:
- Python 3.10 (make sure to add to PATH while installing)
- Git
- VS Code
- Tortoise TTS Installation: Detailed on my YouTube tutorial or on the GitHub installation wiki.
Clone the audiobook maker from GitHub:
git clone https://github.com/YourGitHub/audiobook_maker.git
cd audiobook_maker
Set up a virtual environment:
python -m venv venv
.\venv\Scripts\activate
Install dependencies like PyTorch:
pip install torch torchvision torchaudio
pip install -r requirements.txt
pip install --index-url https://pypi.org/simple/ rvc
pip install ffmpeg-python
Ensure additional files are in place:
rmvpe.pt
andhubert_base.pt
in the parent directoryazas.pth
voice model in thevoice_models
directory
Download ffmpeg full build and place ffmpeg.exe
and ffprobe.exe
in the parent directory.
Configuration and Running
Open the project in VS Code and change the interpreter to the virtual environment. Restart the language server:
Ctrl + Shift + P -> Python: Restart Language Server
Edit torts.yaml
:
voice: Mel
samples: 4
iterations: 32
temperature: 0.8
Launch Tortoise:
- Locate and run
start.bat
- Go to the Local URL
- Configure result folder and autoregressive model
Start the audiobook app:
python audiobook_app_2.0.py
Create a text file with your content and select it in the GUI. The interface allows for:
- Playing generated audio
- Regenerating sentences
- Exporting as an audiobook file
Additional functionality includes handling interrupted sessions and continuation for generation from where it stopped.
Future Enhancements
- Simplifying the installation process
- Adding support for AMD and Mac
- Support multiple languages for text-to-speech
- Enhancements for better usability and performance
If you encounter any issues, reach out, and I’ll try to assist as much as possible.
Keywords
- AI audiobook maker
- AI voice tools
- Nvidia graphics card
- CUDA
- Python 3.10
- Git
- VS Code
- Tortoise TTS
- PyTorch
- Virtual Environment
- ffmpeg
FAQ
Q1: What hardware do I need? A: An Nvidia graphics card, preferably from the 10 series upward with at least 6GB of VRAM.
Q2: Can I use this tool on AMD or Mac? A: Currently, it’s limited to Nvidia GPUs, but plans are underway to support AMD and Mac in the future.
Q3: What software do I need to install? A: You need to install CUDA, Python 3.10, Git, VS Code, Tortoise TTS, PyTorch, and ffmpeg.
Q4: Where can I download the necessary files and dependencies? A: Links to download all required software and files are provided in the detailed guide above.
Q5: Can I regenerate specific sentences in the audiobook? A: Yes, the tool allows for regenerating audio for specific sentences if the initial generation is not satisfactory.
Q6: How do I handle interrupted audiobook generation sessions? A: You can load the existing audiobook and continue generation from where it left off using the application’s built-in functionality.