Local AI Voice Cloning with Tortoise TTS - 2024 Installation (Check LATEST update in description)
Science & Technology
Introduction
Today we will provide you with an updated tutorial on how to install Tortoise TTS locally. If you are not familiar with Tortoise TTS, we will start with a quick demo before diving into the installation process. We will generate audio for a few sentences using models that have been trained. Let's take a listen:
"The men gazed at him with scorn and contempt as Subaru appraised them. They appeared to be in their mid-20s. Their clothes were filthy, and their faces reflected their inner evil. They weren't subhuman, but they couldn't be called decent humans either. Damn. A compulsory event facing the grinning men, Subaru wiped his face and stood up in a panic."
To install Tortoise TTS locally, you will need to download 7zip and have an Nvidia GPU. Unfortunately, Tortoise TTS will not work without an Nvidia GPU. Start by downloading and installing 7zip from their website. Once installed, head over to my GitHub repository, which contains the AI voice cloning files.
Click on the link to access the repository and navigate to the releases section. From there, download the Tortoise TTS package from Hugging Face. The package size is quite large, around 20 to 22 GB, due to the included models. It may take a few hours to download.
After downloading, use 7zip to extract the files. Cut and paste the extracted folder to a desired location. Inside the folder, you will find a file called "start.bat." Double-click on it to start the installation process. A terminal window will open and display a local URL. Open this URL in your browser (preferably Chrome) by holding control and left-clicking the link, or simply type it in manually.
This will open the Tortoise TTS interface, which is now ready for generation. The installed version includes Hi-Fi Gan and DeepSpeed for improved performance. You can adjust the settings to enable or disable these features. Remember to click "Reload TTS" after making changes.
To generate audio, go to the "Generate" tab and select a voice model (e.g. "random"). Enter the text you want to generate audio for and click "Generate." The audio sample will be generated within a few seconds.
You can also use Tortoise TTS for voice cloning. To do this, create a new folder inside the "voices" directory and name it accordingly. Place audio clips of the desired voice into this folder. Refresh the voice list in the interface, select the cloned voice, and click "Generate" to create samples based on the trained model.
Note that the process of curating a dataset and training a voice model is not covered in this tutorial but can be found in other resources. Once you have a dataset ready, you can extract the vocals from the audio using tools like Ultimate Vocal Remover.
After training a voice model, you may want to delete unnecessary files to save disk space. It is also recommended to create a backup of your trained model by copying it to another folder. Select the fine-tuned model in the Tortoise TTS interface and click "Reload TTS" to use it for inference.
These are the basic steps for installing and using Tortoise TTS locally. For more advanced training techniques and tips, refer to other resources or feel free to ask any questions in the comments section. Enjoy experimenting with Tortoise TTS and creating your own AI voices!
Keywords
Local AI Voice Cloning, Tortoise TTS, Installation, Voice Models, Audio Generation, Voice Cloning, Data Curation, Training Dataset, Inference, Fine-tuning, Backup, Disk Space
FAQ
Q: Can I install Tortoise TTS without an Nvidia GPU? A: No, an Nvidia GPU is required for installing Tortoise TTS.
Q: How long does the package download take? A: The package download may take several hours due to its large size.
Q: Can I adjust the settings to enable or disable Hi-Fi Gan and DeepSpeed? A: Yes, you can enable or disable these features in the settings menu.
Q: Is data curation and training covered in this tutorial? A: No, this tutorial focuses on the installation and basic usage of Tortoise TTS. Data curation and training are separate topics.
Q: How can I generate audio for voice cloning? A: Create a new folder in the "voices" directory, copy audio clips of the desired voice into it, and select the cloned voice in the Tortoise TTS interface for audio generation.