Topview Logo
  • Create viral videos with
    GPT-4o + Ads library
    Use GPT-4o to edit video empowered by Youtube & Tiktok & Facebook ads library. Turns your links or media assets into viral videos in one click.
    Try it free
    gpt video

    TTS with AI Cloned Voices for Audiobooks, Narration, etc. - Set-up and Installation

    blog thumbnail

    TTS with AI Cloned Voices for Audiobooks, Narration, etc. - Set-up and Installation

    I have created an AI audiobook maker and narrator using various AI voice tools that I’ve explored on my channel. In this article, I’ll provide a detailed guide on installation and share a quick demo to get you started.


    Features and Demo

    I have an AI audiobook maker open and a processed audiobook ready. Useful features include:

    • Generating audiobooks from text files
    • Exporting to a single audio file
    • Regenerating audio for specific sentences

    For instance, the sentence "penniless" did not sound right initially, but with audio regeneration, it was corrected.


    Prerequisites

    Hardware:

    • Nvidia graphics card (10 series upwards with at least 6GB VRam recommended)
    • Plan to add support for AMD and Mac in the future

    Software:

    1. CUDA: Ensure that you have the latest CUDA version installed.
    2. Python: Python version 3.10 recommended.
    3. Git: Necessary for cloning repositories.
    4. VS Code: While optional, it’s highly recommended for better code management.
    5. Tortoise TTS: Necessary for text-to-speech.

    Installation Steps

    First, ensure you have the following software:

    1. Python 3.10 (make sure to add to PATH while installing)
    2. Git
    3. VS Code
    4. Tortoise TTS Installation: Detailed on my YouTube tutorial or on the GitHub installation wiki.

    Clone the audiobook maker from GitHub:

    git clone https://github.com/YourGitHub/audiobook_maker.git
    cd audiobook_maker
    

    Set up a virtual environment:

    python -m venv venv
    .\venv\Scripts\activate
    

    Install dependencies like PyTorch:

    pip install torch torchvision torchaudio
    pip install -r requirements.txt
    pip install --index-url https://pypi.org/simple/ rvc
    pip install ffmpeg-python
    

    Ensure additional files are in place:

    • rmvpe.pt and hubert_base.pt in the parent directory
    • azas.pth voice model in the voice_models directory

    Download ffmpeg full build and place ffmpeg.exe and ffprobe.exe in the parent directory.


    Configuration and Running

    Open the project in VS Code and change the interpreter to the virtual environment. Restart the language server:

    Ctrl + Shift + P -> Python: Restart Language Server
    

    Edit torts.yaml:

    voice: Mel
    samples: 4
    iterations: 32
    temperature: 0.8
    

    Launch Tortoise:

    1. Locate and run start.bat
    2. Go to the Local URL
    3. Configure result folder and autoregressive model

    Start the audiobook app:

    python audiobook_app_2.0.py
    

    Create a text file with your content and select it in the GUI. The interface allows for:

    • Playing generated audio
    • Regenerating sentences
    • Exporting as an audiobook file

    Additional functionality includes handling interrupted sessions and continuation for generation from where it stopped.


    Future Enhancements

    • Simplifying the installation process
    • Adding support for AMD and Mac
    • Support multiple languages for text-to-speech
    • Enhancements for better usability and performance

    If you encounter any issues, reach out, and I’ll try to assist as much as possible.


    Keywords

    • AI audiobook maker
    • AI voice tools
    • Nvidia graphics card
    • CUDA
    • Python 3.10
    • Git
    • VS Code
    • Tortoise TTS
    • PyTorch
    • Virtual Environment
    • ffmpeg

    FAQ

    Q1: What hardware do I need? A: An Nvidia graphics card, preferably from the 10 series upward with at least 6GB of VRAM.

    Q2: Can I use this tool on AMD or Mac? A: Currently, it’s limited to Nvidia GPUs, but plans are underway to support AMD and Mac in the future.

    Q3: What software do I need to install? A: You need to install CUDA, Python 3.10, Git, VS Code, Tortoise TTS, PyTorch, and ffmpeg.

    Q4: Where can I download the necessary files and dependencies? A: Links to download all required software and files are provided in the detailed guide above.

    Q5: Can I regenerate specific sentences in the audiobook? A: Yes, the tool allows for regenerating audio for specific sentences if the initial generation is not satisfactory.

    Q6: How do I handle interrupted audiobook generation sessions? A: You can load the existing audiobook and continue generation from where it left off using the application’s built-in functionality.

    One more thing

    In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.

    TopView.ai provides two powerful tools to help you make ads video in one click.

    Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.

    Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.

    You may also like