Audiobook Maker Demo | RCV

Audiobook Maker Demo | RCV | TTS

The Audiobook Maker tab provides a user-friendly interface for creating high-quality audiobooks. This article will guide you through the process of using various drop-downs and sliders to customize and control the audio output to your liking.

Model Selection

First, navigate to the Model tab to select a trained model for your voice synthesis. For this example, we will choose a female voice model named Azuro, which was trained earlier in the Training tab.

TTS Speaker Dropdown

Next, go to the TTS Speaker dropdown. This list includes all default devices in your system as well as library options. Select the “IND speaker” from the list.

Pitch Extraction Method

The pitch extraction method used here is RMPE (Robust and efficient technique), known for its high accuracy and performance in extracting pitch from audio signals.

Slider Controls

Index Rate: Controls the level of detail in the sample features of the voice. A scale of 0 means none of Azuro’s features are applied; a scale of 1 means a perfect clone. Set the index rate to 1 for accurate voice cloning in the output.
Pitch Slider: Adjusts the pitch. Since both the model and TTS speaker in this example are female voices, keep the pitch slider at its default setting of 0.
Speed: Controls the pace of the voice. Lower values result in a slower, more precise voice; set the slider to -10.
Protect Slider: Regulates the protection level, influencing the stability and clarity of speech synthesis. Higher protection levels can help maintain clarity and prevent distortions. Adjust the slider to balance stability and naturalness.

Input Text

Input the main text in the text box. An example: selecting a Japanese model voice while the speaker is set to Hindi. Press the Convert button to see the cross-language capabilities. Once converted, you can play back the synthesized speech.

Model and Speaker Adjustment

For another example, adjust the model to a male voice like Obama and set the speaker to a female voice. Adjust the pitch slider to -12 to lower the pitch appropriately. Press convert to synthesize the speech.

Background Music Selection

Choose a background music track and use the volume slider to control the music level. A higher value on the slider means lower music volume and vice versa. Set it to around 9 for optimal mixing. Press the Combine button to attach the background music to the cloned voice.

The final output will display here, and you can enjoy a speech synthesis example of Obama reading a short story in Hindi.

Keywords

Audiobook Maker
Model selection
TTS Speaker
RMPE
Index rate
Pitch slider
Speed control
Protect slider
Synthesized speech
Background music

FAQ

Q1: What is the purpose of the Audiobook Maker tab?
A1: The Audiobook Maker tab is designed for creating high-quality audiobooks by customizing and controlling the audio output with various tools and settings.

Q2: How do I select a trained voice model?
A2: Navigate to the Model tab and select a pre-trained voice model, such as Azuro, for voice synthesis.

Q3: What does the TTS Speaker dropdown include?
A3: The TTS Speaker dropdown lists all default devices available on your system, along with additional library options for speech synthesis.

Q4: What is RMPE?
A4: RMPE stands for Robust and efficient pitch extraction, a method known for its high accuracy in retrieving pitch from audio signals.

Q5: How do I use the index rate slider?
A5: The index rate slider adjusts the level of detail from the sample features. Setting it to 1 ensures a perfect clone of the model’s voice.

Q6: When should I adjust the pitch slider?
A6: Adjust the pitch slider when the speaker's voice and the model's voice have different natural pitches, such as when using a male voice model with a female speaker.

Q7: How can I control the speed of the synthesized speech?
A7: Use the speed slider; lower values slow down the speech for more precision, while higher values make it faster.

Q8: What is the role of the protect slider?
A8: The protect slider helps regulate the stability and clarity of speech synthesis, balancing between naturalness and maintaining clarity in challenging conditions.

Q9: How do I add background music to my audiobook?
A9: Select a background music track, adjust the volume slider, and press the Combine button to mix the music with the cloned voice.

Q10: Can I work with different languages?
A10: Yes, the model supports cross-language capabilities, allowing you to combine different languages in the input text and voice models.