Topview Logo
  • Create viral videos with
    GPT-4o + Ads library
    Use GPT-4o to edit video empowered by Youtube & Tiktok & Facebook ads library. Turns your links or media assets into viral videos in one click.
    Try it free
    gpt video

    Get better sounding AI voice output from Elevenlabs.

    blog thumbnail

    Introduction

    Transforming text to speech that sounds almost lifelike isn't just a dream anymore thanks to Elevenlabs. This detailed guide will walk you through the various settings, sliders, voice selections, prompting techniques, and more to help you master Elevenlabs' text-to-speech capabilities.

    Voice Selection

    Selecting the right voice is like picking the right human actor. If you need a fast-talking, punchy voice, opting for someone like Morgan Freeman wouldn't make much sense. Similarly, when browsing the Elevenlabs voice library or creating one in the Voice Lab, ensure the sample clip matches the style of your project.

    Choosing the Right Model

    Elevenlabs Multilingual V2

    • Languages: 29
    • Features: Very stable, accurate, handles accents well, and offers language diversity.

    Elevenlabs Multilingual V1

    • Languages: 9
    • Notes: Experimental model, less accurate; avoid unless necessary.

    Elevenlabs English V1

    • Languages: English only
    • Notes: Fastest but least accurate; also features a smaller training data set.

    Elevenlabs Turbo V2

    • Languages: English only
    • Features: Fast generations, but lacks a style slider and may not be as accurate as Multilingual V2.

    For most projects, Multilingual V2 is your best bet. It's stable, natural, and accurate.

    Setting Sliders

    Stability Slider

    • Lower: More emotional range but can lead to odd performances and overly fast speech.
    • Higher: More stable voice but can become monotonous.
    • Starting Point: Default setting or between 40-50.

    Similarity Slider

    • Lower: Less like the original voice.
    • Higher: More like the original voice but can include artifacts.
    • Starting Point: 75-80 is a good setting.

    Style Exaggeration

    • Zero: Style exaggeration off.
    • Higher: Emphasizes the style of the original voice but can decrease stability.

    Speaker Boost

    • Checkbox: Increases similarity to the original recording but slows down generation.

    Settings are non-deterministic, meaning each time you generate, you will get slightly different results. The sweet spot for many is 40-50 for stability and 75-80 for similarity.

    Prompting

    Adding Pauses

    • Programmatic Syntax: <break time="1.5s"/> adds a 1.5-second pause.
    • Dashes: Use M-dashes or multiple dashes.
    • Ellipses: Three dots for hesitation e.g., "I... guess so."

    Pronunciation

    • Programmatic Syntax: Use SSML with IPA or CMU ARPAbet (complex).
    • Phonetic Spelling: Fun and flexible. E.g., "samurai" as "samoorai," "samurai," etc.

    Emotion

    • Contextual Cues: Write the text like a book, including cues such as "he said angrily."
    • Punctuation: Commas, periods, exclamation marks, and question marks help guide intonation.
    • Caps Lock: Emphasizing words or sentences with all caps often works.

    Pacing

    • Avoid Multi-Clipping: Submit one sample file with natural pauses.
    • **Editing Software:** Use tools like Descript for creating one clean file.
    • Write Descriptively: Add textual cues for the desired pacing e.g., "he said slowly."

    Combining these tips with the sliders can help you get the optimal voice. Lowering the similarity slider when using prompts can make the AI more flexible.

    Additional Tips and Tricks

    Keep generating until you get the take you like. Consider it as working with a human actor. If the first take doesn't work, try again and again until it's perfect.


    Keywords

    • Text-to-speech
    • Elevenlabs
    • Voice selection
    • Multilingual V2
    • Stability slider
    • Similarity slider
    • Style exaggeration
    • Speaker boost
    • Pauses
    • Pronunciation
    • Emotion
    • Pacing

    FAQ

    Q: Which Elevenlabs model should I use for the best overall performance?

    • A: Multilingual V2 is generally the best option for its stability, accuracy, and wide language support.

    Q: How can I ensure that the generated speech has the right emotional tone?

    • A: You can write your script with emotional cues and adjust the stability slider for more or less emotional range. Adding punctuation and using descriptive text can also help guide the AI.

    Q: How can I add pauses in the generated speech?

    • A: Use programmatic syntax like <break time="1.5s"/>, or try adding dashes, M-dashes, or ellipses for brief pauses.

    Q: What should I do if the AI pronounces a word incorrectly?

    • A: You can use phonetic spelling to adjust pronunciation or employ SSML tags with IPA or CMU ARPAbet for precise control.

    Q: Why does my cloned voice sound too fast?

    • A: This could be due to submitting multiple sample clips without pauses. Try merging your samples into a single file with natural gaps.

    Q: Are the Elevenlabs settings deterministic?

    • A: No, each generation will be slightly different. Use higher stability settings and keep generating until you get the desired result.

    Q: How can I reduce unwanted background noise in my cloned voice?

    • A: Ensure your original recordings are as clean as possible, free of background noise, sibilance, or electronic interference.

    Q: Can I use Elevenlabs for free?

    • A: Yes, there is a free tier available to test out these features and tips.

    One more thing

    In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.

    TopView.ai provides two powerful tools to help you make ads video in one click.

    Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.

    Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.

    You may also like