AI Podcast Generator with cloned voice. Inspired by Google’s NotebookLM and Illuminate.
Science & Technology
Introduction
Happy Friday! Today, we're diving into an exciting topic: the AI podcast generator that I have built, which features cloned voices for an engaging listening experience. This project was inspired by Google's NotebookLM, a research tool that can generate interactive podcasts from various content types. While it was a close vote on LinkedIn to select this topic, the choice was made, and I'm eager to share how it all works.
Introduction to the AI Podcast Generator
The idea behind my podcast generator is to create a two-person conversation based on text input. It could be an article, book, or any textual content you want to explore. The tool I built has similarities with Google's NotebookLM, which allows for creating immersive podcasts through voice conversation. You simply paste your chosen text into the generator, and it produces an audio-based discussion.
How it Works
- Input Source: You can upload any text material—be it articles, videos, or links.
- **AI Model:** The generator leverages a large language model (specifically, I used Gemini) to create a back-and-forth conversation between two characters.
- **Voice Cloning:** The audio is generated using two different speech synthesis tools. Google Speech to Text is used for one voice, while I cloned my own voice using 11 Labs for the other character.
- Final Merge: The generated audio clips are merged into a final podcast file.
The crux of this project is the prompt we're sending to Gemini. The prompt directs how the conversation should flow, integrating emotions and natural language fillers to make the dialogue more human-like. While Google’s NotebookLM offers limited customization, my AI podcast generator provides full control over the voice and prompt.
Tools and Technology Used
The primary tools I used for this project include:
- **Google Gemini:** For generating conversation based on the text input.
- **Google Speech to Text:** For converting textual dialogues into natural-sounding audio.
- 11 Labs: For cloning my voice, allowing for a unique and personalized touch.
- Python: To script the entire generation process.
Promise of Fuller Customization
One of my primary goals during this project was to have full control over the voices and prompts used in generating our podcast. My AI podcast generator provides more flexibility when compared to NotebookLM while retaining various engaging features.
The Prompt: The Heart of the Podcast
The effectiveness of the conversation lies heavily in how it is structured through the prompt we provide. For instance, we can specify character names, emotional tone, and more. It's tailored to create a rich and engaging conversation that mimics how a natural dialogue would occur.
Implementation and Generation
The code I wrote is available on GitHub, and you can run it yourself. The streamlit application I developed allows for a user-friendly interface to generate podcasts. Simply input your text, let it process, and listen to the audio output.
Example Podcast
To illustrate the process, I generated a podcast based on one of my articles discussing the hallucination tendencies in large language models (LLMs). The end result was a 6-minute podcast that featured an engaging conversation dynamically exploring the topic while utilizing my cloned voice and a female voice provided by Google.
Conclusion
Overall, the experience of building this AI podcast generator was enjoyable and fulfilling. With the combination of voice cloning and AI-generated dialogue, the possibilities for creating engaging audio content are exciting and plentiful.
Now you can easily generate your podcasts or listen to articles simply by converting them into an audio format with an interactive twist.
Keywords
AI podcast generator, voice cloning, Google NotebookLM, speech synthesis, Gemini, audio content, interactive learning, technology, 11 Labs.
FAQ
Q1: What is an AI podcast generator?
An AI podcast generator is a tool that creates audio podcasts from textual inputs by mimicking a conversation between two people using artificial intelligence.
Q2: How does voice cloning work in this project?
Voice cloning is achieved through services like 11 Labs, which allows you to upload samples of your voice to generate a digital clone that can speak any text input in your voice.
Q3: What technologies do you use?
The project uses Google Gemini for generating dialogue, Google Speech to Text for converting text to audio, and 11 Labs for voice cloning.
Q4: Can I customize the conversation?
Yes, you can customize the conversation by adjusting the prompts sent to Gemini, allowing you to control the tone, length, and emotional content of the podcast.
Q5: Where can I access the generated podcasts?
The generated podcasts can be shared on platforms like Spotify or accessed via the output files saved in the project.