Topview Logo
  • Create viral videos with
    GPT-4o + Ads library
    Use GPT-4o to edit video empowered by Youtube & Tiktok & Facebook ads library. Turns your links or media assets into viral videos in one click.
    Try it free
    gpt video

    Build a vocal AI assistant using ChatGPT and Python using speech recognition

    blog thumbnail

    Introduction

    Creating a vocal AI assistant that communicates via speech is simpler than you might think. In this article, we'll walk through building a basic AI assistant using ChatGPT and Python, all with fewer than 80 lines of code. We'll deploy speech recognition to transcribe spoken words, ChatGPT for generating responses, and text-to-speech (TTS) to return vocal replies.

    Requirements

    Libraries

    We'll need the following Python libraries:

    • openai: Interface for ChatGPT
    • speech_recognition: For recognizing spoken language
    • pyttsx3: For TTS conversion
    • threading and time: Core Python libraries for managing threads and timeouts
    pip install openai
    pip install SpeechRecognition
    pip install pyttsx3
    

    Keys

    To interact with ChatGPT, you’ll require an API key from OpenAI.

    Setup

    Speech Recognition

    To recognize speech, we'll use the speech_recognition library. Below is an outline of how to set up and use the library to listen to a user's voice and transcribe it into text:

    import speech_recognition as sr
    
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Chat is ready, say something!")
        audio = r.listen(source)
    
    try:
        text = r.recognize_google(audio)
        print("You said: " + text)
    except sr.UnknownValueError:
        print("Sorry, could not understand the audio")
    except sr.RequestError:
        print("Could not request results; check your network connection")
    

    ChatGPT Integration

    Next, integrate the openai library to interface with ChatGPT.

    import openai
    
    openai.api_key = 'YOUR_API_KEY'
    
    response = openai.Completion.create(
        engine="text-davinci-003",
        [prompt="Hello ChatGPT](https://www.topview.ai/blog/detail/Funny-ChatGPT-Conversations)!",
        max_tokens=150,
        n=1,
        stop=None,
        temperature=0.5
    )
    
    reply = response.choices[0].text.strip()
    print(reply)
    

    Text-to-Speech

    Utilize the pyttsx3 library to convert ChatGPT's text responses into speech:

    import pyttsx3
    
    engine = pyttsx3.init()
    engine.say(reply)
    engine.runAndWait()
    

    Putting it All Together

    Finally, we glue everything together with threading to ensure smooth execution.

    import threading
    
    def generate_response(text):
        response = openai.Completion.create(
            engine="text-davinci-003",
            prompt=text,
            max_tokens=150,
            n=1,
            stop=None,
            temperature=0.5
        )
        return response.choices[0].text.strip()
    
    def speak(text):
        engine = pyttsx3.init()
        engine.say(text)
        engine.runAndWait()
    
    while True:
        r = sr.Recognizer()
        with sr.Microphone() as source:
            print("Listening...")
            audio = r.listen(source)
            try:
                text = r.recognize_google(audio)
                print(f"User said: (text)")
                if "stop" in text.lower():
                    break
                response = generate_response(text)
                print(f"ChatGPT: (response)")
                speak(response)
            except:
                print("Sorry, I couldn't understand that.")
    

    Conclusion and Future Improvements

    We've created a basic vocal AI assistant using ChatGPT and Python. However, there is ample room for improvement:

    1. Human-like voice: Enhance the TTS output to sound more natural.
    2. User Interface: Implement a graphical user interface using libraries such as Tkinter.

    Feel free to explore and enrich this assistant further!

    Keywords

    FAQ

    1. What libraries are necessary for building a vocal AI assistant with ChatGPT?

    You'll need openai, speech_recognition, pyttsx3, threading, and time.

    2. How do you install the OpenAI library in Python?

    Use the command pip install openai.

    3. Where can I get the API key for ChatGPT?

    You can get an API key by signing up or signing in at OpenAI’s official website. The key can be found in the API section of your account.

    4. How do I make the AI assistant stop?

    You can instruct the AI to listen for a specific keyword like “stop” to break the listening loop.

    5. How do I customize the voice and speed of the text-to-speech engine?

    You can adjust the voice type and speed using engine.setProperty('voice', voice_id) and engine.setProperty('rate', rate) functions in pyttsx3.

    One more thing

    In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.

    TopView.ai provides two powerful tools to help you make ads video in one click.

    Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.

    Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.

    You may also like