ad
ad

Build a vocal AI assistant using ChatGPT and Python using speech recognition

Science & Technology


Introduction

Creating a vocal AI assistant that communicates via speech is simpler than you might think. In this article, we'll walk through building a basic AI assistant using ChatGPT and Python, all with fewer than 80 lines of code. We'll deploy speech recognition to transcribe spoken words, ChatGPT for generating responses, and text-to-speech (TTS) to return vocal replies.

Requirements

Libraries

We'll need the following Python libraries:

  • openai: Interface for ChatGPT
  • speech_recognition: For recognizing spoken language
  • pyttsx3: For TTS conversion
  • threading and time: Core Python libraries for managing threads and timeouts
pip install openai
pip install SpeechRecognition
pip install pyttsx3

Keys

To interact with ChatGPT, you’ll require an API key from OpenAI.

Setup

Speech Recognition

To recognize speech, we'll use the speech_recognition library. Below is an outline of how to set up and use the library to listen to a user's voice and transcribe it into text:

import speech_recognition as sr

r = sr.Recognizer()
with sr.Microphone() as source:
    print("Chat is ready, say something!")
    audio = r.listen(source)

try:
    text = r.recognize_google(audio)
    print("You said: " + text)
except sr.UnknownValueError:
    print("Sorry, could not understand the audio")
except sr.RequestError:
    print("Could not request results; check your network connection")

ChatGPT Integration

Next, integrate the openai library to interface with ChatGPT.

import openai

openai.api_key = 'YOUR_API_KEY'

response = openai.Completion.create(
    engine="text-davinci-003",
    [prompt="Hello ChatGPT](https://www.topview.ai/blog/detail/Funny-ChatGPT-Conversations)!",
    max_tokens=150,
    n=1,
    stop=None,
    temperature=0.5
)

reply = response.choices[0].text.strip()
print(reply)

Text-to-Speech

Utilize the pyttsx3 library to convert ChatGPT's text responses into speech:

import pyttsx3

engine = pyttsx3.init()
engine.say(reply)
engine.runAndWait()

Putting it All Together

Finally, we glue everything together with threading to ensure smooth execution.

import threading

def generate_response(text):
    response = openai.Completion.create(
        engine="text-davinci-003",
        prompt=text,
        max_tokens=150,
        n=1,
        stop=None,
        temperature=0.5
    )
    return response.choices[0].text.strip()

def speak(text):
    engine = pyttsx3.init()
    engine.say(text)
    engine.runAndWait()

while True:
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Listening...")
        audio = r.listen(source)
        try:
            text = r.recognize_google(audio)
            print(f"User said: (text)")
            if "stop" in text.lower():
                break
            response = generate_response(text)
            print(f"ChatGPT: (response)")
            speak(response)
        except:
            print("Sorry, I couldn't understand that.")

Conclusion and Future Improvements

We've created a basic vocal AI assistant using ChatGPT and Python. However, there is ample room for improvement:

  1. Human-like voice: Enhance the TTS output to sound more natural.
  2. User Interface: Implement a graphical user interface using libraries such as Tkinter.

Feel free to explore and enrich this assistant further!

Keywords

FAQ

1. What libraries are necessary for building a vocal AI assistant with ChatGPT?

You'll need openai, speech_recognition, pyttsx3, threading, and time.

2. How do you install the OpenAI library in Python?

Use the command pip install openai.

3. Where can I get the API key for ChatGPT?

You can get an API key by signing up or signing in at OpenAI’s official website. The key can be found in the API section of your account.

4. How do I make the AI assistant stop?

You can instruct the AI to listen for a specific keyword like “stop” to break the listening loop.

5. How do I customize the voice and speed of the text-to-speech engine?

You can adjust the voice type and speed using engine.setProperty('voice', voice_id) and engine.setProperty('rate', rate) functions in pyttsx3.