Build a vocal AI assistant using ChatGPT and Python using speech recognition
Science & Technology
Introduction
Creating a vocal AI assistant that communicates via speech is simpler than you might think. In this article, we'll walk through building a basic AI assistant using ChatGPT and Python, all with fewer than 80 lines of code. We'll deploy speech recognition to transcribe spoken words, ChatGPT for generating responses, and text-to-speech (TTS) to return vocal replies.
Requirements
Libraries
We'll need the following Python libraries:
- openai: Interface for ChatGPT
- speech_recognition: For recognizing spoken language
- pyttsx3: For TTS conversion
- threading and time: Core Python libraries for managing threads and timeouts
pip install openai
pip install SpeechRecognition
pip install pyttsx3
Keys
To interact with ChatGPT, you’ll require an API key from OpenAI.
Setup
Speech Recognition
To recognize speech, we'll use the speech_recognition
library. Below is an outline of how to set up and use the library to listen to a user's voice and transcribe it into text:
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
print("Chat is ready, say something!")
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print("You said: " + text)
except sr.UnknownValueError:
print("Sorry, could not understand the audio")
except sr.RequestError:
print("Could not request results; check your network connection")
ChatGPT Integration
Next, integrate the openai
library to interface with ChatGPT.
import openai
openai.api_key = 'YOUR_API_KEY'
response = openai.Completion.create(
engine="text-davinci-003",
[prompt="Hello ChatGPT](https://www.topview.ai/blog/detail/Funny-ChatGPT-Conversations)!",
max_tokens=150,
n=1,
stop=None,
temperature=0.5
)
reply = response.choices[0].text.strip()
print(reply)
Text-to-Speech
Utilize the pyttsx3
library to convert ChatGPT's text responses into speech:
import pyttsx3
engine = pyttsx3.init()
engine.say(reply)
engine.runAndWait()
Putting it All Together
Finally, we glue everything together with threading to ensure smooth execution.
import threading
def generate_response(text):
response = openai.Completion.create(
engine="text-davinci-003",
prompt=text,
max_tokens=150,
n=1,
stop=None,
temperature=0.5
)
return response.choices[0].text.strip()
def speak(text):
engine = pyttsx3.init()
engine.say(text)
engine.runAndWait()
while True:
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print(f"User said: (text)")
if "stop" in text.lower():
break
response = generate_response(text)
print(f"ChatGPT: (response)")
speak(response)
except:
print("Sorry, I couldn't understand that.")
Conclusion and Future Improvements
We've created a basic vocal AI assistant using ChatGPT and Python. However, there is ample room for improvement:
- Human-like voice: Enhance the TTS output to sound more natural.
- User Interface: Implement a graphical user interface using libraries such as Tkinter.
Feel free to explore and enrich this assistant further!
Keywords
- Vocal AI Assistant
- ChatGPT
- Speech Recognition
- Text-to-Speech
- Python
- OpenAI API
FAQ
1. What libraries are necessary for building a vocal AI assistant with ChatGPT?
You'll need openai
, speech_recognition
, pyttsx3
, threading
, and time
.
2. How do you install the OpenAI library in Python?
Use the command pip install openai
.
3. Where can I get the API key for ChatGPT?
You can get an API key by signing up or signing in at OpenAI’s official website. The key can be found in the API section of your account.
4. How do I make the AI assistant stop?
You can instruct the AI to listen for a specific keyword like “stop” to break the listening loop.
5. How do I customize the voice and speed of the text-to-speech engine?
You can adjust the voice type and speed using engine.setProperty('voice', voice_id)
and engine.setProperty('rate', rate)
functions in pyttsx3
.