Topview Logo
  • Create viral videos with
    GPT-4o + Ads library
    Use GPT-4o to edit video empowered by Youtube & Tiktok & Facebook ads library. Turns your links or media assets into viral videos in one click.
    Try it free
    gpt video

    I Built an Interactive AI Talking Avatar

    blog thumbnail

    Step 1: Markdown Article


    Introduction

    Hello everyone and welcome to my channel. I'm Rob, and today I have an exciting project to share with you. We'll be diving into the world of artificial intelligence as I take you through the process of building my very own talking AI Avatar using OpenAI's GPT API, Microsoft Azure's Cognitive Services API, and a combination of web technologies like JavaScript, 3.js, and Node.js. We'll bring conversations with AI one step forward. Thanks for tuning in. I'm super excited to show you what I've been working on, so let's dive right in!

    I found this project on GitHub, which served as the perfect starting point for what we're about to create. It allowed users to type a message in a text box, and an avatar replayed it back via speech. I took it a step further by integrating OpenAI's GPT API, enabling our avatar to respond intelligently to your questions.

    Speech to Text Integration

    First, I integrated the Web Speech API, which transcribes your speech to text. You can ask a question using your voice, and the app will convert it into written text.

    OpenAI's GPT API

    Next, the text request is sent to OpenAI's GPT API, where it analyzes the question and generates a response. Note that I'm using the GPT-3.5 Turbo model.

    Chat Completion Stream

    But here's the exciting part: I'm using the chat completion stream option. What does that mean? Well, with the stream option turned on, the response is delivered in chunks or streams, instead of waiting for the whole response to get generated before sending it back to the client. This clever feature significantly reduces the waiting time for users, making the conversation with our AI Avatar more fluid and dynamic. As the model generates a part of the response, it starts sending it back to the client, allowing the avatar to begin its animation even before the full response is complete.

    Azure Cognitive Services API

    To make the conversation even more engaging, we then pass the response to Azure Cognitive Services API. It analyzes the text, identifies the phonemes, generates bisimulation data, converts the text to speech, and sends it back to the frontend client.

    Continuous Conversation

    What's even more amazing is that the conversation is continuous. The app retains the context of the conversation, making it feel like you're talking to a real person. It's a fantastic way to explore the capabilities of AI and have a meaningful interaction.

    Check it out: the avatar on the screen animates its mouth movements in sync with the spoken response, giving it a realistic, lifelike feel.

    Example Interaction

    User: What are the rules of basketball?

    Avatar:

    The rules of basketball can vary slightly depending on the league or organization, but here are some of the basic rules:
    - **Team Composition:** A basketball game is played between two teams, with each team consisting of five players on the court at a time.
    - **Objective:** The objective of the game is to score points by shooting the ball into the opponent's basket while preventing the opposing team from scoring in your own basket.
    - **Dribbling:** The player in possession of the ball must continuously dribble the ball while moving. If they stop dribbling, it results in a violation called a travel or double dribble.
    - **Shooting:** Players can shoot the ball into the opponent's basket to score points. Different shots have different point values, such as two points for a regular field goal and three points for a shot made beyond the three-point line.
    - **Fouls:** Physical contact that hinders an opponent's progress or gives an unfair advantage is considered a foul. Common fouls include pushing, holding, or striking an opponent. A player who commits a foul may be penalized with free throws or the opposing team gaining possession of the ball.
    - **Time and Scoring:** A basketball game is divided into quarters or halves with a specific amount of time allotted for each period. The team with the most points at the end of the game wins. It's important to refer to the specific rules of the league or organization you are playing under for more detailed information.
    

    User: Who won the 1991 NBA Finals?

    Avatar: The Chicago Bulls won the 1991 NBA Finals, defeating the Los Angeles Lakers.

    User: Who was the MVP?

    Avatar: The MVP (Most Valuable Player) of the 1991 NBA Finals was Michael Jordan of the Chicago Bulls.

    User: Is he still playing?

    Avatar: No, Michael Jordan retired from professional basketball for the final time in 2003. He is currently the owner of the Charlotte Hornets NBA team.

    User: Does he have kids?

    Avatar: Yes, Michael Jordan has three children. He has two sons named Jeffrey and Marcus and a daughter named Jasmine.


    There you have it, our talking AI Avatar app is up and running! It's incredible how we can leverage these technologies to create interactive conversational experiences.

    Future Plans

    I have some exciting plans to enhance this talking AI Avatar app further. I'm going to add functionality to be able to change the avatar and the background UI. I'm also planning to port this to an iOS app in the future. So stay tuned for further updates and exciting additions.

    Don't forget to like, subscribe, and hit that notification bell to stay updated on more exciting content.

    Thanks for watching and happy coding!


    Step 2: Keywords

    Keywords


    Step 3: FAQs

    FAQ

    1. Q: What APIs are used to build the AI Avatar?

      • A: The project utilizes OpenAI's GPT API and Microsoft Azure’s Cognitive Services API.
    2. Q: What technologies are used for the frontend and backend implementation?

      • A: The frontend involves JavaScript, 3.js, and the backend runs on Node.js.
    3. Q: What model of GPT is being used in this project?

      • A: We are using the GPT-3.5 Turbo model.
    4. Q: How does the chat completion stream option improve performance?

      • A: The chat completion stream option delivers responses in chunks, making the avatar's responses more fluid and dynamic by starting the animation before the entire response is generated.
    5. Q: Can the app retain the context of a conversation?

      • A: Yes, the app retains the context, making interactions feel continuous and natural.
    6. Q: What are some future enhancements planned for the AI Avatar?

      • A: Future enhancements include changing the avatar and the background UI, and porting the app to iOS.
    7. Q: How does the avatar sync its mouth movements with the speech?

      • A: Azure Cognitive Services API analyzes the text and synchronizes the mouth movements with the spoken response.

    This markdown article covers the entire process, keywords, and frequently asked questions about building an interactive AI Talking Avatar.

    One more thing

    In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.

    TopView.ai provides two powerful tools to help you make ads video in one click.

    Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.

    Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.

    You may also like