Use OpenAI's new Assistant API with your files

In this article, we explore how to effectively use OpenAI's Assistant API to create an assistant that can provide information and generate answers from the files we provide. This step-by-step guide demonstrates how we integrated audio files from YouTube videos to build a functional assistant. While the process is quick and relatively easy, there are some evident limitations that users should be aware of.

Getting Started

The first step in creating your custom assistant is to head to platform.openai.com, log in, and navigate to the Playground. Here, you will select 'Assistants' and press 'Create.'

Naming your Assistant:
- I named mine 'key codes GPT.'

Setting the Prompt:

The system prompt shapes the assistant's behavior. My initial prompt was:

You are key codes GPT, an assistant with knowledge about the YouTube channel key codes.
- Rule 1: Only use the provided context (to limit hallucination).
- Rule 2: Always address the user with "coder."
- Rule 3: Always ask anybody to like and subscribe.
- Rule 4: End any message with "have a lot of fun coders."

Model Selection:
- I selected the gpt4 turbo preview model and saved the settings.

Initial Testing

With the initial setup, let's test the assistant:

Hello Interaction:

Hello coder! I'm an assistant here to provide information about the YouTube channel key codes. Don't forget to like and subscribe for more coding content. Have a lot of fun coders!

Query About Video Content:
- Upon querying specific video details, the assistant couldn't provide real-time content.

Integrating Video Transcripts

To provide accurate responses, I processed the audio from my videos and converted them into transcripts using the Whisper API. Additionally, metadata like the YouTube URL and video titles were added. I then organized this data into different formats like plain text and JSON.

Uploading and Testing Files

Text File Upload:
- I uploaded a text file containing titles, URLs, and transcripts to activate the retrieval function.
- The assistant could now reference video content:
```
“Can you tell me if key codes have a video about typing?”
“Yes, key codes have a video titled 'Why you should use typing in Python.'”
```
JSON File Upload:
- Both un-beautified and beautified JSON files were tested but faced issues in accurate responses and live URL retrieval.

Managing Large Data Volumes

To enhance performance, I split transcript data into individual JSON files for each video and re-uploaded everything. This method addressed some of the token limit issues, but wasn't a perfect solution.

Optimizing Data Format

Switching from JSON to a simplified text format unexpectedly provided improved results:

Text Summary File:
- Incorporating two-sentence summaries for each video into a single text file and structuring it clearly.

Refining the system prompt helped in providing cleaner, accurate responses:

Updated instructions to ensure the model uses the appropriate file for answers first.

Final Testing

Testing different aspects like listing all videos, providing content summaries, and accurate video linking now showed better results:

Summary & Listing:
- Summaries and listings of video content were mostly accurate.

Conclusion

While the Assistant API is relatively easy to start with, fine-tuning it requires an understanding of prompt engineering and managing context limits. The results are promising yet highlight the model's limitations with data volume and format handling.

Have a lot of fun coders!

Keywords

OpenAI Assistant API
YouTube content
Whisper API
JSON files
Python script
Transcript files
GPT-4 Turbo model
Context limit
Prompt engineering

FAQ

Q: What is the first step to use OpenAI's Assistant API? A: The first step is to go to platform.openai.com, log in, and navigate to the Playground to create your assistant.

Q: Why couldn't the assistant initially provide video contents? A: The assistant had no real-time access to the current YouTube content and required external file inputs for accurate information.

Q: How were video transcripts processed? A: I used a Python script to convert audio from videos into text via Whisper API and saved the transcripts along with metadata.

Q: What formats were tested for integrating data into the assistant? A: Both plain text and JSON files were tested.

Q: What was a major challenge faced during integration? A: One challenge was managing the token limit, which restricted seamless summarization and listing of video contents.

Q: How was the issue of limited performance addressed? A: Splitting transcripts into individual JSON files for each video and refining prompts provided moderate improvements.

Q: Does the assistant correctly retrieve video URLs? A: This function had mixed results, improving with proper text formatting and prompt adjustments.

Use OpenAI's new Assistant API with your files