ADVANCED Python AI Agent Tutorial - Using RAG
Education
ADVANCED Python AI Agent Tutorial - Using RAG
Introduction
Get ready because this tutorial is going to get really cool really fast! I'll be showing you how to build an artificial intelligence (AI) agent which can utilize a bunch of different tools that we provide it with. This means we make the tools, we give them to the AI, and it will automatically decide the best tool to use. This is super slick, pretty easy to build, and even if you're a beginner or intermediate programmer, you should be able to follow along.
Demo Overview
In this demo, you're going to see an AI agent who can answer questions about population and demographic data using something known as RAG (Retrieval-Augmented Generation). RAG involves providing extra data to the model so that it can reason based on that rather than its old training data, which might be out of date.
You'll see interactions with a population CSV file and a PDF about Canada. The CSV contains structured data about population density and changes, while the PDF provides detailed information about Canada. The model can switch between these data sources or use both to answer questions, demonstrating its capability to dynamically select and use the correct data source.
Moreover, the agent can also take notes upon request, storing information in a notes.txt file, showcasing how it can interact with and utilize different tools provided.
Llama Index Introduction
To achieve this, we'll use Llama Index, a free open-source package that allows ingestion of various data types, whether structured, unstructured, or semi-structured. Llama Index helps in creating an interface where we can easily query the data, thus extending the capabilities of our AI agent.
Setting Up
1. Create a Virtual Environment:
python3 -m venv ai
2. Activate the Environment:
For macOS/Linux:
source ai/bin/activate
For Windows (in PowerShell):
.\ai\Scripts\Activate.ps1
3. Install Required Packages:
pip install llama-index pypdf pandas python-dotenv
4. Download Data:
- Population CSV: Kaggle Data
- Canada PDF: Wikipedia PDF download tool
5. Place data files in a directory named data
and create a notes.txt
file for storing notes. Also, create a .env
file to store your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
Building the Agent
We'll divide the functionality into separate components to make it easier to manage and understand.
1. Load and Query CSV Data
- Main File Setup (
main.py
):
from dotenv import load_dotenv
import os
import pandas as pd
from llama_index import query_engine
## Introduction
load_dotenv()
## Introduction
population_path = os.path.join("data", "population.csv")
population_df = pd.read_csv(population_path)
## Introduction
population_query_engine = query_engine.PandasQueryEngine(
df=population_df, verbose=True
)
- Prompt Template (
prompts.py
):
from llama_index import PromptTemplate
instruction_str = "Convert the query to executable Python code using pandas..."
new_prompt = PromptTemplate(instructions=instruction_str)
2. Note Taking Functionality
- Note Engine (
note_engine.py
):
from llama_index.tools import FunctionTool
def save_note_to_file(note):
with open("data/notes.txt", "a") as f:
f.write(f"(note)\n")
return "Note saved!"
note_saver = FunctionTool(
fn=save_note_to_file,
name="Note Saver",
description="This tool can save a text-based note."
)
3. Read Unstructured PDF Data
- PDF Handling (
pdf.py
):
from llama_index import StorageContext, VectorStoreIndex, load_index_from_storage
pdf_path = os.path.join("data", "canada.pdf")
storage_context = StorageContext.from_default()
vector_index = VectorStoreIndex.from_documents(pdf_reader.load_data(file=pdf_path))
## Introduction
def get_index(data_docs, name):
if not os.path.exists(name):
vector_index.save_to_disk(name, storage_context)
else:
vector_index = load_index_from_storage(name, storage_context)
return vector_index
canada_index = get_index(vector_index, "canada_index")
4. Combining Everything and creating the Agent
- Main File Update:
from note_engine import note_saver
from pdf import canada_index
from llama_index import llm, agent
## Introduction
tools = [
FunctionTool(engine=note_saver, name="Note Saver", description="Saves notes."),
QueryEngineTool(engine=population_query_engine, name="Population Data",
description="Queries population data."),
QueryEngineTool(engine=canada_index, name="Canada Data",
description="Provides information about Canada.")
]
## Introduction
agent_llm = llm.OpenAI(model="gpt-3.5-turbo")
ai_agent = agent.ReactAgent.from_tools(tools, lm=agent_llm, verbose=True)
while True:
u_input = input("Enter your question (or 'q' to quit): ")
if u_input.lower() == 'q':
break
response = ai_agent.query(u_input)
print(response)
Conclusion
This step-by-step guide shows you how to build a powerful AI agent capable of employing different tools to provide accurate and relevant responses. By leveraging Llama Index, you can manage both structured and unstructured data easily, and even extend the functionality of your agent by defining custom tools.
Keywords
- AI Agent
- Llama Index
- Query Engine
- PDF Reader
- Vector Store Index
- Python
- OpenAI
FAQ
1. What is Retrieval-Augmented Generation (RAG)?
RAG involves providing a model with extra data so it can generate reasoned outputs based on that data, rather than relying solely on its training data.
2. How does the AI agent decide which tool to use?
The AI agent automatically decides the best tool based on the context and the query provided by the user, leveraging the integrated tools defined in the system.
3. Can the AI agent save notes?
Yes, you can ask the AI agent to save notes. It utilizes a specific function designated for note-taking.
4. What is a Vector Store Index?
A Vector Store Index is used to handle unstructured data; it creates multi-dimensional embeddings of the data objects, enabling quick and efficient querying and retrieval based on similarity of intent and content.
5. How can I include more data sources?
You can include more data sources by creating new query engines or defining specific readers, and then adding them as tools to the agent using Llama Index.
6. Do I need an OpenAI API key for this tutorial?
Yes, you need to obtain an API key from OpenAI to utilize the models for generating and querying data.