Chat with Multiple PDFs | LangChain App Tutorial in Python (Free LLMs and Embeddings)
Education
Chat with Multiple PDFs | LangChain App Tutorial in Python (Free LLMs and Embeddings)
Good morning, everyone! Today, welcome to this new video tutorial where I'll show you exactly how to build a remarkable application. This project is a chatbot that allows you to chat with multiple PDFs from your computer at once. Let's dive in.
How It Works
The application demonstrated allows users to upload multiple PDFs and process them to ask relevant questions. For this example, I uploaded the Constitution and the Bill of Rights. Upon processing, the documents are embedded into a vector store database, enabling users to ask questions such as "What are the three branches of the United States government?" and get answers based on the uploaded PDFs.
Setting Up the Environment
Creating a Virtual Environment:
python -m venv myenv source myenv/bin/activate
Installing Dependencies:
pip install streamlit pip install pypdf2 pip install langchain pip install python-dotenv pip install faiss-cpu pip install openai pip install huggingface_hub
Graphical User Interface (GUI)
To build the GUI, we make use of Streamlit, a powerful tool for creating web apps in Python.
Setting Page Configuration:
import streamlit as st st.set_page_config(page_title="Chat with Multiple PDFs", page_icon="?")
Adding a Header and Sidebar:
st.header("Chat with Multiple PDFs ?") query = st.text_input("Ask a question about your documents here") with st.sidebar: st.subheader("Your Documents") pdf_docs = st.file_uploader("Upload your PDFs here and click on process", accept_multiple_files=True) process = st.button("Process")
Backend Logic
Processing PDF Documents
from PyPDF2 import PdfReader
def get_pdf_text(pdf_docs):
text = ""
for pdf in pdf_docs:
pdf_reader = PdfReader(pdf)
for page in pdf_reader.pages:
text += page.extract_text()
return text
Splitting Text into Chunks
from langchain.text_splitter import CharacterTextSplitter
def get_text_chunks(text):
text_splitter = CharacterTextSplitter(separator="\n", chunk_size=1000, chunk_overlap=200, length_function=len)
chunks = text_splitter.split_text(text)
return chunks
Creating Vector Store with OpenAI Embeddings
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
def get_vector_store(chunks):
embeddings = OpenAIEmbeddings()
vector_store = FAISS.from_texts(chunks, embeddings)
return vector_store
Creating a Conversational Chain
from langchain.chains import ConversationalRetrievalChain
from langchain.llms import OpenAI
def create_conversation_chain(vector_store):
memory = ConversationalBufferMemory(memory_key="chat_history", return_messages=True)
llm = [OpenAI(chat_model=](https://www.topview.ai/blog/detail/chatgpt-https-chat-openai-com-auth-login)"gpt-3.5-turbo", api_key="YOUR_OPENAI_API_KEY")
conversation_chain = ConversationalRetrievalChain.from_llm(llm, vector_store.as_retriever(), memory=memory)
return conversation_chain
Running the Application
if __name__ == "__main__":
if process:
with st.spinner("Processing"):
raw_text = get_pdf_text(pdf_docs)
chunked_text = get_text_chunks(raw_text)
vector_store = get_vector_store(chunked_text)
conversation = create_conversation_chain(vector_store)
st.session_state.conversation = conversation
if query:
user_message = query
response = st.session_state.conversation(('question': user_message))
st.write(response['chat_history'])
Displaying Chat Messages with HTML Templates
import streamlit as st
CSS = """
<style>
/* Your CSS code */
</style>
"""
USER_TEMPLATE = """
<div class="chat_message">
<div class="user">
<img src="https://user_image_url.com" alt="User">
(message)
</div>
</div>
"""
BOT_TEMPLATE = """
<div class="chat_message">
<div class="bot">
<img src="https://bot_image_url.com" alt="Bot">
(message)
</div>
</div>
"""
st.write(CSS, unsafe_allow_html=True)
## Introduction
st.write(USER_TEMPLATE.replace("(message)", user_message), unsafe_allow_html=True)
st.write(BOT_TEMPLATE.replace("(message)", bot_response), unsafe_allow_html=True)
Conclusion
Congratulations on following along to the end! You've successfully built a sophisticated chatbot that can manage multiple PDF documents and provide intelligent responses based on their contents.
Don't forget to subscribe and leave any questions in the comments.
Keywords
- Chatbot
- LangChain
- Streamlit
- Python
- OpenAI
- HuggingFace
- Vector Store
- Embeddings
- Conversational Chain
FAQ
What libraries do I need to install for this project?
- You will need Streamlit, PyPDF2, LangChain, Python-dotenv, Faiss-cpu, OpenAI, and HuggingFace_hub.
How do I split the text into manageable chunks?
- Use the CharacterTextSplitter class from LangChain.
Which OpenAI function creates the embeddings for vector storage?
- Use
OpenAIEmbeddings
from LangChain for embedding the chunks of text.
- Use
Can I use HuggingFace models instead of OpenAI?
- Yes, you can use HuggingFace models like
google_flant5_base
and integrate them similarly as shown.
- Yes, you can use HuggingFace models like
How do you make variables persistent in Streamlit?
- Use
st.session_state
to keep variables persistent throughout the session.
- Use
Is there a way to use free models for this project?
- Yes, you can use HuggingFace's
instructor_transformer
model to create embeddings for free.
- Yes, you can use HuggingFace's
What if my embeddings process is too slow?
- Consider using a GPU for faster embedding processing or utilize cloud-hosted services like OpenAI or HuggingFace API.