Topview Logo
  • Create viral videos with
    GPT-4o + Ads library
    Use GPT-4o to edit video empowered by Youtube & Tiktok & Facebook ads library. Turns your links or media assets into viral videos in one click.
    Try it free
    gpt video

    Langchain: PDF Chat App (GUI) | ChatGPT for Your PDF FILES | Step-by-Step Tutorial

    blog thumbnail

    Introduction

    In today's tutorial, we're going to create a PDF chat application that allows users to upload a PDF document and interactively query its contents. By the end of this tutorial, you'll have the foundational knowledge to develop your own application similar to popular platforms like ChatPDF.

    Overview of the Application Architecture

    Our application combines a front-end built using Streamlit and a back-end powered by OpenAI's API. The following is a breakdown of the architecture:

    • Uploading PDF Files: Users can drag and drop their PDF documents into the interface.
    • Processing PDF Content: The application reads the PDF, extracts text, and divides it into smaller chunks to manage context effectively.
    • Creating Embeddings: Each chunk of text is converted into numerical embeddings for semantic searching.
    • Querying: The user can input questions, and based on their queries, the application will search the embeddings for relevant text to provide answers.

    We’ll be using several libraries in this project, including Streamlit for the user interface, PyPDF2 for reading PDF files, and Langchain for managing embeddings and querying.

    Setting Up the Environment

    1. Python Environment: It's crucial to create a new virtual environment for this project. In your terminal, use:
      conda create -n pdf_chat python=3.8
      conda activate pdf_chat
      pip install -r requirements.txt
      
    2. Installing Required Libraries: Our requirements file will contain libraries such as streamlit, PyPDF2, langchain, and openai.
    3. Loading Environment Variables: Store your OpenAI API key in a .env file, and load it using dotenv.

    Building the Application with Streamlit

    Step 1: Setting Up the User Interface

    We’ll start by importing libraries and setting up the structure of our Streamlit application.

    import streamlit as st
    import PyPDF2
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain.embeddings import OpenAIEmbeddings
    from langchain.vectorstores import FAISS
    from langchain.llms import OpenAI
    import pickle
    import os
    from dotenv import load_dotenv
    
    load_dotenv()
    
    def main():
        st.[title("PDF Chat App")](https://www.topview.ai/blog/detail/chat-with-any-pdf-with-chatpdf-chatpdf-demo)
        st.sidebar.header("Upload Your PDF")
    

    Step 2: Uploading the PDF

    Streamlit allows us to create a file uploader where users can drag and drop their PDFs. We read and extract the text from each page of the PDF:

    pdf = st.file_uploader("Upload your PDF", type="pdf")
    if pdf is not None:
        pdf_reader = PyPDF2.PdfFileReader(pdf)
        text = ''
        for page in range(pdf_reader.getNumPages()):
            text += pdf_reader.getPage(page).extract_text()
    

    Step 3: Splitting Text into Chunks

    For optimal performance, we split the extracted text into chunks to prevent exceeding the LLM's context window limits.

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    chunks = text_splitter.split_text(text)
    

    Step 4: Creating Vector Store

    Next, we create embeddings for these chunks using OpenAI’s API and store them in a vector database:

    embeddings = OpenAIEmbeddings()
    vector_store = FAISS.from_texts(chunks, embeddings)
    

    Step 5: Query Input by Users

    Next, we accept user queries through the application interface:

    query = st.text_input("Ask a question about your document:")
    if query:
        docs = vector_store.similarity_search(query)
    

    Step 6: Interacting with the Language Model

    Finally, we retrieve relevant chunks and pass them to the OpenAI model to generate a response:

    llm = OpenAI(model_name="gpt-3.5-turbo")
    chain = load_qa_chain(llm, chain_type="stuff")
    response = chain.run(input_docs=docs, question=query)
    st.write(response)
    

    Conclusion

    That’s it! You’ve successfully built a PDF chat application. By combining the power of Streamlit, OpenAI's API, and Langchain, you can now chat with your documents. If you want additional features or optimizations, consider exploring different embeddings, models, or even visual aesthetic improvements to your Streamlit app.


    Keyword

    • Langchain
    • PDF Chat App
    • Streamlit
    • OpenAI API
    • PDF Processing
    • Text Embeddings
    • Semantic Search

    FAQ

    • Q: What is Langchain?
      A: Langchain is a library designed to help developers create applications that can interact with language models, providing tools for embeddings, vector storage, and chains.

    • Q: How can I upload my PDF in the app?
      A: You can simply drag and drop your PDF file into the designated area in the app's user interface.

    • Q: What is the significance of chunking text?
      A: Chunking text helps manage the context window of LLMs since they have limits on the amount of data they can process in one go.

    • Q: How do I obtain the OpenAI API key?
      A: You can create an API key by signing up on the OpenAI platform and accessing your account settings.

    • Q: What does the embedding process involve?
      A: Embedding is the transformation of text into numerical vectors, allowing the application to perform semantic searches to find relevant information.

    One more thing

    In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.

    TopView.ai provides two powerful tools to help you make ads video in one click.

    Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.

    Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.

    You may also like