How to Build Knowledge Graphs With LLMs (python tutorial)

Introduction

In this tutorial, we will explore how to build knowledge graphs using large language models (LLMs) and graph databases. We will discuss the process of generating a knowledge graph by identifying entities and relationships in unstructured data using a language model. Additionally, we will demonstrate how to interact with the graph database using a language model chat interface. This approach offers several advantages over traditional vector search methods, which are commonly used in similar applications.

Overview

In the previous video, we showcased a demo that highlighted the interesting interactions between graph databases and large language models. We demonstrated how a language model can be used to identify entities and relationships in unstructured data and how these entities can be used to generate a knowledge graph in a graph database. The video received positive feedback, with viewers requesting more details about the code used to build the demo. In response, we have created a more detailed video that walks through the code step-by-step, explaining how it all works.

Environment Setup

Before diving into the code, let's briefly discuss the environment we will be using for this tutorial. We will be working in a Jupyter notebook, which allows us to run the code step-by-step with annotations. The data we will be using to generate the knowledge graph is stored in JSON and Markdown files. We will also need API keys and credentials for the OpenAI API and the Neo4j graph database. Make sure to install and import the required Python libraries listed in the requirements file.

Extracting Entities and Relationships

Our first step is to extract entities and relationships from the data using the language model. We have a function called "extract_entities_relationships" that takes a folder of files and a prompt template as inputs. This function iterates through each file, applies the prompt template, and calls the OpenAI API to generate the entities and relationships. The results are stored in a JSON object. We provide prompt templates for different types of files, such as project briefs, people profiles, and Slack messages. These templates define the entities, relationships, and desired output structure for each file type.

Generating Cipher Statements

Once we have the entities and relationships extracted, we can generate the Cipher statements to create nodes and relationships in the Neo4j graph database. We have a function called "generate_cipher" that takes the JSON object as input and generates the appropriate Cipher statements. This function constructs the Cipher statements for creating nodes and relationships based on the entities and relationships extracted earlier. The resulting statements are stored in a text file, which can be used to execute the statements in the Neo4j database.

Executing Cipher Statements

In the final step, we have a pipeline function that orchestrates the entire process. This function takes a list of folders and their corresponding prompt templates as input. It sequentially runs the functions to extract entities and relationships, generate Cipher statements, and execute the statements in the Neo4j database. The results are stored in the graph database, creating a knowledge graph from the unstructured data.

Keywords

Knowledge Graphs
Large Language Models
Graph Databases
Entities
Relationships
Unstructured Data
Neo4j
Cipher Statements
OpenAI API
Jupyter Notebook

FAQ

What is a knowledge graph?
- A knowledge graph is a structured representation of information that captures entities (e.g., people, places, concepts) and their relationships.
What are large language models?
- Large language models (LLMs) are AI models trained on vast amounts of text data, enabling them to generate human-like language.
What is a graph database?
- A graph database is a database system that uses graph structures to store, represent, and query data.
How do LLMs help in building knowledge graphs?
- LLMs can identify entities and relationships in unstructured data, enabling the generation of knowledge graphs.
What are Cipher statements in Neo4j?
- Cipher statements are queries written in the Neo4j graph database language used to create, update, and query the database.

This article provides a step-by-step guide on building knowledge graphs using large language models and graph databases. We discuss the process of extracting entities and relationships from unstructured data, generating Cipher statements, and executing them in a Neo4j graph database. The tutorial explores the power of LLMs and their advantages over traditional approaches.