Knowledge Graph using LangChain

Introduction

In today’s discussion, we will explore how to create knowledge graphs using LangChain. A knowledge graph represents unstructured data in a structured format using graph structures such as nodes and the relationships between them. Knowledge graphs have gained popularity in fields like biology and have proven particularly useful with the advent of graph analytics tools such as Neo4j. Our tutorial will demonstrate how generative AI can help create knowledge graphs from textual or unstructured datasets.

Prerequisites

To get started, you need to install several libraries:

langchain-experimental
langchain-community
langchain
networkx
google-genai (if using the free API by Google)
langchain-core
jsonrepair

Importing Necessary Libraries

After installing the libraries, import the essential packages, including the llm_graph_transformer.

from langchain_core import Document
from langchain.llms import LLMGraphTransformer
import networkx as nx
import google_genai as genai

Setting Up Your LLM

Configure your LLM with your API key. For instance, if you are using Google's free API, set it up accordingly.

api_key = 'your_google_api_key'
llm = genai.LLM(api_key=api_key)

Processing Text Data

Consider a baseline text about Marie Curie. This unstructured text will be converted into a structured format.

text = "Marie Curie was a Polish and naturalized-French physicist and chemist who conducted pioneering research on radioactivity."

The goal is to convert this text into a format where relationships and nodes are represented clearly:

Example: Marie Curie -> Polish Nationality

Loading Text into Document Function

Load the text into the document function and pass it to the llm_graph_transformer.

doc = Document(text)
transformer = LLMGraphTransformer()

Setting Up Nodes and Relationships

Identify the nodes and possible relationships within the text. In this example, nodes are ‘country’ and ‘person,’ while relationships might include ‘nationality,’ ‘located in,’ ‘worked at,’ ‘spouse,’ and ‘mother.’

nodes = ['country', 'person']
relationships = ['nationality', 'located in', 'worked at', 'spouse', 'mother']
transformer.set_nodes(nodes)
transformer.set_relationships(relationships)

Creating the Knowledge Graph

Generate the graph by calling the function with the loaded document.

knowledge_graph = transformer.transform(doc)

Converting the Graph to CSV

For better readability or to train machine learning models, convert the graph into a CSV format.

import csv
with open('knowledge_graph.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Source ID', 'Target ID', 'Relationship'])
    for edge in knowledge_graph.edges(data=True):
        writer.writerow([edge[0], edge[1], edge[2]['relation']])

Extracting Relationships Without Labels

If you don't provide labels for entities and relationships, the LLM will identify potential entities and relationships on its own, albeit possibly less accurately.

transformer.set_nodes(None)
transformer.set_relationships(None)
anonymous_graph = transformer.transform(doc)

Conclusion

Creating a knowledge graph helps convert unstructured data into a structured format useful for various applications, including machine learning workflows. Experiment with smaller datasets before scaling up to conserve costs and ensure efficacy.

Keywords

LangChain
Knowledge Graph
Unstructured Data
Generative AI
Graph Analytics
LLMGraphTransformer
API Integration

FAQ

What is a knowledge graph?

A knowledge graph is a structured representation of information where entities are nodes, and relationships between them are edges.

Why use LangChain for creating knowledge graphs?

LangChain helps automate the extraction of structured data from unstructured text using its generative AI capabilities, making the process efficient and scalable.

What libraries are required for this process?

You need langchain-experimental, langchain-community, langchain, networkx, google-genai, langchain-core, and jsonrepair.

How do you convert text data into a knowledge graph?

Load your text into a document function, configure nodes and relationships, and use LLMGraphTransformer to generate the knowledge graph.

Can you create a knowledge graph without specifying nodes and relationships?

Yes, the LLM can attempt to identify entities and relationships on its own if you don’t specify them, though specificity improves accuracy.

How do you handle the cost of using LLMs for knowledge graphs?

Start with smaller datasets to gauge performance and cost before scaling up to larger datasets, especially when using paid APIs.