LlamaIndex

LlamaIndex is a popular LLM orchestration framework with a clean architecture and a focus on data structures and models. It integrates many LLMs as well as vector stores and other indexes and contains tooling for document loading (loader hub) and advanced RAG patterns.

LlamaIndex provides a lot of detailed examples for GenAI application development in their blogs and documentation.

The Neo4j integration covers both the vector store as well as query generation from natural language and knowledge graph construction.

Functionality Includes

Neo4jVector

The Neo4j Vector integration supports a number of operations

  • create vector from LlamaIndex documents

  • query vector

  • query vector with additional graph retrieval Cypher query

  • construct vector instance from existing graph data

  • hybrid search

  • metadata filtering

%pip install llama-index-llms-openai
%pip install llama-index-vector-stores-neo4jvector
%pip install llama-index-embeddings-openai
%pip install neo4j

import os
import openai
from llama_index.vector_stores.neo4jvector import Neo4jVectorStore
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"
openai.api_key = os.environ["OPENAI_API_KEY"]
username = "neo4j"
password = "pleaseletmein"
url = "bolt://localhost:7687"
embed_dim = 1536

neo4j_vector = Neo4jVectorStore(username, password, url, embed_dim)
# load documents
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults(vector_store=neo4j_vector)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

query_engine = index.as_query_engine()
response = query_engine.query("What happened at interleaf?")

Hybrid search combines vector search with fulltext search with re-ranking and de-duplication of the results.

neo4j_vector_hybrid = Neo4jVectorStore(
    username, password, url, embed_dim, hybrid_search=True
)

storage_context = StorageContext.from_defaults(
    vector_store=neo4j_vector_hybrid
)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)
query_engine = index.as_query_engine()
response = query_engine.query("What happened at interleaf?")

Metadata filtering

Metadata filtering enhances vector search by allowing searches to be refined based on specific node properties. This integrated approach ensures more precise and relevant search results by leveraging both the vector similarities and the contextual attributes of the nodes.

from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)

filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="theme", operator=FilterOperator.EQ, value="Fiction"
        ),
    ]
)

retriever = index.as_retriever(filters=filters)
retriever.retrieve("What is inception about?")

Neo4jPropertyGraphStore

The Neo4j Property Graph Store integration is a wrapper for the Neo4j Python driver. It allows querying and updating the Neo4j database in a simplified manner from LlamaIndex. Many integrations allow you to use the Neo4j Property Graph Store as a source of data for LlamaIndex.

Property graph index

Knowledge graph index can be used to extract graph representation of information from text and use it to construct a knowledge graph. The graph information can then be retrieved in a RAG application for more accurate responses.

%pip install llama-index llama-index-graph-stores-neo4j

from llama_index.core import SimpleDirectoryReader
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
graph_store = Neo4jPropertyGraphStore(
    username="neo4j",
    password="password",
    url="bolt://localhost:7687",
)
# Extract graph from documents
index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    kg_extractors=[
        SchemaLLMPathExtractor(
            llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0)
        )
    ],
    property_graph_store=graph_store,
    show_progress=True,
)

# Define retriever
retriever = index.as_retriever(
    include_text=False,  # include source text in returned nodes, default True
)
results = retriever.retrieve("What happened at Interleaf and Viaweb?")
for record in results:
    print(record.text)

# Question answering
query_engine = index.as_query_engine(include_text=True)
response = query_engine.query("What happened at Interleaf and Viaweb?")
print(str(response))

Property Graph constructing modules

LlamaIndex features multiple graph construction modules. Property graph construction in LlamaIndex works by performing a series of kg_extractors on each text chunk, and attaching entities and relations as metadata to each llama-index node. You can use as many as you like here, and they will all get applied. Learn more about them in the documentation.

Here is an example of graph construction using a predefined schema.

%pip install llama-index llama-index-graph-stores-neo4j

from typing import Literal
from llama_index.core import SimpleDirectoryReader
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore

# best practice to use upper-case
entities = Literal["PERSON", "PLACE", "ORGANIZATION"]
relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"]

# define which entities can have which relations
validation_schema = {
    "PERSON": ["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"],
    "PLACE": ["HAS", "PART_OF", "WORKED_AT"],
    "ORGANIZATION": ["HAS", "PART_OF", "WORKED_WITH"],
}

kg_extractor = SchemaLLMPathExtractor(
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0),
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=validation_schema,
    # if false, allows for values outside of the schema
    # useful for using the schema as a suggestion
    strict=True,
)
graph_store = Neo4jPropertyGraphStore(
    username="neo4j",
    password="password",
    url="bolt://localhost:7687",
)
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
index = PropertyGraphIndex.from_documents(
    documents,
    kg_extractors=[kg_extractor],
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    property_graph_store=graph_store,
    show_progress=True,
)

Property graph querying modules

Labeled property graphs can be queried in several ways to retrieve nodes and paths. And in LlamaIndex, you can combine several node retrieval methods at once! Learn more about which ones are available in the documentation.

You can also define a custom graph retriever as shown below.

from llama_index.core.retrievers import (
    CustomPGRetriever,
    VectorContextRetriever,
    TextToCypherRetriever,
)
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.graph_stores import PropertyGraphStore
from llama_index.core.vector_stores.types import VectorStore
from llama_index.core.embeddings import BaseEmbedding
from llama_index.core.prompts import PromptTemplate
from llama_index.core.llms import LLM
from llama_index.postprocessor.cohere_rerank import CohereRerank


from typing import Optional, Any, Union


class MyCustomRetriever(CustomPGRetriever):
    """Custom retriever with cohere reranking."""

    def init(
        self,
        ## vector context retriever params
        embed_model: Optional[BaseEmbedding] = None,
        vector_store: Optional[VectorStore] = None,
        similarity_top_k: int = 4,
        path_depth: int = 1,
        ## text-to-cypher params
        llm: Optional[LLM] = None,
        text_to_cypher_template: Optional[Union[PromptTemplate, str]] = None,
        ## cohere reranker params
        cohere_api_key: Optional[str] = None,
        cohere_top_n: int = 2,
        **kwargs: Any,
    ) -> None:
        """Uses any kwargs passed in from class constructor."""

        self.vector_retriever = VectorContextRetriever(
            self.graph_store,
            include_text=self.include_text,
            embed_model=embed_model,
            vector_store=vector_store,
            similarity_top_k=similarity_top_k,
            path_depth=path_depth,
        )

        self.cypher_retriever = TextToCypherRetriever(
            self.graph_store,
            llm=llm,
            text_to_cypher_template=text_to_cypher_template
            ## NOTE: you can attach other parameters here if you'd like
        )

        self.reranker = CohereRerank(
            api_key=cohere_api_key, top_n=cohere_top_n
        )

    def custom_retrieve(self, query_str: str) -> str:
        """Define custom retriever with reranking.

        Could return `str`, `TextNode`, `NodeWithScore`, or a list of those.
        """
        nodes_1 = self.vector_retriever.retrieve(query_str)
        nodes_2 = self.cypher_retriever.retrieve(query_str)
        reranked_nodes = self.reranker.postprocess_nodes(
            nodes_1 + nodes_2, query_str=query_str
        )

        ## TMP: please change
        final_text = "\n\n".join(
            [n.get_content(metadata_mode="llm") for n in reranked_nodes]
        )

        return final_text

custom_sub_retriever = MyCustomRetriever(
    index.property_graph_store,
    include_text=True,
    vector_store=index.vector_store,
    cohere_api_key="...",
)

query_engine = RetrieverQueryEngine.from_args(
    index.as_retriever(sub_retrievers=[custom_sub_retriever]), llm=llm
)

response = query_engine.query("Did the author like programming?")
print(str(response))

Documentation

Neo4j Query Engine Pack

This Neo4j Query Engine LlamaPack creates a Neo4j query engine, and executes its query function. This pack offers the option of creating multiple types of query engines, namely:

  • Knowledge graph vector-based entity retrieval (default if no query engine type option is provided)

  • Knowledge graph keyword-based entity retrieval

  • Knowledge graph hybrid entity retrieval

  • Raw vector index retrieval

  • Custom combo query engine (vector similarity + KG entity retrieval)

  • KnowledgeGraphQueryEngine

  • KnowledgeGraphRAGRetriever