Talk to Your Graph¶

What’s in this document?

Note

This feature is currently still experimental.

This document provides a quick start guide for setting up your first Talk to Your Graph configuration.

What is Talk to Your Graph?¶

Talk to Your Graph is a chatbot based on OpenAI’s GPT-4 model that lets you enter natural language queries of your own knowledge graphs. You can access it through the Workbench’s Lab menu.

The bot uses an instance of the ChatGPT Retrieval Connector (which in turn uses the ChatGPT Retrieval Plugin) to request additional information when it doesn’t know how to answer your question. The information in the connector is created from the RDF data stored in a GraphDB repository.

Note

You need to provide your own API key for OpenAI’s API. The key must have access to the GPT-4 model. See Configuring Your Use of GPT Models for how to configure your API key with GraphDB. Talk to Your Graph always uses the GPT-4 model regardless of the model setting in GraphDB’s configuration.

Getting started ¶

This section describes common requirements and instructions for the two examples below.

Requirements ¶

GraphDB with ChatGPT Retrieval Connector and Talk to Your Graph.
Weaviate – other vector databases may work as well but the example here will use Weaviate.
The ChatGPT Retrieval Plugin.

Installation Overview ¶

Installing and running Talk to Your Graph will consist of the following steps:

Use Docker to install the Weaviate vector database.
Install the poetry package manager and install the ChatGPT Retrieval Plugin inside of a poetry virtual environment.
Set the tokens and other environment variables necessary for these components to communicate back and forth.

The functionality will be demonstrated by using two different datasets that follow similar but not identical steps:

Run the plugin.
Load data into a repository and create an instance of the ChatGPT Retrieval connector.

Then, you’ll be ready to enter natural language queries about the data.

Installing Weaviate ¶

Weaviate is an open-source vector database. The easiest way to get a Weaviate instance running is using Docker.

To do so, create a docker-compose.yml file in a directory of your choice with the following contents:

---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:1.21.2
    ports:
    - 8080:8080
    volumes:
    - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      ENABLE_MODULES: ''
      CLUSTER_HOSTNAME: 'node1'
volumes:
  weaviate_data:
...

Go to that directory in the shell and run:

docker compose up -d

This will instantiate a new Docker container with Weaviate and run it in the background. (For more information on running Weaviate with Docker refer to Docker Compose | Weaviate - vector database.) Note that the Weaviate instance will allow unauthenticated access from localhost and will not be accessible outside of the machine where it is running.

Installing the ChatGPT Retrieval Plugin ¶

Clone Ontotext’s fork of the ChatGPT Retrieval Plugin. It contains some important fixes and improvements described in the ONTOTEXT.md file of the fork.

You can follow the generic instructions in the project’s README or these simpler ones.

Installing ¶

Install Python 3.10, if not already installed.
Navigate to the cloned repository directory:
```
cd /path/to/chatgpt-retrieval-plugin
```
Install the poetry package manager:
```
pip install poetry
```
Create a new virtual environment with Python 3.10:
```
poetry env use python3.10
```
Activate the virtual environment:
```
poetry shell
```
Install app dependencies:
```
poetry install
```

Tip

This works with Python 3.11 as well. You may get some warnings about deprecated options in pyproject.toml but they can be safely ignored.

Note

When running poetry install you may get an error about installing the Python package psycopg2 (interface to PostgreSQL) if a specific PostgreSQL build tool (pg_config) is not installed. Since we do not need PostgreSQL you can safely ignore this error.

Generate a Bearer Token ¶

You will need to generate a Bearer token. This is a secret token used to authenticate requests to the plugin API. You can generate one using any tool or method you prefer. For example, go to https://jwt.io/#decoded-jwt and paste this into the payload field on the right:

{
  "sub": "1234567890",
  "name": "Test",
  "iat": 1694775299
}

The token will appear in the token field on the left and will look like this:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IlRlc3QiLCJpYXQiOjE2OTQ3NzUyOTl9.DC7JjYLShRJ7r73btG6tWwPM-VNWEf41Vk1nxFkt0v0

Star Wars example ¶

This example uses data describing Star Wars movies, characters, planets and vehicles.

Run the ChatGPT Retrieval Plugin ¶

Download the run-poetry-starwars.sh file and store in the same location as the git-cloned copy of the ChatGPT Retrieval Plugin.

Open the file in a text editor and replace the values for BEARER_TOKEN (with the value from the Generate a Bearer Token step) and OPENAI_API_KEY (with your actual OpenAI key):

# Authentication token to access the plugin, replace with actual value
export BEARER_TOKEN="<your-bearer-token>"

# Your OpenAI API KEY, replace with actual value
export OPENAI_API_KEY="<your-openai-api-key>"

Run chmod +x run-poetry-starwars.sh to make the file executable

Now execute the file to run the plugin:

./run-poetry-starwars.sh

If everything is successful you should see the following messages:

... Connecting to weaviate instance at http://localhost:8080 with credential type NoneType
... Creating collection STARWARS with properties {'text', 'chunk_id', 'source_id', 'document_id', 'source', 'author', 'created_at', 'url'}
... Application startup complete.

The plugin instance will be accessible at http://localhost:8000.

Creating a ChatGPT Retrieval Connector Instance ¶

Now it’s time to connect GraphDB to the ChatGPT Retrieval Plugin by using the ChatGPT Retrieval connector.

First, create a repository with a name of your choice (for example, starwars).
Then, download the starwars-data.ttl dataset and import it into your new repository.
Download the create-retrieval-starwars.rq file.
Open the file in a text editor and replace the Bearer token placeholder under “retrievalBearerToken” with the actual token you created for the plugin:

...
INSERT DATA {
        retr-index:starwars retr:createConnector '''
{
  "retrievalUrl": "http://localhost:8000",
  "retrievalBearerToken": "<your-bearer-token>",
...

Paste the contents of this file into the GraphDB Workbench SPARQL editor and click Run.

If successful, this will create an instance of the ChatGPT Retrieval connector called “starwars”. You should be able to see the instance in the Workbench by selecting Setup ‣ Connectors.

You can also verify the basic operation of the connector instance with the following SPARQL query:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

SELECT * {
    [] a retr-index:starwars ;
        retr:query "who is luke" ;
        retr:entities ?entity .
}

The query should return the following entities, corresponding to Luke Skywalker, the planet Tatooine and the X-wing starfighter:

<https://swapi.co/resource/human/1>

<https://swapi.co/resource/planet/1>

<https://swapi.co/resource/starship/12>

Talking to Your Graph ¶

At this point you’re set to use the Talk to Your Graph feature. On the Workbench go to Lab ‣ Talk to Your Graph and ask some questions about the Star Wars dataset:

Acme Employees and Products example ¶

Another dataset and corresponding connector definition lets you explore the Talk to Your Graph feature with more business-oriented data. It contains data about the employees and products of Acme, an IT company that makes software for the animation industry.

Run the ChatGPT Retrieval Plugin ¶

Download the run-poetry-acme.sh file and store in the same location as the git-cloned copy of the ChatGPT Retrieval Plugin.

Open the file in a text editor and replace the values for BEARER_TOKEN (with the value from the Generate a Bearer Token step) and OPENAI_API_KEY (with your actual OpenAI key):

# Authentication token to access the plugin, replace with actual value
export BEARER_TOKEN="<your-bearer-token>"

# Your OpenAI API KEY, replace with actual value
export OPENAI_API_KEY="<your-openai-api-key>"

Run chmod +x run-poetry-acme.sh to make the file executable

Now execute the file to run the plugin:

./run-poetry-acme.sh

If everything is successful you should see the following messages:

... Connecting to weaviate instance at http://localhost:8080 with credential type NoneType
... Creating collection ACME with properties {'text', 'chunk_id', 'source_id', 'document_id', 'source', 'author', 'created_at', 'url'}
... Application startup complete.

The plugin instance will be accessible at http://localhost:8001.

Note

The run file for the Acme example runs the plugin at port 8001, while the Star Wars one runs it at port 8000. This means that you can safely run the two instances of the plugin at the same time.

Creating a ChatGPT Retrieval Connector Instance ¶

Now it’s time to connect GraphDB to the ChatGPT Retrieval Plugin by using the ChatGPT Retrieval connector.

First, create a repository with a name of your choice (for example, acme).
Then, download the acme-data.ttl dataset and import it into your new repository.
Download the create-retrieval-acme.rq file.
Open the file in a text editor and replace the Bearer token placeholder under “retrievalBearerToken” with the actual token you created for the plugin:

...
INSERT DATA {
        retr-index:acme retr:createConnector '''
{
  "retrievalUrl": "http://localhost:8001",
  "retrievalBearerToken": "<your-bearer-token>",
...

Paste the contents of this file into the GraphDB Workbench SPARQL editor and click Run.

If successful, this will create an instance of the ChatGPT Retrieval connector called “acme”. You should be able to see the instance in the Workbench by selecting Setup ‣ Connectors.

Setting up Talk to Your Graph ¶

Unlike the Star Wars example, this works best if you change the default Talk to Your Graph settings. When you go to Lab ‣ Talk to Your Graph, click the settings icon in the upper right of the “Talk to Your Graph” query and response list.

Paste these ground truths in the corresponding field:

The data contains employees and products of Acme, an IT company that makes software for the animation industry.
The company website is https://acme.example.com.
Today is {today}.

Change the “Number of top results” setting to 10.

The Settings form should look like this:

Some sample queries to get you started:

Who are Acme’s software developers?
Who are the company founders?
What audio products does Acme sell?
Any question about a particular employee of Acme