ChatGPT Retrieval GraphDB Connector

Overview and features

The ChatGPT Retrieval GraphDB Connector provides a means to convert an RDF model to a text representation and synchronize it to the chatgpt-retrieval-plugin, which in turn will convert the text document to embedded vectors and index it into its configured vector database. This is an experimental feature that is mostly meant to be used together with the Talk to Your Graph functionality but can also be used independently to query the vector database.

Note

GraphDB supports full-text search options as well.

The Connectors provide synchronization at the entity level, where an entity is defined as having a unique identifier (an IRI) and a set of properties and property values. In terms of RDF, this corresponds to a set of triples that have the same subject. In addition to simple properties (defined by a single triple), the Connectors support property chains. A property chain is defined as a sequence of triples where each triple’s object is the subject of the following triple.

The main features of the connector are:

  • Maintenance of an index that is always in sync with the data stored in GraphDB

  • Multiple independent instances per repository

  • The entities for synchronization are defined by:

    • A list of fields (on the ChatGPT Retrieval plugin side) and property chains (on the GraphDB side) whose values will be synchronized

    • A list of rdf:type values of the entities for synchronization

    • A list of languages for synchronization (the default is all languages)

    • Additional filtering by property and value

  • Text search via ChatGPT Retrieval plugin queries

  • Chunk and metadata extraction from search results

  • Paging of results using OFFSET and LIMIT

Each feature is described in detail below.

Usage

All interactions with the ChatGPT Retrieval GraphDB Connector are done through SPARQL queries.

There are three types of SPARQL queries:

  • INSERT for creating, updating, and deleting connector instances

  • SELECT for listing connector instances and querying their configuration parameters

  • INSERT/SELECT for storing and querying data as part of the normal GraphDB data workflow

In general, this corresponds to INSERT that adds or modifies data, and to SELECT that queries existing data.

Each connector implementation defines its own IRI prefix to distinguish it from other connectors. For the ChatGPT Retrieval GraphDB Connector, this is http://www.ontotext.com/connectors/retrieval#. Each command or predicate executed by the connector uses this prefix, e.g., http://www.ontotext.com/connectors/retrieval#createConnector to create a connector instance for ChatGPT Retrieval.

Individual instances of a connector are distinguished by unique names that are also IRIs. They have their own prefix to avoid clashing with any of the command predicates. For ChatGPT Retrieval, the instance prefix is http://www.ontotext.com/connectors/retrieval/instance#.

Warning

Changing the ChatGPT Retrieval Plugin URL will not reindex the data automatically. If you need the data to be reindexed, you can repair the connector instance.

Sample data

All examples use the following sample data that describes five fictitious wines: Yoyowine, Franvino, Noirette, Blanquito, and Rozova, as well as the grape varieties required to make these wines. The minimum required ruleset level in GraphDB is RDFS.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix wine: <http://www.ontotext.com/example/wine#> .

wine:RedWine rdfs:subClassOf wine:Wine ;
   rdfs:label "Red Wine" .
wine:WhiteWine rdfs:subClassOf wine:Wine ;
   rdfs:label "White Wine" .
wine:RoseWine rdfs:subClassOf wine:Wine ;
   rdfs:label "Rose Wine" .

wine:Merlo
    rdf:type wine:Grape ;
    rdfs:label "Merlo" .

wine:CabernetSauvignon
    rdf:type wine:Grape ;
    rdfs:label "Cabernet Sauvignon" .

wine:CabernetFranc
    rdf:type wine:Grape ;
    rdfs:label "Cabernet Franc" .

wine:PinotNoir
    rdf:type wine:Grape ;
    rdfs:label "Pinot Noir" .

wine:Chardonnay
    rdf:type wine:Grape ;
    rdfs:label "Chardonnay" .

wine:Yoyowine
    rdf:type wine:RedWine ;
    wine:madeFromGrape wine:CabernetSauvignon ;
    wine:hasSugar "dry" ;
    wine:hasYear "2013"^^xsd:integer ;
    wine:hasWinery "Semantinos" .

wine:Franvino
    rdf:type wine:RedWine ;
    wine:madeFromGrape wine:Merlo ;
    wine:madeFromGrape wine:CabernetFranc ;
    wine:hasSugar "dry" ;
    wine:hasYear "2012"^^xsd:integer ;
    wine:hasWinery "Semantinos" .

wine:Noirette
    rdf:type wine:RedWine ;
    wine:madeFromGrape wine:PinotNoir ;
    wine:hasSugar "medium" ;
    wine:hasYear "2012"^^xsd:integer ;
    wine:hasWinery "In vino veritas" .

wine:Blanquito
    rdf:type wine:WhiteWine ;
    wine:madeFromGrape wine:Chardonnay ;
    wine:hasSugar "dry" ;
    wine:hasYear "2012"^^xsd:integer ;
    wine:hasWinery "In vino veritas" .

wine:Rozova
    rdf:type wine:RoseWine ;
    wine:madeFromGrape wine:PinotNoir ;
    wine:hasSugar "medium" ;
    wine:hasYear "2013"^^xsd:integer ;
    wine:hasWinery "In vino veritas" .

Setup and maintenance

Prerequisites

You need a running instance of the ChatGPT Retrieval plugin.

Creating a connector instance

Creating a connector instance is done by sending a SPARQL query with the following configuration data:

  • the name of the connector instance (e.g., my_index);

  • a ChatGPT Retrieval instance to synchronize to;

  • classes to synchronize;

  • properties to synchronize.

The configuration data has to be provided as a JSON string representation and passed together with the create command.

You can create connectors via a Workbench dialog or by using a SPARQL update query (create command).

If you create the connector via the Workbench, no matter which way you use, you will be presented with a pop-up screen showing you the connector creation progress.

Using the Workbench

Warning

The Workbench Connector management view does not support the creation of nested fields. While the example in Using the create command below has the nested field “metadata”, it is missing from the screenshots shown here. Having the “metadata” field is not essential for most of the query examples further below.

  1. Go to Setup ‣ Connectors.

  2. Click New Connector in the tab of the respective Connector type you want to create.

  3. Fill out the configuration form.

    _images/create-connector-retrieve1.png _images/create-connector-retrieve2.png _images/create-connector-retrieve3.png _images/create-connector-retrieve4.png _images/create-connector-retrieve5.png _images/create-connector-retrieve6.png
  4. Execute the CREATE statement that will be generated from the data entered on the form by clicking OK. Alternatively, you can view its SPARQL query by clicking View SPARQL Query, and then copy it to execute it manually or integrate it in automation scripts.

Using the create command

The create command is triggered by a SPARQL INSERT with the createConnector predicate, e.g., it creates a connector instance called my_index, which synchronizes the wines from the sample data above.

To be able to use newlines and quotes in the block of JSON being passed without the need for escaping, here we use SPARQL’s multi-line string delimiter consisting of 3 apostrophes: '''...'''. You can also use 3 quotes instead: """...""".

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

INSERT DATA {
    retr-index:my_index retr:createConnector '''
{
  "retrievalUrl": "http://localhost:8000",
  "retrievalBearerToken": "<replace-this-with-actual-value>",
  "types": [
    "http://www.ontotext.com/example/wine#Wine"
  ],
  "fields": [
    {
      "fieldName": "subject",
      "propertyChain": [
        "localName()"
      ]
    },
    {
      "fieldName": "metadata",
      "propertyChain": [
        "$self"
      ],
      "objectFields": [
        {
          "fieldName": "author",
          "propertyChain": [
            "http://www.ontotext.com/example/wine#hasWinery"
          ]
        }
      ]
    },
    {
      "fieldName": "type",
      "propertyChain": [
        "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
        "localName()"
      ],
      "fieldTextPrefix": "is a",
      "valueFilter": "isExplicit($this)"
    },
    {
      "fieldName": "grape",
      "propertyChain": [
        "http://www.ontotext.com/example/wine#madeFromGrape",
        "http://www.w3.org/2000/01/rdf-schema#label"
      ],
      "fieldTextPrefix": "made from {}"
    },
    {
      "fieldName": "sugar",
      "propertyChain": [
        "http://www.ontotext.com/example/wine#hasSugar"
      ]
    },
    {
      "fieldName": "year",
      "propertyChain": [
        "http://www.ontotext.com/example/wine#hasYear"
      ]
    }
  ]
}
''' .
}

The above command creates a new ChatGPT Retrieval connector instance that connects to the ChatGPT Retrieval instance accessible at the http://localhost:8000 URL.

The "types" key defines the RDF type of the entities to synchronize. In the example, it is only entities of the type http://www.ontotext.com/example/wine#Wine (and its subtypes if RDFS or higher-level reasoning is enabled). The "fields" key defines the mapping from RDF to ChatGPT Retrieval. The basic building block is the property chain, i.e., a sequence of RDF properties where the object of each property is the subject of the following property. In the example, three bits of information are mapped - the grape the wines are made of, sugar content, and year. Each chain is assigned a short and convenient field name: “grape”, “sugar”, and “year”.

The field grape is an example of a property chain composed of more than one property. First, we take the wine’s madeFromGrape property, the object of which is an instance of the type Grape, and then we take the rdfs:label of this instance. The fields sugar and year are both composed of a single property that links the value directly to the wine.

The type field uses a property chain whose last element is localName(). This indicates that values mapped to that field should consist of the local name of the IRI value instead of the entire IRI — in this case, the name of the class that the resource belongs to.

Two helper field mappings used by the example above are:

  • subject, maps the values used to construct the beginning of each text document. It uses the localName() construct just like the type field. In this connector definition, it’s simply the name of each wine.

  • metadata, provides the metadata for the ChatGPT Retrieval Plugin document. In this case, we populate the metadata “author” field with the winery that makes each wine.

The defined fields and the values gathered from the RDF statements that match the definition are used to construct a natural language text document following a series of steps described in Text document assembly.

For example, using the create connector command above, the text document for the Franvino wine data above looks like this:

Franvino:
- is a RedWine.
- made from grape Merlo.
- made from grape Cabernet Franc.
- has sugar dry.
- has year 2012.

All documents are sent to the ChatGPT Retrieval Plugin in the format expected by it. The JSON also illustrates how the “author” field of the metadata is populated:

{
  "documents" : [ {
    "metadata" : {
      "author" : "Semantinos"
    },
    "id" : "http://www.ontotext.com/example/wine#Yoyowine",
    "text" : "Yoyowine:\n- is a RedWine.\n- made from grape Cabernet Sauvignon.\n- has sugar dry.\n- has year 2013.\n"
  }, {
    "metadata" : {
      "author" : "Semantinos"
    },
    "id" : "http://www.ontotext.com/example/wine#Franvino",
    "text" : "Franvino:\n- is a RedWine.\n- made from grape Merlo.\n- made from grape Cabernet Franc.\n- has sugar dry.\n- has year 2012.\n"
  }, {
    "metadata" : {
      "author" : "In vino veritas"
    },
    "id" : "http://www.ontotext.com/example/wine#Noirette",
    "text" : "Noirette:\n- is a RedWine.\n- made from grape Pinot Noir.\n- has sugar medium.\n- has year 2012.\n"
  }, {
    "metadata" : {
      "author" : "In vino veritas"
    },
    "id" : "http://www.ontotext.com/example/wine#Blanquito",
    "text" : "Blanquito:\n- is a WhiteWine.\n- made from grape Chardonnay.\n- has sugar dry.\n- has year 2012.\n"
  }, {
    "metadata" : {
      "author" : "In vino veritas"
    },
    "id" : "http://www.ontotext.com/example/wine#Rozova",
    "text" : "Rozova:\n- is a RoseWine.\n- made from grape Pinot Noir.\n- has sugar medium.\n- has year 2013.\n"
  } ]
}

Working with a secured ChatGPT Retrieval Plugin

GraphDB allows the access of a secured ChatGPT Retrieval Plugin instance by passing the retrievalBearerToken parameter.

Instead of supplying the token as part of the connector instance configuration, you can also implement a custom authenticator class using the GraphDB Java API and set it via the authenticationConfiguratorClass option. See these connector authenticator examples for more information and example projects that implement such a custom class.

See the List of creation parameters for more information.

Dropping a connector instance

Dropping a connector instance removes all references to its external store from GraphDB as well as the ChatGPT Retrieval index associated with it.

The drop command is triggered by a SPARQL INSERT with the dropConnector predicate where the name of the connector instance has to be in the subject position, e.g., this removes the connector my_index:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

INSERT DATA {
  retr-index:my_index retr:dropConnector [] .
}

You can also force drop a connector in case a normal delete does not work. The force delete will remove the connector even if part of the operation fails. Go to Setup ‣ Connectors where you will see the already existing connectors that you have created. Click the delete icon, and check Force delete in the dialog box.

_images/connectors-force-delete.png

Retrieving the create options for a connector instance

You can view the options string that was used to create a particular connector instance with the following query:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

SELECT ?createString {
  retr-index:my_index retr:listOptionValues ?createString .
}

Listing available connector instances

In the Connectors management view

Existing Connector instances are shown above the New Connector button. Click the name of an instance to view its configuration and SPARQL query, or click the repair / delete icons to perform these operations. Click the copy icon to copy the connector definition query to your clipboard.

_images/view-existing-connectors-retrieve.png

With a SPARQL query

Listing connector instances returns all previously created instances. It is a SELECT query with the listConnectors predicate:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>

SELECT ?cntUri ?cntStr {
  ?cntUri retr:listConnectors ?cntStr .
}

?cntUri is bound to the prefixed IRI of the connector instance that was used during creation, e.g., http://www.ontotext.com/connectors/retrieval/instance#my_index, while ?cntStr is bound to a string, representing the part after the prefix, e.g., "my_index".

Instance status check

The internal state of each connector instance can be queried using a SELECT query and the connectorStatus predicate:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>

SELECT ?cntUri ?cntStatus {
  ?cntUri retr:connectorStatus ?cntStatus .
}

?cntUri is bound to the prefixed IRI of the connector instance, while ?cntStatus is bound to a string representation of the status of the connector represented by this IRI. The status is key-value based.

Working with data

Adding, updating and deleting data

From the user point of view, all synchronization happens transparently without using any additional predicates or naming a specific store explicitly, i.e., you must simply execute standard SPARQL INSERT/DELETE queries. This is achieved by intercepting all changes in the plugin and determining which ChatGPT Retrieval Plugin documents need to be updated.

Simple queries

Once a connector instance has been created, it is possible to query data from it through SPARQL. For each matching ChatGPT Retrieval Plugin document, the connector instance returns the document subject. In its simplest form, querying is achieved by using a SELECT and providing the ChatGPT Retrieval Plugin query as the object of the retr:query predicate:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

SELECT * {
    [] a retr-index:my_index ;
        retr:query "cabernet" ;
        retr:entities ?entity .
}

The result binds ?entity to the wines whose vectors are close to “cabernet”. These could be for example :Franvino, :Yoyowine (made from Cabernet grapes) but also :Noirette (semantically close since it is a red wine as well). The exact result depends on the vector model and the specific vector database used by the ChatGPT Retrieval Plugin.

  1. Get a query instance of the requested connector instance by using the RDF notation "X a Y" (= X rdf:type Y), where X is a variable and Y is a connector instance IRI. X is bound to a query instance of the connector instance.

  2. Assign a query to the query instance by using the system predicate retr:query.

  3. Request the matching entities through the retr:entities predicate.

It is also possible to provide per-query search options by using one or more option predicates. The option predicates are described in detail below. You can also retrieve information about the matching chunks of text as well as the metadata of the matching documents; see Chunk and metadata extraction for more details.

Raw queries

To access a ChatGPT Retrieval Plugin query parameter that is not exposed through a special predicate, use a raw query. Instead of providing text in the :query part, specify a raw ChatGPT Retrieval Plugin query. For example, to filter by the “author” field in the metadata:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

SELECT ?entity {
  ?search a retr-index:my_index ;
        retr:query '''
            {
              "queries": [
                {
                  "query": "cabernet",
                  "filter": {
                    "author": "Semantinos"
                  }
                }
              ]
            }
        ''' ;
    retr:entities ?entity .
}

This query return only wines whose winery (“author” field in the metadata) is “Semantinos”.

Combining ChatGPT Retrieval results with GraphDB data

The bound ?entity can be used in other SPARQL triples in order to build complex queries that join to or fetch additional data from GraphDB, for example, to see the actual grapes in the matching wines as well as the year they were made:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>
PREFIX wine: <http://www.ontotext.com/example/wine#>

SELECT ?entity ?grape ?year {
  ?search a retr-index:my_index ;
      retr:query "cabernet" ;
      retr:entities ?entity .
  ?entity wine:madeFromGrape ?grape .
  ?entity wine:hasYear ?year
}

The result may look like this:

_images/connectors-combining-results-with-gdb-data.png

Note

:Franvino is returned twice because it is made from two different grapes, both of which are returned.

Entity match score

It is possible to access the match score returned by the ChatGPT Retrieval Plugin with the score predicate. Higher scores mean more relevance. As each entity has its own score, the predicate should come at the entity level. For example:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

SELECT ?entity ?score {
  ?search a retr-index:my_index ;
      retr:query "grape:cabernet" ;
      retr:entities ?entity .
  ?entity retr:score ?score
}

The result looks like this but the actual score might be different as it depends on the specific vector database used by the ChatGPT Retrieval Plugin:

_images/connectors-entity-match-score.png

Limit

Limit (but not offset) is supported on the ChatGPT Retrieval Plugin side of the query. This is achieved through the predicate limit. Consider this example in which a limit of 1 is specified:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

SELECT ?entity {
  ?search a retr-index:my_index ;
      retr:query "cabernet" ;
      retr:limit 1 ;
      retr:entities ?entity .
}

The result contains a single wine, Franvino:

_images/connectors-limit-and-offset.png

Note

The specific order in which GraphDB returns the results depends on how the ChatGPT Retrieval Plugin returns the matches according to their match score.

Chunk and metadata extraction

The ChatGPT Retrieval Plugin can return information about the matching chunks of text as well as the metadata of the matching documents. This information is accessed through the dedicated predicate retr:snippets. It binds a blank node that in turn provides the actual bits of information via the predicates retr:snippetField and retr:snippetText. The predicate snippets must be attached to the entity, as each entity has a different set of snippets. For example, in a search for Cabernet:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

SELECT ?entity ?snippetField ?snippetText {
  ?search a retr-index:my_index ;
      retr:query "cabernet" ;
      retr:entities ?entity .
  ?entity retr:snippets ?snippet .
  ?snippet retr:snippetField ?snippetField ;
      retr:snippetText ?snippetText .
}

The query returns the three wines that are semantically (vector) related to “Cabernet” as well as the respective matching chunks and metadata:

_images/connectors-snippet-extraction-retrieve.png

Note

The actual results might be different as this depends on the specific vector database used by the ChatGPT Retrieval Plugin.

List of creation parameters

The creation parameters define how a connector instance is created by the retr:createConnector predicate. Some are required and some are optional. All parameters are provided together in a JSON object, where the parameter names are the object keys. Parameter values may be simple JSON values such as a string or a boolean, or they can be lists or objects.

All of the creation parameters can also be set conveniently from the Create Connector user interface without any knowledge of JSON.

readonly (boolean), optional, read-only mode

A read-only connector will index all existing data in the repository at creation time, but, unlike non-read-only connectors, it will:

  • Not react to updates. Changes will not be synced to the connector.

  • Not keep any extra structures (such as the internal Lucene index for tracking updates to chains)

The only way to index changes in data after the connector has been created is to repair (or drop/recreate) the connector.

importGraph (boolean), optional, specifies that the RDF data from which to create the connector is in a special virtual graph

Used to create a connector instance from temporary RDF data inserted in the same transaction. It requires read-only mode and creates a connector whose data will come from statements inserted into a special virtual graph instead of data contained in the repository. The virtual graph is retr:graph, where the prefix retr: is as defined before. The data have to be inserted into this graph before the connector create statement is executed.

Both the insertion into the special graph and the create statement must be in the same transaction. In GDB Workbench, this can be done by pasting them one after another in the SPARQL editor and putting a semicolon at the end of the first INSERT. This functionality requires read-only mode.

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
INSERT {
    GRAPH retr:graph {
        ...
    }
} WHERE {
        ...
};
PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>
INSERT DATA {
    retr-index:my_index retr:createConnector '''
{
  "readonly": true,
  "importGraph": true,
  "fields": [],
  "languages": [],
  "types": [],
}
''' .
}
importFile (string), optional, an RDF file with data from which to create the connector

Creates a connector whose data will come from an RDF file on the file system instead of data contained in the repository. The value must be the full path to the RDF file. This functionality requires readonly mode.

detectFields (boolean), optional, detects fields

This mode introduces automatic field detection when creating a connector. You can omit specifying fields in JSON. Instead, you will get automatic fields: each corresponds to a single predicate, and its field name is the same as the predicate.

In this mode, specifying types is optional too. If types are not provided, then all types will be indexed. This mode requires importGraph or importFile.

Once the connector is created, you can inspect the detected fields in the Connector management section of the Workbench.

retrievalUrl (string), required, the ChatGPT Retrieval Plugin instance to sync to

As the ChatGPT Retrieval Plugin is a third-party service, you have to specify the URL where it is running. The format of the node value is of the form http://hostname.domain:port, and https:// is allowed too. No default value. Can be updated at runtime without having to rebuild the index.

retrievalBearerToken (string), optional, the Bearer token to use for authentication with the ChatGPT Retrieval plugin

No default value. Can be updated at runtime without having to rebuild the index.

bulkUpdateBatchSize (integer), optional, controls the maximum number of documents sent per bulk request

Default value is 1,000. Can be updated at runtime without having to rebuild the index.

authenticationConfiguratorClass optional, provides custom authentication behavior.

See Working with a secured ChatGPT Retrieval Plugin.

types (list of IRIs), required, specifies the types of entities to sync

The RDF types of entities to sync are specified as a list of IRIs. At least one type IRI is required.

Use the pseudo-IRI $any to sync entities that have at least one RDF type.

Use the pseudo-IRI $untyped to sync entities regardless of whether they have any RDF type, see also the examples in General full-text search with the connectors.

languages (list of strings), optional, valid languages for literals

RDF data is often multilingual, but only some of the languages represented in the literal values can be mapped. This can be done by specifying a list of language ranges to be matched to the language tags of literals according to RFC 4647 Section 3.3.1, Basic Filtering. In addition, an empty range can be used to include literals that have no language tag. The list of language ranges maps all existing literals that have matching language tags.

fields (list of field objects), required, defines the mapping from RDF to ChatGPT Retrieval documents

The fields specify exactly which parts of each entity will be synchronized as well as the specific details on the connector side. The field is the smallest synchronization unit and it maps a property chain from GraphDB to a field in ChatGPT Retrieval. The fields are specified as a list of field objects. At least one field object is required. Each field object has further keys that specify details.

fieldName (string), required, the name of the field in ChatGPT Retrieval documents

The name of the field that defines the mapping on the connector side. It is specified by the key fieldName with a string value. The field names are used to construct text documents so we recommend using meaningful field names that are easy to understand by a human and hence, the GPT-4 model.

fieldTextPrefix (string), optional, specifies a template for constructing the field name in text documents, has {} by default

If the value contains {} it will be replaced by the normalized field name. Field names are normalized by converting from Java camel-case notation to separate words. For example, the field name “dateOfBirth” will be normalized to “date of birth”.

fieldNameTransform (one of none, predicate or predicate.localName), optional, none by default

Defines an optional transformation of the field name. Although fieldName is always required, it is ignored if fieldNameTransform is predicate or predicate.localName.

  • none: The field name is supplied via the fieldName option.

  • predicate: The field name is equal to the full IRI of the last predicate of the chain, e.g., if the last predicate was http://www.w3.org/2000/01/rdf-schema#label, then the field name will be http://www.w3.org/2000/01/rdf-schema#label too.

  • predicate.localName: The field name is the derived from the local name of the IRI of the last predicate of the chain, e.g., if the last predicate was http://www.w3.org/2000/01/rdf-schema#comment, then the field name will be comment.

See Indexing all literals in distinct fields for an example.

propertyChain (list of IRI), required, defines the property chain to reach the value

The property chain defines the mapping on the GraphDB side. A property chain is defined as a sequence of triples where the entity IRI is the subject of the first triple, its object is the subject of the next triple, etc. In this model, a property chain with a single element corresponds to a direct property defined by a single triple. Property chains are specified as a list of IRIs where at least one IRI must be provided.

The IRI of the document will be synchronized to the special field document_id in the ChatGPT Retrieval Plugin. You may use it to query the ChatGPT Retrieval Plugin directly and to retrieve the matching entity IRI.

See Multiple property chains per field to define a field whose values are populated from more than one property chain.

See Indexing language tags to define a field whose values are populated with the language tags of literals.

See Indexing the IRI of an entity to define a field whose values are populated with the IRI of the indexed entity.

See Wildcard literal indexing to define a field whose values are populated with literals regardless of their predicate.

valueFilter (string), optional, specifies the value filter for the field

See also Entity filtering.

documentFilter (string), optional, specifies the nested document filter for the field

Only for fields that define nested documents. See also Entity filtering.

defaultValue (string), optional, specifies a default value for the field

The default value (defaultValue) provides a means to specify a default value for the field when the property chain has no matching values in GraphDB. The default value can be a plain literal, a literal with a datatype (xsd: prefix supported), a literal with language, or an IRI. It has no default value.

indexed (boolean), optional, default true

Indexed fields are used to construct the text document for the ChatGPT Retrieval Plugin and this is controlled by the Boolean option indexed. True by default. Non-indexed fields can be used to perform filtering without affecting the document contents.

Setting this to false on the special metadata field or any of its nested fields will be ignored.

multivalued (boolean), optional, default true

RDF properties and synchronized fields may have more than one value. If multivalued is set to true, all values will be used for the text document. If set to false, only a single value will be synchronized. True by default.

objectFields (objects array), optional, nested object mapping

Provide a mapping for the nested object’s fields. At present, nested objects can be defined only for the metadata of the ChatGPT Retrieval plugin.

valueFilter (string), optional, specifies the top-level value filter for the document

See also Entity filtering.

documentFilter (string), optional, specifies the top-level document filter for the document

See also Entity filtering.

Updating parameters at runtime

As mentioned above, the following connector parameters can be updated at runtime without having to rebuild the index:

  • retrievalUrl

  • retrievalBearerToken

  • bulkUpdateBatchSize

This can be done by executing the following SPARQL update, here with an example for changing the Bearer token:

PREFIX conn:<http://www.ontotext.com/connectors/retrieval#>
PREFIX inst:<http://www.ontotext.com/connectors/retrieval/instance#>
INSERT DATA {
inst:proper_index conn:updateConnector '''
    {
        "retrievalBearerToken": "<my-token>"
    }
''' .
}

Special field definitions

Helper field mappings

The ChatGPT Retrieval connector reserves some field names for fields that are treated differently when constructing a text document.

  • subject: the value of this field will be at the top of the text document followed by a colon without mentioning the field name.

  • text: the value of this field will be appended at the end of the document without mentioning the field name.

  • metadata: the value of this field will be used to construct the metadata of the document and will not be used to generate the resulting text. Note that the nested fields that are contained in it must follow the ChatGPT Retrieval Plugin metadata schema. The value for the created_at field of the metadata schema must be an xsd:dateTime. If the literal used to construct it is a valid xsd:date, it will be padded to make it a xsd:dateTime value by appending “T00:00:00Z” to conform to the requirement.

Nested objects

Nested objects are connector documents that are used as values in the main document. They are defined with the objectFields option.

In the example under Using the create command earlier, objectFields is used to map author as a nested object within the metadata field.

Multiple property chains per field

Sometimes, you have to work with data models that define the same concept (in terms of what you want to index in ChatGPT Retrieval) with more than one property chain, e.g., the concept of “name” could be defined as a single canonical name, multiple historical names and some unofficial names. If you want to index these together as a single field in ChatGPT Retrieval, you can define this as a multiple property chains field.

Fields with multiple property chains are defined as a set of separate virtual fields that will be merged into a single physical field when indexed. Virtual fields are distinguished by the suffix $xyz, where xyz is any alphanumeric sequence of convenience. For example, we can define the fields name$1 and name$2 like this:

...
  "fields": [
    {
      "fieldName": "name$1",
      "propertyChain": [
        "http://www.ontotext.com/example#canonicalName"
      ],
      "fieldName": "name$2",
      "propertyChain": [
        "http://www.ontotext.com/example#historicalName"
      ]
      ...
    },
...

The values of the fields name$1 and name$2 will be merged and synchronized to the field name in ChatGPT Retrieval.

Note

You cannot mix suffixed and unsuffixed fields with the same same, e.g., if you defined myField$new and myField$old, you cannot have a field called just myField.

Filters and fields with multiple property chains

Filters can be used with fields defined with multiple property chains. Both the physical field values and the individual virtual field values are available:

  • Physical fields are specified without the suffix, e.g., ?myField

  • Virtual fields are specified with the suffix, e.g., ?myField$2 or ?myField$alt.

Note

Physical fields cannot be combined with parent() as their values come from different property chains. If you really need to filter the same parent level, you can rewrite parent(?myField) in (<urn:x>, <urn:y>) as parent(?myField$1) in (<urn:x>, <urn:y>) || parent(?myField$2) in (<urn:x>, <urn:y>) || parent(?myField$3) ... and surround it with parentheses if it is a part of a bigger expression.

Indexing language tags

The language tag of an RDF literal can be indexed by specifying a property chain, where the last element is the pseudo-IRI lang(). The property preceding lang() must lead to a literal value. For example:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

INSERT DATA {
  retr-index:my_index retr:createConnector '''
    {
      "retrievalUrl": "http://localhost:8000",
      "retrievalBearerToken": "<replace-this-with-actual-value>",
      "types": ["http://www.ontotext.com/example#gadget"],
      "fields": [
         {
           "fieldName": "name",
           "propertyChain": [
             "http://www.ontotext.com/example#name"
           ]
         },
         {
           "fieldName": "nameLanguage",
           "propertyChain": [
             "http://www.ontotext.com/example#name",
             "lang()"
           ]
         }
      ],
    }
  ''' .
}

The above connector will index the language tag of each literal value of the property http://www.ontotext.com/example#name into the field nameLanguage.

Indexing named graphs

The named graph of a given value can be indexed by ending a property chain with the special pseudo-URI graph(). Indexing the named graph of the value instead of the value itself allows searching by named graph.

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

INSERT DATA {
  retr-index:my_index retr:createConnector '''
    {
      "retrievalUrl": "http://localhost:8000",
      "retrievalBearerToken": "<replace-this-with-actual-value>",
      "types": ["http://www.ontotext.com/example#gadget"],
      "fields": [
         {
           "fieldName": "name",
           "propertyChain": [
             "http://www.ontotext.com/example#name"
           ]
         },
         {
           "fieldName": "nameGraph",
           "propertyChain": [
             "http://www.ontotext.com/example#name",
             "graph()"
           ]
         }
      ],
    }
  ''' .
}

The above connector will index the named graph of each value of the property http://www.ontotext.com/example#name into the field nameGraph.

Indexing local names

The local name of a given IRI value can be indexed by ending a property chain with the special pseudo-URI localName(). Indexing the local name instead of the full IRI is convenient when the local name is a human-readable meaningful string.

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

INSERT DATA {
  retr-index:my_index retr:createConnector '''
    {
      "retrievalUrl": "http://localhost:8000",
      "retrievalBearerToken": "<replace-this-with-actual-value>",
      "types": ["http://www.ontotext.com/example#gadget"],
      "fields": [
         {
           "fieldName": "name",
           "propertyChain": [
             "http://www.ontotext.com/example#name"
           ]
         },
         {
           "fieldName": "feature",
           "propertyChain": [
             "http://www.ontotext.com/example#feature",
             "localName()"
           ]
         }
      ],
    }
  ''' .
}

The above connector will index the local name of each IRI value of the property http://www.ontotext.com/example#feature into the field feature.

Wildcard literal indexing

In this mode, the last element of a property chain is a wildcard that will match any predicate that leads to a literal value. Use the special pseudo-IRI $literal as the last element of the property chain to activate it.

Note

Currently, it really means any literal, including literals with data types.

For example:

{
    "fields" : [ {
        "propertyChain" : [ "$literal" ],
        "fieldName" : "name"
    }, {
        "propertyChain" : [ "http://example.com/description", "$literal" ],
        "fieldName" : "description"
    }
    ...
}

See Indexing all literals for a detailed example.

Indexing the IRI of an entity

Sometimes you may need the IRI of each entity (e.g., http://www.ontotext.com/example/wine#Franvino from our small example dataset) indexed as a regular field. This can be achieved by specifying a property chain with a single property referring to the pseudo-IRI $self. For example:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

INSERT DATA {
    retr-index:my_index retr:createConnector '''
{
  "retrievalUrl": "http://localhost:8000",
  "retrievalBearerToken": "<replace-this-with-actual-value>",
  "types": [
    "http://www.ontotext.com/example/wine#Wine"
  ],
  "fields": [
    {
      "fieldName": "entityId",
      "propertyChain": [
        "$self"
      ],
    },
    {
      "fieldName": "grape",
      "propertyChain": [
        "http://www.ontotext.com/example/wine#madeFromGrape",
        "http://www.w3.org/2000/01/rdf-schema#label"
      ]
    },
  ]
}
''' .
}

The above connector will index the IRI of each wine into the field entityId.

Note

Note that GraphDB will also use the IRI of each entity as the ID of each document in ChatGPT Retrieval, which is represented by the field id.

Entity filtering

The ChatGPT Retrieval connector supports four kinds of entity filters used to fine-tune the set of entities and individual values for the configured fields, based on the field value. Entities and field values are synchronized to ChatGPT Retrieval if, and only if, they pass the filter. The filters are similar to a FILTER() inside a SPARQL query but not exactly the same. In them, each configured field can be referred to by prefixing it with a ?, much like referring to a variable in SPARQL. Entity filter examples are provided at the end of this section.

Types of filters

Top-level value filter

The top-level value filter is specified via valueFilter. It is evaluated prior to anything else when only the document ID is known and it may not refer to any field names but only to the special field $this that contains the current document ID. Failing to pass this filter removes the entire document early in the indexing process and it can be used to introduce more restrictions similar to the built-in filtering by type via the types property.

Top-level document filter

The top-level document filter is specified via documentFilter. This filter is evaluated last when all of the document has been collected and it decides whether to include the document in the index. It can be used to enforce global document restrictions, e.g., certain fields are required or a document needs to be indexed only if a certain field value meets specific conditions.

Per-field value filter

The per-field value filter is specified via valueFilter inside the field definition of the field whose values are to be filtered. The filter is evaluated while collecting the data for the field when each field value becomes available.

The variable that contains the field value is $this. Other field names can be used to filter the current field’s value based on the value of another field, e.g., $this > ?age will compare the current field value to the value of the field age (see also Two-variable filtering). Failing to pass the filter will remove the current field value.

On nested documents, the per-field value filter can be used to remove the entire nested document early in the indexing process, e.g., by checking the type of the nested document via next hop with rdf:type.

Nested document filter

The nested document filter is specified via documentFilter inside the field definition of the field that defines the root of a nested document. The filter is evaluated after the entire nested document has been collected. Failing to pass this filter removes the entire nested document.

Inside a nested document filter, the field names are within the context of the nested document and not within the context of the top-level document. For example, if we have a field children that defines a nested document, and we use a filter like ?age < "10"^^xsd:int, we will be referring to the field children.age. We can use the prefix $outer. one or more times to refer to field values from the outer document (from the viewpoint of the nested document). For example, $outer.age > "25"^^xsd:int will refer to the age field that is a sibling of the children field.

Other than the above differences, the nested document filter is equivalent to the top-level document filter from the viewpoint of the nested document.

Filter operators

The filter operators are used to test if the value of a given field satisfies a certain condition.

Field comparisons are done on the original RDF values before they are converted to textual representation.

Operator

Meaning

?var in (value1, value2, ...)

Tests if the field var’s value is one of the specified values. Values are compared strictly unlike the similar SPARQL operator, i.e. for literals to match their datatype must be exactly the same (similar to how SPARQL sameTerm works). Values that do not match, are treated as if they were not present in the repository.

Example:
?status in ("active", "new")

?var not in (value1, value2, ...)

The negated version of the in-operator.

Example:
?status not in ("archived")

bound(?var)

Tests if the field var has a valid value. This can be used to make the field compulsory.

Example:
bound(?name)

isExplicit(?var)

Tests if the field var’s value came from an explicit statement. This will use the last element of the property chain. If you need to assert the explicit status of a previous property chain use parent(?var) as many times as needed.

Example:
isExplicit(?name)
?var = value (equal to)
?var != value (not equal to)
?var > value (greater than)
?var >= value (greater than or equal to)
?var < value (less than)
?var <= value (less than or equal to)
RDF value comparison operators that compare RDF values similarly to the equivalent SPARQL operators. The field var’s value will be compared to the specified RDF value. When comparing RDF values that are literals, their datatypes must be compatible, e.g., xsd:integer and xsd:long but not xsd:string and xsd:date. Values that do not match are treated as if they were not present in the repository.
Examples:
Given that height’s value is "150"^^xsd:int and dateOfBirth’s value is "1989-12-31"^^xsd:date, then:
?height = "150"^^xsd:int is true
?height = "150"^^xsd:long is true
?height = "150" is false

?height != "151"^^xsd:int is true
?height != "150" is true

?height > "150"^^xsd:int is false
?height >= "150"^^xsd:int is true
?dateOfBirth < "1990-01-01"^^xsd:date is true

regex(?var, "pattern")

or

regex(?var, "pattern", "i")

Tests if the field var’s value matches the given regular expression pattern.
If the “i” flag option is present, this indicates that the match operates in case-insensitive mode.
Values that do not match are treated as if they were not present in the repository.
Example:
regex(?name, "^mrs?", "i")

expr1 || expr2

or

expr1 or expr2

Logical disjunction of expressions expr1 and expr2.

Examples:
bound(?name) || bound(?company)
bound(?name) or bound(?company)

expr1 && expr2

or

expr1 and expr2

Logical conjunction of expressions expr1 and expr2.

Examples:
bound(?status) && ?status in ("active", "new")
bound(?status) and ?status in ("active", "new")

!expr

Logical negation of expression expr.

Example:
!bound(?company)

( expr )

Grouping of expressions

Example:
(bound(?name) or bound(?company)) && bound(?address)

Filter modifiers

In addition to the operators, there are some constructions that can be used to write filters based not on the values of a field but on values related to them:

Accessing the previous element in the chain

The construction parent(?var) is used for going to a previous level in a property chain. It can be applied recursively as many times as needed, e.g., parent(parent(parent(?var))) goes back in the chain three times. The effective value of parent(?var) can be used with the in or not in operator like this: parent(?company) in (<urn:a>, <urn:b>), or in the bound operator like this: parent(bound(?var)).

Accessing an element beyond the chain

The construction ?var -> uri (alternatively, ?var o uri or just ?var uri) is used to access additional values that are accessible through the property uri. In essence, this construction corresponds to the triple pattern value uri ?effectiveValue, where ?value is a value bound by the field var. The effective value of ?var -> uri can be used with the in or not in operator like this: ?company -> rdf:type in (<urn:c>, <urn:d>). It can be combined with parent() like this: parent(?company) -> rdf:type in (<urn:c>, <urn:d>). The same construction can be applied to the bound operator like this: bound(?company -> <urn:hasBranch>), or even combined with parent() like this: bound(parent(?company) -> <urn:hasGroup>).

The IRI parameter can be a full IRI within < > or the special string rdf:type (alternatively, just type), which will be expanded to http://www.w3.org/1999/02/22-rdf-syntax-ns#type.

Filtering by RDF graph

The construction graph(?var) is used for accessing the RDF graph of a field’s value. A typical use case is to sync only explicit values: graph(?a) not in (<http://www.ontotext.com/implicit>) but using isExplicit(?a) is the recommended way.

The construction can be combined with parent() like this: graph(parent(?a)) in (<urn:a>).

Filtering by language tags

The construction lang(?var) is used for accessing the language tag of field’s value (only RDF literals can have a language tag). The typical use case is to sync only values written in a given language: lang(?a) in ("de", "it", "no"). The construction can be combined with parent() and an element beyond the chain like this: lang(parent(?a) -> <http://www.w3.org/2000/01/rdf-schema#label>) in ("en", "bg"). Literal values without language tags can be filtered by using an empty tag: "".

Current context variable $this

The special field variable $this (and not ?this, ?$this, $?this) is used to refer to the current context. In the top-level value filter and the top-level document filter, it refers to the document. In the per-field value filter, it refers to the currently filtered field value. In the nested document filter, it refers to the nested document.

ALL() quantifier

In the context of document-level filtering, a match is true if at least one of potentially many field values match, e.g., ?location = <urn:Europe> would return true if the document contains { "location": ["<urn:Asia>", "<urn:Europe>"] }.

In addition to this, you can also use the ALL() quantifier when you need all values to match, e.g., ALL(?location) = <urn:Europe> would not match with the above document because <urn:Asia> does not match.

Entity filters and default values

Entity filters can be combined with default values in order to get more flexible behavior.

If a field has no values in the RDF database, the defaultValue is used. But if a field has some values, defaultValue is NOT used, even if all values are filtered out. See an example in Basic entity filter.

A typical use-case for an entity filter is having soft deletes, i.e., instead of deleting an entity, it is marked as deleted by the presence of a specific value for a given property.

Two-variable filtering

Besides comparing a field value to one or more constants or running an existential check on the field value, some use cases also require comparing the field value to the value of another field in order to produce the desired result. GraphDB solves this by supporting two-variable filtering in the per-field value filter, the top-level document filter, and the nested document filter.

Note

This type of filtering is not possible in the top-level value filter because the only variable that is available there is $this.

In the top-level document filter and the nested document filter, there are no restrictions as all values are available at the time of evaluation.

In the per-field value filter, two-variable filtering will reorder the defined fields such that values for other fields are already available when the current field’s filter is evaluated. For example, let’s say we defined a filter $this > ?salary for the field price. This will force the connector to process the field salary first, apply its per-field value filter if any, and only then start collecting and filtering the values for the field price.

Cyclic dependencies will be detected and reported as an invalid filter. For example, if in addition to the above we define a per-field value filter ?price > "1000"^^xsd:int for the field salary, a cyclic dependency will be detected as both price and salary will require the other field being indexed first.

Basic entity filter example

Given the following RDF data:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix example: <http://www.ontotext.com/example#> .

# the entity below will be synchronised because it has a matching value for city: ?city in ("London")
example:alpha
    rdf:type example:gadget ;
    example:name "John Synced" ;
    example:city "London" .

# the entity below will not be synchronised because it lacks the property completely: bound(?city)
example:beta
    rdf:type example:gadget ;
    example:name "Peter Syncfree" .

# the entity below will not be synchronized because it has a different city value:
# ?city in ("London") will remove the value "Liverpool" so bound(?city) will be false
example:gamma
    rdf:type example:gadget ;
    example:name "Mary Syncless" ;
    example:city "Liverpool" .

If you create a connector instance such as:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

INSERT DATA {
  retr-index:my_index retr:createConnector '''
    {
      "retrievalUrl": "http://localhost:8000",
      "retrievalBearerToken": "<replace-this-with-actual-value>",
      "types": ["http://www.ontotext.com/example#gadget"],
      "fields": [
         {
           "fieldName": "name",
           "propertyChain": ["http://www.ontotext.com/example#name"]
         },
         {
           "fieldName": "city",
           "propertyChain": ["http://www.ontotext.com/example#city"],
           "valueFilter": "$this = \\"London\\""
         }
      ],
      "documentFilter": "bound(?city)"
    }
  ''' .
}

The entity :beta is not synchronized as it has no value for city.

To handle such cases, you can modify the connector configuration to specify a default value for city:

...
         {
           "fieldName": "city",
           "propertyChain": ["http://www.ontotext.com/example#city"],
           "defaultValue": "London"
         }
...
}

The default value is used for the entity :beta as it has no value for city in the repository. As the value is “London”, the entity is synchronized.

Advanced entity filter example

Sometimes, data represented in RDF is not well suited to map directly to non-RDF. For example, if you have news articles and they can be tagged with different concepts (locations, persons, events, etc.), one possible way to model this is a single property :taggedWith. Consider the following RDF data:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix example2: <http://www.ontotext.com/example2#> .

example2:Berlin
    rdf:type example2:Location ;
    rdfs:label "Berlin" .

example2:Mozart
    rdf:type example2:Person ;
    rdfs:label "Wolfgang Amadeus Mozart" .

example2:Einstein
    rdf:type example2:Person ;
    rdfs:label "Albert Einstein" .

example2:Cannes-FF
    rdf:type example2:Event ;
    rdfs:label "Cannes Film Festival" .

example2:Article1
    rdf:type example2:Article ;
    rdfs:comment "An article about a film about Einstein's life while he was a professor in Berlin." ;
    example2:taggedWith example2:Berlin ;
    example2:taggedWith example2:Einstein ;
    example2:taggedWith example2:Cannes-FF .

example2:Article2
    rdf:type example2:Article ;
    rdfs:comment "An article about Berlin." ;
    example2:taggedWith example2:Berlin .

example2:Article3
    rdf:type example2:Article ;
    rdfs:comment "An article about Mozart's life." ;
    example2:taggedWith example2:Mozart .

example2:Article4
    rdf:type example2:Article ;
    rdfs:comment "An article about classical music in Berlin." ;
    example2:taggedWith example2:Berlin ;
    example2:taggedWith example2:Mozart .

example2:Article5
    rdf:type example2:Article ;
    rdfs:comment "A boring article that has no tags." .

example2:Article6
    rdf:type example2:Article ;
    rdfs:comment "An article about the Cannes Film Festival in 2013." ;
    example2:taggedWith example2:Cannes-FF .

Assume you want to map this data to the ChatGPT Retrieval Plugin, so that the property example2:taggedWith x is mapped to separate fields taggedWithPerson and taggedWithLocation, according to the type of x (whereas we are not interested in Events). You can map taggedWith twice to different fields and then use an entity filter to get the desired values:

PREFIX retr: <http://www.ontotext.com/connectors/retrieval#>
PREFIX retr-index: <http://www.ontotext.com/connectors/retrieval/instance#>

INSERT DATA {
  retr-index:my_index retr:createConnector '''
    {
      "retrievalUrl": "http://localhost:8000",
      "retrievalBearerToken": "<replace-this-with-actual-value>",
      "types": ["http://www.ontotext.com/example2#Article"],
      "fields": [
         {
            "fieldName": "comment",
            "propertyChain": ["http://www.w3.org/2000/01/rdf-schema#comment"]
         },
         {
           "fieldName": "taggedWithPerson",
           "propertyChain": ["http://www.ontotext.com/example2#taggedWith"],
           "valueFilter": "$this -> type = <http://www.ontotext.com/example2#Person>"
         },
         {
           "fieldName": "taggedWithLocation",
           "propertyChain": ["http://www.ontotext.com/example2#taggedWith"],
           "valueFilter": "$this -> type = <http://www.ontotext.com/example2#Location>"
         }
      ]
    }
  ''' .
}

Note

type is the short way to write <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>.

The six articles in the RDF data above will be mapped as such:

Article IRI

Value in taggedWithPerson

Value in taggedWithLocation

Explanation

:Article1

:Einstein

:Berlin

:taggedWith has the values :Einstein, :Berlin and :Cannes-FF. The filter leaves only the correct values in the respective fields. The value :Cannes-FF is ignored as it does not match the filter.

:Article2

:Berlin

:taggedWith has the value :Berlin. After the filter is applied, only taggedWithLocation is populated.

:Article3

:Mozart

:taggedWith has the value :Mozart. After the filter is applied, only taggedWithPerson is populated.

:Article4

:Mozart

:Berlin

:taggedWith has the values :Berlin and :Mozart. The filter leaves only the correct values in the respective fields.

:Article5

:taggedWith has no values. The filter is not relevant.

:Article6

:taggedWith has the value :Cannes-FF. The filter removes it as it does not match.

Text document assembly

The natural language text document to pass to ChatGPT is assembled from the defined connector fields with the following steps:

  1. The subject field value, followed by a colon and a new line, starts a series of statements.

  2. Regular fields (not Helper field mappings) are appended next with their values in the format:

    - has <normalized field name> <value>.

    Nested fields are represented as - has <normalized field name>: followed by each field within the nested field on a new line with additional indentation.

    The has <normalized field name> part is configurable via the connector’s fieldTextPrefix option.

  3. Finally, for each value of the field “text” add a new line and the value.

A generic text document will look like this:

<subject-field-value>:
- has <field-name1> <value>.
- has <field-name1> <value>.
- has <field-name2> <value>.
- has <nested-field-name1>:
    - has <inner-field-name1> <value>;
    - has <inner-field-name2> <value>.
...

<text-field-value1>

<text-field-value2>

...

See the wines connector example for a text document constructed from actual data.

Overview of connector predicates

The following diagram shows a summary of all predicates that can administer (create, drop, check status) connector instances or issue queries and retrieve results. It can be used as a quick reference of what a particular predicate needs to be attached to. For example, to retrieve entities, you need to use :entities on a search instance and to retrieve snippets (chunks and metadata), you need to use :snippets on an entity. Variables that are bound as a result of a query are shown in green, blank helper nodes are shown in blue, literals in red, and IRIs in orange. The predicates are represented by labeled arrows.

scale 0.85
left to right direction

skinparam activity {
  BackgroundColor<<BNode>> #D1E0FF
  BackgroundColor<<Var>> #D1FFD1
  BackgroundColor<<IRI>> #FFCC80
  BackgroundColor #FFE3E3
}

partition "Instance level" {
  "instance IRI" <<IRI>> -->[:createConnector] "JSON params"
  "instance IRI" -->[:dropConnector] "blank node\n or dummy value"
  "instance IRI" -->[:repairConnector] "blank node\n or dummy value"
  "instance IRI" -->[:connectorStatus] "?status" <<Var>>
  "_:search" <<BNode>> -->[rdf:type] "instance IRI"
}

partition "Search level: query and options" {
  "_:search" -->[:query] "query value"
  "_:search" -->[:limit] "limit value"
}

partition "Search level: results"
  "_:search" -->[:entities] "?entity" <<Var>>
}

partition "Entity level" {
  "?entity" -->[:score] "?score" <<Var>>
  "?entity" -->[:snippets] "_:snippet" <<BNode>>
}

partition "Snippet level" {
  "_:snippet" -->[:snippetField] "?snippetField" <<Var>>
  "_:snippet" -->[:snippetText] "?snippetText" <<Var>>
}

Caveats

Order of control

Even though SPARQL per se is not sensitive to the order of triple patterns, the ChatGPT Retrieval GraphDB Connector expects to receive certain predicates before others so that queries can be executed properly. In particular, predicates that specify the query or query options need to come before any predicates that fetch results.

The diagram in Overview of connector predicates provides a quick overview of the predicates.