The Bio2KG Registry is a repository of dataset descriptions, including preferred CURIE prefixes, base URIs, identifier regex patterns and HTML resolvers for biomedical datasets.
The registry is constructed from a manually curated spreadsheet.
- Extract data from the Life Science Registry spreadsheet on Google docs
- Load to ElasticSearch (deployed with the
docker-compose.yml
file)
Access the Bio2KG registry web application and API endpoints:
-
Search website: https://registry.bio2kg.org
-
GraphQL API: https://registry.bio2kg.org/graphql
-
ElasticSearch API: https://elastic.registry.bio2kg.org/_search
Search with cURL:
curl -XGET --header 'Content-Type: application/json' https://elastic.registry.bio2kg.org/prefixes/_search -d '{
"query" : {
"match" : { "Preferred Prefix": "bio" }
}
}'
The process to prepare the ElasticSearch index for the Life Science Registry runs in a docker container defined in the etl
folder.
To update a running docker-compose
stack:
docker-compose run update-pipeline
- Prepare the permission for the shared volume to keep ElasticSearch data persistent:
sudo mkdir -p /data/bio2kg/registry/elasticsearch
sudo chmod g+rwx -R /data/bio2kg/registry/elasticsearch
sudo chgrp 1000 -R /data/bio2kg/registry/elasticsearch
sudo chown 1000 -R /data/bio2kg/registry/elasticsearch
- Start the stack with docker-compose:
docker-compose up -d
- Search UI frontend on http://localhost:3000
- GraphQL server on http://localhost:4000/graphql
- ElasticSearch on http://localhost:9200
🔁 If you want to update the ElasticSearch endpoint data without stopping the stack, you can run this:
docker-compose run update-pipeline
The stack deploys:
- An ElasticSearch instance with a nginx proxy to allow anyone to access the
/_search
endpoint, but prevents editing, configuration defined in theelasticsearch
folder on http://localhost:9200 - A NodeJS server using Searchkit and NextJS defined in the
website
folder on http://localhost:3000- Apollo GraphQL endpoint serving data from ElasticSearch on
/graphql
- A React website using SearchKit to search the data on the base URL (
/
) - 🚧 In development: a Sofa API to publish an OpenAPI endpoint based on the GraphQL endpoint
- API on
/api
- Swagger UI on
/apidocs
- API on
- Apollo GraphQL endpoint serving data from ElasticSearch on
You can also start just the website using yarn
without of docker, it will get the data from the ElasticSearch in production:
cd website
yarn
yarn dev
Start the stack with production config, using nginx-proxy to route the services:
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d
Install the Linked Data Platform in the Virtuoso triplestore (running via docker-compose
)
./prepare_virtuoso_ldp.sh
🔁 If you want to update the ElasticSearch endpoint data without stopping the stack, you can run this:
docker-compose run update-pipeline
To add a new field to the Bio2KG registry, check the following files:
- In the
etl
folder python script: get the field from the spreadsheet, process it, and add the field value to ElasticSearch - In
website/pages/api/graphql.tsx
: add the field in the entry fields to register this field in the GraphQL server query - In
website/components/index.tsx
: add the field to the GraphQL query used by the UI to retrieve data. - In
website/components/searchkit/Hits.tsx
: add the field to the UI