Bio2KG Registry and APIs

The Bio2KG Registry is a repository of dataset descriptions, including preferred CURIE prefixes, base URIs, identifier regex patterns and HTML resolvers for biomedical datasets.

The registry is constructed from a manually curated spreadsheet.

Extract data from the Life Science Registry spreadsheet on Google docs
Load to ElasticSearch (deployed with the docker-compose.yml file)

Access the Bio2KG registry web application and API endpoints:

Search website: https://registry.bio2kg.org
GraphQL API: https://registry.bio2kg.org/graphql
ElasticSearch API: https://elastic.registry.bio2kg.org/_search

Search with cURL:

curl -XGET --header 'Content-Type: application/json' https://elastic.registry.bio2kg.org/prefixes/_search -d '{
      "query" : {
        "match" : { "Preferred Prefix": "bio" }
    }
}'

Update the Life Science Registry 🐍

The process to prepare the ElasticSearch index for the Life Science Registry runs in a docker container defined in the etl folder.

To update a running docker-compose stack:

docker-compose run update-pipeline

Deploy with docker 🐳

Locally for development

Prepare the permission for the shared volume to keep ElasticSearch data persistent:

sudo mkdir -p /data/bio2kg/registry/elasticsearch
sudo chmod g+rwx -R /data/bio2kg/registry/elasticsearch
sudo chgrp 1000 -R /data/bio2kg/registry/elasticsearch
sudo chown 1000 -R /data/bio2kg/registry/elasticsearch

Start the stack with docker-compose:

docker-compose up -d

Search UI frontend on http://localhost:3000
GraphQL server on http://localhost:4000/graphql
ElasticSearch on http://localhost:9200

🔁 If you want to update the ElasticSearch endpoint data without stopping the stack, you can run this:

docker-compose run update-pipeline

The stack deploys:

An ElasticSearch instance with a nginx proxy to allow anyone to access the /_search endpoint, but prevents editing, configuration defined in the elasticsearch folder on http://localhost:9200
A NodeJS server using Searchkit and NextJS defined in the website folder on http://localhost:3000
- Apollo GraphQL endpoint serving data from ElasticSearch on /graphql
- A React website using SearchKit to search the data on the base URL (/)
- 🚧 In development: a Sofa API to publish an OpenAPI endpoint based on the GraphQL endpoint
  - API on /api
  - Swagger UI on /apidocs

You can also start just the website using yarn without of docker, it will get the data from the ElasticSearch in production:

cd website
yarn
yarn dev

Deploy in production

Start the stack with production config, using nginx-proxy to route the services:

docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d

Install the Linked Data Platform in the Virtuoso triplestore (running via docker-compose)

./prepare_virtuoso_ldp.sh

🔁 If you want to update the ElasticSearch endpoint data without stopping the stack, you can run this:

docker-compose run update-pipeline

Add a field in the registry

To add a new field to the Bio2KG registry, check the following files:

In the etl folder python script: get the field from the spreadsheet, process it, and add the field value to ElasticSearch
In website/pages/api/graphql.tsx: add the field in the entry fields to register this field in the GraphQL server query
In website/components/index.tsx: add the field to the GraphQL query used by the UI to retrieve data.
In website/components/searchkit/Hits.tsx: add the field to the UI

⚠️ Make sure the id you are using for the field is the same everywhere! (it is case sensitive)

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
.github/workflows		.github/workflows
elasticsearch		elasticsearch
etl		etl
server		server
website		website
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bio2kg_logo.svg		bio2kg_logo.svg
docker-compose.override.yml		docker-compose.override.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
prepare_virtuoso_docker.sh		prepare_virtuoso_docker.sh
restart_deployment.sh		restart_deployment.sh
sofa-api.tsx		sofa-api.tsx
update_elasticsearch.sh		update_elasticsearch.sh
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bio2KG Registry and APIs

Update the Life Science Registry 🐍

Deploy with docker 🐳

Locally for development

Deploy in production

Add a field in the registry

About

Releases

Packages

Contributors 2

Languages

License

bio2kg/bio2kg-registry

Folders and files

Latest commit

History

Repository files navigation

Bio2KG Registry and APIs

Update the Life Science Registry 🐍

Deploy with docker 🐳

Locally for development

Deploy in production

Add a field in the registry

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages