Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 29;14(1):e1005968.
doi: 10.1371/journal.pcbi.1005968. eCollection 2018 Jan.

Reactome graph database: Efficient access to complex pathway data

Affiliations

Reactome graph database: Efficient access to complex pathway data

Antonio Fabregat et al. PLoS Comput Biol. .

Abstract

Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. A simplified example where reactions only contain reactants and products represented by the class PhysicalEntity.
(a) In the relational use case, two junction tables are required to model these many-to-many relationships (b) SQL query used to retrieve input and output entities of a given reaction where two join operations are needed per junction table. (c) The same reaction modelled as a graph. The reaction (green node) contains named outgoing relationships to corresponding input and output entities (purple nodes). (d) The same query written in Cypher, in a shorter but more intuitive manner.
Fig 2
Fig 2. Representation of the content migration.
The example shows a Reaction class reduced to its inputs, outputs, catalyst and regulators. A model class instance is converted to a graph database node where (1) slots with primitive value types become node properties and (2) slots allocating instances of another class become relationships.
Fig 3
Fig 3. A schematic diagram of the new ecosystem.
The relational database is converted to a graph database via the batch importer that relies on the Domain Model. Spring Data Neo4j and AspectJ are two main pillars for the graph-core, which also rests on the Domain Model. Users access services or use tools that make direct use of the graph-core as a library that eliminates the code boilerplate for data retrieval and offers a data persistency mechanism. Finally, export tools take advantage of Cypher to generate flat mapping files.
Fig 4
Fig 4. Examples of frequent use cases that can be answered using Cypher queries.
a) Retrieving the participating molecules for “Interleukin-4 and 13 signalling” pathway. b) Retrieving the pathways in which CCR5 participates.
Fig 5
Fig 5. Comparison of the response and elapsed time for one user sequentially retrieving 5,000 reaction instances from the graph and relational databases (blue and orange respectively).
The graph database software ecosystem achieved a 93% average improvement in performance compared to that of the relational database.
Fig 6
Fig 6. Response time versus an increasing set of users simultaneously performing queries for 5,000 reaction instances.
Starting with one and scaling up to 20 concurrent users, the relational database performance drops while the graph database keeps a low response time and a good throughput as the number of active threads increases.
Fig 7
Fig 7. Throughput measured in transactions per second, versus the number of users concurrently performing queries for 5,000 reaction instances in Homo sapiens.
Fig 8
Fig 8. The Reactome graph database in numbers.

Similar articles

Cited by

References

    1. Fabregat A, Sidiropoulos K, Garapati P, Gillespie M, Hausmann K, Haw R, et al. The Reactome pathway Knowledgebase. Nucleic Acids Res. 2016; 44:D481–7. doi: 10.1093/nar/gkv1351 - DOI - PMC - PubMed
    1. Van Bruggen R. Learning Neo4j. Birmingham: Packt Publishing Ltd.; 2014
    1. Vukotic A, Watt N, Abedrabbo T, Fox D, Partner J. Neo4j in Action. 1st ed. Shelter Island, NY: Manning Publications; 2014.
    1. Sedgewick R, Wayne K. Algorithms. 4th ed. Addison-Wesley; 2011. pp. 566–596.
    1. Vastrik I, D'Eustachio P, Schmidt E, Joshi-Tope G, Gopinath G, Croft D, et al. Reactome: a knowledge base of biologic pathways and processes. Genome Biol. 2007; 8: R39 doi: 10.1186/gb-2007-8-3-r39 - DOI - PMC - PubMed

Publication types