Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 2;18(1):142.
doi: 10.1186/s12859-017-1559-2.

Reactome pathway analysis: a high-performance in-memory approach

Affiliations

Reactome pathway analysis: a high-performance in-memory approach

Antonio Fabregat et al. BMC Bioinformatics. .

Abstract

Background: Reactome aims to provide bioinformatics tools for visualisation, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Pathway analysis methods have a broad range of applications in physiological and biomedical research; one of the main problems, from the analysis methods performance point of view, is the constantly increasing size of the data samples.

Results: Here, we present a new high-performance in-memory implementation of the well-established over-representation analysis method. To achieve the target, the over-representation analysis method is divided in four different steps and, for each of them, specific data structures are used to improve performance and minimise the memory footprint. The first step, finding out whether an identifier in the user's sample corresponds to an entity in Reactome, is addressed using a radix tree as a lookup table. The second step, modelling the proteins, chemicals, their orthologous in other species and their composition in complexes and sets, is addressed with a graph. The third and fourth steps, that aggregate the results and calculate the statistics, are solved with a double-linked tree.

Conclusion: Through the use of highly optimised, in-memory data structures and algorithms, Reactome has achieved a stable, high performance pathway analysis service, enabling the analysis of genome-wide datasets within seconds, allowing interactive exploration and analysis of high throughput data. The proposed pathway analysis approach is available in the Reactome production web site either via the AnalysisService for programmatic access or the user submission interface integrated into the PathwayBrowser. Reactome is an open data and open source project and all of its source code, including the one described here, is available in the AnalysisTools repository in the Reactome GitHub ( https://github.com/reactome/ ).

Keywords: Data structures; Over-representation analysis; Pathway analysis.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Radix tree representation for the identifiers P60484, P60467, P60468, P29172, P11087, P11086, P10639, P10636, P10635, P10622, P10620, P12939, P12938, P12931, P05480, P05386, PTEN
Fig. 2
Fig. 2
Graph representation where P are proteins; C are complexes, S are sets and prime nodes are the same but for other species. a One species graph. b Relation between two species. c Base node content
Fig. 3
Fig. 3
Double-linked tree to represent the event hierarchy in Reactome. The root node defines the species and its children represent the different pathways and sub-pathways in Reactome. Each node contains the pathway identifier, name, the total curated entities and the number of entities found in the user’s sample
Fig. 4
Fig. 4
Representation of two analysis use cases joining the different data structures. In red an analysis performed using the projection to human. In green an analysis performed without projection

Similar articles

Cited by

References

    1. García-Campos MA, Espinal-Enríquez J, Hernández-Lemus E. Pathway analysis: state of the art. Front Physiol. 2015;6:383. doi: 10.3389/fphys.2015.00383. - DOI - PMC - PubMed
    1. Zhang J, Chiodini R, Badr A, Zhang G. The impact of next-generation sequencing on genomics. J Genet Genomics. 2011;38:95–109. doi: 10.1016/j.jgg.2011.02.003. - DOI - PMC - PubMed
    1. Reuter JA, Spacek DV, Snyder MP. High-throughput sequencing technologies. Mol Cell. 2015;58(4):586–97. doi: 10.1016/j.molcel.2015.05.004. - DOI - PMC - PubMed
    1. Drǎghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA. Global functional profiling of gene expression. Genomics. 2003;81:98–104. doi: 10.1016/S0888-7543(02)00021-6. - DOI - PubMed
    1. Chowdhury S, Sarkar RR. Comparison of human cell signaling pathway databases—evolution, drawbacks and challenges. Database (Oxford). 2015. doi:10.1093/database/bau126. - PMC - PubMed

LinkOut - more resources