Reactome pathway analysis: a high-performance in-memory approach
- PMID: 28249561
- PMCID: PMC5333408
- DOI: 10.1186/s12859-017-1559-2
Reactome pathway analysis: a high-performance in-memory approach
Abstract
Background: Reactome aims to provide bioinformatics tools for visualisation, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Pathway analysis methods have a broad range of applications in physiological and biomedical research; one of the main problems, from the analysis methods performance point of view, is the constantly increasing size of the data samples.
Results: Here, we present a new high-performance in-memory implementation of the well-established over-representation analysis method. To achieve the target, the over-representation analysis method is divided in four different steps and, for each of them, specific data structures are used to improve performance and minimise the memory footprint. The first step, finding out whether an identifier in the user's sample corresponds to an entity in Reactome, is addressed using a radix tree as a lookup table. The second step, modelling the proteins, chemicals, their orthologous in other species and their composition in complexes and sets, is addressed with a graph. The third and fourth steps, that aggregate the results and calculate the statistics, are solved with a double-linked tree.
Conclusion: Through the use of highly optimised, in-memory data structures and algorithms, Reactome has achieved a stable, high performance pathway analysis service, enabling the analysis of genome-wide datasets within seconds, allowing interactive exploration and analysis of high throughput data. The proposed pathway analysis approach is available in the Reactome production web site either via the AnalysisService for programmatic access or the user submission interface integrated into the PathwayBrowser. Reactome is an open data and open source project and all of its source code, including the one described here, is available in the AnalysisTools repository in the Reactome GitHub ( https://github.com/reactome/ ).
Keywords: Data structures; Over-representation analysis; Pathway analysis.
Figures
Similar articles
-
Reactome diagram viewer: data structures and strategies to boost performance.Bioinformatics. 2018 Apr 1;34(7):1208-1214. doi: 10.1093/bioinformatics/btx752. Bioinformatics. 2018. PMID: 29186351 Free PMC article.
-
Interleukins and their signaling pathways in the Reactome biological pathway database.J Allergy Clin Immunol. 2018 Apr;141(4):1411-1416. doi: 10.1016/j.jaci.2017.12.992. Epub 2018 Feb 21. J Allergy Clin Immunol. 2018. PMID: 29378288 Free PMC article.
-
Reactome graph database: Efficient access to complex pathway data.PLoS Comput Biol. 2018 Jan 29;14(1):e1005968. doi: 10.1371/journal.pcbi.1005968. eCollection 2018 Jan. PLoS Comput Biol. 2018. PMID: 29377902 Free PMC article.
-
Plant Reactome and PubChem: The Plant Pathway and (Bio)Chemical Entity Knowledgebases.Methods Mol Biol. 2022;2443:511-525. doi: 10.1007/978-1-0716-2067-0_27. Methods Mol Biol. 2022. PMID: 35037224 Review.
-
Web tools for predictive toxicology model building.Expert Opin Drug Metab Toxicol. 2012 Jul;8(7):791-801. doi: 10.1517/17425255.2012.685158. Epub 2012 May 12. Expert Opin Drug Metab Toxicol. 2012. PMID: 22577953 Review.
Cited by
-
Graph databases in systems biology: a systematic review.Brief Bioinform. 2024 Sep 23;25(6):bbae561. doi: 10.1093/bib/bbae561. Brief Bioinform. 2024. PMID: 39565895 Free PMC article.
-
Genomic variants associated with type 2 diabetes mellitus among Filipinos.PLoS One. 2024 Nov 19;19(11):e0312291. doi: 10.1371/journal.pone.0312291. eCollection 2024. PLoS One. 2024. PMID: 39561140 Free PMC article.
-
Integrated proteomics and metabolomics analyses reveal new insights into the antitumor effects of valproic acid plus simvastatin combination in a prostate cancer xenograft model associated with downmodulation of YAP/TAZ signaling.Cancer Cell Int. 2024 Nov 16;24(1):381. doi: 10.1186/s12935-024-03573-1. Cancer Cell Int. 2024. PMID: 39550583 Free PMC article.
-
Polygenic and transcriptional risk scores identify chronic obstructive pulmonary disease subtypes in the COPDGene and ECLIPSE cohort studies.EBioMedicine. 2024 Nov 6;110:105429. doi: 10.1016/j.ebiom.2024.105429. Online ahead of print. EBioMedicine. 2024. PMID: 39509750 Free PMC article.
-
Breast cancer genomic analyses reveal genes, mutations, and signaling networks.Funct Integr Genomics. 2024 Nov 4;24(6):206. doi: 10.1007/s10142-024-01484-y. Funct Integr Genomics. 2024. PMID: 39496981
References
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources