Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul;29(7):1152-1163.
doi: 10.1101/gr.243212.118. Epub 2019 Jun 24.

OMA standalone: orthology inference among public and custom genomes and transcriptomes

Affiliations

OMA standalone: orthology inference among public and custom genomes and transcriptomes

Adrian M Altenhoff et al. Genome Res. 2019 Jul.

Abstract

Genomes and transcriptomes are now typically sequenced by individual laboratories but analyzing them often remains challenging. One essential step in many analyses lies in identifying orthologs-corresponding genes across multiple species-but this is far from trivial. The Orthologous MAtrix (OMA) database is a leading resource for identifying orthologs among publicly available, complete genomes. Here, we describe the OMA pipeline available as a standalone program for Linux and Mac. When run on a cluster, it has native support for the LSF, SGE, PBS Pro, and Slurm job schedulers and can scale up to thousands of parallel processes. Another key feature of OMA standalone is that users can combine their own data with existing public data by exporting genomes and precomputed alignments from the OMA database, which currently contains over 2100 complete genomes. We compare OMA standalone to other methods in the context of phylogenetic tree inference, by inferring a phylogeny of Lophotrochozoa, a challenging clade within the protostomes. We also discuss other potential applications of OMA standalone, including identifying gene families having undergone duplications/losses in specific clades, and identifying potential drug targets in nonmodel organisms. OMA standalone is available under the permissive open source Mozilla Public License Version 2.0.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Conceptual overview of the OMA standalone software. Dotted arrows indicate alternative steps (reference species tree either specified as input or inferred from the data). The species tree inference step infers a distance tree but can be bypassed by supplying a reference tree.
Figure 2.
Figure 2.
Resource measurements for various data sets of increasing sizes as total number of protein sequences. The data sets have been sampled from the public OMA Browser to maintain a constant composition of 20% fungi, 10% archaea, 10% plants, 20% metazoan, and 40% bacteria genomes. (Left) Runtime of the all-against-all phase (orange) on a single CPU, and the inference of the orthologous pairs and various groups (green). (Right) Peak memory usage of OMA standalone in gigabytes (GB).
Figure 3.
Figure 3.
Comparison of amount of orthologous data inferred by the different pipelines. (A) OMA and OrthoFinder infer more orthologous groups than other methods, whereas the groups inferred by HaMStR are considerably larger on average than for the other methods. (B) The resulting supermatrix has most sites for OMA, whether the minimum site occupancy threshold is 40% or 50%, and most sites for HaMStR at the 60% cutoff (used for phylogenomic reconstruction) and 70% cutoff.
Figure 4.
Figure 4.
Comparison of trees obtained using PhyloBayes with the CAT-GTR-G4 model from the different orthology methods. OMA tree is in congruence with published results (see main text). Branches that are at odds with the literature are in red; otherwise they are displayed in gray (posterior probability < 0.95) or else in black. Only posterior probabilities below one are displayed. Please note that the PhyloBayes tree computed from HaMStR data did not converge after 900,000 CPU hours and thus should be interpreted with caution.
Figure 5.
Figure 5.
Accuracy of trees reconstructed with varying number of orthologous groups, on the lophotrochozoan data set, using IQ-TREE with a WAG + I model. Each point is obtained by averaging over results obtained from 50 random group subsets of varying size, drawn without replacement. Even if all methods are downsampled to have the same number of groups, trees obtained from OMA are consistently among the most accurate ones (measured in terms of the Robinson-Foulds distance to a partially resolved reference tree) (see Methods). Error bars depict one standard error on each side.
Figure 6.
Figure 6.
Model tree based on the literature (see Methods).

Similar articles

Cited by

References

    1. Afrasiabi C, Samad B, Dineen D, Meacham C, Sjölander K. 2013. The PhyloFacts FAT-CAT web server: ortholog identification and function prediction using fast approximate tree classification. Nucleic Acids Res 41: W242–W248. 10.1093/nar/gkt399 - DOI - PMC - PubMed
    1. Altenhoff AM, Dessimoz C. 2009. Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol 5: e1000262 10.1371/journal.pcbi.1000262 - DOI - PMC - PubMed
    1. Altenhoff AM, Dessimoz C. 2012. Inferring orthology and paralogy. Methods Mol Biol 855: 259–279. 10.1007/978-1-61779-582-4_9 - DOI - PubMed
    1. Altenhoff AM, Studer RA, Robinson-Rechavi M, Dessimoz C. 2012. Resolving the ortholog conjecture: Orthologs tend to be weakly, but significantly, more similar in function than paralogs. PLoS Comput Biol 8: e1002514 10.1371/journal.pcbi.1002514 - DOI - PMC - PubMed
    1. Altenhoff AM, Gil M, Gonnet GH, Dessimoz C. 2013. Inferring hierarchical orthologous groups from orthologous gene pairs. PLoS One 8: e53786 10.1371/journal.pone.0053786 - DOI - PMC - PubMed

Publication types

LinkOut - more resources