Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;8(1):e53786.
doi: 10.1371/journal.pone.0053786. Epub 2013 Jan 14.

Inferring hierarchical orthologous groups from orthologous gene pairs

Affiliations

Inferring hierarchical orthologous groups from orthologous gene pairs

Adrian M Altenhoff et al. PLoS One. 2013.

Abstract

Hierarchical orthologous groups are defined as sets of genes that have descended from a single common ancestor within a taxonomic range of interest. Identifying such groups is useful in a wide range of contexts, including inference of gene function, study of gene evolution dynamics and comparative genomics. Hierarchical orthologous groups can be derived from reconciled gene/species trees but, this being a computationally costly procedure, many phylogenomic databases work on the basis of pairwise gene comparisons instead ("graph-based" approach). To our knowledge, there is only one published algorithm for graph-based hierarchical group inference, but both its theoretical justification and performance in practice are as of yet largely uncharacterised. We establish a formal correspondence between the orthology graph and hierarchical orthologous groups. Based on that, we devise GETHOGs ("Graph-based Efficient Technique for Hierarchical Orthologous Groups"), a novel algorithm to infer hierarchical groups directly from the orthology graph, thus without needing gene tree inference nor gene/species tree reconciliation. GETHOGs is shown to correctly reconstruct hierarchical orthologous groups when applied to perfect input, and several extensions with stringency parameters are provided to deal with imperfect input data. We demonstrate its competitiveness using both simulated and empirical data. GETHOGs is implemented as a part of the freely-available OMA standalone package (http://omabrowser.org/standalone). Furthermore, hierarchical groups inferred by GETHOGs ("OMA HOGs") on >1,000 genomes can be interactively queried via the OMA browser (http://omabrowser.org).

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Hierarchical orthologous groups and their relationship to the orthology graph and the underlying gene and species trees.
In this example, the hierarchical groups for the taxonomic range formula image are drawn in orange. By definition, these groups correspond to the sets of leaves attached to the speciation nodes of the gene tree coloured in orange.
Figure 2
Figure 2. Illustration of Lemma 1: the taxonomic range induces a set of speciation node (left) and associated hierarchical orthologous groups (centre).
Likewise, it also induces an orthology subgraph with set of connected component formula image (right). Lemma 1 establishes the one-to-one correspondence between formula image and formula image (which we prove by viewing it as composition of the one-to-one correspondences formula image and formula image).
Figure 3
Figure 3. Example of an orthology graph.
An example orthology graph from the OMA database where two false positive prediction merges two well-defined orthologous groups. At the level of vertebrates, the NOX family forms 4 different orthologous groups. Because of two spurious predictions, the NOX1 and NOX2 clusters get weakly connected. The minimum cut algorithm will split them, as there are only two edges to cut.
Figure 4
Figure 4. Validation on simulated data: precision-recall plots of COCO-CL, LOFT and the algorithm introduced here (GETHOGs) on two datasets of 30 simulated genomes (200 genes each).
The two datasets show average rates of 4 independent runs of genome simulations with fixed parameters. The difference between the two datasets are essentially different gene duplication rates (see Method section for details). As a point of reference, we also show the performance of pairwise orthologs inferred in OMA (OMA Pairwise). The colour gradient corresponds to various formula image parameter values for GETHOGs and bootstrap value for COCO-CL.
Figure 5
Figure 5. Validation on empirical data: precision-recall plot of our newly proposed GETHOGs, COCO-CL, LOFT, EggNOG and OrthoDB on orthologous and paralogous gene relationships for the 3 gene families (3,783 relationships in total) analysed in Boeckmann et al.
. Predictions for GETHOGs and COCO-CL are computed using the default parameters (respectively formula image and bootstrapformula image). The points for EggNOG and OrthoDB are from the original analysis (Reference [9],table 2).

Similar articles

Cited by

References

    1. Fitch WM (1970) Distinguishing homologous from analogous proteins. Syst Zool 19: 99–113. - PubMed
    1. Kristensen DM, Wolf YI, Mushegian AR, Koonin EV (2011) Computational methods for Gene Orthology inference. Briefings in bioinformatics 12: 379–391. - PMC - PubMed
    1. Altenhoff AM, Dessimoz C (2012) Inferring orthology and paralogy. In: Anisimova M, editor, Evolutionary Genomics, Clifton, NJ, USA: Methods in Molecular Biology. 259–279. - PubMed
    1. Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 96: 2896–2901. - PMC - PubMed
    1. Wall DP, Fraser HB, Hirsh AE (2003) Detecting putative orthologs. Bioinformatics 19: 1710–1711. - PubMed

Publication types

Grants and funding

This work was supported by an ETH Independent Investigators’ Research Award to CD and GHG, and an SNSF advanced researcher fellowship to CD (#136461). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources