Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 27;4(1):ycae042.
doi: 10.1093/ismeco/ycae042. eCollection 2024 Jan.

Quantifying microbial guilds

Affiliations

Quantifying microbial guilds

Juan Rivas-Santisteban et al. ISME Commun. .

Abstract

The ecological role of microorganisms is of utmost importance due to their multiple interactions with the environment. However, assessing the contribution of individual taxonomic groups has proven difficult despite the availability of high throughput data, hindering our understanding of such complex systems. Here, we propose a quantitative definition of guild that is readily applicable to metagenomic data. Our framework focuses on the functional character of protein sequences, as well as their diversifying nature. First, we discriminate functional sequences from the whole sequence space corresponding to a gene annotation to then quantify their contribution to the guild composition across environments. In addition, we identify and distinguish functional implementations, which are sequence spaces that have different ways of carrying out the function. In contrast, we found that orthology delineation did not consistently align with ecologically (or functionally) distinct implementations of the function. We demonstrate the value of our approach with two case studies: the ammonia oxidation and polyamine uptake guilds from the Malaspina circumnavigation cruise, revealing novel ecological dynamics of the latter in marine ecosystems. Thus, the quantification of guilds helps us to assess the functional role of different taxonomic groups with profound implications on the study of microbial communities.

Keywords: ammonia oxidation; functional ecology; microbial ecology; microbial guilds; phylogenetics; polyamine uptake.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Protein sequence similarity cannot fully anticipate functional activity. Notes: Any particular function can be performed by a plethora of different protein sequences, generated by adaptive evolution and drift. Top: traditionally, automatic annotation considers any given protein sequence below a dissimilarity threshold as functional. Although the threshold can be adjusted, there are several pitfalls: (i) the threshold value is prone to errors; (ii) the functional space may display discontinuities or gaps; and (iii) proteins beyond the dissimilarity threshold may be able to perform the function. Bottom: we exemplify these scenarios with real protein sequences related to the formula image-subunit of the ammonia monooxygenase (amoA). We show: AOA (archaeal) and AOB (bacterial) amoA sequences well within the similarity threshold; pmoA, which mainly oxidize formula image-alkanes and usually does not carry out the function but falls within the threshold; hapE, which can oxidize a wide range of acetophenone derivatives, but not ammonia, is beyond the threshold; and CTP-synthase-1, which oxidizes L-glutamine to form a nucleic acid precursor, but it also can oxidize ammonia, thus performing the molecular function, falls well beyond any reasonable sequence similarity to AOA-amoA.
Figure 2
Figure 2
Guild quantification method. Notes: (A) Example showing how abundance, observed diversity, and univocity are calculated. A tree is constructed with reference sequences (lines, far left). Different clusters are (formula image) or are not (formula image) assigned to the function based on available evidence. Metagenomic sequences (stars) are placed in the reference tree. Sequence formula image does not fall within the known formula image clusters and is discarded. The remaining sequences fall in two different clusters (implementations formula image in yellow and formula image in pink). A given taxon may have different implementations and different taxa may hold the same. Next, abundance of reads for each cluster (column A, arbitrary abundances are shown as examples) and observed diversity (column formula image, counting the different unique sequences in each cluster) are calculated. Real metagenomic data quantification is available in Table S4. (B) Calculation of expected diversity, formula image. In general, the more abundant a gene is, the larger the diversity of sequences. For each gene, these variables show a log–log relationship. When real data from a metagenome are placed in the graph, some show either higher or lower diversity than expected from their abundance. By calculating formula image, we reward the former and penalize the latter.
Figure 3
Figure 3
Distinguishing functional from non-functional sequences improves the assessment of the ammonia oxidation guild. Notes: (A) Reconstructed phylogeny of amoA, used as a reference tree to classify ammonia-oxidizing capable sequences. The tree contains 135 sequences with strong functional evidence based on either biochemical or physiological features, or inferred by homology to quality sequences (see Methods). For clarity, we only highlight clusters of sequences where metagenomic Malaspina samples have been successfully placed (full clustering in Fig. S4). Among those, we distinguish sequence clusters with proven ammonium oxidation function (shaded greens) from sequences with a broader substrate spectrum and sequences with evidence of being non-functional for ammonia oxidation (shaded orange). Note that AOA-amoA form a single cluster, but AOB-amoA comprises two dissimilar clusters. Evidence of function was gathered from various sources, albeit the main ones are marked in pink and blue in the outer circle. The size of red circles is proportional to the number of reads. (B) Log representation of abundance values (TPM) of the amoA classified queries found in Malaspina metagenomes (red circles from panel A). We excluded 31% of unique sequences (1.01% of total TPM) corresponding to the non-univocal tree clusters, keeping a conservative criterion.
Figure 4
Figure 4
Polyamine uptake implementations revealed by environmental cues are non-synonymous with phylogeny. Notes: (A) A phylogenetic reconstruction of potF, a polyamine-binding subunit of an ABC transporter. Due to the very scarce experimental evidence available, we here determine the different implementations as sequence spaces adapted to specific environmental conditions (see Methods, and Tables S1 and S2). We gathered environmental data for 478 tree locations representing 321 organisms (green tags). We found six nodes enriched by environmental preferences (formula image randomizations, one-tailed formula image-values formula image), which included most sequences. Thus, the tree acts as a metagenomic query classifier. (B) To find whether the previous clusters grouped taxonomically related species, we tested the divergence between last common ancestors (LCAs) in the trees of potF and a pair of phylogenetic markers: ribosomal 16S RNA and rplB genes. The average phylogenetic distances between common groups of organisms (nodes in the tree) in the 16S RNA (formula image-axis) are compared to those in the rplB gene (black dots) or to those in the potF gene (colored dots). The size of the dot represents the number of children leaves compared. The two phylogenetic markers follow a straight line as expected. The potF sequences, however, significantly depart from this linearity. The null model is equivalent to the neutral variation expected from phylogeny and was fitted with a bivariate regression (mean and formula image, black solid and dotted lines). Red edges to the circles highlight the nodes that define the environmental clusters in panel A, which are beyond the taxonomy-related null model. The red arrow shows an example that corresponds with Oceanobacter kriegii and Thalassobius gelatinovorus, where their potF sequences are more similar than what is expected by their taxonomic markers.
Figure 5
Figure 5
Our guild approach identifies important ecological patterns in the polyamine uptake guild in contrast to the traditional approach. Notes: (A) The output of a standard automatic metagenomic analysis. The results have been displayed as histograms and as radial plots with stacked log values to improve the visualization of taxa spanning different orders of magnitude. (B) Our guild approach considering only abundance. (C) The impact coefficients, which also ponder sequence richness, shown across three environments (columns). What in the automatic annotation was a single bar has been split into several sectors, each of them corresponding to one of the implementations as identified in Fig. 4A. We only show taxa with the largest contributions, the rest are grouped under the “Others” tag (black). The contribution to the function fluctuates in both taxonomic identity and implementation preference with non-trivial relationships with depth. For example, UBA11654 sp001629325 (striped blue, red arrows) contributes in the epipelagic with two different implementations, cIII and cIIa, while it only contributes through cIII in the mesopelagic, and disappears in the bathypelagic. Moreover, Sulfitobacter sp. and P. atlanticus (black arrows) were only visible considering expected richness. Note how easy it is to observe distinct functional trends for each taxon, in contrast with traditional approaches (panel A). The incertae implementation is representing the absence of k values in the undefined sequence spaces of the reference tree.
Figure 6
Figure 6
Changes with depth in the importance of the polyamine uptake guild. An advantage of using the impact coefficients to determine the structure of a guild is that we can visualize the functional contribution in a variety of ways. For example, here we look at the fold changes in the contribution between different ocean layers for different implementations. First, the formula image values of each implementation are added for each layer (formula image, formula image, formula image). Then the values for two different layers are divided (formula image, formula image). It is easily observed which implementations depend the most on depth; in this case, formula image and formula image, sequence spaces putatively adapted to a wide pH range. Interestingly, the sharp changes in these two implementations correspond with the area of the water column with the largest shift in pH toward acidity in the oxycline (Fig. S8).

Similar articles

References

    1. Cech TR. Evolution of biological catalysis: ribozyme to RNP enzyme. In Cold Spring Harbor Symposia on Quantitative Biology, Vol. 74. Cold Spring Harbor Laboratory Press, 2009, 11–16. 10.1101/sqb.2009.74.024. - DOI - PubMed
    1. Masel J. Genetic drift. Curr Biol 2011;21:R837–8. 10.1016/j.cub.2011.08.007 - DOI - PubMed
    1. Altenhoff AM, Dessimoz C. Inferring orthology and paralogy. Methods Mol Biol 2012;855:259–79. 10.1007/978-1-61779-582-4_9 - DOI - PubMed
    1. Lanyi JK. Salt-dependent properties of proteins from extremely halophilic bacteria. Bacteriol Rev 1974;38:272–90. 10.1128/br.38.3.272-290.1974 - DOI - PMC - PubMed
    1. Ladero M, Ruiz G, Pessela BCC et al. Thermal and ph inactivation of an immobilized thermostable formula image-galactosidase from thermus sp. strain t2: comparison to the free enzyme. Biochem Eng J 2006;31:14–24. 10.1016/j.bej.2006.05.012 - DOI