Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Jun;13(6):360-72.
doi: 10.1038/nrmicro3451. Epub 2015 Apr 27.

Sequencing and beyond: integrating molecular 'omics' for microbial community profiling

Affiliations
Review

Sequencing and beyond: integrating molecular 'omics' for microbial community profiling

Eric A Franzosa et al. Nat Rev Microbiol. 2015 Jun.

Abstract

High-throughput DNA sequencing has proven invaluable for investigating diverse environmental and host-associated microbial communities. In this Review, we discuss emerging strategies for microbial community analysis that complement and expand traditional metagenomic profiling. These include novel DNA sequencing strategies for identifying strain-level microbial variation and community temporal dynamics; measuring multiple 'omic' data types that better capture community functional activity, such as transcriptomics, proteomics and metabolomics; and combining multiple forms of omic data in an integrated framework. We highlight studies in which the 'multi-omics' approach has led to improved mechanistic models of microbial community structure and function.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Optimizing experimental design
a. Whole metagenome shotgun (WMS) sequencing studies face trade-offs between the number of subjects considered, the number of samples per subject, and the sequencing depth per sample achievable with a fixed sequencing budget (here, six “units” of WMS sequencing). Greater sequencing depth facilitates identification of rare species and rare variants of abundant species (such as SNPs); considering more subjects improves statistical power in case-control studies; considering multiple samples per subject is critical for time course analysis; and combining DNA and RNA (meta'omic) sequencing reveals differences between the functional potential and the functional activity of the microbial communities present in different individuals. b. Combining the lower cost and decreased resolution of amplicon sequencing with the higher cost and increased resolution of WMS sequencing (here, one “unit” of WMS sequencing and four “units” of amplicon sequencing are considered to have equivalent costs) enables richer experimental designs. For example, two-stage study designs begin by surveying a large number of individuals using amplicon sequencing and then follow-up with a subset of samples using WMS sequencing (selected based on individuals that are representative of the group or those that represent the extreme cases within the group). Similarly, time course studies can combine amplicon sequencing, which is used to survey a large number of time points, with WMS sequencing, which is applied to analyze a subset of time points (such as the first and last) in greater detail. Although depicted here in the context of sequencing-based assays in humans, these considerations are equally applicable to environmental samples and to a variety of high-throughput functional screens, including metaproteomics and metabolomics.
Figure 2
Figure 2. Profiling strain-level variation in microbial communities
a. Mapping paired-end sequencing reads to microbial reference genomes reveals not only the genomes that are present in a community, but also differences between the isolates of particular species and the reference isolate. In this example, most positions have 4x coverage, represented by 4 paired-end sequencing reads stacked above (mapped to) each position in the reference genomes. Gene deletion events can be detected with relatively low coverage of the reference genome; deleted genes (in orange) recruit no reads from the sample and are flanked by paired reads (orange paired reads). Higher coverage facilitates differentiating between sequencing error and true nucleotide-level strain variation. Such variation includes fixed differences (in which the sample is consistently different from the reference at some site) and single nucleotide polymorphisms (SNPs; in which a site occurs in two or more states in the sample). Paired reads that do not map together (red and blue reads) indicate additional structural variation, including the insertion of genomic material not found in the reference by mechanisms such as horizontal gene transfer (HGT). b. Assembling paired-end reads into larger genomic fragments, called contigs, facilitates detection of strain variation in the absence of a reference genome. For example, analyzing contigs from three environmental isolates of a microbial species can reveal novel genomic arrangements and HGT events. Metagenomic assembly also allows the comparison of reference contigs (in this case, t = 0) to paired-end reads obtained at different time points during temporal analysis (such as t = 6 months or t = 1 year), which enables the identification of emerging SNPs. c. Mapping reads to reference genomes reveals patterns of gene presence and absence, which is a form of strain variation. Here, two individuals sampled at two time points (t = 0 and t =1 year) are distinguished by the presence and absence of genes in species A. The blue genes are stably present in individual 1 and stably absent in individual 2, whereas the red genes are stably present in individual 2 and stably absent in individual 1.
Figure 3
Figure 3. Relating the metatranscriptome and metagenome in the human gut
In this example, shotgun RNA and DNA sequence data from gut microbiome samples of 8 healthy individuals were functionally profiled with HUMAnN. Each panel illustrates a gene or functional module for which functional activity (expression level) deviated strongly from functional potential (metagenomic abundance). The median gene or transcript abundance is plotted for functions involving more than one gene; DNA and RNA values from the same individual are connected. tetA (an antibiotic resistance determinant), methanogenesis (an important metabolic pathway among gut archaeal species), the bacterial ribosome, and groEL (a bacterial chaperone protein) were strongly over-expressed, as their RNA abundance consistently exceeded their DNA abundance. Hence, on average, genes involved in these functions were producing many transcript copies, suggesting that they were highly active in the human gut (for example, that bacterial ribosomal subunits were being continuously synthesized). Conversely, spore coat protein H (a gene involved in response to nutrient starvation) and synthesis of the amino acid tryptophan were strongly under-expressed (DNA abundance consistently exceeded RNA abundance). Reduced transcription from these microbial functions suggests that they were down-regulated in the healthy human gut, likely due to the high bioavailability of nutrients (including tryptophan) derived from the host's diet. Transcription of bacterial ribosomal proteins and groEL were highly variable across individuals relative to their strong metagenomic conservation (see inset panels), which is consistent with a pattern of subject-specific transcriptional regulation. Such inferences would not be possible if microbial community RNA or DNA sequence data were considered in isolation.
Figure 4
Figure 4. Integrating multi'omic data for deeper biological insights
a. To facilitate integrated analysis of a microbiome sample, distinct multi'omic data types are often associated with microbial genes or gene families that act as a shared point-of-reference. These genes may be taken from a reference database or directly assembled from the sample. Metagenomic, metatranscriptomic, and metaproteomic sequence data (such as paired-end reads or protein fragments identified by mass spectrometry) are then directly mapped to these genes based on sequence homology, which yields information about the copy numbers and activities of genes. Metabolites (identified by mass spectrometry) can be mapped to a subset of the genes by taking advantage of known relationships between enzyme-coding genes and their products, thus providing an additional, independent measure of gene activity. There are several motivations and advantages to perform multi'omic data integration. For example, in the absence of DNA data, measures of functional activity are confounded with community functional potential. Therefore, transcript abundance can be normalized by gene copy number; this removes the confounding effect and highlights over-, under-, or non-expressed functions (part b). Individually weak but consistent signals (from different assays and/or studies) provide stronger collective support for a hypothesis. Here, a hypothetical microbial function is more abundant at the DNA, RNA, and protein levels in case samples relative to control samples (part c). Data integration also enables descriptive modeling. For example, combining data from proteomics and metabolomics analyses can reveal whether a pathway formed by different enzymes (in this case X, Y, and Z, which metabolize substrates 1, 2 and 3, respectively) is inactive or active (part d).

Similar articles

Cited by

References

    1. Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM. Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem Biol. 1998;5:R245–9. - PubMed
    1. Tyson GW, et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004;428:37–43. - PubMed
    1. Venter JC, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304:66–74. One of the first large-scale environmental metagenomic sequencing projects; presented profiles of taxonomic composition and function from geographically diverse marine microbial communities. - PubMed
    1. Rondon MR, et al. Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl Environ Microbiol. 2000;66:2541–7. - PMC - PubMed
    1. Kembel SW, et al. Architectural design influences the diversity and structure of the built environment microbiome. ISME J. 2012;6:1469–79. - PMC - PubMed

Publication types