Metabolómica Livro 300pgs
Metabolómica Livro 300pgs
Metabolómica Livro 300pgs
Published by InTech
Janeza Trdine 9, 51000 Rijeka, Croatia
As for readers, this license allows users to download, copy and build upon published
chapters even for commercial purposes, as long as the author and publisher are properly
credited, which ensures maximum dissemination and a wider impact of our publications.
Notice
Statements and opinions expressed in the chapters are these of the individual contributors
and not necessarily those of the editors or publisher. No responsibility is accepted for the
accuracy of information contained in the published chapters. The publisher assumes no
responsibility for any damage or injury to persons or property arising out of the use of any
materials, instructions, methods or ideas contained in the book.
Contents
Preface IX
Preface
Metabolomics is a new scientific field which has developed with an accelerating speed
over the last decade as demonstrated through the increasing numbers of publications
in scientific journals of any biological research field. These developments are mainly
driven by increasingly robust and sensitive analytical instrumentations allowing the
analysis and quantification of thousands of metabolites from any biological system.
Together with the application of sophisticated computational methodology and
statistical approaches the vast amount of data generated from instrumentation can be
analysed and mined aiding biological and biochemical interpretation. Especially, once
experimental metabolomics data can be integrated with other ‘omics type data such as
from genomics and proteomics analyses the path is paved for a better holistic
understanding of the biological system under investigation.
This book will provide the reader with summaries of the state-of-the-art of the
technologies and methodologies, especially in the data analysis and interpretation
approaches as well as gives insights into exciting applications of metabolomics in
human health studies, safety assessments and plant and microbial research.
Dr. Ute Roessner
School of Botany, The University of Melbourne, Victoria,
Australia
Part 1
1. Introduction
Since the mid-1950s, when pioneering work of Earle and colleagues (1954) enable routine
cell culture, mammalian cell culture has been used in the large-scale production of
recombinant protein and monoclonal antibodies. Mammalian cell lines are preferred as
production host for many pharmaceuticals, since complex post-translational modifications
of the produced proteins (especially glycosylation) are generally not properly performed by
microbial systems (Lake-Ee Quek et al., 2010).
Wagburg described that under batch conditions, mammalian cells display an inefficient
metabolic phenotype characterized by high rates of glucose to lactate conversion (Warburg,
1956) together with partial oxidation of glutamine to ammonia and non-essential amino
acids (Fitzpatrick et al., 1993; Jenkins et al., 1992; Ljunggren and Haggstrom, 1992; Ozturk
and Palsson, 1991). The accumulation of fermentation by-products causes a reduction of the
culture density and product titer that can be realized (Martinelle et al., 1998).
In order to increase the cell productivity a common optimization approach is to grow cells
to moderately high density in fed-batch and the deliberately induce a prolonged, productive
stationary phase. While optimization of this perturbed batch strategy is responsible for the
increase of monoclonal antibodies titer seen over the past decades it has a number of short-
comings, including:
a. the strategy has to be refined for each new cell line,
b. the ultimate metabolic phenotype during prolonged stationary phase varies between
cell lines and it is not always possible to achieve the most productive phenotypes for a
given strain
c. volumetric productivity remains relatively low due to moderate cell density (Lake Ee
Quek et al., 2010).
Other approaches are related with the changes of cell phenotypes through metabolic
engineering or change of culture media conditions in order to manipulate the cellular
metabolic behavior.
Although transcriptomics and proteomics have been explored extensively for mammalian
cell engineering (Korke et al., 2002; Seow et al., 2001; Seth et al., 2007; Smales et al., 2004; de
la Luz et al., 2007, 2008) these tools fall short of generating direct measurements of the
physiological state of the cell. It is essential to combine these techniques with metabolic flux
4 Metabolomics
3. Experimental design
3.1 Cell culture growth, stimulation
The differences in optimized cell culture growth conditions present another major concern
for cell line metabolomics. This is particularly an issue is studies involving comparative
analysis of several different cell types, all of which might contain different levels of glucose,
glutamine and lactate, as well as other nutrients and additives, which will probably lead to
differences in the metabolome of the cells. If possible, it is recommended to use the same
growth medium for all cell lines in the study to reduce variance in metabolic profile that can
be caused by the medium (Cuperlovic-Culf et al., 2010). The standard enhancement of cell
culture medium with serum of animal origin can add another level of complexity in cell
growth condition optimization. Variations in serum can lead to contamination with
exogenous metabolites and alterations of endogenous cell metabolite.
In order to minimize the influence of different cell culture conditions in the metabolomic
final results, proper experimental designs are crucial. Nonetheless, more effort is required in
the future for the determination of metabolic differences caused by various growth
conditions, cell culture age and/or passage number for different cell lines.
Several reviews have dealt with the application of NMR and MS in metabolomics (Ala-
Korpela, 2008; Detmer et al., 2007; Griffin, 2003). NMR is a non-invasive, non-destructive,
highly discriminatory and fast method that can analyze rather crude samples. NMR
spectroscopy can be performed without extensive sample preprocessing and separation and
provides several different experimental protocols optimized for mixture analysis and
molecular formula or structure determination. The results of NMR measurements have
proven highly replicable across centers and instruments (Viant et al., 2005). NMR can
provide measurements for different types and sizes of both polar and non-polar molecules
through analysis of different spectral windows. In addition, NMR instruments are highly
versatile and with only minor changes in probes, users can obtain spectral information for
different nuclei (1H, 13C, 15N, and 32P among others) in solvent or solid samples and even in
vivo (Griffin, 2003). NMR is also the only method used in metabolomics that currently
enables direct measurements of molecular diffusion, interactions and chemical exchange.
Several databases and methods are being developed that enable metabolite identification
and quantification from NMR spectra (Table 1). The major problem with NMR technology
as applied to metabolomics is its low sensitivity, which limits the majority of currently
available instruments to measurement of fewer than 100 metabolites.
The role of MS in metabolomic research is constantly expanding, whether the focus is on
profiling (targeted analysis) or pattern-based analysis (Hollywood et al., 2006). Recent
technological advances in separation science, ion sources and mass analyzers have
considerably increased the sensitivity, selectivity, specificity and speed of metabolite
detection and identification by MS. There are five important considerations that need to be
dealt with in any global metabolite analysis by MS:
1. The efficient and unbiased extraction of metabolites from the sample matrix
2. Separation or fractionation of the analytes by chromatography
3. Ionization of the analyte metabolite
4. Detection of mass signals
5. Analyte identification
mass resolution analysis capability with mass accuracy approaching single digit ppm.
Recently developed instruments also allow rapid polarity switching between positive and
negative mode within a single run, reducing the need for multiple runs and cost per sample
(Gummer et al., 2009).
Analysis NMR MS
High throughput-metabolites No Medium
High throughput-samples; automation Yes No
Quantitative Yes Yes
Availability in clinic No No
Equipment cost High High
Maintenance cost Medium High
Per sample cost Low High
Required technical skills Yes Yes
Sensitivity Medium High
Reproducibility High Low
Data analysis automation Yes Yes
Identification of new metabolites Difficult Possible
Chemical exchange analysis Yes No
In vivo measueremnt Possible Impossible
Table 2. Comparison of characteristics of major experimental methods for metabolomic
analysis
LC-MS linear quadrupole, triple quadrupole (QQQ), QTrap and io trap mass analyzers have
also been utilized for global and targete metabolomics, but may be limited by mass accuracy
and mass resolution in identifying metabolites. However, the use of triple quadrupole and
QTrap mass analyzers in various selective ion scaning modes can be used to detect specific
metabolites or metabolite classes with high sensitivity and are particulary useful for targeted
metabolomic analysis.
CE-MS offers a complementary approach to LC-MS for analyzing anions, cations, and
neutral particles in a single run. Metabolites can be analyzed directly without derivitisation
and the chromatographic resolution and sensitivity of CE is very high. However, CE is less
frequently used for metabolomic analyzes tha LC-MS.
Vibrational spectroscopies are relative insensitive, but FTIR allows for high throughput
screening of biological samples in an unbiased fashion. Similar to NMR, water signals pose a
problem and must be subtracted electronically or attenuated total reflectance may be used.
Compared with the other methods it is one of the least sensitive, but its unbiasness to
compounds and ability to analyse large numbers of sample in a day makes it a plausible
method for screening purposes (Khoo and Al-Rubeai, 2007).
LC-MS-based instruments can be operated in direct infusion mode with no chromatographic
separation for measurement of the total mass spectrum for the mixture. The infusion can be
performed with either the LC autosampler or with and offline syringe pump. Ion trap, TOF,
Q-TOF, Orbitrap and FT-ICR-MS mass analyzers have been used with this mode of sample
delivery. This approach relies totally on the mass analyser to resolve isobaric metabolites
such as leucine or isoleucine. The key advantange of direct infusion analysis is the potential
Metabolomics and Mammalian Cell Culture 11
for automated high throughput sample analysis with both low and high mass resolution
mass analyzers.
Another beneficial experimental method for cell culture metabolomics analysis involves
stable isotope labeling followed by either MS or NMR measurement. This approach enables
pathway tracing, easier metabolite assignment and metabolic flux measurements. Isotopic
labeling has previously enabled detailed determination of pathways leading to the
production of specific metabolites and the development of the highly accurate mathematical
models of these pathways (Hollywood et al., 2006).
Table 3. Cellular growth parameters from host NS0 and recombinant NS0 cell lines adapted
and non-adapted to protein-free medium
Fig. 3. Metabolic model of the adaptation of NS0 cel line to protein free-medium obtained
from proteomic and metabolites analysis. Red: sub-expressed pathways. Blue: over-
expressed pathways.
With the aim to calculate the relative rate of glycolysis and glutaminolysis, intracellular
concentration of lactate and glucose were determined in the batch culture and the
relationships between lactate production and glucose consumption (qL/qG) were calculated
(figure 4). These results indicated that the lactate produced depend of the glycolysis and the
glutaminolysis. Taken into account protemic and metabolic results we have proposed a
metabolic mechanism where the glucose is used for the precursors synthesis. On the other
hand, the cell obtain the energy from glutamine degradation.
We used the flux balance analysis (FBA) in order to compare the results obtained with an
empiric metabolic network with the experimental results. In this study we used a reported
metabolic network (Ma and Zeng, 2003) with changes in the cholesterol reactions, where the
cholesterol synthesis pathways was eliminated in NS0 non-adapted. The comparison
between adapted and non adapted metabolic network showed changes in carbohydrate and
lipid metabolism, very similar with our previous experimental results. Also we analyzed the
metabolites that have influence in cellular growth when they are not present in the medium.
Glycine, tryptophan, phenylalanine, adenine, palmitic acid, glutamic acid, methinonine and
asparagine are relatefd with the increase of cellular biomass (data not shown).
Metabolomics and Mammalian Cell Culture 15
Fig. 4. Glucose consumption and lactate production during a batch culture of myeloma cell
line in presence and absence of serum. During the culture period, samples were taken
periodically for off-line analysis and media metabolites concentration were determined. The
relation between lactate production and glucose consumption is representative of the
cellular metabolic state, especially of the glycolysis efficiency
5. Conclusion
Data integration is not limited to flux data. Systems biology encompasses a holistic
approach to the study of biology and the objective is to simultaneously monitor all
biological processes operating as an integrated system. The use of the data obtained from
studies with different “omics” techniques is not simple. In addition, a single gene may code
for isoenzymes reacting with multiple metabolite substrates. The difficulty in determining
the timing of different events, that it, transcription and protein activity, also contribute to
the difficulty in integrating data. Hence in order for metabolomics to be used in systems
biology, novel strategies will need to be created. One step forward in such an integration
process is the functional assignments between protein/gene and metabolite within a system
of interest. This can be done by creating models where basic biochemical pathways are
modelled using static data (Khoo and Al-Rubeai, 2007). Second, time-dependent
concentrations of other types of components (transcriptomics and/or proteomics) will then
be incorporated followed by the reconstruction of the model with statistic data.
In contrast with previous results, changes in metabolic rates and biosynthetic machinery
with respect to the presence or not of serum in the culture medium were observed in this
study. The analysis was performed by two different ways. First, using iTRAQ reagents,
proteins with differential expression levels in two myeloma cell lines cultured in serum-
supplemented and serum-free medium were detected. These proteins belong to major
pathways related with glycolysis, protein synthesis and membrane transport. These
results are in accordance with previous results obtained using 2DE and the study of a
revertant
16 Metabolomics
6. References
Earle W.R. et al (1954) Certain factors limiting the size of the tissue culture and the
development of mass cultures. New York Academy of Sciences 58, 1000-1011
Lake-Ee Quek et al. (2010) Metabolic flux analysis in mammalian cell culture. Metabolic
Engineering 12, 161-171
Warburg O. (1956) On the origin of cancer cells. Science 123, 309-314
Fitzpatrick L. et al. (1993) Glucose and glutamine metabolism of a murine B-lymphocyte
hybridoma grown in batch culture. Applied Biochemistry and Biotechnology 43, 93-116
Jenkins H.A. et al. (1992) Characterization of glutamine metabolism in two related murine
hybridomas. Journal of Biotechnology 23, 167-182
Ljunggren J. and Haggstrom L. (1992) Glutamine limited fed-batch culture reduces the
overflow metabolism of amino acids in myeloma cells. Cytotechnology 8, 45-56
Ozturk S.S. and PalssonB.O. (1991) Growth, metabolic, and antibody production kinetics of
hybridoma cell culture: 1 Analysis of data from controlled batch reactors.
Biotechnology Progress 7, 471-480
Martinelle K. et al. (1998) Elevated glutamate dehydrogenase flux in glucose-deprived
hybridoma and myeloma cells: evidence from H-1/N-15 NMR. Biotechnology and
Bioengineering 60, 508-517
Korke R. et al. (2002) Genomic and proteomic perspectives in cell culture engineering.
Journal of Biotechnology 94, 73-92
Seow T.K. et al. (2001) Proteomic invetsigation of metabolic shift in mammalian cell culture.
Biotechnology Progress 17, 1137-1144
Seth G. et al. (2007) Molecular portrait of high productivity in recombinant NS0 cells.
Biotechnology and Bioengineering 97, 933-951
Smales C.M. et al. (2004) Comparative proteomic analysis of GS-NS0 murine myeloma cell
lines with variyng recombinant monoclonal antibody production rate. Biotechnology
and Bioengineering 88, 474-488
de la Luz K.R. et al. (2007) Proteomic analysis of the adaptation of the host NS0 myeloma
cell line to a protein-free medium. Biotecnología Aplicada 24, 215-223
de la Luz K.R. et al. (2008) Metabolic and proteomic study of NS0 myeloma cell line
following the adaptation to protein-free medium. Journal of Proteomics 71, 133-147
Oliver S.G. et al. (1998) Systematic functional analysis of the yeast genome Trends in
Biotechnology 16, 373–378
Beecher C.W. (2002) Metabolomics: A new "omics" technology. American Genomics -
Proteomics Technology
Metabolomics and Mammalian Cell Culture 17
Griffin J.L. (2003) Metabonomics: NMR spectroscopy and pattern recognition analysis of
body fluids and tissues for characterization of xenobiotic toxicity and disease
diagnosis Curr. Opin. Chem. Biol. 7, 648-654
Barnes L. Et al. (2000) Advances in animal cell recombinant protein production: GS-NS0
expression system. Cytotechnology 32, 109-123
Jenkins H. et al. (1992) Characterisation of glutamine metabolism in two related murine
hybridomas Journal of Biotechnology 23, 167-182
Gummer J. et al. (2009) Use of mass spectrometry for metabolite profiling and metabolomics
Australian Biochemist 40, 5-8
Sinacore M. et al. (2000) Adaptation of mammalain cells to growth in serum-free media
Molecular Biotechnology 15, 249-257
Spens E. and Haggstrom L. (2005) Defined protein and animal component-free NS0 fed-
batch culture Biotechnology and Bioengineering 98, 6
Jung Y. et al. (2006) Identifying differentially expressed genes in meta-analysis via Bayesian
model-based clustering Biomolecules Journal 48, 435-450
Seth G. et al. (2005) Large-scale gene expression analysis of cholesterol dependence in NS0
cells Biotechnology and Bioengineering 90, 552-567
Ma H. and Zeng A. (2003) Reconstruction of metabolic networks from genome data and
analysis of their global structure for various organisms Bioinformatics 19, 270-277
2
1. Introduction
The baker’s yeast Saccharomyces cerevisiae and its beneficial properties have been recognized
very early by human beings. It has been used in the making of alcoholic beverages, bread
and cake long before the term biotechnology has been coined. In addition to its great
importance in food industry S. cerevisiae strains are nowadays applied in many other fields
for example in the production of bio-fuels from corn or sugar containing crops, in the bio-
sorption of heavy-metals from sewage, in pharmaceuticals or in the production of precursor
compounds for the synthesis of pharmaceuticals or fine chemicals. As a consequence S.
cerevisiae developed to one of the most important and best investigated microbial cell
factories for the industrial (white) biotechnology. Furthermore S. cerevisiae is an important
model organism used to elucidate the underlying molecular mechanistic principles that are
involved in complex diseases (cancer or diabetes) and metabolic disorders (Castrillo and
Oliver 2005; Castrillo and Oliver 2006; Nielsen and Jewett 2008). Other important features of
S. cerevisiae that led to its multifaceted applicability in industry and R&D constitute its
GRAS (generally recognized as safe) status and that cells are very easy to cultivate and are
readily available.
The physiology of S. cerevisiae under various environmental conditions has been
investigated intensively in the last 140 years (Racker 1974). The baker’s yeast exhibits
some very interesting physiological features that render it unique among all other
microorganisms. It grows nearly equally fast under aerobic and anaerobic conditions with
glucose as the sole carbon source (Nissen et al. 2000a; Visser et al. 1990). Under aerobic
conditions and at glucose concentrations above 100 mg/L biomass formation is
accompanied by the production of ethanol as a consequence of an overflow metabolism at
the pyruvate node (Crabtree-effect, (Crabtree 1928)). After depletion of glucose the
ethanol initially formed by the overflow metabolism is further converted into biomass
under aerobic conditions (Diauxie). Under anaerobic conditions about 90% of glucose
carbon is converted into ethanol and CO2. The rate of glucose utilization and the specific
ethanol yield is higher under anaerobic conditions as compared to the sugar conversion
rate and ethanol yield under aerobic conditions (Pasteur-effect, (Racker 1974)). It can
reduce a number of keto-compounds to the corresponding chiral alcohols that represent
20 Metabolomics
interesting precursors for pharmaceuticals (Csuk 1991). It can grow as a diploid as well as
a haploid which highly facilitates genetic manipulation and permits high-throughput
genetic engineering.
Considering the enormous early interest in studying and understanding the physiology of
S. cerevisiae long before modern omics techniques have been developed, it is not very
surprising that it was the baker’s yeasts genome that was the first within the domain of
eukaryotes that was completely sequenced. Genomic and biological information about S.
cerevisiae molecular biology is comprehensively collected at the Saccharomyces Genome
Database (SGD, http://www.yeastgenome.org/). Driven by the knowledge of the
complete genomic sequence and by the steadily increasing availability of tools developed
for genetic engineering, S. cerevisiae became a key work horse and the representative
eukaryotic model organism in every modern discipline within the biosciences such as
molecular and cell biology, functional genomics, systems biology or metabolic and
synthetic engineering. Today’s genetic work with S. cerevisiae cells is highly alleviated by
the presence of a wide spectrum of established yeast molecular biology tool kits and
availability of many wild-type and mutant strain (e.g. knock-out strains) collections as
well as plasmid collections containing S. cerevisiae ORFs, gene deletion markers or
promoter sets and many more, offered by commercial sources such as EUROSCARF
(http://web.uni-frankfurt.de/fb15/mikro/euroscarf/index.html), Open biosystems
(http://www.openbiosystems.com/Products/) or Addgene (http://www. addgene.
org/).
The commercial establishment of genetic manipulation techniques paved the way for S.
cerevisiae to be exploited in the field of metabolic engineering. Various novel recombinant
designer strains capable of either selective formation of one desired product or of producing
heterologous compounds or endogenous products from new resources (waste or renewable
materials) emerged in the last decades. Metabolic engineering efforts based on S. cerevisiae
are comprehensively summarized elsewhere and the interested reader is referred to (Bettiga
et al. 2010; Nevoigt 2008). A collection of engineered substrate utilization and heterologous
or homologous product formation pathways is given in Table 1.
The corresponding underlying engineering principles can be basically broken down into 4
strategies as depicted in Figure 1 panel A-D. Elucidation of the appropriate engineering
approach represents the most important step in designing novel cellular properties and
targets on the identification of reaction(s) or even entire pathways that are suited for the
anticipated metabolic engineering objective. Relevant reaction(s) and associated gene(s) can
be extracted by thorough screenings of literature data (US National Library of Medicine
(http://www.ncbi.nlm.nih.gov/pubmed), SciFinder (https://scifinder.cas.org/) or Web of
Knowledge (http://wokinfo.com/)) and online databases (KEGG the Kyoto Encyclopedia
of Genes and Genomes (http://www. genome.jp/kegg/), the enzyme database BRENDA
(http://www.brenda-enzymes.org/), or the SIB bioinformatics Resource Portal ExPASy
(http://www.expasy.ch/).
To increase the probability of engineering success identified targets can be subjected to in
silico modeling by employing mathematical models like restricted flux balance analysis
(FBA) based on a genome-scale stoichiometric network to verify their compatibility with the
underlying metabolic network (Cvijovic et al. 2010; Selvarajoo et al. 2010).
Quantitative Metabolomics and Its Application in
Metabolic Engineering of Microbial Cell Factories Exemplified by the Baker’s Yeast 21
Product targets
Fig. 1. Typical metabolic engineering principles based on rational design (panels A – D) are
linked to a suggested experimental work-flow to unravel limiting metabolic sites. Panels A-
D refer to enabling of substrate utilization (A) or product formation (B), preventing side
product formation by deletion and/or overexpression of an endogenous enzyme (C),
increasing selectivity of a substrate promiscuous enzyme (D); Substrate A, intermediate B,
product P and enzymes E new to the network are indicated in grey. Overexpression of an
endogenous enzyme is depicted by a grey e. Knock outs are indicated by grey x’s. Subscripts
of rate constants v given as numbers and small letters refer to fluxes based on stoichiometry
(solid arrows) and individual reaction rates of enzymes (dotted arrows), respectively.
Directing the carbon flux towards P can be afforded by deletion of respective gene(s),
overexpression of enzymes participating in the productive branch, or replacing the
corresponding activity by a less regulated or more selective one. Furthermore unbalanced
carbon usage between reaction partners participating in this pathway and/or in the
recycling of, for example cofactors, can result in accumulation and release of a pathway
intermediate (Krahulec et al. 2009; Krahulec et al. 2010). In this case fine-tuning of all
activities involved based on for example a metabolic control analysis (MCA) or kinetic
modeling analysis is required to minimize or even completely prevent by-product
accumulation (Parachin et al. 2011).
Aside from rational design stochastic methods based on inverse metabolic engineering have
been developed for S. cerevisiae to identify key target reactions and associated gene
sequences enabling the desired new cellular property (Bailey et al. 2002; Bengtsson et al.
2008; Bro et al. 2005; Hong et al. 2010; Jin et al. 2005; Lee et al. 2010). Differently, methods
targeting on the induction of a cellular property, such as growth, increase of substrate
conversion rate or enhancing resistance to environmental stress, that is hardly to capture by
in silico design because of its highly intricate metabolic relations that have to be satisfied,
rely on the cellular adaptability to a certain environmental stress by evolution (Cakar et al.
Quantitative Metabolomics and Its Application in
Metabolic Engineering of Microbial Cell Factories Exemplified by the Baker’s Yeast 23
2009; Cakar et al. 2005; Garcia Sanchez et al. 2010; Kuyper et al. 2005; Sonderegger and Sauer
2003; Wisselink et al. 2009).
In the course of establishing systems biology various high-throughput omics techniques such
as transcriptomics, proteomics, fluxomics and others have been developed with the objective
to comprehensively analyze cellular physiology at all molecular levels (DNA, RNA, protein,
flux, and metabolite). Data-driven analysis is often exploited to unravel novel interrelations at
the various molecular levels or to obtain a more insightful (quantitative) understanding of
cellular processes. It is obvious that metabolic engineering can greatly benefit from the
integration of omics techniques in the design of improved microbial cell factories (Nielsen and
Jewett 2008). The various omics tools have helped to increase understanding about how cells
regulate, communicate and adapt to different environmental conditions.
Depending on the metabolic engineering objective the appropriate omics tool or a
combination should be selected after due consideration. For example transcriptome analysis
provides a holistic image of mRNA molecule pattern and levels but do not tell us anything
about metabolic fluxes. Optimization of the flux towards a specific metabolite however
represents one of the major goals in metabolic engineering. Metabolic flux analyses based on
stoichiometric models or 13C-isotopomer analysis (provided that cells can grow under the
environmental conditions used) are useful tools in this respect (Nielsen and Jewett 2008). To
understand the underlying mechanistic relationships between the flux through a particular
pathway and the enzymes forming the pathway, providing the relevant information for
strain design, detailed knowledge about enzyme-metabolite interactions are required.
Consequently quantitative information about metabolites involved together with detailed
knowledge of kinetic properties of participating enzymes is mandatory. Within the omics
family metabolomics represents the youngest member. This is basically due to the facts that
metabolites vary greatly in their physico-chemical properties (polarity, acidity, reactivity,
and stability) and are present in a large dynamic concentration range which make it almost
impossible to record the entire metabolome on a single analytical platform. Another
challenge represents the generation of reliable and representative metabolite data from
biological samples. Cell-wall leakage, instabilities and losses of metabolites throughout the
sample work-up, or strong matrix effects in the MS analysis are a few of the many causes
impairing metabolite data and as a consequence adulterate molecular mechanistic
interpretation. Nevertheless in the last years much progress has been made due to
enormous efforts of the yeast research community to overcome these obstacles. Protocols of
unbiased sample-work-up and different analytical platforms are available today that can
cover more than 100 compounds quantitatively.
This review presents current accepted protocols and techniques that enable acquisition of
absolute quantitative metabolite data from S. cerevisiae cells. The second part focuses on how
quantitative metabolite data can help in the development of improved microbial cell factories.
However, before going into the details some definitions of terms used in metabolome
analysis should be reminded (Nielsen 2007). Metabolite profiling targets on the qualitative
or semi-quantitative analysis of specified metabolites or groups of metabolites. In contrast in
metabolite target analysis selected metabolites are quantified. If the entire metabolome or a
fraction of it is addressed (or as many metabolites as possible) qualitatively or quantitatively
we speak of metabolomics or quantitative metabolomics.
24 Metabolomics
found that the volume ratio of sample to quenching solution affects the quenching quality and
a ratio of at least 1 to 5 was suggested (Canelas et al. 2008a).
Today two methods cold methanol and cold glycerol-saline have achieved wide
acceptance within the metabolomics community (Canelas et al. 2008a; Villas-Bôas and
Bruheim 2007). See Fig. 2 for details. In addition to the quenching temperature and time
or the volume ratio between sample and quenching solution, the time between
quenching and separation by centrifugation and the centrifugation time can influence
metabolic activity and metabolite leakage significantly. Harvesting by rapid sampling is
often coupled to quenching. Rapid sampling is especially important in continuous
steady-state cultivations and for pulse-experiments in which changes of intracellular
metabolite pools induced by a certain environmental impulse are analyzed at the sub-
second time scale. Ingenious devices have been developed in the last years that enable
rapid sampling and quenching simultaneously at the millisecond scale. The various
manual, semiautomatic and fully automatic rapid sampling techniques and their pros
and cons have been comprehensively discussed and the reader is referred to (Reuss et al.
2007; van Gulik 2010; Villas-Bôas 2007a). A disadvantage in this context is that most of
these devices are not available on the market and therefore not accessible to the scientific
community. Commercial accessibility however would be of great importance in the
context of comparability, reproducibility and standardization in quantitative
metabolomics studies. In batch cultivations manual transfer of the cell suspension to the
quenching solution by using a pipette or a syringe is widely accepted as environmental
conditions might not change significantly during the sample transfer (3 – 6 s) (Villas-
Bôas 2007b). These assumptions might hold for anaerobic or microaerobic conditions but
should be reconsidered in case of aerobic cultivations for which for example the O2/CO2
ratio may vary and induces changes (oxygen limitation) in cell metabolism during
sampling.
It is obvious from this example considering just four metabolites that the ideal extraction
procedure with which the complete metabolite consortium is extractable without any losses
may not exist and calls for a compromise in the selection of conditions used for metabolite
extraction. This “inconvenience” however can be circumvented, provided that subsequent
metabolite detection is based on mass spectroscopy, by the addition of an aliquot of U-13C-
labled internal standard (IS) compounds to the biomass subsequent to quenching or prior to
metabolite extraction (Büscher et al. 2009; Canelas et al. 2009; Klimacek et al. 2010; Mashego
et al. 2004; Wu et al. 2005). Metabolite losses due to incomplete quantitative work-up of
samples can be addressed by application of one selected IS compound. However metabolite
specific instabilities, matrix effects, ion suppression, non-linear responses and day-to-day
variations can be only identifed and appropriately corrected for by the addition of U-13C-
labled IS. A representative mixture of labeled metabolites can be easily prepared from S.
cerevisiae wild-type and/or mutant cells cultivated in standard mineral medium
supplemented with U-13C-labled substrate (glucose, fructose, galactose,…) under the
cultivation condition selected and by respective appropriate quenching and extraction
procedures. Internal referencing by using an IS displaying a metabolite composition that is
representative for the cellular state to be studied should be always taken into account
because intracellular metabolite levels can vary considerably in dependence on the
cultivation conditions used or on the cellular alterations introduced by pathway engineering
(Klimacek et al. 2010).
Quantitative Metabolomics and Its Application in
Metabolic Engineering of Microbial Cell Factories Exemplified by the Baker’s Yeast 27
Various extraction protocols with respect to extraction solvent (acids, bases, ethanol or
methanol, organic solvents), buffered or non-buffered solutions, pH, temperature, etc have
been tested (Villas-Bôas 2007b), evaluated and verified for S. cerevisiae cells in terms of
metabolite coverage, efficacy and recovery (Klimacek et al. 2010) in the last years (Canelas et
al. 2009; Villas-Bôas et al. 2005b). Today three extraction procedures have achieved some
acceptance and are likewise used within the yeast research community (see Fig. 2 for
details). That is boiling ethanol (BE; pioneered for S. cerevisiae cells by (Gonzalez et al. 1997)),
chloroform-methanol (CM; pioneered for S. cerevisiae cells by (de Koning and van Dam
1992)) and to some extent freeze-thawing in methanol (FTM; pioneered for S. cerevisiae cells
by (Villas-Bôas et al. 2005b) for which however controversial results with respect to its
applicability are present in the literature. While (Villas-Bôas et al. 2005b) found extraction
performance of FTM sufficient, others (Canelas et al. 2009) concluded that FTM cannot
effectively prevent metabolite conversion throughout the extraction process and considered
FTM therefore as not appropriate for metabolite extraction. Differences in evaluation criteria
and growth conditions were used as a basis to explain the different outcomes. It should be
however noted that (Canelas et al. 2009) investigated metabolite extraction performances
from S. cerevisiae cells grown under two different physiological conditions (glucose
limitation and glucose saturation; a bioreactor coupled to a rapid sampling device was
used), used identical U-13C-labled compounds as IS and three different analytical methods
for quantification of a broad range of different compounds. Quality and metabolite recovery
of FTM instead was judged by (Villas-Bôas et al. 2005b) by the application of an IS mixture
composed of compounds each a representative for a substance class analyzed. The mixture
was added to the quenched cells prior to extraction. Cells were cultivated under aerobic
conditions in shake flasks and metabolites were quantified by established GC-MS after
metabolite derivatization with methyl chloroformate. In Fig. 2 brief descriptions of
respective protocols are compiled. A broad spectrum of compounds covering a wide range
of different chemical properties such as acidity, polarity, size and responsiveness can be
addressed with either of these extraction protocols. Details with respect to extraction
method specific component coverage can be extracted from (Buescher et al. 2010; Büscher et
al. 2009; Canelas et al. 2008a; Canelas et al. 2009; Klimacek et al. 2010; Villas-Bôas and
Bruheim 2007; Villas-Bôas et al. 2005b).
adapted to high ethanol concentrations displayed an altered cell size (Dinh et al. 2008). Or
overexpressing mannitol-1-phosphate dehydrogenase (M1PDH) in S. cerevisiae to produce
mannitol from glucose caused a substantial increasing of the size of cells (Costenoble et al.
2003).
As for all eukaryotic organisms S. cerevisiae metabolism is compartmented (cytosol,
mitochondrion, vacuoles) which poses a problem for the accurate determination of
concentrations of relevant intracellular metabolites. Current techniques for extracting
metabolites and isolating organelles do not allow for absolute separation from the cytosol
without altering the respective metabolite composition and pattern. Indirect strategies based
on metabolic engineering or on fundamental thermodynamic principles have been
developed to address this obstacle and gave first preliminary and semi-quantitative insights
into the distribution of metabolites between cytosol and mitochondrion.
Functional expression of M1PDH from E. coli in S. cerevisiae was used as indicator reaction to
determine the cytosolic free NAD to NADH ratio (Canelas et al. 2008b). M1PDH catalyzes the
reversible NAD(H)-dependent interconversion of fructose 6-P (F6P) and mannitol 1-P (M1P).
This reaction is directly connected to the central carbon metabolism and represents a dead-end
reaction in the metabolism of yeast under the conditions applied in this study. Based on the
assumption that the M1PDH reaction is at equilibrium the authors were capable of calculating
the NAD/NADH ratio from the equilibrium constant and the intracellular concentrations of
F6P and M1P. Data were verified by thermodynamic analysis. The cytosolic ratio of
NAD/NADH was found to be ~10-fold higher as compared to the same ratio when based on
the whole cell. Under anaerobic conditions however mannitol is formed from M1P implying
that this approach is not yet universally applicable (Costenoble et al. 2003).
A different approach based on a network-embedded thermodynamic analysis later termed
anNET (Zamboni et al. 2008) was used by (Kümmel et al. 2006) to resolve
intracompartmental feasible concentration ranges from cell-averaged metabolome data.
Although these first results are promising there is large open space for the development of
novel strategies combined with appropriate experimental techniques that enable precise
compartment-specific quantification.
So how do metabolomics and more specifically quantitative metabolomics come into play? As
described above the composition of intracellular metabolites together with their levels represent
a direct signature of the physiological state of the cells investigated. Comparing metabolite
profiles of wild-type and mutant strain(s) was often used to identify target reactions limiting the
conversion rate (Hasunuma et al. 2011; Kahar et al. 2011; Klimacek et al. 2010; Kötter and
Ciriacy 1993; Wisselink et al. 2010; Zaldivar et al. 2002) or extract the metabolite pattern
representative for the new phenotype (Canelas et al. 2008b; Devantier et al. 2005; Ding et al.
2010; Hou et al. 2009; Kamei et al. 2011; MacKenzie et al. 2008; Pereira et al. 2011; Raamsdonk et
al. 2001; Ralser et al. 2007; Thorsen et al. 2007; Usaite et al. 2009; Villas-Bôas et al. 2005a; Villas-
Bôas et al. 2005c; Yoshida et al. 2008). Even apparently silent phenotypes of S. cerevisiae single
deletion mutants can be uncovered with respect to the underlying mutation based on the
developed metabolome (Raamsdonk et al. 2001). The rate however at which a compound’s
carbon skeleton is channeled through a certain pathway is directly linked to the level of active
enzymes present and their affinity to the participating reactants as well as to fundamental
thermodynamic laws of the reactions involved. Consequently knowledge about intracellular
concentration of metabolites and enzyme activities combined with thermodynamic and enzyme
kinetic analysis can provide novel and valuable insights into the kinetic organization of the
engineered pathway or even the associated metabolic network which eventually exposes key
regulatory or flux limiting sites. Differently to the general holistic approach usually found in
systems biology pathway analysis in metabolically engineered cells can be reduced in most
cases to the components involved in the new pathway and those connecting this pathway to the
central carbon metabolism (Parachin et al. 2011).
A + B C + D (1)
The chemical equilibrium constant (Keq) associated with this reaction can be defined
according to the law of mass action as
Ns
rG 0 vi f G i0 RT ln K eq (3)
i 1
32 Metabolomics
in which vi and fGi0 correspond to the stoichiometric coefficient of reactant i and to the
standard reaction Gibbs energy of formation of species i at a specified T, P and ionic
strength, respectively. R and T denote the general gas constant (8.314 J/mol/K) and absolute
temperature in Kelvin (K), respectively. The Gibbs energy of formation of a reactant i (fGi)
is further defined by
Ns Ns
rG vi f Gi0 RT vi ln ci r G 0 RT ln Q (5)
i 1 i 1
Q in Equation (5) indicates the reaction quotient cC cD / (cA cB), which is also known under
the term “mass action ratio” that is frequently abbreviated by Γ. Rearranging Equation (4)
yields
strength and temperature is calculated with respect to the underlying stoichiometry of the
reaction. Values for K’eq are usually applied in the context of network analysis. Importantly
all reactants and the enzyme must be of highest purity and must be stable along the time of
incubation. The composition of the reaction mixture at the equilibrium must not change in
the subsequent component analysis.
When the fGi0 for all reactants participating in the investigated reaction system are known,
the respective values for Keq can be calculated (Alberty 1991). Values for standard Gibbs
energies of formation fGi0 have been tabulated for a number of compounds (Alberty 2003).
Computer programs are available with which one can calculate transformed standard Gibbs
energies of formation (fGi’0) from fGi0 for a specified pH and ionic strength (Alberty 2003;
Zamboni et al. 2008). In addition all possible dissociation forms of a compound are also
lumped into a single reactant in fGi’0. The respective transformed Gibbs energy of
formation for a reactant at a certain concentration (fGi’) is described by
L
FCCiJ ci j
0 (8)
i 1
Equation (8) implies that large elasticities – concentration of reactant exerts large control on
the enzyme reaction rate – are associated with small FCCs – the overall flux through the
pathway is not very dependent on the enzyme activity – and vice versa. Consequently
knowledge of either of these coefficient is only required. FCC can be determined by
increasing the amount of active enzyme for example by expressing this particular enzyme at
different expression levels and measuring the change in flux through the pathway. The level
of enzyme present can be judged by determining its specific activity in cell-free extracts. If
Quantitative Metabolomics and Its Application in
Metabolic Engineering of Microbial Cell Factories Exemplified by the Baker’s Yeast 35
the specific activity (µmol/mgprotein/min) for the isolated enzyme, the cellular protein
content (mgprotein/gCDW) and the specific cell volume are known then the molar
concentration of this enzyme can be calculated. Enzyme levels can be also determined by
proteomics techniques and to some extent extrapolated from transcriptome data.
Alternatively suitable antibodies that selectively bind to the target enzymes or fusion of an
indicator peptide or protein tag to the enzyme to be analyzed permitting quantification
either directly in vivo (GFP) or in vitro after separation of the target enzyme by affinity
chromatography (e.g. His-tag, strep-tag) can be used for determining intracellular enzyme
concentrations. If rate equations and associated kinetic parameters of all enzymes involved
and at least their relative activity levels are known mathematical models can be applied to
estimate FCCs and 's. Rate equations based on the steady-state or rapid equilibrium
assumption for enzyme catalyzed reactions are comprehensively summarized in (Segel
1993). However kinetic parameters typically used in this approach are determined by in
vitro assays and it has been shown that in vitro generated data can significantly differ from
those observed in vivo (Aragón and Sánchez 1985; Mauch et al. 2000; Reuss et al. 2007). This
discrepancy is most likely due to the incomplete knowledge of the cellular composition and
associated enzyme metabolite interactions as well as the difficulty to analyze enzymes under
in vivo-like conditions at the lab bench. A promising step forward towards generation of
more reliable in vivo-like in vitro enzyme kinetic data was reported recently (van Eunen et
al. 2010). The authors suggested to measure enzyme activities in a buffered reaction mixture
that simulates the intracellular cellular medium composition of S. cerevisiae.
Using the non-linear lin-log formulation developed for metabolic network analysis by
(Visser and Heijnen 2003) FCC and can be estimated without any prior knowledge of
enzyme-specific kinetic parameters (Visser and Heijnen 2002). In this approach the reaction
rate is a nonlinear function of metabolite concentrations and is proportional to enzyme
levels. lin-log kinetics is suited when large perturbations on the systems are analyzed.
Furthermore statistical evaluation of parameter estimates is simplified (Wu et al. 2004).
Nevertheless it is an approximation that fits to fundamental enzyme-reactant properties
only in a certain range of perturbation (Wu et al. 2004).
correlation with by-product formation in the form of xylitol, the reaction product of XR and
the substrate of XDH in the subsequent reaction (van Maris et al. 2007). Usage of up to (0-
52)% NADPH by XR was compatible with a genome-scale metabolic network (Krahulec et
al. 2010). Detailed kinetic analysis from in vitro studies showed that the XR almost
exclusively utilizes NADPH in terms of catalytic efficiency (kcat/Km,coenzyme) and the
selectivity parameter Rsel (kcat/(Ki,coenzymeKm,xylose)) which are widely used as marker
parameter for coenzyme discrimination. Based on this data even a rough estimation of the
coenzyme usage of XR in the cell is not possible. To solve the coenzyme usage riddle of XR
we determined intracellular concentrations of NADH and NADPH (Klimacek et al. 2010)
and integrated this information together with the relevant kinetic parameters (Petschacher
et al. 2005) into the mechanistically appropriate enzyme kinetic rate expression (Banta et al.
2002; Petschacher and Nidetzky 2005). A balanced coenzyme usage perfectly in line with
physiology observed was obtained for the XR (Klimacek et al. 2010). Information about the
correct flux partition of a particular substrate promiscuous enzyme can be implemented as
further restrictions into a stoichiometric network to increase reliability of flux distributions.
Furthermore this approach could be successfully applied on a series of wild-type and
mutant forms of XR to predict reliably formation of xylitol (Krahulec et al. 2011).
4. Conclusions
Quantitative metabolomics is especially suited to help identifying key sites limiting an
engineered metabolic route either within the created pathway but also apart from it. State-
of-the-art protocols for sample work-up and LC-MS and GC-MS analysis permit absolute
quantification of metabolites from S. cerevisiae cells provided that a U-13C-labled IS is
applied. Quantitative data in turn are indispensible for reliable pathway and network
analysis in the form of a thermodynamic analysis, MCA or kinetic modelling. In
combination with other omics techniques it represents a powerful tool to create designer
microbial cell factories exposing improved or novel phenotypes.
5. Acknowledgement
Financial support from the Austrian Science Fund FWF (J2698) is gratefully acknowledged.
6. References
Abbott DA, Zelle RM, Pronk JT, van Maris AJ. 2009. Metabolic engineering of Saccharomyces
cerevisiae for production of carboxylic acids: current status and challenges. FEMS
Yeast Res 9(8):1123-36.
Alberty RA. 1991. Equilibrium compositions of solutions of biochemical species and heats of
biochemical reactions. Proc Natl Acad Sci U S A 88(8):3268-71.
Alberty RA. 2003. Thermodynamics of biochemical reactions. Alberty RA, editor. New
Jersey: John Wiley & Sons, Inc.
Alivisatos SG, Ungar F, Abraham G. 1964. Non-Enzymatic Interactions of Reduced
Coenzyme I with Inorganic Phosphate and Certain Other Anions. Nature 203:973-5.
Aragón JJ, Sánchez V. 1985. Enzyme concentration affects the allosteric behavior of yeast
phosphofructokinase. Biochem Biophys Res Commun 131(2):849-55.
Quantitative Metabolomics and Its Application in
Metabolic Engineering of Microbial Cell Factories Exemplified by the Baker’s Yeast 37
Asadollahi MA, Maury J, Patil KR, Schalk M, Clark A, Nielsen J. 2009. Enhancing
sesquiterpene production in Saccharomyces cerevisiae through in silico driven
metabolic engineering. Metab Eng 11(6):328-34.
Asadollahi MA, Maury J, Schalk M, Clark A, Nielsen J. 2010. Enhancement of farnesyl
diphosphate pool as direct precursor of sesquiterpenes through metabolic
engineering of the mevalonate pathway in Saccharomyces cerevisiae. Biotechnol
Bioeng 106(1):86-96.
Bailey JE, Sburlati A, Hatzimanikatis V, Lee K, Renner WA, Tsai PS. 2002. Inverse metabolic
engineering: a strategy for directed genetic engineering of useful phenotypes.
Biotechnol Bioeng 79(5):568-79.
Banta S, Boston M, Jarnagin A, Anderson S. 2002. Mathematical modeling of in vitro
enzymatic production of 2-Keto-L-gulonic acid using NAD(H) or NADP(H) as
cofactors. Metab Eng 4(4):273-84.
Bengtsson O, Jeppsson M, Sonderegger M, Parachin NS, Sauer U, Hahn-Hagerdal B, Gorwa-
Grauslund MF. 2008. Identification of common traits in improved xylose-growing
Saccharomyces cerevisiae for inverse metabolic engineering. Yeast 25(11):835-47.
Bettiga M, Gorwa-Grauslund MF, Hahn-Hägerdal B. 2010. Metabolic engineering in yeast.
In: Smolke CD, editor. The metabolic pathway engineering handbook.
Fundamentals. Boca Raton: CRC Press. p 1 - 48.
Brauer MJ, Huttenhower C, Airoldi EM, Rosenstein R, Matese JC, Gresham D, Boer VM,
Troyanskaya OG, Botstein D. 2008. Coordination of growth rate, cell cycle, stress
response, and metabolic activity in yeast. Mol Biol Cell 19(1):352-67.
Bro C, Knudsen S, Regenberg B, Olsson L, Nielsen J. 2005. Improvement of galactose uptake
in Saccharomyces cerevisiae through overexpression of phosphoglucomutase:
example of transcript analysis as a tool in inverse metabolic engineering. Appl
Environ Microbiol 71(11):6465-72.
Bro C, Regenberg B, Förster J, Nielsen J. 2006. In silico aided metabolic engineering of
Saccharomyces cerevisiae for improved bioethanol production. Metab Eng 8(2):102-11.
Buescher JM, Moco S, Sauer U, Zamboni N. 2010. Ultrahigh performance liquid
chromatography-tandem mass spectrometry method for fast and robust
quantification of anionic and aromatic metabolites. Anal Chem 82(11):4403-12.
Büscher JM, Czernik D, Ewald JC, Sauer U, Zamboni N. 2009. Cross-platform comparison of
methods for quantitative metabolomics of primary metabolism. Anal Chem
81(6):2135-43.
Cakar ZP, Alkim C, Turanli B, Tokman N, Akman S, Sarikaya M, Tamerler C, Benbadis L,
François JM. 2009. Isolation of cobalt hyper-resistant mutants of Saccharomyces
cerevisiae by in vivo evolutionary engineering approach. J Biotechnol 143(2):130-8.
Cakar ZP, Seker UO, Tamerler C, Sonderegger M, Sauer U. 2005. Evolutionary engineering
of multiple-stress resistant Saccharomyces cerevisiae. FEMS Yeast Res 5(6-7):569-78.
Canelas AB, Ras C, Ten Pierick A, van Dam JC, Heijnen JJ, van Gulik W. 2008a. Leakage-free
rapid quenching technique for yeast metabolomics. Metabolomics 4:226-239.
Canelas AB, Ras C, ten Pierick A, van Gulik WM, Heijnen JJ. 2011. An in vivo data-driven
framework for classification and quantification of enzyme kinetics and
determination of apparent thermodynamic data. Metab Eng 13(3):294-306.
Canelas AB, ten Pierick A, Ras C, Seifar RM, van Dam JC, van Gulik WM, Heijnen JJ. 2009.
Quantitative evaluation of intracellular metabolite extraction techniques for yeast
metabolomics. Anal Chem 81(17):7379-89.
38 Metabolomics
Canelas AB, van Gulik WM, Heijnen JJ. 2008b. Determination of the cytosolic free
NAD/NADH ratio in Saccharomyces cerevisiae under steady-state and highly
dynamic conditions. Biotechnol Bioeng 100(4):734-43.
Castrillo JI, Oliver SG. 2005. Towards integrative functional genomics using yeast as a
reference model. In: Vaidyanathan S, Harrigan GG, Goodacre R, editors.
Metabolome analysis: Strategies for systems biology. Berlin: Springer-Verlag. p 9 -
30.
Castrillo JI, Oliver SG. 2006. Metabolomics and systems biology in Saccharomyces
cerevisiae. In: Brown AJP, editor. Fungal Genomics. Berlin: Springer-Verlag. p 3 - 7.
Chaykin S. 1967. Nicotinamide coenzymes. Annu Rev Biochem 36:149-70.
Cipollina C, van den Brink J, Daran-Lapujade P, Pronk JT, Vai M, de Winde JH. 2008.
Revisiting the role of yeast Sfp1 in ribosome biogenesis and cell size control: a
chemostat study. Microbiology 154(Pt 1):337-46.
Ciriacy M, Breitenbach I. 1979. Physiological effects of seven different blocks in glycolysis in
Saccharomyces cerevisiae. J Bacteriol 139(1):152-60.
Costenoble R, Adler L, Niklasson C, Lidén G. 2003. Engineering of the metabolism of
Saccharomyces cerevisiae for anaerobic production of mannitol. FEMS Yeast Res
3(1):17-25.
Crabtree B, Newsholme EA, Reppas NB. 1997. Principles of regulation and control in
biochemistry: a paradigmatic, flux-oriented approach. In: Hoffman JF, Jamieson JD,
editors. Handbook of Physiology. New York, U.S.A.: Oxford University Press. p
117-180.
Crabtree HG. 1928. The carbohydrate metabolism of certain pathological overgrowths.
Biochem J 22(5):1289-98.
Csuk R. 1991. Baker's yeast mediated transformations on organic chemistry. Chem Rev 91:49
- 97.
Cvijovic M, Bordel S, Nielsen J. 2010. Mathematical models of cell factories: moving towards
the core of industrial biotechnology. Microb Biotechnol 4(5):572-84.
de Koning W, van Dam K. 1992. A method for the determination of changes of glycolytic
metabolites in yeast on a subsecond time scale using extraction at neutral pH. Anal
Biochem 204(1):118-23.
Dejong JM, Liu Y, Bollon AP, Long RM, Jennewein S, Williams D, Croteau RB. 2006. Genetic
engineering of taxol biosynthetic genes in Saccharomyces cerevisiae. Biotechnol
Bioeng 93(2):212-24.
den Hollander JA, Ugurbil K, Brown TR, Shulman RG. 1981. Phosphorus-31 nuclear
magnetic resonance studies of the effect of oxygen upon glycolysis in yeast.
Biochemistry 20(20):5871-80.
Devantier R, Scheithauer B, Villas-Bôas SG, Pedersen S, Olsson L. 2005. Metabolite profiling
for analysis of yeast stress response during very high gravity ethanol
fermentations. Biotechnol Bioeng 90(6):703-14.
Ding M-Z, Zhou X, Yuan Y-J. 2010. Metabolome profiling reveals adaptive evolution of
Saccharomyces cerevisiae during repeated vacuum fermentations. Metabolomics 6:42
- 55.
Dinh TN, Nagahisa K, Hirasawa T, Furusawa C, Shimizu H. 2008. Adaptation of
Saccharomyces cerevisiae cells to high ethanol concentration and changes in fatty acid
composition of membrane and cell size. PLoS One 3(7):e2623.
Quantitative Metabolomics and Its Application in
Metabolic Engineering of Microbial Cell Factories Exemplified by the Baker’s Yeast 39
MacKenzie DA, Defernez M, Dunn WB, Brown M, Fuller LJ, de Herrera SR, Günther A,
James SA, Eagles J, Philo M and others. 2008. Relatedness of medically important
strains of Saccharomyces cerevisiae as revealed by phylogenetics and metabolomics.
Yeast 25(7):501-12.
Madsen KM, Udatha GD, Semba S, Otero JM, Koetter P, Nielsen J, Ebizuka Y, Kushiro T,
Panagiotou G. 2011. Linking genotype and phenotype of Saccharomyces cerevisiae
strains reveals metabolic engineering targets and leads to triterpene hyper-
producers. PLoS One 6(3):e14763.
Mashego MR, Wu L, Van Dam JC, Ras C, Vinke JL, Van Winden WA, Van Gulik WM,
Heijnen JJ. 2004. MIRACLE: mass isotopomer ratio analysis of U-13C-labeled
extracts. A new method for accurate quantification of changes in concentrations of
intracellular metabolites. Biotechnol Bioeng 85(6):620-8.
Mauch K, Vaseghi S, Reuss M. 2000. Quantitative analysis of metabolic signalling pathways
in Saccharomyces cerevisiae. In: Schügerl K, Bellgardt KH, editors. Bioreaction
engineering. Berlin: Springer Verlag. p 435 - 477.
Mountain HA, Sudbery PE. 1990. The relationship of growth rate and catabolite repression with
WHI2 expression and cell size in Saccharomyces cerevisiae. J Gen Microbiol 136(4):733-7.
Mutka SC, Bondi SM, Carney JR, Da Silva NA, Kealey JT. 2006. Metabolic pathway
engineering for complex polyketide biosynthesis in Saccharomyces cerevisiae. FEMS
Yeast Res 6(1):40-7.
Navon G, Shulman RG, Yamane T, Eccleshall TR, Lam KB, Baronofsky JJ, Marmur J. 1979.
Phosphorus-31 nuclear magnetic resonance studies of wild-type and glycolytic
pathway mutants of Saccharomyces cerevisiae. Biochemistry 18(21):4487-99.
Nevoigt E. 2008. Progress in metabolic engineering of Saccharomyces cerevisiae. Microbiol Mol
Biol Rev 72(3):379-412.
Nidetzky B, Helmer H, Klimacek M, Lunzer R, Mayer G. 2003. Characterization of
recombinant xylitol dehydrogenase from Galactocandida mastotermitis expressed in
Escherichia coli. Chem. Biol. Interact. 143-144:533-542.
Nielsen J. 2007. Metabolomics in functional genomics and systems biology. In: Villas-Bôas
SG, Roessner U, Hansen MAE, Smedsgaard J, Nielsen J, editors. Metabolome
analysis: An Introduction. Hoboken: John Wiley & Sons, Inc. p 3 - 14.
Nielsen J, Jewett MC. 2008. Impact of systems biology on metabolic engineering of
Saccharomyces cerevisiae. FEMS Yeast Res 8(1):122-31.
Nissen TL, Hamann CW, Kielland-Brandt MC, Nielsen J, Villadsen J. 2000a. Anaerobic and
aerobic batch cultivations of Saccharomyces cerevisiae mutants impaired in glycerol
synthesis. Yeast 16(5):463-74.
Nissen TL, Kielland-Brandt MC, Nielsen J, Villadsen J. 2000b. Optimization of ethanol
production in Saccharomyces cerevisiae by metabolic engineering of the ammonium
assimilation. Metab Eng 2(1):69-77.
Parachin NS, Bergdahl B, van Niel EW, Gorwa-Grauslund MF. 2011. Kinetic modelling
reveals current limitations in the production of ethanol from xylose by recombinant
Saccharomyces cerevisiae. Metab Eng in press.
Pereira FB, Guimarães PMR, Teixeira JA, Domingues L. 2011. Robust industrial
Saccharomyces cerevisiae strains for very high gravity bio-ethanol fermentations. J
Biosci Bioeng 112(2):130 - 136.
Petschacher B, Leitgeb S, Kavanagh KL, Wilson DK, Nidetzky B. 2005. The coenzyme
specificity of Candida tenuis xylose reductase (AKR2B5) explored by site-directed
mutagenesis and X-ray crystallography. Biochem. J. 385(Pt 1):75-83.
42 Metabolomics
Petschacher B, Nidetzky B. 2005. Engineering Candida tenuis Xylose reductase for improved
utilization of NADH: antagonistic effects of multiple side chain replacements and
performance of site-directed mutants under simulated in vivo conditions. Appl.
Environ. Microbiol. 71(10):6390-6393.
Petschacher B, Nidetzky B. 2008. Altering the coenzyme preference of xylose reductase to
favor utilization of NADH enhances ethanol yield from xylose in a metabolically
engineered strain of Saccharomyces cerevisiae. Microb Cell Fact 7:9.
Pirkov I, Albers E, Norbeck J, Larsson C. 2008. Ethylene production by metabolic
engineering of the yeast Saccharomyces cerevisiae. Metab Eng 10(5):276-80.
Porro D, Brambilla L, Alberghina L. 2003. Glucose metabolism and cell size in continuous
cultures of Saccharomyces cerevisiae. FEMS Microbiol Lett 229(2):165-71.
Raab AM, Gebhardt G, Bolotina N, Weuster-Botz D, Lang C. 2010. Metabolic engineering of
Saccharomyces cerevisiae for the biotechnological production of succinic acid. Metab
Eng 12(6):518-25.
Raamsdonk LM, Teusink B, Broadhurst D, Zhang N, Hayes A, Walsh MC, Berden JA,
Brindle KM, Kell DB, Rowland JJ and others. 2001. A functional genomics strategy
that uses metabolome data to reveal the phenotype of silent mutations. Nat
Biotechnol 19(1):45-50.
Racker E. 1974. History of the Pasteur effect and its pathobiology. Mol Cell Biochem 5(1-
2):17-23.
Ralser M, Wamelink MM, Kowald A, Gerisch B, Heeren G, Struys EA, Klipp E, Jakobs C,
Breitenbach M, Lehrach H and others. 2007. Dynamic rerouting of the carbohydrate
flux is key to counteracting oxidative stress. J Biol 6(4):10.
Reuss M, Aguilera-Vázques L, Mauch K. 2007. Reconstruction of dynamic network models
from metabolite measurements. In: Nielsen J, Jewett MC, editors. Metabolomics.
Berlin: Springer-Verlag. p 97 - 127.
Rizzi M, Baltes M, Theobald U, Reuss M. 1997. In vivo analysis of metabolic dynamics in
Saccharomyces cerevisiae: II. Mathematical model. Biotechnol Bioeng 55(4):592-608.
Rocha I, Maia P, Evangelista P, Vilaca P, Soares S, Pinto JP, Nielsen J, Patil KR, Ferreira EC,
Rocha M. OptFlux: an open-source software platform for in silico metabolic
engineering. BMC Syst Biol 4:45.
Segel IH. 1993. Enzyme Kinetics - Behavior and analysis of rapid equilibrium and steady-
state enzyme systems. New York: Wiley interscience.
Selvarajoo K, Arjunan SNV, Tomita M. 2010. In silico models for metabolic systems
engineering. In: Smolke CD, editor. The metabolic pathway engineering handbook.
Tools and applications. Boca Raton: CRC Press. p 1 - 22.
Shanks JV, Bailey JE. 1988. Estimation of intracellular sugar phosphate concentrations in
Saccharomyces cerevisiae using 31P nuclear magnetic resonance spectroscopy.
Biotechnol Bioeng 32(9):1138-52.
Sonderegger M, Sauer U. 2003. Evolutionary engineering of Saccharomyces cerevisiae for
anaerobic growth on xylose. Appl Environ Microbiol 69(4):1990-8.
Steen EJ, Chan R, Prasad N, Myers S, Petzold CJ, Redding A, Ouellet M, Keasling JD. 2008.
Metabolic engineering of Saccharomyces cerevisiae for the production of n-butanol.
Microb Cell Fact 7:36.
Steinle A, Bergander K, Steinbüchel A. 2009. Metabolic engineering of Saccharomyces
cerevisiae for production of novel cyanophycins with an extended range of
constituent amino acids. Appl Environ Microbiol 75(11):3437-46.
Quantitative Metabolomics and Its Application in
Metabolic Engineering of Microbial Cell Factories Exemplified by the Baker’s Yeast 43
Villas-Bôas SG, Bruheim P. 2007. Cold glycerol-saline: the promising quenching solution for
accurate intracellular metabolite analysis of microbial cells. Anal Biochem
370(1):87-97.
Villas-Bôas SG, Delicado DG, Åkesson M, Nielsen J. 2003. Simultaneous analysis of amino
and nonamino organic acids as methyl chloroformate derivatives using gas
chromatography-mass spectrometry. Anal Biochem 322(1):134-8.
Villas-Bôas SG, Hojer-Pedersen J, Åkesson M, Smedsgaard J, Nielsen J. 2005b. Global
metabolite analysis of yeast: evaluation of sample preparation methods. Yeast
22(14):1155-69.
Villas-Bôas SG, Moxley JF, Åkesson M, Stephanopoulos G, Nielsen J. 2005c. High-
throughput metabolic state analysis: the missing link in integrated functional
genomics of yeasts. Biochem J 388(Pt 2):669-77.
Visser D, Heijnen JJ. 2002. The mathematics of metabolic control analysis revisited. Metab
Eng 4(2):114-23.
Visser D, Heijnen JJ. 2003. Dynamic simulation and metabolic re-design of a branched
pathway using linlog kinetics. Metab Eng 5(3):164-76.
Visser W, Scheffers WA, Batenburg-van der Vegte WH, van Dijken JP. 1990. Oxygen
requirements of yeasts. Appl Environ Microbiol 56(12):3785-92.
Wang L, Birol I, Hatzimanikatis V. 2004. Metabolic control analysis under uncertainty:
framework development and case studies. Biophys J 87(6):3750-63.
Wisselink HW, Cipollina C, Oud B, Crimi B, Heijnen JJ, Pronk JT, van Maris AJ. 2010.
Metabolome, transcriptome and metabolic flux analysis of arabinose fermentation
by engineered Saccharomyces cerevisiae. Metab Eng 12(6):537-51.
Wisselink HW, Toirkens MJ, del Rosario Franco Berriel M, Winkler AA, van Dijken JP, Pronk
JT, van Maris AJ. 2007. Engineering of Saccharomyces cerevisiae for efficient anaerobic
alcoholic fermentation of L-arabinose. Appl Environ Microbiol 73(15):4881-91.
Wisselink HW, Toirkens MJ, Wu Q, Pronk JT, van Maris AJ. 2009. Novel evolutionary
engineering approach for accelerated utilization of glucose, xylose, and arabinose
mixtures by engineered Saccharomyces cerevisiae strains. Appl Environ Microbiol
75(4):907-14.
Wu L, Mashego MR, van Dam JC, Proell AM, Vinke JL, Ras C, van Winden WA, van Gulik
WM, Heijnen JJ. 2005. Quantitative analysis of the microbial metabolome by isotope
dilution mass spectrometry using uniformly 13C-labeled cell extracts as internal
standards. Anal Biochem 336(2):164-71.
Wu L, Wang W, van Winden WA, van Gulik WM, Heijnen JJ. 2004. A new framework for
the estimation of control parameters in metabolic pathways using lin-log kinetics.
Eur J Biochem 271(16):3348-59.
Yoshida S, Imoto J, Minato T, Oouchi R, Sugihara M, Imai T, Ishiguro T, Mizutani S, Tomita
M, Soga T and others. 2008. Development of bottom-fermenting Saccharomyces
strains that produce high SO2 levels, using integrated metabolome and
transcriptome analysis. Appl Environ Microbiol 74(9):2787-96.
Zaldivar J, Borges A, Johansson B, Smits HP, Villas-Boas SG, Nielsen J, Olsson L. 2002.
Fermentation performance and intracellular metabolite patterns in laboratory and
industrial xylose-fermenting Saccharomyces cerevisiae. Appl Microbiol Biotechnol
59(4-5):436-42.
Zamboni N, Kümmel A, Heinemann M. 2008. anNET: a tool for network-embedded
thermodynamic analysis of quantitative metabolome data. BMC Bioinformatics
9:199.
Part 2
1. Introduction
As metabolomics becomes an increasingly major component of modern biological research,
steps must be taken to preserve and make maximal use of the ever increasing torrents of
new data entering the public domain. While this task is by no means unique to the field of
metabolomics, the complexity, heterogeneity and large sizes of metabolomics datasets make
the development of effective metabolomics bioinformatics tools particularly challenging.
Despite these challenges, metabolomics specialists have recently been making rapid
progress in this area. A wide range of powerful web-based tools designed to facilitate the
systematic online storage, processing, dissemination and biological interpretation of
technically and biologically diverse metabolomics datasets have now emerged and are
rapidly becoming cornerstones of advancement in biological science.
Web-based tools for metabolomics perform a wide variety of functions. These can be
divided into several broad categories, including:
1. Storage and dissemination of technical, biological, and physicochemical reference data
for metabolites
2. Processing of raw instrument data to generate [metabolite x sample] data matrices
suitable for statistical and multivariate data-analysis
3. Database storage and querying of pre-processed relative and/or absolute metabolite
level data
4. Statistical and multivariate analysis of pre-processed relative and/or absolute
metabolite level data
5. Aiding biological interpretation of metabolomics results by integration of biological
knowledge such as known biomarkers or metabolic pathway information.
While some tools are broader in scope than others and some tools can essentially fully
service the data-processing requirements of certain metabolomics approaches, it is
important to note that no single tool is currently capable of fulfilling every requirement of
every metabolomics researcher. This chapter will review the current state of development in
the area of web-based informatics tools for metabolomics and explain how currently
available tools can be used to accelerate scientific discovery. It will then attempt to predict
future developments in the area of metabolomics web-tool development and advise new
metabolomics researchers on strategies to maximise their own benefit from these
developments.
48 Metabolomics
often have been observed in nature before and have a common name, but finding this out
can be challenging if one does not know where to start. This is where InChI codes and
comprehensive InChI-enabled cheminformatic databases become indispensible (Wohlgemuth
et al., 2010).
“InChI” is an abbreviation for “International Chemical Identifier”, a system of expressing
chemical structures as compact strings of text suitable for efficiently and unambiguously
conveying chemical structures across text-based systems such as web search engines. The
InChI system was developed by the International Union of Pure and Applied Chemistry
(IUPAC) and the National Institute of Standards and Technology (NIST). Each unique
chemical structure can be converted into its own unique InChI code and vice-versa1. There
are a range of freely-available software tools that allow one to draw a chemical structure and
obtain its InChI code or enter an InChI code and have its structure drawn automatically (see
Table 1 for examples). All the major metabolite information databases tag their entries with
InChI codes, so if one is uncertain of the name of a target metabolite, the best approach is to
generate its InChI code and search with that. Some cheminformatic databases provide web-
based structure drawing tools allowing users to effectively generate an InChI code and
search with it in a single step. One of the advantages of using an unambiguous structural
identifier such as InChI to search a database is that if no hits are obtained, one may fairly
safely conclude that the target molecule was not in the database2. When a hit is obtained,
however, the returned information may include common name(s) for the molecule that can
aid in subsequent literature searches. For anyone building metabolite databases or
supplying supplementary tables of metabolite data for publication, annotation of these data
with InChI codes is highly-recommended (Wohlgemuth et al., 2010). Online tools for
generating InChI codes from structures or other identifiers are listed in Table 1. A
particularly useful tool for metabolomics researchers is the Chemical Translation Service
provided by the lab of Oliver Fiehn (Wohlgemuth et al., 2010) since this tool is capable of
batch translations of miscellaneous metabolite identifiers and synonyms to standard InChI
codes and other common identifiers.
1 There is one caveat to this statement. The only truly non-ambiguous InChI codes are called “Standard”
InChI (often abbreviated to “StdInChI” - these always begin with the string “InChI=1S/”). If building a
metabolomics database, it is advisable to use only standard InChI codes.
2 Some metabolite databases were built prior to the release of Standard InChI and have been annotated
using non-standard InChI codes (always beginning with “InChI=1/”). It is always a good idea to check
which InChI type a database uses before searching it with an InChI code.
50 Metabolomics
into accurately and systematically defined and ‘chemical ontologies’ that can be used in
practically useful ways is a non-trivial task. Despite this, a number of metabolite-related
databases have begun developing and/or employing hierarchical systems of compound
classification, allowing users to browse lists of metabolites via classification trees
(ontologies). Examples of databases employing compound ontologies or hierarchical
compound taxonomies for annotation of metabolite information include PubChem, ChEBI
(Degtyarenko et al., 2008), the BioCyc family of metabolic pathway databases (Caspi et al.,
2010), the Human Metabolome Database (HMDB) (Wishart et al., 2007) and
MetabolomeExpress (Carroll et al., 2010). The ChEBI compound ontology is by far the most
advanced and comprehensive ontology for biological small molecules and is downloadable
in open formats from the ChEBI website. Its adoption is recommended in the development
of new metabolomics databases.
Table 1. Recommended online tools for generating unambiguous InChI structural identifier
strings from structures, names or other identifiers
2.5.1 ChemSpider
Description: A freely-accessible collection of compound data from across the web with a
very versatile search engine.
Scope: all chemicals – not just metabolites
Semantic content: Many synonyms, identifiers and external database IDs and link-outs
Physicochemical content: Masses, formula, experimental melting point, physical state,
appearance, stability, storage compatibility, safety. A substantial amount of additional
predicted data.
Biological content: Links to MeSH
Analytical content: Some compounds have spectra
Noteworthy tools: Search by physicochemical properties
Modes of access: search (by synonym, InChI, SMILES, CAS, structure), API. Limited to 5000
structures per day.
Strengths: Enormous index of chemicals that is widely linked to external online resources –
a good starting point if looking for information on a particular chemical.
Limitations: Broad focus means extracting desired subsets of information can be difficult.
Cannot be downloaded. Results only returned in HTML - not spread sheet format. Limited
to 5000 structures per day.
URL: http://www.chemspider.com/
Analytical content: many compounds have LC/MS, GC/MS and/or NMR spectra obtained
under standardised conditions
Noteworthy tools: Versatile ‘Data Extractor’. Searching based on spectral properties
Modes of access: browse, search, complex query with data extractor, download
Strengths: Comprehensive, may be freely downloaded in entirety. Human focus is good for
human metabolomics researchers.
Limitations: Important fields are empty for some very common metabolites. Being limited
to human metabolites limits utility for other research areas. Downloadable flat-file format
requires parsing in order to be usable in spread sheets or local databases.
Reference: (Wishart et al., 2007)
URL: http://www.hmdb.ca/
2.5.4 PubChem
Description: A freely available general dictionary of chemicals.
Online Metabolomics Databases and Pipelines 53
2.6.3 Reactome
Description: An interactive collection of curated, peer-reviewed metabolic pathways with
cross-referencing of reactions and pathways between organisms. Pathways are displayed
via an intuitive GUI but may be downloaded in a variety of open formats.
Species: A variety of species. Most comprehensive for human.
Metabolic pathway content: hierarchically organised curated and peer-reviewed metabolic
pathways; reactions; reaction-gene associations
Noteworthy features: Interactive pathway viewer
Modes of access: browse, search and download
Strengths: Peer-reviewed, user-friendly, different subcellular metabolite pools are treated as
separate entities
Limitations: Reaction-centric. Not much information about metabolites and does not
provide any tools for overlaying metabolite expression data.
Reference: (Croft et al., 2011)
URL: http://www.reactome.org
2.6.4 KappaView
Description: A web-based tool allowing users to overlay metabolite- and gene-expression
responses and correlations onto custom pathway diagrams or onto a collection of neat,
simple and interactive metabolic pathway diagrams.
Species: A variety of species.
Metabolism-related content: hierarchically organised curated metabolic pathways;
reactions; reaction-gene associations
Noteworthy features: Gene and metabolite expression overlay
Modes of access: browse, search and download
Strengths: User-friendly; neat/simple diagrams; may be integrated into third party websites
using a flexible API; can also overlay metabolite-metabolite, gene-gene and metabolite-gene
correlations
Limitations: Does not support InChI
Reference: (Sakurai et al., 2010)
URL: http://kpv.kazusa.or.jp/kpv4/
2.6.6 KNApSAcK
Description: A comprehensive species-metabolite relationship database for plants.
Although not strictly a metabolic pathway database, this database is useful for identifying
plant species that contain a certain chemical or identifying chemicals that have been
reported in a particular plant species or higher level taxon.
Species: Plants
Metabolism-related content: References to literature reporting the presence of compounds
in different plant species. Chemical structures. Masses.
Noteworthy features: References to literature.
Modes of access: browse, search
Strengths: Contains information on many plant-specific specialised metabolites.
Limitations: Data itself is not downloadable.
Reference: (Shinbo et al., 2006)
URL: http://kanaya.naist.jp/KNApSAcK/
n
3.5 Reference data for liquid chromatography-MS, MS/MS and MS
While the low-cost and operational simplicity of GC/MS has led it to become the most
widely employed analytical platform in metabolomics, an increasing number of laboratories
are adopting complementary techniques based on liquid chromatography (LC)- and direct
infusion (DI)/MS methods that employ different ionisation techniques and more advanced
mass-spectrometers capable of MS, MS/MS, MS3 and MSn modes of analysis together with
much higher mass accuracy and resolution than is provided by most standard GC-MS
systems. In the paragraphs below, the various types of non GC/MS, MS-based
metabolomics techniques such as LC/MS, DI/MS and capillary electrophoresis (CE)/MS
including tandem MS and MSn methods will be referred to collectively as “LC/MS”
techniques.
While GC/MS metabolomics is dominated almost entirely by electron impact ionisation (EI)
methods using the industry-standardised ionisation energy of 70eV, yielding highly-
reproducible fragmentation spectra between different GC/MS instruments, such broad
standardisation has not occurred for LC/MS. For LC/MS, the enormous diversity of mass-
spectrometer types, combined with a lack of highly-developed LC ‘retention-index’ systems
present significant challenges towards the creation of standardized MSRI reference libraries,
analogous to those available for GC/MS, capable of unambiguous cross-laboratory peak
identification for LC/MS.
The simplest type of online reference data for LC/MS metabolomics are the accurate,
monoisotopic masses and molecular formulas of metabolites and, in some cases, their stable-
isotope-labelled isotopomers. The data-processing packages provided with MS instruments
capable of high-accuracy mass measurements generally allow users to create custom
libraries of accurate masses and/or molecular formulas (for improved match scoring based
on the shapes of isotopic envelopes) for target analytes to assist with peak identification.
Although accurate masses or molecular formulas alone are not sufficient to unambiguously
identify metabolite signals (due to the high frequency of structural isomers across nature),
using these data in a rational manner can often provide valuable clues about the possible
identities of peaks.
A good way of reducing (but not eliminating) ambiguity in accurate mass-based assignments
is to build a separate accurate mass library for each biological system under investigation
and to include in each library only those metabolites for which literature evidence exists to
support their presence in that organism. An easy way of doing this is to use the advanced
query tool provided with each of the BioCyc family of metabolic pathway databases (of
which there are many). While the metabolite sets thus obtained may not be complete, this is
a fast way of obtaining a good quality starting set.
Another approach for reducing ambiguity in LC/MS peak identifications is to use MS/MS
spectral similarity as a scoring parameter to complement accurate-mass MS based
assignments (see (Matsuda et al., 2009; Matsuda et al., 2010) for good examples). The major
online sources of MS/MS spectra for metabolites are MassBank (Horai et al., 2010), METLIN
(Smith et al., 2005), ReSpect for Phytochemicals (http://spectra.psc.riken.jp/menta.cgi/
index) and the HMDB (Wishart et al., 2007). These databases each have different strengths
and limitations which will be outlined shortly. With the notable exception of ReSpect for
Phytochemicals, a drawback that these databases share is a lack of support for bulk
Online Metabolomics Databases and Pipelines 59
downloading of spectra. That said, MassBank does provide a powerful API to partially
overcome the need for bulk download while the METLIN website currently reports that an
API is in development.
3.6 The need for chromatographic retention data in LC/MS reference databases
It is important to note that, for high-confidence peak identifications that meet minimum
reporting standards outlined by the Metabolomics Standards Initiative (MSI) (Sansone et al.,
2007), it is necessary to support peak identifications with an additional, orthogonal
identification parameter. In the case of LC/MS, where chromatography is used, this
parameter is generally retention time or relative retention time agreement with an authentic
standard. Unfortunately, there appear to be few if any LC/MS reference databases that
provide retention time or relative retention time information. Absolute retention times vary
from instrument to instrument and from column to column (even between columns of the
same make and model), and are therefore considered to be of limited use for high-
confidence inter-laboratory peak identification. However, relative retention times (or
retention indices), where the retention time of each peak is expressed relative to one or two
other peaks in the same chromatogram, are far more stable (Tarasova et al., 2009) and may
provide an avenue to the compilation of LC-MS reference libraries capable of providing
MSI-compliant peak identifications by combining accurate mass MS or MS/MS spectra with
meaningful and highly reproducible retention index (RI) properties. Complementary to this
approach would be the further development of RI-prediction models that can accurately
predict the LC retention indices of metabolites based on their structures (Hagiwara et al.,
2010).
It is important to note that sufficient RI reproducibility may only be achievable with certain
simple types of stationary and mobile phase combinations whereby a single stationary
phase interaction mechanism (eg. hydrophobic interactions in C18 reversed-phase
chromatography or hydrogen-bonding interactions in silanol based normal phase
chromatography) applies to all analytes. In separations over mixed-mode stationary phases
where multiple interaction mechanisms occur, there is more potential for variations in
chromatographic conditions to differentially affect different peaks, thus changing their
relative retention times. Public databases of “Accurate Mass / retention Time (AMT) tags”
are playing increasingly important roles in peptide identification in LC-MS proteomics
(Hagiwara et al., 2010). A similar trend is to be expected in metabolomics.
Modes of access: browse, search and download individual spectra via web interface. Bulk
FTP download of raw spectra via BMRB FTP site.
Strengths: Enormous resource for NMR metabolomics. Includes a wide range of metabolites
including those that don’t occur in humans (eg. plant-specific metabolites). Spectral
matching tools provide batch-processing capability.
Limitations: No support for bulk download of metabolite information based on complex
query
Reference: (Cui et al., 2008)
URL: http://mmcd.nmrfam.wisc.edu
3.7.3 METLIN
Description: A repository for metabolite information and tandem mass spectrometry data.
Species: Not formally species-constrained but is fairly human-centric
Reference data: Accurate masses of >44000 metabolites. >28000 high-resolution
Quadrupole/Time-Of-Flight (Q/TOF) MS/MS spectra for ~5000 metabolites. Multiple
collision energies.
Noteworthy features: Batch searching of mzXML MS/MS files against the database.
Integration with XCMS LC/MS data-processing pipeline. Neutral loss search.
Modes of access: Search only. API in development
Online Metabolomics Databases and Pipelines 61
Strengths: A large set of standardized NMR and GC/MS spectra help new labs to quickly
set up metabolite profiling platforms.
Limitations: No bulk-download (must be purchased from instrument manufacturer).
Reference: (Smith et al., 2005)
URL: http://metlin.scripps.edu
3.7.4 MassBank
Description: A repository for mass-spectra of pure compounds. Features a unique design
involving a centralised interface but a distributed network of data servers providing the
mass-spectra.
Species: Not species constrained. Not limited to biological metabolites.
Reference data: >29000 mass spectra from a wide range of instrument types including, but
not limited to, GC/MS, LC/MS and LC-MS/MS.
Noteworthy features: Batch searching of MS/MS files against the database. Neutral loss
search. Most sophisticated and powerful spectral search and visualisation capabilities of all
available mass-spectral repositories.
Modes of access: Search, browse and API.
Strengths: Many spectra, powerful search capabilities.
Limitations: No bulk-download. However, individual spectra may be downloaded in text
format.
Reference: (Horai et al., 2010)
URL: http://www.massbank.jp/
3.7.7 MoTo DB
Description: A liquid chromatography-mass spectrometry-based metabolome database for
tomato
Species: Tomato (Solanum lycopersicum)
Reference data: Masses, retention times, UV/Vis properties and MS/MS fragment
information for a range of metabolites reported to occur in tomato plants.
Noteworthy features: Includes retention times.
Modes of access: Search only.
Strengths: Provides literature references to support peak annotations.
Limitations: Very limited search capability. No browse capability. No download.
Reference: (Moco et al., 2006)
URL: http://appliedbioinformatics.wur.nl/moto/
3.7.9 MetabolomeExpress
Description: An interactive database of downloadable MSRI libraries, raw and processed
GC/MS metabolite profiling datasets and a database of metabolic phenotypes observed in
any organism using any analytical technique. Includes a complete GC/MS data processing
pipeline and cross-study data mining tools.
Species: Not formally species-constrained but current content is plant-centric.
Reference data: A number of GC/MS MSRI libraries are downloadable from the website. Golm
Metabolome Database MSRI libraries are provided for use within the data processing pipeline.
Noteworthy features: Members may independently upload their own MSRI libraries for
interactive dissemination and use within the GC/MS data-processing pipeline.
Modes of access: browse and FTP
Strengths: Libraries free for download. Provides a built-in GC/MS data processing
pipeline.
Limitations: No API. No search.
Reference: (Carroll et al., 2010)
URL: http://www.metabolome-express.org
towards the needs of metabolomics research. Thanks to the availability of these packages and
the availability of standardised analytical reference libraries, it is now quite feasible for
researchers with limited experience to conduct detailed processing and analysis of their
instrumental datasets with little more than a fast internet connection, an up-to-date web-
browser and, in some cases, an FTP-client program for uploading data. This section will
provide an overview of the types of data processing pipelines that are currently accessible
online and compare the most powerful examples in more detail.
have seen a strong increase in the number of metabolomics labs sharing primary datasets
from their own websites and even the emergence of centralized metabolomics data
repositories allowing arbitrary labs to share their datasets publicly without even having to
set up their own website. These groups that have been voluntarily driving the free and open
dissemination of primary metabolomics data should be commended! The following sections
will highlight the data sharing efforts that have been made by individual groups within the
metabolomics community and describe the centralized metabolomics data repositories that
are currently in operation and/or development.
5.2.4 SetupX
Description: A study design database for GC/MS metabolomics experiments.
Species: Plant species
Reference data: Provides raw and processed GC/MS data for download together with
metadata.
Noteworthy features: Metabolite detections are searchable by species and species are
searchable by metabolite detections.
Modes of access: Search, browse and download.
Strengths: One of very few sites to archive and disseminate raw chromatograms.
Experimental datasets may be downloaded as single zipped files.
Limitations: Enormous sizes of zipped experimental dataset files means that download
errors frequently occur during long downloads. No quantitative information is provided
with metabolite detections and there is no way to compare the results of different
experiments.
Reference: (Scholz and Fiehn, 2007)
URL: http://fiehnlab.ucdavis.edu:8080/m1/
5.2.5 PlantMetabolomics.org
Description: A database of processed, large-scale metabolic phenotype information
obtained from an array of different Arabidopsis thaliana T-DNA insertion mutants.
68 Metabolomics
5.2.6 Mery-B
Description: A repository for plant metabolomics datasets including experimental metadata
processed data and raw data for NMR experiments.
Species: Plants.
Reference data: Provides NMR-based metabolite quantification data for a variety of tissues
from a variety of species grown under a variety of conditions. Based on ~1000 spectra.
Chemical shift peak assignment information is provided.
Noteworthy features: Interactive raw data viewers for 1D NMR and GC/MS data.
Modes of access: Search and browse.
Strengths: Contains data from a range of peer-reviewed publications and references to
literature are clearly presented. Raw NMR spectra and GC chromatograms are available for
visualisation. All experimental protocols are provided.
Limitations: Tools for statistical analysis are not yet functional. Data are not downloadable
for offline analysis. Analytical reference libraries are not provided. Peak assignments are not
seamlessly integrated into the raw data viewer. No direct links between statistical results
and raw data vizualisation. Interface is not very intuitive.
Reference: (Ferry-Dumazet et al., 2011)
URL: http://www.cbib.u-bordeaux2.fr/MERYB/home/home.php
5.2.7 MetabolomeExpress
Description: An interactive, centralized metabolomics data repository for metabolomics
data from all organisms and all analytical platforms that provides a variety of cross-study
data-mining tools for analysis of metabolic phenotypes. Processed data may be uploaded in
a simple tab-delimited format. Alternatively, raw GC/MS data may be uploaded and
Online Metabolomics Databases and Pipelines 69
processing online using the integrated data-processing pipeline before being imported into
the data repository.
Species: Not formally species-constrained but current content is plant-centric. Data from
other systems is currently being gathered from the literature.
Reference data: MSRI libraries, GC/MS chromatograms, processed results, metadata in
systematic formats. Database currently includes >12000 publicly available metabolite
response statistics representing >100 metabolic phenotypes from 8 species under 22 different
experiments in 16 different peer-reviewed publications.
Noteworthy features: Members may independently upload their own MSRI libraries for
interactive dissemination and use within the GC/MS data-processing pipeline. Provides
tools for cross-study meta-analysis and database-driven phenotype recognition by pattern
matching.
Modes of access: browse and FTP
Strengths: All public data free for download. Provides a built-in GC/MS data processing
pipeline. Allows cross-study analysis. Processed metabolite response statistics are
transparently linked to underlying raw data in an interactive raw data viewer.
Limitations: No API. No search. Raw data processing pipeline needs to be extended to
support analytical platforms other than GC/MS. Does not provide as many multivariate
analysis and classification tools as other web-based metabolomics data-processing systems.
Reference: (Carroll et al., 2010)
URL: http://www.metabolome-express.org
6. Conclusion
The field of metabolomics informatics development is moving very rapidly. New data-
processing tools and new data repositories will continue to emerge. As they do, an
increasingly important area to make progress in will be in the standardization of universal
data exchange formats that allow free flow of data between compliant databases. Similarly
important will be the development of user-friendly metadata capture tools that make
systematic annotation of their datasets as painless as possible for biologists. These
developments will require the development of new ontologies and/or the extension of
existing ontologies that do not cover all of the terms required to describe metabolomics
experiments. The efficient sharing and mining of well-annotated and well-quality-controlled
metabolomics data across the internet will undoubtedly lead to many important discoveries
in the future.
7. References
Bais P, Moon SM, He K, Leitao R, Dreher K, Walk T, Sucaet Y, Barkan L, Wohlgemuth G,
Roth MR, Wurtele ES, Dixon P, Fiehn O, Lange BM, Shulaev V, Sumner LW, Welti
R, Nikolau BJ, Rhee SY, Dickerson JA (2010) PlantMetabolomics.org: a web portal
for plant metabolomics experiments. Plant physiology 152: 1807-1816
70 Metabolomics
Biswas A, Mynampati KC, Umashankar S, Reuben S, Parab G, Rao R, Kannan VS, Swarup S
(2010) MetDAT: a modular and workflow-based free online pipeline for mass
spectrometry data processing, analysis and interpretation. Bioinformatics 26: 2639-
2640
Carroll AJ, Badger MR, Harvey Millar A (2010) The MetabolomeExpress Project: enabling
web-based processing, analysis and transparent dissemination of GC/MS
metabolomics datasets. BMC Bioinformatics 11: 376
Caspi R, Altman T, Dale JM, Dreher K, Fulcher CA, Gilham F, Kaipa P, Karthikeyan AS,
Kothari A, Krummenacker M, Latendresse M, Mueller LA, Paley S, Popescu L,
Pujar A, Shearer AG, Zhang P, Karp PD (2010) The MetaCyc database of metabolic
pathways and enzymes and the BioCyc collection of pathway/genome databases.
Nucleic acids research 38: D473-479
Cochrane G, Karsch-Mizrachi I, Nakamura Y (2011) The International Nucleotide Sequence
Database Collaboration. Nucleic acids research 39: D15-D18
Croft D, O'Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath
G, Jassal B, Jupe S, Kalatskaya I, Mahajan S, May B, Ndegwa N, Schmidt E,
Shamovsky V, Yung C, Birney E, Hermjakob H, D'Eustachio P, Stein L (2011)
Reactome: a database of reactions, pathways and biological processes. Nucleic
acids research 39: D691-697
Cui Q, Lewis IA, Hegeman AD, Anderson ME, Li J, Schulte CF, Westler WM, Eghbalnia HR,
Sussman MR, Markley JL (2008) Metabolite identification via the Madison
Metabolomics Consortium Database. Nature biotechnology 26: 162-164
Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, McNaught A, Alcantara R,
Darsow M, Guedj M, Ashburner M (2008) ChEBI: a database and ontology for
chemical entities of biological interest. Nucleic acids research 36: D344-350
Ferry-Dumazet H, Gil L, Deborde C, Moing A, Bernillon S, Rolin D, Nikolski M, de Daruvar
A, Jacob D (2011) MeRy-B: a web knowledgebase for the storage, visualization,
analysis and annotation of plant NMR metabolomic profiles. BMC plant biology 11:
104
Hagiwara T, Saito S, Ujiie Y, Imai K, Kakuta M, Kadota K, Terada T, Sumikoshi K, Shimizu
K, Nishi T (2010) HPLC Retention time prediction for metabolome analysi.
Bioinformation 5: 255-258
Horai H, Arita M, Kanaya S, Nihei Y, Ikeda T, Suwa K, Ojima Y, Tanaka K, Tanaka S,
Aoshima K, Oda Y, Kakazu Y, Kusano M, Tohge T, Matsuda F, Sawada Y, Hirai
MY, Nakanishi H, Ikeda K, Akimoto N, Maoka T, Takahashi H, Ara T, Sakurai N,
Suzuki H, Shibata D, Neumann S, Iida T, Funatsu K, Matsuura F, Soga T, Taguchi
R, Saito K, Nishioka T (2010) MassBank: a public repository for sharing mass
spectral data for life sciences. Journal of mass spectrometry : JMS 45: 703-714
Iijima Y, Nakamura Y, Ogata Y, Tanaka Ki, Sakurai N, Suda K, Suzuki T, Suzuki H, Okazaki
K, Kitayama M, Kanaya S, Aoki K, Shibata D (2008) Metabolite annotations based
on the integration of mass spectral information. The Plant Journal 54: 949-962
Kaminuma E, Mashima J, Kodama Y, Gojobori T, Ogasawara O, Okubo K, Takagi T,
Nakamura Y (2010) DDBJ launches a new archive database with analytical tools for
next-generation sequence data. Nucleic acids research 38: D33-D38
Online Metabolomics Databases and Pipelines 71
Smith CA, Want EJ, O'Maille G, Abagyan R, Siuzdak G (2006) XCMS: processing mass
spectrometry data for metabolite profiling using nonlinear peak alignment,
matching, and identification. Analytical chemistry 78: 779-787
Tarasova IA, Guryca V, Pridatchenko ML, Gorshkov AV, Kieffer-Jaquinod S, Evreinov VV,
Masselon CD, Gorshkov MV (2009) Standardization of retention time data for AMT
tag proteomics database generation. Journal of Chromatography B 877: 433-440
Ulrich EL, Akutsu H, Doreleijers JF, Harano Y, Ioannidis YE, Lin J, Livny M, Mading S,
Maziuk D, Miller Z, Nakatani E, Schulte CF, Tolmie DE, Kent Wenger R, Yao H,
Markley JL (2008) BioMagResBank. Nucleic acids research 36: D402-408
Wishart DS, Tzur D, Knox C, Eisner R, Guo AC, Young N, Cheng D, Jewell K, Arndt D,
Sawhney S, Fung C, Nikolai L, Lewis M, Coutouly MA, Forsythe I, Tang P,
Shrivastava S, Jeroncic K, Stothard P, Amegbey G, Block D, Hau DD, Wagner J,
Miniaci J, Clements M, Gebremedhin M, Guo N, Zhang Y, Duggan GE, Macinnis
GD, Weljie AM, Dowlatabadi R, Bamforth F, Clive D, Greiner R, Li L, Marrie T,
Sykes BD, Vogel HJ, Querengesser L (2007) HMDB: the Human Metabolome
Database. Nucleic acids research 35: D521-526
Wohlgemuth G, Haldiya PK, Willighagen E, Kind T, Fiehn O (2010) The Chemical
Translation Service--a web-based tool to improve standardization of metabolomic
reports. Bioinformatics 26: 2647-2648
Xia J, Psychogios N, Young N, Wishart DS (2009) MetaboAnalyst: a web server for
metabolomic data analysis and interpretation. Nucleic acids research 37: W652-660
0
4
1. Introduction
Metabolomics has seen a rapid development of new technologies, methodologies, and
data analysis procedures during the past decade. The development of fast gas- and
liquid-chromatography devices coupled to sensitive mass-spectrometers, supplemented by
the unprecedented precision of nuclear magnetic resonance for structure elucidation of
small molecules, together with the public availability of database resources associated to
metabolites and metabolic pathways, has enabled researchers to approach the metabolome
of organisms in a high-throughput fashion. Other "omics" technologies have a longer history
in high-throughput, such as next generation sequencing for genomics, RNA microarrays for
transcriptomics, and mass spectrometry methods for proteomics. All of these together give
researchers a unique opportunity to study and combine multi-omics aspects, forming the
discipline of "Systems Biology" in order to study organisms at multiple scales simultaneously.
Like all other "omics" technologies, metabolomics data acquisition is becoming more reliable
and less costly, while at the same time throughput is increased. Modern time-of-flight (TOF)
mass spectrometers are capable of acquiring full scan mass spectra at a rate of 500Hz from 50
to 750 m/z and with a mass accuracy <5 ppm with external calibration (Neumann & Böcker,
2010). At the opposite extreme of machinery, Fourier-transform ion-cyclotron-resonance
(FTICR) mass spectrometers coupled to liquid chromatography for sample separation reach
an unprecedented mass accuracy of <1 ppm m/z and very high mass resolution (Miura
et al., 2010). These features are key requirements for successful and unique identification of
metabolites. Coupled to chromatographic separation devices, these machines create datasets
ranging in size from a few hundred megabytes to several gigabytes per run. While this is not
a severe limitation for small scale experiments, it may pose a significant burden on projects
that aim at studying the metabolome or specific metabolites of a large number of specimens
and replicates, for example in medical research studies or in routine diagnostics applications
tailored to the metabolome of a specific species (Wishart et al., 2009).
Thus, there is a need for sophisticated methods that can treat these datasets efficiently in terms
of computational resources and which are able to extract, process, and compare the relevant
information from these datasets. Many such methods have been published, however there is
a high degree of fragmentation concerning the availability and accessibility of these methods,
which makes it hard to integrate them into a lab’s workflow.
The aim of this work is to discuss the necessary and desirable features of a software framework
for metabolomics data preprocessing based on gas-chromatography (GC) and comprehensive
74
2 Metabolomics
Will-be-set-by-IN-TECH
workflows of the two application pipelines to give the reader a thorough understanding of
the methods used by ChromA and ChromA4D.
Finally, we discuss the current state of the presented Open Source frameworks and give an
outlook into the future of software frameworks and data standards for metabolomics.
Fig. 1. A typical workflow for a metabolomics experiment. Steps shown in orange (solid
border) are usually handled within the bioinformatics domain, while the steps shown in
green (dashed border) often involve co-work with scientists from other disciplines.
recently as the successor to the latter two, mzML (Deutsch, 2008; Martens et al., 2010). All of
these formats include well-defined data structures for meta-information necessary to interpret
data in the right context, such as detector type, chromatographic protocol, detector potential
and other details about the separation and acquisition of the data. Furthermore, they explicitly
model chromatograms and mass spectra, with varying degrees of detail.
NetCDF is the oldest and probably most widely used format today. It is routinely
exported even by older machinery, which offers backwards compatibility to those. It is
a general-purpose binary format, with a header that describes the structure of the data
contained in the file, grouped into variables and indexed by dimensions. In recent years,
efforts were made to establish open formats for data exchange based on a defined grammar
in extensible markup language (XML) with extendable controlled vocabularies, to allow
new technologies to be easily incorporated into the file format without breaking backwards
compatibility. Additionally, XML formats are human readable which narrows the technology
gap. mzXML was the first approach to establish such a format. It has been superseded
by mzData and, more recently, mzML was designed as a super-set of both, incorporating
extensibility through the use of an indexed controlled vocabulary. This allows mzML to be
adapted to technologies like GCxGC-MS without having to change its definition, although
its origins are in the proteomics domain. One drawback of XML-based formats is often
claimed to be their considerably larger space requirements when compared to the supposedly
more compact binary data representations. Recent advances in mzML approach this issue by
compressing spectral data using gzip compression.
The data is continuously stored in a vendor-dependent native format during sample
processing on a GC-MS machine. Along with the mass spectral information, like ion mass (or
equivalents) and abundance, the acquisition time of each mass spectrum is recorded. Usually,
the vendor software includes methods for data conversion into one of the aforementioned
formats. However, especially when a high degreee of automation is desired, it may be
beneficial to directly access the data in their native format. This avoids the need to run
the vendor’s proprietary software manually for every data conversion task. Both the
ProteoWizard framework (Kessner et al., 2008) and the Trans Proteomic Pipeline (Deutsch
et al., 2010) include multiple vendor-specific libraries for that use case.
2.2 Preprocessing
Raw mass specrometry data is usually represented in sparse formats, only recording those
masses whose intensities exceed a user-defined threshold. This thresholding is usually
applied within the vendor’s proprietary software and may lead to artificial gaps within the
data. Thus, the first step in preprocessing involves the binning of mass spectra over time
into bins of defined size in the m/z dimension, followed by interpolation of missing values.
After binning, the data is stored as a rectangular array of values, with the first dimension
representing time, the second dimension representing the approximate bin mass values, and
the third dimension representing the intensity corresponding to each measured ion. This
process is also often described as resampling (Lange et al., 2007).
Depending on various instrumental parameters, the raw exported data may require additional
processing. The most commonly reported methods for smoothing are the Savitzky-Golay
filter (Savitzky & Golay, 1964), LOESS regression (Smith et al., 2006) and variants of local
averaging, for example by a windowed moving average filter. These methods can also be
Generic
GC-MS BasedSoftware
Metabolomics Frameworks for GC-MS Based Metabolomics 775
Generic Software Frameworks for
applied to interpolate values where gaps are present in the original data. The top-hat filter
(Bertsch et al., 2008; Lange et al., 2007) is used to remove a varying baseline from the signal.
More refined methods use signal decomposition and reconstruction methods, such as Fourier
transform and continuous wavelet transform (CWT) (Du et al., 2006; Fredriksson et al., 2009;
Tautenhahn et al., 2008) in order to remove noise and baseline contributions from the signal
and simultaneously find peaks.
2.4 Alignment
The alignment problem in metabolomics and proteomics stems from the analytical
methods used. These produce sampled sensor readings acquired over time in fixed or
programmed intervals, usually called chromatograms. The sensor readings can be one- or
multidimensional. In the first case, detectors like ultra violet and visible light absorbance
detectors (UV/VIS) or flame ionization detectors (FID) measure the signal response as
one-dimensional features, e.g. as the absorbance spectrum or electrical potential, respectively.
Multi-dimensional detectors like mass spectrometers record a large number of features
simultaneously, e.g. mass and ion count. The task is then to find corresponding and
non-corresponding features between different sample acquisitions. This correspondence
problem is a term used by Åberg et al. (2009) which describes the actual purpose of alignment,
namely to find true correspondences between related analytical signals over a number of
sample acquisitions. For GC-MS- and LC-MS-based data, a number of different methods have
been developed, some of which are described in more detail by Castillo, Gopalacharyulu,
Yetukuri & Orešič (2011) and Åberg et al. (2009). Here, we will concentrate on those methods
that have been reported to be applicable to GC-MS. In principle, alignment algorithms can
be classified into two main categories: peak- and signal-based methods. Methods of the first
type start with a defined set of peaks, which are present in most or all samples that are to be
aligned before determining the best correspondences of the peaks between samples in order to
then derive a time correction function. Krebs et al. (2006) locate landmark peaks in the TIC and
then select pairs of those peaks with a high correlation between their mass spectra in order to
fit an interpolating spline between a reference chromatogram and the to-be-aligned one. The
78
6 Metabolomics
Will-be-set-by-IN-TECH
method of Robinson et al. (2007) is inspired by multiple sequence alignment algorithms and
uses dynamic programming to progressively align peak lists without requiring an explicit
reference chromatogram. Other methods, like that of Chae et al. (2008) perform piecewise,
block-oriented matching of peaks, either on the TIC, on selected masses, or on the complete
mass spectra. Time correction is applied after the peak assignments between the reference
chromatogram and the others have been calculated. Signal-based methods include recent
variants of correlation optimized warping (Smilde & Horvatovich, 2008), parametric time
warping (Christin et al., 2010) and dynamic time warping (Christin et al., 2010; Clifford et al.,
2009; Hoffmann & Stoye, 2009; Prince & Marcotte, 2006) and usually consider the complete
chromatogram for comparison. However, attempts are made to reduce the computational
burden associated with a complete pairwise comparison of mass spectra by partitioning the
chromatograms into similar regions (Hoffmann & Stoye, 2009), or by selecting a representative
subset of mass traces (Christin et al., 2010). Another distinction in alignment algorithms is the
requirement of an explicit reference for alignment. Some methods apply clustering techniques
to select one chromatogram that is most similar to all others (Hoffmann & Stoye, 2009; Smilde
& Horvatovich, 2008), while other methods choose such a reference based on the number of
features contained in a chromatogram (Lange et al., 2007) or by manual user choice (Chae
et al., 2008; Clifford et al., 2009). For high-throughput applications, alignments should be fast
to calculate and reference selection should be automatic. Thus, a sampling method for time
correction has recently been reported by Pluskal et al. (2010) for LC-MS. A comparison of these
methods is given in the same publication.
biological context. This task is usually not handled by the frameworks described in this
chapter. Many web-based analysis tools allow to put the data into a larger context,
by providing name- or id-based mapping of the experimentally determined metabolite
concentrations onto biochemical pathways like MetaboAnalyst (Xia & Wishart, 2011),
MetabolomeExpress (Carroll et al., 2010), or MeltDB (Neuweger et al., 2008). The latter allows
association of the metabolomics data with other results for the same subjects under study or
with results from other "omics" experiments on the same target subjects, but this is beyond
the scope of the frameworks presented herein.
3.1 XCMS
XCMS (Smith et al., 2006) is a very mature framework and has seen constant development
during the last five years. It is mainly designed for LC-MS applications, however its binning,
peak finding and alignment are also applicable to GC-MS data. XCMS is implemented in
the GNU R programming language, the de-facto standard for Open Source statistics. Since
GNU R is an interpreted scripting language, it is easy to write custom scripts that realize
additional functionality of the typical GC-MS workflow described above. XCMS is part of
the Bioconductor package collection, which offers many computational methods for various
"omics" technologies. Further statistical methods are available from GNU R.
XCMS supports input in NetCDF, mzXML, mzData and, more recently, mzML format. This
allows XCMS to be used with virtually any chromatography-mass spectrometry data, since
vendor software supports conversion to at least one of those formats. XCMS uses the xcmsRaw
object as its primary tabular data structure for each binned data file. The xcmsSet object is then
used to represent peaks and peak groups and is used by its peak alignment and diffreport
features.
The peak finding methods in XCMS are quite different from each other. For data with normal
or low mass resolution and accuracy, the matched filter peak finder (Smith et al., 2006)
is usually sensitive enough. It uses a Gaussian peak template function with user defined
width and signal-to-noise critera to locate peaks on individual binned extracted ion current
80
8 Metabolomics
Will-be-set-by-IN-TECH
(EIC) traces over the complete time range of the binned chromatogram. The other method,
CentWave (Tautenhahn et al., 2008) is based on a continuous wavelet transform on areas of
interest within the raw data matrix. Both peak finding methods report peak boundaries and
integrated areas for raw data and for the data reconstructed from the peak finder’s signal
response values.
Initially designed for LC-MS, XCMS does not have a method to group co-eluting peaks into
peak groups, as is a requirement in GC-MS methods using electron ionization. However,
CAMERA (Tautenhahn et al., 2007) shows how XCMS can be used as a basis in order to create
a derived application, in this case for ion annotation between samples.
Peak alignment in XCMS is performed using local LOESS regression between peak groups
with very similar m/z and retention time behaviour and good support within each sample
group. This allows a simultaneous alignment and retention time correction of all peaks. The
other available method is based on the Obi-Warp dynamic time warping (Prince & Marcotte,
2006) algorithm and is capable of correcting large non-linear retention time distortions. It uses
the peak set with the highest number of features as alignment reference, which is comparable
to the approach used by Lange et al. (2007). However, it is much more computationally
demanding then the LOESS-based alignment.
XCMS’s diffreport generates a summary report of significant analyte differences between two
sample sets. It uses Welch’s two-sample t-statistic to calculate p-values for each analyte group.
ANOVA may be used for more than two sample sets.
A number of different visualizations are also available, both for raw and processed data. These
include TIC plots, EIC plots, analyte group plots for grouped features, and chromatogram (rt,
m/z, intensity) surface plots.
XCMS can use GNU R’s Rmpi infrastructure to execute arbitary function calls, such as profile
generation and peak finding, in parallel on a local cluster of computers.
3.2 PyMS
PyMS (Callaghan et al., 2010; Isaac et al., 2009) is a programming framework for GC-MS
metabolomics based on the Python programming language. It can therefore use a large
number of scientific libraries which are accessible via the SciPy and NumPy packages (SciPy,
2011). Since Python is a scripting language, it allows to do rapid prototyping, comparable to
GNU R. However, Python’s syntax may be more familiar for programmers with a background
in object-oriented programming languages.
The downloadable version of PyMS currently only supports NetCDF among the more recent
open data exchange formats. Nonetheless, it is the only framework in this comparison with
support for the JCAMP GC-MS file format.
PyMS provides dedicated data structures for chromatograms, allowing efficient access to
EICs, mass spectra, and peak data.
In order to find peaks, PyMS also builds a rectangular profile matrix with the dimensions
time, m/z and intensity. Through the use of slightly shifted binning boundaries, they
reduce the chance of false assignments of ion signals to neighboring bins, when binning is
performed with unit precision (bin width of 1 m/z). PyMS offers the moving average and
the Savitzky-Golay (Savitzky & Golay, 1964) filters for signal smoothing of EICs within the
Generic
GC-MS BasedSoftware
Metabolomics Frameworks for GC-MS Based Metabolomics 819
Generic Software Frameworks for
profile matrix. Baseline correction can be performed by the top-hat filter (Lange et al., 2007).
The actual peak finding is based on the method described by Biller & Biemann (1974) and
involves the matching of local peak maxima co-eluting within a defined window. Peaks are
integrated for all co-eluting masses, starting from a peak apex to both sides and ending if the
increase in area falls below a given threshold.
Peak alignment in PyMS is realized by the method introduced by Robinson et al. (2007). It
is related to progressive multiple sequence alignment methods and is based on a generic
dynamic programming algorithm for peak lists. It proceeds by first aligning peak lists within
sample groups, before aligning the aligned peak lists of different groups, until all groups have
been aligned.
Visualizations of chromatogram TICs, EICs, peaks and mass spectra are available and are
displayed to the user in an interactive plot panel.
For high-throughput applications, PyMS can be used together with MPI to parallelize tasks
within a local cluster of computers.
3.3 Maltcms
The framework Maltcms allows to set up and configure individual processing components
for various types of computational analyses of metabolomics data. The framework is
implemented in JAVA and is modular using the service provider pattern for maximal
decoupling of interface and implementation, so that it can be extended in functionality at
runtime.
Maltcms can read data from files in NetCDF, mzXML, mzData or mzML format. It uses a
pipeline paradigm to model the typical preprocessing workflow in metabolomics, where each
processing step can define dependencies on previous steps. This allows automatic pipeline
validation and ensures that a user can not define an invalid pipeline. The workflow itself is
serialized to XML format, keeping track of all resources created during pipeline execution.
Using a custom post-processor, users can define which results of the pipeline should be
archived.
Maltcms uses a generalization of the ANDI-MS data schema internally and a data provider
interface with corresponding implementations to perform the mapping from any proprietary
data format to an internal data object model. This allows efficient access to individual mass
spectra and other data available in the raw-data files. Additionally, developers need no special
knowledge of any supported file format, since all data can be accessed generically. Results
from previous processing steps are referenced in the data model to allow both shadowing of
data, e.g. creating a processing result variable with the same name as an already existing
variable, and aggregation of processing results. Thus, all previous processing results are
transparently accessible for downstream elements of a processing pipeline, unless they have
been shadowed.
Primary storage of processing results is performed on a per-chromatogram basis in the binary
NetCDF file format. Since metabolomics experiments create large amounts of data, a focus is
put on efficient data structures, data access, and scalability of the framework.
82
10 Metabolomics
Will-be-set-by-IN-TECH
Embedding Maltcms in existing workflows or interfacing with other software is also possible,
as alignments, peak-lists and other feature data can be exported as comma separated value
files or in specific xml-based formats, which are well-defined by custom schemas.
To exploit the potential of modern multi-core CPUs and distributed computing networks,
Maltcms supports multi-threaded execution on a local machine or within a grid of connected
computers using an OpenGrid infrastructure (e.g. Oracle Grid Engine or Globus Toolkit
(Foster, 2005)) or a manually connected network of machines via remote method invocation
(RMI).
The framework is accompanied by many libraries for different purposes, such as the
JFreeChart library for 2D-plotting or, for BLAS compatible linear algebra, math and statistics
implementations, the Colt and commons-math libraries. Building upon the base library Cross,
which defines the commonly available interfaces and default implementations, Maltcms
provides the domain dependent data structures and specializations for processing of
chromatographic data.
Table 1. Overview of available Open Source software frameworks for GC-MS based
metabolomics. a: Part of Bioconductor 2.8
3.3.1 ChromA
ChromA is a configuration of Maltcms that includes preprocessing, in the form of mass
binning, time-scale alignment and annotation of signal peaks found within the data, as
well as visualizations of unaligned and aligned data from GC-MS and LC-MS experiments.
The user may supply mandatory alignment anchors as CSV files to the pipeline and a
database location for tentative metabolite identification. Further downstream processing can
be performed either on the retention time-corrected chromatograms in NetCDF format, or on
the corresponding peak tables in either CSV format or XML format.
Peaks can either be imported from other tools, by providing them in CSV format to ChromA,
giving at least the scan index of each peak in a file per row. Alternatively, ChromA has a fast
peak finder that locates peaks based on derivatives of the smoothed and baseline-corrected
TIC, using a moving average filter followed by top-hat filter baseline-substraction, with a
predefined minimum peak-width. Peak alignment is based on a star-wise or tree-based
application of an enhanced variant of pairwise dynamic time warping (DTW) (Hoffmann &
Stoye, 2009). To reduce both runtime and space requirements, conserved signals throughout
the data are identified, constraining the search space of DTW to a precomputed closed
polygon. The alignment anchors can be augmented or overwritten by user-defined anchors,
such as previously identified compounds, characteristic mass or MS/MS identifications.
Then, the candidates are paired by means of a bidirectional best-hits (BBH) criterion, which
can compare different aspects of the candidates for similarity. Paired anchors are extended to
k-cliques with configurable k, which help to determine the conservation or absence of signals
across measurements, especially with respect to replicate groups. Tentative identification
of peaks against a database using their mass spectra is possible using the MetaboliteDB
module. This module provides access to mass-spectral databases in msp-compatible format,
for example the Golm Metabolite Database or the NIST EI-MS database.
ChromA visualizes alignment results including paired anchors in birds-eye view or as a
simultaneous overlay plot of the TIC. Additionally, absolute and relative differential charts
are provided, which allow easy spotting of quantitative differences.
Peak tables are exported in CSV format, including peak apex positions, area under curve, peak
intensity and possibly tentative database identifications. Additionally, information about the
matched and aligned peak groups is saved in CSV format.
of the frameworks Guineu (Castillo, Mattila, Miettinen, Orešič & Hyötyläinen, 2011) and
ChromA4D (Maltcms, 2011).
Table 3. Feature comparison of Open Source software frameworks for GCxGC-MS based
metabolomics
4.1 Guineu
Guineu is a recently published graphical user interface and application for the comparative
analysis of GCxGC-MS data (Castillo, Mattila, Miettinen, Orešič & Hyötyläinen, 2011). It
currently reads LECO ChromaTOF software’s peak list output after smoothing, baseline
correction, peak finding, deconvolution, database search and retention index (RI) calculation
have been performed within ChromaTOF.
The peak lists are aligned pairwise using the score alignment algorithm, which requires
user-defined retention time windows for both separation dimensions. Additionally, the
one-dimensional retention index (RI) of each peak is used within the score calculation. Finally,
a threshold for mass spectral similarity is needed in order to create putative peak groups.
Additional peak lists are added incrementally to an already aligned path, based on the
individual peaks’ score against those peaks that are already contained within the path.
Guineu provides different filters to remove peaks by name, group occurrence count, or other
features from the ChromaTOF peak table. In order to identify compound classes, the Golm
metabolite database (GMD) substructure search is used. Peak areas can be extracted from
ChromaTOF using the TIC, or using extracted, informative or unique masses. Peak area
normalization is available relative to multiple user-defined standard compounds.
After peak list processing, Guineu produces an output table containing information for
all aligned peaks, containing information on the original analyte annotation as given by
ChromaTOF, peak areas, average retention times in both dimensions together with the average
RI and further chemical information on the functional group and substructure prediction as
given by the GMD. It is also possible to link the peak data to KEGG and Pubchem via the CAS
annotation, if it is available for the reported analyte.
For statistical analysis of the peak data, Guineu provides fold change- and t-tests, principal
component analysis (PCA), analysis of variance (ANOVA) and other methods.
Guineu’s statistical analysis methods provide different plots of the data sets, e.g. for showing
the principal components of variation within the data sets after analysis with PCA.
4.2 ChromA4D
For the comparison of comprehensive two-dimensional gas chromatography-mass
spectrometry (GCxGC-MS) data, ChromA4D accepts NetCDF files as input. Additionally,
the user needs to provide the total runtime on the second orthogonal column (modulation
time) to calculate the second retention dimension information from the raw data files. For
tentative metabolite identification, the location of a database can be given by the user.
ChromA4D reports the located peaks, their respective integrated TIC areas, their best
matching corresponding peaks in other chromatograms, as well as a tentative identification
for each peak. Furthermore, all peaks are exported together with their mass spectra to MSP
format, which allows for downstream processing and re-analysis with AMDIS and other
tools. The exported MSP files may be used to define a custom database of reference spectra
for subsequent analyses.
Peak areas are found by a modified seeded region growing algorithm. All local maxima of the
TIC representation that exceed a threshold are selected as initial seeds. Then, the peak area
is determined by using the distance of the seed mass spectrum to all neighbor mass spectra
as a measure of the peak’s coherence. The area is extended until the distance exceeds a given
threshold. No information about the expected peak shape is needed. The peak integration
is based on the sum of TICs of the peak area. An identification of the area’s average or apex
mass spectrum or the seed mass spectrum is again possible using the MetaboliteDB module.
To represent the similarities and differences between different chromatograms, bidirectional
best hits are used to find co-occurring peaks. These are located by using a distance that
exponentially penalizes differences in the first and second retention times of the peaks to be
compared. To avoid a full computation of all pairs of peaks, only those peaks within a defined
window of retention times based on the standard deviation of the exponential time penalty
function are evaluated.
86
14 Metabolomics
Will-be-set-by-IN-TECH
5. Application examples
The following examples for GC-MS and GCxGC-MS are based on the Maltcms framework,
using the ChromA and ChromA4D configurations described in the previous sections. In order
to run them, the recent version of Maltcms needs to be downloaded and unzipped to a local
folder on a computer. Additionally, Maltcms requires a JAVA runtime environment version 6
or newer to be installed. If these requirements are met, one needs to start a command prompt
and change to the folder containing the unzipped Maltcms.
(a) Overlay of unaligned data sets, extracted from middle section within a time range of 1100 to 1700
seconds.
(b) Overlay with highlighted peak areas (without n-alkanes) after peak finding and integration. Zoomed
in to provide more detail.
the other samples. The acquired data were exported to ANDI-MS (NetCDF) format before
ChromA was applied. The default ChromA pipeline chroma.properties was run from the
unzipped Maltcms directory with the following command (issued on a single line of input):
-i points to the directory containing the input data, -o points to the directory where output
should be placed, -f can be a comma separated list of filenames or, as in this case, a wildcard
expression, matching all files in the input directory having a file name ending with .CDF.
The final argument indicated by -c is the path to the configuration file used for definition
of the pipeline and its commands. An overlay of the raw TICs of the samples is depicted in
Figure 2(a). The default ChromA pipeline configuration creates a profile matrix with nominal
mass bin width. Then, the TIC peaks are located separately within each sample data file and
are integrated (Figure 2(b)). The peak apex mass spectra are then used in the next step in
order to build a multiple peak alignment between all peaks of all samples by finding large
cliques, or clusters of peaks exhibiting similar retention time behaviour and having highly
similar mass spectra. This coarse alignment could already be used to calculate a polynomial
fit, correcting retention time shift for all peaks. However, the ChromA pipeline uses the
peak clusters in order to constrain a dynamic time warping (DTW) alignment in the next
step, which is calculated between all pairs of samples. The resulting distances are used to
determine the reference sample with the lowest sum of distances to all remaining samples.
Those are then aligned to the reference using the warp map obtained from the pairwise
DTW calculations. The pairwise DTW distances can easily be used for a hierarchical cluster
analysis. Similar samples should be grouped into the same cluster, while dissimilar samples
should be grouped into different clusters. Figure 3 shows the results of applying a complete
linkage clustering algorithm provided by GNU R to the pairwise distance matrix. It is clearly
visible that the samples are grouped correctly, without incorporation of any external group
assignment. Thus, this method can be used for quality control of multiple sample acquisitions,
when the clustering results are compared against a pre-defined number of sample groups.
0.05
0.00
mix1−MD.cdf
n−alkanes−AD.cdf
n−alkanes−MD.cdf
mix1−1−AD.cdf
mix1−2−AD.cdf
as.dist(1 − tbl)
(Agilent, Santa Clara, CA, USA). The inlet temperature was set to 275◦ C. An Rtx-5ms (Restek,
Bellefonte, PA, USA) capillary column was used with a length of 30 m, 0.25 mm diameter
and 0.25 μm film thickness as the primary column. The secondary column was a BPX-50
(SGE, Ringwood, Victoria, Australia) capillary column with a length of 2 m, a diameter of
0.1 mm and 0.1 μm film thickness. The temperature program of the primary oven was set to
the following conditions: 70◦ C for 2 min, 4◦ C/min to 180◦ C, 2◦ C/min to 230◦ C, 4◦ C/min
to 325◦ C hold 3 min. This program resulted in a total runtime of about 70 min for each
sample. The secondary oven was programmed with an offset of 15◦ C to the primary oven
temperature. The thermal modulator was set 30◦ C relative to the primary oven and to a
modulation time of 5 seconds with a hot pulse time of 0.4 seconds. The mass spectrometer ion
source temperature was set to 200◦ C and the ionization was performed at -70eV. The detector
voltage was set to 1600V and the stored mass range was 50-750 m/z with an acquisition rate
of 200 spectra/second.
90
(a) 2D-TIC plot before filters were applied. Long tailing peaks are (b) 2D-TIC plot after application of a moving median filter with
visible within the vertical dimension. Additionally, high frequency window size 3 for smoothing of high-frequency noise and successive
noise is present in the raw exported data, which is barely visible at application of a top hat filter with a window size of 301 for baseline
this resolution. removal in order to reduce false positive peak finding results.
Fig. 4. Visualizations of Standard-Mix1-1 before and after signal filtering with the ChromA4D processing pipeline.
Metabolomics
Generic
GC-MS BasedSoftware
Generic Software Frameworks for
Metabolomics Frameworks for GC-MS Based Metabolomics
(a) 2D-TIC plot of Standard-Mix1-1 after peak finding and (b) Differential plot of the two Standard-Mix1 samples after DTW
integration with seeded region growing based on the cosine mass alignment based on vertical TIC slices. Yellow color indicates
spectral similarity with a fusion threshold of 0.99. Peak areas were similar amounts of total ion intensity in both samples. Green
limited to contain at most 100 points. shows a surplus in Standard-Mix1-1, while red shows a surplus in
Standard-Mix1-2.
Fig. 5. Visualizations of Standard-Mix1-1 after peak finding and of Standard-Mix1-1 and Standard-Mix1-2 after alignment with DTW.
19
91
92
20 Metabolomics
Will-be-set-by-IN-TECH
The raw acquired samples in LECO’s proprietary ELU format were exported to NetCDF
format using the LECO ChromaTOF software v.4.22 (LECO, St. Joseph, MI, USA). Initial
attempts to export the full, raw data failed with a crash beyond a NetCDF file size of 4GBytes.
Thus, we resampled the data with ChromaTOF to 100 Hz (resampling factor 2) and exported
with automatic signal smoothing and baseline offset correction value of 1 which resulted
in file sizes around 3GBytes per sample. The samples presented in this section are named
"Standard-Mix1-1" and "Standard-Mix1-2" and were measured on different days (Nov. 29th,
2008 and Dec. 12th, 2008).
The default ChromA4D pipeline for peak finding was called from within the unzipped
Maltcms directory (issued on a single line of input):
The pipeline first preprocesses the data by applying a median filter followed by a top hat
filter in order to remove high- and low-frequency noise contributions (Figures 4(a) and 4(b)).
ChromA4D then uses a variant of seeded region growing in order to extend peak seeds, which
are found as local maxima of the 2D-TIC. These initial seeds are then extended until the mass
spectral similarity of the seed and the next evaluated candidate drops below a user-defined
threshold, or until the peak area reaches its maximum, pre-defined size (Figure 5(a)). After
peak area integration, the pipeline clusters peaks between samples based on their mass
spectral similarity and retention time behaviour in both dimensions to form peak cliques
(not shown) as multiple peak alignments, which are then exported into CSV format for
further downstream processing. Another possible application shown in Figure 5(b) is the
visualization of pairwise GCxGC-MS alignments using DTW on the vertical 2D-TIC slices,
which can be useful for qualitative comparisons.
7. Acknowledgements
We would like to thank Manuela Meyer at the Center for Biotechnology (CeBiTec), Bielefeld
University, Germany, for the preparation of the GC-MS samples used in this work.
We furthermore thank Denise Schöfbeck and Rainer Schumacher, Center for Analytical
Chemistry, IFA Tulln, University of Natural Resources and Life Sciences, Vienna, Austria,
for measuring those samples and for kindly providing us with the datasets. Furthermore, we
would like to express our gratitude to Anja Döbbe and Olaf Kruse, Algae BioTech Group at the
CeBiTec, for providing the GCxGC-MS samples and datasets. We finally thank our students
Mathias Wilhelm for his work on ChromA4D and Kai-Bernd Stadermann for his work on the
RMI-based remote execution framework.
8. References
Åberg, K., Alm, E. & Torgrip, R. (2009). The correspondence problem for metabonomics
datasets, Analytical and Bioanalytical Chemistry 394(1): 151–162.
Ahmad, I., Suits, F., Hoekman, B., Swertz, M. A., Byelas, H., Dijkstra, M., Hooft, R., Katsubo,
D., van Breukelen, B., Bischoff, R. & Horvatovich, P. (2011). A high-throughput
processing service for retention time alignment of complex proteomics and
metabolomics LC-MS data, Bioinformatics 27(8): 1176–1178.
Babushok, V. I., Linstrom, P. J., Reed, J. J., Zenkevich, I. G., Brown, R. L., Mallard, W. G. & Stein,
S. E. (2007). Development of a database of gas chromatographic retention properties
of organic compounds., Journal of Chromatography A 1157(1-2): 414–421.
Berk, M., Ebbels, T. & Montana, G. (2011). A statistical framework for biomarker discovery in
metabolomic time course data, Bioinformatics 27(14): 1979–1985.
94
22 Metabolomics
Will-be-set-by-IN-TECH
Bertsch, A., Hildebrandt, A., Hussong, R. & Zerck, A. (2008). OpenMS - An open-source
software framework for mass spectrometry., BMC Bioinformatics 9(1): 163.
Biller, J. E. & Biemann, K. (1974). Reconstructed Mass Spectra, A Novel Approach for
the Utilization of Gas Chromatograph—Mass Spectrometer Data, Analytical Letters
7(7): 515–528.
Callaghan, S., De Souza, D., Tull, D., Roessner, U., Bacic, A., McConville, M. & Likić, V. (2010).
Application and comparative study of PyMS Python toolkit for processing of gas
chromatography-mass spectrometry (GC-MS) data, 2nd Australasian Symposium on
Metabolomics, Melbourne 2010.
Carroll, A. J., Badger, M. R. & Harvey Millar, A. (2010). The MetabolomeExpress Project:
enabling web-based processing, analysis and transparent dissemination of GC/MS
metabolomics datasets, BMC Bioinformatics 11(1): 376.
Castillo, S., Gopalacharyulu, P., Yetukuri, L. & Orešič, M. (2011). Algorithms and tools for the
preprocessing of LC–MS metabolomics data, Chemometrics and Intelligent Laboratory
Systems 108(1): 23–32.
Castillo, S., Mattila, I., Miettinen, J., Orešič, M. & Hyötyläinen, T. (2011). Data Analysis Tool
for Comprehensive Two-Dimensional Gas Chromatography/Time-of-Flight Mass
Spectrometry, Analytical Chemistry 83(8): 3058–3067.
Chae, M., Reis, R. & Thaden, J. J. (2008). An iterative block-shifting approach to retention
time alignment that preserves the shape and area of gas chromatography-mass
spectrometry peaks, BMC Bioinformatics 9(Suppl 9): S15.
Christin, C., Hoefsloot, H. C. J., Smilde, A. K., Suits, F., Bischoff, R. & Horvatovich, P. L.
(2010). Time Alignment Algorithms Based on Selected Mass Traces for Complex
LC-MS Data, Journal of Proteome Research 9(3): 1483–1495.
Clifford, D., Stone, G., Montoliu, I., Rezzi, S., Martin, F.-P., Guy, P., Bruce, S. & Kochhar,
S. (2009). Alignment Using Variable Penalty Dynamic Time Warping, Analytical
Chemistry 81(3): 1000–1007.
Deutsch, E. (2008). mzML: a single, unifying data format for mass spectrometer output.,
Proteomics 8(14): 2776–2777.
Deutsch, E. W., Mendoza, L., Shteynberg, D., Farrah, T., Lam, H., Tasman, N., Sun, Z., Nilsson,
E., Pratt, B., Prazen, B., Eng, J. K., Martin, D. B., Nesvizhskii, A. I. & Aebersold, R.
(2010). A guided tour of the Trans-Proteomic Pipeline, Proteomics 10(6): 1150–1159.
Doebbe, A., Keck, M., Russa, M. L., Mussgnug, J. H., Hankamer, B., Tekce, E., Niehaus, K.
& Kruse, O. (2010). The Interplay of Proton, Electron, and Metabolite Supply for
Photosynthetic H2 Production in Chlamydomonas reinhardtii, Journal of Biological
Chemistry 285(39): 30247–30260.
Du, P., Kibbe, W. A. & Lin, S. M. (2006). Improved peak detection in mass spectrum by
incorporating continuous wavelet transform-based pattern matching, Bioinformatics
22(17): 2059–2065.
Foster, I. T. (2005). Globus Toolkit Version 4: Software for Service-Oriented Systems., in H. Jin,
D. A. Reed & W. Jiang (eds), IFIP International Conference on Network and Parallel
Computing, Springer, pp. 2–13.
Fredriksson, M. J., Petersson, P., Axelsson, B.-O. & Bylund, D. (2009). An automatic peak
finding method for LC-MS data using Gaussian second derivative filtering., Journal
of Separation Science 32(22): 3906–3918.
GNU R (2011).
URL: http://www.r-project.org/
Generic
GC-MS BasedSoftware
Metabolomics Frameworks for GC-MS Based Metabolomics 95
Generic Software Frameworks for
23
Hoffmann, N. & Stoye, J. (2009). ChromA: signal-based retention time alignment for
chromatography-mass spectrometry data, Bioinformatics 25(16): 2080–2081.
Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M. R., Li, P. & Oinn, T. (2006).
Taverna: a tool for building and running workflows of services, Nucleic Acids Research
34(suppl 2): W729–W732.
Hummel, J., Selbig, J., Walther, D. & Kopka, J. (2007). The Golm Metabolome Database:
a database for GC-MS based metabolite profiling, in J. Nielsen & M. Jewett (eds),
Metabolomics, Springer Berlin / Heidelberg, pp. 75–95.
HUPO PSI (2011).
URL: http://www.psidev.info/
Isaac, A., Lee, L., Keen, W., Erwin, T., Wang, Q., De Souza, D., Roessner, U., Pyke, J., Kotagiri,
R., Wettenhall, R., McConville, M., Bacic, A. & Likić, V. (2009). PyMS: A Python
toolkit for processing of gas chromatography-mass spectrometry data, Bioinformatics
Australia Conference, Melbourne 2009.
JAVA (2011).
URL: http://www.java.com/en/
Kankainen, M., Gopalacharyulu, P., Holm, L. & Orešič, M. (2011). MPEA–metabolite pathway
enrichment analysis, Bioinformatics 27(13): 1878–1879.
Kastenmüller, G., Römisch-Margl, W., Wägele, B., Altmaier, E. & Suhre, K. (2011).
metaP-Server: A Web-Based Metabolomics Data Analysis Tool, Journal of Biomedicine
and Biotechnology 2011: 1–8.
Kessner, D., Chambers, M., Burke, R., Agus, D. & Mallick, P. (2008). ProteoWizard:
open source software for rapid proteomics tools development., Bioinformatics
24(21): 2534–2536.
Kim, S., Fang, A., Wang, B., Jeong, J. & Zhang, X. (2011). An Optimal Peak Alignment For
Comprehensive Two-Dimensional Gas Chromatography Mass Spectrometry Using
Mixture Similarity Measure, Bioinformatics 27(12): 1660–1666.
Krebs, M. D., Tingley, R. D., Zeskind, J. E., Holmboe, M. E., Kang, J.-M. & Davis, C. E.
(2006). Alignment of gas chromatography-mass spectrometry data by landmark
selection from complex chemical mixtures, Chemometrics and Intelligent Laboratory
Systems 81(1): 74–81.
Lange, E., Gropl, C., Schulz-Trieglaff, O., Huber, C. & Reinert, K. (2007). A geometric approach
for the alignment of liquid chromatography mass spectrometry data, Bioinformatics
23(13): i273–i281.
Linke, B., Giegerich, R. & Goesmann, A. (2011). Conveyor: a workflow engine for
bioinformatic analyses, Bioinformatics 27(7): 903–911.
Lommen, A. (2009). MetAlign: Interface-Driven, Versatile Metabolomics Tool for
Hyphenated Full-Scan Mass Spectrometry Data Preprocessing, Analytical Chemistry
81(8): 3079–3086.
Maltcms (2011).
URL: http://maltcms.sourceforge.net
Martens, L., Chambers, M., Sturm, M., Kessner, D., Levander, F., Shofstahl, J., Tang,
W. H., Rompp, A., Neumann, S., Pizarro, A. D., Montecchi-Palazzi, L., Tasman, N.,
Coleman, M., Reisinger, F., Souda, P., Hermjakob, H., Binz, P. A. & Deutsch, E. W.
(2010). mzML–a Community Standard for Mass Spectrometry Data, Molecular and
Cellular Proteomics 10(1): R110.000133–R110.000133.
Matthews, L. (2000). ASTM Protocols for Analytical Data Interchange, 5(5): 60–61.
96
24 Metabolomics
Will-be-set-by-IN-TECH
Miura, D., Tsuji, Y., Takahashi, K., Wariishi, H. & Saito, K. (2010). A strategy
for the determination of the elemental composition by fourier transform ion
cyclotron resonance mass spectrometry based on isotopic peak ratios., Technical
Report 13, Innovation Center for Medical Redox Navigation, Kyushu University, 3-1-1
Maidashi, Higashi-ku, Fukuoka 12-8582, Japan.
Neumann, S. & Böcker, S. (2010). Computational mass spectrometry for metabolomics:
Identification of metabolites and small molecules, Analytical and Bioanalytical
Chemistry 398(7-8): 2779–2788.
Neuweger, H., Albaum, S. P., Niehaus, K., Stoye, J. & Goesmann, A. (2008). MeltDB: a
software platform for the analysis and integration of metabolomics experiment data,
Bioinformatics 24(23): 2726–2732.
Neuweger, H., Persicke, M., Albaum, S. P., Bekel, T., Dondrup, M., Hüser, A. T., Winnebald,
J., Schneider, J., Kalinowski, J. & Goesmann, A. (2009). Visualizing post genomics
data-sets on customized pathway maps by ProMeTra-aeration-dependent gene
expression and metabolism of Corynebacterium glutamicum as an example., BMC
Systems Biology 3: 82.
Oh, C., Huang, X., Regnier, F. E., Buck, C. & Zhang, X. (2008). Comprehensive
two-dimensional gas chromatography/time-of-flight mass spectrometry peak
sorting algorithm, Journal of Chromatography A 1179(2): 205–215.
Oliver, S. G., Paton, N. W. & Taylor, C. F. (2004). A common open representation of mass
spectrometry data and its application to proteomics research, Nature Biotechnology
22(11): 1459–1466. 10.1038/nbt1031.
Orchard, S., Hermjakob, H., Taylor, C. F., Potthast, F., Jones, P., Zhu, W., Julian, R. K. &
Apweiler, R. (2005). Second proteomics standards initiative spring workshop., Expert
review of proteomics, EMBL Outstation - European Bioinformatics Institute, Wellcome
Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. pp. 287–289.
Pierce, K. M., Hoggard, J. C., Hope, J. L., Rainey, P. M., Hoofnagle, A. N., Jack, R. M.,
Wright, B. W. & Synovec, R. E. (2006). Fisher Ratio Method Applied to Third-Order
Separation Data To Identify Significant Chemical Components of Metabolite Extracts,
Analytical Chemistry 78(14): 5068–5075.
Pierce, K. M., Wood, L. F., Wright, B. W. & Synovec, R. E. (2005). A comprehensive
two-dimensional retention time alignment algorithm to enhance chemometric
analysis of comprehensive two-dimensional separation data, Analytical Chemistry
77(23): 7735–7743.
Pluskal, T., Castillo, S., Villar-Briones, A. & Orešič, M. (2010). MZmine 2: Modular
framework for processing, visualizing, and analyzing mass spectrometry-based
molecular profile data, BMC Bioinformatics 11(1): 395.
Prince, J. & Marcotte, E. (2006). Chromatographic alignment of ESI-LC-MS proteomics
data sets by ordered bijective interpolated warping, Analytical Chemistry
78(17): 6140–6152.
Python (2008).
URL: http://www.python.org/download/releases/2.5.2/
Rew, R. & Davis, G. (1990). NetCDF: an interface for scientific data access, Computer Graphics
and Applications, IEEE 10(4): 76–82.
Robinson, M. D., De Souza, D. P., Saunders, E. C., Mcconville, M. J., Speed, T. P. & Likić,
V. A. (2007). A dynamic programming approach for the alignment of signal peaks in
Generic
GC-MS BasedSoftware
Metabolomics Frameworks for GC-MS Based Metabolomics 97
Generic Software Frameworks for
25
1. Introduction
Revolutionary improvements in high-throughput DNA sequencing technologies have made
it possible to measure gene, mRNA, proteins and metabolites, as well as their interaction at
global level. In the past decades, significant efforts in improving analytical technologies
pertaining to measuring mRNA, proteins and metabolites have been made. These efforts
have led to the generation of several new ‘omics’ research fields: transcriptomics,
proteomics, metabolomics, interactomics and so on (Singh & Nagaraj, 2006; Fiehn 2007; Lin
& Qian, 2007; Kandpal et al., 2009; Ishii & Tomita, 2009). Among them, metabolomics is an
approach to obtain a comprehensive evaluation of metabolites in cells. Compared with
transcriptomics and proteomics approaches, metabolomics can achieve large-scale
quantitative and qualitative measurements of cellular metabolites, which can thus generate
a high-resolution biochemical and functional information of an organism.
Due to the chemical complexity of cellular metabolites, it is generally accepted that no single
analytical technique can provide a comprehensive visualization of all metabolites, so
multiple technologies are generally employed (Dunn & Ellis, 2005; Villas-Boas Silas et al.,
2005; Hollywood et al., 2006; Dettmer et al., 2007; Lenz & Wilson, 2007; Seger & Sturm,
2007). The selection of the most suitable technology is typically a compromise between
speed, chemical selectivity, and instrumental sensitivity. Tools such as nuclear magnetic
resonance spectroscopy (NMR) are rapid, highly selective and non-destructive, but have
relatively lower sensitivities. Other tools such as capillary electrophoresis (CE) coupled to
laser-induced fluorescence detection are highly sensitive, but have limited chemical
selectivity (Ramautar et al., 2006). So far mass spectrometry (MS) measurement following
chromatographic separation offers the best combination of sensitivity and selectivity (Dunn
& Ellis, 2005; Bedair & Sumner, 2008). Mass-selective detection provides highly specific
chemical information including molecular mass and/or characteristic fragment-ion that is
directly related to chemical structure of molecules. This information can be utilized for
compound identification through spectral matching with data compiled in libraries for
100 Metabolomics
authentic compounds or used for de novo structural elucidation. Further, chemically selective
MS information can be obtained from extremely small quantities of metabolites in the pmole
and fmole level for many primary and secondary metabolites. Different technologies, either
individual or integrated, could be employed for different study aims, based on metabolite
identification, detection speed, high throughput and sensitivity. In the chapter, we will first
review current MS technologies that have been incorporated into many metabolomics
research programs as well as some of the emerging MS technologies that hold additional
promise for the future advancement of metabolomics.
In contrast to classical biochemical approaches that typically focus on a single metabolite,
single metabolic reaction or their kinetic properties, metabolomics involves collection of
large amount of quantitative data on a broad series of metabolites in an attempt to gain an
overall understanding of metabolism and/or metabolic dynamics associated with
conditions of interest, such as disease or drug exposure. Generally speaking, metabolomics
data share a great deal of similarity with transcriptomics data: both types of data matrices
are large, feature rich, and challenged with issues of dealing with a limited sample size and
a high-dimensional feature space. Thus in many cases the robust data processing algorithms
originally developed for transcriptomic analysis have been adapted directly for
metabolomic analysis. However, the challenges with metabolomic data can be unique, and
may require new methodologies supported with a detailed knowledge of cheminformatics,
bioinformatics, optimization, dynamic system theory and statistics. In recent years, many
computational methods have been developed specifically for metabolomic data, varying
from metabolic network analysis oriented or feature selection/data mining oriented. In the
chapter, we will introduce each of these methods and their relevant applications, and will
also discuss all the computational challenges associated.
(Mas et al., 2007). Negative ESI LCQ ion-trap MS was reported as an effective method for the
characterization of plant extracts with well-defined clusters in comparison to positive-ion
ESI and 1H-NMR profiling (Mattoli et al., 2006).
Gas chromatography-mass spectrometry (GC-MS) has been a very useful technology for
volatile and thermally stable polar and nonpolar metabolites (Tanaka et al., 1980).
Metabolite identification or confirmation is performed by retention time or index
comparisons with pure compounds and mass spectral interpretation or comparison using
retention index/mass spectral library databases (Wagner et al., 2003). Metabolites can be
classified into two classes: volatile metabolites not requiring chemical derivatisation
(Yassaa et al., 2001; Mallouchos et al., 2002; Deng et al., 2004) and non-volatile metabolites
requiring chemical derivatisation (Roessner et al., 2000). GC-MS based metabolic profiling
has been used to compare four Arabidopsis genotypes and showed each genotype
exhibited a different metabolite profile (Birkemeyer et al., 2003), and to compare
transgenic tomato plants over expressing hexokinase (Roessner-Tunali et al., 2003). Silent
phenotypes of potatoes have been distinguished from their parental background by
employing metabolic profiling (Weckwerth et al., 2004). The same approach has recently
been employed in microbial metabolomics to study the effect of different growth
conditions on Corynebacterium glutamicum (Strelkov et al., 2004).
The application of Liquid Chromatography coupled to Mass Spectrometry (LC-MS) in
metabolomics has been growing over the past few years (Wittmann et al., 2004). As a
universal separation technique that can be tailored for the targeted analysis of specific
metabolite groups or utilized in a broader non-targeted manner, LC offers additional
benefits of analyte recovery by fraction collection and/or concentration, which has been
difficult for GC separation. In addition, LC-MS operates at lower analysis temperatures than
GC-MS, which enables the analysis of heatlabile metabolites. LC-MS analysis does not
involve sample derivatization, which simplifies the sample-preparation and improves the
identification of metabolites. The major disadvantage of LC-MS relative to GC-MS is the lack
of transferable LC-MS libraries for metabolite identifications, although some efforts have
been initiated to construct in-house LC-MS or LC-MS-MS libraries for automated metabolite
identifications (Noteborn et al., 2000). Two-dimensional LC has also been utilized to increase
the peak separation capacity (Aharoni et al., 2002). Recent LC-MS metabolite-profiling
examples include the identification of flavonoids and isoflavonoids in Medicago truncatula
(Daykin et al., 2002), the revelation of novel pathways by studying the differential and
elicitor-specific responses in phenylpropanoid and isoflavonoid biosynthesis in Medicago
truncatula cell cultures (Farag et al., 2008), and the investigation of small polar-metabolite
responses to salt stress in Arabidopsis thaliana (Lindon et al., 2000). LC-MS has also been used
in the non-targeted analysis of endogenous metabolites in an unbiased manner (Rashed et
al., 1997; De Vos et al., 2007).
Capillary electrophoresis mass spectrometry (CE-MS) is a powerful separation technique for
charged metabolites (Ramautar et al., 2006; Monton et al., 2007). CE has superior separation
efficiencies compared to LC due to the plug-flow profile generated by the electroosmotic
flow (EOF) as compared to the parabolic flow in LC. Capillary zone electrophoresis (CZE)
has been the major CE mode used for CE-MS analysis of metabolites, due to the simplicity of
the running buffer. Simultaneous separation of charged and neutral metabolites can be
achieved using other CE modes (e.g., micellar electrokinetic chromatography (MEKC) or
102 Metabolomics
and with metabolites as low as amole to fmole (Schmid A et al., 2010), 106 times lower than a
typical population-based metabolomics. While amplification of DNA/RNA and highly
sensitive fluorescence measurements could be employed in single cell genomics,
transcriptomics and proteomics, no similar technique is available for single-cell
metabolomics. ii) Sample processing for a single cell is extremely challenging. Even though
detection limits for metabolites using MS can be as low as fmoles to amole range
(Amantonico et al., 2008); however, transferring of a cell or cell content to mass
spectrometer, conserving the original metabolome, and separating metabolites from cell
debris, proteins and salts, would be critical.
In recent years, several approaches have been established for MS-based single-cell
metabolomics (Figure 1) (see review by Heinemann & Zenobi, 2011). i) Sampling the cell
contents with a micropipette, followed by injection into a mass spectrometer using a
nano-electrospray ionization (nano-ESI) source (Masujima 2009). This approach, probably
only suitable for very large cells, can only measure a few cells per hour; ii) Sample
preparation on a microfluidic chip, followed by deposition on a sample plate for (matrix-
assisted) laser desorption/ionization (MALDI or LDI) mass spectrometry (Lu et al., 2006;
Mellors et al., 2008; Amantonico et al., 2008, 2010; Holmes et al., 2009). Once a complete
setup is realized, it has the potential to generate high throughput data in an automated
way; iii) Cell arraying, single cells are deposited on a sample plate for LDI or MALDI
covered by a solvent-repelling, application of a MALDI matrix in an organic solvent will
then lyse the cells and extract the compounds of interest for analysis by MALDI. This
approach is a true high-throughput operation because the sample arraying can be
automated, and thus the speed of MS instrument is the only limited factor (1000s of
cells/hour) (Urban et al., 2010); iv) Imaging mass spectrometry, many modern mass
spectrometers have imaging capabilities, with a spatial resolution of typically ~50 µm
(MALDI or LDI), and ~1 µm (secondary ion mass spectrometry, SIMS), at relatively fast
acquisition speed (Fletcher, 2009). With SIMS, the distribution of ions such as Na+, K+,
Ca2+, as well as cationized cholesterol, lipids present at cell surfaces can be imaged
(Fletcher 2009). However, so far the data generated through this approach has been less
quantitative (Heinemann & Zenobi, 2011).
Fig. 1. Schematics of the four MS-based approaches for single cell metabolomics.
(Heinemann & Zenobi, 2011)
Computational Methods to Interpret and Integrate Metabolomic Data 105
Only very few metabolites can be analyzed directly in single cells by autofluorescence
(Amantonico et al., 2010); however, by incorporating fluorescent tags or probes, researchers
have been able to detect more metabolites. For example, Fehr et al. (2002) developed a
protein based nanosensor for detection of maltose uptake by living yeast cells. In another
study, a genetically encoded fluorescent sensor was expressed in living cells for detecting
adenylate nucleotides (Berg et al., 2009). However, it is still arguable whether the formation
of these foreign complexes between sensor and metabolites in cells will cause damages and
lead to alteration of physiological status of cells.
The high sensitivity of electrochemical detectors to electroactive species makes them suitable
for targeted studies of metabolites in single-cell analysis. A range of microscale
electrochemical methods have been introduced to monitor various physiological processes
(Huang & Kennedy, 1995). For example, release of metabolites, such as catecholamines and
oxygen, can be readily measured electrochemically (Cannon et al., 2000). Specific methods
targeting particular metabolic pathways in single cells have been in use for a long time,
including autoradiography of cells preincubated with radioisotopically labeled compounds
(Fliermans & Schmidt, 1975).
Other methods used in single cells including single-cell spectroscopy in conjunction with
image analysis for glycogen metabolism in yeast cells (Cahill et al., 2000), enzyme-catalyzed
luminescence method for dopamine release from a mammalian nerve cell (Shinohara &
Wang, 2007), synchrotron Fourier transform infrared spectromicroscopy for ethanol
formation in single living cells of unicellular algae (Goff et al., 2009), raman spectroscopy
(Schuster et al., 2000; Buckmaster et al., 2009; Hermelink et al., 2009) for detecting nucleic
bases and amino acids in single cells, and nuclear magnetic resonance (NMR) for structural
characterization of organic compounds, including metabolites (Beckonert et al., 2007; Motta
et al., 2010). Brief summary and comparison for various metabolomics techniques discussed
above are listed in Table 1.
nondiscrimination of isomeric
high-throughput, simple sample
DIMS compounds, susceptilble to
preparation
ionization
Biomarker discovery: Like other “omics” studies, the primary objective of many metabolomics
studies is to find biomarkers that are discriminative between matched “case” and “control”
samples, i.e., which metabolites are apparently altered under different physiological
conditions. In pharmaceutical research, metabolomics study has been used for biomarker
discovery for different diseases, safety markers, or drug mechanism research. However,
given the large number of metabolites studied simultaneously with usually small sample
size, it is very common to find metabolites that appear persuasive but in fact spurious. Thus,
it is of crucial importance to control the rate of false positive (Broadhurst & Kell, 2006). To
tackle this problem, many statistical methods have been developed under the term of large
scale hypothesis testing (Benjamini & Hochberg, 1995; Storey 2002; Efron 2003, 2004a, 2004b,
2007a, 2007b, 2008; Storey & Tibshirani, 2003; Reiner et al., 2003; Xie et al., 2005). In classical
hypothesis testing, the fundamental problem is to control type I error, the probability that a
non-trivial finding is declared while it actually happens by chance. Type I error increases
with the number of independent hypothesis considered simultaneously. The well-known
and widely used strategy to control the overall type I error rate is Bonferroni-correction, in
which the critical value for individual hypothesis testing is obtained by dividing the
significance level by the number of hypothesis considered. For example, in metabolomics, if
the search for discriminating biomarkers is performed using 500 metabolites and an
acceptable chance to reject one true hypothesis is 0.05, and then the Bonferroni-corrected
critical value for rejecting an individual null hypothesis for a metabolite should be 0.05/500
= 0.0001). Berferroni-correction is conservative in the sense that it excludes type I errors at
the cost of increasing the potential for type II errors (false negatives) (Broadhurst & Kell,
2006).
A widely accepted error measure in microarray literature for large-scale hypothesis testing
is the false discovery rate (FDR), the proportion of false positives among all the discoveries.
The procedure controlling false discovery rate proposed by Benjamini and Hochberg (1995)
has been recognized as a breakthrough and widely applied or adapted by statistical
researchers (Efron, 2004a). Most of the literatures assume that the theoretical null hypothesis
is known in advance. However, Efron argued that in large scale hypothesis testing, like in
‘omics’ studies, the theoretical null often fails for reasons like correlations among proteins or
genes, unknown confounding factors, or systematic bias (Efron, 2004b, 2007a, 2008). Thus, it
will be appropriate to estimate the distribution of the null statistics from the data in order to
have a more meaningful discovery. Translating to the setting of a metabolomics study,
Efron’s concept aims to find a subset of metabolites that behave very differently from the
majority of the metabolites. Efron’s creative idea has received significant attention from the
research field. Applications to metabolite biomarker finding has not been reported but
certainly expected.
Metabolomics-based biomarker discovery have been reported. In one example for invasive
ovarian carcinomas and ovarian borderline tumors, a differential analysis of 291 detected
metabolites in sixty-six invasive ovarian carcinomas and nine borderline tumors of the
ovary revealed 51 metabolites that were significantly different between borderline tumors
and carcinoma with a FDR controlled at 7.8% (Denkert et al., 2006). For Onchocera volvulus,
analysis of an African sample set of 73 serum and plasm samples based on LC-MS revealed
a set of 14 biomarkers that showed excellent discrimination between Onchocera volvulus-
positive and negative individuals (Denery et al., 2010). Controlling FDR at 54% using
Computational Methods to Interpret and Integrate Metabolomic Data 109
Storey’s q-value approach (Storey, 2002) resulted in 194 features selected from a total of 2350
mass features. Among the 194 features, the authors selected the top 14 feature for
investigation.
Data clustering and visualization: Clustering or unsupervised modeling is useful for class
discovery and provides information on data similarity: metabolomic samples clustered or
grouped together can be objectively considered to be similar.
Principal components analysis (PCA) is probably the most widely used unsupervised
approach to data mining or visualization. PCA is a multivariate technique that transforms
the data into a coordinate system where each new projection (also called principal
components (PC)) is a linear combination of the original variables. PCs are orthogonal so
that each dimension is related to different data characteristics and source of variability in a
mathematical sense (Enot et al., 2008). As a dimension reduction tool, PCA is very useful for
metabolomics data visualization and further data clustering. However, PCA may not work
well for metabolomics data where the differences between groups are minor and obscured
by other covariates.
Self-organizing map (SOM) is another visualization tool for high-dimensional data
(Kohonen, 1998, 2001). The SOM describes a mapping from a higher dimensional input
space to a lower dimensional map space. The procedure for placing a vector from data space
onto the map is to find the node with the closest weight vector to the vector taken from data
space. Once the closest node is located, it is assigned the values from the vector taken from
the data space. The SOM places similar input data in adjacent nodes. Therefore, SOM forms
a semantic map where similar samples are mapped close together and dissimilar apart. One
disadvantage of SOM is that the final map solution is dependent on the order of the
presentation of the training data. The batching-learning version of the algorithm (Kohonen,
2001) overcame this problem and yields reproducible maps. SOM has been applied to
metabolic profiling for clustering blood plasma (Kaartinen et al., 1998), and NMR spectra of
breast cancer tissues (Beckonet et al., 2003b). More recently, Kouskoumvekaki et al. (2008)
applied SOM to identify similarities among the metabolic profiles of different filamentous
fungi. Meinicke et al. (2008) proposed one-dimensional SOM for metabolite-based
clustering and visualization of marker candidates. In a case study on the wound response of
Arabidopsis thaliana, they showed how the clustering and visualization capabilities of SOM
can be utilized to identify relevant groups of biomarkers.
As a popular unsupervised learning method, Hierarchical cluster analysis (HCA) clusters
the data to form a tree diagram or dendrogram which shows the relationships between
samples (Ebbels, 2007). The algorithm begins by computing the distances between all pairs
of samples. Initially each cluster consists of a single sample. The algorithm proceeds
iteratively until all samples are members of a single cluster. The final structure of the
resulting clusters depends on the choice of distance function or “linkage” between two
clusters as well as a similarity cut-off. The most popular choices of linkage are centroidal,
average, single (nearest neighbor) and complete (farthest neighbor) linkages. The centroidal
linkage defines the inter-cluster distance as the distance between the centroids. To
determine the cluster membership, one must decide on a similarity cut-off which breaks the
dendrogram into a number of separate clusters. As an example, Beckobert et al. (2003) used
the HCA method to explore a set of toxicology studies. HCA allowed interpretation of the
110 Metabolomics
data in terms of the magnitude and site of toxicological effect, and helped to explain
misclassifications by other methods.
The k-means clustering is a method of cluster analysis that aims to partition n observations
into k clusters in which each observation belongs to the cluster with the nearest mean. In k-
means clustering, the Euclidean distance is used as a distance metric and variance is used as
a measure of cluster scatter. The number of clusters k is an input parameter. When
performing k-means, it is important to run diagnostic checks for determining the number of
clusters in the data set. Thus, k-means is often used in conjunction with other clustering and
visualization methods (Ebbels, 2007).
Fuzzy k-means (also called Fuzzy c-means) is an extension of k-means clustering technique
based on Fuzzy logic. While k-means discovers hard clusters (a point belong to only one
cluster), Fuzzy k-means is a more statistically formalized method and discovers soft clusters
where a particular point can belong to more than one cluster with certain probability. In
Fuzzy k-means algorithm, one sample can be assigned to more than one class instead of only
one. The membership of each sample is calculated and then represented by a membership
value between 0 and 1, instead of just 0 and 1 in the hard clustering. Cuperlovic-Culf et al.
(2009) presented the application of fuzzy k-means clustering method for the classification of
samples based on metabolomics 1D 1H-NMR fingerprints. The sample classification was
performed on NMR spectra of cancer cell line extracts and of urine samples of type 2
diabetes patients and animal models. The fuzzy k-means clustering method allowed more
accurate sample classification in both datasets relative to the other tested methods including
PCA, HCA and k-means clustering. Li et al. (2009) applied fuzzy k-means to cluster three
gene types of Escherichia coli on the basis of their metabolic profiles and delivered better
results than PCA. On the basis of the optimized parameters, the fuzzy k-means was able to
reveal main phenotype changes and individual characters of three gene types of E. coli,
while PCA failed to model the metabolite data.
Clustering of metabolomics data can be hampered by noise originating from biological
variation, physical sampling error and analytical error. Bootstrap aggregating (bagging) is a
resampling technique that can deal with noise and improves accuracy. Hageman et al.
(2006) demonstrated the application of bagged clustering to metabolomics data. It was
argued that the bagged k-means should be favored against ordinary k-means clustering
when dealing with noisy metabolomics data.
In practice, it is common to combine dimensionality reduction and clustering methods. For
example, first, a sample-based principal component analysis (PCA) is performed to compute
a subset of principal components. Then the metabolite-specific PCA loadings of these
components are used for metabolite-based clustering using k-means or hierarchical methods
(Pohjanen et al., 2006).
Classification and prediction: While the purpose of clustering is to group similar data together,
classification aims at finding a rule to discriminate the classes in an optimal way as well as
selecting the subset of features that are most discriminative or predicative. In contrast to
clustering applications, the class label and the number of classes are known for a subset of
data (training samples) in a priori in classification problem. Once the rule or classifies are
determined using training dataset, it can be used to predict the class label (such as diseased
or not) of a test sample.
Computational Methods to Interpret and Integrate Metabolomic Data 111
The k-Nearest Neighbour (kNN) rule for classification may be the simplest of all supervised
classification approaches (Ebbels, 2007). Different from other supervised learning methods,
the training phase of kNN consists of only storing the training samples and the
corresponding class labels. In the classification (prediction) phase, a test sample is classified
by assigning the label which is most frequent among the k nearest training samples. The
method requires only the choice of k, the number of neighbors to be considered when
making the classification. Greater values of k reduce the effect of noise on the classification,
but make boundaries between classes less distinct. A limitation of this classification method
is that the classes with the more frequent examples tend to dominate the prediction of a test
sample. Usually, k is chosen through a cross validation procedure. kNN has been often used
as a comparator for other methods in literature (Beckonert et al., 2003; Baumgartner et al.,
2004).
Partial least square for discriminant analysis (PLS-DA) is a regression extension of PCA that
takes advantage of class label information to maximize separation between groups of
observations. PLS-DA models the relationship between the class affiliation matrix (Y) and
feature matrix (X), which is a generalization of multiple linear regressions. It determines a
set of latent variables explaining as much as possible of the covariance between the two
matrices. PLS-DA can deal with uncompleted dataset and multicollinearity problem. The
output of PLS-DA is the score matrix that can be plotted similarly as in PCA and the
predictor matrix containing estimated class affiliation (Ciosek et al., 2005; Trygg &
Lundstedt, 2007). The ortogonal-PLS (OPLS) method (Cloarec et al., 2005; Trygg &
Lundstedt, 2007) is a recent modification of the PLS method. The main idea of OPLS is to
separate the systematic variation in X into two parts, one that is linearly related to Y and one
that is orthogonal to Y. The OPLS method provides a prediction similar to that of PLS.
However, the interpretation of the models is improved because the structured noise is
modeled separately from the variation common in X and Y. Analogous to PLS, when the Y
matrix is class affiliation, the corresponding analysis is named OPLS-DA. Cloarec et al.
(2005) illustrated the applicability of the method in combination with statistical total
correlation spectroscopy to 1H NMR spectra of urine from a metabonomic study of a model
of insulin resistance based on the administration of a carbohydrate diet to three different
mice strains. Tapp and Kemsley (2009) recently discussed similarities and differences
between PLS-DA and OPLS-DA with a focus on the usage of OPLS in the analytical
chemistry literature. They concluded that the two methods are very similar and no one
outperforms the other, and the reported discrepancies in the literature must be due to
differences in the implementation details, or some otherwise ‘‘unfair’’ comparison between
the methods
An Artificial neural network (ANN) is a widely used non-linear data modeling tool (Bishop,
1995). An ANN is a computational model that is inspired by the structure and functional
aspects of biological neural network. An ANN consists of a layered network of nodes with
simple linear or sigmoid activation function. The most widely used type of ANN is the
multilayer perceptron (MLP), which has at least three layers including the input layer, one
or more hidden layers and output layer. The most attractive feature of a MLP is its capacity
to approximate any continuous function in arbitrary precision given enough number of
nodes in the hidden layers with sigmoid-type activation functions. ANN has been applied to
classification of tumor cells by different researchers (Maxwell et al., 1998; Ott et al., 2003).
112 Metabolomics
n m
i
i i M ,
where M is the observed mass and mi is the mass of the ith atom. Diophantine equation is
the basis for much of the mass spectrometry software to obtain compositions. Usually, there
are many integer solutions to it mathematically. Among all the mathematical solutions, we
then seek all of the integers ni that are chemically feasible considering some chemical
contextual information such as the valence rules, double-bond equivalents or exact mass
(Meija, 2006). Even with very high mass accuracy (<1 ppm), many chemically possible
formulae can be obtained in higher mass regions. To further reduce the number of potential
elemental compositions, it is necessary to utilize isotope abundance pattern (Kind & Fiehn,
2006).
The identification of small metabolites has been seen as one of the bottlenecks in
interpreting metabolomics data. Neumann & Bocker (2010) provided a review focusing on
the computational methods for electronspary ionization (ESI) mass spectrometry. One of the
most common methods for the identification of compounds using mass spectrometry is the
comparison with spectra of authentic standards. The Metabolomics Standards Initiative
Computational Methods to Interpret and Integrate Metabolomic Data 113
(MSI) has defined several confidence levels for the identification of non-novel chemical
compounds, ranging from level 1 for a rigorous identification to unidentified signals at level
4 (Sumner et al., 2007). The difference between level 1 and 2 is that the former requires the
comparison with authentic standards based on in-house data measured under identical
analytical conditions, whereas the latter allows one to use literature or external databases.
Level 1 or level 2 identifications are based on a comparison of “exact mass and isotope
pattern”. Even with the most exact mass and isotope pattern the identification will be
limited to the elemental composition. Many compounds share the same sum formula for
known metabolites in databases such as KEGG or PubChem. For all other MSI levels, the
“identification” usually reduces to an annotation with lower levels of confidence (Neumann
& Bocker, 2010).
When a reference spectrum is used, a similarity or distance function is needed for selecting
database entries. The most basic similarity functions are those based on counting the
number of matching peaks between a query spectrum and each of the database spectra. For
this, both spectra can be considered as binary vectors with 0’s and 1’s for “peak absent” and
“peak present”, respectively (Neumann & Bocker, 2010). Common distance functions on
binary vectors are the Hamming distance (counting any difference) or the Jaccard coefficient
(the fraction of matching peaks). Besides counting matches, other measures also consider
their actual mass and intensity, such as the Euclidean distance, the probability-based
matching (PBM), the normalized dot product (NDP), and a modified cosine distance for the
database search of EI spectra (Stein, 1994).
Oberacher et al. (2009) proposed and optimized a search function for tandem mass
spectrometry (MS/MS-spectra) based on a combination of relative and absolute match
probabilities, which combines the principle of peak counting and summed intensities of
matching peaks. The X-Rank algorithm (Mylonas et al., 2009) for MS/MS-spectra match is
based on probability calculations. It sorts peak intensities of a spectrum and then establishes
a correlation between two sorted spectra. X-Rank computes the probability that a rank from
an experimental spectrum matches a rank from a reference library spectrum. The solution
requires training on a representative dataset. In a training step, characteristic parameter
values are generated for a given data set. Identification of small compounds is still
challenging, especially for compounds that have not been recorded in any library or
structure database. Methods for these tasks are highly sought.
3.2 Computational methods for metabolic pathway analysis using metabolomic data
Metabolomics data provides a series of snapshots of cellular metabolism, which can be
combined with metabolic flux data for further analysis. Metabolic pathways are the true
functional units of metabolic systems (Schilling et al., 2000b). Finding biochemically
plausible pathways between metabolites in a metabolic network is a central problem in
computational metabolic modeling. Mathematical modeling approaches to metabolic
regulation analysis involve different levels of details and complexities ranging from detailed
kinetic models, stoichiometric analysis, structural kinetic models, to large scale topological
network analysis (Steuer, 2007).
Detailed kinetic models of metabolic pathways, based on explicit enzymekinetic rate
equations, is a bottom-up approach towards more comprehensive large-scale dynamic
114 Metabolomics
models. It allows for the most detailed quantitative evaluation of the dynamics of metabolic
systems, which is very important for improving the understanding of metabolic regulation
and control. The metabolic control analysis (MCA) is the culminating mathematical theory
from kinetic models, which describes the control and regulatory properties of metabolic
systems (Heinrich & Schuster, 1996; Fell, 1997).
A metabolic network is a collection of enzyme-catalyzed reactions and transport processes
that serve to dissipate substrate metabolites and generate final metabolites. The dynamics of
a metabolic system can be described by a set of ordinary differential equations:
dXi
Sij v j ( X , k ) S v( X , k ), (1)
dt j
where X i represents the concentration of the metabolite and Sij stands for the
stoichiometric coefficient for the reactant i in the jth reaction. v j ( X , k ) corresponds to the
flux through the jth reaction. The vector v( X , k ) consists of nonlinear enzyme-kinetic rate
functions, which depends on the concentration X and a set of kinetic parameters k . Given
the v( X , k ) function form, the set of kinetic parameters k and an initial state X (0) , the
differential equations can be solved numerically to obtain the time-dependent behavior of
all metabolites under consideration. The stoichiometric matrix S is an m by n matrix where
m corresponds to the number of metabolites and n is the total number of metabolites and n
is the total number of fluxes taking place in the network.
The stoichiometric analysis approach takes advantage of the structure nature of metabolic
system. Knowledge of the stoichiometry puts constraints on the feasible flux distributions,
which can be utilized to model the functional capabilities of metabolic networks (Varma &
Palsson 1994; Edwards & Palsson, 2000; Stelling et al., 2002; Famili et al., 2003; Price et al.,
2003).
The pathway structure should be an invariant property of the network along with
stoichiometry. Under steady-state, the set of ordinary equations reduce to linear
equations:
0 S v( X 0 , k ) S v 0 , (2)
Flux balance analysis (FBA) is a computational approach to reduction of the admissible flux
space (Schilling et al., 1999; Edwards & Palsson, 2000). FBA optimizes an objective function
such as maximal biomass yield or maximal energy production through the steady-state flux
space has resulted in many applications (Papin et al., 2004; Almaas et al., 2004;
Stephanopoulos et al., 2004). Using FBA, in silico studies on the systemic properties of the
Haemophilus influenzae and E. coli (Edwards & Palsson, 2000) metabolic networks have been
completed. Under various substrate conditions, Schilling et al. (2000a) explored the
metabolic capabilities and predicted functions of a sub-system of the E. coli using FBA.
A metabolic network can be decomposed into distinct pathways, termed elementary flux
modes (EFM). An EFM is the minimal set of reactions capable of working together in a
steady state, which is unique for a given metabolic network. Another closely related concept
is extreme pathways, which are a subset of elementary modes (Klamt & Stelling, 2003). All
feasible flux vectors can be described as linear combinations of EFMs. The concept of EFM
has resulted in a vast number of applications for metabolic network analysis (Stelling et al.,
2002; Schuster et al., 2002; Klamt & Schuster, 2002; Klamt & Gilles, 2004; Klamt et al., 2006).
For medium-sized metabolic networks, software packages have been developed for the
computation of elementary flux modes (Hoops et al., 2006; Klamt et al., 2007). Owing to a
combinatorial explosion of the number of elementary vectors, this approach becomes
computationally intractable for genome scale networks. To develop an analysis approach
computationally feasible even for genome scale networks, Urbanczik & Wagner (2005)
proposed to focus on conversion cone, the projection of the flux cone, which describes the
interaction of the metabolism with its external chemical environment. The method for
calculating the elementary vectors of this cone was applied to study the metabolism of
Saccharomyces cerevisiae.
Stoichiometric analysis does not incorporate dynamic properties into the description of the
system. Steuer (2006, 2007) proposed a structure kinetic modeling approach to augmenting
the stoichiometric analysis with kinetic properties. The idea of the proposed approach is to
use a local linear approximation to explicit kinetic model to capture the dynamic response to
perturbations, the stability of a metabolic state, as well as the transition to oscillatory
behavior. The local linear approximation is obtained from a Taylor series expansion of the
metabolic system. The linear term of the expansion is the derivative of the kinetic rate
equations with respect to the metabolic concentration X at a given state, which usually
requires knowledge of the enzyme-kinetic rate equations. Even in the absence of enzyme-
kinetic information, it is still possible to specify the structure of the linear term. Structure
kinetic modeling approach allows quantitative conclusions about the possible dynamics of
the system, based on only a minimal amount of additional information.
The extension of the detailed kinetic models to whole cell models is faced with some
fundamental difficulties including the absence of comprehensive measured kinetic
parameter values, and the observed inconsistency in the available kinetic data, and the
computational complexity of such models. Traditionally, kinetic models are constructed
using rate equations derived to describe conditions in vitro and thus rely on the use of in
vitro measured kinetic parameters. However, the conditions at which in vitro experiments
are performed are often very different from those inside the cell. Thus, in vitro kinetic rates
and in vitro kinetic parameters describe enzymatic behaviors that may not truly represent
the observed physiological kinetic behavior in the cell. Several methods have been proposed
116 Metabolomics
2009; van den Berg et al., 2009). In scenario 2, PCA was directly used to identify pattern of
the integrated metabolomic data with other “omics” data. For example, it was used to a
integrated metabolomic data and a proteomic data to reveal clustering of the two genotypes
(Weckwerth et al., 2004). While PCA can also be used for exploring polynomial relationships
and for multivariate outlier detection, this method is restricted to linear relationships.
In both correlation and PCA analysis, the roles of all variables are the same and they are
interchangeable. They are used to explore associations between factors. In some other
analyses, some factors (independent variables X) are used to explain or predict the variable
of main interest (dependent variable Y). For example, PLS is a statistical method that models
Y over X through a linear relationship. Rather than considering all dependent variables as
regressors in a multivariate regression analysis, PLS regresses Y over principal components
resulted from a principal components analysis (Garthwaite 1994). The method was in fact
previously used to model metabolomic variables as a function of the transcriptome profiles
(Pir et al., 2006). The analysis allowed the discrimination between the effects that the growth
media, dilution rate and deletion of specific genes on the transcriptomic and metabolomic
profiles (Pir et al. 2006). The method was also used to relate quantifiable phenotypes of
interest such as protease activity or productivity, to concentrations of each of the metabolites
determined (Braaksma et al., 2011). The analysis revealed various sugar derivatives
correlated with glucoamylase activity.
As an extension of PLS, Le Cao and colleagues proposed a sparse PLS approach to combine
integration and simultaneous variable (e.g., gene) selection in one step (Le Cao et al., 2008,
2009). In the approach, the PLS was penalized by the sum of the absolute values of the
coefficients through least absolute shrinkage and selection operator (LASSO) (Tibshirani
1996), therefore automatically eliminating variables (e.g., genes) with negligible effects. The
model selection approach, together with the smoothly clipped absolute deviation approach
(Fan & Li, 2001) is effective in analyzing data with sparsity (e.g., only a few genes have
significant effects).
The methods previously discussed in this section, including Perason correlation, PCA, and
PLS, are all methods to explore linear relationship. On the other hand, the kinetic model and
artificial network could be more sensible when nonlinearity occurs. In an analysis to
integrate metabolomics and pharmacokinetics (or nutrikinetics), Van Velzen et al. (2009)
presented a one-compartment nutrikinetic model with first-order excretion, a lag time, and a
baseline function was fitted to the time courses of these selected biomarkers based on
metabolomic data. A kietic model was also used to model the relationship between enzyme
kinetics and intracellular metabolites through a two-substrate Michaelis–Menten equation
with competitive substrate inhibition or competitive product inhibition (Schroer et al., 2009).
The kinetic constants were estimated by nonlinear regression of initial rate measurements.
Martense & Vanrolleghem (2010) summarized a few other modeling approaches. Compared
to data driving unsupervised analysis, the mathematical modeling may provide a meaning
relationship for a better understanding. However, because the modeling is generally based
on approximation under some restricted assumptions, the simple model may not precisely
describe the complex biology system.
Artificial network is another method to cope with the nonlinearity. A batch-learning self-
organizing network was utilized to classify the metabolomes and the transcriptomes
118 Metabolomics
according to their time-dependent pattern of changes (Kanaya et al., 2001); the results
showed that the metabolomes and transcriptomes regulated by the same mechanism tended
to be clustered together (Hirai et al., 2004, 2005). The A batch-learning self-organizing
network is artificial neural network that is trained using unsupervised learning to produce a
low-dimensional representation of the input space of the training samples. A Network-
embedded thermodynamic analysis (NET analysis) is presented as a framework for
mechanistic and model-based analysis of metabolite data. By coupling the data to an
operating metabolic network via the second law of thermodynamics and the metabolites’
Gibbs energies of formation, NET analysis allows inferring functional principles from
quantitative metabolite data; for example it identifies reactions that are subject to active
allosteric or genetic regulation as exemplified with quantitative metabolomic data from E.
coli and S. cerevisiae (Kummel et al., 2006). The network typically creates a graphic
representing the global relationship. In a review article, Feist et al. (2006) classified studies
using network into three categories: studies that use a reconstruction to examine topological
network properties, studies that use a reconstruction in constraint-based modeling for
quantitative or qualitative analyses, and studies that are purely data driven. Some of the
networks’ mathematical frameworks are graph theory. It provides a visual presentation of
the complex biology system. However, when it involves more features, the network
approach often become too complicated to provide a clear clue.
Bayesian graphical modeling approaches infer biological regulatory networks by integrating
expression levels of different types. Specific sequence/structure information will be
incorporated into the prior probability models (Webb-Robertson et al., 2009) presented a
Bayesian approach to integration that uses posterior probabilities to assign class memberships
to samples using individual and multiple data sources; these probabilities are based on lower-
level likelihood functions derived from standard statistical learning algorithms. The approach
was demonstrated by integrating two proteomic datasets and one metabolic dataset from
microbial infections of mice; the results showed that integration of the different datasets
improved classification accuracy to ~89% from the best individual dataset at ~83 %.
Integrative interpretations of data from different “omics” including metabolomics, are still
in it early development stage. More thoughtful interpretation methods that are capable to
reveal biology at a system level are yet to come. The collaboration between mathematicians,
statisticians, bioinformaticians and experimental biologists will be the key to the success of
these efforts.
5. Final remarks
Although comprehensive coverage of metabolome in cells is not yet possible, significant
advancements in the large-scale profiling of metabolites have been achieved in recent years
and these analyses have offered unique insight into the metabolic and regulatory networks
of cells. In this chapter, we first reviewed some of the widely used and emerging
technologies for metabolomics analysis, and then focus on recent progress in developing
computational methodologies to improve biological interpretation of high throughput
metabolomic data. In addition, we present some mathematical, statistical and bioinformatics
methods that have been utilized for the integration of metabolomics data with other type of
“omics” datasets and how this integrative analysis has improved our interpretation of
biological systems.
Computational Methods to Interpret and Integrate Metabolomic Data 119
6. Acknowledgements
The research of J. Wang and W. Zhang of Tianjin University was supported by National
Basic Research Program of China (National “973” program, project No. 2011CBA00803 and
No. 2012CB721101), and National Natural Science Foundation of China (Project No.
31170043).
7. Disclaimer
Feng Li is currently working for Office of Biostatistics, Food and Drug Administration.
Views expressed in this paper are the author's professional opinions and do not necessarily
represent the official positions of the U.S. Food and Drug Administration
8. References
Aharoni, A., C. H. R. De Vos, H. A. Verhoeven, C. A. Maliepaard, G. Kruppa, R. Bino & D. B.
Goodenowe (2002) Nontargeted metabolome analysis by use of fourier transform
ion cyclotron mass spectrometry. OMICS A Journal of Integrative Biology, 6, 217-234.
Almaas, E., B. Kovacs, T. Vicsek, Z. N. Oltvai & A. L. Barabasi (2004) Global organization of
metabolic fluxes in the bacterium Escherichia coli. Nature, 427, 839-843.
Amantonico, A., P. L. Urban, J. Y. Oh & R. Zenobi (2009) Interfacing microfluidics and laser
desorption/Ionization mass spectrometry by continuous deposition for application
in single cell analysis. Chimia, 63, 185-188.
Amantonico, A., J. Y. Oh, J. Sobek, M. Heinemann & R. Zenobi (2008) Mass spectrometric
method for analyzing metabolites in yeast with single cell sensitivity. Angewandte
Chemie-International Edition, 47, 5382-5385.
Amantonico, A., P. L. Urban & R. Zenobi (2010) Analytical techniques for single-cell
metabolomics: state of the art and trends. Analytical and Bioanalytical Chemistry, 398,
2493-2504.
Anderle, M., S. Roy, H. Lin, C. Becker & K. Joho (2004) Quantifying reproducibility for
differential proteomics: noise analysis for protein liquid chromatography-mass
spectrometry of human serum. Bioinformatics, 20, 3575-3582.
Baumgartner, C., C. Bohm, D. Baumgartner, G. Marini, K. Weinberger, B. Olgemoller, B.
Liebl & A. A. Roscher. 2004. Supervised machine learning techniques for the
classification of metabolic disorders in newborns. In Bioinformatics, 2985-96.
England.
Beckonert, O., M. E. Bollard, T. M. D. Ebbels, H. C. Keun, H. Antti, E. Holmes, J. C. Lindon &
J. K. Nicholson (2003) NMR-based metabonomic toxicity classification: hierarchical
cluster analysis and k-nearest-neighbour approaches. Analytica Chimica Acta, 490, 3-
15.
Beckonert, O., H. C. Keun, T. M. D. Ebbels, J. G. Bundy, E. Holmes, J. C. Lindon & J. K.
Nicholson (2007) Metabolic profiling, metabolomic and metabonomic procedures
for NMR spectroscopy of urine, plasma, serum and tissue extracts. Nature Protocols,
2, 2692-2703.
Bedair, M. & L. W. Sumner (2008) Current and emerging mass-spectrometry technologies
for metabolomics. Trac-Trends in Analytical Chemistry, 27, 238-250.
120 Metabolomics
Benjamini, Y. & Y. Hochberg (1995) Controlling the False Discovery Rate: A Practical and
Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series
B (Methodological), 57, 289-300.
Berg, J., Y. P. Hung & G. Yellen (2009) A genetically encoded fluorescent reporter of
ATP:ADP ratio. Nature Methods, 6, 161-166.
Birkemeyer, C., A. Kolasa & J. Kopka (2003) Comprehensive chemical derivatization for gas
chromatography-mass spectrometry-based multi-targeted profiling of the major
phytohormones. Journal of Chromatography A, 993, 89-102.
Bishop, C. M. 1995. Neural Networks for Pattern Recognition. Oxford, UK: Clarendon Press.
Braaksma, M., S. Bijlsma, L. Coulier, P. J. Punt & M. J. van der Werf (2011) Metabolomics as
a tool for target identification in strain improvement: the influence of phenotype
definition. Microbiology, 157, 147-59.
Breitling, R., A. R. Pitt & M. P. Barrett (2006) Precision mapping of the metabolome. Trends
in Biotechnology, 24, 543-548.
Broadhurst, D. & D. Kell (2006) Statistical strategies for avoiding false discoveries in
metabolomics and related experiments. Metabolomics, 2, 171-196.
Buckmaster, R., F. Asphahani, M. Thein, J. Xu & M. Zhang (2009) Detection of drug-induced
cellular changes using confocal Raman spectroscopy on patterned single-cell
biosensors. Analyst, 134, 1440-1446.
Cahill, G., P. K. Walsh & D. Donnelly (2000) Determination of yeast glycogen content by
individual cell spectroscopy using image analysis. Biotechnology and Bioengineering,
69, 312-322.
Cannon, D. M., N. Winograd & A. G. Ewing (2000) Quantitative chemical analysis of single
cells. Annual Review of Biophysics and Biomolecular Structure, 29, 239-263.
Chen, H. W., A. Venter & R. G. Cooks (2006) Extractive electrospray ionization for direct
analysis of undiluted urine, milk and other complex mixtures without sample
preparation. Chemical Communications, 2042-2044.
Ciosek, P., Z. Brzozka, W. Wroblewski, E. Martinelli, C. Di Natale & A. D'Amico. 2005.
Direct and two-stage data analysis procedures based on PCA, PLS-DA and ANN
for ISE-based electronic tongue-Effect of supervised feature extraction. In Talanta,
590-6. England.
Clarke, B. L. 1980. Stability of Complex Reaction Networks. In Advances in Chemical Physics,
1-215. John Wiley & Sons, Inc.
--- (1988) Stoichiometric network analysis. Cell Biophysics, 12, 237-53.
Cloarec, O., M. E. Dumas, A. Craig, R. H. Barton, J. Trygg, J. Hudson, C. Blancher, D.
Gauguier, J. C. Lindon, E. Holmes & J. Nicholson (2005) Statistical total correlation
spectroscopy: an exploratory approach for latent biomarker identification from
metabolic 1H NMR data sets. Analytical Chemistry, 77, 1282-9.
Cooks, R. G., Z. Ouyang, Z. Takats & J. M. Wiseman (2006) Ambient mass spectrometry.
Science, 311, 1566-1570.
Cristianini, N. & J. Shawe-Taylor. 2000. n Introduction to Support Vector Machines. Cambridge
University Press.
Cuperlovic-Culf, M., N. Belacel, A. S. Culf, I. C. Chute, R. J. Ouellette, I. W. Burton, T. K.
Karakach & J. A. Walter (2009) NMR metabolic analysis of samples using fuzzy K-
means clustering. Magnetic Resonance in Chemistry, 47 Suppl 1, S96-104.
Daykin, C. A., P. J. D. Foxall, S. C. Connor, J. C. Lindon & J. K. Nicholson (2002) The
comparison of plasma deproteinization methods for the detection of low-
Computational Methods to Interpret and Integrate Metabolomic Data 121
Enot, D. P., W. Lin, M. Beckmann, D. Parker, D. P. Overy & J. Draper (2008) Preprocessing,
classification modeling and feature selection using flow injection electrospray mass
spectrometry metabolite fingerprint data. Nature Protocols, 3, 446-70.
Famili, I., J. Forster, J. Nielsen & B. O. Palsson. 2003. Saccharomyces cerevisiae phenotypes can
be predicted by using constraint-based analysis of a genome-scale reconstructed
metabolic network. In Proceedings of the National Academy of Sciences of the United
States of America, 13134-9. United States.
Fan, J. Q. & R. Z. Li (2001) Variable selection via nonconcave penalized likelihood and its
oracle properties. Journal of the American Statistical Association, 96, 1348-1360.
Farag, M. A., D. V. Huhman, R. A. Dixon & L. W. Sumner (2008) Metabolomics reveals
novel pathways and differential mechanistic and elicitor-specific responses in
phenylpropanoid and isoflavonoid biosynthesis in Medicago truncatula cell
cultures. Plant Physiology, 146, 387-402.
Fehr, M., W. B. Frommer & S. Lalonde (2002) Visualization of maltose uptake in living yeast
cells by fluorescent nanosensors. Proceedings of the National Academy of Sciences of the
United States of America, 99, 9846-9851.
Feist, A. M., J. C. Scholten, B. O. Palsson, F. J. Brockman & T. Ideker (2006) Modeling
methanogenesis with a genome-scale metabolic reconstruction of Methanosarcina
barkeri. Molecular Systems Biology, 2, 2006 0004.
Fell, D. 1997. Understanding the Control of Metabolism. Potland Press.
Fiehn, O. (2001) Combining genomics, metabolome analysis, and biochemical modelling to
understand metabolic networks. Comparative and Functional Genomics, 2, 155-168.
Fiehn, O. 2007. Cellular Metabolomics: The Quest for Pathway Structure. In The Handbook of
Metabonomics and Metabolomics, eds. J. C. Lindon, J. K. Nicholson & E. Holmes, 35-
54. Elsevier.
Fletcher, J. S. (2009) Cellular imaging with secondary ion mass spectrometry. Analyst, 134,
2204-2215.
Fliermans, C. B. & E. L. Schmidt (1975) Authoradiography and immunofluorescence
combined for autecological study of single cell activity with nitrobacter as a model
system. Applied Microbiology, 30, 676-684.
Fraser, P. D., E. M. A. Enfissi, M. Goodfellow, T. Eguchi & P. M. Bramley (2007) Metabolite
profiling of plant carotenoids using the matrix-assisted laser desorption ionization
time-of-flight mass spectrometry. Plant Journal, 49, 552-564.
Furey, T. S., N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer & D. Haussler (2000).
Support vector machine classification and validation of cancer tissue samples using
microarray expression data. Bioinformatics, 16, 906-914.
Garthwaite, P. H. (1994) An Interpretation of Partial Least-Squares. Journal of the American
Statistical Association, 89, 122-127.
Gmati, D., J. K. Chen & M. Jolicoeur (2005) Development of a small-scale bioreactor:
Application to in vivo NMR measurement. Biotechnology and Bioengineering, 89, 138-
147.
Goff, K. L., L. Quaroni & K. E. Wilson (2009) Measurement of metabolite formation in single
living cells of Chlamydomonas reinhardtii using synchrotron Fourier-Transform
Infrared spectromicroscopy. Analyst, 134, 2216-2219.
Graf, T. & M. Stadtfeld (2008) Heterogeneity of Embryonic and Adult Stem Cells. Cell Stem
Cell, 3, 480-483.
Grivet, J.-P. & A.-M. Delort (2009) NMR for microbiology: In vivo and in situ applications.
Progress in Nuclear Magnetic Resonance Spectroscopy, 54, 1-53.
Computational Methods to Interpret and Integrate Metabolomic Data 123
Gu, H., H. Chen, Z. Pan, A. U. Jackson, N. Talaty, B. Xi, C. Kissinger, C. Duda, D. Mann, D.
Raftery & R. G. Cooks (2007) Monitoring diet effects via biofluids and their
implications for metabolomics studies. Analytical Chemistry, 79, 89-97.
Guyon, I., J. Weston, S. Barnhill & V. Vapnik (2002) Gene Selection for Cancer Classification
using Support Vector Machines. Machine Learning, 46, 389-422.
Gygi, S. P., Y. Rochon, B. R. Franza & R. Aebersold (1999) Correlation between protein and
mRNA abundance in yeast. Molecular and Cellular Biology, 19, 1720-30.
Hageman, J. A., R. A. van den Berg, J. A. Westerhuis, H. C. J. Hoefsloot & A. K. Smilde
(2006) Bagged K-Means Clustering of Metabolome Data. Critical Reviews in
Analytical Chemistry, 36, 211-220.
Heinemann, M. & R. Zenobi (2011) Single cell metabolomics. Current Opinion in
Biotechnology. 22, 26-31.
Heinrich, R. & S. Schuster. 1996. The Regulation of Cellular Systems. Chapman and Hall.
Hermelink, A., A. Brauer, P. Lasch & D. Naumann (2009) Phenotypic heterogeneity within
microbial populations at the single-cell level investigated by confocal Raman
microspectroscopy. Analyst, 134, 1149-1153.
Hirai, M. Y., M. Klein, Y. Fujikawa, M. Yano, D. B. Goodenowe, Y. Yamazaki, S. Kanaya, Y.
Nakamura, M. Kitayama, H. Suzuki, N. Sakurai, D. Shibata, J. Tokuhisa, M.
Reichelt, J. Gershenzon, J. Papenbrock & K. Saito (2005) Elucidation of gene-to-gene
and metabolite-to-gene networks in arabidopsis by integration of metabolomics
and transcriptomics. Journal of Biological Chemistry, 280, 25590-5.
Hirai, M. Y., M. Yano, D. B. Goodenowe, S. Kanaya, T. Kimura, M. Awazuhara, M. Arita, T.
Fujiwara & K. Saito (2004) Integration of transcriptomics and metabolomics for
understanding of global responses to nutritional stresses in Arabidopsis thaliana.
Proceedings of the National Academy of Sciences of the United States of America, 101,
10205-10.
Hollywood, K., D. R. Brison & R. Goodacre (2006) Metabolomics: Current technologies and
future trends. Proteomics, 6, 4716-4723.
Holmes, D., D. Pettigrew, C. H. Reccius, J. D. Gwyer, C. van Berkel, J. Holloway, D. E.
Davies & H. Morgan (2009) Leukocyte analysis and differentiation using high
speed microfluidic single cell impedance cytometry. Lab on a Chip, 9, 2881-2889.
Hoops, S., S. Sahle, R. Gauges, C. Lee, J. Pahle, N. Simus, M. Singhal, L. Xu, P. Mendes & U.
Kummer. 2006. COPASI--a COmplex PAthway SImulator. In Bioinformatics, 3067-
74. England.
Hsieh, Y., R. Casale, E. Fukuda, J. W. Chen, I. Knemeyer, J. Wingate, R. Morrison & W.
Korfmacher (2006) Matrix-assisted laser desorption/ionization imaging mass
spectrometry for direct measurement of clozapine in rat brain tissue. Rapid
Communications in Mass Spectrometry, 20, 965-972.
Huang, L. & R. T. Kennedy (1995) Exploring single-cell dynamics using chemically-modified
microelectrodes. Trends in Analytical Chemistry, 14, 158-164.
Irish, J. M., N. Kotecha & G. P. Nolan (2006) Innovation - Mapping normal and cancer cell
signalling networks: towards single-cell proteomics. Nature Reviews Cancer, 6, 146-
155.
Ishii, N. & M. Tomita. 2009. Multi-Omics Data-Driven Systems Biology of E. coli.
Jones, J. J., S. Borgmann, C. L. Wilkins & R. M. O'Brien (2006) Characterizing the
phospholipid profiles in mammalian tissues by MALDI FTMS. Analytical Chemistry,
78, 3062-3071.
124 Metabolomics
Jewett, M., M. Hansen & J. Nielsen. 2007a. Data acquisition, analysis, and mining:
Integrative tools for discerning. In Metabolomics: A Powerful Tool in Systems Biology,
eds. J. Nielsen & M. C. Jewett, 159-187. Springer.
---. 2007b. Data acquisition, analysis, and mining: Integrative tools for discerning. In
Metabolomics, A Powerful Tool in Systems Biology, eds. J. Nielsen & M. C. Jewett, 159-
187. Springer.
Jonsson, P., S. J. Bruce, T. Moritz, J. Trygg, M. Sjostrom, R. Plumb, J. Granger, E. Maibaum, J.
K. Nicholson, E. Holmes & H. Antti (2005) Extraction, interpretation and validation
of information for comparing samples in metabolic LC/MS data sets. Analyst, 130,
701-7.
Kaartinen, J., Y. Hiltunen, P. T. Kovanen & M. Ala-Korpela (1998) Application of self-
organizing maps for the detection and classification of human blood plasma
lipoprotein lipid profiles on the basis of 1H NMR spectroscopy data. NMR in
Biomedicine, 11, 168-176.
Kandpal, R. P., B. Saviola & J. Felton (2009) The era of 'omics unlimited. Biotechniques, 46,
351.
Kanaya, S., M. Kinouchi, T. Abe, Y. Kudo, Y. Yamada, T. Nishi, H. Mori & T. Ikemura (2001)
Analysis of codon usage diversity of bacterial genes with a self-organizing map
(SOM): characterization of horizontally transferred genes with emphasis on the E.
coli O157 genome. Gene, 276, 89-99.
Khatib-Shahidi, S., M. Andersson, J. L. Herman, T. A. Gillespie & R. M. Caprioli (2006)
Direct molecular analysis of whole-body animal tissue sections by imaging MALDI
mass spectrometry. Analytical Chemistry, 78, 6448-6456.
Kind, T. & O. Fiehn. 2006. Metabolomic database annotations via query of elemental
compositions: mass accuracy is insufficient even at less than 1 ppm. In BMC
Bioinformatics, 234 7:234. England.
Klamt, S. & E. D. Gilles (2004) Minimal cut sets in biochemical reaction networks.
Bioinformatics, 20, 226-34.
Klamt, S., J. Saez-Rodriguez & E. D. Gilles (2007) Structural and functional analysis of
cellular networks with CellNetAnalyzer. BMC Systems Biology, 1, 2.
Klamt, S., J. Saez-Rodriguez, J. A. Lindquist, L. Simeoni & E. D. Gilles (2006) A methodology
for the structural and functional analysis of signaling and regulatory networks.
BMC Bioinformatics, 7, 56.
Klamt, S. & S. Schuster (2002) Calculating as many fluxes as possible in underdetermined
metabolic networks. Mol Biol Rep, 29, 243-8.
Klamt, S. & J. Stelling (2003) Two approaches for metabolic pathway analysis? Trends in
Biotechnology, 21, 64-9.
Kohonen, T. (1998) The self-organizing map. Neurocomputing, 21, 1-6.
---. 2001. Self-Organizing Maps. Springer.
Kouskoumvekaki, I., Z. Yang, S. O. Jonsdottir, L. Olsson & G. Panagiotou. 2008.
Identification of biomarkers for genotyping Aspergilli using non-linear methods for
clustering and classification. In BMC Bioinformatics, 59. England.
Kummel, A., S. Panke & M. Heinemann (2006) Putative regulatory sites unraveled by
network-embedded thermodynamic analysis of metabolome data. Molecular
Systems Biology, 2, 2006 0034.
Le Cao, K. A., P. G. Martin, C. Robert-Granie & P. Besse (2009) Sparse canonical methods for
biological data integration: application to a cross-platform study. BMC
Bioinformatics, 10, 34.
Computational Methods to Interpret and Integrate Metabolomic Data 125
Le Cao, K. A., D. Rossouw, C. Robert-Granie & P. Besse (2008) A sparse PLS for variable
selection when integrating omics data. Statistical and Applied Genetics and Molecular
Biology, 7, Article 35.
Lenz, E. M. & I. D. Wilson (2007) Analytical strategies in metabonomics. Journal of Proteome
Research, 6, 443-458.
Lei, F. & S. B. Jorgensen. 2001. Estimation of kinetic parameters in a structured yeast model
using regularisation. In Journal of Biotechnology, 223-37. Netherlands.
Lerouxel, O., T. S. Choo, M. Seveno, B. Usadel, L. Faye, P. Lerouge & M. Pauly (2002) Rapid
structural phenotyping of plant cell wall mutants by enzymatic oligosaccharide
fingerprinting. Plant Physiology, 130, 1754-1763.
Li, X., X. Lu, J. Tian, P. Gao, H. Kong & G. Xu (2009) Application of fuzzy c-means clustering
in data analysis of metabolomics. Analytical Chemistry, 81, 4468-75.
Lin, J. & J. Qian (2007) Systems biology approach to integrative comparative genomics.
Expert Review of Proteomics, 4, 107-119.
Lindon, J. C., E. Holmes & J. K. Nicholson (2003) So whats the deal with metabonomics?
Metabonomics measures the fingerprint of biochemical perturbations caused by
disease, drugs, and toxins. Analytical Chemistry, 75, 384A-391A.
Lindon, J. C., J. K. Nicholson & I. D. Wilson (2000) Directly coupled HPLC-NMR and HPLC-
NMR-MS in pharmaceutical research and development. Journal of Chromatography
B, 748, 233-258.
Listgarten, J. & A. Emili. 2005. Statistical and computational methods for comparative
proteomic profiling using liquid chromatography-tandem mass spectrometry. In
Molecular and Cellular Proteomics, 419-34. United States.
Lu, K.-Y., A. M. Wo, Y.-J. Lo, K.-C. Chen, C.-M. Lin & C.-R. Yang (2006) Three dimensional
electrode array for cell lysis via electroporation. Biosensors & Bioelectronics, 22, 568-
574.
Mahadevan, S., S. L. Shah, T. J. Marrie & C. M. Slupsky (2008) Analysis of metabolomic data
using support vector machines. Analytical Chemistry, 80, 7562-70.
Mallouchos, A., M. Komaitis, A. Koutinas & M. Kanellaki (2002) Investigation of volatiles
evolution during the alcoholic fermentation of grape must using free and
immobilized cells with the help of solid phase microextraction (SPME) headspace
sampling. Journal of Agricultural and Food Chemistry, 50, 3840-3848.
Mas, S., S. G. Villas-Boas, M. E. Hansen, M. Akesson & J. Nielsen (2007) A comparison of
direct infusion MS and GC-MS for metabolic footprinting of yeast mutants.
Biotechnology and Bioengineering, 96, 1014-1022.
Masujima, T. (2009) Live single-cell mass spectrometry. Analytical Sciences, 25, 953-960.
Mattoli, L., F. Cangi, A. Maidecchi, C. Ghiara, E. Ragazzi, M. Tubaro, L. Stella, F. Tisato & P.
Traldi (2006) Metabolomic fingerprinting of plant extracts. Journal of Mass
Spectrometry, 41, 1534-1545.
Maxwell, R. J., I. Martinez-Perez, S. Cerdan, M. E. Cabanas, C. Arus, A. Moreno, A.
Capdevila, E. Ferrer, F. Bartomeus, A. Aparicio, G. Conesa, J. M. Roda, F. Carceller,
J. M. Pascual, S. L. Howells, R. Mazucco & J. R. Griffiths (1998) Pattern recognition
analysis of 1H NMR spectra from perchloric acid extracts of human brain tumor
biopsies. Magnetic Resonance in Medicine, 39, 869-77.
Mellors, J. S., V. Gorbounov, R. S. Ramsey & J. M. Ramsey (2008) Fully integrated glass
microfluidic device for performing high-efficiency capillary electrophoresis and
electrospray ionization mass spectrometry. Analytical Chemistry, 80, 6881-6887.
126 Metabolomics
Schuster, S., T. Pfeiffer, F. Moldenhauer, I. Koch & T. Dandekar (2002) Exploring the
pathway structure of metabolism: decomposition into subnetworks and application
to Mycoplasma pneumoniae. Bioinformatics, 18, 351-361.
Seger, C. & S. Sturm (2007) Analytical aspects of plant metabolite profiling platforms:
Current standings and future aims. Journal of Proteome Research, 6, 480-497.
Segre, D., J. Zucker, J. Katz, X. Lin, P. D'Haeseleer, W. P. Rindone, P. Kharchenko, D. H.
Nguyen, M. A. Wright & G. M. Church (2003) From annotated genomes to
metabolic flux models and kinetic parameter fitting. OMICS, 7, 301-16.
Shinohara, H. & F. Wang (2007) Real-time detection of dopamine released from a nerve
model cell by an enzyme-catalyzed luminescence method and its application to
drug assessment. Analytical Sciences, 23, 81-84.
Singh, O. V. & N. S. Nagaraj (2006) Transcriptomics, proteomics and interactomics: unique
approaches to track the insights of bioremediation. Briefings in Functional Genomics
& Proteomics, 4, 355-362.
Soga, T., Y. Ohashi, Y. Ueno, H. Naraoka, M. Tomita & T. Nishioka (2003) Quantitative
metabolome analysis using capillary electrophoresis mass spectrometry. Journal of
Proteome Research, 2, 488-494.
Southam, A. D., T. G. Payne, H. J. Cooper, T. N. Arvanitis & M. R. Viant (2007) Dynamic
range and mass accuracy of wide-scan direct infusion nanoelectrospray
Fourier transform ion cyclotron resonance mass spectrometry-based metabolomics
increased by the spectral stitching method. Analytical Chemistry, 79, 4595-4602.
Stein, S. E. (1994) Estimating probabilities of correct identification from results of mass
spectral library searches. Journal of the American Society for Mass Spectrometry, 5, 316-
323.
Stelling, J., S. Klamt, K. Bettenbrock, S. Schuster & E. D. Gilles (2002) Metabolic network
structure determines key aspects of functionality and regulation. Nature, 420, 190-
193.
Stephanopoulos, G., H. Alper & J. Moxley (2004) Exploiting biological complexity for strain
improvement through systems biology. Nature Biotechnology, 22, 1261-1267.
Steuer, R. 2006. Review: on the analysis and interpretation of correlations in metabolomic
data. In Briefing in Bioinformatics, 151-8. England.
---. 2007. Computational approaches to the topology, stability and dynamics of metabolic
networks. In Phytochemistry, 2139-51. United States.
Storey, J. D. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical
Society: Series B (Statistical Methodology), 64, 479-498.
Storey, J. D. & R. Tibshirani. 2003. Statistical significance for genomewide studies. In
Proceedings of the National Academy of Sciences of the United States of America, 9440-5.
United States.
Strelkov, S., M. von Elstermann & D. Schomburg (2004) Comprehensive analysis of
metabolites in Corynebacterium glutamicum by gas chromatography/mass
spectrometry. Biological Chemistry, 385, 853-861.
Sturm, S., C. Seger & H. Stuppner (2007) Analysis of Central European Corydalis species by
nonaqueous capillary electrophoresis-electrospray ion trap mass spectrometry.
Journal of Chromatography A, 1159, 42-50.
Suits, F., J. Lepre, P. Du, R. Bischoff & P. Horvatovich (2008) Two-dimensional method for
time aligning liquid chromatography-mass spectrometry data. Analytical Chemistry,
80, 3095-104.
Computational Methods to Interpret and Integrate Metabolomic Data 129
1. Introduction
Metabolism represents a junction system in biological body receiving cumulated signals
from upstream (genome, transcriptome, proteome) and downstream (environment) systems.
This median position of the metabolic system makes it to be very sensitive toward internal
and external signals resulting in its regulatory role in physiological homeostasy and
adaptive responses to endogenous and exogenous factors. These factors have initiation,
modulation or pressure effects on the biological organisms and species which adapt, resist
or react through different types of metabolisms based on different synthesis and regulation
levels of metabolites (Wilson, 2009).
These characteristics give to metabolism a flexibility that may be described by means of four
variability criteria (Fig. 1a): presence-absence of metabolites, concentration levels, relative
levels or ratios between metabolites, and metabolic profiles characterizing different structural
and functional states in biosystems through different metabolites’ levels. Presence-absence of
metabolites is a qualitative criterion that concerns metabolites that are stimulated by particular
internal biological states (species physiology, disease, stress, etc.) or external governing factors
(climate, threat, etc.). Beyond this binary aspect of metabolic responses, increase or decrease in
concentration levels of some metabolites can be sensitive responses to the degree of a
governing implicit factor (e.g. wounding, toxin, pollutant exposure levels, etc.). More precision
on association between biological state and governing factor can be extracted from
concentration ratios between sensitive metabolites. Complex situations integrating many
interactive factors can be characterized by metabolic profiles in which several metabolites
levels increase or decrease compared with control conditions or neutral situations.
The four metabolic variability criteria can be used separately or in association to provide reliable
pictures on different metabolic phenotypes of a biological system; such pictures characterizing
the metabolic phenotypes are called metabotypes (or chemotypes) (Fig. 1a). Identifications of
relationships between metabotypes and quantitative or qualitative control factors lead to the
concept of metabolic markers. Metabolic markers can be used to anticipate (predict), alert and
control responses and states of different components in biological systems. These components
include cells, biofluids, tissues, organisms and biological populations or species.
132 Metabolomics
Presence- Concentration
Absence levels
A B
C D C D Concentrations
(a) (µg/L)
AC AD BC BD
Major metabolite
A
Metabolites B
A, B, C, D C
D Minor
Presence Absence
Metabotypes
Metabolite B
40% B 80% B 100% B Concentrations or
relative levels
Relative levels
(ratios) Profiling
Physiology Nutrition
METABOTYPE CRITERIA:
- Presence-Absence of Metabolites
- Concentration Levels
- Ratios between Metabolites’ levels
- Profiling of Metabolites’ sets
Ecology Clinics
2. Physiological metabotypes
2.1 Metabotypes based on metabolites’ occurrences
Presence-absence of metabolites is an efficient metabolomic parameter to characterize
different biological species or varieties known to represent different physiological systems
(Fig. 2): for instance, benzoic acid is metabolized almost entirely to hippuric acid by
primates, rodents and rabbits (mammals) (Fig. 2a). However, it is excreted unchanged and
as glucuronide by insects, birds and reptiles (Jones, 1982). Similarly, the excretion of
phenylacetic acid (Fig. 2b) as the parent compound or as glutamine, glycine or taurine
conjugates is species-dependent (Robertson et al., 2002): for instance, it is excreted as
phenylacetyl glutamine in humans and phenylacetylglycine in rats.
Comparison of 1H NMR spectra from control B6C3F1 mouse urines with those of control SD
rat urines revealed the presence of guanidinoacetic acid and trimethylamine in mouse but
their absence in rat (Bollard et al., 2005).
COOH
(a)
Benzoic acid
OH Urinary excretion
OH
N
O
Hippuric acid - Benzoic acid
O
- Glucuronyl Benzoic acid OH OH
OH
O O
HOOC
Primates Insects
Rodents Birds
Rabbits Reptiles
OH
(b)
O
Phenylacetic acid
O Urinary excretion O
NH NH
OH
OH
O O
Phenylacetylglutamine Phenylacetylglycine
H2N O
Humans Rats
Fig. 2. Metabolic derivatives of benzoic acid (a) and phenylacetic acid (b) excreted in urine of
human and animal species and from which specific metabotypes can be defined.
Metabotype Concept: Flexibility, Usefulness and Meaning in Different Biological Populations 135
Lower
(k) Urinary
(j) Urinary betaine trim ethylamine-N-
CH3 O oxide O-
+
N - +
H3C O N
CH3 H3C CH3
CH3
Differences in the concentrations of stool short-chain fatty acids (SCFA) between the lean,
overweight and obese human subjects have been shown to be important (Schwiertz et al.,
2010): the mean of total SCFA concentration in fecal samples of obese volunteers was by
136 Metabolomics
more than 20% higher in total than of lean volunteers. The highest increase was seen for
propionate (Fig. 3e) with 41%.
The age is known to result in numerous physiological changes that are reflected by physical
and metabolic changes:
Previous studies have shown that ageing rats decrease their excretion of citrate and 2-
oxoglutarate versus increase in their taurine and creatinine output (Fig.3f-i) (Bell et al.,
1991). The increase in creatinine with age (up to 6 months) may be associated with the
increase output from the muscle of larger rats as well as an age-related increase in the
glomerular filtration rate (the rat kidney becomes fully functional at 3 months).
Moreover, in young rats (1 month or less), the urinary excretion of trimethylglycine (betaine)
and trimethylamine-N-oxide was higher than in older rats (Fig. 3j, k). The increased level of
betaine and trimethylamine-N-oxide was higher than in older rats. The increased level of
betaine in the urine of young rats may be due to high choline levels (Bell et al., 1991). Betaine
(or N-trimethylglycine) results from oxidation of choline; it represents a reservoir of methyl
groups and plays the role of methyl donor in the synthesis of methionine in mammals.
In plant world, secondary metabolites showed significant variations in relation to species,
age and maturity level: for instance, among two birch species, Silver Birch (Betula pendula)
does not emit sesquiterpenes (SQT), while Downy Birch (B. pubescens) does (Hakola et al.,
2001). Moreover, older trees B. pubescens emitted greater quantities and higher proportions
of SQT than younger ones. In the common snapdragon, Antirrhinum majus, the emission of
methyl benzoate (MeBA) is increased in pollination period (Dudareva et al., 2000); this may
serve as guide for bees to find their way inside the flower. After pollination, emission of
MeBA decreases dramatically.
SCFA are produced by the intestinal microbiota which represent a large part of bacteria
belonging to the phyla of Firmicus, Bacteroides, Actinobacteria, Proeobacteria and
Verrumicrobia (Zoetendal et al., 2008). Some phyla were characterized by high level
production of some metabolites: for instance, Bacteroides phylum produces high levels of
acetate and propionate, whereas several members of the Firmicus phylum produce high
amount of butyrates (Maslowski et al., 2009).
trim ethylamine-N-
oxide Out of menstruation
O-
N
+ Increase
H3C CH3
(a) CH3
Urinary ratio
N
H3C CH3
CH3 Decrease
trim ethylamine
Menstruation
- -
H3C O H3C O
(b) Fecal ratio
O O
H3C CH 2 C
- -
O H3C O
Propionate Butyrate
Obese subjects
CH 2OH
6 --hydroxycortisol
C O
CH3
HO OH
(c) O Increase
OH
Urinary ratio
CH 2OH
Cortisol
C O
CH3
HO OH
Decrease
CH3
Male Caucasians
O OH
Metabolomic profile of
O O Alpk:ApfCD mousse strain
HO OH O-
OH
O
O O +
N HN NH
HO OH
H3C CH3 OH
CH3
O NH2
O
NH
O N
H3C CH3
N NH H3C CH3
S CH3
HO NH2
CH3 N
O
H
Creatinine
trim ethylamine
2-oxoglutarate
Citrate
dim ethylamine
Trim ethylamine-N-
Guanidinoacetic
Taurine
acid
oxide
Metabolomic profile of
C57BL107 mousse strain
Creatinine
trim ethylamine
2-oxoglutarate
Citrate
Trim ethylamine-N-
Guanidinoacetic
Taurine
dim ethylamine
acid
oxide
Fig. 5. Metabolomic profiles representing two genetic strains of mice on the basis of relative
levels of several metabolites. Bar heights are indicative of relatively higher or lower
concentrations depending on mousse strains (Gavaghan et al., 1996).
Metabotype Concept: Flexibility, Usefulness and Meaning in Different Biological Populations 139
Metabolic phenotypes analysis has been applied to characterize particular laboratory animal
varieties including "germfree" (GF) specimens. GF is the highest quality level of laboratory
animals in which there are no any detectable microorganisms in contrast to those commonly
known as “SPF”, which is merely free of specific pathogens. Germfree animals are especially
useful in the researches concerning genetic engineering, cancer, normal intestine flora,
immunology and nutrition.
Aqueous extract profiles of gut tissues from GF mice were markedly different from those of
conventional mice (Claus et al., 2008) (Fig. 6):
i. The metabolite profile of the duodenum from GF mice was mainly characterized by
higher levels of tauro-conjugated bile acids (TCBAs) and alanine versus lower levels of
glycerophosphocoline (GPC) (Fig. 6a) when compared with conventional mice.
ii. The jejunal tissue of GF group had higher levels of creatine and TCBAs versus lower
levels of tyrosine (Fig. 6b).
iii. The ileum of GF mice was characterized by a higher level of TBCAs and lower levels of
glutamate, fumarate, lacate, phophocholine and alanine when compared with the ileum
from conventional mice (Fig. 6c).
iv. The metabolic profile of the colon from GF mice revealed a high level in a complex
carbohydrate identified as raffinose, and lower levels of lactate, creatine, 5-aminovalerate,
propionate, glutamine, myo-inositol, scyllo-inositol, GPC, phosphocholine, choline,
formate, uracile and fumarate (Fig. 6d) (Monero and Arus, 1996).
Metabolic profiles of ileum and particularly colon in GF mice were markedly more affected than
those of duodenum and jejunum. This reflects the higher microbial loads found in ileum and
colon (Dunne, 2001). The lower levels of choline and its phosphorylated derivatives, GPC and
phosphocholine (Fig. 6a, c-d) were reported to be likely due to the disturbance of the membrane
of colonocytes in GF mice (Claus et al., 2008). Also, the accumulation of the trisaccharide,
raffinose, can be a possible consequence of this disruption. In GF animals, raffinose seems to be
able to cross the epithelial membrane and accumulates in colonocytes where it induces a rise in
osmotic pressure. This phenomenon provokes a well-known signaling cascade that leads to the
release of the mobile osmolytes: GPC, myo-inositol and scyllo-inositol.
Beyond static analysis, kinetic metabolic profiling is applied in chronobiology and
pharmacokinetics in relation to intrinsic or extrinsic factors (e.g diurnal variations):
In SD rat, 1H-NMR profiles of urinary samples collected during the day showed lower levels
of hippurate, taurine, and creatinine together with elevated levels of glucose, succinate,
dimethylglycine, glycine, creatine and betaine compared with urine collected during the
night (Bollard et al., 2001).
Male rats secrete growth hormone in an "on-off" episodic rhythm between which there
are periods when there are no detectable levels of the hormone. Growth hormone
secretion in the female rat is "continuous" since hormone levels are always present
(Czerniak, 2001).
In women subjects, plasmatic cortisol stimulated by synacthen (synthetic ACTH) showed
obesity level-dependent kinetic profiles (Fig. 7) (Semmar et al., 2005a): the secretion and
elimination of cortisol were more rapid and higher in the most obese followed by
intermediate obese then non-obese subjects.
140 Metabolomics
O O O
H3C
O O O
S (b)
(a) HO
CH3
N OH
HO
H3C
N
S
OH Duodenum CH3
H
CH3 O Jejunum
CH3 H O
CH3 CH3 H2N N
CH3 H3C OH
OH
NH
NH2 HO OH
HO OH O
O
CH3
H3C + OH
N P
H3C O O OH NH2
- HO
O OH
O O O
H3C
S
HO
CH3
N
H
OH (c) Ileum
CH3
CH3 O O
H3C CH3
H3C OH
+
OH N P O O
H3C O
HO OH O O OH HO
NH2 H3C
OH
OH
HO OH
O
NH2 OH
OH OH
O
HO
HO
O OH
O
OH
O
(d) Colon
HO HO OH
HO
HO
O
O
OH H3C CH3 O
OH OH
HO +
N P HO O O HO OH
H3C O OH
CH3 O OH H3C OH
O OH O NH OH
H2N N O O H3C CH3
OH
OH H3C CH 2 C HO OH
+ H2N O- OH N O HO
N - OH
NH
H2N OH
H3C OH O H
NH2 O
Raffinose Creatine Glutamine Phosphocholine Choline 5-aminovalerate Fumarate Lactate Propionate Uracile M yo-inositol Scyllo-inositol
Fig. 6. Metabolomic profiling of different gut tissues of germ free (GF) mice (Monero and
Arus, 1996). Bar heights are indicative of relatively higher or lower concentrations in GF
compared with conventional mice (details are given in text).
Metabotype Concept: Flexibility, Usefulness and Meaning in Different Biological Populations 141
250
(3)
200
concentrations ng/mL)
(2)
Plasma cortisol
150
(1)
100
50
0 10 20 30 40 50 60
Time (min)
Fig. 7. Kinetic profiles of plasma cortisol concentrations of three body weight levels women
populations, highlighting compensatory process between secretion and elimination in
relation to obesity levels. (1), (2), (3) corresponds to non-obese, intermediate obese and
extreme obese populations, respectively (Semmar et al., 2005).
3. Dietary metabotypes
3.1 Metabotypes based on occurrence of metabolites
Fruits and legumes can be generally characterized by occurrences of specific or abundant
secondary metabolites belonging to flavonoids and terpenoids. For instance, in flavonoid
class, flavonols, flavones, flavanones, isoflavones, flavanols and anthocyanins are widely
present in onions, parsley, citrus fruits, leguminous plants, green tea and blackberry,
respectively (Fig. 8) (Majewska et al., 2011; Holden et al., 2005; Kaufman et al., 1997).
Among the flavonols, quercetin is widely present in the plant world. It occurs as different
glycosidic forms with quercetin-3-rhamnoglucoside (or rutin) being one of the most
widespread forms (Fig. 9d). The different forms of quercetin glycosides have been found to
be good markers in food quality control: in onions, quercetin is bound to one or two
glucoses to give quercetin-4’-glucoside and quercetin-3,4’-glucoside (Fig. 9c); apples and
berries, however, have been characterized by the occurrence of quercetin-3-galactoside and
quercetin-3- arabinoside, respectively (Fig. 9b, a) (Kühnau, 1976; Zheng et al., 2003).
142 Metabolomics
O Lemons O Celery
Grapefruits Chicory
Tomatoes Letuce
O O
Fig. 8. Metabolic characterization of different dietary plants on the basis of high occurrences
of produced flavonoid classes in their tissues.
OH
4' OH
HO 7 O
Quercetin
3
5 OH
OH O
OH
OH OH OH
HO
OH OH CH3
O OH
HO O OH O
O OH
OH
HO O HO OH OH
OH HO O OH
O
O O CH 2OH O
HO O
OH O OH O OH O OH
OH Quercetin-4'- O
OH O OH
OH OH glucoside OH O OH
Quercetin-3-arabinoside Quercetin-3-galactoside OH O
Quercetin-3-(rhamnosyl-
HO 6→1-glucoside)
OH O
OH
O OH
OH (Rutin)
HO O HO Quercetin-
O
OH 3,4'-glucoside
O OH
OH
OH O
Fig. 9. Metabolic characterization of some dietary plants on the basis of abundant quercetin
glycosides in their tissues.
Flavanones are flavonoids particularly abundant in citrus and vary qualitatively in relation
to fruit types (Mouly et al., 1998; Kawaii et al., 1999; Gattuso et al., 2007): The lemon (Citrus
limon) can be distinguished by production of eriocitrin and hesperidin (Fig. 10 a), whereas in
grapefruits, naringin predominates in presence of narirutin (Fig. 10c). In oranges (Citrus
sinensis) and mandarins (Citrus reticulata), hesperidin is the major flavanone in presence of
narirutin (Fig. 10b).
Metabotype Concept: Flexibility, Usefulness and Meaning in Different Biological Populations 143
HO HO
OH OCH3
OH
HO O HO O
HO O
Eriodictyol Hesperetin Naringenin
HO O
HO O
HO O
HO HO
HO
HO OH HO OH
O O OH
H3C OH H3C OH HO O
H3C OH
OCH 3 HO
O OH O
O OH
O O O
HO
O O O O O O
HO HO O HO
HO O O O
HO HO OH
OH
HO OH
H3C O OH O
HO OH
OH O OH O
OH O HO
Eriocitrine Hesperidin Narirutin Naringin
Fig. 10. Metabolomic distinction between different Citrus species based on major flavanone
glycosides in fruit juices.
Input Ouput
others
275 µmol flavonol µmol
270 g onion
glucosides Q-(4'-O- Q-3,4'-O-
glucoside) glucoside
Source ? ? ? Content
Prediction of dietary source (Diet control)
In human subjects, plasma quercetin was found to be a good marker of dietary intake
because its concentrations increase with increasing ingested dose (Radtke et al., 2002). In
a strictly controlled dietary intervention study, 77 health human subjects consumed
either 170 or 850 g of fruits, vegetable and berries daily. Quercetin intake was calculated
to be 3 to 24 mg/d on the respective diets. The mean SD of plasma quercetin
concentration was 78 56 nmol/L during the habitual diet; it decreased to 70%
during the low-vegetable diet and increased to 170% during the high-vegetable diet
(Freese et al., 2002) (Fig. 12).
144 Metabolomics
170 g: 3 mg 55 nmol/L
(fruits, quercetin plasma
vegetable, intake quercetin
berries)
Fig. 12. Increase in plasma quercetin concentration (nmol/L) in relation to diet mass (g) and
its total quercetin content (mg) in human population (Freese et al., 2002).
Ingestion of 200 mL of coffee has been reported to increase the plasma conjugated caffeic
acid level in human subjects (Nardini et al., 2002).
Following consumption of both green and black tea, human urinary samples showed
significant increases in level of hippuric acid and 4-hydroxyphenylacetic acid (Mulder et al.,
2005).
Excretion of creatinine in the urine of rats, per unit of skeletal muscle mass, was found to be
promoted by food deprivation (Rikimaru et al., 1989).
(a) (b)
HO O
Daidzein analyzed in feces Daidzein analyzed in urine
O
OH
Daidzein
HO O
OH O
OH
Genistein
Fig. 13. Metabolomic classification of human subjects into two excretion levels of isoflavones
(daizein and genistein) according to percentages of faecal and urinary isoflavones
compared to diet dose. Low and high excreters had low and high excretion percentages of
isoflavones, respectively.
146 Metabolomics
Intake
Strawberries (250 g)
Walnuts (35 g)
OH
OH OH OH OH Urine
HO HO HO
190 mg
O
422 mg O O O O
191 mg O
Hydrolysis
O MF O MF O MF Glucuronidation
Ellagitannins O O
(MF)
OH OH
OH
O HO
OH OH OH O OH
OH
COOH
16.6 %
3.4 %
2.8 %
% ingested ellagitannins
Fig. 14. Characterization of three tannin-rich diets by the excretion percentages of a urinary
metabolite (urolithin B-3-O-glucuronide) referred to the ingested dose of ellagitannins
(Cerda et al., 2005).
O O
R-(+)-enantiomer S-()-enantiomer
O O
Conventional Organic
orange juice orange juice
Conventional Organic
grapefruit juice grapefruit juice
Conventional Organic
tomato juice tomato juice
Fig. 15. Metabolomic characterization of different fruit juices by their chiral flavanone
profiles.
148 Metabolomics
Human subjects ingesting three times two cups of coffee at 4-h intervals had urinary profiles
containing ferulic, isoferulic, dihydroferulic, 3-methoxy-4-hydroxybenzoic, hippuric and 3-
hydroxyhippuric acids (Rechner et al., 2001).
After ingestion of 270 g of lightly fried onion by human subjects, plasma and urine samples
collected over 24h showed very different metabolic profiles of concentrations (Fig. 16)
(Mullen et al., 2006):
OH OH
OH
OH O OH O
OH O
Fig. 16. Metabolic profiles of conjugated quercetin metabolites in plasma (a) and urine (b)
following ingestion of lightly fried onion (270 g) by health human subjects (Mullen et al.,
2006).
The main plasma metabolite, quercetin-3’-O-sulfate, was excreted only in trace quantities in
urine while isorhamnetin-O-glucuronide and quercetin-O-diglucuronide that were minor
components in plasma were major urinary metabolites (Fig. 16). Several other metabolites,
including quercetin-3’-O-glucuronide and isorhamnetin-4’-O-glucuronide, which were
present in trace quantities or absent from plasma were excreted in urine in substantial
amounts.
In two separated human studies, and following the consumption of 200 g of strawberries
(Mullen et al., 2008) or 200 g of blackberries (Felgines et al., 2005), the urinary contents were
Metabotype Concept: Flexibility, Usefulness and Meaning in Different Biological Populations 149
Fig. 17. Variations in concentrations of short chain fatty acids (SCFA) and lactate in stool
samples of obese humans in relation to three carbohydrate levels diets (Ducan et al., 2007).
(a), (b), (c) correspond to high, medium and low carbohydrate diet, respectively.
A previous study showed that SD rats deprived of water for 48h had elevated levels of
creatinine and depleted levels of taurine, hippurate, 2-oxoglutarate, succinate and citrate
(Clausing and Gottschalk, 1989). Water deprivation has a direct effect on osmoregulation
implying variations in osmoregulators’ levels as taurine.
4. Clinical metabotypes
Pathologies are known to induce changes in concentrations, regulation ratios and overall
profiles of different metabolites that could be used to diagnose or characterize different
diseases. Some examples will be given to illustrate the interest of different metabolomic
criteria in clinical cases.
2-bromoethanamine
O
Taurine
Mastomys natalensis
(a) Kidney disturbance S
(mousse) HO NH2
O
propyleneimine
Creatine TCA
intermediate
(b) SD rat 2-aminoadipate O CH3 O H 2C COO
-
-
O H2N N
Hydrazine Liver O
-
OH HO C COO
-
(steatosis) O NH2 NH
B6C3F1 mousse H 2C COO
-
Carbon
tetrachloride Creatine
O
Taurine CH3 O
(c) Rats Reduced S H2N N
Thioacetamide liver function HO NH2 OH
O
NH
Allyl alcohol
developing high-grade prostate cancer for men with low serum cholesterol levels (Platz et
al., 2009). Inversely, high levels of cholesterol in metastasic bone tissue (>70mg/g tissue)
were found to be revelator of prostate cancer, compared to normal bone tissue (50-60mg
cholesterol/g tissue) as well as to bone metastases from other cancers (<70mg/g) (Thysell et
al., 2010).
Fig. 19. Metabolomic profile characterizing breast tumor cells (MDA-MB-435) from normal
cells (MCF-10A) based on ratios between metabolite concentrations MDA-MB-435/MCF-
10A. High metabolic regulations are indicated by ratios >1, and inversely.
O-C link HO
OH
HO OH
HO O
OH
HO
Sugar(s)-O O
OH
O O
OH
HO O
O-Sugar(s)
HO
HO O
OH HO O
flavonol-3,7-O-diglycoside
C-C link
(c) (Fabaceae) HO O
OH
OH
O
H3CO
HO
OH O
O
R=H R H
O OH O
H3C HO
O
HO
O HO O Lamioideae Subfamily level Nepetoideae
O OH
HO
O OH
HO OH OH O COOH
No R.A. HO
HO OH O OH
HO
Rosmarinic acid (R.A.)
flavonol-3-O-tetraglycoside
Tribe level
(Asteraceae)
(e) Saturejeae Other tribes
OH
OH H3CO O
H3CO
HO O H3CO O No HMF
HO
OCH 3 O
HO OH HO
HO O
OH O OCH 3
Fig. 20. Phytochemical characterizations of different plant taxons on the basis of specific or
abundant phenolic compounds in their tissues.
Apart the link type between aglycone and sugar, glycosylation degree provides good
chemotaxonomical criterion to a general characterization of plant families. For instance, leaf-
tissues of Liliaceae species (monocotyledons) were characterized by the presence of di- and
tri-O-glycosides of flavonoids, and a rare occurrence of monoglycosides (Williams, 1975).
Flavonol 3, 7-diglycosides seem to be common constituents of the Liliaceae (Fig.20b)
(Budzianowski, 1991; Williams, 1975). The family Fabaceae (dicotyledons) has been shown
to be productive of multiglycosylated flavonols (Fig.20c) (Semmar, 2010).
Within the Lamiaceae family (dicotyledons), the presence-absence of rosmarinic acid (Fig.
20d) has been shown to be an excellent chemotaxonomic marker because of its presence in
the subfamily Nepetoideae and absence in the subfamily Lamioideae (Janicsák et al., 1999;
Harborne, 1966b). Members (tribes and genera) of these two subfamilies have been
phytochemically characterized by the presence-absence of hydroxyl and methyl groups
substituted on the A-ring of flavone aglycones. For instance, the presence of 5,7-dihydroxy-
6-methoxyflavones with a substituted B-ring is characteristic of the subfamily Nepetoideae
(Fig. 20d), particularly of Salvia, Rosmarinus and Ocimum species (Tomás-Barberán and
Wollenweber, 1990); in Lamioideae, a 5,7-dihydroxy 6-methyl ether flavone has been found
Metabotype Concept: Flexibility, Usefulness and Meaning in Different Biological Populations 155
in the genus Scutellaria but with unsubstituted B-ring (Fig. 20d). Moreover, in the subfamily
Nepetoideae, the genera Thymus, Satureja, Micromeria, Acinos, Calamintha, Origanum and
Mentha were characterized by the production of the 5,6-dihydroxy-,7,8-dimethoxyflavone
(Tomás-Barberán and Wollenberg, 1990); all these genera belong to the tribe Saturejeae (Fig.
20d).
Among the dictotyledons, the family Asteraceae has been characterized by the production of
aurones and quercetagetin which is a flavonol almost entirely found in this family (Fig. 20e)
(Iwashina, 2000).
In Tulipa (Liliaceae), the flower colors are fundamentally defined by the anthocyanidin type:
orange and flesh pink colors are linked to pelargonidin, black-red and red-orange are due to
cyanidin, and black-blue-violet-purple are governed by delphinidin (Fig. 21) (Shibata and
Ishikura, 1960; Torskangerpoll et al., 2005).
Moreover in tulips, the shade of flower tepals was showed to be dependent on some
chemical substitutions of anthocyanins: substitutions of anthocyanins by aromatic acyl
groups have been reported to be responsible for bluing effect (Torskangerpoll et al., 2005).
Also, combinations of cyanidin and pelargonidin with carotenoids generally induced
attractive red and orange colours (van Eijk et al., 1987).
Fig. 21. Variation of tulip flower colours in relation to occurrence and abundance of
anthocyanidin type.
156 Metabolomics
SQT levels have been also found to be biomarkers of diurnal and seasonal rhythmicity: -
Car emissions from various Citrus varieties were found to increase during the morning
with generally a concentration peak around noon (Ciccioli et al., 1999). In potatos, emitted
SQT increased steadily throughout the day and peaked in the afternoon (Agelopoulos et
al., 2000). In Finnish scots pine, -Car emissions exhibited significantly seasonal
variability, with maximum emissions observed during summer months (Tarvainen et al.,
2005).
Volatile terpenes (isoprenes, monoterpenes and sesquiterpenes) were found to be good
markers of water stress in some plants. In Pinus halepensis (Alepo pine), water stress induced
monoterpenes emissions by the leaves (Ormeño et al., 2007a). In young orange tree, severe
drought reduced -Car emissions to 6% of pre-drought levels, but emissions were
unaffected by mild drought conditions (Hansen and Seufert, 1999).
Different works concluded relationships between metabolic variability in plants and soil
composition: higher concentrations of aluminium in soil resulted in increase in exuded
phenolic compounds by the roots of maize (Kidd et al., 2001). Aluminium resistant variety
of maize exuded 15-fold higher level of flavonoids when pre-treated with silicon than
when no such pre-treatment was applied. In scots pine (Pinus sylvestris L.), tree exposed to
nickel had higher concentrations of condensed tannins compared with control (Roitto et
al., 2005). Calcareous soils stimulated emissions of -humulene from Aleppo pine,
whereas siliceous soils favored -humulene and -bourbonene from Rock Rose (Ormeño
et al., 2007b).
In plant world, emissions of some volatile compounds were found to be positively
correlated to biotic disturbance such as parasite or herbivory: in Black Sage, SQT
emissions increase significantly under infection with aphids (Arey et al., 1995). In corn
seedlings, SQT emissions increase as a response to caterpillar feeding, and it has been
demonstrated that such emission attracted wasps which parasitize caterpillars (Turlings et
al., 1995).
CH3 CH3
CH3
CH3
CH3
CH2
CH3 H3C
CH2 CH2 H2C
H H
H3C
H3C H3C CH3
CH3 H3C
Fig. 22. Characterization of different plant species by the percentages (%) of some volatile
sesquiterpenes (SQT) (-caryophyllene, -humulene, -farnesene, germacrene D) referred to
the overall SQT emissions.
SQT ratios showed that plant species can be characterized by dominance or relatively high
levels of some SQT : sunflower, hormbeam and citrus species are highly productive of -
caryophyllene (-car) (>90%). The SQT pool of gray pine seems to be dominated by high
relative levels of -farnesene (77%); the corn shows wide inter-individual variation range of
-farnesene going from 0 to 70%. Marsh elder can be characterized by germacrene D
representing 48 to 54% of overall emitted SQT. Trembling aspen seems to emit relatively
more -humulene with high inter-individual variability (3-36%). Apart from -humulene, -
humulene (not presented) is also frequent in nature and was reported to represent 55-57% of
SQT in red and white pines (Duhl et al., 2008).
In tulips (Liliaceae), the flower colors acquire higher variability governed by the relative levels
of mixed anthocyanins (Shibata and Ishikura, 1960): cultivars having "magenta nuances"
showed anthocyanin content in which the relative amounts of cyanidin 3-rutinoside increased
at the expense of delphinidin-3-rutinoside. Garden varieties with blue nuance (black, black-
purple, fade-sky, violet and purple) have relatively high content of delphinidin type in tepals
(i.e. the delphinidin content was more than 50% of the total anthocyanin content). Orange
colored tepals were to a large extent correlated with high relative amounts of the pelargonidin
derivatives at the expense of the two other aglycone types.
Apart from the chemotaxonomic characterization of plants, metabolic ratios were analysed
in relation to different environmental conditions to characterize adaptive responses of
biological species:
158 Metabolomics
FS Dihydro-kaempferol FH
Kaempferol Quercetin
GT
GT
1
Chemotype IV
OH
MT, AT OH
OH
HO O
HO O
O OH O
O Acyl. DiG. Rh. O
O OH O
H3C HO
O 5 H3C HO
HO O
O HO OH HO
O OH
HO O HO O
O OH
HO
HO OH O
HO OH OH
OH
OH
ApT 2
RhT
HO
3 HO O
OH 11-14
O
O HO O
OH
MT
H3 C HO
OH O
H3CO O
Chemotype II
O O
O OH O O
HO O
O HO O H3C HO O OCH3
OH OH O
HO O O OH
O HO HO
HO OH OH O HO O HO CH3 O O
HO OH
HO O OH
O
CH3
OH HO O
4 HO
HO OH OH
OH O O
OH O O OH O
AcT OH
H3C HO
HO OH O
HO
OH O HO
OH O
HO O
6-10 HO O
HO OH
O
OH
Chemotype I
O HO OH
O OH O
H3C HO
O
HO
O O O
O OH
HO CH3
HO OH
O
OH
OH Chemotype I
HO OH
O
Chemotype III
Fig. 23. Four chemotypes of Astragalus caprinus (Fabaceae) based on different metabolic
regulations between flavonol glycosides pathways. FS: Flavonol Synthase; FH: Flavonol
Hydroxylase; Acyl. Dig. Rh.: Acylated Diglycosyl of Rhamnazin or Rhamnocitrin
Metabotype Concept: Flexibility, Usefulness and Meaning in Different Biological Populations 159
6. References
Arey, J., Crowley, D. E., Crowley, M., Resketo, M., and Lester, J. (1995). Hydrocarbon
emissions from natural vegetation in California’s South Coast Air Basin. Atmos.
Environ., Vol. 29, pp. 2977–2988, 13522310
Agelopoulos, N. G., Chamberlain, K. & Pickett, J. A. (2000). Factors affecting volatile
emissions of intact Potato plants, Solanum tuberosum: variability of quantities and
stability of ratios. J. Chem. Ecol., Vol. 26, pp. 497–511, 0098-0331.
Barderas, M.G., Laborde, C.M., Posada, M., de la Cuesta, F., Zubiri, I., Vivanco, F. &
Alvarez-Llamas, G. (2011). Metabolomic Profiling for Identification of Novel
Potential Biomarkers in Cardiovascular Diseases. J. Biomed. Biotechnol. 2011, ID
790132, pp. 1-9, 11107243
Bell, J.D., Sadler, P.J., Morris, V.C. & Levander, O.A. (1991). Effect of aging and diet on
proton NMR spectra of rat urine. Magn. Reson. Med., Vol. 17, No. 2, pp. 414–422,
15222594
Bianchini, F., Hall, J., Donato, F. & Cadet, J. (1996). Monitoring urinary excretion of 5-
hydroxymethyluracil for assessment of oxidative DNA damage and repair.
Biomarkers, Vol. 1, pp. 178-184, 1354-750X.
Bishop, J.H. & Green, R. (1980). Effects of pregnancy on glucose handling by rat kidneys. J.
Physiol., Vol. 307, pp. 491–502, 00223751
Bollard, M.E., Holmes, J.C., Lindon, S.C., Mitchell, D., Branstetter, W. & Zhang JK
Nicholson, (2001). Investigations into biochemical changes due to diurnal variation
and estrus cycle in female rats using high resolution 1H NMR spectroscopy of
urine and pattern recognition. Anal. Biochem., Vol. 295, pp. 194–202, 00032697
Bollard, M.E., Keun, H., Ebbels, T., Beckonert, O., Antti, H., Lindon, J.C., Holmes, E. &
Nicholson, J.K. (2005). Comparative metabonomics of differential species toxicity of
hydrazine in the rat and mouse. Toxicol. Appl. Pharmac., Vol. 204, No 2, pp. 135-151,
0041008X
Budzianowski, J. (1991). Six flavonol glucuronides from Tulipa gesneriana. Phytochemistry
30, 1679-1682, 0031-9422.
Cerda, B., Tomas-Barberan, F.A. & Espin, J.C. (2005). Metabolism of antioxidant and
chemopreventive ellagitannins from strawberries, raspberries, walnuts, and oak-
aged wine in humans: identification of biomarkers and individual variability. J.
Agric. Food Chem., Vol. 53, pp. 227–235, 00218561
Ciccioli, P., Brancaleoni, E., Frattoni, M., Di Palo, V., Valentini, R., Tirone, G., Seufert, G.,
Bertin, N., Hansen, U., Csiky, O., Lenz, R. & Sharma, M. (1999). Emission of
reactive terpene compounds from orange orchards and their removal by within-
canopy processes, J. Geophys. Res., Vol. 104, pp. 8077–8094, 01480227
160 Metabolomics
Claus, S.P., Tsang, T.M., Wang, Y., Cloarec, O., Skordi, E., Martin, F.P., Rezzi, S., Ross, A.,
Kochhar, S., Holmes, E. & Nicholson, J.K. (2008). Systemic multicompartmental
effects of the gut microbiome on mouce metabolic phenotypes. Molecular Systems
Biology, Vol. 4, No. 219, pp. 1-14, 17444292
Clausing, P. & Gottschalk, M. (1989). Effects of drinking water acidification, restriction of
water supply and individual cageing on parameters of toxicological studies in rats.
Z. Versuchstierkd., Vol. 32, No 3, pp. 129–134, 00443697
Czerniak, R. (2001). Gender-based differences in pharmacokinetics in laboratory animal
models. Int. J. Toxicol., Vol. 20, No. 3, pp. 161–163, 10915818
Davison, J.M. & Hytten, F.E. (1975). The effect of pregnancy on the renal handling of
glucose. Br. J. Obstet. Gynaecol., Vol. 82, No 5, pp. 374–381, 03065456
Dudareva, N., Murfitt, L.M., Mann, C.J., Gorenstein, N., Kolosova, N., Kish, C.M., Bonham,
C. & Wood, K. (2000). Developmental regulation of methyl benzoate biosynthesis
and emission in snapdragon flowers. Plant Cell, Vol. 12, pp. 949-961, 10404651
Duhl, T.R., Helmig, D. & Guenther, A. (2008). Sesquiterpene emissions from vegetation: a
review. Biogeosciences, Vol. 5, pp. 761–777, 17264170
Duncan, S.H., Belenguer, A., Holtrop, G., Johnstone, A.M., Flint, H.J. & Lobley, G.E. (2007).
Reduced Dietary Intake of Carbohydrates by Obese Subjects Results in Decreased
Concentrations of Butyrate and Butyrate-Producing Bacteria in Feces. Appl. Environ.
Microbiol.,Vol. 73, No. 4, pp. 1073-1078, 00992240
Dunne, C. (2001). Adaptation of bacteria to the intestinal niche: probiotics and gut disorder.
Inflamm. Bowel. Dis., Vol. 7, pp. 136–145, 10780998
Felgines, C., Talavéra, S., Texier, O., Gil-Izquierdo, A., Lamaison, J.-L. & Rémésy, C. (2005).
Blackberry anthocyanins are mainly recovered from urine as methylated and
glucuronidated conjugates in humans. J. Agric. Food Chem., Vol. 53, pp. 7721–7727,
00218561
Freese, R., Alfthan, G., Jauhiainen, M., Basu, S., Erlund, I., Salminen, I., Aro, A. & Mutanen,
M. (2002). High intakes of vegetables, berries, and apples combined with a high
intake of linoleic or oleic acid only slightly affect markers of lipid peroxidation and
lipoprotein metabolism in healthy subjects. Am. J. Clin. Nutr., Vol. 76, pp. 950-960,
00029165
Gaillard, Y., Vayssette, F., Balland, A., Pépin, G. (1999). Gas chromatographic-tandem mass
spectrometric determination of anabolic steroids and their esters in hair:
application in doping control and meat quality control. J. Chromatogr. B, Vol. 735,
pp. 189-205, 15700232
Gattuso, G., Barreca, D., Gargiulli, C., Leuzzi, U. & Caristi, C. (2007). Flavonoid Composition
of Citrus Juices. Molecules, Vol. 12, pp. 1641-1673, 14203049
Gavaghan, C.L., Holmes, E., Lenz, E., Wilson, I.D. & Nicholson, J.K. (2000). An NMR-based
metabonomic approach to investigate the biochemical consequences of genetic
strain differences: application to the C57BL10J and Alpk:ApfCD mouse. FEBS Lett.,
Vol. 484, No. 3, pp. 169–174, 00145793
Gouinguené, S. P. & Turlings, T. C. J. (2002). The effects of abiotic factors on induced volatile
emissions in corn plants. Plant Phys., Vol. 129, pp. 1296–1307, 00320889
Groopman, J.D., Wogan, G.N., Roebuck, B.D. & Kensler, T.W. (1994). Molecular biomarkers
for aflatoxins and theirapplication to human cancer prevention. Cancer Res. Vol. 54,
pp. 1907-1911, 00085472
Metabotype Concept: Flexibility, Usefulness and Meaning in Different Biological Populations 161
Hakola, H., Laurila, T., Lindfors, V., Hellén, H., Gaman, A., & Rinne, J. (2001). Variation of
the VOC emission rates of birch species during the growing season. Boreal Environ.
Res., Vol. 6, pp. 237–249, 12396095
Hamilton, H.E., Wallace, J.E., Shimek E.L. Jr, Land, P., Harris, S.C. & Christenson J.G. (1977).
Cocaine and benzoylecgonine excretion in humans. J. Forensic Sci., Vol. 22, No. 4,
pp. 697-707, 00221198
Hansen, U. & Seufert, G. (1999). Terpenoid emission from Citrus sinensis (L.) OSBECK under
drought stress. Phys. Chem. Earth (B), Vol. 24, pp. 681–687, 14747065
Harborne, J.B. (1966b). Caffeic acid Ester distribution in higher plants. Z. Naturforsch. 21b,
604-605.
Hemminki, K. & Vodicka, P. (1995). Styrene: from characterization of DNA adducts to
application in styrene-exposed lamination workers. Toxicol. Lett., Vol. 77, pp. 153-
161, 03784274
Hodgkinson, A. (1962). Citric acid excretion in normal adults and in patients with renal
calculus. Clin. Sci., Vol. 23, pp. 203–205, 01435221
Holden, J.M., Bhagwat, S.A., Haytowitz, D.B., Gebhardt, S.E., Dwyer, J.T., Peterson, J.,
Beecher, G.R., Eldridge, A.L. & Balentine, D. (2005). Development of a database of
critically evaluated flavonoids data: application of USDA’s data quality evaluation
system. J. Food Comp. Anal., Vol. 18, pp. 829–844, 08891575
Holmes, E., Bonner, F.W. & Nicholson, J.K. (1995). Comparative studies on the
nephrotoxicity of 2-bromoethanamine hydrobromide in the Fischer 344 rat and the
multimammate desert mouse (Mastomys natalensis). Arch. Toxicol., Vol. 70, No 2, pp.
89–95, 03405761
Holmes, E., Bonner, F.W. & Nicholson, J.K. (1996). Comparative biochemical effects of low
doses of mercury II chloride in the F344 rat and the Multimammate mouse
(Mastomys natalensis). Comp. Biochem. Physiol., Vol. 114C, No. 1, pp. 7–15, 15320456
Holmes, E., Bonner, F.W. & Nicholson, J.K. (1997). 1H NMR spectroscopic and
histopathological studies on propyleneimine-induced renal papillary necrosis in
the rat and the multimammate desert mouse (Mastomys natalensis). Comp. Biochem.
Physiol. C. Pharmac. Toxicol. Endocrinol., Vol. 116, No. 2, pp. 125–134, 07428413
Holmes, E., Nicholls, A.W., Lindon, J.C., Ramos, S., Spraul, M., Neidig, P., Connor, S.C.,
Connelly, J., Damnent, S.J., Haselden, J. & Nicholson, J.K. (1998). Development of a
model for classification of toxin-induced lesions using 1H NMR spectroscopy of
urine combined with pattern recognition. NMR Biomed., Vol. 11, No. 4–5, pp. 235–
244, 09523480
Holmes, E., Nicholls, A.W., Lindon, J.C., Connor, S.C., Connelly, J.C., Haselden, J.N.,
Damment, S.J., Spraul, M., Neidig, P. & Nicholson, J.K. (2000). Chemometric
models for toxicity classification based on NMR spectra of biofluids. Chem. Res.
Toxicol., Vol. 13, No. 6, pp. 471–478, 0893228X
Holmes, E., Wilson, I.D. & Nicholson, J.K. (2008). Metabolic Phenotyping in Health and
Disease. Cell, Vol. 134, pp. 714-717, 00928674
Hugget, R.J., Kimerle, R.A., Mehrle, P.M. & Bergman, H.L. (1992). Biomarkers. Biochemical,
Physiological and Histological Markers of Anthropogenic Stress. Lewis, Chelsea,
MI. ISBN: 978-0873715058.
Iwashina, T. (2000). The structure and distribution of the flavonoids in plants. J. Plant Res.,
Vol. 113, pp. 287-299, 09189440
162 Metabolomics
Janicsák, G., Máthé, I., Miklóssy-Vári, V. & Blunden, G. (1999). Comparative studies of the
rosmarinic and caffeic acid contents of Lamiaceae species. Biochem. Syst. Ecol., Vol.
27, pp. 733-738, 03051978
Jones, A.R. (1982). Some observations on the urinary excretion of glycine conjugates by
laboratory animals. Xenobiotica, Vol. 12, pp. 387–395, 00498254
Joossen, J.V. (1988). Mechanisms of hypercholesterolemia and atherosclerosis. Acta Cardiol.
Suppl. Vol. 29, pp. 63–83
Kaufman, P.B., Duke, J.A., Brielmann, H., Boik, J. & Hoyt, J.E. (1997). A Comparative Survey
of Leguminous Plants as Sources of the Isoflavones, Genistein and Daidzein:
Implications for Human Nutrition and Health. The Journal of Alternative and
Complementary Medicine, Vol. 3, No. 1, pp. 7-12, 10755535
Kawaii, S., Tomono, Y., Katase, E., Ogawa, K. & Yano, M. (1999). Quantitation of flavonoid
constituents in citrus fruits. J. Agric. Food Chem., Vol. 47, No. 9, pp. 3565-3571,
00218561
Kidd, P.S., Llugany, M., Poschenrieder, C., Gunsé, B & Barceló, J. (2001). The role of root
exudates in aluminium resistance and silicon-induced amelioration of aluminium
toxicity in three variety of maize (Zea mays L.). J Exp Bot, Vol. 52, pp. 1959-1967,
00220957
Kintz, P., Cirimele, V., Jeanneau, T. & Ludes, B. (1999). Identification of testosterone and
testosterone esters in human hair. J. Anal. Toxicol., Vol. 23, pp. 352-356, 01464760
Konishi, H., Tanaka, K., Minouchi, T. & Yamaji, A. (2004). Urinary 6β-hydroxycortisol/17-
hydroxycorticosteroids ratio as a measure of hepatic CYP3A4 capacity after
enzyme induction. Ann Clin Biochem, Vol. 41, pp. 335-337, 00045632
Krol, M., Gray, G.R., Hurry, V.M., Oquist, G., Malek, L. & Huner N.P.A. (1995). Low-
temperature stress and photoperiod affect an increased tolerance to photoinhibition
in Pinus banksiana seedlings. Can. J. Bot., Vol. 73, pp. 1119-1127, 00084826
Kühnau, J. (1976). The flavonoids. A class of semi-essential food components: their role in
human nutrition. World Rev. Nutr. Diet., Vol. 24, pp. 117-191, 00842230
Leyva A., Jarillo J. A., Salinas J. & Martinez-Zapater J.M. (1995). Low temperature induces
the accumulation of phenylalanine ammonia-lyase and chalcone synthase mRNAs
of Arabidopsis thaliana in a light-dependent manner. Plant Physiol., Vol. 108, pp. 39-
46, 00320889
Liggins, J., Bluck, L.J., Runswick, S., Atkinson, C., Coward, W.A. & Bingham, S.A. (2000).
Daidzein and genistein contents of vegetables. Br. J. Nutr., Vol. 84, pp. 717- 25,
00071145
Logue, B.A., Kirschten, N.P., Petrikovics I., Moser, M.A., Rockwood, G.A., Baskin, S.I. (2005).
Determination of the cyanide metabolite 2-aminothiazoline-4-carboxylic acid in
urine and plasma by gas chromatography-mass spectrometry. J Chromatogr B, Vol.
819, No. 2, pp. 237-244, 15700232
Lutz, U., Bittner, N., Ufer, M. & Lutz, W. K. (2010). Quantification of cortisol and 6 beta-
hydroxycortisol in human urine by LC MS/MS, and gender-specific evaluation of
the metabolic ratio as biomarker of CYP3A activity. J. Chromatogr. B, Vol. 878, No. 1,
pp. 97–101, 15700232
Majewska, M. Skrzycki, M., Podsiad, M. & Czeczot, H. (2011). Evaluation of antioxidant
potential of flavonoids: an in vitro study. Acta Poloniae Pharmaceutica. Drug Research,
Vol. 68, No 4, pp. 611-615, 00016837
Metabotype Concept: Flexibility, Usefulness and Meaning in Different Biological Populations 163
Maslowski, K.M., Vieira, A.T., Ng, A., Kranich, J., Sierro, F., Yu, D., Schilter, H.C., Rolph,
M.S., Mackay, F., Artis, D., Xavier, R.J., Teixeira, M.M. & Mackay, C.R. (2009).
Regulation of inflammatory responses by gut microbiota and chemoattractant
receptor GPR43. Nature, Vol. 461, pp. 1282-1286, 00280836
Mayr, M., Yusuf, S., Weir, G., Chung, Y.L., Mayr, U., Yin, X., Ladroue, C., Madhu, B.,
Roberts, N., De Souza, A., Fredericks, S., Stubbs, M., Griffiths, J.R., Jahangiri, M.,
Xu, Q. & Camm, A.J. (2008). Combined metabolomic and proteomic analysis of
human atrial fibrillation. Journal of the American College of Cardiology, Vol. 51, No. 5,
pp. 585–594, 07351097
Kim, Y.S., & Milner, J.A. (2011). Bioactive Food Components and Cancer-Specific
Metabonomic Profiles. Journal of Biomedicine and Biotechnology ID 721213, pp. 1-9,
11107243
Mazurek, S. (2011). Pyruvate kinase type M2: a key regulator of the metabolic budget system
in tumor cells. Int J Biochem Cell Biol., Vol. 43, No. 7, pp. 969-980, 0020711X
Moreno, A. & Arus, C. (1996). Quantitative and qualitative characterization of 1H NMR
spectra of colon tumors, normal mucosa and their perchloric acid extracts:
decreased levels of myo-inositol in tumours can be detected in intact biopsies. NMR
Biomed., Vol. 9, pp. 33–45, 09523480
Mouly, P., Gaydou, E. M. & Auffray, A. (1998). Simultaneous separation of flavanone
glycosides and polymethoxylated flavones in citrus juices using liquid
chromatography. J. Chromatogr., Vol. 800, No. 2, pp. 171–179, 0021-9673.
Mulder, T.P., Rietveld, A.G. & van Amelsvoort, J.M. (2005). Consumption of both black tea
and green tea results in an increase in the excretion of hippuric acid into urine. Am.
J. Clin. Nutr., Vol. 81, pp. 256S–260S, 00029165
Mullen, W., Edwards, C.A. & Crozier, A. (2006). Absorption, excretion and metabolic
profiling of methyl-, glucuronyl-, glucosyl and sulpho-conjugates of quercetin in
human plasma and urine after ingestion of onions. Br. J. Nutr., Vol. 96, pp. 107–116,
00071145
Mullen, W., Edwards, C.A., Serafini, M., Crozier, A. (2008). Bioavailability of pelargonidin-3-
O-glucoside and its metabolites in humans following the ingestion of strawberries
with and without cream. J. Agric. Food Chem., Vol. 56, pp. 713–719, 00218561
Nardini, M.E., Cirillo, F., Natella, C. & Scaccini, C. (2002). Absorption of phenolic acids in
humans after coffee consumption. J. Agric. Food Chem., Vol. 50, pp. 5735–5741,
00218561
Oren-Shamir, M. & Levi-Nissim, A. (1997). Temperature effect on the leaf pigmentation of
Cotinus coggygria 'Royal Purple'. J. Hor. Sci. Biotech., Vol. 72, pp. 425-432. 14620316
Ormeño, E., Mévy, J.P., Vila, B., Bousquet-Mélou, A., Greff, S., Bonin, G. & Fernandez, C.
(2007a). Water deficit stress induces different monoterpene and sesquiterpene
emission changes in Mediterranean species. Relationship between terpene
emissions and plant water potential. Chemosphere, Vol. 67, pp. 276-284, 00456535
Ormeño, E., Fernandez, C., Bousquet-Mélou, A., Greff, S., Morin, E., Robles, C., Vila, B. &
Bonin, G. (2007b). Monoterpene and sesquiterpene emissions of three
Mediterranean species through calcareous and siliceous soils in natural conditions,
Atmos. Environ., Vol. 41, pp. 629-639, 13522310
164 Metabolomics
Phipps, A.N., Wright, B., Stewart, J. & Wilson, I.D. (1997). Use of proton NMR for
determining changes in metabolite excretion profiles induced by dietary changes in
the rat. Pharmacy and Pharmacology Communications, Vol. 3, pp. 143–146, 20427158
Platz, E.A., Till C., Goodman, P.J., Parnes, H.L., Figg, W.D., Albanes,D., Neuhouser, M.L.,
Klein, E.A., Thompson, I.M., Jr & Kristal, A.R. (2009). Men with low serum
cholesterol have a lower risk of high-grade prostate cancer in the placebo arm of
the prostate cancer prevention trial. Cancer Epidemiol Biomarkers Prev, Vol. 18, pp.
2807–2813, 10559965
Prevost, V., Likhachev, A.J., Loktionova, N.A., Bartsch, H., Wild, C.P., Kazanova, O.I.,
Arkhipov, A.I., Gershanovich, M.L. & Shuker, D.E.G. (1996). DNA base adducts in
urine and white blood cells of cancer patients receiving combination
chemotherapies which include N-methyl-N-nitrosourea. Biomarkers, Vol. 1, pp. 244-
251, 1354750X
Radtke, J., Linseisen, J. & Wolfram, G. (2002). Fasting plasma concentrations of selected
flavonoids as markers of their ordinary dietary intake. Eur. J. Nutr., Vol. 41, pp.203-
209, 14366207
Raina, K., Serkova, N. J. & Agarwal, R. (2009). Silibinin feeding alters the metabolic profile
in TRAMP prostatic tumors:1hnmrs-based metabolomics study. Cancer Research,
Vol. 69, No. 9, pp. 3731–3735, 00085472
Rechner, A.R., Spencer, J.P.E., Kuhnle, G., Hahn, U. & Rice-Evans, C.A. (2001). Novel
biomarkers of the metabolism of caffeic acid derivatives in vivo. Free Rad. Biol.
Med., Vol. 30, pp. 1213–1222, 08915849
Rikimaru, T., Oozeki, T., Ichikawa, M., Ebisawa, H. & Fujita, Y. ( 1989). Comparisons of
urinary creatinine, skeletal muscle mass and indices of muscle protein catabolism
in rats fed ad libitum, with restricted food intake and deprived of food. J. Nutr. Sci.
Vitaminol., Vol. 35, No. 3, pp. 190–209, 03014800
Robertson, D.G., Reily, M.D., Lindon, J.C., Holmes, E. & Nicholson, J.K. (2002).
Metabonomic technology as a tool for rapid throughput in vivo toxicity screening.
In Comprehensive Toxicology Vol. XIV: Molecular and Cellular Toxicology, Heuvel
JPV, Perdew GH, Mattes WB & Greenlee WF (Eds). Elsevier (2002). pp. 583-610,
9780080468846, Amsterdam
Roitto, M., Rautio, P., Julkunen-Tiitto, R., Kukkola, E. & Huttunen, S. (2005). Changes in the
concentrations of phenolics and photosynthates in Scots pine (Pinus sylvestris L.)
seedlings exposed to nickel and copper. Environmental Pollution, Vol. 137, pp. 603-
609, 02697491
Schwiertz, A., Taras, D., Schäfer, K., Beijer, S., Bos, N.A., Donus, C. & Hardt, P.D. (2010).
Microbiota and SCFA in lean and overweight healthy subjects. Obesity, Vol. 18, pp.
190-195, 19307381
Semmar, N., Bruguerolle, B., Boullu-Ciocca, S. & Simon, N (2005a). Cluster Analysis: An
Alternative Method for Covariate Selection in population Pharmacokinetic
Modeling. Journal of Pharmacokinetics and Pharmacodynamics, Vol. 32, No. 3-4, pp.
333-358, 1567567X
Semmar, N., Jay, M., Farman, M. and Chemli, R. (2005b). Chemotaxonomic analysis of
Astragalus caprinus (Fabaceae) based on the flavonic patterns. Biochem. Syst. Ecol.,
Vol. 33, pp. 187-200, 03051978
Metabotype Concept: Flexibility, Usefulness and Meaning in Different Biological Populations 165
Semmar, N., Jay M. & Nouira S. (2007). A new approach to graphical and numerical analysis
of links between plant chemotaxonomy and secondary metabolism from HPLC
data smoothed by a simplex mixture design. Chemoecology, Vol. 17, No. 3, pp. 139-
156, 09377409
Semmar, N., Nouira, S. & Farman, M. (2008). Variability and Ecological Significance of
Secondary Metabolites in Terrestrial Plant World. In: Handbook of Nature
Conservation, Aronoff, J.B., pp. 1-89. Nova Science Publishers, 9781606929933, NY
Semmar, N. (2010). Chemotaxonomical Analysis of Herbaceous Plants Based on Phenolic and
Terpenic patterns: flexible tools to survey biodiversity in grassland. Nova Science
Publishers, 9781616687892, NY.
Shibata, M. & Ishikura, N. (1960). Paper chromatographic survey of anthocyanin in tulip-
flowers. I. Jap. J. Bot., Vol. 17, pp. 230-238
Stanley, E.G. (2002). 1H NMR spectroscopic and chemometric studies on endogenous
physiological variation in rats. Ph.D. thesis, university of London, pp. 43–67
Tarvainen, V., Hakola, H., Hellén, H., Bäck, J., Hari, P. & Kulmala, M. (2005): Temperature
and light dependence of the VOC emissions of Scots Pine, Atmos. Chem. Phys., Vol.
5, pp. 989–998, 16807367
Thieme, D., Grosse, J., Sachs, H. & Mueller, R.K. (2000). Analytical strategy for detecting
doping agents in hair. Forensic Sci. Int., Vol. 107, pp. 335-345, 03790738
Thysell, E., Surowiec, I., Hörnberg, E., Crnalic, S., Widmark, A., Johansson, A.I., Stattin, P.,
Bergh, A., Moritz, T., Antti., H. & Wikström, P. (2010). Metabolomic
Characterization of Human Prostate Cancer Bone Metastases Reveals Increased
Levels of Cholesterol. PLoS One, Vol. 5, No. 12, e14175, 19326203
Timbrell, J.A., Draper, R.P. & Waterfield, C.J. (1994). Biomarkers in toxicology: new uses for
some old molecules? Toxicol. Ecotoxicol. News, Vol. 1, pp. 4-14, 13504592
Timbrell, J.A. (1998). Biomarkers in toxicology. Toxicology, Vol. 129, pp. 1–12, 0300483X
Tomás-Barberán, F.A.T. & Wollenweber, E. (1990). Flavonoid aglycones from the leaf
surfaces of some Labiatae species. Pl. Syst Evol., vol. 173, pp. 109-118, 03782697
Torskangerpoll, K., Nørbæk, R., Nodland, E., Øvstedal, D.O. & Andersen, Ø.M. (2005).
Anthocyanin content of Tulipa species and cultivars and its impact on tepal
colours. Biochem. Syst. Ecol., Vol. 33, pp. 499-510, 03051978
Turer, A. T., Stevens, R. D., Bain, J. R., Muehlbauer, M.J., van der Westhuizen, J., Mathew,
J.P., Schwinn, D.A., Glower, D.D., Newgard, C.B. & Podgoreanu M.V. (2009).
Metabolomic profiling reveals distinct patterns of myocardial substrate use in
humans with coronary artery disease or left ventricular dysfunction during surgical
ischemia/reperfusion. Circulation, Vol. 119, No. 13, pp. 1736–1746, 00097322
Turlings, T. C. J., Loughrin, J. H., McCall, P. J., R¨ose, U. S. R., Lewis, W. J., & Tumlinson, J.
H. (1995). How caterpillar-damaged plants protect themselves by attracting
parasitic wasps, Proc. Natl. Acad. Sci., Vol. 92, pp. 4169–4174, 00278424
Vallejo, M., Garcia, A., Tunon, J. , García-Martínez, D., Angulo, S., Martin-Ventura, J.L.,
Blanco-Colio, L.M., Almeida, P., Egido, J. & Barbas, C. (2009). Plasma fingerprinting
with GC-MS in acute coronary syndrome. Analytical and Bioanalytical Chemistry,
Vol. 394, No. 6, pp. 1517–1524, 16182642
van Eijk, J.P., Nieuwhof, M., van Keulen, H.A. & Keijzer, P. (1987). Flower colour analyses in
tulip (Tulipa L.). The occurrence of carotenoids and flavonoids in tulip tepals.
Euphytica, Vol. 36, pp. 855-862, 00142336
166 Metabolomics
Van Welie, R.T.H., van Dijck, R.G.J.M., Vermeulen, N.P.E., van Sittert, N.J., 1992.
Mercapturic acids, protein adducts and DNA adducts as biomarkers of
electrophilic chemicals. Crit. Rev. Toxicol., Vol. 22, No. 5-6, pp. 271-306, 10408444
Williams, C.A. (1975). Biosystematics of the Monocotyledoneae – Flavonoid Patterns in
Leaves of the Liliaceae. Biochem. Syst. Ecol., Vol. 3, pp. 229-244, 03051978
Wilson, I.D. (2009). Drugs, bugs, and personalized medicine: Pharmacometabonomics enters
the ring. PNAS, Vol. 106, No. 34, pp. 14187-14188, 00278424
Xu, X., Harris, K.S., Huei-Ju, W., Murphy, P.A. & Hendrich, S. (1995). Bioavailability of
soybean isoflavones depends upon gut microflora in women. The Journal of
Nutrition, Vol. 125, No. 9, pp.2307-2315, 00223166
Yanez, J.A., Remsberg, C.M., Miranda, N.D., Vega-Villa, K.R., Andrews, P.K. & Davies,
N.M. (2008). Pharmacokinetics of Selected Chiral Flavonoids: Hesperetin,
Naringenin and Eriodictyol in Rats and their Content in Fruit Juices. Biopharm.
Drug Dispos., Vol. 29, pp. 63–82, 01422782
Yang, C., Richardson A.D., Smith, J.W. & Osterman A. (2007). Comparative metabolomics of
breast cancer. Pacific Symposium on Biocomputing 12, 181-192, 17935091
Zhang, A.Q., Mitchell, S.C. & Smith, R.L. (1996). Exacerbation of symptoms of fish-odour
syndrome during menstruation. Lancet, Vol. 348, No. 9043, pp. 1740–1741, 01406736
Zheng, W. & Wang, S.Y. (2003). Oxygen radical absorbing capacity of phenolics in
blueberries, cranberries, chokeberries, and lingonberries. J. Agric. Food Chem., Vol.
51, pp. 502-509, 00218561
Zoetendal, E.G., Rajilic-Stojanovic, M. & de Vos, W.M. (2008). High throughput diversity
and functionality analysis of the gastrointestinal tract microbiota. Gut., Vol. 57, pp.
1605-1615, 00175749
7
1. Introduction
In recent years, the study of metabolomics and the use of metabolomics data to answer a
variety of biological questions have been greatly increasing (Fan, Lane et al. 2004; Griffin
2006; Khoo and Al-Rubeai 2007; Lindon, Holmes et al. 2007; Lawton, Berger et al. 2008).
While various techniques are available for analyzing this type of data (Bryan, Brennan et al.
2008; Scalbert, Brennan et al. 2009; Thielen, Heinen et al. 2009; Xia, Psychogios et al. 2009),
the fundamental goal of the analysis is the same – to quickly and accurately identify
detected molecules so that biological mechanisms and modes of action can be understood.
Metabolomics analysis was long thought of as, and in many aspects still is, an
instrumentation problem; the better and more accurate the instrumentation (LC/MS,
GC/MS, NMR, CE, etc.) the better the resulting data which, in turn, facilitates data
interpretation and, ultimately, the understanding of the biological relevance of the results.
While the quality of instrumentation does play a very important role, the rate-limiting step
is often the processing of the data. Thus, software and computational tools play an
important and direct role in the ability to process, analyze, and interpret metabolomics data.
This situation is much like the early days of automated DNA sequencing where it was the
evolution of the software components from highly manual to fully automated processes that
brought about significant advances and a new era in the technology (Hood, Hunkapiller et
al. 1987; Hunkapiller, Kaiser et al. 1991; Fields 1996). Currently, software tools exist for the
automated initial processing of metabolomic data, especially chromatographic separation
coupled to mass spectrometry data (Wilson, Nicholson et al. 2005; Nordstrom, O'Maille et al.
2006; Want, Nordstrom et al. 2007; Patterson, Li et al. 2008). Samples can be processed
automatically; peak detection, integration and alignment, and various quality control (QC)
steps on the data itself can be performed with little to no user interaction. However, the
problem is that the generation of data, together with peak detection and integration, is the
relatively simple part; without a properly engineered system for managing this part of the
process the vast number of data files generated can quickly become overwhelming.
Two major processes in metabolomic data processing are the verification of the accuracy of
the peak integration and the verification of the accuracy of the automated identification of
the metabolites that those peaks represent. These two processes, while vitally important to
168 Metabolomics
the accuracy of the results, are very time consuming and are the most significant bottlenecks
in processing metabolomic data. In fact, the peak integration verification step is often
omitted due to the extremely large number of peaks whose integration must be verified.
2. Background
At the outset, running a metabolomics study is actually simple and straightforward.
Samples are prepared for running on a signal detection platform, signal data is collected on
samples from the instrumentation, the signals are translated into peaks, the peaks are
compared to reference libraries for the identification of metabolites and those identified
metabolites are then statistically analyzed with whatever metadata may exist for the
samples. Alternatively, the entirety of the detected peaks resulting from the instrument
signal data are statistically analyzed without metabolite identification prior to the statistical
analysis.
Once statistical analysis is completed and the significant signals have been stratified and
metabolites identified, biochemical pathway analysis is performed to gain insight into the
original biological questions the study asked. Too often, when the metabolomic experiments
do not provide meaningful biological results, the realization may come that there’s so much
variability in the data, it can’t be used to address the original objectives of the study. Despite
the methods and software provided by the various instrument vendors, it turns out that
running a global, non-targeted analysis of small molecules in a complex mixture that
generates high-quality data and provides answers to biological questions is challenging.
Doing so in a high-throughput environment is significantly more challenging.
However, a high-throughput metabolomics platform that produces reliable, precise,
reproducible, and interpretable data is possible. It simply requires the right process coupled
with the right software tools. As with any high throughput process it is important to have a
logical, consistent workflow that is simple, reproducible, and expandable without
negatively impacting the efficiency of the process. It is important to know when human
interaction is required and when it is not. Well designed and integrated software can
efficiently handle the majority of the mundane workload, allowing human interaction to be
focused only where required.
3. Approach
Metabolite identification is essential for chemical and biological interpretation of
metabolomics experiments. Two approaches to metabolomic data analysis have been used
and will be described in detail below. The main difference between the two approaches is
when the metabolite is identified, either before or after statistical analysis of the data.
To date, the most commonly used method of processing metabolomic data has been to
statistically analyze all of the detected ion-features (‘ion-centric’). Ion features, defined here
as a chromatographic peak with a given retention time and m/z value, are analyzed using a
statistical package such as SAS or S-plus to determine which features vary statistically
significantly and are related to a test hypothesis (Tolstikov and Fiehn 2002; Katajamaa and
Oresic 2007; Werner, Heilier et al. 2008) . The significant ion feature changes are then used to
prioritize metabolite identification. One issue with this type of approach is the convoluted
Software Techniques for Enabling High-Throughput Analysis of Metabolomic Datasets 169
nature of the data being analyzed. In many cases the “statistically significant ion-features”
are various forms of the same chemical and are therefore redundant information. Most
biochemicals detected in a traditional LC- or GC-MS analysis produce several different ions,
which contributes to the massive size and complexity of metabolomics data. In addition,
there are an even larger number of measurements for each experimental sample which
impacts the false discovery rate (Benjamini and Hochberg 1995; Storey and Tibshirani 2003).
In the ‘chemo-centric’ approach to metabolomics data analysis discussed here, metabolites
are identified on the front-end through the use of a reference library comprised of spectra of
authentic chemical standards(Lawton, Berger et al. 2008; Evans, Dehaven et al. 2009). Then,
instead of treating all detected peaks independently, as is done in the ion-centric approach,
the chemo-centric method selects a single ion (‘quant-ion’) to represent that metabolite in all
subsequent analyses. The other ions associated with the metabolite are essentially
redundant information that only add to data complexity. Furthermore, the statistical
analysis may be skewed since a single metabolite may be represented by multiple ion peaks,
and the false discovery rate increased due to the large number of measurements relative to
the number of samples in the experiment. Accordingly, by taking a chemo-centric approach
any extraneous peaks can be identified and removed from the analysis based on the
authentic standard library/database. Since the number of features analyzed statistically
contributes to the probability of obtaining false positives, analyzing one representative ion
for each metabolite reduces the number of false positives. Further, the chemo-centric data
analysis method is powerful because a significant amount of computational processing time
and power can be saved simply due to data reduction.
The majority of work and complexity with the chemo-centric approach are: first, the
generation of the reference library of spectra from authentic chemical standards; second, the
actual identification of the detected metabolites using the reference library; and third, the
ability for quality control (QC) of the automated metabolite identification, peak detection
and integration. Notably, the QC of the automated processes is often overlooked. However,
the QC step is critical to ensure that false identifications and poor or inconsistent peak
integrations do not make their way into the statistical analysis of the experimental results.
The generation of a reference library entry made up of the spectral signature and
chromatographic elution time of an authentic chemical standard is relatively
straightforward, as is the generation of spectral-matching algorithms that use the reference
library to identify the experimentally detected metabolites. In contrast, performing the QC
step on the automated processes, including peak detection, integration and metabolite
identification, is time and human resource intensive.
Not to be overlooked, an issue with using a reference library comprised of authentic
standards is dealing with metabolites in the samples that are not contained within the
reference library. The power of the technology would be significantly reduced if it was
limited to identifying only compounds contained in the reference library. Through
intelligent software algorithms, it is possible to analyze data of similar characteristics across
multiple samples in a study to find those metabolites that are unknown by virtue of not
matching a reference standard in the library, and, in the process, group all the ion-features
related to that unknown together by examining ion correlations across the sample set
(Dehaven, Evans et al. 2010). One such method capitalizes on the natural biological
variability inherent in the experimental samples, using this variation the metabolites and
170 Metabolomics
their respective ion-features can reveal themselves and be entered into the chemical
reference library as a novel chemical entity (Dehaven, Evans et al. 2010). The unknown
chemical can then be tracked in future metabolomics studies, and, if important, can be
identified using standard analytical chemistry techniques.
Without going into detail, it is important to note that the sample preparation process is
critical. High quality samples that have been properly and consistently prepared for analysis
on sensitive scientific instrumentation are of extreme importance. Ensuring this high quality
starts with the collection and preparation of the samples. No software system is going to be
able to produce high-quality data unless ample effort is focused on consistently following
standardized protocols for preparing high quality samples for analysis.
The following discussion, examples and workflow solutions make use of GCMS or LCMS
(or both) platforms for metabolomic analysis of samples, although the concepts in general
could apply to a variety of data collection techniques. Software tools are also presented to
demonstrate the application of the concepts that are discussed but the tools themselves will
not be discussed in great detail. It is also important to note that achieving the greatest
operation efficiency of the process relies on treating all of the experimental samples in a
study as a set and not as individual files. By using tools to analyze and perform quality
control on the samples as a single group or set it becomes much easier to spot patterns that
can be useful to determine what is going on in the overall process.
in high-throughput mode, it is best not to have humans manage data files, either in storage
locations, or, as noted above, in naming. For consistency, it is imperative that the machines
control this step; running one experiment on one machine may be manageable manually but
running experiments in tandem or on more than one instrument can easily result in
misnaming, file version problems, location mishaps, etc. if file management is not
automated.
4.3 Alignment based on peak similarity inadequate, retention index should be used
Many of the software packages provide capabilities to align the chromatograms to
account for time drift in an instrument. In many instances internal standards and/or
endogenous metabolites are used across the analyzed samples to align chromatography
based on their retention times, such that there is confidence that the same peak at the
same mass is consistent among the data files. This approach should be avoided because
while it works fine for peak analysis and chromatographic alignment on a single, small
study it will only be applicable within that one study where retention times are quite
consistent. This type of alignment approach makes it much harder to do a comparison to a
reference standard library where a retention profile is used as matching criteria. The
better choice is to opt for retention index (RI) calculation, which can correctly align
chromatograms even over long periods of time where conditions can be vastly different
dependent on the condition in these systems. Using a retention index method, each RT
marker is given a fixed RI value (Evans, Dehaven et al. 2009). The retention times for the
retention markers can be set in the integrator method and the time at which those internal
standards elute are used to calculate an adjustment RI ladder. All other detected peaks
can then use their actual retention time and adjustment index to calculate a retention
index. In this way, all detected peaks are aligned based on their elution relative to their
flanking RT markers. An RI removes any systematic changes in retention time by
assuming that the compound will always elute in the same relative position to those
flanking markers. Because of this, a unique time location and window for a spectral
library entry can be set in terms of RI, thereby ensuring that metabolites don’t fall outside
the allowed window over a much longer period of time. Retention indices have
predominately been used for GC/MS methods however this approach can also have great
success for LC/MS data alignment as well. LC/MS is certainly more complex as certain
metabolites and classes of metabolites show more chromatographic shift in their RI
172 Metabolomics
markers than others, in these cases increasing the expected RI window of the library entry
in conjunction with mass and fragmentation spectrum data is sufficient for accurate
identification. The advantages over many of the widely available chromatographic
alignment tools, eg. XCMS (Smith, Want et al. 2006), as it can be used to match against a
RI locked library over long periods of time and can align data from different biological
matrices without potential distortion from structural isomers.
most important aspects of metabolomics analysis, especially for running studies in high
throughput.
5. Quality control
The ability to perform thorough quality control on identified metabolites in metabolomics
studies is extremely important. The higher the quality of data entering statistical analysis,
the higher the probability that the study will provide answers to the questions being asked.
This section will focus on three aspects of quality control – quality control samples (i.e.,
174 Metabolomics
blanks, technical replicates), software for assessing the quality of metabolite identification,
and software for assessing the original peak detection and integration. This last point may
seem out of order but for reasons to be described results in an invaluable check of the peak
quality.
Fig. 1. Graphical user interface showing the view for the proposed identification of
heptadecanoic acid. (A) Distinct list of identified metabolites for the loaded sample set. This
list includes any metabolite identified at least once in any sample with the set. It also
includes summary statistics such as averages for spectral scoring and chromatographic peak
intensities, number of times detected, and status.(B) Chemical structure for displayed
metabolite. (C) Data for the posed library identification heptadecanoic acid from the GC/MS
method. (D) Data for the posed library identification heptadecanoic acid from the LC/MS
negative ion method. (E) List of unique sample identifiers comprising the study. (F)
Comment field for storing and displaying annotations that are relevant to the currently
displayed metabolite. (G) List of other ion peaks that exist as part of the spectral library
entry. (H) List of sample sorting options including associated sample metadata; diagnosis,
group and subgroup.
Software Techniques for Enabling High-Throughput Analysis of Metabolomic Datasets 177
Fig. 2. Plot for LC/MS negative method. Individual samples in the sample set are displayed
and sorted on the y-axis. Chromatographic retention time is presented on the x-axis.
178 Metabolomics
Fig. 3. Raw MS data can be accessed for each sample via graphical user interface links. Each
“dot” represents the detected and integrated ion peak in the individual sample listed on the y-
axis. Thus, each “dot” has an associated area, height intensity, chromatographic start, stop and
apex retention time/retention index. Color and shape of “dot” are indicative of the quality of
the match to the posed library identification (see chart 1) and can be used to launch underlying
data such as raw MS data (insert). Colors of the samples listed on the y-axis also hold meaning
(see chart 1).
Software Techniques for Enabling High-Throughput Analysis of Metabolomic Datasets 179
This type of visualization permits the analyst to quickly verify the quality (i.e., QC) of the
automated peak detection and integration software. Each dot that is representative of an ion
peak can be individually removed/rejected from the proposed library identification. In this
way, extraneously detected ion peaks in the window can be visualized and individually
removed, as is the case shown in Figure 4. In this particular example there are two closely
eluting ion peaks with the same mass. One of the peaks is 2-stearoylglycerophosphocholine
(Figure 4, panel A) and the other is 1-stearoylglycerophosphocholine (Figure 4, panel B). In
both panels the correct peak must be manually approved and the incorrect peak rejected
(indicated by red dots). Stray detected ions can also be individually rejected from the
identifications. In addition, the interface permits the interrogation of the integration quality
of individual ion peaks since each dot is linked to the raw ion data as illustrated in Figure 4,
panel C and Figure 5, panel B. In this fashion any potential inaccuracy in the automated
detection and integration of individual peaks can be readily determined.
In addition to being able to curate each sample individually, the automated library
identification for an entire sample set can be rejected. An example of this is shown in Figure
5. The presence of multiple dots for each sample in the RI window (Figure 5 A) coupled
with the ability to view the underlying ion data (Figure 5B) makes it apparent that the
automated metabolite identification was based on erroneous ion peaks that resulted from
the integration of noise. As a result the automated call for the entire sample set was
manually rejected by the analyst. Accordingly, with this visualization tool the analyst can
rapidly determine the quality of the automated detection and integration and remove from
the dataset any peaks which are of questionable quality.
In addition to being able to QC the automated peak detection and integration software, an
interface such as this allows an analyst to visually inspect the quality of the library
identification in each individual sample. In the graphical plot the “dot” representing the
detected ion peak for the proposed metabolite identification is displayed in various color
and shape combinations. Each combination of color and shape within each plot is an
indicator of the quality of the automated metabolite identification, which greatly aids the
analyst in making the quality assessment rapidly. Listed in Table 1 are possible color and
shape combinations with the meaning for each. The quality assessment is based on spectral
library matching logic (Evans, Dehaven et al. 2009). This graphical display allows the
analyst to look at a proposed identification for a given metabolite made by the software and
immediately determine its quality and confidence based on spectral match scores. In this
way, the automated metabolite identifications for large datasets can be quickly evaluated by
the analyst. An example of a proposed call for a group of ions in a sample set where the
MS/MS spectral match was poor is shown in Figure 6. The low data quality is readily
apparent by the preponderance of the red colored dots in the plot.
Fig. 4. Example of the visualization of closely related ion peaks. (A) Two possible ion peaks
were detected in same retention window for 2-stearoylglycerophosphocholine. In this case
the ion peak to the left is the correct peak (green and blue dots with arrow) and the peak on
the right was rejected (red). (B) The peak on the right is actually 1-
stearoylglycerophosphocholine. Therefore, the peak on the right is correct (green and blue
with arrow) while the peak on the left is rejected (red). In addition ion peaks in dashed
boxes are stray detected ion peaks not associated with the peak for 1-
stearoylglycerophosphocholine ion peaks that were rejected (red). (C) The extracted ion
chromatogram for one of the samples in the sample set for this ion shows the two peaks are
well separated and accurately integrated.
Software Techniques for Enabling High-Throughput Analysis of Metabolomic Datasets 181
Fig. 5. Example of rejected identification for entire sample set. (A) Detected ion-peaks that
result from (B) a noisy baseline as seen in one injection; the entire library identification for
this LC/MS positive method is rejected.
182 Metabolomics
Fig. 6. An example of an entire metabolite call that was rejected because the MS/MS spectral
match was poor. (A) Red color of dots indicates that the MS/MS spectral match was of low
quality. (B) Experimental MS/MS spectrum from one injection compared to the reference
library spectrum for beta-alanyl-L-histidine (carnosine).
184 Metabolomics
peaks per sample are typically detected and integrated. Those peaks are organized into
groups within a time window, adjusted/aligned by retention index from known internal
standards to account for time drift and matched to library compounds (metabolites). For
LC/MS/MS, secondary fragment ions of the primary quantification ion can also be used to
match library compounds.
Obviously, all the profiling and analysis of metabolites in biological samples are dependent
on the accuracy and consistency of ion chromatographic peak detection and peak
integration. However, GC/MS and LC/MS/MS measurements are complicated by a
number of factors; for example, the co-elution of metabolites because of incomplete
separation, the existence of artifacts from the system, the background noise, and the
potential wide concentration ranges of metabolites in the sample. Such complexities can
affect the detection and determination of the peak start, the peak end and the peak baseline.
Incomplete separation can lead to shoulders on peaks on either the leading edge or the
trailing edge of the main peak from metabolites present at higher concentrations. When
compared to the baseline, the peak start and the peak end would be characterized by a
baseline peak or a drop peak. Because of the complexity and variance inherent in biological
samples, the same metabolite in different samples may have been automatically detected
differently in regard to peak start, peak end and peak background. For example, in some
cases, the ion chromatographic peaks for metabolites present in only trace amounts may not
be well shaped, especially when a noisy background is present, so integration of such peaks
might be quite variable from sample to sample. In other cases, the major ions for a
metabolite may appear as a small shoulder on a larger ion peak and, as a result, may not
even be detected during the automatic peak detection/integration process from sample to
sample. In still other instances, a metabolite present in high concentrations may overload the
column and distort the chromatographic profile, leading to peak splitting. For such high
concentration metabolites the automatic library match may pick only one of the two peaks
for quantification which will give an erroneously low value for the amount of the metabolite
in the sample. Clearly, each of the above examples will lead to peak detection and
integration inconsistency and inaccuracy across the samples in a sample set, which will
potentially lead to wrong conclusions and wrong decisions in later analysis.
Global metabolomics has other challenges when it comes to peak detection. Unlike targeted
metabolomics, global metabolomic profiling cannot be optimized for each metabolite that is
present within a biological sample. Chromatography methods must be broad enough to
detect as many of the metabolites in the sample as possible, regardless of chemical
characteristics. Consequently, chiral compounds cannot be resolved and structural isomers
are usually not well resolved in global metabolomics profiling. In downstream analysis for
identified metabolites, structural isomers might better be combined to represent the
metabolites, or, if one isomer is more crucial in elucidating the metabolism or biochemical
pathway, consistently picking that one form across samples would ensure analytical
consistency.
A software solution to detect and correct such inconsistencies in ion peak integration and
library matching across samples in a sample set could be developed. After the quality control
phase of the identification of the detected metabolites is completed, a deeper examination of
the consistency of peak detection and integration could be performed to ensure consistency
and accuracy. The quality control phase of automatically detected metabolites involves
Software Techniques for Enabling High-Throughput Analysis of Metabolomic Datasets 185
providing a high-quality, filtered list of identified metabolites devoid of noise and artifactual
metabolites to the end user (Figure 7). Sample set sizes range from a few samples to hundreds
and even thousands of samples. Because hundreds of metabolites can be detected and
measured in each sample, this type of quality control operates on the ‘quant’ ion peaks –those
peaks detected in the samples that are used for quantification of those metabolites.
The chromatograms of the ion peaks representing the quantitative mass from all of the
samples in a set must be evaluated to determine if:
the majority of the sample peaks are on the trailing edge of another peak,
the majority of the sample peaks are on the leading edge of another peak,
the majority are peaks that encompass two peaks in other samples, as a result of peak
splitting.
Peak integration ranges are evaluated with alignment by retention index and the statistics of
peak limits across the sample set. Accordingly, in addition to user specified manual
correction, corrections in consistency and re-integration would be suggested and presented
to the analyst for review and approval. Functionally, this type of software would give the
end user a variety of methods to both investigate the automated integration and peak calls
and to correct them as necessary. The software features must include:
186 Metabolomics
Automatic merging of approved peaks from the sample that match to the same library
compound.
Detection of shoulder peaks based on RI-aligned peak start or peak end distribution
across the samples.
Manual integration
Manual peak splitting
Show peak chromatograms in overlay mode or tabular mode for easy review/manual
re-integration.
Update peak integrations, peak recovery and library rematch
When an identified metabolite in a biological sample is at a sufficiently high concentration,
it can overload the column and distort the chromatographic peak. Even though it may be
out of the linear range, a consistent integration of the peak is still needed to characterize the
group of samples. Distorted peaks tend to drive the integration software to identify a less
than optimal peak to be used for quantification. In Figure 7, the peak for glucose was
incorrectly split in a handful of samples by the automated peak integrator. By examining the
consistency of the peak integration across the set of samples it is possible to easily identify
and correct this situation. As shown in the example in Figure 8, this correction would
improve the relative standard deviation from 20.1 to 7.4
6.0
4.0
2.0
0.0
6.0
4.0
2.0
0.0
6.0
4.0
2.0
0.0
6.0
4.0
2.0
Intensity/10,000,000
0.0
6.0
4.0
2.0
0.0
6.0
4.0
2.0
0.0
6.0
4.0
2.0
0.0
6.0
4.0
2.0
0.0
6.0
4.0
2.0
0.0
6.0
4.0
2.0
0.0
1850.0 1855.0 1860.0 1865.0 1870.0 1875.0 1850.0 1855.0 1860.0 1865.0 1870.0 1875.0 1850.0 1855.0 1860.0 1865.0 1870.0 1875.0 1880.0
RI
2.0
1.6
Area/100,000,000
1.2
0.8
0.4
0.0
1246500
1246512
1246524
1246534
1246545
1246557
1246580
1246592
1246604
1246616
1246628
1246640
1246652
1246676
1246688
1246700
1246712
1246724
1246736
1246748
1246770
1246778
1246786
1246793
1246800
1246808
1246816
1246828
1246832
1246836
Task ID
As illustrated in Figures 9 and 10, small peaks on the leading or trailing side of a larger peak
are often integrated inconsistently:
Small shoulder peaks are detected
Small shoulder peaks are not detected
Small shoulder peaks are combined into the main peak
In Figure 8, the major peak on the left is identified as cysteine, whereas the shoulder on
the right side is from threonate. In one sample, the small peak from threonate was
inaccurately combined into the main peak for cysteine when it was automatically
integrated, thus inadvertently increasing the response for cysteine in that sample. After
re-integration the erroneous integration was corrected thereby restoring the correct
integration for cysteine and permitting the independent detection of threonate in the
sample as well.
Fig. 9. Examples in inconsistent shoulder peaks. Splitting of shoulder (Upper panel); Area
change after re-integration (Blue for automatic integration and red for re-evaluated
integration (Lower panel).
Software that can detect inconsistencies in peak detection and integration across samples in
a sample set can ultimately improve the accuracy in the integration of peaks that have been
identified as metabolites; this in turn leads to lower CV’s and more accurate statistical
analysis which can contribute significantly to the elucidation of metabolism and metabolite
pathway.
5.0
4.5
4.0
3.5
3.0
Intensity/1,000,000
2.5
2.0
1.5
1.0
0.5
0.0
5420 5440 5460 5480 5500 5520 5540 5560 5580 5600 5620
RI
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
4.5
Intensity/1,000,000
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
5420 5460 5500 5540 5580 5420 5460 5500 5540 5580 5420 5460 5500 5540 5580 5620
RI
Software Techniques for Enabling High-Throughput Analysis of Metabolomic Datasets 189
5420 5460 5500 5540 5580 5420 5460 5500 5540 5580 5420 5460 5500 5540 5580 5620
RI
6. Conclusion
Metabolomics as a technology has demonstrated clear utility in a broad array of biological
applications. The applications are not only in demonstrating simple metabolic comparisons
between treated and control groups but in studies involving biomarker discovery, drug
development/MOA/recovery, bio-processing, agricultural applications, consumer products,
diagnostics, and so on (Sreekumar, Poisson et al. 2009; Berger, Kramer et al. 2007; Barnes, Teles
et al. 2009; Boudonck, Mitchell et al. 2009; Ma, Ellet et al. 2009; Ohta, Masutomi et al. 2009;
Watson, Roulston et al. 2009; Oliver, Guo et al.). The ability to run metabolomic studies in high-
throughput has been a challenge thus far, not so much because of the complexity or size of the
data, but because of the difficulty in generating reproducible data having low process variation
that can be quantified, is devoid of artifactual components, and provides high confidence in the
identification of metabolites. Without knowledge of the variability of the process on a
metabolite by metabolite basis, it is not possible to determine the true biological variability and
thus, cannot provide accurate answers to the questions that under investigation.
As demonstrated here, quality and the throughput of processing sample data for
metabolomics studies do not need to be mutually exclusive. By taking an intelligent
engineering approach to the data workflow, knowing when to automate a process and
developing software solutions that are streamlined for this process, the processing of sample
data for metabolomics studies can be done in significantly high volume and with high quality.
7. Acknowledgement
We gratefully acknowledge the work and contributions of all members of the Metabolon
informatics, platform, project management, statistics and management teams for their
dedicated work in building an enterprise metabolomics platform. CD, AE, HD, and KL are
employees of Metabolon.
190 Metabolomics
8. References
Barnes, V. M., R. Teles, et al. (2009). "Acceleration of purine degradation by periodontal
diseases." J Dent Res 88(9): 851-855.
Benjamini, Y. and Y. Hochberg (1995). "Controlling the false discovery rate: a practical and
powerful approach to multiple testing." Journal of the Royal Statistical Society. Series B
57: 289-300.
Berger, F. G., D. L. Kramer, et al. (2007). "Polyamine metabolism and tumorigenesis in the
Apc(Min/+) mouse." Biochem Soc Trans 35(Pt 2): 336-339.
Boudonck, K. J., M. Mitchell, et al. (2009). "Characterization of the biochemical variability of
bovine milk using metabolomics " Metabolomics 5(4): 375-386.
Bowen, B. P. and T. R. Northen "Dealing with the unknown: metabolomics and metabolite
atlases." J Am Soc Mass Spectrom 21(9): 1471-1476.
Bryan, K., L. Brennan, et al. (2008). "MetaFIND: a feature analysis tool for metabolomics
data." BMC Bioinformatics 9: 470.
Dehaven, C. D., A. M. Evans, et al. (2010). "Organization of GC/MS and LC/MS
metabolomics data into chemical libraries." J Cheminform 2(1): 9.
Dunn, W. B., N. J. Bailey, et al. (2005). "Measuring the metabolome: current analytical
technologies." Analyst 130(5): 606-625.
Evans, A. M., C. D. Dehaven, et al. (2009). "Integrated, Nontargeted Ultrahigh Performance
Liquid Chromatography/Electrospray Ionization Tandem Mass Spectrometry
Platform for the Identification and Relative Quantification of the Small-Molecule
Complement of Biological Systems." Anal Chem 81(16): 6656-6667.
Fan, T. W., A. N. Lane, et al. (2004). "The promise of metabolomics in cancer molecular
therapeutics." Curr Opin Mol Ther 6(6): 584-592.
Fields, C. (1996). "Informatics for ubiquitous sequencing." Trends Biotechnol 14(8): 286-289.
Griffin, J. L. (2006). "Understanding mouse models of disease through metabolomics." Curr
Opin Chem Biol 10(4): 309-315.
Hood, L. E., M. W. Hunkapiller, et al. (1987). "Automated DNA sequencing and analysis of
the human genome." Genomics 1(3): 201-212.
Hunkapiller, T., R. J. Kaiser, et al. (1991). "Large-scale and automated DNA sequence
determination." Science 254(5028): 59-67.
Katajamaa, M. and M. Oresic (2005). "Processing methods for differential analysis of LC/MS
profile data." BMC Bioinformatics 6: 179.
Katajamaa, M. and M. Oresic (2007). "Data processing for mass spectrometry-based
metabolomics." J Chromatogr A 1158(1-2): 318-328.
Khoo, S. H. and M. Al-Rubeai (2007). "Metabolomics as a complementary tool in cell
culture." Biotechnol Appl Biochem 47(Pt 2): 71-84.
Lawton, K. A., A. Berger, et al. (2008). "Analysis of the adult human plasma metabolome."
Pharmacogenomics 9(4): 383-397.
Lindon, J. C., E. Holmes, et al. (2007). "Metabonomics in pharmaceutical R&D." Febs J 274(5):
1140-1151.
Ma, N., J. Ellet, et al. (2009). "A single nutrient feed supports both chemically defined NS0
and CHO fed-batch processes: Improved productivity and lactate metabolism."
Biotechnol Prog 25(5): 1353-1363.
Software Techniques for Enabling High-Throughput Analysis of Metabolomic Datasets 191
Nordstrom, A., G. O'Maille, et al. (2006). "Nonlinear Data Alignment for UPLC-MS and
HPLC-MS Based Metabolomics: Quantitative Analysis of Endogenous and
Exogenous Metabolites in Human Serum." Anal Chem 78(10): 3289-3295.
Ohta, T., N. Masutomi, et al. (2009). "Untargeted metabolomic profiling as an evaluative tool
of fenofibrate-induced toxicology in Fischer 344 male rats." Toxicol Pathol 37(4): 521-
535.
Oliver, M. J., L. Guo, et al. "A sister group contrast using untargeted global metabolomic
analysis delineates the biochemical regulation underlying desiccation tolerance in
Sporobolus stapfianus." Plant Cell 23(4): 1231-1248.
Patterson, A. D., H. Li, et al. (2008). "UPLC-ESI-TOFMS-based metabolomics and gene
expression dynamics inspector self-organizing metabolomic maps as tools for
understanding the cellular response to ionizing radiation." Anal Chem 80(3): 665-
674.
Scalbert, A., L. Brennan, et al. (2009). "Mass-spectrometry-based metabolomics: limitations
and recommendations for future progress with particular focus on nutrition
research." Metabolomics 5(4): 435-458.
Scheltema, R., S. Decuypere, et al. (2009). "Simple data-reduction method for high-resolution
LC-MS data in metabolomics." Bioanalysis 1(9): 1551-1557.
Smith, C. A., E. J. Want, et al. (2006). "XCMS: processing mass spectrometry data for
metabolite profiling using nonlinear peak alignment, matching, and identification."
Anal Chem 78(3): 779-787.
Sreekumar, A., L. M. Poisson, et al. (2009). "Metabolomic profiles delineate potential role for
sarcosine in prostate cancer progression." Nature 457(7231): 910-914.
Storey, J. D. and R. Tibshirani (2003). "Statistical significance for genomewide studies." Proc
Natl Acad Sci U S A 100(16): 9440-9445.
Thielen, B., S. Heinen, et al. (2009). "mSpecs: a software tool for the administration and
editing of mass spectral libraries in the field of metabolomics." BMC Bioinformatics
10: 229.
Tolstikov, V. V. and O. Fiehn (2002). "Analysis of highly polar compounds of plant origin:
combination of hydrophilic interaction chromatography and electrospray ion trap
mass spectrometry." Anal Biochem 301(2): 298-307.
Want, E. J., A. Nordstrom, et al. (2007). "From exogenous to endogenous: the inevitable
imprint of mass spectrometry in metabolomics." J Proteome Res 6(2): 459-468.
Watson, M., A. Roulston, et al. (2009). "The small molecule GMX1778 is a potent inhibitor of
NAD+ biosynthesis: strategy for enhanced therapy in nicotinic acid
phosphoribosyltransferase 1-deficient tumors." Mol Cell Biol 29(21): 5872-5888.
Werner, E., J. F. Heilier, et al. (2008). "Mass spectrometry for the identification of the
discriminating signals from metabolomics: current status and future trends." J
Chromatogr B Analyt Technol Biomed Life Sci 871(2): 143-163.
Wilson, I. D., J. K. Nicholson, et al. (2005). "High resolution "ultra performance" liquid
chromatography coupled to oa-TOF mass spectrometry as a tool for differential
metabolic pathway profiling in functional genomic studies." J Proteome Res 4(2):
591-598.
Wishart, D. S. (2009). "Computational strategies for metabolite identification in
metabolomics." Bioanalysis 1(9): 1579-1596.
Wishart, D. S. (2011). "Advances in metabolite identification." Bioanalysis 3(15): 1769-1782.
192 Metabolomics
Wishart, D. S., D. Tzur, et al. (2007). "HMDB: the Human Metabolome Database." Nucleic
Acids Res 35(Database issue): D521-526.
Xia, J., N. Psychogios, et al. (2009). "MetaboAnalyst: a web server for metabolomic data
analysis and interpretation." Nucleic Acids Res 37(Web Server issue): W652-660.
Part 3
USA
1. Introduction
In the drug development process, candidate compounds are first screened for desirable
biological properties such as effects on gene expression, signal transduction, or enzyme activity.
The genetic and metabolic pathways used in the readouts are known as targets of the drug
screening process. Despite advances in molecular targeting, proteomics and metabolomics,
drug screening with molecular or metabolic targets have not produced the results that meet the
need of the pharmaceutical industry in the selection of small molecules leads/targets for clinical
testing. The relative lack of success in applying the -omics in drug screening is partly due to the
inability of the –omics to account for metabolic regulation, a property of the cellular metabolic
network. More recently, tracer-based metabolomics has been developed as an experimental
approach for the study of cellular metabolic networks. Interconversion of metabolites are
measured in terms of “extreme pathways” of the metabolic network which can be used for
drug screening purposes. In this paper, these approaches for drug screening targeting genetic
pathways (transcriptomics), biochemical pathways (metabolomics and fluxomics) and ‘extreme
pathways” (tracer-based metabolomics) are compared. The advantages and limitations of these
approaches for metabolic research and drug screening are discussed.
Corresponding author
*
196 Metabolomics
the regulation of metabolism. This popular molecular genetic approach to drug screening is
based on the assumption that the effect of drugs on metabolism and metabolic regulation is
determined by gene transcription and translation alone.
The rationale for choosing gene switches as targets for drug screening can be illustrated by the
example of the action of the tumor suppressor gene (P53) in cancer metabolism. Cancer cells
have metabolic characteristics that are distinct from normal cells in that there is an overall
increased macromolecular syntheses to sustain cell growth and proliferation. These metabolic
characteristics are generally grouped under the Warburg effect which consists of increased
anaerobic glycolysis, decreased glucose oxidation and increased glutamine utilization (1). A
representation of the model of gene switches is depicted in Figure 1. The signals that
orchestrate these metabolic changes originate from the balance between oncogenes (growth
promoting factors) that turn on signaling pathways regulating the utilization of substrates for
growth and tumor suppressor genes such as P53 that modulate energy utilization. The loss of a
cancer suppressor gene or the over-expression of an oncogene may be sufficient to generate
genetic signals to switch on or off (or modulate) metabolic pathways resulting in the cancer
cell metabolic phenotype. The interaction between molecular pathways and metabolic
pathways in cancer has recently been reviewed (1). At the molecular level, P53 regulates
transcription of genes that modulate PI3K, Akt and mTOR pathways (growth promoting
pathways) to reduce cancer growth. Excessive growth induces expression of P53 in cells
keeping cell growth and cell death in balance. Independently, P53 inhibits glucose uptake,
ribose synthesis and glycolysis thus modulating cellular metabolism. When the action of P53
is lost due to mutation, cells take up more glucose for ribose synthesis and glycolysis, the key
elements of the Warburg effect. The fact that the actions of P53 can be used to explain the
cancer metabolic phenotype suggests that any signaling pathway that interacts with P53 is a
potential target for anticancer drug screening.
The use of genetic pathways for the understanding of metabolism and drug screening has its
limitations. The interactions among signaling pathways are often based on demonstrations
using artificial overexpression or underexpression of these pathways. The real actions of these
signaling pathways in normal physiology are not exactly known. The quantitative relationship
connecting gene expression to metabolism has not been worked out. Therefore, the genetic
switch hypothesis is only one possible explanation for the expression of the cancer metabolic
phenotype. Conceptual limitations of genetic switches in the understanding of metabolisms or
the metabolic effect of drugs have been noted by D. E. Koshland Jr (2) almost half a century ago.
He pointed out that overproduction or underproduction of enzymes by molecular
manipulation may sometimes have dramatic effects on an organism and other times with only
minor effects. The overall effect of genetic manipulation on cellular metabolism cannot always
be predicted. The lack of observable effect when an enzyme concentration is changed is
analogous to the “silent” phenotypes (3) of the carrier states of many recessive diseases when
enzyme or protein concentrations of the affected genes can be substantially reduced.
Discrepancies in genotype phenotype correlation between signaling pathways and
metabolism when it occurs may be explained by our incomplete knowledge of the feedback
regulation of the signaling pathways as well as metabolic regulations of cellular metabolism.
However, the lack of genotype-phenotype correlation in many cases can be attributed to
conceptual difficulties of using genetic switches to the understand metabolism. First,
metabolic regulation is rarely an “all-or-none” type of control. According to metabolic
Metabolic Pathways as Targets for Drug Screening 197
control analysis, the regulation of metabolic pathway is distributed over many enzymes of
the biochemical reaction. Transcriptional or post-translational modification of an enzyme
potentially changes its Km and/or Vmax of the reaction. However, the change in Km or
Vmax of one enzyme may be compensated by either a change in precursor substrate
concentration or by a shift in the locus of control of the reaction to other enzymes such that
net flux remains unchanged. Secondly, the model of metabolic switches does not take into
account how the change in one metabolic pathway may impact on many other pathways
that are connected by shared substrates or co-factors and vice versa. The lack of quantitative
relationship between genotype and phenotype is the Achille’s heel of the gene switching
hypothesis† and the use of genetic pathways for drug screening.
glucose-6-P ribose-5-P
+ NADPH
fructose-6-P
fructose-1,6-di-P
P53
lactate pyruvate
glutamine
Malate TCA Cycle
α-KG
†Gene expression can be quantitatively determined using RTPCR method. Results are reported in folds
of change. Even though there may be a correlation between the fold of change and the observed
metabolic effect, the correlation is not a quantitative one.
198 Metabolomics
‡Systems biology as commonly defined is the enumeration of a collection of biologically related objects
(genomics, proteomics and metabolomics) or characteristics (transcriptomics and fluxomics) within the
boundary of a cell. However, in actuality the context of a cellular boundary i.e. how these objects or
characteristics separate the cell from its environment is often absent in the definition of these systems (4).
Metabolic Pathways as Targets for Drug Screening 199
Fig. 2. A model of pentose phosphate cycle used for carbon tracing and flux analysis. Using
experimentally determined isotopomer distribution in ribose and lactate, the fluxes of the
numbered reactions can be calculated (from reference 9 with permission).
The fluxomic approach targeting traditional biochemical reactions provides more specific
information regarding the metabolic system than metabolite profiling (metabolomics). The
use of fluxomics allows the simultaneous assessment of the effect of a drug on multiple
metabolic pathways and permits a better understanding of metabolism than the gene-
targeting approach. However, in order to take into account futile cycling or stoichiometric
constraints, stable isotope tracing (carbon tracing) is required as illustrated in the above
example. Even though it is possible to construct a complex model for mammalian metabolic
networks to take into account of futile cycles and stoichiometric constraints, such a model
requires a very large data set and extensive programming. In the best case scenario, there is
never sufficient data for solving all the parameters of the system and the results are model
dependent and are difficult to verify for practical reasons (2). Nonetheless, the fluxomics
approach definitely provides better correlation with phenotype than the gene switch
targeting approach.
utilize substrates from its environment to produce energy and building material for the
synthesis of macromolecules. Excess intermediates are returned to the surrounding
environment to maintain a relatively constant internal environment. The boundaries of
metabolic activities represented by “extreme pathways” within which the cell functions
define the homeostatic state (10, 11). These boundaries are the result of constraints by the
stoichiometry of competing reactions, synchronization of shared pathways and/or
intermediates, and balance of energy production and utilization.
The role of “extreme pathways” in the maintenance of homeostasis can be illustrated by the
example of glucose metabolism via the TCA cycle. Pyruvate from glycolysis is metabolized
via pyruvate carboxylation leading to the conservation of 3-carbon species or pyruvate
decarboxylation leading to production of 2-carbon species (via acetyl-CoA) and energy
production (beta-oxidation and tricarboxylic acid (TCA) cycle) (16). These two processes are
concurrent in cells and the activity of one pathway constrains the activity of the other. For a
given homeostatic state, the observed utilization of pyruvate via these pathways is the
optimal§ pyruvate utilization and can be represented by a vector in the pyruvate phenotypic
phase plane. The operation of the TCA cycle is an example of metabolic constraint due to
synchronization of shared pathways or intermediates. A full turn of the TCA cycle oxidizes a
mole of acetate into two moles of carbon dioxide with production of reducing equivalents
and/or high energy phosphates. At the same time each of the TCA cycle intermediate may
have its respective substrate cycle such as the malate cycle and the citrate lyase cycle. These
individual substrate cycles perform separate metabolic functions in conveying reducing
equivalents (malate shuttle) and acetyl-CoA (citrate lyase cycle) from the mitochondria to the
cytosol. The operations of these cycles are usually synchronized for efficiency. When there is
a lack of synchrony of these cycle, abnormal substrate and energy balance can result and a loss
of homeostasis in the cell occurs. The imbalance of energy metabolism in the mitochondria due
to imbalance of substrate cycles is a frequent cause for reactive oxygen species generation and
apoptosis. Changes in these boundaries consisting of “extreme pathways” are sensitive to
metabolic or therapeutic perturbations and are excellent markers of therapeutic effects.
The differnces between a metabolic network and a traditional biochemical reaction model
can best be shown by representing a metabolic network as an engineering system. The
working of such a system is illustrated in figure 3 in which pathways shown in Figure 2 are
represented as belts and wheels connecting glycolytic/gluconeogenic substrates to those of
the pentose cycle. The enzymes that drive the belts are indicated and the role of energy
production and utilization are included. Figure 2 is model of pentose cycle intermediates
linked by enzymatic reaction. The fluxes of these reactions can be modeled mathematically
using a set of ordinary differential equations. Figure 3 shows the production and
consumption of different classes of compounds connected to the production and
consumption of ATP and reducing equivalents. These models are conceptually different.
The input-output model of cellular homeostasis of tracer-based metabolomics can account
for stoichiometric constraints and synchronization of substrate cycles thus overcoming
limitations of the previous approaches in metabolic studies.
§Optimality is sometimes thought of as a teleological concept. The optimal metabolic function of a cell
is not for its purpose to survive, but is defined by the internal organization of the metabolic network.
Metabolic Pathways as Targets for Drug Screening 201
Fig. 3. An engineering model of the system of reactions depicted in Figure 1 and Figure 2.
The relationship among the different substrate pools is represented by different circles.
Stoichiometric relationships are provided by the mass balance equations. The productions of
these metabolites from one another are indicated by respective drive belts. Energy substrate
consumption and production is also included in the model. The metabolic network and its
function is shown as a factory production model with sources and sinks of the raw materials
and products.
The basic concept and tracer methodology of tracer-based metabolomics have been
reviewed (4, 10-12). A key feature that distinguishes tracer-based metabolomics from
metabolite profiling (metabolomics) and fluxomics is the inclusion of a system boundary
that permits input-output analysis and a balance of flux** model in which substrate input is
link to its output (products) by “extreme pathways” (12, 13, 14). “Extreme pathways” are
pathways that elements (carbon, oxygen and nitrogen) from compounds (precursors)
introduced into the system travel over to the final products. The basic elements of “extreme
pathways” form the axes of a high dimension phenotypic space, any two of these axes forms
a phenotypic phase plane and the line of optimality which is a vector within the space (or a
plane) representing the metabolic phenotype. The relationship among any three “extreme
A balance of flux analysis requires a steady state or quasi-steady state assumption. For most cellular
**
processes involving cell growth and division, these processes are slow relative to the experimental
study period and quasi-steady state of metabolic reactions may be safely assumed. However, in
biological processes that are fast such as muscle contraction, or nerve conduction the balance of flux
model cannot be applied and a dynamic model is required.
202 Metabolomics
G2-G1-G1-G2-
M5 --- --- --- ---
G3@
G2-G1-G1- G2-G1-G1-G2-X;
M4 --- --- ---
G2@ G2-G1-G1-G1-X
The orientations of the amino acid molecules are shown in the top row. Mass isotopomers are
designated as M1 to M5 indicating the number of 13C per molecule of the amino acid. The
corresponding position of glucose carbon within the amino acid is designated as G1 to G3. The glucose
molecule is symmetrical around C3-C4. In the table, G1-G2-G3 is the same as G6-G5-G4, if these
positions are labeled equally. X represents 12C from exchange within the TCA cycle.
@ When glucose enrichment is high, there is a likelihood of labeled OAA condensing with labeled acetyl-
Table 1a. Examples of Position and Mass Isotopomer Distribution in Gluconeogenic Amino
Acids from [U13C6]-glucose (16-24)
204 Metabolomics
Oxidative G6-G5-G4-G3-
M5
G2
Non-oxidative Glycogen glucose are
M4 labeled as in G3-G2-X-G3-G2
glycolysis/
M3 G3-G2-X-X-G3
Oxidative gluconeogenesis
shown above X-X-X-G3-G2
plus M2
G3-G2-X-X-X
Non-oxidative
M1 G3
The orientations of the glycogen, ribose and lactate molecules are shown in the top row. Mass
isotopomers are designated as M1 to M6 indicating the number of 13C per molecule of the amino acid.
The corresponding position of glucose carbon within the glycogen, ribose and lactate is designated as
G1 to G6. The glucose molecule is symmetrical around C3-C4. In the table, G1-G2-G3 is the same as G6-
G5-G4, if these positions are labeled equally. X represents 12C from exchange within the TCA cycle.
Table 1b. Examples of Position and Mass Isotopomer Distribution in Glycogen, Ribose and
Lactate from [U13C6]-glucose (16-24)
Fig. 4. Examples of phenotypic phase plane analysis showing the quantitative relationship
among phenotypes by isocline analysis. Panel (i) shows effects of two drugs (A and B) with
different mechanisms on metabolic pathways X and Y. N represents the normal phenotype.
Panel (ii) shows the effect of A is orthogonal to the X-Y plane. Panel (iii) shows dose
dependent effect of A. Panel (iv) shows non-linear response to two different doses of A.
intervention or by chemical inhibitor of the reaction had similar metabolic phenotypes (25).
Panel (ii) shows result of treatment A which is orthogonal to the phenotypic phase plane of
X and Y. This means that treatment A affects a different part of the metabolic system which
is not linked to the utilization of substrates X and Y. Metformin and rosiglitazone, two
antihyperglycemic drugs, have been shown to alter de novo lipogenesis. While inhibition of
de novo lipogenesis by metformin is in the plane of ribose metabolism, meaning changes in
pentose cycle metabolism is related to the decrease in lipogenesis. On the other hand, the
increase in fatty acid synthesis by rosiglitazone is orthogonal to the ribose phenotypic phase
plane suggesting very different mechanism of actions by these two drugs (26). The ability to
detect orthogonal phenotypic phase plane is important because there are potentially many
of these orthogonal phenotypic phase planes which can be discovered using tracer-based
metabolomics, and each of the orthogonal pair would suggest different mechanism of action
by different drugs. The finding of orthogonal planes is one of the unique capability of the
metabolomics approach in generating mechanistic hypothesis. Panel (iii) shows the
proportional response to inhibitor of substrate X where all of the isoclines are parallel to
each other. An example of this type of response is provided by our study on the response of
a methotrexate resistant colon cancer cell line (HT29) to the effect of DHEAS, oxythiamine
and methotrexate treatment alone and in combination (27). Panel (iv) shows response to
two inhibitors of substrate X with non-linear compensation of substrate Y. The application
of PPP analysis has allowed a far better understanding of metabolic adaptation in cellular
homeostasis using tracer-based metabolomics. Using PPP and isocline analysis, we can
directly exploit the large dataset accumulated from tracer-based metabolomics studies for
target discovery and lead identification in pharmaceutical industry.
7. Concluding comments
Study of metabolism in the post-genomic era differs from the traditional biochemistry in
that the study is focused on the function of the system of biochemical reactions in a cell (or
the cellular metabolic network) and its regulation. Metabolic function of a living organism
(cell) is what mediates the the genetic potential of a cell and its interaction with its
environment to maintain homeostasis. The maintenance of homeostasis by the cellular
metabolic network in a living organism is the basis of normal physiology and histology (28).
When the metabolic environment of a living organism or a cell is altered such as in diabetes
or metabolic diseases, maladaptation or the lack of homeostasis in the living organism is the
underlying cause for pathophysiology and histopathology (29).
Metabolic phenotype of a cell is the result of genetic and environmental interaction.
Understanding metabolic phenotyping changes is important to our understanding of how cells
maintain homeostasis by metabolic regulation. We have reviewed three approaches that are
used in such investigations based on three different models. Of these different approaches, the
gene-switch approach is the most extensively used in the pharmaceutical industry. In the gene-
switch model, metabolic regulation begins with the interaction between genes and signaling
pathways which eventually impact on biochemical reactions known as down-stream effects
(30). This model ignores the fact that many of these signaling pathways or transcriptional
factors are altered through post-translational events such as phosphorylation, acetylation,
glycosylation and methylation. Since all these post-translational modifications are basic
biochemical reactions, they are all subject to the stoichiometric and energy substrate constraints.
Metabolic Pathways as Targets for Drug Screening 207
Once the downstream events are initiated, the interconversion of metabolic intermediates is
subject to all the constraints as described in preceding paragraphs. We have previously shown
that altered metabolic pathways can be the initiating events in gene transcription and post-
translational modification of signaling pathways and enzymes (31, 32). Therefore, gene-switch
model is an incomplete model to understand metabolic regulation. Because of these conceptual
problems, application of gene-switch approach has had disappointing results in identifying
drug candidates or targets and appalling failures in clinical trial due to unexpected toxicity or
lack of efficacy. Of the two remaining approaches, tracer-based metabolomics is a practical
experimental approach that does not require complicated mathematical modeling.
Furthermore, the results can be graphically presented and the quantitative difference of
metabolic phenotypes can be compared. Such features make the tracer-based metabolomics a
powerful approach for drug screening in pharmaceutical research. Since the model does not
assume any signaling pathways, it is most suitable for studies of nutriceuticals such as
phytochemicals (33) or for screening of compounds in a chemical library that have no known
molecular targets (34). It is also applicable to investigate the metabolic effect of drug
combinations, in which the interaction of drugs can be studied. Most important of all, tracer-
based metabolomics approach provides the understanding of cellular homeostasis and its
changes under the influence of nutrient conditions or pharmaceuticals.
Since our first publication on metabolic profiling (35), progress in tracer-based
metabolomics has been slow because there are few investigators who are trained in tracer
technology. The current tracer model mainly addresses the area of glucose metabolic
pathways. Methods for the investigation of other metabolic systems that may be distantly
connected to glucose metabolism (orthogonal systems) are not represented. These systems
include the systems of glutamine metabolism which connects glucose metabolism to nucleic
acid synthesis; arginine metabolism which is part of the urea cycle and nitric oxide synthesis
system; and the methyl donor pathways which are important in nucleic acid synthesis and
choline synthesis. The complete development of tracer-based metabolomics is probably a
decade away provided the development has the attention and adequate funding to complete
the tasks to cover the metabolic pathways of the whole cellular metabolic network.
8. Acknowledgement
This work is supported by the Biomedical Mass Spectrometry Laboratory of the GCRC at
the Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center (PHS M01-
RR00425; UCLA; CTSI 1UL1-RR033176) and the Metabolomics Core Laboratory of the
UCLA Center of Excellence in Pancreatic Diseases (P01 AT003960).
9. References
[1] Levine AJ, Puzio-Kuter AM. The Control of the Metabolic Switch in Cancers by
Oncogenes and Tumor Suppressor Genes. Science. 2010 Dec 3;330(6009):1340-4.
[2] Koshland Jr. DE Control of enzyme activity and metabolic pathways. pp1-8, in
“Metabolic Regulation” (eds. R.S. Ochs, R.W Hanson and J Hall), Elsevier Science
Publisher, Amsterdam (1985).
208 Metabolomics
[3] Raamsdonk LM, Teusink B, Broadhurst D, Zhang N, Hayes A, Walsh MC, Berden JA,
Brindle KM, Kell DB, Rowland JJ, Westerhoff HV, van Dam K, Oliver SG. A
Functional Genomics Strategy That Uses Metabolome Data to Reveal The
Phenotype of Silent Mutations. Nat. Biotechnol 2001; 19:45–50.
[4] Cascante, M., Comin, B., Boren, J., Raïs, B., Centelles, J.J., Puigjaner, J., Lee, W.-N. P.,
Boros, L.G. "Application of Metabolic Control Analysis to the Design of a New
Strategy for Cancer Therapy", pp. 173-180 in "Technological and Medical
Implications of Metabolic Control Analysis" (ed. A. Cornish-Bowden and M. L.
Cárdenas), Kluwer Academic Publishers, Dordrecht (2000)
[5] Cascante M, Boros LG, Comin-Anduix B, de Atauri P, Centelles JJ, Lee PW.Metabolic
control analysis in drug discovery and disease. Nat Biotechnol. 2002 Mar;20(3):243-
9.
[6] Lee WN. Characterizing Metabolic Phenotype Using Tracer-Based Metabolomics.
Metabolomics 2006, 2:31-39.
[7] Zamboni N, Fendt SM, Rühl M, Sauer U. (13)C-based metabolic flux analysis. Nat
Protoc. 2009;4(6):878-92. Epub 2009 May 21.
[8] Selivanov VA, Puigjaner J, Sillero A, Centelles JJ, Ramos-Montoya A, Lee PW, Cascante
M. An Optimized Algorithm for Flux Estimation From Isotopomer Distribution in
Glucose Metabolites. Bioinformatics 2004; 20:3387-3397.
[9] Selivanov VA, Meshalkina LE, Solovjeva ON, Kuchel PW, Ramos-Montoya A, Kochetov
GA, Lee PW, Cascante M. Rapid Simulation and Analysis of Isotopomer
Distributions Using Constraints Based on Enzyme Mechanisms: An Example From
HT29 Cancer Cells. Bioinformatics 2005; 21:3558-3564.
[10] Lee WN, Go VL. Nutrient-Gene Interaction: Tracer-Based Metabolomics. J Nutr 2005;
135:3027S-3032S.
[11] Lee, WN, Wahjudi PN, Xu, J, Go VLW. Tracer-based Metabolomics: Concepts and
Practices. Clin Biochem. 2010 Nov;43(16-17):1269-77.
[12] Maguire G, Boros LG and Lee P. Development of Tracer-Based Metabolomics and its
Implications for the Pharmaceutical Industry. Int J Pharm Med 2007; 21 (3): 217-
224.
[13] Schilling CH, Palsson BO. The Underlying Pathway Structure of Biochemical Reaction
Networks. Proc Natl Acad Sci 1998; 95:4193-4198.
[14] Schilling CH, Letscher D, Palsson BO. Theory for the Systemic Definition of Metabolic
Pathways and Their Use in Interpreting Metabolic Function From a Pathway-
Oriented Perspective. J Theor Biol 2000; 203:229-248.
[15] Edwards JS, Ramakrishna R, Palsson BO. Characterizing the Metabolic Phenotype: a
Phenotype Phase Plane Analysis. Biotechnol Bioeng. 2002; 77:27-36.
[16] Katz J., Lee WP. The Application of Mass Isotopomer Analysis in The Determination of
Pathways of Glycogen Synthesis. Am. J. Physiol 1991; 259:E757-E762.
[17] Lee WNP, Boros LG, Puigjaner J, Bassilian S, Lim S, Cascante M. Investigation of The
Pentose Cycle Using [1, 2-13C2]-Glucose and Mass Isotopomer Analysis:
Estimation of Transketolase and Transaldolase Activities. Am. J. Physiol 1998; 274:
E843-E851.
Metabolic Pathways as Targets for Drug Screening 209
[18] Katz J, Lee WNP, Wals PA, Bergner EA. Studies of Glycogen Synthesis and the Krebs
Cycle by Mass Isotopomer Analysis with U-13C-Glucose in Rats. J. Biol. Chem
1989; 264:12994-13001.
[19] Katz J, Wals PA, Lee WP. Determination of Pathways of Glycogen Synthesis and
Dilution of the 3-Carbon Pool with [U-13C6]-Glucose. Proc. Natl. Acad. Sci 1991;
88:2103-2107.
[20] Katz J, Wals PA, Lee WP. Isotopomer Studies of Gluconeogenesis and the Krebs Cycle
with 13C Labeled Lactate. J. Biol. Chem 1993; 268:25511-25521.
[21] Lee WNP. Appendix: Analysis of Tricarboxylic Acid Cycle Using Mass Isotopomer
Ratios. J. Biol. Chem 1993; 268:25522-25526.
[22] Lee WP, Edmond J, Bassilian S, Morrow J. Mass Isotopomer Study of Glutamine
Oxidation and Synthesis in Primary Culture of Astrocytes. Develop. Neurosci 1996;
18:469-477.
[23] Fu TF, Rife JP, Schirch V. The Role of Serine Hydroxymethyltransferase Isozymes in
One-Carbon Metabolism in MCF-7 Cells as Determined by (13)C NMR. Arch
Biochem Biophys 2001; 393:42-50.
[24] Solà A, Maaheimo H, Ylönen K, Ferrer P, Szyperski T. Amino Acid Biosynthesis and
Metabolic Flux Profiling of Pichia Pastoris. Eur J Biochem 2004; 271:2462-2470.
[25] Alcarraz-Vizán G, Boren J, Lee WN, Cascante M. Histone deacetylase inhibition results
in a common metabolic profile associated with HT29 differentiation. Metabolomics.
2010:229-237.
[26] Personal unpublished observation.
[27] Ramos-Montoya A, Lee WN, Bassilian S, Lim S, Trebukhina RV, Kazhyna MV, Ciudad
CJ, Noe V, Centelles JJ, Cascante M. Pentose phosphate cycle oxidative and
nonoxidative balance: A new vulnerable target for overcoming drug resistance in
cancer. Int J Cancer. 2006:119(12):2733-41.
[28] Boros, LG, Lee WNP and VLW. Go. A Metabolic Hypothesis of Cell Growth and Death
in Pancreatic Cancer. Pancreas 2002: 24:26-33.
[29] Boros LG, Steinkamp MP, Fleming JC, Lee WNP, EJ Neufeld. Defective RNA Ribose
Synthesis in Thiamine-Responsive Megaloblastic Anemia (TRMA): Mechanism for
the Syndrome. Blood 2003; 102:3556-3561.
[30] Zanuy M, Ramos-Montoya A,• Oscar Villacanas O, Canela N, Miranda A, Aguilar E,
Agell N, Bachs O, Rubio-Martinez J, Pujol MD, Lee WNP, Marin S, Cascante M.
Cyclin-dependent kinases 4 and 6 control tumor progression and direct glucose
oxidation in the pentose cycle. Metabolomics 2011 DOI 10.1007/s11306-011-0328
[31] Zhang H, Cao R, Lee WN, Deng C, Zhao Y, Lappe J, Recker R, Yen Y, Wang Q, Tsai MY,
Go VL, Xiao GG. Inhibition of protein phosphorylation in MIA pancreatic cancer
cells: confluence of metabolic and signaling pathways. J Proteome Res. 2010 Feb
5;9(2):980-9. PMID: 20035555; PMC2836017
[32] Ma D, Wang J, Zhao Y, Lee WNP, Xiao J, Go VL, Wang Q, Recker R, Xiao GG Inhibition
of glycogen phosphorylation induces changes in cellular proteome and signaling
pathways in MIA pancreatic cancer cells Pancreas 2011
D01:10.1097/mpaOBO13E318236F022
210 Metabolomics
[33] Harris DM, Li L, Chen M, Lagunero FT, VLW Go, Boros LG. Diverse mechanisms of
growth inhibition by luteolin, resveratrol, and quercetin in MIA PaCa-2 cells: a
comparative glucose tracer study with the fatty acid synthase inhibitor C75
Metabolomics 2011 DOI 10.1007/s11306-011-0300-9
[34] Harrigan GG, Brackett DJ, Boros LG. Medicinal chemistry, metabolic profiling and drug
target discovery: a role for metabolic profiling in reverse pharmacology and
chemical genetics. Mini Rev Med Chem. 2005 Jan;5(1):13-20.
[35] Boros LG, Cascante M, Go VLW, Heber D, Hidvégi M, Lee WNP. Metabolic Profiling of
Cell Growth and Death in Cancer: Applications In Drug Discovery. Drug Discovery
Today 2002; 7:366-374.
Part 4
1. Introduction
Today’s unsustainable use of fossil fuel reserves or green fuel is predicted to destabilize the
global climate and lead to reduced food security. The key challenge for the coming decades
are to meet local needs for food, in terms of both quantity and quality, while conserving
natural resources and biodiversity (Ruane & Sonnino, 2011) and to develop a supply
industry based on renewable plant-derived products. Indeed agricultural crops can be
viewed as a source of or starting point for a plant based economy, potential input to a bio
refinery in which all parts of the plant are processed and used to yield (i) food, both
traditional and with enhanced nutritional safety, stability and processability; (ii) industrial
products, including polymers, fibbers, industrial oils and packaging materials as well as
basic chemical building blocks (green chemistry); (iii) fuels such as ethanol and biodiesel;
(iv) molecules with pharmaceutical properties and health benefits. To reach these new
agricultural perspectives, new varieties with the appropriate properties need to be selected
(Tester & Langridge, 2010) through plant breeding, be it conventional, marker assisted, QTL
mapping assisted, or genetically modified (GM) (Mittler & Blumwald, 2010). There are also
growing demands for germplasm adapted to deal with changing climates and effective
under a range of cultural practices and for foods with higher nutritional value. To decipher
agronomical traits, functional genomics approaches can be of good use to understand
physiological, molecular and genetic processes underlying complex traits. Appropriate
functional genomics technologies such as transcriptomics, proteomics and metabolomics
must be used together with detailed physiological and environmental information as a
combined platform for ‘candidate’ gene identification or translational genomics approaches
that aims to improve complex traits in plants (Sanchez et al., 2011). Without a
comprehensive understanding of the plant physiology, molecular processes and genetics of
214 Metabolomics
the components of complex traits, the development of new varieties will remain an
empirical yet uncertain procedure. This integration of functional genomics data can be
viewed as the first step to systems and predictive biology serving agricultural perspectives.
Among the ‘omics’ technologies, metabolomics is one of the more recently introduced. The
term ‘metabolome’ coined in 1998 (Oliver et al., 1998) refers to the richly diverse population
of small molecules present in biofluids, living cells or organisms. Overall, there are two
approaches to analyse small molecules, and they differ in the number of compounds
analysed, the level of structural information obtained, and their sensitivity. The most
common approach, metabolite profiling, is the analysis of small numbers of known
metabolites in specific compound classes (e.g. sugars, amino acids or phenolics). At the
other extreme, metabolic fingerprinting detects many compounds but their structures are
rarely identified. Today metabolomics methods typically allow measuring hundreds of
compounds, with a small number being definitively identified, a larger number being
identified as belonging to particular compound classes, and many remaining unidentified.
Over the past decade, metabolomics has gone from being just a simple concept to becoming
a rapidly growing discipline with valuable outputs in plant biology (Hall, 2006; Saito &
Matsuda, 2010; Hall, 2011a; Shepherd et al., 2011). Metabolomics has played a key role in
basic plant biology and started having a potentially broad field of applications. Plants
produce an astonishing wealth of metabolites estimated to figures ranging from 200,000 to
1,000,000 metabolites (Dixon & Strack, 2003; Saito & Matsuda, 2010). The first significant
advances have been made in the area of analytical technology for metabolite identification
in order to increase our capacity to simultaneously analyse a chemically diverse range of
metabolites in complex mixtures. The metabolomics community has set up analytical
platforms with complementary analytical technologies (Moing et al., 2011) after having
realized that no single technology currently available (or likely in the close future) will be
able to detect all compounds found in living cells. Today these analytical platforms provide
a combination of multiple analytical techniques such as gas chromatography (GC), liquid
chromatography (LC) or capillary electrophoresis (CE) coupled to mass spectrometry (MS),
or nuclear magnetic resonance spectroscopy (NMR) and much more (Kim et al., 2011; Lei et
al., 2011).
Considering metabolomics as a combination of knowledge and know-how in
biochemistry, signal processing, data and metadata handling, and data mining, the
challenge remains to perform in a cohesive and coordinated manner these
multidisciplinary approaches to solve biological questions (Ferry-Dumazet et al., 2011;
Hall, 2011b). Recently, plant biologists have used metabolomics approaches to understand
fundamental plant processes (Leiss et al., 2010; Sulpice et al., 2010), to make a link
between genotype and biochemical phenotype and to study plant responses to biotic or
abiotic stresses by combining genomics and biochemical phenotyping capabilities
(Redestig & Costa, 2011; Villiers et al., 2011). While full genome sequence annotations of
the major crops have been published, many post-genomic studies using metabolomics
approaches have tried to bridge the phenotype-genotype gap in order to link gene to
function (Smith & Bluhm, 2011). Such integrated approaches have been helpful in
assigning functions to a large class of function-unknown genes and their interactions with
other pathways and also useful in applications such as metabolic engineering (Liu et al.,
2009) and assessment of GM plants (Kusano et al., 2011b).
New Opportunities in Metabolomics and Biochemical Phenotyping for Plant Systems Biology 215
As part of a more recent emerging area, robust data generated from metabolomics can be
combined with computationally-intensive approaches based on modelling of pathways to
steer this field towards systems biology, which promises to provide an integrated view of
cellular processes (Joyce & Palsson, 2006; Wang et al., 2006). Bringing metabolomics data
into the forefront of system biology is a challenging opportunity that implies using
quantitative metabolomics data in the context of models to improve our understanding of
metabolism and drive the biological discovery process. So far, computational studies on
metabolomics data have often been restricted to multivariate statistical analyses such as
principal component analysis or PLS discriminant analysis to look at trends among different
data sets. Such work has proven useful in discovering potential biomarkers of stress and
identifying key metabolic difference in GM plants, but provides minimal insight into the
underlying biology or the means to modulate it for agronomic or industrial purposes. Now
researchers are rising to the challenge by using omics data integration and specially high-
throughput metabolomics data within a constraint-based framework to address
fundamental questions that would increase our understanding of systems as a whole.
This article provides an overview of the technological trends in plant metabolomics to
optimize the characterization of a large number of metabolites with accurate and absolute
quantification in a few samples (concept of vertical high-throughput metabolomics) and
present the needed technologies to increase the analysis capacity of samples for large-scale
studies (concept of horizontal high-throughput metabolomics). This article also outlines
how these technological developments in plant metabolomics can be used for systems
biology, quantitative genetics and the emerging field of meta-phenomics to answer the key
challenges of plant biology and agriculture in the future, and which technological and
computational developments are necessary to meet these challenges.
Saito & Matsuda, 2010). Even the number of primary metabolites, defined as the type of
compounds synthesized by all or most plant species, may exceed the number of
compounds found in other eukaryotes since plants are true autotrophs (Pichersky &
Lewinsohn, 2011). In addition, different plant lineages synthesize distinct sets of
“specialized metabolites”, often mis-named “secondary metabolites” (Pichersky &
Lewinsohn, 2011), with Arabidopsis thaliana estimated to make up to 3,500 of such
specialized metabolites. Capturing such diversity is one of the challenges for plant
metabolomics compared to animal metabolomics, which has to deal with ‘only’ 5,000 to
25,000 different metabolites (Trethewey, 2004). However, the consumption of plant-
derived food is known to lead to a strong increase in metabolite diversity in animal or
human derived samples, e.g. blood or urine. This implies that plant and nutrition
scientists face a similar challenge. Indeed, specific plant metabolites are attracting
attention due to their role/impact on health and nutrition. Vertical metabolomics mainly
relies on sophisticated instrumentation such as NMR and MS, with or without
hyphenation of chromatography or capillary electrophoresis (LC-NMR, LC-SPE-NMR,
LC-MS, GC- MS, GC- SPE-MS, CE-MS, Fourier Transform-MS (FT-MS), Table 1).
Vertical approaches have to deal with a wide variety of chemical structures, which implies
wide ranges of solubility, polarity and stability, as well as a high dynamic range of
metabolite concentrations (>1012, (Sumner, 2010); 106, (Saito & Matsuda, 2010)). In addition,
plant metabolites are usually extracted with sometimes sophisticated protocols including
steps like heating or fractionation that may lead to losing or modifying metabolites, but also
promote the synthesis or import of chemical artefacts. This is why the term analyte, which
might be a metabolite or an artefact, is preferred. For example, during the derivatization
process, which is required for non-volatile compounds when performing GC-MS, a single
metabolite may produce multiple derivatives leading to different peaks. Similarly, adducts
and product ions are formed during the desolvation step following the ionization process in
LC-MS analyses (Werner et al., 2008). To cover the wide range of chemical diversity and
concentrations of plant metabolites, careful experimental design is definitely required,
including special care for harvest (Ernst, 1995), several extraction protocols and multi-
analytical platforms (see (Ryan & Robards, 2006; Allwood et al., 2011) and Tables 1-2).
Currently the number of quantified analytes in a given sample and in one shot is
approximately 50 with proton NMR, 100-200 with GC-MS, >1000 with LC-High-Resolution-
MS (LC-HR-MS). This expansion of scale has been made possible through improved
analytical capabilities, dissemination of routine procedures between laboratories, but also
implementation of dedicated statistical and data mining strategies. However, a large
proportion of the analytes detected in plant extracts cannot be annotated and identified
based on chemical shift and multiplicity for NMR analysis, or on elemental formula (based
on m/z ratio and isotopic ratio) and chromatographic retention time for GC- or LC-MS
analysis, alone. Hence metabolite identification, which uses a variety of analytical
techniques along with analyte/metabolite databases, remains difficult (Moco et al., 2007).
Achieving standardization for naming compounds at the plant metabolomics community
level is also an important issue, as it will enable researchers to share knowledge and speed
up metabolite identification (Saito & Matsuda, 2010; Kim et al., 2011). Another challenge is
the development of chimiotheques, where trusted reference compounds would be available
for the community to validate analyte identifications, for example via spiking experiments.
target metabolites or to run whole 13C metabolome isotope labelling (Feldberg et al., 2009;
Giavalisco et al., 2009). However, even with a stable isotope the matrix effects may impair the
quantification (Jemal et al., 2003) and few isotopically-labelled metabolites are currently
commercially available (Lei et al., 2011). In contrast to MS-based technologies, NMR, although
less sensitive, provides ease of quantitation since the resonance intensity is only determined by
the molar concentration, and high reproducibility (Ward et al., 2010; Kim et al., 2011).
Surprinsingly, a unique extraction protocol (sometimes one-step protocol) is typically used for a
given analytical technique, regardless of the vast variety of plant matrices (plant species, organs
and tissues). Very few metabolomic publications are prolix on extraction recovery and stability.
Running blanks (solvent blank and extraction blank) in the same conditions as the biological
samples is also important, as it is needed to identify impurities originating from solvents (Kaiser
et al., 2009) or consumables (i.e., phthalates from plastic ware) (Allwood et al., 2011; Weckwerth,
2011). Although metabolomics is by definition an untargeted approach, highly selective
extraction protocols along with targeted analysis should not be forgotten, especially to reach
high and reproducible extraction recovery as well as quantification accuracy (Sawada et al.,
2009). Then, replication is required to achieve statistical reliability. Biological replicates should
be preferred to technological replicates assuming biological variance almost always exceeds
analytical variance (Shintu et al., 2009). Five biological replicates of five pooled-tissue samples
or of five individuals and two to three technological replicates are recommended in plant
metabolomics to get statistically reliable information (Tikunov et al., 2007). Quality control
samples should also be run (Fiehn et al., 2008; Allwood et al., 2011).
Most analytical technologies require sample grinding prior to extraction and analysis. Mills
enabling the parallel grinding of large numbers of samples (e.g., 192 samples) are now
available at affordable prices. However, they usually do not allow multiparallel grinding of
samples of large size, suggesting that further developments are needed to enable large-scale
studies with organs such as fruits or ears and with most crops. Last but not least, the
weighing of aliquots is a tedious task, especially when the material needs to be kept at very
low temperature. A robot combining grinding and weighing of up to 96 samples has been
developed recently (http://www.labman.co.uk), opening the way for unprecedented
horizontal high-throughput.
robots and readers able to handle high density plates), which implies that very high
numbers of samples will have to be processed before decreasing costs per assay. These
limitations probably explain why the use of high density microplates has not been adopted
by a wide research community so far.
intermediates of central metabolism (Arrivault et al., 2009) and pesticides (Kmellar et al.,
2010). In fact, they provide together the appropriate selectivity, sensitivity and throughput.
a selection of samples to identify the most discriminating biomarkers that would then be
analyzed on a much greater number of samples using a targeted approach (Tarpley et al.,
2005). For example, such strategy has been successfully used in maize where a number of
enzymes were first profiled in a small panel of eight highly diverse maize inbred lines,
revealing a highly heritable variation in NAD-dependent isocitrate dehydrogenase activity.
The use of a panel of about hundred lines then allowed the identification of a novel amino-
acid substitution in a phylogenetically conserved site, which is assoaciated with isocitrate
activity variation (Zhang et al., 2010). On the contrary, a horizontal approach can be used to
screen large numbers of samples, thus revealing the most extremes or representative ones,
on which a vertical approach can then be used to search for unexpected modifications, to
study the system as a whole in the best possible matrix of samples, or simply to find novel
biomarkers. As an example, the easy to measure glucose-6-phosphate, which is a good
temporal marker of carbon depletion (Stitt et al., 2007), has been used to define a precise
time frame to study transcriptomic and metabolomic responses to carbon starvation in
Arabidopsis leaves (Usadel et al., 2008), thus avoiding unnecessary and costly analyses.
system of interest. Once the model is sufficiently accurate and detailed, it allows biologists
to accomplish two tasks (1) predict the behavior of the system given any perturbation such
as a modification of the environment, and (2) redesign or perturb the gene regulatory
network to create completely new emergent systems properties (Vidal, 2009; Westerhoff et
al., 2009; Arkin & Schaffer, 2011).
Exciting examples of integrated system biology to solve biological questions in plant science
have been published such as identification of key players in the branched amino acid
metabolism in A. thaliana (Curien et al., 2009), analysis of carbohydrate dynamics during
acclimation to low temperature in A. thaliana (Nagele et al., 2011), or understanding the
metabolism of tobacco grown on media containing different cytokins (Lexa et al., 2003).
Systems biology will benefit from close collaborations between different teams covering
complementary sectors of metabolism, e.g. central metabolism and different sectors of
secondary metabolism. The challenges in establishing such systems approaches rely on
collecting reliable, quantitative and systemic “omics” data, including metabolomics data, for
developing modelling able to predict de novo biological outcomes given the list of the
components involved. Advances in plant genome sequencing, transcriptomics and
proteomics have paved the way for a systematic analysis of cellular processes at gene and
protein levels. For metabolomics, some limitations remain for real system biology
approaches, in terms of analytical sensitivity, throughput and access to specific tissue or
subcellular compartments. Moreover, the high turn-over rate of many metabolic
intermediates has to be taken into consideration. In addition, the absolute quantification of
metabolites under physiological, in vivo and dynamic conditions remains a major challenge.
The combination of existing multiparallel analytical platforms with special attention to
metabolite quantification (see Sections 2.1.2 and 2.2.3) in a cohesive manner may not be
sufficient and emerging microtechnologies such as microfluidics will certainly help (see
Section 2.2.4 and (Wurm et al., 2010)).
Recently, plant systems biology has been redefined from cell to ecosystem (Keurentjes et al.,
2011). For these authors, in a holistic systems-biology approach, plants have to be studied at
six levels of biological organization (from subcellular level to ecosystem) in an orchestrated
way, with special attention to the interdependence between the various levels of biological
organization. The corresponding challenge will be to generate accurate experimental data
for communities, populations, single whole plants, down to cell types and their organelles
that can be used to feed new modelling concepts. For example, at the subcellular level
molecular signaling pathways are crucial to understand cell development, defense against
pathogens and many more intermediate processes in plants. The highly sensitive and high-
throughput method developed for the simultaneous analysis of 43 molecular species of
cytokinins, auxins, ABA and gibberellins (Kojima et al., 2009) has opened a big opportunity
to routinely describe basic molecular signaling pathways in plant cells. Others challenges
need to be considered in terms of dry labs. Because systems biology heavily relies on
information stored in public databases for the different levels of biological organization,
which is often incomplete, not standardized or improperly annotated, it is essential that
collective efforts are developed for the validation of large data sets. Plant network biology is
in its infancy and other current needs range from the development of new theoretical
methods to characterize network topology, to insights into dynamics of motif clusters and
biological function.
226 Metabolomics
3.3 Meta-phenomics
Comparing different species is a powerful way to extend knowledge about biological
processes. Thus, comparative genomics facilitate the assignation of gene function in non
sequenced organisms, enable the quick annotation of newly sequenced genomes and greatly
contribute to studies of gene function and evolution. For example, extensive synteny
between genomes of Graminae species has been shown (Salse, 2004) and QTL controlling
New Opportunities in Metabolomics and Biochemical Phenotyping for Plant Systems Biology 227
similar traits have been found in orthologous regions of e.g., maize and sorghum
(Figueiredo et al., 2010). Conversely, the fact that orthologous genes do not necessary have
the same functions in different species (Buckler et al., 2009) opens fascinating perspectives
regarding evolution of gene function (Wang et al., 2009).
Finding common and divergent phenotypes among large numbers of species is also a
promising way to better understand biological functions in the context of evolution.
Meta-phenomics, which has recently been proposed by Poorter and colleagues (Poorter et
al., 2009; Poorter et al., 2010), defines as the study of plant responses to environmental
factors by performing meta-analyses. This novel ecophysiological approach aims at
generalising plant responses by integrating phenotypic and environmental data gathered for
large numbers of species. Thus, by using accurate normalisation procedures generic
response curves were found for surface leaf area as related to major abiotic factors.
Noteworthy, data for >300 species had to be collected and curated manually throughout 60
years of literature. One exciting finding is divergences between groups of species could be
pinpointed, for example C3 and C4 species. There is no doubt that meta-phenomics is
amenable to the cellular level, and in particular to metabolic pathways, and C3 and C4
metabotypes are indeed easy to distinguish when comparing their respective metabolomes.
However, this might be considerably complicated given the heterogeneity of available
metabolic data (in terms of e.g., annotation and normalisation). Furthermore, descriptions of
environmental conditions found in literature are almost always text-based, and thus very
difficult to compute. Fortunately, the adoption and use of standardised conceptualisations
with explicit specifications to report data and metadata (i.e. minimum checklists) is
progressing in the field of metabolomics (Fiehn et al., 2007a; Fiehn et al., 2007b). It will
nevertheless be of central importance to prefer absolute quantification and to enable
quantitative descriptions of environmental factors, which will probably be facilitated via
collaborations with ecophysiologists.
4. Conclusion
As metabolomics in general (Hall et al., 2011), plant metabolomics is moving towards
biology with a growing variety of applications from ‘simple’ diagnostic of culture practices
to translational studies towards systems biology. However, for some of the emerging
applications, the optimization of analytical and computational technologies for the
acquisition, handling and mining of metabolomics data remains necessary. Some of the
crucial bottlenecks that still have to be adressed concern quantification for modelling, time
and spatial resolved experiments, multi-experiments and data sharing.
The promotion of multi-experiments and multi-labs combined analyses (Allwood et al.,
2009; Ward et al., 2010) for high sample numbers, indispensable for some ecology or
quantitative genetics studies for instance, requires shared plant biological standards (labeled
or non-labeled) and standardization of their use. The absolute quantification data, needed
for metabolism modelling in systems biololy approaches, also requires isotopically labelled
plant standards or at least labelled reference compounds for MS approaches. The
generalisation of time-resolved experiments for instance for the study of fine metabolism
regulation or short-term responses to stresses will need further increases in horizontal high-
throughput using microplate, microfluidics or other technologies. Besides increased
throughput, increased sensitivity for all the analytical technologies listed in this review may
228 Metabolomics
open new insights into the use of metabolomics for plant development studies. Spatial-
resolved experiments with analysis of laser-microdissected samples by NMR or MS (Moco
et al., 2009; Kim et al., 2011) will be particularly useful for the study of plant-pathogen
interactions. The generalization of metabolite compartmentation studies in plant tissues at
the cellular and subcellular levels, possibly with non-aqueous fractionation (Krueger et al.,
2011), will also request increases in both horizontal high-throughput and sensitivity.
Moreover, the systematic sharing, combining, and re-exploring of the data produced using
targeted metabolic phenotyping or untargeted metabolomics will produce new knowledge.
Cataloging the metabolome itself by experimental data and literature data, stored in curated
databases can complement genomic reconstructions of metabolism (Fiehn et al., 2011).
Access to the regulation of the plasticity and flexibility of metabolic networks implies that
the metadata of each experiment, including environment metadata (Hannemann et al., 2009)
have to be carefully documented and uploaded into a central or distributed network
repository dedicated to plants. This suggests that the MSI initiative (Fiehn et al., 2007a) has
to continue to propose and promote standardization criteria that will be integrated by the
bioinformatics developments of open repositories and used by the community. In addition,
sophisticated but easy-to-use tools for metabolomics data combining, integration with other
phenotyping or omics data, and integrated statistical analyses and modelling are needed.
The plant metabolome community may benefit from more interaction with the human
metabolome community for the use and development of such tools, and both may address
combined analyses of food quality determinants (Hall et al., 2008) and food human
consumption monitoring (Wishart, 2008).
5. Acknowledgment
Financial supports of ERA-NET ERASysBio+ (FRIM) and FP7 KBBE (DROPS, grant
agreement number FP7-244374) are acknowledged. All authors acknowledge support from
the Metabolome Facility of Bordeaux Functional Genomics Centre.
6. References
Allwood, J.W.; de Vos, C.H.R.; Moing, A.; Deborde, C.; Erban, A.; Kopka, J.; Goodacre, R. &
Hall, R. (2011). Plant metabolomics and its potential for systems biology research:
background concepts, technology and methodology, In: Methods In Systems Biology,
Westerhoff, H., Hayes, N., (Eds) in press, Elsevier Inc., isbn 978-0-12-385118-5,
Amsterdam, Netherlands
Allwood, J.W.; Erban, A.; de Koning, S.; Dunn, W.B.; Luedemann, A.; Lommen, A.; Kay, L.;
Loscher, R.; Kopka, J. & Goodacre, R. (2009). Inter-laboratory reproducibility of fast
gas chromatography-electron impact-time of flight mass spectrometry (GC-EI-
TOF/MS) based plant metabolomics. Metabolomics, Vol.5, No. 4, (Dec 2009), pp.
479-496, issn 1573-3882
Ap Rees, T.A.; Fuller, W.A. & Wright, B.W. (1977). Measurements of glycolytic intermediates
during onset of thermogenesis in spadix of Arum maculatum. Biochimica Biophysica
Acta, Vol.461, No. 2, (Aug 1977), pp. 274-282, issn 0006-3002
Arkin, A.P. & Schaffer, D.V. (2011). Network News: Innovations in 21st Century Systems
Biology. Cell, Vol.144, No. 6, (Mar 2011), pp. 844-849, issn 0092-8674
New Opportunities in Metabolomics and Biochemical Phenotyping for Plant Systems Biology 229
Arrivault, S.; Guenther, M.; Ivakov, A.; Feil, R.; Vosloh, D.; van Dongen, J.T.; Sulpice, R. &
Stitt, M. (2009). Use of reverse-phase liquid chromatography, linked to tandem
mass spectrometry, to profile the Calvin cycle and other metabolic intermediates in
Arabidopsis rosettes at different carbon dioxide concentrations. Plant Journal,
Vol.59, No. 5, (Sep 2009), pp. 824-839, issn 0960-7412
Atalay, Y.T.; Witters, D.; Vermeir, S.; Vergauwe, N.; Verboven, P.; Nicolai, B. & Lammertyn,
J. (2009). Design and optimization of a double-enzyme glucose assay in
microfluidic lab-on-a-chip. Biomicrofluidics, Vol.3, No.4, (Oct-Dec 2009), issn 1932-
1058
Battersby, B.J. & Trau, M. (2002). Novel miniaturized systems in high-throughput screening.
Trends in Biotechnology, Vol.20, No. 4, (Apr 2002), pp. 167-173, issn 0167-7799
Beale, M.H. & Sussman, M.R. (2011). Metabolomics of Arabidopsis Thaliana, In: Biology of
Plant Metabolomics, Hall, R.D., (Ed.) 157-180, Wiley-Blackwell, isbn 978-1-4051-9954-
4, Chichester, UK
Bergmeyer, H.U. (1983). Metabolites 1: Carbohydrates, In: Methods of enzymatic analysis,
Bergmeyer, J., GraBl, M., (Eds) 701, VCH Verlagsgesellschaft mbH, isbn 3-527-
26046-3, Weinheim, Germany
Bergmeyer, H.U. (1985). Metabolites 2: Tri and dicarboxylic acids, purines, pyrimidines and
derivatives, coenzymes, inorganic compounds, In: Methods of enzymatic analysis,
Bergmeyer, J., GraBl, M., (Eds) 656, VCH Verlagsgesellschaft mbH, isbn 3-527-
26047-1, Weinheim, Germany
Bergmeyer, H.U. (1987). Metabolites 3 - Lipids, Amino Acids and Related Compounds In:
Methods of enzymatic analysis, Bergmeyer, J., GraBl, M., (Eds) VCH
Verlagsgesellschaft mbH, isbn 3-527-26048-X, Weinheim, Germany
Bourgis, F.; Kilaru, A.; Cao, X.; Ngando-Ebongue, G.F.; Drira, N.; Ohlrogge, J.B. & Arondel,
V. (2011). Comparative transcriptome and metabolite analysis of oil palm and date
palm mesocarp that differ dramatically in carbon partitioning. Proceedings of the
National Academy of Sciences of the United States of America, Vol.108, No. 30, (Jul
2011), pp. 12527-12532, issn 0027-8424
Broyart, C.; Fontaine, J.X.; Molinie, R.; Cailleu, D.; Terce-Laforgue, T.; Dubois, F.; Hirel, B. &
Mesnard, F. (2010). Metabolic profiling of maize mutants deficient for two
glutamine synthetase isoenzymes using (1)H-NMR-based metabolomics.
Phytochemical Analysis, Vol.21, No. 1, (Jan-Feb 2010), pp. 102-109, issn 0958-0344
Buckler, E.S.; Holland, J.B.; Bradbury, P.J.; Acharya, C.B.; Brown, P.J.; Browne, C.; Ersoz, E.;
Flint-Garcia, S.; Garcia, A.; Glaubitz, J.C.; Goodman, M.M.; Harjes, C.; Guill, K.;
Kroon, D.E.; Larsson, S.; Lepak, N.K.; Li, H.H.; Mitchell, S.E.; Pressoir, G.; Peiffer,
J.A.; Rosas, M.O.; Rocheford, T.R.; Romay, M.C.; Romero, S.; Salvo, S.; Villeda, H.S.;
da Silva, H.S.; Sun, Q.; Tian, F.; Upadyayula, N.; Ware, D.; Yates, H.; Yu, J.M.;
Zhang, Z.W.; Kresovich, S. & McMullen, M.D. (2009). The genetic architecture of
maize flowering time. Science, Vol.325, No. 5941, (Aug 2009), pp. 714-718, issn 0036-
8075
Bylesjo, M.; Nilsson, R.; Srivastava, V.; Gronlund, A.; Johansson, A.I.; Jansson, S.; Karlsson,
J.; Moritz, T.; Wingsle, G. & Trygg, J. (2009). Integrated analysis of transcript,
230 Metabolomics
protein and metabolite data to study lignin biosynthesis in hybrid aspen. Journal of
Proteome Research, Vol.8, No. 1, (Jan 2009), pp. 199-210, issn 1535-3893
Calinski, T.; Kaczmarek, Z.; Krajewski, P.; Frova, C. & Sari-Gorla, M. (2000). A multivariate
approach to the problem of QTL localization. Heredity, Vol.84, No. 3, (Mar 2000),
pp. 303-310, issn 0018-067X
Choi, C.J. & Cunningham, B.T. (2007). A 96-well microplate incorporating a replica molded
microfluidic network integrated with photonic crystal biosensors for high
throughput kinetic biomolecular interaction analysis. Lab on a Chip, Vol.7, No. 5,
(Mar 2007), pp. 550-556, issn 1473-0197
Ciccimaro, E. & Blair, I.A. (2010). Stable-isotope dilution LC-MS for quantitative biomarker
analysis. Bioanalysis, Vol.2, No. 2, (Feb 2010), pp. 311-341, issn 1757-6180
Cossegal, M.; Chambrier, P.; Mbelo, S.; Balzergue, S.; Martin-Magniette, M.L.; Moing, A.;
Deborde, C.; Guyon, V.; Perez, P. & Rogowsky, P. (2008). Transcriptional and
metabolic adjustments in ADP-glucose pyrophosphorylase-deficient bt2 maize
kernels. Plant Physiology, Vol.146, No. 4, (Apr 2008), pp. 1553-1570, issn 0032-0889
Curien, G.; Bastlen, O.; Robert-Genthon, M.; Cornish-Bowden, A.; Cardenas, M.L. & Dumas,
R. (2009). Understanding the regulation of aspartate metabolism using a model
based on measured kinetic parameters. Molecular Systems Biology, Vol.5, (May
2009), issn 1744-4292
de Vos, R.C.H.; Hall, R. & Moing, A. (2011). Metabolomics of a model fruit: tomato In:
Biology of Plant Metabolomics, Hall, R., (Ed.) 109-155, Wiley-Blackwell Ltd, isbn 978-
1-4051-9954-4, Oxford
Deluc, L.G.; Grimplet, J.; Wheatley, M.D.; Tillett, R.L.; Quilici, D.R.; Osborne, C.; Schooley,
D.A.; Schlauch, K.A.; Cushman, J.C. & Cramer, G.R. (2007). Transcriptomic and
metabolite analyses of Cabernet Sauvignon grape berry development. BMC
Genomics, Vol.8 (Nov 2007), issn 1471-2164
Dixon, R.A. & Strack, D. (2003). Phytochemistry meets genome analysis, and beyond.
Phytochemistry, Vol.62, No. 6, (Mar 2003), pp. 815-816, issn 0031-9422
Dwivedi, P.; Wu, P.; Klopsch, S.J.; Puzon, G.J.; Xun, L. & Hill, H.H. (2008). Metabolic
profiling by ion mobility mass spectrometry (IMMS). Metabolomics, Vol.4, No. 1,
(Mar 2008), pp. 63-80, issn 1573-3882
Eklund, S.E.; Snider, R.M.; Wikswo, J.; Baudenbacher, F.; Prokop, A. & Cliffel, D.E. (2006).
Multianalyte microphysiometry as a tool in metabolomics and systems biology.
Journal of Electroanalytical Chemistry, Vol.587, No. 2, (Feb 2006), pp. 333-339, issn
0022-0728
Ernst, W.H.O. (1995). Sampling of plant material for chemical analysis. Science of the Total
Environment, Vol.176, No. 1-3, (Dec 1995), pp. 15-24, issn 0048-9697
Fait, A.; Hanhineva, K.; Beleggia, R.; Dai, N.; Rogachev, I.; Nikiforova, V.J.; Fernie, A.R. &
Aharoni, A. (2008). Reconfiguration of the achene and receptacle metabolic
networks during strawberry fruit development. Plant Physiology, Vol.148, No. 2,
(Oct 2008), pp. 730-750, issn 0032-0889
Farag, M.A.; Huhman, D.V.; Dixon, R.A. & Sumner, L.W. (2008). Metabolomics reveals
novel pathways and differential mechanistic and elicitor-specific responses in
New Opportunities in Metabolomics and Biochemical Phenotyping for Plant Systems Biology 231
Gibon, Y.; Vigeolas, H.; Tiessen, A.; Geigenberger, P. & Stitt, M. (2002). Sensitive and high
throughput metabolite assays for inorganic pyrophosphate, ADPGlc, nucleotide
phosphates, and glycolytic intermediates based on a novel enzymic cycling system.
Plant Journal, Vol.30, No. 2, (Apr 2002), pp. 221-235, issn 0960-7412
Hall, R.D. (2006). Plant metabolomics: from holistic hope, to hype, to hot topic. New
Phytologist, Vol.169, No. 3, (Jan 2006), pp. 453-468, issn 1469-8137
Hall, R.D. (2011a). Biology of Plant Metabolomics, In: Annual Plant Reviews, 420, Wiley-
Blackwell, isbn 978-1-4051-9954-4, Chichester, UK
Hall, R.D. (2011b). Plant Metabolomics in a Nutshell: Potential and Future Challenges, In:
Biology of Plant Metabolomics, Hall, R.D., (Ed.) 1-24, Wiley-Blackwell, isbn 978-1-
4051-9954-4, Chichester, UK
Hall, R.D.; Brouwer, I.D. & Fitzgerald, M.A. (2008). Plant metabolomics and its potential
application for human nutrition. Physiologia Plantarum, Vol.132, No. 2, (Feb 2008),
pp. 162-175, issn 0031-9317
Hall, R.D.; Wishart, D. & Roessner, U. (2011). Metabolomics and the move towards biology.
Metabolomics, Vol.7, No. 3, (Sep 2011), pp. 454-456, issn 1573-3882
Halpin, S.T. & Spence, D.M. (2010). Direct plate-reader measurement of nitric oxide released
from hypoxic erythrocytes flowing through a microfluidic device. Analytical
Chemistry, Vol.82, No. 17, (Sep 2010), pp. 7492-7497, issn 0003-2700
Hannemann, J.; Poorter, H.; Usadel, B.; Blasing, O.E.; Finck, A.; Tardieu, F.; Atkin, O.K.;
Pons, T.; Stitt, M. & Gibon, Y. (2009). Xeml Lab: a tool that supports the design of
experiments at a graphical interface and generates computer-readable metadata
files, which capture information about genotypes, growth conditions,
environmental perturbations and sampling strategy. Plant Cell and Environment,
Vol.32, No. 9, (Sep 2009), pp. 1185-1200, issn 0140-7791
Hausler, R.E.; Fischer, K.L. & Flugge, U.I. (2000). Determination of low-abundant
metabolites in plant extracts by NAD(P)H fluorescence with a microtiter plate
reader. Analytical Biochemistry, Vol.281, No. 1, (May 2000), pp. 1-8, issn 0003-2697
Hirai, M.Y.; Sawada, Y.; Kanaya, S.; Kuromori, T.; Kobayashi, M.; Klausnitzer, R.; Hanada,
K.; Akiyama, K.; Sakurai, T.; Saito, K. & Shinozaki, K. (2010). Toward genome-wide
metabolotyping and elucidation of metabolic system: metabolic profiling of large-
scale bioresources. Journal of Plant Research, Vol.123, No. 3, (May 2010), pp. 291-298,
issn 0918-9440
Hompesch, R.W.; Garcia, C.D.; Weiss, D.J.; Vivanco, J.M. & Henry, C.S. (2005). Analysis of
natural flavonoids by microchip-micellar electrokinetic chromatography with
pulsed amperometric detection. Analyst, Vol.130, No. 5, (May 2005), pp. 694-700,
issn 0003-2654
Jacob, S.S.; Smith, N.W. & Legido-Quigley, C. (2007). Assessment of Chinese medicinal herb
metabolite profiles by UPLC-MS-based methodology for detection of aristolochic
acids. Journal of Separation Science, Vol.30, No. 8, (May 2007), pp. 1200-1206, issn
1615-9306
Jemal, M.; Schuster, A. & Whigan, D.B. (2003). Liquid chromatography/tandem mass
spectrometry methods for quantitation of mevalonic acid in human plasma and
urine: method validation, demonstration of using a surrogate analyte, and
New Opportunities in Metabolomics and Biochemical Phenotyping for Plant Systems Biology 233
Moco, S.; Schneider, B. & Vervoort, J. (2009). Plant micrometabolomics: The analysis of
endogenous metabolites present in a plant cell or tissue. Journal of Proteome
Research, Vol.8, No. 4, (Apr 2009), pp. 1694-1703, issn 1535-3893
Moing, A.; Aharoni, A.; Biais, B.; Rogachev, I.; Meir, S.; Brodsky, L.; Allwood, J.W.; Erban,
A.; Dunn, W.B.; Kay, L.; de Koning, S.; de Vos, R.C.H.; Jonker, H.; Mumm, R.;
Deborde, C.; Maucourt, M.; Bernillon, S.; Gibon, Y.; Hansen, T.H.; Husted, S.;
Goodacre, R.; Kopka, J.; Schjoerring, J.K.; Rolin, D. & Hall, R.D. (2011). Extensive
metabolic cross-talk in melon fruit revealed by spatial and developmental
combinatorial metabolomics. New Phytologist, Vol.190, No. 3, (May 2011), pp. 683-
696, issn 1469-8137
Moser, I.; Jobst, G. & Urban, G.A. (2002). Biosensor arrays for simultaneous measurement of
glucose, lactate, glutamate, and glutamine. Biosensors & Bioelectronics, Vol.17, No. 4,
(Apr 2002), pp. 297-302, issn 0956-5663
Mounet, F.; Moing, A.; Garcia, V.; Petit, J.; Maucourt, M.; Deborde, C.; Bernillon, S.; Le Gall,
G.; Colquhoun, I.; Defernez, M.; Giraudel, J.-L.; Rolin, D.; Rothan, C. & Lemaire-
Chamley, M. (2009). Gene and metabolite regulatory network analysis of early
developing fruit tissues highlights new candidate genes for the control of tomato
fruit composition and development. Plant Physiology, Vol.149, No. 3, (March 2009),
pp. 1505-1528, issn 0032-0889
Mulas, G.; Galaffu, M.G.; Pretti, L.; Nieddu, G.; Mercenaro, L.; Tonelli, R. & Anedda, R.
(2011). NMR analysis of seven selections of Vermentino grape berry: metabolites
composition and development. Journal of Agricultural and Food Chemistry, Vol.59,
No. 3, (Feb 2011), pp. 793-802, issn 0021-8561
Myles, S.; Peiffer, J.; Brown, P.J.; Ersoz, E.S.; Zhang, Z.W.; Costich, D.E. & Buckler, E.S.
(2009). Association mapping: Critical considerations shift from genotyping to
experimental design. Plant Cell, Vol.21, No. 8, (Aug 2009), pp. 2194-2202, issn 1040-
4651
Nagele, T.; Kandel, B.A.; Frana, S.; Meissner, M. & Heyer, A.G. (2011). A systems biology
approach for the analysis of carbohydrate dynamics during acclimation to low
temperature in Arabidopsis thaliana. Febs Journal, Vol.278, No. 3, (Feb 2011), pp. 506-
518, issn 1742-464X
Nagrath, S.; Sequist, L.V.; Maheswaran, S.; Bell, D.W.; Irimia, D.; Ulkus, L.; Smith, M.R.;
Kwak, E.L.; Digumarthy, S.; Muzikansky, A.; Ryan, P.; Balis, U.J.; Tompkins, R.G.;
Haber, D.A. & Toner, M. (2007). Isolation of rare circulating tumour cells in cancer
patients by microchip technology. Nature, Vol.450, No. 7173, (Dec 2007), pp. 1235-
1239, issn 0028-0836
Oikawa, A.; Matsuda, F.; Kusano, M.; Okazaki, Y. & Saito, K. (2008). Rice metabolomics.
Rice, Vol.1, No. 1, (Sep 2008), pp. 63-71, issn 1939-8425
Okada, T.; Afendi, F.M.; Altaf-Ul-Amin, M.; Takahashi, H.; Nakamura, K. & Kanaya, S.
(2010). Metabolomics of medicinal plants: The importance of multivariate analysis
of analytical chemistry data. Current Computer-Aided Drug Design, Vol.6, No. 3, (Sep
2010), pp. 179-196, issn 1573-4099
236 Metabolomics
Oliver, S.G.; Winson, M.K.; Kell, D.B. & Baganz, F. (1998). Systematic functional analysis of
the yeast genome. Trends in Biotechnology, Vol.16, No. 9, (Sep 1998), pp. 373-378,
issn 0167-7799
Palama, T.L.; Khatib, A.; Choi, Y.H.; Payet, B.; Fock, I.; Verpoorte, R. & Kodja, H. (2009).
Metabolic changes in different developmental stages of Vanilla planifolia pods.
Journal of Agricultural and Food Chemistry, Vol.57, No. 17, (Sep 2009), pp. 7651-7658,
issn 0021-8561
Pichersky, E. & Lewinsohn, E. (2011). Convergent evolution in plant specialized metabolism,
In: Annual Review of Plant Biology, 549-566, Annual Reviews, isbn 978-0-8243-0662-
5, Palo Alto, California
Poorter, H.; Niinemets, U.; Walter, A.; Fiorani, F. & Schurr, U. (2010). A method to construct
dose-response curves for a wide range of environmental factors and plant traits by
means of a meta-analysis of phenotypic data. Journal of Experimental Botany, Vol.61,
No. 8, (May 2010), pp. 2043-2055, issn 0022-0957
Poorter, H.; Walter, A.; Fiorani, F.; Schurr, U. & Niinemets, U. (2009). Meta-phenomics:
Building a unified framework for interpreting plant growth responses to diverse
environmental variables. Comparative Biochemistry and Physiology a-Molecular &
Integrative Physiology, Vol.153A, No. 2, (Jun 2009), pp. S224-S224, issn 1095-6433
Ramautar, R.; Mayboroda, O.A.; Somsen, G.W. & de Jong, G.J. (2011). CE-MS for
metabolomics: Developments and applications in the period 2008-2010.
Electrophoresis, Vol.32, No. 1, (Jan 2011), pp. 52-65, issn 0173-0835
Rashed, M.S.; Bucknall, M.P.; Little, D.; Awad, A.; Jacob, M.; Alamoudi, M.; Alwattar, M. &
Ozand, P.T. (1997). Screening blood spots for inborn errors of metabolism by
electrospray tandem mass spectrometry with a microplate batch process and a
computer algorithm for automated flagging of abnormal profiles. Clinical Chemistry,
Vol.43, No. 7, (Jul 1997), pp. 1129-1141, issn 0009-9147
Redestig, H. & Costa, I.G. (2011). Detection and interpretation of metabolite-transcript
coresponses using combined profiling data. Bioinformatics, Vol.27, No. 13, (Jul 2011),
pp. I357-I365, issn 1367-4803
Redestig, H.; Kusano, M.; Fukushima, A.; Matsuda, F.; Saito, K. & Arita, M. (2010).
Consolidating metabolite identifiers to enable contextual and multi-platform
metabolomics data analysis. BMC Bioinformatics, Vol.11, (Apr 2010), issn 1471-2105
Roda, A.; Pasini, P.; Mirasoli, M.; Michelini, E. & Guardigli, M. (2004). Biotechnological
applications of bioluminescence and chemiluminescence. Trends in Biotechnology,
Vol.22, No. 6, (Jun 2004), pp. 295-303, issn 0167-7799
Ruane, J. & Sonnino, A. (2011). Agricultural biotechnologies in developing countries and
their possible contribution to food security. Journal of Biotechnology, Vol.In Press,
Corrected Proof, ( 2011), issn 0168-1656
Ryan, D. & Robards, K. (2006). Analytical chemistry considerations in plant metabolomics.
Separation and Purification Reviews, Vol.35, No. 4, (Nov 2006), pp. 319-356, issn 1542-
2119
Saito, K. & Matsuda, F. (2010). Metabolomics for Functional Genomics, Systems Biology, and
Biotechnology, In: Annual Review of Plant Biology, Vol 61, 463-489, Annual Reviews,
isbn 1543-5008, Palo Alto, California
New Opportunities in Metabolomics and Biochemical Phenotyping for Plant Systems Biology 237
Salse, J. (2004). New in silico insight into the synteny between rice (Oryza sativa L.) and
maize (Zea mays L.) highlights reshuffling and identifies new duplications in the
rice genome (Vol.38, (May 2004), pp. 396-409). Plant Journal, Vol.38, No. 5, (Jun
2004), pp. 873-873, issn 0960-7412
Sanchez-Perez, E.M.; Iglesias, M.J.; Lopez-Ortiz, F.; Sanchez-Perez, I. & Martinez-Galera, M.
(2010). Study of the suitability of HRMAS NMR for metabolic profiling of tomatoes:
Application to tissue differentiation and fruit ripening. Food Chemistry, Vol.122, No.
3, (Oct 2010), pp. 877-887, issn 0308-8146
Sanchez, D.H.; Pieckenstain, F.L.; Szymanski, J.; Erban, A.; Bromke, M.; Hannah, M.A.;
Kraemer, U.; Kopka, J. & Udvardi, M.K. (2011). Comparative functional genomics
of salt stress in related model and cultivated plants identifies and overcomes
limitations to translational genomics. Plos One, Vol.6, No. 2, (Feb 2011), issn 1932-
6203
Sawada, Y.; Akiyama, K.; Sakata, A.; Kuwahara, A.; Otsuki, H.; Sakurai, T.; Saito, K. & Hirai,
M.Y. (2009). Widely targeted metabolomics based on large-scale ms/ms data for
elucidating metabolite accumulation patterns in plants. Plant and Cell Physiology,
Vol.50, No. 1, (Jan 2009), pp. 37-47, issn 0032-0781
Schauer, N.; Semel, Y.; Roessner, U.; Gur, A.; Balbo, I.; Carrari, F.; Pleban, T.; Perez-Melis, A.;
Bruedigam, C.; Kopka, J.; Willmitzer, L.; Zamir, D. & Fernie, A.R. (2006).
Comprehensive metabolic profiling and phenotyping of interspecific introgression
lines for tomato improvement. Nature Biotechnology, Vol.24, No. 4, (Apr 2006), pp.
447-454, issn 1087-0156
Shaw, R.A.; Rigatto, C.; Reslerova, M.; Ying, S.L.; Man, A.; Schattka, B.; Battrell, C.F.;
Matthewson, J. & Mansfield, C. (2009). Toward point-of-care diagnostic metabolic
fingerprinting: quantification of plasma creatinine by infrared spectroscopy of
microfluidic-preprocessed samples. Analyst, Vol.134, No. 6, (Jun 2009), pp. 1224-
1231, issn 0003-2654
Shepherd, L.V.T.; Fraser, P. & Stewart, D. (2011). Metabolomics: a second-generation
platform for crop and food analysis. Bioanalysis, Vol.3, No. 10, (May 2011), pp. 1143-
1159, issn 1757-6180
Shintu, L.; Le Gall, G. & Colquhoun, I.J. (2009). Metabolomics and the detection of
unintended effects in genetically modified crops, In: Plant-derived Natural Products,
Osbourn, A.E., Lanzotti, V., (Eds) 505-531, Springer New York, isbn 978-0-387-
85497-7, New-York
Shroff, R.; Rulisek, L.; Doubsky, J. & Svatos, A. (2009). Acid-base-driven matrix-assisted
mass spectrometry for targeted metabolomics. Proceedings of the National Academy of
Sciences of the United States of America, Vol.106, No. 25, (Jun 2009), pp. 10092-10096,
issn 0027-8424
Skogerson, K.; Harrigan, G.G.; Reynolds, T.L.; Halls, S.C.; Ruebelt, M.; Iandolino, A.;
Pandravada, A.; Glenn, K.C. & Fiehn, O. (2010). Impact of genetics and
environment on the metabolite composition of maize grain. Journal of Agricultural
and Food Chemistry, Vol.58, No. 6, (Mar 2010), pp. 3600-3610, issn 0021-8561
238 Metabolomics
Smith, J.E. & Bluhm, B.H. (2011). Metabolic fingerprinting in Fusarium verticillioides to
determine gene function, In: Fungal Genomics : Methods and Protocols, Xu, J.R.,
Bluhm, B.H.H., (Eds) 237-247, Humana Press, isbn 978-1-61779-039-3, Heidelberg
Stitt, M.; Gibon, Y.; Lunn, J.E. & Piques, M. (2007). Multilevel genomics analysis of carbon
signalling during low carbon availability: coordinating the supply and utilisation of
carbon in a fluctuating environment. Functional Plant Biology, Vol.34, No. 6, (Jun
2007), pp. 526-549, issn 1445-4408
Sulpice, R.; Trenkamp, S.; Steinfath, M.; Usadel, B.; Gibon, Y.; Witucka-Wall, H.; Pyl, E.T.;
Tschoep, H.; Steinhauser, M.C.; Guenther, M.; Hoehne, M.; Rohwer, J.M.; Altmann,
T.; Fernie, A.R. & Stitt, M. (2010). Network analysis of enzyme activities and
metabolite levels and their relationship to biomass in a large panel of Arabidopsis
accessions. Plant Cell, Vol.22, No. 8, (Aug 2010), pp. 2872-2893, issn 1040-4651
Sumner, L. (2010). Recent advances in plant metabolomics and greener pastures. F1000
Biology Reports, Vol.2 (Jan 2010), pp. 7, issn 1757-594X
Takatsy, G. (1955). The use of spiral loops in serological and virological micro-methods. Acta
Microbiologica Academiae Scientiarum Hungaricae, Vol. 3, No. 1-2, (Jun 1955), pp. 191-
202, issn 0001-6187
Tarpley, L.; Duran, A.L.; Kebrom, T.H. & Sumner, L.W. (2005). Biomarker metabolites
capturing the metabolite variance present in a rice plant developmental period.
BMC Plant Biology, Vol.5, (May 2005), issn 1471-2229
Tester, M. & Langridge, P. (2010). Breeding technologies to increase crop production in a
changing world. Science, Vol.327, No. 5967, (Feb 2010), pp. 818-822, issn 0036-8075
Thiele, B.; Fullner, K.; Stein, N.; Oldiges, M.; Kuhn, A.J. & Hofmann, D. (2008). Analysis of
amino acids without derivatization in barley extracts by LC-MS-MS. Analytical and
Bioanalytical Chemistry, Vol.391, No. 7, (Aug 2008), pp. 2663-2672, issn 1618-2642
Tikunov, Y.M.; Verstappen, F.W.A. & Hall, R.D. (2007). Metabolomic profiling of natural
volatiles headspace trapping: GC-MS, In: Metabolomics, Weckwerth, W., (Ed.) 39-53,
Humana Press, isbn 1064-3745, Totowa, New Jersey
Tomas, R.; Kleparnik, K. & Foret, F. (2008). Multidimensional liquid phase separations for
mass spectrometry. Journal of Separation Science, Vol.31, No. 11, (Jun 2008), pp. 1964-
1979, issn 1615-9306
Trethewey, R.N. (2004). Metabolite profiling as an aid to metabolic engineering in plants.
Current Opinion in Plant Biology, Vol.7, No. 2, (Apr 2004), pp. 196-201, issn 1369-5266
Trufelli, H.; Palma, P.; Famiglini, G. & Cappiello, A. (2011). An overview of matrix effects in
liquid chromatography-mass spectrometry. Mass Spectrometry Reviews, Vol.30, No.
3, (May-Jun 2011), pp. 491-509, issn 0277-7037
Urbanczyk-Wochniak, E.; Baxter, C.; Kolbe, A.; Kopka, J.; Sweetlove, L.J. & Fernie, A.R.
(2005). Profiling of diurnal patterns of metabolite and transcript abundance in
potato (Solanum tuberosum) leaves. Planta, Vol.221, No. 6, (Aug 2005), pp. 891-903,
issn 0032-0935
Usadel, B.; Blasing, O.E.; Gibon, Y.; Retzlaff, K.; Hoehne, M.; Gunther, M. & Stitt, M. (2008).
Global transcript levels respond to small changes of the carbon status during
progressive exhaustion of carbohydrates in Arabidopsis rosettes. Plant Physiology,
Vol.146, No. 4, (Apr 2008), pp. 1834-1861, issn 0032-0889
New Opportunities in Metabolomics and Biochemical Phenotyping for Plant Systems Biology 239
Vengasandra, S.; Cai, Y.K.; Grewell, D.; Shinar, J. & Shinar, R. (2010). Polypropylene CD-
organic light-emitting diode biosensing platform. Lab on a Chip, Vol.10, No. 8, 2010),
pp. 1051-1056, issn 1473-0197
Vidal, M. (2009). A unifying view of 21st century systems biology. Febs Letters, Vol.583, No.
24, (Dec 2009), pp. 3891-3894, issn 0014-5793
Villiers, F.; Ducruix, C.; Hugouvieux, V.; Jarno, N.; Ezan, E.; Garin, J.; Junot, C. &
Bourguignon, J. (2011). Investigating the plant response to cadmium exposure by
proteomic and metabolomic approaches. Proteomics, Vol.11, No. 9, (May 2011), pp.
1650-1663, issn 1615-9861
Wang, Q.Z.; Wu, C.Y.; Chen, T.; Chen, X. & Zhao, X.M. (2006). Integrating metabolomics
into a systems biology framework to exploit metabolic complexity: strategies and
applications in microorganisms. Applied Microbiology and Biotechnology, Vol.70, No.
2, (Mar 2006), pp. 151-161, issn 0175-7598
Wang, X.Y.; Gowik, U.; Tang, H.B.; Bowers, J.E.; Westhoff, P. & Paterson, A.H. (2009).
Comparative genomic analysis of C4 photosynthetic pathway evolution in grasses.
Genome Biology, Vol.10, No. 6, (Jun 2009), issn 1474-760X
Ward, J.; Baker, J.M.; Miller, S.; Deborde, C.; Maucourt, M.; Biais, B.; Rolin, D.; Moing, A.;
Moco, S.; Vervoort, J.; Lommen, A.; Schäfer, H.; Humpfer, E. & Beale, M.H. (2010).
An inter-laboratory comparison demonstrates that [1H]-NMR metabolite
fingerprinting is a robust technique for collaborative plant metabolomic data
collection. Metabolomics, Vol.6, No. 2, (Jun 2010), pp. 263-273, issn 1573-3882
Weckwerth, W. (2007). Metabolomics. Methods and protocols. Humana Press, isbn 1-59745-244-
0, Totowa, USA
Weckwerth, W. (2011). Unpredictability of metabolism-the key role of metabolomics science
in combination with next-generation genome sequencing. Analytical and
Bioanalytical Chemistry, Vol.400, No. 7, (Jun 2011), pp. 1967-1978, issn 1618-2642
Weckwerth, W.; Loureiro, M.E.; Wenzel, K. & Fiehn, O. (2004). Differential metabolic
networks unravel the effects of silent plant phenotypes. Proceedings of the National
Academy of Sciences of the United States of America, Vol.101, No. 20, (May 2004), pp.
7809-7814, issn 0027-8424
Werner, E.; Heilier, J.F.; Ducruix, C.; Ezan, E.; Junot, C. & Tabet, J.C. (2008). Mass
spectrometry for the identification of the discriminating signals from
metabolomics: Current status and future trends. Journal of Chromatography B-
Analytical Technologies in the Biomedical and Life Sciences, Vol.871, No. 2, (Aug 2008),
pp. 143-163, issn 1570-0232
Westerhoff, H.V.; Winder, C.; Messiha, H.; Simeonidis, E.; Adamczyk, M.; Verma, M.;
Bruggeman, F.J. & Dunn, W. (2009). Systems Biology: The elements and principles
of Life. Febs Letters, Vol.583, No. 24, (Dec 2009), pp. 3882-3890, issn 0014-5793
Whitesides, G.M. (2006). The origins and the future of microfluidics. Nature, Vol.442, No.
7101, (Jul 2006), pp. 368-373, issn 0028-0836
Williams, T.C.R.; Poolman, M.G.; Howden, A.J.M.; Schwarzlander, M.; Fell, D.A.; Ratcliffe,
R.G. & Sweetlove, L.J. (2010). A genome-scale metabolic model accurately predicts
fluxes in central carbon metabolism under stress conditions. Plant Physiology,
Vol.154, No. 1, (Sep 2010), pp. 311-323, issn 0032-0889
240 Metabolomics
Wishart, D.S. (2008). Metabolomics: applications to food science and nutrition research.
Trends in Food Science & Technology, Vol.19, No. 9, 2008), pp. 482-493, issn 0924-2244
Wurm, M.; Schopke, B.; Lutz, D.; Muller, J. & Zeng, A.P. (2010). Microtechnology meets
systems biology: The small molecules of metabolome as next big targets. Journal of
Biotechnology, Vol.149, No. 1-2, (Aug 2010), pp. 33-51, issn 0168-1656
Yu, J.M.; Holland, J.B.; McMullen, M.D. & Buckler, E.S. (2008). Genetic design and statistical
power of nested association mapping in maize. Genetics, Vol.178, No. 1, (Jan 2008),
pp. 539-551, issn 0016-6731
Zhang, N.Y.; Gur, A.; Gibon, Y.; Sulpice, R.; Flint-Garcia, S.; McMullen, M.D.; Stitt, M. &
Buckler, E.S. (2010). Genetic analysis of central carbon metabolism unveils an
amino acid substitution that alters maize NAD-dependent isocitrate
dehydrogenase activity. Plos One, Vol.5, No. 3, (Apr 2010), issn 1932-6203
Zhang, Y.; Thiele, I.; Weekes, D.; Li, Z.W.; Jaroszewski, L.; Ginalski, K.; Deacon, A.M.;
Wooley, J.; Lesley, S.A.; Wilson, I.A.; Palsson, B.; Osterman, A. & Godzik, A. (2009).
Three-dimensional structural view of the central metabolic network of Thermotoga
maritima. Science, Vol.325, No. 5947, (Sep 2009), pp. 1544-1549, issn 0036-8075
10
1. Introduction
Microorganisms are indispensable for every aspect of human life, in fact all life on earth,
although they cannot be seen by the naked eye. Since time immemorial, every process in the
biosphere has been affected by the apparently unending ability of microbes to renovate the
world around them. More recently, many discoveries have been made in isolating a special
class of microorganisms, mainly fungi but also bacteria, commonly called endophytes,
which have been shown to have the natural potential for accumulation of various bioactive
metabolites which may directly or indirectly be used as therapeutic agents against a
plethora of maladies (Kusari & Spiteller, 2010, 2011). Bioprospecting endophytes have led to
exciting possibilities to explore and utilize their potential. Several bioprospecting strategies
might be employed in order to discover potent endophytes with desirable traits (Figure 1).
These include randomly sampling different plants from any population to isolate the
associated endophytes, or first performing a detailed study of an ecosystem in order to
determine its features with regard to its natural population of plant species, their
relationship with the environment, soil composition, and biogeochemical cycles. Another
approach is to evaluate the evolutionary relatedness among groups of plants at a particular
sampling site, correlating to species, genus, and populations, through morphological data
matrices and molecular sequencing, followed by isolation of endophytes from the desired
plants. Traditional medicinal plants are also bioprospected for endophytes, especially for the
ones capable of producing one or more of the bioactive secondary metabolites present in the
host plants. Finally, the valuable information obtained using the different bioprospecting
schemes can be pooled together, comparatively evaluated, and stored for further use
applying suitable data mining approaches.
Fig. 1. Different bioprospecting strategies that might be utilized in order to discover novel or
competent endophytes with desirable features.
bacteria, and actinomycetes) have been discovered as endophytes. The most frequently
encountered endophytes are fungi (Staniek et al., 2008). Fungal endophytes constitute an
inexplicably diverse group of polyphyletic fungi ubiquitous in plants, and maintain an
indiscernible dynamic relationship with their hosts for at least a part of their life cycle
(Figure 2a,b). The existence of fungi inside the tissues of asymptomatic plants has been
known since the end of the nineteenth century (Guerin, 1898). Evidence of plant-associated
microorganisms found in the fossilized tissues of stems and leaves has revealed that
endophyte-plant associations may have evolved from the time higher plants first appeared
on the earth (Redecker et al., 2000). However, except for some infrequent studies, it was not
until the end of the twentieth century that fungal endophytes began to receive more
attention from scientists. Since endophytes were first described in the Darnel (Freeman,
1904), various investigators have isolated endophytes from different plant species. These
discoveries led to a worldwide search for novel endophytes for the better understanding
and applicability of such a promising group of microorganisms. On the one hand, the
ecological aspects of endophytic fungi such as host range, evolutionary relatedness,
infection, colonization, transmission patterns, tissue specificity, and mutualistic fitness
benefits have been investigated relating to a plethora of plants (Arnold et al., 2003, 2007,
Arnold, 2005, 2007; Stone et al., 2004; Schulz &Boyle, 2005; Rodriguez et al., 2009) (Figure
2c). On the other hand, many discoveries have been made in isolating endophytic fungi,
which have been shown to have the potential for de novo synthesis of various bioactive
metabolites that may directly or indirectly be used as therapeutic agents against numerous
ailments (Strobel and Daisy, 2003; Strobel et al., 2004; Zhang et al., 2006; Gunatilaka, 2006;
Staniek et al., 2008; Suryanarayanana et al., 2009; Aly et al., 2010; Kharwar et al., 2011; Kusari
& Spiteller, 2010, 2011).
Metabolomics of Endophytic Fungi Producing
Associated Plant Secondary Metabolites: Progress, Challenges and Opportunities 243
4. Rationale for plant selection to provide the best opportunities for isolating
endophytic fungi producing associated plant natural products
Considering the enormous numbers and the diversity of plants, ingenious strategies should
be utilized to narrow the search for endophytes producing plant compounds. A specific
rationale for the collection of each plant for endophyte isolation could be proposed to
maximize possibility of discovering endophytes equipped with the capacity to produce
associated plant natural products. Several hypotheses governing this plant selection strategy
might be exploited.
4.1 Plants from inimitable ecological niche, especially those with an uncommon
morphology and possessing unusual strategies for subsistence
4.1.1 Case study: Hypericum perforatum
Plants from a distinct ecological niche or with unusual biology might also harbor potent
endophytes. A fine example of such a plant is Hypericum perforatum, which is commonly
called St. John’s wort (Wichtl, 1986) (Figure 3a). This plant is a pseudogamous,
facultatively apomictic, perennial medicinal plant that is native to Europe, West and
South Asia, North Africa, North America, and Australia (Hickey and
King, 1981; Wichtl, 1986). In general, Hypericum has always been a very important
medicinal plant occupying a significant place in ancient history. Pedanius Dioscorides, the
foremost ancient Greek herbalist, mentioned four species of Hypericum - uperikon, askuron,
androsaimon, and koris, which he recommended for sciatica, “when drunk with 2 heim of
hydromel (honey water)” (Gunther, 1959). H. perforatum has also been in use, at least from
the time of ancient Greece (Tammaro and Xepapadakis, 1986), as an antidepressant, in
healing of wounds and menstrual disorders, due to the presence of the-then unknown
bioactive compounds in the plant. This plant has also found historical use in India, China,
Egypt and many countries of Europe, where the tribal peoples have been burning this
plant to represent sun, light, vitality and strength (Hickey & King, 1981). We know now
that this plant produces the widely used antidepressant compound hypericin (Brockmann
Metabolomics of Endophytic Fungi Producing
Associated Plant Secondary Metabolites: Progress, Challenges and Opportunities 245
4.2 Plants that have an ethno-botanical history that is associated with the specific
practices or applications of interest
4.2.1 Case study: Juniperus species
Juniperus plants (Figure 4) serve an excellent example to describe this rationale, which
contain the therapeutically important anticancer lignans podophyllotoxin and
deoxypodophyllotoxin (Hartwell et al., 1953). This species was in use as early as in the first
century A.D., when Gaius Plinius Secundus mentioned that the smaller species of Juniperus
could be used, among other things, to stop tumors or swelling (Imbert, 1998). The use of the
oil of Juniperus species (J. sabina, J. phoenicea and J. communis) for the treatment of ulcers,
carbuncles and leprosy was also mentioned by Dioscorides (Gunther, 1959). Generally, the
dried needles, called savin, or the derived oil was used. In 47 A.D., Scribonius Largus wrote
that savin oil was used to soften “hard female genital parts” (Sconocchia, 1983). Savin was
later also used to treat uterine carcinoma, venereal warts and polyps. Based on such
historical use by indigenous people, we recently isolated and characterized endophytic
fungi harbored in Juniperus plants sampled from the natural populations in Dortmund and
Haltern, Germany, and Jammu and Kashmir, India. This resulted in the discovery of a
deoxypodophyllotoxin-producing endophytic fungus harbored in J. communis (Kusari et al.,
2009a).
plant has been in use as traditional medicine in China for treatment of psoriasis, liver and
stomach ailments and the common cold (Sung et al., 1998). The present application of this
plant is on account of the fact that it contains substantial quantities of an important
antineoplastic drug, namely camptothecin (CPT). This plant is uprooted and harvested by
various sectors, including medical groups, pharmaceutical companies and scientists from
around the world, to isolate CPT for numerous purposes (Lorence & Nessler, 2004;
Sankar-Thomas, 2010). In addition to the difficulties of the practical total synthesis of this
natural compound, the unpredictable problems of nature such as erratic weather and
pests (Kusari et al., 2011d) have rendered this plant species vulnerable to extinction. As
such, in 2000 and again in 2006, C. acuminata was proposed for protection in the CITES
(Convention for International Trade in Endangered Species), World Conservation
Monitoring Centre, appendix II (Anonymous, 2000, 2006). This appendix lists species that
are not necessarily now threatened with extinction but that may become so unless trade is
closely controlled. There are, of course, some nurseries growing C. acuminata for
commercial purposes. These few nurseries, however, cannot meet the demand for CPT
production (Sankar-Thomas, 2010). Furthermore, the yields of CPT from field trees vary
widely and depend on factors that are difficult to control. For instance, plant diseases
such as leaf spot and root rot are some of the major fungal diseases that can limit the
cultivation of Camptotheca plants (Li et al., 2005) and diminish the production of CPT.
Cultivation of Camptotheca plants is limited to subtropical climates and it takes about ten
years for plants to produce a stable fruit yield (Li et al., 2005; Sankar-Thomas, 2010). The
combination of a high demand for CPT and its scarcity from natural plant sources has,
therefore, led to a different strategy of bioprospecting the endophytic fungi associated
with the C. acuminata as alternate sources of CPT and related metabolites (Kusari et al.,
2009b, 2011b).
6) have emerged as one of the most promising agents for cancer treatment owing to the
typical action mechanism involving DNA-Topoisomerase I, i.e., they cause DNA damage by
stabilizing a normally transient covalent complex between Topoisomerase I (Topo 1) and
DNA (Hsiang et al., 1985; Kusari et al., 2011a,d). CPT interacts with the Topo 1-DNA
complex, thereby forming a ternary complex that stabilizes the trans-esterification
intermediate (Hertzberg et al., 1990; Pommier et al., 1995). Thus, by stabilizing the cleavable
complex, CPT transforms the normally useful enzyme Topo 1 into an intracellular, cytotoxic
poison, and hence, CPT and structural analogues are called ‘topoisomerase poisons’ or
‘topoisomerase inhibitors’ (Lorence & Nessler, 2004).
Endophytic Entrophospora infrequens (Puri et al., 2005; Amna et al., 2006) and Neurospora
crassa (Rehman et al., 2008) isolated from N. nimmoniana were initially reported to produce
CPT. However, in both cases, there have been no further studies on how the fungi are able
to produce CPT and prevent self-toxicity from the intracellular accumulated CPT. Further,
no follow-up work on up-scaling the production of CPT has been performed, and there is no
published breakthrough in the commercial exploitation of these endophytic fungi as a
source of CPT.
Fig. 6. Camptothecin (CPT) and some important natural analogues found in plants.
Recently, we isolated an endophytic fungus, Fusarium solani, from the inner bark of
Camptotheca acuminata Decaisne, obtained from the Southwest Forestry University (SWFU)
campus, Kunming (Yunnan Province), People’s Republic of China (Figure 7a). This
250 Metabolomics
Fig. 7. (a) Endophytic Fusarium solani that produces CPT, 9-MeO-CPT and 10-OH-CPT. (b) A
cross-species CPT biosynthetic pathway where the endophyte utilizes indigenous enzymes
to biosynthesize CPT precursors (10-hydroxygeraniol, secologanin, and tryptamine), but
requires the host strictosidine synthase to complete the pathway. (c) Perforations on the
leaves N. nimmoniana plant (Western Ghats, India) (red arrows) are caused by feeding of
Chrysomelid leaf beetles. (Photographs courtesy of M. Spiteller).
Metabolomics of Endophytic Fungi Producing
Associated Plant Secondary Metabolites: Progress, Challenges and Opportunities 251
The discovery of this endophytic fungus that is capable of producing CPT led us to envisage
the possibility of using this organism to produce CPT under controlled fermentation
conditions in an economical, environment-friendly, and reproducible manner amenable to
industrial scale-up. Unfortunately, it was observed that a substantial decrease occurred in
the production of CPT and 9-MeO-CPT by this in vitro-cultured endophyte following
repeated subculturing (i.e., in successive subculture generations) (Kusari et al., 2009b).
Optimized fermentation conditions and the addition of precursors as well as various host
plant tissue extracts did not restore the production. We then deciphered the chemical
ecology of the endophyte-host interaction, where the fungal endophyte utilizes indigenous
G10H (geraniol 10-hydroxylase), SLS (secologanin synthase), and TDC (tryptophan
decarboxylase) to biosynthesize CPT precursors. However, to complete the cross-species
CPT biosynthetic pathway, the endophyte requires the host STR (strictosidine synthase)
(Kusari et al., 2011b) (Figure 7b). The fungal CPT biosynthetic genes destabilized ex planta
over successive subculture generations. The seventh subculture predicted proteins exhibited
reduced homologies to the original enzymes proving that such genomic instability leads to
dysfunction at the amino acid level. The endophyte with an impaired CPT biosynthetic
capability was artificially inoculated into the living host plants and then recovered after
colonization. CPT biosynthesis could still not be restored. This demonstrated that the
observed phenomenon of genomic instability was possibly irreversible (Kusari et al., 2011b).
Following our discovery of the endophytic fungus F. solani, another endophytic fungus has
been isolated from Apodytes dimidiata capable of producing the same compounds (Shweta et
al., 2010). Furthermore, an endophytic Xylaria sp. has recently been isolated from C.
acuminata capable of producing only 10-OH-CPT, and remarkably, not the parent compound
CPT (Liu et al., 2010). In both these cases, no further follow-up studies have been reported
so far. Recently, it was reported that chrysomelid beetles (Kanarella unicolor Jacobby) feeds
on the leaves of N. nimmoniana without any apparent adverse effect (Ramesha et al., 2011)
(Figure 7c). Interestingly, most of the CPT in the insect body was found in the parental form
without any major metabolized products.
mode (1 spectrum s-1; mass range: 200-800) with nominal mass resolving power of 60000 at
m/z 400 with a scan rate of 1 Hz, with automatic gain control to provide high-accuracy mass
measurements within 2 ppm deviation using one internal lock mass; m/z 391.284286; bis-(2-
ethylhexyl)-phthalate. MS2 led to the corresponding CO2 loss of the precursor (CID of 45).
The final MS3 measurement was performed under CID of 45 and resulted in characteristic
fragments of the compounds. The compounds were additionally confirmed using 1H NMR,
performed at 298 K with a Bruker DRX-400 spectrometer using 5 mm tubes with CDCl3
(Merck, Darmstadt, Germany) as solvent.
For the host C. acuminata plants, the LC-MS/MS data were subjected to a number of
different chemometric evaluations for metabolite profiling and correlating the
phytochemical loads among the various Camptotheca plants (infraspecific), among the
organic and aqueous phases, and among the different aerial tissues (dry and fresh in
parallel) to reflect the metabolome profiles of the studied plants (Kusari et al., 2011d). The
analyses included multivariate analysis (MVA), Kruskal’s multidimensional scaling (MDS),
principal component analysis (PCA), linear discriminant analysis (LDA), and hierarchical
agglomerative cluster analysis (HACA). All analyses were performed using the statistical
software XLSTAT-Pro (Addinsoft, NY, U.S.A.), except for MVA which was performed using
the statistical software QI Macros (KnowWare International Inc., CO, U.S.A.). Both the
statistical software packages were used in combination with Microsoft Excel (part of
Microsoft Office Professional, Microsoft Corporation, U.S.A.).
Furthermore, we used the high-precision isotope-ratio mass spectrometry (HP-IRMS) by
compound-specific carbon isotope (CSCI) and compound-specific nitrogen isotope (CSNI)
modules, to confirm that the endophytic fungus actually utilizes host strictosidine
synthase, as detailed above (Kusari et al., 2011b). The CPT produced by the cultured
endophyte (first generation) outside the host plant in a nitrogen-free media was compared
to CPT from the tissue (not containing the same F. solani) of original C. acuminata host
(from SWFU) to check both the δ13C/12C (by CSCI) and the δ15N/14N (by CSNI). It was
possible to trace the exact pattern of the accumulation of both ‘carbons’ and ‘nitrogens’
with the source of the enzyme(s) (fungal or plant) concerned up to and including the
formation of CPT in the endophytic fungus and in the host plant. Briefly, the samples
were readied for HP-IRMS in each case by placing 0.5 mg CPT in 3.5 × 5 mm tin capsules
(HEKAtech GmbH, Germany), lyophilizing completely and finally rolling the capsules
into small spheres. The HP-IRMS measurements were performed in compound-specific
carbon isotope (CSCI) and compound-specific nitrogen isotope (CSNI) modules, using a
FlashEA 1112 elemental analyzer (Thermo Fisher, Italy) coupled to a DELTA V Plus
isotope-ratio mass spectrometer (Thermo Fisher, Bremen, Germany) interfaced through a
ConFlo IV universal continuous flow interface (Thermo Fisher, Bremen, Germany) (Kusari
et al., 2011b). The combustion furnace (oxidation reactor) was maintained at 1020°C, and
flash combustion was initiated by injecting a pulse of O2 at the time of sample drop.
Helium was used as the carrier with a flow rate of 120 mL min-1. NOx species were
reduced to N2 in a reduction furnace at 680°C. Water was removed by phosphorus
pentoxide in a water trap and CO2 was separated from N2 using a Porapak-packed
N2/CO2-separation column (3 m × 6.5 mm, Thermo Electron S. p. A.) operated
isothermally at 85°C. Each sample was analyzed in quadruplet. Acetanilide (Fisons
Instruments) was used as the reference standard.
Metabolomics of Endophytic Fungi Producing
Associated Plant Secondary Metabolites: Progress, Challenges and Opportunities 253
Using the rationale that the plants containing hypericin may also contain endophytic
fungi that are able to accumulate the same or similar molecules, a selective search for
fungal endophytes was pursued. A number of endophytic fungi were isolated from
various organs of the Hypericum plants, which were morphologically different from the
strains isolated from unsterilized explants (surface-contaminating fungi). Only one
endophytic fungus was able to produce hypericin and emodin under axenic submerged
shake-flask fermentation (Kusari et al., 2008). The fungus was identified as Thielavia
subthermophila by its morphology and authenticated by 28S rDNA and ITS-5.8S rDNA
analyses. The growth of the endophyte and production of hypericin remained
independent of the illumination conditions and media spiking with emodin.
Protohypericin could not be detected, irrespective of either spiking or illumination
conditions. The hyp-1 gene, suggested to encode for the Hyp-1 phenolic coupling protein
in plant cell cultures, was absent in the genome of the endophyte. Thus, it was proposed
that emodin anthrone is the common precursor of both hypericin and emodin in the
fungal endophyte, which is governed by a different molecular mechanism than the host
plant or host cell suspension cultures (Kusari et al., 2009c).
8. Acknowledgements
We thank the International Bureau (IB) of the German Federal Ministry of Education and
Research (BMBF/DLR), Germany for supporting our various research projects. We also
thank the Ministry of Innovation, Science, Research and Technology of the State of North
Rhine-Westphalia, Germany, and the German Research Foundation (DFG) for granting us
the necessary high-resolution instruments.
9. References
Aly AH, Debbab A, Kjer J, Proksch P. (2010). Fungal endophytes from higher plants: a
prolific source of phytochemicals and other bioactive natural products. Fungal
Divers., 41: 1-16.
Amna T, Puri SC, Verma V, Sharma JP, Khajuria RK, Musarrat J, Spiteller M, Qazi GN.
(2006). Bioreactor studies on the endophytic fungus Entrophospora infrequens for the
production of an anticancer alkaloid camptothecin. Can. J. Microbiol., 52: 189–196.
Anonymous. (2000, 2006). Consideration of proposals for amendment of appendices II: inclusion of
happy tree (Camptotheca acuminata Decaisne) in CITES appendix II of convention in
accordance with the provisions of article II, paragraph 2(a). Prop. 11.58, World
Metabolomics of Endophytic Fungi Producing
Associated Plant Secondary Metabolites: Progress, Challenges and Opportunities 261
Guerin P. (1898). Sur la presence d’un champignon dans l’ivraie. J. Botanique, 12: 230–238.
Gunatilaka AAL. (2006). Natural products from plant-associated microorganisms:
distribution, structural diversity, bioactivity, and implications of their occurrence. J.
Nat. Prod., 69: 509–526.
Gunther RT. (1959). The Greek herbal of Dioscorides. Hafner Publishing Co., New York.
Häberlein H, Tschiersch KP, Stock S, Hölzl J. (1992). Johanniskraut (Hypericum perforatum L.):
Nachweis eines weiteren Naphthodianthrons. Pharm. Ztg. Wiss., 5/137: 169–174.
Hartwell JL, Johnson JM, Fitzgerald DB, Belkin, M. (1953). Podophyllotoxin from Juniperus
species; Savinin. J. Am. Chem. Soc., 75: 235–236.
Hertzberg RP, Busby RW, Caranfa MJ, Holden KG, Johnson RK, Hecht SM, Kingsbury WD.
(1990). Irreversible trapping of the DNA-topoisomerase I covalent complex.
Affinity labeling of the camptothecin binding site. J. Biol. Chem., 265: 19287–19295.
Hickey M, King C. (1981). 100 Families of flowering plants (2nd edition, Walters SM ed.),
Cambridge University Press, Cambridge.
Hölzl J, Petersen M. (2003). Chemical constituents of Hypericum ssp. In: Hypericum: the genus
Hypericum (Series: Medicinal and Aromatic Plants - Industrial Profiles), vol. 31, Ernst E.
(ed.), pp. 77-93, Taylor and Francis, London, UK.
Hombe Gowda HC, Vasudeva R, Mathachen GP, Shaanker RU, Ganeshaiah KN. (2002).
Breeding types in Nothapodytes nimmoniana Graham.: An important medicinal tree.
Curr. Sci., 83: 1077–1078.
Hsiang YH, Hertzberg R, Hecht S, Liu LF. (1985). Camptothecin induces protein-linked
DNA breaks via mammalian DNA topoisomerase I. J. Biol. Chem., 260: 14873–14878.
Imbert TF. (1998). Discovery of podophyllotoxins. Biochimie, 80: 207–222.
Kharwar RN, Verma VC, Kumar A, Gond SK, Harper JK, Hess WM, Lobkovosky E, Ma
C, Ren Y, Strobel GA. (2009). Javanicin, an antibacterial naphthaquinone from an
endophytic fungus of neem, Chloridium sp. Curr. Microbiol., 58: 233-238.
Kharwar RN, Mishra A, Gond SK, Stierle A, Stierle D. (2011). Anticancer compounds
derived from fungal endophytes: their importance and future challenges. Nat. Prod.
Rep., 28: 1208-1228.
King J. (1857). Discovery of podophyllin. Coll. J. M. Sci., 2: 557–559.
Kour A, Shawl AS, Rehman S, Sultan P, Qazi PH, Suden P, Khajuria RK, Verma V. (2008).
Isolation and identification of an endophytic strain of Fusarium oxysporum
producing podophyllotoxin from Juniperus recurva. World J. Microbiol. Biotechnol.,
24: 1115–1121.
Kubin A, Wierrani F, Burner U, Alth G, Grunberger W. (2005). Hypericin - the facts about a
controversial agent. Curr. Pharm. Des., 11: 233–253.
Kumar KR, Ved DK. (2000). 100 Red listed medicinal plants of conservation concern in southern
India, Foundation for Revitalisation of Local Health Traditions (FRLHT), Bangalore,
India.
Kusari S, Kosuth J, Cellarova E, Spiteller M. (2011a). Survival-strategies of endophytic
Fusarium solani against indigenous camptothecin biosynthesis. Fungal Ecol., 4: 219-
223.
Kusari S, Lamshöft M, Spiteller M. (2009a). Aspergillus fumigatus Fresenius, an endophytic
fungus from Juniperus communis L. Horstmann as a novel source of the anticancer
pro-drug deoxypodophyllotoxin. J. Appl. Microbiol., 107: 1019-1030.
Metabolomics of Endophytic Fungi Producing
Associated Plant Secondary Metabolites: Progress, Challenges and Opportunities 263
Mahesh B, Tejesvi MV, Nalini MS, Prakash HS, Kini KR, Subbiah V, Hunthrike SS. (2005).
Endophytic mycoflora of inner bark of Azadirachta indica A. Juss. Curr. Sci., 88: 218-
219.
Martinez B, Kasper S, Ruhrmann S, Moller HJ. (1993). Hypericum in the treatment of seasonal
affective disorders. Nervenheilkunde, 36: 103–108.
Nahrstedt A, Butterweck V. (1997). Biologically active and other chemical constituents of the
herb of Hypericum perforatum L. Pharmacopsychiatry, 30: 129–134.
Onelli E, Rivetta A, Giorgi A, Bignami M, Cocucci M, Patrignani G. (2002). Ultrastructural
studies on the developing secretory nodules of Hypericum perforatum. Flora, 197: 92–
102.
Petcher TJ, Weber HP, Kuhn M, von Wartburg A. (1973). Crystal structure and absolute
configuration of 2'-bromopodophyllotoxin-0.5 ethyl acetate. J. Chem. Soc., Perkin
Trans. 2: 288–292.
Podwyssotzki V. (1881). The active constituent of podophyllin. Pharm. J. Trans., 12: 217–218.
Podwyssotzki V. (1882). On the active constituents of podophyllin. Am. J. Pharm., 12: 102–
115.
Podwyssotzki V. (1884). Pharmakologische Studien über Podophyllum peltatum. Naunyn.
Schmied Arch. Exp. Path. Phar., 13: 29–52.
Pommier Y, Kohlhagen G, Kohn KW, Leteurtre F, Wani MC, Wall ME. (1995). Interaction of
an alkylating camptothecin derivative with a DNA base at topoisomerase I-DNA
cleavage sites. Proc. Natl. Acad. Sci. U. S. A., 92: 8861–8865.
Puri SC, Nazir A, Chawla R, Arora R, Riyaz-ul Hasan S, Amna T, Ahmed B, Verma V, Singh
S, Sagar R, Sharma A, Kumar R, Sharma RK, Qazi GN. (2006). The endophytic
fungus Trametes hirsuta as a novel alternative source of podophyllotoxin and
related aryl tetralin lignans. J. Biotechnol., 122: 494–510.
Puri SC, Verma V, Amna T, Qazi GN, Spiteller M. (2005). An endophytic fungus from
Nothapodytes foetida that produces camptothecin. J. Nat. Prod., 68: 1717–1719.
Radulovic N, Stankov-Jovanovic V, Stojanovic G, Smelcerovic A, Spiteller M, Asakawa Y.
(2007). Screening of in vitro antimicrobial and antioxidant activity of nine Hypericum
species from the Balkans. Food Chem., 103: 15–21.
Raffa RB. (1998). Screen of receptor and uptake-site activity of hypericin component of St.
John’s wort reveals sigma receptor binding. Life Sci., 62: 265–270.
Rajagopal R, Suryanarayanan TS. (2000). Isolation of endophytic fungi from leaves of neem
(Azadirachta indica). Curr. Sci., 78: 1375–1378.
Ramesha BT, Zuehlke S, Vijaya R, Priti V, Ravikanth G, Ganeshaiah K, Spiteller M, Shaanker
RU. (2011). Sequestration of camptothecin, an anticancer alkaloid, by chrysomelid
beetles. J. Chem. Ecol., 37: 533-536.
Razinkov SP, Yerofeyeva LN, Khovrina MP, Lazarev AI. (1989). Validation of the use of
Hypericum perforatum medicamentous form with a prolonged action to treat
patients with maxillary sinusitis. Zh. Ushn. Nos. Gorl. Bolezn., 49: 43–46.
Redecker D, Kodner R, Graham LE. Glomalean fungi from the Ordovician. (2000). Science,
289: 1920-1921.
Rehman S, Shawl AS, Kour A, Andrabi R, Sudan P, Sultan P, Verma V, Qazi GN. (2008). An
endophytic Neurospora sp. from Nothapodytes foetida producing camptothecin. Appl.
Biochem. Microbiol., 44: 203–209.
Metabolomics of Endophytic Fungi Producing
Associated Plant Secondary Metabolites: Progress, Challenges and Opportunities 265
Rodriguez RJ, White JFJ, Arnold AE, Redman RS. (2009). Fungal endophytes: diversity and
functional roles. New Phytol., 182: 314–330.
Sankar-Thomas YD. (2010). In vitro culture of Camptotheca acuminata (Decaisne) in Temporary
Immersion System (TIS): growth, development and production of secondary metabolites,
PhD thesis, Universität Hamburg, Germany.
Scherlach K, Hertweck C. (2009). Triggering cryptic natural product biosynthesis in
microorganisms. Org. Biomol. Chem., 7: 1753-1760.
Schulz BJE, Boyle CJC. (2005). The endophytic continuum. Mycol. Res., 109: 661–687.
Sconocchia S. (1983). Scribonius largus compositions. In: Bibliotheca Scriptorum Graecorum et
Romanorum Teubneriana (B.G. Teubner), Hansen GC (ed.), pp. 76-77,
Verlagsgesellschaft, Leipzig, Germany.
Shaanker RU, Ramesha BT, Ravikanth G, Gunaga RP, Vasudeva R, Ganeshaiah, KN. (2008).
Chemical profiling of Nothapodytes nimmoniana for camptothecin, an important
anticancer alkaloid: towards the development of a sustainable production system.
In: Bioactive molecules and medicinal plants, Ramawat KG, Merillon JM (eds.), pp. 197-
213, Springer-Verlag, Berlin and Heidelberg.
Shweta S, Zühlke S, Ramesha BT, Priti V, Kumar PM, Ravikanth G, Spiteller M, Vasudeva R,
Shaanker RU. (2010). Endophytic fungal strains of Fusarium solani, from Apodytes
dimidiata E. Mey. ex Arn (Icacinaceae) produce camptothecin, 10-
hydroxycamptothecin and 9-methoxycamptothecin. Phytochemistry, 71: 117–122.
Staniek A, Woerdenbag HJ, Kayser O. (2008). Endophytes: exploiting biodiversity for the
improvement of natural product-based drug discovery. J. Plant Interact., 3: 75–93.
Stierle A, Strobel GA, Stierle D. (1993). Taxol and taxane production by Taxomyces andreanae,
an endophytic fungus of Pacific yew. Science, 260: 214–216.
Stone JK, Polishook JD, White JF Jr. (2004). Endophytic fungi. In: Biodiversity of fungi:
inventory and monitoring methods, Mueller G, Bills GF, Foster MS (eds.), pp. 241-270,
Elsevier, Burlington, MA, USA.
Strobel GA, Daisy B. (2003). Bioprospecting for microbial endophytes and their natural
products. Microbiol. Mol. Biol. Rev., 67: 491–502.
Strobel GA, Daisy B, Castillo U, Harper J. (2004). Natural products from endophytic
microorganisms. J. Nat. Prod., 67: 257–268.
Sung CK, Kimura T, But PPH, Guo JX. (1998). International collation of traditional and folk
medicine: Northeast Asia, Part III. A project of UNESCO, vol. 3, World Scientific
Publishing Co. Pte. Ltd., Singapore.
Suryanarayanana TS, Thirunavukkarasub N, Govindarajulub MB, Sassec F, Jansend R,
Murali TS. (2009). Fungal endophytes and bioprospecting. Fungal Biol. Rev., 23: 9–
19.
Tammaro F, Xepapadakis G. (1986). Plants used in phytotherapy, cosmetics and dyeing in
the Pramanda district (Epirus, north-west Greece). J. Ethnopharmacol., 16: 167–174.
Veitch GE, Boyer A, Ley SV. (2008). The azadirachtin story. Angew. Chem. Int. Ed., 47: 9402-
9429.
Verma VC, Gond SK, Kumar A, Kharwar RN, Strobel GA. (2007). Endophytic mycoflora of
bark, leaf, and stem tissues of Azadirachta indica A. Juss. (neem) from Varanasi
(India). Microb. Ecol., 54: 119-125.
266 Metabolomics
Verma VC, Gond SK, Mishra A, Kumar A, Kharwar RN, Gange AC. (2009). Endophytic
actinomycetes from Azadirachta indica A. Juss.: isolation, diversity and anti-
microbial activity. Microb. Ecol., 57: 749–756.
Wall ME, Wani MC, Cook CE, Palmer KH, Mcphail AT, Sim GA. (1966). Plant antitumor
agents. I. The isolation and structure of camptothecin, a novel alkaloidal leukemia
and tumor inhibitor from Camptotheca acuminata. J. Am. Chem. Soc., 88: 3888–3890.
Wichtl M. (1986). Hypericum perforatum L. Das Johanniskraut. Zeitschrift Phytother., 3: 87–90.
Winter JM, Behnken S, Hertweck C. (2011). Genomics-inspired discovery of natural
products. Curr. Opin. Chem. Biol., 15: 22-31.
Wu SH, Chen YW, Shao SC, Wang LD, Li ZY, Yang LY, Li SL, Huang R. (2008). Ten-
membered lactones from Phomopsis sp., an endophytic fungus of Azadirachta indica.
J. Nat. Prod., 71: 731–734.
Wu SH, Chen YW, Shao SC, Wang LD, Yu Y, Li ZY, Yang LY, Li SL, Huang R. (2009). Two
new Solanapyrone analogues from the endophytic fungus Nigrospora sp. YB-141 of
Azadirachta indica. Chem. Biodivers., 6: 79-85.
Yazaki K, Okada T. (1994). Hypericum erectum Thunb. (St. John's wort): in vitro culture and
the production of procyanidins. In: Biotechnology in Agriculture and Forestry.
Medicinal and Aromatic Plants VI, Bajaj YPS (ed.), vol. 26, pp. 167-178, Springer-
Verlag, Berlin.
Zaichikova SG, Grinkevich NI, Barabanov EI. (1985). Healing properties and determination
of the upper parameters of toxicity of Hypericum herb. Farmatsiya, 34: 62–64.
Zhang HW, Song YC, Tan RX. (2006). Biology and chemistry of endophytes. Nat. Prod. Rep.,
23: 753–771.
Part 5
1. Introduction
Inflammation is a normal and extraordinarily important component of responses to
infection and injury. The cardinal features of swelling, redness, stiffness and increasing
temperature are strong indicators of the significant changes in tissue metabolism and the
ingress of immune cells into the tissues. The increase in blood flow which underlies
many of these changes may result in changes to the supply of nutrients and in particular
the level of oxygen in the tissues. Inward migration of immune cells, which is also
enabled by the increased blood flow, will put further stress on the metabolic
environment of the tissues. The activity of macrophages and neutrophils in clearing
infection and repairing tissue damage also have significant metabolic consequences
particularly because of the production of cytokines and cytotoxic molecules such as
reactive oxygen species and reactive nitrogen species, which are required to kill invading
organisms. Production of these molecules will consume considerable quantities of
oxygen, ATP and NADPH. These antimicrobial agents put considerable stress on host
cells in the surrounding and distal tissues and can lead to significant loss of protective
metabolites such as glutathione.
Most infections and traumatic injuries are cleared or repaired relatively rapidly and
metabolic homoeostasis is soon restored. However, there is a broad range of inflammatory
diseases which involve chronic activation of the immune system and, as a result, chronic
persistent inflammation. We have been studying the metabolic consequences of chronic
inflammatory diseases with the aim of identifying metabolic fingerprints which may
provide clues about why the localised tissue disease persists. For example, why in
rheumatoid arthritis does persistent inflammation lead to widespread cartilage and joint
destruction? However, the metabolic consequences of chronic inflammation are much
more widespread than the localised disease and can lead on to important comorbidities
such as accelerated atherosclerosis and cardiovascular disease. Metabolomic analysis may
be able to distinguish between localised and systemic metabolic consequences of
inflammation and provide novel targets for therapeutic intervention in these important
human diseases.
270 Metabolomics
White adipose tissue has been shown to secrete several inflammatory mediators called
adipokines or adipocytokines. These induce their activities by binding to selective
transmembrane receptors. Leptin is the most studied adipocytokine and is thought to have
an important role in the inflammatory process (Montecucco & Mach, 2009).
+ TNF-α
Epithelial
+ IL-17
- IL-10 + IL-1 IL-6
Macrophage IL-1Ra IL-8 IFN-γ
T cell
+ IL-1
- IL-4
IL-6
TNFα + IL-8 IL-10
IL-13
IL-6
Fibroblast
+ FGF
TGF-β
Fig. 1. Key inflammatory cytokines and the inflammatory network. Responses are a balance
of pro-inflammatory tumour necrosis factor alpha (TNF) and interleukin (IL) 1, IL-6, IL-17
and anti-inflammatory IL-1R, IL-4, IL-10 and IL-13. Expression of cytokines is dependent on
activation and local signalling driving progression and eventual resolution.
AA
+ PLA2
EPA -
Prostaglandins
Leukotrienes
TNFα +
HIF-1α
+ IL-1 HYPOXIA
IL-6
TNFα
IFNγ
Fig. 2. Some common metabolic responses to inflammation and hypoxia. Arachidonic acid
(AA) from cellular membranes is metabolised to inflammatory prostaglandins and
leukotrienes. Omega-3 fatty acids (EPA) compete for the same pathway producing less
inflammatory derivatives. Hypoxic conditions in the inflammatory site stabilises HIF
transcription factor driving production of IL-1, IL-6, TNFα and IFNγ. TNFα in turn drives
cellular proteolysis and tissue remodelling.
When investigating inflammation it is important to take into account the many facets of the
inflammatory environment that have the potential to play a role in pathology. Hypoxia is
known to be prevalent in the inflammatory environments such as those associated with
wounds, malignant tumours, bacterial infections and autoimmunity (Eltzschig & Carmeliet,
2011, Murdoch et al., 2005). Increasing hypoxia in the inflammatory site is associated with
poorer disease outcome such as increased macroscopic synovitis in rheumatoid arthritis (Ng
et al., 2010).
Normal physiological oxygen levels are thought to range between 5-12% oxygen (compared
to 21% atmospheric oxygen). However, hypoxic tissue oxygen levels in pathological
environments can range from as little as 0.5% oxygen to around 2.5% oxygen. Local hypoxia
develops as the result of either blood vessel occlusion by inflamed tissues, or when existing
supply is insufficient for increased cellular density caused by infiltrating or proliferating
inflammatory cells. Additionally, circulating phagocytes can block blood vessels reducing
blood flow into the inflammatory site (Sitkovsky & Lukashev, 2005). Normal tissue
structures can lend themselves to hypoxia where they are poorly perfused, such as the
synovium or eye. Tissue alteration associated with inflammation can contribute to hypoxia
by altering pressure within the blood vessels causing vessel occlusion and increasing
distances between blood vessels (Jawed et al., 1997, Mapp et al., 1995).
Metabolomics in the Analysis of Inflammatory Diseases 273
There is increasing evidence that the inflammatory environment is hypoxic. The tumour
environment is known to be hypoxic and extensive angiogenesis reveals the requirement of
the tissue for a better oxygen supply. In rheumatoid arthritis, oxygen levels of synovial fluid
have been directly measured revealing lower oxygen tensions compared with osteoarthritic
patients and patients with traumatic joint injuries (Lund-Olesen, 1970). In systemic sclerosis,
direct measurements with sensitive probes revealed lower dermal oxygen levels in fibrotic
areas compared to non-fibrotic areas in both patients and healthy controls (Beyer et al.,
2009). Metabolomic analysis of eye fluids from uveitis patients has shown increased levels
of oxaloacetate and urea, likely derived from anaerobic respiration by locally activated
macrophages (Young et al., 2009, Young & Wallace, 2009).
An elegant cellular oxygen detection system is used by cells to respond to changes in
environmental oxygen. Reductions in environmental oxygen lead to the stabilisation of the
transcription factor hypoxia-inducible factor (HIF), which is otherwise targeted for depletion
in oxygen-rich environments. HIF expression is therefore suggestive of hypoxic exposure, and
has been detected in autoimmune diseases such as rheumatoid arthritis and multiple sclerosis
(Gaber et al., 2009, Hollander et al., 2001, Lassmann, 2003). HIF is known to be important in
inflammatory development, for example loss of HIF-1a in macrophages is associated with
impaired aggregation, motility, invasiveness and killing of bacteria (Cramer et al., 2003).
Hypoxia and HIF stabilisation has a large effect on cellular metabolism. HIF causes a
preference for glycolytic metabolism over oxidative phosphorylation by inducing the
expression of glycolytic enzymes. This allows ATP generation to continue in the absence of
sufficient oxygen albeit at a much reduced efficiency per molecule of glucose. It also induces
the upregulation of lactate dehydrogenase A, therefore promoting the conversion of
pyruvate (produced during glycolysis) to lactate (Wheaton & Chandel, 2011). Lactate has
been detected in many chronic inflammatory conditions such as in inflamed joints (Chang &
Wei, 2011, Treuhaft & McCarty, 1971), multiple sclerosis , pulmonary inflammation (Serkova
et al., 2008) and is thought to play a role in wound healing (Trabold et al., 2003). Conversely,
the acidosis associated with increasing lactate concentrations is thought to play a pathogenic
role in cell transformation and autoantigen development in some inflammatory
environments (Chang & Wei, 2011). Recently, lactate measurements have been suggested to
be useful in the diagnosis of bacterial infections in diabetic foot ulcers compared to non-
infected ulcers. Both infected and non-infected ulcers revealed high lactate concentrations,
but infected ulcers had significantly higher levels probably due to additional immune and
bacterial cell involvement (Loffler et al., 2011). The detection of lactate in metabolomic
studies of disease suggests that there may be an inflammatory component, understanding of
which may help to direct future treatment.
Immune cells are thought to be highly influenced by hypoxia and HIF stabilisation
especially due to the environments they normally act within. In a study performed recently
by Gaber et al., peripheral blood CD4+ T cells placed under hypoxia were found to have a
large induction of genes involved in metabolism and homeostasis (Gaber et al., 2009). Innate
immune cells such as neutrophils and macrophages are thought to be adapted to function
best at lower oxygen tensions as they preferentially use glycolysis to provide ATP even at
higher oxygen levels (Cramer et al., 2003). Macrophages are known to accumulate in the
hypoxic sites of chronic inflammation (Vergadi et al., 2011), and hypoxia is associated with
activation of tissue-resident macrophages. Exposure of macrophages to hypoxic conditions
274 Metabolomics
at the interaction between RA susceptibility genes HLA-DRB1 and PTPN22 and their
interaction with smoking (Kallberg et al., 2007). It was observed that the odds ratio (OR) of
developing RA with two genetic risk factors was 13.2, which rose to 23.4 if two genetic
factors were present and there was a history of smoking. These studies provides sound
evidence that gene-gene and gene-environment interactions occur, and risk of inflammatory
disease greatly increases with the presence of more than one additional risk factor.
2008, Schicho et al., 2010). This has been shown using both urine samples or faecal extracts.
Hence, TMA may be a useful biomarker for IBD.
NMR spectroscopy has also been used to assess cerebrospinal fluid (CSF) in patients with
MS. It has been shown that there are increased CSF levels of lactate, creatinine and
fructose in MS compared to control patients (Nicoli et al., 1996). Two additional
unidentified signals were found to be elevated in MS. The compound responsible for
both these signals has now been identified as B-hydroxyisobutyrate (Lutz et al., 2007).
This is a typical partial degradation product of branched-chain amino acids. Increased B-
hydroxyisobutyrate in urine is thought to be due to respiratory-chain deficiency leading
to impaired oxidation of NADH (Chitayat et al., 1992). However the level of B-
hydroxyisobutyrate in these experiments was much higher than the level found in CSF
from MS patients, and so the precise role of B-hydroxyisobutyrate in MS needs further
investigation.
In a study of metabolite fingerprints in the CSF from patients with a range of neurological
conditions we have been able to differentiate between some of these conditions by
comparing the metabolites found (Sinclair et al., 2010). In particular we were able to identify
some novel features of idiopathic intracranial hypertension (IIH) a neurological condition,
the pathogenesis of which is poorly understood (Sinclair et al., 2008). Although IIH was not
thought to be an inflammatory disease, the elevated levels of lactate we observed in IIH
points towards an inflammatory component since lactate has been identified in
inflammatory CNS disease previously (Simone et al., 1996). Rabbits with elevated
intraocular pressure also show increased levels of lactate which may reflect anaerobic
metabolism resulting from decreased blood supply and this may also be an explanation for
the lactate in the IIH patients’ CSF due to compressed vasculature from the elevated
intracranial pressure. Oxaloacetate levels were also increased in IIH and this, together with
reduced citrate, suggests alterations in the citric acid cycle. Overall the observations suggest
a predominantly anaerobic environment deficient in carbohydrate substrate in patients with
IIH, a conclusion supported by the presence of elevated ketone bodies 3-hydroxybutyrate
(Sinclair et al., 2010) often observed in hypoxic tissues.
3.6 Osteoarthritis
Osteoarthritis (OA) is a complex disease and has a multifactorial pathogenesis. It has many
known risk factors such as age, sex, obesity, activity level, prior joint damage and genetic
susceptibility. It is not classically thought of as an inflammatory disease but it may have an
inflammatory element. There are currently no disease-modifying drugs for OA and very
few are in development.
Synovial fluid (SF) has been used to look at OA via NMR. SF is felt to be a good medium to
study as the SF is the first place where the degradation products, enzymes and signal
transduction molecules involved in OA are released from the cartilage matrix. The SF
should therefore have a higher concentration of metabolites compared to blood, lymph or
urine.
Damayanovich et al used SF from a canine model of OA to look at metabolic profiles
using NMR (Damyanovich et al., 1999). Metabolites from experimentally induced canine
knee OA SF were compared to metabolites from SF of normal canine knees. They found
large increases in lactate and sharp decreases of glucose in OA SF compared to normal
SF suggesting that the intra-articular environment of an OA joint is more hypoxic and
acidic than a healthy joint. They also found increased levels of pyruvate, lipoprotein
associated fatty acids, glycerol and ketones in OA SF suggesting that lipolysis may be an
important source of energy in OA. There were also elevated levels of N-
acetylglycoproteins, acetate and acetamide in OA SF especially with progressive OA
(Damyanovich et al., 1999).
In order to understand further the mechanisms behind OA progression, Damayanovich et al
looked at the effect of joint afferent nerve injury (Damyanovich et al., 1999). They again
used a bilateral canine model of OA. Paired SF samples were taken from dogs that had
undergone bilateral anterior cruciate ligament transaction, unilateral knee denervation and
contralateral sham nerve exposure. NMR was used to look at the SF. Increases in glycerol,
hydroxybutyrate, glutamine, creatinine, acetate and N-acetyl-glycoprotein were seen in the
SF from denervated compared to control knees. This suggests that the metabolite
differences seen in the denervated knees are due to the aggravation of OA caused by joint
denervation (Damyanovich et al., 1999). Hydroxybutyrate is also found in SF of RA patients
(Naughton et al., 1993) suggesting that it is more of a marker of joint destruction rather than
being specific for any joint disease.
Another group used guinea pigs to study OA metabolism (Lamers et al., 2003). They used
Hartley outbred strain guinea pigs as they develop spontaneous progressive knee OA
with features similar to human disease. The earliest histological features appear at 3
months but progress to extensive cartilage degeneration after 12 months. Urine samples
were collected from these OA guinea pigs and from healthy animals at 10 and 12 months
of age. They identified a metabolic fingerprint that reflected OA changes in the pigs.
Lactic acid, malic acid, hypoxanthine and alanine contributed strongly to the fingerprint
suggesting their involvement in OA (Lamers et al., 2003). The metabolic profile largely
resembled that found in the guinea pig model. The presence of hypoxanthine suggests
that OA may be an inflammatory disease due to the increased oxygen demand and altered
purine metabolism.
280 Metabolomics
Mass spectroscopy has also been used to look for novel biomarkers for knee OA (Zhai et al.,
2010). They looked at serum samples of unrelated white women with and without knee OA.
Knee OA was defined as radiographic, medically diagnosed or total knee replacement due
to primary OA. They found that the ratio of valine to histidine and the ratio of leucine to
histidine to be significantly associated with knee OA in humans (Zhai et al., 2010). These
ratios have potential clinical use as an OA biomarker. OA branched chain amino acids
(BCAA) are raised which may drive the release of acetoacetate and 3-hydroxybutyrate.
These can result from the partial oxidation of leucine. BCAA are essential amino acids and
therefore cannot be synthesised within the body. An increased level of BCAA may suggest
an increased rate of protein breakdown or be secondary to collagen degradation. BCAA
increase production of the cytokines IL1, IL2, TNF and interferon (Bassit et al., 2000) which
could drive the collagen degradation.
that they identified were cholesterol, lactate, acetylated glycoprotein and lipids. The lactate
levels represented oxidative damage and thus indirectly reflected active inflammation.
3.8 Atherosclerosis
Atherosclerosis is the thickening of arteries and is the underlying pathological process that
affects the coronary, cerebral, aortic and peripheral arteries. Atherosclerosis involves the
accumulation of cholesterol particles, cellular by-products, deposition of the extracellular
matrix and inflammatory cell infiltration within the vessel wall (Goonewardena et al., 2010).
Chronic inflammation has been recognised as one of the key components of atherogenesis
(Ross, 1999) but accelerated atherosclerosis is an important confounder of chronic
inflammatory diseases such as rheumatoid arthritis (Bacon et al., 2005). Animal models have
been widely used to investigate the biochemical basis of atherosclerosis. Using aortas from
apolipoprotein-E knockout mice Mayr et al concluded that inefficient vascular glucose and
energy metabolism coincided with increased oxidative stress in animals with hyperlipidaemia
(Mayr et al., 2007). NMR-based metabolomics of mouse urine has been used to look at
atherosclerosis (Leo & Darrow, 2009). Using apolipoprotein-E knockout mice they compared
untreated mice with those treated with captopril. They found elevated levels of xanthine and
ascorbate in untreated mice which may be possible markers of plaque formation (Leo &
Darrow, 2009). The interaction between diet and inflammation in promoting atherosclerosis
has also been highlighted through metabolomic studies and Kleenmann (Kleemann et al.,
2007) suggested that a high cholesterol intake lead to a switch in liver metabolism towards a
pro-atherosclerotic state. Another recent example of how metabolomics can provide novel
insights into inflammatory disease pathology was the observation that the metabolism of
dietary lecithin by gut flora leads to the increased absorption and accumulation of choline
derivatives which in turn promote cardiovascular disease (Wang et al., 2011) . Only through
the use of the systematic analysis of metabolites using metabolomics was it possible to uncover
these complex metabolic relationships underpinning the disease process.
4. Conclusion
As summarised above there is now a growing body of literature describing metabolomic
changes in inflammatory diseases, both in humans and animal models. Several distinct
metabolic changes have been identified in inflammatory disorders, but there is a core theme
of increasing energy requirements coupled with decreasing oxygen supply within the
inflammatory environment.
Studies in MS, RA, OA and inflammatory lung disease have all shown an increase in lactate,
while studies of inflammatory eye and lung diseases have shown local reductions in
glucose. Immunological responses to tissue hypoxia, such as the up-regulation of IL-1, IL-6,
IFN-γ and TNF-α seen in macrophages, show the link between local metabolic changes and
inflammatory responses. Here transcription factor HIF-1α may play a central co-ordinating
role in both normal and pathological inflammation by regulating the underlying cellular
metabolism towards anaerobic respiratory pathways and lactate production. Subsequent
effects of inflammatory cytokines on tissue remodelling and perfusion further provide a
mechanism for feedback driving self-sustaining inflammatory microenvironments, and
potentially where resolution is disrupted, a route to chronic inflammatory disease.
282 Metabolomics
Therefore, as both a by-product and mediator of local tissue conditions, metabolites offer a
unique opportunity to gain an insight of local and global inflammatory processes.
Metabolomics likewise, provides promising opportunities for both diagnosis of
inflammatory diseases, and study of the underlying processes that may offer clues as to how
the inflammatory process develops.
5. Acknowledgement
The authors were supported by grants from Arthritis Research UK (grant numbers 18552
and 19325) and the Wellcome Trust (089384/Z/09/Z and 066490/Z/01/A).
6. References
Alberg, A.J. (2002). The influence of cigarette smoking on circulating concentrations of
antioxidant micronutrients, Toxicology, vol.180, No. 2, pp.121-37, ISSN 0300-483X.
Albina, J.E., Henry, W.L., Jr., Mastrofrancesco, B., Martin, B.A., Reichner, J.S. (1995).
Macrophage activation by culture in an anoxic environment, Journal of Immunology,
vol.155, No. 9, pp.4391-6, ISSN 0022-1767.
Bacon, P.A., Church, L.D., Young, S.P. (2005). Endothelial Dysfunction - the Link Between
Inflammation and Atherosclerosis in Rheumatoid Arthritis, Journal of the Indian
Rheumatology Association, vol.13, pp.103-6, ISSN 0971-5045.
Bassit, R.A., Sawada, L.A., Bacurau, R.F.P., Navarro, F., Rosa, L.F.B.P. (2000). The effect of
BCAA supplementation upon the immune response of triathletes, Medicine and
Science in Sports and Exercise, vol.32, No. 7, pp.1214-9, ISSN 0195-9131.
Beyer, C., Schett, G., Gay, S., Distler, O., Distler, J.H. (2009). Hypoxia. Hypoxia in the
pathogenesis of systemic sclerosis, Arthritis Research and Therapy, vol.11, No. 2,
pp.220, ISSN 1478-6362.
Bezabeh, T., Somorjai, R.L., Smith, I.C.P. (2009). MR metabolomics of fecal extracts:
applications in the study of bowel diseases, Magnetic Resonance in Chemistry, vol.47,
pp.S54-S61, ISSN 0749-1581.
Borgerding, M., Klus, H. (2005). Analysis of complex mixtures - Cigarette smoke,
Experimental and Toxicologic Pathology, vol.57, pp.43-73, ISSN 0940-2993.
Brown, R.A., Spina, D., Page, C.P. (2008). Adenosine receptors and asthma, British Journal of
Pharmacology, vol.153 Suppl 1, pp.S446-S56, ISSN 0007-1188.
Chang, X., Wei, C. (2011). Glycolysis and rheumatoid arthritis, International Journal of
Rheumatic Diseases, vol.14, No. 3, pp.217-22, ISSN 1756-185X.
Chitayat, D., Meaghervillemure, K., Mamer, O.A., Ogorman, A., Hoar, D.I., Silver, K.,
Scriver, C.R. (1992). Brain Dysgenesis and Congenital Intracerebral Calcification
Associated with 3-Hydroxyisobutyric Aciduria, Journal of Pediatrics, vol.121, No. 1,
pp.86-9, ISSN 0022-3476.
Cramer, T., Yamanishi, Y., Clausen, B.E., Forster, I., Pawlinski, R., Mackman, N., et al. (2003).
HIF-1alpha is essential for myeloid cell-mediated inflammation, Cell, vol.112, No.
5, pp.645-57, ISSN 0092-8674.
Damyanovich, A.Z., Staples, J.R., Chan, A.D.M., Marshall, K.W. (1999). Comparative study
of normal and osteoarthritic canine synovial fluid using 500 MHz H-1 magnetic
Metabolomics in the Analysis of Inflammatory Diseases 283
Ibrahim, S.M., Gold, R. (2005). Genomics, proteomics, metabolomics: what is in a word for
multiple sclerosis?, Current Opinion in Neurology, vol.18, No. 3, pp.231-5, ISSN
1350-7540.
Isomaki, P., Punnonen, J. (1997). Pro- and anti-inflammatory cytokines in rheumatoid
arthritis, Annals of Medicine, vol.29, No. 6, pp.499-507, ISSN 0785-3890.
Ivanenkov, Y.A., Balakin, K.V., Tkachenko, S.E. (2008). New Approaches to the Treatment of
Inflammatory Disease Focus on Small-Molecule Inhibitors of Signal Transduction
Pathways, Drugs in R&D, vol.9, No. 6, pp.397-434, ISSN 1174-5886.
Jawed, S., Gaffney, K., Blake, D.R. (1997). Intra-articular pressure profile of the knee joint in
a spectrum of inflammatory arthropathies, Annals of the Rheumatic Diseases, vol.56,
No. 11, pp.686-9, ISSN 0003-4967.
Kallberg, H., Padyukov, L., Plenge, R.M., Ronnelid, J., Gregersen, P.K., van der Helm-van
Mil, A., et al. (2007). Gene-Gene and Gene-Environment Interactions Involving
HLA-DRB1, PTPN22, and Smoking in Two Subsets of Rheumatoid Arthritis, The
American Journal of Human Genetics, vol.80, No. 5, pp.867-75, ISSN 0002-9297.
Kleemann, R., Verschuren, L., van Erk, M.J., Nikolsky, Y., Cnubben, N.H.P., Verheij, E.R., et
al. (2007). Atherosclerosis and liver inflammation induced by increased dietary
cholesterol intake: a combined transcriptomics and metabolomics analysis, Genome
Biology, vol.8, No. 9, pp.R200, ISSN 1474-760X.
Lamers, R.J.A.N., DeGroot, J., Spies-Faber, E.J., Jellema, R.H., Kraus, V.B., Verzijl, N., et al.
(2003). Identification of disease- and nutrient-related metabolic fingerprints in
osteoarthritic guinea pigs, Journal of Nutrition, vol.133, No. 6, pp.1776-80, ISSN
1096-0007.
Lassmann, H. (2003). Hypoxia-like tissue injury as a component of multiple sclerosis lesions,
Journal of the Neurological Sciences, vol.206, No. 2, pp.187-91, ISSN 0022-510X.
Lauridsen, M.B., Bliddal, H., Christensen, R., Danneskiold-Samsoe, B., Bennett, R., Keun, H.,
et al. (2010). (1)H NMR Spectroscopy-Based Interventional Metabolic Phenotyping:
A Cohort Study of Rheumatoid Arthritis Patients, Journal of Proteome Research,
vol.9, No. 9, pp.4545-53, ISSN 1535-3907.
Leo, G.C., Darrow, A.L. (2009). NMR-based metabolomics of urine for the atherosclerotic
mouse model using apolipoprotein-E deficient mice, Magnetic Resonance in
Chemistry, vol.47 Suppl 1, pp.S20-S5, ISSN 1097-458X.
Lin, H.M., Edmunds, S.J., Helsby, N.A., Ferguson, L.R., Rowan, D.D. (2009). Nontargeted
Urinary Metabolite Profiling of a Mouse Model of Crohn's Disease, Journal of
Proteome Research, vol.8, No. 4, pp.2045-57, ISSN 1535-3893.
Loffler, M., Zieker, D., Weinreich, J., Lob, S., Konigsrainer, I., Symons, S., et al. (2011).
Wound fluid lactate concentration: a helpful marker for diagnosing soft-tissue
infection in diabetic foot ulcers? Preliminary findings, Diabetic Medicine, vol.28, No.
2, pp.175-8, ISSN 1464-5491.
Lund-Olesen, K. (1970). Oxygen tension in synovial fluids, Arthritis and Rheumatism, vol.13,
No. 6, pp.769-76, ISSN 0004-3591.
Lutz, N.W., Viola, A., Malikova, I., Confort-Gouny, S., Ranjeva, J.P., Pelletier, J., Cozzone,
P.J. (2007). A branched-chain organic acid linked to multiple sclerosis: First
identification by NMR spectroscopy of CSF, Biochemical and Biophysical Research
Communications, vol.354, No. 1, pp.160-4, ISSN 0006-291X.
Metabolomics in the Analysis of Inflammatory Diseases 285
MacGregor, A.J., Snieder, H., Rigby, A.S., Koskenvuo, M., Kaprio, J., Aho, K., Silman, A.J.
(2000). Characterizing the quantitative genetic contribution to rheumatoid arthritis
using data from twins, Arthritis and Rheumatism, vol.43, No. 1, pp.30-7, ISSN 0004-
3591.
Mapp, P.I., Grootveld, M.C., Blake, D.R. (1995). Hypoxia, Oxidative Stress and Rheumatoid-
Arthritis, British Medical Bulletin, vol.51, No. 2, pp.419-36, ISSN 0007-1420.
Marchesi, J.R., Holmes, E., Khan, F., Kochhar, S., Scanlan, P., Shanahan, F., Wilson, I.D.,
Wang, Y.L. (2007). Rapid and noninvasive metabonomic characterization of
inflammatory bowel disease, Journal of Proteome Research, vol.6, No. 2, pp.546-51,
ISSN 1535-3893.
Mayr, M., Madhu, B., Xu, Q. (2007). Proteomics and metabolomics combined in
cardiovascular research, Trends in Cardiovascular Medicine, vol.17, No. 2, pp.43-8,
ISSN 1873-2615.
Metsios, G.S., Stavropoulos-Kalinoglou, A., Panoulas, V.F., Sandoo, A., Toms, T.E., Nevill,
A.M., Koutedakis, Y., Kitas, G.D. (2009). Rheumatoid cachexia and cardiovascular
disease, Clinical and Experimental Rheumatology, vol.27, No. 6, pp.985-8, ISSN 0392-
856X.
Montecucco, F., Mach, F. (2009). Common inflammatory mediators orchestrate
pathophysiological processes in rheumatoid arthritis and atherosclerosis,
Rheumatology, vol.48, No. 1, pp.11-22, ISSN 1462-0324.
Munro, R., Capell, H. (1997). Prevalence of low body mass in rheumatoid arthritis:
Association with the acute phase response, Annals of the Rheumatic Diseases, vol.56,
No. 5, pp.326-9, ISSN 0003-4967.
Murata, Y., Ohteki, T., Koyasu, S., Hamuro, J. (2002). IFN-gamma and pro-inflammatory
cytokine production by antigen-presenting cells is dictated by intracellular thiol
redox status regulated by oxygen tension, European Journal of Immunology, vol.32,
No. 10, pp.2866-73, ISSN 0014-2980.
Murdoch, C., Muthana, M., Lewis, C.E. (2005). Hypoxia regulates macrophage functions in
inflammation, Journal of Immunology, vol.175, No. 10, pp.6257-63, ISSN 0022-1767.
Murdoch, T.B., Fu, H., MacFarlane, S., Sydora, B.C., Fedorak, R.N., Slupsky, C.M. (2008).
Urinary metabolic profiles of inflammatory bowel disease in interleukin-10 gene-
deficient mice, Analytical Chemistry, vol.80, No. 14, pp.5524-31, ISSN 0003-2700.
Naughton, D., Whelan, M., Smith, E.C., Williams, R., Blake, D.R., Grootveld, M. (1993). An
Investigation of the Abnormal Metabolic Status of Synovial-Fluid from Patients
with Rheumatoid-Arthritis by High-Field Proton Nuclear-Magnetic-Resonance
Spectroscopy, FEBS Letters, vol.317, No. 1-2, pp.135-8, ISSN 0014-5793.
Naughton, D.P., Haywood, R., Blake, D.R., Edmonds, S., Hawkes, G.E., Grootveld, M.
(1993). A Comparative-Evaluation of the Metabolic Profiles of Normal and
Inflammatory Knee-Joint Synovial-Fluids by High-Resolution Proton Nmr-
Spectroscopy, FEBS Letters, vol.332, No. 3, pp.221-5, ISSN 0014-5793.
Ng, C.T., Biniecka, M., Kennedy, A., McCormick, J., FitzGerald, O., Bresnihan, B., et al.
(2010). Synovial tissue hypoxia and inflammation in vivo, Annals of the Rheumatic
Diseases, vol.69, No. 7, pp.1389-95, ISSN 0003-4967.
286 Metabolomics
Ngumah, Q.C., Buchthal, S.D., Dacheux, R.F. (2006). Longitudinal non-invasive proton NMR
spectroscopy measurement of vitreous lactate in a rabbit model of ocular
hypertension, Experimental Eye Research, vol.83, No. 2, pp.390-400, ISSN 1096-0007.
Nicoli, F., VionDury, J., ConfortGouny, S., Maillet, S., Gastaut, J.L., Cozzone, P.J. (1996).
Cerebrospinal fluid metabolic profiles in multiple sclerosis and degenerative
dementias obtained by high resolution proton magnetic resonance spectroscopy,
Comptes Rendus de l Academie des Sciences Serie Iii-Sciences de la Vie-Life Sciences,
vol.319, No. 7, pp.623-31, ISSN 0764-4469.
Nishimoto, N., Sugino, H., Lee, H.M. (2010). DNA microarray analysis of rheumatoid
arthritis susceptibility genes identified by genome-wide association studies (vol 12,
pg 403, 2010), Arthritis Research and Therapy, vol.12, No. 3, ISSN 1478-6362.
Parkes, H.G., Grootveld, M.C., Henderson, E.B., Farrell, A., Blake, D.R. (1991). Oxidative
Damage to Synovial-Fluid from the Inflamed Rheumatoid Joint Detected by H-1-
Nmr Spectroscopy, Journal of Pharmaceutical and Biomedical Analysis, vol.9, No. 1,
pp.75-82, ISSN 0731-7085.
Renz, H., von Mutius, E., Brandtzaeg, P., Cookson, W.O., Autenrieth, I.B., Haller, D. (2011).
Gene-environment interactions in chronic inflammatory disease, Nature
Immunology, vol.12, No. 4, pp.273-7, ISSN 1529-2908.
Rieckmann, P., Smith, K.J. (2001). Multiple sclerosis: more than inflammation and
demyelination, Trends in Neurosciences, vol.24, No. 8, pp.435-7, ISSN 0166-2236.
Ross, R. (1999). Mechanisms of disease - Atherosclerosis - An inflammatory disease, New
England Journal of Medicine, vol.340, No. 2, pp.115-26, ISSN 0028-4793.
Scannell, G. (1996). Leukocyte responses to hypoxic/ischemic conditions, New Horizons,
vol.4, No. 2, pp.179-83, ISSN 1063-7389.
Schicho, R., Nazyrova, A., Shaykhutdinov, R., Duggan, G., Vogel, H.J., Storr, M. (2010).
Quantitative metabolomic profiling of serum and urine in DSS-induced ulcerative
colitis of mice by (1)H NMR spectroscopy, Journal of Proteome Research, vol.9, No.
12, pp.6265-73, ISSN 1535-3907.
Serhan, C.N. (2009). Systems approach to inflammation resolution: identification of novel
anti-inflammatory and pro-resolving mediators, Journal of Thrombosis and
Haemostasis, vol.7, pp.44-8, ISSN 1538-7933.
Serkova, N.J., Van Rheen, Z., Tobias, M., Pitzer, J.E., Wilkinson, J.E., Stringer, K.A. (2008).
Utility of magnetic resonance imaging and nuclear magnetic resonance-based
metabolomics for quantification of inflammatory lung injury, American Journal Of
Physiology-Lung Cellular And Molecular Physiology, vol.295, No. 1, pp.L152-L61,
ISSN 1040-0605.
Simone, I.L., Federico, F., Trojano, M., Tortorella, C., Liguori, M., Giannini, P., Picciola, E.,
Natile, G., Livrea, P. (1996). High resolution proton MR spectroscopy of
cerebrospinal fluid in MS patients. Comparison with biochemical changes in
demyelinating plaques, Journal of the Neurological Sciences, vol.144, No. 1-2, pp.182-
90, ISSN 0022-510X.
Sinclair, A.B., Viant, M.R., Ball, A.K., Burdon, M.A., Walker, E.A., Stewart, P.M., Rauz, S.,
Young, S.P. (2010). NMR-Based Metabolomic Analysis of Cerebrospinal Fluid and
Serum in Neurological Diseases - A Diagnostic Tool?, NMR in Biomedicine, vol.23,
No. 2, pp.123-32, ISSN 1099-1492.
Metabolomics in the Analysis of Inflammatory Diseases 287
Sinclair, A.J., Ball, A.K., Burdon, M.A., Clarke, C.E., Stewart, P.M., Cumow, S.J., Rauz, S.
(2008). Exploring the pathogenesis of IIH: An inflammatory perspective, Journal of
Neuroimmunology, vol.201, pp.212-20, ISSN 0165-5728.
Sitkovsky, M., Lukashev, D. (2005). Regulation of immune cells by local. tissue oxygen
tension: Hif1 alpha and adenosine receptors, Nature Reviews Immunology, vol.5, No.
9, pp.712-21, ISSN 1474-1733.
Stappenbeck, T.S., Cadwell, K., Patel, K.K., Maloney, N.S., Liu, T.C., Ng, A.C.Y., et al. (2010).
Virus-Plus-Susceptibility Gene Interaction Determines Crohn's Disease Gene
Atg16L1 Phenotypes in Intestine, Cell, vol.141, No. 7, pp.1135-64, ISSN 0092-8674.
Strober, W., Fuss, I., Mannon, P. (2007). The fundamental basis of inflammatory bowel
disease, Journal of Clinical Investigation, vol.117, No. 3, pp.514-21, ISSN 0021-9738.
Summers, G.D., Deighton, C.M., Rennie, M.J., Booth, A.H. (2008). Rheumatoid cachexia: a
clinical perspective, Rheumatology, vol.47, No. 8, pp.1124-31, ISSN 1462-0324.
Summers, G.D., Metsios, G.S., Stavropoulos-Kalinoglou, A., Kitas, G.D. (2010). Rheumatoid
cachexia and cardiovascular disease, Nature Reviews Rheumatology, vol.6, No. 8,
pp.445-51, ISSN 1759-4790.
Trabold, O., Wagner, S., Wicke, C., Scheuenstuhl, H., Hussain, M.Z., Rosen, N., Seremetiev,
A., Becker, H.D., Hunt, T.K. (2003). Lactate and oxygen constitute a fundamental
regulatory mechanism in wound healing, Wound Repair and Regeneration, vol.11,
No. 6, pp.504-9, ISSN 1524-475X.
Trapp, B.D., Bo, L., Mork, S., Chang, A. (1999). Pathogenesis of tissue injury in MS lesions,
Journal of Neuroimmunology, vol.98, No. 1, pp.49-56, ISSN 0165-5728.
Treuhaft, P.S., McCarty, D.J. (1971). Synovial fluid pH, lactate, oxygen and carbon dioxide
partial pressure in various joint diseases, Arthritis and Rheumatism, vol.14, No. 4,
pp.475-84, ISSN 0004-3591.
Vergadi, E., Chang, M.S., Lee, C., Liang, O.D., Liu, X., Fernandez-Gonzalez, A., Mitsialis,
S.A., Kourembanas, S. (2011). Early macrophage recruitment and alternative
activation are critical for the later development of hypoxia-induced pulmonary
hypertension, Circulation, vol.123, No. 18, pp.1986-95, ISSN 0009-7322.
Wang, Z.N., Klipfell, E., Bennett, B.J., Koeth, R., Levison, B.S., Dugar, B., et al. (2011). Gut
flora metabolism of phosphatidylcholine promotes cardiovascular disease, Nature,
vol.472, No. 7341, pp.57-65, ISSN 0028-0836.
Weljie, A.M., Dowlatabadi, R., Miller, B.J., Vogel, H.J., Jirik, F.R. (2007). An inflammatory
arthritis-associated metabolite biomarker pattern revealed by H-1 NMR
Spectroscopy, Journal of Proteome Research, vol.6, No. 9, pp.3456-64, ISSN 1535-3893.
Wheaton, W.W., Chandel, N.S. (2011). Hypoxia. 2. Hypoxia regulates cellular metabolism,
American Journal of Physiology - Cell Physiology, vol.300, No. 3, pp.C385-C93, ISSN
0363-6143.
White, J.R., Harris, R.A., Lee, S.R., Craigon, M.H., Binley, K., Price, T., Beard, G.L., Mundy,
C.R., Naylor, S. (2004). Genetic amplification of the transcriptional response to
hypoxia as a novel means of identifying regulators of angiogenesis, Genomics,
vol.83, No. 1, pp.1-8, ISSN 0888-7543.
Williams, H.R.T., Cox, I.J., Walker, D.G., North, B.V., Patel, V.M., Marshall, S.E., et al. (2009).
Characterization of Inflammatory Bowel Disease With Urinary Metabolic Profiling,
American Journal of Gastroenterology, vol.104, No. 6, pp.1435-44, ISSN 0002-9270.
288 Metabolomics
Young, S.P., Nessim, M., Falciani, F., Trevino, V., Banerjee, S.P., Scott, R.A.H., Murray, P.I.,
Wallace, G.R. (2009). Metabolomic analysis of human vitreous humor differentiates
ocular inflammatory disease, Molecular Vision, vol.15, No. 125-29, pp.1210-7, ISSN
1090-0535.
Young, S.P., Wallace, G.R. (2009). Metabolomic analysis of human disease and its
application to the eye, Journal of Ocular Biology, Disease and Informatics, vol.2, No. 4,
pp.235-42, ISSN 1936-8445.
Zhai, G., Wang-Sattler, R., Hart, D.J., Arden, N.K., Hakim, A.J., Illig, T., Spector, T.D. (2010).
Serum branched-chain amino acid to histidine ratio: a novel metabolomic
biomarker of knee osteoarthritis, Annals of the Rheumatic Diseases, vol.69, No. 6,
pp.1227-31, ISSN 0003-4967.
12
1. Introduction
1.1 Overview
Metabolomics, which is also referred to as metabonomics, metabolic profiling or metabolic
fingerprinting, is the comprehensive quantitative measurement of endogenous metabolites
within a biological system (Fiehn, 2002; Kaddurah-Daouk et al, 2008; Spratlin et al, 2009).
Detection of metabolites is in general carried out in cell extracts, tissue specimens, or various
biological fluids including serum, plasma, urine and cerebrospinal fluid (CSF) by liquid
chromatography mass spectrometry (LC-MS), gas chromatography–mass spectrometry (GC-
MS), capillary electrophoresis–mass spectrometry (CE-MS) or nuclear magnetic resonance
spectroscopy (NMR). Metabolomics captures the status of diverse biochemical pathways in
a particular situation and can define the metabolic status of an organism (Aranibar et al,
2011; DeFeo et al, 2011; Lu et al, 2008; Roux et al, 2011; Soga, 2007; Yuan et al, 2007). In
clinical settings, biomarkers generated from metabolomics have become one of the most
essential diagnostic criteria that can be objectively measured and evaluated as indicators of
normal or pathological states, as well as a tool to assess responses to therapeutic
interventions (Hunter, 2009; Spratlin et al, 2009; van der Greef et al, 2006; Zeisel, 2007). As
we describe in this chapter, novel metabolomic markers, for instance, for cancer therapy,
glucose intolerance, hepatic steatosis, nephrotic and psychiatric disorders, and their
incorporation into clinical decision-making may considerably change future health care.
In order for metabolomics to be successful in clinical settings, it must surpass
conventional methods in reliability and predictive capability, and/or should be more
informative about disease pathogenesis. Utilizing a systems biology approach in
Mitsuo Takahashi1, Toshihiko Ando3, Hiroshi Miyano2, Kenji Nagao1, Yasushi Noguchi1,
Nobuhisa Shimba3,4 and Takeshi Kimura4
1Amino Acids Basic And Applied Research Group, Frontier Research Laboratories,
2011). The critical step is the construction of models from the raw dataset of
transcriptomics, proteomics, and metabolomics. This may be achieved by using different
mathematical techniques ranging from simple Pearson correlations to the use of ordinary
differential equations (Wheelock et al, 2009). Through this modeling, fundamental
concepts in the understanding of biological systems like robustness, modularity,
emergence, etc. are incorporated.
Most studies currently remain focused on local level networks within a set of related genes
or protein expressions (Bapat et al, 2010; Kirouac et al, 2010). Yet a combination of different
levels of networks can be connected to overview the whole system. A change in the gene
regulatory network may have a corresponding effect in the protein–protein interaction
network, the metabolic network, etc., which collectively may manifest changes in the
pathological phenotype. To understand the whole system, it is critical to integrate
knowledge from different datasets. Although some progress has been made in amino acid
metabolism, the integration of different types of datasets is still difficult due to differences in
dynamic range, scales, or analytical errors, particularly in metabolomic analysis (Ishii et al,
2007; Momin et al, 2011; Noguchi et al, 2008). Therefore, focused-metabolomics, with well
managed measurements in terms of accuracy and reproducibility, for lipid, amino acid and
glucose metabolism appears to be a realistic approach to illustrate how the phenotype is
altered when the metabolic network itself is modified through the alteration of endogenous
or environmental factors.
1.7 RT
Relative Conc.
1.6 Fridge
1.5 Ice-Water
1.4
1.3
1.2
1.1
1
0 2 4 6
Time(h)
Fig. 1. Effect of cooling on concentration of glutamate in whole blood
40
35
Room Temp.
30
25
Fridge
20 Ice
Temp. (°C )
15
10
Blood Tube Cooler
5 Blood Tube Cooler
Ice-Water
0
-5
0 3 6 9 12 15 2 6 10 14
Time (min) Time (h)
Fig. 3. Cooling rate when the blood tubes are set in various conditions and cooling duration
of the blood tube cooler
294 Metabolomics
2.1.2 Centrifugation
It is desirable to store blood samples in ice-water after collection and to separate the plasma
from the blood cells within a few hours. As mentioned above, since blood cells contains
many amino acids and enzymes, it is important not to contaminate the plasma with
platelets. If contamination occurs, the concentrations of some amino acids, such as
glutamate, aspartic acid and taurine can be high.
2.1.4 Deproteinization
Since plasma contains proteins such as albumin, deproteinization is necessary before amino
acid analysis. When analyzed with amino acid analyzer, plasma is generally mixed with
trichloro-acetic acid or sulfo-salicylic acid and the precipitate is centrifuged. Since these
reagents are strong acids, it is necessary to rapidly analyze amino acids or store in -80°C
freezer so that some amino acids like glutamine are not decomposed due to acid hydrolysis.
When analyzing with LC-MS or LC-MS/MS, organic solvents such as methanol and
acetonitrile is useful for deproteinization. In this case, the organic solvent may influence the
derivatization reaction and separation of amino acids. Since recovery rates for amino acids
depend on the procedure of deproteinization, it is desirable to unify the procedure. When
analyzing with LC-MS or LC-MS/MS, recovery rates can be calculated by adding stable-
isotope-labeled amino acids as internal standards before deproteinization.
analysis. Derivatization methods, based on specific reactions to targeted functional groups are
major tools in targeted analysis. This method allows for sensitive and selective quantification
of endogenous metabolites with amino and carboxyl groups (Tsukamoto et al, 2006; Yang et al,
2006). An advantage of this method is to be able to select a suitable sample preparation for
each endogenous metabolite with the same functional group, because of the similar physical
and chemical properties. This method is also very important for accurate quantification,
because sample stability is different for each endogenous metabolite.
The analysis of amino acids with an amino group has a long history. In 1958, a key
application for physiological amino acid analysis was supplanted by ion exchange column
chromatography separations on an automated apparatus designed and built by postdoctoral
fellow Darrel H. Spackman at the request of his mentor William H. Stein, and Stanford
Moore at Rockerfeller University (Moore et al, 1958). This automated system reduced the
analytical time from a few weeks to a full day and provided easy to use operation. The
present system is used for the study of inborn errors of amino acid metabolism in clinical
laboratories (Qu et al, 2001).
Recently, pre-column derivatization reagents for amino acid analyses have been developed,
mainly to achieve greater sensitivity and selectivity, and much attention is paid to the
design of derivatization reagents for LC-MS (Yang et al, 2006) and LC-MS/MS (Shimbo et al,
2009a; Shimbo et al, 2009b). These reagents have three notable characteristics (Figure 4).
First, the reagent must have sufficient hydrophobicity to enable the retention of amino acids.
Secondly, is should have a desirable structure which will increases ionization efficiency.
Thirdly, it should be designed to provide characteristic and selective cleavage at the
bonding site between the reagent moiety and the amino acid in the collision cell of the triple-
stage quadrupole mass spectrometer. Using precursor ion scanning, endogenous
metabolites with amino groups are can be extracted on ion chromatograms, even in crude
biological samples.
3-aminopyridyl-N-hydroxysuccinimidyl carbamate (APDS) reagent is known to provide
rapid analysis and separation of amino acids of the same charge to mass ratio on a column
(Shimbo et al, 2009b) (Figure 5). This reagent is applied to the modelling of a diagnostic
index, “AminoIndex technology”, from differences in PFAA profiles between non-cachectic
colorectal/breast/lung cancer patients and healthy individuals. (Maeda et al, 2010;
Okamoto et al, 2009).
Fig. 4. Typical reaction of amino acids with a derivatizaiton reagent for LC-MS/MS. This
reagent has three notable characteristics; 1) sufficient hydrophobicity (benzene ring) 2)
increases ionization efficiency (quaternary amine) 3) characteristic and selective cleavage
(the reagent moiety and the amino acid).
296 Metabolomics
Fig. 5. Typical chromatograms of amino acids which were the same charge to mass ratio on
a column.
The most important point of analysis is algorithm selection. It is well-known as the “no free-
lunch theorem”, that it is impossible to determine the most suitable algorithm a priori, and
that the pros and cons of each algorithm are not always specific, but dependent on each
situation. Therefore, preliminary analysis to determine the most felicitous algorithm is
necessary in each case. Univariate analysis can be performed to figure the behavior of each
metabolite and to select the variable, i.e. dimensionality reduction of variable space, prior to
multivariate analysis. It should be noted that the metabolome data are often so connected
that there is a potential pitfall of statistical analysis, so-called multicollinearity, where the
excess reduction of dimension sometimes can lead to the loss of latent network structure of
metabolites. Multivariate analytical methods are applicable for simplification or
dimensionality reduction of data to easily figure out visualized images of the “metabolite
space” which has huge body of dimensions (metabolites).
Algorithms for multivariate analyses are categorized into two different groups, i.e.,
unsupervised methods and supervised methods. Unsupervised methods do not require
objective variables such as subject status, other observed data, etc., while supervised
methods require them for the data set to be analyzed. The examples of multivariate
algorithms are listed in Table 1. Unsupervised learning methods are especially useful for
investigating the latent structure and decreasing the redundancy of data and therefore they
are sometime performed in combination. The advantages of unsupervised methods are that
they minimize the loss of information (Maeda et al, 2010). However, whether the results of
unsupervised methods can provide the appropriate interpretation or not depends on the
setting of parameters or the problem to be analyzed.
On the contrary, supervised methods (Caruana, 2006) themselves contain the objective
variables. Therefore the goal of analysis is to find a model (or classifier) in which the error
between the model’s response and the target traits is minimized to fit the target traits. Target
traits can be discrete (e.g., disease vs. healthy, grade of disease) or continuous (e.g.,
measurement value). Supervised methods are also applicable to discover and predict which
metabolites are responsible for the target traits (Maeda et al, 2010; Okamoto et al, 2009;
Zhang et al, 2006). However, the generality of the model obtained from those methods can
not be always guaranteed because of the potential overfitting or bias of data. Therefore,
298 Metabolomics
validation of the obtained model is necessary to establish the usefulness for practical use.
Validation methods are categorized into two classes. The first is cross- validation in which
single or multiple samples are iteratively left out from the training data set, and the
remaining samples are used to evaluate the predictive performance of the model. The other
is usage of external validation data set which must not be used for construction of models.
Ideally, the latter case in which blinded data set is used is the most appropriate validation.
However, it is sometimes difficult to perform the validation test itself.
Various metrics are used as criterion of the performance of diagnosis. In the case of the
model in which the object variable contains only two classes (e.g., controls and patients),
receiver-operator characteristic (ROC) curve analysis is the most appropriate criteria for
evaluating the model because this analysis is independent of both sample size of each group
and threshold. As threshold metrics, sensitivity, specificity, positive predictive value (PPV),
negative predictive value (NPV), and accuracy are used. Among them, both sensitivity and
specificity is independent of sample size and ratio of each group while the others are
dependent. Therefore, to determine threshold in terms of PPV, NPV, and accuracy, it is
necessary to take into account the “real” distribution of subjects.
those observed in the controls. The alteration of the PFAA profile in BC differed from that in
CRC, with fewer changes observed. Multiple logistic regression analyses with selected
variables using each data set resulted in AUC of ROC of0.860 for CRC and 0.906 for BC,
respectively when using training data sets. To confirm the performance of the obtained
classifier, ROC curves were also generated from the split test data. These reproduced similar
diagnostic performances, with AUC of 0.910 for CRC, and 0.865 for BC, respectively.
We then investigated the possibility for early detection of non-small-cell lung cancer
(NSCLC) using a larger size of samples (Maeda et al, 2010). 141 NSCLC patients and 423
age-matched, gender-matched healthy controls without apparent cancers were used as the
study data set. As a result, fifteen amino acids (Ser, Gly, Ala, Cit, Val, Met, Ile, Leu, Tyr, Phe,
His, Trp, Orn, Lys, and Arg) were identified whose profile in plasma were associated with
NSCLC. Multiple logistic regression analyses by conditional likelihood methods were
performed with variable selection and LOOCV cross-validation using the study data set.
The resulting conditional logistic regression model included six amino acids: Ala, Val, Ile,
His, Trp, and Orn. The AUC of ROC for the discriminant score was 0.817 in the study data
set. It should be noted that conditional logistic (c-logistic) regression analysis can correct the
effects of age, gender, and smoking statuses which are potential confounding factors in the
discrimination. To verify the robustness of the resulting model, a ROC curve was also
generated using the split test data set, which had not been used to construct the model. An
AUC of ROC for the discriminant score was 0.812 in the test data set, again demonstrating
that the obtained model performed well (Figure 6).
Fig. 6. ROC curves for discriminant scores for the discrimination of NSCLC(Maeda et al,
2010).
It was indicated that the model could discriminate lung cancer patients regardless of cancer
stage or histological type. Furthermore, the distribution of the discriminant scores for small-
cell lung cancer (SCLC) patients was similar to that for NSCLC patients (Figure 7).
300 Metabolomics
Fig. 7. ROC curves for discriminant scores subgrouped by NSCLC stage and histological
type (Maeda et al, 2010). A. ROC curves for cancer stage of study data set. B. ROC curves for
cancer stage of test data set. C. ROC curves for histological type of study data set D. ROC
curves for histological type of test data set (including SCLC patients).
Fig. 8. Radar chart of mean values of PFAAs over fibrosis stages. F01: dashed, F2: dot-dash,
F3: dotted, F4: solid. Mean values are scaled in z-score.
Fig. 9. Molar ratio variation over fibrosis stages. The change in distribution among F0-F2,F3
and F4 stages indicated a stage-dependent trend. Circles are 80% regions of each stage, F0-
F2: dashed and square, F3: dotted and triangle, and F4: solid and christcross.
Clinical Implementation of Metabolomics 303
The Fischer’s ratio (Val+Leu+Ile)/(Phe+Tyr) was originally created for diagnosis of hepatic
encephalopathy (Fischer et al, 1975; Fischer et al, 1976) and has been reported to show good
performance in assessing chronic hepatitis (Kano et al, 1991). Therefore a comparison study
between the Fischer’s ratio and the classifier was undertaken, where the index was
generated to have a positive correlation with the degree of fibrosis, showing an inverse
pattern to Fischer’s ratio. The AI_fibrosis indicated ROC AUC values larger than Fischer’s
ratio: the ROC AUC values of Fischer’s ratio being 0.87 (0.77-0.96) for advanced fibrosis and
0.91 (0.83-0.99) for cirrhosis, respectively. There is a close relationship between the
AI_fibrosis and the Fischer’s ratio as partially supported by the fact that the ratio Phe/Val
correlated well with the inverse of Fischer’s ratio (r = 0.95) because the BCAAs exhibited
good mutual correlation, as did Tyr and Phe. In summary, these results suggest that the
AI_fibrosis based on amino acid concentration can be applied to evaluate liver fibrosis as an
effective and less invasive method as a surrogate marker for liver biopsy, although future
extended validation study is still necessary.
3.3 Lipidomics: A review on the use of lipid metabolomics for clinical use
Lipidomics, a type of focused metabolomics, is the comprehensive measurement of a variety
of lipid classes: free fatty acids (FFA), triglycerides (TAGs), cholesterol esters (CEs),
lysophosphatidylcholines (LPCs), phosphatidylcholines (PCs), lysophosphatidyl
ethanolamines (LPEs), diacylglycerols (DAGs), and sphingomyelins (SMs) and ceramides,
generally using LC-MS/MS (Bou Khalil et al, 2010; Bucci, 2011; Dennis, 2009). Several
studies have reported the potency of lipidomic analyses for biomarker discoveries in
humans in diabetes, non-alcoholic fatty liver disease (NAFLD) (Puri et al, 2009), Alzheimer’s
disease (Han et al, 2011; Valdes-Gonzalez et al, 2011) and cancers (Hilvo et al, 2011; Min et
al, 2011). For instance, Rhee et al reported the LC-MS–based lipid profiling of 189
individuals who developed type 2 diabetes and 189 matched disease-free individuals, with
over 12 years of follow up in the Framingham Heart Study (Rhee et al, 2011). They found
that lipids of lower carbon number and double bond content were associated with an
increased risk of diabetes, whereas lipids of higher carbon number and double bond content
were associated with a decreased risk. In addition, Barr et al demonstrated differential
serum lipidomics in both NAFLD patients and in a mouse model of NAFLD by ultra
performance liquid chromatography-mass spectrometry (UPLC-MS) (Barr et al, 2010).
Multivariate statistical analysis of the UPLC-MS datasets revealed metabolic similarities
between NAFLD mice and human NAFLD patients in relative serum metabolite levels
compared to normal subjects. Lipidomic analysis is also applicable to other biological fluids
such as cerebrospinal fluid (CSF), in addition to plasma and serum (Fonteh et al, 2006). For
instance, phospholipid profiling in the CSF by nano-HPLC-MS has been reported in
Alzheimer’s disease (AD) patients, and a statistically significant increase of SMs were
observed in CSF from probable AD patients compared to normal subjects (Han et al, 2011).
physiological stimuli is evoked in many disease-associated cells and tissues, it leads to the
formation of disease-specific enzymatic metabolite profiles quite different from that of the
healthy hosts, and the blood components are significantly influenced as a result.
Blood amino-acid contents are included in such components (referred to as a blood amino-
acid profile). It is well known that in the process of feeding, exercising, sleeping, and other
activities, the blood amino-acid profile temporarily fluctuates, but within a few hours
returns to the normal level through intrinsic homeostatic mechanisms. By contrast, disease-
mediated disturbances in the local amino-acid metabolisms may result in formation of a
disease-specific change in the blood amino-acid profile. Based on these findings and
discussions, we have introduced the AminoIndex® Cancer Screening (AICS) system as a
tool for providing new biomarkers to enable the early detection of various cancers.
5. Future expectations
Although the applications of “AminoIndex technology” are still limited, the foundations for
their use for diagnostic purposes are in progress as described above. Studies with clinical
data indicate that even with individual variability, the “AminoIndex technology” can be
used to separate certain disease and physiological states. We believe that the amino acids are
a convenient metabolomic subset to use as a model for the development of metabolomics
based diagnostics, and that in the near future, other metabolites could be added to the
current analytical platform as practical issues such as stability are solved. At the same time,
the universality of the findings must be examined and it should be studied whether the data
set we have obtained for the Japanese population is applicable to other populations. We
believe that there is a great potential to use metabolome based markers in preliminary
diagnostic screening for multiple diseases in which a single measurement of a metabolomic
subset can lead to multiple diagnoses. One further advantage of the focused metabolomics
multiple metabolite marker approach is that since the biomarkers are generated from a
combination of already measured markers, new markers can be generated against any
measured target parameter. This means that if a focused metabolomic subset data is
obtained at the beginning of a treatment or an experiment, the generation of predictive
markers can be attempted with the outcome of the treatment or experiment as the target
parameter. We believe this would be of great use in tailor-made medicine and nutrition, as it
may be possible to discriminate populations for which certain pharmaceutical or nutritional
interventions would be useful or not.
6. References
Alpert, A. J. (1990) Hydrophilic-interaction chromatography for the separation of peptides,
nucleic acids and other polar compounds. J Chromatogr 499 pp.177-196
Aranibar, N., Borys, M., Mackin, N. A., Ly, V., Abu-Absi, N., Abu-Absi, S., Niemitz, M.,
Schilling, B., Li, Z. J., Brock, B., Russell, R. J., 2nd, Tymiak, A.&Reily, M. D. (2011)
NMR-based metabolomics of mammalian cell and tissue cultures. J Biomol NMR
49:(3-4) pp.195-206
Aspinall, R. J.&Pockros, P. J. (2004) The management of side-effects during therapy for
hepatitis C. Aliment Pharmacol Ther 20:(9) pp.917-29
Bapat, S. A., Krishnan, A., Ghanate, A. D., Kusumbe, A. P.&Kalra, R. S. (2010) Gene
expression: protein interaction systems network modeling identifies
transformation-associated molecules and pathways in ovarian cancer. Cancer Res
70:(12) pp.4809-19
Barr, J., Vazquez-Chantada, M., Alonso, C., Perez-Cormenzana, M., Mayo, R., Galan, A.,
Caballeria, J., Martin-Duce, A., Tran, A., Wagner, C., Luka, Z., Lu, S. C., Castro, A.,
Le Marchand-Brustel, Y., Martinez-Chantar, M. L., Veyrie, N., Clement, K.,
Tordjman, J., Gual, P.&Mato, J. M. (2010) Liquid chromatography-mass
spectrometry-based parallel metabolic profiling of human and mouse model serum
reveals putative biomarkers associated with the progression of nonalcoholic fatty
liver disease. J Proteome Res 9:(9) pp.4501-12
Bollard, M. E., Holmes, E., Lindon, J. C., Mitchell, S. C., Branstetter, D., Zhang,
W.&Nicholson, J. K. (2001) Investigations into biochemical changes due to diurnal
Clinical Implementation of Metabolomics 307
variation and estrus cycle in female rats using high-resolution (1)H NMR
spectroscopy of urine and pattern recognition. Anal Biochem 295:(2) pp.194-202
Bou Khalil, M., Hou, W., Zhou, H., Elisma, F., Swayne, L. A., Blanchard, A. P., Yao, Z.,
Bennett, S. A.&Figeys, D. (2010) Lipidomics era: accomplishments and challenges.
Mass Spectrom Rev 29:(6) pp.877-929
Bucci, M. (2011) Lipidomics: A viral egress. Nat Chem Biol 7:(9) pp.577
Caesar, R., Manieri, M., Kelder, T., Boekschoten, M., Evelo, C., Muller, M., Kooistra, T., Cinti,
S., Kleemann, R.&Drevon, C. A. (2010) A combined transcriptomics and lipidomics
analysis of subcutaneous, epididymal and mesenteric adipose tissue reveals
marked functional differences. PLoS One 5:(7) pp.e11525
Caruana, R., Niculescu-Mizil, A. (2006) An Empirical Comparison of Supervised Learning
Algorithms. In ICML2006 pp 161-168
Cascino, A., Muscaritoli, M., Cangiano, C., Conversano, L., Laviano, A., Ariemma, S.,
Meguid, M. M.&Rossi Fanelli, F. (1995) Plasma amino acid imbalance in patients
with lung and breast cancer. Anticancer Res 15:(2) pp.507-10
Castera, L., Vergniol, J., Foucher, J., Le Bail, B., Chanteloup, E., Haaser, M., Darriet, M.,
Couzigou, P.&De Ledinghen, V. (2005) Prospective comparison of transient
elastography, Fibrotest, APRI, and liver biopsy for the assessment of fibrosis in
chronic hepatitis C. Gastroenterology 128:(2) pp.343-50
Cynober, L. A. (2004) Metabolic and therapeutic aspects of amino acids in clinical nutrition, 2nd
edn, CRC Press, 0-8493-1382-1, Roca Raton
Dang, C. V. (2010) Rethinking the Warburg effect with Myc micromanaging glutamine
metabolism. Cancer Res 70:(3) pp.859-62
DeFeo, E. M., Wu, C. L., McDougal, W. S.&Cheng, L. L. (2011) A decade in prostate cancer:
from NMR to metabolomics. Nat Rev Urol 8:(6) pp.301-11
Dennis, E. A. (2009) Lipidomics joins the omics evolution. Proc Natl Acad Sci U S A 106:(7)
pp.2089-90
Douvlis, Z. (1999) Interference of amino acid patterns and tissue-specific amino acids
absorption dominance under the influence of tumor cell protein degradation
toxins. Med Hypotheses 53:(5) pp.450-7
Duda, R. O., Hart, P. E., Stork., D. G. (2001) Pattern Classification, 2nd edn, Wiley-
Interscience, 0-471-05669-3, New York
Dunn, W. B., Broadhurst, D. I., Atherton, H. J., Goodacre, R.&Griffin, J. L. (2011) Systems
level studies of mammalian metabolomes: the roles of mass spectrometry and
nuclear magnetic resonance spectroscopy. Chem Soc Rev 40:(1) pp.387-426
Fiehn, O. (2002) Metabolomics--the link between genotypes and phenotypes. Plant Mol Biol
48:(1-2) pp.155-71
Filho, J. C., Bergstrom, J., Stehle, P.&Furst, P. (1997) Simultaneous measurements of free
amino acid patterns of plasma, muscle and erythrocytes in healthy human subjects.
Clin Nutr 16:(6) pp.299-305
Fischer, J. E., Funovics, J. M., Aguirre, A., James, J. H., Keane, J. M., Wesdorp, R. I.,
Yoshimura, N.&Westman, T. (1975) The role of plasma amino acids in hepatic
encephalopathy. Surgery 78:(3) pp.276-90
308 Metabolomics
Fischer, J. E., Rosen, H. M., Ebeid, A. M., James, J. H., Keane, J. M.&Soeters, P. B. (1976) The
effect of normalization of plasma amino acids on hepatic encephalopathy in man.
Surgery 80:(1) pp.77-91
Fonteh, A. N., Harrington, R. J., Huhmer, A. F., Biringer, R. G., Riggins, J. N.&Harrington,
M. G. (2006) Identification of disease markers in human cerebrospinal fluid using
lipidomic and proteomic methods. Dis Markers 22:(1-2) pp.39-64
Forslund, A. H., Hambraeus, L., van Beurden, H., Holmback, U., El-Khoury, A. E., Hjorth,
G., Olsson, R., Stridsberg, M., Wide, L., Akerfeldt, T., Regan, M.&Young, V. R.
(2000) Inverse relationship between protein intake and plasma free amino acids in
healthy men at physical exercise. Am J Physiol Endocrinol Metab 278:(5) pp.E857-67
Fox, C. J., Hammerman, P. S.&Thompson, C. B. (2005) Fuel feeds function: energy
metabolism and the T-cell response. Nat Rev Immunol 5:(11) pp.844-52
Fried, M. W. (2002) Side effects of therapy of hepatitis C and their management. Hepatology
36:(5 Suppl 1) pp.S237-44
German, J. B., Gillies, L. A., Smilowitz, J. T., Zivkovic, A. M.&Watkins, S. M. (2007)
Lipidomics and lipid profiling in metabolomics. Curr Opin Lipidol 18:(1) pp.66-71
Grimplet, J., Cramer, G. R., Dickerson, J. A., Mathiason, K., Van Hemert, J.&Fennell, A. Y.
(2009) VitisNet: "Omics" integration through grapevine molecular networks. PLoS
One 4:(12) pp.e8365
Gruning, N. M., Lehrach, H.&Ralser, M. (2010) Regulatory crosstalk of the metabolic
network. Trends Biochem Sci 35:(4) pp.220-7
Gu, H., Pan, Z., Xi, B., Asiago, V., Musselman, B.&Raftery, D. (2011) Principal component
directed partial least squares analysis for combining nuclear magnetic resonance
and mass spectrometry data in metabolomics: application to the detection of breast
cancer. Anal Chim Acta 686:(1-2) pp.57-63
Han, X., Rozen, S., Boyle, S. H., Hellegers, C., Cheng, H., Burke, J. R., Welsh-Bohmer, K. A.,
Doraiswamy, P. M.&Kaddurah-Daouk, R. (2011) Metabolomics in early
Alzheimer's disease: identification of altered plasma sphingolipidome using
shotgun lipidomics. PLoS One 6:(7) pp.e21643
Hanley, J. A.&McNeil, B. J. (1982) The meaning and use of the area under a receiver
operating characteristic (ROC) curve. Radiology 143:(1) pp.29-36
Hilvo, M., Denkert, C., Lehtinen, L., Muller, B., Brockmoller, S., Seppanen-Laakso, T.,
Budczies, J., Bucher, E., Yetukuri, L., Castillo, S., Berg, E., Nygren, H., Sysi-Aho, M.,
Griffin, J. L., Fiehn, O., Loibl, S., Richter-Ehrenstein, C., Radke, C., Hyotylainen, T.,
Kallioniemi, O., Iljin, K.&Oresic, M. (2011) Novel theranostic opportunities offered
by characterization of altered membrane lipid metabolism in breast cancer
progression. Cancer Res 71:(9) pp.3236-45
Hulley, S. B., Cummings, R. B., Browner, W. S., Grady, D. G., Newman, T. B. (2006)
Designing Clinical Research: An Epidemiologic Approach 3rd edn, Lippincott Williams
& Willkins, Inc., Philadelphia
Hunter, P. (2009) Reading the metabolic fine print. The application of metabolomics to
diagnostics, drug research and nutrition might be integral to improved health and
personalized medicine. EMBO Rep 10:(1) pp.20-3
Clinical Implementation of Metabolomics 309
Imbert-Bismut, F., Ratziu, V., Pieroni, L., Charlotte, F., Benhamou, Y.&Poynard, T. (2001)
Biochemical markers of liver fibrosis in patients with hepatitis C virus infection: a
prospective study. Lancet 357:(9262) pp.1069-75
Ishii, N., Nakahigashi, K., Baba, T., Robert, M., Soga, T., Kanai, A., Hirasawa, T., Naba, M.,
Hirai, K., Hoque, A., Ho, P. Y., Kakazu, Y., Sugawara, K., Igarashi, S., Harada, S.,
Masuda, T., Sugiyama, N., Togashi, T., Hasegawa, M., Takai, Y., Yugi, K., Arakawa,
K., Iwata, N., Toya, Y., Nakayama, Y., Nishioka, T., Shimizu, K., Mori, H.&Tomita,
M. (2007) Multiple high-throughput analyses monitor the response of E. coli to
perturbations. Science 316:(5824) pp.593-7
Jenkins, H., Hardy, N., Beckmann, M., Draper, J., Smith, A. R., Taylor, J., Fiehn, O.,
Goodacre, R., Bino, R. J., Hall, R., Kopka, J., Lane, G. A., Lange, B. M., Liu, J. R.,
Mendes, P., Nikolau, B. J., Oliver, S. G., Paton, N. W., Rhee, S., Roessner-Tunali, U.,
Saito, K., Smedsgaard, J., Sumner, L. W., Wang, T., Walsh, S., Wurtele, E. S.&Kell,
D. B. (2004) A proposed framework for the description of plant metabolomics
experiments and their results. Nat Biotechnol 22:(12) pp.1601-6
Kaddurah-Daouk, R., Kristal, B. S.&Weinshilboum, R. M. (2008) Metabolomics: a global
biochemical approach to drug response and disease. Annu Rev Pharmacol Toxicol 48
pp.653-83
Kano, T., Nagaki, M., Takahashi, T., Ohnishi, H., Saitoh, K., Kimura, K.&Muto, Y. (1991)
Plasma free amino acid pattern in chronic hepatitis as a sensitive and prognostic
index. Gastroenterol Jpn 26:(3) pp.344-9
Kell, D. B. (2002) Metabolomics and machine learning: explanatory analysis of complex
metabolome data using genetic programming to produce simple, robust rules. Mol
Biol Rep 29:(1-2) pp.237-41
Kell, D. B. (2006) Systems biology, metabolic modelling and metabolomics in drug discovery
and development. Drug Discov Today 11:(23-24) pp.1085-92
Kim, Y., Koo, I., Jung, B. H., Chung, B. C.&Lee, D. (2010) Multivariate classification of urine
metabolome profiles for breast cancer diagnosis. BMC Bioinformatics 11 Suppl 2
pp.S4
Kimura, T., Noguchi, Y., Shikata, N.&Takahashi, M. (2009) Plasma amino acid analysis for
diagnosis and amino acid-based metabolic networks. Curr Opin Clin Nutr Metab
Care 12:(1) pp.49-53
Kirouac, D. C., Ito, C., Csaszar, E., Roch, A., Yu, M., Sykes, E. A., Bader, G. D.&Zandstra, P.
W. (2010) Dynamic interaction networks in a hierarchically organized tissue. Mol
Syst Biol 6 pp.417
Lai, H. S., Lee, J. C., Lee, P. H., Wang, S. T.&Chen, W. J. (2005) Plasma free amino acid
profile in cancer patients. Semin Cancer Biol 15:(4) pp.267-76
Lee, J. C., Chen, M. J., Chang, C. H., Tiai, Y. F., Lin, P. W., Lai, H. S.&Wang, S. T. (2003)
Plasma amino acid levels in patients with colorectal cancers and liver cirrhosis with
hepatocellular carcinoma. Hepatogastroenterology 50:(53) pp.1269-73
Lin, L., Huang, Z., Gao, Y., Yan, X., Xing, J.&Hang, W. (2011a) LC-MS based serum
metabonomic analysis for renal cell carcinoma diagnosis, staging, and biomarker
discovery. J Proteome Res 10:(3) pp.1396-405
310 Metabolomics
Lin, Z. H., Xin, Y. N., Dong, Q. J., Wang, Q., Jiang, X. J., Zhan, S. H., Sun, Y.&Xuan, S. Y.
(2011b) Performance of the aspartate aminotransferase-to-platelet ratio index for
the staging of hepatitis C-related fibrosis: an updated meta-analysis. Hepatology
53:(3) pp.726-36
Lu, X., Zhao, X., Bai, C., Zhao, C., Lu, G.&Xu, G. (2008) LC-MS-based metabonomics
analysis. J Chromatogr B Analyt Technol Biomed Life Sci 866:(1-2) pp.64-76
Maeda, J., Higashiyama, M., Imaizumi, A., Nakayama, T., Yamamoto, H., Daimon, T.,
Yamakado, M., Imamura, F.&Kodama, K. (2010) Possibility of multivariate function
composed of plasma amino acid profiles as a novel screening index for non-small
cell lung cancer: a case control study. BMC Cancer 10:(1) pp.690
Mantovani, A., Allavena, P., Sica, A.&Balkwill, F. (2008) Cancer-related inflammation.
Nature 454:(7203) pp.436-44
Matthew, E. M., Hart, L. S., Astrinidis, A., Navaraj, A., Dolloff, N. G., Dicker, D. T., Henske,
E. P.&El-Deiry, W. S. (2009) The p53 target Plk2 interacts with TSC proteins
impacting mTOR signaling, tumor growth and chemosensitivity under hypoxic
conditions. Cell Cycle 8:(24) pp.4168-75
Medina, M. A., Marquez, J.&Nunez de Castro, I. (1992) Interchange of amino acids between
tumor and host. Biochem Med Metab Biol 48:(1) pp.1-7
Mercier, S., Breuille, D., Mosoni, L., Obled, C.&Patureau Mirand, P. (2002) Chronic
inflammation alters protein metabolism in several organs of adult rats. J Nutr
132:(7) pp.1921-8
Metavir. (1994) Intraobserver and interobserver variations in liver biopsy interpretation in
patients with chronic hepatitis C. The French METAVIR Cooperative Study Group.
Hepatology 20:(1 Pt 1) pp.15-20
Min, H. K., Lim, S., Chung, B. C.&Moon, M. H. (2011) Shotgun lipidomics for candidate
biomarkers of urinary phospholipids in prostate cancer. Anal Bioanal Chem 399:(2)
pp.823-30
Momin, A. A., Park, H., Portz, B. J., Haynes, C. A., Shaner, R. L., Kelly, S. L., Jordan, I.
K.&Merrill, A. H., Jr. (2011) A method for visualization of "omic" datasets for
sphingolipid metabolism to predict potentially interesting differences. J Lipid Res
52:(6) pp.1073-83
Montoliu, I., Martin, F. P., Collino, S., Rezzi, S.&Kochhar, S. (2009) Multivariate modeling
strategy for intercompartmental analysis of tissue and plasma 1H NMR
spectrotypes. J Proteome Res 8:(5) pp.2397-406
Moore, S., Spackman, D. H.&Stein, W. H. (1958) Automatic recording apparatus for use in
the chromatography of amino acids. Fed Proc 17:(4) pp.1107-15
Naini, A. B., Dickerson, J. W.&Brown, M. M. (1988) Preoperative and postoperative levels of
plasma protein and amino acid in esophageal and lung cancer patients. Cancer
62:(2) pp.355-60
Noguchi, Y., Shikata, N., Furuhata, Y., Kimura, T.&Takahashi, M. (2008) Characterization of
dietary protein-dependent amino acid metabolism by linking free amino acids with
transcriptional profiles through analysis of correlation. Physiol Genomics 34:(3)
pp.315-26
Clinical Implementation of Metabolomics 311
Norton, J. A., Gorschboth, C. M., Wesley, R. A., Burt, M. E.&Brennan, M. F. (1985) Fasting
plasma amino acid levels in cancer patients. Cancer 56:(5) pp.1181-6
Okamoto, N., Miyagi, Y., Chiba, A., Akaike, M., Shiozawa, M., Imaizumi, A.,
Yamamoto, H., Ando, T., Yamakado, M.&Tochikubo, O. (2009) Diagnostic
modeling with differences in plasma amino acid profiles between non-cachectic
colorectal/breast cancer patients and healthy individuals. Int J Med Med Sci
1:(1) pp.1-8
Piraud, M., Vianey-Saban, C., Petritis, K., Elfakir, C., Steghens, J. P., Morla, A.&Bouchu, D.
(2003) ESI-MS/MS analysis of underivatised amino acids: a new tool for the
diagnosis of inherited disorders of amino acid metabolism. Fragmentation study of
79 molecules of biological interest in positive and negative ionisation mode. Rapid
Commun Mass Spectrom 17:(12) pp.1297-311
Plumb, R. S., Stumpf, C. L., Granger, J. H., Castro-Perez, J., Haselden, J. N.&Dear, G. J. (2003)
Use of liquid chromatography/time-of-flight mass spectrometry and multivariate
statistical analysis shows promise for the detection of drug metabolites in biological
fluids. Rapid Commun Mass Spectrom 17:(23) pp.2632-8
Poynard, T., Yuen, M. F., Ratziu, V.&Lai, C. L. (2003) Viral hepatitis C. Lancet 362:(9401)
pp.2095-100
Proenza, A. M., Oliver, J., Palou, A.&Roca, P. (2003) Breast and lung cancer are
associated with a decrease in blood cell amino acid content. J Nutr Biochem
14:(3) pp.133-8
Puri, P., Wiest, M. M., Cheung, O., Mirshahi, F., Sargeant, C., Min, H. K., Contos, M. J.,
Sterling, R. K., Fuchs, M., Zhou, H., Watkins, S. M.&Sanyal, A. J. (2009) The
plasma lipidomic signature of nonalcoholic steatohepatitis. Hepatology 50:(6)
pp.1827-38
Qu, Y., Slocum, R. H., Fu, J., Rasmussen, W. E., Rector, H. D., Miller, J. B.&Coldwell, J. G.
(2001) Quantitative amino acid analysis using a Beckman system gold HPLC
126AA analyzer. Clin Chim Acta 312:(1-2) pp.153-62
Quinones, M. P.&Kaddurah-Daouk, R. (2009) Metabolomics tools for identifying biomarkers
for neuropsychiatric diseases. Neurobiol Dis 35:(2) pp.165-76
Rhee, E. P., Cheng, S., Larson, M. G., Walford, G. A., Lewis, G. D., McCabe, E., Yang, E.,
Farrell, L., Fox, C. S., O'Donnell, C. J., Carr, S. A., Vasan, R. S., Florez, J. C., Clish, C.
B., Wang, T. J.&Gerszten, R. E. (2011) Lipid profiling identifies a triacylglycerol
signature of insulin resistance and improves diabetes prediction in humans. J Clin
Invest 121:(4) pp.1402-11
Righi, V., Durante, C., Cocchi, M., Calabrese, C., Di Febo, G., Lecce, F., Pisi, A., Tugnoli, V.,
Mucci, A.&Schenetti, L. (2009) Discrimination of healthy and neoplastic human
colon tissues by ex vivo HR-MAS NMR spectroscopy and chemometric analyses. J
Proteome Res 8:(4) pp.1859-69
Rossi Fanelli, F., Cangiano, C., Muscaritoli, M., Conversano, L., Torelli, G. F.&Cascino, A.
(1995) Tumor-induced changes in host metabolism: a possible marker of neoplastic
disease. Nutrition 11:(5 Suppl) pp.595-600
312 Metabolomics
Roux, A., Lison, D., Junot, C.&Heilier, J. F. (2011) Applications of liquid chromatography
coupled to mass spectrometry-based metabolomics in clinical chemistry and
toxicology: A review. Clin Biochem 44:(1) pp.119-35
Schetter, A. J., Heegaard, N. H.&Harris, C. C. (2010) Inflammation and cancer:
interweaving microRNA, free radical, cytokine and p53 pathways.
Carcinogenesis 31:(1) pp.37-49
Serkova, N. J., Standiford, T. J.&Stringer, K. A. (2011) The Emerging Field of Quantitative
Blood Metabolomics for Biomarker Discovery in Critical Illnesses. Am J Respir Crit
Care Med
Shiffman, M. L. (2004) Management of patients with chronic hepatitis C virus infection and
previous nonresponse. Rev Gastroenterol Disord 4 Suppl 1 pp.S22-30
Shimbo, K., Kubo, S., Harada, Y., Oonuki, T., Yokokura, T., Yoshida, H., Amao, M.,
Nakamura, M., Kageyama, N., Yamazaki, J., Ozawa, S. I., Hirayama, K., Ando, T.,
Miura, J.&Miyano, H. (2009a) Automated precolumn derivatization system for
analyzing physiological amino acids by liquid chromatography/mass
spectrometry. Biomed Chromatogr 24:(7) pp.683-91
Shimbo, K., Oonuki, T., Yahashi, A., Hirayama, K.&Miyano, H. (2009b) Precolumn
derivatization reagents for high-speed analysis of amines and amino acids in
biological fluid using liquid chromatography/electrospray ionization tandem mass
spectrometry. Rapid Commun Mass Spectrom 23:(10) pp.1483-92
Shimbo, K., Yahashi, A., Hirayama, K., Nakazawa, M.&Miyano, H. (2009c)
Multifunctional and highly sensitive precolumn reagents for amino acids in
liquid chromatography/tandem mass spectrometry. Anal Chem 81:(13) pp.5172-
9
Soga, T. (2007) Capillary electrophoresis-mass spectrometry for metabolomics. Methods Mol
Biol 358 pp.129-37
Spagou, K., Wilson, I. D., Masson, P., Theodoridis, G., Raikos, N., Coen, M., Holmes, E.,
Lindon, J. C., Plumb, R. S., Nicholson, J. K.&Want, E. J. (2011) HILIC-UPLC-MS for
exploratory urinary metabolic profiling in toxicological studies. Anal Chem 83:(1)
pp.382-90
Spratlin, J. L., Serkova, N. J.&Eckhardt, S. G. (2009) Clinical applications of metabolomics in
oncology: a review. Clin Cancer Res 15:(2) pp.431-40
Steuer, R. (2006) Review: on the analysis and interpretation of correlations in metabolomic
data. Brief Bioinform 7:(2) pp.151-8
Sugimoto, M., Wong, D. T., Hirayama, A., Soga, T.&Tomita, M. (2010) Capillary
electrophoresis mass spectrometry-based saliva metabolomics identified oral,
breast and pancreatic cancer-specific profiles. Metabolomics 6:(1) pp.78-95
Taylor, N. S., Weber, R. J., White, T. A.&Viant, M. R. (2010) Discriminating between different
acute chemical toxicities via changes in the daphnid metabolome. Toxicol Sci 118:(1)
pp.307-17
Taylor, R.&Singhal, M. (2009) Biological Network Inference and analysis using SEBINI and
CABIN. Methods Mol Biol 541 pp.551-76
Thysell, E., Surowiec, I., Hornberg, E., Crnalic, S., Widmark, A., Johansson, A. I., Stattin, P.,
Bergh, A., Moritz, T., Antti, H.&Wikstrom, P. (2010) Metabolomic characterization
Clinical Implementation of Metabolomics 313
Zhang, P. C.&Pang, C. P. (1992) Plasma amino acid patterns in cancer. Clin Chem 38:(6)
pp.1198-9
Zhang, Q., Takahashi, M., Noguchi, Y., Sugimoto, T., Kimura, T., Okumura, A., Ishikawa,
T.&Kakumu, S. (2006) Plasma amino acid profiles applied for diagnosis of
advanced liver fibrosis in patients with chronic hepatitis C infection. Hepatol Res
34:(3) pp.170-7
Part 6
Improving Analytics
13
1. Introduction
The development and optimisation of genomic, transcriptomic and proteomic technologies
have significantly contributed to the assessment of biological systems and increased our
understanding of gene function and regulation (Kitano 2002, Brown et al., 1999, Pandey and
Mann 2000). In addition, metabolic fingerprinting or metabolomics complement these
approaches by measuring low molecular weight chemicals in biological samples (Nicholson
and Lindon, 2008). The elucidation of the links between genetic regulation, kinetic activities
of enzymes and metabolic reactions is key to understanding homeostatic regulation of living
organisms and the effects of food, diurnal variations disease and drugs (Nicholson et al.,
2003, van der Greef et al., 2003, Plumb et al., 2003). Mapping of these various interactions is
likely to result in applications in disciplines such as agriculture and medicine (Lee et al.,
2007, Borodina and Nielson 2007, Wishart 200, Ducruix et al., 2006). Several analytic tools
have been applied to profile the metabolome.
LC-MS studies are a more recent introduction to the field of metabolomics compared with the
more established techniques of GC-MS and NMR. LC-MS can be used for the analysis of
metabolites with a wide range of molecular weights than those detectable by GC-MS including
polar and non-volatile compounds. With LC-MS, many different chromatographic phases and
thus separation techniques are available when compared with GC-MS (Dunn, 2008).
Targeted metabolomic studies allow the identification and quantification of defined sets of
metabolites and are performed using triple Quadrupole mass spectrometers which provide
sensitivity and selectivity. Non-targeted global metabolomic studies are carried out on
instruments with good mass accuracy such as time of flight and orbitrap mass analysers. In
318 Metabolomics
3. Sample extraction
Plasma samples (200 μl) were extracted with 4 volumes of acetonitrile, using a 96 well
protein precipitation plate (Whatman, Maidstone, UK). The plate was vortexed for 1 min
before a vacuum was applied. The filtered samples were collected in a 96 deep well plate
and plasma extracts were pooled and aliquoted out for further analysis.
Improvement in the Number of Analytic Features Detected by Non-Targeted
Metabolomic Analysis: Influence of the Chromatographic System and the Ionization Technique 319
4. Data processing
Sample features were extracted with the molecular feature extractor (MassHunter
Workstation Software (version B.01.03)). Data were processed using the following
320 Metabolomics
conditions: restrict retention time to 0.20 - 8.5 min, restrict m/z to 100-800, absolute height
threshold: 25000 or 2500, mass tolerance: 0.05, peaks with height: > 100 counts, isotope
grouping: peak spacing tolerance: 0.0025 m/z, plus 7.0 ppm, isotope model: common
organic model, mass filters: filter mass list: 20 ppm.
The list of features consisting of retention times (RT) and molecular masses was then
analysed using GeneSpring MS Analysis Platform (v1.2, Agilent Technologies, Inc., Santa
Clara, CA) where they were aligned and normalized.
Data were then imported into Excel spreadsheets and mean, SD and CV of all features was
calculated.
% of % change in features
Total no. No. of
System features with <25% relative
Method of features
pressure with to conventional
features <25% CV
<25% CV system
aChromatography performed on a reverse phase Waters Acquity T3 column with a 7.5min 0.1% formic
acid: acetonitrile gradient. Mass spectrometry analysis performed on 6530 QTOF in ESI mode. Data was
extracted using MassHunter Qualitative software package and GeneSpringMS and then exported to
Excel where statistics were performed.
Table 1. Effect of flow rate on the number and reproducibility of features present in technical
replicates of human plasma extracts conventional 1200 LC and novel 1290 Infinity LC in ESI
mode a (n=3).
Improvement in the Number of Analytic Features Detected by Non-Targeted
Metabolomic Analysis: Influence of the Chromatographic System and the Ionization Technique 321
Under similar conditions, the novel 1290 LC system generated 16% higher and
reproducible features compared with the conventional system without any change in
analytic conditions. Increasing the flow rate to 0.6 ml/min with the same gradient
increased the number of reproducible features to 1149 allowing a 44% improvement
when compared with the 1200 LC system. When the flow rate was increased from 0.4 to
0.6 ml/min many additional features were detected that were not previously observed
following separation by the 1200 system. Careful examination of the features showed
that they were mainly ions that had not previously eluted from the column at 0.4
ml/min. Further increase in flow rate showed that fewer reproducible features were
detected. The pressure in the system with the column installed was significantly lower in
the 1290 system with a back pressure of at 132 bar at 0.4 ml/min on the 1290 versus 450
bars on the 1200 respectively. At 1 ml/min, the pressure on the 1290 system was 605 bars
only. It is possible that the different composition of the pistons and their independent
operation together with the novel mixing technology used in the 1290 Infinity LC system
can explain the decreased pressures compared with the conventional 1200 (data not
shown)
The length of the gradient was then shortened to 5.5 minutes and 3.5 minutes
respectively but this resulted in a significant decrease in reproducibility (Table 2). In fact,
we noted that the isocratic segment of the gradient had to be extended in order to avoid
carry-over from previous samples which defeated the purpose of a shorter analytic run
(data not shown).
% of
Total no. No. of % change in features
features
Methodb of features with <25% relative to
with <25%
features <25% CV conventional system
CV
aChromatography performed on a reverse phase Waters Acquity T3 column with a 7.5min 0.1% formic
acid: acetonitrile gradient. Mass spectrometry analysis performed on 6530 QTOF in ESI mode. Data was
extracted using MassHunter Qualitative software package and GeneSpringMS and then exported to
Excel where statistics were performed.
Each gradient was preceded by 0.5min of 100% A (0.1% formic acid in water) and followed by 2min of 100% B
(0.1 formic acid in acetonitrile).
Table 2. The effect of gradient duration on the number and reproducibility of features
present in technical replicates of human plasma extracts using conventional 1200 and novel
1290 Infinity LC systems in ESI mode a (n=3).
322 Metabolomics
Following triplicate analysis of human plasma on the conventional 1200 LC system using
the 150 mm column with our previously described gradient, there was no significant
improvement in total or reproducible number of features when compared with the 100 mm
column regardless of the flow rate (data not shown). Our conclusion for the data from the
ESI source was that a flow rate of 0.6 ml/min was optimal with the 100 mm column with the
original 7.5 minute gradient.
We then proceeded to evaluate the effect of the Jet Stream technology on the number of
features detected and their repeatability. Incremental temperatures of 50°C of heated
nitrogen sheath gas; from 200°C to 400°C were applied and evaluated. At 0.6 ml/min
with a sheath gas of 200°C, both the total and reproducible features were more than
doubled when compared to the equivalent result with the ESI source. Overall, 50% of
features showed less 25% CV over triplicate analysis (Table 3). This represents a 173%
increase in reproducible features when compared with the conventional 1200 LC system
and the ESI source.
% of
Total no. No. of % change in features
features
Method of features with <25% relative to
with <25%
features <25% CV conventional system
CV
acid: acetonitrile gradient. Mass spectrometry analysis performed on 6530 QTOF in ESI mode. Data was
extracted using MassHunter Qualitative software package and GeneSpringMS and then exported to
Excel where statistics were performed.
Table 3. The effect of Jet Stream (JS) sheath gas and flow rates on number and
reproducibility of features present in technical replicates of human plasma extracts the novel
1290 Infinity LC system coupled to a 6530 QTOF using Jet Stream technology a (n=3)
Improvement in the Number of Analytic Features Detected by Non-Targeted
Metabolomic Analysis: Influence of the Chromatographic System and the Ionization Technique 323
The increased number of features ionized by the Jet Stream technology when compared
with ESI is illustrated in Figure 1 which shows overlayed total ion chromatograms. In
contrast to our results with the ESI source, increasing the flow rate to 0.8 ml/min further
increased the number of reproducible features when compared with 0.6 ml/min (Table 3).
Increasing the temperature of the sheath gas by increments of 50°C at both 0.6 and 0.8
ml/min gradually increased the number of features. At 400°C, a striking 5095 features were
detected; 3310 of which showed less than 25% CV.
We were concerned that this significant increase in features with increase in temperature
could be the result of thermal degradation of ions. . To address this, a Venn diagram derived
from GeneSpring MS shows the ions present at at both temperatures, at 200°C or 400°C
only. (Figure 2).
The heat plot in Figure 3 demonstrates that all the 1491 ions present at 400° C were weak
whereas more than half of the 888 were stronger in intensity, suggesting that the increase
in temperature fragmented the ions and that thermal degradation occurred. Further
examination of the features at 200°C in Jet Stream and ESI at 0.6 ml/min, showed that
>2000 features were specific to Jet Stream alone. By analysing the most intense features
we found that they were not split features due to errors in the automatic processing
software. We were concerned that some of these features may be detectable in ESI at a
lower threshold. Therefore, we proceeded to lower the threshold to 2500 in both Jet
Stream and ESI and found that 10599 and 15424 total features and 2792 and 4163
reproducible features were detected respectively in the two systems with 26%
reproducibility obtained using both systems. Comparison of these features showed that a
proportion was only present in Jet Stream and the remainder were detectable at low
intensity and not reproducible.
324 Metabolomics
Fig. 2. Comparison of features observed with Jet Stream and ESI. Venn diagram showing
features found in human plasma extracts using LC 1290 coupled to 6530 QTOF coupled with
Jet Stream technology using sheath gas temperature of either 200°C or 400°C at 0.8 ml/min.
The features present at 200°C were 888, 1491 were present only at 400°C and 3438 were
present at both temperatures.
Improvement in the Number of Analytic Features Detected by Non-Targeted
Metabolomic Analysis: Influence of the Chromatographic System and the Ionization Technique 325
Fig. 3. Abundance of features observed exclusively at 200°C and 400°C. Heat plot depicting
the abundance of features present in human plasma extracts using LC1290 coupled to 6530
QTOF coupled with Jet Stream technology at either 200°C or 400°C at 0.8 ml/min. Each line
represents one feature found exclusively at either temperature, with red representing those
features present in a low high intensity and in blue those present in low intensity.
In summary, the Jet Stream technology increased the overall number of features when
compared with the ESI but thermal degradation occurred above 200°C, which is therefore
the optimal temperature to use under the conditions studied.
Our data demonstrates the advantage of the new LC system which allowed operation at
higher flow rates with low back pressure and very reproducible analysis. The increase in
flow rate resulted in a predictable increase in peak capacity (Figure 4).
326 Metabolomics
Fig. 4. Peak capacity versus flow rate determined for creatine (m/z 132.07), phenylalanine
(166.08), tryptophan (m/z 204.10) and glycerophosphocholine (m/z 496.34) and ion m/z
332.33 using the conventional 1200 LC system and novel 1290 LC system coupled to ESI and
Jet Stream technology. Chromatographic separation was carried out on a 100mmx2.1mm ID
reverse phase Waters Acquity T3 column with a 7.5min 0.1% formic acid: acetonitrile
gradient followed by 2 min isocratic.
However, saturation of desolvation may have occurred at flow rates of 0.8 ml/min and above,
as mentioned previously, and features started to disappear. For example we could no longer
detect creatine (m/z 132.07) and glycerophosphocholine (m/z 496.34) with ESI above a flow rate
of 0.6 ml/min. The Jet Stream technology detected these features at higher flow rate when
compared with the ESI and showed the expected linear increase in peak capacity with flow rate.
We were concerned however, that at higher flow rates, less data points were collected across
chromatographic peaks and proceeded to test various scan rates (2, 4 and 6Hz) at 0.6 and 0.8
ml/min (Table 4a). For a limited number of compounds the number of points across peaks
for early and late eluting metabolites (0.5-0.8 min, n=2, and 6-6.5 min, n=4) were measured).
Decreasing the scan rate increased the number of points across the peaks but decreased the
sensitivity of the analysis resulting in a significant decrease in the total and reproducible
number of features (Table 4b).
For example, at 0.8 ml/min, at 2Hz; only 4566 features were detected and 2056 with less than
25%CV as opposed to 3716 and 1294 at 4Hz and 3233 and 893 at 6Hz. At 2Hz, there were an
average number of 5 points across chromatographic peaks versus 15 at 4Hz, and in late eluting
peaks an average of 10 points were monitored across peaks at 2Hz and 25 at 4Hz.
Interestingly, the overall percentage of reproducible features was similar at all acquisition rates
suggesting that the loss of sensitivity observed at 4 or 6GHz occurs equally across total
features whether they are variable or not. The number of reproducible features before 2 min at
0.6 ml/min was: 28 at 2Hz versus 40 and 41 at 6Hz (data not shown). The lower number of
reproducible features at 2Hz when compared to 4 and 6 suggests that early polar metabolites
being defined with less than 10 points across peaks are less reproducibly detected.
Improvement in the Number of Analytic Features Detected by Non-Targeted
Metabolomic Analysis: Influence of the Chromatographic System and the Ionization Technique 327
acid: acetonitrile gradient. Mass spectrometry analysis performed on 6530 QTOF in ESI mode.
Chromatograms were then analysed using MassHunter Qualitative software.
Table 4a. Evaluation of number of points across chromatographic peaks and mass accuracy
of selected metabolites using 2Hz, 4Hz and 6Hz scan rates at 0.6ml/min and 0.8ml/min a
acid: acetonitrile gradient. Mass spectrometry analysis performed on 6530 QTOF in ESI mode.
Chromatograms were then analysed using MassHunter Qualitative software.
Table 4b. The effect of scan rate on number and reproducibility of features present in
technical replicates of human plasma extracts the novel 1290 Infinity LC system coupled to a
6530 QTOF using Jet Stream technology a (n=3)
This study clearly demonstrates the considerable challenges associated with reproducibly
and sensitively acquiring metabolomic data. Increasing the flow rates eluted more non-polar
metabolites off the column but eventually at the detriment of polar metabolites that became
undersampled. Whatever choice is made in analytic conditions cannot be optimal for all
metabolites. It has to be noted that the number of features do not correspond to the number
of metabolites. For example other studies have described up to 23 features for a given
metabolite which further complicates matters (these may include multiply charged ions and
in source fragmentation ions) (Evans 2009).
In conclusion, our study describes the much improved effect of the 1290 LC system together
with the Jet Stream technology on the number of features detected compared with the 1200
LC system and ESI. It is clear that this increased number of features corresponds both to an
increased number of metabolites eluting from the column at higher flow rates and an
additional number of species being reproducibly ionized by the Jet Stream technology when
compared with the ESI source. Our preferred analytic conditions use the 100 mm analytic
column, a 7.5 min gradient (total run time 10 min plus 3 min equilibration) with a flow rate
of 0.6 ml/min and an acquisition rate of 2Hz.
328 Metabolomics
6. References
Brown, P. O.; Botstein, D. Nat Genet 1999, 21, 33-37.
Bruce S.J., Jonsson P., Clorarec C., Trygg J., Marklund S.L., Moritz T. Anal. Biochem., 2008,
372, 237-249.
Crews, B., Wikoff, W.R., Patti, G.J., Woo, H.K., Kalisiak, E., Heideker, J., and Siuzdak, G.
Anal Chem 2009, 81, 8538-8544.
Dunn, W.B. Phys Biol 2008, 5, 100-111
Evans, A.M., DeHaven C.D., Barrett T., Mitchell M. and Milgram E. J Anal Chem 2009
81 (16), pp 6656–6667
Kitano, H. Nature 2002, 420, 206-210.
Lai, L., Michopoulos, F., Gika, H., Theodoridis, G., Wilkinson, R.W., Odedra, R., Wingate, J.,
Bonner, R., Tate, S., and Wilson, I.D. Mol Biosyst 2010 6, 108-120.
Lee, S. H.; Woo, H. M.; Jung, B. H.; Lee, J.; Kwon, O. S.; Pyo, H. S.; Choi, M. H.; Chung, B. C.
Anal Chem 2007, 79, 6102-6110.
Lenz, E. M.; Bright, J.; Knight, R.; Wilson, I. D.; Major, H. Analyst 2004, 129, 535-541.
Nicholson, J. K.; Lindon, J. C. Nature 2008, 455, 1054-1056.
Nicholson, J. K.; Wilson, I. D. Nat Rev Drug Discov 2003, 2, 668-676.
Pandey, A.; Mann, M. Nature 2000, 405, 837-846.
Pandher, R., Ducruix, C., Eccles, S.A., and Raynaud, F.I. J Chromatogr B Analyt Technol
Biomed Life Sci. 2009, 877, 1352-1358.
Plumb, R.; Granger, J.; Stumpf, C.; Wilson, I. D.; Evans, J. A.; Lenz, E. M. Analyst 2003, 128,
819-823.
Quinones, M.P., and Kaddurah-Daouk, R. Neurobiol Dis 2009 35, 165- 176.
Sabatine M.S., Liu, E., Morrow, D.A., Heller E, McCarroll R., Wiegand R., Berriz G.F., Roth
F.P., Gerszten R.E., . Circulation. 2005, 112. 3868-3875.
Want E.J., O’Maille G., Smith C.A., Brabdon T.R., Uritboonthai W., Qui C., Trauger S.A.,
Siuzdak, G. Anal. Chem., 2006, 78, 743-752.
van der Greef, J.; Stroobant, P.; van der Heijden, R. Curr Opin Chem Biol 2004, 8, 559-565.
Zelena E., Dunn W, D. Broadhurst, S. Francis-McIntyreK. Carroll, P.Begley, S. O’Hagan, J.D.
Knowles, A. Halsall, HUSERMET Consortium, I.D. Wilson,D. Kell, D. Anal. Chem.,
2009, 81, 1357-1364.
Part 7
1. Introduction
Agriculture’s ability to supply an abundance of nutritious foods and feeds to nourish the
world’s growing population faces serious challenges (Foresight, 2011). In order to meet
these challenges, plant breeders will be required to continuously improve agricultural
productivity as well as enhance food and feed quality. In recent years, the development of
methods for the direct introduction of new traits to produce transgenic varieties – also
known as GM crops – has proven to be a powerful tool in the hands of breeders. In most
countries, however, GM crops are subjected to rigorous pre-market regulatory assessments
that require numerous laboratory and field studies and which consume time and resources
(Kalaitzandonakes et al., 2007).
Comprehensive compositional analyses represent a key component of the pre-market safety
evaluations of GM crops (Harrigan, et al., 2010). These analyses typically include the
measurement of levels of key nutrients such as protein, storage oil, fiber, amino acids, fatty
acids, vitamins, as well as crop-specific metabolites such as gossypol and cyclopropenoid
fatty acids in cotton or isoflavones in soybean. The Organization of Economic Cooperation
and Development (OECD) has produced a series of consensus documents that identify key
analytes in a number of major crop varieties (http://www.oecd.org). These documents
carefully review the composition and uses for each crop and identify those components that
contribute to nutritional or functional food or feed value as well as components that might
confer health-beneficial, health-protective, or harmful effects (e.g. allergens, anti-nutrients,
and potential toxicants). The large-scale compositional studies performed as part of
regulatory assessments must follow internationally accepted guidelines. These are outlined
in detail by Codex Alimentarius (Codex Alimentarius, 2008) and OECD. In most cases, these
studies are typically conducted under Good Laboratories Practice (GLP), a practice that
places a high premium on documentation and reconstructability of data, method validation
and personnel training, and a requirement for professionally staffed Quality Assurance
Units.
The fact that different crops produce foods or feeds with differing compositions, along with
the fact that human and animal diets vary greatly in their consumption of these crops,
means that each crop plays a unique role in diet and health. Most plant foods in the human
diet make significant contributions to the total intake of just a few macro- and
332 Metabolomics
micronutrients (Senti and Rizek, 1974; Chassy, 2010). It is therefore important to assure that
no changes have occurred that would lower the dietary intake of an essential nutrient; on
the other hand, large changes in the content of one or more nutrients in a crop that supplies
an infrequently consumed food, one which is consumed in small amounts in the diet, or one
which is not an important source of that nutrient in the diet, are of no health consequence
and will have no adverse effect on health (Chassy, 2010).
The identification and analysis of a key set of relevant metabolites is often referred to a
“targeted” compositional analysis. Analyses utilize quantitative assays and the overall
approach allows the generation of data that is easily interpretable from a nutrition and
food/feed safety aspect. Furthermore, since the small molecule metabolite pool in seed is
of low abundance relative to macromolecular components, measurement of
macronutrients approximates the total seed biomass. For example, the small molecule
metabolite pool in corn grain is only ~5% of the total biomass (corn is dominated by
starch, fiber, protein, and fat). Anti-nutrient components in grain such as phytic acid and
raffinose (which represent much of the small molecule metabolite pool) are measured in
regulatory assessments. Other small molecules metabolites can be included if they are an
intended endpoint of compositional or nutritional modification. Otherwise analytical
measurement of the metabolites that constitute this pool, mainly ubiquitous free amino
acids, sugars, and organic acids), is of little value owing to the extreme sensitivity of
metabolite levels to environmental influences and the negligible contribution they make
to safety and nutritional content (Herman et al., 2009; Skogerson et al., 2010, Harrigan et
al., 2007).
In fact, levels of all crop compositional components are influenced markedly by
environment (Harrison and Harrigan, 2011; Harrigan, et al., 2010; Zhou et al., 2011a, 2011b).
To illustrate, as far back as 1983, it was noted that “The concentration of the isoflavones vary
from [soybean] variety to variety, and there are also differences when the same variety is
grown in different locations” (Eldridge and Kwolek, 1983). Given the extensive scientific
literature on isoflavone variability, it was unsurprising that Gutierrez-Gonzalez et al. (2009)
recently concluded that “The range of values of isoflavones is overwhelming, even for
homozygous genotypes growing in the same year and location, which greatly complicates
genetic studies.” This is true for almost all crop compositional components as evidenced by
challenges in enhancing nutritional quality in staple crops through conventional
approaches. Figure 1 illustrates the type of variability than can be observed for metabolites
such as isoflavones.
The use of multiple geographically separate sites is required in regulatory assessments to
allow compositional studies across a wide range of environmental conditions. Indeed,
information on compositional variation in conventional crops with respect to their
responsiveness to environmental factors is necessary to provide context to evaluations of
new GM crops. Studies incorporating four to five replicated field sites utilizing randomized
complete block designs with three blocks per comparator are typical in regulatory
assessments, although the European Food Safety Authority (EFSA) has recently mandated a
minimum of eight replicated sites utilizing randomized complete block designs with four
blocks (EFSA, 2011).
Challenges for Metabolomics as a Tool in Safety Assessments 333
Results to date from these large-scale compositional studies have generally demonstrated
that the effect of transgene insertion is significantly less than the impact of environmental or
germplasm variation on conventional crops (Harrigan et al., 2010). This has allowed some to
question the relevance and design of compositional assessments. One review, for example,
suggests that “the current complexity and resource requirements for compositional studies
on transgenic crops containing input traits are not justified by a commensurate
understanding of safety” (Herman et al., 2009).
Despite continued confirmation that conventional breeding and environmental variation
contribute to compositional variability more so than transgene insertion (Ricroch et al.,
2011), and the resource-intensiveness of the large-scale studies currently required for
regulatory assessments, there remains some interest in the application of profiling
technologies to compare GM and conventional crops. These are often posited in terms of
“gap-filling” (Heineman et al., 2011) or “case-by-case” (Davies, 2010) evaluations. It is also
perceived by many (e.g., Kok et al., 2008) that measurement of “primarily the low-
molecular weight molecules” is more relevant to safety than proteomic or transcriptomic
profiling due to a closer relationship to “the plant phenotype and nutritional and
toxicological characteristics”. This potential advantage of metabolic profiling could be
extended as an improvement over, for example, measurements of gross levels of protein,
fat, and fibers, key nutritional but essentially “safe and inert” components of food. It is
334 Metabolomics
noteworthy that Kok et al. (2008) define metabolomics as the “generation of profiles of
secondary metabolites” whereas most metabolic profiling experiments to date have
focused on primary metabolites. It has also been suggested that untargeted profiling
techniques are unbiased while “targeted” compositional analysis is biased. Finally,
advocates of metabolomic profiling have suggested that such an approach can detect
potentially deleterious totally novel metabolites that would have been missed by
“targeted” analysis, although it should be noted that many profiling technologies require
standards of known identity to accurately identify and measure specific metabolites thus
limiting this potential advantage. In addition, in examples where a new traditionally bred
plant variety has caused toxic effects, this has been attributable to increased levels of well-
known toxicants (Chassy, 2010).
Profiling technologies have confirmed on a case-by-case basis the compositional
“equivalence” of GM crops to their conventional near-isogenic comparators (Ricroch et al.,
2011). Profiling technologies are, however, unlikely to provide immediately interpretable
data in safety assessments that would provide added value to, or otherwise enhance,
rigorously quantitative assessments of known nutrients and anti-nutrients that comprise
foodstuffs. In the case of metabolic profiling, this can be directly attributable to i) the
intrinsically safe nature of food itself, ii) inconsistencies in metabolite coverage versus
quantitative capabilities afforded by different data acquisition technologies, iii) the
ubiquitous and innocuous nature of small molecule metabolites identified in profiling as
well as extreme variability in metabolite levels even within homozygous genotypes, and iv)
the “chasm” between the large number of data generated in profiling experiments and the
ability to interpret them in a way that is meaningful to nutrition and food safety. We now
expand on these observations and further emphasize that a clear distinction between
“substantial equivalence” and food safety should be promoted.
almost exclusive to humans.” In other words, because “cyanogenic plants are surprisingly
well protected from herbivory and yet can be readily detoxified by food processing, … early
farmers fortuitously chose these plants above all others for cultivation.”
Of course, many modern foodstuffs are still associated with “ancestral” secondary
metabolites that may confer nutritional or safety concerns at elevated levels. Classic
examples include glycoalkaloids in potato (NIEHS, 1998), β-N-oxalyl-L-α,β-
diaminopropionic acid (ODAP) in Lathyrus sativus (Bell, 2003), psoralens in celery (Beier and
Oertli, 1983), and gossypol in cotton (Sunlkumar et al., 2006). Targeted measurement of
these components as opposed to broad-based compositional screening is recommended by
Herman et al. (2009); in other words, compositional assessments should focus on molecules
explicitly associated with safety concerns. This is consistent with the observation that in the
very few examples where a new plant variety has caused toxic effects it has been
attributable to well-know toxicants associated with conventionally bred crops and not to a
hitherto undetected metabolite (Chassy, 2010).
It is noteworthy that such targeted assessments could easily facilitate a partnership with
omics researchers conducting semi-targeted profiling on pathways associated with toxic
metabolites to support both early development and commercialization of nutritionally
enhanced products. Such a partnership could, at least in principle, mitigate the current
regulatory burden imposed on new GM crops (Graff et al., 2009; Potrykus, 2010) and
promote the application of omics within modern agricultural biotechnology.
2. Information on compositional variation in conventional crops with respect to their
responsiveness to environmental and genetic factors is necessary to provide context to
evaluations of new GM crops. The need to assess natural variation is also true for metabolomics
yet little information on the impact of conventional breeding on metabolite profiles is available.
The inconsistent coverage of metabolites offered through different data acquisition platforms may
provide challenges in establishing a coherent literature in this area.
Ironically, as mentioned earlier, continued confirmation that conventional breeding,
environment, and germplasm contribute to compositional variation more than transgene
insertion has coincided with increased interest in the use of ‘omics technologies. This
paradox is compounded by the fact that results from these technologies have only
further highlighted the equivalence of GM crops to their conventional counterparts and
reaffirmed the substantial effect of environment and germplasm on compositional and
biochemical variability (see Ricroch et al., 2011). Although there are complexities in the
interpretation of data generated through modern profiling technologies (Broadhurst and
Kell, 2006; Lay et al., 2006) including the fact that the data is not quantitative and there is
no standardized framework for comparisons, the lack of variation between GM crops
and their conventional comparators at the transcriptomic, proteomic, and metabolomic
level has, nonetheless, been independently corroborated. These profiling evaluations
extend to a wide range of plants including wheat (Baker et al., 2006; Gregersen et al.,
2005; Ioset et al., 2007), potato (Catchpole et al., 2005; Defernez et al., 2004; Lehesranta et
al., 2005), soybean (Cheng et al., 2008), rice (Dubouzet et al., 2007; Wakasa et al., 2006),
tomato (Le Gall et al., 2003), tobacco, Arabidopsis (Kristensen et al., 2005), and Gerbera
(Ainasoja et al., 2008).
336 Metabolomics
As with the compositional studies reported above, results from many of the ‘omics studies
emphasize the need to understand natural variation in levels of endogenous metabolites in
providing biological context to pair-wise differences in any recorded profiles (see Figure 1).
Levels of compositional components are sensitive to environmental conditions. This has been
established for, for example, protein and oil in key crops (Panthee et al., 2005; Lam et al., 2010).
Protein levels in soybean seed generally average ~40% dry weight (dwt), with values reported
in the USDA soybean germplasm collection, for example, ranging from 34.1 to 56.8% dwt
(Wilson, 2004). In a recent meta-analysis of environmental effects on soybean composition,
Rotundo and Westgate (2009) observed that water stress, temperature, and/or nitrogen supply
all affected protein levels measured in mature seed.
Variability is even greater for lower abundance small molecule metabolites. Vitamin E (-
tocopherol) is typically only a minor component in soybean but is known to be important in
maintaining oxidative stability of soybean oil. Levels in soybean seed are affected by
environment and germplasm. For example, Britz et al. (2008) showed a greater than 2-fold
variation in levels across three locations in the U.S. over a period of four years. Levels in
soybean seed harvested from six different locations in Eastern Canada over a single year
ranged from 0.87 to 3.32 mg/100g dwt (Seguin et al. 2009). Seguin et al. (2010) point out that
environmental factors associated with variability in vitamin E levels include drought,
temperature, and even crop management systems. The “overwhelming variability” of
isoflavones was mentioned in the introduction (see Figure 1). As will be discussed later, this
“overwhelming variability” can be considered to apply to levels of small molecule
metabolites in harvested seed and grain of most crops.
Encouragingly, many comparative profiling studies on GM and non-GM crops have been
designed to include at least one element of genotypic or environmental variability. This is
exemplified in the following two examples, both of which reaffirm the need to provide
biological context to pairwise-differences between two comparators.
In Baker et al. (2006) NMR-based metabolic profiles of three GM wheat varieties and the
corresponding parents were generated. The incorporated transgenes encoded high-
molecular weight subunits of the storage protein, glutenin. The wheat varieties were grown
at two different sites over three different growing seasons (1999 -2001). Differences between
the GM and parental lines were within the same range as the differences between the
control lines grown on different sites and in different years. Analogous to the approach
adopted in targeted compositional analyses adopting OECD recommendations, this study
emphasized the importance of data from multiple years and multiple sites and that
environmental variation influences metabolome composition.
In Catchpole et al. (2005) two GM potato varieties modified in fructan chemistry were
grown over two different seasons (2001, 2003). Metabolic profiles of the GM and five
conventional crops were generated using flow-injection MS (FIE-MS), GC-MS, and LC-MS.
These demonstrated that differences between the GM and conventional potatoes were due
to the intended metabolic changes, but aside from these targeted changes, the GM crops
were “substantially equivalent to traditional cultivars”. A major finding recognized by the
authors was the large variation in the metabolic profiles of the conventional crops and, as
such, the study emphasized the importance of understanding genotypic variability in
assessments of compositional changes.
Challenges for Metabolomics as a Tool in Safety Assessments 337
An often overlooked aspect of the Catchpole et al. (2005) paper is their demonstration
that levels of glycoalkaloids (-chaconine and -solanine) were normal in the GM
potatoes, a result that is easily interpretable from a food and feed perspective. Our
understanding of nutrients and anti-nutrients forms the basis of attempt to modify crops
through conventional breeding or agricultural biotechnology. It has allowed crops to be
developed by conventional breeding that are deliberately non-equivalent to their
parental progenitors in a wide range of nutritional (and agronomic) characteristics.. As
Rischer et al. (2006) point out “For centuries, conventional plant breeding programs have
produced new traits, higher yields and improved quality. However, little attention has
been paid to metabolic changes occurring in successive generations. The issue has gained
importance only recently in the context of defining thresholds for safety assessments of
GM crops.” It is not immediately obvious why these hitherto neglected metabolites
should now be at the center of such attention. Indeed, there are few studies on small
molecule metabolite changes in crops where macro-molecular composition has been
deliberately changed through conventional breeding (e.g. high oil and high protein corn,
high oil soybean).
Catchpole et al. (2005) in their demonstration of the compositional equivalence of GM
potatoes to conventional lines also remark on the large metabolite variation in
conventional potato as follows; “These significant differences [between conventional
cultivars] were never sought as desired traits in traditional breeding programs, and
overall composition has not given cause for public safety concerns”. Overall, however,
experimental designs that will both account for natural variation and have enough power
to identify differences that can be attributed to transgene insertion will offer opportunities
to maximize the value of omics technologies as tools in plant breeding and the
development of new crops.
3. Metabolomics offers opportunities to generate data on large numbers of metabolites. Most of
these metabolites will be low in abundance and levels will be highly variable. They are also more
likely to include central (and hence ubiquitous) metabolites such as sugars, organic acids, and
free amino acids; metabolites that are not immediately associated with safety or nutritional
relevance.
Compositional assessments of new foodstuffs generally focus on the article of commerce,
most typically harvested seed or grain. This material is generally characterized by high
levels of starch, protein, fat, and fibers, with the small metabolite pool being low in
abundance. For example, approximately 95-98% of maize grain is comprised the
aforementioned materials; the small metabolite pool in grain, is of low abundance (~2-5% of
grain biomass) and its levels are highly dependent on changes in the macromolecular pool.
Soybean seed is comprised 40% protein, 20% fat, and 15% fiber. The residual 15% is
comprised mainly of sugars (e.g. sucrose, raffinose, stachyose, glucose, galactose, fructose)
of which the principal two, raffinose and stachyose, are measured in regulatory assessments.
The fact that the small molecule metabolite pool in seed or grain is of low abundance and
influenced by levels of the major macromolecular nutrients accounts for its extensive
variability (Skogerson et al., 2010; Harrigan et al., 2007).
Skogerson et al. (2010) sought to assess genetic and environmental impacts on the
metabolite composition of corn grain. Their data acquisition technology (gas
338 Metabolomics
Affected by
Metabolite class No. of analytes Affected by Testera
Locationb
free amino acids 26 14 2
sterols, amines, and
17 6 1
others
organic acids 17 6 0
free fatty acids and
17 5 0
related metabolites
sugar alcohols 18 5 0
mono-, di-, and
16 1 0
trisaccharides
sugar acids 8 0 1
aThis indicates a statistically significant difference (p<0.0001) between hybrids derived from a cross
with one tester (C103 heterotic group) versus another tester (Iodents heterotic group) bThis indicates a
statistically significant difference (p<0.0001) across the three locations in this study
Table 1. Variation in Metabolites due to Genotype or Environmental Variation
In an analogous report on the same samples, Harrigan et al. (2007) concluded that, given such
variability, measurement of the small metabolite pool, was unlikely to prove useful to a
comparative assessment of GM crops unless a given metabolite was an intended nutritional or
toxicological endpoint. In fact, it is not immediately obvious how the data generated in
Skogerson et al. (2011) could be used to determine which hybrids in this study were the safest.
In its report in 2004 the US National Research Council made pointed remarks about this
disconnect as summarized in the following quotes. “.. severe imbalances between highly
advanced analytical technologies and limited ability to interpret the results and predict
health effects that result from the consumption of food that is genetically modified” and
“….inherent difficulties, however, in identifying all of the constituents detected in profiling
methods or understanding the activity and potential biological consequence of all genes in
an organism severely limit the usefulness of these methods for predictive purposes..”
Unable to bridge this gap, many profiling proponents make an assumption of safety on the
non-GM comparator and consider statistical differences to equate with unintended effects.
This tendency is described later.
Challenges for Metabolomics as a Tool in Safety Assessments 339
4. Another challenge in establishing a coherent literature on the impact of conventional and other
approaches to breeding on natural variability in metabolite as well as determining a framework
to establish nutritional meaning from metabolite analysis is the differential coverage of
metabolites offered through the numerous data acquisition platforms available to omics
researchers.
As described in numerous articles on metabolomics, (e.g. Goodacre et al., 2004; Rischer and
Oksman-Caldentey, 2006; Kusano et al., 2011) the large physico-chemical diversity of small
molecule metabolites renders comprehensive metabolomic profiling through a single data
acquisition technology impossible. A range of technologies associated with different
detection capabilities (metabolite coverage and sensitivity), precision, resolution,
throughput and reproducibility are now extensively deployed by the research community.
Nuclear magnetic resonance spectroscopy (NMR), gas-chromatography mass spectrometry
(GC-MS), liquid-chromatography (LC)-MS utilizing different ionization modes, Fourier-
transform MS, and capillary electrophoresis (CE)- MS have all been applied in comparative
assessments of GM and non-GM crops. MS approaches predominate over NMR analyses
given their greater sensitivity and coverage; however this advantage does come at the
expense of quantitation (i.e. MS would need an internal standard for every metabolite to be
quantitated) and with a large number of unidentified MS signals in any metabolite profile.
Whilst it has been suggested that untargeted profiling techniques are unbiased, it is clear
that selection of a specific data acquisition technology is a bias and that this type of
analytical bias would need to be justified by pre-specified experimental hypotheses. This
justification would be critical in a Regulatory environment.
Recognizing inherent limitations for any given data acquisition technology Kusano et al. (2011)
applied a multi-platform approach to an evaluation of transgenic tomato. These authors used a
combination of GC-MS, LC-MS, and CE-MS with each technology covering distinct metabolite
classes. Free amino acids, sugars and organic acids were covered by GC-MS, larger molecules
(e.g. flavonoids) by LC-MS whereas CE-MS measured specific cations and anions. Overall, the
data generated 175 unique identified metabolites but a total of 1460 with “no or imprecise
metabolite annotation.” Of the identified metabolites, only 56 were observed in at least two
platforms. A total of 261 peaks showed no correlation with experimental factors (transgene,
cultivar, tissue type) and had to be removed from statistical analyses.
It is worth pointing out that two studies that assessed the metabolic profiles of grain from GM
maize containing the Cry1ab gene and that utilized the same data acquisition platform (NMR)
differed in their conclusions on the impact of transgene insertion on levels of free amino acids
(Manetti et al., 2006; Piccioni et al., 2009). Manetti et al. (2006) reported that the GM crop
included higher levels of sugars (glucose, sucrose, meliobiose), GABA, glutamine, and succinate
and decreased levels of alanine, asparagine, and choline. Piccioni et al. (2009) reported lower
levels of all amino acids, lower sugars, and lower succinate (and other organic acids). Piccioni et
al. (2009) were also able to report on metabolites observed in their NMR profiles but absent in
those of Manetti et al. (2006). Key design differences between the two studies include different
parental lines, different growth conditions, and sample extraction protocols.
Levandi et al. (2008) utilized CE-MS to compare levels of 27 metabolites in three different
GM maize lines also containing the Cry1ab gene. Some of these metabolites (e.g., certain free
amino acids, choline, GABA) were also recorded in the NMR platforms of Mannetti et al
340 Metabolomics
(2006) and Piccioni et al. (2009). No consistent association of these metabolites with the GM
trait when assessed over all three GM lines was observed, a conclusion in line with the
combined results of Mannetti et al. (2006) and Piccioni et al. (2009). Several of the
metabolites reported by Levandi et al. (2006) are more typically associated with other
taxonomic groupings, for example, graveolin (Ruta graveolus, Rutaceae) and lunarine,
(Lunaria annua, Brassicaceae). The assignment of peaks to metabolites not typically
associated with a genus or family would almost certainly require extensive validation in a
Regulatory environment.
Leon et al. (2009) utilized FT-MS on the same samples assessed by Levandi et al. (2008). This
allowed coverage of 5500 mass signals of which approximately 1000 could be assigned an
elemental composition. Those elemental compositions could be associated (through
MasstTRIX) with specific metabolic pathways (KEGG); these associations are referred to as
“isomeric hits”. This approach would identify any differences in GM and non-GM metabolic
profiles, especially where an elemental composition could be assigned, to be tentatively
associated with biochemical differences. Overall, it was shown that a greater number of
isomeric hits in pathways such as arachidonic acid metabolism, free amino acid metabolism,
purine metabolism, and folate biosynthesis were associated with the GM samples. A list of
33 possible compounds that could distinguish the GM and non-GM varieties was generated,
of which 12 could be confirmed in an orthogonal assay (CE-MS). The authors then indicated
that only four of these could be considered as potential GM “biomarkers”; L-carnitine,
apigenidin, 5, 6-dihydroxyindole, and one unidentified metabolite. There is little further
literature on levels of these metabolites in maize and the association of these metabolites as
GM biomarkers is almost certainly premature. Further, the interpretability of the Levandi et
al. (2009) approach is not at all clear; there are fewer isomeric hits associated with inositol
phosphate metabolism, yet levels of phytic acid have been well-established to be near-
identical in GM and non-GM maize. The association of isomeric hits for bile acid
biosynthesis, which is not typically associated with plant metabolism, is also difficult to
interpret.
In summary, different metabolic profiling platforms applied to similar biological questions
will yield non-overlapping solutions. This is due to differential metabolite coverage (even
within similar data acquisition technologies) and is compounded both by the number of
unidentified signals observed in current metabolite profiles and, in some cases,
“identification” of metabolites not previously known to be biosynthetically associated with
the plant species or genus in question.
falls well within the natural variability of metabolite levels and is even less than typical
experimental error, setting a universal threshold for relative magnitude of differences as a
trigger for further safety assessments of GM crops has been considered. In 2000, the Nordic
Council of Ministers recommended that if a component in a GM crop differed from the
conventional control by ±20% in relative magnitude, additional analyses of the GM crop
were warranted (cited in Hothorn and Oberdoerfer, 2006). This concept was refined to
account for the nutritional relevance of a component and the experimental precision of its
measurement (Hothorn and Oberdoerfer, 2006). Threshold ranges for GM components were
suggested as follows; 0.833-1.20 of the conventional control for “nutritionally very relevant”
components (minerals, vitamins, anti-nutrients, bioactives, essential amino acids, and fatty
acids), 0.769-1.30 for “relevant” (non-essential amino and fatty acids), and 0.667-1.50 for
components of “less relevance” (proximates, fiber). Suggestions for the use of limits and
triggers of this kind have been criticized for their failure to fully account for the role and
contributions of the specific crop in the human diet; and with GM crops in particular since
they are often not eaten as such but are used as a source of macronutrients such as oil, starch
and protein (Chassy, 2008; Chassy, 2010). As noted previously, most plant foods in the
human diet make significant contributions to the total intake of just a few macro- and
micronutrients and therefore even large compositional changes in a single crop plant might
produce little impact on the nutritional value of the overall diet. Chassy (2010) has observed
that composition cannot be viewed in isolation since the composition of the diet is far more
important than the composition of a single variety of a single crop. Strictly numerical
approaches have not been adopted in compositional studies and there is no reason they
would be relevant to profiling experiments.
At least one profiling study has attempted to apply statistical equivalence testing but again
falls prey to the dubious association of equivalence with safety. Kusano et al. (2011)
compared a GM-tomato (a miraculin protein expressor) to not only to the parental line but
to a panel of conventional reference varieties. The statistical design (described by the
authors as a proof-of-safety test) involved comparing the difference between test and control
and the determining whether these differences fell within equivalence limits established by
the reference varieties. However such a design makes more of a statement about the
selection of the reference substances and the control to which the GM-trait is introgressed,
and not about the effect of transgene insertion; the same test-to-control differences can be
equivalent or non-equivalent contingent on whether a limited or diverse range of genotypes
is available. The overall conclusion from the study however was that “miraculin over-
expressors are remarkably similar to the control line”.
In summary, there are no defined data analyses strategies currently being consistently
applied to profiling data that would facilitate interpretability of data.
4. Conclusion
There are clearly divergent views about the utility of ‘omics sciences in food safety
assessments. This paper has discussed some of the reasons metabolic profiling technologies
are, however, unlikely to provide immediately interpretable data in safety assessments that
would otherwise enhance rigorously quantitative assessments of known nutrients and anti-
nutrients that comprise foodstuffs. Indeed, it is not clear to the present authors that any new
types of data are in fact necessary to judge GM or other foods as safe. We are also unaware
Challenges for Metabolomics as a Tool in Safety Assessments 343
of any “gaps” in our compositional knowledge that might compromise safety and in fact,
our current understanding of plant anti-nutrients and toxicants, allows GM solutions to
enhancing food safety (e.g. Sunilkumar et al., 2006). The last 25 years of research on GM
plants and 15 years of commercial experience planting GM crops without harm or incident
suggest that no difference in safety that would require further analysis exists between GM
and crops bred by other strategies. All breeding induces genetic changes and these changes
give rise to transcriptomic, proteomic and metabolomic alterations.
We consider that metabolic profiling could increase its value in food safety science as well as
in the development of nutritionally enhanced crops as follows;
1. Improved compositional analysis. One potential target for future research could be to
develop metabolic screening methods that afford a comprehensive compositional
assessment in a single suite of determinations rapidly and at lower cost than traditional
targeted analysis. It is known that the metabolites in a cell form a large, complex and
interconnected network; one possible approach would be elucidation of key metabolic
compound whose determination might provide insight into the global concentrations of
numerous other metabolites. If such a validated analytical method could be developed
it would great aid research and development and would be particularly valuable in
assessments of nutritionally enhanced crops where changes in a specific pathway are
sought. However, metabolomic technologies are not able to supply this kind of analysis
and data.
2. Detection of novel toxicants. Targeted analysis is inherently incapable of assessing levels
of metabolites that are not selected (targeted) for analysis. Proponents of metabolic
profiling have argued that profiling might detect the emergence of previously unknown
novel toxicants presumably created by the breeding process. However, the abundance
of a few macro-components (protein, fiber, carbohydrate, lipids) and numerous minor
metabolites leaves little compositional “space” for novel toxicants. If wholly new
molecules were created by the spontaneous evolution of a new pathway or pathways
necessary for its biosynthesis, the chances that sufficient quantities would be present to
exert an adverse effect are small indeed. Perhaps this is why such effects have not yet
been observed by science or why coherent hypotheses as to how a novel toxicant would
be generated by a specific breeding process appear to be sparse in the literature.
3. Detection of unintended effects. Proponents of metabolic profiling often suggest that a
profile itself may be an indicator that unintended changes had occurred. Methods to
draw safety conclusions based on differences in metabolic profiles do not yet exist, and
certainly as we have discussed above, no reason to assume that differences in profiles
imply a safety concern; in fact, by any objective measure, there is no such technique as
metabolomic profiling. What we have today is a series of distinct and emerging
powerful scanning techniques each of which surveys a slightly different molecular
landscape with variable degrees of resolution. Clearly, the number of metabolites
present in crops is very large and the power of targeted metabolic profiling will become
increasingly useful in analyzing the chemical complexity of prospective commercial
releases as they progress through initial research and development phases.
Metabolomics is an expanding and exciting field of research. The rapidly expanding scope
of the metabolomic profiling technologies tempts us to test their applicability to a wide
344 Metabolomics
array of analytical challenges. We have, on the other hand, a long history of safe experience
with plant breeding. We know that many unintended changes take place in plant breeding,
however, these are almost without exception innocuous. There is no reason to believe that
GM breeding should require any new or different data set than other forms of breeding.
It seems clear to the present authors that there is no role for metabolic profiling in food
safety assessment. We agree that modern targeted metabolic profiling technologies can
rapidly identify pathway perturbations and, if judiciously applied and interpreted, might
enhance food safety science, although traditional analytical methods can still be used to
assess if changes in pathways and metabolite pools have occurred. If incorporated into the
early selection stages of a prospective new trait targeted metabolic profiling may greatly aid
in the selection of metabolites that need to be considered during the compositional phase of
a risk assessment. To quote Larkin and Harrigan (2007) “However, it should be self-evident
that GM crops ought not to be considered a single monolithic class that is either good or bad
for the economy, agriculture or the environment. Each novel crop should be considered on
its own merits and demerits. If we ever get to that point we will have achieved something
positive out of the GM controversy.” It is our hope that colleagues will take this as a
challenge to further metabolic profiling in the advancement of food safety and nutritional
enhancement of crops.
5. Acknowledgements
Figure 1 was prepared by Jay Harrison of the Statistics Technology Center, Monsanto
Company.
6. References
Ainasoja, M. M., Pohjala, L. L., Tammela, P.S. M., Somervuo, P. J., Vuorela, P. M. & Teeri, T.
H. 2008. Comparison of transgenic Gerbera hybrida lines and traditional varieties
shows no differences in cytotoxicity or metabolic fingerprints. Transgenic Res, 17,
793-803.
Baker, J. M. Hawkins, N. D. Ward, J. L., Lovegrove, A., Napier, J. A., Shewry, P. R., & Beale,
M. H. 2006. A metabolomic study of substantial equivalence of field-grown
genetically modified wheat. Plant Biotechnol J, 4, 381-92.
Beier, R. C. & Oertli, E. H. 1983. Psoralen and other linear furocoumarins as phytoalexins in
celery. Phytochem, 22, 2595-97.
Bell, A. E. 2003. Nonprotein amino acids of plants: Significance in medicine, nutrition, and
agriculture. J Agric Food Chem, 51, 2854–65.
Berger, J.O. 1985. Statistical Decision Theory and Bayesian Analysis. 2nd edition: Springer-
Verlag.
Britz, S. J., Kremer, D. F., & Kenworthy, W. 2008. Tocopherols in soybean seeds; Genetic
variation and environmental effects in field-grown crops. J Am Oil Chem Soc, 85,
931-36.
Broadhurst, D. I. & Kell, D. B. 2006. Statistical strategies for avoiding false discoveries in
metabolomics and related experiments. Metabolomics, 2, 171-96.
Catchpole, G. S., Beckmann, M., Enot, D. P., Mondhe, M., Zywicki, B., Taylor, J., Hardy, N.,
Smith, A., King, R. D., Kell, D. B., Fiehn, O. & Draper, J. 2005. Hierarchical
Challenges for Metabolomics as a Tool in Safety Assessments 345
Lam, H-M., Xu, X., Liu, X., Chen, W., Yang, G., Wong F. L., Li, M.-W., He, W., Qin, N.,
Wang, B., Li, J., Jian, M., Wang, J., Shao, G., Wang, J., Sun , S. S.-M., & Zhang, G.
2010. Resequencing of 31 wild and cultivated soybean genomes identifies patterns
of genetic diversity and selection. Nature Gen, 42, 1053-59.
Larkin, P. & Harrigan, G. G. 2007. Opportunities and surprises in crops modified by
transgenic technology: metabolic engineering of benzylisoquinoline alkaloid,
gossypol and lysine biosynthetic pathways. Metabolomics, 3, 371-82.
Lay Jr., J.O, Borgmann, S., Liyanage, R. & Wilkins, C.L. 2006. Problems with the ‘‘omics’’.
Trends Anal Chem, 25, 1046-56.
Le Gall, G., Colquhoun, I. J., Davis, A. L., Collins, G. J. & Verhoeyen, M. E. 2003. Metabolite
profiling of tomato (Lycopersicon esculentum) using 1H NMR spectroscopy as a tool
to detect potential unintended effects following a genetic modification. J Agric
Food Chem, 51, 2447-56.
Lecoutre, B., Lecoutre, M.-P. & J. Poitevineau, J. 2001. Uses, abuses and misuses of
significance tests in the scientific community: Won't the Bayesian choice be
unavoidable? Int Stat Rev, 69, 399-417.
Lehesranta, S. J., Davies, H. V., Shepherd, L. V., Nunan, N., Mcnicol, J. W., Auriola, S.,
Koistinen, K. M., Suomalainen, S., Kokko, H. I. & Karenlampi, S. O. 2005.
Comparison of tuber proteomes of potato varieties, landraces, and genetically
modified lines. Plant Physiol, 138, 1690-9.
Leon, C., Rodriguez-Meizoso, I., Lucio, M., Garcia-Canas, V., Ibanez, E., Schmitt-Kopplin, P.
& Cifuentes, A. 2009. Metabolomics of transgenic maize combining Fourier
transform-ion cyclotron resonance-mass spectrometry, capillary electrophoresis-
mass spectrometry and pressurized liquid extraction. J Chromatog A, 1216, 7314-
23.
Levandi, T., Leon, C., Kaljurand, M., Garcia-Canas, V. & Cifuentes, A. 2008. Capillary
electrophoresis time-of-flight mass spectrometry for comparative metabolomics of
transgenic versus conventional maize. Anal Chem, 80, 6329-35.
Manetti, C., Bianchetti, C., Casciani, L., Castro, C., Di Cocco, M. E., Miccheli, A., Motto, M. &
Conti, F. 2006. A metabonomic study of transgenic maize (Zea mays) seeds revealed
variations in osmolytes and branched amino acids. J Exp Bot, 57, 2613-25.
NIEHS. 1998. National Institute of Environmental Health Sciences; -Chaconine [20562-03-2]
and -solanine [20562-02-1]. Review of toxicological literature. http://ntp-
server.niehs.nih.gov/index.cfm?objectid=6F5E930D-F1F6-975E-
7037ACA48ABB25F4
NRC. 2004. National Research Council: Safety of Genetically Engineered Foods. Approaches
to Assessing Unintended Health Effects. The National Academies Press,
Washington, D.C.
Panthee, D. R., Pantalone, V. R., West, D. R., Saxton A. M., & Sams, C. E. 2005. Quantitative
trait loci for seed protein and oil concentration and seed size in soybean. Crop Sci,
45, 2015-22.
Piccioni, F., Capitani, D., Zolla, L. & Mannina, L. 2009. NMR metabolic profiling of
transgenic maize with the Cry1A(b) gene. J Agric Food Chem, 57, 6041-49.
Potrykus, I. 2010. Regulation must be revolutionized. Nature, 466, 561
348 Metabolomics
Ricroch, A. E., Berge, J. B. & Kuntz, M. 2011. Evaluation of genetically engineered crops
using transcriptomic, proteomic, and metabolomic profiling techniques. Plant
Physiol, 155, 1752-61.
Rischer, H. & Oksman-Caldentey, K.-M. 2006. Unintended effects in genetically modified
crops: revealed by metabolomics? Trends Biotechnol, 24, 102-4.
Rotundo, J.L. & Westgate, M. E. 2009. Meta-analysis of environmental effects on soybean
seed composition. Field Crops Res, 110, 147-56.
Scott, M. P., Edwards, J. W., Bell, C. P., Schussler, J. R. & Smith, J. S. 2006. Grain composition
and amino acid content in maize cultivars representing 80 Years of commercial
maize varieties. Maydica, 51, 417-23.
Seguin, P., Turcotte, P., Tremblay, G., Pageau, D., & Liu, W. 2009. Tocopherols concentration
and stability in early maturing soybean genotypes. Agron. J, 101, 1153-59.
Seguin, P., Tremblay, G., Pageau, D. & Liu, W. 2010. Soybean tocopherol concentrations are
affected by crop management. J Agric Food Chem, 58, 5495-5501.
Senti, F.R. & Rizek, R. R.1974. An overview of GRAS regulations and the effect from the
viewpoint of nutrition, In Hanson C. H. (ed) CSSA Special Publication. Vol. 5. SSA,
Madison, WI.
Skogerson, K., Harrigan, G. G., Reynolds, T. L., Halls, S. C., Ruebelt, M., Iandolino, A.,
Pandravada, A., Glenn, K. C. & Fiehn, O. 2010. Impact of genetics and environment
on the metabolite composition of maize grain. J Agric Food Chem, 58, 3600-10.
Sunilkumar, G., Campbell, L. M., Puckhaber, L., Stipanovic, R. D. & Rathore, K. S. 2006.
Engineering cottonseed for use in human nutrition by tissue-specific reduction of
toxic gossypol. Proc Natl Acad Sci U S A, 103, 18054-59.
Uribelarrea, M., Below, F. E. & Moose, S.P. 2004. Grain composition and productivity of
maize hybrids derived from the Illinois protein strains in response to variable
nitrogen supply. Crop Sci, 44, 1593-1600.
Yin, X.H. & Vyn, T. J. 2005. Relationships of isoflavone, oil, and protein in seed with yield of
soybean. Agron J, 97, 1314-21.
Wakasa, K., Hasegawa, H., Nemoto, H., Matsuda, F., Miyazawa, H., Tozawa, Y., Morino, K.,
Komatsu, A., Yamada, T., Terakawa, T. & Miyagawa, H. 2006. High-level
tryptophan accumulation in seeds of transgenic rice and its limited effects on
agronomic traits and seed metabolite profile. J Exp Bot, 57, 3069-78 .
Wilcox, J.R. & Shibles, R. M. 2001. Interrelationships among seed quality attributes in
soybean. Crop Sci, 41, 11-4.
Wilson, R. F. 2004. Seed composition, In H.R. Boerma and J.E. Specht (ed.) Soybeans:
Improvement, production and uses, 3rd ed. ASA, CSSA, SSA, Madison, WI.
Zhou, J., Berman, K. H., Breeze, M. L., Nemeth, M. A., Oliveira, W. S., Braga, D. V., Berger,
G. U. & Harrigan, G. G. 2011a. Compositional variability in conventional and
glyphosate-tolerant soybean varieties grown in different regions in Brazil. J Agric
Food Chem, 59, online
Zhou, J., Harrigan, G. G., Berman, K. H., Webb, E. G., Klusmeyer, T. H. & Nemeth, M. A.
2011b. Stability of the compositional equivalence of grain from insect-protected
corn and seed from herbicide-tolerant soybean over multiple seasons, locations and
breeding germplasms. J Agric Food Chem, 59, 8822-28.
15
1. Introduction
Exposure to xenobiotics induces complex biochemical responses in mammalian cells
resulting in several perturbations in cellular toxicity pathways. Within the context of
systems biology, such biochemical perturbations can be studied individually using “omics”
approaches such as toxicogenomics, transcriptomics, proteomics and metabolomics (Heijne
et al., 2005). The objective of this chapter is to examine how the metabolomics approach can
be used in identifying the risk posed by environmental chemicals to human health using
selective examples of organ toxicity. Metabolomics is a medium-to-high throughput
technique employing predominantly mass spectrometry (MS) and nuclear magnetic
resonance (NMR) technology (Roux et al., 2011) for the identification and characterization of
endogenous metabolites of low molecular weight (<1800 Da) arising from different
biochemical pathways either as primary or secondary metabolites (Idle & Gonzalez, 2007).
The sum total of all small metabolites is referred to as the “metabolome”. Metabolomics has
also been applied to the identification of low molecular weight, exogenous metabolites of
xenobiotics (Roux et al., 2011; Rubino et al., 2009). With these capabilities, metabolomics
represents a relatively quick and informative approach for assessing the physiological
response to environmental chemicals.
suspected to pose health hazards, quantification of the concentrations at which they are
present in the environment, a description of the specific forms of toxicity (i.e. neurotoxicity,
carcinogenicity, etc.) that can be caused by the contaminants of concern, and an evaluation
of the conditions under which these forms of toxicity might be expressed in exposed
humans” (NRC, 1994).
For human health assessment of chemicals, non-cancer or cancer risk values are derived
based on the selection of a critical endpoint of toxicity or several endpoints (e.g. biochemical,
pathological, physiological, and behavioral abnormalities) of adverse health outcomes.
Uncertainty factors are applied to the lowest dose associated with the critical health
outcome(s) in order to derive the resulting exposure level for non-cancer toxicity. These
uncertainty factors attempt to account for exposure duration, pharmacokinetic, and
pharmacodynamic data gaps associated with inter- and intra-species extrapolation.
The U.S. EPA and the International Agency for Research on Cancer (IARC) evaluate the
evidence for carcinogenesis in humans from epidemiological, experimental animal, and
mechanistic data to determine the qualitative cancer classification for humans. In addition,
the U.S. EPA evaluates exposure-response relationships and develops quantitative cancer
risk values based on the observed tumors that correspond to a unit exposure (U.S. EPA,
2005). Uncertainties with cancer risk values are presented and are generally associated with
the mode of action (MOA) for carcinogenicity.
One of the major concerns with cancer risk assessment is false-positive animal tumor
findings. Having an understanding of the mechanism(s) leading to carcinogenicity would
help in developing a better perspective of whether a carcinogen in experimental animals is
likely to be a carcinogen in humans. For example, correlating a metabolomic profile of a
suspected carcinogen between human exposures (environmental or occupational) and
experimental animal exposure studies would be highly useful. If similar biochemical
markers were to appear across the human and animal metabolomic profiles, that
information would help in informing similarities or differences in interspecies mechanisms.
Further, if the chemical was demonstrated to be a carcinogen in animals through a
traditional two-year animal bioassay, but there was inconclusive epidemiological evidence,
the similarity in metabolomic data could be used along with other mechanistic data (e.g.
mutagenicity/genotoxicity assays, cell proliferation findings, oxidative stress, epigenetics,
etc.) to support or refute human carcinogenicity. In this regard, metabolomics information
could be used to support mechanistic data to augment the animal and human findings.
EPA, 2005). In vitro systems (e.g. cell cultures) and in vivo models (e.g. experimental
animals or human population studies) have identified several of the early key events
such as oxidative stress, inflammation, genotoxicity, and cytotoxicity that occur from
toxicant exposure. Since metabolomics measures biological response at the molecular
level, this approach can identify the metabolites associated with the sequence of ‘key
events’ and the processes inherent to the mechanism(s) of xenobiotic toxicity. The
metabolomics approach could generate several mode(s) of action hypotheses using a
nontargeted approach. The individual MOA hypothesis thus generated could be tested
in targeted approaches (e.g. measuring glutathione reduction from oxidative stress)
using more conventional assays. Metabolomics data could be used to further inform the
mode(s) of action in experimental animals associated with carcinogenicity or with non-
cancer health outcomes, which may help to confirm the relevancy of the observations in
experimental animals to humans.
chemicals. The extent of the metabolomic datasets varies for each type of organ toxicity;
however, discussion focuses on how metabolomic investigations have contributed to some
understanding of mechanisms of toxicity. Some of these metabolic changes are specific but
many are nonspecific for the selected organ toxicities.
were used for this analysis. The identified putative biomarkers were γ-aminobutyric acid
(GABA), choline, glutamine, creatine, and spermine. The authors were able to speculate
potential mechanisms of methylmercury from results of traditional biochemical assays
(Eskes et al., 2002; Monnet-Tschudi et al., 1996) where the enzymatic activities of neuronal
enzymes including glutamine synthesis, choline acetyltransferase, and glutamic acid
decarboxylase were reported to be significantly decreased. In essence, the decreased levels
of GABA and choline in the in vitro metabolomic study may help to explain these decreases
in enzymatic activity which then leads to neuron-specific toxicity. This information
correlates to the observed increases in apoptotic cell death associated with methylmercury.
Similarly, the increased creatine levels that were reported from methylmercury treatment
were correlated to gliosis (proliferation of astrocytes in the central nervous system) by the
authors. Creatine is generally linked to brain osmoregulation (Bothwell et al., 2001) and
increased creatine levels would lead to increased activity of glial cells that could result in
gliosis.
Thus, a very limited metabolomic dataset for methylmercury adds to the mechanistic profile
of this compound and helps establish a temporality of the key events leading to cytotoxicity
in the brain. In the case of using the metabolomic data for methylmercury, the observed
changes in the neuronal metabolites provide supportive, early evidence of later stage events
leading to brain cytotoxicity. Metabolomic data from in vivo studies or from incidental
human exposure would demonstrate any pharmacodynamic differences and if so, how such
differences could be quantified through the available data rather than the default use of
uncertainty factors.
contain antioxidant defenses, such as GSH to prevent oxidative damage (Rahman &
MacNee, 1999). The metabolomics approach has indeed found changes in metabolites
associated with these effects of cigarette smoke. Predominant changes in metabolites from
exposure to WS included glutathione and -glutamylglutamine, which showed 51 and 13-
fold increases compared to control cells. The increased levels of metabolites within the
glutathione pathway (i.e. glutathione and -glutamylglutamine) suggest a protective
response against oxidative stress, which can result from WS. These data correlate with
human microarray data that demonstrated an antioxidant response to cigarette smoke
through the induction of genes associated with glutathione metabolism (Spira et al., 2004).
Further, Vulimiri and colleagues (Vulimiri et al., 2009) observed increased levels (16.4-fold)
of o-phosphoethanolamine, a marker of phospholipid degradation that may indicate cell
membrane damage. Additional observations included increased arachidonic acid levels that
may suggest an inflammatory response and markers (e.g. putrescine) of cell proliferation.
Conversely, there was a statistically significant decrease in glutathione levels from exposure
to WTPM and GVP when compared to controls. Predominant metabolite changes in these
phases were acetylcarnitine (4.5-fold increase) and palmitoleate (5.2-fold increase) for
WTPM and GVP, respectively, both of which indicated changes in lipid metabolism
(Vulimiri et al., 2009). In summary, these authors (Vulimiri et al., 2009) demonstrated how
metabolomics could differentiate metabolic responses caused by exposure to complex
mixtures (i.e. cigarette smoke) and also provided empirical data for metabolic changes for
known markers of toxicity (e.g. decreased GSH, membrane damage) associated with
exposure to cigarette smoke.
2006; Spira et al., 2004) and proteomic (Carter et al., 2011; Kelsen et al., 2008; Zhang et al.,
2008) investigations that also detected markers associated with the pathways affected by
exposure to cigarette smoke.
Conversely, the integration of “omics” technologies has also demonstrated some limitations.
For example, a study by Steiling and co-workers found a discrepancy between genomic and
proteomic data. Specifically, gel electrophoresis followed by LC-MS analysis identified 41
proteins whose expression would not have been detected by microarray analysis (Steiling et
al., 2009). Such data highlight the importance of assessing more downstream markers (i.e.
proteins or metabolites) that may provide a more accurate understanding of the biological
responses to chemical exposures.
11. Conclusions
Metabolomics is an emerging medium-to-high throughput technique which measures the
endogenous biochemicals affecting different metabolic pathways and can be useful in
characterizing the hazards of environmental chemicals. Identifying metabolic perturbations
caused in mammalian cell systems following chemical exposure helps in elucidating the
predominant toxicity pathways. Some of the advantages of using metabolomic data for
hazard identification—one of the key steps in risk assessment—include the ability to inform
gender, genetic, and organ-specific effects in a relatively expedient manner. As briefly
discussed, metabolomics can identify early biochemical perturbations associated with
toxicity in the hepatic, nervous, and pulmonary systems caused by selected environmental
chemicals. As surveyed, various research systems using metabolomics demonstrate how
metabolomic data could be used for hazard identification and mode of action
characterization for environmental chemicals. Overall, metabolomics represents an
opportunity to develop a better understanding of the toxicity of environmental chemicals
and could further impact the human health assessment of these chemicals.
12. Disclaimer
The views expressed in this chapter are those of the authors and do not necessarily represent
the views or policies of the U.S. Environmental Protection Agency.
13. References
ATSDR (1999). Toxicological profile for mercury, Agency for Toxic Substances and Disease
Registry, Atlanta, GA, Available from:
http://www.atsdr.cdc.gov/toxprofiles/tp46.pdf
Azmi, J.; Connelly, J.; Holmes, E.; Nicholson, J.K.; Shore, R.F. & Griffin, J.L. (2005).
Characterization of the biochemical effects of 1-nitronaphthalene in rats using
360 Metabolomics
based urine metabolic profiles of male Wistar Han rats: implications for biomarker
discovery, Biomarkers, Vol. 9, No. 2, pp. 156-79.
De Zwart, L.L.; Venhorst, J.; Groot, M.; Commandeur, J.N.; Hermanns, R.C.; Meerman, J.H.;
Van Baar, B.L. & Vermeulen, N.P. (1997). Simultaneous determination of eight lipid
peroxidation degradation products in urine of rats treated with carbon
tetrachloride using gas chromatography with electron-capture detection, J
Chromatogr B Biomed Sci Appl, Vol. 694, No. 2, pp. 277-87.
Ekino, S.; Susa, M.; Ninomiya, T.; Imamura, K. & Kitamura, T. (2007). Minamata disease
revisited: an update on the acute and chronic manifestations of methyl mercury
poisoning, J Neurol Sci, Vol. 262, No. 1-2, pp. 131-44.
Eskes, C.; Honegger, P.; Juillerat-Jeanneret, L. & Monnet-Tschudi, F. (2002). Microglial
reaction induced by noncytotoxic methylmercury treatment leads to
neuroprotection via interactions with astrocytes and IL-6 release, Glia, Vol. 37, No.
1, pp. 43-52.
Faux, S.P.; Tai, T.; Thorne, D.; Xu, Y.; Breheny, D. & Gaca, M. (2009). The role of oxidative
stress in the biological responses of lung epithelial cells to cigarette smoke,
Biomarkers, Vol. 14 Suppl 1, No. 90-6.
Fountoulakis, M.; de Vera, M.C.; Crameri, F.; Boess, F.; Gasser, R.; Albertini, S. & Suter, L.
(2002). Modulation of gene and protein expression by carbon tetrachloride in the
rat liver, Toxicol Appl Pharmacol, Vol. 183, No. 1, pp. 71-80.
Gee, D.L.; Bechtold, M.M. & Tappel, A.L. (1981). Carbon tetrachloride-induced lipid
peroxidation: simultaneous in vivo measurements of pentane and chloroform
exhaled by the rat, Toxicol Lett, Vol. 8, No. 6, pp. 299-306.
Gieger, C.; Geistlinger, L.; Altmaier, E.; Hrabe de Angelis, M.; Kronenberg, F.; Meitinger, T.;
Mewes, H.W.; Wichmann, H.E.; Weinberger, K.M.; Adamski, J.; Illig, T. & Suhre, K.
(2008). Genetics meets metabolomics: a genome-wide association study of
metabolite profiles in human serum, PLoS Genet, Vol. 4, No. 11, pp. e1000282.
Ginsberg, G.; Smolenski, S.; Neafsey, P.; Hattis, D.; Walker, K.; Guyton, K.Z.; Johns, D.O. &
Sonawane, B. (2009). The influence of genetic polymorphisms on population
variability in six xenobiotic-metabolizing enzymes, J Toxicol Environ Health B Crit
Rev, Vol. 12, No. 5-6, pp. 307-33.
Go, E.P. (2010). Database resources in metabolomics: an overview, J Neuroimmune Pharmacol,
Vol. 5, No. 1, pp. 18-30.
Guyton, K.Z.; Kyle, A.D.; Aubrecht, J.; Cogliano, V.J.; Eastmond, D.A.; Jackson, M.; Keshava,
N.; Sandy, M.S.; Sonawane, B.; Zhang, L.; Waters, M.D. & Smith, M.T. (2009).
Improving prediction of chemical carcinogenicity by considering multiple
mechanisms and applying toxicogenomic approaches, Mutat Res, Vol. 681, No. 2-3,
pp. 230-40.
Han, J.; Antunes, L.C.; Finlay, B.B. & Borchers, C.H. (2010). Metabolomics: towards
understanding host-microbe interactions, Future Microbiol, Vol. 5, No. 2, pp. 153-61.
Harvey, B.G.; Heguy, A.; Leopold, P.L.; Carolan, B.J.; Ferris, B. & Crystal, R.G. (2007).
Modification of gene expression of the small airway epithelium in response to
cigarette smoking, J Mol Med (Berl), Vol. 85, No. 1, pp. 39-53.
Heby, O. (1981). Role of polyamines in the control of cell proliferation and differentiation,
Differentiation, Vol. 19, No. 1, pp. 1-20.
Hecht, S.S. (2006). Cigarette smoking: cancer risks, carcinogens, and mechanisms,
Langenbecks Arch Surg, Vol. 391, No. 6, pp. 603-13.
362 Metabolomics
Heijne, W.H.; Kienhuis, A.S.; van Ommen, B.; Stierum, R.H. & Groten, J.P. (2005). Systems
toxicology: applications of toxicogenomics, transcriptomics, proteomics and
metabolomics in toxicology, Expert Rev Proteomics, Vol. 2, No. 5, pp. 767-80.
Hu, J.Z.; Rommereim, D.N.; Minard, K.R.; Woodstock, A.; Harrer, B.J.; Wind, R.A.; Phipps,
R.P. & Sime, P.J. (2008). Metabolomics in lung inflammation:a high-resolution (1)h
NMR study of mice exposedto silica dust, Toxicol Mech Methods, Vol. 18, No. 5, pp.
385-98.
Huang, X.; Shao, L.; Gong, Y.; Mao, Y.; Liu, C.; Qu, H. & Cheng, Y. (2008). A metabonomic
characterization of CCl4-induced acute liver failure using partial least square
regression based on the GC/MS metabolic profiles of plasma in mice, J Chromatogr
B Analyt Technol Biomed Life Sci, Vol. 870, No. 2, pp. 178-85.
Hwang, E.S. & Kim, G.H. (2007). Biomarkers for oxidative stress status of DNA, lipids, and
proteins in vitro and in vivo cancer research, Toxicology, Vol. 229, No. 1-2, pp. 1-10.
Ichi, I.; Kamikawa, C.; Nakagawa, T.; Kobayashi, K.; Kataoka, R.; Nagata, E.; Kitamura, Y.;
Nakazaki, C.; Matsura, T. & Kojo, S. (2009). Neutral sphingomyelinase-induced
ceramide accumulation by oxidative stress during carbon tetrachloride
intoxication, Toxicology, Vol. 261, No. 1-2, pp. 33-40.
Idle, J.R. & Gonzalez, F.J. (2007). Metabolomics, Cell Metab, Vol. 6, No. 5, pp. 348-51.
Kelada, S.N.; Eaton, D.L.; Wang, S.S.; Rothman, N.R. & Khoury, M.J. (2003). The role of
genetic polymorphisms in environmental health, Environ Health Perspect, Vol. 111,
No. 8, pp. 1055-64.
Kelsen, S.G.; Duan, X.; Ji, R.; Perez, O.; Liu, C. & Merali, S. (2008). Cigarette smoke induces
an unfolded protein response in the human lung: a proteomic approach, Am J
Respir Cell Mol Biol, Vol. 38, No. 5, pp. 541-50.
Kim, K.-B.; Chung, M.W.; Um, S.Y.; Oh, J.S.; Kim, S.H.; Na, M.A.; Oh, H.Y.; Cho, W.-S. &
Choi, K.H. (2008). Metabolomics and biomarker discovery: NMR spectral data of
urine and hepatotoxicity by carbon tetrachloride, acetaminophen, and D-
galactosamine in rats, Metabolomics, Vol. 4, No. 377-392.
Krewski, D.; Acosta, D., Jr.; Andersen, M.; Anderson, H.; Bailar, J.C., 3rd; Boekelheide, K.;
Brent, R.; Charnley, G.; Cheung, V.G.; Green, S., Jr.; Kelsey, K.T.; Kerkvliet, N.I.; Li,
A.A.; McCray, L.; Meyer, O.; Patterson, R.D.; Pennie, W.; Scala, R.A.; Solomon,
G.M.; Stephens, M.; Yager, J. & Zeise, L. (2010). Toxicity testing in the 21st century:
a vision and a strategy, J Toxicol Environ Health B Crit Rev, Vol. 13, No. 2-4, pp. 51-
138.
Lin, Y.; Si, D.; Zhang, Z. & Liu, C. (2009). An integrated metabonomic method for profiling
of metabolic changes in carbon tetrachloride induced rat urine, Toxicology, Vol. 256,
No. 3, pp. 191-200.
Liu, J.Y.; Tsai, H.J.; Hwang, S.H.; Jones, P.D.; Morisseau, C. & Hammock, B.D. (2009).
Pharmacokinetic optimization of four soluble epoxide hydrolase inhibitors for use
in a murine model of inflammation, Br J Pharmacol, Vol. 156, No. 2, pp. 284-96.
Manibusan, M.K.; Odin, M. & Eastmond, D.A. (2007). Postulated carbon tetrachloride mode
of action: a review, J Environ Sci Health Part C, Vol. 25, No. 3, pp. 185-209.
Meng, Q.R.; Gideon, K.M.; Harbo, S.J.; Renne, R.A.; Lee, M.K.; Brys, A.M. & Jones, R. (2006).
Gene expression profiling in lung tissues from mice exposed to cigarette smoke,
lipopolysaccharide, or smoke plus lipopolysaccharide by inhalation, Inhal Toxicol,
Vol. 18, No. 8, pp. 555-68.
Metabolomics Approach for Hazard Identification in
Human Health Assessment of Environmental Chemicals 363
Monnet-Tschudi, F.; Zurich, M.G. & Honegger, P. (1996). Comparison of the developmental
effects of two mercury compounds on glial cells and neurons in aggregate cultures
of rat telencephalon, Brain Res, Vol. 741, No. 1-2, pp. 52-9.
Mortensen, H.M. & Euling, S.Y. (2011). Integrating mechanistic and polymorphism data to
characterize human genetic susceptibility for environmental chemical risk
assessment in the 21st century, Toxicol Appl Pharmacol, Vol. No. (In Press).
Myers, G.J.; Thurston, S.W.; Pearson, A.T.; Davidson, P.W.; Cox, C.; Shamlaye, C.F.;
Cernichiari, E. & Clarkson, T.W. (2009). Postnatal exposure to methyl mercury from
fish consumption: a review and new data from the Seychelles Child Development
Study, Neurotoxicology, Vol. 30, No. 3, pp. 338-49.
NRC (1983). Risk assessment in the federal government: Managing the process, National
Academies Press, Washington, DC. Available from:
http://www.nap.edu/openbook.php?isbn=0309033497
NRC (1994). Science and judgment in risk assessment, National Academies Press, Washington,
DC. Available from: http://www.nap.edu/catalog/2125.html
Pegg, A.E.; Matsui, I.; Seely, J.E.; Pritchard, M.L. & Poso, H. (1981). Formation of putrescine
in rat liver, Med Biol, Vol. 59, No. 5-6, pp. 327-33.
Raamsdonk, L.M.; Teusink, B.; Broadhurst, D.; Zhang, N.; Hayes, A.; Walsh, M.C.; Berden,
J.A.; Brindle, K.M.; Kell, D.B.; Rowland, J.J.; Westerhoff, H.V.; van Dam, K. &
Oliver, S.G. (2001). A functional genomics strategy that uses metabolome data to
reveal the phenotype of silent mutations, Nat Biotechnol, Vol. 19, No. 1, pp. 45-50.
Rahman, I. & MacNee, W. (1999). Lung glutathione and oxidative stress: implications in
cigarette smoke-induced airway disease, Am J Physiol, Vol. 277, No. 6 Pt 1, pp.
L1067-88.
Robertson, D.G. (2005). Metabonomics in toxicology: a review, Toxicol Sci, Vol. 85, No. 2, pp.
809-22.
Robertson, D.G.; Reily, M.D.; Sigler, R.E.; Wells, D.F.; Paterson, D.A. & Braden, T.K. (2000).
Metabonomics: evaluation of nuclear magnetic resonance (NMR) and pattern
recognition technology for rapid in vivo screening of liver and kidney toxicants,
Toxicol Sci, Vol. 57, No. 2, pp. 326-37.
Roux, A.; Lison, D.; Junot, C. & Heilier, J.F. (2011). Applications of liquid chromatography
coupled to mass spectrometry-based metabolomics in clinical chemistry and
toxicology: A review, Clin Biochem, Vol. 44, No. 1, pp. 119-35.
Rubino, F.M.; Pitton, M.; Di Fabio, D. & Colombi, A. (2009). Toward an "omic"
physiopathology of reactive chemicals: thirty years of mass spectrometric study of
the protein adducts with endogenous and xenobiotic compounds, Mass Spectrom
Rev, Vol. 28, No. 5, pp. 725-84.
Schmelzer, K.R.; Wheelock, A.M.; Dettmer, K.; Morin, D. & Hammock, B.D. (2006). The role of
inflammatory mediators in the synergistic toxicity of ozone and 1-nitronaphthalene in
rat airways, Environ Health Perspect, Vol. 114, No. 9, pp. 1354-60.
Serhan, C.N. (2009). Systems approach to inflammation resolution: identification of novel
anti-inflammatory and pro-resolving mediators, J Thromb Haemost, Vol. 7 Suppl 1,
No. 44-8.
Spira, A.; Beane, J.; Pinto-Plata, V.; Kadar, A.; Liu, G.; Shah, V.; Celli, B. & Brody, J.S. (2004).
Gene expression profiling of human lung tissue from smokers with severe
emphysema, Am J Respir Cell Mol Biol, Vol. 31, No. 6, pp. 601-10.
Steiling, K.; Kadar, A.Y.; Bergerat, A.; Flanigon, J.; Sridhar, S.; Shah, V.; Ahmad, Q.R.; Brody,
J.S.; Lenburg, M.E.; Steffen, M. & Spira, A. (2009). Comparison of proteomic and
364 Metabolomics