metabolomics & metabolite atlases
DESCRIPTION
Dealing With the Unknown. Metabolomics & Metabolite Atlases. Ben Bowen Pathway Tools Workshop 2010. Acknowledgements. Trent Northen Richard Baran Wolfgang Reindl Do Yup Lee Jane Tanamachi Jill Banfield Curt Fisher Paul Wilmes - PowerPoint PPT PresentationTRANSCRIPT
Metabolomics & Metabolite Atlases
Ben Bowen
Pathway Tools
Workshop2010
Dealing With the Unknown
Acknowledgements Trent NorthenRichard Baran
Wolfgang Reindl
Do Yup Lee
Jane Tanamachi
Jill BanfieldCurt Fisher
Paul Wilmes
US Department of Energy BER Genome Sciences Program
Sample independent: suitable for
unsequenced organisms and communities
AGILENT 6520 QTOF
HPLC (C18; hilic)
MS/MSMetabolite ‘features’
&Quantification
C18NEG/255.22807/3.39329/Hexadecanoic acid;C18NEG/255.22862/4.89002/Hexadecanoic acid;C18NEG/248.8424/1.47135/24-Dibromophenol;C18NEG/112.98576/27.34079/Acetylenedicarboxylate;C18NEG/270.82471/1.34821/C18NEG/168.88735/1.29241/
metabolite solvent
extraction
LC-MS/MS Workflow
How a data point becomes a compound
From Feature to Formula
From Formula to Compound
Annotation of
Metabolite Atlases
Photo: John Waterbury, Woods Hole Oceanographic Institute (DOE)
• Selection of features• Pure Spectra• Isotopic pattern fitting• Stable Isotope Labeling
• Exact Match to MS/MS Spectra• Partial Match to MS/MS Spectra• Exchangable hydrogen• Retention time• Authentic standards• Other (NMR & Synthesis)
• Define feature in database• Sample Metadata• Extraction methods• LC/MS methods• mz@rt annotations
Systems biology depends on accurate modelsAnalysis of MetaCyc shows many unique formulas are shown in only a few reactions or pathways
• Models provide a framework to prove or disprove observations.
• Highlight gaps in annotations when new compounds are discovered
Pathway Specific MarkersOrSparsity of Knowledge
Using inexact mass for formula ID
Isotopic Pattern FittingC & N Isotopic Labels
Reduce Degeneracy About m/z value
Mass and Degeneracy are Correlated
Heuristically Filtered
Brute Force Method
CONTROL
Na15NO3
NaH13CO3
Large-scale formula determination using stable isotopic labeling
Baran et. al. Untargeted metabolite profiling of Synechococcus sp. PCC 7002 reveals a large fraction of unexpected metabolites (Analytical Chemistry 2010)
PROBLEM: Difficult to ID many metabolites give low coverage of authentic standards
Approach: Stable isotope labeling (SIL) for direct empirical formula determination
Less Degeneracy Isn’t Better
We Prefer to Work With Unique Chemical Formulae
Heuristically Filtered OnlyHeuristically Filtered + SIL
Unfiltered + SIL
Noise & Isotopic Patterns
Initial focus is on Synechococcus sp a simple yet important model system
1. Photosynthetic bacteria2. Small genome (3299
ORFs)3. ~fast growing and easy to
grow4. No metabolite
background (salt media)5. Adaptable: 0-2M salt, T up
to 45C
Simple systemFor method development
Widely distributed and globally important in carbon cycling
Benefits of Using SIL
• Are the signals being measured biological?
• What type of ion is the signal?
• Has this signal been seen before?
• What compound(s) is it?• What else in the sample
behaves like that compound?
Global Profiling
StandardsSIL
Stable isotope labeling
Control
13C
15N[15N]NaNO3
[13C]NaHCO3
Stable isotope labeling
m/z
RT
Non-biological features dominate
• Manually curated
• Computationally Identified
• Sets are constructed by grouping features by retention time
Results
~100 distinct metabolites detected 82 assigned chemical formulas
74 unique 45 outside of Syn7002Cyc 24 outside of MetaCyc or KEGG
54 identified or putatively identified metabolites Using authentic standards or
MS/MS
Most dominant biological features
Formula MetabolitePeak height Formula matches in
Cell extract Media extract7002 MetaCyc KEGG(+) (-) (+) (-)
(Glucosylglycerol) 452242 658300 1 2 2
Glutamate 228714 44229 3 9 10
(Hexos(amine)-based oligomer) 184691 90745 0 0 0
(Hexos(amine)-based oligomer) 174581 152126 0 0 0
(Glucosylglycerate) 39066 163000 0 2 1
19819 83700 2 26 29
(NNN-trimethylhistidine) 69974 2444 0 1 1
C9H18O8C5H9NO4C25H40N2O18C25H40N2O18C9H16O9C12H22O11 (2Hexoses-H2O)
C9H15N3O2
Putative hexose(amine)-based trisaccharide:
Excreted metabolites
Formula MetabolitePeak height Formula matches in
Cell extract Media extract7002 MetaCyc KEGG(+) (-) (+) (-)
Phenylalanine 12860 8878 24417 8259 1 4 4
(Alanine) 3987 7325 2479 1500 4 7 8
Isoleucine 1200 1301 4427 1532 2 8 11
Leucine 2089 1992 4093 1707 2 8 11
Tryptophan 1778 2264 929 1 2 7
Methionine 950 1 5 4
Valine 600 1 8 10
Methyluridine 220 570 0 0 2
Methylguanosine 350 140 0 3 1
Methyladenosine 310 0 1 2
C9H11NO2C3H7NO2C6H13NO2C6H13NO2C11H12N2O2C5H11NO2S
C5H11NO2C10H14N2O6C11H15N5O5C11H15N5O4
Histidine-betaine derivatives
NH
N
N
OH
O
HSNH
N
N
OH
O
HONH
N
N
OH
O
Previously only to attributed to non-yeast-fungi and Actinomycetales bacteria
Culture purity validated by PCR of markers of ribosomal RNA and sequencing
Lysine biosynthesis V (Syn7002Cyc)
Lysine biosynthesis VI (Syn7002Cyc)
N2-acetyllysine
Analyze selected features by MS/MS
Target features at specificm/z & r.t.
MS/MS structural confirmation
• Commercial Standards
• Metlin
• Massbank
• Collaborating to expand the number of authentic standards (Siuzdak, Mukhopadhyay) and make these publically available.
De novo MS/MS analysis
5-methyluridine
Proton Painting
CiHjOkNxPySz Ci (HNj1HEX
j2) OkNxPySz
j=j1+j2
Chemical properties in addition to m/z
decyldimethylammoniopropane sulfonate Glycylglycine
Lipids from microbial communities
• Unlabeled
• 15N labeled
• 2H labeled (exchangeable)
• Sample independent
Resolve Isomers of lysolipids
Pure-Spectra Includes Ca2+ & Fe2+ Adducts
Absolute abundance of L-PE features is much higher in a “friable” sample.
AB Muck DS2
AB Muck Friable
Relative abundance of various PEs changes with development stage.
Moving from features to formulas to metabolites is challenging
Time (sec)
m/z 205.097
C11H12N2O2
Chemical formula determination
Structural analysis
Retention Time Correlation
Afte
r 12
Obs
erva
tions
Store retention time correlations
SIL Automatic Annotation
Test the fit for all possible formulas for common
ionization mechanisms
Label Purity and Percent Incorporation are Parameters
Correlation and mass defect analysis
C2H4
200 400 600 800-0.4
-0.3
-0.2
-0.1
0
Nominal Mass
Kend
rick
Mas
s D
efec
t
650 700 750 800
-0.32
-0.3
-0.28
-0.26
Nominal Mass
Ken
dric
k M
ass
Def
ect
0 50 100 1500
1
2
3
4
x 1012
G(
) 28 28.02 28.04 28.060
2
4
6
8
10
12x 1011
G(
)
C2H4
Autocorrelation Spectra of unprocessed data
Find the dominant mass differences in data
H2O
Modular Metabolome
13.99 14 14.01 14.02 14.03 14.04 14.05 14.060
0.01
0.02
0.03
0.04
0.05
0.06
m/z lag,
Corre
latio
n, G
()
Estimate the likelihood of all possible chemical differences
How can you know that this is CH2?
What can be resolved
0.98 0.99 1 1.01 1.02 1.03 1.04 1.050
0.2
0.4
0.6
0.8
1
G(
)
-3 -2 -1 0 1 2 3x 10-3
0
0.2
0.4
0.6
0.8
1
*
G(
)
Mass of an electron shown for scale
Time and Mass Correlation
Neutron: Zero Time Correlation
H2O: Mixture of: Zero Time and Negative Time Correlation
C2H4: Positive Time Correlation
Relate back to features
16.94 16.96 16.98 17 17.02 17.04 17.06 17.08 17.1 17.120
0.005
0.01
0.015
0.02
0.025
0.03
0.035
m/z lag,
Cor
rela
tion,
G(
)
Microbial Metabolite Atlases
600 800 1000 1200 1400 1600 1800 2000 2200 24000
2
4
6
x 105
retention time (sec)
inte
nsity
900 1000 11000
1
2
3
4
5
6
x 105
retention time (sec)
inte
nsity
0 500 1000 1500 2000500
1000
1500
2000
2500
m/z
rete
ntio
n tim
e (s
ec)
From Features to
Pure Spectra
Within one experiment: 1000s of features from 100s of metabolites
The End