model seed tutorial part 1: technical primer · 2010. 10. 26. · model seed tutorial part 1:...
TRANSCRIPT
Model SEED Tutorial Part 1:Technical Primer
Christopher Henry, Scott Devoid, Matt DeJongh, Aaron Best, Ross Overbeek, and Rick Stevens
Presented by: Christopher Henry
August 31-September 2
Environment
Genome + Environment = Phenotype
Transcription
Translation
Proteins
Biochemical Circuitry
Phenotypes (Organism Properties)
DNA (storage)
Gene Expression
Metabolomics
Proteomics
Adapted From Bruno Sobral VBI
Predicting Phenotypes from Genotypes the prediction of system level behavior from collections of functional components
Metabolic Modeling is One Key to Predicting Phenotype from Genotype
What is a metabolic model?1.) A list of all reactions involved in the metabolic pathways
2.) A list of rules associating reaction activity to gene activity
3.) A biomass reaction listing essential building blocks needed for growth and division Gene A
Function
Gene B
Function
Enzyme
Biomass
Amino acids
Nucleotides
Lipids
CofactorsCell walls
Energy
Nutrients
Metabolic Modeling is One Key to Predicting Phenotype from Genotype
What can a metabolic model do?1.) Predict culture conditions and possible responses to environment changes.2.) Predict metabolic capabilities from genotype.3.) Predict impact of genetic perturbations
Gene A
Function
Gene B
Function
Enzyme
Biomass
Amino acids
Nucleotides
Lipids
CofactorsCell walls
Energy
Nutrients Byproducts
Metabolic Modeling is One Key to Predicting Phenotype from Genotype
What can a metabolic model do?1.) Predict culture conditions and possible responses to environment changes.2.) Predict metabolic capabilities from genotype.3.) Predict impact of genetic perturbations4.) Linking annotations to observed organism behavior enabling validation and correction of annotations
BiomassMODEL
ANNOTATION PREDICTION
PHENOTYPE
RECONCILIATION
1
10
100
1000
Num
ber o
f gen
omes
Sequenced prokaryotes in NCBI
Manually curated published models
Total published models
Num
ber o
f mod
els
Automatically generated SEED models
Model reconstruction lags behind genome sequencing
Year
Num
ber o
f gen
omes
100
101
102
103
104
105Manually curated modelsAutomatically reconstructed models
Sequenced prokaryotes
0
30
60
90
120
150
180
•≈1000 completely sequenced prokaryotes vs ≈30 published genome-scale models
•Models are often constructed one-at-a-time by individuals working independently
•Model building typically begins by identifying bidirectional best hits with E. coli
•Current process results in replication of work, propagation of errors, and extensive manual curation
•Bottom line: it previously required approximately one year to produce a complete model
www.theseed.org/models/
Model SEED: Converting Annotated Genomes into Genome-scale Metabolic Models
Preliminary reconstruction
Annotated genome in SEED
RAST annotation server
•A biochemistry database was constructed combining content from the KEGG and 13 published genome-scale models into a non-redundant set of compounds and reactions
•Reactions were then mapped to the functional roles in the SEED based on EC number, substrate names, and enzyme names:
Acetinobacter: iAbaylyiv4 (874 rxn) M. barkeri: iAF692 (620 rxn)
Combined SEED Database
(12,103 rxn)
M. genitalium: iPS189 (263 rxn)
M. tuberculosis: iNJ661 (975 rxn)
P. putida: iJN746 (949 rxn)
S. aureus: iSB619 (649 rxn)
S. cerevisiae: iND750 (1149 rxn)
B. subtilis: iAG612 (598 rxn)
E. coli: iAF1260 (2078 rxn)
E. coli: iJR904 (932 rxn)
H. pylori: iIT341 (476 rxn)
L. lactis: iAO358 (619 rxn)
B. subtilis: iYO844 (1020 rxn)
(8000 rxn)
NAD+ + NADPH NADH + NADP+
NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2)
NAD(P) transhydrogenase subunit beta (EC 1.6.1.2)
REACTION FUNCTIONAL ROLE GENE
peg.100
peg.101
COMPLEX
Gene complex
Biochemistry Database in the SEED
www.theseed.org/models/
Biomass Objective Function•To test growth of the model, we build a biomass objective function template
Biomass
DNA
RNA
Protein
Cell wall
Lipids
Cofactors and ions
Energy
dATP, dGTP, dCTP, dTTP
ATP+H2O→ADP+Pi
ATP, GTP, CTP, UTP
Amino acids
Peptioglycan
Various acylglycerols
Nutrients
•Each biomass component may be rejected from the biomass reaction of a model based on the following criteria:
•Subsystem representation
•Functional role presence
•Taxonomy
•Cell wall types
Misc
Cell wallTeichoic acid
Cell wallCore lipid A Gram negative
Universal
Universal
Universal
Universal
Depends on genome
Gram positive
Any genome with cell wall
Depends on genome
www.theseed.org/models/
Model SEED: Converting Annotated Genomes into Genome-scale Metabolic Models
? Biomass
?
Preliminary reconstruction
Predicted 56 missing metabolic functions/
model
Predicted cell-host
interactions
Annotated genome in SEED
RAST annotation server
Auto-completion
Assuming Steady State:
No internal metabolite is allowed to accumulate
Thus, reaction rates are constrained by mass balances
For example:
v3 = v4
At Steady State:
v1 = v2
v4+v5 = v6
v2 =v3+v5+v7
A B
C
D1 2
3
5
4
6
7
The Cell
By product
BiomassNutrient
www.theseed.org/models/
Flux Balance Analysis
Flux Balance Analysis
A B
C
D1 2
3
5
4
6
7
The Cell
7
6
5
4
3
2
1
vvvvvvv
1 -1 0 0 0 0 00 1 -1 0 -1 0 -10 0 1 -1 0 0 00 0 0 1 1 -1 0
A
B
C
D
1 2 3 54 6 7
V
• =
0000
By product
BiomassNutrient
www.theseed.org/models/
Genome Annotations Contain Knowledge Gaps
chromosome
mRNA
proteinchaperone ribosome
flagella
transcription factor
??
?
?
?
?
www.theseed.org/models/
Flux Balance Analysis
A B
C
D1
3
5
4
6
7
The Cell
7
6
5
4
3
2
1
vvvvvvv
1 -1 0 0 0 0 00 1 -1 0 -1 0 -10 0 1 -1 0 0 00 0 0 1 1 -1 0
A
B
C
D
1 2 3 54 6 7
V
• =
0000
By product
BiomassNutrient
?
www.theseed.org/models/
Model Auto-completion Optimization
Objective:
Subject to:
Mass balance constraints: Compounds in model
Compounds not in model
Ncore vcore
vdb
0Ndb
Ndb0
Use variable constraints: ≤ ≤, ,0 i forward Max i forwardv v z
≤ ≤, ,0 i reverse Max i reversev v z0biomassv >Forcing positive growth:
=
+ + + ∑ , ,not in model , ,reversible not in model , ,irreversible not in model , ,irreversible in model1
Minimize 2r
i forward i reverse i reverse i reversei
z z z z
Penalizing addition of reactions to the model
Penalizing reversibility adjustments
www.theseed.org/models/
Genome Annotation: the Subsystems Approach
chromosome
mRNA
proteinchaperone ribosome
flagella
transcription factor
?
?
?
www.theseed.org/models/
Model SEED: Converting Annotated Genomes into Genome-scale Metabolic Models
? Biomass
?
130 new metabolic models
Analysis-ready models
Preliminary reconstruction
Predicted 56 missing metabolic functions/
model
Predicted cell-host
interactions
Predicted growth media
66%
Model accuracy
Predicted gene essentiality
Predicted phenotypes
•965 reactions•688 genes•876 metabolites
*
Annotated genome in SEED
RAST annotation server
Auto-completion
•SEED models were used to predict the output of 14 biolog phenotyping arrays
•Average accuracy: 60%
•SEED models were used to predict essential genes for 14 experimental gene essentiality datasets
•Average accuracy: 72%
Overall accuracy: 66%
Essentiality data
Biolog phenotype dataAccuracy Before Optimization
0%
20%
40%
60%
80%
100%
0%
20%
40%
60%
80%
100%
Esse
ntia
lity
pred
ictio
n ac
cura
cyBi
olog
pre
dict
ion
accu
racy
www.theseed.org/models/
Model SEED: Converting Annotated Genomes into Genome-scale Metabolic Models
? Biomass
?
130 new metabolic models
Analysis-ready models
Preliminary reconstruction
Predicted 56 missing metabolic functions/
model
Predicting 69 missing transporters/model
Predicted cell-host
interactions
Predicted growth media
66%
71%
Model accuracy
Predicted gene essentiality
Predicted phenotypes
•965 reactions•688 genes•876 metabolites
*
Annotated genome in SEED
RAST annotation server
Auto-completion
Biolog consistency analysis
Esse
ntia
lity
pred
ictio
n ac
cura
cyBi
olog
pre
dict
ion
accu
racy
•Add transporters for Biolog nutrients if missing from models
•69 transporters added to each model on average
•Average accuracy: 70%
•Accuracy unchanged: 72%
Overall accuracy: 71%
Essentiality data
Biolog phenotype data
0%
20%
40%
60%
80%
100%
0%
20%
40%
60%
80%
100%Biolog Consistency Analysis
www.theseed.org/models/
Model SEED: Converting Annotated Genomes into Genome-scale Metabolic Models
? Biomass
?
130 new metabolic models
Gene essentiality consistency analysis
Analysis-ready models
Preliminary reconstruction
Predicted 56 missing metabolic functions/
model
Predicting 69 missing transporters/model
Correction for 202 annotations inconsistent with essentiality data
Predicted cell-host
interactions
Predicted growth media
66%
71%
74%
Model accuracy
Predicted gene essentiality
Predicted phenotypes
•965 reactions•688 genes•876 metabolites
Essential gene A
Essential gene B
Nonessential gene C
Reaction
Original GPR
Corrected GPR
*
Annotated genome in SEED
RAST annotation server
Auto-completion
Biolog consistency analysis
Essential gene
Nonessential gene
A B
Essential gene A
Essential gene B
A B
•Reconciling annotation inconsistent with essentiality data
Essentiality data
•Accuracy 78%
Biolog phenotype data
•Accuracy unchanged: 70%
Overall accuracy: 75%
Esse
ntia
lity
pred
ictio
n ac
cura
cyBi
olog
pre
dict
ion
accu
racy
0%
20%
40%
60%
80%
100%
0%
20%
40%
60%
80%
100%Annotation Consistency Analysis
www.theseed.org/models/
Model SEED: Converting Annotated Genomes into Genome-scale Metabolic Models
? Biomass
?
130 new metabolic models
Model opt: GapFill
Gene essentiality consistency analysis
Analysis-ready models
Preliminary reconstruction
Predicted 56 missing metabolic functions/
model
Predicting 69 missing transporters/model
Correction for 202 annotations inconsistent with essentiality data
Correcting reversibility constraints
Predicted cell-host
interactions
Predicted growth media
A B
A B
A B
66%
71%
74%
82%
Model accuracy
Predicted gene essentiality
Predicted phenotypes
•965 reactions•688 genes•876 metabolites
? Biomass
?
Predicted missing and
extra metabolic functions
Essential gene A
Essential gene B
Nonessential gene C
Reaction
Original GPR
Corrected GPR
*
Annotated genome in SEED
RAST annotation server
Auto-completion
Biolog consistency analysis
Additional gap filling:
Biolog accuracy
•Average accuracy: 83%
Essentiality accuracy
•Average accuracy: 81%
Overall accuracy: 82%
GrowthNo growth
In vivo
In s
ilico No
grow
thG
row
th
•Fix false negative predictions by adding reactions to models
Esse
ntia
lity
pred
ictio
n ac
cura
cyBi
olog
pre
dict
ion
accu
racy
0%
20%
40%
60%
80%
100%
0%
20%
40%
60%
80%
100%Model Optimization: Gap Filling
www.theseed.org/models/
Model SEED: Converting Annotated Genomes into Genome-scale Metabolic Models
? Biomass
?
130 new metabolic models
Model opt: GapFill
Model opt: GapGen
Gene essentiality consistency analysis
Analysis-ready models
Preliminary reconstruction
Predicted 56 missing metabolic functions/
model
Predicting 69 missing transporters/model
Correction for 202 annotations inconsistent with essentiality data
Correcting reversibility constraints
Predicted cell-host
interactions
Predicted growth media
A B
A B
A B
66%
71%
74%
82%
87%
Model accuracy
Predicted gene essentiality
Predicted phenotypes
•965 reactions•688 genes•876 metabolites
? Biomass
?
Predicted missing and
extra metabolic functions
Essential gene A
Essential gene B
Nonessential gene C
Reaction
Original GPR
Corrected GPR
*
Annotated genome in SEED
RAST annotation server
Auto-completion
Biolog consistency analysis
Model Optimization: Gap GenerationAdditional gap filling:
Biolog accuracy
•Average accuracy: 88%
Essentiality accuracy
•Average accuracy: 85%
Overall accuracy: 87%
GrowthNo growth
In vivo
In s
ilico No
grow
thG
row
th
•Fix false positive predictions by removing reactions from models
Esse
ntia
lity
pred
ictio
n ac
cura
cyBi
olog
pre
dict
ion
accu
racy
0%
20%
40%
60%
80%
100%
0%
20%
40%
60%
80%
100%
www.theseed.org/models/
Model SEED: Converting Annotated Genomes into Genome-scale Metabolic Models
? Biomass
?
130 new metabolic models
Model opt: GapFill
Model opt: GapGen
Optimized models
Gene essentiality consistency analysis
Analysis-ready models
Preliminary reconstruction
22 optimized models
Predicted 56 missing metabolic functions/
model
Predicting 69 missing transporters/model
Correction for 202 annotations inconsistent with essentiality data
Correcting reversibility constraints
Predicted cell-host
interactions
Predicted growth media
A B
A B
A B
66%
71%
74%
82%
87%
Model accuracy
Predicted gene essentiality
Predicted phenotypes
•965 reactions•688 genes•876 metabolites
? Biomass
?
Predicted missing and
extra metabolic functions
Essential gene A
Essential gene B
Nonessential gene C
Reaction
Original GPR
Corrected GPR
*
Annotated genome in SEED
RAST annotation server
Auto-completion
Biolog consistency analysis
Seed Models vs Published ModelsOrganism name Published model Published reactions SEED Reactions Published genes SEED genesAcinetobacter iAbaylyiv4 868 1196 775 785B. subtilis iYO844 1020 1463 844 1041C. acetobutylicum iJL432 502 989 432 721E. coli iAF1260 2013 1529 1261 1083G. sulfurreducens iRM588 523 721 588 468H. influenzae iCS400 461 969 400 575H. pylori iIT341 476 731 341 421L. plantarum iBT721 643 908 721 699L. lactis iAO358 621 965 358 646M. succiniciproducens iTK425 686 1048 425 659M. tuberculosis iNJ661 939 1021 661 728M. genitalium iPS189 264 294 189 214N. meningitidis iGB555 496 903 555 560P. gingivalis iVM679 679 744 0* 399P. aeruginosa iMO1056 883 1386 1056 1094P. putida iNJ746 950 1261 746 1053R. etli iOR363 387 1264 363 1242S. aureus iSB619 641 1115 619 770S. coelicolor iIB700 700 1159 700 987
•Single-genome Seed models compare favorably with published single genome models
www.theseed.org/models/
Assessing Subsystem Annotations From Auto-completion
•We identify how complete the annotations are for each of the Seed subsystems by calculating the following ratio:
auto-completion reactions in subsystem
total reactions in subsystem
•Highest scoring subsystems:
•Cell Wall and Capsule Biosynthesis (15%)•21 reactions per model added during auto-completion•LOS Core Oligosaccharide Biosynthesis (Gram negative)•Teichoic and Lipoteichoic Acids Biosynthesis (Gram positive)•KDO2-Lipid A Biosynthesis
•Cofactors, Vitamins, and Prosthetic Group Biosynthesis (5%)•10 reaction per model added during auto-completion•Ubiquinone Biosynthesis•Menaquinone and Phylloquinone Biosynthesis•Thiamin Biosynthesis
•Six subsystems account for 31/56 reactions added to each model during the auto-completion process
Fraction of subsystem reactions with missing genes
=
www.theseed.org/models/
Model statistics across the phylogenetic tree
www.theseed.org/models/
1125/795/46γ-proteobacteria (34)
α-proteobacteria (18)944/705/64
δ-proteobacteria (5)951/663/54
spirochaetes (3)528/346/82
bacteroidetes (4)931/648/70
ε-proteobacteria (6)788 /471/61
β-proteobacteria (12)1115/870/39
elusimicrobium (1)737/415/88
fusobacteria (1)786/534/64
mollicutes (3)295/210/50
bacilli (21)1051/757/47
clostridia (5)997/728/59
572/296/86chlamydia (2)
dehalococcoides (1)600/381/105
actinobacteria (9)949/696/74
deinococcus/thermus (2)976/657/53
aquificae (1)741/493/74
thermotogae (1)855/516/60
firmicutes proteobacteria
Bacteria group (number of models)Reactions/genes/auto-completion reactions
1041/751/51963/695/49
0
400
800
1200
1600
0 20 40 60 80 100 120 140Tota
l Re
actio
ns i
n M
odel
Auto-completion reactions
2.5% 5%10%
30%
Reaction Activity Across All Models
www.theseed.org/models/
0
200
400
600
800
1000
0 400 800 1200 1600Reactions in models
Reac
tions
in e
ach
clas
s EssentialActive nonessentialInactive
80%
40%
20%
5%
www.theseed.org/models/
0
400
800
1200
1600
0 3000 6000 9000
Gen
es in
eac
h cl
ass
Genes in modeled genomes
Essential genesGenes in model
10%
15%25%40%
Essential Genes Across All Models
www.theseed.org/models/
0
10
20
30
40
50
0 400 800 1200 1600
Esse
ntia
l nut
rien
ts
Reactions in models
Essential Nutrients Across All Models
www.theseed.org
AcknowledgementsANL/U. Chicago Team- Robert Olson- Terry Disz- Daniela Bartels- Tobias Paczian- Daniel Paarmann- Scott Devoid- Andreas Wilke- Bill Mihalo- Elizabeth Glass- Folker Meyer- Jared Wilkening- Rick Stevens- Alex Rodriguez- Mark D’Souza- Rob Edwards- Christopher Henry
FIG Team- Ross Overbeek- Gordon Pusch- Bruce Parello- Veronika Vonstein- Andrei Ostermann- Olga Vassieva- Olga Zagnitzko- Svetlana GerdesHope College Team- Aaron Best- Matt DeJongh- Nathan Tintle - Hope college students