thanks to harvard/mit team: jake jaffe, kyriacos leptos, matt wright, daniel segre, martin steffen...
Post on 19-Dec-2015
214 views
TRANSCRIPT
Thanks to Harvard/MIT Team:
Jake Jaffe, Kyriacos Leptos,
Matt Wright, Daniel Segre, Martin Steffen
DARPA BIOCOMP 23-May-2002
Model-data integration . Issues of flux optimality & polymer mechanics of 4D cell models
gggatttagctcagttgggagagcgccagactgaa gatttg gaggtcctgtgttcgatccacagaattcgcacca
Post- 300 genomes &
3D structures
DoD Relevance: Accurate Bio I/O Engineering
Over-determinedCalculable Protein folding vs. crystallographyAccurate Comprehensive/Quantitative Bio-Systems Embrace outliers Analytic & Synthetic Useful Computer-Aided-Design (CAD)
>>INTEGRATION<<
DNA RNA Protein: in vivo & in vitro interactions
Metabolites
Replication rate
Environment
Technical challenge: Integrating Measures & Models
Microbes Cancer & stem cells DarwinianIn vitro replicationSmall multicellular organisms
RNAiInsertionsSNPs
Human Red Blood CellODE model200 measured parameters
GLCe GLCi
G6P
F6P
FDP
GA3P
DHAP
1,3 DPG
2,3 DPG
3PG
2PG
PEP
PYR
LACi LACe
GL6P GO6P RU5PR5P
X5P
GA3P
S7P
F6P
E4P
GA3P F6P
NADPNADPH
NADPNADPH
ADPATP
ADPATP
ADP ATPNADHNAD
ADPATP
NADHNAD
K+
Na+
ADP
ATPADP
ATP
2 GSH GSSGNADPH NADP
ADO
INO
AMP
IMPADOe
INOe
ADE
ADEeHYPX
PRPP
PRPP
R1P R5PATP
AMPATP
ADP
Cl-
pH
HCO3-
Jamshidi, Edwards, Fahland, Church, Palsson, B.O. (2001) Bioinformatics 17: 286.(http://atlas.med.harvard.edu/gmc/rbc.html)
Gene deletions
Nor
mal
ized
opt
imal
gro
wth
Linear Programming Flux Balance Analysis
(vko=0)
Minimal Perturbation Analysisfor the analysis of non-optimalmetabolic phenotypes
Daniel Segre
Challenge #1: Suboptimality of mutants --integrating growth rate
and flux data
This is a Quadratic Programming (QP) problem:
Minimize Dist=i(xi-ai)2
given Sx=b ; x 0
Minimize (xTQx)/2 + aTx given Sx=b ; x 0
Standard form:
Optimal (FBA)
Suboptimal(MPA)
p = 4·10-3
p = 10-5
2 test for prediction of essential genes:
0 50 100 150 2000
20
40
60
80
100
120
140
160
180
200
1
2
3
456
78
9
10
11121314
15
16
17 18
C009-limited
-50 0 50 100 150 200 250-50
0
50
100
150
200
250
1
2
3456
78
910
11121314
1516
17
18
Experimental Fluxes
Pre
dic
ted
Flu
xes
-50 0 50 100 150 200 250-50
0
50
100
150
200
250
1
2
3
456
78
910
111213
14
15
16
1718
pyk (LP)
WT (LP)
Experimental Fluxes
Pre
dic
ted
Flu
xes
Experimental Fluxes
Pre
dic
ted
Flu
xes
pyk (QP)
=0.91p=8e-8
=-0.06p=6e-1
=0.56P=7e-3
DNA RNA Protein: in vivo & in vitro interactions
Metabolites
Replication rate
Environment
Technical challenge: Integrating Measures & Models
Microbes Cancer & stem cells DarwinianIn vitro replicationSmall multicellular organisms
RNAiInsertionsSNPs
Minimal Perturbation Analysisfor the analysis of non-optimalmetabolic phenotypes
Challenge #1: Suboptimality of mutants --integrating growth rate
and flux data
Polymer mechanics of 4D cell models
(Automating integration of data)
Challenge #2: integrating proteomics & in vivo crosslinking data
Mapping genome foldingDNA:DNA, DNA:protein, protein:protein in vivo crosslinks
Dekker etal. Science 2002 295:1306-11 Capturing chromosome conformation.
In vivo crosslinking DNA-binding proteins
Comparison of Quantification Methods
0.001
0.01
0.1
1
10
100
0.0001 0.001 0.01 0.1 1 10 100
Fractional Composition (percent - total intensity all peptides)
Fra
cti
on
al
Co
mp
os
itio
n (
pe
rce
nt)
dps
rpoc
rpob
hns
dbha
ssb
gyrb
ihfalon
ihfb
top1uvra
crp
argr
nusahrpa
sspa
fur
Retention time min
PS
W CM
VAR
C C T KD
Q GAG
LF E K
[Optional 1st & 2nd Protein dimensions: Subcellular fractions, Sizing of native protein complexes
1st peptide Dimension: Strong Cation Exchange Charge
2nd peptide Dimension: Reverse Phase Chromatography Hydrophobicity
3rd peptide Dimension: Mass Spectrometry Mass per charge
Multidimensional protein and peptide separations for MS quantitation
m/z
Β.A.
C.
rt1
rt2
rt3
KCL_MS0002_BSA3 #900 RT: 26.79 AV: 1 NL: 1.93E4T: + p ESI Full ms [ 400.00-2000.00]
600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
666.7488.7
595.9
579.0
686.9
897.2 943.9720.1 1259.3
815.9 1075.5 1695.9
1184.8 1626.71415.2
1335.11047.1
1563.71888.0
1736.01506.5
MS1KCL_MS0002_BSA3 #903 RT: 26.88 AV: 1 NL: 2.44E4T: + p ESI Full ms [ 400.00-2000.00]
600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
424.2
666.8488.7
573.5
686.6
740.7
1249.9 1329.1979.5 1157.5 1941.8
1026.8877.3 1588.8
748.11376.7 1504.9
1640.9 1931.11807.2
1888.3
MS1KCL_MS0002_BSA3 #901 RT: 26.82 AV: 1 NL: 2.75E4T: + p ESI Full ms [ 400.00-2000.00]
600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
488.8
538.9
641.5
610.1813.9684.1 1133.5
1303.4833.7
1384.8901.1 996.9
1524.1 1769.61158.2 1605.81424.1
1670.51839.6
1847.0
MS1KCL_MS0002_BSA3 #902 RT: 26.85 AV: 1 NL: 2.26E4T: + p ESI Full ms [ 400.00-2000.00]
600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
488.8
578.7
741.9
683.9
1195.5
985.5 1179.5856.0
893.9 1249.9825.5
1067.31279.9
1577.01349.01901.51470.4
1605.01688.9
1919.4
MS1KCL_MS0002_BSA3 #902 RT: 26.85 AV: 1 NL: 2.26E4T: + p ESI Full ms [ 400.00-2000.00]
600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
488.8
578.7
741.9
683.9
1195.5
985.5 1179.5856.0
893.9 1249.9825.5
1067.31279.9
1577.01349.01901.51470.4
1605.01688.9
1919.4
MS1KCL_MS0002_BSA3 #902 RT: 26.85 AV: 1 NL: 2.26E4T: + p ESI Full ms [ 400.00-2000.00]
600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
488.8
578.7
741.9
683.9
1195.5
985.5 1179.5856.0
893.9 1249.9825.5
1067.31279.9
1577.01349.01901.51470.4
1605.01688.9
1919.4
MS1KCL_MS0002_BSA3 #902 RT: 26.85 AV: 1 NL: 2.26E4T: + p ESI Full ms [ 400.00-2000.00]
600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
488.8
578.7
741.9
683.9
1195.5
985.5 1179.5856.0
893.9 1249.9825.5
1067.31279.9
1577.01349.01901.51470.4
1605.01688.9
1919.4
MS1KCL_MS0002_BSA3 #902 RT: 26.85 AV: 1 NL: 2.26E4T: + p ESI Full ms [ 400.00-2000.00]
600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
488.8
578.7
741.9
683.9
1195.5
985.5 1179.5856.0
893.9 1249.9825.5
1067.31279.9
1577.01349.01901.51470.4
1605.01688.9
1919.4
MS1KCL_MS0002_BSA3 #902 RT: 26.85 AV: 1 NL: 2.26E4T: + p ESI Full ms [ 400.00-2000.00]
600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
488.8
578.7
741.9
683.9
1195.5
985.5 1179.5856.0
893.9 1249.9825.5
1067.31279.9
1577.01349.01901.51470.4
1605.01688.9
1919.4
MS1KCL_MS0002_BSA3 #902 RT: 26.85 AV: 1 NL: 2.26E4T: + p ESI Full ms [ 400.00-2000.00]
600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
488.8
578.7
741.9
683.9
1195.5
985.5 1179.5856.0
893.9 1249.9825.5
1067.31279.9
1577.01349.01901.51470.4
1605.01688.9
1919.4
MS1KCL_MS0002_BSA3 #902 RT: 26.85 AV: 1 NL: 2.26E4T: + p ESI Full ms [ 400.00-2000.00]
600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
488.8
578.7
741.9
683.9
1195.5
985.5 1179.5856.0
893.9 1249.9825.5
1067.31279.9
1577.01349.01901.51470.4
1605.01688.9
1919.4
MS1KCL_MS0002_BSA3 #902 RT: 26.85 AV: 1 NL: 2.26E4T: + p ESI Full ms [ 400.00-2000.00]
600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
488.8
578.7
741.9
683.9
1195.5
985.5 1179.5856.0
893.9 1249.9825.5
1067.31279.9
1577.01349.01901.51470.4
1605.01688.9
1919.4
MS1
D.
Minimal Cell Projects
• The first FULL proteome model would benefit from a small number of natural cell states & genes.
• 3D-structure of a cell during replication & motility.
• Genome engineering / complete synthesis.
199419951996199719981999200020012002
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
Small sequenced genomes (excludes organelle/symbionts)
Mollicutes = cell-wall-less bacteria, a subgroup of Clostridia “gram-positive”o Acholeplasmataceae
•Acholeplasma, Anaeroplasma, Phytoplasmao Mycoplasmatales
•Entomoplasmataceae (florum)•Mycoplasmataceae pulmonis urealyticum pneumoniae genitalium (mobile)•Spiroplasmataceae
Megabases
Motility
Species nm/ sec Replicate Temp
M. mobile 3000 5 hr 25M. pneumoniae 300 8 37M. florum 0 1.5 30U. urealyticum 0 >10 37
E.coli 20000 0.4 37H. sapiens 1000 >10 37
RNA Pol / ribosome 20 (=50 nt/s)E.coli DNA Pol3 300 (=1000 nt/s)
Attachment organelle replication
Seto S, Layh-Schmitt G, Kenri T, Miyata M. J Bacteriol 2001 183:1621 Visualization of the attachment organelle and cytadherence proteins of Mycoplasma pneumoniae by immunofluorescence microscopy.
Mycoplasma pneumoniae
Regula, et al, Microbiology 147:1045-57,scale bar = 100 nm
Hypothetical mechanisms
Protein filament
Anchor to membrane
Linear motor
Adhesion Protein
Cell membrane
Direction of cellmovement
Adhesion moleculesdiscarded
Adhesion moleculesrecycled
Motors recycled
?
?
?
Solid surface
A
B C D
D B
C
Proteo-genomic mapping
(of peptide data
in 3 forward & 3 reverse
frames)
A
B C D
D B
C
Use of proteogenomic mapping to discover B. a new ORF. C. a new ORF & delete an inaccurately predicted ORF. D. N-terminal extension of an existing ORF.
Constraints
•Replication
•Membrane-bound polyribosomes
•Other RNA and/or protein complexes
•Metabolism
•DNA Structural Forces
Genome folding & cell 3D structure
Seto & Miyata (1999) Partitioning, movement, and positioning of nucleoids in Mycoplasma capricolum J. Bact. 181:6073
Cell = 0.5 500-800 kbp genome
Extended diameter = 80 ~200 transverses with each membrane encoding gene
anchored to the cell surface.How to segregate this?
Paired fork modelDingman CW. Bidirectional chromosome replication: some topological considerations. J Theor Biol 1974 Jan;43(1):187-95.
Sundin O, Varshavsky A. Terminal stages of SV40 DNA replication proceed via multiply intertwined catenated dimers. Cell. 1980 Aug;21(1):103-14.
Hearst JE, Kauffman L, McClain WM. A simple mechanism for the avoidance of entanglement during chromosomereplication. Trends Genet. 1998 Jun;14(6):244-7.
Bouligand, Y, Norris V (2000) “Both replication forks appear to be part of a single complex or factory, as first proposed by Dingman.” http://wwwmc.bio.uva.nl/texel/tekst/norris.html
Roos M, Lingeman R, Woldringh CL, Nanninga N. Biochimie 2001 Jan;83(1):67-74 Experiments on movement of DNA regions in Escherichia coli evaluated by computer simulation.
Constraints
•Replication
•Membrane-bound polyribosomescould anchor the RNA polymerase and hence the gene’s DNA to within 20 nm of the cell surface.
•Other RNA and/or protein complexes
•Metabolism
•DNA Structural Forces
Origin
Blue: First MPN gene#Green : Mid gene # 344 (ter)Red: Last gene# 688
Side view, no replication (gene#)
Off-axial view, no replicated segments,unoptimizedmembrane
Yellow: MembranePink: Ribosomal White: Hypothetical & abundantGreen : Misc. abundant Blue: Weak
Axial view, no replicated segments
Yellow: MembranePink: Ribosomal White: Hypothetical & abundantGreen : Misc. abundant Blue: Weak
Origin
Yellow: MembranePink: Ribosomal White: Hypothetical & abundantGreen : Misc. abundant Blue: Weak
Side view, no replicated segments
Origin
Blue: Origin of replicationRed: Terminus
Side view, no replication (dis from ori)
ji
jii
i RRwMwC,
21 ),d(),d(
R1
R2
M1 M2
M3
Simple example cost function for chromosome structure optimization
2002_5_16_h18_42 31.5783 0.0595431 0.444777 -0.148005 -0.12554 39.676 0.0072412002_5_16_h19_0 61.4522 0.046929 -0.0010534 -0.37642 0.64887 -7.9804 -0.12812002_5_16_h19_19 91.2823 0.075882 0.16159 -0.2373 1.0718 8.0774 0.0763642002_5_16_h19_34 45.8961 0.10725 0.165795 -0.292295 -0.0370155 46.2283 0.34542002_5_16_h19_42 38.601 0.0410951 0.363854 0.154569 0.0889424 24.162 0.12032002_5_16_h20_3 35.3927 0.0355828 -0.434093 0.17439 0.0015235 -24.9479 -0.029682002_5_16_h20_30 36.5715 0.0495523 0.0201888 0.533363 0.04049 -11.7067 -0.07172002_5_16_h20_50 108.2712 -0.03419 0.366322 -0.216694 -1.30726 -23.67 0.01812002_5_16_h21_5 45.4948 0.022745 0.44564 -0.26902 -0.18342 -9.5072 0.271892002_5_16_h21_50 50.4768 0.172497 -0.282122 -0.285109 0.478558 -46.2911 0.27582002_5_16_h21_56 37.6382 0.0304836 0.398325 0.201159 0.0797413 17.013 -0.812002_5_16_h23_41 35.4194 0.0445114 0.532795 0.0134364 0.117782 -42.2785 0.4512002_5_17_h0_2 39.8033 0.11543 -0.006943 -0.426032 -0.128618 -35.8674 -0.030492002_5_17_h0_10 62.7409 0.0093794 0.040845 -0.10502 0.35003 3.4834 0.237642002_5_17_h4_12 47.0811 0.116387 0.146311 -0.520041 -0.28928 20.3289 0.17002002_5_17_h4_20 33.5733 0.096 0.00628 0.547581 0.0413792 22.1782 -0.15982002_5_17_h4_29 41.1507 0.167149 0.422391 0.126038 0.59806 38.4758 0.10792002_5_17_h4_35 46.4101 0.0765229 0.106407 0.460038 0.350776 12.6997 -0.010972002_5_17_h4_50 31.2508 0.0209708 0.484708 -0.131666 0.0525948 17.7536 -0.078832002_5_17_h5_41 41.8434 0.0638499 0.411257 0.20358 0.380453 19.9535 -0.044102002_5_17_h5_54 31.7824 0.0219507 0.568525 -0.0296989 -0.25155 10.4541 0.016612002_5_17_h6_39 42.8122 0.21156 0.003633 -0.502632 0.315238 -61.1441 0.396042002_5_17_h6_45 31.5284 0.026136 0.52898 -0.0904436 -0.0902993 -25.0525 0.11012002_5_17_h7_17 44.8789 0.069805 -0.00365152 -0.539196 0.179759 -18.5657 0.01892002_5_17_h7_26 110.863 0.231782 0.311698 0.218959 -1.51978 11.0336 0.014072002_5_17_h7_34 27.5664 0.0463924 0.44446 0.077077 -0.237724 -26.988 -0.02722002_5_17_h7_51 43.5492 0.0300962 0.230355 0.293637 0.0425634 12.5355 -0.02752002_5_17_h8_15 44.922 0.107868 0.0263435 -0.554559 -0.298406 -18.3352 0.04061
E_finals
Searching six helical parametersfor chromosomal fold
0 50 100 150 200 250 300 350 400 450 500
0
200
400
600
800
1000
1200
1400
1600
1800
time steps
Ene
rgy
Monte carlo minimization of the model fit to constraints.
2002_5_17_h5_54 70.5984 31.7824
2002_5_16_h20_3 95.1449 35.3927
2002_5_17_h4_20 92.7126 33.5733
2002_5_17_h4_50 749.4929 31.2508
data_2002_5_19_h0_40
data_2002_5_16_h18_42
data_2002_5_16_h19_34
data_2002_5_16_h21_50
data_2002_5_16_h19_42
data_2002_5_16_h21_56
data_2002_5_16_h20_3
data_2002_5_16_h19_0
data_2002_5_16_h20_30
data_2002_5_16_h21_5
Origin
Blue: Left replicated segment (yelgr=high gene#)Red: Right (i.e. middle) segmentAqua: unduplicated segment of the circular genome
Avoidance of entanglement throughout cell cycle
M. pneumoniae genes generally point away from Ori
More significant if abundance
data are integrated
Alignment of known motors:Polymerases,b
ribosomes, F1 ATPase
Biospice 2.0 Deliverables: toolsets for data integration & optimality assessment
#1QP MPA flux & growth modeling
#2: 4D-model current plan:•Chromosome segregation• Membrane-bound polysomes• Ribosomal protein/rRNA assembly• Motility (coordination with replication origin)
Next few months:• Other protein complexes• Space filling metric• Replication entanglement metric• In vivo crosslinking