probabilistic methods for interpreting electron-density maps
DESCRIPTION
Probabilistic Methods for Interpreting Electron-Density Maps. Frank DiMaio University of Wisconsin – Madison Computer Sciences Department [email protected]. 3D Protein Structure. backbone. backbone sidechain. backbone sidechain C -a l p h a. ALA. LEU. PRO. VAL. ARG. ?. ?. ?. - PowerPoint PPT PresentationTRANSCRIPT
Probabilistic Methods for Interpreting Electron-Density Maps
Frank DiMaio
University of Wisconsin – Madison Computer Sciences Department
3D Protein Structure
backbonebackbonesidechainbackbonesidechainC-alpha
3D Protein Structure
ALALEU PRO VAL
ARG
… …
?? ?? ??
High-Throughput Structure Determination
Protein-structure determination important Understanding function of a protein Understanding mechanisms Targets for drug design
Some proteins produce poor density maps Interpreting poor electron-density maps is very
(human) laborious I aim to automatically interpret
poor-quality electron-density maps
Electron-Density Map Interpretation
……
GIVEN: 3D electron-density map,(linear) amino-acid sequence
Electron-Density Map Interpretation
……
FIND: All-atom Protein Model
My focus
Density Map Resolution
Morris et al. (2003) Ioerger et al. (2002)Terwilliger (2003)
2.0Å 3.0Å 4.0Å1.0Å
Thesis Contributions
A probabilistic approach to protein-backbone tracingDiMaio et al., Intelligent Systems for Molecular Biology (2006)
Improved template matching in electron-density mapsDiMaio et al., IEEE Conference on Bioinformatics and Biomedicine (2007)
Creating all-atom protein models using particle filteringDiMaio et al. (under review)
Pictorial structures for atom-level molecular modelingDiMaio et al., Advances in Neural Information Processing Systems (2004)
Improving the efficiency of belief propagationDiMaio and Shavlik, IEEE International Conference on Data Mining (2006)
Iterative phase improvement in ACMI
ACMI Overview
Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)
Independent amino-acid search Templates model 5-mer conformational space
Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints
Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories
ACMI Overview
Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)
Independent amino-acid search Templates model 5-mer conformational space
Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints
Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories
5-mer Lookup
…SAW C VKFEKPADKNGKTE…
ProteinDB
ACMI searches map for each template independently Spherical-harmonic decomposition allows rapid search
of all template rotations
Spherical-Harmonic Decomposition
f (θ,φ)
5-mer Fast Rotation Search
pentapeptide fragmentfrom PDB (the “template”)
electron density map
calculated (expected)density in 5A sphere
map-region sampled in
spherical shells
template-density sampled in
spherical shells
sampled region ofdensity in 5A sphere
5-mer Fast Rotation Search
map-region sampled in
spherical shells
template-density sampled in
spherical shells
template spherical-harmonic coefficients
map-region spherical-harmonic coefficients
correlationcoefficientas functionof rotation
fast-rotation function
(Navaza 2006, Risbo 1996)
Convert Scores to Probabilities
correlation coefficientsover density map ti (ui)
scan density map for fragment
probability distribution
over density mapP(5-mer at ui | EDM)
Bayes’rule
ACMI Overview
Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)
Independent amino-acid search Templates model 5-mer conformational space
Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints
Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories
Probabilistic Backbone Model Trace assigns a position and orientation
ui={xi, qi} to each amino acid i
The probability of a trace U = {ui} is
1( | ) ( | )NP P u u U EDM EDM
This full joint probability intractable to compute
Approximate using pairwise Markov field
Pairwise Markov-Field Model
Joint probabilities defined on a graph as product of vertex and edge potentials
AAs ( | )i i
iu
EDMAAs ,
( , )ij i ji j
u u ( | )P U EDM
GLY LYS LEU SERALA
ACMI’s Backbone Model
Observational potentials tie the map to the model
LEU SERGLY LYSALA
GLY LYS LEU SERALA
ACMI’s Backbone Model
Adjacency constraints ensure adjacent amino acids are ~3.8Å apart and in proper orientation
Occupancy constraints ensure nonadjacent amino acids do not occupy same 3D space
Backbone Model Potential
( | )p U EDM
AAs , AAs , AAs | | 1 | | 1
( , ) ( , ) ( | )adj i j occ i j i ii j i j i
i j i j
u u u u u
EDM
Backbone Model Potential
Constraints between adjacent amino acids
×),( jiadj uu ) |||| ( jix xxp ),( ji uup=
( | )p U EDM
AAs , AAs , AAs | | 1 | | 1
( , ) ( , ) ( | )adj i j occ i j i ii j i j i
i j i j
u u u u u
EDM
Backbone Model Potential
otherwise1
if0),(
K||x||xuu ji
jiocc
( | )p U EDM
AAs , AAs , AAs | | 1 | | 1
( , ) ( , ) ( | )adj i j occ i j i ii j i j i
i j i j
u u u u u
EDM
Constraints between all other amino acid pairs
2 2
( | )
Pr(5mer ... at )i i
i i i
u
s s u
EDM
( | )p U EDM
Backbone Model Potential
AAs , AAs , AAs | | 1 | | 1
( , ) ( , ) ( | )adj i j occ i j i ii j i j i
i j i j
u u u u u
EDM
Observational (“template-matching”) probabilities
Inferring Backbone Locations Want to find backbone layout that maximizes
AAs , AAs , AAs | | 1 | | 1
( , ) ( , ) ( | )adj i j occ i j i ii j i j i
i j i j
u u u u u
EDM
Inferring Backbone Locations
Exact methods are intractable Use belief propagation (Pearl 1988)
to approximate marginal distributions
Want to find backbone layout that maximizes
, ku k i( | ) ( | )i ip u p EDM U EDM
AAs , AAs , AAs | | 1 | | 1
( , ) ( , ) ( | )adj i j occ i j i ii j i j i
i j i j
u u u u u
EDM
Belief Propagation Example
LYS31 LEU32
mLYS31→LEU32
pLEU32pLYS31ˆ ˆ
Belief Propagation Example
LYS31 LEU32
mLEU32→LYS31
pLEU32pLYS31ˆ ˆ
Naïve implementation O(N2G2) N = the number of amino acids in the protein G = # of points in discretized density map
O(G2) computation for each message passed O(G log G) as Fourier-space multiplication
O(N2) messages computed & stored Approx (N-3) occupancy msgs with 1 message O(N) messages using a message accumulator
Improved implementation O(NG log G)
Scaling BP to Proteins(DiMaio and Shavlik, ICDM 2006)
Naïve implementation O(N2G2) N = the number of amino acids in the protein G = # of points in discretized density map
O(G2) computation for each message passed O(G log G) as Fourier-space multiplication
O(N2) messages computed & stored Approx (N-3) occupancy msgs with 1 message O(N) messages using a message accumulator
Improved implementation O(NG log G)
Scaling BP to Proteins(DiMaio and Shavlik, ICDM 2006)
To pass a message
( , )occ i ju u1ˆ ( )
ni i
i
p udu
( )ni j j
ui
m u 1 ( )nj i im u
Occupancy Message Approximation
occupancyedge potential
product of incoming msgs to i except from j
To pass a message
1ˆ( ) ( , ) ( ) n ni occ i i i i
ui
m u u u p u du
( , )occ i ju u1ˆ ( )
ni i
i
p udu
( )ni j j
ui
m u 1 ( )nj i im u
Occupancy Message Approximation
occupancyedge potential
product of all incoming msgs to i
“Weak” potentials between nonadjacent amino acids lets us approximate
1 5 62 3 4
Occupancy Message Approximation
3
3
1 3
ˆocc
x
p
m
3
3
5 3
ˆocc
x
p
m
3
3
6 3
ˆocc
x
p
m
1 5 62 3 4
Occupancy Message Approximation
3
3ˆocc
x
p 3
3ˆocc
x
p 3
3ˆocc
x
p
1 5 62 3 4
Occupancy Message Approximation
Send outgoing occupancy message product to a central accumulator
AAs
( )i
ACC x ( )im x
ACC
1 5 62 3 4
Occupancy Message Approximation
ACC
Then, each node’s incoming message product is computed in constant time
3 3p̂
ACC
2m 3m 4m
2 3m 4 3m
BP Output
After some number of iterations, BP gives probability distributions over Cα locations
ALA LEU PRO VAL ARG… …
… … …
LEU LEUp x VAL VALp x
ACMI’s Backbone Trace
Independently choose Cα locations that maximize approximate marginal distribution
…
…
* ˆarg max ( )i
i i ix
b p x
Example: 1XRI
HIGH
LOW0.1
0.9
0.9009Å RMSd93% complete
prob(AA at location) 3.3Å resolution density map39° mean phase error
Testset Density Maps (raw data)
Density-map resolution (Å)
Den
sity
-map
mea
n ph
ase
erro
r (d
eg.)
15
30
45
60
75
1.0 2.0 3.0 4.0
0
20
40
60
80
100
Experimental Accuracy
% C
α’s
loca
ted
with
in 2
Å o
f s
om
e C
α /
co
rre
ct C
α
ACMI ARP/wARP
TextalResolve
% backbone correctly placed% amino acids correctly identified
Experimental Accuracy on a Per-Protein Basis
AC
MI %
Cα
’s lo
cate
d
ARP/wARP % Cα’s located
Resolve % Cα’s located
Textal % Cα’s located
0
20
40
60
80
100
0 20 40 60 80 100
0
20
40
60
80
100
0 20 40 60 80 1000
20
40
60
80
100
0 20 40 60 80 100
ACMI Overview
Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)
Independent amino-acid search Templates model 5-mer conformational space
Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints
Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories
Problems with ACMI
Biologists want location of all atoms All Cα’s lie on a discrete grid Maximum-marginal backbone model may be
physically unrealistic
Ignoring a lot of information Multiple models may better represent
conformational variation within crystal
Probability=0.4 Probability=0.35 Probability=0.25 Maximum-marginal structure
ACMI with Particle Filtering(ACMI-PF)
Idea: Represent protein using a set of static 3D all-atom protein models
Particle Filtering Overview (Doucet et al. 2000)
Given some Markov process x1:KX with observations y1:K Y
Particle Filtering approximates some posterior probability distribution over X using a set of N weighted point estimates
( ) ( )1: 1: 1: 1:
1
|N
i iK K K K K
i
p x y wt x x
Particle Filtering Overview
Markov process gives recursive formulation
1: 1: 1 1: 1 1: 1| | | |k k k k k k k kp x y p y x p x x p x y
Use importance fn. q(x k |x 0:k-1 ,y k) to grow particles
Recursive weight update,
( ) ( ) ( )1( ) ( )
1 ( ) ( )1
| |
| ,
i i ik k k ki i
k k i ik k k
p y x p x xwt wt
q x x y
Particle Filtering for Protein Structures
Particle refers to one specific 3D layout of some subsequence of the protein
At each iteration advance particle’s trajectory by placing an additional amino-acid’s atoms
Particle Filtering for Protein Structures
Alternate extending chain left and right
Particle Filtering for Protein Structures
Alternate extending chain left and right An iteration alternately places
Cα position bk+1 given bk
All sidechain atoms sk given bk-1:k+1
bk bk+1
sk
bk-1
Particle Filtering for Protein Structures
Key idea: Use the conditional distribution p(bk|bi
k-1,Map) to advance particle trajectories
Construct this conditional distribution from BP’s marginal distributions
bk bk+1
sk
bk-1
Algorithmplace “seeds” bk
i for each particle i=1…N
while amino-acids remainplace bk
i+1 / bj
i-1 given bj:k
i for each i=1…N
place ski given bk
i-1:k+1 for each i=1…N
optionally resample N particlesend while
Particle Filtering for Protein Structures
bkbk-1 bk+1
sk
… …
Backbone Step (for particle i )
(1) Sample L bk+1’s from bk-1–bk–bk+1
pseudoangle distribution
bkbk+1
1…L
bk-1
place bki+1 given bk
i for each i=1…N
Backbone Step (for particle i )
pk+1(b )k+11
pk+1(b )k+12
pk+1(b )k+1L
…bk
bk-1
(2) Weight each sample by its ACMI-computed approximate marginal
place bki+1 given bk
i for each i=1…N
bk+11…L
Backbone Step (for particle i )
pk+1(b )k+11
pk+1(b )k+12
pk+1(b )k+1L
…bk
bk-1
(3) Select bk+1 with probability
proportional to sample weight
place bki+1 given bk
i for each i=1…N
bk+11…L
Backbone Step (for particle i )
bk-1
bk bk+1
1 1 11
L
k k k kwt p b wt
(4) Update particle weight as sum of sample weights
place bki+1 given bk
i for each i=1…N
Sidechain Step (for particle i )
place ski given bk
i-1:k+1 for each i=1…N
(1) Sample sk from a database of
sidechain conformations
ProteinData Bank
Sidechain Step (for particle i )
pk(EDM | s ) k 1
pk(EDM | s ) k 2
pk(EDM | s ) k 3
(2) For each sidechain conformation, compute probability of density map given the sidechain
place ski given bk
i-1:k+1 for each i=1…N
Sidechain Step (for particle i )
pk(EDM | s ) k 1 pk(EDM | s ) k
3
pk(EDM | s ) k 2
(3) Select sidechain conformation from this weighted distribution
place ski given bk
i-1:k+1 for each i=1…N
Sidechain Step (for particle i )
11
|M
mk k k
m
wt p s wt
EDM
(4) Update particle weight as sum of sample weights
place ski given bk
i-1:k+1 for each i=1…N
Particle Resampling
wt = 0.1wt = 0.1
wt = 0.1wt = 0.1
wt = 0.4wt = 0.4
wt = 0.3wt = 0.3
wt = 0.1wt = 0.1
wt = 0.2
wt = 0.2
wt = 0.2
wt = 0.2
wt = 0.2
wt = 0.1
wt = 0.1
wt = 0.4
wt = 0.3
wt = 0.1
Amino-Acid Sampling Order
Begin at some amino acid k with probability
ˆ( ) exp entropy ( )k kP k p b
At each step, move left to right with probability
j k
1 1
1 1
ˆ( 1) exp entropy ( )
ˆ( 1) exp entropy ( )
j j
k k
P j p b
P k p b
Experimental Methodology
Run ACMI-PF 10 times with 100 particles each Return highest-weight particle from each run Each run samples amino-acids in a different order Refine each structure for 10 iterations in Refmac5
Compare 10-structure model to others using Rfree
obs calc
obs
F FR
F
ACMI-PF Versus ACMI-Naïve
Ref
ined
Rfr
ee
Number of ACMI-PF runs
0.2
0.3
0.4
0.5
1 2 3 4 5 6 7 8 9 10
Acmi-PF
Acmi-Naive
Additionally, ACMI-PF’s models have … Fewer gaps (10 vs. 28) Lower sidechain RMS error (2.1Å vs. 2.3Å)
ACMI-PF Versus OthersA
CM
I-P
F R
free
ARP/wARP Rfree Resolve Rfree Textal Rfree
0.25
0.35
0.45
0.55
0.65
0.25 0.35 0.45 0.55 0.65
0.25
0.35
0.45
0.55
0.65
0.25 0.35 0.45 0.55 0.65
0.25
0.35
0.45
0.55
0.65
0.25 0.35 0.45 0.55 0.65
ACMI-PF Example: 2A3Q
1.79Å RMSd92% complete
2.3Å resolution 66° phase err.
ACMI Overview
Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)
Independent amino-acid search Templates model 5-mer conformational space
Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints
Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories
Phase 4: Iterative phase improvement Use particle-filtering models to
improve density-map quality Rerun entire pipeline on
improved density map Repeat until convergence
Phase Problem
, f I Φ
Intensities
Phases
Measured by X-raycrystallography
Experimentallyestimated (e.g. MAD, MIR)
Density-Map Phasing
30° 60° 75°0°
mean phase error
calcΦ
calcI
calcΦ
calcI
calcΦexpΦ
Iterative Phase Improvement
obsI
Predicted3D model
Initialdensity map
Reviseddensity map
ACMI-PF’s Phase Improvement
Error in initial phases(deg. mean phase error)
Err
or in
AC
MI-
PF
’s p
hase
s(d
eg. m
ean
pha
se e
rro
r)
0
15
30
45
60
75
0 15 30 45 60 75
Two-Iteration ACMI
% backbone locatedIteration 1
% b
ackb
one
loca
ted
Iter
atio
n 2
50
60
70
80
90
100
50 60 70 80 90 100
Future Work: Many-iteration ACMI
0
10
20
30
40
50
60
0 1 2 3 40
5
10
15
20
1 2 3 4 5
Number of ACMI iterations Number of ACMI iterations
Ave
rag
e %
un
inte
rpre
ted
AA
s
Ave
rag
e m
ea
n p
has
e e
rro
r
Conclusions
ACMI’s three steps construct a set of all-atom protein models from a density map
Novel message approximation allows inference on large, highly-connected models
Resulting protein models are more accurate than other methods
Ongoing and Future Work
Incorporate additional structural biology background knowledge
Incorporate more complex potential functions
Further work on iterative phase improvement
Generalize my algorithms to other 3D image data
Acknowledgements
Advisor Jude Shavlik Committee
George Phillips Charles Dyer David Page Mark Craven
Collaborators Ameet Soni Dmitry Kondrashov Eduard Bitto Craig Bingman
6th floor MSCers
Center for Eukaryotic Structural Genomics
Funding UW-Madison Graduate
School NLM 1T15 LM007359 NLM 1R01 LM008796