probabilistic methods for interpreting electron-density maps

Probabilistic Methods for Interpreting Electron-Density Maps

Frank DiMaio

University of Wisconsin – Madison Computer Sciences Department

[email protected]

3D Protein Structure

backbonebackbonesidechainbackbonesidechainC-alpha

3D Protein Structure

ALALEU PRO VAL

ARG

… …

?? ?? ??

High-Throughput Structure Determination

Protein-structure determination important Understanding function of a protein Understanding mechanisms Targets for drug design

Some proteins produce poor density maps Interpreting poor electron-density maps is very

(human) laborious I aim to automatically interpret

poor-quality electron-density maps

Electron-Density Map Interpretation

……

GIVEN: 3D electron-density map,(linear) amino-acid sequence

Electron-Density Map Interpretation

……

FIND: All-atom Protein Model

My focus

Density Map Resolution

Morris et al. (2003) Ioerger et al. (2002)Terwilliger (2003)

2.0Å 3.0Å 4.0Å1.0Å

Thesis Contributions

A probabilistic approach to protein-backbone tracingDiMaio et al., Intelligent Systems for Molecular Biology (2006)

Improved template matching in electron-density mapsDiMaio et al., IEEE Conference on Bioinformatics and Biomedicine (2007)

Creating all-atom protein models using particle filteringDiMaio et al. (under review)

Pictorial structures for atom-level molecular modelingDiMaio et al., Advances in Neural Information Processing Systems (2004)

Improving the efficiency of belief propagationDiMaio and Shavlik, IEEE International Conference on Data Mining (2006)

Iterative phase improvement in ACMI

ACMI Overview

Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)

Independent amino-acid search Templates model 5-mer conformational space

Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints

Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

5-mer Lookup

…SAW C VKFEKPADKNGKTE…

ProteinDB

ACMI searches map for each template independently Spherical-harmonic decomposition allows rapid search

of all template rotations

Spherical-Harmonic Decomposition

f (θ,φ)

5-mer Fast Rotation Search

pentapeptide fragmentfrom PDB (the “template”)

electron density map

calculated (expected)density in 5A sphere

map-region sampled in

spherical shells

template-density sampled in

spherical shells

sampled region ofdensity in 5A sphere

5-mer Fast Rotation Search

map-region sampled in

spherical shells

template-density sampled in

spherical shells

template spherical-harmonic coefficients

map-region spherical-harmonic coefficients

correlationcoefficientas functionof rotation

fast-rotation function

(Navaza 2006, Risbo 1996)

Convert Scores to Probabilities

correlation coefficientsover density map ti (ui)

scan density map for fragment

probability distribution

over density mapP(5-mer at ui | EDM)

Bayes’rule

ACMI Overview





Probabilistic Backbone Model Trace assigns a position and orientation

ui={xi, qi} to each amino acid i

The probability of a trace U = {ui} is

1( | ) ( | )NP P u u U EDM EDM

This full joint probability intractable to compute

Approximate using pairwise Markov field

Pairwise Markov-Field Model

Joint probabilities defined on a graph as product of vertex and edge potentials

AAs ( | )i i

iu

EDMAAs ,

( , )ij i ji j

u u ( | )P U EDM

GLY LYS LEU SERALA

ACMI’s Backbone Model

Observational potentials tie the map to the model

LEU SERGLY LYSALA

GLY LYS LEU SERALA

ACMI’s Backbone Model

Adjacency constraints ensure adjacent amino acids are ~3.8Å apart and in proper orientation

Occupancy constraints ensure nonadjacent amino acids do not occupy same 3D space


Constraints between adjacent amino acids

×),( jiadj uu ) |||| ( jix xxp ),( ji uup=

( | )p U EDM

AAs , AAs , AAs | | 1 | | 1


i j i j

u u u u u

EDM


otherwise1

if0),(

K||x||xuu ji

jiocc

( | )p U EDM

AAs , AAs , AAs | | 1 | | 1


i j i j

u u u u u

EDM

Constraints between all other amino acid pairs

Inferring Backbone Locations Want to find backbone layout that maximizes

AAs , AAs , AAs | | 1 | | 1


i j i j

u u u u u

EDM

Inferring Backbone Locations

Exact methods are intractable Use belief propagation (Pearl 1988)

to approximate marginal distributions

Want to find backbone layout that maximizes

, ku k i( | ) ( | )i ip u p EDM U EDM

AAs , AAs , AAs | | 1 | | 1


i j i j

u u u u u

EDM

Belief Propagation Example

LYS31 LEU32

mLYS31→LEU32

pLEU32pLYS31ˆ ˆ

Belief Propagation Example

LYS31 LEU32

mLEU32→LYS31

pLEU32pLYS31ˆ ˆ

Naïve implementation O(N2G2) N = the number of amino acids in the protein G = # of points in discretized density map

O(G2) computation for each message passed O(G log G) as Fourier-space multiplication

O(N2) messages computed & stored Approx (N-3) occupancy msgs with 1 message O(N) messages using a message accumulator

Improved implementation O(NG log G)

Scaling BP to Proteins(DiMaio and Shavlik, ICDM 2006)

To pass a message

( , )occ i ju u1ˆ ( )

ni i

i

p udu

( )ni j j

ui

m u 1 ( )nj i im u

Occupancy Message Approximation

occupancyedge potential

product of incoming msgs to i except from j

To pass a message

1ˆ( ) ( , ) ( ) n ni occ i i i i

ui

m u u u p u du

( , )occ i ju u1ˆ ( )

ni i

i

p udu

( )ni j j

ui

m u 1 ( )nj i im u


occupancyedge potential

product of all incoming msgs to i

“Weak” potentials between nonadjacent amino acids lets us approximate

1 5 62 3 4


3

3

1 3

ôcc

x

p

m

3

3

5 3

ôcc

x

p

m

3

3

6 3

ôcc

x

p

m

1 5 62 3 4


3

3ôcc

x

p 3

3ôcc

x

p 3

3ôcc

x

p

1 5 62 3 4


Send outgoing occupancy message product to a central accumulator

AAs

( )i

ACC x ( )im x

ACC

1 5 62 3 4


ACC

Then, each node’s incoming message product is computed in constant time

3 3p̂

ACC

2m 3m 4m

2 3m 4 3m

BP Output

After some number of iterations, BP gives probability distributions over Cα locations

ALA LEU PRO VAL ARG… …

… … …

LEU LEUp x VAL VALp x

ACMI’s Backbone Trace

Independently choose Cα locations that maximize approximate marginal distribution

…

…

* ˆarg max ( )i

i i ix

b p x

Example: 1XRI

HIGH

LOW0.1

0.9

0.9009Å RMSd93% complete

prob(AA at location) 3.3Å resolution density map39° mean phase error

Testset Density Maps (raw data)

Density-map resolution (Å)

Den

sity

-map

mea

n ph

ase

erro

r (d

eg.)

15

30

45

60

75

1.0 2.0 3.0 4.0

0

20

40

60

80

100

Experimental Accuracy

% C

α’s

loca

ted

with

in 2

Å o

f s

om

e C

α /

co

rre

ct C

α

ACMI ARP/wARP

TextalResolve

% backbone correctly placed% amino acids correctly identified

Experimental Accuracy on a Per-Protein Basis

AC

MI %

Cα

’s lo

cate

d

ARP/wARP % Cα’s located

Resolve % Cα’s located

Textal % Cα’s located

0

20

40

60

80

100

0 20 40 60 80 100

0

20

40

60

80

100

0 20 40 60 80 1000

20

40

60

80

100

0 20 40 60 80 100

ACMI Overview





Problems with ACMI

Biologists want location of all atoms All Cα’s lie on a discrete grid Maximum-marginal backbone model may be

physically unrealistic

Ignoring a lot of information Multiple models may better represent

conformational variation within crystal

Probability=0.4 Probability=0.35 Probability=0.25 Maximum-marginal structure

ACMI with Particle Filtering(ACMI-PF)

Idea: Represent protein using a set of static 3D all-atom protein models

Particle Filtering Overview (Doucet et al. 2000)

Given some Markov process x1:KX with observations y1:K Y

Particle Filtering approximates some posterior probability distribution over X using a set of N weighted point estimates

( ) ( )1: 1: 1: 1:

1

|N

i iK K K K K

i

p x y wt x x

Particle Filtering Overview

Markov process gives recursive formulation

1: 1: 1 1: 1 1: 1| | | |k k k k k k k kp x y p y x p x x p x y

Use importance fn. q(x k |x 0:k-1 ,y k) to grow particles

Recursive weight update,

( ) ( ) ( )1( ) ( )

1 ( ) ( )1

| |

| ,

i i ik k k ki i

k k i ik k k

p y x p x xwt wt

q x x y

Particle Filtering for Protein Structures

Particle refers to one specific 3D layout of some subsequence of the protein

At each iteration advance particle’s trajectory by placing an additional amino-acid’s atoms


Alternate extending chain left and right


Alternate extending chain left and right An iteration alternately places

Cα position bk+1 given bk

All sidechain atoms sk given bk-1:k+1

bk bk+1

sk

bk-1


Key idea: Use the conditional distribution p(bk|bi

k-1,Map) to advance particle trajectories

Construct this conditional distribution from BP’s marginal distributions

bk bk+1

sk

bk-1

Algorithmplace “seeds” bk

i for each particle i=1…N

while amino-acids remainplace bk

i+1 / bj

i-1 given bj:k

i for each i=1…N

place ski given bk

i-1:k+1 for each i=1…N

optionally resample N particlesend while


bkbk-1 bk+1

sk

… …

Backbone Step (for particle i )

(1) Sample L bk+1’s from bk-1–bk–bk+1

pseudoangle distribution

bkbk+1

1…L

bk-1

place bki+1 given bk

i for each i=1…N


pk+1(b )k+11

pk+1(b )k+12

pk+1(b )k+1L

…bk

bk-1

(2) Weight each sample by its ACMI-computed approximate marginal


i for each i=1…N

bk+11…L


pk+1(b )k+11

pk+1(b )k+12

pk+1(b )k+1L

…bk

bk-1

(3) Select bk+1 with probability

proportional to sample weight


i for each i=1…N

bk+11…L


bk-1

bk bk+1

1 1 11

L

k k k kwt p b wt

(4) Update particle weight as sum of sample weights


i for each i=1…N

Sidechain Step (for particle i )

place ski given bk


(1) Sample sk from a database of

sidechain conformations

ProteinData Bank


pk(EDM | s ) k 1

pk(EDM | s ) k 2

pk(EDM | s ) k 3

(2) For each sidechain conformation, compute probability of density map given the sidechain

place ski given bk



pk(EDM | s ) k 1 pk(EDM | s ) k

3

pk(EDM | s ) k 2

(3) Select sidechain conformation from this weighted distribution

place ski given bk



11

|M

mk k k

m

wt p s wt

EDM

(4) Update particle weight as sum of sample weights

place ski given bk


Particle Resampling

wt = 0.1wt = 0.1

wt = 0.1wt = 0.1

wt = 0.4wt = 0.4

wt = 0.3wt = 0.3

wt = 0.1wt = 0.1

wt = 0.2

wt = 0.2

wt = 0.2

wt = 0.2

wt = 0.2

wt = 0.1

wt = 0.1

wt = 0.4

wt = 0.3

wt = 0.1

Amino-Acid Sampling Order

Begin at some amino acid k with probability

ˆ( ) exp entropy ( )k kP k p b

At each step, move left to right with probability

j k

1 1

1 1

ˆ( 1) exp entropy ( )

ˆ( 1) exp entropy ( )

j j

k k

P j p b

P k p b

Experimental Methodology

Run ACMI-PF 10 times with 100 particles each Return highest-weight particle from each run Each run samples amino-acids in a different order Refine each structure for 10 iterations in Refmac5

Compare 10-structure model to others using Rfree

obs calc

obs

F FR

F

ACMI-PF Versus ACMI-Naïve

Ref

ined

Rfr

ee

Number of ACMI-PF runs

0.2

0.3

0.4

0.5

1 2 3 4 5 6 7 8 9 10

Acmi-PF

Acmi-Naive

Additionally, ACMI-PF’s models have … Fewer gaps (10 vs. 28) Lower sidechain RMS error (2.1Å vs. 2.3Å)

ACMI-PF Versus OthersA

CM

I-P

F R

free

ARP/wARP Rfree Resolve Rfree Textal Rfree

0.25

0.35

0.45

0.55

0.65

0.25 0.35 0.45 0.55 0.65

0.25

0.35

0.45

0.55

0.65

0.25 0.35 0.45 0.55 0.65

0.25

0.35

0.45

0.55

0.65

0.25 0.35 0.45 0.55 0.65

ACMI-PF Example: 2A3Q

1.79Å RMSd92% complete

2.3Å resolution 66° phase err.

ACMI Overview





Phase 4: Iterative phase improvement Use particle-filtering models to

improve density-map quality Rerun entire pipeline on

improved density map Repeat until convergence

Phase Problem

, f I Φ

Intensities

Phases

Measured by X-raycrystallography

Experimentallyestimated (e.g. MAD, MIR)

Density-Map Phasing

30° 60° 75°0°

mean phase error

calcΦ

calcI

calcΦ

calcI

calcΦexpΦ

Iterative Phase Improvement

obsI

Predicted3D model

Initialdensity map

Reviseddensity map

ACMI-PF’s Phase Improvement

Error in initial phases(deg. mean phase error)

Err

or in

AC

MI-

PF

’s p

hase

s(d

eg. m

ean

pha

se e

rro

r)

0

15

30

45

60

75

0 15 30 45 60 75

Two-Iteration ACMI

% backbone locatedIteration 1

% b

ackb

one

loca

ted

Iter

atio

n 2

50

60

70

80

90

100

50 60 70 80 90 100

Future Work: Many-iteration ACMI

0

10

20

30

40

50

60

0 1 2 3 40

5

10

15

20

1 2 3 4 5

Number of ACMI iterations Number of ACMI iterations

Ave

rag

e %

un

inte

rpre

ted

AA

s

Ave

rag

e m

ea

n p

has

e e

rro

r

Conclusions

ACMI’s three steps construct a set of all-atom protein models from a density map

Novel message approximation allows inference on large, highly-connected models

Resulting protein models are more accurate than other methods

Ongoing and Future Work

Incorporate additional structural biology background knowledge

Incorporate more complex potential functions

Further work on iterative phase improvement

Generalize my algorithms to other 3D image data

Acknowledgements

Advisor Jude Shavlik Committee

George Phillips Charles Dyer David Page Mark Craven

Collaborators Ameet Soni Dmitry Kondrashov Eduard Bitto Craig Bingman

6th floor MSCers

Center for Eukaryotic Structural Genomics

Funding UW-Madison Graduate

School NLM 1T15 LM007359 NLM 1R01 LM008796

probabilistic methods for interpreting electron-density maps

Documents

allatom protein models

probabilitiesscan density

d electrondensity map

electrondensity maps

local pentapeptide search

particle filtering dimaio

coarse backbone model

amino acid