rigid body docking -...
TRANSCRIPT
Rigid Body DockingWilly Wriggers
School of Health Information Sciences & Institute of Molecular MedicineUniversity of Texas – Houston
14 Years of Multi-Scale Modeling in EMGuoji Wang, Claudine Porta, Zhongguo Chen, Timothy S. Baker, John E. Johnson:Identification of a Fab interaction footprint site on an icosahedral virus by cryoelectron microscopy and X-ray crystallography. Nature, 355:275, 1992.
Phoebe L. Stewart, Stephen D. Fuller, Roger M. Burnett:Difference imaging of adenovirus: bridging the resolution gap between X-ray crystallography and electron microscopy. EMBO J., 12:2589, 1993.
“At that time, placing an atomic structure into an EM map seemed like a very dangerous idea…”Phoebe Stewart, 2003
1998: “Simulated Markers”
Actin filament: Reconstruction from EM data at 20Å resolution rmsd: 1.1Å
Willy Wriggers, Ronald A. Milligan, Klaus Schulten, and J. Andrew McCammon:Self-Organizing Neural Networks Bridge the Biomolecular Resolution Gap. J. Mol. Biol., 284:1247, 1998
1999: The First Algorithmic Packages
Willy Wriggers, Ronald A. Milligan, and J. Andrew McCammon:Situs: A Package for Docking Crystal Structures into Low-Resolution Maps from Electron Microscopy. J. Structural Biology, 125:185, 1999
Niels Volkmann and Dorit Hanein:Quantitative Fitting of Atomic Models into Observed Densities Derived by Electron Microscopy. J. Structural Biology, 125:176, 1999
2000-Present: The “Gold Rush” era: Many other labs join the effort to develop multi-scale modeling and visualization tools…
Advantages of a Collaborative Design
VMD,ChimeraSculptor
Resumé: 2001-2006
• ~2000 unique, registered users of Situs• 6 workshops in Houston, San Diego, New Hampshire, Finland, France• ~1000 citations• 34 collaborators around the world• 25 publications (our lab)• ~50 publications using Situs (collaborators)
( ) ⎟⎟⎠
⎞⎜⎜⎝
⎛ −= 22
23expσ
rrG
Actin filament(Holmes et al., 1990)
Convolution withGaussian:
Q: We can lower the resolution of 3D data, but how can one increase it?A: Combine low- with high-resolution data by flexible and rigid-body fitting.
2σ=0Å 2σ=10Å 2σ=20Å 2σ=40Å
Combining Multi-Resolution Biophys. Data
( )( )
*em1
calc
( )C
ff
f
e
e
ρ
ρ
− ⊗
⊗
=
⎡ ⎤×⎢ ⎥⎢ ⎥⎣ ⎦
T
LOW-RESOLUTIONEM MAP
COMPUTE CORRELATION
ROTATE
FFT
UPDATE LATTICE:SAVE MAX. C VALUE + CORRESPONDING Θ,Φ,Ψ
ATOMICSTRUCTURE
FFT
TARGET PROBE
GAUSSIAN g
( )*em ( )f e ρ⊗ h
em( )ρ r atomic( )ρ r
atomic ( )ρ rRFILTER e
atomic
calc
( )
( )
g ρ
ρ
⊗
≡
rR
r
( )calc ( )ef ρ⊗ h
ROTATIONAL SPACE3 EULER ANGLES
Θ,Φ,Ψ → R
FILTER e
6D Search with FTM
Grid size 6ÅResolution 15Å90 steps (30481 rotations)
Only Laplacian filtering successfully restores the initial pose
standard cross-correlation
w/ Laplacianfiltering
Example: RecA
E. Coli RNA polymerase
(Seth Darst)
E. Coli RNA polymerase + factor GreB
Need to perform difference mapping to localize GreB(difficult at variable helical symmetry)
Problem: different helical arrays
Registration of Two EM Maps
Rigid-body docking: The RNAP “jaws” are open in presence of GrepB factor, perform flexible map fitting
Map fitting new in Situs 2.2.
Registration and Difference Mapping
FRM: Fast Rotational Matching
Euler angle search is expensive!
9º angular sampling (30481 rotations) requires > 10 minutes on standard workstation for rotations only. Rotations + translations: 10-20 hours.
Our Goal: We seek to FFT-accelerate rotational search in addition to translational search.
To do this, we need to do the math in rotational space and take advantage of expressions similar to convolution theorem that are best described bygroup theory.
Target map f
Probe map g
lmY are the spherical harmonic functions
∑ ∑−
= −=
=1
0
)()(ˆ)(B
l
l
lmlmlm uYsfsuf
∑ ∑−
= −=
=1
0
)()(ˆ)(B
l
l
lmlmlm uYsgsug
Expansion in Spherical Harmonics
FRM3D Method
3
( ) ( ) ( , , )c R f g R c α β γ= ⋅ =∫R
The correlation function is:
Expanding f and g in spherical harmonics as beforewe arrive at: 2 2
ˆ( , , ) ( ) ( )l l lpq qr pr
l
c p q r d d Iπ π= ∑
dsssgsfI lrlplpr
2
0
)(ˆ)(ˆ∫∞
=where:
α,β,γ are specially chosen Euler angles (origin shift).
13 ˆ( , , ) ( )Dc FFT cα β γ −=
The quantities (which come from rotation group theory) are precomputed using a recursive procedure.
)( 2πl
pqd
FRM
( ) ( , , )c R c α β γ=
2 2ˆ( , , ) ( ) ( )l l l
pq qr prl
c p q r d d Iπ π= ∑Precomputations:
FFT setup)( 2
πlpqd
Reading mapsCOMs & sampling
lprlrlp Irgrf ),(ˆ),(ˆ
ˆ( , , )c p q r1
3 ˆ( , , ) ( )Dc FFT cα β γ −=
Postprocessing& Saving
Cro
wth
er
( ) ( , ; )c R c φ ψ θ=ˆ( , ; ) ( )l l
pr prl
c p r d Iθ θ= ∑
Precomputations:FFT setup
Reading mapsCOMs & sampling
lprlrlp Irgrf ),(ˆ),(ˆ
ˆ( , ; )kc p r θ1
2 ˆ( , ; ) ( )k Dc FFT cφ ψ θ −=
Postprocessing& Saving
k =
0, …
, B
)(, klprB
kk d θθ π=
Comparison between FRM3D and Crowther
Timings of FRM3D and Crowther
(seconds)
37.43371.4°3.7519.33°0.971.666°
FRMCrowtherangular sampling
FRM6D (Rigid-Body Matching)
5 angular parameters.
The correlation function is now:
1 linear parameter remains, distance δ of movement along the z axis.
emρ
calcρδ
( ) ( )3
em calc
( , ; )
( ) ( '; ).e e
c R R
R Rρ ρ
δ
δ⊗ ⊗
′ =
⋅∫R
Rigid-Body Search by 5D FFT
The 5D Fourier transform of the correlation function turns out to be:
,
ˆ( , , , , ; ) ( 1) ( ).n l l l l llnh hm nh h m mnm
l l
c n h m h m d d d d Iδ δ′ ′ ′′ ′ ′ ′−
′
′ ′ = − ∑
The quantities are the so called two-center integrals, corresponding to the spherical harmonic transforms of the two maps, at a distance δ of one another:
( )llmnmI δ′
′
1 1 ' ' 20 02 2
0 0
ˆ ˆ( ) ( )( ) ( ) ( ) ( ) ( )sinll lm l m l lmnm em calc n nI l l r r d r dr d d
π
δ ρ ρ β β β β∞
′ ′′
⎡ ⎤′ ′ ′= + + ⎢ ⎥
⎣ ⎦∫ ∫
Test Cases
1afw (peroxisomal thiolase)
1nic (copper-nitrite reductase)
7cat (catalase)
1e0j (Gp4D helicase)
Test Cases
1
10
100
1000
10000
1000 10000 100000 1000000 10000000
Number of voxels
Tim
e (m
in)
FTMFRMFTMFRM
Efficiency: FRM vs. FTM
11° sampling,
standard workstation.
FRM
FTM
FTM: 3D FFT + 3D rot. searchFRM: 5D FFT + 1D trans. search
typical EM map size
• Gain: 2-4 orders of magnitude for typical EM map
Correlations: SummaryInput
Laplacian filterNo filter Local mask
Situs 6D exhaustive searches:• Rigid Body• Fast Translational Matching (2.2) • Fast Rotational Matching (2.3)• Density Filtering
Increasing Fitting Contrast
Summary: Correlation Based Matching
Questions?
Alternative
Alternative: “Simulated Markers”
Actin filament: Reconstruction from EM data at 20Å resolution rmsd: 1.1Å
Reduced Representations of Biomolecular Structure
Reduced Representations of Biomolecular Structure
emiwcalc
jw
Feature points (fiducials, landmarks), reduce complexity of search space
Useful for:
•Rigid-body fitting•Flexible fitting •Interactive fitting / force feedback•Building of deformable models
1w
2w
3w
4w
Vector QuantizationLloyd (1957)
Linde, Buzo, & Gray (1980)Martinetz & Schulten (1993)
Digital Signal Processing,Speech and Image Compression.Topology-Representing Network.
}
{ }jwEncode data (in ) using a finite set (j=1,…,k) of codebook vectors.3=ℜd
Delaunay triangulation divides into k Voronoi polyhedra (“receptive fields”):3ℜ
{ }jwvwvv jii ∀−≤−ℜ∈= 3V
1w
2w
3w
4w
Linde, Buzo, Gray (LBG) AlgorithmEncoding Distortion Error:
ii
mijiE wv∑ −=
voxels)(atoms,
2
)(
Lower iteratively: Gradient descent( ){ }( )twE j
( ) ( ) ( ) . 2
1 )( )(∑ −⋅=∂∂
⋅−=−−≡∆i
iriirjr
rrr mwvwEtwtwtw δεε
:r∀
Inline (Monte Carlo) approach for a sequence selected at random according to weights
( )tvi:im
( )( ). ~ )( )( riirjr wtvtw −⋅⋅=∆ δε
How do we avoid getting trapped in the many local minima of E?
Soft-Max Adaptation
1 1 0 )1(10
−===
−≤≤−≤− −
ksss
wvwvwv
rrr
kjijiji …
{ }( ) . )( ,~ 2k
1ri
i
s
j mijiewE wvr
∑ −∑−
=
=λ
λ
( ) { }( )jir wtvs ,
Avoid local minima by smoothing of energy function (here: TRN method):
( )( ) , ~ )( : ri
s
r wtvetwrr
−⋅⋅=∆∀−λε
Where is the closeness rank:
Note: LBG algorithm.not only “winner” , also second, third, ... closest are updated.
:0→λ:0≠λ ( )ijw
Can show that this corresponds to stochastic gradient descent on
Note: LBG algorithm.parabolic (single minimum).
. ~ :0 EE →→λE~ :∞→λ } ( )t λ⇒
QuestionAnswer
Codebook vector variability arises due to:• statistical uncertainty,• spread of local minima.
A small variability indicates good convergence behavior.Optimum choice of # of vectors k: variability is minimal.
Q: How do we know that we have found the global minimum of E?A: We don’t (in general).
But we can compute the statistical variability of the by repeating thecalculation with different seeds for random number generator.
{ }jw
Convergence and Variability
EMlow res. data
Xtalstructure
)(hjw )(l
jw
•Estimate optimum k with variability criterion. •Index map I: (m, n = 1,…,k).• k! = k (k-1)…2 possible combinations.• For each index map I perform a least squares fit of the to the . • Quality of I: residual rms deviation
• Find optimal I by direct enumeration of the k! cases (minimum of ).
nm →
)(ljw)(
)(h
jIw
I∆
∑ −=
=∆k
jI
lj
hjIk ww
1
2)()()(
1
Single-Molecule Rigid-Body Docking
Application Example
ncd monomer and dimer-decorated microtubules (Milligan et al., 1997)ncd monomer crystal structure (Fletterick et al., 1996,1998)
Search for Conformations
Two possible ranking criteria:
• Codebook vector rms deviation ( ).• Overlap between both data sets:
Correlation coefficient:
21
,,
2,,
21
,,
2,,
,,,,,,
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛
⋅=
∑∑
∑
zyxzyx
zyxzyx
zyxzyxzyx
hl
lh
lhC
I∆
ncd motor (white, shown with ATP nucleotide) docked to EM map (black) using k=7 codebook vectors
Reduced Search FeaturesTop 20, 7!=5040 possible pairs of codebook vectors.
I (permutation)hlCI∆
For a fixed k, codebook rmsd is more stringent criterion than correlation coefficient!
Performance (I)
Dependence on resolution (simulated EM map, automatic assignment of k from ):3 9k≤ ≤
Deviation from start structure (PDB: 1TOP) used to generate simulated EM map.
Accurate matching up to ~30ÅCatastrophic misalignment
Performance (II)Is minimum vector variability a suitable choice for optimum k?
10 test systems, simulated EM densities from 2-100Å.
2-20Å (reliable fitting)22-50Å (borderline)52-100Å (mismatches)
Reasonable correlation with actual deviation
No “false positives” for resolution values < 20Åand variability < 1Å.
3 9k≤ ≤
Wriggers & Birmanns, J. Struct. Biol 133, 193-202 (2001)
Summary: Classic Situs (Versions 1.x)
Advantages of vector quantization:•Fast (seconds of compute time).•Reduced search is robust.
Limitations:•Original k->k algorithm works best for single molecules, not for matching subunits to larger densities.•Largely superseded by FTM and FRM in Situs 2.x•But newer application allows k->l matching (Sculptor: Stefan Birmanns)
Resources and Further Reading
WWW: http://situs.biomachina.org
Tools:colores (FTM)Contact me for beta version of FRMqvol, qpdb, qdock, qrange (vector quantization)
WWW: http://sculptor.biomachina.org
Acknowledgement:Yao Cong, Stefan Birmanns, Julio Kovacs, Pablo Chacon (http://biomachina.org)