dukka application of monte carlo simulation: removing averaging artifacts in protein structure...

40
Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

Upload: dale-smith

Post on 18-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

Dukka

Application of Monte Carlo Simulation: Removing averaging artifacts in protein

structure prediction

Page 2: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• Background

• Protein Structure Prediction and CASP

• TASSER algorithm

• MCORE algorithm

Outline

Page 3: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• Experimental or computational method often output results as an ensemble of protein structures.

– NMR, Protein Structure Prediction, Protein Docking, RNA Structure Prediction

• A single representative structure is required to compare or do further analysis.

• Representative structure (consensus structure) = a centroid structure by averaging the Cartesian coordinates of the ensemble of superimposed structures.

• RMSD between the ‘averaged structure’ and any reference structure is always less than or equal to the average RMSD of the individual members. (Zagrovic et al.)

• However, the centroid structure has averaging artifacts rendering bond angles and bond lengths to be unphysical.

Background

Page 4: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• Critical Assessment of Structure prediction of Proteins (CASP) is a biannual contest where different groups try to predict structure of a protein whose structure is not released to the outside world.

• One of the most popular and objective contest in the bioinformatics field.

• CASP8 just over.

• Major observations from CASP7:– Methods are more or less ripe enough

– Consensus servers usually outperform individual servers

– A lot of work needed to be done in the refinement step

Protein Structure Prediction and CASP

Page 5: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• Given a set of conformations obtain a conformation that is closest to the native structure.

• Molecular force fields like AMBER, CHARMM can be utilized but as we know they are not perfect.

• Furthermore, still lack of perfect definition of “closest”. Hence, CASP coming up with new ideas of other measures to measure the closeness to the native like HB score and so on.

• Often, the ‘most closest prediction’ is not ranked top 1. Hence, ‘Refinement’ is getting a lot of attention.

Refinement

Page 6: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

TASSER algorithm(Threading/ASSembly/Refinement)

Zhang & Skolnick, 2004

Centroid Structure

Page 7: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• TASSER is one of the best prediction server in both CASP7 and CASP8.

• A large number of conformations is generate after the assembly step. However, we can submit only a couple of models.

• Clustering is utilized and the centroid of the largest cluster (Combo model) is predicted as the output and has proven to be successful.

• Artifacts in ‘Tasser (combo) output’– Unrealistic bond lengths and bond angles due to averaging artifactsScope– To fix these unrealistic bond lengths and bond angles

Problem Identification

C-alpha Space

Energy Minimization!

Page 8: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

Combo and Closc Models

COMBO model : The centroid structure of the most dense cluster.CLOSC model : The structure that is closest to the centroid of the most dense cluster.

Fra

ctio

n of

cla

shes

Page 9: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 10: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 11: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• PULCHRA - based on steepest descent minimization and a simple force field.

• Sometimes, can not come out of the kinetic trap.

• Heavily distorted chain, the minimization procedure does not converge or the optimized model still exhibits irregularities.

PULCHRA

Rotkiewicz and Skolnick, 2008

Page 12: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 13: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

Generate an extended structure based on Combo model

Monte-Carlo Minimization

MCORE

Output the best structure

Start from a ‘close-by model’

Page 14: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

Using the distance distribution from the PDB, mainly three types: x-Pro = 3.77, x-{ALA|ARG|ASN|LEU|LYS|MET} = 3.81, and x-{ASP|CYS|GLU|GLY|HIS|ILE|PHE|SER|THR|TRP|TYR|VAL} = 3.80

Generation of Extended Structure

Page 15: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• Two major components of any Monte-Carlo Approach

– Energy Function

• Can be generic force field or any combination of terms

– Move Sets

• Critical to the performance of the algorithm, more of an art(?)

– Convergency Criteria

• Naïve way (Run for certain number of steps)

• Introduce some criteria based on the generated conformations

Monte-Carlo

Page 16: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• Starting from a state A, make a change in the configuration to obtain a new (nearby) configuration B.

• Compute EB

• If (EB < EA), assume the new configuration, since it is a desirable thing.

• If (EB > EA), calculate the probability p

• Draw r from uniform distribution [0,1], if r < p then accept the new configuration B else reject the new configuration B.€

p = e−(EB −EA ) /T

Monte-Carlo: Metropolis Criteria

Page 17: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• Move Sets

– Global move-set

• Rest-all bead move

– Local move-set

• 1-bead move

• 2-bead move

• 3-bead move

• 4-bead move

• 5-bead move

– End-bond move

• 1,2,3-bead C-terminal end bond move

• 1,2,3-bead N-terminal end bond move

Move Sets

Page 18: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• Calculate the unit vector along axis defined by i-1 and i+1

• Calculate the rotation matrix around this vector

• Calculate the new position of i

• Important thing is to preserve the bond length i.e. to preserve the distance between consecutive C-alphas.

i-1

i

i+1

Move Sets

One bead move

Two bead move

Page 19: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

Four-bead move

Three-bead move

Five-bead move

Rest-bead move

Axis of rotation

Page 20: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

End-bond Move Sets

Axis of rotation

Page 21: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 22: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 23: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

kexcl (rkl − ro_ excl )2 + kang (ϕ i,i+1,i+2 −ϕ o _ ang )2 + kclosk=1

N −2∑

l=k+2

N∑

k=1

N −2∑ (dkkt

− do _ clo )2

k=1

N∑

otherwise same and150 if 150 70 if 70 2i1,ii,_oo ><= ++ andango ϕϕ

075.0=angk 4.5=closk001.0_ =clood0.4_ =exclor

9.2=exclk

Energy Function

Excluded volume Bond angle Closeness to target

Penalize if the distance is less than 4.0A

Penalize if the angle is not between 70 and 150

Penalize if the difference in C-alpha position between the target and starting structure is not with-in certain cutoff

N: Number of C-alpha atoms

Page 24: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• Before doing the actual computation, have to test whether the move sets and energy function is properly working or not.

• So, have to design some test cases. Positive test cases would be to drive extended structure to native structure.– Desired results:

should be able to drive ‘very close’ to extended structure to native structure in relatively short number of steps

Assessment of Move Sets and Energy Function

Page 25: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• 1363 proteins less than 200 residues and the combo RMSD to the native is lesser than 6.5 Å.

• 1363 Centroid structures (COMBO models)

• 1363 CLOSC models

• 1363 Close-by structures (CLOSC models + Pulchra Refinement)

• 1363 Native structures.

Data Set

Page 26: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

Driving Extended to NativeA

vera

ge E

nerg

y

Ave

rage

RM

SD

to N

AT

IVE

)

Steps Steps

0.060.045

0.041

10000 steps RMSD = 0.039

Page 27: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

Ext-refined Vs CA

Driving Extended to Native

0.033Å

Page 28: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

i l

| rmsd_diff((i –l))| < Tolerance value, where l = i+j , j=1,…,L

Tried with different value of L and L=49 and Tolerance value = 0.005 seems reasonable.

Convergency criteria

Page 29: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• MCORE: Start from a ‘close-by model and drive it towards the COMBO model.

• CLOSC models as the close-by models.

– When close-by model is readily available

• MCORE-EXT: Start from an extended structure and drive it towards the COMBO model.

– When close-by model is not readily available

Propose two algorithms

Page 30: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

MCORE: Driving Close-by models to COMBO A

vera

ge E

nerg

y

Steps

Page 31: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

Why cannot go much closer to COMBO?F

ract

ion

of A

tom

s C

lash

ing

in C

OM

BO

Fra

ctio

n of

Ato

ms

Cla

shin

g in

M

CO

RE

RMSD of MCORE to COMBO (Å) RMSD of MCORE to COMBO (Å)

Page 32: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

38 proteins had even lesser RMSD than the respective combo model

MCORE Vs Combo

RMSD of COMBO to NATIVE

RM

SD

of

MC

OR

E to

NA

TIV

E (Å

)

Page 33: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

Comparison of Different Models

3.35 3.36 3.54 3.28

Fra

ctio

n of

Ato

ms

in C

lash

es

0.010 0.065 0.000 0.63

RM

SD

to N

ativ

e (Å

)

Page 34: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

0.770 0.746 0.747 0.754

TM-score of four models

Page 35: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

Avg. RMSD to Native (Å)

Avg clash < 1.9 Avg clash < 3.6

Combo 3.28 0.03 0.630

Closc 3.54 0 0.614

MCORE 3.35 0 0.010

Pulchra (Closc) 3.54 0 0

MCORE(EXT, 2000 steps)

3.35 0 0.011

Pulchra (Combo) 3.36 0.005 0.065

Results

Page 36: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

Some Examples

Page 37: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

1akhA refined Vs native 1akhA com Vs refined

1akhA com Vs native

12 clashes

0.354Å

0.68Å

0.78Å

1akhA pulchra Vs native

0.674Å

Page 38: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

3bbn_ refined Vs Native3bbn_ pulchra Vs Native

3bbn_ comboVs Native3bbn_ refined Vs combo

2.918Å

2.948Å

3.099Å

0.852Å

Page 39: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• Built the main chain atoms of the refined Cα trace.

• Rebuilt side-chains using two methods

– Pulchra (-c)

– Scwrl

3.92(all) 2.868(cα) 3.95(all)

All-atom model reconstruction

Page 40: Dukka Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

• Designed an algorithm to remove averaging artifacts and applied it to refine combo model.

• Acknowledgments

– Dr. Jeff Skolnick and all the members of the Skolnick Lab, especially Lila, Shashi, Hongyi, Seung Yup,…….

– Dr. Dennis Livesay

• Future Works

– Refinement in All-atom space

Conclusion