flexscore: ensemble-based evaluation for protein structure models

Post on 12-Apr-2017

238 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

ENSEMBLE-BASED EVALUATION FOR PROTEIN STRUCTURE MODELS

Michal Jamroz1, Andrzej Kolinski1, & Daisuke Kihara2

1 Faculty of Chemistry, Warsaw University, Poland2 Department of Biological Sciences/Computer Science, Purdue University, USA

http://kiharalab.org

2

Protein Structure Comparison• Superimposition of two structures considering the

structures are rigid• Root mean square deviation (RMSD)

2

1

1),(

N

i

Bi

Ai xxNBArmsd

• CE, Dali, SSAP, 3D-SURFER (http://kiharalab.org/3d-surfer)• In protein structure prediction, structure comparison

important in evaluating structure models• GDT-TS, TM-Score• Rigid structure comparison is due to the static pictures

provided by crystal structures of proteins in PDB

3

But protein is intrinsically flexible!• Flexibility can be measured/observed by

• NMR• Molecular dynamics (MD) simulation• Coarse-grained model simulation, e.g. Gaussian Network model

• Even diffraction data from X-ray crystallography contains flexibility information beyond single isotropic B-factor model (Blundell, 2004; Terwilliger, 2006, ,,,)

• Intrinsic disordered proteins

(Madl et al, JMB 2006;  CcdA, NMR) (10 nano sec. MD, PDB ID: 2n2u)

4

Protein Structure Comparison Methods that Consider Chain Flexibility

• Weighted RMSD using B-factors (Wu & Wu, 2010)

• iterative RMSD computation (Damm &Carlson, 2006)

• Use of elastic network model (FlexE, Perez et al., 2012)

• Use of structural ensembles• KL divergence of two ensembles (L-Larsen et al. 2009)• Maximum Likelihood (THESEUS, 2006; bFit, 2010)

5

FlexScore (Jamroz, Kolinski, & Kihara, ISMB, Bioinformatics, 2016)

• Evaluating a computational protein structure model by comparing it to an ensemble of the target protein structure

• The ensemble comes from either NMR, MD simulation (or else)• 10 nano seconds MD simulation with explicit water molecules

• Structure Xi in an ensemble X is represented as

Tik

Tiii tREMX 1

M: a mean structureEi: displacement that follows a Gaussian distribution of Nk,3(0, S, I3), S is a k x k covariance matrixK: the number of Ca atomsRi: rotation matrixti: translation vector, 1k is a k x 1 vectorT denotes transpose of a matrix

6

Ensemble Superimposition T

ikTiii tREMX 1Ensemble

structures, X

kTk

kTi

iXt

111ˆ

1

1

Estimate t

Estimate R

Initialization:S=I, M= Xj, a = 0 ^^

Estimate M ^ TikiT tXM ˆ11

R computed by SVD of

TiTiki

n

ii

Tikis MRtXMRtX

nˆˆ1ˆˆ1

31ˆ

1

sh I

nnn ˆ

32

333ˆ a

mk

iism cmE

k

1

11 ,,|2ˆ

aa

Estimate a

Estimate Ss, and its Ls (eigenvector)^

^

Estimate Sh, Lh^^

a |)|,,,( lXMtRl Hierarchical log likelihood model

a: parameter of inverse Gamma distribution which L of S follows

(Theobald DL, 2012)

7

FlexScore (FS)• Score of a computational model Y by shifting t and rotating

with a rotation matrix by SVD of

• Score of 0 for the perfect model

• FS-GDT: defined as the average of factions of Ca atoms within FlexScore of 1, 2, 4, and 8. The score ranges [0, 1].

(analogous to GDT-TS, which is the average of fractions of Ca atoms within 1, 2, 4, and 8 Å)

k

iii

i

YMk

YFS1

supˆ11 )(

TikhT tYM ˆ1ˆ 1

8

FlexScore of Toy models

NMR Structures (PDB ID: 2j8p)Identical RMSD, GDT-TS, & TM-Score: 1.47, 0.95, & 0.93 to the mean structure FlexScore: Green, 1.96; Blue: 1.42

9

Correlation of FlexScore to the Other Scores

Target GDT-TS TMSCORE RMSD

 

<GDT-TS>

<TM>  

 <RMSD>  <FS>

T0651* -0.04 -0.13 1.00 0.27 0.36 24.02 62.76T0655 -0.83 -0.88 0.77 0.49 0.58 13.95 15.41T0657 -0.94 -0.95 0.92 0.63 0.68 7.69 9.64T0662 -0.97 -0.96 0.99 0.67 0.67 3.87 5.24T0667 -0.96 -0.98 0.98 0.57 0.69 6.73 13.34T0669 -0.83 -0.84 0.96 0.46 0.50 9.21 16.70T0673 -0.65 -0.58 0.95 0.33 0.27 11.85 22.87T0675 -0.62 -0.56 0.74 0.37 0.33 11.14 6.96T0714 -0.91 -0.92 0.98 0.78 0.79 2.67 5.24T0716 -0.82 -0.79 0.88 0.65 0.62 7.55 5.62T0763* -0.30 -0.48 0.99 0.16 0.20 18.18 54.71T0767* -0.48 -0.69 1.00 0.11 0.19 33.84 94.69T0769 -0.88 -0.87 0.80 0.50 0.53 11.58 13.22T0773 -0.91 -0.89 0.85 0.52 0.49 9.45 12.04T0777* -0.63 -0.72 1.00 0.10 0.21 31.60 81.96T0780 0.08 0.03 0.99 0.29 0.37 23.13 32.47T0782 -0.88 -0.89 0.99 0.45 0.49 9.20 17.83T0785* -0.54 -0.59 0.97 0.18 0.20 16.40 37.16T0790* -0.28 -0.57 1.00 0.11 0.19 26.15 50.85T0803 -0.27 -0.30 0.98 0.34 0.39 13.84 35.47T0808* -0.02 -0.15 0.99 0.11 0.21 26.47 70.98T0814* 0.10 -0.43 0.98 0.10 0.19 27.14 75.96T0829 -0.78 -0.72 0.95 0.47 0.42 9.63 22.38T0832* -0.41 -0.64 0.97 0.15 0.22 20.65 51.35T0833 -0.94 -0.95 0.96 0.57 0.60 7.50 11.78T0853 -0.27 -0.32 0.99 0.21 0.26 17.55 36.25T0856 -0.89 -0.92 0.99 0.69 0.77 4.01 10.81T0857 -0.89 -0.90 0.95 0.29 0.31 13.96 13.27

~200 Predicted (server) models for single chain targets from CASP10 and CASP 11* Free modeling targets

10

Correlation between FlexScore and RMSD

11

Different Evaluation by FlexScore, GDT-TS, TM-Score, & RMSD (T0716)

Green, OrangeGDT-TS: 0.52, 0.51TM-Score: 0.48, 0.49FlexSscore: 7.4, 24.1

FlexScore of Green, Orange: 2.75, 2.71GDT-TS: 0.75, 0.70; TM-Score: 0.72, 0.70; RMSD: 3.93 Å, 5.40 Å

12

Different Evaluation by FlexScore, GDT-TS, TM-Score, & RMSD (T0655)

Green, Orange models:GDT-TS: 0.50, 0.54TM-Score: 0.61, 0.66FlexScore: 23.05, 9.2

13

Different Evaluation by FlexScore, GDT-TS, TM-Score, & RMSD (T0714)

Green and orange modelGDT-TS: 0.84, 0.83; TM-Score: 0.83, 0.86 FlexScore: 4.42, 2.69

14

Different MD TrajectoriesT0829 (4rgi, 70 res) T0782 (4qrl, 70 res)

3 MD trajectories

FlexScore: 5.20, 5.21, 5.22 FlexScore: 3.63, 3.63, 3.63

15

Dependency to Length of MD Simulation

T0773, PDB ID: 2n2u, 77aa long. Left half, Correlation with the other scores; right half, average values of the scores.

16

FlexScore from NMR and MD Ensembles

Scores of 235 models of T0176 are compared.

17

CASP10 Prediction Group RankingRank FS FS-GDT GDT-TS TM RMSD

1 A A A A A

2 B D B B B

3 C B F C C

4 D C C F F

5 E F D D E

6 F E I I G

7 G O (14) G X (24) J

8 H J J L (12) I

9 I Q (17) E G D

10 J H O (14) Q (17) H

18

Real-value Prediction of Protein Flexibility

http://kiharalab.org/flexpred/

(Peterson, Jamroz, Kolinski, Kihara, Methods Mol. Biol, 2016)

(Jamroz, Kolisnki, Kihara, Proteins 80: 1425-1435, 2012)

Structural Features Avg. corr. coefficient

B-Factor 0.484

Distance to center of mass 0.509

Square of distance to center of mass (D2)

0.545

Contact number (cutoff 6 Å) -0.374

Contact number (8 Å) -0.480

Contact number (12 Å) -0.554

Contact number (15 Å) -0.568

Contact number (16 Å) -0.567

Contact number (18 Å) -0.562

Accessible Surface Area normalized 0.476

Residue depth (residue mean) -0.352

Prediction by GNM (cutoff 16 Å) 0.643

Prediction by GNM (no cutoff) 0.646

19

(592 MD trajectories from the MoDEL db)

Fluctuation Prediction Using Support Vector Regression

20

Features used Average corr. coeff.

RMS (Å)

B, D2, Sec, C(16), C(18), C(12), C(8) 0.667 1.042

B, D2, C(16), C(18), C(12), C(8), C(6), C(20) 0.666 1.042

B, D2, C(16), C(18), C(12), C(8), C(6), C(20), C(22) 0.667 1.042

B, C(16), C(18), C(12), C(8), C(6), C(20), C(22) 0.669 1.073

C(16), C(18), C(12), C(8), C(6), C(15) C(20), C(22) 0.660 1.092

B, B-factor; D2, square of the distance to the center of mass; C(x), the contact number with x Å cutoff

(Jamroz, Kolisnki, Kihara, Proteins 80: 1425-1435, 2012)

Examples of Predicted Fluctuations21

1gpc 218 aa

1a1x108aa

22

Summary• Developed FlexScore, which evaluates computational

protein structure models by considering flexibility of target proteins

• Flexibility is represented by a structure ensemble, which come from MD or NMR, or prediction using FlexPred

• Distinguishes discrepancy of a model at a flexible region and a rigid region of the target protein

• Overall correlates well with existing scores (GDT-TS, TM-Score), but occasionally have different, more reasonable evaluation

Available atFlexScore: https://bitbucket.org/mjamroz/flexscore FlexPred: http://kiharalab.org/flexpred/

Acknowledgement

@kiharalab

8

top related