flexscore: ensemble-based evaluation for protein structure models

23
ENSEMBLE-BASED EVALUATION FOR PROTEIN STRUCTURE MODELS Michal Jamroz 1 , Andrzej Kolinski 1 , & Daisuke Kihara 2 1 Faculty of Chemistry, Warsaw University, Poland 2 Department of Biological Sciences/Computer Science, Purdue University, USA 1 http://kiharalab.org

Upload: daisuke-kihara

Post on 12-Apr-2017

238 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Flexscore: Ensemble-based evaluation for protein Structure models

1

ENSEMBLE-BASED EVALUATION FOR PROTEIN STRUCTURE MODELS

Michal Jamroz1, Andrzej Kolinski1, & Daisuke Kihara2

1 Faculty of Chemistry, Warsaw University, Poland2 Department of Biological Sciences/Computer Science, Purdue University, USA

http://kiharalab.org

Page 2: Flexscore: Ensemble-based evaluation for protein Structure models

2

Protein Structure Comparison• Superimposition of two structures considering the

structures are rigid• Root mean square deviation (RMSD)

2

1

1),(

N

i

Bi

Ai xxNBArmsd

• CE, Dali, SSAP, 3D-SURFER (http://kiharalab.org/3d-surfer)• In protein structure prediction, structure comparison

important in evaluating structure models• GDT-TS, TM-Score• Rigid structure comparison is due to the static pictures

provided by crystal structures of proteins in PDB

Page 3: Flexscore: Ensemble-based evaluation for protein Structure models

3

But protein is intrinsically flexible!• Flexibility can be measured/observed by

• NMR• Molecular dynamics (MD) simulation• Coarse-grained model simulation, e.g. Gaussian Network model

• Even diffraction data from X-ray crystallography contains flexibility information beyond single isotropic B-factor model (Blundell, 2004; Terwilliger, 2006, ,,,)

• Intrinsic disordered proteins

(Madl et al, JMB 2006;  CcdA, NMR) (10 nano sec. MD, PDB ID: 2n2u)

Page 4: Flexscore: Ensemble-based evaluation for protein Structure models

4

Protein Structure Comparison Methods that Consider Chain Flexibility

• Weighted RMSD using B-factors (Wu & Wu, 2010)

• iterative RMSD computation (Damm &Carlson, 2006)

• Use of elastic network model (FlexE, Perez et al., 2012)

• Use of structural ensembles• KL divergence of two ensembles (L-Larsen et al. 2009)• Maximum Likelihood (THESEUS, 2006; bFit, 2010)

Page 5: Flexscore: Ensemble-based evaluation for protein Structure models

5

FlexScore (Jamroz, Kolinski, & Kihara, ISMB, Bioinformatics, 2016)

• Evaluating a computational protein structure model by comparing it to an ensemble of the target protein structure

• The ensemble comes from either NMR, MD simulation (or else)• 10 nano seconds MD simulation with explicit water molecules

• Structure Xi in an ensemble X is represented as

Tik

Tiii tREMX 1

M: a mean structureEi: displacement that follows a Gaussian distribution of Nk,3(0, S, I3), S is a k x k covariance matrixK: the number of Ca atomsRi: rotation matrixti: translation vector, 1k is a k x 1 vectorT denotes transpose of a matrix

Page 6: Flexscore: Ensemble-based evaluation for protein Structure models

6

Ensemble Superimposition T

ikTiii tREMX 1Ensemble

structures, X

kTk

kTi

iXt

111ˆ

1

1

Estimate t

Estimate R

Initialization:S=I, M= Xj, a = 0 ^^

Estimate M ^ TikiT tXM ˆ11

R computed by SVD of

TiTiki

n

ii

Tikis MRtXMRtX

nˆˆ1ˆˆ1

31ˆ

1

sh I

nnn ˆ

32

333ˆ a

mk

iism cmE

k

1

11 ,,|2ˆ

aa

Estimate a

Estimate Ss, and its Ls (eigenvector)^

^

Estimate Sh, Lh^^

a |)|,,,( lXMtRl Hierarchical log likelihood model

a: parameter of inverse Gamma distribution which L of S follows

(Theobald DL, 2012)

Page 7: Flexscore: Ensemble-based evaluation for protein Structure models

7

FlexScore (FS)• Score of a computational model Y by shifting t and rotating

with a rotation matrix by SVD of

• Score of 0 for the perfect model

• FS-GDT: defined as the average of factions of Ca atoms within FlexScore of 1, 2, 4, and 8. The score ranges [0, 1].

(analogous to GDT-TS, which is the average of fractions of Ca atoms within 1, 2, 4, and 8 Å)

k

iii

i

YMk

YFS1

supˆ11 )(

TikhT tYM ˆ1ˆ 1

Page 8: Flexscore: Ensemble-based evaluation for protein Structure models

8

FlexScore of Toy models

NMR Structures (PDB ID: 2j8p)Identical RMSD, GDT-TS, & TM-Score: 1.47, 0.95, & 0.93 to the mean structure FlexScore: Green, 1.96; Blue: 1.42

Page 9: Flexscore: Ensemble-based evaluation for protein Structure models

9

Correlation of FlexScore to the Other Scores

Target GDT-TS TMSCORE RMSD

 

<GDT-TS>

<TM>  

 <RMSD>  <FS>

T0651* -0.04 -0.13 1.00 0.27 0.36 24.02 62.76T0655 -0.83 -0.88 0.77 0.49 0.58 13.95 15.41T0657 -0.94 -0.95 0.92 0.63 0.68 7.69 9.64T0662 -0.97 -0.96 0.99 0.67 0.67 3.87 5.24T0667 -0.96 -0.98 0.98 0.57 0.69 6.73 13.34T0669 -0.83 -0.84 0.96 0.46 0.50 9.21 16.70T0673 -0.65 -0.58 0.95 0.33 0.27 11.85 22.87T0675 -0.62 -0.56 0.74 0.37 0.33 11.14 6.96T0714 -0.91 -0.92 0.98 0.78 0.79 2.67 5.24T0716 -0.82 -0.79 0.88 0.65 0.62 7.55 5.62T0763* -0.30 -0.48 0.99 0.16 0.20 18.18 54.71T0767* -0.48 -0.69 1.00 0.11 0.19 33.84 94.69T0769 -0.88 -0.87 0.80 0.50 0.53 11.58 13.22T0773 -0.91 -0.89 0.85 0.52 0.49 9.45 12.04T0777* -0.63 -0.72 1.00 0.10 0.21 31.60 81.96T0780 0.08 0.03 0.99 0.29 0.37 23.13 32.47T0782 -0.88 -0.89 0.99 0.45 0.49 9.20 17.83T0785* -0.54 -0.59 0.97 0.18 0.20 16.40 37.16T0790* -0.28 -0.57 1.00 0.11 0.19 26.15 50.85T0803 -0.27 -0.30 0.98 0.34 0.39 13.84 35.47T0808* -0.02 -0.15 0.99 0.11 0.21 26.47 70.98T0814* 0.10 -0.43 0.98 0.10 0.19 27.14 75.96T0829 -0.78 -0.72 0.95 0.47 0.42 9.63 22.38T0832* -0.41 -0.64 0.97 0.15 0.22 20.65 51.35T0833 -0.94 -0.95 0.96 0.57 0.60 7.50 11.78T0853 -0.27 -0.32 0.99 0.21 0.26 17.55 36.25T0856 -0.89 -0.92 0.99 0.69 0.77 4.01 10.81T0857 -0.89 -0.90 0.95 0.29 0.31 13.96 13.27

~200 Predicted (server) models for single chain targets from CASP10 and CASP 11* Free modeling targets

Page 10: Flexscore: Ensemble-based evaluation for protein Structure models

10

Correlation between FlexScore and RMSD

Page 11: Flexscore: Ensemble-based evaluation for protein Structure models

11

Different Evaluation by FlexScore, GDT-TS, TM-Score, & RMSD (T0716)

Green, OrangeGDT-TS: 0.52, 0.51TM-Score: 0.48, 0.49FlexSscore: 7.4, 24.1

FlexScore of Green, Orange: 2.75, 2.71GDT-TS: 0.75, 0.70; TM-Score: 0.72, 0.70; RMSD: 3.93 Å, 5.40 Å

Page 12: Flexscore: Ensemble-based evaluation for protein Structure models

12

Different Evaluation by FlexScore, GDT-TS, TM-Score, & RMSD (T0655)

Green, Orange models:GDT-TS: 0.50, 0.54TM-Score: 0.61, 0.66FlexScore: 23.05, 9.2

Page 13: Flexscore: Ensemble-based evaluation for protein Structure models

13

Different Evaluation by FlexScore, GDT-TS, TM-Score, & RMSD (T0714)

Green and orange modelGDT-TS: 0.84, 0.83; TM-Score: 0.83, 0.86 FlexScore: 4.42, 2.69

Page 14: Flexscore: Ensemble-based evaluation for protein Structure models

14

Different MD TrajectoriesT0829 (4rgi, 70 res) T0782 (4qrl, 70 res)

3 MD trajectories

FlexScore: 5.20, 5.21, 5.22 FlexScore: 3.63, 3.63, 3.63

Page 15: Flexscore: Ensemble-based evaluation for protein Structure models

15

Dependency to Length of MD Simulation

T0773, PDB ID: 2n2u, 77aa long. Left half, Correlation with the other scores; right half, average values of the scores.

Page 16: Flexscore: Ensemble-based evaluation for protein Structure models

16

FlexScore from NMR and MD Ensembles

Scores of 235 models of T0176 are compared.

Page 17: Flexscore: Ensemble-based evaluation for protein Structure models

17

CASP10 Prediction Group RankingRank FS FS-GDT GDT-TS TM RMSD

1 A A A A A

2 B D B B B

3 C B F C C

4 D C C F F

5 E F D D E

6 F E I I G

7 G O (14) G X (24) J

8 H J J L (12) I

9 I Q (17) E G D

10 J H O (14) Q (17) H

Page 18: Flexscore: Ensemble-based evaluation for protein Structure models

18

Real-value Prediction of Protein Flexibility

http://kiharalab.org/flexpred/

(Peterson, Jamroz, Kolinski, Kihara, Methods Mol. Biol, 2016)

(Jamroz, Kolisnki, Kihara, Proteins 80: 1425-1435, 2012)

Page 19: Flexscore: Ensemble-based evaluation for protein Structure models

Structural Features Avg. corr. coefficient

B-Factor 0.484

Distance to center of mass 0.509

Square of distance to center of mass (D2)

0.545

Contact number (cutoff 6 Å) -0.374

Contact number (8 Å) -0.480

Contact number (12 Å) -0.554

Contact number (15 Å) -0.568

Contact number (16 Å) -0.567

Contact number (18 Å) -0.562

Accessible Surface Area normalized 0.476

Residue depth (residue mean) -0.352

Prediction by GNM (cutoff 16 Å) 0.643

Prediction by GNM (no cutoff) 0.646

19

(592 MD trajectories from the MoDEL db)

Page 20: Flexscore: Ensemble-based evaluation for protein Structure models

Fluctuation Prediction Using Support Vector Regression

20

Features used Average corr. coeff.

RMS (Å)

B, D2, Sec, C(16), C(18), C(12), C(8) 0.667 1.042

B, D2, C(16), C(18), C(12), C(8), C(6), C(20) 0.666 1.042

B, D2, C(16), C(18), C(12), C(8), C(6), C(20), C(22) 0.667 1.042

B, C(16), C(18), C(12), C(8), C(6), C(20), C(22) 0.669 1.073

C(16), C(18), C(12), C(8), C(6), C(15) C(20), C(22) 0.660 1.092

B, B-factor; D2, square of the distance to the center of mass; C(x), the contact number with x Å cutoff

(Jamroz, Kolisnki, Kihara, Proteins 80: 1425-1435, 2012)

Page 21: Flexscore: Ensemble-based evaluation for protein Structure models

Examples of Predicted Fluctuations21

1gpc 218 aa

1a1x108aa

Page 22: Flexscore: Ensemble-based evaluation for protein Structure models

22

Summary• Developed FlexScore, which evaluates computational

protein structure models by considering flexibility of target proteins

• Flexibility is represented by a structure ensemble, which come from MD or NMR, or prediction using FlexPred

• Distinguishes discrepancy of a model at a flexible region and a rigid region of the target protein

• Overall correlates well with existing scores (GDT-TS, TM-Score), but occasionally have different, more reasonable evaluation

Available atFlexScore: https://bitbucket.org/mjamroz/flexscore FlexPred: http://kiharalab.org/flexpred/

Page 23: Flexscore: Ensemble-based evaluation for protein Structure models

Acknowledgement

@kiharalab

8