flexscore: ensemble-based evaluation for protein structure models
Post on 12-Apr-2017
238 Views
Preview:
TRANSCRIPT
1
ENSEMBLE-BASED EVALUATION FOR PROTEIN STRUCTURE MODELS
Michal Jamroz1, Andrzej Kolinski1, & Daisuke Kihara2
1 Faculty of Chemistry, Warsaw University, Poland2 Department of Biological Sciences/Computer Science, Purdue University, USA
http://kiharalab.org
2
Protein Structure Comparison• Superimposition of two structures considering the
structures are rigid• Root mean square deviation (RMSD)
2
1
1),(
N
i
Bi
Ai xxNBArmsd
• CE, Dali, SSAP, 3D-SURFER (http://kiharalab.org/3d-surfer)• In protein structure prediction, structure comparison
important in evaluating structure models• GDT-TS, TM-Score• Rigid structure comparison is due to the static pictures
provided by crystal structures of proteins in PDB
3
But protein is intrinsically flexible!• Flexibility can be measured/observed by
• NMR• Molecular dynamics (MD) simulation• Coarse-grained model simulation, e.g. Gaussian Network model
• Even diffraction data from X-ray crystallography contains flexibility information beyond single isotropic B-factor model (Blundell, 2004; Terwilliger, 2006, ,,,)
• Intrinsic disordered proteins
(Madl et al, JMB 2006; CcdA, NMR) (10 nano sec. MD, PDB ID: 2n2u)
4
Protein Structure Comparison Methods that Consider Chain Flexibility
• Weighted RMSD using B-factors (Wu & Wu, 2010)
• iterative RMSD computation (Damm &Carlson, 2006)
• Use of elastic network model (FlexE, Perez et al., 2012)
• Use of structural ensembles• KL divergence of two ensembles (L-Larsen et al. 2009)• Maximum Likelihood (THESEUS, 2006; bFit, 2010)
5
FlexScore (Jamroz, Kolinski, & Kihara, ISMB, Bioinformatics, 2016)
• Evaluating a computational protein structure model by comparing it to an ensemble of the target protein structure
• The ensemble comes from either NMR, MD simulation (or else)• 10 nano seconds MD simulation with explicit water molecules
• Structure Xi in an ensemble X is represented as
Tik
Tiii tREMX 1
M: a mean structureEi: displacement that follows a Gaussian distribution of Nk,3(0, S, I3), S is a k x k covariance matrixK: the number of Ca atomsRi: rotation matrixti: translation vector, 1k is a k x 1 vectorT denotes transpose of a matrix
6
Ensemble Superimposition T
ikTiii tREMX 1Ensemble
structures, X
kTk
kTi
iXt
111ˆ
1
1
Estimate t
Estimate R
Initialization:S=I, M= Xj, a = 0 ^^
Estimate M ^ TikiT tXM ˆ11
R computed by SVD of
TiTiki
n
ii
Tikis MRtXMRtX
nˆˆ1ˆˆ1
31ˆ
1
sh I
nnn ˆ
32
333ˆ a
mk
iism cmE
k
1
11 ,,|2ˆ
aa
Estimate a
Estimate Ss, and its Ls (eigenvector)^
^
Estimate Sh, Lh^^
a |)|,,,( lXMtRl Hierarchical log likelihood model
a: parameter of inverse Gamma distribution which L of S follows
(Theobald DL, 2012)
7
FlexScore (FS)• Score of a computational model Y by shifting t and rotating
with a rotation matrix by SVD of
• Score of 0 for the perfect model
• FS-GDT: defined as the average of factions of Ca atoms within FlexScore of 1, 2, 4, and 8. The score ranges [0, 1].
(analogous to GDT-TS, which is the average of fractions of Ca atoms within 1, 2, 4, and 8 Å)
k
iii
i
YMk
YFS1
supˆ11 )(
TikhT tYM ˆ1ˆ 1
8
FlexScore of Toy models
NMR Structures (PDB ID: 2j8p)Identical RMSD, GDT-TS, & TM-Score: 1.47, 0.95, & 0.93 to the mean structure FlexScore: Green, 1.96; Blue: 1.42
9
Correlation of FlexScore to the Other Scores
Target GDT-TS TMSCORE RMSD
<GDT-TS>
<TM>
<RMSD> <FS>
T0651* -0.04 -0.13 1.00 0.27 0.36 24.02 62.76T0655 -0.83 -0.88 0.77 0.49 0.58 13.95 15.41T0657 -0.94 -0.95 0.92 0.63 0.68 7.69 9.64T0662 -0.97 -0.96 0.99 0.67 0.67 3.87 5.24T0667 -0.96 -0.98 0.98 0.57 0.69 6.73 13.34T0669 -0.83 -0.84 0.96 0.46 0.50 9.21 16.70T0673 -0.65 -0.58 0.95 0.33 0.27 11.85 22.87T0675 -0.62 -0.56 0.74 0.37 0.33 11.14 6.96T0714 -0.91 -0.92 0.98 0.78 0.79 2.67 5.24T0716 -0.82 -0.79 0.88 0.65 0.62 7.55 5.62T0763* -0.30 -0.48 0.99 0.16 0.20 18.18 54.71T0767* -0.48 -0.69 1.00 0.11 0.19 33.84 94.69T0769 -0.88 -0.87 0.80 0.50 0.53 11.58 13.22T0773 -0.91 -0.89 0.85 0.52 0.49 9.45 12.04T0777* -0.63 -0.72 1.00 0.10 0.21 31.60 81.96T0780 0.08 0.03 0.99 0.29 0.37 23.13 32.47T0782 -0.88 -0.89 0.99 0.45 0.49 9.20 17.83T0785* -0.54 -0.59 0.97 0.18 0.20 16.40 37.16T0790* -0.28 -0.57 1.00 0.11 0.19 26.15 50.85T0803 -0.27 -0.30 0.98 0.34 0.39 13.84 35.47T0808* -0.02 -0.15 0.99 0.11 0.21 26.47 70.98T0814* 0.10 -0.43 0.98 0.10 0.19 27.14 75.96T0829 -0.78 -0.72 0.95 0.47 0.42 9.63 22.38T0832* -0.41 -0.64 0.97 0.15 0.22 20.65 51.35T0833 -0.94 -0.95 0.96 0.57 0.60 7.50 11.78T0853 -0.27 -0.32 0.99 0.21 0.26 17.55 36.25T0856 -0.89 -0.92 0.99 0.69 0.77 4.01 10.81T0857 -0.89 -0.90 0.95 0.29 0.31 13.96 13.27
~200 Predicted (server) models for single chain targets from CASP10 and CASP 11* Free modeling targets
10
Correlation between FlexScore and RMSD
11
Different Evaluation by FlexScore, GDT-TS, TM-Score, & RMSD (T0716)
Green, OrangeGDT-TS: 0.52, 0.51TM-Score: 0.48, 0.49FlexSscore: 7.4, 24.1
FlexScore of Green, Orange: 2.75, 2.71GDT-TS: 0.75, 0.70; TM-Score: 0.72, 0.70; RMSD: 3.93 Å, 5.40 Å
12
Different Evaluation by FlexScore, GDT-TS, TM-Score, & RMSD (T0655)
Green, Orange models:GDT-TS: 0.50, 0.54TM-Score: 0.61, 0.66FlexScore: 23.05, 9.2
13
Different Evaluation by FlexScore, GDT-TS, TM-Score, & RMSD (T0714)
Green and orange modelGDT-TS: 0.84, 0.83; TM-Score: 0.83, 0.86 FlexScore: 4.42, 2.69
14
Different MD TrajectoriesT0829 (4rgi, 70 res) T0782 (4qrl, 70 res)
3 MD trajectories
FlexScore: 5.20, 5.21, 5.22 FlexScore: 3.63, 3.63, 3.63
15
Dependency to Length of MD Simulation
T0773, PDB ID: 2n2u, 77aa long. Left half, Correlation with the other scores; right half, average values of the scores.
16
FlexScore from NMR and MD Ensembles
Scores of 235 models of T0176 are compared.
17
CASP10 Prediction Group RankingRank FS FS-GDT GDT-TS TM RMSD
1 A A A A A
2 B D B B B
3 C B F C C
4 D C C F F
5 E F D D E
6 F E I I G
7 G O (14) G X (24) J
8 H J J L (12) I
9 I Q (17) E G D
10 J H O (14) Q (17) H
18
Real-value Prediction of Protein Flexibility
http://kiharalab.org/flexpred/
(Peterson, Jamroz, Kolinski, Kihara, Methods Mol. Biol, 2016)
(Jamroz, Kolisnki, Kihara, Proteins 80: 1425-1435, 2012)
Structural Features Avg. corr. coefficient
B-Factor 0.484
Distance to center of mass 0.509
Square of distance to center of mass (D2)
0.545
Contact number (cutoff 6 Å) -0.374
Contact number (8 Å) -0.480
Contact number (12 Å) -0.554
Contact number (15 Å) -0.568
Contact number (16 Å) -0.567
Contact number (18 Å) -0.562
Accessible Surface Area normalized 0.476
Residue depth (residue mean) -0.352
Prediction by GNM (cutoff 16 Å) 0.643
Prediction by GNM (no cutoff) 0.646
19
(592 MD trajectories from the MoDEL db)
Fluctuation Prediction Using Support Vector Regression
20
Features used Average corr. coeff.
RMS (Å)
B, D2, Sec, C(16), C(18), C(12), C(8) 0.667 1.042
B, D2, C(16), C(18), C(12), C(8), C(6), C(20) 0.666 1.042
B, D2, C(16), C(18), C(12), C(8), C(6), C(20), C(22) 0.667 1.042
B, C(16), C(18), C(12), C(8), C(6), C(20), C(22) 0.669 1.073
C(16), C(18), C(12), C(8), C(6), C(15) C(20), C(22) 0.660 1.092
B, B-factor; D2, square of the distance to the center of mass; C(x), the contact number with x Å cutoff
(Jamroz, Kolisnki, Kihara, Proteins 80: 1425-1435, 2012)
Examples of Predicted Fluctuations21
1gpc 218 aa
1a1x108aa
22
Summary• Developed FlexScore, which evaluates computational
protein structure models by considering flexibility of target proteins
• Flexibility is represented by a structure ensemble, which come from MD or NMR, or prediction using FlexPred
• Distinguishes discrepancy of a model at a flexible region and a rigid region of the target protein
• Overall correlates well with existing scores (GDT-TS, TM-Score), but occasionally have different, more reasonable evaluation
Available atFlexScore: https://bitbucket.org/mjamroz/flexscore FlexPred: http://kiharalab.org/flexpred/
Acknowledgement
@kiharalab
8
top related