template-based prediction of protein 8-state secondary structures june 12 th 2013 ashraf yaseen and...

20
Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION UNIVERSITY, NORFOLK, VA 3rd IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)

Upload: georgia-evans

Post on 23-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

Template-based Prediction of Protein 8-state

Secondary Structures

June 12th 2013

Ashraf Yaseen and Yaohang Li

DEPARTMENT OF COMPUTER SCIENCEOLD DOMINION UNIVERSITY, NORFOLK, VA

3rd IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)

Page 2: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

2

Contents

Introduction Secondary Structure Definition &

Representation Secondary Structure Prediction C8-Scorpion

Materials & Methods Data Sets, Template Construction, and

Encoding Neural Network Model

Results & Discussions Summary

Page 3: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

3

Protein Secondary Structure Prediction in Protein Modeling

Proteins; Proteios, “primary”, “of prime importance.” The primary components of living things

In nature, proteins fold into specific 3D structures critical to their functions

Protein Modeling

Correctly predicting protein secondary structure is a critical step stone to obtain correct 3D models

Sequence

3D

intermediate prediction steps

Page 4: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

4

Secondary Structures - Definition

Protein 1BOO Chain A

π-helix

α-helix

310-helix

Turn

Bend

Other

β-strand

• General 3D form of local segments of residues• Identified from determined

protein 3D• DSSP

Page 5: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

5

Secondary Structures - Representation

3-10 helix (G)

α-helix (H) π-helix (I)

β-stand (E) bridge (B)

turn (T)bend (S)

others (C)

Page 6: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

6

Secondary Structure Prediction - Effectiveness

Correctly predicting secondary structure Reduce the degrees of freedom in protein

structure modeling reduce the difficulty of obtaining high resolution 3D models

Derive a much smaller range of possible torsion angles

http://www.imb-jena.de/~rake/Bioinformatics_WEB/basics_peptide_bond.html

Page 7: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

7

Secondary Structure Prediction - Background

Secondary Structure Prediction • 3-state (helix, sheet, coil)• 8-state (α-helix, π-helix, 310-helix, β-

strand, β-bridge, turn, bend and others)

Predictor

Structural state of Ri

Secondary Structure Prediction classificationEach residue is predicted to be in one of few states

Machine Learning (ANN, SVM, HMM, ...)

3-state Examples: GOR4, PSI-Pred, PHD, SAM, Porter, JPred, SPINE, SSPRO, NETSURF, and many others. ~80% (Q3)

8-state Examples: SSpro8, 62-63% Q8 RaptorXss8, 67.9% Q8

Page 8: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

8

Secondary Structure Prediction - 8-state

  CB513 CASP9 Manesh215 Carugo338

QG17.54 20.58 18.43 19.20

QH89.96 92.90 90.22 89.91

QI0.00 0.00 0.00 0.00

QE77.68 81.64 79.60 79.45

QB0.09 0.00 0.32 0.44

QS15.87 18.11 17.80 17.14

QT48.02 51.45 51.28 50.11

QC63.29 59.37 63.73 63.36

Q865.59 69.31 67.69 66.64

Prediction Accuracy of RaptorXss8 on Benchmarks of CB513, CASP9, Manesh215, and Carugo338. Prediction accuracies for 3-10 helices (G), π-helices (I), β-bridges (B), and bends (T) are particularly low due to their low appearance frequencies

Distribution of 3-10 helices (G), α-helices (H), π-helices (I), β-sheets (E), β-bridges (B), turns (T), bends (S), and coils (C) in Cull5547

Page 9: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

9

Secondary Structure Prediction - Template-based

Most current methods for secondary structure predictions are ab initio

However, many protein sequences have some degree of similarity among themselves

Latest version of Porter (in 3-state) Improvement in prediction accuracy with >30%

sequence similarity Decline in efficiency with low sequence

similarity <20%

Page 10: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

10

Template-based C8-SCORPION

Predictor

Structural feature (state) of Ri

Input encoding

Sequence & evolutionary info (PSSM)

+ Structure info. from (templates

Orcontext-based

scores)

Is an extension of our previous method C3-

SCORPION

Page 11: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

11

Materials & Methods

Cull5547

PISCES

server 25% (at most) sequence identity, 2.0A resolution

CASP9

Manesh215

Carugo338CB51

3

Data Sets Template Construction

Encoding

Context-based scores: potential scores, based on statistics, derived from the protein datasets, estimate the favorability of residues in adopting specific structural states, within their amino acid environment.

Page 12: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

12

Materials & Methods -cont.

Two phases of template-based 8-state secondary structure prediction (architecture and encoding)

Neural Network Model

Page 13: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

13

Results & Discussions

  Q8 SOV8

G 43.99 47.96

H 92.48 95.19

I 0.00 0.00

E 88.30 92.77

B 27.86 27.57

S 43.46 45.32

T 64.18 66.64

C 75.51 71.45

Overall 78.85 80.10

7-fold cross-validation accuracy in template-based 8-state prediction

  Q8 SOV8

 No Template With Template No Template With Template

CB513 67.22 79.39 67.66 80.64

CASP9 71.54 76.36 73.47 78.15

Manesh215 69.71 81.10 70.79 82.99

Carugo338 68.44 80.39 69.50 81.95

Comparison between 8-state predictions with and without template on Benchmarks

Distribution of 8-state secondary structure prediction accuracy (Q8) as a function of sequence similarity- the first group of bars corresponds to template-

less predictions

Page 14: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

14

Results & Discussions -cont.

  (0, 10] (10, 20] (20, 40] (40, 70] (70, 95]# of chains 4,426 4,215 3,204 1,437 1,133

QH92.05 92.70 93.60 94.97 95.94

QG22.07 23.93 35.09 55.03 69.44

QI0.00 0.00 0.00 0.00 0.00

QE83.37 84.53 86.59 90.16 93.61

QB1.53 3.59 7.24 22.30 44.26

QT53.35 55.34 60.89 69.66 77.06

QS22.83 26.41 35.19 54.09 73.40

QC66.55 67.84 71.81 79.56 86.80

Q871.33 73.01 76.29 82.11 88.01

Comparison of 7-fold cross validation prediction accuracies in eight states when templates with different sequence similarities are used

Page 15: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

15

Results & Discussions -cont.

Comparison between template-less and template-based predictions on 1BTN chain A

Page 16: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

16

Working with C8-ScorpionInput titleInput your sequenceInput your e-mail Submit, then wait for the results...

“C8-Scorpion” available at: http://hpcr.cs.odu.edu/c8scorpion

Page 17: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

17

Working with C8-ScorpionCheck your e-mail,Click the link providedThe results are displayed

Page 18: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

18

Summary

The effectiveness of using structural information in templates has been demonstrated in our computational results in 7-fold cross validation as well as on benchmarks, where enhancements of prediction accuracies are observed.

Overall, 78.85% Q8 accuracy and 80.10% SOV8 accuracy are achieved in 7-fold cross validation

More importantly, when good templates are available, the prediction accuracy of less frequent secondary structure states, such as 3-10 helices, turns, and bends, are highly improved, which are suitable for practical use in applications.

A webserver (C8-Scorpion) implementing template-less 8-state secondary structure prediction is currently available at http://hpcr.cs.odu.edu/c8scorpion. The integration of template-based prediction into the C8-Scorpion webserver is currently under development

Page 19: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

19

Acknowledgement

This work is partially supported by NSF grant 1066471 and ODU 2013 Multidisciplinary Seed grant

Page 20: Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION

20

Questions?

Thank You