high throughput computing and protein structure stephen e. hamby

20
High Throughput Computing and Protein Structure Stephen E. Hamby

Upload: jeffery-neal

Post on 18-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

High Throughput Computing and Protein

Structure

Stephen E. Hamby

Overview

• Introduction To Protein Structure• Dihedral Angles• Previous Work• Support Vector Regression• Optimisation• Prediction• Results• Conclusions

Introduction To Protein Structure

Molecules with massive biological importance

Structure determination gives insight into ….

• Function, Dynamics, Potential drug targets.

Experimental structure determination is….

• Expensive, Slow, Difficult

Introduction To Protein Structure

Primary Structure:

Order of Amino Acids

Secondary Structure:

Building blocks

Tertiary Structure:

Complete 3D Structure

Introduction To Protein Structure

Secondary Structure Types

α-helix

β-sheet

Random Coil

Dihedral Angles

Dihedral Angles

Dihedral Angles

Finding the secondary structure of a protein is a step towards finding its complete structure

Predicting dihedral angles can help us to get the secondary structure

How Can We Predict Dihedral Angles?

Previous work

Destruct

Multiple neural networks.

Iterative method.

Predicts secondary structure

and dihedral angles.

Previous work

Twin neural networks give a consensus prediction.

Predicts dihedral angles from various amino acid properties amino acid composition and predicted structure.

Real Spine

Support Vector Regression

Kernel machine learning raises the data to a higher dimension so a linear relationship can be found.

Support Vector Regression

Attempts to fit a linear function to the data in a high dimensional feature space

Accurate but…

Slow, needs optimisation, black box.

Support Vector Regression

Kernel Choice

We tested the various kernels available through the PyML package.

These the are linear, polynomial, and gaussian kernels.

We tested them using the CASP4 dataset.

Gaussian kernel produced the best results.

Optimisation

Three interdependent parameters

Grid based optimisation on a the CASP4 dataset

Around 10000 3 hour jobs.

Run in blocks of 10 on Jupiter

Accuracy assessed using the Pearson correlation coefficient

Prediction

Support vector machine using a Gaussian kernel and optimal parameters.

Training on the CB513 dataset.

Tested by 10 fold cross validation

CASP 4 used as a test set.

Results

Destruct Real Spine SVM Prediction

Pearson Correlation Coefficient

0.42 0.62 0.57

CASP4 Test set gives Pearson Correlation Coefficient of 0.56

Results measured by cross validation

Results

Using Secondary structure predictions made by cascade correlation neural networks:

Dihedrals assisted by predicted structure Pearson correlation coefficient 0.582.

Subsequent iterations should lead to better predictions of both structure and dihedral angles.

What Next?

Using further iterations to improve accuracy.

Current method is a black box.

Can we use a program like Trepan to get some definite rules about secondary structure.

Conclusions

• Dihedral Angles define protein secondary structure

• Using Support Vector Machines it is possible to predict dihedral angles

• We (hopefully!) can use predicted dihedral angles to improve the accuracy of secondary structure prediction.

Acknowledgements

Jonathan Hirst

Hirst group members

BBSRC

The University of Nottingham