development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides

17
Development of classification methods to predict new 14-3-3- binding proteins and phosphopeptides Fábio M. Marques Madeira Supervisor: Professor Geoff Barton 7 th May 2013

Upload: callia

Post on 24-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides. Fábio M. Marques Madeira Supervisor: Professor Geoff Barton. 7 th May 2013. 14-3-3s dock onto pairs of tandem phosphoSer / Thr. 2R-ohnologue families. P. P. Kinase 1. 14-3-3. Kinase 2. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

Development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides

Fábio M. Marques Madeira

Supervisor: Professor Geoff Barton

7th May 2013

Page 2: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

14-3-3s dock onto pairs of tandem phosphoSer/Thr

P P

Kinase 1 Kinase 2

Hundreds of structurally and functionally diverse targets

14-3-3

1

2R-ohnologuefamilies

Page 3: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

The binding specificity of 14-3-3s is determined by overall steric fit and the sequence flanking the phosphoSer/Thr site

2

Mode I: RSX(pS/T)XP

Mode II: RX(F/Y)X(pS)XP

Mode III: C-terminal X(pS/T)

P P

Johnson et al., (2011) Molecular & cellular proteomics 10, M110.005751.

Page 4: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

ANIA: ANnotation and Integrated Analysis of the 14-3-3 interactome

3

Page 5: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

Development and evaluation of three new classifiers

6

Position-specific scoring matrix (PSSM)

Artificial Neural Network (ANN)

Support Vector Machines (SVM)

Page 6: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

Defining positive and negative examples for training and testing

5

Previous76 Pos76 Neg

Current273 Pos93 Neg

Training datasets:

1,192 Likely Neg

72 Proteins

pS/T pS/T

C- -N

Page 7: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

Defining positive and negative examples for training and testing

5

Previous76 Pos76 Neg

Current273 Pos93 Neg

Training datasets:

1,192 Likely Neg

Previous17 Pos17 Neg

Current38 Pos38 Neg

Blind datasets:

-11:11

-3:3

-7:7

Sequence redundancy thresholds:60%, 50% and 40%

Different motif regions/lengths:

-9:9

-5:5

Page 8: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

Development and evaluation of three new classifiers

7The area under the curve (AUC) was tested by Jackknife

Page 9: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

Development and evaluation of three new classifiers

8

Q - Accuracy

MCC - Matthews Correlation Coefficient

Page 10: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

Amino acid alphabet reduction reduces accuracy

9

Li et al., 2003 Livingston and Barton, 1993

Grouping 20 amino acids in 10 physicochemical classes:

Overall, alphabet reduction led to lower classification performances, suggesting that some sequence features that influence 14-3-3 binding, were lost by the reduction.

Page 11: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

Protein secondary structure, disorder and conservation do not improve the performance of the ANN

10

Sequence conservationProtein secondary structure by Jpred

Protein disorder by IUPred, DisEMBL and GlobPlot

P – Positives; N – Negatives (true + likely neg); L – Likely neg only; R – Random neg

Page 12: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

11

Blind testing shows that the PSSM is the best overall predictor

80% Overall Accuracy

Page 13: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

12

Prediction of new 14-3-3-binding sites using the PSSMHuman Proteome

Page 14: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

13

Scansite includes a set of predictions based on type I 14-3-3-

binding motif: RSX(pS/T)XP

The PSSM predictor outperforms Scansite in terms of accuracy

PSSM Scansite

Page 15: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

Conclusions

New strategy to map negative datasets

Performance improvement (AUC from ~0.80 to 0.88) and 80% accuracy,

for the PSSM model (60% and [-5:5])

Large-scale prediction of the human 14-3-3-binding proteome

The PSSM classifier outperforms Scansite in terms of accuracy

15

Page 16: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

Future work

1. Test training of the classifiers using non-symmetrical motif regions:

e.g. [-6:3]

2. Investigate new machine learning algorithms such as Bayesian

classifiers

3. Use the PSSM classifier to predict the 14-3-3-binding proteome of

model organisms such as Arabidopsis thaliana

4. Integrate predictions in ANIA and investigate if the candidate sites

are lynchpin sites conserved across 2R-ohnologue family members

16

Page 17: Development of classification methods to predict new 14-3-3-binding proteins and  phosphopeptides

Acknowledgements

Geoff Barton

Chris Cole

All members in the Computational Biology group

Carol MacKintosh and Michele Tinti