tm pro & comparison of algorithms for “protein stability prediction upon mutations”
DESCRIPTION
TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”. Madhavi Ganapathiraju Graduate student Carnegie Mellon University. Overview. TMpro evaluations on PDBTM, TMPDB and MPTOPO are complete Additional inputs to TMPro are being studied - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/1.jpg)
1
TM PRO&
Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”
Madhavi GanapathirajuGraduate student
Carnegie Mellon University
![Page 2: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/2.jpg)
2
Overview
• TMpro evaluations on PDBTM, TMPDB and MPTOPO are complete
• Additional inputs to TMPro are being studied– Yule values (not successful)– Evolutionary Profile (promising)
• TMPro website has been completed• Evaluation of algorithms to predict protein
stability changes upon mutations
![Page 3: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/3.jpg)
3
Part 1: TM pro
![Page 4: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/4.jpg)
4
TMPro Evaluations
Segment Residuelevel
Method Qok SegmentF Score
Segment Recall
SegmentPrecision
Q2 Misclassified as
Soluble
MPtopo (101 TM proteins)
2a TMHMM 66 91 89 94 84 5
2b TMpro NN 60 93 92 94 79 0
PDBTM (191 TM proteins)
3a TMHMM 68 90 89 90 84 13
3b TMpro NN 57 93 93 93 81 2
![Page 5: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/5.jpg)
5
TMPro web-server
is fully functional!
Competition for TMpro
Logo
Prize:See your
logo on the web!
![Page 6: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/6.jpg)
6
Attempts to overcome confusion with globular soluble helices (1)
• Yule value features to be added– Yule value features that discriminate amino acid
neighbor propensities between TM and nonTM helices were computed earlier
– Tried to add these features as input to NN predictor, but could not achieve quantitative improvement
– I will discuss this in future when I have any results to present
![Page 7: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/7.jpg)
7
Attempts to overcome confusion with globular soluble helices (2)
• Evolutionary profile information– It is known that knowledge of evolutionary profile of a
protein can improve prediction accuracy to a great extent
• TMPro is capable of predicting TMs without requiring knowledge of profile– Useful when you cannot extract sequence
alignments from known proteins
• But where profile is known, we would like to use that additional information
![Page 8: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/8.jpg)
8
Profile generation
• Get multiple sequence alignments• Compute position specific scoring matrix for
each protein– 21 rows (20 amino acids, and 1 row for gaps)
• Profile is generated for each protein in the training and test sets
Those of you who have worked with evolutionary analysis before, please give feedback
PSSM (i,j) = log(C(i,j)/total counts at position j)log(C(i,j)/unigram count of i in the protein)
![Page 9: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/9.jpg)
9
Doubts
• We have labels for training sequences– But when original sequence has gaps when aligned,
how to interpret the labels of the gaps?
--n------n----n------nnn-----n------n-----------------M-----2a65 369 --D------E----L------KLS-----R------K-----------------H----- 3772A65_A 369 --.------.----.------...-----.------.-----------------.----- 377AAC07817 369 --.------.----.------...-----.------.-----------------.----- 377YP_001956 364 --E------S----F------G.K-----.------.-----------------T----- 372
-M------M------M------M-------M----------M---------MM-------2a65 378 -A------V------L------W-------T----------A---------AI------- 3852A65_A 378 -.------.------.------.-------.----------.---------..------- 385AAC07817 378 -.------.------.------.-------.----------.---------..------- 385YP_001956 373 -S------C------.-----------------------------------IL------- 377
Even TM regions are having gaps such as shown above
What labels to assign to gaps?
![Page 10: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/10.jpg)
10
Doubts
• When nothing is shown (gap/alignment) for some sequences, I am counting those as gaps
XP_659910 47 L-......K.----------...KAP----RSNQV.-..FVAGTMGLASAVGA.AT 86AAW43619 100 .....A..A-----------KNP----NTTRNV-..FMVGALGALGASSV.ST 136CAB59195 59 ----.N.RP.-A..VIGSARFAYMAWTRVA 83XP_466001 107 SKRA.-A.FVLSGGRFIYASLLRLL 130AAA20832 103 SKRA.-A.FVLTGGRFVYASLVRLL 126
What do with missing segment info for some sequences
![Page 11: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/11.jpg)
11
Using profile for predictionStudied independent of TMpro
Neural network with 21 input, 21 hidden and 1 output neurons
Residue Number
Pre
dic
ted
ou
tpu
t(n
on
me
mb
ran
e=
0,
me
mb
ran
e =
1)
Experimentalobserved locationsof TM helices
![Page 12: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/12.jpg)
12
Another output
![Page 13: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/13.jpg)
13
NN architecture needs to be modifiedBut instead I did post-processing of Neural network output
Computed Wavelet TransformMexican hat wavelet, scale = 10
![Page 14: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/14.jpg)
14
Some more wavelet outputs
Note that these are from the training data itself.. Yet to check how it performs overall
![Page 15: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/15.jpg)
15
Part 2: Stability upon Mutations
![Page 16: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/16.jpg)
16
Evaluation of predictions of protein stability changes upon mutations
• Effects of mutations on 2 TM proteins are available in our group– The two proteins are rhodopsin and
bacteriorhodopsin– Data available for how much mis-folding occurs– How stability of protein is affected
• There are algorithms that can also predict these changes
• We compared how accurate or reliable the prediction methods are, by comparing their results with our experimental data
![Page 17: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/17.jpg)
17
3 Prediction algorithms
• I mutant 2.0– Support vector machine– Features: amino acid neighbors in 9nm sphere,
temperature, pH, relative solvent accessibility surface are
– http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi
• DFIRE– Knowledge based statistical potentials– http://phyyz4.med.buffalo.edu/hzhou/mutation.html
• FOLDX– Statistical mechanics.. Account for various energy terms– http://fold-x.embl-heidelberg.de:1100/
![Page 18: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/18.jpg)
18
Authors’ claims in 3 papers
![Page 19: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/19.jpg)
19
Our results
Number of known mutations I mutant DFIRE FOLD-X
Folding 52 54.7 57.7 50Meta 2 32 78.1 73.3 46.9Both 84 64.3 63.0 50.6
Number of known mutations I mutant DFIRE FOLD-X
Folding 147 35.4 37.1 55.7Meta 2 159 56.0 47.5 67.2Both 279 55.3 38.7 52.7
Rhodopsin (PDB: 1U19)
Bacteriorhodopsin (PDB: 1QM8)
![Page 20: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/20.jpg)
20
Bias in # of mutations that increase/decrease stability
Database bias affects apparent accuracies of algorithms
I-mutant for example, predicts decrease in stability for a majority of the mutations.
Whether the mutations studied through experiments preserve the natural bias of decreasing stability mutations, affects the apparent accuracy of the prediction algorithms
Experimental I-mutant DFIRE FOLDXRhodopsin 63 75 46 66Bacteriorhodopsin 81 97 81 65
![Page 21: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/21.jpg)
21
Correlation with known data
I-mutant DFIRE FOLDXRhodopsin 0.11 0.16 0.24Bacteriorhodopsin -0.09 0.18 -0.18
Reported correlations for these methods are quite large (>0.7)
On data compared here the correlations are quite low
![Page 22: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/22.jpg)
22
Notes ..
• Local installation of blast and netblast are on cologne:– /usr1/blast-2.2.13/ – /usr1/netblast-2.2.13/
• Java SDK on Cologne– /usr1/j2sdk1.4.2_11/
![Page 23: TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”](https://reader036.vdocument.in/reader036/viewer/2022062719/5681316b550346895d97e373/html5/thumbnails/23.jpg)
23
Acknowledgements
Judith Klein-Seetharaman
Christopher Jon Jursa Pitt Information sciences
(for developing web interface)