fast search protein structure prediction algorithm for almost perfect matches1 by jayakumar...
TRANSCRIPT
![Page 1: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/1.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 1
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches
By
Jayakumar Rudhrasenan
S3047315
Primary Supervisor: Prof. Heiko Schroder
Secondary Supervisor: Dr. Margaret Hamilton
![Page 2: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/2.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 2
Introduction
Bio-Informatics
What is Bio-Informatics?
Bio-Informatics is the science of developing computer databases and algorithms to facilitate biological research especially in the area of genomic.
Genomic is the study of genes and its functions.
![Page 3: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/3.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 3
Background - Protein Structure
e
How can we find the Structure of a protein ?• X-ray Crystallography
• NMR Spectroscopy
Phi Psi
Amino acid
a
k
r
n
d
c
a
r
aProtein Structure
![Page 4: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/4.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 4
Where does Computer Science come into it?
Limitations of traditional lab-work
•Expensive
Cost involved in finding the structure through these method is expensive
•Time Consuming
Takes 6 to 12 months to predict the structure of a single protein.
REASON:
Some proteins don’t crystallise
Some don’t give good diffraction patterns
All proteins are fragile and difficult to handle.
![Page 5: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/5.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 5
Methods Available
There are many ways by which this problem is being tackled.
These methods are basically classified into two groups:
• ab initio
• Homology modelling
What is Homology modelling ?
![Page 6: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/6.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 6
What is homology modelling?
Homology modeling works on the principle that although each protein adopts a unique structure, there are only ~2,000 common folds between the various super families identified thus far.
If two protein sequences are aligned and their percentage similarity is above the ‘twilight zone’, or 20% we can conclude that the sequences are homologous, or share a common ancestry, below this zone it is not possible to say whether the identical amino acid residues are in fact evolutionarily linked or have arisen by chance.
![Page 7: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/7.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 7
What is Protein Structure Prediction?
In its most general form
- It is the prediction of the relative position of each amino acid in the protein structure with the knowledge of the structural details of other known proteins.
![Page 8: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/8.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 8
Why predict protein structure?
• The sequence structure gap
– 750 000 known sequences, 17 000 known structures
• Structural knowledge brings understanding of function and mechanism of action
• Can help in prediction of function
• Predicted structures can be used in structure based drug design
• It can help us understand the effects of mutations on structure or function
• It is a very interesting scientific problem
– still unsolved in its most general form after more than 20 years of effort
![Page 9: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/9.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 9
Protein Structure Prediction Algorithm
n f s b c a r . . . . .
a r n d c q e g h i l k m n f s s d
e g h i l n f s e a r l k s p q g a
n h e . . . . . . . . . . .
Window size =3. Can be implemented with window size of 5,7,9. With window size of 9, we look for almost perfect matches as we wont get a perfect match with the database we have.
window
Protein Database
Protein sequence for which the structure is unknown
![Page 10: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/10.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 10
Algorithm – continued..
Number of Occurrences
Number of Occurrences
Phi graph
Psi graph
![Page 11: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/11.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 11
Limitations of this algorithm
Time Consuming
Time taken to predict the
structure of a protein
Time taken to predict the
structure 20,000 protein
2 hr PC time
2 x 20,000 = 40,000 hrs PC time
![Page 12: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/12.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 12
Why does it take time?
Each sub sequence of the unknown protein is compared with all the sub sequences of the proteins in the database.
With a window size of 9, the number of sub strings in the database will be around 2 million.
So, there will be 2 million comparisons for each sub sequence in the unknown protein.
“Unknown protein” here means the proteins whose sequence is knows but the structure is not known.
![Page 13: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/13.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 13
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches
•Arrange the sub sequences with a hamming distance of one between each sub sequences.
What is hamming distance?
The number of disagreeing bits between twobinary vectors.
Used as measure of dissimilarity.
Eg. 1000011
1000001 These two binary numbers differ by one bit.
Hamming distance of one here means that the each sub sequence differ from the one next to that by just one amino acid.
![Page 14: Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder](https://reader036.vdocument.in/reader036/viewer/2022082712/56649e985503460f94b9b898/html5/thumbnails/14.jpg)
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 14
Continued…
• Maintain a table which stores the hope index value for a mismatch. For example
Row number
Sub Sequence Jump to row number
1023 111110000 1027
1024 111110001
1025 111110002
1026 111110003
1027 111110013 1031
1028 111110012
1029 111110011
1030 111110010
1031 111110020 1035
. . .