advanced computational biology project presentation
DESCRIPTION
Team Members: Joshua Wu 11174269 Shuyu (Christine) Xu 11161640. Advanced ComputationAL Biology Project Presentation. OVERVIEW. Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/1.jpg)
ADVANCED COMPUTATIONAL BIOLOGY
PROJECT PRESENTATION
Team Members:Joshua Wu 11174269
Shuyu (Christine) Xu 11161640
![Page 2: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/2.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 3: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/3.jpg)
Project DescriptionExplicit Suffix Trees
Suppose that we want to store explicitly all strings that are edge labels of a suffix tree.
The main question of this project is how much space explicit suffix trees require comparing to implicit suffix trees.
Implement suffix tree algorithm and run it on substrings of real data.
![Page 4: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/4.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 5: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/5.jpg)
Introduction Any string of length m can be
degenerated into m suffixes, and these suffixes can be stored in a suffix tree.
Setup time O(m) (m is length of string)
searching time O(n) (n is length of pattern)
![Page 6: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/6.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 7: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/7.jpg)
Motivation "Suffix trees are widely used in the
computer field... Recent improvements in the method have cut the memory requirement to 17 bytes per letter, which brings the method to the verge of practicality [for bioinformatics applications]" -- Nat Goodman (Genome Technology).
![Page 8: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/8.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 9: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/9.jpg)
Bioinformatics Application
1. multiple genome alignment (Michael Hohl et al., 2002)
2. selection of signature oligonucleotides for DNA arrays (Kaderali and Schliep, 2002)
3. identification of sequence repeats (Kurtz and Schleiermacher, 1999)
![Page 10: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/10.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 11: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/11.jpg)
Explicit vs Implicit ABC $ Explicit 1 2 3 4 ABC$ $ BC$ C$ Implicit 1,4 4,4 2,4 3,4
![Page 12: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/12.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 13: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/13.jpg)
Problem Analysis Best Case for explicit and implicit suffix
trees: All different characters
Best case not likely with DNA inputs: total of 4 characters
Worst case: same characters throughout
![Page 14: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/14.jpg)
Assumptions In implicit trees, each number will only
take up one bit. (the number 10 takes up 1 bit)
Only alphabets will be in the sequence
![Page 15: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/15.jpg)
Example: all different char ABCD $ 1,5 5,5 1 2 3 4 5 2,5 3,5 4,5
N: string length N = 5 Memory = 10 best case
![Page 16: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/16.jpg)
Example ABCABC $ 7,7 1 2 3 4 5 6 7 1,3 2,3 6,6 N: string length N = 7 4,7 7,7 7,7 7,7 Memory = 20 4,7 4,7
![Page 17: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/17.jpg)
Example: all same character AAAA $ 1 2 3 4 5 1,1 5,5 N=string length N = 5, 6, 7 2,2 5,5 Memory = 16, 20, 24 Memory = 4n-4 3,3 5,5
Worse case 4,5 5,5
![Page 18: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/18.jpg)
Program Input Data
DNA for all kinds of creatures:
Homo Sapiens, Monkeys, Chickens, …
![Page 19: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/19.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 20: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/20.jpg)
Sample input: Homo Sapien
cagctcctgagactgctggcatgaaggggagccgtgccctcctgctggtggccctcaccctgttctgcatctgccggatggccacaggggaggacaacgatgagtttttcatggacttcctgcaaacactactggtggggaccccagaggagctctatgaggggaccttgggcaagtacaatgtcaacgaagatgccaaggcagcaatgactgaactcaagtcctgcagagatggcctgcagccaatgcacaaggcggagctggtcaagctgctggtgcaagtgctgggcagtcaggacggtgcctaagtggacctcagacatggctcagccataggacctgccacacaagcagccgtggacacaacgcccactaccacctcccacatggaaatgtatcctcaaaccgtttaatcaataa
![Page 21: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/21.jpg)
Sample result
![Page 22: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/22.jpg)
Sample input 2: plants
EARPIVVGPPPPLSGGLPGTENSDQARDGTLPYTKDRFYLQPLPPTEAAQRAKVSASEILNVKQFIDRKAWPSLQNDLRLRASYLRYDLKTVISAKPKDEKKSLQELTSKLFSSIDNLDHAAKIKSPTEAEKYYGQTVSNINEVLAKLG
![Page 23: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/23.jpg)
Sample output:
![Page 24: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/24.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 25: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/25.jpg)
Homo Sapien
![Page 26: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/26.jpg)
Sample Input: Homo Sapiens
atgaaggggagccgtgccctcctgctggtggccctcaccctgttctgcatctgccggatggccacaggggaggacaacgatgagtttttcatggacttcctgcaaacactactggtggggaccccagaggagctctatgaggggaccttgggcaagtacaatgtcaacgaagatgccaaggcagcaatgactgaactcaagtcctgcagagatggcctgcagccaatgcacaaggcggagctggtcaagctgctggtgcaagtgctgggcagtcaggacggtgcctaa
![Page 27: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/27.jpg)
Comparisons: Homo Sapiens
![Page 28: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/28.jpg)
Comparisons: Homo Sapiens
![Page 29: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/29.jpg)
Monkey Virus
![Page 30: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/30.jpg)
Sample Input: Monkey Virus
GGSCFKCGKKGHFAKNCHEHAHNNAEPKVPGLCPRCKRGKHWANECKSKTDNQGNPIPPH
![Page 31: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/31.jpg)
Monkey Virus
![Page 32: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/32.jpg)
Plants
![Page 33: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/33.jpg)
Sample Input: Plants EARPIVVGPPPPLSGGLPGTENSDQA
RDGTLPYTKDRFYLQPLPPTEAAQRAKVSASEILNVKQFIDRKAWPSLQNDLRLRASYLRYDLKTVISAKPKDEKKSLQELTSKLFSSIDNLDHAAKIKSPTEAEKYYGQTVSNINEVLAKLG
![Page 34: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/34.jpg)
Plants
![Page 35: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/35.jpg)
Tobacco
![Page 36: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/36.jpg)
Sample input: tobacco
SYSITTPSQFVFLSSAWADPIELINLCTNALGNQFQTQQARTVVQRQFSEVWKPSPQVTVRFPDSDFKVYRYNAVLDPLVTALLGAFDTRNRIIEVENQANPTTAETLDATRRVDDATVAIRSAINNLIVELIRGTGSYNRSSFESSSGLVWTSGPAT
![Page 37: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/37.jpg)
Tobacco
![Page 38: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/38.jpg)
Insects
![Page 39: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/39.jpg)
Sample Input: Insects DCLSGRYKGPCAVWDNETCRRVCKE
EGRSSGHCSPSLKCWCEGC
![Page 40: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/40.jpg)
Insects
![Page 41: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/41.jpg)
Birds
![Page 42: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/42.jpg)
Sample Input: Birds IDTCRLPSDRGRCKASFERWYFNGRT
CAKFIYGGCGGNGNKFPTQEACMKRCAKA
![Page 43: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/43.jpg)
Birds
![Page 44: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/44.jpg)
SARS
![Page 45: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/45.jpg)
Sample Input: SARS ALNTLVKQLSSNFGAISSVLNDILSRLD
KVEAEV
![Page 46: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/46.jpg)
SARS
![Page 47: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/47.jpg)
Fish
![Page 48: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/48.jpg)
Sample Input: Fish GHHHHHHLEDPSGGTPYIGSKISLISK
AEIRYEGILYTIDTENSTVALAKVRSFGTEDRPTDRPIAPRDETFEYIIFRGSDIKDLTVCEPPKPIM
![Page 49: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/49.jpg)
Fish
![Page 50: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/50.jpg)
Chicken
![Page 51: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/51.jpg)
Sample Input: Chicken
RVKRVWPLVIRTVIAGYNLYRAIKKK
![Page 52: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/52.jpg)
Chicken
![Page 53: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/53.jpg)
files Code
Results
![Page 54: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/54.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work
Now we are here
![Page 55: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/55.jpg)
Conclusion Explicit suffix trees require more space
than implicit suffix trees in real datas.
Data comparison: worst case is DNA input (least variety of characters)
results Implicit trees should be used for smaller
use of storage
![Page 56: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/56.jpg)
1 3 5 7 9 11 13 15 17 19 21 23 250
500
1000
1500
2000
2500
3000
variety of string vs tree size
variety of string vs tree size
# of alphabets
![Page 57: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/57.jpg)
Conclusion Application:
it is easier to compare structures for implicit than explicit suffix trees (number comparisons)
Save spaceEasy to implement
Further improvement?
![Page 58: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/58.jpg)
OVERVIEW Project Description Introduction Motivation Bioinformatics Application Explicit vs Implicit Problem Analysis Implement Files Experimental Results Conclusion Possible Future Work Now we are here
![Page 59: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/59.jpg)
Possible Future Work Program speed is too slow
The interface of our program should be improved. (Matlab)
More variety of input
![Page 60: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/60.jpg)
References Real Data http://www.ncbi.nlm.nih.gov/entrez/viewe
r.fcgi?db=nucleotide&val=74273665 http://www.rcsb.org/pdb http://www.ncbi.nlm.nih.gov/sites/entrez
?cmd=search&db=nucleotide
![Page 61: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/61.jpg)
References Online info http://en.wikipedia.org/wiki/Suffix_tree http://marknelson.us/1996/08/01/suffix-tr
ees/ http://homepage.usask.ca/~ctl271/857/s
uffix_tree.shtml http://www.cs.uku.fi/~kilpelai/BSA05/lect
ures/print07.pdf
![Page 62: Advanced ComputationAL Biology Project Presentation](https://reader035.vdocument.in/reader035/viewer/2022081513/56816354550346895dd3fd7d/html5/thumbnails/62.jpg)
THANK YOU!