discovery of regulatory elements by a phylogenetic footprinting algorithm
DESCRIPTION
Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm. Mathieu Blanchette Shane Neph Martin Tompa Computer Science & Engineering University of Washington. Outline. How are genes regulated? What is phylogenetic footprinting? First solution Improvements and extensions - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/1.jpg)
Discovery of Regulatory Elements by a Phylogenetic
Footprinting Algorithm
Mathieu BlanchetteShane Neph
Martin Tompa
Computer Science & EngineeringUniversity of Washington
![Page 2: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/2.jpg)
2
Outline•How are genes regulated?
•What is phylogenetic footprinting?
•First solution
•Improvements and extensions
•Application to regulation of several important genes
![Page 3: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/3.jpg)
3
Regulation of Genes
• What turns genes on and off?
• When is a gene turned on or off?
• Where (in which cells) is a gene turned on?
• How many copies of the gene product are produced?
![Page 4: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/4.jpg)
4
Regulation of Genes
Coding regionRegulatory Element
RNA polymerase
Transcription Factor
DNA
![Page 5: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/5.jpg)
5
RNA polymerase
Transcription Factor
DNA
Coding region
Regulation of Genes
Regulatory Element
![Page 6: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/6.jpg)
6
GoalIdentify regulatory elements in DNA sequences. These are:
• Binding sites for proteins
• Short substrings (5-25 nucleotides)
• Up to 1000 nucleotides (or farther) from gene
• Inexactly repeating patterns (“motifs”)
![Page 7: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/7.jpg)
7
Phylogenetic Footprinting(Tagle et al. 1988)
Functional sequences evolve slower than nonfunctional ones.
• Consider a set of orthologous sequences from different species
• Identify unusually well conserved regions
![Page 8: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/8.jpg)
8
Substring Parsimony ProblemGiven:
• phylogenetic tree T,• set of orthologous sequences at leaves of T,• length k of motif• threshold d
Problem:
• Find each set S of k-mers, one k-mer from each leaf, such that the “parsimony” score of S in T is at most d.
This problem is NP-hard.
![Page 9: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/9.jpg)
9
Small Example
AGTCGTACGTGAC... (Human)
AGTAGACGTGCCG... (Chimp)
ACGTGAGATACGT... (Rabbit)
GAACGGAGTACGT... (Mouse)
TCGTGACGGTGAT... (Rat)
Size of motif sought: k = 4
![Page 10: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/10.jpg)
10
Solution
Parsimony score: 1 mutation
AGTCGTACGTGAC...
AGTAGACGTGCCG...
ACGTGAGATACGT...
GAACGGAGTACGT...
TCGTGACGGTGAT...ACGGACGT
ACGT
ACGT
![Page 11: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/11.jpg)
11
CLUSTALW multiple sequence alignment (rbcS gene)Cotton ACGGTT-TCCATTGGATGA---AATGAGATAAGAT---CACTGTGC---TTCTTCCACGTG--GCAGGTTGCCAAAGATA-------AGGCTTTACCATTPea GTTTTT-TCAGTTAGCTTA---GTGGGCATCTTA----CACGTGGC---ATTATTATCCTA--TT-GGTGGCTAATGATA-------AGG--TTAGCACATobacco TAGGAT-GAGATAAGATTA---CTGAGGTGCTTTA---CACGTGGC---ACCTCCATTGTG--GT-GACTTAAATGAAGA-------ATGGCTTAGCACCIce-plant TCCCAT-ACATTGACATAT---ATGGCCCGCCTGCGGCAACAAAAA---AACTAAAGGATA--GCTAGTTGCTACTACAATTC--CCATAACTCACCACCTurnip ATTCAT-ATAAATAGAAGG---TCCGCGAACATTG--AAATGTAGATCATGCGTCAGAATT--GTCCTCTCTTAATAGGA-------A-------GGAGCWheat TATGAT-AAAATGAAATAT---TTTGCCCAGCCA-----ACTCAGTCGCATCCTCGGACAA--TTTGTTATCAAGGAACTCAC--CCAAAAACAAGCAAADuckweed TCGGAT-GGGGGGGCATGAACACTTGCAATCATT-----TCATGACTCATTTCTGAACATGT-GCCCTTGGCAACGTGTAGACTGCCAACATTAATTAAALarch TAACAT-ATGATATAACAC---CGGGCACACATTCCTAAACAAAGAGTGATTTCAAATATATCGTTAATTACGACTAACAAAA--TGAAAGTACAAGACC
Cotton CAAGAAAAGTTTCCACCCTC------TTTGTGGTCATAATG-GTT-GTAATGTC-ATCTGATTT----AGGATCCAACGTCACCCTTTCTCCCA-----APea C---AAAACTTTTCAATCT-------TGTGTGGTTAATATG-ACT-GCAAAGTTTATCATTTTC----ACAATCCAACAA-ACTGGTTCT---------ATobacco AAAAATAATTTTCCAACCTTT---CATGTGTGGATATTAAG-ATTTGTATAATGTATCAAGAACC-ACATAATCCAATGGTTAGCTTTATTCCAAGATGAIce-plant ATCACACATTCTTCCATTTCATCCCCTTTTTCTTGGATGAG-ATAAGATATGGGTTCCTGCCAC----GTGGCACCATACCATGGTTTGTTA-ACGATAATurnip CAAAAGCATTGGCTCAAGTTG-----AGACGAGTAACCATACACATTCATACGTTTTCTTACAAG-ATAAGATAAGATAATGTTATTTCT---------AWheat GCTAGAAAAAGGTTGTGTGGCAGCCACCTAATGACATGAAGGACT-GAAATTTCCAGCACACACA-A-TGTATCCGACGGCAATGCTTCTTC--------Duckweed ATATAATATTAGAAAAAAATC-----TCCCATAGTATTTAGTATTTACCAAAAGTCACACGACCA-CTAGACTCCAATTTACCCAAATCACTAACCAATTLarch TTCTCGTATAAGGCCACCA-------TTGGTAGACACGTAGTATGCTAAATATGCACCACACACA-CTATCAGATATGGTAGTGGGATCTG--ACGGTCA
Cotton ACCAATCTCT---AAATGTT----GTGAGCT---TAG-GCCAAATTT-TATGACTATA--TAT----AGGGGATTGCACC----AAGGCAGTG-ACACTAPea GGCAGTGGCC---AACTAC--------------------CACAATTT-TAAGACCATAA-TAT----TGGAAATAGAA------AAATCAAT--ACATTATobacco GGGGGTTGTT---GATTTTT----GTCCGTTAGATAT-GCGAAATATGTAAAACCTTAT-CAT----TATATATAGAG------TGGTGGGCA-ACGATGIce-plant GGCTCTTAATCAAAAGTTTTAGGTGTGAATTTAGTTT-GATGAGTTTTAAGGTCCTTAT-TATA---TATAGGAAGGGGG----TGCTATGGA-GCAAGGTurnip CACCTTTCTTTAATCCTGTGGCAGTTAACGACGATATCATGAAATCTTGATCCTTCGAT-CATTAGGGCTTCATACCTCT----TGCGCTTCTCACTATAWheat CACTGATCCGGAGAAGATAAGGAAACGAGGCAACCAGCGAACGTGAGCCATCCCAACCA-CATCTGTACCAAAGAAACGG----GGCTATATATACCGTGDuckweed TTAGGTTGAATGGAAAATAG---AACGCAATAATGTCCGACATATTTCCTATATTTCCG-TTTTTCGAGAGAAGGCCTGTGTACCGATAAGGATGTAATCLarch CGCTTCTCCTCTGGAGTTATCCGATTGTAATCCTTGCAGTCCAATTTCTCTGGTCTGGC-CCA----ACCTTAGAGATTG----GGGCTTATA-TCTATA
Cotton T-TAAGGGATCAGTGAGAC-TCTTTTGTATAACTGTAGCAT--ATAGTACPea TATAAAGCAAGTTTTAGTA-CAAGCTTTGCAATTCAACCAC--A-AGAACTobacco CATAGACCATCTTGGAAGT-TTAAAGGGAAAAAAGGAAAAG--GGAGAAAIce-plant TCCTCATCAAAAGGGAAGTGTTTTTTCTCTAACTATATTACTAAGAGTACLarch TCTTCTTCACAC---AATCCATTTGTGTAGAGCCGCTGGAAGGTAAATCATurnip TATAGATAACCA---AAGCAATAGACAGACAAGTAAGTTAAG-AGAAAAGWheat GTGACCCGGCAATGGGGTCCTCAACTGTAGCCGGCATCCTCCTCTCCTCCDuckweed CATGGGGCGACG---CAGTGTGTGGAGGAGCAGGCTCAGTCTCCTTCTCG
![Page 12: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/12.jpg)
12
An Exact Algorithm(generalizing Sankoff and Rousseau 1975)
Wu [s] = best parsimony score for subtree rooted at node u,
if u is labeled with string s.
AGTCGTACGTG
ACGGGACGTGC
ACGTGAGATAC
GAACGGAGTAC
TCGTGACGGTG
… ACGG: 2 ACGT: 1 ...
… ACGG: 0 ACGT: 2...
… ACGG: 1 ACGT: 1 ...
…
ACGG: + ACGT: 0
...
… ACGG: 1 ACGT: 0 ...
4k entries
… ACGG: 0 ACGT: + ...
… ACGG: ACGT :0 ...
… ACGG: ACGT :0 ...
… ACGG: ACGT :0 ...
![Page 13: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/13.jpg)
13
Wu [s] = min ( Wv [t] + d(s, t) ) v: child t of u
Recurrence
![Page 14: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/14.jpg)
14
O(k 42k ) time per node
Wu [s] = min ( Wv [t] + d(s, t) ) v: child t of u
Running Time
![Page 15: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/15.jpg)
15
O(k 42k ) time per node
Number of species
Average sequence
length
Motif length
Total time O(n k (42k + l ))
Wu [s] = min ( Wv [t] + d(s, t) ) v: child t of u
Running Time
![Page 16: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/16.jpg)
16
Improvements• Better algorithm reduces time from
O(n k (42k + l )) to O(n k (4k + l ))
• By restricting to motifs with parsimony score at most d, greatly reduce the number of table entries computed (exponential in d, polynomial in k)
• Amenable to many useful extensions (e.g., allow insertions and deletions)
![Page 17: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/17.jpg)
17
Application to -actin Gene
Gilthead sea bream (678 bp)
Medaka fish (1016 bp)
Common carp (696 bp)
Grass carp (917 bp)
Chicken (871 bp)
Human (646 bp)
Rabbit (636 bp)
Rat (966 bp)
Mouse (684 bp)
Hamster (1107 bp)
![Page 18: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/18.jpg)
18
Common carpACGGACTGTTACCACTTCACGCCGACTCAACTGCGCAGAGAAAAACTTCAAACGACAACATTGGCATGGCTTTTGTTATTTTTGGCGCTTGACTCAGGATCTAAAAACTGGAACGGCGAAGGTGACGGCAATGTTTTGGCAAATAAGCATCCCCGAAGTTCTACAATGCATCTG
AGGACTCAATGTTTTTTTTTTTTTTTTTTCTTTAGTCATTCCAAATGTTTGTTAAATGCATTGTTCCGAAACTTATTTGCCTCTATGAAGGCTGCCCAGTAATTGGGAGCATACTTAACATTGTAGTATTGTATGTAAATTATGTAACAAAACAATGACTGGGTTTTTGTACTTTCAGCCTTAATCTTGGGTTTTTTTTTTTTTTTGGTTCCAAAAAACTAAGCTTTACCATTCAAGATGTAAAGGTTTCATTCCCCCTGGCATATTGAAAAAGCTGTGTGGAACGTGGCGGTGCA
GACATTTGGTGGGGCCAACCTGTACACTGACTAATTCAAATAAAAGTGCACATGTAAGACATCCTACTCTGTGTGATTTTTCTGTTTGTGCTGAGTGAACTTGCTATGAAGTCTTTTAGTGCACTCTTTAATAAAAGTAGTCTTCCCTTAAAGTGTCCCTTCCCTTATGGCCTTCACATTTCTCAACTAGCGCTTCAACTAGAAAGCACTTTAGGGACTGGGATGC
ChickenACCGGACTGTTACCAACACCCACACCCCTGTGATGAAACAAAACCCATAAATGCGCATAAAACAAGACGAGATTGGCATGGCTTTATTTG
TTTTTTCTTTTGGCGCTTGACTCAGGATTAAAAAACTGGAATGGTGAAGGTGTCAGCAGCAGTCTTAAAATGAAACATGTTGGA
GCGAACGCCCCCAAAGTTCTACAATGCATCTGAGGACTTTGATTGTACATTTGTTTCTTTTTTAATAGTCATTCCAAATATTGTTATAATGCATTGTTACAGGAAGTTACTCGCCTCTGTGAAGGCAACAGCCCAGCTGGGAGGAGCCGGTACCAATTACTGGTGTTAGATGATAATTGCTTGTCTGTAAATTATGTAACCCAACAAGTGTCTTTTTGTATCTTCCGCCTTAAAAACAAAACACACTTGATCCTTTTTGGTTTGTCAAGCAAGCGGGCTGTGTTCCCCAGTGA
TAGATGTGAATGAAGGCTTTACAGTCCCCCACAGTCTAGGAGTAAAGTGCCAGTATGTGGGGGAGGGAGGGGCTACCTGTACACTGACTTAAGACCAGTTCAAATAAAAGTGCACACAATAGAGGCTTGACTGGTGTTGGTTTTTATTTCTGTGCTGCGCTGCTTGGCCGTTGGTAGCTGTTCTCATCTAGCCTTGCCAGCCTGTGTGGGTCAGCTATCTGCATGGGCTGCGTGCTGGTGCTGTCTGGTGCAGAGGTTGGATAAACCGTGATGATATTTCAGCAAGTGGGAGTTGGCTCTGATTCCATCCTGAGCTGCCATCAGTGTGTTCTGAAGGAAGCTGTTGGATGAGGGTGGGCTGAGTGCTGGGGGACAGCTGGGCTCAGTGGGACTGCAGCTGTGCT
HumanGCGGACTATGACTTAGTTGCGTTACACCCTTTCTTGACAAAACCTAACTTGCGCAGAAAACAAGATGAGATTGGCATGGCTTTATTTGTTT
TTTTTGTTTTGTTTTGGTTTTTTTTTTTTTTTTGGCTTGACTCAGGATTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTT
GGAGCGAGCATCCCCCAAAGTTCACAATGTGGCCGAGGACTTTGATTGCATTGTTGTTTTTTTAATAGTCATTCCAAATATGAGATGCATTGTTACAGGAAGTCCCTTGCCATCCTAAAAGCCACCCCACTTCTCTCTAAGGAGAATGGCCCAGTCCTCTCCCAAGTCCACACAGGGGAGGTGATAGCATTGCTTTCGTGTAAATTATGTAATGCAAAATTTTTTTAATCTTCGCCTTAATACTTTTTTATTTTGTTTTATTTTGAATGATGAGCCTTCGTGCCCCCCCTTC
CCCCTTTTTGTCCCCCAACTTGAGATGTATGAAGGCTTTTGGTCTCCCTGGGAGTGGGTGGAGGCAGCCAGGGCTTACCTGTACACTGACTTGAGACCAGTTGAATAAAAGTGCACACCTTAAAAATGAGGCCAAGTGTGACTTTGTGGTGTGGCTGGGTTGGGGGCAGCAGAGGGTG
Parsimony score over 10 vertebrates: 0 1 2
![Page 19: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/19.jpg)
19
Motifs Absent from Some Species
• Find motifs – with small parsimony score– that span a large part of the tree
• Example: in tree of 10 species spanning 760 Myrs, find all motifs with– score 0 spanning at least 250 Myrs– score 1 spanning at least 350 Myrs– score 2 spanning at least 450 Myrs– score 3 spanning at least 550 Myrs
![Page 20: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/20.jpg)
20
Application to c-fos Gene
Asked for motifs of length 10, with 0 mutations over tree of
size 6 1 mutation over tree of size 11 2 mutations over tree of size 16 3 mutations over tree of size 21 4 mutations over tree of size 26
Puffer fish
Chicken
Pig
Mouse
Hamster
Human
10
2
7
2
2
21
0
1
1
Found: 0 mutations over tree of size 81 mutation over tree of size 163 mutations over tree of size 214 mutations over tree of size 28
![Page 21: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/21.jpg)
21
Application to c-fos GeneMotif Score Conserved in Known?
CAGGTGCGAATGTTC 0 4 mammals
TTCCCGCCTCCCCTCCCC 0 4 mammals yes
GAGTTGGCTGcagcc 3 puffer + 4 mammals
GTTCCCGTCAATCcct 1 chicken + 4 mammals yes
CACAGGATGTcc 4 all 6 yes
AGGACATCTG 1 chicken + 4 mammals yes
GTCAGCAGGTTTCCACG 0 4 mammals yes
TACTCCAACCGC 0 4 mammals
![Page 22: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/22.jpg)
22
MicroFootPrinter• Designed specifically for phylogenetic
footprinting in microbial genomes
• Front end to FootPrinter designed with Shane Neph
• Available at bio.cs.washington.edu/software.html
![Page 23: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/23.jpg)
23
MicroFootPrinter
• 317 prokaryotes with genomes completely sequenced (as of 3/28/2006)
– For any prokaryotic gene of interest, plenty of orthologous genes in other species available
• User specifies species and gene of interest
• Automates collection of orthologous genes, cis-regulatory sequences, gene tree, parameters
![Page 24: Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm](https://reader036.vdocument.in/reader036/viewer/2022081603/568148f0550346895db60efc/html5/thumbnails/24.jpg)
24
Operons< 100 bp
g
Upstream sequence for g