finding regulatory modules from local alignment
DESCRIPTION
Finding regulatory modules from local alignment. - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki Erice 30 Nov 2005. Pairwise alignment of strings. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/1.jpg)
Finding regulatory modules from local alignment
-
Department of Computer Science & Helsinki Institute of Information Technology HIIT
University of Helsinki
Erice 30 Nov 2005
![Page 2: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/2.jpg)
Pairwise alignment of strings
• A: S T O C K H O L M B: T U K H O L M A
• minimum number of ’mutation’ steps: a -> b a -> є є -> b …
![Page 3: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/3.jpg)
Dynamic programmingdi,j = min(if ai=bj then di-1,j-1 else ,
di-1,j + 1, di,j-1 + 1)
= distance between i-prefix of A and j-prefix of B (without substitutions)
di,j
di-1,j-1
di,j-1
di-1,j
dm,
n
mxn table d
A
B
ai
bj
+1
+1
![Page 4: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/4.jpg)
A\B s t o c k h o l m0 1 2 3 4 5 6 7 8 9
t 1 2 1 2 3 4 5 6 7 8u 2 3 2 3 4 5 6 7 8 9k 3 4 3 4 5 4 5 6 7 8h 4 5 4 5 6 5 4 5 6 7o 5 6 5 4 5 6 5 4 5 6l 6 7 6 5 6 7 6 5 4 5
m 7 8 7 6 7 8 7 6 5 4a 8 9 8 7 8 9 8 7 6 5
di,j = min(if ai=bj then di-1,j-1 else , di-1,j + 1, di,j-1 + 1)
dID(A,B)
optimal alignment by trace-back
![Page 5: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/5.jpg)
Homology searches
• find homologous sequences: new sequence versus all old ones in database – the most popular computational task in present-day molecular biology = approximate string matching
• BLAST - big success
• good homology => same biological function
D A T A B A S E
NEW SEQUENCE ?
![Page 6: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/6.jpg)
Multiple alignment• multiple
alignment of sequence families to find interesting conserved motifs: NP-hard => heuristics, Hidden Markov models, MCMC
• comparison of entire genomes
![Page 7: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/7.jpg)
Gene enhancer module prediction
![Page 8: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/8.jpg)
Problem
• Gene expression regulation in multicellular organisms is controlled in combinatorial fashion by so called transcription factors (TFs).
• Transcription factors bind to DNA cis-elements (TF binding sites) on enhancer modules (promoters), and multiple factors need to bind to activate the module.
• In mammals, the modules are few and far • The problem: Locate functional regulatory
modules, that is, find interesting patterns.
![Page 9: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/9.jpg)
Gene enhancer modules
gene1 gene2 gene3 gene4DNA
RNA
transcription
translation
Proteins
transcription factors
enhancer module
![Page 10: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/10.jpg)
Model of cell type specific regulation of target gene expression
GLI X Y (tissue specific TFs)
GLI GLI Ubiquitously expressed TF
transcription
transcription
Common targets (e.g. Patched):
Cell type specific targets (e.g. N-myc):
![Page 11: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/11.jpg)
Binding affinity matrices
• The TF binding sites are represented by affinity matrices.– A column per position– A row per nucleotide
• Discovered:– Computationally– Traditional wet lab– Microarrays
9 11 49 51 0 1 1 4 19 3 0 0 0 45 25 16 5 1 2 0 17 0 4 21 18 36 0 0 34 5 21 10
![Page 12: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/12.jpg)
Binding affinity matrices
9 11 49 51 0 1 1 4 19 3 0 0 0 45 25 16 5 1 2 0 17 0 4 21 18 36 0 0 34 5 21 10
![Page 13: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/13.jpg)
Determined TF binding profiles (+ JASPAR)
![Page 14: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/14.jpg)
Finding conserved motifs of binding sites
• looking at one (human) genome gives too many positives
• comparative genomics approach: – take the 200 kB regions surrounding the same genes
(paralogs and orthologs) of different mammals: human, mouse, chicken, …
– find conserved clusters (= motifs) of binding sites
• cluster = group of binding sites with good local alignment = > Smith-Waterman type algorithm with a novel scoring function
![Page 15: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/15.jpg)
Smith-Waterman
• find the best local alignment of strings A and B: substring X of A and substring Y of B such that X and Y have the best scoring pairwise alignment
X
Y
![Page 16: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/16.jpg)
Computational identification of enhancer elements
• Preserved in evolution:– Affinities of
functional cis-elements.
– Spatial arrangement of elements within a module.
Human
Mouse
![Page 17: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/17.jpg)
Parameter optimization
• scoring function has 3 free parameters.
• Find good parameters by greedy hill climbing using a training data
![Page 18: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/18.jpg)
Whole genome comparisons
• Whole genomes can be analyzed with our implementation EEL (Enhancer Element Locator)
• We compared human genes to orthologs in mouse, rat, chicken, fugu, tetraodon and zebrafish – 100 kbp flanking regions on both sides of the gene.– Coding regions masked out.– About 20 000 comparisons for each pair of species.
![Page 19: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/19.jpg)
Annotating the Human genome with mammalian enhancer-elements
![Page 20: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/20.jpg)
EEL output● Output from EEL
program.● Previously known
functional sites are highlighted
● DNA between the sites is aligned just for the output
![Page 21: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/21.jpg)
Enhancer prediction for N-myc
200 kb Mouse N-Myc genomic region
200
kb H
uman
N-M
yc g
enom
ic r
egio
n
Conserved GLI binding sites in two predicted enhancer elements, CM5 and CM7
coding region of N-Myc
![Page 22: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/22.jpg)
Wet-lab verification
● Selected some predicted enhancer modules for wet-lab verification
● Fused 1kb DNA segment containing the predicted enhancer to a marker gene (LacZ) with a minimal promoter, and generated transgenic embryos.
![Page 23: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/23.jpg)
Enhancer prediction for N-myc
200 kb Mouse N-Myc genomic region
200
kb H
uman
N-M
yc g
enom
ic r
egio
n
Conserved GLI binding sites in two predicted enhancer elements, CM5 and CM7
coding region of N-Myc
![Page 24: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/24.jpg)
Summary
• input: +- 100 kb flanking sequences of DNA of orthologous pairs of genes from human and mouse
• find all good enough TF binding sites from the sequences
• find the best local alignments of the binding sites using the EEL scoring function
• output: the sequences in good local alignments; these are the putative enhancers
• postprocessing: an expert biologist selects the most promising predictions for wet lab verification; hopefully he/she has good luck!
![Page 25: Finding regulatory modules from local alignment](https://reader035.vdocument.in/reader035/viewer/2022062518/56814524550346895db1ea36/html5/thumbnails/25.jpg)
Acknowledgements
• Kimmo Palin
• Outi Hallikas (Biom)• Jussi Taipale (Biom)
The BioSapiens project is funded by the European Commission within its FP6 Programme, under the thematic area "Life sciences, genomics and biotechnology for health,"contract number LHSG-CT-2003-503265.