the amadeus motif discovery platform c. linhart, y. halperin, r. shamir tel-aviv university aposys...
TRANSCRIPT
![Page 1: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/1.jpg)
The
AMADEUSMotif Discovery Platform
C. Linhart, Y. Halperin, R. Shamir
Tel-Aviv University
ApoSys workshop May ‘08Genome Research 2008
![Page 2: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/2.jpg)
• Transcription is regulated primarily by transcription factors (TFs) – proteins that bind to DNA subsequences, called binding sites (BSs)
• TFBSs are located mainly (not always!) in the gene’s promoter – the DNA sequence upstream the gene’s transcription start site (TSS)
• TFs can promote or repress transcription
Promoter Analysis:Exteremely brief intro
TFTF
Gene5’ 3’
BSBSTSS
![Page 3: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/3.jpg)
• The BSs of a particular TF share a common pattern, or motif, which is often modeled using:– Consensus string
TASDAC (S={C,G} D={A,G,T})
– Position weight matrix (PWM / PSSM)
Promoter Analysis (cont.)TFBS models
A0.10.800.70.20
C00.10.50.10.40.6
G000.50.10.40.1
T0.90.100.100.3
> Threshold = 0.01:
TACACC (0.06)TAGAGC (0.06)TACAAT (0.015)…
![Page 4: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/4.jpg)
Promoter Analysis (cont.): Typical pipeline
Cluster I
Cluster II
Cluster III
Gene expressionmicroarrays
Clustering
Location analysis(ChIP-chip, …)
Functional group(e.g., GO term)
Promotersequences
Motifdiscovery
Co-regulated gene set
![Page 5: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/5.jpg)
Reverse-engineer the transcriptional regulatory network = find the TFs (and their BSs) that regulate the studied biological process
Input: A set of co-expressed genes
Output: “Interesting” motif(s):
1. Known motifs: PRIMA, ROVER, …
2. Novel motifs: MEME, AlignACE, …
3. A group of co-occurring motifs = cis-regulatory module (CRM):
MITRA, CREME, …
Promoter Analysis (cont.): Goals
AMADEUS
![Page 6: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/6.jpg)
• Extant tools perform reasonably well for:
– Finding known/novel motifs in organisms with short, simple promoters, e.g., yeast
– Identifying some of the known motifs in complex species, e.g., TFs whose BSs are usually close to the TSS
• … but often fail in other cases!
• Each tool is custom-built for a specific target score, often parametric (i.e., assumes a BG model) or uses a small part of the genome as BG reference;
Majority of tools can efficiently handle only dozens of genes
• Comparison of tools: [Tompa et al. ’05]
Promoter Analysis: Status of motif discovery tools
![Page 7: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/7.jpg)
AMADEUSA Motif Algorithm for DetectingEnrichment in mUltiple Species
• Research platform:• Extensible: add new algs, scores, motif models• Flexible: control params, algs, scores of execution
• Experimental tool:• Sensitive: find subtle signals • Efficient: analyze many long sequences• Informative: show lots of info on motifs • User-friendly: nice GUI
![Page 8: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/8.jpg)
Main features: I/OInput:
• Type: target set / expression data• Multiple species / target-sets• Sequence region (promoter, 1st intron, 3’ UTR, …)
Output:• Non-redundant set of motifs• Rich info per output motif:
1. Graphical motif logo2. Multiple scores & combined p-value3. Similarity to known TFBS models4. List of target genes5. BS localization graph6. Targets mean expression graph
![Page 9: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/9.jpg)
Main features: alg.Algorithm: Multiple refinement phases: • Each phase receives best candidates of previous phase,
and refines them (e.g., uses a more complex motif model)• First phases are simple and fast (e.g., try all k-mers);
Last phases are more complex (e.g., optimize PWM using EM)
![Page 10: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/10.jpg)
Main features: scores
Motif scores: • User selects scores to use, a subset of:
─ Target-set: Over/under-representation: 1. Hypergeometric2. GC-content+length binned binomial
─ Expression: 1. Enrichment of ranked expression (multiple conditions)
(Not yet in the public version) ─ Global/spatial:
1. Localization2. Strand-bias3. Chromosomal preference
• Scores are combined into a single p-value• Doesn’t assume specific models for distribution of BSs
and/or expression values
![Page 11: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/11.jpg)
Main features: misc.
GUI:• Control all parameters• Save/load parameters from file• Save textual+graphical output to file• TFBS viewer
Other:• Ignore redundant sequences (with identical subsequence) • Applicable to multiple genome-scale promoter sequences • Bootstrapping: Empirical p-value estimation using
random target sets / shuffled data• Execution modes: GUI , batch• Interoperability: Java application
![Page 12: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/12.jpg)
Case study:G2 & G2/M phases of human cell
cycle [Whitfield et al. ’02]
CHR (not in TRANSFAC)
NF-Y
(Module was reported in [Linhart et al., ’05], [Tabach et al. ’05])Module: CHR and NF-Y motifs co-occur
![Page 13: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/13.jpg)
Benchmark I:Yeast TF target sets [Harbison et al.
’04]Source: ChIP-chip [Harbison et al., ’04]
Data: target-sets of 83 TFs with known BS motifs
Average set size: 58 genes (=35 Kbps)
Success rates: (for top 2 motifs of lengths 8 & 10)
![Page 14: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/14.jpg)
Performance on metazoan datasets
Results on 42 target-sets:• Collected from 29 publications• Based on high-throughput expr’s• Species: human, mouse, fly, worm • Sets: 26 TFs, 8 microRNAs• All have known motifs
![Page 15: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/15.jpg)
Global Analysis I:Localized human+mouse motifs
Input: • All human & mouse promoters (2 x ~20,000) • Region: -500…100 (w.r.t. TSS)• Total sequence length: ~26 Mbps• [No target-set / expression data]• Score: localization
Results: • Recovered known TFs: Sp1, NF-Y, GABP, TATA, Nrf-1, ATF/CREB, Myc, RFX1• Recovered the splice donor site• Identified several novel motifs
![Page 16: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/16.jpg)
Input: • All fly promoters (~14,000) • Region: -1000…200 (w.r.t. TSS)• Total sequence length: ~11 Mbps• [No target-set / expression data]• Score: chromosomal preference
Results: • DNA Replication Element Factor (DREF) on X chromosome
Global Analysis II:Chromosomal preference
![Page 17: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/17.jpg)
Global Analysis II:Chromosomal preference (cont.)
Input: • All worm promoters (~18,000) • Region: -500…100 (w.r.t. TSS)• Total sequence length: 6.6 Mbps• [No target-set / expression data]• Score: chromosomal preference
Results: • Novel motif on chrom IV
![Page 18: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/18.jpg)
Summary
• Developed Amadeus motif discovery platform:• Easy to use• Feature-rich, informative• Sensitive & efficient
• Constructed a large, real-life, heterogeneous
benchmark for testing motif finding tools• Demonstrated various applications of motif discovery• http://acgt.cs.tau.ac.il/amadeus
![Page 19: The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008](https://reader034.vdocument.in/reader034/viewer/2022051115/56649cd75503460f9499f82f/html5/thumbnails/19.jpg)
Acknowledgements
Tel-Aviv UniversityChaim LinhartYonit HalperinRon Shamir
The Hebrew University of JerusalemGidi Weber