human: 78 tissues (su et al, 2004) stastical significance p. falciparum: intra-erythrocytic...
Post on 20-Dec-2015
218 views
TRANSCRIPT
Human: 78 tissues (Su et al, 2004)
Stastical significanceP. falciparum: intra-erythrocytic development cycle
Yeast: 78 co-expression clusters From k-mers to motifs
Statistical significance
What is FIRE ?FIRE (for Finding Informative Regulatory Elements) is a highly sensitive approach for motif discovery from expression data, based on mutual information. It has the following characteristics:
• highly sensitive, with very few false positive predictions, if any,
• applicable to any type of expression data,
• obviates assumptions and parameter tuning often required by existing methods,
• simultaneously finds DNA and RNA motifs and explores their functional relationships, v) scales well to mammalian genomes,
• highlights the biological role of predicted motifs, their inter-species conservation, and spatial and orientation biases,
• characterizes motif interactions and co-localizations
• displays the results in a user-friendly graphical format.
FIRE uses mutual information to discover and characterize motifs
Systematic exploration of cis-regulation using a generic computational framework
Olivier Elemento*, Noam Slonim* (equal contribution) and Saeed Tavazoie Lewis-Sigler Institute for Integrative Genomics, Princeton University
Discrete Continuous
1
1
1
1
1
2
2
2
0
0
0
0
0.45
0.12
0.01
-0.08
-0.87
-1.56
-2.32
-2.89
1.54
1.98
3.50
4.39
6.45
5’ upstream region
Log-ratio5’ upstream region
Cluster index
Position bias
1
1
1
1
2
2
2
0
0
0
0
2
5’ upstream region
Cluster index
Co-occurrence
5’ upstream region
Down-regulated Up-regulatedCy3/Cy5 log-ratios
PAC
Rpn4
Yap1
Puf3
Experiment: H2O2 treatment in ΔMsn2/ΔMsn4 background
-π Phase +π
~ 2
,70
0 p
eri
od
ically
exp
ress
ed
g
en
es
0h Time 48h
change
Similarity to ChIP-chip RAP1 motif (Lee et al, 2002)
Mutual information
Real mutual information value
Maximum of 10,000 expression-shuffled mutual information values
17 motifs in 5’ upstream regions 6 motifs in 3’UTRs
0 “motifs” when shuffling the gene labels of the clustering partition
1129 motifs when applying AlignACE (with default parameters) to each cluster independently880 “motifs” when applying AlignACE to the same shuffled clusters as above
All 23 motifs are highly conserved with S. bayanus
> 50% of our predicted motifs have a non-random spatial distribution
X Y yPxP
yxPyxPYXI
)()(
),(log),();(
Mutual Information21 motifs in 5’ upstream regions 0 motifs in 3’UTRs0 “motifs” when shuffling the gene labels of the phase profile
71% highly conserved with P. yoelli
DNA replication, p<1e-4plastid, p<0.01
ribosome, p<0.001
Bozdech, Llinás, et al, 2003
-π Phase +π
motifs informative about the phase ?
Yeast: single microarray
Biological insights• Importance of RNA motifs in shaping transcriptomes (~30% of yeast, worm, human, arabidopsis motifs we found are RNA motifs)
• In worm/human/mouse, several RNA motifs match miRNA targets
• “Cooperation” between DNA and RNA motifs
• Avoidance of joint-presence for certain motifs
• Under-representation of certain motifs
Practical aspectsUnix command line:
perl fire.pl –expfile=human_clusters.txt –exptype=discrete –species=human
FIRE FIRE
Human gene expression atlas (clustered)
PAC and the Msn2/4 binding site tend to avoid being in the same promoters
PAC and RRPE tend be co-localize on the DNA
(data from Gasch et al, 2000)
Motif conservation with S. bayanus
The RAP1 binding site has a position and orientation bias
PAC
RRPE
PUF4
PUF3
MSN2/4
RAP1
RPN4
REB1
MBP1
HAP4
XBP1
BAS1
CBF1
SWI4
73 motifs in 5’ upstream regions 42 motifs in 3’UTRs
0 “motifs” when shuffling the gene labels of the clustering partition
ELK4
Sp1
miR-525/mR-526c
bZIP911
NF-Y
E2F1
miR-200b/miR-429
TCF11-MafG
Pax2
E2F
CHOP-C/EBPα
TCF11-MafG
…
(data from Su et al, 2004)
Several 3’UTR motifs match the 5’ extremity of microRNAs
(Data from Bozdech, Llinás, et al, 2003)
FIRE