Outline
Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work?
Reg
.
ACGTGC
Activator Repressor
Regulated gene
Activator Repressor
Regulated gene
Activator
Regulated gene
Repressor
State 1
Act
ivat
or
State 2
Act
ivat
or
Repressor
State 3
Gene Regulation: Simple Example
Regulated gene
DNA Microarray
Regulators
DNA Microarray
Regulators
truefalse
truefalse
Regulation Tree
Activator?
Repressor?
State 1 State 2 State 3
true Regulation
program
Module
genes
Activator expressio
n
Repressor expressio
n
Genes in the same module share the same regulation
program
Module Networks
Goal: Discover regulatory modules and their regulators Module genes: set of genes that are similarly
controlled Regulation program: expression as function of
regulators
Modu
les
HAP4
CMK1 truefalse
truefalse
Expression level in each module is a
function of expression of regulators
Module Network Probabilistic Model
Experiment
Gene
Expression
Module
Regulator1
Regulator2
Regulator3
Level
What module does gene “g” belong
to?
Expression level of Regulator1 in experiment
BMH1
GIC2
00 0
2
1
Module
P(Level | Module, Regulators)
HAP4
CMK1
0
0 0
Outline
Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work?
Reg
.
ACGTGC
Learning Problem
Experiment
Gene
Expression
Module
Regulator1
Regulator2
Regulator3
Level
HAP4
CMK1
0
00
Find gene module assignments and tree structures that maximize P(M|D)
Goal:
Gene module
assignments
Tree structures
Hard
Genes: 5000-10000
Regulators: ~500
Learning Algorithm Overview
Relearn gene
assignments to modules
clustering
Gene module assignment
Regulatory modules
Learn regulatio
n program
s
HAP4
CMK1
Learning Regulation ProgramsExperiments
Mod
ul
e
gen
esExperiments
sorted in original order
Experiments sorted by Hap4
expression
log P(M|D) log P(D|,) + log P(,)
HAP4
log P(M|D) log P(DHAP4 |HAP4 ,HAP4 ) + log P(DHAP4 |HAP4 ,HAP4 ) + log P(HAP4,HAP4, HAP4 ,HAP4)
SIP4
log P(M|D) log P(DSIP4 |SIP4 ,SIP4 ) + log P(DSIP4 |SIP4 ,SIP4 ) + log P(SIP4,SIP4, SIP4 ,SIP4)
log P(M|D) log P(DHAP4 |HAP4 ,HAP4 ) + log P(DCMK1 |CMK1 ,CMK1 ) + log P(DCMK1 |CMK1 ,CMK1 ) + …
HAP4
CMK1
Mod
ul
e
gen
es
Hap4 expression
Regulator
Learning Algorithm Performance
-131
-130
-129
-128
0 5 10 15 20
Bayesi
an
sco
re (
avg
. p
er
gen
e)
Algorithm iterations
0
10
20
30
40
50
0 5 10 15 20
Algorithm iterations
Gen
e m
od
ule
ass
ign
ment
ch
an
ges
(% f
rom
tota
l)
Significant improvements across
learning iterations
Many genes (50%) change module assignment in
learning
Outline
Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work?
Reg
.
ACGTGC
Yeast Stress Data
Genes Selected 2355 that showed activity
Experiments (173) Diverse environmental stress
conditions: heat shock, nitrogen depletion,…
Comparison to Bayesian Networks
Problems Robustness Interpretability
Cmk1
Hap4
Mig1
Ste12
Bayesian Network
Friedman et al ’00Hartemink et al. ’01
Yap1
Gic1
Expression level of each gene is a function of expression of
regulators
Fragment of learned Bayesian network 2355 variables (genes) 173 instances (experiments)
Comparison to Bayesian Networks
Problems Robustness Interpretability
Cmk1
Hap4
Mig1
Ste12
Bayesian Network
Friedman et al ’00Hartemink et al. ’01
Yap1
Gic1
Module NetworkSPRKF ’03 (UAI)
Solutions Robustness sharing parameters Interpretability module-level
model
Regulator1
Regulator2
Regulator3
Level
Module
Comparison to Bayesian Networks
Problems Robustness Interpretability
Solutions Robustness sharing parameters Interpretability module-level
model
Test
Data
Log
-Lik
elih
ood
(gain
per
inst
an
ce)
Number of modules
Bayesian Network performance
-150
-100
-50
0
50
100
150
0 100 200 300 400 500
Learn which parameters are shared(by learning which genes are in the same
module)
Module
From Model to Regulatory Modules
Regulator1
Regulator2
Regulator3
Level
HAP4
CMK1
Biologically relevant?
HAP4
CMK1
0
0 0
Respiration Module
Regulation
program
Module genes
Energy production (oxid. phos. 26/55 P<10-30)
Hap4+Msn4 known to regulate module genes
Module genes functionally coherent? Module genes known targets of predicted regulators?
Predicted regulator
Energy, Osomlarity, & cAMP Signaling
Tpk1: Regulation by non-TFs
(Tpk1 is a catalytic unit of cAMP dependent protein kinase)
Module contains known Tpk1 targets (e.g. Tps1)
Tpk1-mediated STRE motif (50/64 genes; p<3x10-11)
EM: Biological Improvement
0
5
10
15
20
25
30
35
40
45
0 5 10 15 20 25 30 35 40 45
Ne
gat
ive
log
p-v
alu
e (
mo
du
le n
etw
ork
)
Negative log p-value (standard clustering)
Hap4
Xbp1
Yer184c
Yap6
Gat1
Ime4
Lsg1
Msn4
Gac1
Gis1
Ypl230w
Not3
Sip2
Amino acidmetabolism
Energy andcAMP signaling
DNA and RNAprocessing
nuclear
12 3 253341
ST
RE
N41
HA
P234
426
RE
PC
AR
CA
T8
N26
AD
R1
3947
HS
F
HA
C1
XB
P1
30 42M
CM
1
N30
31 36
AB
F_C
N36
5 16
Kin82
Cm
k1
Tpk1
Ppt1
N11
GA
TA
8109
GC
N4
CB
F1_B
Tpk2
Pph3
13 141517
N14
N13
Regulation supported in literature
Regulator (Signaling molecule)
Regulator (transcription factor)
Inferred regulation
48 Module (number)
Experimentally tested regulator
Enriched cis-Regulatory Motif
Bm
h1
Gcn20
GC
R1
18
MIG
1
N18
11
Biological Evaluation Summary
Are the module genes functionally coherent?
Are some module genes known targets of the predicted regulators?
46/50
30/50
Functionally coherent = module genes enriched for GO annotations with hypergeometric p-value < 0.01 (corrected for multiple hypotheses)
Known targets = direct biological experiments reported in the literature
Outline
Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work?
Reg
.
ACGTGC
From Model to Detailed Predictions
Prediction:
Experiment:
Regulator ‘X’ regulates process ‘Y’
Knock out ‘X’ and repeat experiment
HAP4
Ypl230w X
?
Does ‘X’ Regulate Predicted Genes?
Experiment: knock out Ypl230w (stationary phase)
1334 regulated genes(312 expected by
chance)
wild-type
mutant
>4x
Regulated genes
Rank modules by regulated genes
Predicted modules
Module Sig.
Protein foldingP<0.0001
Cell diferentiation P<0.02
Glycolysis and folding P<0.04
Mitochondrial and protein fate
P<0.04
Module Sig.
Protein foldingP<0.0001
Cell diferentiation P<0.02
Glycolysis and folding P<0.04
Mitochondrial and protein fate
P<0.04
Modules predicted to be regulated by
Ypl230w
Ypl230w regulates
computationally predicted genes
Regulated genes(1014)
Ppt1 knockout(hypo-osmotic
stress)wild-type
mutant
Regulated genes(1034)
wild-type
mutant
Kin82 knockout (heat
shock)
Module Sig.
Energy and osmotic stressP<0.0001
Energy, osmolarity & cAMP signaling
P<0.006
mRNA, rRNA and tRNA processing
P<0.02
Module Sig.
Ribosomal and phosphate metabolism
P<0.009
Amino acid and purine metabolism
P<0.01
mRNA, rRNA and tRNA processing
P<0.02
Protein folding P<0.02
Cell cycle P<0.02
Does ‘X’ Regulate Predicted Genes?
Wet Lab Experiments Summary
3/3 regulators regulate computationally predicted genes
New yeast biology suggested Ypl230w activates protein-
folding, cell wall and ATP-binding genes
Ppt1 represses phosphate metabolism and rRNA processing
Kin82 activates energy and osmotic stress genes
Outline
Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work?
Reg
.
ACGTGC
Why does it work? Underlying assumption:
Regulators are transcriptionally regulated
Regulators are part of regulatory structures in which they are themselves regulated*
Statistical methods can detect associations between regulators and their targets
* [Shen-Orr et al., ’02] find many such structures
Regulator Chain
Respiration module
Time
Activeprotein
level
mRNAexpression
level
Phd1Hap4Targets
Phd1
Hap4Targets
Phd1 (TF)
Hap4 (TF)
Cox4 Cox6 Atp17
Black: regulators that cannot be detectedRed: correctly predicted regulatorBlue: targets
Auto Regulation
Snf kinase regulated processes module
Yap6 (TF)
Vid24 Tor1 Gut2
Black: regulators that cannot be detectedRed: correctly predicted regulatorBlue: targets
Positive Signaling Loop
Sporulation and cAMP pathway module
Sip2 (SM)
Msn4 (TF)
Vid24 Tor1 Gut2
Black: regulators that cannot be detectedRed: correctly predicted regulatorBlue: targets
Negative Signaling Loop
Energy and osmotic stress module
Tpk1 (SM)
Msn4 (TF)
Nth1 Tps1 Glo1
Black: regulators that cannot be detectedRed: correctly predicted regulatorBlue: targets
Why Does it Work?
Feed-forward and feedback loops
Some transcription factors and signal transduction molecules have a detectable expression signature
Module Networks infers their regulatory relationships
Assignment Download the yeast stress expression dataset Download the list of transcription factor regulators Randomly partition the dataset in a 5-fold cross
validation scheme For k=50:
Create a hard-clustering model (use code from earlier exercise). At each array, this model has a separate Gaussian distribution for each of the 50 values of the cluster variable
Use the assignment of genes to clusters that you learned in the hard-clustering, and for each cluster, learn a decision tree with at most: (1) one split (2) two splits (3) three splits
Note 1: allow only splits with >=5 arrays in each side of the split Note 2: split question is whether the expression level of the
transcription factor is greater than some value
Assignment Continued Note 3: at each leaf of the resulting model, there is a single
Gaussian distribution that is used for all arrays that map to that leaf
Compute the log-likelihood of the test data for each model (hard-clustering, and each of the three regulation models)
Plot the avg. and std. test log-likelihood for each model For the model with two splits on each cluster, use the
Gaussian distribution at each array to sample a new expression dataset with exactly the same number of genes and number of arrays. For each original gene and array, you sample from the Gaussian distribution associated with that gene and that array
Learn a model with two splits for each cluster Plot the number of regulation tree splits that are identical
between the model that sampled the data and the new model that you learned