university of north carolina at chapel...
Post on 20-Jan-2021
5 Views
Preview:
TRANSCRIPT
A hierarchical regression mixture modelfor inferring gene regulatory networks
Mayetri Guptagupta@bios.unc.edu
University of North Carolina at Chapel Hill
gupta@bios.unc.edu -- p.1/30
Upstream regulation ↔ Downstream expression
Fundamental question: how can we understand the biologicalmechanisms leading to disease?
...gtggtTAGAATagcgactgttttt... gene 1
...taggTATAATacagtctgacaaaa... gene 2
...cagcaacattgaTATAATtgccat... gene 3
...ctaaaacaatTATTATttatcagg... gene 4
0
1
2
bits | 1 T 2 A 3 GT 4 TA 5 GCA 6 T|
Co-regulated genes sharesimilar upstream patterns
Identify genes that are differentially expressed under differenttreatments or conditions
gupta@bios.unc.edu -- p.2/30
Gene Regulation: DNA Motifs
Proteins bind to DNA to activate gene transcription
0
1
2
bits |
1 T 2 A 3 GT
4 TA 5 GCA 6 T|Position specific weight
matrix (PSWM)
or Motif
gupta@bios.unc.edu -- p.3/30
Gene regulation in complex genomes
Harder problem: many transcription factors working inco-ordination
LARGE sequence search space: using sequence data only→ many false positives?
gupta@bios.unc.edu -- p.4/30
Upstream regulation ↔ Downstream expression
Gene expression contains information about sequence motifs
Sequence may contain information on gene co-regulation
Expression clustering→ Motif discovery
or
Expression clustering↔ Motif discovery ?
What if initial clustering is inaccurate?
gupta@bios.unc.edu -- p.5/30
Cell-cycle data set (Spellman, Mol Biol Cell. 1998)
28 63 98
−2−1
01
23
G1G2/MM/G1SS/G2
PS
fragreplacem
ents
gene
expr
essi
on
time (minutes)
Measurements over 18 time points, 3 different experiments∼ 800 genes known to be cell-cycle dependent
Do clusters of genes share common TFs?Do certain TFs work combinatorially on groups of genes?
color: time when gene is active
gupta@bios.unc.edu -- p.6/30
Using Gene Expression in Motif Discovery
REDUCER (Bussemaker, Nat. Genet. 2001) correlatesexpression of gene with number of motif occurrences
MDScan (Liu, Nat Biotech. 2002) Most strongly differentiallyexpressed genes → candidate motifs.
Motif Regressor (Conlon, PNAS 2003)Multiple regression model: Sum of motif effects explainsgene expression
Yg = α+M
∑m=1
βmSmg + εg
Yg: expression of gene g; Smg: motif-match score
gupta@bios.unc.edu -- p.7/30
Using Gene Expression in Motif Discovery
Non-parametric approaches
Phuong et al. (Bioinformatics, 2004): Classification Trees(CART)
Arbitrary decision criterion based on the number ofoccurrences of a motif type
Multivariate adaptive regression splines (Das et al, PNAS2004)
Joint sequence-expression model without parametric connection
Holmes and Bruno (ISMB proc.,2000) joint likelihood forsequence and expression data
gupta@bios.unc.edu -- p.8/30
Using motif information in gene clustering
Infer sets of transcription factors involved in regulating groups ofgenes
Higher transcriptional activity → greater presence of TFbinding sites, more pronounced expression changes
Genes within a “cluster” may be correlated, with or withoutsharing common transcription factors
Measurements on the same gene in different conditions maybe correlated due to sharing the same upstream transcriptionfactor binding sites
gupta@bios.unc.edu -- p.9/30
Linear mixed effects model
y = Xβ+Zb+ ε
β: fixed effectsb: random effects
ε ∼ N(0,τ−10 I)
y: gene expression
Fixed effects: Sequence MotifLevels of factor are reproduced exactly if experiment isrepeated
Random effects: Expression clusterLevels of factor (expression + clusters) may not bereproduced exactly if experiment is repeated
gupta@bios.unc.edu -- p.10/30
Joint Model for Sequence-Expression
Complication: gene cluster identity cannot be assumed knownZ matrix not “fixed”
Conditional on cluster k, (k = 1, . . . ,K), vector of log-expressionvalues of gene g generated from mixed-effects model:
Yg|zg = k,X , parameters ∼ N(ξg +XTg βk1,σ2
kI) (≡ fk)
gupta@bios.unc.edu -- p.11/30
Joint Model for Sequence-Expression
Complication: gene cluster identity cannot be assumed knownZ matrix not “fixed”
Conditional on cluster k, (k = 1, . . . ,K), vector of log-expressionvalues of gene g generated from mixed-effects model:
Yg|zg = k,X , parameters ∼ N(ξg +XTg βk1,σ2
kI) (≡ fk)
Unconditionally, a mixture model
P(Y |X , parameters) = ∏genes g
[∑
cluster kπk fk(Yg| parameters)
]
Which β’s significant, in which cluster?gupta@bios.unc.edu -- p.12/30
Bayesian hierarchical formulation
Prior distributions for cluster k parameters:“Expression” model:
µk ∼ N(·,v2k0I)
σ2k ∼ InvGamma(·, ·)
ξg|zg = k ∼ N(µk,τ0σ2kI)
“Sequence” model: βk ∼ N(β0,V k)Probabilities of cluster membership:P(zg = k) = πk
(π1, . . . ,πK) ∼ Dirichlet(α1, . . . ,αK)
G genes; T measurements per gene
gupta@bios.unc.edu -- p.13/30
Bayesian hierarchical formulation
For each βk, we use multivariate extension of g-prior (Zellner,1986), so that
V k =cσ2
k
TS−1
k , where Sk = ∑zg=k
XgXTg
Why use g-prior?
Computational efficiency, varying c −→ more/less informative
Induces dependence among genes in a cluster due tosequence effects
Cov(Yg,Yg′|Zg,Zg′ = k) = v2k0I +
cσ2k
T1[XT
g S−1k Xg
]1T
gupta@bios.unc.edu -- p.14/30
Regulatory Motif Model
· · ·θ0 θ0 θ1 · · · θ6 θ0 θ0 · · ·
Every non-siteposition multinomialwithθ0 = (θ01, . . . ,θ04)
Every motif position imultinomial withθi = (θi1, . . . ,θi4)
Product Multinomial modelChallenge: Find position of sites and θ’s
MEME (Bailey, ISMB 1994) Gibbs Motif Sampler (Liu, JASA 1995), AlignAce (Roth, Nat. BioTech 1998),
BioProspector (Liu, Pac. Symp. Biocomp. 2001), Stochastic Dictionary (Gupta, JASA 2003)
gupta@bios.unc.edu -- p.15/30
Sequence Motif Scoring
Starting set of motifs
De-novo: Different clusters of genes exhibiting “strong” up- ordown-regulation (MDScan)
Databases: Derived from experimental data
Motif score for w-width motif j and upstream sequence g:
Xg j = ∑positions i
P(seq(i, i+w−1)|motif j)P(seq(i, i+w−1)| null)
gupta@bios.unc.edu -- p.16/30
Sequence motif selectionInitial set of D motif candidates (D can be large!)In regression model, want to know which motifs correlated“significantly” with response (gene expression)
u = (u1, . . . ,uD) where
u j =
{1 if motif j is in model
0 otherwise.
Prior probability of motif inclusion
P(u) =D
∏j=1
ηu j(1−η)1−u j
Variable selection from LARGE potential setgupta@bios.unc.edu -- p.17/30
Outline of method
select motifs from
given motifs update clusters
update model parametersgiven cluster membership
and functional motifs ineach class
large initial set
active in each class
Complications
Cluster membership z unknown
Number of clusters K may be unknown (assume fixed for now)
Number of motif candidates is large
gupta@bios.unc.edu -- p.18/30
Parameter updating using MCMC
Update from joint posterior distribution
P(θ,β,u,z|Y ,X ,K)
θ = (µ,σ2,π)
For updating steps for parameters θ,β and z, marginalize overother parameters for efficiency (Conjugate forms permit this)
For updating u, use evolutionary Monte Carlo (Liang andWong, JASA 2001)
Select motifs that have most effect on expression, and differentiatemost among clusters
gupta@bios.unc.edu -- p.19/30
Three simulated data sets
Regression coeffs. for motifs cor-
responding to 3 PSWMs from
JASPAR database SAP1, SRF,
amd MEF2
200 “genes” in K = 2 clusters
2 measurements each
Data 1 Data 2 Data 3
Coef. C1 C2 C1 C2 C1 C2
βM1 2 0 2 0 2 -2
βM2 0 2 0 -2 0 0
βM3 0 0 0 0 0 0
(Motif 3 not present in data)
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
++
++
++
+
+
+
+
++
+
+
+
++
+
++++
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
o
o
o
o
o o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o o
oo
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o o o
ooo
o o
o
o
o
o
o
o
ooo
o
o
oo
o
o
o
o
o
o
o
o
o
o
−1 0 1 2
01
23
4
x11
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
++
++
++
+
+
+
+
++
+
+
+
++
+
+++
++
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
o
o
o
o
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo
oo
o o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
ooo
ooo
oo
o
o
o
o
o
o
ooo
o
o
oo
o
o
o
o
o
o
o
o
o
o
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
01
23
4
x12
Y1 +
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
++
++
++
+
+
+
+
++
+
+
+
++
+
+++
++
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
o
o
o
o
o o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o o
oo
o o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o o o
ooo
o o
o
o
o
o
o
o
o oo
o
o
oo
o
o
o
o
o
o
o
o
o
o
−1.0 −0.5 0.0 0.5 1.0
01
23
4
x13
Y1
++
+
+
++
+
+ ++
++
+
+
++
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
++
+
+
++
+
++
+ ++
+
+
+
++
+
+
++
+
+++
+
+
+
++
+
++
++
+
+
+
+++
+
+
++
+
+++
+
+
+
+
+
++
+
+
+
+
++
+
+
+
++
o
oo
o
o
oo
oo
o
o
o
o
o
o
o o
o
o
oo
o oo
o
oo
o
oo
oo
o
o
ooooo
oo
o
o
o
oo
o
o
o
oo o
o
o
o
o
o
o
oo
o o
o
o
oo
o
o
o
o
oooo
o
o
oo
o
o
o
o
oo
o
o oo
o
o
o
o
o
o
o
o
o
o
o
o
−1 0 1 2
−10
12
34
5
x31
++
+
+
++
+
+ ++
++
+
+
++
+
+
+
+
+
+
+
+
+ ++
+
+
+
+
+
+
++
+
+
++
+
++
+ + +
+
+
+
++
+
+
+ +
+
++++
+
+
++
+
+ +
++
+
+
+
++ +
+
+
+ +
+
++ +
+
+
+
+
+
+++
+
+
+
++
+
+
+
++
o
oo
o
o
oo
oo
o
o
o
o
o
o
oo
o
o
oo
ooo
o
oo
o
oo
oo
o
o
oo ooo
oo
o
o
o
oo
o
o
o
ooo
o
o
o
o
o
o
oo
oo
o
o
oo
o
o
o
o
oooo
o
o
oo
o
o
o
o
oo
o
ooo
o
o
o
o
o
o
o
o
o
o
o
o
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
−10
12
34
5
x32
Y3
++
+
+
++
+
+ ++
+ +
+
+
++
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
++
+
+
++
+
+ +
+++
+
+
+
++
+
+
+ +
+
+ ++
+
+
+
++
+
++
++
+
+
+
+++
+
+
+ +
+
++ +
+
+
+
+
+
++
+
+
+
+
++
+
+
+
++
o
oo
o
o
oo
oo
o
o
o
o
o
o
o o
o
o
oo
ooo
o
oo
o
o o
oo
o
o
o oo oo
oo
o
o
o
oo
o
o
o
oo o
o
o
o
o
o
o
oo
o o
o
o
oo
o
o
o
o
o oo o
o
o
o o
o
o
o
o
oo
o
o oo
o
o
o
o
o
o
o
o
o
o
o
o
−1.0 −0.5 0.0 0.5 1.0
−10
12
34
5
x33
Y3
++
++
+
++
+
+
+
+
+
++
+
+
+
++
+
++
+
+
+
+
++
+
+
+
+++
+
+ +
+
+
++
+
+
+
++
+
+
++
+
+
+
+
++
+
+
+
++
++
+
+
+ +
+
++
+
+
++
+ ++
+
++
+
++
+
++
+
++
+
+
++
+
+
+
+
+
+
+
o
o
o
o
o
o
o
oo
o
o
o
o
ooo
o
o
oo o
o
oo
o
o
o
o
oo
ooo
o
o
oo
o
oo
oo
o
o
o
oo
o
o oo
o
o
o
o
o
o
o o
o
o
o
oo
o
oo
ooo
o
oo
o
oo
oo
o
o
o
o
o o
o
o
oo
o
o
o
o
ooooo
oo
o
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5
−20
24
++
++
+
++
+
+
+
+
+
+ +
+
+
+
++
+
+ +
+
+
+
+
++
+
+
+
+++
+
++
+
+
++
+
+
+
++
+
+
++
+
+
+
+
++
+
+
+
++
++
+
+
++
+
++
+
+
++
+ ++
+
++
+
+ +
+
++
+
++
+
+
++
+
+
+
+
+
+
+
o
o
o
o
o
o
o
oo
o
o
o
o
ooo
o
o
oo o
o
oo
o
o
o
o
oo
ooo
o
o
oo
o
oo
oo
o
o
o
oo
o
o oo
o
o
o
o
o
o
o o
o
o
o
oo
o
oo
ooo
o
oo
o
oo
o o
o
o
o
o
oo
o
o
oo
o
o
o
o
o oooo
o o
o
−0.5 0.0 0.5 1.0 1.5 2.0
−20
24
Y2
++
++
+
++
+
+
+
+
+
++
+
+
+
++
+
+ +
+
+
+
+
++
+
+
+
+ ++
+
+ +
+
+
++
+
+
+
++
+
+
++
+
+
+
+
++
+
+
+
++
++
+
+
++
+
+ +
+
+
+++ +++
++
+
++
+
++
+
+ +
+
+
++
+
+
+
+
+
+
+
o
o
o
o
o
o
o
oo
o
o
o
o
ooo
o
o
oo o
o
oo
o
o
o
o
o o
ooo
o
o
oo
o
oo
oo
o
o
o
oo
o
o oo
o
o
o
o
o
o
oo
o
o
o
oo
o
oo
oo o
o
oo
o
oo
o o
o
o
o
o
oo
o
o
oo
o
o
o
o
o ooo
o
o o
o
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−20
24
Y2
S1 S2 S3Y
(3)
1Y
(2)
1Y
(1)
1
Y1 with motif 1,2,3 scores (columns)
in 3 data sets (rows)
gupta@bios.unc.edu -- p.20/30
Simulation study results: Bayes factors
Data 1 Data 2 Data 3
1 2 3 4 5 6
−3
00
−2
50
−2
00
PS
fragreplacem
ents
K
P(M
K)
1 2 3 4 5 6
−7
00
−6
00
−5
00
−4
00
data3
PS
fragreplacem
ents
K
P(M
K)
1 2 3 4 5 6
−7
00
−6
00
−5
00
−4
00
−3
00
data2
PS
fragreplacem
ents
K
P(M
K)
Optimal choice: K = 2
Marginal model probability for MK through Double mixtureimportance sampling
P(Y |MK)=̂1
NtNs∑
t∑
sP(Y |Z(t)
,θ(s),K)
π1(θ(s)|z(t))
f1(θ(s)|z(t))
π2(z(t))
f2(z(t))
gupta@bios.unc.edu -- p.21/30
Simulation: β estimates
Data 1 Data 2 Data 3
1 2
−0
.50
.00
.51
.01
.52
.0
1 2
−0
.50
.00
.51
.01
.52
.0
PS
fragreplacem
ents
motif type
coef
ficie
nt
c1c2
1 2
−3
−2
−1
01
2
1 2
−3
−2
−1
01
2
PS
fragreplacem
entsmotif type
coef
ficie
nt
c1c2
−2
−1
01
2−
2−
10
12
PS
fragreplacem
ents
motif type
coef
ficie
nt c1c2
Data sets 1 and 2: Motifs 1,2 selectedData set 3: Motif 1 selected
gupta@bios.unc.edu -- p.22/30
Spellman (1998) data
28 63 98
−2−1
01
23
G1G2/MM/G1SS/G2
PS
fragreplacem
ents
gene
expr
essi
on
time (minutes)
Pick different groups of genes that are highly differentially expressed
Sets of motif candidates of widths 7-12 bp (using MDscan)
Motif overlaps lead to collinearity: remove motifs with correlation > 0.5 with a
higher-ranked one → 32 candidate motifs
Two consecutive time points: 1-2, 3-4, . . ., 17-18
gupta@bios.unc.edu -- p.23/30
Number of clusters K∗
log(B̂F) compared to 1-component model
22 2
2
2
2 2
2 2−2
00
2040
time interval
log(
BF)
1 2 3 4 5 6 7 8 9
3
3
3
3
3
3
3
3 3
Interval 1 2 3 4 5 6 7 8 9K∗ 1 3 1 2 2 2 2 1 1
0
1
2
bits
1
TA
C
2
T
AG
3
AGC
4
A
CG
5
TA
6
TA
7
TA
8
G
CTA
9
GTA
0
1
2
bits
1
TA
2
A
C
3
T
AG
4
A
TC
5
A
G
6
AT
7
CTA
0
1
2
bits
1
TA
2
A
C
3
T
AG
4
A
TC
5
A
G
6
AT
7
CTA
0
1
2
bits
1
GTA
2
CGTA
3
GTC
4GTA
5CTGA
6
GTC
7
CGTA
0
1
2
bits
1
TA
2
A
C
3
T
AG
4
A
TC
5
A
G
6
AT
7
CTA
0
1
2
bits
1
C
TAG
2
C
TAG
3
TCGA
4
T
GAC
5
G
TCA
6
CATG
7
CTGA
0
1
2
bits
1
TGAC
2
TCAG
3
TGCA
4
GCTA
5
T
CAG
6
GCTA
7
TCAG
0
1
2
bits
1
G
TAC
2
AT
3
GCTA
4
CTA
5
AT
6
A
CT
7
GCTA
0
1
2
bits
1
TA
2
A
C
3
T
AG
4
A
TC
5
A
G
6
AT
7
CTA
SCB MCB MCB SFF MCB MCM1 MCM1 MCB
0
1
2
bits
1
GTA
2
CGTA
3
GTC
4
GTA
5
CTGA
6
GTC
7
CGTA
0
1
2bi
ts
1
GTA
2
CGTA
3GTC
4
GTA
5
CTGA
6
GTC
7
CGTA
0
1
2
bits
1
T
A
2
A
C
3
T
AG
4
A
TC
5
A
G
6
AT
7
CTA
0
1
2
bits
1
T
A
2
A
C
3
T
AG
4
A
TC
5
A
G
6
AT
7
CTA
SFF SFF MCB MCB
0
1
2
bits
1GTA
2CGTA
3
GTC
4
GTA
5
CTGA
6
GTC
7
CGTA
SFFSignificant motifs over time intervalsgupta@bios.unc.edu -- p.24/30
Motif influence at different time intervalsMotif index −→
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
PS
fragreplacem
ents
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
PS
fragreplacem
ents
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
PS
fragreplacem
ents
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
PS
fragreplacem
ents
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
PS
fragreplacem
ents
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
PS
fragreplacem
ents
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
PS
fragreplacem
ents
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
PS
fragreplacem
ents0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
PS
fragreplacem
ents
Post
erio
rpro
b.of
sele
ctio
n
Selected motif types for 9 time intervals for optimal K
gupta@bios.unc.edu -- p.25/30
Significant motifs match experimental PSWMs
Index TF name Consensus Expt. Phase4 MCM1 CGAAGAG/CTCTTCG CCNNNWWRGG M6 GCN1 TCAGTCA/TGACTGA TCAGTCA7 [CSRE] GGACAGA/TCTGTCC [YCGGAYRRAWGG]
10 MCB ACGCGTA/TACGCGT WCGCGW G116 SFF AACAACA/TGTTGTT GTMAACAA M18 MCM1 CCAATTAGG/CCTAATTGG CCNNNWWRGG M20 [RME1] TTCAGGTAC/GTACCTGAA [GAACCTCAA]22 SCB CGCGAAAAA/TTTTTCGCG CNCGAAA G125 PHO4 CGTACGTAC/GTACGTACG CACGTK29 − CTTCGCATC/GATGCGAAG
K ≡ {G or T} ; M ≡ {A or C} ; N ≡ {A or C or G or T} ; R ≡ {A or G} ; W ≡ {A or T} ; Y ≡ {C or T}
gupta@bios.unc.edu -- p.26/30
Summary
Treating gene expression clustering as a variable may help indiscovering relationships between functional sequence motifs, andgroups of genes they regulate
Different groups of genes may behave as a cluster at differenttime points
Upstream sequence motifs “constant” but effects/interactionsover time may vary
gupta@bios.unc.edu -- p.27/30
Further extensions
Motif scoring issues: sensitivity, co-occurrence of sites
Efficient model selection
Extension to high density ChIP tiling arrays
Acknowledgement:Joseph G. Ibrahim (UNC), Jason Lieb (UNC)
UNC high-performance scientific computing group
gupta@bios.unc.edu -- p.28/30
Model Selection: number of clusters K
Likelihood-based methods not valid (BIC, etc.)
Bayes factor: ratio of model marginal probabilitiesMarginal probability for model MK
P(Y |MK) = ∑z
∫
θP(Y |Z,K)p(θ|z)p(Z|K)dθ
Double mixture importance sampling
P(Y |MK)=̂1
NtNs∑
t∑
sP(Y |Z(t)
,θ(s),K)
π1(θ(s)|z(t))
f1(θ(s)|z(t))
π2(z(t))
f2(z(t))
gupta@bios.unc.edu -- p.29/30
Model Selection: Number of clusters
Double mixture importance sampling
P(Y |MK)=̂1
NtNs∑
t∑
sP(Y |Z(t)
,θ(s),K)
π1(θ(s)|z(t))
f1(θ(s)|z(t))
π2(z(t))
f2(z(t))
Challenge: Good sampling densities f (·) for z and θ
For a simpler case (θ marginalized), Raftery et al (TR, 2003)propose permutation-based methods to find good candidatesampling densities for z
gupta@bios.unc.edu -- p.30/30
top related