infinite structured explicit duration hidden markov modelsfwood/talks/ishsmm.pdf · in nite...
TRANSCRIPT
Infinite Structured Explicit DurationHidden Markov Models
Frank Wood
Joint work with Chris Wiggins, Mike Dewar, and Jonathan Huggins
Columbia University
November, 2011
Wood (Columbia University) ISEDHMM November, 2011 1 / 57
Hidden Markov Models
Hidden Markov models (HMMs) [Rabiner, 1989] are an important tool fordata exploration and engineering applications.
Applications include
Speech recognition [Jelinek, 1997, Juang and Rabiner, 1985]
Natural language processing [Manning and Schutze, 1999]
Hand-writing recognition [Nag et al., 1986]
DNA and other biological sequence modeling applications [Kroghet al., 1994]
Gesture recognition [Tanguay Jr, 1995, Wilson and Bobick, 1999]
Financial data modeling [Ryden et al., 1998]
. . . and many more.
Wood (Columbia University) ISEDHMM November, 2011 2 / 57
Graphical Model: Hidden Markov Model
πm
θm
y1
y2 yT
y3K
z0 z1 z2 z3 zT
Wood (Columbia University) ISEDHMM November, 2011 3 / 57
Notation: Hidden Markov Model
zt |zt−1 = m ∼ Discrete(πm)
yt |zt = m ∼ Fθ(θm)
A =
...
......
π1 · · · πm · · · πK...
......
Wood (Columbia University) ISEDHMM November, 2011 4 / 57
HMM: Typical Usage Scenario (Character Recognition)
Training data: multiple “observed” yt = vt , ht sequences of styluspositions for each kind of character
Task: train |Σ| different models, one for each character
Latent states: sequences of strokes
Usage: classify new stylus position sequences using trained modelsMσ = Aσ,Θσ
P(Mσ|y1, . . . , yT ) ∝ P(y1, . . . , yT |Mσ)P(Mσ)
Wood (Columbia University) ISEDHMM November, 2011 5 / 57
Shortcomings of Original HMM Specification
Latent state dwell times are not usually geometrically distributed
P(zt = m, . . . , zt+L = m|A)
=L∏`=1
P(zt+`+1 = m|zt+` = m,A)
= Geometric(L; πm(m))
Prior knowledge often suggests structural constraints on allowabletransitions
The state cardinality of the latent Markov chain K is usually unknown
Wood (Columbia University) ISEDHMM November, 2011 6 / 57
Explicit Duration HMM / H. Semi-Markov Model
[Mitchell et al., 1995, Murphy, 2002, Yu and Kobayashi, 2003, Yu, 2010]
Latent state sequence z = (s1, r1, . . . , sT , rT)Latent state id sequence s = (s1, . . . , sT )
Latent “remaining duration” sequence r = (r1, . . . , rT )
State-specific duration distribution Fr (λm)
Other distributions the same
An EDHMM transitions between states in a different way than does atypical HMM. Unless rt = 0 the current remaining duration is decrementedand the state does not change. If rt = 0 then the EDHMM transitions to astate m 6= st according to the distribution defined by πst
Wood (Columbia University) ISEDHMM November, 2011 7 / 57
EDHMM notation
Latent state zt = st , rt is tuple consisting of state identity and time leftin state.
st |st−1, rt−1 ∼
I(st = st−1), rt−1 > 0
Discrete(πst−1), rt−1 = 0
rt |st , rt−1 ∼
I(rt = rt−1 − 1), rt−1 > 0
Fr (λst ), rt−1 = 0
yt |st ∼ Fθ(θst )
Wood (Columbia University) ISEDHMM November, 2011 8 / 57
EDHMM: Graphical Model
λm
πm
θm
r0
s0
r1
s1
y1
r2
s2
y2
rT
sT
yT
r3
s3
y3K
Wood (Columbia University) ISEDHMM November, 2011 9 / 57
Structured HMMs: i.e. left-to-right HMM [Rabiner, 1989]
Example: Chicken pox
Observations vital signs
Latent states pre-infection, infected, post-infectiona
State transition structure can’t go from infected to pre-infection
adisregarding shingles
Structured transitions imply zeros in the transition matrix A, i.e.
p(st = m|st−1 = `) = 0 ∀ m < `
Wood (Columbia University) ISEDHMM November, 2011 10 / 57
Bayesian HMM
We will put a prior on parameters so that we can effect a solutionthat conforms to our ideas about what the solution should look like
Structured prior examples
Ai,j = 0 (hard constraints)Ai,j ≈
Pj Ai,j (rich get richer)
Regularization means that we can specify a model with moreparameters than could possibly be needed
infinite complexity (i.e. K →∞) avoids many model selection problems“extra” states can be thought of as auxiliary or nuisance variables
Wood (Columbia University) ISEDHMM November, 2011 11 / 57
Bayesian HMM
πm
Hy θm
y1
y2 yT
y3K
z0 z1 z2 z3 zTHz
πm ∼ Hz
θm ∼ Hy
zt |zt−1 = m ∼ Discrete(πm)
yt |zt = m ∼ Fθ(θm)
Wood (Columbia University) ISEDHMM November, 2011 12 / 57
Infinite HMMs (IHMM) [Beal et al., 2002, Teh et al., 2006]
πm
θm
y1
y2 yT
y3∞
z0 z1 z2 z3 zT
c0
γ
c1
Hy
K →∞
Wood (Columbia University) ISEDHMM November, 2011 13 / 57
Other HMM variants
Sticky Infinite HMMs [Fox et al., 2011]
Extra parameter per state used to bias towards self-transition
Hierarchical HMMs [Murphy, 2001]
State is hierarchical (i.e. sequence of letters composed of strokesequences)
Factorial HMMs [Ghahramani and Jordan, 1996]
Infinite explicit duration HMM [Johnson and Willsky, 2010]
No generative model. Latent history sampling used to assert theexistence of an implicitly defined IEDHMM that can be sampled fromby rejecting HDP-HMM samples that violate transition constraints.
Wood (Columbia University) ISEDHMM November, 2011 14 / 57
Infinite Structured Explicit Duration HMM (ISEDHMM)
Generative framework for HMMs with
Explicitly parameterized duration distributions
Structured transition priors
Countable state cardinality
Fundamental Problems
How to generate structured, dependent, infinite-dimensional transitiondistributions.
How to do inference in HMMs with countable state cardinality andcountable duration distribution support.
[Huggins and W, 2011]
Wood (Columbia University) ISEDHMM November, 2011 15 / 57
ISEDHMM: Graphical Model
c0 Hr
γ λm
c1 πm
Hy θm
r0
s0
r1
s1
y1
r2
s2
y2
rT
sT
yT
r3
s3
y3∞
Structured, dependent, infinite-dimensional transition distributions?
Wood (Columbia University) ISEDHMM November, 2011 16 / 57
ISEDHMM: Recipe
Poisson process [Kingman, 1993]
Gamma process [Kingman, 1993]
SNΓPs [Rao and Teh, 2009]
Structured, dependent, infinite-dimensional transition distributions[Huggins and W, 2011]
Wood (Columbia University) ISEDHMM November, 2011 17 / 57
ISEDHMM: Recipe
Gamma process [Kingman, 1993] as Poisson process overΘ⊗ V ⊗ [0,∞) with rate / mean measure
µ(Θ, V , S) = α(Θ, V )
∫Sγ−1e−γ
A draw from a Gamma process with
α(Θ, V ) = c0Hθ(Θ)Hv (V ).
[Kingman, 1993] has the form
G =∞∑
m=1
γmδ(θm,vm)
where (θm, vm) ∼ Hθ × Hv .
Wood (Columbia University) ISEDHMM November, 2011 18 / 57
ISEDHMM: Recipe
Non-disjoint “restricted projections” of Gamma processes aredependent Gamma processes (SNΓPs) [Rao and Teh, 2009]
Hv = Geom(p)
0v0
R0
1v1
R1
2v2
R2
3v3
R3
4v4
R4
5v5
R5
6v6
R6
7v7
8v8
R+
9v9
. . .
. . .
. . .
M
R4
G0 =∑m 6=0
. . . , · · · ,G4 =∑m 6=4
γmδθm , · · ·
Wood (Columbia University) ISEDHMM November, 2011 19 / 57
ISEDHMM: Recipe
Normalized dependent ΓP draws are dependent Dirichlet processdraws.1 In the ISEDHMM, DP draws are the dependent, structured,infinite-dimensional transition distributions
D4 =G4
G4(Θ)(1)
=
∑m 6=4 γmδθm∑
θ∈Θ
∑m′ 6=4 γm′δθm′
(2)
=
∑m 6=4 γmδθm∑m′ 6=4 γm′
(3)
=∑m 6=4
γm∑m′ 6=4 γm′
δθm
1a draw from a Dirichlet process [Ferguson, 1973] is an infinite sum of weightedatoms [Sethuraman, 1994] where the weights sum to one.
Wood (Columbia University) ISEDHMM November, 2011 20 / 57
ISEDHMM: Recipe
GRM GR1 GR2 GR3
G2, D2G1, D1 G3, D3 GM , DM
D2, 2D1, 1 D3, 3 DM , M
c0
c1
G = GR+
Structured, dependent, infinite dimensional transition distributionsπm can be formed from draws from DDPs [Huggins and W, 2011]
Wood (Columbia University) ISEDHMM November, 2011 21 / 57
ISEDHMM Inference: Beam Sampling
We employ the forward-filtering, backward slice-sampling approach for theIHMM of [Van Gael et al., 2008] and EDHMM of [Dewar, Wiggins and W,2011], in which the state and duration variables s and r are sampledconditioned on auxiliary slice variables u.
Net result: efficient, always finite forward backward procedure for samplinglatent states.
Wood (Columbia University) ISEDHMM November, 2011 22 / 57
Auxiliary Variables for Sampling
Objective: get samples of x .
Sometimes it is easier to introducean auxiliary variable u and to Gibbssample the joint P(x , u) (i.e. samplefrom P(x |u;λ) then P(u|x , λ), etc.)then discard the u values than it is todirectly sample from p(x |λ).
Useful when: p(x |λ) does not have aknown parametric form but adding uresults in a parametric form andwhen x has countable support andsampling it requires enumerating allvalues.
λ
x
λ
x
u
Wood (Columbia University) ISEDHMM November, 2011 23 / 57
Auxiliary Variables for Sampling
Objective: get samples of x .
Sometimes it is easier to introducean auxiliary variable u and to Gibbssample the joint P(x , u) (i.e. samplefrom P(x |u;λ) then P(u|x , λ), etc.)then discard the u values than it is todirectly sample from p(x |λ).
Useful when: p(x |λ) does not have aknown parametric form but adding uresults in a parametric form andwhen x has countable support andsampling it requires enumerating allvalues.
λ
x
λ
x
u
Wood (Columbia University) ISEDHMM November, 2011 23 / 57
Auxiliary Variables for Sampling
Objective: get samples of x .
Sometimes it is easier to introducean auxiliary variable u and to Gibbssample the joint P(x , u) (i.e. samplefrom P(x |u;λ) then P(u|x , λ), etc.)then discard the u values than it is todirectly sample from p(x |λ).
Useful when: p(x |λ) does not have aknown parametric form but adding uresults in a parametric form andwhen x has countable support andsampling it requires enumerating allvalues.
λ
x
λ
x
u
Wood (Columbia University) ISEDHMM November, 2011 23 / 57
Slice Sampling: A very useful auxiliary variable sampling trick
Unreasonable Pedagogical Example:
x |λ ∼ Poisson(λ) (countable support)
enumeration strategy for sampling x (impossible)
auxiliary variable u with P(u|x , λ) = I(0≤u≤P(x |λ))P(x |λ)
Note: Marginal distribution of x is
P(x |λ) =∑u
P(x , u|λ)
=∑u
P(x |λ)P(u|x , λ)
=∑u
P(x |λ)I(0 ≤ u ≤ P(x |λ))
P(x |λ)
=∑u
I(0 ≤ u ≤ P(x |λ)) = P(x |λ)
Wood (Columbia University) ISEDHMM November, 2011 24 / 57
Slice Sampling: A very useful auxiliary variable sampling trick
This suggests a Gibbs sampling scheme: alternately sampling from
P(x |u, λ) ∝ I(u ≤ P(x |λ))
finite support, uniform above slice, enumeration possible
P(u|x , λ) = I(0≤u≤P(x |λ))P(x |λ)
uniform between 0 and y = P(x |λ)
then discarding the u values to arrive at x samples marginally distributedaccording to P(x |λ).
Wood (Columbia University) ISEDHMM November, 2011 25 / 57
ISEDHMM Inference: Beam Sampling
Forward-backward slice sampling only has to consider a finite number ofsuccessor states at each timestep. With auxiliary variables
p(ut |zt , zt−1) =I(ut < p(zt |zt−1))
p(zt |zt−1)
and
p(zt |zt−1) = p((st , rt)|(st−1, rt−1))
=
rt−1 > 0, I(st = st−1)I(rt = rt−1 − 1)
rt−1 = 0, πst−1st Fr (rt ;λst ).
one can run standard forward-backward conditioned on u’s.
Wood (Columbia University) ISEDHMM November, 2011 26 / 57
Results
To illustrate IEDHMM learning on synthetic data, five hundred datapointswere generated using a 4 state EDHMM with Poisson durationdistributions
λ = (10, 20, 3, 7)
and Gaussian emission distributions with means
µ = (−6,−2, 2, 6)
all unit variance.
Wood (Columbia University) ISEDHMM November, 2011 27 / 57
IEDHMM: Synthetic Data Results
0 100 200 300 400 500−10
−5
0
5
10
time
emission/mean
4 5 6 70
500
1000
number of states
coun
t
number of states
coun
ts
Wood (Columbia University) ISEDHMM November, 2011 28 / 57
IEDHMM: Synthetic Data, State Duration Parameter Posterior
0 5 10 15 20 250
100
200
300
duration rate
coun
t
Wood (Columbia University) ISEDHMM November, 2011 29 / 57
IEDHMM: Synthetic Data, State Mean Posterior
−10 −5 0 5 100
200
400
600
emission mean
coun
t
Wood (Columbia University) ISEDHMM November, 2011 30 / 57
IEDHMM: Nanoscale Transistor Spontaneous Voltage Fluctuation
0 1000 2000 3000 4000 5000−3
−2
−1
0
1
2
time
emission/mean
Wood (Columbia University) ISEDHMM November, 2011 31 / 57
IEDHMM vs. IHMM: Modeling the Morse Code Cepstrum
Wood (Columbia University) ISEDHMM November, 2011 32 / 57
Wrap-Up
Novel Gamma process construction for dependent, structured, infinitedimensional HMM transition distributions.
Other transition distribution structures (left-to-right) can beimplemented simply by changing “restricted projection regions.”
ISEDHMM framework generalizes the HMM, Bayesian HMM, infiniteHMM, left-to-right HMM, explicit duration HMM, and more.
Wood (Columbia University) ISEDHMM November, 2011 33 / 57
Future Work
Generalize to spatial prior on HMM states (“location”)
Simultaneous location and mappingProcess diagram modeling for systems biology
Applications; seeking “users”
Wood (Columbia University) ISEDHMM November, 2011 34 / 57
Questions?
Thank you!
Wood (Columbia University) ISEDHMM November, 2011 35 / 57
Bibliography I
M J Beal, Z Ghahramani, and C E Rasmussen. The Infinite HiddenMarkov Model. In Advances in Neural Information Processing Systems,pages 29–245, March 2002.
M Dewar, C Wiggins, and F Wood. Inference in hidden Markov modelswith explicit state duration distributions. In Submission, 2011.
T.S. Ferguson. A Bayesian analysis of some nonparametric problems. TheAnnals of Statistics, 1(2):209–230, 1973.
E B Fox, E B Sudderth, M I Jordan, and A S Willsky. A StickyHDP-HMM with Application to Speaker Diarization. Annals of AppliedStatistics, 5(2A):1020–1056, 2011.
Z Ghahramani and M I Jordan. Factorial Hidden Markov Models. MachineLearning, 29:245–273, 1996.
J. Huggins and F. Wood. Infinite structured explicit duration hiddenMarkov models. In Under Review, 2012.
Wood (Columbia University) ISEDHMM November, 2011 36 / 57
Bibliography II
F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997.
M Johnson and A S Willsky. The hierarchical Dirichlet process hiddensemi-Markov model. In Proceedings of the Twenty-Sixth ConferenceAnnual Conference on Uncertainty in Artificial Intelligence (UAI-10),pages 252–259, 2010.
B H Juang and L R Rabiner. Mixture Autoregressive Hidden MarkovModels for Speech Signals. IEEE Transactions on Acoustics, Speech,and Signal Processing, 33(6):1404–1413, 1985.
J F C Kingman. Poisson Processes. Oxford Studies in Probability. OxfordUniversity Press, 1993.
A. Krogh, M. Brown, I. S. Mian, K. Sjolander, and D. Haussler. HiddenMarkov models in computational biology: Applications to proteinmodelling . Journal of Molecular Biology, 235:1501–1531, 1994.
C. Manning and H. Schutze. Foundations of statistical natural languageprocessing. MIT Press, Cambridge, MA, 1999.
Wood (Columbia University) ISEDHMM November, 2011 37 / 57
Bibliography III
C Mitchell, M Harper, and L Jamieson. On the complexity of explicitduration HMM’s. IEEE Transactions on Speech and Audio Processing, 3(3):213–217, 1995.
K P Murphy. Hierarchical HMMs. Technical report, University CaliforniaBerkeley, 2001.
K P Murphy. Hidden semi-markov models (HSMMs). Technical report,MIT, 2002.
R. Nag, K. Wong, and F. Fallside. Script regonition using hidden Markovmodels. In ICASSP86, pages 2071–2074, 1986.
L R Rabiner. A tutorial on hidden Markov models and selected applicationsin speech recognition. In Proceedings of the IEEE, pages 257–286, 1989.
V Rao and Y W Whye Teh. Spatial Normalized Gamma Processes. InAdvances in Neural Information Processing Systems, pages 1554–1562,2009.
Wood (Columbia University) ISEDHMM November, 2011 38 / 57
Bibliography IV
T. Ryden, T. Terasvirta, and S. rAsbrink. Stylized facts of daily returnseries and the hidden markov model. Journal of Applied Econometrics,13(3):217–244, 1998.
J. Sethuraman. A constructive definition of Dirichlet priors. StatisticaSinica, 4:639–650, 1994.
D.O. Tanguay Jr. Hidden Markov models for gesture recognition. PhDthesis, Massachusetts Institute of Technology, 1995.
Y W Teh, M I Jordan, M J Beal, and D M Blei. Hierarchical DirichletProcesses. Journal of the American Statistical Association, 101(476):1566–1581, 2006.
J Van Gael, Y Saatci, Y W Teh, and Z Ghahramani. Beam sampling forthe infinite hidden Markov model. In Proceedings of the 25thInternational Conference on Machine Learning, pages 1088–1095. ACM,2008.
Wood (Columbia University) ISEDHMM November, 2011 39 / 57
Bibliography V
A.D. Wilson and A.F. Bobick. Parametric hidden Markov models forgesture recognition. Pattern Analysis and Machine Intelligence, IEEETransactions on, 21(9):884–900, 1999.
S Yu and H Kobayashi. An Efficient Forward–Backward Algorithm for anExplicit-Duration Hidden Markov Model. Signal Processing letters, 10(1):11–14, 2003.
S Z Yu. Hidden semi-Markov models. Artificial Intelligence, 174(2):215–243, 2010.
Wood (Columbia University) ISEDHMM November, 2011 40 / 57
IEDHMM vs. IHMM
02
46
0
5
100
0.5
1
log(1 + duration KL)emission KL
AMI d
iffer
ence
00.5
12468
−0.6
−0.4
−0.2
0
0.2
0.4
log(1 + duration KL)emission KL
AMI d
iffer
ence
Left: data from Poisson duration distribution ISEDHMM, Right: data from IHMM fit with Poisson duration distribution
ISEDHMM
Wood (Columbia University) ISEDHMM November, 2011 41 / 57
Poisson Process [Kingman, 1993]
Poisson process can be defined by the requirement that the randomvariables defined as the counts of the number of “events” inside each of anumber of non-overlapping finite sub-regions of some space should eachhave a Poisson distribution and should be independent of each other.
ΩA
B
C
D
E
N(A) ∼ Poisson(µ(A))
Wood (Columbia University) ISEDHMM November, 2011 42 / 57
Gamma Process as Poisson Process [Kingman, 1993]
Define a Poisson process over the product space Θ⊗ [0,∞) with meanmeasure
µ(Θ, S) = α(Θ)
∫Sγ−1e−γ
where Θ ∈ Ω and S ⊂ [0,∞). A draw from this Poisson process yields acountably infinite set of pairs (θn, γn)n≥1, which can be used to form anatomic random measure
G =∑n≥1
γnδθn .
Wood (Columbia University) ISEDHMM November, 2011 43 / 57
Gamma Process
This discrete measure is draw from G ∼ ΓP(α) (from previous slide)
G =∑n≥1
γnδθn .
A ΓP can be thought of as an unnormalized DP.
Wood (Columbia University) ISEDHMM November, 2011 44 / 57
Dirichlet Processes
D = G/G (Θ) is a sample from a DP with base measure α
D ∼ DP(α).
A draw from a Dirichlet process is an infinite mixture of weighted, discreteatoms.
For the ISEDHMM
each atom is a next state (there are a countably infinite number ofsuch states)
each atoms’ weight is the probability of transitioning to that state
there will be a countably infinite number of such transitiondistributions
Wood (Columbia University) ISEDHMM November, 2011 45 / 57
Structured Transitions Via Dependent Dirichlet Processes
One way to define a set of dependent DPs is to construct a basegamma process over an augmented space by taking the union ofdisjoint independent gamma processes, then define a series ofrestricted projections of that base process, which are themselvesgamma processes.
The normalization of these dependent gamma processes form a set ofdependent DPs.
We will use this procedure to construct a number of dependent DPs(one for each HMM state) which preclude certain transitions. Theprecluded transitions, a form of dependence, arise from particulars.
Wood (Columbia University) ISEDHMM November, 2011 46 / 57
Structured Transitions Via Dependent Dirichlet Processes
One way to define a set of dependent DPs is to construct a basegamma process over an augmented space by taking the union ofdisjoint independent gamma processes, then define a series ofrestricted projections of that base process, which are themselvesgamma processes.
The normalization of these dependent gamma processes form a set ofdependent DPs.
We will use this procedure to construct a number of dependent DPs(one for each HMM state) which preclude certain transitions. Theprecluded transitions, a form of dependence, arise from particulars.
Wood (Columbia University) ISEDHMM November, 2011 46 / 57
Structured Transitions Via Dependent Dirichlet Processes
One way to define a set of dependent DPs is to construct a basegamma process over an augmented space by taking the union ofdisjoint independent gamma processes, then define a series ofrestricted projections of that base process, which are themselvesgamma processes.
The normalization of these dependent gamma processes form a set ofdependent DPs.
We will use this procedure to construct a number of dependent DPs(one for each HMM state) which preclude certain transitions. Theprecluded transitions, a form of dependence, arise from particulars.
Wood (Columbia University) ISEDHMM November, 2011 46 / 57
SNΓP construction of IEDHMM transition distributions
GRM GR1 GR2 GR3
G2, D2G1, D1 G3, D3 GM , DM
D2, 2D1, 1 D3, 3 DM , M
c0
c1
G = GR+
[Huggins and W, 2011]
Wood (Columbia University) ISEDHMM November, 2011 47 / 57
Spatial Normalized Gamma Processes [Rao and Teh, 2009]
To review SNΓPs formally, let V be an arbitrary auxiliary space. In general,one can think of this as a covariate, time, or an index. For V ⊂ V, let
α(Θ,V ) = c0Hθ(Θ)Hv (V )
be the base measure for a gamma process G = ΓP(α) defined over theproduct space Θ⊗ V. Here c0 is a concentration parameter and Hθ andHv are probability measures. We will refer to G as the “base ΓP”.
Wood (Columbia University) ISEDHMM November, 2011 48 / 57
Spatial Normalized Gamma Processes [Rao and Teh, 2009]
Let T be an index set and define restricted projected measures αm for allm ∈ T such that
αm(Θ) = α(Θ,Vm)
where Vm is a subset of V indexed by m.
The SNΓP gets its name from thinking of V as a space and Vm as a regionof space indexed by m. With G ∼ G being a draw from the base gammaprocess, define the restricted projection Gm by
Gm(Θ) = G (Θ,Vm).
Then Gm(Θ) is distributed according to a gamma process with basemeasure αm
Gm ∼ ΓP(αm).
Wood (Columbia University) ISEDHMM November, 2011 49 / 57
Spatial Normalized Gamma Processes [Rao and Teh, 2009]
Let T be an index set and define restricted projected measures αm for allm ∈ T such that
αm(Θ) = α(Θ,Vm)
where Vm is a subset of V indexed by m.
The SNΓP gets its name from thinking of V as a space and Vm as a regionof space indexed by m. With G ∼ G being a draw from the base gammaprocess, define the restricted projection Gm by
Gm(Θ) = G (Θ,Vm).
Then Gm(Θ) is distributed according to a gamma process with basemeasure αm
Gm ∼ ΓP(αm).
Wood (Columbia University) ISEDHMM November, 2011 49 / 57
Spatial Normalized Gamma Processes [Rao and Teh, 2009]
Let T be an index set and define restricted projected measures αm for allm ∈ T such that
αm(Θ) = α(Θ,Vm)
where Vm is a subset of V indexed by m.
The SNΓP gets its name from thinking of V as a space and Vm as a regionof space indexed by m. With G ∼ G being a draw from the base gammaprocess, define the restricted projection Gm by
Gm(Θ) = G (Θ,Vm).
Then Gm(Θ) is distributed according to a gamma process with basemeasure αm
Gm ∼ ΓP(αm).
Wood (Columbia University) ISEDHMM November, 2011 49 / 57
Spatial Normalized Gamma Processes [Rao and Teh, 2009]
Normalizing Gm yields a draw Dm = Gm/Gm(Θ) distributed according to aDirichlet process with base measure αm
Dm ∼ DP(αm).
Dm(Θ) is not in general independent of Dm′(Θ) because they can shareatoms from G .
Wood (Columbia University) ISEDHMM November, 2011 50 / 57
IEDHMM [Huggins and W, 2011] (an example ISEDHMM)
Recall the gamma process G = ΓP(α) with base measure
α(Θ,V ) = c0Hθ(Θ)Hv (V ).
Draws G ∼ G are measures over the parameter and covariate productspace Θ⊗ V with the form
G =∞∑
m=1
γmδ(θm,vm)
where (θm, vm) ∼ Hθ × Hv .
In the IEDHMM θm = λm, θm is the duration parameter and outputdistribution parameters for state m.
For pedagogical expediency let T = V = 0, 1, 2, 3, 4, . . . then vm ∈ N.
Wood (Columbia University) ISEDHMM November, 2011 51 / 57
IEDHMM: “spatial” regions for restrictions of base ΓP
Rm = vm m ∈ TR+ = V \ VR = Rm : m ∈ T ∪ R+Rm = R \ Rm m ∈ T
Wood (Columbia University) ISEDHMM November, 2011 52 / 57
IEDHMM: Geometric base distribution over states example
Hv = Geom(p)
0v0
R0
1v1
R1
2v2
R2
3v3
R3
4v4
R4
5v5
R5
6v6
R6
7v7
8v8
R+
9v9
. . .
. . .
. . .
M
R4
Wood (Columbia University) ISEDHMM November, 2011 53 / 57
IEDHMM: SNΓP restrictions
From before, using the restricted projection αR(Θ) = α(Θ,R) settingGR(Θ) = G (Θ,R) and DR = GR/GR(Θ) we have
GR ∼ ΓP(αR)
DR ∼ DP(αR).
In the case of the IEDHMM, the base measure corresponding to the pointregion Rm ∈ R is
αRm(Θ) = c0δθm(Θ)Hv (Rm),
while for R+ it isαR+(Θ) = c0Hθ(Θ)Hv (R+).
Wood (Columbia University) ISEDHMM November, 2011 54 / 57
IEDHMM: SNΓP restrictions
From before, using the restricted projection αR(Θ) = α(Θ,R) settingGR(Θ) = G (Θ,R) and DR = GR/GR(Θ) we have
GR ∼ ΓP(αR)
DR ∼ DP(αR).
In the case of the IEDHMM, the base measure corresponding to the pointregion Rm ∈ R is
αRm(Θ) = c0δθm(Θ)Hv (Rm),
while for R+ it isαR+(Θ) = c0Hθ(Θ)Hv (R+).
Wood (Columbia University) ISEDHMM November, 2011 54 / 57
IEDHMM: From restricted ΓPs to DPs
A highly dependent transition distribution for state m is a DP draw Dm
Dm =Gm
Gm(Θ)=
∑R∈Rm
GR∑R′∈Rm
GR′(Θ)
=∑
R∈Rm
GR(Θ)∑R′∈Rm
GR′
GR
GR(Θ)
=∑
R∈Rm
GR(Θ)∑R′∈Rm
GR′DR
Wood (Columbia University) ISEDHMM November, 2011 55 / 57
IEDHMM: Mixture of DP construction for transition distributions
DRm ∼ DP(αRm)
DR+ ∼ DP(αR+)
γm ∼ Gamma(c0Hv (Rm), 1)
γ+ ∼ Gamma(c0Hv (R+), 1)
βmk =I(m 6= k)γk
γ+ +∑M
k ′ 6=m γk ′
βm+ =γ+
γ+ +∑M
k ′ 6=m γk ′
Dm =M∑
k 6=m
βmkDRk+ βm+DR+ .
Wood (Columbia University) ISEDHMM November, 2011 56 / 57
IEDHMM: Relaxing the dependence hierarchically
These dependent DPs are the base dist.’s for conditional state transitiondistributions Dm ∼ DP(c1Dm). With βm = (βm1, . . . , βmM , βm+)
πm ∼ Dirichlet(c1βm),
Dm =M∑
k=1
πmkδθm + πm+DR+
where c1 is a concentration parameter and πm = (πm1, . . . , πmM , πm+).
The conditional state transition probability row vector πm is finite, sinceprobabilities of transitioning to new states have been merged into a singleprobability πm+ =
∑∞k=M+1 πmk . This “bin” is dynamically split and
joined at inference time.
Wood (Columbia University) ISEDHMM November, 2011 57 / 57