maximum likelihoodestimationofhidden markov...

21
Maximum Likelihood Estimation of Hidden Markov Processes Halina Frydman and Peter Lakner New York University June 11, 2001 Abstract We consider the process dY t = u t dt + dW t ; where u is a process not necessarily adapted to F Y (the …ltration generated by the process Y ) and W is a Brownian Motion. We obtain a general representa- tion for the likelihood ratio of the law of the Y process relative to Brownian measure. This representation involves only one basic …lter (expectation of u conditional on observed process Y ) : This generalizes the result of Kailath and Zakai (1971) where it is assumed that the process u is adapted to F Y : In particular, we consider the model in which u is a functional of Y and of a random element X which is independent of the Brownian Motion W: For example X could be a di¤usion or a Markov chain. This result can be applied to the esti- mation of an unknown multidimensional parameter μ appearing in the dynamics of the process u based on continuous observation of Y on the time interval [0,T]. For a speci…c hidden di¤usion …nancial model in which u is an unobserved mean-reverting di¤usion, we give an explicit form for the likelihood function of μ: For this model we also develop a computationally explicit E-M algorithm for the estimation of μ: In contrast to the likelihood ratio, the algorithm involves evaluation of a number of …ltered integrals in addition to the basic …lter. 1 Introduction Let ( -; F ;P ), fF t ;t T g be a …ltered complete probability space and W = fW t ;t T g astandard Brownian Motion. We consider process Y with the 1

Upload: others

Post on 25-Feb-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

Maximum Likelihood Estimation of HiddenMarkov Processes

Halina Frydman and Peter LaknerNew York University

June 11, 2001

AbstractWe consider the process dYt = utdt + dWt; where u is a process

not necessarily adapted to FY (the …ltration generated by the processY ) and W is a Brownian Motion. We obtain a general representa-tion for the likelihood ratio of the law of the Y process relative toBrownian measure. This representation involves only one basic …lter(expectation of u conditional on observed process Y ): This generalizesthe result of Kailath and Zakai (1971) where it is assumed that theprocess u is adapted to FY : In particular, we consider the model inwhich u is a functional of Y and of a random element X which isindependent of the Brownian Motion W: For example X could be adi¤usion or a Markov chain. This result can be applied to the esti-mation of an unknown multidimensional parameter µ appearing in thedynamics of the process u based on continuous observation of Y on thetime interval [0,T]. For a speci…c hidden di¤usion …nancial model inwhich u is an unobserved mean-reverting di¤usion, we give an explicitform for the likelihood function of µ: For this model we also developa computationally explicit E-M algorithm for the estimation of µ: Incontrast to the likelihood ratio, the algorithm involves evaluation of anumber of …ltered integrals in addition to the basic …lter.

1 IntroductionLet (­;F ; P ), fFt; t � T g be a …ltered complete probability space andW =fWt; t � Tg a standard Brownian Motion. We consider process Y with the

1

Page 2: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

decomposition

dYt = ut(µ)dt+ dWt (1)

where u is a measurable, adapted process, and µ is an unknown, vector-valuedparameter. We assume that Y is observed continuously on the interval [0; T ];but that neither u = fut; t � Tg nor W are observed. The purpose of thiswork is to develop methods for the maximum likelihood estimation of µ:Our interest in this work was motivated by the following model from …nance.Suppose that we observe continuously a single security with the price process{St; t � Tg which follows the equation

dSt = ¹tStdt+ ¾StdWt; (2)

where ¾ > 0 is a known constant, but the drift coe¢cient ¹= f¹t; t � Tg isan unobserved mean-reverting process with the dynamics

d¹t = ®(À ¡¹t)dt+ ¯dW 1t : (3)

Here ® > 0; ¯ > 0; and À are unknown parameters, and W 1 is a standardBrownian Motion independent ofW: The initial value of the security S0 is anobservable constant whereas the initial value of the drift, ¹0; is an unknownconstant. The model in (2) and (3) has been recently found to be very ‡exiblein modeling of security prices. Its versions have been used extensively in thearea of dynamic asset management (see e.g. Bielecki and Pliska (1999),Lakner (1998), Kim and Omberg (1996), and Lo (2000)). We now transformthe model de…ned by (2) and (3) to the form in (1) by introducing thefollowing changes of variables. The reason for the transformation is that wewant the law of {Yt; t � Tg to be equivalent to that of a Brownian Motion.Let

Xt =1

¯¹t ¡

1

¯¹0;

Yt =1

¾log

µStS0

¶+¾

2t:

Then the hidden process is

dXt = ®(± ¡Xt)dt+ dW 1t ; (4)

2

Page 3: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

where ± = À=¯; and the observed process is

dYt = (¸Xt + º)dt + dWt; (5)

where ¸ = ¯=¾ and º = ¹0=¾: Let µ = (®; ±; ¸; º): We note that X0 = 0 andY0 = 0; and that equation (5) is in the form (1) with ut(µ) = ¸Xt + º:

We summarize our results. Our main result is a new formula for thelikelihood ratio of the law of the process Y; speci…ed in (2), with respect tothe Brownian measure. Naturally, the likelihood ratio is a function of µ: Thisresult generalizes a result in Kailath and Zakai (1971) that gives a formulafor the likelihood ratio in the case when u is adapted to FY = fFY

t ; t � T g;the …ltration generated by the observed process Y: Here we extend this resultto the case when u is not adapted to FY : We note that in our …nancial modelexample (5) u is not adapted to FY : We establish the new formula under astronger (Theorem 1) and a weaker (Theorem 2) set of su¢cient conditions.These conditions allow u to be a functional of Y; and of a random elementX which is independent of a Brownian Motion W: For example X could bea di¤usion or a Markov chain. The likelihood ratio involves only one basic…lter, that is, the expectation of u conditional on the observed data. The newformula makes it possible to directly maximize the likelihood function withrespect to the parameters. It may also be useful in developing the likelihoodratio tests of various hypotheses about the parameters of hidden Markovprocesses. For the …nancial model in (4) and (5) we give an explicit form ofthe likelihood function.

We also consider the Expectation-Maximization (E-M) algorithm ap-proach to the maximization of the loglikelihood function for the model in(4) and (5). Dembo and Zeitouni (1986) extended the E-M algorithm to theparameter estimation of the hidden continuous time random processes andin particular to the parameter estimation of hidden di¤usions. The …nancialmodel in (4) and (5) generalizes an example in Dembo and Zeitouni (1986).For this model we develop a computationally explicit E-M algorithm for themaximum likelihood estimation of µ: The expectation step of this algorithmrequires evaluation of …ltered integrals, that is, evaluation of the expectationsof stochastic integrals conditional on observed data. As remarked by Elliottet al (1997, p. 203), Dembo and Zeitouni (1986) express the …ltered integralsin the form of nonadapted stochastic integrals which are not well de…ned. Inour development we express a required …ltered integral in terms of adaptedstochastic integrals (Proposition 4). In contrast to the likelihood function,the E-M algorithm involves a number of …lters in addition to the basic …lter.

3

Page 4: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

The development of the likelihood ratio tests based on the new formulafor the likelihood function and of simulations to assess the accuracy of themaximum likelihood estimators is left for a future investigation.

This paper is organized as follows. The new formula for the likelihoodratio is established in Section 2. Section 3 presents the extended E-M al-gorithm and some of its properties. In Section 4 and 5 we apply the E-Malgorithm to the estimation of µ in model in (4) and (5). Appendix containsan auxiliary result for the derivation of the likelihood ratio in Section 2.

2 The Likelihood RatioIn this section we generalize a result of Kailath and Zakai (1971). We considerthe process in (1), but to simplify notation, we suppress the dependence ofu on µ; so that

dYt = utdt+ dWt: (6)

Our standing assumption is

Z T

0

u2t dt <1: (7)

Let Py be the measure induced by Y on B(C [0; T ]) and Pw be the Brownianmeasure on the same class of Borel sets. We know from Kailath & Zakai(1971) that (7) implies Py ¿ Pw. The same authors give a formula for thelikelihood ratio dPy

dPw for the case when u is adapted to FY . Here we are goingto extend this result to the case when u is not adapted to FY .

Let fv(t); t � Tg be the FY¡optional projection of u, thus

E[ut j FYt ] = vt; a:s: (8)

whenever

Ejutj < 1:

Theorem 1 Additionally to the assumption (7), we assume that

E jutj< 1 ; for all t 2 [0; T ] (9)

4

Page 5: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

Z T

0

v2sds < 1 ; a.s. (10)

and

E

�exp

Z T

0

usdWs ¡ 1

2

Z T

0

u2sdso¸= 1: (11)

Then Py is equivalent to Pw and

dPydPw

= expfZ T

0

vsdYs ¡ 1

2

Z T

0

v2sdsg: (12)

Remark: The formula on the right-hand side is an FYT -measurable ran-

dom variable, thus it is also a path-functional of fYt; t � T g. Thus thisexpression represents a random variable and a functional on C[0; T ] simulta-neously, and in (12) it appears in its capacity as a functional on the space ofcontinuous functions.

Proof. We de…ne the process

Zt = expf¡Z t

0

usdWs ¡ 1

2

Z t

0

u2sdsg; t 2 [0; T ] (13)

and the probability measure P0 by

dP0dP

= ZT ;

which is a bona …de probability measure by condition (11). We are go-ing to use repeatedly the following form of “Bayes theorem”: for every Ft-measurable random variable V we have

E0[V j FYt ] =

1

E [Zt j FYt ]E [ZtV j FY

t ]; t 2 [0; T ]; (14)

where E0 is the expectation operator corresponding to the probability mea-sure P0: According to Girsanov’s Theorem Y is a Brownian Motion underP0. Also, one can see easily that

dPydPw

= E0[1

ZTj FY

T ] = (E [ZT j FYT ])

¡1:

5

Page 6: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

The equivalence of Py and Pw already follows, since ZT > 0 (under P andP0) thus also

E0[1

ZTj FY

T ] > 0;

hence the same expression, as a path-functional of fYt; t � Tg is Py-almosteverywhere positive. We still need to show (12). We start with the identity

1

Zt= 1+

Z t

0

1

ZsusdYs:

Theorem 12.1 in Lipster and Shiryaev (1978), implies that

E0[1

Ztj FY

t ] = 1 +

Z t

0

E0[1

Zsus j FY

s ]dYs; (15)

as long as the following two conditions hold:

E0

�1

Zsjusj

¸<1 (16)

Z T

0

µE0

�1

Zsus j FY

s

¸¶2

ds <1 ; a.s. (17)

However, (16) follows from condition (9). In order to show (17) we use therule (14) and computeZ T

0

(E0[1

Zsus j FY

s ])2ds =

Z T

0

(E [Zs j FYs ])

¡2v2sds =Z T

0

(E0[1

Zsj FY

s ])2v2sds:

Now we observe that by (15) (s; !) 7! E0£1Zs

j FYs

¤is a P0-local martingale

(and also a martingale but this is not important at this point), adaptedto the …ltration generated by the P0-Brownian Motion Y , thus it must becontinuous (Karatzas and Shreve, Problem 3.4.16). This fact and condition(10) implies (17). Now using (15) and (14) we get

E0[1

Ztj FY

t ] = 1 +

Z t

0

1

E[Zs j FYs ]vsdYs = 1+

Z t

0

E0

�1

Zsj FY

s

¸vsdYs;

6

Page 7: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

which implies (12).In the rest of this section we are going to strengthen Theorem 1 by elim-

inating condition (11). This will be a generalization again of a result byKailath and Zakai (1971), where u is assumed to be FY -adapted. Our objec-tive here is to have a result which applies to a situation when u depends onY and possibly on another process which is independent of W . This otherprocess may be a Brownian Motion, or possibly a Markov chain. In orderto achieve this generality we assume that X is a complete separable metricspace and X : ­ 7! X is an F 7! B(X )-measurable random element, inde-pendent of the Brownian MotionW . We denote by PX the measure inducedby X on B(X): We assume that

us = ©(Ys0 ; s;X); s 2 [0; T ] (18)

where Y s0 represents the path of Y up to time s. Before spelling out theexact conditions on © we note that X may become a Brownian Motion ifX = C[0; T ], or a Markov-chain if X =D[0; T ] (the space of right-continuousfunctions with left limits) with the Skorohod topology. Other examples arealso available.

The conditions for © are the following. First a notation: for every f 2C([0; T ]) and s 2 [0; T ] we denote by f s 2 C [0; s] the restriction of f to [0; s].We assume the following:

For every s 2 [0; T ] …xed, ©(¢; s; ¢) is a mapping from C [0; s] £ X to <such that

(i) the C[0; T ]£ [0; T ]£ X 7! < mapping given by (f; s; x) 7! ©(f s; s; x)is measurable with respect to the appropriate Borel-classes.

(ii) for every s 2 [0; T ] and f 2 C[0; s] the random variable ©(f; s;X) isFs-measurable.

(iii) the following “functional Lipschitz” condition (Protter, 1990) holds:for every f; g 2 C([0; T ]) and PX¡ almost every x 2 X there exists anincreasing (…nite) process Kt(x) such that

¯̄©(f t; t; x) ¡ ©(gt; t; x)

¯̄� Kt(x) sup

u�tjf (u)¡ g(u)j

for all t 2 [0; T ]:Our basic setup now is Y with a decomposition (6), u of the form (18),

whereX is a random element independent ofW and© a functional satisfying(i), (ii) and (iii).

7

Page 8: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

Theorem 2 Suppose that (7) holds, i.e., in our case

Z T

0

©2(Y s0 ; s;X)ds < 1 (19)

and alsoZ T

0

©2(W s0 ; s;X)ds < 1: (20)

Then

E

�exp

Z T

0

usdWs ¡ 1

2

Z T

0

u2sdso¸= 1: (21)

Remark: It is interesting to observe that if © does not depend on Y ,i.e., u is independent of W then (21) follows from (7) only.

Remark: Conditions (19) and (20) are obviously satis…ed if the mapping

s 7! ©(fs; s; x)

is continuous for every (f; x) 2 C[0; T ]£ X .Proof. The main idea of the proof is to derive our theorem from the

results of Kailath & Zakai by conditioning on X = x. Let Q(x; ¢), x 2 X bea regular conditional probability measure on F given X. The existence ofsuch regular version is well-known in this situation. It is shown in the Ap-pendix that the independence of the Brownian MotionW and X implies thatf(Wt;FW

t ); t � Tg remains a Brownian Motion under Q(x; ¢) for PX-almostevery x 2 X . We also show in the Appendix that if A is a zero probabilityevent under P then for PX¡ almost every x 2 X we have Q(x;A) = 0: (The“PX¡ almost every” is important here; obviously Q(x; ¢) is not absolutelycontinuous with respect to P:) Accordingly, for PX-almost every x 2 X

Z T

0

©2(Y s0 ; s; x)ds <1 ; Q(x; ¢)¡ a:s:;

andZ T

0

©2(W s0 ; s; x)ds < 1 ; Q(x; ¢) ¡ a:s:

8

Page 9: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

Since X is assumed to be a complete measurable metric space, we have

Q(x; f! 2 ­ :X (!) = xg) = 1

for PX-almost every x 2 X (Karatzas and Shreve,(1988), Theorem 5.3.19)What follows is that under Q(x; ¢) our equations (6) and (18) become

dYt = ©(Yt0 ; t; x)dt+ dWt:

It follows from Theorem V.3.7 in Protter (1990) and our condition (iii) thatthe above equation has a unique (strong) solution. Thus the process Y mustbe adapted to the Q(x; ¢)¡ augmentation of FW : However, now we are backin the situation studied by Kailath and Zakai (1971). For every x 2 X letPY (x; ¢) be the measure induced by Y on B(C [0; T ]) under the conditionalprobability measure Q(x; ¢). Kailath and Zakai, Theorems 2 and 3 implythat for PX-almost every x 2 X we have PY (x; ¢) » Pw and

dPwdPY (x; ¢)

= expn

¡Z T

0

©(Y t0 ; t; x)dYt +1

2

Z T

0

©2(Y t0 ; t; x)dto:

This implies that the integral of the right-hand side with respect to themeasure PY (x; ¢) is 1, which can be expressed as

E

�exp

Z T

0

utdYt +1

2

Z T

0

u2dto

j X = x

¸= 1 :

for PX-almost every x 2 X . Formula (21) now follows.We summarize the results of Theorems 1 and 2 in the following theorem.

Theorem 3 Let X : ­ 7! X be a measurable mapping taking values in thecomplete separable metric space X . Assume that X is independent of theBrownian motion W , Y has the form (6) and u is as speci…ed in (18) where© satis…es conditions (i), (ii) and (iii). Additionally we assume

E[©(Y s0 ; s;X)] <1;

Z T

0

v2sds <1 ; a.s.,

9

Page 10: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

Z T

0

©2(Y s0 ; s;X )ds < 1 ; a.s.,

Z T

0

©2(W s0 ; s;X)ds < 1; a.s.,

where v is the optional projection of u to FY .Then Py » Pw and

dPy

dPw= exp

nZ T

0

vsdYs ¡ 1

2

Z T

0

v2sdso:

Furthermore,

E

�expf¡

Z T

0

usdWs ¡ 1

2

Z T

0

u2sdsg¸= E

�expf¡

Z T

0

vsdWs ¡ 1

2

Z T

0

v2sdsg¸= 1:¥

For our example of the security price model we compute explicitly therequired optional projection in Section 5.

3 E-M algorithm.We brie‡y describe the extended EM algorithm presented in Dembo andZeitouni (1986). In the next section we apply this algorithm to the estimationof µ in the model de…ned in (4) and (5). Let FX

T be the …ltration generatedby {Xt; 0 � t � Tg; FY

T the …ltration generated by {Yt; 0 � t � Tg; andFX;YT the …ltration generated by {Xt; 0 � t � T g and {Yt; 0 � t � Tg: We

identify ­ as C [0; T ]£ C[0; T ] and F as the Borel sets of ­: Let X be the…rst coordinate mapping process and Y be the second coordinate mappingprocess, i. e., for (!1; !2) 2 C[0;T ] £ C [0; T ] we have Xt(!1; !2) = !1(t)and Yt(!1; !2) = !2(t): Let Pµ be a family of probability measures such thatPµ » PÁ for all µ; Á 2 £; where £ is a parameter space. Also let Eµ denotethe expectation under the measure Pµ : We denote by PYµ the restriction ofPµ to FY

T : Then obviously

dPYµdPYÁ

= EÁ(dPµdPÁ

jFYT ): (22)

10

Page 11: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

The maximum likelihood estimator bµ of µ is de…ned as:

bµ = arg maxµ

ÃdPYµdPYÁ

!for some constant Á 2 £:

It should be clear that this de…nition does not depend on the choice of Á:The maximization of dPYµ =dP

YÁ is equivalent to the maximization of the log-

likelihood function

LÁ(µ) , logdPYµdPYÁ

: (23)

Now by (22) and (23)

LÁ(µ0) ¡ LÁ(µ) = log

dP Yµ0

dP Yµ= logEµ

µdPµ0

dPµjFYT

¶;

and from the Jensen’s inequality

LÁ(µ0)¡ LÁ(µ) ¸ Eµ

µlogdPµ0

dPµjFYT

¶; (24)

with equality if and only if µ0 = µ: The E-M algorithm is based on (24)and proceeds as follows. In the …rst step we select an initial value µ0 for µ:In the (n + 1)st iteration, n ¸ 0; the E-M algorithm is applied to produceµn+1: Each iteration consists of two steps: the expectation, or E, step andthe maximization, or M, step. In the E-step of the (n + 1)st iteration thefollowing quantity is computed

Q(µ; µn) , Eµn (logdPµdPµn

jFYT ):

In the M-step Q(µ; µn) is maximized with respect to µ to obtain µn+1: Iter-ations are repeated until the sequence fµng converges. It is clear from (24)that

LÁ(µn+1) ¸ LÁ(µn); (25)

with equality if and only if µn+1 = µn: Thus at each iteration of the E-Malgorithm the likelihood function increases. In the M-step the maximizer ofµ is obtained as the solution of

@

@µQ(µ; µn) =

@

@µEµn (log

dPµdPµn

jFYT ) = 0; (26)

11

Page 12: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

or if we can interchange the di¤erentiation and expectation operations, themaximizer of µ is the solution of

Eµn(@

@µlog

dPµdPµn

jFYT ) = 0:

A corollary of (25) is that the maximum likelihood estimator is a …xed pointof the E-M algorithm. For summary of the convergence properties of theE-M algorithm and additional references we refer the reader to Dembo andZeitouni (1986).

4 Estimation of the drift of the security priceprocess with the E-M algorithm

We develop the E-M algorithm for the estimation of µ = (®; ±; ¸; º) in thesecurity price model in (4) and (5). By Girsanov’s theorem, if also X processis completely observed, the log-likelihood function of µ is given by

dPµdPW

= exp

½Z T

0

®(± ¡Xs)dXs ¡ 1

2

Z T

0

®2(± ¡Xs)2ds¾

£ exp½Z T

0

(¸Xs + º)dYs ¡ 1

2

Z T

0

(¸Xs + º)2ds

¾;

where W =(W 1;W ) is a two-dimensional Brownian motion and PW is theassociated Brownian measure on ß(C[0; T ]£ C [0; T ]): Hence

logdPµdPµn

=

½Z T

0

®(± ¡Xs)dXs ¡ 1

2

Z T

0

®2(± ¡Xs)2ds¾

+

½Z T

0

(¸Xs+ º)dYs ¡ 1

2

Z T

0

(¸Xs + º)2ds

¾

+f(µn);

where f(µn) is a term which does not depend on µ: Since, for our prob-lem, we can interchange the operations of di¤erentiation and integration, theequations in (26) take the form

Eµn

�@

@®log

dPµdPµn

jFYT

¸=

Eµn

�Z T

0

(± ¡Xs)dXs ¡ ®Z T

0

(± ¡Xs)2dsjFYT

¸= 0; (27)

12

Page 13: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

Eµn

�@

@±log

dPµdPµn

jFYT

¸= Eµn

�®XT ¡

Z T

0

®2(± ¡Xs)dsjFYT

¸= 0; (28)

Eµn

�@

@¸log

dPµdPµn

jFYT

¸= Eµn

�Z T

0

XsdYs ¡Z T

0

(¸Xs + º)XsdsjFYT

¸= 0:

(29)

Eµn

�@

@Àlog

dPµdPµn

jFYT

¸= Eµn

�YT ¡

Z T

0

(¸Xs + º)dsjFYT

¸= 0; (30)

Let

m(t; T; µn) = Eµn (XtjFYT );

n(t; T; µn) = Eµn (X2t jFY

T ):

Then, recalling thatR T0XsdXs = 1

2X2T ¡ T

2; equation (27) becomes

±m(T; T; µn)¡1

2n(T; T; µn) +

+T

2¡ ®±2T + 2®±

Z T

0

m(s; T; µn)ds¡ ®Z T

0

n(s; T; µn)ds = 0: (31)

Equation (28) takes the form

m(T; T; µn) ¡ ®±T + ®Z T

0

m(s; T; µn)ds = 0: (32)

Multiplying (32) by ± and subtracting the result from (31) gives

¡12n(T; T; µn) +

T

2+ ®±

Z T

0

m(s; T ; µn)ds¡ ®Z T

0

n(s; T; µn)ds= 0: (33)

From equations (32) and (33) we obtain updated values of ® and ± :

®n+1 =¡12Tn(T; T; µn) +

T 2

2 +m(T; T; µn)R T0 m(s; T; µn)ds

TR T0 n(s; T; µn)ds¡

hR T0 m(s; T ; µn)ds

i2 ; (34)

±n+1 =m(T; T; µn)

T®n+1+

R T0m(s; T; µn)ds

T: (35)

13

Page 14: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

Similarly we write equation (30) as

YT ¡ ¸Z T

0

m(s; T ; µn)ds¡ ºT = 0; (36)

and equation (29) as

Eµn

�Z T

0

XsdYsjFYT

¸¡ ¸

Z T

0

n(s; T; µn)ds¡ ºZ T

0

m(s; T ; µn)ds = 0: (37)

We compute the …rst term of (37) in the following lemma. Let

ms(µn) = Eµn (XsjFYs );

and to simplify notation we will write

m(t; T) = m(t; T ; µ);

n(t; T) = n(t; T; µ);

ms = ms(µ)

Proposition 4

�Z T

0

XsdYsjFYT

¸

=

Z T

0

msdYs ¡Z T

0

ms(º + ¸ms)ds+ º

Z T

0

m(s; T)ds+ ¸

Z T

0

m(s; T )msds

Proof. Let

fWt = Yt ¡ ºt ¡ ¸Z T

0

msds (38)

By Lemma 11.3 in Lipster and Shiryayev (1978), the process (fWt;FYt ); 0 �

t � T is a Brownian motion. By Theorem 12.1 of Lipster and Shiryayev(1978) mt satis…es the following equation

dmt = ®(± ¡mt)dt¡ ¸°t(º + ¸mt)dt+ ¸°tdYt;m0 = 0; (39)

where °t = V (XtjFYt ) is a deterministic function satisfying

d°tdt= ¡¸2°2t ¡ 2®°t + 1; °0 = 0: (40)

14

Page 15: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

From (38)

dfWt = dYt ¡ (º + ¸mt)dt; (41)

and thus

dmt = ®(± ¡mt)dt+ ¸°tdfWt: (42)

We see from (42) that mt is FfWt ¡ adapted. Then Yt is FfW

t ¡ adapted by(38). Hence

�Z T

0

XsdYsjFYT

¸= Eµ

�Z T

0

XsdYsjFfWT

¸

= Eµ

�Z T

0

XsdfWs+

Z T

0

Xs(º + ¸ms)dsjFfWT

¸

= Eµ

�Z T

0

XsdfWsjF fWT

¸+ º

Z T

0

m(s; T)ds

Z T

0

m(s; T )msds: (43)

Now by Theorem 5.14 in Lipster and Shiryayev (1977, p.185)

�Z T

0

XsdfWsjFfWT

¸=

Z T

0

msdfWs; (44)

provided that

E jXsj < 1; 0 � s � T; (45)

and

P (

Z T

0

hE(jXsj jF fW

s )i2ds < 1) = 1: (46)

Conditions (45) and (46) hold in our case and thus by (44), (43) and (41)

�Z T

0

XsdYsjFYT

¸

=

Z T

0

msdfWs+ º

Z T

0

m(s; T)ds+ ¸

Z T

0

m(s; T)msds

=

Z T

0

msdYs ¡Z T

0

ms(º + ¸ms)ds+ º

Z T

0

m(s; T)ds

Z T

0

m(s; T )msds;

15

Page 16: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

which is the statement of the Proposition.By Proposition 4 equation (37) becomes

0 =

Z T

0

ms(µn)dYs ¡ ºZ T

0

ms(µn)ds¡ ¸Z T

0

m2s(µn)ds

Z T

0

m(s; T ; µn)ms(µn)ds¡ ¸Z T

0

n(s; T; µn)ds; (47)

From (36) and (47) the updated values for ¸ and º are

¸n+1 = (48)

TR T0 ms(µn)dYs ¡ YT

R T0 ms(µn)ds

TR T0 [m

2s(µn) + n(s; T; µn) ¡m(s; T; µn)ms(µn)] ds¡

R T0 m(s; T; µn)ds

R T0 ms(µn)ds

;

and

ºn+1 =YT ¡ ¸n+1

R T0 m(s; T; µn)ds

T: (49)

To implement the E-M algorithm using equations (34), (35), (48), and (49)we need the expressions for ms;m(s; T ) and n(s; T): These expressions areobtained in the next section.

5 Expressions for Filters.We obtain mt as the solution of the equation (39). Letting

©t = exp

µ¡®t¡ ¸2

Z t

0

°sds

¶: (50)

the solution is

mt = ©t

�¸

Z t

0

°s©¡1s (dYs ¡ ºds) +®±

Z t

0

©¡1s ds¸: (51)

The function °t is obtained as the solution of (40) and is given by

°t =b

¸2exp(2at) ¡ dexp(2at) + d

¡ ®

¸2; (52)

16

Page 17: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

where

b =p®2 + ¸2; d =

b¡ ®b+ ®

:

For evaluation of mt it is useful to note the following expressions. Substitut-ing (52) into (50) gives

©t = exp

µ¡b

Z t

0

(b+ ®) exp(bs)¡ (b ¡ ®)exp(¡bs)(b +®) exp(bs) + (b¡ ®) exp(¡bs)ds

= exp©¡ ln [(b+ ®) exp(bs) + (b¡ ®) exp(¡bs)] jt0

ª

=2b

(b+ ®) exp(bt) + (b¡ ®) exp(¡bt);

and also

°s©¡1s =

sinh(bs)

b:

We note that the likelihood function of µ involves only the …lter ms; and isof the following explicit form

exp

½Z T

0

[¸ms(µ) + º]dYs ¡ 1

2

Z T

0

[¸ms(µ) + º]2ds

¾:

Application of Lemma 12.3 and Theorem 12.3 in Lipster and Shiryayev (1978,p. 36-37) to our model shows that

m(s; t) (53)

=¡'ts

¢¡1mt ¡ ®±

Z t

s

('us)¡1 du ¡ ¸

Z t

s

('us)¡1 °(u; s)(dYu ¡ ºdu); s � t;

and

°(s; t) =

�1 + ¸2°s

Z t

s

('us)2 du

¸¡1°s; s � t; (54)

respectively, where 'ts; t ¸ s; is the solution of

d'tsdt

=¡¡® ¡ ¸2°(t; s)

¢'ts; '

ss = 1; (55)

17

Page 18: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

and for t > s; °(t; s) = V (XtjzYt ; Xs) satis…es

d°(t; s)

dt= ¡2®°(t; s) + 1 ¡ ¸2°(t; s)2; °(s; s) = 0: (56)

It can be easily veri…ed that the solution to (56) is

°(t; s) =b

¸2exp [2b(t¡ s)]¡ dexp [2b(t¡ s)] + d ¡ ®

¸2; t ¸ s: (57)

Using (57), the solution to (55) is

'ts = exp

�¡b

Z t

s

exp[2b(u¡ s)]¡ dexp[2b(u¡ s)] + ddu

¸

= (1 + d)exp[b(t¡ s)]

exp[2b(t¡ s)] + d:

For evaluation of m(s; t) it is useful to note that

¡'ts

¢¡1°(t; s) =

sinh [b(t¡ s)]b

;

and to evaluate °(s; t) we compute the integral

Z t

s

('us)2 du =

4b2

(b+ ®)2

Z t

s

exp[2b(u ¡ s)]fexp[2b(u ¡ s)] + dg2

du

=2b

(b+ ®)2

�¡ 1

exp[2b(u¡ s)] + d jts¸

=exp[2b(t¡ s)]¡ 1

(b+ ®) exp[2b(t¡ s)] + b¡ ®:

Finally we note

n(s; t) = °(s; t) +m(s; t)2; 0 � s � t:

Appendix

18

Page 19: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

Proposition: Suppose that {(Wt;Ft); t � Tg is a Brownian Motion, Xis a complete separable metric space, X : ­ ! X is a measurable mapping,and {Q(x; ¢); x 2 Xg is a family of regular conditional probability measuresfor F given X: Let PX be the measure induced by X on B(X ). Then

(a) for every A 2 F such that P (A) = 0 we have Q(x;A) = 0 forPX¡almost every x 2 X ,

(b) for PX¡almost every x 2 X , the process {(Wt;FWt ); t � Tg is a

Brownian Motion under the probability measure Q(x; ¢):

Proof: Statement (a) is an obvious consequence of the de…nition of reg-ular conditional probability measure. Indeed, for every A 2 F we have

Q(x;A) = P (AjX = x);PX ¡ a:e:; x 2 X

and thus

Q(X;A) = P (AjX); P ¡ a:s:

This implies that if A is a P¡zero event then E [Q(X ;A)] = P (A) = 0 and(a) now follows. Next we are going to prove (b). It follows from (a) thatthe process W has Q(x; ¢)¡ almost surely continuous paths for PX¡almostevery x 2 X . We need to show that the …nite dimensional distributions ofW under Q(x; ¢) coincide with those under the probability measure P: Let us…x a positive integer n and time points 0 � t1 � ::: � tn � T: We denote byQ the set of rational numbers. For every q = (q1; :::; qn) 2 Qn there exists aset Bq 2 B(X), PX(Bq) = 1 such that for all x 2 Bq

Q(x; W (t1) � q1; :::;W(tn) � qn) = P (W (t1) � q1; :::;W (tn) � qnjX = x)= P (W (t1) � q1; :::;W (tn) � qn);

where the last step follows from independence of X and W under P: Now wede…ne

B =\

q2Qn

Bq;

and note that PX(B) = 1 since Qn is countable. It follows that the equationabove holds for every x 2 B which completes the proof.

19

Page 20: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

AcknowledgementHalina Frydman completed part of this work while visiting the Depart-

ment of Biostatistics at the University of Copenhagen. She thanks the de-partment for the hospitality.

REFERENCES

Bielecki, T. R. and Pliska, S. R., (1999). Risk Sensitive Dynamic AssetManagement. Journal of Applied Mathematics and Optimization 39: 337-360.

Dembo, A. and Zeitouni O. (1986). Parameter Estimation of PartiallyObserved Continuous Time Stochastic Processes. Stochastic Processes andtheir Applications 23: 91-113.

Elliott, R. J., Aggoun, L. P. and Moore, J. B. (1997). Hidden MarkovModels. Springer-Verlag.

Haugh, M. B. and Lo, A. W. (2000). Asset Allocation and Derivatives.Draft. MIT Sloan School of Management.

Kailath, T. and Zakai, M. (1971). Absolute Continuity and Radon-Nikodym Derivatives for Certain Measures Relative to Wiener Measure. TheAnnals of Mathematical Statistics 42: 130-140.

Karatzas, I. and Shreve, S. E. (1988). Brownian Motion and StochasticCalculus. Springer Verlag.

Kim, T. and Omberg, E. (1996). Dynamic Nonmyopic Portfolio Behavior.Review of Financial Studies 9: 141-161.

Lakner, P. (1998). Optimal Trading Strategy for an Investor: the Case ofPartial Information. Stochastic Processes and their Applications 76: 77-97.

Lipster, R. S. and Shiryayev, A. N. (1977). Statistics of Random Pro-cesses, Vol. 1. Springer Verlag.

Lipster, R. S. and Shiryayev, A. N. (1978). Statistics of Random Pro-cesses, Vol. 2. Springer Verlag.

20

Page 21: Maximum LikelihoodEstimationofHidden Markov Processesw4.stern.nyu.edu/ioms/docs/sg/workingpapers/drifta.pdf · 2004. 12. 3. · Maximum LikelihoodEstimationofHidden Markov Processes

Protter, P. (1990). Stochastic Integration and Di¤erential Equations.Springer-Verlag.

Department of Statistics and Operations ResearchStern School of Business, New York University44 West 4th Street, New York, NY 10012E-mail: [email protected]: [email protected]

Abbreviated Title: Hidden Markov Processes

21