patterns generated by -order markov chains

10
Statistics and Probability Letters 80 (2010) 1157–1166 Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro Patterns generated by mth-order Markov chains Evan Fisher a,*,1 , Shiliang Cui b a Department of Mathematics, Lafayette College, 111 Quad Drive, Easton, PA 18042, United States b OPIM Department, The Wharton School, University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA 19104, United States article info Article history: Received 1 February 2010 Accepted 15 March 2010 Available online 24 March 2010 Keywords: Patterns Markov chains Waiting time abstract We derive an expression for the expected time for a pattern to appear in higher-order Markov chains with and without a starting sequence. This yields a result for directly calculating, the first time one of a collection of patterns appears, in addition to the probability, for each pattern, that it is the first to appear. © 2010 Elsevier B.V. All rights reserved. 1. Introduction Let S be a non-empty finite set. A pattern from S is any finite sequence of elements from S . A pattern A of length n N ={1, 2,...} is denoted by A = a 1 a 2 ... a n where a i S for every i = 1, 2,..., n. In the case of a pattern generated by a sequence of random variables Z 1 , Z 2 ,..., Z n , we denote the random pattern similarly as Z 1 Z 2 ... Z n . (In the context of this paper, the aforementioned notation does not represent a product.) We introduce the following notation for typographical efficiency. Notation 1.1. Let A = a 1 a 2 ... a n be a pattern from S . For each m = 1, 2,..., n, let A m = a 1 a 2 ... a m and let ¯ A m = a n-m+1 ... a n . That is, A m represents the pattern consisting of the first m elements of A and ¯ A m represents the pattern consisting of the last m elements of A. For k m n, we define the pattern A m,k by A m,k = a m-k+1 ... a m . (In what follows, all indices are assumed to take values in N, unless explicitly indicated otherwise.) We adopt the same notation for random patterns Z 1 ... Z n with the exception that, to avoid ambiguity, we define the random pattern Z 1 Z 2 ... Z m by Z m,m , rather than Z m . Let Z 1 , Z 2 ,... be an mth-order, irreducible, homogeneous Markov chain defined on a probability space (Ω, F , P ) with finite state space S . That is, for every n N with n m and for s, s 1 ,..., s m S and x 1 , x 2 ,..., x n-m S such that P (Z n = s m ,..., Z n-m+1 = s 1 , Z n-m = x n-m ,..., Z 1 = x 1 )> 0, then P ( Z n+1 = s | Z n = s m ,..., Z n-m+1 = s 1 , Z n-m = x n-m ,..., Z 1 = x 1 ) = P ( Z n+1 = s | Z n = s m ,..., Z n-m+1 = s 1 ) . (1.1) We denote the latter conditional probability by P s 1 ···s m ,s . * Corresponding author. E-mail addresses: [email protected] (E. Fisher), [email protected] (S. Cui). 1 Tel.: +1 610 330 5281; fax: +1 610 330 5721. 0167-7152/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2010.03.011

Upload: evan-fisher

Post on 29-Jun-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Statistics and Probability Letters 80 (2010) 1157–1166

Contents lists available at ScienceDirect

Statistics and Probability Letters

journal homepage: www.elsevier.com/locate/stapro

Patterns generated bymth-order Markov chainsEvan Fisher a,∗,1, Shiliang Cui ba Department of Mathematics, Lafayette College, 111 Quad Drive, Easton, PA 18042, United Statesb OPIM Department, The Wharton School, University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA 19104, United States

a r t i c l e i n f o

Article history:Received 1 February 2010Accepted 15 March 2010Available online 24 March 2010

Keywords:PatternsMarkov chainsWaiting time

a b s t r a c t

We derive an expression for the expected time for a pattern to appear in higher-orderMarkov chains with and without a starting sequence. This yields a result for directlycalculating, the first time one of a collection of patterns appears, in addition to theprobability, for each pattern, that it is the first to appear.

© 2010 Elsevier B.V. All rights reserved.

1. Introduction

Let S be a non-empty finite set. A pattern from S is any finite sequence of elements from S. A pattern A of lengthn ∈ N = 1, 2, . . . is denoted by A = a1a2 . . . an where ai ∈ S for every i = 1, 2, . . . , n. In the case of a pattern generatedby a sequence of random variables Z1, Z2, . . . , Zn, we denote the random pattern similarly as Z1Z2 . . . Zn. (In the context ofthis paper, the aforementioned notation does not represent a product.)We introduce the following notation for typographical efficiency.

Notation 1.1. Let A = a1a2 . . . an be a pattern from S. For each m = 1, 2, . . . , n, let Am = a1a2 . . . am and let Am =an−m+1 . . . an. That is, Am represents the pattern consisting of the first m elements of A and Am represents the patternconsisting of the last m elements of A. For k ≤ m ≤ n, we define the pattern Am,k by Am,k = am−k+1 . . . am. (In whatfollows, all indices are assumed to take values in N, unless explicitly indicated otherwise.)We adopt the same notation for random patterns Z1 . . . Zn with the exception that, to avoid ambiguity, we define the

random pattern Z1Z2 . . . Zm by Zm,m, rather than Zm.

Let Z1, Z2, . . . be an mth-order, irreducible, homogeneous Markov chain defined on a probability space (Ω,F , P) withfinite state space S. That is, for every n ∈ Nwith n ≥ m and for s, s1, . . . , sm ∈ S and x1, x2, . . . , xn−m ∈ S such that

P(Zn = sm, . . . , Zn−m+1 = s1, Zn−m = xn−m, . . . , Z1 = x1) > 0,

then

P(Zn+1 = s | Zn = sm, . . . , Zn−m+1 = s1, Zn−m = xn−m, . . . , Z1 = x1

)= P

(Zn+1 = s | Zn = sm, . . . , Zn−m+1 = s1

).

(1.1)

We denote the latter conditional probability by Ps1···sm,s.

∗ Corresponding author.E-mail addresses: [email protected] (E. Fisher), [email protected] (S. Cui).

1 Tel.: +1 610 330 5281; fax: +1 610 330 5721.

0167-7152/$ – see front matter© 2010 Elsevier B.V. All rights reserved.doi:10.1016/j.spl.2010.03.011

1158 E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

Notation 1.2. Letµ be a probability distribution on the set of all patterns of lengthm from S and suppose thatµ is the initialdistribution of the Markov chain. That is, suppose that if A = a1 . . . am is a pattern from S, then µA = P(Z1 = a1, . . . , Zm =am). We denote the probability of events, C , associated with the Markov chain with initial distribution µ by Pµ(C) and ifµA = 1, then we denote PµC by PA(C).

Definition 1.1. Let B = b1 . . . bn be a pattern from S where n > m. The pattern B is said to be observable if

µBm

n−m∏j=1

PBm+j−1,m,bm+j > 0. (1.2)

Let B = b1 . . . bn be an observable pattern and define the random variable T by

T = infk ≥ n : Zk,n = B. (1.3)

Our main result, Theorem 3.1, describes a closed form expression for ET ; that is, the expected value of the first time, T , thatthe pattern B is generated by the Markov chain Zii∈N.Theorem 3.1 generalizes the results of Li (1980) and Benevento (1984) for patterns generated by i.i.d. random variables

and first-order Markov chains, respectively. Our proof is based partially upon a modification of the martingale constructionused by Li (1980) (recent papers by Pozdnyakov (2008a,b) and Glaz et al. (2006) employ the same modification). However,when combined with the occupation measure approach by Benevento (1984), we obtain an efficient method for computingthe expected time for a given pattern to appear in higher-order Markov chains, the content of Sections 2 and 3. This isillustrated by Example 3.1.This result is applied, in Section 4, to derive a closed form expression for the expected number of transitions from a

starting pattern to the first appearance of a given target pattern (Theorem 4.1). This extends a result of Li (1980) for thei.i.d. case. In Section 5, we apply this result, using a technique employed in Gerber and Li (1981), to obtain the expectedtime at which a pattern from a finite collection of patterns first appears as well as the probability that each pattern inthe collection is the first to appear. The result is a direct and efficient procedure for computing these and is an alternativeapproach to that in Pozdnyakov (2008a) and the generating function approach by Fu and Lou (2006) for computing theformer. This is illustrated by Example 5.1.

2. Martingale construction

For every j = 1, 2, . . . ,m− 1 and for all t = 1, 2, . . ., defineM(j)t = 0. Similarly, for every j = m,m+ 1, . . ., and every

t = 1, 2, . . . , j, defineM(j)t = 0.

For j ≥ m and j < t ≤ j+ n−m, defineM(j)t by

M(j)t =

0 if Zj,m 6= Bm,(t−j∏k=1

PBm+k−1,m,bm+k

)−1− 1 if Zt,m+t−j = Bm+t−j,

−1 if Zj,m = Bm and Zt,m+t−j 6= Bm+t−j.

(2.1)

For t > j+ n−m, defineM(j)t by

M(j)t = M

(j)j+n−m. (2.2)

The process M(j)t t∈N represents the following gambling game. For each j ≥ m, a gambler arrives at the game just after Zj

has been observed. If the sequence ofm observations endingwith Zj does notmatch Bm, the gambler does not enter the game.Otherwise, the gambler enters the game betting one dollar that the next observation, Zj+1, generates the next element ofthe target pattern, B. If that occurs, the gambler wins the amount (PBm,bm+1)

−1 dollars and bets that amount on the outcomeZj+2 = bm+2. If Zj+2 6= bm+2, the gambler sustains a net loss of 1 dollar. If Zj+2 = bm+2, the gambler wins the amount(PBm,bm+1PBm+1,m,bm+2)

−1 and must bet that amount on the event that the next element generated continues the pattern B.This pattern of play continues, until either B is subsequently generated, in which case the net gain of the gambler is fixed

at (∏n−mk=1 Pbk...bm+k−1,bm+k)

−1− 1 or a generated element fails to continue the pattern, in which case the gambler sustains a

fixed net loss of 1. The game has been defined so that it is a fair game, and as noted in Pozdnyakov (2008a,b), the processM(j)t is a martingale. This is the content of Proposition 2.1.

Proposition 2.1. For each j = 1, 2, . . ., the process M(j)t ,Ftt∈N is a martingale where Ft = σ(Z1, Z2, . . . , Zt), the sigma

algebra generated by the set of random variables Z1, Z2, . . . , Zt.

Proof. The result follows by direct calculation using Eqs. (2.1) and (2.2).

E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166 1159

For each t ∈ N, we define Xt by Xt =∑∞

j=1M(j)t . In the context of the underlying gambling game, Xt represents the net

gain of all the gamblers after the tth play.

Proposition 2.2. The process Xt ,Ftt∈N is a martingale.

Proof. The result follows immediately from Proposition 2.1.

From this point, our proof represents an adaptation andmodification of that used in Benevento (1984). Let δ : S×S → Rbe the Kronecker delta function:

δ(x, y) =1 if x = y,0 otherwise.

We extend this definition to patterns of length greater than or equal to one. If x = x1x2 . . . xk and y = y1y2 . . . yk are patternsfrom S, we define δ(x; y) by

δ(x; y) =k∏i=1

δ(xi, yi). (2.3)

That is, the function δ identifies whether the two patterns match.Define N by

N =T−1∑t=m

1Zt,m=Bm, (2.4)

where 1C is the indicator function of the event C and T is defined by Eq. (1.3). Then N represents the number of times priorto time T that the subpattern Bm appears and, therefore, the number of gamblers who have entered the game (that is, placeda bet) through time T − 1. It follows that

XT =n−m∑j=1

(j∏k=1

PBm+k−1,m,bm+k

)−1δ(Bm+j; Bm+j)− N. (2.5)

(Note that δ(Bm+j; Bm+j) = 1 if and only if the firstm+ j and lastm+ j elements of Bmatch.)We make use of the following result (see Williams (1991), p. 101).

Lemma 2.1. Suppose that T is a stopping time such that

P(T ≤ n+ N|Fn) > ε a.s.

for some N ∈ N, some ε > 0, and for every n ∈ N.Then ET <∞.

Lemma 2.2. Let T be the stopping time for the pattern B = b1 . . . bn. Then ET <∞.

Proof. Since the Markov chain is irreducible and the number of states (and patterns of lengthm) is finite, there exists someK ∈ N such that for every pattern β of lengthm, there exists k(β) ∈ Nwith k(β) < K such that αβ = Pβ(Zk(β),m = Bm) > 0.Let α = minαβ : β is a pattern of lengthm. Then α > 0.Let N = K + n. It follows from the Markov property that

P(T ≤ k+ N|Fk) ≥ αn−m∏j=1

PBm+j−1,m,bm+j > 0

for all k ∈ N. By Lemma 2.1, we obtain ET <∞.

Proposition 2.3. The process Xt∧T ,Ftt∈N is a uniformly integrable martingale.

Proof. It is a standard result (see Williams (1991), p. 99) that the process Xt∧T ,Ftt∈N is a martingale. Let

c =n−m∑j=1

(j∏k=1

PBm+k−1,m,bm+k

)−1δ(Bm+j; Bm+j).

It is sufficient to show that Xt∧T is bounded by an integrable random variable. Since |Xt∧T | ≤ maxc, t ∧ T for allt = 1, 2, . . ., and ET <∞ (Lemma 2.2), then Xt∧T ,Ftt∈N is uniformly integrable.

1160 E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

Proposition 2.4. Let N be as defined by (2.4). Then

EN =n−m∑j=1

(j∏k=1

PBm+k−1,m,bm+k

)−1δ(Bm+j; Bm+j). (2.6)

Proof. Since X1 = 0, the result follows from (2.5) and Proposition 2.3.

Let N∗ equal the number of times the initial segment Bm of B appears up to and including time T . Then N∗ can be definedby

N∗ = N + δ(Bm; Bm). (2.7)

DefineW1 by

W1 = mink ≥ m|Zk,m = Bm. (2.8)

Define W2 by W2 = mink > W1|Zk,m = Bm, and for j > 2 define Wj by Wj = mink > Wj−1|Zk,m = Bm. That is,W1,W2,W3, . . . are the successive hitting times for the initial segment Bm of B.DefineW by

W = mink ≥ 1|Zm+k,m = Bm. (2.9)

If A = a1a2 . . . am is a pattern of lengthm from S, then, consistent with Notation 1.2, we define EAW by

EAW = E[W |Zm,m = Am].

Given an initial pattern A, then EAW equals the expected number of additional transitions for the pattern Bm to appear. Sincethe Markov chain is irreducible and the state space is finite, it is a standard result from the theory of Markov chains (e.g. seeResnick (1992), pp. 119–120), that there exists a unique stationary distribution π and that

π−1Bm = EBmW <∞. (2.10)

Similarly, EBmW <∞.For each stopping timeWn, we define the σ -field FWn by

FWn = A ∈ F : A ∩ Wn = k ∈ Fk for all k ∈ N

(see Durrett (2005), p. 285).

Lemma 2.3. For each n ∈ N, the event (N∗ ≥ n) ∈ FWn .

Proof. For every n ∈ N, we note that (N∗ ≥ n) = (Wn ≤ T ). If k, n ∈ N, then

(Wn ≤ T ) ∩ (Wn = k) = (T ≥ k) ∩ (Wn = k) ∈ σ(Z1, . . . , Zk).

Therefore, (Wn ≤ T ) ∈ FWn .

3. Main theorem: proof and corollaries

Theorem 3.1. Let Z1, Z2, . . . be an irreducible, mth-order Markov chain, with finite state space S. Suppose that B = b1 . . . bn isan observable pattern from S. Let T , N∗, W1, and W be as defined by (1.3) and (2.7)–(2.9) respectively. Then

ET = EW1 + (EN∗)π−1Bm − EBmW (3.1)

where

EN∗ =n−m∑j=1

(j∏k=1

PBm+k−1,m,bm+k

)−1δ(Bm+j; Bm+j)+ δ(Bm; Bm). (3.2)

Proof. SinceWN∗+1 = W1 +∑N∗k=1(Wk+1 −Wk), then

EWN∗+1 = EW1 + EN∗∑k=1

(Wk+1 −Wk). (3.3)

E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166 1161

It follows from Lemma 2.3, the strong Markov property (see Durrett (2005), p. 285), and (2.10) that

EN∗∑k=1

(Wk+1 −Wk) = E∞∑k=1

(Wk+1 −Wk)1N∗≥k

=

∞∑k=1

E[E[(Wk+1 −Wk)1N∗≥k|FWk

]]= (EN∗)π−1Bm . (3.4)

Therefore, we obtain

EWN∗+1 = EW1 + (EN∗)π−1Bm . (3.5)

Since ZT ,m = Bm, the strong Markov property yields

E(WN∗+1 − T ) = E[E(WN∗+1 − T | FT )]= E[E(WN∗+1 − T | ZT ,m)]

= EBmW . (3.6)

Eq. (3.1) follows from (3.3) and (3.5) and Eq. (3.2) follows from (2.7) and Proposition 2.4.

Example 3.1. Consider a second-order (m = 2) Markov chain on the state space S = 1, 2, 3 with transition probabilitiesdefined by

P11,1 = 1/5, P11,3 = 4/5; P13,1 = P13,2 = P13,3 = 1/3P22,2 = 1/4, P22,3 = 3/4; P23,1 = 1/3, P23,2 = 2/3P31,1 = 1/4, P31,3 = 3/4; P32,2 = 1/3, P32,3 = 2/3; P33,1 = P33,2 = 2/5, P33,3 = 1/5.

This chain is irreducible on the set of patterns 11, 13, 22, 23, 31, 32, 33. Let µ be the initial distribution on this set ofpatterns defined by µ11 = 1/3 and µ31 = 2/3.Let B = 323 and let T = infk ≥ 3 : Zk,3 = B. We apply Theorem 3.1 to calculate ET .Standard calculations determine the stationary distribution, π :

π = (15/307, 48/307, 32/307, 72/307, 48/307, 72/307, 20/307).

In the context of Theorem 3.1,W1 = mink ≥ 2|Zk,2 = B2 = 32 andW = mink ≥ 1|Z2+k,2 = 32. Using standardmatrix techniques (see Resnick (1992), pp. 105–110), we obtain EB2W = E23W = 203/72. Given that the initial distribution,µ, is concentrated on the states 11, 31, it follows thatW = W1 + 2 and that EW1 = 119/16. It follows from Eq. (3.1) andfrom EN∗ = P−132,3 = 3/2 that

ET = 119/16+ (3/2)(307/72)− 203/72 ≈ 11.014.

3.1. Markov chains of order 1

In the case of a Markov chain Z1, Z2, . . ., of order one, target pattern B = b1 . . . bn for n ≥ 2, and stationary distributionπ , the result of Theorem 3.1 becomes

ET = EW1 + (EN∗)π−1b1 − EbnW (3.7)

where

EN∗ =n−m∑j=1

(j∏k=1

Pbk,bk+1

)−1δ(B1+j; B1+j)+ δ(b1; bn). (3.8)

We note that this is equivalent to Theorem 3.1 in Benevento (1984), and in this form, the original result of Li (1980) for thecase of i.i.d. random variables follows immediately. We state the result as Corollary 3.1.

Corollary 3.1. Let Z1, Z2, . . ., be i.i.d. random variables with support S, where S is finite. For each s ∈ S, let πs = P(Zi = s) > 0.Let B = b1 . . . bn, where n ≥ 2, be a pattern from S and let T be as defined by (1.3).Then

ET =n∑k=1

(πb1πb2 . . . πbk)−1δ(Bk; Bk). (3.9)

1162 E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

4. Starting patterns and applications

Let Z1, Z2, . . . be a first-order, irreducible, stationary Markov chain with finite state space S, transition probability matrixP , and unique stationary distribution π . For any pattern B = b1b2 . . . bl from S where l ≥ 1, we defineΠB by

ΠB = P(Z1 = b1, . . . , Zl = bl) = πb1l−1∏i=1

Pbibi+1 .

That is, we take the initial distribution of the chain as π .For any two patterns A = a1a2 . . . an and B = b1b2 . . . bl we define, analogously to Li (1980), the notation A ? B by

A ? B =l∑j=1

δ(Bj; Aj)Π−1Bj . (4.1)

In this equation, δ(Bj; Aj) identifies whether the initial subpattern of j elements of B matches the subpattern of the last jelements of A (see (2.3)).Let A = a1a2 . . . an and B = b1b2 . . . bl be patterns from S where B is observable (see Definition 1.1), but not a subpattern

of A.Define N(A, B), the number of transitions from the last element of A until B is observed, by

N(A, B) = mink ≥ 1 : B is a connected subpattern of a1a2 . . . anZ1 . . . Zk. (4.2)

A key characteristic describing the relationship between two patterns A and B is the maximum overlap of the head of Bon the tail of A. We define ν(A, B) by

ν(A, B) = maxk : δ(Bk; Ak) = 1 (4.3)

where we set ν(A, B) ≡ 0 if k : δ(Bk; Ak) = 1 = ∅.A useful result in the sequel is that for any two patterns A and B, it follows that

A ? B = Aν(A,B) ? B = Aν(A,B) ? Bν(A,B). (4.4)

The following notation will be useful.

Notation 4.1. Let ν0 = ν(A, B). If 0 < ν0 < l, let ν1 = ν(Bν0 , Bν0). For k ≥ 1, we define νk+1 by νk+1 = ν(Bνk−1,νk , Bνk).

Let

K = mink ≥ 0 : νk+1 ∈ 0, νk. (4.5)

For the case 0 < ν0 < l, we have

l > ν0 > ν1 > · · · > νK = νK+1 > 0 (4.6)

or

l > ν0 > ν1 > · · · > νK > νK+1 = 0. (4.7)

Theorem 4.1, which follows, provides a simple expression for EN(A, B), the expected value of N(A, B). It is analogous toLemma 2.4 in Li (1980).

Theorem 4.1. Let ν0 = ν(A, B).1. If ν0 = l, then EN(A, B) = Π−1B .2. If ν0 = 0, then EN(A, B) = B ? B+ EN(an, b1)− EN(bl, b1).3. Let K be as defined by (4.5) and suppose that 0 < ν0 < l.(a) If ν1 = 0, then EN(A, B) = B ? B− A ? B+ EN(bl, b1)− EN(an, b1).(b) If K ≥ 1 and νK+1 = 0, thenEN(A, B) = B ? B− A ? B+ (−1)K [EN(an, b1)− EN(bνK , b1)].

(c) If K ≥ 0 and νK+1 = νK , then EN(A, B) = B ? B− A ? B.

Remark 1. In the case of an i.i.d. sequence of observations Zk, k ≥ 1, Theorem 4.1 reduces to Lemma 2.4 in Li (1980),which we state as Corollary 4.1.

Corollary 4.1. Suppose that Z1, Z2, . . . are i.i.d. random variables with support a finite set S. Let A and B be patterns from S. If Bis not a connected subsequence of A, then

EN(A, B) = B ? B− A ? B.

Proof. The result follows from Theorem 4.1 and the observation that, in this setting, EN(s, b1) = EN(t, b1) for all s ∈ S andt ∈ S.

E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166 1163

4.1. Proof of Theorem 4.1, parts (1) and (2)

Suppose that ν0 = l. Then N(A, B) = N(B, B) is the return time to B. Therefore,

EN(A, B) = Π−1B . (4.8)Suppose that ν0 = 0. Define the probability distribution, µ, on S by µ(an) ≡ 1 and let Z1, Z2, . . . have the initial

distribution µ: that is, P(Z1 = an) = 1. LetW = minj ≥ 1 : Zj+1 = B1 = b1 (4.9)

andW1 = minj ≥ 1 : Zj = b1. (4.10)

Let T = infj ≥ 1 : Z1Z2 . . . Zk = B. Apply Theorem 3.1 (withm = 1) to obtain

ET = EanW1 + (EN∗)π−1b1 − EblW . (4.11)

Since ν0 = 0, then an 6= b1, EanW1 = 1 + EanW1 = 1 + EN(an, b1), and N(A, B) = T − 1. By the definition of N , we haveEblW = EN(bl, b1). Therefore

EN(A, B) = (EN∗)π−1b1 + EN(an, b1)− EN(bl, b1). (4.12)Theorem 4.1, part (1), follows from (3.8), (4.1), and the fact that A ? B = 0.

4.2. Proof of Theorem 4.1, part (3)

We prove Theorem 4.1 for the case K = 0 and then by induction for K ≥ 1. Suppose that 1 ≤ ν0 ≤ l− 1. It follows thatb1 . . . bν0 = an−ν0+1 . . . an (4.13)

andb1 . . . bν0+k 6= an−(ν0+k)+1 . . . an (4.14)

for k ≥ 1.We apply Theorem 3.1 withm = ν0. That is, consider the process Zk; k ≥ 1 as a Markov chain of order ν0. We assume

an initial distribution, µ defined by µAν0 = P(Zν0 = Aν0) = 1. As in Theorem 3.1, we defineW andW1 by

W = mink ≥ 1 : Zk+1,ν0 = Bν0 (4.15)and

W1 = mink ≥ 1 : Zk,ν0 = Bν0. (4.16)Note that N(A, B) = T − ν0.It follows from Theorem 3.1 and EAν0W1 = ν0 that

ET = EW1 + (EN∗)Π−1Bν0 − EBν0W

= ν0 + (EN∗)Π−1Bν0 − EN(Bν0 , Bν0) (4.17)

and, from this, that

EN(A, B) = (EN∗)Π−1Bν0 − EN(Bν0 , Bν0). (4.18)

By combining (3.8) with (4.1), it follows that if 1 ≤ ν0 ≤ l− 1, then

EN(A, B) = B ? B− B ? Bν0 + δ(Bν0 , Bν0)Π−1Bν0− EN(Bν0 , Bν0). (4.19)

4.2.1. The Case K = 0Suppose that ν1 = ν(Bν0 , Bν0) = ν0. Then δ(Bν0 , Bν0) = 1 and we obtain EN(Bν0 , Bν0) = Π

−1Bν0by (4.8).

By the definition of ν (see (4.3)), the assumption that ν1 = ν0, and by (4.4), it follows thatB ? Bν0 = Bν0 ? Bν0 = Bν0 ? Bν0 = A ? B.

This yields the resultEN(A, B) = B ? B− A ? B. (4.20)

Now assume that ν1 = 0. By definition (see Notation 4.1), it follows that δ(Bν0 , Bν0) = 1 and B ? Bν0 = Bν0 ? Bν0 = 0.We apply Theorem 4.1 to Bν0 and Bν0 , respectively, and obtain

EN(Bν0 , Bν0) = Bν0 ? Bν0 + EN(bl, b1)− EN(bν0 , b1).This result, the fact that Bν0 ? Bν0 = A ? B, and (4.19) establish part (2) of Theorem 4.1.

1164 E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

4.2.2. The Case K = 1The remainder of Theorem 4.1 is proved by induction on K for K ≥ 1. We first establish the result for K = 1, that is for

the case l > ν0 > ν1 > 0 and either ν2 = ν1 or ν2 = 0.We apply equation (4.19) to Bν0 and Bν0 respectively to obtain

EN(Bν0 , Bν0) = Bν0 ? Bν0 − Bν0 ? Bν1 + δ(Bν1 , Bν0,ν1)Π−1Bν1− EN(Bν0,ν1 , Bν1). (4.21)

Since ν1 < ν0, then δ(Bν0 , Bν0) = 0 and (4.19) reduces to

EN(A, B) = B ? B− B ? Bν0 − EN(Bν0 , Bν0). (4.22)

The definition of ν0 implies that

Bν0 ? Bν0 = A ? B and Bν0 ? Bν1 = A ? Bν1 . (4.23)

The definition of ν1 implies that

B ? Bν0 = Bν0 ? Bν0 = Bν1 ? Bν1 = Bν1 ? Bν1 . (4.24)

It follows that

EN(A, B) = B ? B− A ? B− (B ? Bν1 − A ? Bν1)− δ(Bν1 , Bν0,ν1)Π−1Bν1+ EN(Bν0,ν1 , Bν1). (4.25)

The definitions of ν1 and ν2, with (4.4) yield

A ? Bν1 = Bν0 ? Bν1 = Bν0,ν1 ? Bν1 = Bν2 ? Bν1 (4.26)

and

B ? Bν1 = Bν0 ? Bν1 = Bν1 ? Bν1 . (4.27)

Consider the case ν2 = ν1. Then ν2 = ν(Bν0,ν1 , Bν1) = ν1 and δ(Bν1 , Bν0,ν1) = 1. By Theorem 4.1 (part 1), we obtainEN(Bν0,ν1 , Bν1) = Π

−1Bν1. It follows from the last equality in (4.27) that B?Bν1 = Bν2 ?Bν1 and hence that EN(A, B) = B?B−A?B.

Now consider the case ν2 = 0. Then δ(Bν1 , Bν0,ν1) = 0. We apply the previously established result, Theorem 4.1 part (2),to the patterns Bν0,ν1 and Bν1 , respectively, to obtain

EN(Bν0,ν1 , Bν1) = Bν1 ? Bν1 + EN(an, b1)− EN(bν1 , b1). (4.28)

(Note that the last state in the pattern Bν0,ν1 equals an.)Since B ? Bν1 = Bν1 ? Bν1 (see (4.27)) and A ? Bν1 = Bν0 ? Bν1 = 0 (by (4.26) and the assumption that ν2 = 0), then Eqs.

(4.25) and (4.28) yield

EN(A, B) = B ? B− A ? B+ EN(an, b1)− EN(bν1 , b1).

This establishes Theorem 4.1 for the case K = 1.

4.2.3. The case K ≥ 2Suppose that K ≥ 2. This means that l > ν0 > · · · νK−1 > νK > 0 and that νK+1 = νk or νK+1 = 0. Assuming the

induction hypothesis and applying the result of Theorem 4.1 parts (3b) and (3c) to Bν0 and Bν0 (where the correspondingindex defined by (4.5) equals K − 1), we obtain

EN(Bν0 , Bν0) = Bν0 ? Bν0 − Bν0 ? Bν0 (4.29)

if νK+1 = νK and

EN(Bν0 , Bν0) = Bν0 ? Bν0 − Bν0 ? Bν0 + (−1)K−1[EN(an, b1)− EN(bνK , b1)] (4.30)

if νK+1 = 0.If νK+1 = νK , then Theorem 4.1, part (3c) follows from (4.22)–(4.24) and (4.29).If νK+1 = 0, then Theorem 4.1, part (3b), follows from (4.22)–(4.24) and (4.30).Hence, the induction proof is complete and Theorem 4.1 is established.

E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166 1165

5. Application of Theorem 4.1

The result of Theorem 4.1 can be applied to solve the following problem:Given a finite state, irreducible, stationary first-order Markov chain Zkk≥1, an initial pattern A, and a finite set of

observable patterns A1, A2, . . . , An, each of which is not a subpattern of A, find the expected value for the number ofobservations, T , beyond A at which a pattern in the collection first appears. In addition, for each pattern Ai, find theprobability, pi, that it is the first pattern to appear.The solution, which also applies in the case where there is no starting pattern, is a direct application of the results in

Li (1980, Theorem 3.1), and Gerber and Li (1981, see pp. 102–103), which were derived for the case of i.i.d. sequencesof observations. The proof in the context of this section is identical to that in Gerber and Li (1981). For the purpose ofcompleteness, we state the result here:

Theorem 5.1. Let Ti represent the time at which the pattern Ai first appears, for i = 1, 2, . . . , n. Let T = minT1, T2, . . . , Tn.For i 6= j and i, j = 1, 2, . . . , n, define eji by eji = EN(Aj, Ai). Let eii = 0 and let ei = ETi for i = 1, 2, . . . , n.Then ET , p1, p2, . . . , pn satisfies the system of n+ 1 linear equations described in matrix form by

M(ET , p1, . . . , pn)′ = (1, e1, . . . , en)′, (5.1)

where

M =

0 1 1 . . . 11 e11 e21 . . . en11 e12 e22 . . . en2. . . . . . . . . . . . . . . . . . . . .1 e1n e2n . . . enn

(5.2)

is invertible and v′ represents the transpose of the row-vector v.

Example 5.1. Let Zjj≥1 be a Markov chain with state space S = 1, 2, 3 and transition probability matrix

P =

( 1/4 1/4 1/21/3 1/6 1/25/12 7/12 0

). (5.3)

Let A be the pattern 113 and let A1 and A2 be the patterns 322 and 131 respectively. We apply Theorems 4.1 and 5.1 tocalculate ET , the expected time until one of the patterns appear and the probability for each pattern that it is the first toappear.

5.1. Solution to Example 5.1

Since the transition probability matrix P is doubly stochastic, then the stationary distribution π is uniform on 1, 2, 3.We first compute e1. With the notation of Theorem 4.1, note that ν0 = 1 and ν1 = 0. Thus, part (3a) of Theorem 4.1

applies, so that

e1 = EN(A, A1)= A1 ? A1 − A ? A1 + EN(2, 3)− EN(3, 3)

= (π3P32P22)−1 − π−13 + EN(2, 3)− EN(3, 3). (5.4)

Clearly, EN(3, 3) = π−13 . Using a standard Markov chain calculation, such as a first step analysis, one obtains the resultEN(2, 3) = 2. It follows that e1 = 188/7.Since e2 = N(A, A2), then ν0 = ν(113, 131) = 2, ν1 = ν(31, 13) = 1, and ν2 = ν(3, 1) = 0. Hence, part (3b) of

Theorem 4.1 applies with K = 1. We obtain

e2 = EN(A, A2)= A2 ? A2 − A ? A2 + (−1)1[EN(3, 1)− EN(1, 1)]

= π−11 + (π1P13P31)−1− (π1P13)−1 − [EN(3, 1)− 3]. (5.5)

As with the previous calculation, it is easily determined that EN(3, 1) = 34/13, and we conclude that e2 = 766/65.The computations for e12 and e21 are similar. Since ν0 = ν(A1, A2) = 0, we apply part (1) of Theorem 4.1 and obtain

e12 = 1116/65 and that e21 = 216/7.Thus, the vector (ET , p1, p2)′ is the solution of the matrix equation

M−1(1, e1, e2)′ (5.6)

1166 E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

where

M =

(0 1 11 0 216/71 1116/65 0

).

We conclude that ET = 3728/607 ≈ 6.14, p1 = 399/1214 ≈ 0.329, and p2 = 815/1214 ≈ 0.671.

Acknowledgement

The second author’s research participation was funded by a Lafayette College Excel grant.

References

Benevento, R.V., 1984. The occurrence of sequence patterns in ergodic Markov chains. Stochastic Process. Appl. 17, 369–373.Durrett, R., 2005. Probability: Theory and Examples, third ed. Brooks/Cole, Belmont, California.Fu, J.C., Lou, W.Y.W., 2006. Waiting time distributions of simple and compound patterns in a sequence of r-th order Markov dependent multi-state trials.Ann. Inst. Statist. Math 58, 291–310.

Gerber, H.U., Li, S.Y.R., 1981. The occurrence of sequence patterns in repeated experiments and hitting times in a Markov chain. Stochastic Process. Appl.11, 101–108.

Glaz, J., Kulldorff, M., Pozdnyakov, V., Steele, J.M., 2006. Gambling teams and waiting times for patterns in two-state Markov chains. J. Appl. Probab. 43,127–140.

Li, S.Y.R., 1980. A martingale approach to the study of occurrence of sequence patterns in repeated experiments. Ann. Probab. 8, 1171–1176.Pozdnyakov, V., 2008a. On occurrence of patterns in Markov chains: method of gambling teams. Statist. Probab. Lett. 78, 2762–2767.Pozdnyakov, V., 2008b. On occurrence of subpattern and method of gambling teams. Ann. Inst. Statist. Math. 60, 193–203.Resnick, S., 1992. Adventures in Stochastic Processes. Birkhauser, Boston.Williams, D., 1991. Probability with Martingales. Cambridge University Press, Cambridge.