appendix b

Appendix B1

Measure and Measurability

-algebra

Let denote a set of elements of interest, which is referred to

as a space, or sample space in statistical context.

A set is said to be countable if its elements can be listed as a

sequence: = {1, 2, . . .}. Otherwise is uncountable.

In particular, any finite set is countable. Any interval [a, b]

with a < b is uncountable.

A space is said to be discrete if it is countable; or continuous

if it is an interval or a product of intervals, such as:

= R = (,), = [0,), = [0, 1],

= Rn = {(x1, . . . , xn) : x1, . . . , xn R},

= [0,)2 = {(x, y) : x, y 0}, or

= [0,) [0, 1] = {(x, y) : x 0, 0 y 1}, etc.

A collection F of subsets of a space is called a -algebra or

-field if it satisfies the following axioms:

(i) The empty set F ;

(ii) If E F , then its complement Ec F ;

(iii) If Ei F , i = 1, 2, . . ., theni=1

Ei F .

1

The three axioms above further imply:

(iv) The state space F ;

(v) If E F and F F , then E F F , E F F , and

E F = E F c F ;

(vi) If Ei F , i = 1, 2, . . ., theni=1

Ei F .

In summary, a -algebra is nonempty and closed under finite or

countable operations of unions, intersections and complements.

In other words, it is a self-contained collection of subsets.

Given a collection G of subsets of , the smallest -algebra F

such that G F is called the -algebra generated by G, and

denoted by F = (G).

The smallest -algebra is F = ({}) = {,}. The largest

-algebra is the collection of all subsets of , denoted by 2.

Given E (E 6= ,), the smallest -algebra that includes E

as a member is F = ({E}) = {, E,Ec,}.

Obviously, if G1 G2, then (G1) (G2).

If is a discrete space, then every subset E of can be expressed

as a countable union of single-element sets:

E =E

{}

As a result, if a -algebra F on a discrete space includes all

single-point sets {} for , then F = 2.

2

Measurable sets

Given a space and a -algebra F on , a subset E of is said

to be measurable with respect to F if and only if E F .

By the definition of -algebra, any countable union, intersection

and complement of measurable sets are also measurable.

Borel field and sets

If = R and G = {[a, b] : a, b }, then B = (G) is called

the Borel algebra or Borel field on R. Equivalently, the intervals

[a, b] can be replaced by (a, b), (a, b], [a, b), (, b], etc.

Similarly we can define the Borel field on [0,) or any other

interval; as well as on Rn by the products of intervals:

B =

({ni=1

[ai, bi] : ai, bi R, i = 1, . . . , n

})

Any A B is called a Borel set, or said to be Borel measurable.

In particular, any single-point set {a} = [a, a] is a Borel set, and

consequently, every countable subset of R is a Borel set as well.

The Borel field B on an interval I must include all intervals

and countable unions, intersections and complements of intervals

contained in I. It is quite large and includes all subsets we need

to define probability, but it does not include every subset of an

interval; in other words, B 6= 2I . There exist many non-Borel

sets, but it is not easy to show an explicit example.

3

Measure

Given a space and a -algebra F on , a set function m()

defined on F is said to be a measure if it satisfies the following

two axioms:

(i) m(E) 0 for any E F ;

(ii) If E1, E2, . . . F are disjoint or mutually exclusive in the

sense that Ei Ej = for i 6= j, then

m

(i=1

Ei

)=

i=1

m (Ei)

A measure defined on the Borel field is called a Borel measure.

Among the most useful Borel measures is the Lebesgue measure,

which assigns measure

m([a, b]) = m((a, b)) = m((a, b]) = m([a, b)) = b a

to any interval in R. In particular, for any a R,

m({a}) = m ([a, a]) = a a = 0

Consequently, every countable subset of R has zero measure.

The Lebesgue measure is defined on a -algebra F larger than

the Borel field B. There exist, however, subsets of an interval I

for which even the Lebesgue measure cannot be assigned. That

explains why 2I is too large to be a useful -algebra.

For the purpose of probability, the Borel field together with the

Lebesgue measure is sufficient.

4

Measurable functions

A real-valued function g(x) defined on R is said to be measurable

if g1(A) = {x : g(x) A} is a Borel set for every Borel set A.

In particular, an indicator function g(x) = I{xB} of any B B

is measurable since g1(A) = , B, Bc or R for any A B.

A real-valued function g(x) defined on R is said to be Riemann

integrable if the integralRg(x)dx is well-defined in the usual

sense of Calculus. This integral is referred to as the Riemann

integral.

A property is said to hold almost everywhere if it holds on a

Borel set B such that m(Bc) = 0 under the Lebesgue measure.

A function is Riemann integrable if and only if it is continuous

almost everywhere.

An almost everywhere continuous function g(x) is measurable.

Consequently, all analytic, continuous, piecewise continuous, and

Riemann integrable functions are measurable.

In fact, all functions of practical interest are measurable.

A simple example of a measurable function that is not Riemann

integrable is the indicator of the set R of rational numbers:

IR(x) = I{xR} ={1 if x is a rational number0 if x is an irrational number

Since R is countable, it is a Borel set and so IR(x) is measurable.

It is however nowhere continuous, hence not Riemann integrable.

5

Lebesgue integral

The Lebesgue integral is defined for measurable functions.

If g(x) is a Riemann integrable function, its Lebesgue integral

coincides with its Riemann integral. Hence we will use the same

notation for Lebesgue and Riemann integrals.

If g(x) = 1IA1(x) + + kIAk (x) is a linear combination of

indicators of Borel sets, its Lebesgue integral is defined byR

g(x)dx =ki=1

im(Ai)

where m() is the Lebesgue measure. In particular, if R is the

set of rational numbers, thenRIR(x)dx = m(R) = 0.

For a measurable function g(x) 0, there exists a sequence of

linear combinations {gn(x)} of indicators of Borel sets such that

gn(x) g(x). The Lebesgue integral of g(x) is then defined byR

g(x)dx = limn

R

gn(x)dx

A measurable function g(x) is said to be Lebesgue integrable ifR

|g(x)|dx =

R

g+(x)dx+

R

g(x)dx

Probability space and random variable

For a continuous state space S, by a measurable set A S we

mean A B; that is, A is Borel measurable.

Let be a sample space (of all possible outcomes) and F be a

-algebra of subsets of . Each E F is called an event.

A probability (measure) is a measure Pr() defined on F such that

Pr() = 1. The probability Pr(E) is defined only for E F .

The triplet (,F ,Pr()) is called a probability space.

A real-valued functionX = X() of is said to be a random

variable if and only if {X A} F for every Borel set A.

If X is a random variable and g(x) is a measurable function,

then g(X) is also a random variable.

If X is a continuous random variable, then its state space S is

an interval and Pr(X A) is defined only for Borel set A S.

Given a cdf F (x), we can define Pr(X x) = F (x) and extend

it to Pr(X A) for any Borel set A via the axioms or properties

of the probability, such as

Pr(X (a, b]) = Pr(a < X b) = Pr(X b) Pr(X a)

Pr(X (, b)) = Pr(X < b) = limn

Pr(X b n1

)and so on, as the Borel field B is generated by {(, x], x R}.

Such extensions may not be possible if A / B.

This explains why a cdf can determine a probability distribution.

7

Appendix B2

Stationary Distribution

Existence

Let {Xn : n = 0, 1, . . .} be a Markov chain with a finite state

space S = {1, 2, . . . ,N} and transition matrix P = (pij)NN .

Since pi1 + + piN = 1, i = 1, . . . ,N, the matrix

I P =

1 p11 p12 . . . p1Np21 1 p22 . . . p2N...

.... . .

...pN1 pN2 . . . 1 pNN

has a zero sum of elements for each row.

Thus the N columns of I P are linearly dependent, implying

Rank(I P ) < N . Consequently, the equation pi(I P ) = 0 or

pi = piP has at least one solution pi 6= 0.

For convenience, we write pi > 0 for a vector pi = (pi1, . . . , piN )

(either a row or a column) if pij 0 for all j = 1, . . . ,N and

pij > 0 for some j. We also write pi < 0 if pi > 0.

We now prove the existence of a row vector pi > 0 such that

pi = piP by induction on the number N of states.

Start with N = 1. In this case, we must have P = 1, hence

pi = 1 satisfies pi = piP obviously.

8

For N > 1, assume there exists such a pi > 0 with k states for

1 k < N . Then we need to prove the case with N states.

Let pi 6= 0 be a solution to pi = piP . If neither pi > 0 nor pi < 0,

we can write pi = [pi1 pi2] and

P =

[P11 P12P21 P22

](B2.1)

where pi1 is a vector of k negative elements (1 k < N), pi2 > 0

has N k elements, and P11 is a k k matrix.

If P12 = 0, then P11 is a kk transition matrix (with the elements

of each row add to 1). Hence by the induction assumption, there

exists pi > 0 such that pi = piP11.

Consequently,

[pi 0]P = [pi 0]

[P11 0P21 P22

]= [piP11 0] = [pi 0]

Thus pi = [pi 0] > 0 and pi = piP .

If P12 6= 0, then pij > 0 for some i k and j > k, so that

pi1 + + pik < pi1 + + pik + pij 1.

Let 1k = [1 1 1]T denote the k 1 vector with all elements

equal to 1. Then

(I P11)1k =

1 p11 p1k

...1 pk1 pkk

> 0

Thus pi1(I P11)1k < 0 since all elements of pi1 are negative.

9

On the other hand, by (B2.1),

pi = piP pi1 = pi1P11 + pi2P21 pi1(I P11) = pi2P21

This together with pi2 > 0 leads to a contradiction:

0 > pi1(I P11)1k = pi2P211k 0

It follows that when P12 6= 0, either pi > 0 or pi < 0. Thus we

can take either pi = pi > 0 or pi = pi > 0 to satisfy pi = piP .

We have shown the existence of pi > 0 such that pi = piP with

N states. By the principle of mathematical induction, this holds

for all N = 1, 2, . . ..

Let pi = (pi1 , . . . , piN ) > 0 be a row vector such that pi

= piP .

Then pi1 + + piN > 0.

Take pi = [pi1 piN ] with

pij =pij

pi1 + + piN

, j = 1, 2, . . . ,N.

Then pi > 0, pi1 + + piN = 1 and

pi(I P ) =pi(I P )

pi1 + + piN

= 0 = pi = piP

Thus pi is a stationary distribution of {Xn}.

This shows that a Markov chain with a finite state space must

have at least one stationary distribution.

10

Uniqueness

Suppose there exist two stationary distributions pi 6= pi. Then

pi = pi pi 6= 0 and

piP = (pi pi)P = piP piP = pi pi = pi

Thus pi is a non-zero solution to equation pi = piP .

Since pi and pi each has a sum 1 over its elements, the elements

of pi has a sum 0. Hence neither pi > 0 nor pi < 0.

Then a partition of P as in (B2.1) leads to

P =

[P11 0P21 P22

]

where P11 is a k k matrix with 1 k < N .

It follows that the transition matrix at time n has the form

P (n) = Pn =

[Pn11 0

P(n)21 P

n22

](B2.2)

where P(n)21 and P

n22 are two matrices of orders (N k) k and

(N k) (N k) respectively.

From (B2.2) we see that p1N (n) = 0 for all n. This shows that

the Markov chain with transition matrix P is reducible.

Therefore, two distinct stationary distributions are possible only

with a reducible Markov chain.

Consequently, an irreducible Markov chain with finite states

must have a unique stationary distribution.

11

Appendix B3

Convergence of Markov Chain

For any time-homogeneous Markov chain with finite N states

and transition matrices P = (pij)NN and Pn = (pij(n))NN , it

has been theoretically proved that limn

pij(n) exists (i.e., pij(n)

converges) for every aperiodic state j. Thus if P is aperiodic

(every state of P is aperiodic), then Pn converges.

If P is irreducible, then P = limn

Pn exists if and only if P is

aperiodic. This is because each row of P is the unique stationary

distribution pi = (pi1, . . . , piN ) of P with pij > 0 for at least one

state j. Therefore,

limn

Pn = P = limn

pjj(n) = pij > 0 = pjj(n) > 0

for all sufficiently large n. This means that state j is aperiodic

(a periodic state j must have pjj(n) = 0 for infinitely many n),

and so P is aperiodic since it is irreducible.

If all states of P are periodic, then Pn must diverge. To see this,

let P11 be an irreducible block of P that is a transition matrix

itself. Then Pn11 diverges as P11 is periodic, hence Pn diverges.

However, individual pij(n) may converge for some (i, j) even if

every state of P is periodic. A trivial example is a reducible P ,

which has some states i, j such that pij(n) = 0 for all n.

If P has both periodic and aperiodic states (such a P must be

reducible), then Pn may or may not converge.

12

A simple example with divergent Pn is

P =

0 1 01 0 00 0 1

= Pn = { I3 for even n

P for odd n

Hence states 1 and 2 are periodic with d = 2, whereas state 3 is

aperiodic. As Pn oscillates between I3 and P , it diverges.

The next example shows that Pn may converge:

P =

0 0.5 0.50.5 0 0.5

0 0 1

= Pn =

0.5n 0 1 0.5n0 0.5n 1 0.5n

0 0 1

for even n and

Pn =

0 0.5n 1 0.5n0.5n 0 1 0.5n

0 0 1

for odd n.

Thus states 1 and 2 are periodic (d = 2) and state 3 is aperiodic.

Pn converges obviously in this case:

P = limn

Pn =

0 0 10 0 10 0 1

,

where each row of P is the unique stationary distribution of P .

This example also shows that limn

pij(n) = 0 exist for periodic

states j = 1, 2 beyond the trivial case pij(n) = 0 for all n.

Even if Pn diverges, pi(n) = pi(0)Pn may converge for some pi(0)

that satisfies certain conditions. This is demonstrated in the

following example.

13

Consider a transition matrix of the form

P =

[A B0 D

]NN

with D =

[0 11 0

]22

(N > 2)

Assume that An 0 as n. Then(I A2

)1exists.

Note that

P 2 =

[A2 AB + BD0 I2

]has aperiodic states N 1 and N . Hence

limm

P 2m = limm

(P 2)m =

[0 Q0 I2

]for some Q (B3.1)

(as pij(2m) converges for aperiodic states j = N 1 and N).

Since P 2m+2 has the same limit as P 2m, (B3.1) implies[0 Q0 I2

]= P 2

[0 Q0 I2

]=

[A2 AB +BD0 I2

] [0 Q0 I2

]= Q = A2Q+AB +BD =

(I A2

)Q = AB + BD

= Q =(I A2

)1(AB +BD) (B3.2)

It also follows from (B3.1) that

limm

P 2m+1 =

[0 Q0 I2

]P =

[0 QD0 D

](B3.3)

(B3.1) and (B3.3) show that Pn diverges as D 6= I2.

Let pi1 be a 1 (N 2) vector of nonnegative elements with sum

no more than 0.5, and pi2 = [0.5 0.5] pi1Q. Then

pi1Q+ pi2 = [0.5 0.5] = [0.5 0.5]D = (pi1Q+ pi2)D (B3.4)

14

It follows from (B3.1), (B3.3) and (B3.4) that

limm

[ pi1 pi2 ]P2m = [ 0 pi1Q+ pi2 ] = [0 pi1QD + pi2D ]

= limm

[pi1 pi2 ]P2m+1 (B3.5)

where 0 is a 1 (N 2) vector of zeros. This shows that the

limit limn

[ pi1 pi2 ]Pn exists.

Since each row of the matrix Q must have sum equal to 1, it

is not difficult to see that pi(0) = [pi1 pi2 ] provides an initial

distribution such that

limn

pi(n) = limn

pi(0)Pn = limn

[ pi1 pi2 ]Pn = [0 0.5 0.5 ]

Therefore, if pi1 and pi2 have nonnegative elements and satisfy

the condition pi1Q + pi2 = [0.5 0.5], then pi(n) converges with

pi(0) = [pi1 pi2 ], although Pn diverges.

The domain for pi(0) = [pi1 pi2 ] to meet the above conditions

has a dimension N 2, which is one less than the dimension

N 1 for pi(0) without such conditions.

The above example can also show that limn

pij(n) > 0 is possible

for a periodic state j. To see this, take N = 3, A = 0.8 and

B = [0.1 0.1]. Then by (B3.2),

Q = (1 0.82)1(0.8[0.1 0.1] + [0.1 0.1]D)

=1

0.36[0.18 0.18] = [0.5 0.5] = QD

Hence by (B3.1) and (B3.3), limn

p1j(n) = 0.5 > 0 for periodic

states j = 2, 3.

15

Appendix B4

Properties of Poisson Processes

Partition of a Poisson process

Let N(t) be a Poisson process with rate , which counts the

number of events occurred by time t.

Each event is classified into one of k types, independent of N(t),

with Pr(Type j) = pj , j = 1, . . . , k, where p1 + + pk = 1.

Let Nj(t) be the numbers of type j events occurred by time t,

j = 1, . . . , k. Conditional on N(t) = n, N1(t), . . . Nk(t) have a

multinomial distribution:

Pr(Nj(t) = nj, j = 1, . . . , k|N(t) = n) =n!

n1! nk!pn11 p

nkk

where n1, . . . , nk satisfy n1 + + nk = n.

Consequently,

Pr(N1(t) = n1, . . . ,Nk(t) = nk)

= Pr(Nj(t) = nj , j = 1, . . . , k|N(t) = n)Pr(N(t) = n)

=n!

n1! nk!pn11 p

nkk e

t (t)n

n!

=(t)n1++nk

n1! nk!pn11 p

nkk e

(p1++pk)t

=(tp1)

n1

n1!ep1t

(tpk)nk

nk!epkt

Thus N1(t),N2(t), . . . ,Nk(t) are independent Poisson processes

with rates p1, . . . , pk respectively.

16

Transform of a multivariate density

Before deriving the joint distribution of arrival times, we first

review the transform of a multivariate density function.

Let fX(x1, . . . , xn) denote the joint density function of random

variables X1, . . . ,Xn.

Assume that a one-to-one transform between (X1, . . . ,Xn) and

(Y1, . . . , Yn) is given by Xi = gi(Y1, . . . , Yn), i = 1, . . . , n.

Then by multivariate calculus, the joint density of Y1, . . . , Yn is

given by

fY (y1, . . . , yn) = fX(x1, . . . , xn)|J | (B4.1)

where xi = gi(y1, . . . , yn), i = 1, . . . , n, and

J =

(xiyj

)nn

=

x1y1

x1y2

x1yn

x2y1

x2y2

x2yn

......

. . ....

xny1

xny2

xnyn

which is called the Jacobian of the transform from (x1, . . . , xn)

to (y1, . . . , yn).

If the transform is linear: X = AY , where X = (X1, . . . ,Xn)T ,

Y = (Y1, . . . , Yn)T , and A = (aij)nn is a constant matrix, then

xiyj

=

yj

nk=1

aikyk = aij , i, j = 1, . . . , n.

Hence J = |A| (the determinant of A).

17

Arrival times

Given a Poisson process N(t) with rate to count the number

of events, the arrive time of the kth event is given by

Ak = T1 + T2 + + Tk, k = 1, 2, . . .

where T1, T2, . . . are i.i.d. exponentially distributed with rate .

Let (a1, . . . , an) and (t1, . . . , tn) denote the values of random

vectors (A1, . . . , An) and (T1, . . . , Tn) respectively. Thena1a2...an

=

1 0 01 1 0...

.... . .

...1 1 1

t1t2...tn

(B4.2)

subject to restrictions a1 < < an and t1 > 0, . . . , tn > 0.

Let fA and fT denote the joint densities of (A1, . . . , An) and

(T1, . . . , Tn) respectively.

Since the matrix in (B4.2) has determinant equal to 1, we have

J = 1 for the transform in (B4.2). Hence by (B4.1),

fA(a1, . . . , an|N(t) = n) = fT (t1, . . . , tn|N(t) = n)

=Pr(N(t) = n|t1, . . . , tn)fT (t1, . . . , tn)

Pr(N(t) = n)

=Pr(N(t)N(an) = 0)e

t1 etn

Pr(N(t) = n)

=e(tan)ne(t1++tn)

et(t)n/n!=

n!

tn(B4.3)

if 0 a1 < < an t; and 0 otherwise.

18

LetX1, . . . ,Xn be i.i.d. random variables with a common density

fX(x), and X(1) < < X(n) their order statistics.

The joint density of of X1, . . . ,Xn at (x1, . . . , xn) is given by

f(x1, . . . , xn) = fX(x1) fX(xn) = fX(x(1)) fX(x(n)).

Given x(1) < x(2) < < x(n), there are n! unordered n-tuples

(x1, . . . , xn) whose ordered values are equal to x(1), . . . , x(n), each

with density fX(x(1)) fX(x(n)).

Therefore, the density of X(1), . . . ,X(n) is given by

f(n)(x(1), . . . , x(n)) = n!ni=1

fX(x(i)) (B4.4)

For example, when n = 3 and (x(1), x(2), x(3)) = (1, 2, 3),

f(3)(1, 2, 3) = f(1, 2, 3) + f(1, 3, 2) + f(2, 1, 3)

+ f(2, 3, 1) + f(3, 1, 2) + f(3, 2, 1)

= 3!fX(1)fX(2)fX(3)

In particular, if fX(x) = t1I{0xt} is uniform over [0, t], then

f(n)(x(1), . . . , x(n)) = n!

(1

t

)n=

n!

tn(B4.5)

if 0 x(1) < < x(n) t; and 0 otherwise.

Compare (B4.5) with (B4.3), we see that the conditional joint

distribution of the arrival times A1 < < An given N(t) = n is

the same as that of the order statistics of n independent uniform

random variables over interval [0, t].

19

The time-inhomogeneous case

If the Poisson processN(t) is time-inhomogeneous with intensity

function (t), then by the independent increments,

Pr(Tk > t|T1 = t1, . . . , Tk1 = tk1)

= Pr(N(ak1 + t)N(ak1) = 0) = e[(t+ak1)(ak1)]

where a0 = t0 = 0 and ak = t1 + + tk, k = 1, 2, . . ..

Thus the density of Tk given {T1 = t1, . . . , Tk1 = tk1} is

fk(t|t1, . . . , tk1) = (t+ ak1)e[(t+ak1)(ak1)]

and the joint density of T1, . . . , Tn is given by

f(t1, ..., tn) =nk=1

fk(tk|t1, ..., tk1) =nk=1

(ak)e[(ak)(ak1)]

= (a1) (an)e(an)

Then similar to (B4.3), for 0 a1 < < an t,

fA(a1, . . . , an|N(t) = n) =e[(t)(an)](a1) (an)e

(an)

e(t)[(t)]n/n!

= n!(a1) (an)

[(t)]n= n!

nj=1

(aj)

(t)(B4.6)

Compare (B4.6) with (B4.4), we can see that given N(t) = n,

the arrival times A1 < < An are distributed as the order

statistics of i.i.d. X1, . . . ,Xn with a common density

fX(x) =(x)

(t), 0 x t.

20

Define Poisson process by inter-arrival times

Let {Nt, t 0} be a counting process and T1, T2, . . . are the inter-

arrival time of Nt. If T1, T2, . . . is a sequence of i.i.d. exponential

random variables with a common rate , we can show that the Nt is

a Poisson process with rate in the following steps.

First, Nt can be defined by T1, T2, . . . as follows:

Nt = min{k : Ak > t}, (B4.6)

where Ak = T1 + + Tk is the arrival time of the kth event.

Then (B4.6) implies

{Nt n} = {An+1 > t}, {Nt = n} = {An t < An+1} (B4.7)

Next, it is not difficult to show that Nt is Poisson distributed

with mean t (in a tutorial exercise).

Then we show that Nt has Markov property. Let nt 0 be

integer-valued and nondecreasing in t 0. Then Tns+1, . . . , Tnt

are independent of Anu = T1 + + Tnu for any 0 u s < t

and of Tnu+1 if nu < ns. These together with (B4.7) imply

Pr(Nt nt|Nu = nu, u s,Ans = as)

= Pr(Ant > t|Anu u < Anu + Tnu+1, u s,Ans = as)

= Pr(as + Tns+1 + + Tnt > t|as + Tns+1 > s,Ans = as)

= Pr(Nt nt|Ns = ns, Ans = as) (B4.8)

for any 0 s < t and 0 as s.

21

Note that the value as of Ans must satisfy 0 as s since ns is

the number of arrivals no later than time s by (B4.6) and (B4.7).

It follows from (B4.8) that

Pr(Nt nt|Nu = nu, u s) = Pr(Nt nt|Ns = ns)

for any 0 s < t. This shows the Markov property of Nt.

It remains to show that Nt+u Nt is Poisson distributed with

mean u and is independent of Nt for any t, u > 0. By (4.7),

Pr(Nt+u Nt n|Nt = m) = Pr(Nt+u m+ n|Nt = m)

= Pr(Am+n+1 > t+ u|Am t < Am+1)

Hence for any 0 am t, by the property of the exponential

distribution and the independence between Am and Tm+1,

Pr(Nt+u Nt 0|Nt = m,Am = am)

= Pr(Nt+u m|Nt = m,Am = am)

= Pr(Am+1 > t+ u|Am = am t < Am+1)

= Pr(am + Tm+1 > t+ u|am + Tm+1 > t)

= Pr(Tm+1 > t+ u am|Tm+1 > t am)

= Pr(Tm+1 > u) = eu (B4.9)

As eu does not depend on m and am, (B4.9) shows that

Pr(Nt+u Nt 0|Nt = m) = Pr(Nt+u Nt 0)

= eu = Pr(Nu = 0) = Pr(Nu 0) (B4.10)

22

For n 1, let X = Tm+2 + + Tm+n+1 Gamma(n, ) and

define event E = E(x) = {Am = am t,X = x}. Then an

argument similar to that of (B4.9) leads to

Pr(Nt+u Nt n|Nt = m,E) = Pr(Nt+u m+ n|Nt = m,E)

= Pr(am + Tm+1 + x > t+ u|am + Tm+1 > t,E)

= Pr(Tm+1 > u x) = e(ux)I{xu} + I{x>u} (B4.11)

Multiply (B4.11) by the density fX(x) of X Gamma(n, ) and

integrate over x 0, we get

Pr(Nt+u Nt n|Nt = m) =

0

Pr(Tm+1 > u x)fX(x)dx

=

u0

e(ux)nxn1ex

(n 1)!dx+

u

nxn1ex

(n 1)!dx

= eun u0

xn1

(n 1)!dx+ Pr(An > u)

= eunun

n!+ Pr(Nu n 1) = Pr(Nu n) (B4.12)

It follows from (B4.10) and (B4.12) that Nt+uNt has a Poisson

distribution with mean u and is independent of Nt. Thus Nt is

a Poisson process with rate .

Remarks

(i) (B4.6) and (B4.7) summarise the relationship between a counting

process and its arrival and inter-arrival times.

(ii) A processNt defined by (B4.6) is a Markov process provided that

T1, T2, . . . are independent (need not be i.i.d. or exponential).

23

appendix b

Documents

e f c f

ande f

collection f of subsets

complement ec f

state space f v

borel set

borel field b

subset e