stochastic processes applications lecturenotes

8/13/2019 Stochastic Processes Applications Lecturenotes

1/102

Stochastic Processes

Selective Topics and Applications

Nguyen V.M. Man, Ph.D.

January 15, 2013


2/102

Keywords. probabilistic model, random process, linear algebra, compu-

tational algebra, statistical inference and modeling

Copyright in 2013 by

Lecturer Nguyen V. M. Man, Ph.D.

Faculty Computer Science and Engineering

Institution University of Technology of HCMC - HCMUT

Address 268 Ly Thuong Kiet, Dist. 10, HCMC, Vietnam

Email: [email protected]

Ehome www.cse.hcmut.edu.vn/ mnguyen

the AUTHOR

Man Nguyen conducted his Ph.D. research in Applied Mathematicsand Industrial Statistics after following a master program in Computa-

tional Lie Algebras at HCMs University of Science.

The Ph.D. work was about Factorial Experiment Designs usingCom-

puter Algebraicmethods andDiscrete Mathematics, be done at the Eind-

hoven University of Technology, the Netherlands in 2001-2005.

His current research interests include

* Algebraic Statistics and Experimental Designs, and

* Mathematical & Statistical Modeling of practical problems.

For more information, you are welcomed to visit his e-home at

www.cse.hcmut.edu.vn/mnguyen


3/102

ii


4/102

Contents

1 Background 91.1 Introductory Stochastic Processes . . . . . . . . . . . . . 9

1.2 Generating Functions . . . . . . . . . . . . . . . . . . . . 13

1.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 13

1.2.2 Elementary results of Generating Functions . . . 16

1.2.3 Convolutions . . . . . . . . . . . . . . . . . . . . 18

1.2.4 Compound distributions . . . . . . . . . . . . . . 20

2 Markov Chains & Modeling 23

2.1 Homogeneous Markov chains. . . . . . . . . . . . . . . . 24

2.2 Classification of States . . . . . . . . . . . . . . . . . . . 29

2.3 Markov Chain Decomposition . . . . . . . . . . . . . . . 33

2.4 Limiting probabilities & Stationary distributions. . . . . 36

2.5 Theory of stochastic matrix for MC . . . . . . . . . . . . 412.6 Spectral Theorem for Diagonalizable Matrices . . . . . . 45

2.7 Markov Chains with Absorbing States . . . . . . . . . . 48

2.7.1 Theory. . . . . . . . . . . . . . . . . . . . . . . . 48

2.8 Chapter Review and Discussion . . . . . . . . . . . . . . 52

iii


5/102

iv CONTENTS

3 Random walks & Wiener process 55

3.1 Introduction to Random Walks . . . . . . . . . . . . . . 553.2 Random Walk- a mathematical realization . . . . . . . . 56

3.3 Wiener process . . . . . . . . . . . . . . . . . . . . . . . 60

4 Arrival-Type processes 63

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2 The Bernoulli process. . . . . . . . . . . . . . . . . . . . 64

4.2.1 Basic facts . . . . . . . . . . . . . . . . . . . . . . 644.2.2 Random Variables Associated with the

Bernoulli Process . . . . . . . . . . . . . . . . . . 66

4.3 The Poisson process. . . . . . . . . . . . . . . . . . . . . 66

4.3.1 Poisson distribution . . . . . . . . . . . . . . . . . 66

4.3.2 Poisson process . . . . . . . . . . . . . . . . . . . 67

4.4 Course Review and Discussion . . . . . . . . . . . . . . . 68

5 Probability Modeling and Mathematical Finance 71

5.1 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.1.1 History. . . . . . . . . . . . . . . . . . . . . . . . 72

5.1.2 Conditional expectation . . . . . . . . . . . . . . 72

5.1.3 Key properties of Conditional expectation . . . . 74

5.1.4 Filtration . . . . . . . . . . . . . . . . . . . . . . 75

5.1.5 Martingale. . . . . . . . . . . . . . . . . . . . . . 76

5.1.6 Martingale examples . . . . . . . . . . . . . . . . 78

5.1.7 Stopping time . . . . . . . . . . . . . . . . . . . . 81

5.2 Stochastic Calculus . . . . . . . . . . . . . . . . . . . . . 83

5.2.1 A Simple Model for Asset Prices. . . . . . . . . . 83


6/102

5.2.2 Stochastic differential equation . . . . . . . . . . 83

6 Part III: Practical Applications of SP 85

6.1 Statistical Parameter Estimation . . . . . . . . . . . . . 85

6.2 Inventory Control in Logistics . . . . . . . . . . . . . . . 87

6.3 Epidemic processes . . . . . . . . . . . . . . . . . . . . . 89

6.4 Statistical Models in Risk Management . . . . . . . . . . 90

6.5 Optimization Methods for Portfolio Risk Management. . 91


7/102

2 CONTENTS


8/102

Introduction

We propose a few specific probabilistic techniques used in mathemati-

cally modeling complex phenomena in biology, service systems or finance

activities. These are aimed for graduates in Applied Mathematics and

Statistics.

The aims the course

introduces basic techniques of Stochastic Processes theory, including:

Markov chains and processes (discrete and continuous parame-

ters.)

Random walks, fluctuation theory.

Stationary processes, spectral analysis.

Diffusion processes.

Applications in finance and transportation.

The structure of the course. The course consists of three parts:

Part I: Motivated topics for studying Stochastic Processes

3


9/102

4 CONTENTS

Part II: Fundamental setting of Stochastic Processes

Part III: Connections and research projects

Part I: Motivated topics and Background

Service systems: mathematical model of queueing systems.

Introductory Stochastic Processes: basic concepts

Part II: Basic Stochastic Processes

We will discuss the followings:

Markov Chains and processes

Random walks and Wiener process

Arrival-Type processes

Martingaleand Stochastic Calculus

Part III: New applications of SP

We investigate few following applications:

Statistical Models and Simulation in Risk Management

Mathematical and Statistical Model in Transportation Science


10/102

Motivated topics of SP

Service systemsOver the last few years the Processor Sharing scheme has attracted re-

newed attention as a convenient and efficient approach for studying band-

width sharing mechanisms such as TCP or any process requiring resource

sharing.

Understanding and computing such those processes to produce a high

performance system with limited resources is a very difficult task. Few

typical aspects of the resource allocation are:

1. the fact that many classes of jobs (clients) come in a system with

distinct rates demands a wise policy to get them through efficiently,

2. measuring performance of a system through many different param-

eters (metrics) is hard, requires complex mathematical models.

Evolutionary Dynamics

Keywords: critial lineages, virus mutant,mutation, reproductive ratio,

invasion, escape, ecology, vaccine.

5


11/102

6 CONTENTS

Introductory Invasion and Escape. Some realistic biological phe-

nomina occur in nature such as: (a) a parasite infecting a new host, (b) aspecies trying to invade a new ecological niche, (c) cancer cells escaping

from chemotherapy and, (d) viruses evading anti-microbial therapy.

Typical problems. Imagine a virus of one host species that is trans-

ferred to another host species (HIV, SARS). In the new host, the virus

has a basic reproductive ratio R less than one. Some mutation may

be required to generate a virus mutant attempting to invade the new

host that can lead to an epidemic in the new host species. Few crucialconcerns are:

1. how to calculate the probability that such an attempt succeeds?

2. suppose a successful and effective vaccine is found; but some mu-

tants can breakthrough the protective immunity of the vaccine.

How to calculate the probability that a virus quasispecies contains

an escape mutant that establishes an infection and thereby causes

vaccine failure?

Summary usage We call for a theory to calculate the probability

of non-extinction/ escape for lineages starting from single individuals.

Computing Software

OpenModelica, ScalaLab and R.

Introductory R a statistiscal language

R is a language and environment for statistical computing and graphics.

It is similar to the S language and environment which was developed


12/102

CONTENTS 7

at Bell Laboratories (formerly AT&T, now Lucent Technologies). The

R distribution contains functionality for a large number of statisticalprocedures. Among these are:

linear and generalized

linear models, nonlinear regression models, time series analysis,

classical parametric and nonparametric tests ...

There is also a large set of functions which provide a flexible graphical

environment for creating various kinds of data presentations.

One of Rs strengths is the ease with which well-designed publication-

quality plots can be produced, including mathematical symbols and for-

mulae where needed. Great care has been taken over the defaults for the

minor design choices in graphics, but the user retains full control. R is an

integrated suite of software facilities for data manipulation, calculation

and graphical display. It includes

* an effective data handling and storage facility,

* a suite of operators for calculations on arrays, in particular matrices,

* a large, coherent, integrated collection of intermediate tools for data

analysis,

* graphical facilities for data analysis and display

* a well-developed, simple and effective programming language which

includes conditionals, loops, user-defined recursive functions and inputand output facilities.

Note: most classical statistics and much of the latest methodology is

available for use with R, but users may need to be prepared to do a little

work to find it.


13/102

8 CONTENTS


14/102

Chapter 1

Background

1.1 Introductory Stochastic Processes

The concept. A stochastics process is just a collection (usually infi-

nite) ofrandom variables, denoted Xt or X(t); where parameter t often

represents time. State space of a stochastics process consists of all real-

izationsxofXt, i.e. Xt=x says the random process is in state xat time

t. Stochastics processes can be generally subdivided into four distinct

categories depending on whether tor Xt are discrete or continuous:

1. Discrete processes: both are discrete, such asBernoulli process(die

rolling) or Discrete Time Markov chains.

2. Continuous time discrete state processes: the state space ofXt is

discrete and the index set, e.g. time set T oft is continuous, as an

interval of the reals R.

Poisson process the number of clients X(t) who has entered

9


15/102

10 CHAPTER 1. BACKGROUND

ACB from the time it opened until time t. X(t) will have the

Poisson distribution with the meanE[X(t)] =t (being thearrive rate).

Continuous time Markov chain.

Queuing process people not only enter but also leave the

bank, we need the distribution of service time (the time a

client spends in ACB).

3. Continuous processes: both Xt and t are continuous, such as dif-

fusion process (Brownian motion).

4. Discrete time continuous state processes: Xt is continuous andt is

discrete the so-called TIME SERIES such as

monthly fluctuations of the inflation rate of Vietnam,

daily fluctuations of a stock market.

Examples

1. Discrete processes: random walk model consisting of positions Xt

of an object (drunkand) at time discrete time point t during 24

hours, whose directional distance from a particular point 0 is mea-

sured in integer units. Here T={0, 1, 2, . . . , 24}.

2. Discrete time continuous processes: Xtis the number of births in a

given population during time period [0, t]. Here T = R+ = [0, )

and the state space is {0, 1, 2, . . . , } The sequence of failure times

of a machine is a specific instance.


16/102


17/102


- X(t) and X(t+ ) will have the same distributions. For the

first-order distribution,

FX(x; t) =FX(x; t + ) =FX(x); and fX(x; t) =fX(x).

These processes are found in Arrival-Type Processes. For which,

we are interested in occurrences that have the character of an ar-

rival, such as message receptions at a receiver, job completions in

a manufacturing cell, customer purchases at a store, etc. We will

focus on models in which the interarrival times(the times between

successive arrivals) are independent random variables.

The case where arrivals occur in discrete time and the interarrival

times are geometrically distributed is the Bernoulli process.

The case where arrivals occur in continuous time and the inter-

arrival times are exponentially distributed is the Poisson process.

Bernoulli processand Poisson processwill be investigated next.

2. MARKOVIAN (memory-less) property: Many processes with memory-

less property caused by experiments that evolve in time and in

which the future evolution exhibits a probabilistic dependence on

the past.

As an example, the future daily prices of a stock are typically

dependent on past prices. However, in aMarkov process, we assume

a very special type of dependence: the next value depends on past

values only through the current value, that is Xi+1 depends only

on Xi, and not on any previous values.


18/102

1.2. GENERATING FUNCTIONS 13

1.2 Generating Functions

1.2.1 Introduction

Probabilistic models often involve several random variablesof interest.

For example, in a medical diagnosis context, the results of several tests

may be signicant, or in a networking context, the workloads of several

gateways may be of interest. All of these random variables are associated

with the same experiment, sample space, and probability law, and their

values may relate in interesting ways. Mathematically, a random variable

is a mapping!

Definition 1. A random variable X is a mapping (function) from a

sample space S to the reals R. For any j R, the preimage A :=

X1(j) ={w: X(w) =j} S is an event, then we understand

{X=j}= (A) =wA

(w).

For finite set - sample spaceSthen obviously

{X=j}= (A) = |A|

|S|.

A discrete random variable Xis the one having finite range Range(X),

described by the probability point or mass distribution (pmf), deter-

mined by{X=j}= pj. We must have

pj 0, andj

pj = 1.


19/102


A continuous random variable Xis the one having infinite range Range(X),

described by the probability density distribution (pdf) f(x), that satisfies

f(t) 0, and

tRange(X)

f(t)dt= 1.

Generating functionsare important in handling stochastic processes in-

volving integral-valued random variables.

Multiple random variables. We consider probabilities involving si-

multaneously the numerical values of several random variables and to

investigate their mutual couplings. In this section, we will extend the

concepts of pmf and expectation developed so far to multiple random

variables.

Consider two discrete random variables X, Y : S R associated with

the same experiment. The joint pmf ofXand Y is dened by

pX,Y(x, y) = P(X=x, Y =y)

for all pairs of numerical values (x, y) thatXandYcan take. We will use

the abbreviated notation P(X = x, Y = y) instead of the more precise

notations P({X=x} {Y =y}) or P({X=x} and {Y =y}). That is

P(X=x, Y =y) = P({X=x} {Y =y}) = P({X=x} and {Y =y}).

For the pair of random variablesX, Y, we say

Definition2. X andYare independent if for allx, y R, we have

P(X=x, Y =y) ={X=x}{Y =y} pX,Y(x, y) =pX(x)pY(y),

or in terms of conditional probability


20/102


({X=x}|{Y =y}) ={X=x}.

This can be extended to the so-called mutually independent of a finite

numbernrandom variables.

Definition 3. The expectation operator defines the expected value of a

random variableX as

E(X) = xRange(X)

{X=x} x

If we consider Xis a function from a sample space Sto the naturals

N, then

E(X) =i=0

{X > i}.(W hy?)

Functions of Multiple Random Variables. When there are multi-

ple random variables of interest, it is possible to generate new random

variables by considering functions involving several of these random vari-

ables. In particular, a function Z=g(X, Y) of the random variables X

and Y denes another random variable. Its pmf can be calculated from

the joint pmfpX,Y according to

pZ(z) = (x,y)|g(x,y)=z

pX,Y(x, y).

Furthermore, the expected value rule for functions naturally extends and

takes the form

E[g(X, Y)] =(x,y)

g(x, y)pX,Y(x, y).


21/102


Theorem 4. We have two important results of expectation.

Linearity E(X+ Y) =E(X) + E(Y) for any pair of random variables

X, Y

Independence E(X Y) =E(X) E(Y) for any pair of independent r.

v. X, Y

Mean, variance and moments of the probability distribution

{X=j}= pj

m= E(X) =j=0

j pj =P(1) =

j=0

qj =Q(1)(why!?)

Recall that the variance of the probability distribution pj is

2 =E(X(X 1)) + E(X) [E(X)]2

we need to know

E(X(X 1)) =j=0

j(j 1)pj =P(1) = 2Q(1)?

Therefore, 2 =?

Exercise: Find the formula of the r-th factorial moment

[r]=E(X(X 1)(X 2) (X r+ 1))

1.2.2 Elementary results of Generating Functions

Suppose we have a sequence of real numbers a0, a1, a2, . . . Introducing

the dummy variable x, we may define a function

A(x) =a0+ a1x + a2x2 + =

j=0

ajxj. (1.2.1)


22/102


If the series converges in some real interval x0 < x < x0, the func-

tion A(x) is called the generating function of the sequence {aj}.

Fact 1.1. If the sequence {aj} is bounded by some cosntant K, then

A(x) converges at least for|x|< 1 [Prove it!]

Fact 1.2. In case of the sequence{aj}represents probabilities, we intro-

duce the restriction

aj 0,j=0

aj = 1.

The corresponding function A(x) is then called aprobability-generating

function. We consider the (point) probability distribution and the tail

probability of a random variable X, given by

{X=j}= pj, P{X > j}= qj,

then the usual distribution function is P{Xj}= 1qj. The probability-

generating function now is

P(x) =j=0

pjxj =E(xj), E is the expectation operator.

Also we can define a generating function for the tail probabilities:

Q(x) =j=0

qjxj.

Q(x) is not a probability-generating function, however.

Fact 1.3.

a/ P(1) =

j=0pj1j = 1 and |P(x)|

j=0 |pjx

j|

j=0pj

1 if|x|< 1. So P(x) is absolutely convergent at least for|x| 1.

b/Q(x) is absolutely convergent at least for|x|< 1.


23/102


c/ Connection betweenP(x) andQ(x): (check this!)

(1 x)Q(x) = 1 P(x) orP(x) + Q(x) = 1 + xQ(x).

Finding a generating function from a recurrence: multiply both

sides by xn. For example, the Fibonacci sequence

fn=fn1+ fn2 =F(x) =x + xF(x) + x2F(x)

Finding a recurrence from a generating function: whenever you

knowF(x), we find its power seriesP, the coefficicents ofPbeforexn are

Fibonacci numbers. How? Just remember how to find a partial fractions

expansionofF(x), in particular a basic expansion

1

1 x= 1 + x + 2x2 +

In general, ifG(x) is a generating function of a sequence (gn) then

G(n)(0) =n!gn

1.2.3 Convolutions

Now we consider two nonnegative independent integral-valued random

variablesXand Y, having the probability distributions

{X=j}= aj, P{Y =k}= bk. (1.2.2)

The joint probability of the event (X=j, Y =k) is ajbk obviously. We

form a new random variable S=X+ Y, then the event S=r comprises

the mutuallyexclusive events

(X= 0, Y =r), (X= 1, Y =r 1), , (X=r, Y = 0).


24/102


Fact 1.4. The probability distribution of the sumS then is

{S=r}= cr =a0br+ a1br1+ + arb0.

Proof.

pS(r) = P(X+Y =r) =

(x,y):x+y=r

P(X=x and Y =y) ==x

pX(x)pY(rx)

This method of compounding two sequences of numbers (not necessarily

be probabilities) is called convolution. Notation

{cj}= {aj} {bj}

will be used.

Fact 1.5. Define the generating functions of the sequence{aj},{bj}and

{cj}by

A(x) =j=0

ajxj, B(x) =

j=0

bjxj , C(x) =

j=0

cjxj,

it follows thatC(x) =A(x)B(x). [check this!]

In practical applications, the sum of several independent integral-

valued random variables Xi can be defined

Sn = X1+ X2+ + Xn, n Z+.

If the Xi have a common probability distribution given by pj, with

probability-generating function P(x), then the probability-generating

function ofSn isP(x)n. Clearly, the n-fold convolution ofSn is

{pj} {pj} {pj} (n factors) ={pj}n.


25/102


1.2.4 Compound distributions

In our discussion so far of sums of random variables, we have always

assumed that the number of variables in the sum is known and xed, i.e.,

it is nonrandom. We now generalize the previous concept of convolution

to the case where the number Nof random variables Xk contributing to

the sum is itself a random variable! In particular, we consider the sum

SN=X1+ X2+ + XN, where

{Xk=j}= fj,

{N=n}= gn,

{SN=l}= hl.

(1.2.3)

Probability-generating functions ofX, N and Sare

F(x) =

fjxj,

G(x) =

gnxn,

H(x) =

hlxl.

(1.2.4)

Compute H(x) with respect to F(x) and G(x). Prove that

H(x) =G(F(x)).

Example 1.1. A remote village has three gas stations, and each one

of them is open on any given day with probability 1/2, independently of

the others. The amount of gas available in each gas station is unknownand is uniformly distributed between 0 and 1000 gallons. We wish to

characterize the distribution of the total amount of gas available at the

gas stations that are open.

The number Nof open gas stations is a binomial random variable


26/102


with p= 1/2 and the corresponding transform is

GN(x) = (1 p +pex)3 =18

(1 + ex)3.

The transform (probability-generating function) FX(x) associated with

the amount of gas available in an open gas station is

FX(x) =e1000x 1

1000x .

The transformHS(x) associated with the total amountSof gas avail-

able at the three gas stations of the village that are open is the same as

GN(x), except that each occurrence ofex is replaced with FX(x), i.e.,

HS(x) =G(F(x)) =1

8(1 + FX(x))

3.

Next chapter will discuss Fundamental Stochastic Processes.


27/102



28/102

Chapter 2

Markov Chains & Modeling

We discuss the concept of discrete time Markov Chain or just Markov

Chains (MC) in this section. Suppose we have a sequence Mof consec-

utive trials, numbered n = 0, 1, 2, . The outcome of the nth trial is

represented by the random variable Xn, which we assume to be discrete

and to take one of the values jin a finite setQof discreteoutcomes/states

{e1, e2, e3, . . . , es}.

M is called a (discrete time) Markov chain if, while occupying Q states

at each of the unit time points 0, 1, 2, 3, . . . , n1, n , n+1, . . .,M satisfies

the following property, called

Markov property or Memoryless property:

(Xn+1 = j |Xn=i, , X0=a) = (Xn+1= j |Xn=i),

for all n = 0, 1, 2, .

23


29/102

24 CHAPTER 2. MARKOV CHAINS & MODELING

(In each time step n to n+ 1, the process can stay at the same state ei

(at both n, n+ 1) or move to other state ej (at n+ 1) with respect tothe memoryless rule, saying the future behavior of system depends only

on the present and not on its past history.)

Definition5 (One-step transition probability).

Denote the absolute probability of outcomej at thenth trial by

pj(n) = (Xn=j ) (2.0.1)

The one-step transition probability, denoted

pij(n + 1) = (Xn+1=j |Xn = i),

defined as the conditional probability that the process is in statej at time

n + 1 given that the process was in statei at the previous timen, for all

i, j Q.

2.1 Homogeneous Markov chains

If the state transition probabilities pij(n+ 1) in a Markov chain M is

independentof time n, they are said to be stationary,time homogeneous

or just homogeneous. The state transition probability in homogeneous

chain then can be written without mention time point n:

pij = (Xn+1=j |Xn= i). (2.1.1)

Unless stated otherwise, we assume and will work with homogeneous

Markov chains M. The one-step transition probabilities given by 2.1.1


30/102

2.1. HOMOGENEOUS MARKOV CHAINS 25

of these Markov chains must satisfy:

sj=1

pij = 1; for each i= 1, 2, , sand pij 0.

Transition Probability Matrix. In practice, we are likely given the ini-

tial distribution (the probability distribution of starting position of the

concerned object at time point 0), and the transition probabilities; and

we want to determine the the probability distribution of positionXn for

any time point n > 0. The Markov property, quantitatively describedthrough transition probabilities, is represented in the state transition

matrix P = [pij]:

P =

p11 p12 p13 . . . .p1s.

p21 p22 p23 . . . p2s.

p31 p32 p33 . . . p3s...

.

..

.

..

.

. . . ..

.

(2.1.2)

Briefly, we have

Definition 6. A (homogeneous) Markov chain M is a triple (Q,p,A)

in which:

Q is a finite set of states (be identified with an alphabet),

p(0) are initial probabilities, (at initial time pointn= 0)

Pare state transition probabilities, denoted by a matrixP = [pij]

in which

pij = (Xn+1 = j |Xn=i)

.


31/102


And such that thememoryless property is satisfied,ie.,

(Xn+1=j |Xn=i, , X0=a) = (Xn+1 = j |Xn=i), for alln.

In practice, the initial probabilities p(0) is obtained at the current

time (begining of a research), and the transition probability matrix P is

found from empirical observations in the past. In most cases, the major

concern is using P and p(0) to predict future.

Example 2.1. The Coopmart chain (denotedC) in SG currently con-

trols60% of the daily processed-food market, their rivals Maximart and

other brands (denotedM) takes the other share. Data from the previous

years (2006 and 2007) show that88% ofCs customers remained loyal

to C, while12% switched to rival brands. In addition,85% ofMs cus-

tomers remained loyal to M, while other15% switched to C. Assuming

that these trends continue, determineCs share of the market (a) in 5

years and (b) over the long run.

Proposed solution. Suppose that the brand attraction is time homoge-

neous, for a sample of large enough size n, we denote the customers

attention in the year n by a random variable Xn. The market share

probability of the whole population then can be approximated by using

the sample statistics, e.g.

P(Xn=C) = |{x: Xn(x) =C}|

n , and P(Xn=M) = 1 P(Xn=C).

Set n= 0 for the current time, the initial probabilities then is

p(0) = [0.6, 0.4] = [P(X0 = C),P(X0=M)].


32/102

2.1. HOMOGENEOUS MARKOV CHAINS 27

Obviously we want to know the market share probabilities p(n) = [P(Xn =

C),P

(Xn = M)] at any yearn >0. We now introduce a transition prob-ability matrix with labels on rows and columns to be Cand M

P=

C M

C 0.88 0.12

M 0.15 0.85

=

1 a= 0.88 a= 0.12

b= 0.15 1 b= 0.85

, =

0.88 0.12

0.15 0.85

,

(2.1.3)

where a= pCM = P[Xn+1 =M|Xn =C], b= pMC= P[Xn+1 =C|Xn =M].

Higher-order transition probabilities.

The aim: find the absolute probabilities at any stage n. We write

p(n)ij = (Xn+m = j |Xm=i), with p(1)ij =pij (2.1.4)

for the n-step transition probability, being dependent of m N, see

Equation 2.1.1. Then-step transition matrix is denoted asP(n) = (p(n)ij ).

For the case n= 0, we have

p(0)ij =ij = 1 ifi = j, and i=j.

Chapman Komopgorov equations. Chapman Komopgorov equations re-

late then-step transition probabilities and k-step andn k-step transi-

tion probabilities:

p(n)ij =

sh=1

p(nk)ih p

(k)hj, 0< k < n.

This results in the matrix notation

P(n) =P(nk)P(k).


33/102


Since P(1) =P, we get P(2) =P2, and in general P(n) =Pn.

Let p

(n)

denote the vector form of probability mass distribution (pmf orabsolute probability distribution) associated withXnof a Markov process,

that is

p(n) = [p1(n), p2(n), p3(n), . . . , ps(n)],

where each pi(n) is defined as in 2.0.1.

Proposition 7. The absolute probability distributionp(n) at any stagen

of a Markov chain is given in the matrix form

p(n) =Pnp(0), wherep(0) =p is the initial probability vector. (2.1.5)

Proof. We employ two facts:

* P(n) =Pn, and

* the absolute probability distributionp(n+1) at any stagen +1 (asso-

ciated withXn+1) can be found by the 1-step transition matrixP = [pij]

and the distribution

p(n) = [p1(n), p2(n), p3(n), . . . , ps(n)]

at any stage n(associated with Xn):

pj(n + 1) =si=1

pijpi(n), or in the matrix notation p(n+1) =P p(n).

Then just do the induction p(n+1) = P p(n) = P P,p(n1) = =

Pn+1

p(0)

.

Example 2.2 (The Coopmart chain: cont. ). (a/) Cs share of the

market in 5 years can be computed by

p(5) = [pC(5), pM(5)] =P5p(0).


34/102

2.2. CLASSIFICATION OF STATES 29

Practical Problem1. A state transition diagram of a finite-state Markov

chain is a line diagram with a vertex corresponding to each state and a

directed line between two vertices i and j ifpij >0. In such a diagram,

if one can move fromi and j by a path following the arrows, theni j .

The diagram is useful to determine whether a finite-state Markov

chain is irreducible or not, or to check for periodicities.

Draw the state transition diagrams and classify the states of the MCs

with the following transition probability matrices:

P1=

0 0.5 0.5

0.5 0 0.5

0.5 0.5 0

; P2 =

0 0 0.5 0.5

1 0 0 0

0 1 0 0

0 1 0 0

; P3 =

0.3 0.4 0 0 0.3

0 1 0 0 0

0 0 0 0.6 0.4

0 0 1 0 0

2.2 Classification of States

A) Accessible states.

Statejis said to be accessible from state i if for some N0, p(N)ij >0,

and we write i j . Two statesi andj accessible to each other are said

to communicate, and we write i j. If all states communicate with

each other, then we say that the Markov chain is irreducible. Formally,irreducibility means

i, jQ : N0[p(N)ij >0].

B) Recurrent/persistence states and Transient states.


35/102


Let A(i) be the set of states that are accessible from i. We say that

i is recurrent if from any future state, there is always some probabilityof returning to i and, given enough time, this is certain to happen. By

repeating this argument, if a recurrent state is visited once, it will be

revisited an innite number of times.

A state is called transientif it is not recurrent. In particular, there are

states j A(i) such that i is not accessible from j. After each visit to

statei, there is positive probability that the state enters such a j . Given

enough time, this will happen, and state i cannot be visited after that.

Thus, a transient state will only be visited a finite number of times.

We now formalize concepts of recurrent/persistence state and transient

state.

Let thefirst return timeTj indicate the first timeor the number of steps

the chain is firstly at state j after leaving j after time 0 (if j is never

reached then setTj =) It is a discrete r.v., taking values in {1, 2, 3,...}.

For any two states i =j andn >0, letfni,j be the conditional probability

the chain is firstly at state j after n steps given it was at state i at time

0:

fni,j := P[Tj =n|X0 = i] = P[Xn = j, Xk=j, k= 1, 2,...,n 1|X0 = i]

and f0i,j = 0 since Tj 1. Then clearly

f1i,j = P[X1 = j |X0=i] =pi,j


36/102


37/102


Statej is said to be transient (or nonrecurrent) if

fj,j = P[Tj


38/102

2.3. MARKOV CHAIN DECOMPOSITION 33

2.3 Markov Chain Decomposition

Fact 2.1. In any Markov Chain, the followings are correct.

It can be decomposed into one or morerecurrent classesor equiv-

alent classes, plus possibly some transient states. Each equivalent

class contains those states that communicate with each other.

A recurrent state is accessible from all states in its class, but is not

accessible from recurrent states in other classes;

A transient state is not accessible from any recurrent state. But,

at least one, possibly more, recurrent states are accessible from a

given transient state.

For the purpose of understanding long-termbehavior of Markov

chain, it is important to analyze chains that consist of a single recur-

rent class. Such Markov chain is called irreducible chain.

For the purpose of understanding short-termbehavior, it is also

important to analyze the mechanism by which any particular class of

recurrent states is entered starting from a given transient state.

C) Periodic states.

In a finite Markov Chain M = (Q,, P) (i.e. having finite number

of states), a periodicstate i is state to which an agent could go back at

positive integer time points t0, 2t0, 3t0, . . .(multiple of an integer period

t0 >1). t0 is named the period ofi, being the greatest common divisor

of the integers {t >0 :p(t)i,i >0}.


39/102


A Markov Chain isaperiodicif there is no such periodic state, in other

words, if the period of each state i Q is 1.

For example, we could check if a MC has the transition matrix

P =

0 0 0.6 0.4

0 0 0.3 0.7

0.5 0.5 0 0

0.2 0.8 0 0

;

then it is periodic. Indeed, if the Markovian random variable (agent)

starts at time 0 in stateE1, then at time 1 it must be in state E3 orE4,

at time 2 it must be in state E1 or E2. Therefore, it generaly can visit

onlyE1 at times 2,4,6, ... Summarizing we have

Definition9. A finite Markov chainM= (Q,, P) is

1. irreducible iff it has only one single recurrent class, or any state

can be accessible from all other states.

2. aperiodic iff the period of each state i Q is 1; or it has no

periodic state.

3. ergodic if it is positive recurrent and aperiodic.

It can be shown that recurrence, transientness, and periodicity are all

class properties; that is, if state i is recurrent (positive recurrent, null

recurrent, transient, periodic), then all other states in the same class of

state i inherit the same property.

D) Absorbing states and Absorption probabilities.


40/102

2.3. MARKOV CHAIN DECOMPOSITION 35

State j is said to be an absorbing state ifpjj = 1; that is, once state

j is reached, it is never left.

If there is a unique absorbing statek, its steady-state probability is

1 (because all other states are transient and have zero steady-state

probability), and will be reached with probability 1, starting from

any initial state.

If there are multiple absorbing states, the probability that one

of them will be eventually reached is still 1, but the identity of

the absorbing state to be entered is random and the associated

probabilities may depend on the starting state.

Can we determine precisely absorption probabilities for all the ab-

sorbing states in a MC in the generic case?

Consider a Markov chain X(n) = {Xn, n 0} with finite state space

E={1, 2, , N}and transition probability matrix P.

Theorem 10. Let A = {1, , m} be the set of absorbing states and

B ={m + 1, , N} be a set of nonabsorbing states.

Then the transition probability matrixPcan be expressed as

P=

I O

R Q

whereI ism midentity matrix, 0is anm (Nm)zero matrix,

and the elements ofR are the one-step transition probabilities from

nonabsorbing to absorbing states, and the elements of Q are the

one-step transition probabilities among the nonabsorbing states.


41/102


LetU= [uk,j ] be an(N m) m matrix and its elements are the

absorption probabilities for the various absorbing states,

uk,j = P[Xn= j(A)|X0 = k(B)]

We have

U= (I Q)1R= R,

is called the fundamental matrixof the Markov chainX(n).

2.4 Limiting probabilities & Stationary dis-

tributions

From now on we assume that all MCs are finite, aperiodic and irre-

ducible. The irreducibility assumption implies that any state can even-

tually be reached from any other state. Both irreducibility and aperiod-

icity assumptions hold for essentially all practical applications of MCs

(in bioinformatics,...) except for the case of MCs with absorbing states.

Definition11. Vectorp = (p1, p2, , p

s) is called the stationary dis-

tribution of a Markov chain{Xn, n 0}with the state transition matrix

P if:

pP= p.

This equation indicates that a stationary distribution p is a left eigen-

vector of P with eigenvalue 1. In general, we wish to know limiting

probabilities p from taking n in the equation

p() =Pp(0).


42/102

2.4. LIMITING PROBABILITIES & STATIONARY DISTRIBUTIONS37

We need some general results to determine the stationary distribution

p

and limiting probabilitiesp

of a Markov chain. For a specific classof MCs as follows, there exist stationary distribution.

Lemma 12. If M = (Q,, P) is a finite, aperiodic and irreducible

Markov chain, then some power ofPis strictly positive.

See a proof in [7], page 79. Such matrices P (that there exists a

natural m such that Pm >0) are called regularmatrices.

Theorem 13. [Equilibrium distribution] Given a finite, aperiodic and

irreducible Markov chainM = (Q,, P), whereQ consists ofs states.

Then there exist stationary probabilities

pi := limt

pi(t),

where thepi form a unique solution to the conditions:

si=1 p

i = 1; where eachp

i 0;

pj =s

i=1 pipi,j.

See the proof in Theorem 19. We discuss here two particular cases

when s = 2 ands >2.

A) Markov chains that have two states.

At first we investigate the case of Markov chains that have two states, say

Q= {e1, e2}. Leta= pe1e2andb= pe2e1the state transition probabilities

between distinct states in a two state Markov chain, its state transition

matrix is

P =

p11 p21

p12 p22

=

1 a a

b 1 b

, where 0< a


43/102


Proposition 14.

a) Then-step transition probability matrix is given by

P(n) =Pn = 1

a + b

b a

b a

+ (1 a b)n

a a

b b

b) Find the limit matrix whenn .

To prove this basic Proposition 14 (computing transition probability

matrix of two state Markov chains), we use a fundamental result of Linear

Algebra that is recalled in Subsection 2.6.

Proof. The eigenvalues of the state transition matrixP found by solving

equation

c() =|I P|= 0

are 1 = 1 and 2 = 1 a b. The spectral decomposition of square

matrix says Pcan be decomposed into two constituent matrices E1, E2

(since only two eigenvalues was found):

E1= 1

1 2[P 2I], E2 =

1

2 1[P 1I].

That means, E1, E2 are orthogonal matrices, i.e. E1 E2 = 0 =E2 E1,

and

P =1E1+ 2E2; E21 =E1, E

22 =E2.

Hence,

Pn =n1E1+ n2E2 = E1+ (1 a b)

nE2,

or


44/102

2.4. LIMITING PROBABILITIES & STATIONARY DISTRIBUTIONS39

P(n) =Pn = 1a + b

b ab a

+ (1 a b)n a ab b

b) The limit matrix when n :

limn

Pn = 1

a + b

b a

b a

B) Markov chains that have more than two states.

For s > 2, it is cumbersome to compute constituent matrices Ei ofP,

we could employ the so-called regular property.

Definition 15. Markov chains are regular if there exists m N such

that

P(m) =Pm >0

(i.e. every matrix entry is positive).

In summary, in a DTMC Mthat have more than two states, we have 4

cases:

Fact 2.2.

1. M has irreducible, positive recurrent, but periodic states. The

component i of the stationary distribution vector must be un-

derstood as the long-run proportion of time that the process is in

state i.


45/102


2. M has several closed, positive recurrent classes. In this case, the

transition matrix of the DTMC takes the block form.

In contrast to the irreducible ergodic DTMC, where the limiting

distribution is independent of the initial state, the DTMC with sev-

eral closed, positive recurrent classes has the limiting distribution

that is dependent on the initial state.

3. Mhas both recurrent and transient classes. In this situation, we

often seek the probabilities that the chain is eventually absorbed

by different recurrent classes. See the well-known gamblers ruin

problem.

4. M is an irreducible DTMC withnullrecurrent or transient states.

This case is only possible when the state space is innite, since any

nite-state, irreducible DTMC must be positive recurrent. In this

case, neither the limiting distribution nor the stationary distribu-

tion exists.

A well-known example of this case is the random walk model.

Practical Problem3. Consider a Markov chain with state space {0, 1, 2}

and transition probability matrix

P =

0 0.5 0.5

1 0 0

1 0 0

;

Show that state 0 is periodic with period 2.


46/102

2.5. THEORY OF STOCHASTIC MATRIX FOR MC 41

Practical Problem4(The Gamblers Ruin problem). Let two gam-

blers, A and B, initially have k dollars and m dollars, respectively. Sup-pose that at each round of their game, A wins one dollar from B with

probability p and loses one dollar to B with probability q= 1 p. As-

sume that A and B play until one of them has no money left. LetXn be

As capital after round n, where n= 0, 1, 2, and X0=k.

(a) Show thatX(n) ={Xn, n 0} is a Markov chain with absorbing

states.

(b) Find its transition probability matrixP. RealizeP whenp= q=

1/2 andN= 4

(c*) What is the probability of As losing all his money?

2.5 Theory of stochastic matrix for MC

A stochastic matrix is a matrix for which each column sum equals one.

If the row sums also equal one, the matrix is called doubly stochastic.

Hence the transition probability matrix P = [pij] is a stochastic matrix.

Proposition 16. Every stochastic matrixK has

1 as an eigenvalue (possibly with multiple), and

none of the eigenvalues exceeds 1 in absolute value, that is all eigen-

valuesi satisfy|i| 1.

Proof.


47/102


The spectral radius (K) of any square K is defined as

(K) = maxi {eigen values i}.

When K is stochastic, (K) = 1. Note that ifP is a transition matrix

for a nite-state Markov chain, (then P is stochastic) the multiplicity

of the eigenvalue (K) = 1 is equal to the number ofrecurrent classes

associated with P .

Fact 2.3. IfKis a stochastic matrix thenKm is a stochastic matrix.

Proof. Let e = [1, 1, , 1]t the all-one vector, then use the fact that

Ke= e. Prove that Kme= e.

Let A = [aij] > 0 denote that every element aij of A satisfies the

condition aij >0.

Definition17.

A stochastic matrixP = [pij] is ergodic if limm Pm = L (say)

exists, that is eachp(m)ij has a limit whenm .

A stochastic matrix P is regular if there exists a natural m such

that Pm > 0. In our context, a Markov chain, with transition

probability matrixP, is called regular if there exists anm >0 such

thatPm >0, i.e. there is a finite positive integerm such that after

m time-steps, every state has a nonzero chance of being occupied,

no matter what the initial state.

Example2.3. Is the matrix

P =

0.88 0.12

0.15 0.85

regular? ergodic? Calculate the limit matrixL= limm Pm.


48/102


49/102


(p is called a stationary distribution of MC). Your final task is proving

that Ls rows are identical and equal to the stationary distribution p

i.e.: L= [p, ,p].

Corollary 20. Few important remarks are: (a) for regular MC, the

long-term behavior does not depend on the initial state distribution prob-

abilitiesp(0); (b) in general, the limiting distributions are influenced by

the initial distributionsp(0), whenever the stochastic matrixP = [pij ]is

ergodic but not regular. (See more at problem D).

Example2.4. Consider a Markov chain with two states and transition

probability matrix 3/4 1/4

1/2 1/2

(a) Find the stationary distributionp of the chain. (b) Findlimn Pn

by first evaluatingPn. (c) Find limn Pn.


50/102

2.6. SPECTRAL THEOREM FOR DIAGONALIZABLE MATRICES45

2.6 Spectral Theorem for Diagonalizable

Matrices

Consider a square matrix Pof order s with spectrum (P) ={1, 2, , k}

consisting of its eigenvalues. Then:

If{(1,x1), (2,x2), , (k,xk)} are eigenpairs for P, then S =

{x1, ,xk} is a linearly independent set. IfBi is a basis for the

null space N(P iI), then B = B1 B2 Bk is a linearly

independent set

P is diagonalizable if and only if P possesses a complete set of

eigenvectors (i.e. a set ofs linearly independent vectors). More-

over, H1P H = D = (1, 2, , s) if and only if the columns

ofHconstitute a complete set of eigenvectors and the js are the

associated eigenvalues- i.e., each (j , H[, j]) is an eigenpair forP.

Spectral Theorem for Diagonalizable Matrices. A square matrix

Pof order s with spectrum(P) ={1, 2, , k}consisting of eigen-

values is diagonalizable if and only if there exist constituent matrices

{E1, E2, , Ek}(called the spectral set) such that

P =1E1+ 2E2+ + kEk, (2.6.1)

where the Eis have the following properties:

Ei Ej = 0 whenever i=j , and E2i =Ei for all i= 1..k

E1+ E2+ + Ek=I


51/102


In practice we employ Fact 2.6.1 in two ways:

Way 1 : if we know the decomposition 2.6.1 explicitly, then we cancompute powers

Pm =m1 E1+ m2 E2+ +

mkEk, for any integer m >0. (2.6.2)

Way 2: if we know P is diagonalizable then we find the constituent

matrices Ei by:

* finding the nonsingular matrix H= (x1|x2| |xk), where each xi

is a basis left eigenvector of the null subspace

N(P iI) ={v: (P iI)(v) = 0 Pv= iv};

** then, P =H DH1 = (x1|x2| |xk) D H1 where

D= diag(1, , k) the diagonal matrix, and

H

1

=K

=

yt1

yt2

...

ytk

; (i.e.K= (y1|y2| |yk)).

Here each yi is a basis right eigenvector of the null subspace

N(P iI) ={v:vP =iv

}.

The constituent matrices Ei=xi yti.

Example2.5. Diagonalize the following matrix and provide its spectral

decomposition.

P =

1 4 4

8 11 8

8 8 5

.


52/102

2.6. SPECTRAL THEOREM FOR DIAGONALIZABLE MATRICES47

The characteristic equation is

p() = det(P I) =3 + 52 + 3 9 = 0.

So = 1 is a simple eigenvalue, and = 3 is repeated twice (its

algebraic multiplicityis 2). Any set of vectors x satisfying

x N(P I) (P I)x= 0

can be taken as a basis of the eigenspace (or null space) N(P I).

Bases of for the eigenspaces are:

N(P1I) =span

[1, 2, 2]

; andN(P+3I) =span

[1, 1, 0], [1, 0, 1]

.

Easy to check that these three eigenvectors xi form a linearly indepen-

dent set, then P is diagonalizable. The nonsingular matrix (also called

similarity transformation matrix)

H= (x1|x2|x3) =

1 1 12 1 0

2 0 1

;

will diagonalize P, and since P =H DH1 we have

H1P H=D = (1, 2, 2) = (1, 3, 3) =

1 0 0

0 3 0

0 0 3

Here, H1 =

1 1 1

2 3 2

2 2 1

implies that


53/102


yt1 = [1, 1, 1], yt2 = [2, 3, 2], y

t3 = [2, 2, 1]. Therefore, the con-

stituent matrices

E1 = x1yt1=

1 1 1

2 2 2

2 2 2

; E2 = x2yt2=

2 3 2

2 3 2

0 0 0

; E3 = x3yt3=

2 2 1

0 0 0

2 2 1

.

Obviously,

P =1E1+ 2E2+ 3E3 =

1 4 4

8 11 8

8 8 5

.

2.7 Markov Chains with Absorbing States

2.7.1 Theory

Two quetions:

/ if there are at least two absorbing states, what is the probabilitythat a specific absorbing state is the one eventually entered?

/ what is the mean time until an absorbing state is eventually en-

tered?

Question . The probability that a specific absorbing state is the one

eventually entered.

Theorem 21. Consider a Markov chainX(n) ={Xn, n 0}with finite

state spaceE = {1, 2, , N} and transition probability matrixP. Let

A= {1, , m} be the set of absorbing states andB ={m+ 1, , N}

be a set of nonabsorbing states.


54/102


55/102


we could equivalently check that absorption ofX(n) in one or another

of the absorbing states is certain. Formally, you could prove

Lemma 22.

limnP[Xn B ] = 0 or limnP[Xn A] = 1.

Question . The mean time until an absorbing state is eventually

entered.LetTk denote the total time units (or steps) to absorption from state

k (meaning X0 = k), where k=m + 1..N. Let

T = [Tm+1, Tm+2, , TN]

Then it can be shown that the mean time E(Tk) to absorption from

state k

E(Tk) =N

i=m+1

[k, i]

where [k, i] the (k, i)th element of the fundamental matrix .

Proof. Let W = [nj,k] ,wherenj,kis the number of times the statek(B)

is occupied until absorption takes place when Xn starts in state j (B).

Then

Tj =N

k=m+1

nj,k,

then calculate E(nj,k).


56/102

2.7. MARKOV CHAINS WITH ABSORBING STATES 51

Example2.6. Consider a simple random walkX(n)with absorbing bar-

riers at state 0 and stateN = 3 = mA+mB as in the Gamblers Ruinproblem; wheremA= 2USD isA capital andmB = 1USD isB capital

at round 0. Can you write out

a/ the transition probability matrix P, known thatp = P[ A wins ] in

each round, where 0< p


57/102


2.8 Chapter Review and Discussion

Application in Large Deviation theory. We are interested in a

practical situation in insurance industry, originally realized from 1932

by F. Esscher, (Notices of AMS, Feb 2008).

Problem: too many claims could be made against the insurance com-

pany, we worry about the total claim amount exceeding the reserve fund

set aside for paying these claims.

Our aim: to compute the probability of this event.

Modeling. Each individual claim is a random variable, we assume

some distribution for it, and the total claim is then the sum Sof a large

number of (independent or not) random variables. The probability that

this sum exceeds a certain reserve amount is the tail probabilityof the

sumSof independent random variables.

Large Deviation theoryinvented by Esscher requires the calculation of the

moment generating functions! If your random variables are independent

then the moment generating functions are the product of the individual

ones, but if they are not (like in a Markov chain) then there is no longer

just one moment generating function!

Research project: study Large Deviation theory to solve this problem.

Practical Problem5 (Brand switching model for consumer behavior).

Suppose there are several brands of a product competing in a market

(for example, those brands might be competing brands of soft drinks).

Assume that every week a consumer buys one of the three brands, labeled

as 1, 2, and 3. In each week, a consumer may either buy the same


58/102

2.8. CHAPTER REVIEW AND DISCUSSION 53

brand he bought the previous week or switch to a different brand. A

consumers preference can be influenced by many factors, such as brandloyaltyandbrand pressure(i.e., a consumer is persuaded to purchase the

same brand). To gauge consumer behavior, sample surveys are frequently

conducted. Suppose that one of such surveys identifies the following

consumer behavior:

Following week

Current week Brand 1 Brand 2 Brand 3

Brand 1 0.51 0.35 0.14

Brand 2 0.12 0.80 0.08

Brand 3 0.03 0.05 0.92

The market share of a brand during a period is defined as the

average proportion of people who buy the brand during the period. Our

questions are:

a/ What is the market share of a specific brand in a short run (say

in 3 months) or in a long run (say in 3 years)?

b/ How does repeat business, due to brand loyaltyand brand pres-

sure, affect a companys market share and profitability?

c/ What is the expected number of weeks that a consumer stays

with a particular brand?


59/102



60/102

Chapter 3

Random walks & Wiener

process

Random walks are special cases of Markov chain, thus can be studied by

Markov chain methods.

3.1 Introduction to Random Walks

We use random walks to supply the math base for BLAST. BLAST is a

procedure often employed in Biomatics that

searches for high-scoring local alignments between two sequences,

then tests for significance of the scores found via P-values.

Example3.1. Consider a simple case of the two aligned DNA sequences

ggagactgtagacagctaatgctata

gaacgccctagccacgagcccttatc

55


61/102

56 CHAPTER 3. RANDOM WALKS & WIENER PROCESS

Suppose we give

- a score +1 if the two nucleotides in corresponding positions are thesame and

- a score -1 if they are different.

When we compare two sequences from left to right, the accumulated

score performs a random walk, or better a simple random walk in one

dimension. The following theory although mentions the generic case, but

we will use this example and BLAST as running example.

3.2 Random Walk- a mathematical real-

ization

Let Z1, Z2, be independent identically distributed r.v.s with

P(Zn= 1) =p and P(Zn=1) =q= 1 p

for all n. Let

Xn =ni=1

Zi, n= 1, 2, and X0 = 0.

The collection of r.v.s {Xn, n 0}is a random process, and it is called

the simple random walk in one dimension.

(a) Describe the simple random walkX(n).

(b) Construct a typical sample sequence (or realization) ofX(n).

(c) Find the probability that X(n) =2 after four steps.


62/102

3.2. RANDOM WALK- A MATHEMATICAL REALIZATION 57

(d) Verify the result of part (a) by enumerating all possible sample

sequences that lead to the value X(n) =2 after four steps.

(e) Find the mean and variance of the simple random walkX(n). Find

the autocorrelation function RX(n, m) of the simple random walk

X(n).

(f) Show that the simple random walk X(n) is a Markov chain.

(g) Find its one-step transition probabilities.

(h) Derive the first-order probability distribution of the random walk

X(n).

Solution.

(a) Describe the simple random walk. X(n) is a discrete-parameter (or

time), discrete-state random process. The state space is E={..., 2, 1, 0, 1, 2,...},

and the index parameter set isT ={0, 1, 2,...}.

(b) Typical sample sequence. A sample sequence x(n) of a simple ran-

dom walk X(n) can be produced by tossing a coin every second and

letting x(n) increase by unity if a head H appears and decrease by unity

if a tail T appears. Thus, for instance, we have a small realization of

X(n) in Table 3.2:

The sample sequence x(n) obtained above is plotted in (n, x(n))-plane.The simple random walk X(n) specified in this problem is said to be

unrestricted because there are no bounds on the possible values of X.

The simple random walk process is often used in Game Theory or

Biomatics.


63/102


n 0 1 2 3 4 5 6 7 8 9 10

Coin tossing H T T H H H T H H T

xn 0 1 0 - 1 0 1 2 1 2 3 2

Table 3.1: Simple random walk from Coin tossing

Remark 3.1. We define the ladder points to be the points in the walk

lower than any previously reached point. An excursion in a walk is the

part of the walk from a ladder point to the highest point attained before

the next ladder point.

BLAST theory focus on the maximum heights achieved by theses

excursions.

(c) The probability that X(n) =2 after four steps.

We compute the first-order probability distribution of the random walk

X(n):

pn(k) = P(Xn=k), with boundary conditions p0(0) = 1, and pn(k) = 0 ifn


64/102

3.2. RANDOM WALK- A MATHEMATICAL REALIZATION 59

When X(n) =k, we see that

A= (n + k)/2,

which is a binomial r.v. with parameters (n, p).

Conclude: the probability distribution ofX(n) is given by 3.2.1, in which

n |k|, and n, k must be both even or odd.

Set k = 2 and n= 4 in 3.2.1 to get the concerned probability p4(2)

that X(4) =2

(d) Verify the result of part (a) by enumerating all possible sample se-

quences that lead to the value X(n) =2 after four steps. DIY!

(e) The mean and variance of the simple random walk X(n). Use the

fact

P(Zn= +1) =p and P(Zn=1) = 1 p.


65/102


3.3 Wiener process

Counting process. A random process {X(t), t 0} is said to be a

counting process if X(t) represents the total number of events that

have occurred in the interval (0, t). From its definition, we see that for

a counting process, X(t) must satisfy the following conditions:

X(t) 0 and X(0) = 0.

X(t) is integer valued.

X(s) X(t) ifs < t.

X(t) X(s) equals the number of events that have occurred on the

interval (s, t).

Independent increments and stationary increments. A counting

processX(t) is said to possess independent incrementsif the numbers of

events which occur in disjoint time intervals are independent.

A counting processX(t) is said to possessstationary increments ifX(t+

h) X(s + h) (the number of events in the interval (s + h, t + h) has the

same distribution as X(t) X(s) (the number of events in the interval

(s, t)), for all s < tand h >0.

Wiener process. A random process {X(t), t 0} is called a Wiener

process if

1. X(t) has stationary independent increments

2. The incrementX(t) X(s) (t > s) is normally distributed


66/102

3.3. WIENER PROCESS 61

3. E[X(t)] = 0, and

4. X(0) = 0.

The Wiener process is also known as the Brownian motionprocess, since

it originates as a model for Brownian motion, the motion of particles

suspended in a fluid.

Definition23. A random process{X(t), t 0} is called a Wiener pro-

cess withdrift coeficient if

1. X(t) has stationary independent increments

2. X(t) is normally distributed with meanE[X(t)] =t, and

3. X(0) = 0.


67/102



68/102

Chapter 4

Arrival-Type processes

4.1 Introduction

In Stochastic processes, we are interested in few distinct properties:

(a) the dependencies in the sequence of values generated by the

process. For example, how do future prices of a stock depend on

past values?

(b)long-term averages, involving the entire sequence of generated

values. For example, what is the fraction of time that a machine

is idle?

(c) the likelihood or frequency of certain boundary events. For

example, what is the probability that within a given hour all cir-

cuits of some telephone system become simultaneously busy?

63


69/102

64 CHAPTER 4. ARRIVAL-TYPE PROCESSES

In this chapter, we will discuss the first major category of stochastic

processes, Arrival-Type Processes. We are interested in occurrences thathave the character of an arrival, such as

- message receptions at a receiver,

- job completions in a manufacturing cell,

- customer purchases at a store, etc.

We will focus on models in which the interarrival times(the times be-

tween successive arrivals) are independent random variables.

First, we consider the case where arrivals occur in discrete time

and the interarrival times are geometrically distributed this is the

Bernoulli process.

Then we consider the case where arrivals occur in continuous time

and the interarrival times are exponentially distributed this is the

Poisson process.

4.2 The Bernoulli process

4.2.1 Basic facts

The Bernoulli process can be visualized as a sequence of independent

coin tosses, where the probability of heads in each toss is a fixed number

p in the range 0 < p < 1. In general, the Bernoulli process consists of

a sequence of Bernoulli trials, where each trial produces

a 1 (a success) with probability p, and

a 0 (a failure) with probability 1 p, independently of what happens

in other trials.


70/102

4.2. THE BERNOULLI PROCESS 65

There are many realizations of Bernoulli process. Coin tossing is just a

paradigm involving a sequence of independent binary outcomes. The se-quence Z1, Z2, of independent identically distributed r.v.s in Section

3 is another paradigm for the same phenomenon.

In practice, a Bernoulli process is often used to model systems involving

arrivals of customers or jobs at service centers. Here, time is discretized

into periods, and a success at thek-th trial is associated with the arrival

of at least one customer at the service center during the k-th period. In

fact, we will often use the term arrival in place of success when this is

justied by the context.

Given an arrival process, one is often interested in random variables such

as the number of arrivals within a certain time period, or the time until

the first arrival. For the case of a Bernoulli process, some answers are

already available from earlier chapters. Here is a summary of the main

facts.

Bernoulli DistributionB(p) describes a random variable that can take

only two possible values, i.e. X ={0, 1}. The distribution is described

by a probability function

p(1) = P(X= 1) =p, p(0) = P(X= 0) = 1 pfor some p[0, 1].

It is easy to check that E(X) =p, Var(X) =p(1 p).


71/102


4.2.2 Random Variables Associated with the

Bernoulli Process

Binomial distribution B(n, p). This distribution describes a random

variableXthat is a number of successes innindependent Bernoulli trials

with probability of success p.

In other words,Xis a sum ofnindependent Bernoulli r.v. Therefore,

X takes values in X = {0, 1,...,n} and the distribution is given by a

probability function

p(k) = P(X=k) =

n

k

pk (1 p)nk.

It is easy to check that E(X) =np, Var(X) =np(1 p).

4.3 The Poisson process

4.3.1 Poisson distribution

It is another discrete probability distribution, used to determined the

probability of a designated number of successes per unit of timewhen the

successes/events are independent and the average number of successes

per unit of time remains constant. The Poisson distribution is

p(x) =ex

x! x= 0, 1, 2,... (4.3.1)

where

x= designated number of successes, e= 2.71 the natural base,


72/102

4.3. THE POISSON PROCESS 67

>0 a constant = the average number of successes per unit of time

periodThe Poisson distributions mean and the variance are

= ; 2 = = .

Example4.1 (Poisson distribution usage). We often model the number

of defects or non-conformities that occur in a unit of product (unit area,

volume, and most frequently unit of time...) say, a semiconductor device,

by a Poisson distribution. The number of wire-bonding defects per unit

X is Poisson distributed with parameter= 4. Compute the probability

that a randomly selected semiconductor device will contain two or fewers

wire-bonding defects.

This probability is

(x 2) =p(0) +p(1) +p(2) =2

x=0

ex

x! = 0.2381.

4.3.2 Poisson process

The Poisson process can be viewed as a continuous-time analog of the

Bernoulli process and applies to situations where there is no natural way

of dividing time into discrete periods. We consider an arrival process

that evolves in continuous time, in the sense that any real number tis a

possible arrival time.

Definition24. A counting processX(t) is said to be a Poisson (count-

ing) process with positive rate (or intensity) if

X(0) = 0, andX(t) has independent increments.


73/102


The number of events in any interval of length t is Poisson dis-

tributed with meant; that is, for alls,t >0,

P[X(t + s) X(s) =n] =et(t)n

n! n= 0, 1, 2,... (4.3.2)

4.4 Course Review and Discussion

Practical Problem6.

1. Prove that a Poisson processX(t) with positive rate has station-ary increments, and

E[X(t)] =t, Var[X(t)] =t.

2. Practice. Patients arrive at the doctors office according to a Pois-

son process with rate = 1/10 minute. The doctor will not see a

patient until at least three patients are in the waiting room.

a/ Find the expected waiting time until the first patient is admitted

to see the doctor.

b/ What is the probability that nobody is admitted to see the doctor

in the first hour?

Theorem 25. If every eigenvalue of a matrixPyields linearly indepen-

dent left eigenvectors in number equal to its multuiplicity, then

1. there exists a nonsingular matrixMwhose rows are left eigenvec-

tors ofP, such that


74/102

4.4. COURSE REVIEW AND DISCUSSION 69

2. D= M P M1 is a diagonal matrix with diagonal elements are the

eigenvalues ofP, repeated according to multiplicity.

Practical Problem7 (MC for Business Intelligence). Consider a case

study of mobile phone industry in VN. Due to a most recent survey,

there are four big mobile producers/sellers N, S, M and L, and their

market distributions in 2007 is given by the stochastic matrix:

P =

N M L S

N 1 0 0 0

M 0.4 0 0.6 0

L 0.2 0 0.1 0.7

S 0 0 0 1

IsP regular? ergodic?

Find the long term distribution matrixL= limm Pm.

What is your conclusion?

(Remark that the state N and Sare called absorpting states).


75/102



76/102

Chapter 5

Probability Modeling and

Mathematical Finance

Probability modeling in finance provides instruments to rationalize the

unknown by imbedding it into a coherent framework. Three key com-

ponents should be distinguished: randomness, uncertainty and chaos.

Kolmogorov defined randomness in terms of non-uniqueness and non-

regularity (as a die with six faces or the expansion of). Kalman defined

chaos as randomness without probability.

Few areas that employ much probability modeling include: weather fore-

casting, biology and financial forecasting. In general, in order to model

uncertainty we seek to distinguish the known from the unknown and find

some mechanisms (theories, intuition, common sense...) to reconcile our

knownledge with our lack of it.

71


77/102

72CHAPTER 5. PROBABILITY MODELING AND MATHEMATICAL FINANCE

5.1 Martingales

5.1.1 History

Girolamo Cardano in his book The Book of Game of Chance in 1565

proposed the notion of fair game. He stated: The most fundamental

priciple of all in gambling is simply equal conditions, ... . This is the

essence of the Martingale, however until 1900, in Bacheliers thesis that

a mathematical model of a fair game- or martingale- was proposed.

Nowadays, we understand the concept of a fair game or martingale, in

money terms, states that the expected profit at a given time given the

total past capital is null with probability one.

Throughout this chapter we assume that (, F,P) is a fixed probability

space, where

is a sample space representing the set of all possible outcomes,

F is a -algebra of subsets of representing the events to which

we can assign probabilities, and

P is a probability measure on (, F).

The expectation with respect to P will be denoted by E[.].

5.1.2 Conditional expectation

Let X and Zbe two r.vs on the same (, F,P)-space. Suppose X has

range {x1, x2, . . . , xm}and Zhas range{z1, z2, . . . , z n}. We know that

P[X=xi|Z=zj ] :=P[X=xi, Z=zj]

P[Z=zj]


78/102

5.1. MARTINGALES 73

and also

E[X|Z=zj] =i

xi P[X=xi|Z=zj].

Definition 26. The random variableY = E[X|Z], the conditional ex-

pectation ofX givenZ, is defined as follows:

(a) ifZ() =zj, then Y() :=E[X|Z=zj] =:yj(say).

Justification. In this way we could do partitioning the space into

Z-atoms Z = zj, on which Z is constant. The -algebra G = (Z)

generated byZconsists of sets {ZB}, B B, the Borel set. Therefore

G=(Z) consists precisely of the 2n possible unions of the n Z-atoms.

Note from (a) that Y is constant on Z-atoms, so better we say

(b) Y isG measurable.

Theorem 27 (Kolmogorov 1933). Let (, F,P) be a probability space

and X a random variable with E[|X|] < . LetG be sub--algebra of

F. Then there exists a random variableY such that

a) Y isG-measurable,

b) E[|Y|]< ,

c) for everyG Gwe have

G

Y dP=

G

XdP, G G.


79/102


Moreover, if Y1 is another random variable with these properties then

Y1=Yalmost surely (a.s.), that isP

[Y1=Y] = 1.

A random variable Y with properties a)-c) is called a version of the

conditional expectationE[X|G] ofXgivenG, and we write Y =E[X|G]

a.s.

Proof. Since G is generated by Z, or any G G is a union of the n

Z-atoms, so we first prove that

Z=zj

Y dP= yj P[Z=zj] =... =

Z=zj

XdP.

Write Gj ={Z=zj}then this equation means E[Y IGj ] =E[XIGj ]...

Note 5.1. We often write

E[X|Z] forE[X|G] =E[X|(Z)]; and

E[X|Z1, Z2, . . .] forE[X|(Z1, Z2, . . .)].

Fact 5.2. ifUis a non-negative bounded r.v., then

E[U|G] 0, a.s.

5.1.3 Key properties of Conditional expectation

See textbook.


80/102

5.1. MARTINGALES 75

5.1.4 Filtration

A filtration is a family {Ft, t= 0, 1, . . . , T }of sub--algebras indexed by

t = 0, 1, . . . , T such that

F0 F1 F2 . . . FT;

that is the family is increasing with time. Intuitively, for each t =

0, 1, . . . , T , the -algebra Ft tells us which events may be observed by

time t.

If the sample space is a finite set, often the -algebra F0 is trivial,

consisting simply of the empty set and the whole sample space . We

also often write just {Ft} instead of the lengthy {Ft, t = 0, 1, . . . , T },

and can assume that FT = F(since shall be considering only random

variables that areFT-measurable).

Definition28. We call the quadruple (, F, {Ft},P) a filtered probabil-

ity space.

We fix a filtered probability space (, F, {Ft},P) from now on. Given

d N.

A d-dimensional stochastic process with time index set {0, 1, . . . , T },

defined on the provided filtered probability space, is a collection

X={Xt, t= 0, 1, . . . , T }

where each Xt is a d-dimensional random vector, i.e. a function

Xt : Rd such that


81/102


X1

t (B) { :Xt() B} F

for each subset B ofRd.

The process X = {Xt, t 0} is called adapted(to the filtration

{Ft}) if for each t, Xt isFt-measurable, i.e.

ifX1t (B) Ft for each set B ofRd and for each t= 0, 1, . . . , T .

We often write Xt Ft as shorthand for X1t (B) Ft for all sets

B in Rd.

Two d-dimensional stochastic processes Y = {Yt} and Z = {Zt}

are modifications of one another if P(Yt = Zt) = 1 for each t =

0, 1, . . . , T .

5.1.5 Martingale

A collection/ process M={Mt, Ft, t= 0, 1, . . . , T }, where each Mt is a

real-valued random variable, is called a martingale if the following three

conditions hold:

1. E[|Mt|]< for t = 0, 1, . . . , T ,

2. Mt is Ft-measurable for t = 0, 1, . . . , T , [i.e. the process M is

adapted]

3. the conditional expectation

E[Mt|Ft1] =Mt1 for t= 1, . . . , T .


82/102

5.1. MARTINGALES 77

In our discrete time setting, condition 3. can be equivalently re-

placed by

3.

E[Mt|Fs] =Ms for all s < t in{0, 1, . . . , T }.

We call M asub-martingaleif the = in condition 3. or 3. is replaced

by ; call M a super-martingale if the = in condition 3. or 3. is

replaced by .

When describing (sub/super)martingales we will sometimes omit the fil-

tration Ft from the notion for Mwhen it is understood.

Interpretation of Martingale in Finance

The martingale is considered to be a necessary condition for an efficient

asset market, one in which the information contained in past prices isinstantly, fully and perpetually reflected in the assets current price. We

identify

M={Mt =pt the assets price at t},

and denote the filtration t={p0, p1, . . . , pt}for an asset price history

at timet = 0, 1, 2 . . .expressing the relevant information we have at this

time regarding the time series. Then we could think that in a martingale

process each process event (as a new price)

is independent and can be summed (or intergrable); and

has the property that its conditional expectation remains the same

(i.e. time-invariant).


83/102


Hence,M={Mt = pt}is a martingale iff the expected next period price

is equal to the current price:

E[pt+1|p0, p1, . . . , pt] =pt or equivalentlyE[pt+1|t] =pt for any time t.

If instead asset prices decrease (or increase) in expectation over time, we

have a super-martingale (sub-martingale):

E[pt+1|t]pt

Observation 1. Martingales may also be defined with respect to other

processes.

If, for example, P ={pt, t0}ispriceprocess and Y ={yt, t 0}

is interest rateprocess, we can say that P is a martingale with respect

to Y if

E[|pt|]< , and E[pt+1|y0, y1, . . . , yt] =pt,t.

Fact 5.3. By induction, a martingale implies an invariant mean:

E[pt+1] =E[pt] = = E[p0].

5.1.6 Martingale examples

Example5.1. Sum of independent zero-mean r.vs. LetX1, X2, . . .be a sequence of independent r.vs withE[|Xn|]< , n andE[Xn] = 0.

Define S0= 0,F0={, }and

Sn:= X1+ X2+ X3+ + Xn,


84/102

5.1. MARTINGALES 79

Fn:=(X1, X2, X3, . . . , X n).

Then you can prove for n 1 that

E[Sn|Fn1] =Sn1 a.s.

Example 5.2. Geometric Random Walks and a specific case.

The essential idea underlying the random walk for real processes is the

assumption of mutually independent increments of the order of magni-

tude for each point in time. However, economic time series in particular

do not satisfy the latter assumption. Seasonal fluctuations of monthly

sales figures for example are in absolute terms signicantly greater if the

yearly average sales gure is high. By contrast, the relative orpercent-

age changes are stable over time and do not depend on the current

level ofXt.

Analogously to the random walk Xt =t

i=0 Zi with i.i.d. absoluteincrements Zt = Xt Xt1, a geometric random walk {Xt; t 0} is

assumed to have i.i.d. relative increments

Rt = XtXt1

for t= 1, 2, . . .

For a specific case, the geometric binomial random walk

Xt=Rt Xt1 = X0k=1

tRk

whereX0, R1, R2, . . .are mutually independent, eachRkis Bernoulli, and

for u >1 (up), d


85/102


86/102

5.1. MARTINGALES 81

Example5.4. Product of non-negative independent r.vs of mean

1. Let X1, X2, . . . be a sequence of independent non-negative r.vs withE[Xn] = 1n.

Define M0 = 0,F0 = {, }and

Mn := X1X2X3 . . . X n, Fn := (X1, X2, X3, . . . , X n).

The processM is a martingale. (Why?)

5.1.7 Stopping time

Definition30. A (discrete) stopping time is a function : {0, 1, . . . , T }

{}

such that

{=t} Ft for t= 0, 1, . . . , T . . . ()

Obviously for such a stopping time we see:

{=}= \(Tt=0

{=t}) FT.

For convenience we define F = FT, and then () also holds with

t= .

Justification. Intuitively is a time when you can decide to stop

playing our game. Whether or not you stop immediately after the n-

th game depends only on the history up to (and including) time n:

{=n}= {:() =n} Fn.


87/102


Fact 5.4. With any (discrete) stopping time, there is a-algebra de-

fined by

F={A F: A {=t} Ft for t= 0, 1, . . . , T }.

Lemma 31. If and are two stopping times, then

= min(, ), and = max(, )

both also are stopping times.


88/102

5.2. STOCHASTIC CALCULUS 83

5.2 Stochastic Calculus

Our basic assumption is, we do not know and can not predict tomorrows

values of asset prices. The past history of the asset value is there as a

financial time series for us to examine as much as we want, but we can

n

stochastic processes applications lecturenotes

Documents