stochastic processes applications lecturenotes

Upload: khanglbtk

Post on 04-Jun-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    1/102

    Stochastic Processes

    Selective Topics and Applications

    Nguyen V.M. Man, Ph.D.

    January 15, 2013

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    2/102

    Keywords. probabilistic model, random process, linear algebra, compu-

    tational algebra, statistical inference and modeling

    Copyright in 2013 by

    Lecturer Nguyen V. M. Man, Ph.D.

    Faculty Computer Science and Engineering

    Institution University of Technology of HCMC - HCMUT

    Address 268 Ly Thuong Kiet, Dist. 10, HCMC, Vietnam

    Email: [email protected]

    Ehome www.cse.hcmut.edu.vn/ mnguyen

    the AUTHOR

    Man Nguyen conducted his Ph.D. research in Applied Mathematicsand Industrial Statistics after following a master program in Computa-

    tional Lie Algebras at HCMs University of Science.

    The Ph.D. work was about Factorial Experiment Designs usingCom-

    puter Algebraicmethods andDiscrete Mathematics, be done at the Eind-

    hoven University of Technology, the Netherlands in 2001-2005.

    His current research interests include

    * Algebraic Statistics and Experimental Designs, and

    * Mathematical & Statistical Modeling of practical problems.

    For more information, you are welcomed to visit his e-home at

    www.cse.hcmut.edu.vn/mnguyen

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    3/102

    ii

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    4/102

    Contents

    1 Background 91.1 Introductory Stochastic Processes . . . . . . . . . . . . . 9

    1.2 Generating Functions . . . . . . . . . . . . . . . . . . . . 13

    1.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 13

    1.2.2 Elementary results of Generating Functions . . . 16

    1.2.3 Convolutions . . . . . . . . . . . . . . . . . . . . 18

    1.2.4 Compound distributions . . . . . . . . . . . . . . 20

    2 Markov Chains & Modeling 23

    2.1 Homogeneous Markov chains. . . . . . . . . . . . . . . . 24

    2.2 Classification of States . . . . . . . . . . . . . . . . . . . 29

    2.3 Markov Chain Decomposition . . . . . . . . . . . . . . . 33

    2.4 Limiting probabilities & Stationary distributions. . . . . 36

    2.5 Theory of stochastic matrix for MC . . . . . . . . . . . . 412.6 Spectral Theorem for Diagonalizable Matrices . . . . . . 45

    2.7 Markov Chains with Absorbing States . . . . . . . . . . 48

    2.7.1 Theory. . . . . . . . . . . . . . . . . . . . . . . . 48

    2.8 Chapter Review and Discussion . . . . . . . . . . . . . . 52

    iii

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    5/102

    iv CONTENTS

    3 Random walks & Wiener process 55

    3.1 Introduction to Random Walks . . . . . . . . . . . . . . 553.2 Random Walk- a mathematical realization . . . . . . . . 56

    3.3 Wiener process . . . . . . . . . . . . . . . . . . . . . . . 60

    4 Arrival-Type processes 63

    4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 63

    4.2 The Bernoulli process. . . . . . . . . . . . . . . . . . . . 64

    4.2.1 Basic facts . . . . . . . . . . . . . . . . . . . . . . 644.2.2 Random Variables Associated with the

    Bernoulli Process . . . . . . . . . . . . . . . . . . 66

    4.3 The Poisson process. . . . . . . . . . . . . . . . . . . . . 66

    4.3.1 Poisson distribution . . . . . . . . . . . . . . . . . 66

    4.3.2 Poisson process . . . . . . . . . . . . . . . . . . . 67

    4.4 Course Review and Discussion . . . . . . . . . . . . . . . 68

    5 Probability Modeling and Mathematical Finance 71

    5.1 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . 72

    5.1.1 History. . . . . . . . . . . . . . . . . . . . . . . . 72

    5.1.2 Conditional expectation . . . . . . . . . . . . . . 72

    5.1.3 Key properties of Conditional expectation . . . . 74

    5.1.4 Filtration . . . . . . . . . . . . . . . . . . . . . . 75

    5.1.5 Martingale. . . . . . . . . . . . . . . . . . . . . . 76

    5.1.6 Martingale examples . . . . . . . . . . . . . . . . 78

    5.1.7 Stopping time . . . . . . . . . . . . . . . . . . . . 81

    5.2 Stochastic Calculus . . . . . . . . . . . . . . . . . . . . . 83

    5.2.1 A Simple Model for Asset Prices. . . . . . . . . . 83

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    6/102

    5.2.2 Stochastic differential equation . . . . . . . . . . 83

    6 Part III: Practical Applications of SP 85

    6.1 Statistical Parameter Estimation . . . . . . . . . . . . . 85

    6.2 Inventory Control in Logistics . . . . . . . . . . . . . . . 87

    6.3 Epidemic processes . . . . . . . . . . . . . . . . . . . . . 89

    6.4 Statistical Models in Risk Management . . . . . . . . . . 90

    6.5 Optimization Methods for Portfolio Risk Management. . 91

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    7/102

    2 CONTENTS

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    8/102

    Introduction

    We propose a few specific probabilistic techniques used in mathemati-

    cally modeling complex phenomena in biology, service systems or finance

    activities. These are aimed for graduates in Applied Mathematics and

    Statistics.

    The aims the course

    introduces basic techniques of Stochastic Processes theory, including:

    Markov chains and processes (discrete and continuous parame-

    ters.)

    Random walks, fluctuation theory.

    Stationary processes, spectral analysis.

    Diffusion processes.

    Applications in finance and transportation.

    The structure of the course. The course consists of three parts:

    Part I: Motivated topics for studying Stochastic Processes

    3

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    9/102

    4 CONTENTS

    Part II: Fundamental setting of Stochastic Processes

    Part III: Connections and research projects

    Part I: Motivated topics and Background

    Service systems: mathematical model of queueing systems.

    Introductory Stochastic Processes: basic concepts

    Part II: Basic Stochastic Processes

    We will discuss the followings:

    Markov Chains and processes

    Random walks and Wiener process

    Arrival-Type processes

    Martingaleand Stochastic Calculus

    Part III: New applications of SP

    We investigate few following applications:

    Statistical Models and Simulation in Risk Management

    Mathematical and Statistical Model in Transportation Science

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    10/102

    Motivated topics of SP

    Service systemsOver the last few years the Processor Sharing scheme has attracted re-

    newed attention as a convenient and efficient approach for studying band-

    width sharing mechanisms such as TCP or any process requiring resource

    sharing.

    Understanding and computing such those processes to produce a high

    performance system with limited resources is a very difficult task. Few

    typical aspects of the resource allocation are:

    1. the fact that many classes of jobs (clients) come in a system with

    distinct rates demands a wise policy to get them through efficiently,

    2. measuring performance of a system through many different param-

    eters (metrics) is hard, requires complex mathematical models.

    Evolutionary Dynamics

    Keywords: critial lineages, virus mutant,mutation, reproductive ratio,

    invasion, escape, ecology, vaccine.

    5

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    11/102

    6 CONTENTS

    Introductory Invasion and Escape. Some realistic biological phe-

    nomina occur in nature such as: (a) a parasite infecting a new host, (b) aspecies trying to invade a new ecological niche, (c) cancer cells escaping

    from chemotherapy and, (d) viruses evading anti-microbial therapy.

    Typical problems. Imagine a virus of one host species that is trans-

    ferred to another host species (HIV, SARS). In the new host, the virus

    has a basic reproductive ratio R less than one. Some mutation may

    be required to generate a virus mutant attempting to invade the new

    host that can lead to an epidemic in the new host species. Few crucialconcerns are:

    1. how to calculate the probability that such an attempt succeeds?

    2. suppose a successful and effective vaccine is found; but some mu-

    tants can breakthrough the protective immunity of the vaccine.

    How to calculate the probability that a virus quasispecies contains

    an escape mutant that establishes an infection and thereby causes

    vaccine failure?

    Summary usage We call for a theory to calculate the probability

    of non-extinction/ escape for lineages starting from single individuals.

    Computing Software

    OpenModelica, ScalaLab and R.

    Introductory R a statistiscal language

    R is a language and environment for statistical computing and graphics.

    It is similar to the S language and environment which was developed

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    12/102

    CONTENTS 7

    at Bell Laboratories (formerly AT&T, now Lucent Technologies). The

    R distribution contains functionality for a large number of statisticalprocedures. Among these are:

    linear and generalized

    linear models, nonlinear regression models, time series analysis,

    classical parametric and nonparametric tests ...

    There is also a large set of functions which provide a flexible graphical

    environment for creating various kinds of data presentations.

    One of Rs strengths is the ease with which well-designed publication-

    quality plots can be produced, including mathematical symbols and for-

    mulae where needed. Great care has been taken over the defaults for the

    minor design choices in graphics, but the user retains full control. R is an

    integrated suite of software facilities for data manipulation, calculation

    and graphical display. It includes

    * an effective data handling and storage facility,

    * a suite of operators for calculations on arrays, in particular matrices,

    * a large, coherent, integrated collection of intermediate tools for data

    analysis,

    * graphical facilities for data analysis and display

    * a well-developed, simple and effective programming language which

    includes conditionals, loops, user-defined recursive functions and inputand output facilities.

    Note: most classical statistics and much of the latest methodology is

    available for use with R, but users may need to be prepared to do a little

    work to find it.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    13/102

    8 CONTENTS

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    14/102

    Chapter 1

    Background

    1.1 Introductory Stochastic Processes

    The concept. A stochastics process is just a collection (usually infi-

    nite) ofrandom variables, denoted Xt or X(t); where parameter t often

    represents time. State space of a stochastics process consists of all real-

    izationsxofXt, i.e. Xt=x says the random process is in state xat time

    t. Stochastics processes can be generally subdivided into four distinct

    categories depending on whether tor Xt are discrete or continuous:

    1. Discrete processes: both are discrete, such asBernoulli process(die

    rolling) or Discrete Time Markov chains.

    2. Continuous time discrete state processes: the state space ofXt is

    discrete and the index set, e.g. time set T oft is continuous, as an

    interval of the reals R.

    Poisson process the number of clients X(t) who has entered

    9

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    15/102

    10 CHAPTER 1. BACKGROUND

    ACB from the time it opened until time t. X(t) will have the

    Poisson distribution with the meanE[X(t)] =t (being thearrive rate).

    Continuous time Markov chain.

    Queuing process people not only enter but also leave the

    bank, we need the distribution of service time (the time a

    client spends in ACB).

    3. Continuous processes: both Xt and t are continuous, such as dif-

    fusion process (Brownian motion).

    4. Discrete time continuous state processes: Xt is continuous andt is

    discrete the so-called TIME SERIES such as

    monthly fluctuations of the inflation rate of Vietnam,

    daily fluctuations of a stock market.

    Examples

    1. Discrete processes: random walk model consisting of positions Xt

    of an object (drunkand) at time discrete time point t during 24

    hours, whose directional distance from a particular point 0 is mea-

    sured in integer units. Here T={0, 1, 2, . . . , 24}.

    2. Discrete time continuous processes: Xtis the number of births in a

    given population during time period [0, t]. Here T = R+ = [0, )

    and the state space is {0, 1, 2, . . . , } The sequence of failure times

    of a machine is a specific instance.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    16/102

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    17/102

    12 CHAPTER 1. BACKGROUND

    - X(t) and X(t+ ) will have the same distributions. For the

    first-order distribution,

    FX(x; t) =FX(x; t + ) =FX(x); and fX(x; t) =fX(x).

    These processes are found in Arrival-Type Processes. For which,

    we are interested in occurrences that have the character of an ar-

    rival, such as message receptions at a receiver, job completions in

    a manufacturing cell, customer purchases at a store, etc. We will

    focus on models in which the interarrival times(the times between

    successive arrivals) are independent random variables.

    The case where arrivals occur in discrete time and the interarrival

    times are geometrically distributed is the Bernoulli process.

    The case where arrivals occur in continuous time and the inter-

    arrival times are exponentially distributed is the Poisson process.

    Bernoulli processand Poisson processwill be investigated next.

    2. MARKOVIAN (memory-less) property: Many processes with memory-

    less property caused by experiments that evolve in time and in

    which the future evolution exhibits a probabilistic dependence on

    the past.

    As an example, the future daily prices of a stock are typically

    dependent on past prices. However, in aMarkov process, we assume

    a very special type of dependence: the next value depends on past

    values only through the current value, that is Xi+1 depends only

    on Xi, and not on any previous values.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    18/102

    1.2. GENERATING FUNCTIONS 13

    1.2 Generating Functions

    1.2.1 Introduction

    Probabilistic models often involve several random variablesof interest.

    For example, in a medical diagnosis context, the results of several tests

    may be signicant, or in a networking context, the workloads of several

    gateways may be of interest. All of these random variables are associated

    with the same experiment, sample space, and probability law, and their

    values may relate in interesting ways. Mathematically, a random variable

    is a mapping!

    Definition 1. A random variable X is a mapping (function) from a

    sample space S to the reals R. For any j R, the preimage A :=

    X1(j) ={w: X(w) =j} S is an event, then we understand

    {X=j}= (A) =wA

    (w).

    For finite set - sample spaceSthen obviously

    {X=j}= (A) = |A|

    |S|.

    A discrete random variable Xis the one having finite range Range(X),

    described by the probability point or mass distribution (pmf), deter-

    mined by{X=j}= pj. We must have

    pj 0, andj

    pj = 1.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    19/102

    14 CHAPTER 1. BACKGROUND

    A continuous random variable Xis the one having infinite range Range(X),

    described by the probability density distribution (pdf) f(x), that satisfies

    f(t) 0, and

    tRange(X)

    f(t)dt= 1.

    Generating functionsare important in handling stochastic processes in-

    volving integral-valued random variables.

    Multiple random variables. We consider probabilities involving si-

    multaneously the numerical values of several random variables and to

    investigate their mutual couplings. In this section, we will extend the

    concepts of pmf and expectation developed so far to multiple random

    variables.

    Consider two discrete random variables X, Y : S R associated with

    the same experiment. The joint pmf ofXand Y is dened by

    pX,Y(x, y) = P(X=x, Y =y)

    for all pairs of numerical values (x, y) thatXandYcan take. We will use

    the abbreviated notation P(X = x, Y = y) instead of the more precise

    notations P({X=x} {Y =y}) or P({X=x} and {Y =y}). That is

    P(X=x, Y =y) = P({X=x} {Y =y}) = P({X=x} and {Y =y}).

    For the pair of random variablesX, Y, we say

    Definition2. X andYare independent if for allx, y R, we have

    P(X=x, Y =y) ={X=x}{Y =y} pX,Y(x, y) =pX(x)pY(y),

    or in terms of conditional probability

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    20/102

    1.2. GENERATING FUNCTIONS 15

    ({X=x}|{Y =y}) ={X=x}.

    This can be extended to the so-called mutually independent of a finite

    numbernrandom variables.

    Definition 3. The expectation operator defines the expected value of a

    random variableX as

    E(X) = xRange(X)

    {X=x} x

    If we consider Xis a function from a sample space Sto the naturals

    N, then

    E(X) =i=0

    {X > i}.(W hy?)

    Functions of Multiple Random Variables. When there are multi-

    ple random variables of interest, it is possible to generate new random

    variables by considering functions involving several of these random vari-

    ables. In particular, a function Z=g(X, Y) of the random variables X

    and Y denes another random variable. Its pmf can be calculated from

    the joint pmfpX,Y according to

    pZ(z) = (x,y)|g(x,y)=z

    pX,Y(x, y).

    Furthermore, the expected value rule for functions naturally extends and

    takes the form

    E[g(X, Y)] =(x,y)

    g(x, y)pX,Y(x, y).

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    21/102

    16 CHAPTER 1. BACKGROUND

    Theorem 4. We have two important results of expectation.

    Linearity E(X+ Y) =E(X) + E(Y) for any pair of random variables

    X, Y

    Independence E(X Y) =E(X) E(Y) for any pair of independent r.

    v. X, Y

    Mean, variance and moments of the probability distribution

    {X=j}= pj

    m= E(X) =j=0

    j pj =P(1) =

    j=0

    qj =Q(1)(why!?)

    Recall that the variance of the probability distribution pj is

    2 =E(X(X 1)) + E(X) [E(X)]2

    we need to know

    E(X(X 1)) =j=0

    j(j 1)pj =P(1) = 2Q(1)?

    Therefore, 2 =?

    Exercise: Find the formula of the r-th factorial moment

    [r]=E(X(X 1)(X 2) (X r+ 1))

    1.2.2 Elementary results of Generating Functions

    Suppose we have a sequence of real numbers a0, a1, a2, . . . Introducing

    the dummy variable x, we may define a function

    A(x) =a0+ a1x + a2x2 + =

    j=0

    ajxj. (1.2.1)

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    22/102

    1.2. GENERATING FUNCTIONS 17

    If the series converges in some real interval x0 < x < x0, the func-

    tion A(x) is called the generating function of the sequence {aj}.

    Fact 1.1. If the sequence {aj} is bounded by some cosntant K, then

    A(x) converges at least for|x|< 1 [Prove it!]

    Fact 1.2. In case of the sequence{aj}represents probabilities, we intro-

    duce the restriction

    aj 0,j=0

    aj = 1.

    The corresponding function A(x) is then called aprobability-generating

    function. We consider the (point) probability distribution and the tail

    probability of a random variable X, given by

    {X=j}= pj, P{X > j}= qj,

    then the usual distribution function is P{Xj}= 1qj. The probability-

    generating function now is

    P(x) =j=0

    pjxj =E(xj), E is the expectation operator.

    Also we can define a generating function for the tail probabilities:

    Q(x) =j=0

    qjxj.

    Q(x) is not a probability-generating function, however.

    Fact 1.3.

    a/ P(1) =

    j=0pj1j = 1 and |P(x)|

    j=0 |pjx

    j|

    j=0pj

    1 if|x|< 1. So P(x) is absolutely convergent at least for|x| 1.

    b/Q(x) is absolutely convergent at least for|x|< 1.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    23/102

    18 CHAPTER 1. BACKGROUND

    c/ Connection betweenP(x) andQ(x): (check this!)

    (1 x)Q(x) = 1 P(x) orP(x) + Q(x) = 1 + xQ(x).

    Finding a generating function from a recurrence: multiply both

    sides by xn. For example, the Fibonacci sequence

    fn=fn1+ fn2 =F(x) =x + xF(x) + x2F(x)

    Finding a recurrence from a generating function: whenever you

    knowF(x), we find its power seriesP, the coefficicents ofPbeforexn are

    Fibonacci numbers. How? Just remember how to find a partial fractions

    expansionofF(x), in particular a basic expansion

    1

    1 x= 1 + x + 2x2 +

    In general, ifG(x) is a generating function of a sequence (gn) then

    G(n)(0) =n!gn

    1.2.3 Convolutions

    Now we consider two nonnegative independent integral-valued random

    variablesXand Y, having the probability distributions

    {X=j}= aj, P{Y =k}= bk. (1.2.2)

    The joint probability of the event (X=j, Y =k) is ajbk obviously. We

    form a new random variable S=X+ Y, then the event S=r comprises

    the mutuallyexclusive events

    (X= 0, Y =r), (X= 1, Y =r 1), , (X=r, Y = 0).

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    24/102

    1.2. GENERATING FUNCTIONS 19

    Fact 1.4. The probability distribution of the sumS then is

    {S=r}= cr =a0br+ a1br1+ + arb0.

    Proof.

    pS(r) = P(X+Y =r) =

    (x,y):x+y=r

    P(X=x and Y =y) ==x

    pX(x)pY(rx)

    This method of compounding two sequences of numbers (not necessarily

    be probabilities) is called convolution. Notation

    {cj}= {aj} {bj}

    will be used.

    Fact 1.5. Define the generating functions of the sequence{aj},{bj}and

    {cj}by

    A(x) =j=0

    ajxj, B(x) =

    j=0

    bjxj , C(x) =

    j=0

    cjxj,

    it follows thatC(x) =A(x)B(x). [check this!]

    In practical applications, the sum of several independent integral-

    valued random variables Xi can be defined

    Sn = X1+ X2+ + Xn, n Z+.

    If the Xi have a common probability distribution given by pj, with

    probability-generating function P(x), then the probability-generating

    function ofSn isP(x)n. Clearly, the n-fold convolution ofSn is

    {pj} {pj} {pj} (n factors) ={pj}n.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    25/102

    20 CHAPTER 1. BACKGROUND

    1.2.4 Compound distributions

    In our discussion so far of sums of random variables, we have always

    assumed that the number of variables in the sum is known and xed, i.e.,

    it is nonrandom. We now generalize the previous concept of convolution

    to the case where the number Nof random variables Xk contributing to

    the sum is itself a random variable! In particular, we consider the sum

    SN=X1+ X2+ + XN, where

    {Xk=j}= fj,

    {N=n}= gn,

    {SN=l}= hl.

    (1.2.3)

    Probability-generating functions ofX, N and Sare

    F(x) =

    fjxj,

    G(x) =

    gnxn,

    H(x) =

    hlxl.

    (1.2.4)

    Compute H(x) with respect to F(x) and G(x). Prove that

    H(x) =G(F(x)).

    Example 1.1. A remote village has three gas stations, and each one

    of them is open on any given day with probability 1/2, independently of

    the others. The amount of gas available in each gas station is unknownand is uniformly distributed between 0 and 1000 gallons. We wish to

    characterize the distribution of the total amount of gas available at the

    gas stations that are open.

    The number Nof open gas stations is a binomial random variable

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    26/102

    1.2. GENERATING FUNCTIONS 21

    with p= 1/2 and the corresponding transform is

    GN(x) = (1 p +pex)3 =18

    (1 + ex)3.

    The transform (probability-generating function) FX(x) associated with

    the amount of gas available in an open gas station is

    FX(x) =e1000x 1

    1000x .

    The transformHS(x) associated with the total amountSof gas avail-

    able at the three gas stations of the village that are open is the same as

    GN(x), except that each occurrence ofex is replaced with FX(x), i.e.,

    HS(x) =G(F(x)) =1

    8(1 + FX(x))

    3.

    Next chapter will discuss Fundamental Stochastic Processes.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    27/102

    22 CHAPTER 1. BACKGROUND

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    28/102

    Chapter 2

    Markov Chains & Modeling

    We discuss the concept of discrete time Markov Chain or just Markov

    Chains (MC) in this section. Suppose we have a sequence Mof consec-

    utive trials, numbered n = 0, 1, 2, . The outcome of the nth trial is

    represented by the random variable Xn, which we assume to be discrete

    and to take one of the values jin a finite setQof discreteoutcomes/states

    {e1, e2, e3, . . . , es}.

    M is called a (discrete time) Markov chain if, while occupying Q states

    at each of the unit time points 0, 1, 2, 3, . . . , n1, n , n+1, . . .,M satisfies

    the following property, called

    Markov property or Memoryless property:

    (Xn+1 = j |Xn=i, , X0=a) = (Xn+1= j |Xn=i),

    for all n = 0, 1, 2, .

    23

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    29/102

    24 CHAPTER 2. MARKOV CHAINS & MODELING

    (In each time step n to n+ 1, the process can stay at the same state ei

    (at both n, n+ 1) or move to other state ej (at n+ 1) with respect tothe memoryless rule, saying the future behavior of system depends only

    on the present and not on its past history.)

    Definition5 (One-step transition probability).

    Denote the absolute probability of outcomej at thenth trial by

    pj(n) = (Xn=j ) (2.0.1)

    The one-step transition probability, denoted

    pij(n + 1) = (Xn+1=j |Xn = i),

    defined as the conditional probability that the process is in statej at time

    n + 1 given that the process was in statei at the previous timen, for all

    i, j Q.

    2.1 Homogeneous Markov chains

    If the state transition probabilities pij(n+ 1) in a Markov chain M is

    independentof time n, they are said to be stationary,time homogeneous

    or just homogeneous. The state transition probability in homogeneous

    chain then can be written without mention time point n:

    pij = (Xn+1=j |Xn= i). (2.1.1)

    Unless stated otherwise, we assume and will work with homogeneous

    Markov chains M. The one-step transition probabilities given by 2.1.1

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    30/102

    2.1. HOMOGENEOUS MARKOV CHAINS 25

    of these Markov chains must satisfy:

    sj=1

    pij = 1; for each i= 1, 2, , sand pij 0.

    Transition Probability Matrix. In practice, we are likely given the ini-

    tial distribution (the probability distribution of starting position of the

    concerned object at time point 0), and the transition probabilities; and

    we want to determine the the probability distribution of positionXn for

    any time point n > 0. The Markov property, quantitatively describedthrough transition probabilities, is represented in the state transition

    matrix P = [pij]:

    P =

    p11 p12 p13 . . . .p1s.

    p21 p22 p23 . . . p2s.

    p31 p32 p33 . . . p3s...

    .

    ..

    .

    ..

    .

    . . . ..

    .

    (2.1.2)

    Briefly, we have

    Definition 6. A (homogeneous) Markov chain M is a triple (Q,p,A)

    in which:

    Q is a finite set of states (be identified with an alphabet),

    p(0) are initial probabilities, (at initial time pointn= 0)

    Pare state transition probabilities, denoted by a matrixP = [pij]

    in which

    pij = (Xn+1 = j |Xn=i)

    .

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    31/102

    26 CHAPTER 2. MARKOV CHAINS & MODELING

    And such that thememoryless property is satisfied,ie.,

    (Xn+1=j |Xn=i, , X0=a) = (Xn+1 = j |Xn=i), for alln.

    In practice, the initial probabilities p(0) is obtained at the current

    time (begining of a research), and the transition probability matrix P is

    found from empirical observations in the past. In most cases, the major

    concern is using P and p(0) to predict future.

    Example 2.1. The Coopmart chain (denotedC) in SG currently con-

    trols60% of the daily processed-food market, their rivals Maximart and

    other brands (denotedM) takes the other share. Data from the previous

    years (2006 and 2007) show that88% ofCs customers remained loyal

    to C, while12% switched to rival brands. In addition,85% ofMs cus-

    tomers remained loyal to M, while other15% switched to C. Assuming

    that these trends continue, determineCs share of the market (a) in 5

    years and (b) over the long run.

    Proposed solution. Suppose that the brand attraction is time homoge-

    neous, for a sample of large enough size n, we denote the customers

    attention in the year n by a random variable Xn. The market share

    probability of the whole population then can be approximated by using

    the sample statistics, e.g.

    P(Xn=C) = |{x: Xn(x) =C}|

    n , and P(Xn=M) = 1 P(Xn=C).

    Set n= 0 for the current time, the initial probabilities then is

    p(0) = [0.6, 0.4] = [P(X0 = C),P(X0=M)].

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    32/102

    2.1. HOMOGENEOUS MARKOV CHAINS 27

    Obviously we want to know the market share probabilities p(n) = [P(Xn =

    C),P

    (Xn = M)] at any yearn >0. We now introduce a transition prob-ability matrix with labels on rows and columns to be Cand M

    P=

    C M

    C 0.88 0.12

    M 0.15 0.85

    =

    1 a= 0.88 a= 0.12

    b= 0.15 1 b= 0.85

    , =

    0.88 0.12

    0.15 0.85

    ,

    (2.1.3)

    where a= pCM = P[Xn+1 =M|Xn =C], b= pMC= P[Xn+1 =C|Xn =M].

    Higher-order transition probabilities.

    The aim: find the absolute probabilities at any stage n. We write

    p(n)ij = (Xn+m = j |Xm=i), with p(1)ij =pij (2.1.4)

    for the n-step transition probability, being dependent of m N, see

    Equation 2.1.1. Then-step transition matrix is denoted asP(n) = (p(n)ij ).

    For the case n= 0, we have

    p(0)ij =ij = 1 ifi = j, and i=j.

    Chapman Komopgorov equations. Chapman Komopgorov equations re-

    late then-step transition probabilities and k-step andn k-step transi-

    tion probabilities:

    p(n)ij =

    sh=1

    p(nk)ih p

    (k)hj, 0< k < n.

    This results in the matrix notation

    P(n) =P(nk)P(k).

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    33/102

    28 CHAPTER 2. MARKOV CHAINS & MODELING

    Since P(1) =P, we get P(2) =P2, and in general P(n) =Pn.

    Let p

    (n)

    denote the vector form of probability mass distribution (pmf orabsolute probability distribution) associated withXnof a Markov process,

    that is

    p(n) = [p1(n), p2(n), p3(n), . . . , ps(n)],

    where each pi(n) is defined as in 2.0.1.

    Proposition 7. The absolute probability distributionp(n) at any stagen

    of a Markov chain is given in the matrix form

    p(n) =Pnp(0), wherep(0) =p is the initial probability vector. (2.1.5)

    Proof. We employ two facts:

    * P(n) =Pn, and

    * the absolute probability distributionp(n+1) at any stagen +1 (asso-

    ciated withXn+1) can be found by the 1-step transition matrixP = [pij]

    and the distribution

    p(n) = [p1(n), p2(n), p3(n), . . . , ps(n)]

    at any stage n(associated with Xn):

    pj(n + 1) =si=1

    pijpi(n), or in the matrix notation p(n+1) =P p(n).

    Then just do the induction p(n+1) = P p(n) = P P,p(n1) = =

    Pn+1

    p(0)

    .

    Example 2.2 (The Coopmart chain: cont. ). (a/) Cs share of the

    market in 5 years can be computed by

    p(5) = [pC(5), pM(5)] =P5p(0).

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    34/102

    2.2. CLASSIFICATION OF STATES 29

    Practical Problem1. A state transition diagram of a finite-state Markov

    chain is a line diagram with a vertex corresponding to each state and a

    directed line between two vertices i and j ifpij >0. In such a diagram,

    if one can move fromi and j by a path following the arrows, theni j .

    The diagram is useful to determine whether a finite-state Markov

    chain is irreducible or not, or to check for periodicities.

    Draw the state transition diagrams and classify the states of the MCs

    with the following transition probability matrices:

    P1=

    0 0.5 0.5

    0.5 0 0.5

    0.5 0.5 0

    ; P2 =

    0 0 0.5 0.5

    1 0 0 0

    0 1 0 0

    0 1 0 0

    ; P3 =

    0.3 0.4 0 0 0.3

    0 1 0 0 0

    0 0 0 0.6 0.4

    0 0 1 0 0

    2.2 Classification of States

    A) Accessible states.

    Statejis said to be accessible from state i if for some N0, p(N)ij >0,

    and we write i j . Two statesi andj accessible to each other are said

    to communicate, and we write i j. If all states communicate with

    each other, then we say that the Markov chain is irreducible. Formally,irreducibility means

    i, jQ : N0[p(N)ij >0].

    B) Recurrent/persistence states and Transient states.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    35/102

    30 CHAPTER 2. MARKOV CHAINS & MODELING

    Let A(i) be the set of states that are accessible from i. We say that

    i is recurrent if from any future state, there is always some probabilityof returning to i and, given enough time, this is certain to happen. By

    repeating this argument, if a recurrent state is visited once, it will be

    revisited an innite number of times.

    A state is called transientif it is not recurrent. In particular, there are

    states j A(i) such that i is not accessible from j. After each visit to

    statei, there is positive probability that the state enters such a j . Given

    enough time, this will happen, and state i cannot be visited after that.

    Thus, a transient state will only be visited a finite number of times.

    We now formalize concepts of recurrent/persistence state and transient

    state.

    Let thefirst return timeTj indicate the first timeor the number of steps

    the chain is firstly at state j after leaving j after time 0 (if j is never

    reached then setTj =) It is a discrete r.v., taking values in {1, 2, 3,...}.

    For any two states i =j andn >0, letfni,j be the conditional probability

    the chain is firstly at state j after n steps given it was at state i at time

    0:

    fni,j := P[Tj =n|X0 = i] = P[Xn = j, Xk=j, k= 1, 2,...,n 1|X0 = i]

    and f0i,j = 0 since Tj 1. Then clearly

    f1i,j = P[X1 = j |X0=i] =pi,j

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    36/102

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    37/102

    32 CHAPTER 2. MARKOV CHAINS & MODELING

    Statej is said to be transient (or nonrecurrent) if

    fj,j = P[Tj

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    38/102

    2.3. MARKOV CHAIN DECOMPOSITION 33

    2.3 Markov Chain Decomposition

    Fact 2.1. In any Markov Chain, the followings are correct.

    It can be decomposed into one or morerecurrent classesor equiv-

    alent classes, plus possibly some transient states. Each equivalent

    class contains those states that communicate with each other.

    A recurrent state is accessible from all states in its class, but is not

    accessible from recurrent states in other classes;

    A transient state is not accessible from any recurrent state. But,

    at least one, possibly more, recurrent states are accessible from a

    given transient state.

    For the purpose of understanding long-termbehavior of Markov

    chain, it is important to analyze chains that consist of a single recur-

    rent class. Such Markov chain is called irreducible chain.

    For the purpose of understanding short-termbehavior, it is also

    important to analyze the mechanism by which any particular class of

    recurrent states is entered starting from a given transient state.

    C) Periodic states.

    In a finite Markov Chain M = (Q,, P) (i.e. having finite number

    of states), a periodicstate i is state to which an agent could go back at

    positive integer time points t0, 2t0, 3t0, . . .(multiple of an integer period

    t0 >1). t0 is named the period ofi, being the greatest common divisor

    of the integers {t >0 :p(t)i,i >0}.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    39/102

    34 CHAPTER 2. MARKOV CHAINS & MODELING

    A Markov Chain isaperiodicif there is no such periodic state, in other

    words, if the period of each state i Q is 1.

    For example, we could check if a MC has the transition matrix

    P =

    0 0 0.6 0.4

    0 0 0.3 0.7

    0.5 0.5 0 0

    0.2 0.8 0 0

    ;

    then it is periodic. Indeed, if the Markovian random variable (agent)

    starts at time 0 in stateE1, then at time 1 it must be in state E3 orE4,

    at time 2 it must be in state E1 or E2. Therefore, it generaly can visit

    onlyE1 at times 2,4,6, ... Summarizing we have

    Definition9. A finite Markov chainM= (Q,, P) is

    1. irreducible iff it has only one single recurrent class, or any state

    can be accessible from all other states.

    2. aperiodic iff the period of each state i Q is 1; or it has no

    periodic state.

    3. ergodic if it is positive recurrent and aperiodic.

    It can be shown that recurrence, transientness, and periodicity are all

    class properties; that is, if state i is recurrent (positive recurrent, null

    recurrent, transient, periodic), then all other states in the same class of

    state i inherit the same property.

    D) Absorbing states and Absorption probabilities.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    40/102

    2.3. MARKOV CHAIN DECOMPOSITION 35

    State j is said to be an absorbing state ifpjj = 1; that is, once state

    j is reached, it is never left.

    If there is a unique absorbing statek, its steady-state probability is

    1 (because all other states are transient and have zero steady-state

    probability), and will be reached with probability 1, starting from

    any initial state.

    If there are multiple absorbing states, the probability that one

    of them will be eventually reached is still 1, but the identity of

    the absorbing state to be entered is random and the associated

    probabilities may depend on the starting state.

    Can we determine precisely absorption probabilities for all the ab-

    sorbing states in a MC in the generic case?

    Consider a Markov chain X(n) = {Xn, n 0} with finite state space

    E={1, 2, , N}and transition probability matrix P.

    Theorem 10. Let A = {1, , m} be the set of absorbing states and

    B ={m + 1, , N} be a set of nonabsorbing states.

    Then the transition probability matrixPcan be expressed as

    P=

    I O

    R Q

    whereI ism midentity matrix, 0is anm (Nm)zero matrix,

    and the elements ofR are the one-step transition probabilities from

    nonabsorbing to absorbing states, and the elements of Q are the

    one-step transition probabilities among the nonabsorbing states.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    41/102

    36 CHAPTER 2. MARKOV CHAINS & MODELING

    LetU= [uk,j ] be an(N m) m matrix and its elements are the

    absorption probabilities for the various absorbing states,

    uk,j = P[Xn= j(A)|X0 = k(B)]

    We have

    U= (I Q)1R= R,

    is called the fundamental matrixof the Markov chainX(n).

    2.4 Limiting probabilities & Stationary dis-

    tributions

    From now on we assume that all MCs are finite, aperiodic and irre-

    ducible. The irreducibility assumption implies that any state can even-

    tually be reached from any other state. Both irreducibility and aperiod-

    icity assumptions hold for essentially all practical applications of MCs

    (in bioinformatics,...) except for the case of MCs with absorbing states.

    Definition11. Vectorp = (p1, p2, , p

    s) is called the stationary dis-

    tribution of a Markov chain{Xn, n 0}with the state transition matrix

    P if:

    pP= p.

    This equation indicates that a stationary distribution p is a left eigen-

    vector of P with eigenvalue 1. In general, we wish to know limiting

    probabilities p from taking n in the equation

    p() =Pp(0).

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    42/102

    2.4. LIMITING PROBABILITIES & STATIONARY DISTRIBUTIONS37

    We need some general results to determine the stationary distribution

    p

    and limiting probabilitiesp

    of a Markov chain. For a specific classof MCs as follows, there exist stationary distribution.

    Lemma 12. If M = (Q,, P) is a finite, aperiodic and irreducible

    Markov chain, then some power ofPis strictly positive.

    See a proof in [7], page 79. Such matrices P (that there exists a

    natural m such that Pm >0) are called regularmatrices.

    Theorem 13. [Equilibrium distribution] Given a finite, aperiodic and

    irreducible Markov chainM = (Q,, P), whereQ consists ofs states.

    Then there exist stationary probabilities

    pi := limt

    pi(t),

    where thepi form a unique solution to the conditions:

    si=1 p

    i = 1; where eachp

    i 0;

    pj =s

    i=1 pipi,j.

    See the proof in Theorem 19. We discuss here two particular cases

    when s = 2 ands >2.

    A) Markov chains that have two states.

    At first we investigate the case of Markov chains that have two states, say

    Q= {e1, e2}. Leta= pe1e2andb= pe2e1the state transition probabilities

    between distinct states in a two state Markov chain, its state transition

    matrix is

    P =

    p11 p21

    p12 p22

    =

    1 a a

    b 1 b

    , where 0< a

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    43/102

    38 CHAPTER 2. MARKOV CHAINS & MODELING

    Proposition 14.

    a) Then-step transition probability matrix is given by

    P(n) =Pn = 1

    a + b

    b a

    b a

    + (1 a b)n

    a a

    b b

    b) Find the limit matrix whenn .

    To prove this basic Proposition 14 (computing transition probability

    matrix of two state Markov chains), we use a fundamental result of Linear

    Algebra that is recalled in Subsection 2.6.

    Proof. The eigenvalues of the state transition matrixP found by solving

    equation

    c() =|I P|= 0

    are 1 = 1 and 2 = 1 a b. The spectral decomposition of square

    matrix says Pcan be decomposed into two constituent matrices E1, E2

    (since only two eigenvalues was found):

    E1= 1

    1 2[P 2I], E2 =

    1

    2 1[P 1I].

    That means, E1, E2 are orthogonal matrices, i.e. E1 E2 = 0 =E2 E1,

    and

    P =1E1+ 2E2; E21 =E1, E

    22 =E2.

    Hence,

    Pn =n1E1+ n2E2 = E1+ (1 a b)

    nE2,

    or

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    44/102

    2.4. LIMITING PROBABILITIES & STATIONARY DISTRIBUTIONS39

    P(n) =Pn = 1a + b

    b ab a

    + (1 a b)n a ab b

    b) The limit matrix when n :

    limn

    Pn = 1

    a + b

    b a

    b a

    B) Markov chains that have more than two states.

    For s > 2, it is cumbersome to compute constituent matrices Ei ofP,

    we could employ the so-called regular property.

    Definition 15. Markov chains are regular if there exists m N such

    that

    P(m) =Pm >0

    (i.e. every matrix entry is positive).

    In summary, in a DTMC Mthat have more than two states, we have 4

    cases:

    Fact 2.2.

    1. M has irreducible, positive recurrent, but periodic states. The

    component i of the stationary distribution vector must be un-

    derstood as the long-run proportion of time that the process is in

    state i.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    45/102

    40 CHAPTER 2. MARKOV CHAINS & MODELING

    2. M has several closed, positive recurrent classes. In this case, the

    transition matrix of the DTMC takes the block form.

    In contrast to the irreducible ergodic DTMC, where the limiting

    distribution is independent of the initial state, the DTMC with sev-

    eral closed, positive recurrent classes has the limiting distribution

    that is dependent on the initial state.

    3. Mhas both recurrent and transient classes. In this situation, we

    often seek the probabilities that the chain is eventually absorbed

    by different recurrent classes. See the well-known gamblers ruin

    problem.

    4. M is an irreducible DTMC withnullrecurrent or transient states.

    This case is only possible when the state space is innite, since any

    nite-state, irreducible DTMC must be positive recurrent. In this

    case, neither the limiting distribution nor the stationary distribu-

    tion exists.

    A well-known example of this case is the random walk model.

    Practical Problem3. Consider a Markov chain with state space {0, 1, 2}

    and transition probability matrix

    P =

    0 0.5 0.5

    1 0 0

    1 0 0

    ;

    Show that state 0 is periodic with period 2.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    46/102

    2.5. THEORY OF STOCHASTIC MATRIX FOR MC 41

    Practical Problem4(The Gamblers Ruin problem). Let two gam-

    blers, A and B, initially have k dollars and m dollars, respectively. Sup-pose that at each round of their game, A wins one dollar from B with

    probability p and loses one dollar to B with probability q= 1 p. As-

    sume that A and B play until one of them has no money left. LetXn be

    As capital after round n, where n= 0, 1, 2, and X0=k.

    (a) Show thatX(n) ={Xn, n 0} is a Markov chain with absorbing

    states.

    (b) Find its transition probability matrixP. RealizeP whenp= q=

    1/2 andN= 4

    (c*) What is the probability of As losing all his money?

    2.5 Theory of stochastic matrix for MC

    A stochastic matrix is a matrix for which each column sum equals one.

    If the row sums also equal one, the matrix is called doubly stochastic.

    Hence the transition probability matrix P = [pij] is a stochastic matrix.

    Proposition 16. Every stochastic matrixK has

    1 as an eigenvalue (possibly with multiple), and

    none of the eigenvalues exceeds 1 in absolute value, that is all eigen-

    valuesi satisfy|i| 1.

    Proof.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    47/102

    42 CHAPTER 2. MARKOV CHAINS & MODELING

    The spectral radius (K) of any square K is defined as

    (K) = maxi {eigen values i}.

    When K is stochastic, (K) = 1. Note that ifP is a transition matrix

    for a nite-state Markov chain, (then P is stochastic) the multiplicity

    of the eigenvalue (K) = 1 is equal to the number ofrecurrent classes

    associated with P .

    Fact 2.3. IfKis a stochastic matrix thenKm is a stochastic matrix.

    Proof. Let e = [1, 1, , 1]t the all-one vector, then use the fact that

    Ke= e. Prove that Kme= e.

    Let A = [aij] > 0 denote that every element aij of A satisfies the

    condition aij >0.

    Definition17.

    A stochastic matrixP = [pij] is ergodic if limm Pm = L (say)

    exists, that is eachp(m)ij has a limit whenm .

    A stochastic matrix P is regular if there exists a natural m such

    that Pm > 0. In our context, a Markov chain, with transition

    probability matrixP, is called regular if there exists anm >0 such

    thatPm >0, i.e. there is a finite positive integerm such that after

    m time-steps, every state has a nonzero chance of being occupied,

    no matter what the initial state.

    Example2.3. Is the matrix

    P =

    0.88 0.12

    0.15 0.85

    regular? ergodic? Calculate the limit matrixL= limm Pm.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    48/102

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    49/102

    44 CHAPTER 2. MARKOV CHAINS & MODELING

    (p is called a stationary distribution of MC). Your final task is proving

    that Ls rows are identical and equal to the stationary distribution p

    i.e.: L= [p, ,p].

    Corollary 20. Few important remarks are: (a) for regular MC, the

    long-term behavior does not depend on the initial state distribution prob-

    abilitiesp(0); (b) in general, the limiting distributions are influenced by

    the initial distributionsp(0), whenever the stochastic matrixP = [pij ]is

    ergodic but not regular. (See more at problem D).

    Example2.4. Consider a Markov chain with two states and transition

    probability matrix 3/4 1/4

    1/2 1/2

    (a) Find the stationary distributionp of the chain. (b) Findlimn Pn

    by first evaluatingPn. (c) Find limn Pn.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    50/102

    2.6. SPECTRAL THEOREM FOR DIAGONALIZABLE MATRICES45

    2.6 Spectral Theorem for Diagonalizable

    Matrices

    Consider a square matrix Pof order s with spectrum (P) ={1, 2, , k}

    consisting of its eigenvalues. Then:

    If{(1,x1), (2,x2), , (k,xk)} are eigenpairs for P, then S =

    {x1, ,xk} is a linearly independent set. IfBi is a basis for the

    null space N(P iI), then B = B1 B2 Bk is a linearly

    independent set

    P is diagonalizable if and only if P possesses a complete set of

    eigenvectors (i.e. a set ofs linearly independent vectors). More-

    over, H1P H = D = (1, 2, , s) if and only if the columns

    ofHconstitute a complete set of eigenvectors and the js are the

    associated eigenvalues- i.e., each (j , H[, j]) is an eigenpair forP.

    Spectral Theorem for Diagonalizable Matrices. A square matrix

    Pof order s with spectrum(P) ={1, 2, , k}consisting of eigen-

    values is diagonalizable if and only if there exist constituent matrices

    {E1, E2, , Ek}(called the spectral set) such that

    P =1E1+ 2E2+ + kEk, (2.6.1)

    where the Eis have the following properties:

    Ei Ej = 0 whenever i=j , and E2i =Ei for all i= 1..k

    E1+ E2+ + Ek=I

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    51/102

    46 CHAPTER 2. MARKOV CHAINS & MODELING

    In practice we employ Fact 2.6.1 in two ways:

    Way 1 : if we know the decomposition 2.6.1 explicitly, then we cancompute powers

    Pm =m1 E1+ m2 E2+ +

    mkEk, for any integer m >0. (2.6.2)

    Way 2: if we know P is diagonalizable then we find the constituent

    matrices Ei by:

    * finding the nonsingular matrix H= (x1|x2| |xk), where each xi

    is a basis left eigenvector of the null subspace

    N(P iI) ={v: (P iI)(v) = 0 Pv= iv};

    ** then, P =H DH1 = (x1|x2| |xk) D H1 where

    D= diag(1, , k) the diagonal matrix, and

    H

    1

    =K

    =

    yt1

    yt2

    ...

    ytk

    ; (i.e.K= (y1|y2| |yk)).

    Here each yi is a basis right eigenvector of the null subspace

    N(P iI) ={v:vP =iv

    }.

    The constituent matrices Ei=xi yti.

    Example2.5. Diagonalize the following matrix and provide its spectral

    decomposition.

    P =

    1 4 4

    8 11 8

    8 8 5

    .

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    52/102

    2.6. SPECTRAL THEOREM FOR DIAGONALIZABLE MATRICES47

    The characteristic equation is

    p() = det(P I) =3 + 52 + 3 9 = 0.

    So = 1 is a simple eigenvalue, and = 3 is repeated twice (its

    algebraic multiplicityis 2). Any set of vectors x satisfying

    x N(P I) (P I)x= 0

    can be taken as a basis of the eigenspace (or null space) N(P I).

    Bases of for the eigenspaces are:

    N(P1I) =span

    [1, 2, 2]

    ; andN(P+3I) =span

    [1, 1, 0], [1, 0, 1]

    .

    Easy to check that these three eigenvectors xi form a linearly indepen-

    dent set, then P is diagonalizable. The nonsingular matrix (also called

    similarity transformation matrix)

    H= (x1|x2|x3) =

    1 1 12 1 0

    2 0 1

    ;

    will diagonalize P, and since P =H DH1 we have

    H1P H=D = (1, 2, 2) = (1, 3, 3) =

    1 0 0

    0 3 0

    0 0 3

    Here, H1 =

    1 1 1

    2 3 2

    2 2 1

    implies that

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    53/102

    48 CHAPTER 2. MARKOV CHAINS & MODELING

    yt1 = [1, 1, 1], yt2 = [2, 3, 2], y

    t3 = [2, 2, 1]. Therefore, the con-

    stituent matrices

    E1 = x1yt1=

    1 1 1

    2 2 2

    2 2 2

    ; E2 = x2yt2=

    2 3 2

    2 3 2

    0 0 0

    ; E3 = x3yt3=

    2 2 1

    0 0 0

    2 2 1

    .

    Obviously,

    P =1E1+ 2E2+ 3E3 =

    1 4 4

    8 11 8

    8 8 5

    .

    2.7 Markov Chains with Absorbing States

    2.7.1 Theory

    Two quetions:

    / if there are at least two absorbing states, what is the probabilitythat a specific absorbing state is the one eventually entered?

    / what is the mean time until an absorbing state is eventually en-

    tered?

    Question . The probability that a specific absorbing state is the one

    eventually entered.

    Theorem 21. Consider a Markov chainX(n) ={Xn, n 0}with finite

    state spaceE = {1, 2, , N} and transition probability matrixP. Let

    A= {1, , m} be the set of absorbing states andB ={m+ 1, , N}

    be a set of nonabsorbing states.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    54/102

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    55/102

    50 CHAPTER 2. MARKOV CHAINS & MODELING

    we could equivalently check that absorption ofX(n) in one or another

    of the absorbing states is certain. Formally, you could prove

    Lemma 22.

    limnP[Xn B ] = 0 or limnP[Xn A] = 1.

    Question . The mean time until an absorbing state is eventually

    entered.LetTk denote the total time units (or steps) to absorption from state

    k (meaning X0 = k), where k=m + 1..N. Let

    T = [Tm+1, Tm+2, , TN]

    Then it can be shown that the mean time E(Tk) to absorption from

    state k

    E(Tk) =N

    i=m+1

    [k, i]

    where [k, i] the (k, i)th element of the fundamental matrix .

    Proof. Let W = [nj,k] ,wherenj,kis the number of times the statek(B)

    is occupied until absorption takes place when Xn starts in state j (B).

    Then

    Tj =N

    k=m+1

    nj,k,

    then calculate E(nj,k).

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    56/102

    2.7. MARKOV CHAINS WITH ABSORBING STATES 51

    Example2.6. Consider a simple random walkX(n)with absorbing bar-

    riers at state 0 and stateN = 3 = mA+mB as in the Gamblers Ruinproblem; wheremA= 2USD isA capital andmB = 1USD isB capital

    at round 0. Can you write out

    a/ the transition probability matrix P, known thatp = P[ A wins ] in

    each round, where 0< p

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    57/102

    52 CHAPTER 2. MARKOV CHAINS & MODELING

    2.8 Chapter Review and Discussion

    Application in Large Deviation theory. We are interested in a

    practical situation in insurance industry, originally realized from 1932

    by F. Esscher, (Notices of AMS, Feb 2008).

    Problem: too many claims could be made against the insurance com-

    pany, we worry about the total claim amount exceeding the reserve fund

    set aside for paying these claims.

    Our aim: to compute the probability of this event.

    Modeling. Each individual claim is a random variable, we assume

    some distribution for it, and the total claim is then the sum Sof a large

    number of (independent or not) random variables. The probability that

    this sum exceeds a certain reserve amount is the tail probabilityof the

    sumSof independent random variables.

    Large Deviation theoryinvented by Esscher requires the calculation of the

    moment generating functions! If your random variables are independent

    then the moment generating functions are the product of the individual

    ones, but if they are not (like in a Markov chain) then there is no longer

    just one moment generating function!

    Research project: study Large Deviation theory to solve this problem.

    Practical Problem5 (Brand switching model for consumer behavior).

    Suppose there are several brands of a product competing in a market

    (for example, those brands might be competing brands of soft drinks).

    Assume that every week a consumer buys one of the three brands, labeled

    as 1, 2, and 3. In each week, a consumer may either buy the same

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    58/102

    2.8. CHAPTER REVIEW AND DISCUSSION 53

    brand he bought the previous week or switch to a different brand. A

    consumers preference can be influenced by many factors, such as brandloyaltyandbrand pressure(i.e., a consumer is persuaded to purchase the

    same brand). To gauge consumer behavior, sample surveys are frequently

    conducted. Suppose that one of such surveys identifies the following

    consumer behavior:

    Following week

    Current week Brand 1 Brand 2 Brand 3

    Brand 1 0.51 0.35 0.14

    Brand 2 0.12 0.80 0.08

    Brand 3 0.03 0.05 0.92

    The market share of a brand during a period is defined as the

    average proportion of people who buy the brand during the period. Our

    questions are:

    a/ What is the market share of a specific brand in a short run (say

    in 3 months) or in a long run (say in 3 years)?

    b/ How does repeat business, due to brand loyaltyand brand pres-

    sure, affect a companys market share and profitability?

    c/ What is the expected number of weeks that a consumer stays

    with a particular brand?

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    59/102

    54 CHAPTER 2. MARKOV CHAINS & MODELING

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    60/102

    Chapter 3

    Random walks & Wiener

    process

    Random walks are special cases of Markov chain, thus can be studied by

    Markov chain methods.

    3.1 Introduction to Random Walks

    We use random walks to supply the math base for BLAST. BLAST is a

    procedure often employed in Biomatics that

    searches for high-scoring local alignments between two sequences,

    then tests for significance of the scores found via P-values.

    Example3.1. Consider a simple case of the two aligned DNA sequences

    ggagactgtagacagctaatgctata

    gaacgccctagccacgagcccttatc

    55

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    61/102

    56 CHAPTER 3. RANDOM WALKS & WIENER PROCESS

    Suppose we give

    - a score +1 if the two nucleotides in corresponding positions are thesame and

    - a score -1 if they are different.

    When we compare two sequences from left to right, the accumulated

    score performs a random walk, or better a simple random walk in one

    dimension. The following theory although mentions the generic case, but

    we will use this example and BLAST as running example.

    3.2 Random Walk- a mathematical real-

    ization

    Let Z1, Z2, be independent identically distributed r.v.s with

    P(Zn= 1) =p and P(Zn=1) =q= 1 p

    for all n. Let

    Xn =ni=1

    Zi, n= 1, 2, and X0 = 0.

    The collection of r.v.s {Xn, n 0}is a random process, and it is called

    the simple random walk in one dimension.

    (a) Describe the simple random walkX(n).

    (b) Construct a typical sample sequence (or realization) ofX(n).

    (c) Find the probability that X(n) =2 after four steps.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    62/102

    3.2. RANDOM WALK- A MATHEMATICAL REALIZATION 57

    (d) Verify the result of part (a) by enumerating all possible sample

    sequences that lead to the value X(n) =2 after four steps.

    (e) Find the mean and variance of the simple random walkX(n). Find

    the autocorrelation function RX(n, m) of the simple random walk

    X(n).

    (f) Show that the simple random walk X(n) is a Markov chain.

    (g) Find its one-step transition probabilities.

    (h) Derive the first-order probability distribution of the random walk

    X(n).

    Solution.

    (a) Describe the simple random walk. X(n) is a discrete-parameter (or

    time), discrete-state random process. The state space is E={..., 2, 1, 0, 1, 2,...},

    and the index parameter set isT ={0, 1, 2,...}.

    (b) Typical sample sequence. A sample sequence x(n) of a simple ran-

    dom walk X(n) can be produced by tossing a coin every second and

    letting x(n) increase by unity if a head H appears and decrease by unity

    if a tail T appears. Thus, for instance, we have a small realization of

    X(n) in Table 3.2:

    The sample sequence x(n) obtained above is plotted in (n, x(n))-plane.The simple random walk X(n) specified in this problem is said to be

    unrestricted because there are no bounds on the possible values of X.

    The simple random walk process is often used in Game Theory or

    Biomatics.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    63/102

    58 CHAPTER 3. RANDOM WALKS & WIENER PROCESS

    n 0 1 2 3 4 5 6 7 8 9 10

    Coin tossing H T T H H H T H H T

    xn 0 1 0 - 1 0 1 2 1 2 3 2

    Table 3.1: Simple random walk from Coin tossing

    Remark 3.1. We define the ladder points to be the points in the walk

    lower than any previously reached point. An excursion in a walk is the

    part of the walk from a ladder point to the highest point attained before

    the next ladder point.

    BLAST theory focus on the maximum heights achieved by theses

    excursions.

    (c) The probability that X(n) =2 after four steps.

    We compute the first-order probability distribution of the random walk

    X(n):

    pn(k) = P(Xn=k), with boundary conditions p0(0) = 1, and pn(k) = 0 ifn

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    64/102

    3.2. RANDOM WALK- A MATHEMATICAL REALIZATION 59

    When X(n) =k, we see that

    A= (n + k)/2,

    which is a binomial r.v. with parameters (n, p).

    Conclude: the probability distribution ofX(n) is given by 3.2.1, in which

    n |k|, and n, k must be both even or odd.

    Set k = 2 and n= 4 in 3.2.1 to get the concerned probability p4(2)

    that X(4) =2

    (d) Verify the result of part (a) by enumerating all possible sample se-

    quences that lead to the value X(n) =2 after four steps. DIY!

    (e) The mean and variance of the simple random walk X(n). Use the

    fact

    P(Zn= +1) =p and P(Zn=1) = 1 p.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    65/102

    60 CHAPTER 3. RANDOM WALKS & WIENER PROCESS

    3.3 Wiener process

    Counting process. A random process {X(t), t 0} is said to be a

    counting process if X(t) represents the total number of events that

    have occurred in the interval (0, t). From its definition, we see that for

    a counting process, X(t) must satisfy the following conditions:

    X(t) 0 and X(0) = 0.

    X(t) is integer valued.

    X(s) X(t) ifs < t.

    X(t) X(s) equals the number of events that have occurred on the

    interval (s, t).

    Independent increments and stationary increments. A counting

    processX(t) is said to possess independent incrementsif the numbers of

    events which occur in disjoint time intervals are independent.

    A counting processX(t) is said to possessstationary increments ifX(t+

    h) X(s + h) (the number of events in the interval (s + h, t + h) has the

    same distribution as X(t) X(s) (the number of events in the interval

    (s, t)), for all s < tand h >0.

    Wiener process. A random process {X(t), t 0} is called a Wiener

    process if

    1. X(t) has stationary independent increments

    2. The incrementX(t) X(s) (t > s) is normally distributed

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    66/102

    3.3. WIENER PROCESS 61

    3. E[X(t)] = 0, and

    4. X(0) = 0.

    The Wiener process is also known as the Brownian motionprocess, since

    it originates as a model for Brownian motion, the motion of particles

    suspended in a fluid.

    Definition23. A random process{X(t), t 0} is called a Wiener pro-

    cess withdrift coeficient if

    1. X(t) has stationary independent increments

    2. X(t) is normally distributed with meanE[X(t)] =t, and

    3. X(0) = 0.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    67/102

    62 CHAPTER 3. RANDOM WALKS & WIENER PROCESS

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    68/102

    Chapter 4

    Arrival-Type processes

    4.1 Introduction

    In Stochastic processes, we are interested in few distinct properties:

    (a) the dependencies in the sequence of values generated by the

    process. For example, how do future prices of a stock depend on

    past values?

    (b)long-term averages, involving the entire se- quence of generated

    values. For example, what is the fraction of time that a machine

    is idle?

    (c) the likelihood or frequency of certain boundary events. For

    example, what is the probability that within a given hour all cir-

    cuits of some telephone system become simultaneously busy?

    63

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    69/102

    64 CHAPTER 4. ARRIVAL-TYPE PROCESSES

    In this chapter, we will discuss the first major category of stochastic

    processes, Arrival-Type Processes. We are interested in occurrences thathave the character of an arrival, such as

    - message receptions at a receiver,

    - job completions in a manufacturing cell,

    - customer purchases at a store, etc.

    We will focus on models in which the interarrival times(the times be-

    tween successive arrivals) are independent random variables.

    First, we consider the case where arrivals occur in discrete time

    and the interarrival times are geometrically distributed this is the

    Bernoulli process.

    Then we consider the case where arrivals occur in continuous time

    and the interarrival times are exponentially distributed this is the

    Poisson process.

    4.2 The Bernoulli process

    4.2.1 Basic facts

    The Bernoulli process can be visualized as a sequence of independent

    coin tosses, where the probability of heads in each toss is a fixed number

    p in the range 0 < p < 1. In general, the Bernoulli process consists of

    a sequence of Bernoulli trials, where each trial produces

    a 1 (a success) with probability p, and

    a 0 (a failure) with probability 1 p, independently of what happens

    in other trials.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    70/102

    4.2. THE BERNOULLI PROCESS 65

    There are many realizations of Bernoulli process. Coin tossing is just a

    paradigm involving a sequence of independent binary outcomes. The se-quence Z1, Z2, of independent identically distributed r.v.s in Section

    3 is another paradigm for the same phenomenon.

    In practice, a Bernoulli process is often used to model systems involving

    arrivals of customers or jobs at service centers. Here, time is discretized

    into periods, and a success at thek-th trial is associated with the arrival

    of at least one customer at the service center during the k-th period. In

    fact, we will often use the term arrival in place of success when this is

    justied by the context.

    Given an arrival process, one is often interested in random variables such

    as the number of arrivals within a certain time period, or the time until

    the first arrival. For the case of a Bernoulli process, some answers are

    already available from earlier chapters. Here is a summary of the main

    facts.

    Bernoulli DistributionB(p) describes a random variable that can take

    only two possible values, i.e. X ={0, 1}. The distribution is described

    by a probability function

    p(1) = P(X= 1) =p, p(0) = P(X= 0) = 1 pfor some p[0, 1].

    It is easy to check that E(X) =p, Var(X) =p(1 p).

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    71/102

    66 CHAPTER 4. ARRIVAL-TYPE PROCESSES

    4.2.2 Random Variables Associated with the

    Bernoulli Process

    Binomial distribution B(n, p). This distribution describes a random

    variableXthat is a number of successes innindependent Bernoulli trials

    with probability of success p.

    In other words,Xis a sum ofnindependent Bernoulli r.v. Therefore,

    X takes values in X = {0, 1,...,n} and the distribution is given by a

    probability function

    p(k) = P(X=k) =

    n

    k

    pk (1 p)nk.

    It is easy to check that E(X) =np, Var(X) =np(1 p).

    4.3 The Poisson process

    4.3.1 Poisson distribution

    It is another discrete probability distribution, used to determined the

    probability of a designated number of successes per unit of timewhen the

    successes/events are independent and the average number of successes

    per unit of time remains constant. The Poisson distribution is

    p(x) =ex

    x! x= 0, 1, 2,... (4.3.1)

    where

    x= designated number of successes, e= 2.71 the natural base,

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    72/102

    4.3. THE POISSON PROCESS 67

    >0 a constant = the average number of successes per unit of time

    periodThe Poisson distributions mean and the variance are

    = ; 2 = = .

    Example4.1 (Poisson distribution usage). We often model the number

    of defects or non-conformities that occur in a unit of product (unit area,

    volume, and most frequently unit of time...) say, a semiconductor device,

    by a Poisson distribution. The number of wire-bonding defects per unit

    X is Poisson distributed with parameter= 4. Compute the probability

    that a randomly selected semiconductor device will contain two or fewers

    wire-bonding defects.

    This probability is

    (x 2) =p(0) +p(1) +p(2) =2

    x=0

    ex

    x! = 0.2381.

    4.3.2 Poisson process

    The Poisson process can be viewed as a continuous-time analog of the

    Bernoulli process and applies to situations where there is no natural way

    of dividing time into discrete periods. We consider an arrival process

    that evolves in continuous time, in the sense that any real number tis a

    possible arrival time.

    Definition24. A counting processX(t) is said to be a Poisson (count-

    ing) process with positive rate (or intensity) if

    X(0) = 0, andX(t) has independent increments.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    73/102

    68 CHAPTER 4. ARRIVAL-TYPE PROCESSES

    The number of events in any interval of length t is Poisson dis-

    tributed with meant; that is, for alls,t >0,

    P[X(t + s) X(s) =n] =et(t)n

    n! n= 0, 1, 2,... (4.3.2)

    4.4 Course Review and Discussion

    Practical Problem6.

    1. Prove that a Poisson processX(t) with positive rate has station-ary increments, and

    E[X(t)] =t, Var[X(t)] =t.

    2. Practice. Patients arrive at the doctors office according to a Pois-

    son process with rate = 1/10 minute. The doctor will not see a

    patient until at least three patients are in the waiting room.

    a/ Find the expected waiting time until the first patient is admitted

    to see the doctor.

    b/ What is the probability that nobody is admitted to see the doctor

    in the first hour?

    Theorem 25. If every eigenvalue of a matrixPyields linearly indepen-

    dent left eigenvectors in number equal to its multuiplicity, then

    1. there exists a nonsingular matrixMwhose rows are left eigenvec-

    tors ofP, such that

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    74/102

    4.4. COURSE REVIEW AND DISCUSSION 69

    2. D= M P M1 is a diagonal matrix with diagonal elements are the

    eigenvalues ofP, repeated according to multiplicity.

    Practical Problem7 (MC for Business Intelligence). Consider a case

    study of mobile phone industry in VN. Due to a most recent survey,

    there are four big mobile producers/sellers N, S, M and L, and their

    market distributions in 2007 is given by the stochastic matrix:

    P =

    N M L S

    N 1 0 0 0

    M 0.4 0 0.6 0

    L 0.2 0 0.1 0.7

    S 0 0 0 1

    IsP regular? ergodic?

    Find the long term distribution matrixL= limm Pm.

    What is your conclusion?

    (Remark that the state N and Sare called absorpting states).

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    75/102

    70 CHAPTER 4. ARRIVAL-TYPE PROCESSES

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    76/102

    Chapter 5

    Probability Modeling and

    Mathematical Finance

    Probability modeling in finance provides instruments to rationalize the

    unknown by imbedding it into a coherent framework. Three key com-

    ponents should be distinguished: randomness, uncertainty and chaos.

    Kolmogorov defined randomness in terms of non-uniqueness and non-

    regularity (as a die with six faces or the expansion of). Kalman defined

    chaos as randomness without probability.

    Few areas that employ much probability modeling include: weather fore-

    casting, biology and financial forecasting. In general, in order to model

    uncertainty we seek to distinguish the known from the unknown and find

    some mechanisms (theories, intuition, common sense...) to reconcile our

    knownledge with our lack of it.

    71

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    77/102

    72CHAPTER 5. PROBABILITY MODELING AND MATHEMATICAL FINANCE

    5.1 Martingales

    5.1.1 History

    Girolamo Cardano in his book The Book of Game of Chance in 1565

    proposed the notion of fair game. He stated: The most fundamental

    priciple of all in gambling is simply equal conditions, ... . This is the

    essence of the Martingale, however until 1900, in Bacheliers thesis that

    a mathematical model of a fair game- or martingale- was proposed.

    Nowadays, we understand the concept of a fair game or martingale, in

    money terms, states that the expected profit at a given time given the

    total past capital is null with probability one.

    Throughout this chapter we assume that (, F,P) is a fixed probability

    space, where

    is a sample space representing the set of all possible outcomes,

    F is a -algebra of subsets of representing the events to which

    we can assign probabilities, and

    P is a probability measure on (, F).

    The expectation with respect to P will be denoted by E[.].

    5.1.2 Conditional expectation

    Let X and Zbe two r.vs on the same (, F,P)-space. Suppose X has

    range {x1, x2, . . . , xm}and Zhas range{z1, z2, . . . , z n}. We know that

    P[X=xi|Z=zj ] :=P[X=xi, Z=zj]

    P[Z=zj]

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    78/102

    5.1. MARTINGALES 73

    and also

    E[X|Z=zj] =i

    xi P[X=xi|Z=zj].

    Definition 26. The random variableY = E[X|Z], the conditional ex-

    pectation ofX givenZ, is defined as follows:

    (a) ifZ() =zj, then Y() :=E[X|Z=zj] =:yj(say).

    Justification. In this way we could do partitioning the space into

    Z-atoms Z = zj, on which Z is constant. The -algebra G = (Z)

    generated byZconsists of sets {ZB}, B B, the Borel set. Therefore

    G=(Z) consists precisely of the 2n possible unions of the n Z-atoms.

    Note from (a) that Y is constant on Z-atoms, so better we say

    (b) Y isG measurable.

    Theorem 27 (Kolmogorov 1933). Let (, F,P) be a probability space

    and X a random variable with E[|X|] < . LetG be sub--algebra of

    F. Then there exists a random variableY such that

    a) Y isG-measurable,

    b) E[|Y|]< ,

    c) for everyG Gwe have

    G

    Y dP=

    G

    XdP, G G.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    79/102

    74CHAPTER 5. PROBABILITY MODELING AND MATHEMATICAL FINANCE

    Moreover, if Y1 is another random variable with these properties then

    Y1=Yalmost surely (a.s.), that isP

    [Y1=Y] = 1.

    A random variable Y with properties a)-c) is called a version of the

    conditional expectationE[X|G] ofXgivenG, and we write Y =E[X|G]

    a.s.

    Proof. Since G is generated by Z, or any G G is a union of the n

    Z-atoms, so we first prove that

    Z=zj

    Y dP= yj P[Z=zj] =... =

    Z=zj

    XdP.

    Write Gj ={Z=zj}then this equation means E[Y IGj ] =E[XIGj ]...

    Note 5.1. We often write

    E[X|Z] forE[X|G] =E[X|(Z)]; and

    E[X|Z1, Z2, . . .] forE[X|(Z1, Z2, . . .)].

    Fact 5.2. ifUis a non-negative bounded r.v., then

    E[U|G] 0, a.s.

    5.1.3 Key properties of Conditional expectation

    See textbook.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    80/102

    5.1. MARTINGALES 75

    5.1.4 Filtration

    A filtration is a family {Ft, t= 0, 1, . . . , T }of sub--algebras indexed by

    t = 0, 1, . . . , T such that

    F0 F1 F2 . . . FT;

    that is the family is increasing with time. Intuitively, for each t =

    0, 1, . . . , T , the -algebra Ft tells us which events may be observed by

    time t.

    If the sample space is a finite set, often the -algebra F0 is trivial,

    consisting simply of the empty set and the whole sample space . We

    also often write just {Ft} instead of the lengthy {Ft, t = 0, 1, . . . , T },

    and can assume that FT = F(since shall be considering only random

    variables that areFT-measurable).

    Definition28. We call the quadruple (, F, {Ft},P) a filtered probabil-

    ity space.

    We fix a filtered probability space (, F, {Ft},P) from now on. Given

    d N.

    A d-dimensional stochastic process with time index set {0, 1, . . . , T },

    defined on the provided filtered probability space, is a collection

    X={Xt, t= 0, 1, . . . , T }

    where each Xt is a d-dimensional random vector, i.e. a function

    Xt : Rd such that

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    81/102

    76CHAPTER 5. PROBABILITY MODELING AND MATHEMATICAL FINANCE

    X1

    t (B) { :Xt() B} F

    for each subset B ofRd.

    The process X = {Xt, t 0} is called adapted(to the filtration

    {Ft}) if for each t, Xt isFt-measurable, i.e.

    ifX1t (B) Ft for each set B ofRd and for each t= 0, 1, . . . , T .

    We often write Xt Ft as shorthand for X1t (B) Ft for all sets

    B in Rd.

    Two d-dimensional stochastic processes Y = {Yt} and Z = {Zt}

    are modifications of one another if P(Yt = Zt) = 1 for each t =

    0, 1, . . . , T .

    5.1.5 Martingale

    A collection/ process M={Mt, Ft, t= 0, 1, . . . , T }, where each Mt is a

    real-valued random variable, is called a martingale if the following three

    conditions hold:

    1. E[|Mt|]< for t = 0, 1, . . . , T ,

    2. Mt is Ft-measurable for t = 0, 1, . . . , T , [i.e. the process M is

    adapted]

    3. the conditional expectation

    E[Mt|Ft1] =Mt1 for t= 1, . . . , T .

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    82/102

    5.1. MARTINGALES 77

    In our discrete time setting, condition 3. can be equivalently re-

    placed by

    3.

    E[Mt|Fs] =Ms for all s < t in{0, 1, . . . , T }.

    We call M asub-martingaleif the = in condition 3. or 3. is replaced

    by ; call M a super-martingale if the = in condition 3. or 3. is

    replaced by .

    When describing (sub/super)martingales we will sometimes omit the fil-

    tration Ft from the notion for Mwhen it is understood.

    Interpretation of Martingale in Finance

    The martingale is considered to be a necessary condition for an efficient

    asset market, one in which the information contained in past prices isinstantly, fully and perpetually reflected in the assets current price. We

    identify

    M={Mt =pt the assets price at t},

    and denote the filtration t={p0, p1, . . . , pt}for an asset price history

    at timet = 0, 1, 2 . . .expressing the relevant information we have at this

    time regarding the time series. Then we could think that in a martingale

    process each process event (as a new price)

    is independent and can be summed (or intergrable); and

    has the property that its conditional expectation remains the same

    (i.e. time-invariant).

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    83/102

    78CHAPTER 5. PROBABILITY MODELING AND MATHEMATICAL FINANCE

    Hence,M={Mt = pt}is a martingale iff the expected next period price

    is equal to the current price:

    E[pt+1|p0, p1, . . . , pt] =pt or equivalentlyE[pt+1|t] =pt for any time t.

    If instead asset prices decrease (or increase) in expectation over time, we

    have a super-martingale (sub-martingale):

    E[pt+1|t]pt

    Observation 1. Martingales may also be defined with respect to other

    processes.

    If, for example, P ={pt, t0}ispriceprocess and Y ={yt, t 0}

    is interest rateprocess, we can say that P is a martingale with respect

    to Y if

    E[|pt|]< , and E[pt+1|y0, y1, . . . , yt] =pt,t.

    Fact 5.3. By induction, a martingale implies an invariant mean:

    E[pt+1] =E[pt] = = E[p0].

    5.1.6 Martingale examples

    Example5.1. Sum of independent zero-mean r.vs. LetX1, X2, . . .be a sequence of independent r.vs withE[|Xn|]< , n andE[Xn] = 0.

    Define S0= 0,F0={, }and

    Sn:= X1+ X2+ X3+ + Xn,

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    84/102

    5.1. MARTINGALES 79

    Fn:=(X1, X2, X3, . . . , X n).

    Then you can prove for n 1 that

    E[Sn|Fn1] =Sn1 a.s.

    Example 5.2. Geometric Random Walks and a specific case.

    The essential idea underlying the random walk for real processes is the

    assumption of mutually independent increments of the order of magni-

    tude for each point in time. However, economic time series in particular

    do not satisfy the latter assumption. Seasonal fluctuations of monthly

    sales figures for example are in absolute terms signicantly greater if the

    yearly average sales gure is high. By contrast, the relative orpercent-

    age changes are stable over time and do not depend on the current

    level ofXt.

    Analogously to the random walk Xt =t

    i=0 Zi with i.i.d. absoluteincrements Zt = Xt Xt1, a geometric random walk {Xt; t 0} is

    assumed to have i.i.d. relative increments

    Rt = XtXt1

    for t= 1, 2, . . .

    For a specific case, the geometric binomial random walk

    Xt=Rt Xt1 = X0k=1

    tRk

    whereX0, R1, R2, . . .are mutually independent, eachRkis Bernoulli, and

    for u >1 (up), d

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    85/102

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    86/102

    5.1. MARTINGALES 81

    Example5.4. Product of non-negative independent r.vs of mean

    1. Let X1, X2, . . . be a sequence of independent non-negative r.vs withE[Xn] = 1n.

    Define M0 = 0,F0 = {, }and

    Mn := X1X2X3 . . . X n, Fn := (X1, X2, X3, . . . , X n).

    The processM is a martingale. (Why?)

    5.1.7 Stopping time

    Definition30. A (discrete) stopping time is a function : {0, 1, . . . , T }

    {}

    such that

    {=t} Ft for t= 0, 1, . . . , T . . . ()

    Obviously for such a stopping time we see:

    {=}= \(Tt=0

    {=t}) FT.

    For convenience we define F = FT, and then () also holds with

    t= .

    Justification. Intuitively is a time when you can decide to stop

    playing our game. Whether or not you stop immediately after the n-

    th game depends only on the history up to (and including) time n:

    {=n}= {:() =n} Fn.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    87/102

    82CHAPTER 5. PROBABILITY MODELING AND MATHEMATICAL FINANCE

    Fact 5.4. With any (discrete) stopping time, there is a-algebra de-

    fined by

    F={A F: A {=t} Ft for t= 0, 1, . . . , T }.

    Lemma 31. If and are two stopping times, then

    = min(, ), and = max(, )

    both also are stopping times.

  • 8/13/2019 Stochastic Processes Applications Lecturenotes

    88/102

    5.2. STOCHASTIC CALCULUS 83

    5.2 Stochastic Calculus

    Our basic assumption is, we do not know and can not predict tomorrows

    values of asset prices. The past history of the asset value is there as a

    financial time series for us to examine as much as we want, but we can

    n