b10b notes

Upload: lukas-bosko

Post on 06-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 b10b Notes

    1/101

    B10b Mathematical Models of Financial Derivatives

    Hilary Term 2012

    Michael MonoyiosMathematical Institute

    University of Oxford

    March 5, 2012

  • 8/2/2019 b10b Notes

    2/101

    Useful books

    The books by Shreve [13, 14] are excellent on, respectively, probabilisitic aspects of the binomialmodel and on stochastic calculus for finance.

    Etheridge [5] is a good stochastic calculus primer for finance, Bjork [1] covers many financetopics outside the scope of the course, Wilmott et al [16] is good on the PDE aspects of thesubject, and background on financial derivatives is given in Hull [8]. Jacod and Protter [9] andGrimmett and Stirzaker [6] are excellent for background probability material.

    The lecture notes

    These notes contain some background material, which is marked with an asterix. In general, thiswill not be examinable. It is good to know the basic probability theory underlying conditionalexpectation and martingales, though this will not be directly examined. It is essential to knowthe properties of Brownian motion, particularly its quadratic variation, and have an idea of howthis leads to the Ito formula and to properties of the Ito integral, though you are not required

    to know the theory of the construction of the Ito integral.

    1

  • 8/2/2019 b10b Notes

    3/101

    Contents

    I Introduction to derivative securities 5

    1 Financial derivatives 51.1 Underlying assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Interest rates and time value of money . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Forwards and futures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.3.1 Valuation of forward contracts . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.4.1 Forward on dividend-paying stock . . . . . . . . . . . . . . . . . . . . . . 81.5 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    1.5.1 Europen call and put price bounds . . . . . . . . . . . . . . . . . . . . . . 101.5.2 Combinations of options . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    1.6 Some history* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    II The binomial model 13

    2 Probability spaces 132.1 Finite probability spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    3 The binomial stock price process 173.1 Filtration as information flow in the binomial model* . . . . . . . . . . . . . . . . 19

    4 Conditional expectation and martingales 224.1 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    4.1.1 Conditional expectation in the three-period binomial model . . . . . . . . 264.1.2 Partial averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.1.3 Properties of conditional expectation . . . . . . . . . . . . . . . . . . . . . 28

    4.2 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    5 Equivalent measures* 335.1 Radon-Nikodym martingales* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2 Conditional expectation and Radon-Nikodym theorem* . . . . . . . . . . . . . . 36

    6 Arbitrage pricing in the binomial model 366.1 Equivalent martingale measures and no arbitrage* . . . . . . . . . . . . . . . . . 39

    6.1.1 Fundamental theorems of asset pricing* . . . . . . . . . . . . . . . . . . . 396.2 Pricing by replication in the binomial model . . . . . . . . . . . . . . . . . . . . . 40

    6.2.1 Replication in a one-period binomial model . . . . . . . . . . . . . . . . . 416.3 Completeness of the multiperiod binomial model . . . . . . . . . . . . . . . . . . 42

    7 American options 447.1 Value of hedging portfolio for an American option* . . . . . . . . . . . . . . . . . 467.2 Stopping times* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477.3 Properties of American derivative securities* . . . . . . . . . . . . . . . . . . . . 48

    III Continuous time 49

    2

  • 8/2/2019 b10b Notes

    4/101

    8 Brownian motion 498.1 Random walk* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498.2 BM as scaled limit of symmetric random walk* . . . . . . . . . . . . . . . . . . . 508.3 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    8.4 Properties of BM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528.5 Quadratic variation of BM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    8.5.1 First variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548.5.2 Quadratic variation of Brownian motion . . . . . . . . . . . . . . . . . . . 54

    9 The Ito integral 599.1 Construction of the Ito integral* . . . . . . . . . . . . . . . . . . . . . . . . . . . 599.2 Ito Integral of an elementary integrand* . . . . . . . . . . . . . . . . . . . . . . . 609.3 Properties of the Ito integral of an elementary process . . . . . . . . . . . . . . . 609.4 Ito Integral of a general integrand* . . . . . . . . . . . . . . . . . . . . . . . . . . 639.5 Properties of the general Ito integral . . . . . . . . . . . . . . . . . . . . . . . . . 64

    10 The Ito formula 66

    10.1 Itos formula for one Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . 6610.2 Itos formula for Ito processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6810.3 Stochastic differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7010.4 Multidimensional Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    10.4.1 Cross-variations of Brownian motions . . . . . . . . . . . . . . . . . . . . 7110.5 Two-dimensional Ito formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7210.6 Multidimensional Ito formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    10.6.1 Multidimensional Ito process . . . . . . . . . . . . . . . . . . . . . . . . . 7410.7 Extensions* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7510.8 Connections with PDEs: Feynman-Kac theorem . . . . . . . . . . . . . . . . . . 7510.9 T he Girsanov Theorem* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    11 The Black-Scholes-Merton model 7811.1 Portfolio wealth evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7911.2 Perfect hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    11.2.1 Riskless portfolio argument . . . . . . . . . . . . . . . . . . . . . . . . . . 8011.3 Solution of the BSM equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8111.4 BS option pricing formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8111.5 Sensitivity parameters (Greeks)* . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    11.5.1 Delta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8211.5.2 Theta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8211.5.3 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8411.5.4 Vega . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

    11.6 Probabilistic (martingale) interpretation of perfect hedging* . . . . . . . . . . . . 8511.7 Black-Scholes analysis for dividend-paying stock . . . . . . . . . . . . . . . . . . 87

    11.8 Time-dependent parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    12 Options on futures contracts 8912.1 The mechanics of futures markets . . . . . . . . . . . . . . . . . . . . . . . . . . . 8912.2 Options on futures contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    13 American options in the BS model 9113.1 Smooth pasting condition for American put . . . . . . . . . . . . . . . . . . . . . 9213.2 Optimal stopping representation* . . . . . . . . . . . . . . . . . . . . . . . . . . . 9213.3 The American call with no dividends . . . . . . . . . . . . . . . . . . . . . . . . . 92

    3

  • 8/2/2019 b10b Notes

    5/101

    14 Exotic options 9314.1 Digital options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9414.2 Pay-later options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9514.3 Multi-stage options: compounds and choosers . . . . . . . . . . . . . . . . . . . . 95

    14.3.1 Multi-stage options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9514.3.2 Compound options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9514.3.3 Chooser options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    14.4 Barrier Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9614.4.1 PDE approach to valuing barrier options . . . . . . . . . . . . . . . . . . 97

    4

  • 8/2/2019 b10b Notes

    6/101

    Part I

    Introduction to derivative securities

    1 Financial derivatives

    A European derivative security (or contingent claim) is a financial contract paying to its holder arandom amount (called the payoff of the claim) at some future time T (called the maturity timeof the derivative). An American derivative delivers the payoff at a random time T chosenby the holder of the contract. The payoff is (typically) contingent on the value of some otherunderlying security (or securities), or on the level of some non-traded reference index.

    A basic example is a forward contract. The holder of such a contract agrees to buy an assetat some future time T for a fixed price K (the delivery price) that is decided at initiation of thecontract. Hence, the forward contract has a value (to the holder) at maturity of ST K, whereST is the underlying asset value at maturity.

    The origins of derivatives lie in medieval agreements between farmers and merchants to trade

    the farmers harvest at some future date, at a price set in advance. This allowed farmers to fixthe selling price of their crop, and reduced the risk of having to sell at a lower price than theircost of production, which might happen in a bumper harvest year. This is one motivation for theexistence of derivatives: they give random payoffs which can be used to eliminate uncertaintyfrom future asset price trades. The act of removing uncertainty in finance is called hedging.

    Consider a farmer whose unit cost of crop production is C. His profit on selling the cropwould be ST C, where ST is the market crop price at harvest time. If the farmer were to sell(that is, take a short position in) a forward contract with delivery price K > C, at some timet < T, then his overall payoff at T would be ST C (ST K) = K C > 0. The risk of thecrop price being less than the cost of production has been removed.

    Of course, since derivatives have random payoffs, they can also be used to take risk byspeculating on the future values of asset prices, and they are often a cheaper device for doing sothan investing in the underlying asset. For example, another classical example of a derivative isa European call option on a stock, with payoff (ST K)+, where ST is the (random) stock priceat time T and K 0 is a constant called the strike price of the option. This allows the holderof the call to take a positve payoff if the stock price is above K, and the cost of acquiring a calloption is usually only a fraction of the cost of buying the stock itself.

    This course will be about how to assign a value to a derivative at any time t T. This willinvolve modelling the randomness in the underlying asset price process S = (St)0tT. To dothis, we will need the notion of a stochastic process on a filtered probability space. We shall seethat the key to valuing derivatives is to attempt to use the underlying asset to remove the riskfrom selling (or buying) the derivatieve. That is, derivative valuation is via a hedging argument.

    1.1 Underlying assets

    Typical assets which are traded in financial markets, and which can be the underlying assets fora derivative contract, include:

    shares (stocks) commodities (metals, oil, other physical products) currencies bonds (assets used as borrowing tools by govenments and companies) which pay fixed

    amounts at regular intervals to the bond holder.

    An agent who holds an asset will be said to hold a long position in the asset, or to be long inthe asset.

    An agent who has sold an asset will be said to hold a short position in the asset, or to beshort in the asset.

    5

  • 8/2/2019 b10b Notes

    7/101

    For the most part in this course, we will focus on derivative securities which have a stock asunderlying asset. The stock price will be a stochastic process denoted by S = (St)0tT on aprobability space (, F,P). This means that for each t [0, T], St, the value of the stock at timet, is a random variable.

    1.2 Interest rates and time value of money

    Let us measure time in some convenient units, say years. If an interest rate r is quoted perannum and with compounding frequency at time intervals t, this means that an amount Ainvested for a time period t will grow to A(1 + rt). If this is re-invested for another periodt, the balance becomes A(1 + rt)2, and so on. So after n periods, with t := nt, we haveA(1 + rt)n = A(1 + rt/n)n. A continuously compounded interest rate corresponds to the limitn , or t 0. In this case, after time t an amount A will grow to

    limn

    A

    1 +

    rt

    n

    n= Aert.

    So an amount A invested at time zero for a time t will grow to an amount Aert

    , where r is thecontinuously compounded risk-free interest rate. We call the amount Aert the future value of Ainvested at time zero, and the factor ert is called an accumulation factor.

    By the same token, receiving an amount A at time t is equivalent to receiving Aert at timezero. We call Aert the present value of A received at time t (we say that A is discounted to thepresent) and the factor ert is called a discount factor.

    It is usually convenient (but nothing more) to assume that interest is continuously com-pounded. We do not need to assert that, in reality, interest is continuously compounded, inorder to use a continuously compounded interest rate in all our analysis. If the interest is ac-tually compounded m times a year at an interest rate of R per annum, then we can still use acontinously compounded interest rate r simply by making the identification

    A1 + Rmmt

    = A exp(rt), (1.1)

    so that there is a one-to-one correspondence between the interest rate R (compounded m timesper annum) and the continuously compounded interest rate r. In this course, we will nearlyalways use continuously compounded rates when considering continuous time models.

    A differential version of the above arguments is as follows. In continuous time, we model thetime evolution of cash in a bank account in terms of a riskless asset which we shall call a moneymarket account, which is the value at time t > 0 of $1 invested at time zero and continuouslyearning interest which is reinvested. We shall denote the value of this asset at time t by Bt,which satisfies

    dBt = rBt dt, B0 = 1, (1.2)

    where r is the (assumed constant) interest rate. Then the value of the bank account at time t is

    given by Bt = ert, which we recognise as the familiar accumulation factor we have encounteredabove.

    For more advanced modelling we could assume that interest rates are time-varying (andpossibly stochastic). In this case the money market account satisfies

    dBt = rtBt dt, B0 = 1, (1.3)

    where rt is the instantaneous (or short term) interest rate. We have allowed for this to be time-varying, and rt represents the interest rate in the time interval [t, t + dt). From (1.3) we seethat

    Bt = exp

    t0

    ru du

    , (1.4)

    and this is the accumulation factor in this case. This is the factor by which $1 invested at timezero grows to at time t, when the interest generated is continually reinvested.

    6

  • 8/2/2019 b10b Notes

    8/101

    1.3 Forwards and futures

    A forward contract is a contract which obliges the holder to buy an underlying asset at somefuture time T (the maturity time) for a price K (the delivery price that is fixed at contractinitiation. Hence, at time T, when the stock price is ST, the contract is worth ST

    K (the payoff

    of the forward) to the holder. This payoff is shown in Figure 1.

    E TERMINAL ASSET PRICE ST

    TFORWARD CONTRACT PAYOFF

    K

    Figure 1: Forward contract payoff as function of final underlying asset price

    A futures contract is a rather specialised forward contract, traded on an organised exchange,and such that, if a contract is traded at some time t T, the delivery price is set to a specialvalue Ft,T, called the futures price of the asset or the forward price of the asset, chosen so thatthe value of the futures contract at initiation (that is, at time t), is zero.

    1.3.1 Valuation of forward contracts

    In what follows we value forward contracts on a non-dividend paying stock, that is, an asset withprice process S = (St)0tT that pays no income to its holder.

    Lemma 1.1. The value at time t T of a forward contract with delivery price K and maturityT, on an asset with price process S = (St)0tT, is ft,T f(t, St; T) f(t, St; K, T), given by

    ft,T = St Kexp(r(T t)), 0 t T. (1.5)Proof. This is a simple hedging argument which provides our first example of a riskless hedgingstrategy. Start with zero wealth at time t T, and sell the contract at time t T for some priceft,T. Hedge this sale by purchasing the asset for price St. This requires borrowing of St ft,T.

    At time T, sell the asset for price K under the terms of the forward contract, and requirethat this is enough to pay back the loan. Hence we must have

    K = (St ft,T)exp(r(T t)),and the result follows.

    Corollary 1.2. The forward price of the asset at time t T is given byFt,T = St exp(r(T t)), 0 t T.

    Proof. Set ft,T = 0 in Lemma 1.1 and then by definition we must have K = Ft,T.

    1.4 Arbitrage

    The simple argument above for valuing a forward contract is an example of valuation by theprinciple of no arbitrage. If the relationship in Lemma 1.1 is violated, then an elementary

    example of a riskless profit opportunity, called an arbitrage, ensues. Here is a formal definitionof arbitrage.

    7

  • 8/2/2019 b10b Notes

    9/101

    Definition 1.3 (Arbitrage). Let X = (Xt)0tT denote the wealth process of a trading strategy.An arbitrage over [0, T] is a strategy satisfying X0 = 0, P[XT 0] = 1 and P[XT > 0] > 0.

    So an arbitrage is guaranteed not to lose money and has a positive probability of making aprofit. If the valuation formula (1.5) for the forward contract is violated, an immediate arbitrageopportunity occurs, as we now illustrate.

    Suppose ft,T > St Kexp(r(T t)). Then one can short the forward contract and buy thestock, by borrowing St ft,T at time t. At maturity, one sells the stock for K under the termsof the forward contract and uses the proceeds to pay back the loan, yielding a profit of

    K (St ft,T)exp(r(T t)) > 0.

    This is an arbitrage. A symmetrical argument applies if ft,T < St Kexp(r(T t)) (and youshould supply this).

    The principle of riskless hedging and no arbitrage will also apply, rather less trivially, to thevaluation of options later in the course.

    An equivalent way of looking at no arbitrage is sometimes called the law of one price. Two

    portfolios which give the same payoff at T should have the same value at time t T. Let us showhow this applies to the valuation of a forward contract. Consider the following two portfolios attime t T:

    a long position in one forward contract, a long position in the stock plus a short cash position of Kexp(r(T t)).

    At time T, these are both worth STK, so their values at time t T must be equal, yieldingft,T = St Kexp(r(T t)), as before. Notice that the second portfolio perfectly replicates(or perfectly hedges) the payoff of the forward contract, meaning that it reproduces the payoffST K. Denote the position in the stock that is needed to perfectly hedge a forward contractby Hft . Then we have that H

    ft = 1 for alll t [0, T], and note that

    Hft = fx(t, St) = 1, 0 t T,

    where f(t, x) := x Ker(Tt). This is a simple example of a delta hedging rule, in which onedifferentiates the pricing function of the derivative with respect to the variable representing theunderlying asset price, in order to get the hedging strategy. We will see a similar result whenvaluing options.

    1.4.1 Forward on dividend-paying stock

    The stock in the preceding analysis was assumed to pay no dividends. Now assume that thestock pays dividends as a continuous income stream with dividend yieldq. This means that inthe interval [t, t + dt), the income received by someone holding the stock will be qSt dt. Suppose

    that at time t [0, T] an agent holds nt shares of stock. The income received in the nextinfinitesimal time interval is qntSt dt. If this is immediately re-invested in shares, the holding inshares satisfies

    dnt = qnt dt.

    Hence, if the initial holding is n0 at time zero, we have

    nt = n0 exp(qt), 0 t T.

    In particular, we havent = nT exp(q(T t), 0 t T.

    This means that in order to hold one share of stock at time T, one may buy exp(q(T t))shares at t T and re-invest the dividends in the stock.

    If we use this to value a forward contract on the dividend-paying stock we arrive at thefollowing.

    8

  • 8/2/2019 b10b Notes

    10/101

    Lemma 1.4. The value at time t T of a forward contract with delivery price K and maturityT, on a stock with price process S = (St)0tT paying dividends at a dividend yield q, is givenby

    ft,T = St exp(q(T t)) Kexp(r(T t)), 0 t T. (1.6)Proof. This is again a hedging argument. Start with zero wealth at time t T, and sell thecontract at time t T for some price ft,T. Hedge this sale by purchasing exp(q(Tt)) shares atprice St. This requires borrowing of exp(q(Tt))Stft,T. Re-invest all dividends immediatelyin the stock, so as to hold one share at time T.

    At time T, sell the asset for price K under the terms of the forward contract, and ensure thatthis is enough to pay back the loan. Hence we must have

    K = (St exp(q(T t)) ft,T)exp(r(T t)),

    and the result follows.

    Corollary 1.5. The forward price of the dividend-paying asset at time t

    T is given by

    Ft,T = St exp((r q)(T t)), 0 t T.

    Proof. Set ft,T = 0 in Lemma 1.4 and then by definition we must have K = Ft,T.

    Remark 1.6 (Forwards and futures on currencies). A foreign currency is treated as an asset whichpays a dividend yield equal to the foreign interest rate rf. Hence, suppose S = (St)0tT isthe exchange rate (the value in dollars of one unit of foreign currency), then a forward contracton the foreign currency has value at time t T given by

    ft,T = St exp(rf(T t)) Kexp(r(T t)), 0 t T,

    where T is the maturity and K is the delivery price.

    1.5 Options

    An option is a contract that gives the holder the right but not the obligation to buy or sell anasset for some price that is defined in advance.

    The two most basic option types are a European call and a European put.A European call option on a stock is a contract that gives its holder the right (but not the

    obligation) to purchase the stock at some future time T (the maturity time) for a price K (thestrike price or exercise price) that is fixed at contract initiation. If S = (St)0tT denotes theunderlying assets price process, the payoff of a call option is (ST K)+, as shown in Figure 2.

    EST

    TCALL OPTION PAYOFF

    K

    Figure 2: Call option payoff

    9

  • 8/2/2019 b10b Notes

    11/101

    E ST

    TPUT OPTION PAYOFF

    ddddd

    K

    Figure 3: Put option payoff

    A European put option on a stock is a contract that entitles the holder to sell the underlyingstock for a fixed price K, the strike price, at a future time T. If S = (St)0tT denotes the

    underlying assets price process, the payoff of a put option is (K ST)+

    , as shown in Figure 3.The act of choosing to buy or sell the asset under the terms of the option contract is calledexercising the option.

    Options which can be exercised any time before the maturity date T are called American op-tions, whilst European options can only be exercised at T. Hence, an American call (respectively,put) option allows the holder to buy (respectively, sell) the underlying stock for price K at anytime before maturity.

    Lemma 1.7 (Put-call parity). The European call and put prices c(t, St) and p(t, St) of optionswith the same strike K on a non-dividend paying traded stock with price St at time t [0, T] arerelated by

    c(t, St) p(t, St) = St Ker(Tt), 0 t T.Proof. The payoffs of a call and put satisfy

    c(T, ST) P(T, ST) = (ST K)+ (K ST)+ = ST K,which shows the (obvious) fact that a long position in a call combined with a short position ina put is equivalent to a long position in a forward contract. Hence, their prices at t T mustsatisfy

    c(t, St) p(t, St) = ft,T = St Ker(Tt), 0 t T,where ft,T is the value of a forward contract at t T.

    Remark 1.8. The same argument applied to a dividend-paying stock yields

    c(t, St) p(t, St) = ft.T = Steq(Tt) Ker(Tt), 0 t T,

    where q is the dividend yield.

    1.5.1 Europen call and put price bounds

    Put-call parity is a model-independent result. From it, we see that c(t, St) f(t, St). That is, acall option is always at least as valuable as a forward contract (an obvious fact).1

    A call option gives its holder the right to buy the underlying stock, which means that itsvalue can never be greater than that of the stock, so c(t, St) St.2 From this we deducemodel-independent bounds on a European call option price on a non-dividend-paying stock:

    St Ker(Tt) c(t, St) St, 0 t T.1Equivalently, comparing the payoff of a call with that of one share plus a short position of Ker(Tt) in cash,

    we obtain ST

    K

    (ST

    K)+, and therefore c(t, St)

    St

    Ker(Tt).

    2Equivalently, comparing the payoffs of a share and a call, we have ST = S+T > (ST K)+, and thereforec(t, St) St.

    10

  • 8/2/2019 b10b Notes

    12/101

    These bounds are shown in Figure 4.

    2 4 6 8 10 12 14 16 1810

    5

    0

    5

    10

    15

    20

    C=SK.exp(rT)

    C=S

    stock price, S

    callprice,

    C

    Bounds on Value of European Call

    K=10, r=10%, sigma=25%, T=1year

    Figure 4: Bounds on European call option value

    In Figure 4 we have plotted the upper and lower bounds of a European call value (the dottedgraphs) as well as the Black-Scholes value of the call (the solid graph). We will show in theselectures how this function arises. If the above call option pricing bounds are violated, thenarbitrage opportunities arise.

    For example, if If c(t, St) < St Ker(Tt) one should buy the call and short the stock,which gives a cash amount St c(t, St) to be invested in a bank account. At time T, we havetwo possibilities:

    1. ST K, in which case the call is not exercised. The arbitrageur buys the stock in themarket to close out the short position, using the proceeds from the bank account, whichstand at (St c(t, St))er(Tt) prior to buying the stock. This leaves a profit of

    (St c(t, St))er(Tt) ST > (St c(t, St))er(Tt) K= er(Tt)(St Ker(Tt) c(t, St)) > 0.

    2. ST > K, in which case the call is exercised. The arbitrageur buys the stock for K toclose out the short position, using the proceeds from the bank account, which stand at(St c(t, St))er(Tt) prior to buying the stock. This leaves a profit of

    (St

    c(t, St))er(Tt)

    K = er(Tt)(St

    Ker(Tt)

    c(t, St)) > 0.

    11

  • 8/2/2019 b10b Notes

    13/101

    We can derive similar model-independent bounds on a put option price. A put option givesits holder the right to receive an amount K for the stock, so the most it can be worth at maturityis K (if the final stock price is ST = 0). Hence, its current value can never be greater than thepresent value of K, so that

    p(t, St) Ker(Tt), 0 t T.

    Similarly, for the value of a put at expiry we have P(T, ST) = (K ST)+ K ST. That is, aput option is at least as valuable as a short position in a forward contract. Hence we have thelower bound

    p(t, St) Ker(Tt) St, 0 t T.The results in this section are model-independent. To say more about option values we need amodel for the dynamic evolution of a stock price. One of the simplest continuous-time models isthe Black-Scholes-Merton (BSM) model, which we shall describe later, and one of the simplestdiscrete-time models is the binomial model, which we shall also see shortly.

    1.5.2 Combinations of optionsOptions can be combined to give a variety of payoffs for different hedging purposes, or forspeculation on movements in the underlying asset price, and they are often used to do so becausethe option premiums are relatively small in some cases, thus proving very attractive to gamblers.

    A straddle is a call and a put with the same strike and maturity. The payoff of a long positionin a straddle is

    (ST K)+ + (K ST)+ =

    K ST, ST < KST K, ST K, (1.7)

    This payoff is illustrated in Figure 5.

    EST

    TLONG STRADDLE PAYOFF

    dddddddddd

    K

    Figure 5: Straddle payoff

    1.6 Some history*

    As remarked earlier, the origins of derivatives lie in medieval agreements between farmers andmerchants to insure farmers against low crop prices.

    In the 1860s the Chicago Board of Trade was founded to trade commodity futures (contracts

    that set trading prices of commodities in advance), formalising the act of hedging against futureprice changes of important products.

    12

  • 8/2/2019 b10b Notes

    14/101

    Options were first valued by Bachelier in 1900 in his PhD thesis, a translation of which canbe found in the book by Davis and Etheridge [3]. Bachelier introduced a stochastic process nowknown as Brownian motion(BM) to model stock price movements in continuous time. Bachelierdid this before a rigorous treatment of BM was available in mathematics. His work was decades

    ahead of its time, both mathematically and economically speaking, and was therefore not giventhe credit it deserved at the time. In the decades that followed, mathematicians and physicists(Einstein, Wiener, Levy, Kolmogorov, Feller to name but a few) developed a rigorous theory ofBrownian motion, and Ito developed a rigorous theory of stochastic integration with respect toBrownian motion, leading to the notion of a stochastic calculus, which we shall encounter. In the1960s, economists re-discovered Bacheliers work, and this was one of the ingredients that led tothe modern theory of option valuation.

    In the early 1970s a combination of forces existed which made markets more risky, derivativesmore prominent, and their valuation and trading possible. The system of fixed exchange ratesthat existed before 1970 collapsed, and the Middle East oil crises caused a big increase in thevolatility of financial prices. This increased the demand for risk management products such asoptions. At the same time Black and Scholes [2] and Merton [11] (BSM) published their seminalwork on how to price options, based on managing the risk associated with selling such an asset.This breakthrough, for which Scholes and Merton received a Nobel Prize (Black having passedaway in 1995) coincided with the opening of the Chicago Board Options Exchange (CBOE),giving individuals both a means to value option contracts and a marketplace where they couldprofit from this knowledge of the fair price.

    Following on from this, the financial deregulation of the 1980s, allied to technological devel-opments which made it possible to trade securities globally and to run large portfolios of complexproducts, caused a huge increase in risky trading across international borders. This opened up yetmore risks across currencies, interest rates and equities, and financial institutions very skillfully(or opportunistically, perhaps) created markets to trade derivatives and to sell these products tocustomers. This has led to the massive increase in derivative trading that we now see, with thevolume of derivative contracts traded now dwarfing that in the associated underlying assets.

    The papers of Black-Scholes [2] and Merton [11] attracted mathematicians to the subject,

    and led to a mathematically rigorous approach to valuing derivatives, based on probability andmartingale theory, inspired by Harrison and Pliska [7]. This led directly to modern financialmathematics, and has also contributed to the advent of derivatives written on a plethora ofunderlying stochastic reference entities, such as interest rates, weather indices, default events, aswell as on more traditional traded underlying securities such as stocks, currencies and interestrates.

    Part II

    The binomial model

    2 Probability spacesWe shall use a rigorous approach to probability theory, based on measure theory, to model theevolution of a stock price. The reason for taking this path is that it enables us to treat bothdiscrete and continuous random variables with the same foundations, and enables a generaldefinition of the concept of conditional expectation, and hence of a stochastic process known asa martingale.

    A good reference for this material is Shreve [13], which applies these ideas to finance, whileDurrett [4], Jacod and Protter [9], and Williams [15] are good on the pure probability aspects.

    The first building block in modelling a probabilistic situation is a set, , called the samplespace, which represents the possible outcomes of a random experiment. The number of elementsin might be finite or infinite, but we shall start off with the simpler, finite case. The elements of are called sample points.

    13

  • 8/2/2019 b10b Notes

    15/101

    2.1 Finite probability spaces

    Let be a set with finitely many elements. Let Fbe the set of all subsets of .3

    Definition 2.1. A probability measure P is a function mapping

    Finto [0, 1] with the following

    properties:

    1. P() = 1.

    2. IfA1, A2, . . . is a sequence of disjoint sets in F, then

    P

    k=1

    Ak

    =

    k=1

    P(Ak).

    The interpretation is that, for a set A F, there is a probability in [0, 1] that the outcome ofa random experiment will lie in the set A. We think ofP(A) as this probability. The set A Fis called an event.

    For A

    Fwe define

    P(A) := A

    P(). (2.1)

    We can define P(A) in this way because A has only finitely many elements, and so only finitelymany elements appear in the sum on the right-hand side of the above equation.

    Example 2.2 (3-coin toss sample space). Let be the finite set

    = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT},the set of all possible outcomes of three coin tosses, where H stands for a head and T for atail. Each sample point is a sequence of length three, and denote the tth element of by t, t {1, 2, 3}. Hence a general sample point is written as = (123). For example,when = HTH, 1 = H, 2 = T, 3 = H. Let Fbe the collection of all subsets of (F is a-algebra). Suppose the probability of H on each coin toss is p (0, 1), then the probability ofT is q := 1 p. For each = (123) we define

    P() := pNumber of H in qNumber of T in .

    Then for each A Fwe define P(A) according to (2.1).Definition 2.3. Let be a nonempty set. A -algebra is a collection, G, of subsets of withthe following three properties:

    1. G,2. ifA G then Ac G,

    3. ifA1, A2, . . . is a sequence of sets in G, then k=1Ak is also in G.We interpret -algebras as a record of information. We develop some intuition shortly.Given a set , a -algebra F, and a probability measure P, the pair (, F) is called a

    measurable space, and the triple (, F,P) is called a probability space.Definition 2.4. Let be a nonempty finite set. A filtration is a sequence of -algebrasF0, F1, . . . ,FT, such that each -algebra in the sequence contains all the sets contained by theprevious -algebra.

    Let T = {0, 1, . . . , T }. A probability space (, F,P) equipped with a filtration F = (Ft)tT,with each Ft F, is called a filtered probability space.Remark 2.5. When dealing with filtrations we shall usually assume that F0 is trivial, that is

    F0 =

    {,

    }, so that A

    F0

    P(A)

    {0, 1

    }.

    3F is a -algebra (see later).

    14

  • 8/2/2019 b10b Notes

    16/101

    2.2 Random Variables

    Definition 2.6. Let be a nonempty finite set and let Fbe the -algebra of all subsets of .A random variable X is a measurable function mapping into the real line R, that is, the setX1(A) =

    {

    : X()

    A

    } Ffor every Borel set A

    R.

    We often denote a random variable X by X(), where , to reinforce the fact that therandom variable is a mapping from the sample space to the space of real numbers.

    Since a random variable maps into R, we can look at the preimage under the randomvariable of sets in R, that is, sets of the form

    { : X() A R},

    which is, of course, a subset of . The complete list of subsets of that you can get as preimages(under X) of sets in R, turns out to be a -algebra, whose content is exactly the informationobtained by observing X, and is called the -algebra generated by the random variable X.

    Definition 2.7. Let be a nonempty finite set and let Fbe the -algebra of all subsets of .Let X be a random variable on (, F). The -algebra (X) generated by X is defined to bethe collection of all sets of the form { : X() A} where A is a subset ofR. Let G be asub--algebra ofF. We say that X is G-measurable if every set in (X) is also in G.Notation 2.8. We usually write {X A} for { : X() A}.Remark 2.9. An equivalent characterisation of measurability is that X : R is G-measurableif the event {X x} is an element of G for all x R.Definition 2.10. Let be a nonempty finite set and let Fbe the -algebra of all subsets of .Let P be a probability measure on (, F) and let X be a random variable on (, F). Given anyset A R, we define the induced measure of the set A to be

    X(A) := P{ : X() A} P{X A}.

    So the induced measure of a set A tells us the probability that X takes a value in A. By thedistribution of a random variable X, we mean any of the several ways of characterizing X . Acommon way of doing this is to give the cumulative distribution functionFX(x) ofX, defined by

    FX(x) := P{ : X() x} P{X x}.

    If X is discrete, X places a mass fX(xi) = P{X = xi} = X{xi}, i = 1, . . . , n at each possiblevalue xi of X. We call fX(xi), i = 1 . . . , n the probability mass function of X, and knowingthis for all possible values of X is equivalent to knowing the cumulative distribution function.(Later we shall encounter continuous random variables which have probability density functions,in which case the induced measure of a set A R is the integral of the density over the set A.)

    A simple example of a random variable is the following.

    Definition 2.11. The indicator function IA() of an event A Fis

    IA() :=

    1, A,0, / A.

    Remark 2.12. We make a clear distinction between random variables and their distributions. Arandom variable is a mapping from to R, nothing more, and has an existence quite apart fromany discussion of probabilities. The distributionof a random variable is a measure X on R, i.e.a way of assigning probabilities to sets in R. It depends on the random variable X and on theprobability measure P we use on . If we change P, we change the distribution of the randomvariable X, but not the random variable itself. Thus, a random variable can have more than onedistribution (e.g. an ob jective or market distribution, and a risk-neutral distribution, and

    we shall see such constructs in finance). In a similar vein, two different random variables canhave the same distribution.

    15

  • 8/2/2019 b10b Notes

    17/101

    Definition 2.13. Let be a nonempty finite set and let Fbe the -algebra of all subsets of. Let P be a probability measure on (, F) and let X be a random variable on (, F). Theexpected value or expectation of X is defined to be

    E[X] := X()P(). (2.2)

    Notice that this is a sum over the sample space . If is a finite set, then X can take onlyfinitely many values x1, . . . , xK . In this case we can partition into the subsets {X = x1}({ : X() = x1}, . . . , {X() = xK}), which allows us to write (2.2) as

    E[X] :=

    X()P()

    =K

    k=1

    {X=xk}

    X()P()

    =

    Kk=1

    xk {X=xk}

    P()

    =

    Kk=1

    xkP{X = xk} (the familiar form)

    =K

    k=1

    xkX{xk}.

    Thus, although expectation is defined as a sum over the sample space , we can also write it asa sum over R.

    Remark 2.14. When the sample space is infinite and, in particular, uncountable, the summation

    in the definition of expectation is replaced by an integral. In general, the integral over an abstractmeasurable space (, F) with respect to a probability measure P is a so-called Lebesgue integral(which has all the linearity and comparison properties we associate with ordinary integrals). Theexpectation E[X] becomes the Lebesgue integral over of X with respect to P, written as

    E[X] =

    XdP

    X() dP() =

    R

    x dX(x). (2.3)

    When X takes on a continuum of values and has a density fX , then dX(x) = fX(x) dx and theintegral on the right-hand side if (2.3) reduces to the familiar Riemann integral

    R

    xfX(x) dx.We do not delve into the construction of Lebesgue integrals over abstract spaces here. Merely

    think of the RHS of (2.3) as an alternative notation for the sum

    X()P(). See Williams[15] or Shreve [14] for more details on Lebesgue integration.

    It is easy to see (exercise) that, for an indicator function IA of an event A F, the definitionof expectation leads to

    E[IA] = P(A).

    The expectation operator is linear, so for two random variables X, Y and constants a, b, wehave

    E[aX+ bY] = aE[X] + bE[Y].

    The variance of X is the expected value of (X E[X])2:

    var(X) :=

    (X() (E[X])2)P() = E[(X E[X])2[= E[X2] (E[X])2.

    where the last equality comes from expanding the square in the definition of variance and applying

    the expectation operator to each term.

    16

  • 8/2/2019 b10b Notes

    18/101

    The moment generating function M(a) of the random variable X is

    M(a) := E[exp(aX)].

    The characteristic function (a) of X is defined by

    (a) := E[exp(iaX)], i =1. (2.4)

    2.3 Stochastic processes

    Definition 2.15 (Discrete-time stochastic process). let T = {0}N = {0, 1, 2, . . .} be a discretetime set. A discrete-time stochastic process (Xt)tT is a sequence of random variables.

    Thus, a d-dimensional stochastic process is a parametrised collection of random variables(Xt)tT defined on a probability space (, F,P) and assuming values in Rd for d N. Theparameter space T is thought of as a time index set.

    For a finite discrete-time T-period model, we would have

    T = {0, 1 . . . , T }.For an infinite horizon model we would have T = {0} N = {0, 1, . . . , }.4

    There are a number of ways of thinking about a stochastic process. First, for each fixed t Twe have a random variable Xt : Rd, that is, the function

    Xt(), .

    On the other hand, if we fix we can consider the function : [0, ) Rd, i.e. the map

    t Xt(), t T,

    given by (t) = Xt(), t

    T, which is called a path (or sample path) of X.

    We think of t as time and of each as the outcome of an experiment (such as tradingin a financial market). With this in mind, Xt() would represent the result at time t if theoutcome of the experiment is (or the price at time t of an asset being traded in a market, inthe state ). Sometimes we write X(t, ) Xt(). Thus we may also regard the process asa function of two variables:

    (t, ) X(t, ), (t, ) T ,

    from T into Rd. This is often a natural point of view in stochastic analysis, where it is crucialto have X(t, ) jointly measurable in (t, ).

    Now, we may identify each with the sample path Xt(), i.e. with the function : [0, ) Rd given by (t) = Xt(), t [0, ) (the so-called coordinate mapping process).In this way we may regard as a subset of the space (Rd)[0,) of all Rd-valued functions on[0, ).Definition 2.16. Let T = {0}N. A stochastic process (Xt)tT on a filtered space (, F, (Ft)tT)is adapted to the filtration (Ft)tT if Xt is Ft-measurable for each t T.

    3 The binomial stock price process

    We will use a stochastic process S on a filtered probability space (, F,F := (Ft)tT,P), whereT is some time index set, to represent the evolution of a stock price. We will use a discrete-timeframework. For a finite horizon, T-period model, the time set will be

    T = {0, 1, 2, . . . T }.4If we use a continuous-time model, T could be the half-line: T = [0, ), or a finite interval: T = [0, T].

    17

  • 8/2/2019 b10b Notes

    19/101

    For each t T, St will be a one-dimensional random variable on the measurable space (, F).We assume also that for each t T, St is Ft-measurable, so that S is an F-adapted process.This encapsulates the idea that the information at time t T, represented by Ft, is sufficientinformation to know the values of Ss for all s t.

    The sample space will be finite, say = {1

    , . . . , N

    }, and the probability measure P,called the physical measure (or objective measure, or the market measure), will be such thatP(n) = pn > 0, for n = 1, . . . , N . We shall assume FT = F and also F0 = {, }. Inthis setting, a random variable Y on (, F,P) is a vector in RN with components Y(n), forn = 1, . . . , N . That is

    Y() (Y1, . . . , Y N) = (Y(1), . . . , Y (N)).We introduce a riskless asset (or cash account, or money market account, or bond) with priceprocess S0. It is riskless because its return is always greater than 1: (S0t+1 S0t )/S0t 1.Specifically, its price will be assumed to evolve according to

    S0t+1 = (1 + rt)S0t , t = 0, 1, . . . , T 1.

    where (rt)T1t=0 is the interest rate process, an F-adapted process satisfying rt 0 almost surelyfor all t {0, 1, . . . , T 1}. Let us take this process to be constant, rt = r > 0 for all t {0, 1, . . . , T 1}. We shall also assume S00 = 1, so that with constant interest rate r, we have

    S0t = (1 + r)t, t = 0, 1, . . . , T .

    Hence, S0t represents the value of at time t of one unit of currency (say $1) invested at time zero.So the return on the riskless asset is 1 + r 1, and this encapsulates the risk-free nature of thisasset. In contrast, the return on the stock can be greater or less than 1, encapsulating that it isa risky asset.

    Example 3.1 (One-period binomial model). Let be the finite set

    ={

    H, T}

    ,

    the set of outcomes of a single coin toss. Take a 1-period model, with T = {0, 1}. The riskystock price process is (St)t{0,1}, and S1() takes on two possible values, S1(H) or S1(T), withvalues given by (see Figure 6)

    S1() =

    S0u, if = HS0d, if = T,

    where u > 1 + r > d > 0 are constants.

    ddd

    S0

    p

    1 p

    S0u

    S0d

    Figure 6: One-period binomial process for stock price. We have associated a probability p (0, 1)with an upward stock price move.

    Example 3.2 (Three-period binomial model). Let be the finite set

    = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT},

    the set of all possible outcomes of three coin tosses. The sample points are of the form = 123, with t, t = 1, 2, 3 representing the outcome of the tth toss.

    18

  • 8/2/2019 b10b Notes

    20/101

    In this 3-period model, with T = {0, 1, 2, 3}, the risky stock price process is (St)3t=0, andSt() = St(1 . . . t) is the stock price after t tosses, with evolution given by (see Figure 7)

    St+1() = Stu, if t+1 = H,

    Std, if t+1 = T,

    t = 0, 1, 2,

    where u > 1 + r > d > 0 are constants. We also write

    St+1(1 . . . t+1) =

    St(1 . . . t)u, if t+1 = HSt(1 . . . t)d, if t+1 = T,

    t = 0, 1, 2,

    whenever we wish to emphasise that St actually depends only on the outcome of the first t cointosses.

    ddd

    St

    p

    1 p

    Stu

    Std

    Figure 7: Binomial process for stock price. We have associated a probability p (0, 1) with anupward stock price move.

    With Ft representing the -algebra determined by the first t tosses, we take F= F3. Thereare a total of 8 sample points . We shall sometimes label these as the set = {1, . . . , 8}.So 1 = HHH, 2 = HHT, and so on.

    Example 3.3 (T-period binomial model). As Example 3.2, but now let be the set of all outcomes

    ofT coin tosses, so T = {0, 1, . . . , T }, and each is of the form = (12 . . . T), with eacht {H, T}, for each t {1, . . . , T }.

    It is easy to see that at time t T the possible stock prices are S(j)t , j = 0, 1, . . . , t, given by

    S(j)t = S0u

    jdtj , j = 0, 1, . . . , t, t T.

    3.1 Filtration as information flow in the binomial model*

    A filtration F = (F)tT represents information flow as time marches forward. Let us explainthis idea and the associated notion of measurability, with reference to the three-period binomialmodel of a stock price given in Example 3.2.

    Let be the finite set of all outcomes of 3 coin tosses:

    = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.Each sample point is a sequence (123), with t, t = 1, 2, 3 representing the outcome ofthe of the tth coin toss.

    Suppose the coin has probability p (0, 1) for H and q := 1 p for T. With each coin tossindependent, we have

    P() =

    p3, for = HHH,p2q, for {HHT, HTH, THH},pq2, for {HTT, THT, TTH},p3, for = TTT.

    We can easily write down all the stock prices in the tree as:

    19

  • 8/2/2019 b10b Notes

    21/101

    0011 0011001100110011

    00110011 0011001100110 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 01 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1

    0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 01 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 10 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 01 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 10 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 01 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 10 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 01 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 1

    0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 01 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1S0

    S3(HHH) = u3S0

    S3(TTT) = d3S0

    S1(T) = dS0

    S1(H) = uS0

    S2(HT) = S2(TH) = udS0

    S2(HH) = u2S0

    S2(TT) = d2S0

    S3(HHT) = S3(HTH) = S3(THH) = u2dS0

    S3(HTT) = S3(THT) = S3(TTH) = ud2S0

    Define the following two subsets of :

    AH = {HHH, HHT, HTH, HTT},AT = {THH, THT, TTH, TTT},

    corresponding to the events that the first coin toss results in H and T respectively. Using thedefinition

    P(A) :=A

    P().

    we find

    P(AH) = P{HHH, HHT, HTH, HTT} = P{H on first toss} = p,and

    P(AT) = P{THH, THT, TTH, TTT} = P{T on first toss} = q,precisely in accordance with intuition.

    Here are two -algebras of subsets of the set .

    F0 = {, }, F1 = {, , AH, AT}.As an easy exercise, you can verify that both the above collections of sets are indeed -algebras.

    Let us shed some light on the sense in which -algebras are a record of information. Suppose,after the three coin tosses, that you are not told the outcome, but that you are told, for eachset in F1, whether or not the outcome is in that set. For instance, you might be told that theoutcome is not in AH, but is in AT (you would also be told that the outcome is not in but is in, which is obvious). In effect, you have been told that the first toss was a T, and nothing more.In this sense the -algebra F1 contains the information of the first toss or the information upto time 1.

    The -algebra F0 corresponding to information at time zero (before the coin has been tossed,so no information on is available) is composed of all sets A such that F0 is indeed a -algebra,and such that one can answer the question is A?, given that one has no information onthe coin tosses. Hence all we can say is that / and , and so

    F0 = {, }.At time 1, the coin has been tossed once. Then one knows that either 1 = H or that 1 = T.In this case one can answer the following questions.

    Is ? No.

    20

  • 8/2/2019 b10b Notes

    22/101

    Is ? Yes. Is AH? Yes, if 1 = H, otherwise no.

    Is

    AT? Yes, if 1 = T, otherwise no.

    One cannot answer a question such as: is {HHH, HHT}? The -algebra F1 correspondingto information at time 1 is composed of all the sets A such that F1 is indeed a -algebra, andsuch that one can answer the question: is A?, given that one has information on theoutcome of the first coin toss. This is why we must have

    F1 = {, AH, AT, }.

    DefineAHH = {HHH, HHT}, AHT = {HTH, HTT},ATH = {THH, THT}, ATT = {TTH, TTT},

    corresponding to the events that the first two coin tosses result in HH, HT, TH and TT respec-tively. Consider the collection of sets

    F2 = {, , AHH, AHT, ATH, ATT, plus all unions of these}.

    Then F2 can be written as (check)

    F2 = {, , AHH, AHT, ATH, ATT, AH, AT,AHH ATH, AHH ATT, AHT ATH, AHT ATT,AcHH, A

    cHT, A

    cTH, A

    cTT}.

    Then F2 is indeed a -algebra (a tedious and lengthy verification) which contains the informationof the first two tosses or the information up to time 2. This is because, if you know the outcomeof the first two coin tosses, you can say whether the outcome of all three tosses satisfies A for each A F2.

    Similarly, F3 F, the set of all subsets of , contains full information about the outcomeof all three tosses, whilst the trivial -algebra F0 contains no information. Knowing whether theoutcome of the three tosses is in (it is not) and whether it is in (it is) tells you nothingabout . The sequence of -algebras F0, F1, F2, F3 is a filtration.

    Let us show that the stock price process (St)tT is indeed F-adapted. That is, the randomvariable St is known after t coin tosses, equivalently, St is Ft-measurable, for each t T.

    First, S0 must be a deterministic constant:

    S0() = a R, ,

    since one has no information on the outcome of the coin tosses at time zero. Then, sets of theform

    {S0

    A

    R

    }are clearly either in or

    , so S0 is

    F0-measurable.

    The random variable S1 must be of the form

    S1() =

    a R, if AH,b R, if AT,

    since the only information available at time 1 is whether 1 = H or 1 = T, and we notice thatS1 is indeed of the above form.

    Continuing to argue in this fashion, we find that at each time t T, the event

    { : St() A R} = {St A},

    is in Ft. The stochastic process S = (St)tT is said to be adapted to the filtration F = (Ft)tT,as in Definition 2.16.

    21

  • 8/2/2019 b10b Notes

    23/101

    Now suppose S0 = 4, u = 2 and d =12 . Then S0, S1, S2 and S3 are all random variables,

    denoted by St() for t = 0, 1, 2, 3 and . We may calculate the values of S2() for all ,as

    S2(HHH) = S2(HHT) = 16,S2(HTH) = S2(HTT) = S2(THH) = S2(THT) = 4,

    S2(TTH) = S2(TTT) = 1.

    Now consider the preimage under the random variable S2 of certain sets in R. Specifically,consider the interval [4, 29]. The preimage under S2 of this interval is

    { : S2() [4, 29]}.

    We can characterise the above subset of in terms of one of the sets given earlier in the list ofsets in F2. We have: { : S2() [4, 29]} = AcTT.

    Suppose we list, in as minimal a fashion as possible, the subsets of that we can get aspreimages under S2 of sets in R, along with sets which can be built by taking unions of these,

    then this collection of sets turns out to be a -algebra, the -algebra generated by the randomvariable S2, denoted (S2).

    Now, if AHH, then S2() = 16. If AHT ATH, then S2() = 4. If ATT, thenS2() = 1. Hence (S2) is composed of {, , AHH, AHT ATH, ATT}, plus all relevant unionsand complements. Using the identities

    AHH (AHT ATH) = AcTT,AHH ATT = (AHT ATH)c,

    (AHT ATH) ATT = AcHH,

    we obtain(S2) =

    {, , AHH, AHT

    ATH, ATT, AHH

    ATT, A

    cHH, A

    cTT

    }.

    The information content of the -algebra (S2) is exactly the information learned by observingS2. So, suppose the coin is tossed three times and you do not know the outcome , but you aretold, for each set in (S2), whether is in the set. For instance, you might be told that isnot in AHH, is in AHT ATH, and is not in ATT. Then you know that in the first two throwsthere was a head and a tail, but you are not told in which order they occurred. This is the sameinformation you would have got by being told that the value of S2() is 4.

    Note that F2 contains all the sets which are in (S2), and even more. In other words, theinformation in the first two throws is greater than the information in S2. In particular, if yousee the first two tosses you can distinguish AHT from ATH, but you cannot make this distinctionfrom knowing the value of S2 alone.

    4 Conditional expectation and martingalesWe review independence in a finite probability space (, F,P). Many of the definitions as writtenhere extend to general probability spaces.

    Definition 4.1 (Independence of sets). Two sets A Fand B Fare independent if

    P(A B) = P(A)P(B).

    To see that this is a correct definition, suppose that a random experiment is conducted, and is the outcome. The probability that A is P(A). Suppose you are not told , but you aretold that B. Conditional on this information, the probability that A is

    P(A|B) :=P(A

    B)

    P(B) .

    22

  • 8/2/2019 b10b Notes

    24/101

    The sets A and B are independent if and only if this conditional probability is the unconditionalprobability P(A), i.e. knowing that B does not change the probability you assign to A. Thisdiscussion is symmetric with respect to A and B; ifA and B are independent and you know that A, the conditional probability you assign to B is still the unconditional probability P(B).Note that whether two sets are independent depends on the probability measure P.

    Definition 4.2 (Independence of-algebras). Let G and H be sub--algebras ofF. We say thatG and H are independent if every set in G is independent of every set in H, i.e.

    P(A B) = P(A)P(B), for every A G, B H.Definition 4.3 (Independence of random variables). Two random variables X and Y are inde-pendent if the -algebras they generate, (X) and (Y), are independent.

    The above definition says that for independent random variables X and Y, every set definedin terms of X is independent of every set defined in terms of Y.

    Suppose X and Y are independent random variables. The measure induced by X on R isX(A) := P

    {X

    A

    }, for A

    R. Similarly, the measure induced by Y is Y(B) := P

    {Y

    B

    },

    for B R. The pair (X, Y) takes values in the plane R2, and we define the measure induced bythe pair (X, Y) as

    X,Y(C) := P{(X, Y) C}, C R2.In particular, C could be a rectangle, i.e. a set of the form A B, where A R and B R. Inthis case

    {(X, Y) C} = {(X, Y) A B} = {X A} {Y B},and X and Y are independent if and only if

    X,Y(A B) = P({(X A} {Y B})= P{X A}P{Y B}= X(A)Y(B).

    In other words, for independent random variables X and Y, the joint distributionrepresented byX,Y factorises into the product of the marginal densities represented by the measures X andY .

    Theorem 4.4. Suppose X and Y are independent random variables. Let g and h be functionsfromR to R. Then g(X) and h(Y) are also independent random variables.

    Proof. Put W = g(X) and Z = h(Y). We must consider sets in (W) and (Z). But a typicalset in (W) is of the form

    { : W() A} = { : g(X()) A},which is defined in terms of the random variable X, and is therefore in (X). So, every set in

    (W) is also in (X). Similarly every set in (Z) is also in (Y). Since every set in (X) isindependent of every set in (Y), we conclude that every set in (W) is independent of everyset in (Z).

    Definition 4.5. Let X1, X2, . . . be a sequence of random variables. We say that these randomvariables are independent if for every sequence of sets A1 (X1), A2 (X2), . . ., and for everypositive integer n,

    P(A1 A2 . . . An) = P(A1)P(A2) . . .P(An).Theorem 4.6. If two random variables X and Y are independent, and if g and h are functionsfromR to R, then

    E[g(X)h(Y)] = E[g(X)]

    E[h(Y)],

    provided all the expectations are defined.

    23

  • 8/2/2019 b10b Notes

    25/101

    Proof. Note that by Theorem 4.4 it is enough to prove the result for g(x) = x and h(y) = y.We prove this only in a finite probability space when X, Y can take on only finitely many valuesX = xi, i = 1, . . . , K and Y = yj , j = 1, . . . , L. We use the fact that in this case, the expectation

    of X has the familiar form E[X] = Ki=1 xiP

    {X = xi

    }. So we have

    E[XY] =

    Ki=1

    Lj=1

    xiyjP({X = xi} {Y = yj})

    =Ki=1

    Lj=1

    xiyjP{X = xi}P{y = yj}

    =Ki=1

    xiP{X = xi}L

    j=1

    yjP{y = yj}

    = E[X] E[Y].

    Remark 4.7. For general probability spaces, the above theorem in proved using the Lebesgueintegral representation of expectation, and an argument which Shreve [14] (Section 1.5) callsthe standard machine. Let g(x) = IA(x) and h(y) = IB(y) be indicator functions. Then theequation we are trying to prove becomes

    P({X A} {Y B}) = P{X A}P{Y B},

    which is true because X and Y are independent. Now this is extended to simple functions (linearcombinations of indicator functions) by linearity of expectation. Sequences of such functionscan always be constructed that converge to general functions g and h, and then an integralconvergence theorem, the Monotone Convergence Theorem,5 gives the result.

    The covariance of two random variables X and Y is

    cov(X, Y) := E[(X E[X])(Y E[Y])] = E[XY] E[X] E[Y],

    so var(X) = cov(X, X). According to Theorem 4.6, two independent random variables have zerocovariance (though the opposite is not necessarily true!).

    For independent random variables, the variance of their sum is the sum of their variances.Indeed, for any two random variables X and Y, and Z = X+ Y, then

    var(Z) = var(X+ Y) = var(X) + var(Y) + 2cov(X, Y),

    so that for independent X and Y, var(X+ Y) = var(X) + var(Y).This argument extends to any finite number of random variables. If we are given independent

    random variables X1, X2, . . . , X n, then

    var(X1 + . . . + Xn) = var(X1) + . . . + var(Xn).

    Example 4.8. Toss a coin twice, so = {HH, HT, TH, TT}, with probability p (0, 1) for Hand probability q = 1 p for T on each toss. Let A = {HH, HT} and B = {HT, TH}. We

    5The Monotone Convergence Theorem is as follows. Let Xn, n = 1, 2, . . . be a sequence of random variablesconverging almost surely to a random variable X (that is, P{Xn = X} 1 as n ). Assume that

    0 X1 X2 . . . , almost surely.Then Z

    X dP = lim

    n

    Z

    Xn dP,

    or equivalentlyE

    [X] = limnE

    [Xn].

    24

  • 8/2/2019 b10b Notes

    26/101

    have P(A) = p2 + pq = p, P(B) = pq + qp = 2pq, and P(A B) = pq. The sets A and B areindependent if and only if 2p2q = pq, that is, if and only if p = 1

    2.

    Let G = F1 be the -algebra determined by the first toss and H be the -algebra determinedby the second toss. Then, writing AH := {HH, HT} and AT := {TH, TT}, we have

    G = {, , AH, AT}H = {, , {HH, TH}, {HT, TT}}.

    It is easy to see that these two -algebras are independent. For example, if we choose AH fromG and {HH, TH} from H, we find

    P(AH)P{HH, TH} = p(p2 +pq) = p2,P(AH {HH, TH}) = P{HH} = p2.

    This will be true no matter which sets we choose from G and H. This captures the notion thatthe coin tosses are independent of each other.

    4.1 Conditional expectation

    The conditional expectation E[X|G] of a random variable X given the -algebra G is, as weshall see, a random variable that expresses mathematically the best estimate of X given theinformation represented by G.

    To motivate this idea, we will use our familiar example of a binomial model. Recall thethree-period binomial model of Example 3.2 analysed in Section 3.1.

    We say that a set A is determined by the first t coin tosses if, knowing only the outcomeof the first t tosses, we can decide whether the outcome of all the tosses is in A. We denote thecollection of sets determined by the first t tosses by Ft (and this is a -algebra). The randomvariable St is in fact Ft-measurable for each t T. In Section 3.1 we encountered some of theFt -algebras for the case T = {0, 1, 2, 3}.Definition 4.9 (Information carried by a random variable). Let X be a random variable on .We say that a set A is determined by the random variable X if, knowing only the valueX() of the random variable, we can decide whether or not A. Another way of saying thisis that for every x R, either X1(x) A or X1(x) A = . The collection of subsets of determined by X is a -algebra, called the -algebra generated by X, and denoted by (X).

    If the random variable X takes finitely many different values, then (X) is generated by thecollection of sets { : X() = xi}, where xi, (i = 1, . . . , K ) are the possible values of X. Wecan write this collection as {X1(xi)} or {X1(X()| }. These sets are called the atomsof the -algebra (X).

    In general, if X is a random variable R then (X) is given by

    (X) ={

    X1(B); B B

    (R)}

    ,

    where B denotes the Borel -algebra on R.Section 3.1 gave an example from the 3 coin-toss experiment of the sets determined by the

    stock price at time 2, S2, which form the -algebra generated by S2. This is given by

    (S2) = {, , AHH, ATT, AHT ATH, plus complements and unions},

    where

    AHH = {HHH, HHT} = { : S2() = u2S0},ATT = {TTH, TTT} = { : S2() = d2S0},

    AHT ATH = { : S2() = udS0}.

    25

  • 8/2/2019 b10b Notes

    27/101

    4.1.1 Conditional expectation in the three-period binomial model

    To talk about conditional expectation, we need to introduce a probability measure on our coin-toss sample space . As usual, we do this by defining p (0, 1) as the probability of H on anythrow, and q = 1

    p as the probability of T. For a set A

    , the indicator function of the set

    A is IA() defined in the usual way, and

    E[IAX] =

    A

    XdP =A

    X()P().

    We can think of E[IAX] as a partial average of X over the set A.We give an example of a conditional expectation in the 3-period binomial model, to motivate

    the general definition which follows. Let us estimate S1, given S2. Denote this estimate byE[S1|S2]. We expect the properties ofE[S1|S2] to be:

    E[S1|S2] should depend on , that is, it is a random variable: E[S1|S2] = E[S1|S2()] =E[S1|S2]();

    If the value of S2 is known then the value ofE[S1|S2] should also be known.In particular, if = HHH or = HHT, then S2() = u

    2S0 and, even without knowing , weknow that S1() = uS0. We therefore define

    E[S1|S2](HHH) = E[S1|S2](HHT) = uS0.

    Similarly we defineE[S1|S2](TTT) = E[S1|S2](TTH) = dS0.

    Finally, if A = AHT ATH = {HTH, HTT, THH, THT}, then S2() = udS0, so thatS1() = uS0 or S1() = dS0. So, to get E[S1|S2] in this case, we take a weighted average, asfollows. For A we define

    E[S1|S2]() =A

    S1 dP

    P(A), A = AHT ATH,

    which is a partial average of S1 over the set A, normalised by the probability of A.Now, P(A) = 2pq and

    A

    S1 dP = pq(u + d)S0, so that for A

    E[S1|S2]() = 12

    (u + d)S0, A = AHT ATH.

    (In other words, the best estimate ofS1, given that S2 = udS0, is the average of the possibilitiesuS0 and dS0.) Then we have that

    A E[S1|S2]dP = A S1 dP.In conclusion, we can write

    E[S1|S2]() = g(S2()),where

    g(x) =

    uS0, if x = u

    2S012

    (u + d)S0, if x = udS0dS0, if x = d

    2S0.

    In other words E(S1|S2) is random only through dependence onS2 (and hence is (S2)-measurable).The random variable has two fundamental properties:

    1. E[S1

    |S2] is (S2)-measurable

    26

  • 8/2/2019 b10b Notes

    28/101

    2. For every A (S2), A

    E[S1|S2] dP =A

    S1 dP,

    which is known as the partial averaging property.

    Definition 4.10 (Conditional expectation). Let (, F,P) be a probability space, and let G be asub--algebra of F. Let X be a random variable on (, F,P). Then the conditional expectationE[X|G] is defined to be any random variable Y that satisfies

    1. Y = E[X|G] is G-measurable2. For every set A G, we have the partial averaging property

    A

    Y dP A

    E[X|G] dP =A

    XdP.

    Note (we do not prove this here) that there is always a random variable Y satisfying the

    above properties (provided that E|X| < ), i.e. conditional expectations always exist. Therecan be more than one random variable satisfying the above properties, but if Y is another one,then Y = Y with probability 1 (or almost surely (a.s.)).

    For random variables X, Y it is standard notation to write

    E[X|Y] := E[X|(Y)].

    Here are some ways to interpret E[X|G].

    A random experiment is performed, i.e. an element is selected. The value of is partially but not fully revealed to us, so we cannot compute the exact value of X().Based on what we know about , we compute an estimate of X() which, since it dependson the partial information we have about , depends on , i.e. E[X|G] = E[X|G]() is afunction of , though this dependence is often not shown explicitly.

    If the -algebra G contains finitely many sets, there will be a smallest set A G containing, which is the intersection of all sets in G containing . The way is partially revealedto us is that we are told it is in A, but not told which element of A it is. We then defineE[X|G]() to be the average (with respect to P) value of X over the set A. Thus, for all A, E[X|G]() will be the same.

    4.1.2 Partial averaging

    The partial averaging property is

    A E[X|G] dP = A XdP, A G.We can rewrite this as

    E[IA E[X|G]] = E[IA X]. (4.1)Note that IA() (which equals 1 for A and 0 otherwise) is a G-measurable random variable.Equation (4.1) suggests (and it is indeed true) that the following holds.

    Lemma 4.11. If V is any G-measurable random variable, then providedE[V E[X|G]] < ,

    E[V E[X|G]] = E[V X]. (4.2)

    Proof. Here is a sketch of the proof in a general probability space.First use (4.1) and linearity of expectations to prove (4.2) when V is a simple G-measurable

    random variable, i.e. V = Kk=1 ckAk, where each Ak G and each ck is constant. Next considerthe case that V is a nonnegative G-measurable random variable, not necessarily simple. Such a V27

  • 8/2/2019 b10b Notes

    29/101

    can be written as the limit of an increasing (almost surely) sequence of simple random variablesVn. We write (4.2) for each Vn and pass to the limit n , using the Monotone ConvergenceTheorem, to obtain (4.2) for V. Finally, the general (integrable) G-measurable random variableV can be written as the difference of two nonnegative random variables, V = V+V, and since(4.2) holds for V

    +

    and V it must hold for V as well. This is the standard machine argument.

    Based on Lemma 4.11, we can replace the second condition in the definition of conditionalexpectation by (4.2), so that the defining properties of Y = E[X|G] are:

    1. Y = E[X|G] is G-measurable2. For every G-measurable random variable V, we have

    E[V E[X|G]] = E[V X].) (4.3)Notice that we can write (4.3) as

    E[V (E[X|G] X)] = 0,which allows an interpretation ofE[X|G] as the projection of the vector X on to the subspace G.Then E[X|G] X is perpendicular to any V in the subspace.

    4.1.3 Properties of conditional expectation

    Conditional expectations have many useful properties, which we list below (some with sketchproofs). All the X below satisfy E|X| < .

    1. E[E[X|G]] = E[X].

    Proof. Just take A = in the partial averaging property (or, equivalently, V = I in(4.2)).

    The conditional expectation of X is thus an unbiased estimator of the random variable X.

    2. IfX is G-measurable, then E[X|G] = X.

    Proof. The partial averaging propertyA

    Y dP A E[X|G] dP = A XdP holds triviallywhen Y is replaced by X. Then, if X is G-measurable, it satisfies the first requirement inthe definition of conditional expectation as well.

    In other words, if the information content of G is sufficient to determine X, then the bestestimate of X based on G is X itself.

    3. (Linearity) For a1, a2 R,E[a1X1 + a2X2|G] = a1E[X1|G] + a2E[X2|G].

    Proof. By linearity of integrals (i.e. of expectations), as follows: E[a1X1 + a2X2|G] isG-measurable and satisfies, for any A G,

    A

    E[a1X1 + a2X2|G] dP =A

    (a1X1 + a2X2) dP (partial averaging)

    = a1

    A

    X1 dP+ a2

    A

    X2 dP, (linearity of integrals)

    = a1

    A

    E[X1|G] dP+ a2A

    E[X2|G] dP (partial averaging)

    = A

    (a1E[X1|G] + a2E[X2|G]) dP, (linearity of integrals),

    28

  • 8/2/2019 b10b Notes

    30/101

    and this is the partial averaging property.

    4. (Positivity) IfX

    0 almost surely, then E[X

    |G]

    0 almost surely.

    Proof. Let A = { : E[X|G] < 0}. This set is in G since E[X|G] is G-measurable. Now,the partial averaging property implies that

    A

    E[X|G] dP =A

    XdP.

    The RHS of this is 0 and the LHS is < 0 unless P(A) = 0. Therefore we must haveP(A) = 0 E[X|G] 0 almost surely.

    5. (Jensens inequality) If : R R is convex and E|(X)| < , then

    E[(X)|G] (E[X|G]).

    Proof. Recall the usual Jensen inequality: E[(X)] (E[X]). The proof of the conditionalversion follows exactly the same lines (see the proof of Theorem 23.9 in Jacod and Protter[9]).

    6. (Tower property) IfH is a sub--algebra of G, then

    E[E[X|G]|H] = E[X|H], a.s.

    Proof. If H

    H, then it is also true that H

    G, since

    His a sub--algebra of

    G. Hence

    H

    E[E[X|G]|H] dP =H

    E[X|G] dP

    =

    H

    XdP

    =

    H

    E[X|H] dP,

    so by a.s. uniqueness of conditional expectations, there is a unique H-measurable randomvariable on the left and right hand sides of the above, so we must have E[E[X|G]|H] =E[X|H] a.s.

    The intuition here is that G contains more information than H. If we estimate X basedon the information in G, and then estimate the estimator based on the smaller amount ofinformation in H, then we get the same result as if we had estimated X directly based onthe information in H.

    7. (Taking out what is known). If Z is G-measurable, then

    E[ZX|G] = Z E[X|G].

    29

  • 8/2/2019 b10b Notes

    31/101

    Proof. Let ZE[X|G is G-measurable (since the product of G-measurable functions is G-measurable), so satisfies the first property of a conditional expectation. So we check thepartial averaging property. For A G we have

    A

    ZE[X|G dP = E[IAZE[X|G]] = E[IAZX] = A

    ZX dP = AE[ZX|G] dP,

    (the second equality obtained using (4.3) with V = IAZ), so the partial averaging propertyholds.

    8. (Role of independence) If X is independent of H (i.e. if (X) and H are independent-algebras), then

    E[X|H] = E[X]. (4.4)

    Proof. Observe first that E[X] is H-measurable, since it is not random. So we only needto check the partial averaging property; we require that

    A

    E[X] dP =A

    XdP, A H.

    If X is an indicator of some set B, which by assumption must be independent of H, thenthe partial averaging equation we must check is

    A

    P(B) dP =

    A

    IB dP.

    The LHS is P(A)P(B), and the RHS is

    IAIB dP = IAB dP = P(A B),and so the partial averaging property holds because the sets A and B are independent. Thepartial averaging property for general X independent of H then follows by the standardmachine.

    The intuition behind (4.4) is that if X is independent of H, then the best estimate of Xbased on the information in H is E[X], the same as the best estimate of X based on noinformation.

    Remark 4.12. There are also analogues of integral convergence theorems such as Fatous Lemma,and the Monotone and Dominated Convergence Theorems, for conditional expectations as op-

    posed to ordinary expectations.Example 4.13. Consider the 3-period binomial model of stock price movements, with samplespace given by

    = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.The stock price, initially S0, goes up or down depending on the coin toss. IfSt() = St(1 . . . t)is the stock price after t tosses, it evolves as

    St+1() =

    Stu, if t+1 = HStd, if t+1 = T,

    where u > 1 > d > 0 are constants. Let Ft represent the -algebra determined by the first ttosses. The -algebra determined by the first toss is F1 = {, , AH, AT}, where AH (respectivelyAT) is the event corresponding to a H (respectively a T) on the first toss.

    30

  • 8/2/2019 b10b Notes

    32/101

    Using the partial averaging property on the sets AH and AT, we can show (the obvious fact)that

    E[S2|F1]() = (pu + qd)S1(),as follows: E[S2

    |F1] is constant on AH (and constant on AT) and must satisfy the partial averaging

    property on these sets:AH

    E[S2|F1] dP =AH

    S2 dP,

    AT

    E[S2|F1] dP =AT

    S2 dP.

    (Obviously the partial averaging property is true on (all are zero) and it will be true on if itis true on AH and AT since AH AT = ). On AH we have

    AH

    E[S2|F1] dP = P(AH)E[S2|F1]() (since E[S2|F1] is constant over AH)= pE[S2|F1], AH,

    whilst on the other hand AH

    S2 dP = p2u2S0 +pqudS0.

    Hence

    E[S2|F1]() = pu2S0 + qudS0 = (pu + qd)uS0 = (pu + qd)S1(), AH.

    Similarly, we can show that

    E[S2|F1]() = (pu + qd)S1(), AT.

    So overall we getE[S2|F1]() = (pu + qd)S1(), .

    With F0 = {, }, we can show similarly thatE[S1|F0] = (pu + qd)S0.

    F0 contains no information, so any F0-measurable random variable must be constant (non-random). Therefore E[S1|F0] is that constant which satisfies the averaging property

    E[S1|F0] dP =

    S1 dP = E[S1] = (pu + qd)S0,

    and so we haveE[S1|F0] = (pu + qd)S0.

    In a T-period binomial model, we have

    E[St+1|Ft] = (pu + qd)St, t = 0, 1, . . . , T 1.

    To show this, define, for any t = 0, 1, . . . , T 1, the random variable

    X :=St+1

    St.

    Then X = u if t+1 = H and X = d if t+1 = T, and X is independent ofFt because each cointoss is independent. Hence

    E[St+1|Ft] = E[XSt|Ft] = StE[X|Ft] = StE[X] = (pu + dq)St.

    Note that the stock price is a Markov process.

    31

  • 8/2/2019 b10b Notes

    33/101

    4.2 Martingales

    Definition 4.14 (Martingale). A stochastic process M = (Mt)Tt=0 on a filtered probability space

    (, F,F := (Ft)Tt=0,P) is a martingale with respect to the filtration F = (Ft)Tt=0 if

    1. Each Mt is Ft-measurable (so the process (Mt)Tt=0 is adapted to the filtration (Ft)Tt=0),2. for each t {0, 1, . . . , T }, E|Mt| < ,3.

    E[Mt+1|Ft] = Mt, t = 0, 1, . . . , T 1.

    So martingales tend to go neither up nor down. A supermartingale tends to go down, i.e. thesecond condition above is replaced by E[Mt+1|Ft] Mt. A submartingale tends to go up, i.e.E[Mt+1|Ft] Mt.

    A simple argument using the tower property and induction shows the following.

    Lemma 4.15.

    E[Mt+u|Ft] = Mt, for arbitraryu {1, . . . , T t}.Proof. Consider E[Mt+2|Ft]. By the tower property,

    E[Mt+2|Ft] = E[E[Mt+2|Ft+1]|Ft] = E[Mt+1|Ft] = Mt,

    and continuing in this fashion we get

    E[Mt+u|Ft] = Mt, for u = 1, 2, . . . , T t.

    Let X be an integrable random variable (E [|X|] < ) on a filtered probability space(, F,F := (Ft)Tt=0,P). Define

    Mt := E[X|Ft], t {0, 1, . . . , T }.

    We then have:

    Lemma 4.16. M := (Mt)Tt=0 is a (P,F)-martingale.

    Proof. We have

    E[Mt+1|Ft] = E[E[X|Ft+1]|Ft]= E[X|Ft] (by the tower property)= Mt.

    Definition 4.17 (Predictable process). On a filtered probability space (, F, (Ft)Tt=0,P), a pre-dictable process (t)

    Tt=1 is one such that, for each t {1, . . . , T }, t is Ft1-measurable.

    Proposition 4.18. Let (Mt)Tt=0 be a martingale on a filtered probability space (, F,F :=

    (Ft)Tt=0,P). Let (t)Tt=1 be a bounded predictable process. Then the process N := (Nt)Tt=0 de-fined by

    N0 := 0, Nt :=t

    s=1

    s(Ms Ms1), t = 1, . . . , T ,

    is a (P,F)-martingale.

    32

  • 8/2/2019 b10b Notes

    34/101

    Remark 4.19. The process N is called a martingale transformor discrete-time stochastic integral,and is sometimes denoted Nt =

    t0

    s dMs or Nt = ( M)t.Proof. We have

    E[Nt+1|Ft] = E t+1

    s=1

    s(Ms Ms1)Ft

    = E

    t

    s=1

    s(Ms Ms1) + t+1(Mt+1 Mt)Ft

    = E [ Nt + t+1(Mt+1 Mt)| Ft]= Nt + t+1(E[Mt+1|Ft] Mt

    =0

    ) (since Nt and t+1 are Ft-measurable)

    = Nt.

    Proposition 4.20. On a filtered probability space (, F,F := (Ft)Tt=0,P), an adapted sequenceof real random variables M = (Mt)

    Tt=0 is a (P,F)-martingale if and only if for any predictable

    process = (t)Tt=1, we have

    E

    t

    s=1

    s(Ms Ms1)

    = 0, t = 1, . . . , T .

    Proof. If (Mt)Tt=0 is a martingale, define the process X := (Xt)

    Tt=0 by X0 = 0 and, for t =

    1, . . . , T , Xt :=t

    s=1 s(Ms Ms1) for any predictable process = (t)Tt=1. Then, by Proposi-tion 4.18, X is also a martingale, so E[Xt] = X0 = 0.

    Conversely, ifE

    ts=1 s(Ms Ms1)

    = 0 holds for any predictable , take u {1, . . . , T },let A Fu be given, and define a predictable process by setting u+1 = 1 A, t = 0 for all othert {1, . . . , T }. Then

    0 = E

    t

    s=1

    s(Ms Ms1)

    = E[IA(Mu+1 Mu)]= E [E[IA(Mu+1 Mu)]|Fu]= E [IA(E[Mu+1|Fu] Mu)] .

    Since this holds for all A Fu it follows that E[Mu+1|Fu] = Mu, so M is a martingale.

    5 Equivalent measures*

    Here is a deep theorem, which we do not prove.

    Theorem 5.1 (Radon-Nikodym). LetP andQ be two probability measures on a space (, F).Assume that for every A F satisfyingP(A) = 0, we also have Q(A) = 0. Then we sayQ isabsolutely continuous with respect to P. Under this assumption, there is a nonnegative randomvariable Z such that

    Q(A) =

    A

    ZdP, A F, (5.1)

    and Z is called the Radon-Nikodym derivative ofQ with respect to P.

    33

  • 8/2/2019 b10b Notes

    35/101

    The random variable Z is often written as

    Z =dQ

    dP.

    Equation (5.1) implies the apparently stronger condition

    EQ[X] = E[XZ],

    for every random variable X for which E[XZ] < . To see this, note that (5.1) in Theorem 5.1is equivalent to

    EQ[IA] = E[IAZ], A F.This is then extended this to general X via the standard machine argument.

    IfQ is absolutely continuous with respect to P and P is absolutely continuous with respectto Q, then we say that P and Q are equivalent. In other words, P and Q are equivalent if andonly if

    P(A) = 0 Q(A) = 0, A F.

    IfP and Q are equivalent and Z is the Radon-Nikodym derivative ofQ w.r.t. P, then

    1

    Z is theRadon-Nikodym derivative ofP w.r.t. Q, i.e.

    EQ[X] = E[XZ] X, (5.2)E[Y] = EQ

    Y 1

    Z

    Y, (5.3)

    and letting X and Y be related by Y = XZ we see that the above two equations are the same.

    Example 5.2 (Radon-Nikodym theorem in 2-period coin toss space). Let

    = {HH, HT, TH, TT},the set of coin toss sequ