n-2 cond-exp

Upload: xiong-jiujiu

Post on 05-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 n-2 cond-exp

    1/31

    Stochastic Analysis in Mathematical Finance

    MA5248

    Lecture Notes 2

    Conditional Expectations

    1. Two Examples.

    Let (, F, IP) be a probability space. We are going to introduce conditionalexpectations. Let us begin with some special cases and then proceed to twoconcrete examples (in items 3 and 6).

    1. Let F with IP() > 0. Define IP() on F as follows:

    IP(E) =IP( E)

    IP() . (1.1)

    It is trivial to check that IP

    () thus defined in (1.1) is a probability measure (p.m.)on F, which is known as the conditional probability relative to . It is usuallywritten as IP( | ). Note also that (1.1) can also be written as

    IP( E) = IP()IP

    (E).

    The integral w.r.t. this p.m. is called the conditional expectation relative to :

    EE

    [X] =

    X() IP

    (d) = 1IP()

    X() IP(d) = EE[X; ]IP()

    , (1.2)

    provided that EE[X] is defined, (i.e., EE|X| < ).

    (To establish (1.2), one only needs to make use of the standard machine, namely,to begin with X as indicator r.v.s; then simple r.v.s; nonnegative r.v.s; etc. Fordetails, see 3.2 below.) Recall that

    EE[X; ] = EE[XI] = X dIP.

    2. However, if IP() = 0, we decree that IP(E) = 0 for every E F.

    3. We now give our first example.

    Example 1. Suppose there is a countable measurable partition {n, n 1} of ,namely:

    = n n, n F, m n = , if m = n.

    1

  • 8/2/2019 n-2 cond-exp

    2/31

    Then we have, when EE[X] is defined, (i.e., EE|X| < ),

    IP(E) =n

    IP(n E) =n

    IP(n)IPn(E)

    EE[X] =n

    n

    X() IP(d) =n

    IP(n)EEn [X], (1.3)

    Let G be the -algebra generated by this partition. Given an integrable r.v. X F,we define the function, denoted by EE[X | G] (or EEG [X]), on by: for ,

    EE[X | G] () =n

    EEn [X] In ().

    Observe that EE[X | G] is a discrete r.v. that assumes the value

    EEn [X] =EE[X; n]

    IP(n)

    on the event n for each n. More precisely, the above function EE[X|G] can beexpressed as

    EE[X | G] () =n

    EEn [X]In() =n

    EE[X; n]

    IP(n)In(). (1.4)

    Obviously, this function EE[X | G] is measurable with respect to the sub--algebraG, i.e., EE[X | G] G.

    Now (1.3) can be re-written as

    EE[X] =

    X dIP =n

    n

    EE[X|G] dIP =

    EE[X|G] dIP. (1.5)

    Furthermore, if G, then is a union of a sub-collection of the ns, and notethat the following also holds:

    G : EE[X; ] = EE[XI

    ] =

    X dIP =

    EE[X | G] dIP. (1.6)

    (Clearly, (1.5) is a special case of (1.6) with being replaced by .)

    4. Remarks:

    (a) Look at (1.6) and especially note that the last two terms are integrals over thesame event G; but the integrand X of the 3rd term is measurable w.r.t. F,

    2

  • 8/2/2019 n-2 cond-exp

    3/31

    while the integrand EE[X|G] on the far right is measurable w.r.t. the sub--algebraG.

    (b) If in (1.6) is replaced by another event in F \ G, then (1.6) may not hold anymore. In other words, (1.6) holds in general for G only.

    (c) We now claim that for any integrable X, EE[X | G] is unique a.s. That is, if thereis another r.v. W G such that

    G :

    X dIP =

    W dIP, (1.7)

    then W = EE[X|G] a.s.

    Proof of the Claim. Put

    = { : EE[X | G]() > W()}.

    Obviously, G, and = n n, where

    n = { : EE[X | G]() > W() + n1}.

    If IP() > 0, then there exists an n 1 such that IP{n} > 0. On this n,n

    X dIP =

    n

    W dIP 0,

    0, elsewhere.

    Then we check that for any y such that IP(Y = y) > 0, this fX|Y(x|y) is indeed adensity function.

    Next, for X with finite mean, we define the conditional expectation of X givenY = y to be

    EE[X | Y = y]def.=

    x

    x fX|Y(x|y). (4.1)

    Obviously, EE[X | Y = y] in (1.9) is a function of y. It can be easily verified that

    fX (x) =y

    fY(y) fX|Y(x|y); EE[X] =y

    fY(y) EE[X | Y = y]. (4.2)

    By (4.1),

    EE[X | Y = y] =

    x

    x IP(X = x, Y = y)

    IP(Y = y)=

    EE[X; Y = y]

    IP(Y = y), (4.3)

    which is the same as EE[X | Y] () for {Y = y}. (Cf. (1.4))

    Recall that by 2.4, EE[X | Y] = (Y) for certain extended-valued Borel measurablefunction . We claim that can be taken to be (y) = EE[X | Y = y]. Indeed, itcan be verified easily

    EE[X | Y]() = (Y)() =y

    EE[X | Y = y] I{y}(Y()). (4.4)

    A slightly general formula can also be derived: let G be a function such thatEE|G(X)| < ; we claim that

    EE[G(X) | Y] = (Y), where (y) =x

    G(x) fX|Y(x|y). (4.5)

    Note that the function is nothing but EE[G(X) | Y = y], the conditional expec-tation ofG(X) given Y = y, mentioned in an undergraduate course on Probability.To verify (4.5), observe first (Y) (Y), so (a) of 2.2 holds.

    10

  • 8/2/2019 n-2 cond-exp

    11/31

    To check (b) of 2.2, take an arbitrary event A (Y). Then, for some B B(IR),A = Y1(B) = (Y B) = { : Y() B}. Recall that

    fY(y) = IP(Y = y) = IP{ : Y() = y} =

    {:Y()=y}IP().

    EE[(Y); A] =A

    (Y)()IP() =yB

    (y)

    {:Y()=y}

    IP()

    =yB

    x

    G(x)fX|Y(x|y)fY(y) =x

    G(x)yB

    f(x, y)

    =x

    G(x)IB(y)f(x, y) = EE[G(X)IB(Y)]

    = EE[G(X); A].

    2. In the above, we discussed the discrete case. Now we turn to X and Y with joint

    density f(x, y); that is, for B B(IR2),

    IP((X, Y) B) =

    B

    f(x, y) dxdy.

    In this case, we define the conditional density of X given Y = y by

    fX|Y(x|y) =

    f(x, y)

    fY(y), 0 < fY(y) < ,

    0, elsewhere.

    For each y such that 0 < fY(y) < the function fX|Y(x|y), < x < , is adensity function, and hence we can talk about its mean (and other moments as well),provided EE|X| < (and other moments exist). Its mean is called the conditionalexpectation of X given Y = y and is denoted by EE[X | Y = y]. Thus

    EE[X | Y = y] =

    IR

    xfX|Y(x|y) dx =

    IR

    xf(x, y) dx

    fY(y)(4.6)

    when 0 < fY(y) < . Define EE[X | Y = y] = 0 elsewhere. (Cf. (4.3)) Clearly, (4.6)

    shows that EE[X | Y = y] is a function of y. More generally, for measurable G suchthat EE|G(X)| < ,

    EE[G(X) | Y = y] =

    IR

    G(x)fX|Y(x|y) dx =

    IR

    G(x)f(x, y) dx

    fY(y), (4.7)

    which is also a function of y. In fact, take

    (y) =

    IR

    G(x)f(x, y) dx

    fY(y), (4.8)

    11

  • 8/2/2019 n-2 cond-exp

    12/31

    and one can easily check that (Y) is a version of the conditional expectation ofG(X) given (Y), and hence EE[G(X) | Y] = (Y) a.s. To verify, observe first that(Y) (Y).

    To check (b) of (viii), take A (Y), then A = Y1(B) for some B B1. So,

    EE[(Y); A] = EE[(Y)IB(Y)] =

    B

    (y)fY(y) dy

    =

    B

    IR

    G(x)f(x, y) dx

    fY(y)fY(y) dy =

    B

    IR

    G(x)f(x, y) dxdy

    =

    IR

    IR

    G(x)IB(Y)f(x, y) dxdy = EE[G(X)IB(Y)] = EE[G(X); A].

    Note that both (4.5) and (4.8) are useful for us to get EE[X | Y] for jointly discreteand jointly absolutely continuous cases, respectively.

    3. Example 1. Let X1 and X2 be independent and have a Poisson distribution withmean , and let Z = X1 + X2. Find EE[X1 | Z] and IP(X1 = k | Z).

    Discussion. Observe that Z is a discrete r.v., having still a Poisson distributionwith mean 2. This is the special case mentioned in 1.3 & 6, especially in (1.4)and (4.4).

    To be precise, on the event {Z = n},

    EE[X1 | Z] =EE[X1; Z = n]

    IP(Z = n).

    It is trivial to obtain the denominator:

    IP(Z = n) = e2(2)n

    n!.

    Now turn to find the numerator:

    EE[X1; Z = n] =ni=0

    i IP(X1 = i) IP(i + X2 = n)

    =

    ni=0

    i ei

    i! eni

    (n i)!

    = n e2ni=0

    i

    i!(n i)!.

    Thus, by taking the ratio,

    EE[X1 | Z] =1

    2n

    ni=0

    i

    n

    i

    =

    n

    2.

    12

  • 8/2/2019 n-2 cond-exp

    13/31

    In other words,

    EE[X1 | Z]() =n

    n

    2I{Z=n}() =

    n

    n

    2In(Z()).

    One can take() =

    n

    n

    2In()

    which is an extended-valued Borel measurable function such that a.s.

    EE[X | Z] = (Z).

    To find IP(X1 = k | Z), write A = {X1 = k}. Then IP(X1 = k | Z) = EE[IA | Z].Again by (1.4), on the event {Z = n},

    EE[IA | Z] =EE[IA; Z = n]

    IP(Z = n)=

    IP(X1 = k, X2 = n k)

    IP(Z = n)

    =IP(X1 = k) IP(X2 = n k)

    IP(Z = n)

    =

    n

    k

    /2n, for 0 k n,

    0, otherwise.

    What do these results mean? They say that, on each of the event {X1+X2 = n}, theconditional distribution ofX1 (given X1+X2) behaves like a binomial distributionwith parameters n and 12 .

    4. Example 2. Let X1 and X2 be independent with IP(Xi > t) = et for t 0, and

    let Y = X1 + X2. Find EE[X1 | Y] and IP(X1 < 3 | Y).

    Discussion. To make use of (4.7), one needs to find the joint density of ( X1, Y) =(X1, X1 + X2). But this is a standard exercise on multidimensional change ofvariables.

    13

  • 8/2/2019 n-2 cond-exp

    14/31

    5. General Properties of Conditional Expectation.

    Let (, F, IP) be a probability space. Let X F be a r.v., and let G F be asub--algebra. We have defined the conditional expectation EE[X|G]. We now turn

    to its properties. Conditional expectation has in fact many of the same propertiesthat ordinary (unconditional) expectation does. Recall that whenever EE[X|G] ismentioned, it (implicitly) means that X is integrable, i.e., EE|X| < , as theconditional expectation is defined only for r.v.s with finite means .

    1. Linearity. EE[aX+ Y|G] = aEE[X|G] + EE[Y|G] a.s.

    Proof. All we need to check is that the right-hand side is a version of the left. Itclearly is G-measurable. Next, for all A G, by linearity of the integral and thedefining properties of conditional expectation,

    A

    [aEE(X|G) + EE(Y|G)] dIP = a

    A

    EE[X|G] dIP +

    A

    EE[Y|G] dIP

    = a

    A

    X dIP +

    A

    Y dIP =

    A

    (aX + Y) dIP.

    2. Monotonicity. If X Y, then EE[X|G] EE[Y|G] a.s.

    Proof. First observe that for all A G ,

    A

    EE[X|G] dIP =A

    X dIP A

    Y dIP =A

    EE[Y|G] dIP.

    In particular, putting A = {EE[X|G] EE[Y|G] > 0}, it is obvious to see that thisevent A has probability 0 for every > 0.

    Remark: This monotonicity property is equivalent to the following positivityproperty:

    Positivity. If X 0, then EE[X|G] 0.

    3. Modulus Inequality. |EE[X|G]| EE[|X||G] a.s.

    Proof. Obviously, X |X| and X |X|. Apply the above monotonicity propertyand obtain that EE[X|G] EE[|X||G] a.s. and EE[X|G] EE[|X||G] a.s. We concludethat |EE[X|G]| EE[|X||G] a.s.

    4. Monotone Convergence Theorem. IfXn 0 andXn X a.s. withEEX < ,then EE[Xn|G] EE[X|G] a.s.

    14

  • 8/2/2019 n-2 cond-exp

    15/31

    Proof. By item 2, EE[Xn|G] and EE[Xn|G] EE[X|G] a.s., and hence EE[Xn|G]converges to a limit, say X a.s. Obviously, X G , and X EE[X|G] a.s. Nowfor any A G, on the one hand,

    A EE(Xn|G) dIP =

    A Xn dIP

    A X dIP =

    A EE(X|G) dIP

    by the monotone convergence theorem (for the ordinary unconditional expectation);on the other hand,

    A

    EE[Xn|G] dIP

    A

    X dIP

    also by the monotone convergence theorem. We obtain that on each A G,

    A

    X dIP = A

    EE[X|G] dIP.

    In view of the fact X EE[X|G] a.s., we conclude that X = EE[X|G] a.s. (Why?)That is, EE[Xn|G] EE[X|G] a.s. as n .

    An Alternative Proof. Let Yn = X Xn. Clearly, Yn 0 and hence item 2implies that

    Zndef.= EE[Yn|G] .

    Thus, there exists a limit, say Z G, such that Zn Z a.s. If we can show that

    Z = 0 a.s., then we are done.

    Let A G.A

    Zn dIP =A

    Yn dIP 0 as n by the dominated convergencetheorem. We conclude that

    A

    Z dIP = 0 for all A G , so Z = 0 a.s.

    Corollary 1. Let Xn 0 be integrable. If Xn 0 a.s., then EE[Xn|G] 0 a.s.

    Proof. Refer to the above proof.

    Corollary 2. If Xn X a.s. with Xn, X integrable, (but not necessarily nonneg-ative), then EE[Xn|G] EE[X|G] a.s.

    Proof. Consider X Xn and apply Corollary 1.

    5. Dominated Convergence Theorem. If |Xn| Y a.s. where EE(Y) < andXn X a.s., then EE[Xn|G] EE[X|G] a.s.

    Proof. Let Wkdef.= sup

    nk(Y Xn). Then

    0 Wk limn

    (Y Xn) = Y X

    15

  • 8/2/2019 n-2 cond-exp

    16/31

    a.s., as k . Let Zkdef.= inf

    nk(Y Xn). Then 0 Zk Y X a.s., as k . By

    item 4 and its corollaries, EE[Wk|G] EE[Y X|G] a.s. and EE[Zk|G] EE[Y X|G]a.s.

    On the other hand, by item 2 we have EE[Zk|G] EE[Y Xk|G] EE[Wk|G] a.s.Hence we obtain that EE[Y Xk|G] EE[Y X|G] a.s., as k . It follows thatEE[Xk|G] EE[X|G] a.s. by making use of item 1, and we are done!

    6. Proposition on Taking Out What Is Known. LetY and Y Z be integrabler.v.s and Z G ; then

    EE[Y Z|G] = ZEE[Y|G] a.s. (5.1)

    Proof. By decomposing Y = Y+ Y and Z = Z+ Z, one may only needto deal with the case that Y 0 and Z 0. As Z G, the right-hand side ismeasurable w.r.t. G. For any A G, by the definition of conditional expectation,A

    EE[Y Z|G] dIP =A

    Y Z dIP. So, we would like to establish

    A

    Y Z dIP =

    A

    ZEE[Y|G] dIP. (5.2)

    Recall the standard machine. When Z = I where G, the LHS of (5.2)becomes

    A Y dIP, while the RHS is equal to

    A EE[Y|G] dIP = A Y dIP.Extend to the case when Z is simple. Next, for any nonnegative Z G, there is asequence of simple functions Zn G such that 0 Zn Z. The rest is obvious.

    Remark: Note that the above (5.1) is a very basic and useful result. It saysthat for conditional expectation with respect to G, r.v.s Z G acts likeconstants. They can be brought outside the integral.

    7. Jensens Inequality. Let be a convex function on IR. If EE|X| < andEE|(X)| < , then a.s.

    (EE[X|G]) EE[(X)|G].

    Proof. To begin with, consider the case that X is a simple r.v. taking values {yi}on the sets {i}, 1 i n, which forms a partition of . We have

    EE[X|G] =i

    yiIP{i|G},

    EE[(X)|G] =i

    (yi)IP{i|G},

    16

  • 8/2/2019 n-2 cond-exp

    17/31

    wherei

    IP{i|G} = 1 a.e. Hence Jensens inequality holds in this case by the

    property of convexity. In general, let {Xm} be a sequence of simple r.v.s convergingto X a.s. and satisfying |Xm| |X| for all m. (Note that such sequences exist.)Recall that is continuous as it is convex. If we let m below:

    (EE[Xm|G]) EE[(Xm)|G] a.s., (5.3)

    the left-hand member converges to (EE[X|G]). For the right-hand member, we needdominated convergence. To get this, we first consider n which is obtained from be replacing the graph of outside (n, n) with tangential lines. Thus for each nthere is a constant Cn such that

    x IR : |n(x)| Cn(|x| + 1).

    Consequently, we have

    |n(Xm)| Cn(|Xm| + 1) Cn(|X| + 1)

    and the last term is integrable by hypothesis. It now follows from property 5 that

    limm

    EE[n(Xm)|G] = EE[n(X)|G].

    To establish Jensens inequality, we need to replace n by . Letting n wehave n and n(X) is integrable. Hence by monotone convergence, Jensensinequality follows for a general convex .

    An Alternative Proof. For any x and y, note that

    (x) (y) (y)(x y)

    where is the right-hand derivative of . Hence

    (X) (EE[X|G]) (EE[X|G])[X EE(X|G)].

    The right member may not be integrable; but let = { : |EE[X|G]| A} for A > 0.Replace X by XI in the above, take expectation of both sides, and let A .Observe that

    EE[(XI)|G] = EE[(X)I + (0)Ic |G] = EE[(X)|G]I + (0)Ic .

    (Note that in the above we have used (5.1).)

    Yet Another Proof. If is linear, the result is trivial so we will suppose that is not linear. Now in this case let S = {(a, b) : a,b, Q, ax + b (x) for all x},

    17

  • 8/2/2019 n-2 cond-exp

    18/31

    then (x) = sup{ax + b : (a, b) S}. If(x) ax + b, then using the monotonicityin (ii) and the linearity in (i), one arrives at

    EE[(X)|G] aEE[X|G] + b a.s.

    Taking the sup over (a, b) S gives EE((X)|G) (EE(X|G)) a.s.

    An important corollary of Jensens inequality is:

    Corollary. ||EE(X|G)||p ||X||p for p 1.

    8. Conditional expectation also has properties that have no analog for ordinary un-conditional expectation. Now we come to one of such properties, which can beregarded as the most important property of conditional expectation relating tochanging fields.

    Theorem on Tower Property. If X is integrable and G1 G2, then

    EE(EE[X|G1]|G2) = EE[X|G1] a.s. (5.4)

    EE(EE[X|G2]|G1) = EE[X|G1] a.s. (5.5)

    In words, the smaller -algebra always wins.

    Proof. (5.4) follows immediately from 3.6, once we observe that EE[X|G1] G2as G1 G2. To prove (5.5), notice that EE[X|G1] G1 and if G1, as G1 G2,one has G2, too. Moreover,

    EE[X|G1] dIP =

    X dIP =

    EE[X|G2] dIP =

    EE(EE[X|G2]|G1) dIP

    holds for all G1. Thus, EE(EE[X|G2]|G1) = EE[X|G1] a.s.

    Remark: Note that the above result in conjunction with item 6 is very useful forcomputing conditional expectation.

    9. A Geometric Interpretation of EE[X|G].

    Let

    L2(F) = {Y F : EEY2 < },which is a Hilbert space, and note that L2(G) is a closed subspace. (Also refer to3.1 in the previous set of notes.)

    Theorem. Suppose that EE(X2) < . Then, EE[X|G] is the projection of X ontoL2(G). In statistical terms, EE[X|G] is the variable Y G that minimizes the meansquare error EE[(X Y)2]. In particular,

    EE[(X EE[X|G])2] EE[(X EEX)2]. (5.6)

    18

  • 8/2/2019 n-2 cond-exp

    19/31

    Proof. We will prove the statistical interpretation only. The geometric part willfollow. If Z L2(G), then EE[ZX|G] = ZEE[X|G] a.s. by the above item 6 (TakingOut What Is Known). Taking expectation gives

    EE[ZX] = EE(EE[ZX|G]) = EE(ZEE[X|G]),

    or rearranging, EE[Z(X EE[X|G])] = 0 for Z L2(G). Now for Y L2(G) andtaking Z = Y EE[X|G], it leads to

    EE[(X Y)2] = EE[(X EE[X|G] Z)2] = EE[(X EE[X|G])2] + EE[Z2].

    So, EE[(X Y)2] is minimized when Z = 0, i.e., when Y = EE[X|G] a.s.

    Discussion. By the above geometric interpretation, one can define conditional

    expectation without using the Radon-Nikodym theorem. To be precise,

    first, define the conditional expectation EE[X|G] for Xwith X L2 (i.e., EE[X2] 1 and 1p +

    1q

    = 1. Then

    |EE[XY]| EE|XY| [EE|X|p]1/p [EE|Y|q]1/q. (5.8)

    (Hence if X Lp and Y Lq with p, q as above, then the product XY belongs toL1.)

    (b) Minkowskis Inequality. Let X and Y be random variables and 1 < p < with EE(|X|p) < and EE(|Y|p) < . Then

    [EE|X+ Y|p]1

    p [EE|X|p]1

    p + [EE|Y|p]1

    p . (5.9)

    6. Regular Conditional Probability Distribution.

    Let (, F, IP) be a probability space. Let G F be a sub--algebra. For F F,we call any version of EE[I

    F|G] a version of the conditional probability of F given

    G,

    and write IP{F | G} = EE[IF | G] a.s.

    Obviously, as a set function on F, 0 IP{ | G} 1 a.s., IP{ | G} = 1 a.s., andIP{Fc | G} = 1 IP{F | G} a.s. By linearity and monotone convergence theoremfor conditional expectations [see 5.1 and 5,4], we can also show that for a fixedsequence {Fn} of disjoint elements in F,

    IP

    Fn | G

    =n

    IP(Fn | G), a.s. (6.1)

    20

  • 8/2/2019 n-2 cond-exp

    21/31

    Thus, IP{ | G}, as a set function, looks like a probability measure on F. But,in general, it is not! Note that the exceptional set in (6.1) usually depends onthe sequence {Fn}. Except in trivial cases, there are uncountably many sequencesof disjoint sets, so it is not at all clear that we can choose a good modification{(IP|G)(F) : F F} of the collection {IP(F | G} : F F}. Let us formulate

    what we mean by a good modification.

    1. Definition (regular conditional probability given G).

    Let (, F, IP) be a probability space and let G be a sub--algebra ofF. By a regularconditional probability (IP|G)(, ) given G, we mean a map

    (IP|G) : F [0, 1]

    such that

    (a) for F F, the function (IP|G)(F, ) is a version of IP(F|G);

    (b) for almost every , the map F (IP|G)(F, ) is a probability measure on F.

    It is known that regular conditional probabilities exist under most conditions en-countered in practice, but, they do not always exist.

    2. A Counterexample. This counterexample, for which Halmos, Dieudonne, An-dersen and Jessen share the credit, exhibits a situation in which no regular condi-tional probability given G exists. Take = [0, 1] and G = B[0, 1].

    Let denote Lebesgue measure on (, G). Let Z be a subset of of inner -measure0 and outer -measure 1. (Such a set Z exists by making use of Axiom of Choice.)Note that Zc (with Zc denoting [0, 1]\Z) has also outer -measure 1. Let F be thesmallest -algebra on extending G and containing Z, so that a typical element of F may be written

    = (Z A) (Zc B), where A, B G .

    Thus,(Z ) = (Z A) = (A),

    (Zc ) = (Zc B) = (B).

    Hence we can define a probability measure IP on (, F) by

    IP{}def.=

    1

    2(Z ) +

    1

    2(Zc ) =

    1

    2[(A) + (B)].

    Assume that (IP|G) : F [0, 1] is a regular conditional probability given G.We shall show that this assumption leads to a contradiction.

    21

  • 8/2/2019 n-2 cond-exp

    22/31

    Let G. Then, for G G ,

    EE[(IP|G)(Z ); G] = IP(Z G) =1

    2(Z G)

    =1

    2( G) = EE[

    1

    2I; G],

    so that (IP|G)(Z , ) = 12 I(), a.s.

    Since G is generated by a countable -system I, and since

    (IP|G)(Z , ), 1

    2I()

    are measures for every (the first because of our assumption), the set

    Jdef.

    = { : (IP|G)(Z )() =1

    2 I(), G },

    = { : (IP|G)(Z )() =1

    2I(), I}

    is in G, and IP(J) = (J) = 1. If J, then

    (IP|G)(Z J, ) =1

    2IJ() =

    1

    2IJ\{}() = (IP|G)(Z [J\{}], ),

    so that Z J = Z [J\{}]; in other words, Z. Hence J, which is an elementof G of measure 1, is a subset of Z, contradicting the fact that Z has inner measure

    0.

    3. An Example on Regular Conditional PDF. Refer to the example in 4.2 ofNotes 2-I. Suppose that X and Y, defined on (, F, IP) and both taking values inIR, have a joint density f(x, y) > 0. That is, C B2,

    IP{(X, Y) C} =

    C

    f(x, y) dxdy.

    Obviously, (Y) is a sub--algebra of F. Recall that for every A (Y), there

    exists B B(IR) such that A = Y1(B), and hence

    (Y) = {(Y B) : B B(IR)}.

    Now, claim that the elementary conditional pdf fX|Y(x|y) is a regular conditionalpdffor X given Y in that for every B B(IR),

    B

    fX|Y(x|Y()) dx =

    B

    f(x, Y()) dx

    fY(Y())

    22

  • 8/2/2019 n-2 cond-exp

    23/31

    is a version of IP{X B|(Y)}.

    Referring to the example in 4.2, one can easily see that this claim is nothing but(4.8), when taking G = IB in (4.8). In other words, for y IR and Borel measurableset B B(IR), define

    (B, y) =

    B f(x, y) dx

    fY(y).

    One can show that (B, Y()) is a regular conditional distribution for X given Y,(or equivalently, given (Y)).

    7. Uniform Integrability.

    The concept of uniform integrability has been introduced in Notes 1. It is also of

    basic importance in the connection of martingale theory. Thus, we will make anextensive review of UI in the following. First, recall that the

    1. Definition of Uniform Integrability. A collection of r.v.s Xt, t I, is said tobe uniformly integrable (abbrev. UI) if

    limA

    suptI

    EE[|Xt|; |Xt| > A] = 0. (7.1)

    That is, for all > 0, there is an A > 0 sufficiently large such that for all t I,{|Xt|>A}

    |Xt| dIP < .

    If we pick A large enough so that in (7.1) the sup < 1 it follows that

    suptI

    EE|Xt| A + 1 < . (7.2)

    This observation, which says that Xt, t I, is L1-bounded if it is UI, will be useful

    several times below. Note also that it is not true that a family bounded in L1 is UI.

    For easy reference, we collect a few known results concerning UI in the following.

    2. Theorem. The family {Xt, t I} is uniformly integrable if and only if the twoconditions below are satisfied:

    (a) EE|Xt| is bounded in t T, i.e., suptT

    EE|Xt| < , which means that {Xt, t I}

    is L1-bounded;

    23

  • 8/2/2019 n-2 cond-exp

    24/31

    (b) For every > 0, there exists () > 0 such that for any EE F:

    IP(E) < () =

    E

    |Xt| dIP < for every t T. (7.3)

    Proof. (a) is nothing but (7.2). To deduce (b) from (7.1), let E F and writeEt = {|Xt| > A}. So,

    EE|Xt| =

    EEt

    |Xt| dIP +

    E\Et

    |Xt| dIP

    Et

    |Xt| dIP + AIP(E).

    Given > 0, there exists by (7.1) A = A() such that the last-written integral is lessthan /2 for every t. Choose () = /(2A) and (b) follows.

    Conversely, suppose that (a) and (b) are true. Then by the Markov inequality, for

    all t,IP(|Xt| > A)

    EE|Xt|

    A

    M

    A,

    where M is an upper bound indicated in (a). Hence if A > M/, then

    IP(|Xt| > A) <

    and we have by (b) {|Xt|>A}

    |Xt| dIP < .

    Thus (7.1) is true.

    3. Proposition. If supt

    EE[|Xt|p] < for some p > 1, then {Xt, t I } is uniformly

    integrable.

    Proof. By assumption, EE[|Xt|p] is uniformly bounded from above, say by M < .

    For A large enough, |Xt|>A

    |Xt| dIP = EE[|Xt|I(|Xt| > A)]

    1

    Ap1 EE[|Xt|p]

    M

    Ap1 0,

    as A . Note that the last term does not depend on t. Hence {Xt} is uniformlyintegrable.

    4. A Few Obvious Observations. Proofs will be omitted.

    24

  • 8/2/2019 n-2 cond-exp

    25/31

    (a) If{Xt} and {Yt}, (t I), are uniformly integrable, then so is {Xt + Yt}.

    (b) Any subfamily of a UI family is itself UI.

    (c) Every finite family of L1 r.v.s is UI.

    (d) Suppose for all A, there exists B such that

    |X| |Y| a.s.

    If {Y, B} is UI, so is {X, A}.

    5. Proposition. Let X be a r.v. on (, F, IP) with finite mean, (i.e., EE|X| < ).Let

    A = {all sub--algebras of F}.

    Then the family of r.v.s {EE[X|G] : G A} is uniformly integrable.

    This proposition is left with you as an exercise. In view of part (b) of the aboveitem 4, we may re-phrase the above proposition as follows:

    Corollary. LetX L1(IP). LetA be a family of some sub--algebras ofF. Then,the family of r.v.s {EE[X|G] : G A} is UI.

    6. A common way to check uniform integrability is to use

    Theorem. Let 0 be any function with(x)/x as x , e.g., (x) = xp

    with p > 1 or (x) = x log+ x. If

    EE(|Xt|) C

    for allt I, then {Xt, t I} is UI.

    Proof. For > 0 small enough, take M = C/. As (x)/x as x , thereexists an A > 0 large enough such that for x > A, one has (x)/x > M. For suchan A,

    {|Xt|>A}

    |Xt| dIP {|Xt|>A}

    |Xt|(|Xt|)

    (|Xt|) dIP C/M = .

    7. The relevance of UI to convergence in L1 is explained by the following

    Theorem. If Xn X in probability, then the following are equivalent:

    (a) {Xn : n 1} is UI.

    25

  • 8/2/2019 n-2 cond-exp

    26/31

    (b) Xn X in L1.

    (c) EE|Xn| EE|X| < .

    Proof. To show (a) = (b), let

    M(x) =

    M, if x M,x, if |x| M,M, if x M.

    The triangle inequality implies that

    |Xn X| |Xn M(Xn)| + |M(Xn) M(X)| + |M(X) X|.

    Observe that |M(Y) Y| = (|Y| M)+ |Y|I{|Y|>M}. Taking expected value

    gives

    EE|XnX| EE|M(Xn) M(X)| + EE[|Xn|; |Xn| > M] + EE[|X|; |X| > M]. (7.4)

    As M() is continuous, M(Xn) M(X) in pr. Also M() is bounded, by thebounded convergence theorem,

    EE|M(Xn) M(X)| 0.

    If > 0 and M is large, UI implies that the 2nd term of RHS in (7.4) is . Tobound the 3rd term, we observe that UI of {Xn} implies the L1-boundedness, i.e.,

    supn

    EE|Xn| < .

    Fatous lemma leads to EE|X| < , and then by making M large, we can make the3rd term . Therefore,

    limn

    EE|Xn X| 2.

    Since is arbitrary this proves (b).

    Now (b) = (c). This is obvious for

    |EE|Xn| EE|X|| EE||Xn| |X|| EE|Xn X| 0.

    (c) = (a). Let

    M(x) =

    x, on [0, M 1],0, on [M, ),linear, on [M 1, M].

    26

  • 8/2/2019 n-2 cond-exp

    27/31

    The dominated convergence theorem implies that if M is large,

    EE|X| EEM(|X|) /2.

    As in the first part of the proof, the bounded convergence theorem implies

    EEM(|Xn|) EEM(|X|),

    so using (c), we get that for n n0

    EE[|Xn|; |Xn| > M] EE|Xn| EEM(|Xn|)

    EE|X| EEM(|X|) + /2 < .

    By choosing M larger if necessary we can make

    EE[|Xn|; |Xn| > M]

    for 0 < n < n0, so {Xn} is UI.

    8. Corollary. Uniform integrability is a necessary condition for a sequence X1, X2, . . .of integrable r.v.s to converge in L1.

    Proof. This is an immediate consequence of the above theorem. Alternatively,with the aid of the theorem in item 2, one may also give a direct proof as follows.

    Suppose Xn X in L1, i.e., EE|XnX| 0 as n . Obviously, EE|Xn| EE|X|,and hence {EE|Xn|, n 1} is bounded. Observe that

    A

    |Xn| dIP

    A

    |Xn X| dIP +

    A

    |X| dIP.

    For all > 0, (and for all A measurable), there exists N() such that for all n N(),A

    |Xn X| dIP < /2. As for X and Xj , (1 j < N()), there also exists a > 0such that

    A |X| dIP < /2 as well as

    A |Xj | dIP < /2 for 1 j < N() so long as

    IP(A) < . The theorem in 7.2 concludes that {Xn, n 1} is UI.

    9. To conclude this section, we turn to another important and relevant result, whichis known as Scheffes theorem. Let (X, A, ) be a measure space.

    Scheffes Lemma. Suppose that fn, f are in L1(X, A, ), of which is a -finite

    measure, and that fn f a.e. with respect to . Then, as n ,X

    |fn f| d 0 if and only if

    X

    |fn| d

    X

    |f| d.

    27

  • 8/2/2019 n-2 cond-exp

    28/31

    Remark: If (X, A, ) is a probability space, then it is contained in the theoremof 7.7, in which case these two conditions are also equivalent to the uniform inte-grability of {fn, n 1}. This lemma gives a necessary and sufficient condition forthe L1 convergence as well.

    Proof of Scheffe Lemma. Let us see how to establish this technical lemma. Itsuffices for us to establish the following implication:

    X

    |fn| d

    X

    |f| d =

    X

    |fn f| d 0.

    (I) Consider first the case in which fn and f are nonnegative. Thus, by assumption,X

    (fn f) d =X

    fn d X

    f d 0, as n . Observe that

    (fn f)

    =(fn f) f, if fn < f,

    0 f, if fn f.

    Since fn f a.e.(), by the Dominated Convergence Theorem, we getX

    (fn f) d 0

    as n . On the other hand, |fn f| = (fn f)+ 2 (fn f), we conclude that

    |fn f| 0 in L1, when fn and f are nonnegative.

    (II) Now for general fn and f, write fn = f+n fn and f = f+ f. Obviously,a.e.() f+n f

    + and fn f.

    We would next like to show that, respectively, fn f in L1, as n . By

    Fatous lemma, X

    f+ limn

    X

    f+n and

    X

    f limn

    X

    fn .

    Therefore, X

    |f| =

    X

    f+ +

    X

    f limn

    X

    f+n + limn

    X

    fn

    limn

    (

    X

    f+n +

    X

    fn ) limn

    X

    f+n + limn

    X

    fn

    = limn

    (

    X

    f+n +

    X

    fn ) = limn

    X

    |fn|

    = limn

    X

    |fn| =

    X

    |f|.

    28

  • 8/2/2019 n-2 cond-exp

    29/31

    We conclude that

    X

    f+ = limn

    X

    f+n and

    X

    f = limn

    X

    fn , and

    limn X

    f+n = limn X

    f+n ,

    which means that limn

    X

    f+n exists and is equal to

    X

    f+.

    Similarly, one can draw the same conclusion that limn

    X

    fn exists and is equal toX

    f. Now, use the first part of the argument and obtain that both f+n f+

    and fn f in L1.

    Finally, observe that in L1, as n ,

    |fn f| = |f+n f

    n (f

    + f)| |f+n f+| + |fn f

    | 0.

    We are done!

    10. The following is a simple consequence.

    Corollary. Let fn, f be probability density functions. Suppose fn f a.e. withrespect to the Lebesgue measure on IR. Then

    IR

    |fn f| 0 as n .

    As a consequence, the probability distribution corresponding to fn converges weaklyto that corresponding to f as n .

    Proof. We record a proof for the first part of the theorem here for reference,though it is a simple consequence of Scheffe lemma. Put gn = f fn. Observe firstthat

    g+n |gn| = |fn f| 0

    a.e., g+n f and f is integrable. One can apply the dominated convergence theoremto conclude that

    IR

    g+n 0

    as n .

    Recall that

    g+n =gn + |gn|

    2

    29

  • 8/2/2019 n-2 cond-exp

    30/31

    and

    IR

    gn = 0 for all n. Immediately, we arrive at

    IR |gn| = 2 IR g+n .

    Therefore,

    IR

    |fn f| 0 as n .

    Consequently, for all x IR,

    |Fn(x) F(x)| = |

    (,x]

    (fn f)|

    (,x]

    |fn f|

    IR

    |fn f| 0

    as n . We are done!

    11. In the above corollary, the assumption of the a.e. convergence of fn f (withrespect to the Lebesgue measure on IR) can be replaced by the convergence inLebesgue measure.

    30

  • 8/2/2019 n-2 cond-exp

    31/31

    Exercises

    1. Refer to the definition of conditional expectation in 2.2. Show that if W satisfies(a) and (b), then it must be integrable. Indeed, show that EE|W| EE|X|.

    2. IfX1 = X2 a.s. on B G, show that EE[X1 | G] = EE[X2 | G] a.s. on B.

    3. Complete the discussion of the above Example 2 in 4.4.

    4. If the random vector (X, Y) has the probability density function f(, ) and X isintegrable, show that one version of EE[X | X+ Y = z] is given by

    xf(x, z x) dx

    f(x, z x) dx.

    5. Show that EE[X] = EE[EE(X | G)].

    6. Let X be a nonnegative random variable on a probability space (, F, IP). Put = IP and define a set function on F by

    (A) =

    A

    X d

    for all A F. Show that is a measure and .

    7. Suppose that X, Y L1(, F, IP) and that

    EE[X | Y] = Y, a.s., EE[Y | X] = X, a.s.

    Prove that X = Y a.s., i.e., IP{X = Y} = 1.

    8. IfY is nonnegative, show that a.s. {EE[Y | G] = 0} {Y = 0}.

    31