chapter 5. joint distributions - university of kentchapter 5. joint distributions 5.1 introduction...

49
Chapter 5. JOINT DISTRIBUTIONS 5.1 Introduction In this chapter we look at the simultaneous (joint) behaviour of two or more rvs. Example: We measure height and weight of randomly selected individuals. ht wt × × × × × × × × × × × × × × × × × × Clearly the two rvs representing height and weight are linked.

Upload: others

Post on 08-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Chapter 5.

    JOINT DISTRIBUTIONS

    5.1 Introduction

    In this chapter we look at the simultaneous(joint) behaviour of two or more rvs.

    Example: We measure height and weight ofrandomly selected individuals.

    ht

    wt

    ××××

    ××

    ×

    ×

    ×

    ×

    ×

    ×

    ××

    ××

    ×

    ×

    Clearly the two rvs representing height andweight are linked.

  • 5.2 Discrete rvs: joint probabilityfunction

    Let X have range x1, x2, . . . , xn, and let Yhave range y1, y2, . . . , ym.

    DEFINITION: The joint pf of X and Y isdefined as the function

    Pr(X = x ∩ Y = y) ,

    for x = x1, x2, . . . , xn , y = y1, y2, . . . , ym . It is afunction of x and y.

    Notation: the joint pf is often writtenpX,Y (x, y).

    SIMPLE EXAMPLEX

    1 2 3 4

    1 0.05 0.05 0.05 0.05

    Y 2 0.05 0.10 0.15 0.20

    3 0.05 0.00 0.10 0.15

    Entries in the body of the table give the jointpf of X and Y .

  • Marginal Distributions

    The joint pf gives full information about thejoint behaviour of X and Y . But X by itself isjust a discrete random variable, so it has a pfPr(X = x), x = 1,2,3,4.

    ExampleX

    1 2 3 4

    1 0.05 0.05 0.05 0.05 0.20

    Y 2 0.05 0.10 0.15 0.20 0.50

    3 0.05 0.00 0.10 0.15 0.30

    0.15 0.15 0.30 0.40

    To obtain the marginal pf of X, sum thejoint pf over all values of Y , since

    Pr(X = x) =m∑

    y=1

    Pr(X = x ∩ Y = y) .

    This gives the marginal distribution of X.

    The pf of Y is obtained similarly.

    Exercise: Find E(X) and E(Y ).

  • Conditional Distributions

    Consider now

    Pr(X = x | Y = y) =Pr(X = x ∩ Y = y)

    Pr(Y = y).

    For fixed y, this function of x gives the

    conditional distribution of X given Y = y .

    Extract from the table above:

    x 1 2 3 4

    Pr(X = x ∩ Y = 2) 0.05 0.10 0.15 0.20Pr(X = x | Y = 2) 0.10 0.20 0.30 0.40

    Conditional distributions share the properties

    of probability distributions. Note in particular

    that these conditional probabilities are in the

    range [0, 1], and that they sum to 1 over the

    range of X.

    Exercise: Find E(X | Y = 2).

  • Conditional and marginal distributions

    We have seen that

    Pr(X = x) =∑y

    Pr(X = x ∩ Y = y).

    Also, by definition,

    Pr(X = x | Y = y) =Pr(X = x ∩ Y = y)

    Pr(Y = y).

    Together, these give the useful result

    Pr(X = x) =∑y

    Pr(X = x | Y = y)Pr(Y = y) .

    These concepts extend to joint distributions:

    • of more than 2 rvs,• of continuous rvs.

    We consider the case of continuous rvs later:

    we will need to use integration intead of

    summation.

  • Example: Poisson and binomial

    Seeds of a particular plant species fall at

    random, so that the number Y in a particular

    area has a Poisson distribution with some

    mean µ.

    For each seed, independent of all others, the

    probability of germinating is p.

    Find the distribution of the number X of

    seeds that germinate; that is, calculate

    Pr(X = x) , for x = 0,1, . . ..

    Information provided. We are told:

    (i): Y ∼ Poisson(µ), so that

    Pr(Y = y) =e−µµy

    y!, y = 0,1,2, . . .

    (ii): Given that Y = y, X ∼ B(y, p) ; so that

    Pr(X = x | Y = y) =(yx

    )px(1−p)y−x, x = 0,1, . . . , y.

  • We need to calculate Pr(X = x), the

    marginal distribution of X. Noting that y ≥ x,and that, therefore,

    Pr(X = x) =∞∑

    y=xPr(X = x | Y = y)Pr(Y = y)

    we obtain

    So Pr(X = x) =∞∑

    y=x

    (yx

    )px(1− p)y−x

    e−µµy

    y!

    =e−µpx

    x!

    ∞∑y=x

    (1− p)y−xµy

    (y − x)!

    [substitute z = y − x]

    =e−µpxµx

    x!

    ∞∑z=0

    (1− p)zµz

    z!

    =e−µ(pµ)x

    x!e(1−p)µ =

    e−pµ(pµ)x

    x!.

    Hence X ∼ Poisson(pµ) .

  • 5.3 Continuous rvs: joint pdf

    Instead of a joint probability function (pf)Pr(X = x ∩ Y = y) , we obtain a jointprobability density function (pdf).

    Recall: for a single random variable X, weview the pdf as giving the probability that Xis close to some value x; that is,

    Pr(x < X ≤ x + g) ' g fX(x).

    We extend this idea here to the case of twovariables X and Y . We define the joint pdf ofX and Y informally as the function fX,Y (x, y)such that the probability of the event

    (x < X ≤ x + g) ∩ (y < Y ≤ y + h)is approximately g.h.fX,Y (x, y).

    Notes

    Note 1. Formally, the joint pdf is defined bydifferentiating a joint cdf Pr(X ≤ x ∩ Y ≤ y)partially with respect to x and y.

  • Note 2. Joint pdfs have behaviour analogous

    to that of joint pfs. For example, we have the

    following:

    (A).∫ ∞−∞

    ∫ ∞−∞

    fX,Y (x, y)dxdy = 1 .

    (B). Marginal distributions:

    fX(x) =∫ ∞−∞

    fX,Y (x, y)dy

    (C). Conditional distributions:

    fX|Y (x | y) =fX,Y (x, y)

    fY (y)

    In the discrete case, we noted the result

    Pr(X = x) =∑y

    Pr(X = x | Y = y)Pr(Y = y) .

    For continuous rvs, (B) and (C) combine to

    give the equivalent result

    fX(x) =∫ ∞−∞

    fX|Y (x | y)fY (y)dy .

  • 5.4 Expectation

    In Chapter 3, we used the pf to give a

    complete description of the behaviour of a

    discrete rv.

    In Chapter 4, we used the pdf to give a

    complete description of the behaviour of a

    continuous rv.

    To summarise the most important features

    of the behaviour of any rv (whether discrete

    or continuous), we used the mean and

    variance. We defined:

    mean (µ) E(X)

    variance (σ2) E{(X − µ)2}

    These are both expectations of functions of

    X. We can also make use of the concept of

    expectation to summarise the joint behaviour

    of two rvs X and Y .

  • We first extend the definition of expectation

    to cover a function g(X, Y ) of two rvs, not

    just a function of a single rv.

    Definition: If g(X, Y ) is a scalar function of X

    and Y , then we define

    E{g(X, Y )} =∑x

    ∑y

    g(x, y)Pr(X = x ∩ Y = y)

    if X and Y are discrete, and

    E{g(X, Y )} =∫ ∫

    g(x, y)fX,Y (x, y) dx dy

    if X and Y are continuous.

  • Example:X

    1 2 3 4

    1 0.05 0.05 0.05 0.05 0.20

    Y 2 0.05 0.10 0.15 0.20 0.50

    3 0.05 0.00 0.10 0.15 0.30

    0.15 0.15 0.30 0.40

    For g(X, Y ) = 2X + 5Y , we obtain:

    E(2X + 5Y ) = (7× 0.05) + (9× 0.05). . . + (14× 0.10)

    . . . + (23× 0.15) = 16.4.

    Similarly, for g(X, Y ) = XY , we obtain:

    E(XY ) = (1× 0.05) + (2× 0.05). . . + (4× 0.10)

    . . . + (12× 0.15) = 6.35.

    The concept of expectation is very powerful,especially when used in the context of thejoint behaviour of two or more rvs.

  • Expectation is a linear operator

    We now show that, for any function g(X) of

    a discrete rv X,

    E{ag(X) + b} = aE{g(X)}+ b .

    For brevity, we write pX(x) for Pr(X = x) .

    Now

    E{ag(X) + b} =∑{ag(x) + b}pX(x)

    = a∑

    g(x)pX(x) + b∑

    pX(x)

    = aE{g(X)}+ b.

    A similar proof holds when X is continuous.

    Hence, for scalar problems, expectation acts

    as a linear operator.

    It is also a linear operator when it is a

    function of several rvs.

  • For any rvs X and Y and constants a and b,

    E(aX + bY ) = aE(X) + bE(Y ).

    Proof (discrete case) We again writePr(X = x ∩ Y = y) = pX,Y (x, y); alsoPr(X = x) = pX(x), Pr(Y = y) = pY (y). Now

    E (aX + bY )

    =∑x

    ∑y

    (ax + by)pX,Y (x, y)

    =∑x

    ∑y

    axpX,Y (x, y) +∑x

    ∑y

    bypX,Y (x, y)

    = a∑x

    x

    ∑y

    pX,Y (x, y)

    + b∑y

    y

    {∑x

    pX,Y (x, y)

    }

    = a∑x

    xpX(x) + b∑y

    ypY (y)

    = aE(X) + bE(Y ).

    Note that the operator is again linear.

    We return to this topic later.

  • 5.5 Covariance

    Now reconsider the plot of height (Y ) against

    weight (X).

    hty

    wt x

    ××××

    ××

    ×

    ×

    ×

    ×

    ×

    ×

    ××

    ××

    ×

    ×

    When X is relatively large, so is Y . So if, for

    an individual, {X − E(X)} is positive,{Y − E(Y )} is also likely to be positive .

    Similarly, when {X − E(X)} is negative,{Y − E(Y )} is also likely to be negative .

    So the product {X − E(X)}{Y − E(Y )} islikely to be positive.

  • We can summarise the link between X and Y

    by examining the product

    {X − E(X)}{Y − E(Y )}

    In particular, we consider the expectation of

    this function.

    DEFINITION: The covariance between rvs

    X and Y , written as Cov(X, Y ) , is defined as

    Cov(X, Y ) = E [{X − E(X)}{Y − E(Y )}] .

    If Cov(X, Y ) = 0 , then X and Y are said to

    be uncorrelated.

    The correlation between X and Y is positive

    if Cov(X, Y ) > 0. We then say that they are

    positively correlated.

    They are negatively correlated if

    Cov(X, Y ) < 0.

  • Calculation of covariance

    Recall that the variance Var(X) of a rv X is

    defined as Var(X) = E[{X − E(X)}2] .

    However, we proved in §3.3 that we can writeVar(X) = E(X2)− {E(X)}2.

    There is, similarly, an easier way to calculate

    a covariance.

    Theorem:

    Cov(X, Y ) = E(XY )− E(X)E(Y ).

    Proof

    Cov(X, Y ) = E [{X − E(X)}{Y − E(Y )}]

    = E[XY − E(X)Y−XE(Y ) + E(X)E(Y )]

    = E(XY )− E(X)E(Y ).

  • Example:X

    1 2 3 4

    1 0.05 0.05 0.05 0.05 0.20

    Y 2 0.05 0.10 0.15 0.20 0.50

    3 0.05 0.00 0.10 0.15 0.30

    0.15 0.15 0.30 0.40

    We can obtain E(X) and E(Y ) easily: for

    example

    E(Y ) = (1× 0.2)+ (2× 0.5)+ (3× 0.3) = 2.1 .

    Similarly, we can show that E(X) = 2.95 .

    We showed earlier that E(XY ) = 6.35 .

    Hence Cov(X, Y ) = 6.35− 2.95× 2.1 = 0.155.

    Link between concepts of covariance and

    correlation: important in dues course, but not

    covered in this module.

  • 5.6 Independent Random Variables

    Recall that events A and B are independent

    if Pr(A ∩B) = Pr(A)× Pr(B) . We extendthe concept in a natural way.

    Definition: Random variables X and Y are

    statistically independent if any event relating

    to X alone is independent of any event

    relating to Y alone.

    For example, if X and Y are independent,

    Pr{(X ≤ 3)∩(Y ≥ 6)} = Pr(X ≤ 3)×Pr(Y ≥ 6).

    As a consequence, for discrete rvs

    Pr(X = x ∩ Y = y) = Pr(X = x)Pr(Y = y)

    and for continuous rvs

    fX,Y (x, y) = fX(x)fY (y).

    That is, the joint pf (discrete) or pdf

    (continuous) factorises into terms involving x

    and y separately.

  • Initial Example, revisited

    Original - NOT INDEPENDENT

    X

    1 2 3 4

    1 0.05 0.05 0.05 0.05 0.20

    Y 2 0.05 0.10 0.15 0.20 0.50

    3 0.05 0.00 0.10 0.15 0.30

    0.15 0.15 0.30 0.40

    Revised - INDEPENDENTX

    1 2 3 4

    1 0.030 0.030 0.06 0.08 0.20

    Y 2 0.075 0.075 0.15 0.20 0.50

    3 0.045 0.045 0.09 0.12 0.30

    0.15 0.15 0.30 0.40

    For example, note that, in the second table,

    Pr({X = 4} ∩ {Y = 1}) = 0.40× 0.20 = 0.08 .

  • Independence and covariance

    If two random variables are independent, then

    several aspects of their expectations and

    variances simplify.

    In particular, if X and Y are independent rvs,

    then E(XY ) = E(X)E(Y ) .

    Proof: (discrete case)

    E(XY ) =∑x

    ∑y

    xy Pr(X = x ∩ Y = y)

    =∑x

    ∑y

    xy Pr(X = x)Pr(Y = y)

    =

    (∑x

    xPr(X = x)

    )∑y

    y Pr(Y = y)

    = E(X)E(Y ).

    Hence, if X and Y are independent,

    Cov(X, Y ) = 0 . The rvs are also

    uncorrelated.

  • 5.7 Applications of Expectation

    Reminders: For a rv X with pdf fX(x), theexpectation E{g(X)} of the function g(x) isdefined as

    E{g(X)} =∫ ∞−∞

    g(x)fX(x) dx; or

    =∑x

    g(x)Pr(X = x)

    We have seen that

    • Expectation is linear, so thatE{ag(x) + b} = aE{g(X)}+ b andE(aX + bY ) = aE(X) + bE(Y ).

    • The covariance between X and Y canbe calculated asCov(X, Y ) = E(XY )− E(X)E(Y ).

    • If X and Y are independent randomvariables, Cov(X, Y ) = 0.

    There are several natural extensions of theseresults.

  • The result

    E(aX + bY ) = aE(X) + bE(Y )

    extends to the weighted sum of any number

    of rvs.

    If X1, X2, . . . , Xm are rvs and a1, a2, . . . , amare constants, then

    E

    m∑i=1

    aiXi

    = m∑i=1

    aiE(Xi).

    In the result above, the coefficients a1, a2 etc

    must be constants.

    Note that, in general, E(XY ) 6= E(X)E(Y ) .

  • Results for variances

    Recall that, the expectation of a linearfunction aX + b of a random variable X isgiven by

    E(aX + b) = aE(X) + b.

    The equivalent result for the variance of alinear function is:If X is a rv, and a and b are constants, thenVar(aX + b) = a2Var(X).

    Proof: Let Z = aX + b, then

    E(Z) = aE(X) + b.

    Now, by definition, Var(Z) = E[{Z − E(Z)}2].We also know that

    Z − E(Z) = (aX + b)− (aE(X) + b)= a{X − E(X)}

    So Var(Z) = E[a2{X − E(X)}2]= a2E{X − E(X)}2]= a2Var(X).

  • Variances of sums of rvs:

    For any two rvs X and Y ,

    Var(X + Y ) = Var(X)+Var(Y )+2Cov(X, Y ).

    Proof: Write µx = E(X) and µy = E(Y ).

    Var(X + Y ) = E[{(X + Y )− (µx + µy)}2

    ]= E

    [{(X − µx) + (Y − µy)}2

    ]= E

    [(X − µx)2 + (Y − µy)2

    +2(X − µx)(Y − µy)]

    = Var(X) + Var(Y ) + 2Cov(X, Y ).

    Combining this with the previous result

    Var(aX + b) = a2Var(X)

    gives the following:

    If X and Y are two rvs and a and b areconstants, then

    Var(aX + bY ) = a2Var(X) + b2Var(Y )

    +2abCov(X, Y ) .

  • Special case: difference between two rvs

    Substituting a = 1, b = −1, we obtain

    Var(X − Y ) = Var(X) + Var(Y )− 2Cov(X, Y )

    [Compare:

    Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y ) .]

    Further Extension: For any jointly distributed

    rvs X1, X2, . . . , Xn:

    Var(X1 + X2 + · · ·+ Xn)

    =n∑

    i=1

    Var(Xi) + 2n−1∑i=1

    n∑j=i+1

    Cov(Xi, Xj).

    That is: the variance of the sum of rvs is the

    sum of their variances, plus twice the sum of

    all the covariances.

  • Special case: Independent rvs

    For independent rvs X1, X2, . . . , Xn:

    Var

    n∑i=1

    Xi

    = n∑i=1

    Var(Xi),

    and

    Var

    n∑i=1

    aiXi

    = n∑i=1

    a2i Var(Xi).

    For independent rvs only:

    Variance of a sum = sum of the variances.

    For all rvs:

    Expectation of a sum = sum of the

    expectations.

  • 5.8 Applications to SamplingProblems

    In statistical work we often select a set of

    observations, a random sample, from some

    distribution.

    This is used to make inferences about the

    features of the distribution, e.g. its mean and

    variance.

    Suppose that X1, X2, . . . , Xn are independent

    and identically distributed, each with mean µ

    and variance σ2.

    Typically µ and σ are unknown, and we will

    wish to use the information in the sample to

    estimate them.

  • Estimating the mean, µ

    Suppose that just the first value, X1, isused, ignoring all the rest.It is clear that E(X1) = µ. On average, X1 isneither too high or too low .

    How close can X1 be expected to be to thecorrect value µ? What is the ‘error’ in thisestimate?

    On any particular occasion, we will not knowthe value of X1 − µ, but we can assess itslikely size by finding Var(X) . (We havealready denoted this by σ2.)

    We will compare X1 with another estimatorof µ: the sample mean

    Y =X1 + X2 + · · ·+ Xn

    n.

    It is clear that Y makes better use of thedata than X1 does. But can we prove thatY is a better estimator of µ than X1 is?

  • Consider the mean and variance of Y .

    E(Y ) =1

    nµ +

    1

    nµ + · · ·+

    1

    nµ =

    n= µ.

    On average, Y is neither too high nor toolow. What is the ‘error’ in this estimate?For Var(Y ) we use the result:

    Var (∑

    aiXi) =∑

    a2i Var(Xi),

    valid when the rvs X1, . . . , Xn areindependent. But Y is defined as

    Y = X1+X2+···+Xnn , so a1 = . . . = an =1n,

    Hence Var(Y ) =1

    n2σ2 +

    1

    n2σ2 + · · ·+

    1

    n2σ2

    =nσ2

    n2=

    σ2

    n.

    But Var(X1) = σ2 . So, for n > 1,

    Var(Y ) =σ2

    n< Var(X1).

    On average Y will be closer to µ than X1will: it is a better estimator of µ.

  • Notes:

    1. The sample mean is usually denoted by abar over the symbol, e.g.,

    X =1

    n(X1 + X2 + · · ·+ Xn) .

    2. We show in §5.9 that linear combinationsof Normal rvs are themselves Normal . IfX1, X2, . . . , Xn are mutually independent, andif they are all N(µ, σ2), then

    X ∼ N(µ, σ2/n).This is an important result, and is used veryfrequently in statistical practice.

    3. When estimating the mean µ from arandom sample, it is not necessarily best touse X. Another possible estimator is thesample median M .If X1, X2, . . . , Xn are mutually independent,and if they are all N(µ, σ2), it can be shownthat

    M.∼ N

    (µ,

    π

    2

    σ2

    n

    ).

  • Estimating the variance

    The sample mean X is usually used to

    estimate µ. What can we use to estimate σ2?

    Since Var(X) is defined as E{(X − µ)2}, weconsider the sample version of this quantity:

    1

    n

    n∑i=1

    (Xi − µ)2.

    However, µ is typically unknown , so cannot

    be used. It is sensible to consider replacing it

    by X . We therefore examine the properties

    of a statistic T , defined as

    T =1

    n

    n∑i=1

    (Xi −X)2

    as an estimator for σ2.

    Will it be too high or too low, on average?

    We need to find E(T ).

  • We start by expanding the term (Xi −X)2.This gives

    T =1

    n

    n∑i=1

    (Xi −X)2

    =1

    n

    n∑i=1

    X2i − 2Xn∑

    i=1

    Xi + nX2

    .=

    1

    n

    n∑i=1

    X2i − X2.

    Since expectation is a linear operator, we

    obtain

    E(T ) =1

    n

    n∑

    i=1

    E(X2i )

    − E(X2)For any rv (W , say)

    Var(W ) = E(W2)− {E(W )}2.So E(W2) = Var(W ) + {E(W )}2.

    We know E(Xi), Var(Xi), E(X) and Var(X),

    so we can now obtain E(X2i ) and E(X2).

  • We obtain

    E(T ) =1

    n

    n∑1

    [{E(Xi)}2 + Var(Xi)

    ]−

    [{E(X)}2 + Var(X)

    ]= (µ2 + σ2)− (µ2 +

    σ2

    n)

    =n− 1

    nσ2.

    This shows that, on average, T is a little too

    small. To compensate for this, we usually use

    S2 =n

    n− 1T =

    1

    n− 1

    n∑i=1

    (Xi −X)2

    to estimate σ2.

    Since E(S2) = E(

    n

    n− 1T

    )= σ2 , S2 is said to

    be an unbiased estimator of σ2.

  • 5.9 Sums of random variables

    If X1, X2, . . . , Xn are independent, and

    Y = X1 + X2 + · · ·+ Xn

    is their sum, we know that

    E(Y ) =n∑

    i=1

    E(Xi)

    and that

    Var(Y ) =n∑

    i=1

    Var(Xi).

    But what is the distribution of Y ?

    This distribution can be very complicated,

    and there are several methods which can be

    used to find it (seen in later courses).

    Here we look at the use of the

    moment generating function (mgf)

  • Moment Generating Function

    DEFINITION: The mgf MX(s) of a random

    variable X is defined as MX(s) = E(esX) :

    hence

    MX(s) = E(esX)

    =∑x

    esx Pr(X = x) (discrete)

    =∫ ∞−∞

    esxfX(x)dx (continuous).

    Examples:

    (1) Binomial distribution B(n, p)

    MX(s) =n∑

    x=0

    esx(nx

    )px(1− p)n−x

    =n∑

    x=0

    (nx

    )(pes)x (1− p)n−x

    = (1− p + pes)n .

  • (2) Exponential distribution, parameter λ

    MX(s) =∫ ∞0

    esxfX(x) dx

    =∫ ∞0

    esxλe−λxdx

    λ− s, s < λ, (exercise 82).

    Note that this integral is valid for s < λ only.

    (3) Normal distribution, N(µ, σ2)

    MX(s) =∫ ∞−∞

    esx1

    σ√

    2πe−(x−µ)

    2

    2σ2 dx

    = eµs+12σ

    2s2 .

    Evaluating this integral is tricky, and beyondthe scope of this module.

    The key point is that if one calculates themgf for some rv W , and finds it to be of the

    form eas+12bs

    2, then the distribution of W

    must be Normal; the mean will be a, and thevariance will be b.

  • Some properties of mgfs

    (1) Generating moments

    Definition: For any rv X, E(Xr) is known as

    the rth moment of X.

    So the mean, E(X), is the first moment of

    X, E(X2) is the second moment, and so on.

    Now esX = 1 + sX +s2

    2X2 + · · · ,

    and therefore

    E(esX

    )= 1 + sE(X) +

    s2

    2E(X2) + · · · .

    So, if one expands MX(s) in powers of s , the

    coefficient of sr/r! will be E(Xr) , the rth

    moment of X.

    In particular, the coefficient of s will be the

    mean , and the coefficient of 12s2 will be

    E(X2), from which we can calculate the

    variance .

  • Example: For the exponential distribution

    with parameter λ, the mgf is

    MX(s) = λ/(λ− s) = (1− s/λ)−1

    = 1 +s

    λ+

    s2

    λ2+ · · ·

    = 1 +1

    λs +

    2

    λ212s

    2 + · · ·

    Hence E(X) =1

    λand E(X2) =

    2

    λ2.

    These values agree with those we found in

    §4.5; from them we can show that

    Var(X) = E(X2)−{E(X)}2 =2

    λ2−(1

    λ

    )2=

    1

    λ2.

  • (2) Sums of independent rvs:

    Let X and Y be independent rvs, and letZ = X + Y . Then

    MZ(s) = E(esZ

    )= E

    {es(X+Y )

    }= E

    (esXesY

    )= E

    (esX

    )E(esX

    )(by independence)

    = MX(s)MY (s).

    The mgf of the sum of two independent rvsis the product of their mgfs.

    The result also holds for the sum of nindependent rvs,

    MX1+X2+···+Xn(s) = MX1(s)MX2(s) · · ·MXn(s).

    This is an important result. It gives us arelatively easy way to find the distribution ofsample means – which are just sums of rvsdivided by a constant.

  • Example 1: Binomial distribution

    Suppose that rvs X1 and X2 are known to be

    independent, and that X1 ∼ B(n1, p) andX2 ∼ B(n2, p). Find the distribution ofY = X1 + X2 .

    MY (s) = MX1(s)MX2(s)

    = (1− p + pes)n1 (1− p + pes)n2

    = (1− p + pes)n1+n2 .

    This is the mgf of the B(n1 + n2, p)

    distribution. We have therefore shown that

    X1 + X2 ∼ B(n1 + n2, p).

    Note that the result is not valid if the

    probabilities of success are different. If

    X1 ∼ B(n1, p1) and X2 ∼ B(n2, p2), withp1 6= p2, then the mgf of Y = X1 + X2 willnot be of the binomial form.

  • Example 2: Exponential distribution

    Let X1 and X2 be independent, and let them

    both have an exponential distribution with

    parameter λ.

    The distribution of X1 + X2 has mgf(

    λλ−s

    )2.

    This is not of the same form as the mgf of an

    exponential distribution, so the sum of

    exponential rvs does not have an exponential

    distribution.

  • Example 3: Normal distribution

    Suppose that rvs X1, X2, . . . , Xn are all

    independent, and that they are Normally

    distributed with Xi ∼ N(µi, σ2i ).

    Find the distribution of the sum Y =n∑1

    Xi.

    Now the mgf of Xi is eµis+

    12σ

    2i s

    2, and the mgf

    of Y is the product of all the mgfs of the Xs.

    Hence MY (s) =n∏

    i=1

    e(µis+12σ

    2i s

    2)

    That is, MY (s) = e{∑

    (µis+12σ

    2i s

    2)}

    = es∑

    µi+s22

    ∑σ2i ,

    where all sums are over the range i = 1, ..., n.

    We see that the mgf of Y is of the form

    eas+12bs

    2,which is the form of the mgf of a

    Normal distribution. Therefore, Y is Normally

    distributed.

  • Recall that, if X ∼ N(µ, σ2) then the mgf of

    X will be eµs+12σ

    2s2. The mgf of Y is clearly

    of this form, so

    Y ∼ N(n∑

    i=1

    µi,n∑

    i=1

    σ2i ) .

    Sums of Normal rvs are themselves

    Normally distributed.

    Very few distributions have this property. The

    fact that the Normal does contributes a great

    deal to its importance.

  • CHAPTER 5 SUMMARY

    Joint distributions

    • Description of the simultaneous behaviourof two or more random variables.

    • Discrete, joint pf: PX,Y (x, y)Continuous, joint pdf:fX,Y (x, y)

    • joint cdf• marginal and conditional distributions• independent rvs:

    – E[g(X)h(Y )] = E[g(X)]E[h(Y )]– fX,Y (x, y) = fX(x)fY (y)

    • Expectation as an operator– expectation of function of 2 rvs– linear operator: E(aX + bY + c)– covariance– mean and variance of linear

    transformations, especially ofindependent rvs

    – variance of a sum of rvs,• sampling problems• sums of independent rvs: mgfs

  • COURSE SUMMARY p1/2

    [Chapters 1 and 2]

    • Experiment, event, sample space.• Union, intersection, complement• Exclusive and exhaustive events• Probability: axioms.• Interpretations of probability:

    – symmetry,

    – limiting relative frequency,

    – subjective probability

    • Deductions from axioms• Sampling problems, replacement• Conditional probability,• Independence (pairwise, mutual)• Important theorems

    – law of total probability

    – Bayes’ theorem

  • COURSE SUMMARY p2/2[Chapters 3 ,4 and 5]

    • Discrete and continuous rvs– discrete: pf, cdf– continuous: pdf, cdf

    • Expectation and variance• Bernoulli trials• Important discrete distributions

    – binomial, Poisson, geometric– Poisson approximation to binomial

    • Important continuous distributions– uniform, exponential, Normal

    • The Normal distribution– standardisation: N(0,1)– Use of tables– Normal approximation to the

    binomial and Poisson distributions• Joint distributions

    – Marginal and conditional distributions– Independent random variables– Use of expectation– Sampling and estimation– Sum of rvs. Moment generating fn