chapter 5. joint distributions - university of kentchapter 5. joint distributions 5.1 introduction...

Chapter 5.

JOINT DISTRIBUTIONS

5.1 Introduction

In this chapter we look at the simultaneous(joint) behaviour of two or more rvs.

Example: We measure height and weight ofrandomly selected individuals.

ht

wt

××××

××

×

×

×

×

×

×

××

××

×

×

Clearly the two rvs representing height andweight are linked.

5.2 Discrete rvs: joint probabilityfunction

Let X have range x1, x2, . . . , xn, and let Yhave range y1, y2, . . . , ym.

DEFINITION: The joint pf of X and Y isdefined as the function

Pr(X = x ∩ Y = y) ,

for x = x1, x2, . . . , xn , y = y1, y2, . . . , ym . It is afunction of x and y.

Notation: the joint pf is often writtenpX,Y (x, y).

SIMPLE EXAMPLEX

1 2 3 4

1 0.05 0.05 0.05 0.05

Y 2 0.05 0.10 0.15 0.20

3 0.05 0.00 0.10 0.15

Entries in the body of the table give the jointpf of X and Y .

Marginal Distributions

The joint pf gives full information about thejoint behaviour of X and Y . But X by itself isjust a discrete random variable, so it has a pfPr(X = x), x = 1,2,3,4.

ExampleX

1 2 3 4

1 0.05 0.05 0.05 0.05 0.20

Y 2 0.05 0.10 0.15 0.20 0.50

3 0.05 0.00 0.10 0.15 0.30

0.15 0.15 0.30 0.40

To obtain the marginal pf of X, sum thejoint pf over all values of Y , since

Pr(X = x) =m∑

y=1

Pr(X = x ∩ Y = y) .

This gives the marginal distribution of X.

The pf of Y is obtained similarly.

Exercise: Find E(X) and E(Y ).

Conditional Distributions

Consider now

Pr(X = x | Y = y) =Pr(X = x ∩ Y = y)

Pr(Y = y).

For fixed y, this function of x gives the

conditional distribution of X given Y = y .

Extract from the table above:

x 1 2 3 4

Pr(X = x ∩ Y = 2) 0.05 0.10 0.15 0.20Pr(X = x | Y = 2) 0.10 0.20 0.30 0.40

Conditional distributions share the properties

of probability distributions. Note in particular

that these conditional probabilities are in the

range [0, 1], and that they sum to 1 over the

range of X.

Exercise: Find E(X | Y = 2).

Conditional and marginal distributions

We have seen that

Pr(X = x) =∑y

Pr(X = x ∩ Y = y).

Also, by definition,

Pr(X = x | Y = y) =Pr(X = x ∩ Y = y)

Pr(Y = y).

Together, these give the useful result

Pr(X = x) =∑y

Pr(X = x | Y = y)Pr(Y = y) .

These concepts extend to joint distributions:

• of more than 2 rvs,• of continuous rvs.

We consider the case of continuous rvs later:

we will need to use integration intead of

summation.

Example: Poisson and binomial

Seeds of a particular plant species fall at

random, so that the number Y in a particular

area has a Poisson distribution with some

mean µ.

For each seed, independent of all others, the

probability of germinating is p.

Find the distribution of the number X of

seeds that germinate; that is, calculate

Pr(X = x) , for x = 0,1, . . ..

Information provided. We are told:

(i): Y ∼ Poisson(µ), so that

Pr(Y = y) =e−µµy

y!, y = 0,1,2, . . .

(ii): Given that Y = y, X ∼ B(y, p) ; so that

Pr(X = x | Y = y) =(yx

)px(1−p)y−x, x = 0,1, . . . , y.

We need to calculate Pr(X = x), the

marginal distribution of X. Noting that y ≥ x,and that, therefore,

Pr(X = x) =∞∑

y=xPr(X = x | Y = y)Pr(Y = y)

we obtain

So Pr(X = x) =∞∑

y=x

(yx

)px(1− p)y−x

e−µµy

y!

=e−µpx

x!

∞∑y=x

(1− p)y−xµy

(y − x)!

[substitute z = y − x]

=e−µpxµx

x!

∞∑z=0

(1− p)zµz

z!

=e−µ(pµ)x

x!e(1−p)µ =

e−pµ(pµ)x

x!.

Hence X ∼ Poisson(pµ) .

5.3 Continuous rvs: joint pdf

Instead of a joint probability function (pf)Pr(X = x ∩ Y = y) , we obtain a jointprobability density function (pdf).

Recall: for a single random variable X, weview the pdf as giving the probability that Xis close to some value x; that is,

Pr(x < X ≤ x + g) ' g fX(x).

We extend this idea here to the case of twovariables X and Y . We define the joint pdf ofX and Y informally as the function fX,Y (x, y)such that the probability of the event

(x < X ≤ x + g) ∩ (y < Y ≤ y + h)is approximately g.h.fX,Y (x, y).

Notes

Note 1. Formally, the joint pdf is defined bydifferentiating a joint cdf Pr(X ≤ x ∩ Y ≤ y)partially with respect to x and y.

Note 2. Joint pdfs have behaviour analogous

to that of joint pfs. For example, we have the

following:

(A).∫ ∞−∞

∫ ∞−∞

fX,Y (x, y)dxdy = 1 .

(B). Marginal distributions:

fX(x) =∫ ∞−∞

fX,Y (x, y)dy

(C). Conditional distributions:

fX|Y (x | y) =fX,Y (x, y)

fY (y)

In the discrete case, we noted the result

Pr(X = x) =∑y

Pr(X = x | Y = y)Pr(Y = y) .

For continuous rvs, (B) and (C) combine to

give the equivalent result

fX(x) =∫ ∞−∞

fX|Y (x | y)fY (y)dy .

5.4 Expectation

In Chapter 3, we used the pf to give a

complete description of the behaviour of a

discrete rv.

In Chapter 4, we used the pdf to give a

complete description of the behaviour of a

continuous rv.

To summarise the most important features

of the behaviour of any rv (whether discrete

or continuous), we used the mean and

variance. We defined:

mean (µ) E(X)

variance (σ2) E{(X − µ)2}

These are both expectations of functions of

X. We can also make use of the concept of

expectation to summarise the joint behaviour

of two rvs X and Y .

We first extend the definition of expectation

to cover a function g(X, Y ) of two rvs, not

just a function of a single rv.

Definition: If g(X, Y ) is a scalar function of X

and Y , then we define

E{g(X, Y )} =∑x

∑y

g(x, y)Pr(X = x ∩ Y = y)

if X and Y are discrete, and

E{g(X, Y )} =∫ ∫

g(x, y)fX,Y (x, y) dx dy

if X and Y are continuous.

Example:X

1 2 3 4

1 0.05 0.05 0.05 0.05 0.20

Y 2 0.05 0.10 0.15 0.20 0.50

3 0.05 0.00 0.10 0.15 0.30

0.15 0.15 0.30 0.40

For g(X, Y ) = 2X + 5Y , we obtain:

E(2X + 5Y ) = (7× 0.05) + (9× 0.05). . . + (14× 0.10)

. . . + (23× 0.15) = 16.4.

Similarly, for g(X, Y ) = XY , we obtain:

E(XY ) = (1× 0.05) + (2× 0.05). . . + (4× 0.10)

. . . + (12× 0.15) = 6.35.

The concept of expectation is very powerful,especially when used in the context of thejoint behaviour of two or more rvs.

Expectation is a linear operator

We now show that, for any function g(X) of

a discrete rv X,

E{ag(X) + b} = aE{g(X)}+ b .

For brevity, we write pX(x) for Pr(X = x) .

Now

E{ag(X) + b} =∑{ag(x) + b}pX(x)

= a∑

g(x)pX(x) + b∑

pX(x)

= aE{g(X)}+ b.

A similar proof holds when X is continuous.

Hence, for scalar problems, expectation acts

as a linear operator.

It is also a linear operator when it is a

function of several rvs.

For any rvs X and Y and constants a and b,

E(aX + bY ) = aE(X) + bE(Y ).

Proof (discrete case) We again writePr(X = x ∩ Y = y) = pX,Y (x, y); alsoPr(X = x) = pX(x), Pr(Y = y) = pY (y). Now

E (aX + bY )

=∑x

∑y

(ax + by)pX,Y (x, y)

=∑x

∑y

axpX,Y (x, y) +∑x

∑y

bypX,Y (x, y)

= a∑x

x

∑y

pX,Y (x, y)

+ b∑y

y

{∑x

pX,Y (x, y)

}

= a∑x

xpX(x) + b∑y

ypY (y)

= aE(X) + bE(Y ).

Note that the operator is again linear.

We return to this topic later.

5.5 Covariance

Now reconsider the plot of height (Y ) against

weight (X).

hty

wt x

××××

××

×

×

×

×

×

×

××

××

×

×

When X is relatively large, so is Y . So if, for

an individual, {X − E(X)} is positive,{Y − E(Y )} is also likely to be positive .

Similarly, when {X − E(X)} is negative,{Y − E(Y )} is also likely to be negative .

So the product {X − E(X)}{Y − E(Y )} islikely to be positive.

We can summarise the link between X and Y

by examining the product

{X − E(X)}{Y − E(Y )}

In particular, we consider the expectation of

this function.

DEFINITION: The covariance between rvs

X and Y , written as Cov(X, Y ) , is defined as

Cov(X, Y ) = E [{X − E(X)}{Y − E(Y )}] .

If Cov(X, Y ) = 0 , then X and Y are said to

be uncorrelated.

The correlation between X and Y is positive

if Cov(X, Y ) > 0. We then say that they are

positively correlated.

They are negatively correlated if

Cov(X, Y ) < 0.

Calculation of covariance

Recall that the variance Var(X) of a rv X is

defined as Var(X) = E[{X − E(X)}2] .

However, we proved in §3.3 that we can writeVar(X) = E(X2)− {E(X)}2.

There is, similarly, an easier way to calculate

a covariance.

Theorem:

Cov(X, Y ) = E(XY )− E(X)E(Y ).

Proof

Cov(X, Y ) = E [{X − E(X)}{Y − E(Y )}]

= E[XY − E(X)Y−XE(Y ) + E(X)E(Y )]

= E(XY )− E(X)E(Y ).

Example:X

1 2 3 4

1 0.05 0.05 0.05 0.05 0.20

Y 2 0.05 0.10 0.15 0.20 0.50

3 0.05 0.00 0.10 0.15 0.30

0.15 0.15 0.30 0.40

We can obtain E(X) and E(Y ) easily: for

example

E(Y ) = (1× 0.2)+ (2× 0.5)+ (3× 0.3) = 2.1 .

Similarly, we can show that E(X) = 2.95 .

We showed earlier that E(XY ) = 6.35 .

Hence Cov(X, Y ) = 6.35− 2.95× 2.1 = 0.155.

Link between concepts of covariance and

correlation: important in dues course, but not

covered in this module.

5.6 Independent Random Variables

Recall that events A and B are independent

if Pr(A ∩B) = Pr(A)× Pr(B) . We extendthe concept in a natural way.

Definition: Random variables X and Y are

statistically independent if any event relating

to X alone is independent of any event

relating to Y alone.

For example, if X and Y are independent,

Pr{(X ≤ 3)∩(Y ≥ 6)} = Pr(X ≤ 3)×Pr(Y ≥ 6).

As a consequence, for discrete rvs

Pr(X = x ∩ Y = y) = Pr(X = x)Pr(Y = y)

and for continuous rvs

fX,Y (x, y) = fX(x)fY (y).

That is, the joint pf (discrete) or pdf

(continuous) factorises into terms involving x

and y separately.

Initial Example, revisited

Original - NOT INDEPENDENT

X

1 2 3 4

1 0.05 0.05 0.05 0.05 0.20

Y 2 0.05 0.10 0.15 0.20 0.50

3 0.05 0.00 0.10 0.15 0.30

0.15 0.15 0.30 0.40

Revised - INDEPENDENTX

1 2 3 4

1 0.030 0.030 0.06 0.08 0.20

Y 2 0.075 0.075 0.15 0.20 0.50

3 0.045 0.045 0.09 0.12 0.30

0.15 0.15 0.30 0.40

For example, note that, in the second table,

Pr({X = 4} ∩ {Y = 1}) = 0.40× 0.20 = 0.08 .

Independence and covariance

If two random variables are independent, then

several aspects of their expectations and

variances simplify.

In particular, if X and Y are independent rvs,

then E(XY ) = E(X)E(Y ) .

Proof: (discrete case)

E(XY ) =∑x

∑y

xy Pr(X = x ∩ Y = y)

=∑x

∑y

xy Pr(X = x)Pr(Y = y)

=

(∑x

xPr(X = x)

)∑y

y Pr(Y = y)

= E(X)E(Y ).

Hence, if X and Y are independent,

Cov(X, Y ) = 0 . The rvs are also

uncorrelated.

5.7 Applications of Expectation

Reminders: For a rv X with pdf fX(x), theexpectation E{g(X)} of the function g(x) isdefined as

E{g(X)} =∫ ∞−∞

g(x)fX(x) dx; or

=∑x

g(x)Pr(X = x)

We have seen that

• Expectation is linear, so thatE{ag(x) + b} = aE{g(X)}+ b andE(aX + bY ) = aE(X) + bE(Y ).

• The covariance between X and Y canbe calculated asCov(X, Y ) = E(XY )− E(X)E(Y ).

• If X and Y are independent randomvariables, Cov(X, Y ) = 0.

There are several natural extensions of theseresults.

The result

E(aX + bY ) = aE(X) + bE(Y )

extends to the weighted sum of any number

of rvs.

If X1, X2, . . . , Xm are rvs and a1, a2, . . . , amare constants, then

E

m∑i=1

aiXi

= m∑i=1

aiE(Xi).

In the result above, the coefficients a1, a2 etc

must be constants.

Note that, in general, E(XY ) 6= E(X)E(Y ) .

Results for variances

Recall that, the expectation of a linearfunction aX + b of a random variable X isgiven by

E(aX + b) = aE(X) + b.

The equivalent result for the variance of alinear function is:If X is a rv, and a and b are constants, thenVar(aX + b) = a2Var(X).

Proof: Let Z = aX + b, then

E(Z) = aE(X) + b.

Now, by definition, Var(Z) = E[{Z − E(Z)}2].We also know that

Z − E(Z) = (aX + b)− (aE(X) + b)= a{X − E(X)}

So Var(Z) = E[a2{X − E(X)}2]= a2E{X − E(X)}2]= a2Var(X).

Variances of sums of rvs:

For any two rvs X and Y ,

Var(X + Y ) = Var(X)+Var(Y )+2Cov(X, Y ).

Proof: Write µx = E(X) and µy = E(Y ).

Var(X + Y ) = E[{(X + Y )− (µx + µy)}2

]= E

[{(X − µx) + (Y − µy)}2

]= E

[(X − µx)2 + (Y − µy)2

+2(X − µx)(Y − µy)]

= Var(X) + Var(Y ) + 2Cov(X, Y ).

Combining this with the previous result

Var(aX + b) = a2Var(X)

gives the following:

If X and Y are two rvs and a and b areconstants, then

Var(aX + bY ) = a2Var(X) + b2Var(Y )

+2abCov(X, Y ) .

Special case: difference between two rvs

Substituting a = 1, b = −1, we obtain

Var(X − Y ) = Var(X) + Var(Y )− 2Cov(X, Y )

[Compare:

Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y ) .]

Further Extension: For any jointly distributed

rvs X1, X2, . . . , Xn:

Var(X1 + X2 + · · ·+ Xn)

=n∑

i=1

Var(Xi) + 2n−1∑i=1

n∑j=i+1

Cov(Xi, Xj).

That is: the variance of the sum of rvs is the

sum of their variances, plus twice the sum of

all the covariances.

Special case: Independent rvs

For independent rvs X1, X2, . . . , Xn:

Var

n∑i=1

Xi

= n∑i=1

Var(Xi),

and

Var

n∑i=1

aiXi

= n∑i=1

a2i Var(Xi).

For independent rvs only:

Variance of a sum = sum of the variances.

For all rvs:

Expectation of a sum = sum of the

expectations.

5.8 Applications to SamplingProblems

In statistical work we often select a set of

observations, a random sample, from some

distribution.

This is used to make inferences about the

features of the distribution, e.g. its mean and

variance.

Suppose that X1, X2, . . . , Xn are independent

and identically distributed, each with mean µ

and variance σ2.

Typically µ and σ are unknown, and we will

wish to use the information in the sample to

estimate them.

Estimating the mean, µ

Suppose that just the first value, X1, isused, ignoring all the rest.It is clear that E(X1) = µ. On average, X1 isneither too high or too low .

How close can X1 be expected to be to thecorrect value µ? What is the ‘error’ in thisestimate?

On any particular occasion, we will not knowthe value of X1 − µ, but we can assess itslikely size by finding Var(X) . (We havealready denoted this by σ2.)

We will compare X1 with another estimatorof µ: the sample mean

Y =X1 + X2 + · · ·+ Xn

n.

It is clear that Y makes better use of thedata than X1 does. But can we prove thatY is a better estimator of µ than X1 is?

Consider the mean and variance of Y .

E(Y ) =1

nµ +

1

nµ + · · ·+

1

nµ =

nµ

n= µ.

On average, Y is neither too high nor toolow. What is the ‘error’ in this estimate?For Var(Y ) we use the result:

Var (∑

aiXi) =∑

a2i Var(Xi),

valid when the rvs X1, . . . , Xn areindependent. But Y is defined as

Y = X1+X2+···+Xnn , so a1 = . . . = an =1n,

Hence Var(Y ) =1

n2σ2 +

1

n2σ2 + · · ·+

1

n2σ2

=nσ2

n2=

σ2

n.

But Var(X1) = σ2 . So, for n > 1,

Var(Y ) =σ2

n< Var(X1).

On average Y will be closer to µ than X1will: it is a better estimator of µ.

Notes:

1. The sample mean is usually denoted by abar over the symbol, e.g.,

X =1

n(X1 + X2 + · · ·+ Xn) .

2. We show in §5.9 that linear combinationsof Normal rvs are themselves Normal . IfX1, X2, . . . , Xn are mutually independent, andif they are all N(µ, σ2), then

X ∼ N(µ, σ2/n).This is an important result, and is used veryfrequently in statistical practice.

3. When estimating the mean µ from arandom sample, it is not necessarily best touse X. Another possible estimator is thesample median M .If X1, X2, . . . , Xn are mutually independent,and if they are all N(µ, σ2), it can be shownthat

M.∼ N

(µ,

π

2

σ2

n

).

Estimating the variance

The sample mean X is usually used to

estimate µ. What can we use to estimate σ2?

Since Var(X) is defined as E{(X − µ)2}, weconsider the sample version of this quantity:

1

n

n∑i=1

(Xi − µ)2.

However, µ is typically unknown , so cannot

be used. It is sensible to consider replacing it

by X . We therefore examine the properties

of a statistic T , defined as

T =1

n

n∑i=1

(Xi −X)2

as an estimator for σ2.

Will it be too high or too low, on average?

We need to find E(T ).

We start by expanding the term (Xi −X)2.This gives

T =1

n

n∑i=1

(Xi −X)2

=1

n

n∑i=1

X2i − 2Xn∑

i=1

Xi + nX2

.=

1

n

n∑i=1

X2i − X2.

Since expectation is a linear operator, we

obtain

E(T ) =1

n

n∑

i=1

E(X2i )

− E(X2)For any rv (W , say)

Var(W ) = E(W2)− {E(W )}2.So E(W2) = Var(W ) + {E(W )}2.

We know E(Xi), Var(Xi), E(X) and Var(X),

so we can now obtain E(X2i ) and E(X2).

We obtain

E(T ) =1

n

n∑1

[{E(Xi)}2 + Var(Xi)

]−

[{E(X)}2 + Var(X)

]= (µ2 + σ2)− (µ2 +

σ2

n)

=n− 1

nσ2.

This shows that, on average, T is a little too

small. To compensate for this, we usually use

S2 =n

n− 1T =

1

n− 1

n∑i=1

(Xi −X)2

to estimate σ2.

Since E(S2) = E(

n

n− 1T

)= σ2 , S2 is said to

be an unbiased estimator of σ2.

5.9 Sums of random variables

If X1, X2, . . . , Xn are independent, and

Y = X1 + X2 + · · ·+ Xn

is their sum, we know that

E(Y ) =n∑

i=1

E(Xi)

and that

Var(Y ) =n∑

i=1

Var(Xi).

But what is the distribution of Y ?

This distribution can be very complicated,

and there are several methods which can be

used to find it (seen in later courses).

Here we look at the use of the

moment generating function (mgf)

Moment Generating Function

DEFINITION: The mgf MX(s) of a random

variable X is defined as MX(s) = E(esX) :

hence

MX(s) = E(esX)

=∑x

esx Pr(X = x) (discrete)

=∫ ∞−∞

esxfX(x)dx (continuous).

Examples:

(1) Binomial distribution B(n, p)

MX(s) =n∑

x=0

esx(nx

)px(1− p)n−x

=n∑

x=0

(nx

)(pes)x (1− p)n−x

= (1− p + pes)n .

(2) Exponential distribution, parameter λ

MX(s) =∫ ∞0

esxfX(x) dx

=∫ ∞0

esxλe−λxdx

=λ

λ− s, s < λ, (exercise 82).

Note that this integral is valid for s < λ only.

(3) Normal distribution, N(µ, σ2)

MX(s) =∫ ∞−∞

esx1

σ√

2πe−(x−µ)

2

2σ2 dx

= eµs+12σ

2s2 .

Evaluating this integral is tricky, and beyondthe scope of this module.

The key point is that if one calculates themgf for some rv W , and finds it to be of the

form eas+12bs

2, then the distribution of W

must be Normal; the mean will be a, and thevariance will be b.

Some properties of mgfs

(1) Generating moments

Definition: For any rv X, E(Xr) is known as

the rth moment of X.

So the mean, E(X), is the first moment of

X, E(X2) is the second moment, and so on.

Now esX = 1 + sX +s2

2X2 + · · · ,

and therefore

E(esX

)= 1 + sE(X) +

s2

2E(X2) + · · · .

So, if one expands MX(s) in powers of s , the

coefficient of sr/r! will be E(Xr) , the rth

moment of X.

In particular, the coefficient of s will be the

mean , and the coefficient of 12s2 will be

E(X2), from which we can calculate the

variance .

Example: For the exponential distribution

with parameter λ, the mgf is

MX(s) = λ/(λ− s) = (1− s/λ)−1

= 1 +s

λ+

s2

λ2+ · · ·

= 1 +1

λs +

2

λ212s

2 + · · ·

Hence E(X) =1

λand E(X2) =

2

λ2.

These values agree with those we found in

§4.5; from them we can show that

Var(X) = E(X2)−{E(X)}2 =2

λ2−(1

λ

)2=

1

λ2.

(2) Sums of independent rvs:

Let X and Y be independent rvs, and letZ = X + Y . Then

MZ(s) = E(esZ

)= E

{es(X+Y )

}= E

(esXesY

)= E

(esX

)E(esX

)(by independence)

= MX(s)MY (s).

The mgf of the sum of two independent rvsis the product of their mgfs.

The result also holds for the sum of nindependent rvs,

MX1+X2+···+Xn(s) = MX1(s)MX2(s) · · ·MXn(s).

This is an important result. It gives us arelatively easy way to find the distribution ofsample means – which are just sums of rvsdivided by a constant.

Example 1: Binomial distribution

Suppose that rvs X1 and X2 are known to be

independent, and that X1 ∼ B(n1, p) andX2 ∼ B(n2, p). Find the distribution ofY = X1 + X2 .

MY (s) = MX1(s)MX2(s)

= (1− p + pes)n1 (1− p + pes)n2

= (1− p + pes)n1+n2 .

This is the mgf of the B(n1 + n2, p)

distribution. We have therefore shown that

X1 + X2 ∼ B(n1 + n2, p).

Note that the result is not valid if the

probabilities of success are different. If

X1 ∼ B(n1, p1) and X2 ∼ B(n2, p2), withp1 6= p2, then the mgf of Y = X1 + X2 willnot be of the binomial form.

Example 2: Exponential distribution

Let X1 and X2 be independent, and let them

both have an exponential distribution with

parameter λ.

The distribution of X1 + X2 has mgf(

λλ−s

)2.

This is not of the same form as the mgf of an

exponential distribution, so the sum of

exponential rvs does not have an exponential

distribution.

Example 3: Normal distribution

Suppose that rvs X1, X2, . . . , Xn are all

independent, and that they are Normally

distributed with Xi ∼ N(µi, σ2i ).

Find the distribution of the sum Y =n∑1

Xi.

Now the mgf of Xi is eµis+

12σ

2i s

2, and the mgf

of Y is the product of all the mgfs of the Xs.

Hence MY (s) =n∏

i=1

e(µis+12σ

2i s

2)

That is, MY (s) = e{∑

(µis+12σ

2i s

2)}

= es∑

µi+s22

∑σ2i ,

where all sums are over the range i = 1, ..., n.

We see that the mgf of Y is of the form

eas+12bs

2,which is the form of the mgf of a

Normal distribution. Therefore, Y is Normally

distributed.

Recall that, if X ∼ N(µ, σ2) then the mgf of

X will be eµs+12σ

2s2. The mgf of Y is clearly

of this form, so

Y ∼ N(n∑

i=1

µi,n∑

i=1

σ2i ) .

Sums of Normal rvs are themselves

Normally distributed.

Very few distributions have this property. The

fact that the Normal does contributes a great

deal to its importance.

CHAPTER 5 SUMMARY

Joint distributions

• Description of the simultaneous behaviourof two or more random variables.

• Discrete, joint pf: PX,Y (x, y)Continuous, joint pdf:fX,Y (x, y)

• joint cdf• marginal and conditional distributions• independent rvs:

– E[g(X)h(Y )] = E[g(X)]E[h(Y )]– fX,Y (x, y) = fX(x)fY (y)

• Expectation as an operator– expectation of function of 2 rvs– linear operator: E(aX + bY + c)– covariance– mean and variance of linear

transformations, especially ofindependent rvs

– variance of a sum of rvs,• sampling problems• sums of independent rvs: mgfs

COURSE SUMMARY p1/2

[Chapters 1 and 2]

• Experiment, event, sample space.• Union, intersection, complement• Exclusive and exhaustive events• Probability: axioms.• Interpretations of probability:

– symmetry,

– limiting relative frequency,

– subjective probability

• Deductions from axioms• Sampling problems, replacement• Conditional probability,• Independence (pairwise, mutual)• Important theorems

– law of total probability

– Bayes’ theorem

COURSE SUMMARY p2/2[Chapters 3 ,4 and 5]

• Discrete and continuous rvs– discrete: pf, cdf– continuous: pdf, cdf

• Expectation and variance• Bernoulli trials• Important discrete distributions

– binomial, Poisson, geometric– Poisson approximation to binomial

• Important continuous distributions– uniform, exponential, Normal

• The Normal distribution– standardisation: N(0,1)– Use of tables– Normal approximation to the

binomial and Poisson distributions• Joint distributions

– Marginal and conditional distributions– Independent random variables– Use of expectation– Sampling and estimation– Sum of rvs. Moment generating fn

chapter 5. joint distributions - university of kentchapter 5. joint distributions 5.1 introduction...

Documents