lecture 3: the no rm a l di str ibution a n d sta tistica l in...

Lecture 3: The Normal Distribution andStatistical Inference

Ani [email protected]

19 April 2007

1 / 62

A Review and Some Connections

The Normal Distribution

The Central Limit Theorem

Estimates of means and proportions: uses and properties

Confidence intervals and Hypothesis tests

2 / 62

The Normal Distribution

Probability distribution for continuous data

Under certain conditions, can be used to approximatebinomial probabilities

np>5n(1-p)>5

Characterized by a symmetric bell-shaped curve (Gaussiancurve)

Symmetric about its mean µ

3 / 62

Normal Distribution

Takes on values between −∞ and +∞Mean = Median = Mode

Area under curve equals 1

Parametersµ = meanσ = standard deviation

4 / 62

Normal Distribution

Norm

al D

ensit

y

−∞ µ +∞

Notation for Normal random variable: X ∼ N(µ,σ2)

5 / 62

Formula: Normal Distribution

The normal probability distribution is given by:

f (x) =1√2πσ

· e−(x−µ)2/2σ2,−∞ < x < +∞

π ≈ 3.14 and e ≈ 2.72 are mathematical constants

µ,σ are mean and SD parameters of the distribution

6 / 62

Standard Normal

The standard normal distribution has parametersµ = 0 and σ = 1

Its density function is written as:

f (x) =1√2π

· e−x2/2,−∞ < x < +∞

We typically use the letter Z to denote a standard normalrandom variable (Z ∼ N(0, 1))

If X ∼ N(µ,σ), then X−µσ ∼ N(0, 1)

7 / 62

68-95-99.7 Rule I

68% of density is within one standard deviation of the mean

8 / 62

68-95-99.7 Rule II

95% of density is within two standard deviations of the mean

9 / 62

68-95-99.7 Rule III

99.7% of density is within three standard deviations of the mean

10 / 62

Different Means

Norm

al D

ensit

y

µ1 µ2 µ3

Three normal distributions with different meansµ1 < µ2 < µ3

11 / 62

Different Standard Deviations

Norm

al D

ensit

y

σ1

σ2

σ3

Three normal distributions with different standard deviationsσ1 < σ2 < σ3

12 / 62

Standard Normal

−4 −2 0 2 4

µ=0

Norm

al D

ensit

y

σ=1

13 / 62

Example: Birthweights I

Birthweights (in grams) of infants in a population

14 / 62

Example: Birthweights II

Continuous data

Mean = Median = Mode = 3000 = µ

Standard deviation = 1000 = σ

The area under the curve represents the probability(proportion) of infants with birthweights between certainvalues

15 / 62

Normal Probabilities

16 / 62

Calculating Probabilities

Equivalent to finding area under the curve

Continuous distribution, so we cannot use sums to findprobabilities

Performing the integration is not necessary since tables andcomputers are available

17 / 62

Z Tables

18 / 62

Normal Table

19 / 62

Looking up z=2.22

20 / 62

Looking up z=-0.67

21 / 62

Example: Birthweights

22 / 62

Question I

What is the probability of an infant weighing more than 5000g?

P(X > 5000) = P(X − µ

σ>

5000− 3000

1000)

= P(Z > 2)

= 0.0228

23 / 62

Question II

What is the probability of an infant weighing between 2500 and4000g?

P(2500 < X < 4000) = P(2500− 3000

1000<

X − µ

σ<

4000− 3000

1000)

= P(−0.5 < Z < 1)

= 1− P(Z > 1)− P(Z < −0.5)

= 1− 0.1587− 0.3085

= 0.5328

24 / 62

Question III

What is the probability of an infant weighing less than 3500g?

P(X < 3500) = P(X − µ

σ<

3500− 3000

1000)

= P(Z < 0.5)

= 1− P(Z > 0.5)

= 1− 0.3085

= 0.6915

25 / 62

Statistical Inference

Populations and samples

Sampling distributions

26 / 62

Definitions

Statistical inference is “the attempt to reach a conclusionconcerning all members of a class from observations of onlysome of them.” (Runes 1959)

A population is a collection of observations

A parameter is a numerical descriptor of a population

A sample is a part or subset of a population

A statistic is a numerical descriptor of the sample

27 / 62

Population

Population size = N

µ = mean, a measure of center

σ2 = variance, a measure of dispersion

σ = standard deviation

28 / 62

Sample Estimates

Sample size = n

X̄ = sample mean

s2 = sample variance

s = sample standard deviation

Population: parameters

Sample: statistics

29 / 62

Estimating µ

Usually µ is unknown and we would like to estimate it

We use X̄ to estimate µ

We know the sampling distribution of X̄

30 / 62

Sampling Distribution

The distribution of all possible values of some statistic, computedfrom samples of the same size randomly drawn from the samepopulation, is called the sampling distribution of that statistic

31 / 62

Sampling Distribution of X̄

When sampling from a normally distributed population

X̄ will be normally distributed

The mean of the distribution of X̄ is equal to the true mean µof the population from which the samples were drawn

The variance of the distribution is σ2/n, where σ2 is thevariance of the population and n is the sample size

We can write: X̄ ∼ N(µ,σ2/n)

When sampling is from a population whose distribution is notnormal and the sample size is large, use the Central LimitTheorem

32 / 62

The Central Limit Theorem (CLT)

Given a population of any distribution with mean, µ, and variance,σ2, the sampling distribution of X̄ , computed from samples of sizen from this population, will be approximately N(µ,σ2/n) whenthe sample size is large

In general, this applies when n ≥ 25

The approximation of normality becomes better as n increases

33 / 62

What about for Binomial RVs? I

First, recall that a Binomial variable is just the sum of nBernoulli variable: Sn =

∑ni=1 Xi

Notation:

Sn ∼ Binomial(n,p)Xi ∼ Bernoulli(p) = Binomial(1, p) for i = 1, . . . , n

34 / 62

What about for Binomial RVs? II

In this case, we want to estimate p by p̂ where

p̂ =Sn

n=

∑ni=1 Xi

n= X̄

p̂ is just a sample mean!

So we can use the central limit theorem when n is large

35 / 62

Binomial CLT

For a Bernoulli variableµ = mean = pσ2 = variance = p(1-p)

X̄ ≈ N(µ,σ2/n) as before

Equivalently, p̂ ≈ N(p, p(1−p)n )

36 / 62

Notation I

Often we are interested in detecting a difference between twopopulations

Differences in average income by neighborhood

Differences in disease cure rates by age

37 / 62

Notation II

Population 1:

Size = N1

Mean = µ1

Standard deviation = σ1

Population 2:

Size = N2

Mean = µ2

Standard deviation = σ2

Samples of size n1 from Population 1:

Mean = µX̄1= µ1

Standard deviation =σ1/√

n1 = σX1

Samples of size n2 from Population 2:

Mean = µX̄2= µ2

Standard deviation =σ2/√

n2 = σX2

38 / 62

Notation III

Now by CLT, for large n:

X̄1 ∼ N(µ1,σ21/n1)

X̄2 ∼ N(µ2,σ22/n2)

and X̄1 − X̄2 ≈ N(µ1 − µ2,σ2

1n1

+σ2

2n2

)

39 / 62

Difference in proportions?

We’re done if the underlying variable is continuous. What ifthe underlying variable is Binomial?

Then X̄1 − X̄2 ≈ N(µ1 − µ2,σ2

1n1

+σ2

2n2

)is replaced by:

p̂1 − p̂2 ≈ N(p1 − p2,p1(1− p1)

n1+

p2(1− p2)

n2)

40 / 62

Sampling Distributions

Sampling DistributionStatistic Mean Variance

X̄ µ σ2

n

X̄1 − X̄2 µ1 - µ2σ2

1n1

+σ2

2n2

p̂ p pqn

np̂ np npqp̂1 − p̂2 p1 − p2

p1q1n1

+ p2q2n2

41 / 62

Statistical inference

Two methodsEstimationHypothesis testing

Both make use of sampling distributions

Remember to use CLT

42 / 62

Estimation

Point estimation

An estimator of a population parameter: a statistic (e.g. x̄ , p̂)

An estimate of a population parameter: the value of theestimator for a particular sample

Interval estimation

A point estimate plus an interval that expresses theuncertainty or variability associated with the estimate

43 / 62

Hypothesis Testing

Given the observed data, do we reject or accept apre-specified null hypothesis in favor of an alternative?

“Significance testing”

44 / 62

Point Estimation

X̄ is a point estimator of µ

X̄1 − X̄2 is a point estimator of µ1 − µ2

p̂ is a point estimator of p

p̂1 − p̂2 is a point estimator of p1 − p2

We know the sampling distribution of these statistics, e.g.

X̄ ∼ N(µX̄ = µ,σX̄ =σ√n)

If σ is not known, we can use s, the sample standard deviation, asa point estimator of σ

45 / 62

Interval Estimation

100(1− α)% Confidence interval:

estimate ± (tabled value of z or t) · (standard error)

Plugging in the values, we get

X̄ ± zα/2 × σX̄ = [L,U]

46 / 62

Confidence Interval

We are saying that

P(−zα/2 ≤ Z ≤ zα/2) = 1− α

P(−zα/2 ≤ X̄ − µ

σX̄≤ zα/2) = 1− α

P(−zα/2 · σX̄ ≤ X̄ − µ ≤ zα/2 · σX̄ ) = 1− α

After some algebra:

P(X̄ − zα/2 · σX̄ ≤ µ ≤ X̄ + zα/2 · σX̄ ) = 1− α

P(L ≤ µ ≤ U) = 1− α

47 / 62

CI for mean

A confidence interval for µ is given by the interval estimate

X̄ ± z(α/2) · σX̄

when the population variance σ2 is known

48 / 62

Interpretation

Before the data are observed, the probability is at least(1− α) that [L,U] will contain µ, the population parameter

In repeated sampling from a normally distributed population,100(1− α)% of all intervals of the form above will include thethe population mean µ

After the data are observed, the constructed interval [L,U]either contains the true mean or it does not (no probabilityinvolved anymore)

49 / 62

Known Variance

Sampling from a normally distributed population with knownvariance (σ2 known)

Confidence interval: X̄ ± z(α/2) · σX̄

What if σ2 is unknown?

50 / 62

The t-distribution

t Den

sity

df=2df=5df=20

t = X̄−µs/√

n51 / 62

Use Sample Variance I

Sampling from a normally distributed population withpopulation variance unknown

We can make use of the sample variance s2

Now we construct the confidence interval as:

X̄ ± z(α/2) · sX̄ when n is “large”

X̄ ± t(α/2,n−1) · sX̄ when n is “small”

52 / 62

Use Sample Variance II

Estimate σ2 with s2

Here, sX̄ = s√n

and tα/2 has n-1 degrees of freedom

The distribution of X̄ is not quite normal, so we need thet-distribution

53 / 62

Properties of the t-distribution

mean = median = mode = 0

Symmetric about the mean

t ranges from −∞ to +∞Family of distributions determined by n − 1, the degrees offreedom

The t distribution approaches the normal distribution as n − 1approaches ∞

54 / 62

Comparing t with normal

Dens

ity

Std. normalt with df=2

55 / 62

Confidence intervals for means

Population Sample Population 95% ConfidenceDistribution Size Variance Interval

NormalAny σ2 known X̄ ± 1.96σ/

√n

Any σ2 unknown, use s2 X̄ ± t0.025,n−1s/√

nNot Normal/ Large σ2 known X̄ ± 1.96σ/

√n

UnknownLarge σ2 unknown, use s2 X̄ ± 1.96s/

√n

Small Any Non-parametric methods

BinomialLarge - p̂ ± 1.96

√p̂(1− p̂)/n

Small - Exact methods

56 / 62

Confidence Intervals for Differences in Means

This is a bit tricky

Recall that formulas for CIs for a single mean depend onwhether or not σ2 is knownthe sample size

For a difference in means, the formula for a CI depends onwhether or not the variances are assumed to be equal whenvariance are unknownsample sizes in each group

57 / 62

Equal Variances I

When variances are assumed to be equal:

The standard error of the difference is estimated by:√s2p

n1+

s2p

n2

Here, s2p is the pooled variance

58 / 62

Equal Variances II

s2p =

(n1 − 1)s21 + (n2 − 1)s2

2

n1 + n2 − 2

where df = n1 + n2 − 2

Recall, n1 is the size of sample 1,and n2 is the size of sample 2

59 / 62

Unequal Variances

When variances are assumed to be unequal:

The standard error of the difference is estimated by:√s21

n1+

s22

n2

Here, df = ν and

ν =

s21

n1+

s22

n2

(s21/n1)2

n1−1 +(s2

2/n2)2

n2−1

60 / 62

Confidence intervals for difference of means

Population Sample Population 95% ConfidenceDistribution Size Variances Interval

Normal

Any known (X̄1 − X̄2) ± 1.96√

σ21

n1+ σ2

2n2

Any unknown, (X̄1 − X̄2) ± t0.025,n1+n2−2

√s2p

n1+

s2p

n2

σ21 = σ2

2

Any unknown, (X̄1 − X̄2) ± t0.025,ν

√s21

n1+ s2

2n2

σ21 )= σ2

2

Large known (X̄1 − X̄2) ± 1.96√

σ21

n1+ σ2

2n2

Not Normal/ Large unknown, (X̄1 − X̄2) ± 1.96√

s2p

n1+

s2p

n2

Unknown σ21 = σ2

2

Large unknown, (X̄1 − X̄2) ± 1.96√

s21

n1+ s2

2n2

σ21 )= σ2

2

Small Any Non-parametric methods61 / 62

Confidence intervals for difference of proportions

Population Sample 95% ConfidenceDistribution Size Interval

BinomialLarge (p̂1 − p̂2) ± 1.96

√p̂1(1−p̂1)

n1+ p̂2(1−p̂2)

n2

Small Exact methods

62 / 62

lecture 3: the no rm a l di str ibution a n d sta tistica l in...

Documents