chapter 8: introduction to statistical inference

Apr 20, 2023

Chapter 8: Chapter 8: Introduction to Statistical Introduction to Statistical

InferenceInference

In Chapter 8:

8.1 Concepts

8.2 Sampling Behavior of a Mean

8.3 Sampling Behavior of a Count and Proportion

§8.1: ConceptsStatistical inference is the act of generalizing from a sample to a population with calculated degree of certainty.

We calculate statistics in the sample

We are curious about parameters in the population

Parameters Statistics

Source Population Sample

Calculated? No Yes

Constant? Yes No

Notation (examples) μ, σ, p psx ˆ,,

Parameters and Statistics

It is essential to draw the distinction between parameters and statistics.

§8.2 Sampling Behavior of a Mean

• How precisely does a given sample mean reflect the underlying population mean?

• To answer this question, we must establish the sampling distribution of x-bar

• The sampling distribution of x-bar is the hypothetical distribution of means from all possible samples of size n taken from the same population

Simulation Experiment

• Population: N = 10,000 with lognormal distribution (positive skew), μ = 173, and σ = 30 (Figure A, next slide)

• Take repeated SRSs, each of n = 10, from this population

• Calculate x-bar in each sample

• Plot x-bars (Figure B , next slide)

A. Population (individual values)

B. Sampling distribution of x-bars

Findings1. Distribution B is Normal

even though Distribution A is not (Central Limit Theorem)

2. Both distributions are centered on µ (“unbiasedness”)

3. The standard deviation of Distribution B is much less than the standard deviation of Distribution A (square root law)

Results from Simulation Experiment

• Finding 1 (central limit theorem) says the sampling distribution of x-bar tends toward Normality even when the population distribution is not Normal. This effect is strong in large samples.

• Finding 2 (unbiasedness) means that the expected value of x-bar is μ

• Finding 3 is related to the square root law which says:

nx

Standard Deviation (Error) of the Mean

• The standard deviation of the sampling distribution of the mean has a special name: it is called the “standard error of the mean” (SE)

• The square root law says the SE is inversely proportional to the square root of the sample size:

nSExx

Example, the Weschler Adult Intelligence Scale has σ = 15

Quadrupling the sample size cut the SE in half Square root law!

151

15

nSEx

For n = 1

5.74

15

nSEx

For n = 4

75.316

15

nSEx

For n = 16

Putting it together:

• The sampling distribution of x-bar is Normal with mean µ and standard deviation (SE) = σ / √n (when population Normal or n is large)

• These facts make inferences about µ possible• Example: Let X represent Weschler adult

intelligence scores: X ~ N(100, 15). Take an SRS of n = 10 SE = σ / √n = 15/√10 = 4.7 xbar ~ N(100, 4.7) 68% of sample mean will be in the range

µ ± SE = 100 ± 4.7 = 95.3 to 104.7

x ~ N(µ, SE)

Law of Large Numbers

• As a sample gets larger and larger, the sample mean tends to get closer and closer to the μ

• This tendency is known as the Law of Large Numbers

This figure shows results from a sampling experiment in a population with μ = 173.3

As n increased, the sample mean became a better reflection of μ = 173.3

8.3 Sampling Behavior of Counts and Proportions

• Recall (from Ch 6) that binomial random variable represents the random number of successes (X) in n independent “success/failure” trials; the probability of success for each trial is p

• Notation X~b(n,p)

• The sampling distribution X~b(10,0.2) is shown on the next slide: μ = 2 when the outcome is expressed as a count and μ = 0.2 when the outcome is expressed as a proportion.

Normal Approximation to the Binomial

• When n is large, the binomial distribution takes on Normal properties

• How large does the sample have to be to apply the Normal approximation?

• One rule says that the Normal approximation applies when npq ≥ 5

Top figure: X~b(10,0.2)npq = 10 ∙ 0.2 ∙ (1–0.2) = 1.6 (less than 5) → Normal approximation does not apply

Bottom figure: X~b(100,0.2) npq = 100 ∙ 0.2 ∙ (1−0.2) = 16 (greater than 5) → Normal approximation applies

Normal Approximation for a Binomial Count

npqnpNX ,~

When Normal approximation applies:

npqnp and

n

pqpNp

n

pqp

,~ˆ

and

Normal Approximation for a Binomial Proportion

“p-hat” is the symbol for the sample proportion

Illustrative Example: Normal Approximation to the Binomial

• Suppose the prevalence of a risk factor in a population is p = 0.2

• Take an SRS of n = 100 from this population

• A variable number of cases in a sample will follow a binomial distribution with n = 20 and p = .2

Illustrative Example, cont.

4,20~

48.2.100 and

20.2100

NX

npq

np

The Normal approximation for the binomial count is:

04.0,2.0~ˆ

04.0100

8.2.

.2

Np

n

pq

p

The Normal approximation for the binomial proportion is:

1. Statement of a problem: Suppose we see a sample with 30 cases. What is the probability of see at least 30 cases under these circumstance, i.e., Pr(X ≥ 30) = ? assuming X ~ N(20, 4)

2. Standardize: z = (30 – 20) / 4 = 2.5

3. Sketch: next slide

4. Table B: Pr(Z ≥ 2.5) = 0.0062

Illustrative Example, cont.

Illustrative Example, cont.Binomial and superimposed Normal sampling distributions for the problem.

Our Normal approximation suggests that only .0062 of samples will see at least this many cases.

chapter 8: introduction to statistical inference

Documents

pthe sampling distribution

population distribution

binomial distribution

hypothetical distribution

pnotation x

populationcalculate

given sample mean

binomialwhen n