chapter 8: introduction to statistical inference
DESCRIPTION
Chapter 8: Introduction to Statistical Inference. In Chapter 8:. 8.1 Concepts 8.2 Sampling Behavior of a Mean 8.3 Sampling Behavior of a Count and Proportion. § 8.1: Concepts. - PowerPoint PPT PresentationTRANSCRIPT
Apr 20, 2023
Chapter 8: Chapter 8: Introduction to Statistical Introduction to Statistical
InferenceInference
In Chapter 8:
8.1 Concepts
8.2 Sampling Behavior of a Mean
8.3 Sampling Behavior of a Count and Proportion
§8.1: ConceptsStatistical inference is the act of generalizing from a sample to a population with calculated degree of certainty.
We calculate statistics in the sample
We are curious about parameters in the population
Parameters Statistics
Source Population Sample
Calculated? No Yes
Constant? Yes No
Notation (examples) μ, σ, p psx ˆ,,
Parameters and Statistics
It is essential to draw the distinction between parameters and statistics.
§8.2 Sampling Behavior of a Mean
• How precisely does a given sample mean reflect the underlying population mean?
• To answer this question, we must establish the sampling distribution of x-bar
• The sampling distribution of x-bar is the hypothetical distribution of means from all possible samples of size n taken from the same population
Simulation Experiment
• Population: N = 10,000 with lognormal distribution (positive skew), μ = 173, and σ = 30 (Figure A, next slide)
• Take repeated SRSs, each of n = 10, from this population
• Calculate x-bar in each sample
• Plot x-bars (Figure B , next slide)
A. Population (individual values)
B. Sampling distribution of x-bars
Findings1. Distribution B is Normal
even though Distribution A is not (Central Limit Theorem)
2. Both distributions are centered on µ (“unbiasedness”)
3. The standard deviation of Distribution B is much less than the standard deviation of Distribution A (square root law)
Results from Simulation Experiment
• Finding 1 (central limit theorem) says the sampling distribution of x-bar tends toward Normality even when the population distribution is not Normal. This effect is strong in large samples.
• Finding 2 (unbiasedness) means that the expected value of x-bar is μ
• Finding 3 is related to the square root law which says:
nx
Standard Deviation (Error) of the Mean
• The standard deviation of the sampling distribution of the mean has a special name: it is called the “standard error of the mean” (SE)
• The square root law says the SE is inversely proportional to the square root of the sample size:
nSExx
Example, the Weschler Adult Intelligence Scale has σ = 15
Quadrupling the sample size cut the SE in half Square root law!
151
15
nSEx
For n = 1
5.74
15
nSEx
For n = 4
75.316
15
nSEx
For n = 16
Putting it together:
• The sampling distribution of x-bar is Normal with mean µ and standard deviation (SE) = σ / √n (when population Normal or n is large)
• These facts make inferences about µ possible• Example: Let X represent Weschler adult
intelligence scores: X ~ N(100, 15). Take an SRS of n = 10 SE = σ / √n = 15/√10 = 4.7 xbar ~ N(100, 4.7) 68% of sample mean will be in the range
µ ± SE = 100 ± 4.7 = 95.3 to 104.7
x ~ N(µ, SE)
Law of Large Numbers
• As a sample gets larger and larger, the sample mean tends to get closer and closer to the μ
• This tendency is known as the Law of Large Numbers
This figure shows results from a sampling experiment in a population with μ = 173.3
As n increased, the sample mean became a better reflection of μ = 173.3
8.3 Sampling Behavior of Counts and Proportions
• Recall (from Ch 6) that binomial random variable represents the random number of successes (X) in n independent “success/failure” trials; the probability of success for each trial is p
• Notation X~b(n,p)
• The sampling distribution X~b(10,0.2) is shown on the next slide: μ = 2 when the outcome is expressed as a count and μ = 0.2 when the outcome is expressed as a proportion.
Normal Approximation to the Binomial
• When n is large, the binomial distribution takes on Normal properties
• How large does the sample have to be to apply the Normal approximation?
• One rule says that the Normal approximation applies when npq ≥ 5
Top figure: X~b(10,0.2)npq = 10 ∙ 0.2 ∙ (1–0.2) = 1.6 (less than 5) → Normal approximation does not apply
Bottom figure: X~b(100,0.2) npq = 100 ∙ 0.2 ∙ (1−0.2) = 16 (greater than 5) → Normal approximation applies
Normal Approximation for a Binomial Count
npqnpNX ,~
When Normal approximation applies:
npqnp and
n
pqpNp
n
pqp
,~ˆ
and
Normal Approximation for a Binomial Proportion
“p-hat” is the symbol for the sample proportion
Illustrative Example: Normal Approximation to the Binomial
• Suppose the prevalence of a risk factor in a population is p = 0.2
• Take an SRS of n = 100 from this population
• A variable number of cases in a sample will follow a binomial distribution with n = 20 and p = .2
Illustrative Example, cont.
4,20~
48.2.100 and
20.2100
NX
npq
np
The Normal approximation for the binomial count is:
04.0,2.0~ˆ
04.0100
8.2.
.2
Np
n
pq
p
The Normal approximation for the binomial proportion is:
1. Statement of a problem: Suppose we see a sample with 30 cases. What is the probability of see at least 30 cases under these circumstance, i.e., Pr(X ≥ 30) = ? assuming X ~ N(20, 4)
2. Standardize: z = (30 – 20) / 4 = 2.5
3. Sketch: next slide
4. Table B: Pr(Z ≥ 2.5) = 0.0062
Illustrative Example, cont.
Illustrative Example, cont.Binomial and superimposed Normal sampling distributions for the problem.
Our Normal approximation suggests that only .0062 of samples will see at least this many cases.