lecture 3: the no rm a l di str ibution a n d sta tistica l in...
TRANSCRIPT
![Page 1: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/1.jpg)
Lecture 3: The Normal Distribution andStatistical Inference
19 April 2007
1 / 62
![Page 2: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/2.jpg)
A Review and Some Connections
The Normal Distribution
The Central Limit Theorem
Estimates of means and proportions: uses and properties
Confidence intervals and Hypothesis tests
2 / 62
![Page 3: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/3.jpg)
The Normal Distribution
Probability distribution for continuous data
Under certain conditions, can be used to approximatebinomial probabilities
np>5n(1-p)>5
Characterized by a symmetric bell-shaped curve (Gaussiancurve)
Symmetric about its mean µ
3 / 62
![Page 4: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/4.jpg)
Normal Distribution
Takes on values between −∞ and +∞Mean = Median = Mode
Area under curve equals 1
Parametersµ = meanσ = standard deviation
4 / 62
![Page 5: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/5.jpg)
Normal Distribution
Norm
al D
ensit
y
−∞ µ +∞
Notation for Normal random variable: X ∼ N(µ,σ2)
5 / 62
![Page 6: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/6.jpg)
Formula: Normal Distribution
The normal probability distribution is given by:
f (x) =1√2πσ
· e−(x−µ)2/2σ2,−∞ < x < +∞
π ≈ 3.14 and e ≈ 2.72 are mathematical constants
µ,σ are mean and SD parameters of the distribution
6 / 62
![Page 7: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/7.jpg)
Standard Normal
The standard normal distribution has parametersµ = 0 and σ = 1
Its density function is written as:
f (x) =1√2π
· e−x2/2,−∞ < x < +∞
We typically use the letter Z to denote a standard normalrandom variable (Z ∼ N(0, 1))
If X ∼ N(µ,σ), then X−µσ ∼ N(0, 1)
7 / 62
![Page 8: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/8.jpg)
68-95-99.7 Rule I
68% of density is within one standard deviation of the mean
8 / 62
![Page 9: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/9.jpg)
68-95-99.7 Rule II
95% of density is within two standard deviations of the mean
9 / 62
![Page 10: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/10.jpg)
68-95-99.7 Rule III
99.7% of density is within three standard deviations of the mean
10 / 62
![Page 11: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/11.jpg)
Different Means
Norm
al D
ensit
y
µ1 µ2 µ3
Three normal distributions with different meansµ1 < µ2 < µ3
11 / 62
![Page 12: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/12.jpg)
Different Standard Deviations
Norm
al D
ensit
y
σ1
σ2
σ3
Three normal distributions with different standard deviationsσ1 < σ2 < σ3
12 / 62
![Page 13: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/13.jpg)
Standard Normal
−4 −2 0 2 4
µ=0
Norm
al D
ensit
y
σ=1
13 / 62
![Page 14: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/14.jpg)
Example: Birthweights I
Birthweights (in grams) of infants in a population
14 / 62
![Page 15: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/15.jpg)
Example: Birthweights II
Continuous data
Mean = Median = Mode = 3000 = µ
Standard deviation = 1000 = σ
The area under the curve represents the probability(proportion) of infants with birthweights between certainvalues
15 / 62
![Page 16: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/16.jpg)
Normal Probabilities
16 / 62
![Page 17: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/17.jpg)
Calculating Probabilities
Equivalent to finding area under the curve
Continuous distribution, so we cannot use sums to findprobabilities
Performing the integration is not necessary since tables andcomputers are available
17 / 62
![Page 18: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/18.jpg)
Z Tables
18 / 62
![Page 19: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/19.jpg)
Normal Table
19 / 62
![Page 20: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/20.jpg)
Looking up z=2.22
20 / 62
![Page 21: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/21.jpg)
Looking up z=-0.67
21 / 62
![Page 22: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/22.jpg)
Example: Birthweights
22 / 62
![Page 23: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/23.jpg)
Question I
What is the probability of an infant weighing more than 5000g?
P(X > 5000) = P(X − µ
σ>
5000− 3000
1000)
= P(Z > 2)
= 0.0228
23 / 62
![Page 24: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/24.jpg)
Question II
What is the probability of an infant weighing between 2500 and4000g?
P(2500 < X < 4000) = P(2500− 3000
1000<
X − µ
σ<
4000− 3000
1000)
= P(−0.5 < Z < 1)
= 1− P(Z > 1)− P(Z < −0.5)
= 1− 0.1587− 0.3085
= 0.5328
24 / 62
![Page 25: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/25.jpg)
Question III
What is the probability of an infant weighing less than 3500g?
P(X < 3500) = P(X − µ
σ<
3500− 3000
1000)
= P(Z < 0.5)
= 1− P(Z > 0.5)
= 1− 0.3085
= 0.6915
25 / 62
![Page 26: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/26.jpg)
Statistical Inference
Populations and samples
Sampling distributions
26 / 62
![Page 27: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/27.jpg)
Definitions
Statistical inference is “the attempt to reach a conclusionconcerning all members of a class from observations of onlysome of them.” (Runes 1959)
A population is a collection of observations
A parameter is a numerical descriptor of a population
A sample is a part or subset of a population
A statistic is a numerical descriptor of the sample
27 / 62
![Page 28: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/28.jpg)
Population
Population size = N
µ = mean, a measure of center
σ2 = variance, a measure of dispersion
σ = standard deviation
28 / 62
![Page 29: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/29.jpg)
Sample Estimates
Sample size = n
X̄ = sample mean
s2 = sample variance
s = sample standard deviation
Population: parameters
Sample: statistics
29 / 62
![Page 30: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/30.jpg)
Estimating µ
Usually µ is unknown and we would like to estimate it
We use X̄ to estimate µ
We know the sampling distribution of X̄
30 / 62
![Page 31: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/31.jpg)
Sampling Distribution
The distribution of all possible values of some statistic, computedfrom samples of the same size randomly drawn from the samepopulation, is called the sampling distribution of that statistic
31 / 62
![Page 32: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/32.jpg)
Sampling Distribution of X̄
When sampling from a normally distributed population
X̄ will be normally distributed
The mean of the distribution of X̄ is equal to the true mean µof the population from which the samples were drawn
The variance of the distribution is σ2/n, where σ2 is thevariance of the population and n is the sample size
We can write: X̄ ∼ N(µ,σ2/n)
When sampling is from a population whose distribution is notnormal and the sample size is large, use the Central LimitTheorem
32 / 62
![Page 33: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/33.jpg)
The Central Limit Theorem (CLT)
Given a population of any distribution with mean, µ, and variance,σ2, the sampling distribution of X̄ , computed from samples of sizen from this population, will be approximately N(µ,σ2/n) whenthe sample size is large
In general, this applies when n ≥ 25
The approximation of normality becomes better as n increases
33 / 62
![Page 34: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/34.jpg)
What about for Binomial RVs? I
First, recall that a Binomial variable is just the sum of nBernoulli variable: Sn =
∑ni=1 Xi
Notation:
Sn ∼ Binomial(n,p)Xi ∼ Bernoulli(p) = Binomial(1, p) for i = 1, . . . , n
34 / 62
![Page 35: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/35.jpg)
What about for Binomial RVs? II
In this case, we want to estimate p by p̂ where
p̂ =Sn
n=
∑ni=1 Xi
n= X̄
p̂ is just a sample mean!
So we can use the central limit theorem when n is large
35 / 62
![Page 36: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/36.jpg)
Binomial CLT
For a Bernoulli variableµ = mean = pσ2 = variance = p(1-p)
X̄ ≈ N(µ,σ2/n) as before
Equivalently, p̂ ≈ N(p, p(1−p)n )
36 / 62
![Page 37: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/37.jpg)
Notation I
Often we are interested in detecting a difference between twopopulations
Differences in average income by neighborhood
Differences in disease cure rates by age
37 / 62
![Page 38: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/38.jpg)
Notation II
Population 1:
Size = N1
Mean = µ1
Standard deviation = σ1
Population 2:
Size = N2
Mean = µ2
Standard deviation = σ2
Samples of size n1 from Population 1:
Mean = µX̄1= µ1
Standard deviation =σ1/√
n1 = σX1
Samples of size n2 from Population 2:
Mean = µX̄2= µ2
Standard deviation =σ2/√
n2 = σX2
38 / 62
![Page 39: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/39.jpg)
Notation III
Now by CLT, for large n:
X̄1 ∼ N(µ1,σ21/n1)
X̄2 ∼ N(µ2,σ22/n2)
and X̄1 − X̄2 ≈ N(µ1 − µ2,σ2
1n1
+σ2
2n2
)
39 / 62
![Page 40: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/40.jpg)
Difference in proportions?
We’re done if the underlying variable is continuous. What ifthe underlying variable is Binomial?
Then X̄1 − X̄2 ≈ N(µ1 − µ2,σ2
1n1
+σ2
2n2
)is replaced by:
p̂1 − p̂2 ≈ N(p1 − p2,p1(1− p1)
n1+
p2(1− p2)
n2)
40 / 62
![Page 41: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/41.jpg)
Sampling Distributions
Sampling DistributionStatistic Mean Variance
X̄ µ σ2
n
X̄1 − X̄2 µ1 - µ2σ2
1n1
+σ2
2n2
p̂ p pqn
np̂ np npqp̂1 − p̂2 p1 − p2
p1q1n1
+ p2q2n2
41 / 62
![Page 42: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/42.jpg)
Statistical inference
Two methodsEstimationHypothesis testing
Both make use of sampling distributions
Remember to use CLT
42 / 62
![Page 43: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/43.jpg)
Estimation
Point estimation
An estimator of a population parameter: a statistic (e.g. x̄ , p̂)
An estimate of a population parameter: the value of theestimator for a particular sample
Interval estimation
A point estimate plus an interval that expresses theuncertainty or variability associated with the estimate
43 / 62
![Page 44: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/44.jpg)
Hypothesis Testing
Given the observed data, do we reject or accept apre-specified null hypothesis in favor of an alternative?
“Significance testing”
44 / 62
![Page 45: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/45.jpg)
Point Estimation
X̄ is a point estimator of µ
X̄1 − X̄2 is a point estimator of µ1 − µ2
p̂ is a point estimator of p
p̂1 − p̂2 is a point estimator of p1 − p2
We know the sampling distribution of these statistics, e.g.
X̄ ∼ N(µX̄ = µ,σX̄ =σ√n)
If σ is not known, we can use s, the sample standard deviation, asa point estimator of σ
45 / 62
![Page 46: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/46.jpg)
Interval Estimation
100(1− α)% Confidence interval:
estimate ± (tabled value of z or t) · (standard error)
Plugging in the values, we get
X̄ ± zα/2 × σX̄ = [L,U]
46 / 62
![Page 47: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/47.jpg)
Confidence Interval
We are saying that
P(−zα/2 ≤ Z ≤ zα/2) = 1− α
P(−zα/2 ≤ X̄ − µ
σX̄≤ zα/2) = 1− α
P(−zα/2 · σX̄ ≤ X̄ − µ ≤ zα/2 · σX̄ ) = 1− α
After some algebra:
P(X̄ − zα/2 · σX̄ ≤ µ ≤ X̄ + zα/2 · σX̄ ) = 1− α
P(L ≤ µ ≤ U) = 1− α
47 / 62
![Page 48: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/48.jpg)
CI for mean
A confidence interval for µ is given by the interval estimate
X̄ ± z(α/2) · σX̄
when the population variance σ2 is known
48 / 62
![Page 49: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/49.jpg)
Interpretation
Before the data are observed, the probability is at least(1− α) that [L,U] will contain µ, the population parameter
In repeated sampling from a normally distributed population,100(1− α)% of all intervals of the form above will include thethe population mean µ
After the data are observed, the constructed interval [L,U]either contains the true mean or it does not (no probabilityinvolved anymore)
49 / 62
![Page 50: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/50.jpg)
Known Variance
Sampling from a normally distributed population with knownvariance (σ2 known)
Confidence interval: X̄ ± z(α/2) · σX̄
What if σ2 is unknown?
50 / 62
![Page 51: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/51.jpg)
The t-distribution
t Den
sity
df=2df=5df=20
t = X̄−µs/√
n51 / 62
![Page 52: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/52.jpg)
Use Sample Variance I
Sampling from a normally distributed population withpopulation variance unknown
We can make use of the sample variance s2
Now we construct the confidence interval as:
X̄ ± z(α/2) · sX̄ when n is “large”
X̄ ± t(α/2,n−1) · sX̄ when n is “small”
52 / 62
![Page 53: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/53.jpg)
Use Sample Variance II
Estimate σ2 with s2
Here, sX̄ = s√n
and tα/2 has n-1 degrees of freedom
The distribution of X̄ is not quite normal, so we need thet-distribution
53 / 62
![Page 54: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/54.jpg)
Properties of the t-distribution
mean = median = mode = 0
Symmetric about the mean
t ranges from −∞ to +∞Family of distributions determined by n − 1, the degrees offreedom
The t distribution approaches the normal distribution as n − 1approaches ∞
54 / 62
![Page 55: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/55.jpg)
Comparing t with normal
Dens
ity
Std. normalt with df=2
55 / 62
![Page 56: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/56.jpg)
Confidence intervals for means
Population Sample Population 95% ConfidenceDistribution Size Variance Interval
NormalAny σ2 known X̄ ± 1.96σ/
√n
Any σ2 unknown, use s2 X̄ ± t0.025,n−1s/√
nNot Normal/ Large σ2 known X̄ ± 1.96σ/
√n
UnknownLarge σ2 unknown, use s2 X̄ ± 1.96s/
√n
Small Any Non-parametric methods
BinomialLarge - p̂ ± 1.96
√p̂(1− p̂)/n
Small - Exact methods
56 / 62
![Page 57: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/57.jpg)
Confidence Intervals for Differences in Means
This is a bit tricky
Recall that formulas for CIs for a single mean depend onwhether or not σ2 is knownthe sample size
For a difference in means, the formula for a CI depends onwhether or not the variances are assumed to be equal whenvariance are unknownsample sizes in each group
57 / 62
![Page 58: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/58.jpg)
Equal Variances I
When variances are assumed to be equal:
The standard error of the difference is estimated by:√s2p
n1+
s2p
n2
Here, s2p is the pooled variance
58 / 62
![Page 59: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/59.jpg)
Equal Variances II
s2p =
(n1 − 1)s21 + (n2 − 1)s2
2
n1 + n2 − 2
where df = n1 + n2 − 2
Recall, n1 is the size of sample 1,and n2 is the size of sample 2
59 / 62
![Page 60: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/60.jpg)
Unequal Variances
When variances are assumed to be unequal:
The standard error of the difference is estimated by:√s21
n1+
s22
n2
Here, df = ν and
ν =
s21
n1+
s22
n2
(s21/n1)2
n1−1 +(s2
2/n2)2
n2−1
60 / 62
![Page 61: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/61.jpg)
Confidence intervals for difference of means
Population Sample Population 95% ConfidenceDistribution Size Variances Interval
Normal
Any known (X̄1 − X̄2) ± 1.96√
σ21
n1+ σ2
2n2
Any unknown, (X̄1 − X̄2) ± t0.025,n1+n2−2
√s2p
n1+
s2p
n2
σ21 = σ2
2
Any unknown, (X̄1 − X̄2) ± t0.025,ν
√s21
n1+ s2
2n2
σ21 )= σ2
2
Large known (X̄1 − X̄2) ± 1.96√
σ21
n1+ σ2
2n2
Not Normal/ Large unknown, (X̄1 − X̄2) ± 1.96√
s2p
n1+
s2p
n2
Unknown σ21 = σ2
2
Large unknown, (X̄1 − X̄2) ± 1.96√
s21
n1+ s2
2n2
σ21 )= σ2
2
Small Any Non-parametric methods61 / 62
![Page 62: Lecture 3: The No rm a l Di str ibution a n d Sta tistica l In ferencepeople.virginia.edu/~am3xa/BiostatII/slides/lecture3.pdf · 2007-04-19 · Whe n sa m pl ing is from a p opul](https://reader034.vdocument.in/reader034/viewer/2022050119/5f4f49392afa395c63033d82/html5/thumbnails/62.jpg)
Confidence intervals for difference of proportions
Population Sample 95% ConfidenceDistribution Size Interval
BinomialLarge (p̂1 − p̂2) ± 1.96
√p̂1(1−p̂1)
n1+ p̂2(1−p̂2)
n2
Small Exact methods
62 / 62