OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Lesson 8Chapter 7: Confidence and Prediction Intervals
Michael Akritas
Department of StatisticsThe Pennsylvania State University
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
1 Introduction to Confidence Intervals
2 CIs for a Mean and a Proportion
3 CIs for the Regression Parameters
4 The issue of Precision
5 Prediction Intervals
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Bounding the Error of Estimation
By the CLT, if n is large, most estimators, θ̂, are (at leastapproximately) normally distributed, with mean equal to thetrue value, θ, of the parameter they estimate.Thus,
θ̂·∼ N
(θ, σ2
θ̂
).
The above fact provides probabilistic bounds on the sizeof the estimation error:∣∣∣θ̂ − θ∣∣∣ ≤ 1.96σθ̂ holds 95% of the time.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
From Error Bounds to Confidence Intervals
The probabilistic error bound, can be re-written as
θ̂ − 1.96σθ̂ ≤ θ ≤ θ̂ + 1.96σθ̂,
i.e., an interval of plausible values for θ, with degree ofplausibility approximately 95%.
Such intervals are called confidence intervals (CI).
In general, the 100(1− α)% CI is of the form
θ̂ − zα/2σθ̂ ≤ θ ≤ θ̂ + zα/2σθ̂, or θ̂ ± zα/2σθ̂.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Z Intervals
100(1− α)% CIs that use percentiles of the standardnormal distribution, zα/2, as above, are called Z intervals.
Z intervals for the mean require known variance, andeither the assumption of normality or n ≥ 30. Typically, thevariance is not known.
Z intervals will be primarily used for proportions.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
The T Distribution and T Intervals
When sampling from normal populations, an estimator θ̂ ofsome parameter θ often satisfies, for all sample sizes n,
θ̂ − θσ̂θ̂∼ Tν , where σ̂θ̂ is the estimated standard error,
and Tν stands for T distribution with ν degrees of freedom.
A T distribution is symmetric and its pdf tends to that of thestandard normal as ν tends to infinity.
The 100(1− α/2)th percentile of the T distribution with νdegrees of freedom will be denoted by tν,α/2.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �
� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �
ν
,νt
p.d.f of the t-distr. with
area=
αα
degrees of freedom
Figure: PDF and Percentile of a T Distribution.
As the DF ν gets large, tν,α/2 approaches zα/2.For example, for ν = 9,19,60 and 120, tν,0.05 is:
1.833, 1.729, 1.671, 1.658,
respectively, while z0.05 = 1.645.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Plots of N(0,1) and T densities in R
http://stat.psu.edu/˜mga/401/fig/ComparTdensit.pdf
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Relation θ̂−θσ̂θ̂∼ tν , which also holds approximately when
sampling non-normal populations provided n ≥ 30, leadsto the following 1− α bound on the error of estimation of θ∣∣∣θ̂ − θ∣∣∣ ≤ tν,α/2σ̂θ̂,
This error bound leads to the following (1− α)100% CI forθ: (
θ̂ − tν,α/2σ̂θ̂, θ̂ + tν,α/2σ̂θ̂
). (2.1)
T intervals will be used for the mean, as well as for theregression parameters in the linear regression model.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Read Section 7.2 CI Semantics: The Meaning of “Confidence”
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
0.2 0.4 0.6 0.8
010
2030
4050
End points of CIs
CI c
ount
Figure: 50 CIs for p.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
T CIs for the Mean: Proposition
Let X1, . . . ,Xn be a simple r.s. from a population with mean µand variance σ2, both unknown. Then
X − µS/√
n∼ tn−1 (3.1)
holds exactly for any n if the population is normal, and holdsapproximately for non-normal populations provided n ≥ 30.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
The Proposition yields the 1− α error bound∣∣∣X − µ∣∣∣ ≤ tn−1,α/2S√n
which leads to the following (1− α)100% CI for the mean:(X − tn−1,α/2
S√n, X + tn−1,α/2
S√n
).
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
ExampleThe mean weight loss of n = 16 grinding balls after a certainlength of time in mill slurry is 3.42g with S = 0.68g. Construct a99% CI for the true mean weight loss.
Solution. Because n < 30 we must assume that the (statistical)population of the mean weight loss is normal. (In a real lifeapplication, the normality assumption should be verified by thehistogram or the Q-Q plot of the data.) For α = 0.01 andn − 1 = 15 DF, Table A.4 gives tn−1,α/2 = t15,0.005 = 2.947.Thus the desired 99% CI for µ is
3.42± 2.947(0.68/√
16), or 2.92 < µ < 3.92.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
R command for the t-interval for the mean
With data in x , the commands
”lm(x ∼ 1), confint(lm(x ∼ 1))”
will return X , and pair of values X ± tn−1,0.025S√n
, which is
the 95% CI for µ.For the 90% CI of µ use
”confint(lm(x ∼ 1), level=0.9)”
(∗) The pair of values X ± tn−1,0.025S√n
can also be gotten
as”mean(x) ± qt(0.975,df=length(x)-1)*sd(x)/sqrt(length(x))”
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Z CIs for Proportions
CIs for p, however, are slightly different due to:
1 We are typically given only T = X1 + · · ·+ Xn, or X = p̂.
2 σ2 is estimated by p̂(1− p̂).
3 We use the percentiles from the normal distribution, usingthe approximate result
p̂ ·∼ N(p, p̂(1− p̂)/n
),
which holds if np̂ ≥ 8 and n(1− p̂) ≥ 8, i.e. at least eight 1sand at least eight 0s.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
The above approximate distribution of p̂ leads to the(approximate) (1− α)100% CI
p̂ ± zα/2
√p̂(1− p̂)
n.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
ExampleA Gallup Survey estimated the proportion of adults across thecountry who drink beer, wine, or hard liquor, at leastoccasionally. Of the 1516 adults interviewed, 985 said theydrank. Find a 95% confidence interval for the proportion, p, ofall Americans who drink.
Solution: Here α = 0.05, and z0.025 = 1.96. Thus
9851516
± 1.96
√0.65× 0.35
1516= 0.65± 0.024
QUESTION: An interpretation of the above CI is that theprobability is 0.95 that the true proportion of adults who drinklies in the interval you obtained. True or False?
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
R command for z-intervals for a proportion
With T being the number of ”successes” in n trials, set”phat=T/n” and use the commands
”phat ± qnorm(0.975)*sqrt(phat*(1-phat)/n)”to obtain the 95% CI for p, i.e. the pair of values
p̂ ± z0.025
√p̂(1−p̂)
n .To obtain 90% or other CIs, adjust the 0.975 in the abovecommand accordingly.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
T CIs for the Slope of a Regression Line: Proposition
Let (X1,Y1), . . . , (Xn,Yn), be iid satisfying E(Yi |Xi = x) = α1+β1x , and Var(Yi |Xi = x) = σ2
ε , same for all x . Then,
σ̂β̂1=
√√√√√ S2ε∑
X 2i −
1n
(∑
Xi)2, where
S2ε =
1n − 2
[n∑
i=1
Y 2i − α̂1
n∑i=1
Yi − β̂1
n∑i=1
XiYi
].
is the estimator of the intrinsic variability. NOTE: σ̂β̂1is also
denoted by Sβ̂1.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
We saw that under the normality assumption,
β̂1 − β1
σ̂β̂1
∼ tn−2.
This leads to the 100(1− α)% error bound
|β̂1 − β1| < tn−2,α/2σ̂β̂1,
and corresponding 100(1− α)% CI for β1 of:(β̂1 − tn−2,α/2σ̂β̂1
, β̂1 + tn−2,α/2σ̂β̂1
)
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Example (Y=propagation of stress wave, X=tensile strength)
In this study, n = 14,∑
i Xi = 890,∑
i X 2i = 67,182,∑
i Yi = 37.6,∑
i Y 2i = 103.54 and
∑i XiYi = 2234.30. Let Y1
denote an observation made at X1 = 30, and Y2 denote anobservation at X2 = 35. Construct a 95% CI for E(Y1 − Y2).
Solution. Note that E(Y1 − Y2) = −5β1. We will first construct a95% CI for β1. We have: β̂1 = −0.0147209, α̂1 = 3.6209072,and S2 = 0.02187. Thus,
σ̂β̂1
=
√√√√√ S2ε∑
X 2i −
1n
(∑
Xi)2=
√0.02187
67,182− 1148902
= 0.001414,
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Example (Continued)so that, the 95% CI for β1 is
β̂1 ± t0.025,12σ̂β̂1= −0.0147209± 2.179× 0.001414
= −0.0147209± 0.00308 = (−0.0178,−0.01164).
The 95% CI for −5β1 follows now easily:
−5β̂1 ± 5tα/2,n−2σ̂β̂1= 5(0.0147209)± 5× 2.179× 0.001414.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
T CIs for the Regression Line
Let (X1,Y1), . . . , (Xn,Yn), be iid satisfying E(Yi |Xi = x) = α1+β1x , and Var(Yi |Xi = x) = σ2, same for all x . Then,
σ̂µ̂Y |X=x = Sε
√1n
+n(x − X )2
n∑
X 2i − (
∑Xi)2
, where
S2ε =
1n − 2
[n∑
i=1
Y 2i − α̂1
n∑i=1
Yi − β̂1
n∑i=1
XiYi
].
is the estimator of the intrinsic variability. NOTE: σ̂µ̂Y |X=x is alsodenoted by Sµ̂Y |X=x .
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
We saw that under the normality assumption,
µ̂Y |X=x − µY |X=x
σ̂µ̂Y |X=x
∼ tn−2.
This leads to the 100(1− α)% error bound
|µ̂Y |X=x − µY |X=x | < tn−2,α/2σ̂µ̂Y |X=x ,
and corresponding 100(1− α)% CI for µY |X=x of:(µ̂Y |X=x − tn−2,α/2σ̂µ̂Y |X=x , µ̂Y |X=x − tn−2,α/2σ̂µ̂Y |X=x
)
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
The n(x − X̄ )2 in the expression of σ̂µ̂Y |X=x , means thatconfidence intervals for µY |X=x get wider as x get farther awayfrom X .
Figure: Confidence Intervals for µY |X=x Get Wider Away from X
Estimation of µY |X=x for x < X(1) or x > X(n) is NOTrecommended.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Example
n = 11 data points yield∑
Xi = 292.90,∑
Yi = 69.03,∑X 2
i = 8141.75,∑
XiYi = 1890.2,∑
Y 2i = 442.1903,
µ̂Y |X=x = 2.22494 + .152119x , and Sε = 0.3444. Construct 95%CIs for µY |X=26.627 and µY |X=25. [Note that X = 26.627.]
Solution: First,
Sµ̂Y |X=x= 0.3444
√1
11+
11(x − 26.627)2
11(8141.75)− (292.9)2 ,
so that, Sµ̂Y |X=26.627 = 0.1038, and Sµ̂Y |X=25= 0.1082. Thus,
µ̂Y |X=25 ± t.025,90.1082 = 6.028± 0.245, CI at X = 25
µ̂Y |X=26.627 ± t.025,90.1038 = 6.275± 0.235, CI at X = 26.627.Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
R commands for CIs in regression
CIs for the intercept and slope:”confint(lm(y ∼ x))” (or ”confint(lm(y ∼ x),level=0.95)”) gives95% CIs for both α1 and β1.”confint(lm(y ∼ x),level=0.90)” gives 90% CIs for both α1and β1.”confint(lm(y ∼ x),parm=”x”,level=0.90)” gives 90% CI onlyfor β1.”confint(lm(y ∼ x),parm=”(Intercept)”,level=0.90)” gives90% CI only for α1.
CIs for µY |X (x) at, e.g., x = 5.5: ”newx=data.frame(x=5.5)”,”predict(lm(y ∼ x), newx, interval=”confidence”,level=0.9)”
Use ”newx=data.frame(x=c(4.5,5.5))” above for multipleCIs.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Generalities
Precision in estimation is quantified by the size of theprobabilistic error bound, or by the length of the CI.Error bounds are of the form∣∣∣X − µ∣∣∣ ≤ tn−1,α/2
S√n
(unknown σ, normal case, or n > 30)
|p̂ − p| ≤ zα/2
√p̂(1− p̂)
n(np̂ ≥ 8,n(1− p̂) ≥ 8).
Thus error bounds depend on n, and α since, e.g.,
z.05 = 1.645 < z.025 = 1.96 < z.005 = 2.575
In improving precision, we do not want to adjust α.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
The Ideal Case: σ known
To construct a (1− α)100% CI having a prescribed lengthof L, the sample size n is found by solving the equation
2zα/2σ√n
= L.
The solution is: n =
(2zα/2
σ
L
)2
.
If the solution is not an integer (as is typically the case), thenumber is rounded up. Rounding up guarantees that theprescribed precision objective will be more than met.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
The Ideal Case: An Example
ExampleThe time to response (in milliseconds) to an editing commandwith a new operating system is normally distributed with anunknown mean µ and σ = 25. We want a 95% CI for µ of lengthL = 10 milliseconds. What sample size n should be used?
Solution. For 95% CI, α/2 = .025 and z.025 = 1.96. Thus
n =
(2 · (1.96)
2510
)2
= 96.04,
which is rounded up to n = 97.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
The Realistic Case: σ unknown
Sample size determination must rely a preliminaryapproximation, Sprl , of σ. Two common methods are:
1 If the range of population values is known, use
Sprl =range
3.5, or Sprl =
range4
.
This approximation is inspired by the standard deviation ofa U(a,b) random variable, which is σ = (b − a)/3.464.
2 Use the standard deviation, Sprl , of a preliminary sample.This is somewhat cumbersome because it requires sometrial-and-error iterations.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Sampe Size Determination for Estimating p
Equating the length of the (1− α)100% CI for p to L andsolving for n gives the solution is:
n =4z2
α/2p̂(1− p̂)
L2 . Round up.
Two commonly used methods for obtaining a preliminaryapproximation, p̂prl are:
1 Obtain p̂prl either from a small pilot sample or from expertopinion, and use it in the above formula.
2 Replace p̂(1− p̂) in the formula by 0.25. This gives
n = z2α/2/L
2. Round up.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Example
A preliminary sample gave p̂prl = 0.91. How large should n beto estimate the probability of interest to within 0.01 with 95%confidence?
Solution. “To within 0.01” is another way of saying that the 95%bound on the error of estimation should be 0.01, or the desiredCI should have a width of 0.02. Since we have preliminaryinformation, we use the first formula:
n =4(1.96)2(0.91)(0.09)
(.02)2 = 3146.27.
This is rounded up to 3147.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
ExampleA new method of pre-coating fittings used in oil, brake andother fluid systems in heavy-duty trucks is being studied. Howlarge n is needed to estimate the proportion of fittings that leakto within .02 with 90% confidence? (No prior info available).
Solution. Here we have no preliminary information about p.Thus, we apply the second formula and we obtain
n = z2α/2/L
2 = (1.645)2/(.04)2 = 1691.26.
This is rounded up to 1692.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Prediction refers to estimating an observation. It is relatedto estimating the mean, but prediction intervals (PIs) aredifferent from CIs. For example
1 Predicting the fat content of the hot dog you are about toeat is related to estimating the mean fat content of hot dog.But the PI is different from the CI.
2 Predicting the failure time of your resistor from itsresistance is related to estimating the mean failure time ofall resistors having the same resistance. But the PI isdifferent from the CI.
In the first example, there was no explanatory variable.The second example involves a regression context.
We begin with the case of no explanatory variable.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Prediction Based on a Univariate Sample
To emphasize the difference between PIs and CIs,suppose that the amount of fat in a randomly selected hotdog is N(20,9). Thus there are no unknown parameters tobe estimated, and no need to construct a CI.Still the amount of fat, X , in the hot dog which one is aboutto eat is unknown, simply because it is a random variable.According to well-accepted criteria, the best point-predictorof a normal random variable with mean µ, is µ.A (1− α)100% PI is an interval that contains the r.v. withprobability 1− α. Namely: µ± zα/2σ.In the hot dog example, X ∼ N(20,9), so the best pointpredictor of X is 20 and a 95% PI is 20± (1.96)3.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Typically, µ, σ are unknown and are estimated from asample X1, . . . ,Xn by X , S, respectively.Then, the best point predictor of a future observation, is X .The PI, however, must now take into account the variabilityof X , S as estimators of µ, σ.Assuming normality, the (1− α)100% PI for the next X is:(
X − tα/2,n−1S
√1 +
1n,X + tα/2,n−1S
√1 +
1n
).
The variability of X is accounted for by the1n
, and thevariability of S is accounted for by the use of thet-percentiles.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
ExampleThe fat content measurements from a sample of size n = 10 hot dogs,gave sample mean and sample standard deviation of X = 21.9, andS = 4.134. Give a 95% PI for the fat content, X , of the next hot dogto be sampled.
Solution: Assuming that the fat content of a randomly selected hotdog has the normal distribution, the best point predictor of X isX = 21.9 and the 95% PI is
X ± t.025,9 S
√1 +
1n
= (12.09,31.71).
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
PIs for the Normal Simple Linear Regression Model
Let (X1,Y1), . . . , (Xn,Yn) be n observations that follow thenormal simple linear regression model, i.e.Yi |Xi = xi ∼ N(α1 + β1xi , σ
2).The point predictor for a future observation Y made atX = x is µ̂Y |X=x = α̂1 + β̂1x .The 100(1− α)% PI is
µ̂Y |X=x ± tα/2,n−2S
√1 +
1n
+n(x − X )2
n∑
X 2i − (
∑Xi)2
.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals
OutlineIntroduction to Confidence Intervals
CIs for a Mean and a ProportionCIs for the Regression Parameters
The issue of PrecisionPrediction Intervals
Example
Consider again the study where n = 11,∑
Xi = 292.90,∑Yi = 69.03,
∑X 2
i = 8141.75,∑
XiYi = 1890.200,∑Y 2
i = 442.1903, µ̂Y |X = 2.22494 + .152119X , andS = 0.3444. Construct a 95% PI for a future observation, made atX = 25.Solution. The point predictor is µ̂Y |X=25 = 6.028, and the 95% PI atX = 25 is 6.028± 0.8165, as obtained from the formula
µ̂Y |X=25 ± t.025,9(0.344)
√1 +
111
+11(1.627)2
11∑
X 2i − (
∑Xi)2
.
The 95% CI for µY |X=25 was found to be 6.028± 0.245. Thisdemonstrates that PIs are wider than CIs.
Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals