lesson 8 chapter 7: confidence and prediction...

41
Outline Introduction to Confidence Intervals CIs for a Mean and a Proportion CIs for the Regression Parameters The issue of Precision Prediction Intervals Lesson 8 Chapter 7: Confidence and Prediction Intervals Michael Akritas Department of Statistics The Pennsylvania State University Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Upload: others

Post on 14-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Lesson 8Chapter 7: Confidence and Prediction Intervals

Michael Akritas

Department of StatisticsThe Pennsylvania State University

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 2: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

1 Introduction to Confidence Intervals

2 CIs for a Mean and a Proportion

3 CIs for the Regression Parameters

4 The issue of Precision

5 Prediction Intervals

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 3: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Bounding the Error of Estimation

By the CLT, if n is large, most estimators, θ̂, are (at leastapproximately) normally distributed, with mean equal to thetrue value, θ, of the parameter they estimate.Thus,

θ̂·∼ N

(θ, σ2

θ̂

).

The above fact provides probabilistic bounds on the sizeof the estimation error:∣∣∣θ̂ − θ∣∣∣ ≤ 1.96σθ̂ holds 95% of the time.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 4: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

From Error Bounds to Confidence Intervals

The probabilistic error bound, can be re-written as

θ̂ − 1.96σθ̂ ≤ θ ≤ θ̂ + 1.96σθ̂,

i.e., an interval of plausible values for θ, with degree ofplausibility approximately 95%.

Such intervals are called confidence intervals (CI).

In general, the 100(1− α)% CI is of the form

θ̂ − zα/2σθ̂ ≤ θ ≤ θ̂ + zα/2σθ̂, or θ̂ ± zα/2σθ̂.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 5: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Z Intervals

100(1− α)% CIs that use percentiles of the standardnormal distribution, zα/2, as above, are called Z intervals.

Z intervals for the mean require known variance, andeither the assumption of normality or n ≥ 30. Typically, thevariance is not known.

Z intervals will be primarily used for proportions.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 6: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

The T Distribution and T Intervals

When sampling from normal populations, an estimator θ̂ ofsome parameter θ often satisfies, for all sample sizes n,

θ̂ − θσ̂θ̂∼ Tν , where σ̂θ̂ is the estimated standard error,

and Tν stands for T distribution with ν degrees of freedom.

A T distribution is symmetric and its pdf tends to that of thestandard normal as ν tends to infinity.

The 100(1− α/2)th percentile of the T distribution with νdegrees of freedom will be denoted by tν,α/2.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 7: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �

� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �

ν

,νt

p.d.f of the t-distr. with

area=

αα

degrees of freedom

Figure: PDF and Percentile of a T Distribution.

As the DF ν gets large, tν,α/2 approaches zα/2.For example, for ν = 9,19,60 and 120, tν,0.05 is:

1.833, 1.729, 1.671, 1.658,

respectively, while z0.05 = 1.645.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 8: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Plots of N(0,1) and T densities in R

http://stat.psu.edu/˜mga/401/fig/ComparTdensit.pdf

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 9: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Relation θ̂−θσ̂θ̂∼ tν , which also holds approximately when

sampling non-normal populations provided n ≥ 30, leadsto the following 1− α bound on the error of estimation of θ∣∣∣θ̂ − θ∣∣∣ ≤ tν,α/2σ̂θ̂,

This error bound leads to the following (1− α)100% CI forθ: (

θ̂ − tν,α/2σ̂θ̂, θ̂ + tν,α/2σ̂θ̂

). (2.1)

T intervals will be used for the mean, as well as for theregression parameters in the linear regression model.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 10: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Read Section 7.2 CI Semantics: The Meaning of “Confidence”

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 11: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

0.2 0.4 0.6 0.8

010

2030

4050

End points of CIs

CI c

ount

Figure: 50 CIs for p.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 12: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

T CIs for the Mean: Proposition

Let X1, . . . ,Xn be a simple r.s. from a population with mean µand variance σ2, both unknown. Then

X − µS/√

n∼ tn−1 (3.1)

holds exactly for any n if the population is normal, and holdsapproximately for non-normal populations provided n ≥ 30.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 13: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

The Proposition yields the 1− α error bound∣∣∣X − µ∣∣∣ ≤ tn−1,α/2S√n

which leads to the following (1− α)100% CI for the mean:(X − tn−1,α/2

S√n, X + tn−1,α/2

S√n

).

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 14: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

ExampleThe mean weight loss of n = 16 grinding balls after a certainlength of time in mill slurry is 3.42g with S = 0.68g. Construct a99% CI for the true mean weight loss.

Solution. Because n < 30 we must assume that the (statistical)population of the mean weight loss is normal. (In a real lifeapplication, the normality assumption should be verified by thehistogram or the Q-Q plot of the data.) For α = 0.01 andn − 1 = 15 DF, Table A.4 gives tn−1,α/2 = t15,0.005 = 2.947.Thus the desired 99% CI for µ is

3.42± 2.947(0.68/√

16), or 2.92 < µ < 3.92.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 15: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

R command for the t-interval for the mean

With data in x , the commands

”lm(x ∼ 1), confint(lm(x ∼ 1))”

will return X , and pair of values X ± tn−1,0.025S√n

, which is

the 95% CI for µ.For the 90% CI of µ use

”confint(lm(x ∼ 1), level=0.9)”

(∗) The pair of values X ± tn−1,0.025S√n

can also be gotten

as”mean(x) ± qt(0.975,df=length(x)-1)*sd(x)/sqrt(length(x))”

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 16: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Z CIs for Proportions

CIs for p, however, are slightly different due to:

1 We are typically given only T = X1 + · · ·+ Xn, or X = p̂.

2 σ2 is estimated by p̂(1− p̂).

3 We use the percentiles from the normal distribution, usingthe approximate result

p̂ ·∼ N(p, p̂(1− p̂)/n

),

which holds if np̂ ≥ 8 and n(1− p̂) ≥ 8, i.e. at least eight 1sand at least eight 0s.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 17: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

The above approximate distribution of p̂ leads to the(approximate) (1− α)100% CI

p̂ ± zα/2

√p̂(1− p̂)

n.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 18: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

ExampleA Gallup Survey estimated the proportion of adults across thecountry who drink beer, wine, or hard liquor, at leastoccasionally. Of the 1516 adults interviewed, 985 said theydrank. Find a 95% confidence interval for the proportion, p, ofall Americans who drink.

Solution: Here α = 0.05, and z0.025 = 1.96. Thus

9851516

± 1.96

√0.65× 0.35

1516= 0.65± 0.024

QUESTION: An interpretation of the above CI is that theprobability is 0.95 that the true proportion of adults who drinklies in the interval you obtained. True or False?

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 19: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

R command for z-intervals for a proportion

With T being the number of ”successes” in n trials, set”phat=T/n” and use the commands

”phat ± qnorm(0.975)*sqrt(phat*(1-phat)/n)”to obtain the 95% CI for p, i.e. the pair of values

p̂ ± z0.025

√p̂(1−p̂)

n .To obtain 90% or other CIs, adjust the 0.975 in the abovecommand accordingly.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 20: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

T CIs for the Slope of a Regression Line: Proposition

Let (X1,Y1), . . . , (Xn,Yn), be iid satisfying E(Yi |Xi = x) = α1+β1x , and Var(Yi |Xi = x) = σ2

ε , same for all x . Then,

σ̂β̂1=

√√√√√ S2ε∑

X 2i −

1n

(∑

Xi)2, where

S2ε =

1n − 2

[n∑

i=1

Y 2i − α̂1

n∑i=1

Yi − β̂1

n∑i=1

XiYi

].

is the estimator of the intrinsic variability. NOTE: σ̂β̂1is also

denoted by Sβ̂1.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 21: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

We saw that under the normality assumption,

β̂1 − β1

σ̂β̂1

∼ tn−2.

This leads to the 100(1− α)% error bound

|β̂1 − β1| < tn−2,α/2σ̂β̂1,

and corresponding 100(1− α)% CI for β1 of:(β̂1 − tn−2,α/2σ̂β̂1

, β̂1 + tn−2,α/2σ̂β̂1

)

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 22: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Example (Y=propagation of stress wave, X=tensile strength)

In this study, n = 14,∑

i Xi = 890,∑

i X 2i = 67,182,∑

i Yi = 37.6,∑

i Y 2i = 103.54 and

∑i XiYi = 2234.30. Let Y1

denote an observation made at X1 = 30, and Y2 denote anobservation at X2 = 35. Construct a 95% CI for E(Y1 − Y2).

Solution. Note that E(Y1 − Y2) = −5β1. We will first construct a95% CI for β1. We have: β̂1 = −0.0147209, α̂1 = 3.6209072,and S2 = 0.02187. Thus,

σ̂β̂1

=

√√√√√ S2ε∑

X 2i −

1n

(∑

Xi)2=

√0.02187

67,182− 1148902

= 0.001414,

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 23: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Example (Continued)so that, the 95% CI for β1 is

β̂1 ± t0.025,12σ̂β̂1= −0.0147209± 2.179× 0.001414

= −0.0147209± 0.00308 = (−0.0178,−0.01164).

The 95% CI for −5β1 follows now easily:

−5β̂1 ± 5tα/2,n−2σ̂β̂1= 5(0.0147209)± 5× 2.179× 0.001414.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 24: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

T CIs for the Regression Line

Let (X1,Y1), . . . , (Xn,Yn), be iid satisfying E(Yi |Xi = x) = α1+β1x , and Var(Yi |Xi = x) = σ2, same for all x . Then,

σ̂µ̂Y |X=x = Sε

√1n

+n(x − X )2

n∑

X 2i − (

∑Xi)2

, where

S2ε =

1n − 2

[n∑

i=1

Y 2i − α̂1

n∑i=1

Yi − β̂1

n∑i=1

XiYi

].

is the estimator of the intrinsic variability. NOTE: σ̂µ̂Y |X=x is alsodenoted by Sµ̂Y |X=x .

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 25: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

We saw that under the normality assumption,

µ̂Y |X=x − µY |X=x

σ̂µ̂Y |X=x

∼ tn−2.

This leads to the 100(1− α)% error bound

|µ̂Y |X=x − µY |X=x | < tn−2,α/2σ̂µ̂Y |X=x ,

and corresponding 100(1− α)% CI for µY |X=x of:(µ̂Y |X=x − tn−2,α/2σ̂µ̂Y |X=x , µ̂Y |X=x − tn−2,α/2σ̂µ̂Y |X=x

)

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 26: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

The n(x − X̄ )2 in the expression of σ̂µ̂Y |X=x , means thatconfidence intervals for µY |X=x get wider as x get farther awayfrom X .

Figure: Confidence Intervals for µY |X=x Get Wider Away from X

Estimation of µY |X=x for x < X(1) or x > X(n) is NOTrecommended.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 27: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Example

n = 11 data points yield∑

Xi = 292.90,∑

Yi = 69.03,∑X 2

i = 8141.75,∑

XiYi = 1890.2,∑

Y 2i = 442.1903,

µ̂Y |X=x = 2.22494 + .152119x , and Sε = 0.3444. Construct 95%CIs for µY |X=26.627 and µY |X=25. [Note that X = 26.627.]

Solution: First,

Sµ̂Y |X=x= 0.3444

√1

11+

11(x − 26.627)2

11(8141.75)− (292.9)2 ,

so that, Sµ̂Y |X=26.627 = 0.1038, and Sµ̂Y |X=25= 0.1082. Thus,

µ̂Y |X=25 ± t.025,90.1082 = 6.028± 0.245, CI at X = 25

µ̂Y |X=26.627 ± t.025,90.1038 = 6.275± 0.235, CI at X = 26.627.Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 28: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

R commands for CIs in regression

CIs for the intercept and slope:”confint(lm(y ∼ x))” (or ”confint(lm(y ∼ x),level=0.95)”) gives95% CIs for both α1 and β1.”confint(lm(y ∼ x),level=0.90)” gives 90% CIs for both α1and β1.”confint(lm(y ∼ x),parm=”x”,level=0.90)” gives 90% CI onlyfor β1.”confint(lm(y ∼ x),parm=”(Intercept)”,level=0.90)” gives90% CI only for α1.

CIs for µY |X (x) at, e.g., x = 5.5: ”newx=data.frame(x=5.5)”,”predict(lm(y ∼ x), newx, interval=”confidence”,level=0.9)”

Use ”newx=data.frame(x=c(4.5,5.5))” above for multipleCIs.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 29: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Generalities

Precision in estimation is quantified by the size of theprobabilistic error bound, or by the length of the CI.Error bounds are of the form∣∣∣X − µ∣∣∣ ≤ tn−1,α/2

S√n

(unknown σ, normal case, or n > 30)

|p̂ − p| ≤ zα/2

√p̂(1− p̂)

n(np̂ ≥ 8,n(1− p̂) ≥ 8).

Thus error bounds depend on n, and α since, e.g.,

z.05 = 1.645 < z.025 = 1.96 < z.005 = 2.575

In improving precision, we do not want to adjust α.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 30: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

The Ideal Case: σ known

To construct a (1− α)100% CI having a prescribed lengthof L, the sample size n is found by solving the equation

2zα/2σ√n

= L.

The solution is: n =

(2zα/2

σ

L

)2

.

If the solution is not an integer (as is typically the case), thenumber is rounded up. Rounding up guarantees that theprescribed precision objective will be more than met.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 31: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

The Ideal Case: An Example

ExampleThe time to response (in milliseconds) to an editing commandwith a new operating system is normally distributed with anunknown mean µ and σ = 25. We want a 95% CI for µ of lengthL = 10 milliseconds. What sample size n should be used?

Solution. For 95% CI, α/2 = .025 and z.025 = 1.96. Thus

n =

(2 · (1.96)

2510

)2

= 96.04,

which is rounded up to n = 97.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 32: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

The Realistic Case: σ unknown

Sample size determination must rely a preliminaryapproximation, Sprl , of σ. Two common methods are:

1 If the range of population values is known, use

Sprl =range

3.5, or Sprl =

range4

.

This approximation is inspired by the standard deviation ofa U(a,b) random variable, which is σ = (b − a)/3.464.

2 Use the standard deviation, Sprl , of a preliminary sample.This is somewhat cumbersome because it requires sometrial-and-error iterations.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 33: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Sampe Size Determination for Estimating p

Equating the length of the (1− α)100% CI for p to L andsolving for n gives the solution is:

n =4z2

α/2p̂(1− p̂)

L2 . Round up.

Two commonly used methods for obtaining a preliminaryapproximation, p̂prl are:

1 Obtain p̂prl either from a small pilot sample or from expertopinion, and use it in the above formula.

2 Replace p̂(1− p̂) in the formula by 0.25. This gives

n = z2α/2/L

2. Round up.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 34: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Example

A preliminary sample gave p̂prl = 0.91. How large should n beto estimate the probability of interest to within 0.01 with 95%confidence?

Solution. “To within 0.01” is another way of saying that the 95%bound on the error of estimation should be 0.01, or the desiredCI should have a width of 0.02. Since we have preliminaryinformation, we use the first formula:

n =4(1.96)2(0.91)(0.09)

(.02)2 = 3146.27.

This is rounded up to 3147.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 35: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

ExampleA new method of pre-coating fittings used in oil, brake andother fluid systems in heavy-duty trucks is being studied. Howlarge n is needed to estimate the proportion of fittings that leakto within .02 with 90% confidence? (No prior info available).

Solution. Here we have no preliminary information about p.Thus, we apply the second formula and we obtain

n = z2α/2/L

2 = (1.645)2/(.04)2 = 1691.26.

This is rounded up to 1692.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 36: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Prediction refers to estimating an observation. It is relatedto estimating the mean, but prediction intervals (PIs) aredifferent from CIs. For example

1 Predicting the fat content of the hot dog you are about toeat is related to estimating the mean fat content of hot dog.But the PI is different from the CI.

2 Predicting the failure time of your resistor from itsresistance is related to estimating the mean failure time ofall resistors having the same resistance. But the PI isdifferent from the CI.

In the first example, there was no explanatory variable.The second example involves a regression context.

We begin with the case of no explanatory variable.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 37: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Prediction Based on a Univariate Sample

To emphasize the difference between PIs and CIs,suppose that the amount of fat in a randomly selected hotdog is N(20,9). Thus there are no unknown parameters tobe estimated, and no need to construct a CI.Still the amount of fat, X , in the hot dog which one is aboutto eat is unknown, simply because it is a random variable.According to well-accepted criteria, the best point-predictorof a normal random variable with mean µ, is µ.A (1− α)100% PI is an interval that contains the r.v. withprobability 1− α. Namely: µ± zα/2σ.In the hot dog example, X ∼ N(20,9), so the best pointpredictor of X is 20 and a 95% PI is 20± (1.96)3.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 38: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Typically, µ, σ are unknown and are estimated from asample X1, . . . ,Xn by X , S, respectively.Then, the best point predictor of a future observation, is X .The PI, however, must now take into account the variabilityof X , S as estimators of µ, σ.Assuming normality, the (1− α)100% PI for the next X is:(

X − tα/2,n−1S

√1 +

1n,X + tα/2,n−1S

√1 +

1n

).

The variability of X is accounted for by the1n

, and thevariability of S is accounted for by the use of thet-percentiles.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 39: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

ExampleThe fat content measurements from a sample of size n = 10 hot dogs,gave sample mean and sample standard deviation of X = 21.9, andS = 4.134. Give a 95% PI for the fat content, X , of the next hot dogto be sampled.

Solution: Assuming that the fat content of a randomly selected hotdog has the normal distribution, the best point predictor of X isX = 21.9 and the 95% PI is

X ± t.025,9 S

√1 +

1n

= (12.09,31.71).

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 40: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

PIs for the Normal Simple Linear Regression Model

Let (X1,Y1), . . . , (Xn,Yn) be n observations that follow thenormal simple linear regression model, i.e.Yi |Xi = xi ∼ N(α1 + β1xi , σ

2).The point predictor for a future observation Y made atX = x is µ̂Y |X=x = α̂1 + β̂1x .The 100(1− α)% PI is

µ̂Y |X=x ± tα/2,n−2S

√1 +

1n

+n(x − X )2

n∑

X 2i − (

∑Xi)2

.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals

Page 41: Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Example

Consider again the study where n = 11,∑

Xi = 292.90,∑Yi = 69.03,

∑X 2

i = 8141.75,∑

XiYi = 1890.200,∑Y 2

i = 442.1903, µ̂Y |X = 2.22494 + .152119X , andS = 0.3444. Construct a 95% PI for a future observation, made atX = 25.Solution. The point predictor is µ̂Y |X=25 = 6.028, and the 95% PI atX = 25 is 6.028± 0.8165, as obtained from the formula

µ̂Y |X=25 ± t.025,9(0.344)

√1 +

111

+11(1.627)2

11∑

X 2i − (

∑Xi)2

.

The 95% CI for µY |X=25 was found to be 6.028± 0.245. Thisdemonstrates that PIs are wider than CIs.

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals