Download - Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

OutlineIntroduction to Confidence Intervals

CIs for a Mean and a ProportionCIs for the Regression Parameters

The issue of PrecisionPrediction Intervals

Lesson 8Chapter 7: Confidence and Prediction Intervals

Michael Akritas

Department of StatisticsThe Pennsylvania State University

Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals




1 Introduction to Confidence Intervals

2 CIs for a Mean and a Proportion

3 CIs for the Regression Parameters

4 The issue of Precision

5 Prediction Intervals





Bounding the Error of Estimation

By the CLT, if n is large, most estimators, θ̂, are (at leastapproximately) normally distributed, with mean equal to thetrue value, θ, of the parameter they estimate.Thus,

θ̂·∼ N

(θ, σ2

θ̂

).

The above fact provides probabilistic bounds on the sizeof the estimation error:∣∣∣θ̂ − θ∣∣∣ ≤ 1.96σθ̂ holds 95% of the time.





From Error Bounds to Confidence Intervals

The probabilistic error bound, can be re-written as

θ̂ − 1.96σθ̂ ≤ θ ≤ θ̂ + 1.96σθ̂,

i.e., an interval of plausible values for θ, with degree ofplausibility approximately 95%.

Such intervals are called confidence intervals (CI).

In general, the 100(1− α)% CI is of the form

θ̂ − zα/2σθ̂ ≤ θ ≤ θ̂ + zα/2σθ̂, or θ̂ ± zα/2σθ̂.





Z Intervals

100(1− α)% CIs that use percentiles of the standardnormal distribution, zα/2, as above, are called Z intervals.

Z intervals for the mean require known variance, andeither the assumption of normality or n ≥ 30. Typically, thevariance is not known.

Z intervals will be primarily used for proportions.





The T Distribution and T Intervals

When sampling from normal populations, an estimator θ̂ ofsome parameter θ often satisfies, for all sample sizes n,

θ̂ − θσ̂θ̂∼ Tν , where σ̂θ̂ is the estimated standard error,

and Tν stands for T distribution with ν degrees of freedom.

A T distribution is symmetric and its pdf tends to that of thestandard normal as ν tends to infinity.

The 100(1− α/2)th percentile of the T distribution with νdegrees of freedom will be denoted by tν,α/2.





� � � � � � � � � � � � ��

� � � � � � � � � � � � ��

ν

,νt

p.d.f of the t-distr. with

area=

αα

degrees of freedom

Figure: PDF and Percentile of a T Distribution.

As the DF ν gets large, tν,α/2 approaches zα/2.For example, for ν = 9,19,60 and 120, tν,0.05 is:

1.833, 1.729, 1.671, 1.658,

respectively, while z0.05 = 1.645.





Plots of N(0,1) and T densities in R

http://stat.psu.edu/˜mga/401/fig/ComparTdensit.pdf


http://stat.psu.edu/~mga/401/fig/ComparTdensit.pdf

http://stat.psu.edu/~mga/401/fig/ComparTdensit.pdf




Relation θ̂−θσ̂θ̂∼ tν , which also holds approximately when

sampling non-normal populations provided n ≥ 30, leadsto the following 1− α bound on the error of estimation of θ∣∣∣θ̂ − θ∣∣∣ ≤ tν,α/2σ̂θ̂,

This error bound leads to the following (1− α)100% CI forθ: (

θ̂ − tν,α/2σ̂θ̂, θ̂ + tν,α/2σ̂θ̂

). (2.1)

T intervals will be used for the mean, as well as for theregression parameters in the linear regression model.





Read Section 7.2 CI Semantics: The Meaning of “Confidence”





0.2 0.4 0.6 0.8

010

2030

4050

End points of CIs

CI c

ount

Figure: 50 CIs for p.





T CIs for the Mean: Proposition

Let X1, . . . ,Xn be a simple r.s. from a population with mean µand variance σ2, both unknown. Then

X − µS/√

n∼ tn−1 (3.1)

holds exactly for any n if the population is normal, and holdsapproximately for non-normal populations provided n ≥ 30.





The Proposition yields the 1− α error bound∣∣∣X − µ∣∣∣ ≤ tn−1,α/2S√n

which leads to the following (1− α)100% CI for the mean:(X − tn−1,α/2

S√n, X + tn−1,α/2

S√n

).





ExampleThe mean weight loss of n = 16 grinding balls after a certainlength of time in mill slurry is 3.42g with S = 0.68g. Construct a99% CI for the true mean weight loss.

Solution. Because n < 30 we must assume that the (statistical)population of the mean weight loss is normal. (In a real lifeapplication, the normality assumption should be verified by thehistogram or the Q-Q plot of the data.) For α = 0.01 andn − 1 = 15 DF, Table A.4 gives tn−1,α/2 = t15,0.005 = 2.947.Thus the desired 99% CI for µ is

3.42± 2.947(0.68/√

16), or 2.92 < µ < 3.92.





R command for the t-interval for the mean

With data in x , the commands

”lm(x ∼ 1), confint(lm(x ∼ 1))”

will return X , and pair of values X ± tn−1,0.025S√n

, which is

the 95% CI for µ.For the 90% CI of µ use

”confint(lm(x ∼ 1), level=0.9)”

(∗) The pair of values X ± tn−1,0.025S√n

can also be gotten

as”mean(x) ± qt(0.975,df=length(x)-1)*sd(x)/sqrt(length(x))”





Z CIs for Proportions

CIs for p, however, are slightly different due to:

1 We are typically given only T = X1 + · · ·+ Xn, or X = p̂.

2 σ2 is estimated by p̂(1− p̂).

3 We use the percentiles from the normal distribution, usingthe approximate result

p̂ ·∼ N(p, p̂(1− p̂)/n

),

which holds if np̂ ≥ 8 and n(1− p̂) ≥ 8, i.e. at least eight 1sand at least eight 0s.





The above approximate distribution of p̂ leads to the(approximate) (1− α)100% CI

p̂ ± zα/2

√p̂(1− p̂)

n.





ExampleA Gallup Survey estimated the proportion of adults across thecountry who drink beer, wine, or hard liquor, at leastoccasionally. Of the 1516 adults interviewed, 985 said theydrank. Find a 95% confidence interval for the proportion, p, ofall Americans who drink.

Solution: Here α = 0.05, and z0.025 = 1.96. Thus

9851516

± 1.96

√0.65× 0.35

1516= 0.65± 0.024

QUESTION: An interpretation of the above CI is that theprobability is 0.95 that the true proportion of adults who drinklies in the interval you obtained. True or False?





R command for z-intervals for a proportion

With T being the number of ”successes” in n trials, set”phat=T/n” and use the commands

”phat ± qnorm(0.975)*sqrt(phat*(1-phat)/n)”to obtain the 95% CI for p, i.e. the pair of values

p̂ ± z0.025

√p̂(1−p̂)

n .To obtain 90% or other CIs, adjust the 0.975 in the abovecommand accordingly.





T CIs for the Slope of a Regression Line: Proposition

Let (X1,Y1), . . . , (Xn,Yn), be iid satisfying E(Yi |Xi = x) = α1+β1x , and Var(Yi |Xi = x) = σ2

ε , same for all x . Then,

σ̂β̂1=

√√√√√ S2ε∑

X 2i −

1n

(∑

Xi)2, where

S2ε =

1n − 2

[n∑

i=1

Y 2i − α̂1

n∑i=1

Yi − β̂1

n∑i=1

XiYi

].

is the estimator of the intrinsic variability. NOTE: σ̂β̂1is also

denoted by Sβ̂1.





We saw that under the normality assumption,

β̂1 − β1

σ̂β̂1

∼ tn−2.

This leads to the 100(1− α)% error bound

|β̂1 − β1| < tn−2,α/2σ̂β̂1,

and corresponding 100(1− α)% CI for β1 of:(β̂1 − tn−2,α/2σ̂β̂1

, β̂1 + tn−2,α/2σ̂β̂1

)





Example (Y=propagation of stress wave, X=tensile strength)

In this study, n = 14,∑

i Xi = 890,∑

i X 2i = 67,182,∑

i Yi = 37.6,∑

i Y 2i = 103.54 and

∑i XiYi = 2234.30. Let Y1

denote an observation made at X1 = 30, and Y2 denote anobservation at X2 = 35. Construct a 95% CI for E(Y1 − Y2).

Solution. Note that E(Y1 − Y2) = −5β1. We will first construct a95% CI for β1. We have: β̂1 = −0.0147209, α̂1 = 3.6209072,and S2 = 0.02187. Thus,

σ̂β̂1

=

√√√√√ S2ε∑

X 2i −

1n

(∑

Xi)2=

√0.02187

67,182− 1148902

= 0.001414,





Example (Continued)so that, the 95% CI for β1 is

β̂1 ± t0.025,12σ̂β̂1= −0.0147209± 2.179× 0.001414

= −0.0147209± 0.00308 = (−0.0178,−0.01164).

The 95% CI for −5β1 follows now easily:

−5β̂1 ± 5tα/2,n−2σ̂β̂1= 5(0.0147209)± 5× 2.179× 0.001414.





T CIs for the Regression Line

Let (X1,Y1), . . . , (Xn,Yn), be iid satisfying E(Yi |Xi = x) = α1+β1x , and Var(Yi |Xi = x) = σ2, same for all x . Then,

σ̂µ̂Y |X=x = Sε

√1n

+n(x − X )2

n∑

X 2i − (

∑Xi)2

, where

S2ε =

1n − 2

[n∑

i=1

Y 2i − α̂1

n∑i=1

Yi − β̂1

n∑i=1

XiYi

].

is the estimator of the intrinsic variability. NOTE: σ̂µ̂Y |X=x is alsodenoted by Sµ̂Y |X=x .





The n(x − X̄ )2 in the expression of σ̂µ̂Y |X=x , means thatconfidence intervals for µY |X=x get wider as x get farther awayfrom X .

Figure: Confidence Intervals for µY |X=x Get Wider Away from X

Estimation of µY |X=x for x < X(1) or x > X(n) is NOTrecommended.





Example

n = 11 data points yield∑

Xi = 292.90,∑

Yi = 69.03,∑X 2

i = 8141.75,∑

XiYi = 1890.2,∑

Y 2i = 442.1903,

µ̂Y |X=x = 2.22494 + .152119x , and Sε = 0.3444. Construct 95%CIs for µY |X=26.627 and µY |X=25. [Note that X = 26.627.]

Solution: First,

Sµ̂Y |X=x= 0.3444

√1

11+

11(x − 26.627)2

11(8141.75)− (292.9)2 ,

so that, Sµ̂Y |X=26.627 = 0.1038, and Sµ̂Y |X=25= 0.1082. Thus,

µ̂Y |X=25 ± t.025,90.1082 = 6.028± 0.245, CI at X = 25

µ̂Y |X=26.627 ± t.025,90.1038 = 6.275± 0.235, CI at X = 26.627.Michael Akritas Lesson 8 Chapter 7: Confidence and Prediction Intervals




R commands for CIs in regression

CIs for the intercept and slope:”confint(lm(y ∼ x))” (or ”confint(lm(y ∼ x),level=0.95)”) gives95% CIs for both α1 and β1.”confint(lm(y ∼ x),level=0.90)” gives 90% CIs for both α1and β1.”confint(lm(y ∼ x),parm=”x”,level=0.90)” gives 90% CI onlyfor β1.”confint(lm(y ∼ x),parm=”(Intercept)”,level=0.90)” gives90% CI only for α1.

CIs for µY |X (x) at, e.g., x = 5.5: ”newx=data.frame(x=5.5)”,”predict(lm(y ∼ x), newx, interval=”confidence”,level=0.9)”

Use ”newx=data.frame(x=c(4.5,5.5))” above for multipleCIs.





Generalities

Precision in estimation is quantified by the size of theprobabilistic error bound, or by the length of the CI.Error bounds are of the form∣∣∣X − µ∣∣∣ ≤ tn−1,α/2

S√n

(unknown σ, normal case, or n > 30)

|p̂ − p| ≤ zα/2

√p̂(1− p̂)

n(np̂ ≥ 8,n(1− p̂) ≥ 8).

Thus error bounds depend on n, and α since, e.g.,

z.05 = 1.645 < z.025 = 1.96 < z.005 = 2.575

In improving precision, we do not want to adjust α.





The Ideal Case: σ known

To construct a (1− α)100% CI having a prescribed lengthof L, the sample size n is found by solving the equation

2zα/2σ√n

= L.

The solution is: n =

(2zα/2

σ

L

)2

.

If the solution is not an integer (as is typically the case), thenumber is rounded up. Rounding up guarantees that theprescribed precision objective will be more than met.





The Ideal Case: An Example

ExampleThe time to response (in milliseconds) to an editing commandwith a new operating system is normally distributed with anunknown mean µ and σ = 25. We want a 95% CI for µ of lengthL = 10 milliseconds. What sample size n should be used?

Solution. For 95% CI, α/2 = .025 and z.025 = 1.96. Thus

n =

(2 · (1.96)

2510

)2

= 96.04,

which is rounded up to n = 97.





The Realistic Case: σ unknown

Sample size determination must rely a preliminaryapproximation, Sprl , of σ. Two common methods are:

1 If the range of population values is known, use

Sprl =range

3.5, or Sprl =

range4

.

This approximation is inspired by the standard deviation ofa U(a,b) random variable, which is σ = (b − a)/3.464.

2 Use the standard deviation, Sprl , of a preliminary sample.This is somewhat cumbersome because it requires sometrial-and-error iterations.





Sampe Size Determination for Estimating p

Equating the length of the (1− α)100% CI for p to L andsolving for n gives the solution is:

n =4z2

α/2p̂(1− p̂)

L2 . Round up.

Two commonly used methods for obtaining a preliminaryapproximation, p̂prl are:

1 Obtain p̂prl either from a small pilot sample or from expertopinion, and use it in the above formula.

2 Replace p̂(1− p̂) in the formula by 0.25. This gives

n = z2α/2/L

2. Round up.





Example

A preliminary sample gave p̂prl = 0.91. How large should n beto estimate the probability of interest to within 0.01 with 95%confidence?

Solution. “To within 0.01” is another way of saying that the 95%bound on the error of estimation should be 0.01, or the desiredCI should have a width of 0.02. Since we have preliminaryinformation, we use the first formula:

n =4(1.96)2(0.91)(0.09)

(.02)2 = 3146.27.

This is rounded up to 3147.





ExampleA new method of pre-coating fittings used in oil, brake andother fluid systems in heavy-duty trucks is being studied. Howlarge n is needed to estimate the proportion of fittings that leakto within .02 with 90% confidence? (No prior info available).

Solution. Here we have no preliminary information about p.Thus, we apply the second formula and we obtain

n = z2α/2/L

2 = (1.645)2/(.04)2 = 1691.26.

This is rounded up to 1692.





Prediction refers to estimating an observation. It is relatedto estimating the mean, but prediction intervals (PIs) aredifferent from CIs. For example

1 Predicting the fat content of the hot dog you are about toeat is related to estimating the mean fat content of hot dog.But the PI is different from the CI.

2 Predicting the failure time of your resistor from itsresistance is related to estimating the mean failure time ofall resistors having the same resistance. But the PI isdifferent from the CI.

In the first example, there was no explanatory variable.The second example involves a regression context.

We begin with the case of no explanatory variable.





Prediction Based on a Univariate Sample

To emphasize the difference between PIs and CIs,suppose that the amount of fat in a randomly selected hotdog is N(20,9). Thus there are no unknown parameters tobe estimated, and no need to construct a CI.Still the amount of fat, X , in the hot dog which one is aboutto eat is unknown, simply because it is a random variable.According to well-accepted criteria, the best point-predictorof a normal random variable with mean µ, is µ.A (1− α)100% PI is an interval that contains the r.v. withprobability 1− α. Namely: µ± zα/2σ.In the hot dog example, X ∼ N(20,9), so the best pointpredictor of X is 20 and a 95% PI is 20± (1.96)3.





Typically, µ, σ are unknown and are estimated from asample X1, . . . ,Xn by X , S, respectively.Then, the best point predictor of a future observation, is X .The PI, however, must now take into account the variabilityof X , S as estimators of µ, σ.Assuming normality, the (1− α)100% PI for the next X is:(

X − tα/2,n−1S

√1 +

1n,X + tα/2,n−1S

√1 +

1n

).

The variability of X is accounted for by the1n

, and thevariability of S is accounted for by the use of thet-percentiles.





ExampleThe fat content measurements from a sample of size n = 10 hot dogs,gave sample mean and sample standard deviation of X = 21.9, andS = 4.134. Give a 95% PI for the fat content, X , of the next hot dogto be sampled.

Solution: Assuming that the fat content of a randomly selected hotdog has the normal distribution, the best point predictor of X isX = 21.9 and the 95% PI is

X ± t.025,9 S

√1 +

1n

= (12.09,31.71).





PIs for the Normal Simple Linear Regression Model

Let (X1,Y1), . . . , (Xn,Yn) be n observations that follow thenormal simple linear regression model, i.e.Yi |Xi = xi ∼ N(α1 + β1xi , σ

2).The point predictor for a future observation Y made atX = x is µ̂Y |X=x = α̂1 + β̂1x .The 100(1− α)% PI is

µ̂Y |X=x ± tα/2,n−2S

√1 +

1n

+n(x − X )2

n∑

X 2i − (

∑Xi)2

.





Example

Consider again the study where n = 11,∑

Xi = 292.90,∑Yi = 69.03,

∑X 2

i = 8141.75,∑

XiYi = 1890.200,∑Y 2

i = 442.1903, µ̂Y |X = 2.22494 + .152119X , andS = 0.3444. Construct a 95% PI for a future observation, made atX = 25.Solution. The point predictor is µ̂Y |X=25 = 6.028, and the 95% PI atX = 25 is 6.028± 0.8165, as obtained from the formula

µ̂Y |X=25 ± t.025,9(0.344)

√1 +

111

+11(1.627)2

11∑

X 2i − (

∑Xi)2

.

The 95% CI for µY |X=25 was found to be 6.028± 0.245. Thisdemonstrates that PIs are wider than CIs.


Download - Lesson 8 Chapter 7: Confidence and Prediction Intervalsbigdataudesa.weebly.com/uploads/8/6/9/0/86901080/ci_cp.pdf · ^ 1:96˙ ^ ^ + 1:96˙ ^ ; i.e., an interval of plausible values

Top Related