copyright © cengage learning. all rights reserved. 13 linear correlation and regression analysis

19
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

Upload: berniece-washington

Post on 13-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

Copyright © Cengage Learning. All rights reserved.

13 Linear Correlation and Regression Analysis

Page 2: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

Copyright © Cengage Learning. All rights reserved.

13.5Confidence Intervals for

Regression

Page 3: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

3

Confidence Intervals for Regression

The confidence interval for and the prediction interval for are constructed in a similar fashion, with replacing as our point estimate.

If we were to randomly select several samples from the population, construct the line of best fit for each sample, calculate for a given x using each regression line, and plot the various values (they would vary because each sample would yield a slightly different regression line), we would find that the values form a normal distribution.

Page 4: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

4

Confidence Intervals for Regression

That is, the sampling distribution of is normal, just as the sampling distribution of is normal.

What about the appropriate standard deviation of ?

The standard deviation in both cases ( and ) is calculated by multiplying the square root of the variance of the error by an appropriate correction factor. We know that the variance of the error, , is calculated by means of formula (13.8).

Page 5: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

5

Confidence Intervals for Regression

Before we look at the correction factors for the two cases, let’s see why they are necessary. We know that the line of best fit passes through the point , the centroid.

If we draw lines with slopes equal to the extremes of that confidence interval, 1.27 to 2.51, through the point , [which is (12.3, 26.9)] on the scatter diagram, we will see that the value for fluctuates considerably for different values of x (Figure 13.11). Lines Representing the Confidence Interval for Slope

FIGURE 13.11

Page 6: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

6

Confidence Intervals for Regression

Therefore, we should suspect a need for a wider confidence interval as we select values of x that are fartheraway from x. Hence we need a correction factor to adjust for the distance between x0 and x.

This factor must also adjust for the variation of the y values about . First, let’s estimate the mean value of y at a given value of x, . The confidence interval formula is:

(13.16)

Page 7: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

7

Confidence Intervals for Regression

Note The numerator of the second term under the radical sign is the square of the distance of x0 from . The denominator is closely related to the variance of x and has a “standardizing effect” on this term.

Formula (13.16) can be modified for greater ease of calculation. Here is the new form:

Page 8: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

8

Confidence Intervals for Regression

Let’s compare formula (13.16) with formula (9.1):

replaces , and

(13.16)

Page 9: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

9

Confidence Intervals for Regression

the estimated standard deviation of in estimating , replaces , the standard deviation of .The degrees of freedom are now n – 2 instead of n – 1 as before.

Page 10: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

10

Example 10 – Constructing a Confidence Interval for Y|x0

Construct a 95% confidence interval for the mean travel time for the co-workers who travel 7 miles to work (refer to Example 5 in Section 13.3).

Solution:

Step 1 Parameter of interest: y|x = 7, the mean travel time for co-workers who travel 7 miles to work

Step 2 a. Assumptions: The ordered pairs form a random sample, and we will assume that the y values minutes) at each x (miles) have a normal distribution.

cont’d

Page 11: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

11

Example 10 – Solution

b. Probability distribution and formula: Student’s t-distribution and formula (13.17)

c. Level of confidence: 1 – = 0.95

Step 3 Sample information:

where and

therefore,

= 29.17 (found in example 5 in section 13.3)

Se = = 5.40

cont’d

Page 12: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

12

Example 10 – Solution

= 3.64 + 1.89x = 3.64 + 1.89(7) = 16.87

Step 4 a. Confidence coefficient: t (13, 0.025) = 2.16

(from Table 6 in Appendix B)

b. Maximum error of estimate: Using formula (13.17), we have

cont’d

Page 13: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

13

Example 10 – Solution

c. Lower and upper confidence limits:

Thus, 12.44 to 21.30 is the 95% confidence interval for x|y = 7. That is, with 95% confidence, the mean travel time for commuters that travel 7 miles is between 12.44 minutes (12 min, 26 sec) and 21.30 minutes (21 min, 18 sec).

cont’d

Page 14: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

14

Example 10 – Solution

This confidence interval is shown in Figure 13.12 by the dark red vertical line.

cont’d

Confidence Belts for Y|x0

Figure 13.12

Page 15: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

15

Example 10 – Solution

The confidence belt showing the upper and lower boundaries of all intervals at 95% confidence is also shown in red.

Notice that the boundary lines for the x values far away from become close to the two lines that represent the equations with slopes equal to the extreme values of the 95% confidence interval for the slope (see Figure 13.12).

cont’d

Page 16: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

16

Confidence Intervals for Regression

The formula for the prediction interval of the value of a single randomly selected y is

Page 17: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

17

Confidence Intervals for Regression

There are three basic precautions that you need to be aware of as you work with regression analysis:

1. Remember that the regression equation is meaningful only in the domain of the x-variable studied. Estimation outside this domain is extremely dangerous; it requires that we know or assume that the relationship between x and y remains the same outside the domain of the sample data.

However, although projections outside the interval may be somewhat dangerous, they may be the best predictors available.

Page 18: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

18

Confidence Intervals for Regression

2. Don’t get caught by the common fallacy of applying the regression results inappropriately.

Basically, the results of one sample should not be used to make inferences about a population other than the one from which the sample was drawn.

Page 19: Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

19

Confidence Intervals for Regression

3. Don’t jump to the conclusion that the results of the regression prove that x causes y to change. (This is perhaps the most common fallacy.) Regressions measure only movement between x and y; they never prove causation.

The most common difficulty in this regard occurs because of what is called the missing variable, or third- variable, effect. That is, we observe a relationship between x and y because a third variable, one that is not in the regression, affects both x and y.