© the mcgraw-hill companies, inc., 2000 11-1 chapter 11 correlation and regression
TRANSCRIPT
![Page 1: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/1.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-111-1
Chapter 11Chapter 11
Correlation and Correlation and RegressionRegression
![Page 2: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/2.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-211-2 OutlineOutline
11-1 Introduction
11-2 Scatter Plots
11-3 Correlation
11-4 Regression
![Page 3: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/3.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-311-3 OutlineOutline
11-5 Coefficient of
Determination and
Standard Error of
Estimate
![Page 4: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/4.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-411-4 ObjectivesObjectives
Draw a scatter plot for a set of ordered pairs.
Find the correlation coefficient. Test the hypothesis H0: = 0. Find the equation of the
regression line.
![Page 5: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/5.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-511-5 ObjectivesObjectives
Find the coefficient of determination.
Find the standard error of estimate.
Find a prediction interval.
![Page 6: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/6.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-611-6 11-2 Scatter Plots11-2 Scatter Plots
AA scatter plotscatter plot is a graph of the ordered pairs (x, y)(x, y) of numbers consisting of the independent variable, xx, and the dependent variable, yy.
![Page 7: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/7.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-711-7 11-2 Scatter Plots -11-2 Scatter Plots - Example
Construct a scatter plot for the data obtained in a study of age and systolic blood pressure of six randomly selected subjects.
The data is given on the next slide.
![Page 8: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/8.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-811-8 11-2 Scatter Plots -11-2 Scatter Plots - Example
Subject Age, x Pressure, y
A 43 128
B 48 120
C 56 135
D 61 143
E 67 141
F 70 152
![Page 9: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/9.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-911-9 11-2 Scatter Plots -11-2 Scatter Plots - Example
70605040
150
140
130
120
Age
Pre
ssur
e
70605040
150
140
130
120
Age
Pre
ssur
ePositive Relationship
![Page 10: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/10.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-1011-10 11-2 Scatter Plots -11-2 Scatter Plots - Other Examples
15105
90
80
70
60
50
40
Number of absences
Fin
al g
rade
15105
90
80
70
60
50
40
Number of absences
Fin
al g
rade
Negative Relationship
![Page 11: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/11.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-1111-1111-2 Scatter Plots -11-2 Scatter Plots - Other Examples
706050403020100
10
5
0
X
Y
706050403020100
10
5
0
x
yNo Relationship
![Page 12: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/12.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-1211-12 11-3 Correlation Coefficient11-3 Correlation Coefficient
The correlation coefficientcorrelation coefficient computed from the sample data measures the strength and direction of a relationship between two variables.
Sample correlation coefficient, r. Population correlation coefficient,
![Page 13: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/13.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-1311-1311-3 Range of Values for the 11-3 Range of Values for the
Correlation CoefficientCorrelation Coefficient
Strong negativerelationship
Strong positiverelationship
No linearrelationship
![Page 14: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/14.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-1411-1411-3 Formula for the Correlation 11-3 Formula for the Correlation
Coefficient Coefficient rr
r
n xy x y
n x x n y y
2 2 2 2
Where n is the number of data pairs
![Page 15: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/15.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-1511-1511-3 Correlation Coefficient - 11-3 Correlation Coefficient -
Example (Verify)
Compute the correlation coefficientcorrelation coefficient for the age and blood pressure data.
x y xy
x y
Substituting in the formula for r gives
r
345 819 47 634
20 399 112 443
0 897
2 2
,
, , , .
. .
= , = ,
![Page 16: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/16.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-1611-1611-3 The Significance of the 11-3 The Significance of the
Correlation Coefficient Correlation Coefficient
The population corelation population corelation coefficientcoefficient, , is the correlation between all possible pairs of data values (x, y) taken from a population.
![Page 17: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/17.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-1711-1711-3 The Significance of the 11-3 The Significance of the
Correlation Coefficient Correlation Coefficient
H0: = 0 H1: 0 This tests for a significant
correlation between the variables in the population.
![Page 18: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/18.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-1811-1811-3 Formula for the 11-3 Formula for the t t tests for the tests for the
Correlation Coefficient Correlation Coefficient
tn
rwith d f n
2
12
2
. .
![Page 19: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/19.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-1911-19 11-311-3 Example
Test the significance of the correlation coefficient for the age and blood pressure data. Use = 0.05 and r = 0.897.
Step 1:Step 1: State the hypotheses. H0: = 0 H1: 0
![Page 20: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/20.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-2011-20
Step 2:Step 2: Find the critical values. Since = 0.05 and there are 6 – 2 = 4 degrees of freedom, the critical values are t = +2.776 and t = –2.776.
Step 3: Step 3: Compute the test value. t = 4.059 (verify).
11-311-3 Example
![Page 21: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/21.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-2111-21
Step 4:Step 4: Make the decision. Reject the null hypothesis, since the test value falls in the critical region (4.059 > 2.776).
Step 5: Step 5: Summarize the results. There is a significant relationship between the variables of age and blood pressure.
11-311-3 Example
![Page 22: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/22.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-2211-22
The scatter plot for the age and blood pressure data displays a linear pattern.
We can model this relationship with a straight line.
This regression line is called the line of best fit or the regression line.
The equation of the line is y = a + bx.
11-4 Regression11-4 Regression
![Page 23: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/23.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-2311-2311-4 Formulas for the Regression 11-4 Formulas for the Regression
Line Line y = a + bx.
ay x x xy
n x x
bn xy x y
n x x
2
2 2
2 2
Where a is the y intercept and b is the slope of the line.
![Page 24: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/24.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-2411-24 11-411-4 Example
Find the equation of the regression line for the age and the blood pressure data.
Substituting into the formulas give a = 81.048 and b = 0.964 (verify).
Hence, y = 81.048 + 0.964x. Note, aa represents the interceptintercept and bb
the slopeslope of the line.
![Page 25: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/25.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-2511-25 11-411-4 Example
70605040
150
140
130
120
Age
Pre
ssur
e
70605040
150
140
130
120
Age
Pre
ssur
e
y = 81.048 + 0.964x
![Page 26: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/26.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-2611-2611-4 Using the Regression Line to11-4 Using the Regression Line to Predict Predict
The regression line can be used to predict a value for the dependent variable (y) for a given value of the independent variable (x).
Caution:Caution: Use x values within the experimental region when predicting y values.
![Page 27: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/27.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-2711-27 11-411-4 Example
Use the equation of the regression line to predict the blood pressure for a person who is 50 years old.
Since y = 81.048 + 0.964x, theny = 81.048 + 0.964(50) = 129.248 129.
Note that the value of 50 is within the range of x values.
![Page 28: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/28.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-2811-2811-5 Coefficient of Determination 11-5 Coefficient of Determination and Standard Error of Estimateand Standard Error of Estimate
The coefficient of determinationcoefficient of determination, denoted by r2, is a measure of the variation of the dependent variable that is explained by the regression line and the independent variable.
![Page 29: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/29.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-2911-2911-5 Coefficient of Determination 11-5 Coefficient of Determination and Standard Error of Estimateand Standard Error of Estimate
r2 is the square of the correlation coefficient.
The coefficient of coefficient of nondeterminationnondetermination is (1 – r2).
Example: If r = 0.90, then r2 = 0.81.
![Page 30: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/30.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-3011-3011-5 Coefficient of Determination 11-5 Coefficient of Determination and Standard Error of Estimateand Standard Error of Estimate
The standard error of estimatestandard error of estimate, denoted by sest, is the standard deviation of the observed y values about the predicted y values.
The formula is given on the next slide.
![Page 31: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/31.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-3111-3111-5 Formula for the Standard 11-5 Formula for the Standard
Error of Estimate Error of Estimate
s
y y
nor
sy a y b xy
n
est
est
2
2
2
2
![Page 32: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/32.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-3211-3211-5 Standard Error of Estimate -11-5 Standard Error of Estimate -
Example
From the regression equation, y = 55.57 + 8.13x and n = 6, find sest.
Here, a = 55.57, b = 8.13, and n = 6. Substituting into the formula gives sest
= 6.48 (verify).
![Page 33: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/33.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-3311-33 11-5 Prediction Interval11-5 Prediction Interval
A prediction intervalprediction interval is an interval constructed about a predicted y value, y , for a specified x value.
![Page 34: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/34.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-3411-34 11-5 Prediction Interval11-5 Prediction Interval
For given value, we can state with (1 – )100% confidence that the interval will contain the actual mean of the y values that correspond to the given value of x.
![Page 35: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/35.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-3511-3511-5 Formula for the Prediction 11-5 Formula for the Prediction Interval about a Value Interval about a Value yy
22
2)(11
2 xxn
Xxn
neststy
22
2)(11
2 xxn
Xxn
neststy
y
2.. nfdwith
![Page 36: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/36.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-3611-3611-5 Prediction interval -11-5 Prediction interval - Example
A researcher collects the data shown on the next slide and determines that there is a significant relationship between the age of a copy machine and its monthly maintenance cost. The regression equation is y = 55.57 + 8.13x. Find the 95% prediction interval for the monthly maintenance cost of a machine that is 3 years old.
![Page 37: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/37.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-3711-3711-5 Prediction Interval -11-5 Prediction Interval - Example
Machine Age, x (Years) Monthly cost, y
A 1 $62
B 2 $78
C 3 $70
D 4 $90
E 4 $93
F 6 $103
![Page 38: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/38.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-3811-38
Step 1: Step 1: Find x, x2 and . x = 20,
x2 = 82, Step 2: Step 2: Find y for x = 3.
y = 55.57 + 8.13(3) = 79.96 Step 3: Step 3: Find sest
sest = 6.48 as shown in previous example.
11-5 Prediction Interval -11-5 Prediction Interval - Example
X
X 3.36
20
![Page 39: © The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression](https://reader035.vdocument.in/reader035/viewer/2022062718/56649eab5503460f94bb095e/html5/thumbnails/39.jpg)
© The McGraw-Hill Companies, Inc., 2000
11-3911-39
Step 4: Step 4: Substitute in the formula and solve. t/2 = 2.776, d.f. = 6 – 2 = 4 for 95%
60.53 < y < 99.39 (verify)
Hence, one can be 95% confident that the interval 60.53 < y < 99.39 contains the actual value of y.
11-5 Prediction Interval -11-5 Prediction Interval - Example