m22- regression & correlation 1 department of ism, university of alabama, 1992-2003 lesson...
TRANSCRIPT
M22- Regression & Correlation 1 Department of ISM, University of Alabama, 1992-2003
Lesson Objectives
Know what the equation of a straight line is, in terms of slope and y-intercept.
Learn how find the equation of the least squares regression line.
Know how to draw a regression line on a scatterplot.
Know how to use the regression equation to estimate the mean of Y for a given value of X.
M22- Regression & Correlation 2 Department of ISM, University of Alabama, 1992-2003
Best graphical tool for “seeing”the relationship between two quantitative variables.
Use to identify:
• Patterns (relationships)
• Unusual data (outliers)
Scatterplot
M22- Regression & Correlation 3 Department of ISM, University of Alabama, 1992-2003
Y
X
Y
X
Y
X
Y
X
Y
Positive Linear Relationship
Negative Linear Relationship
Nonlinear Relationship,need to change the model
No Relationship (X is not useful)
M22- Regression & Correlation 4 Department of ISM, University of Alabama, 1992-2003
RegressionRegression
AnalysisAnalysis
mechanicsmechanics
RegressionRegression
AnalysisAnalysis
mechanicsmechanics
M22- Regression & Correlation 5 Department of ISM, University of Alabama, 1992-2003
Equation of a straight line.
Y = mx + b m = slope = “rate of change”
b = the “y” intercept.
Y = a + bx
^
b = slope
a = the “y” intercept.
Days of algebra
Days of algebra
Statistics form
Statistics form
Y = estimate of the mean of Y for some X value.
^
M22- Regression & Correlation 6 Department of ISM, University of Alabama, 1992-2003
by “eyeball”.
by using equations by hand.
by hand calculator.
by computer: Minitab, Excel, etc.
Equation of a straight line.How are the slope and y-interceptdetermined?
M22- Regression & Correlation 7 Department of ISM, University of Alabama, 1992-2003
Equation of a straight line.
Y = a + bx ^
X-axis0
rise
run
a“y” intercept
b =
M22- Regression & Correlation 8 Department of ISM, University of Alabama, 1992-2003
Equation of a straight line.
Y = a + bx ^
X-axis0
rise
run
a“y” intercept
b =
M22- Regression & Correlation 9 Department of ISM, University of Alabama, 1992-2003
Population: All ST 260 students
Each value of X defines a subpopulation of “height” values.
The goal is to estimate the true mean weight for each of the infinite number of subpopulations.
Example 1:
Y = Weight in pounds,X = Height in inches.
Measure:
Is height a goodestimator of mean weight?
M22- Regression & Correlation 10 Department of ISM, University of Alabama, 1992-2003
Sample of n = 5 studentsY = Weight in pounds,X = Height in inches.
1
2
3
4
5
Ht Wt
73 175
68 158
67 140
72 207
62 115
Case
Example 1:
Step 1?
M22- Regression & Correlation 11 Department of ISM, University of Alabama, 1992-2003
DTDPDTDP
M22- Regression & Correlation 12 Department of ISM, University of Alabama, 1992-2003
100
120
140
160
180
200
220
60 64 68 72 76HEIGHT
.
.
.
.
.
WE
IGH
T
Where should the line go?
Where should the line go?
X Y
73 175
68 158
67 140
72 207
62 115
X Y
73 175
68 158
67 140
72 207
62 115
Example 1
M22- Regression & Correlation 13 Department of ISM, University of Alabama, 1992-2003
2 2
( )
( )i i
i
x y nxyb
x nx
page 615
a y bx
Equation of Least Squares Regression LineEquation of Least Squares Regression Line
y a bx Slope:Slope:
y-intercepty-intercept These are not
the preferred
computational
equations.
These are not
the preferred
computational
equations.
M22- Regression & Correlation 14 Department of ISM, University of Alabama, 1992-2003
Basic intermediate calculations
(xi - x)(yi - y)
(xi - x)2
(yi - y)2
1
2
3
= Sxy =
= Sxx =
= Syy =
Numerator part of S2
Look at your formula sheet Look at your formula sheet
M22- Regression & Correlation 15 Department of ISM, University of Alabama, 1992-2003
1
2
3
= Sxy = xy ( x)( y)
n
= Sxx =
= Syy = y2
ny)2 (
x2
nx)2 (
Alternate intermediate calculations
Look at your formula sheet Look at your formula sheet
Numerator part of S2
1
2
3
4
5
Case x y
Ht Wt
73 175
68 158
67 140
72 207
62 115
342 795
x y
xy
Ht*Wt
12775
10744
.
.
__.___54933
xy
x2
Ht 2
5329
4624
.
.
_ .___23470
x2
30625
24964
.
.
_ _.___131263
y2
Wt 2
y2
Example 1
M22- Regression & Correlation 17 Department of ISM, University of Alabama, 1992-2003
Intermediate Summary Values
xy ( x)( y) n54933 ( 342 ) ( 795 ) 5
1
=
x2 n x)2 (2
23470 (342 ) 2 5 =
y2 n y)2 (3
131263 (795 )2 5 =
Example 1
M22- Regression & Correlation 18 Department of ISM, University of Alabama, 1992-2003
Intermediate Summary ValuesExample 1
1
2
3
= 555.0
= 77.2
= 4858.0Once these values are calculated,the rest is easy!
M22- Regression & Correlation 19 Department of ISM, University of Alabama, 1992-2003
Least Squares Regression Line
whereY = a + b X
b
a y b x
1
2
Prediction equation Prediction equation
Estimated Slope
Estimated Slope
Estimated Y - intercept Estimated Y - intercept
M22- Regression & Correlation 20 Department of ISM, University of Alabama, 1992-2003
Slope, for Weight vs. Height
b 1
2 77.2555
=
= 7.189
Example 1
M22- Regression & Correlation 21 Department of ISM, University of Alabama, 1992-2003
Intercept, for Weight vs. Height
a b y x
– 332.73 =
=795 5
y = 159342
5 x = = 68.4
= 159a (+7.189) 68.4
Example 1
M22- Regression & Correlation 22 Department of ISM, University of Alabama, 1992-2003
Prediction equation
^Y = a + b X
Wt = – 332.73 + 7.189 Ht^Y = – 332.73 + 7.189 X^
Example 1
M22- Regression & Correlation 23 Department of ISM, University of Alabama, 1992-2003
100
120
140
160
180
200
220
60 64 68 72 76HEIGHT
Y = – 332.7 + 7.189X^
WE
IGH
TExample 1 Draw the line on the plot
M22- Regression & Correlation 24 Department of ISM, University of Alabama, 1992-2003
100
120
140
160
180
200
220
60 64 68 72 76HEIGHT
Y = – 332.7 + 7.189 60^
Y = 98.64
X
Y = – 332.7 + 7.189 76^
Y = 213.7
XW
EIG
HT
Example 1 Draw the line on the plot
M22- Regression & Correlation 25 Department of ISM, University of Alabama, 1992-2003
What a regression equation gives you:
The “line of means” for the Y population.
A prediction of the mean of the population of Y-values defined by a specific value of X.
Each value of X defines a subpopulation of Y-values; the value of regression equation is the
“least squares” estimate of the mean of that Y subpopulation.
M22- Regression & Correlation 26 Department of ISM, University of Alabama, 1992-2003
Example 2: Estimate the weight of a student 5’ 5” tall.
Y = a + b X = – 332.73 + 7.189 X^
M22- Regression & Correlation 27 Department of ISM, University of Alabama, 1992-2003
100
120
140
160
180
200
220
60 64 68 72 76HEIGHT
Y = – 332.7 + 7.189(65) =
^
WE
IGH
TExample 2
M22- Regression & Correlation 28 Department of ISM, University of Alabama, 1992-2003
Calculate your own weight.
Why was your estimate not exact?
M22- Regression & Correlation 29 Department of ISM, University of Alabama, 1992-2003
1. Calculate the least squares regression line.
2. Plot the data and draw theline through the data.
3. Predict Y for a given X.
4. Interpret the meaning of the regression line.
Regression: Know How To:
M22- Regression & Correlation 30 Department of ISM, University of Alabama, 1992-2003
M22- Regression & Correlation 31 Department of ISM, University of Alabama, 1992-2003
CorrelationCorrelation
M22- Regression & Correlation 32 Department of ISM, University of Alabama, 1992-2003
Sample Correlation Coefficient, r
A numerical summary statistic that measures the strength of
the linear association between two quantitative variables.
M22- Regression & Correlation 33 Department of ISM, University of Alabama, 1992-2003
Notation:
• r = sample correlation.
• = population correlation, “rho”.
r is an “estimator” of
M22- Regression & Correlation 34 Department of ISM, University of Alabama, 1992-2003
Interpreting correlation:
-1.0 -1.0 rr +1.0 +1.0
r > 0.0 Pattern runs upward from left to right; “positive” trend.
r < 0.0 Pattern runs downward from left to right; “negative” trend.
M22- Regression & Correlation 35 Department of ISM, University of Alabama, 1992-2003
Upward & downward trends:
r > 0.0 r < 0.0
Y
X-axis
Y
X-axisSlope and correlation
must have the same sign.
Slope and correlationmust have the same sign.
M22- Regression & Correlation 36 Department of ISM, University of Alabama, 1992-2003
All data exactly on a straight line:
r = _____ r = _____
Perfect positive
relationship
Perfect positive
relationship Perfect negative
relationship
Perfect negative
relationship
Y
X-axis
Y
X-axis
M22- Regression & Correlation 37 Department of ISM, University of Alabama, 1992-2003
r = _____________ r = _____________
Which has stronger correlation?
Y
X-axis
Y
X-axis
M22- Regression & Correlation 38 Department of ISM, University of Alabama, 1992-2003
r close to -1 or +1 means _________________________ linear relation.
r close to 0 means _________________________ linear relation.
"Strength": How tightly the data follow a straight line.
M22- Regression & Correlation 39 Department of ISM, University of Alabama, 1992-2003
r = ________________ r = ________________
Which has stronger correlation?
Y
X-axis
Y
X-axis
M22- Regression & Correlation 40 Department of ISM, University of Alabama, 1992-2003
Y
X-axis X -axis
Y
Which has stronger correlation?
Strong parabolic pattern! We can fix it.
Strong parabolic pattern! We can fix it.
r = ________________ r = ________________
M22- Regression & Correlation 41 Department of ISM, University of Alabama, 1992-2003
Computing Correlation
by hand using the formula
using a calculator (built-in)
using a computer: Excel, Minitab, . . . .
M22- Regression & Correlation 42 Department of ISM, University of Alabama, 1992-2003
Formula for Sample Correlation (Page 627)
2 2 2 2
( ) ( )( )
( ) ( ) ( ) ( )
i i i i
i i i i
n x y x yr
n x x n y y
2 3
1r Sxy
Syy
Sxx
Look at your formula sheet Look at your formula sheet
M22- Regression & Correlation 43 Department of ISM, University of Alabama, 1992-2003
Calculating Correlation
2 3
1r =
Look at your formula sheet Look at your formula sheet
Example 1; Weight versus Height
=
“Go to Slide 18 for values.”
M22- Regression & Correlation 44 Department of ISM, University of Alabama, 1992-2003
3000250020001500
200000
150000
100000
SQFT
ECI
RPLES
Scatterplot of Selling Price vs Square Footage for 50 Houses
Positive Linear Relationship
Example 6 Real estate data,Real estate data, previous sectionprevious section
r =
M22- Regression & Correlation 45 Department of ISM, University of Alabama, 1992-2003
1009080706050403020100
90
80
70
60
50
40
30
20
10
FRLUNCH
8TPTAS
Scatterplot of 8th Grade SAT Percentile vs Free Lunch Participationfor the 128 Public School Systems in Alabama in 1995
Negative Linear Relationship
Example 7 AL school dataAL school data ,, previous sectionprevious section
r =
M22- Regression & Correlation 46 Department of ISM, University of Alabama, 1992-2003
6543210
6
5
4
3
2
1
M_RAIN
NIAR_T
Scatterplot of Tuscaloosa Rainfall vs Moscow Rainfall
for 60 Months
No linear Relationship
Example 9 RainfallRainfall data data ,, previous previous sectionsection
r =
M22- Regression & Correlation 47 Department of ISM, University of Alabama, 1992-2003
Size of “r” does NOT reflect the steepness of the slope, “b”;
but “r” and “b” must have the same sign.
r = b s x s y
and = b r s y
s x
Comment 1:
M22- Regression & Correlation 48 Department of ISM, University of Alabama, 1992-2003
Changing the units of Y and X does not affect the size of r.
Comment 2:
Inches to centimetersPounds to kilogramsCelsius to FahrenheitX to Z (standardized)
M22- Regression & Correlation 49 Department of ISM, University of Alabama, 1992-2003
Comment 3: High correlation does not always imply causation.
Example: X = dryer temperature Y = drying time for clothes
Causation: Changes in X
actually do cause changes in Y.
Consistency, responsiveness, mechanism
M22- Regression & Correlation 50 Department of ISM, University of Alabama, 1992-2003
Common ResponseBoth X and Y change as some unobserved third variable changes.
Comment 4:
Example:In basketball, there is a high correlation between points scored and personal fouls committed over a season. Third variable is ___?
M22- Regression & Correlation 51 Department of ISM, University of Alabama, 1992-2003
ConfoundingThe effect of X on Y is"hopelessly" mixed up with the effects of other variables on Y.
Example:
Is adult behavior most affected
by environment or genetics?
Comment 5:
M22- Regression & Correlation 52 Department of ISM, University of Alabama, 1992-2003
The end