regression
DESCRIPTION
Regression. Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? regression, regression line, regression equation Regression line is used for prediction. Predicting weights from heights. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/1.jpg)
Regression
![Page 2: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/2.jpg)
Regression
• Correlation measures the strength of the linear relationship
• Great! But what is that relationship? How do we describe it?
– regression, regression line, regression equation
• Regression line is used for prediction
![Page 3: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/3.jpg)
Predicting weights from heights• Independent variable: height• Dependent variable: weight• How can we predict one from the other ?• Regression is to a scatter plot as the mean is to a
histogram.
![Page 4: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/4.jpg)
Weights vs. Heights
![Page 5: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/5.jpg)
YRS EM
302520151050-5
SA
LA
RY
70000
60000
50000
40000
30000
20000
Salary by years employed
![Page 6: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/6.jpg)
Regression by local averages
Approximation ofLocal averages by regression line
Inappropriate useof regression line(use other methods)
![Page 7: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/7.jpg)
The equation of a line
• a represents the y-intercept
– when x equals zero, y equals a
– Is this always meaningful in the context of a problem?
– Is it always useful in defining a line?
• b represents the slope of the line (rise/run)
– for every unit change in x, y changes by b.
– Does this mean that if we physically change x by one unit, y will change by b units? Say we gain another year of experience. Will our salary go up by 1107?
bxay
![Page 8: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/8.jpg)
Regression equation• What is the predicted weight of somebody
whose height is h cm ?
• w = intercept + slope x h
• This is known as the regression equation.
• How do we get this formula ?
• We have a statistical model
![Page 9: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/9.jpg)
YRS EM
302520151050-5
SA
LAR
Y
70000
60000
50000
40000
30000
20000
A residual
xy 110728394
line regression gives Minimising
errors, squared of sum theMinimise 2i
Regression line by minimising residual errors
iii bxay i = error of i-th obs from regression line •The best candidate line willminimise these errors•No line can make all errors vanish (some +ve, some –ve)
![Page 10: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/10.jpg)
Regression and correlation• Want to predict weight for those people who are 1 SD
more than avg. height.
• SD line says:• pred. wt. = overall avg. wt. + SD of wt.
• Regression line says:• Predicted wt. = overall avg. wt. + r x SD of wt.• • For people who are k SDs away from avg. height:• Predicted wt. = overall avg. wt. + r x k SD of wt.• Clearly valid for r 0 or r 1
![Page 11: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/11.jpg)
RMS error of regression
• RMS error = SD of y
• RMS inversely related to correlation
21 r
RMS error is to regression what SD is to average
![Page 12: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/12.jpg)
Residuals
residual =observed -predicted
![Page 13: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/13.jpg)
Example: ozone vs. temperature> air[,c(1,3)]
ozone temperature
3.45 67
3.30 72
2.29 74
2.62 62
2.84 65
. . .> cor(ozone,temperature)
[1] 0.7531038
![Page 14: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/14.jpg)
Fitting a regression model in S> ozone.lm <- lm(ozone ~ temperature, data = air)
Coefficients:
. Value Std. Error tvalue Pr(>|t|)
(Intercept) -2.23 0.46 -4.82 0.0000
temperature 0.07 0.01 11.95 0.0000
Multiple R-Squared: 0.5672
> var(ozone)
[1] 0.7928069
> var(resid(ozone.lm))
[1] 0.3431544
> cor(ozone,temperature)
[1] 0.7531038
![Page 15: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/15.jpg)
Checking model appropriatenessWhat assumptions have we made in the regression model ?
Checking model assumptions in S-plus
> par(mfrow=c(2,3))
> plot(ozone.lm)
![Page 16: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/16.jpg)
Fitted : temperature
Res
idua
ls
2.0 2.5 3.0 3.5 4.0 4.5
-10
12
45
23
77
fitssq
rt(a
bs(R
esid
uals
))
2.0 2.5 3.0 3.5 4.0 4.5
0.2
0.4
0.6
0.8
1.0
1.2
1.4
4523
77
Fitted : temperature
ozon
e
2.0 2.5 3.0 3.5 4.0 4.5
12
34
5
Quantiles of Standard Normal
Res
idua
ls
-2 -1 0 1 2
-10
12
45
23
77
Fitted Values
0.0 0.4 0.8
-10
12
Residuals
0.0 0.4 0.8
-10
12
f-value
ozon
e
Index
Coo
k's
Dis
tanc
e0 20 40 60 80 100
0.0
0.02
0.04
0.06 17 77
20
Residual diagnostics for ozone data
![Page 17: Regression](https://reader035.vdocument.in/reader035/viewer/2022070411/56814759550346895db4980a/html5/thumbnails/17.jpg)
Pizza party at the Frat.• How many laps would you
predict a pledge could run if he ate 6 slices of pizza?
• How many laps if he ate 9 slices of pizza?
• A pledge shows off and eats 35 slices of pizza. How many laps would you predict he would run? SLICES
121086420D
ISTA
NC
E
20
18
16
14
12
10
8
6
4
2
965.0
5.120
r
xy
Beware of extrapolation