regression

17
Regression

Upload: minh

Post on 14-Jan-2016

96 views

Category:

Documents


0 download

DESCRIPTION

Regression. Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? regression, regression line, regression equation Regression line is used for prediction. Predicting weights from heights. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Regression

Regression

Page 2: Regression

Regression

• Correlation measures the strength of the linear relationship

• Great! But what is that relationship? How do we describe it?

– regression, regression line, regression equation

• Regression line is used for prediction

Page 3: Regression

Predicting weights from heights• Independent variable: height• Dependent variable: weight• How can we predict one from the other ?• Regression is to a scatter plot as the mean is to a

histogram.

Page 4: Regression

Weights vs. Heights

Page 5: Regression

YRS EM

302520151050-5

SA

LA

RY

70000

60000

50000

40000

30000

20000

Salary by years employed

Page 6: Regression

Regression by local averages

Approximation ofLocal averages by regression line

Inappropriate useof regression line(use other methods)

Page 7: Regression

The equation of a line

• a represents the y-intercept

– when x equals zero, y equals a

– Is this always meaningful in the context of a problem?

– Is it always useful in defining a line?

• b represents the slope of the line (rise/run)

– for every unit change in x, y changes by b.

– Does this mean that if we physically change x by one unit, y will change by b units? Say we gain another year of experience. Will our salary go up by 1107?

bxay

Page 8: Regression

Regression equation• What is the predicted weight of somebody

whose height is h cm ?

• w = intercept + slope x h

• This is known as the regression equation.

• How do we get this formula ?

• We have a statistical model

Page 9: Regression

YRS EM

302520151050-5

SA

LAR

Y

70000

60000

50000

40000

30000

20000

A residual

xy 110728394

line regression gives Minimising

errors, squared of sum theMinimise 2i

Regression line by minimising residual errors

iii bxay i = error of i-th obs from regression line •The best candidate line willminimise these errors•No line can make all errors vanish (some +ve, some –ve)

Page 10: Regression

Regression and correlation• Want to predict weight for those people who are 1 SD

more than avg. height.

• SD line says:• pred. wt. = overall avg. wt. + SD of wt.

• Regression line says:• Predicted wt. = overall avg. wt. + r x SD of wt.• • For people who are k SDs away from avg. height:• Predicted wt. = overall avg. wt. + r x k SD of wt.• Clearly valid for r 0 or r 1

Page 11: Regression

RMS error of regression

• RMS error = SD of y

• RMS inversely related to correlation

21 r

RMS error is to regression what SD is to average

Page 12: Regression

Residuals

residual =observed -predicted

Page 13: Regression

Example: ozone vs. temperature> air[,c(1,3)]

ozone temperature

3.45 67

3.30 72

2.29 74

2.62 62

2.84 65

. . .> cor(ozone,temperature)

[1] 0.7531038

Page 14: Regression

Fitting a regression model in S> ozone.lm <- lm(ozone ~ temperature, data = air)

Coefficients:

. Value Std. Error tvalue Pr(>|t|)

(Intercept) -2.23 0.46 -4.82 0.0000

temperature 0.07 0.01 11.95 0.0000

Multiple R-Squared: 0.5672

> var(ozone)

[1] 0.7928069

> var(resid(ozone.lm))

[1] 0.3431544

> cor(ozone,temperature)

[1] 0.7531038

Page 15: Regression

Checking model appropriatenessWhat assumptions have we made in the regression model ?

Checking model assumptions in S-plus

> par(mfrow=c(2,3))

> plot(ozone.lm)

Page 16: Regression

Fitted : temperature

Res

idua

ls

2.0 2.5 3.0 3.5 4.0 4.5

-10

12

45

23

77

fitssq

rt(a

bs(R

esid

uals

))

2.0 2.5 3.0 3.5 4.0 4.5

0.2

0.4

0.6

0.8

1.0

1.2

1.4

4523

77

Fitted : temperature

ozon

e

2.0 2.5 3.0 3.5 4.0 4.5

12

34

5

Quantiles of Standard Normal

Res

idua

ls

-2 -1 0 1 2

-10

12

45

23

77

Fitted Values

0.0 0.4 0.8

-10

12

Residuals

0.0 0.4 0.8

-10

12

f-value

ozon

e

Index

Coo

k's

Dis

tanc

e0 20 40 60 80 100

0.0

0.02

0.04

0.06 17 77

20

Residual diagnostics for ozone data

Page 17: Regression

Pizza party at the Frat.• How many laps would you

predict a pledge could run if he ate 6 slices of pizza?

• How many laps if he ate 9 slices of pizza?

• A pledge shows off and eats 35 slices of pizza. How many laps would you predict he would run? SLICES

121086420D

ISTA

NC

E

20

18

16

14

12

10

8

6

4

2

965.0

5.120

r

xy

Beware of extrapolation