regression for the purposes of this class: does y depend on x? does a change in x cause a change in...

22
Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b 100 120 140 160 180 30 40 50 60 70 80 IndependentValue D ependentV alue Predicted values Overall Mean Actual values

Upload: allyson-nelson

Post on 19-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

1) The regression line passes through the point (X avg, Y avg ). 2) Its slope is at the rate of “m” units of Y per unit of X, where m = regression coefficient (slope; y=mx+b) The line of best fit is the sample regression of Y on X, and its position is fixed by two results: (55, 138) Y = 1.24(X) slopeY-intercept Rise/Run

TRANSCRIPT

Page 1: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

Regression• For the purposes of this class:

– Does Y depend on X?– Does a change in X cause a change in Y?– Can Y be predicted from X?

• Y= mX + b

100

120

140

160

180

30 40 50 60 70 80

Independent Value

Depe

nden

t Val

ue

Predicted values

Overall Mean

Actual values

Page 2: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

When analyzing a regression-type data set, the first step is to plot the data:

X Y35 11445 12055 15065 14075 16655 138

The next step is to determine the line that ‘best fits’ these points. It appears this line would be sloped upward and linear (straight).

100

120

140

160

180

30 40 50 60 70 80

Independent Value (X)

Dep

ende

nt V

alue

(Y)

Page 3: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

1) The regression line passes through the point (Xavg, Yavg).

2) Its slope is at the rate of “m” units of Y per unit of X, where m = regression coefficient (slope; y=mx+b)

The line of best fit is the sample regression of Y on X, and its position is fixed by two results:

100

120

140

160

180

30 40 50 60 70 80

Independent Value

Dep

ende

nt V

alue

(55, 138)

Y = 1.24(X) + 69.8

slope Y-intercept

Rise/Run

Page 4: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

Testing the Regression Line for Significance

• An F-test is used based on Model, Error, and Total SOS.– Very similar to ANOVA

• Basically, we are testing if the regression line has a significantly different slope than a line formed by using just Y_avg.– If there is no difference, then that means that Y

does not change as X changes (stays around the average value)

• To begin, we must first find the regression line that has the smallest Error SOS.

Page 5: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

100

120

140

160

180

30 40 50 60 70 80

Independent Value

Dep

ende

nt V

alueError SOS

The regression line should pass through the overall average with a slope that has the smallest Error SOS (Error SOS = the distance between each point and predicted line: gives an index of the variability of the data points around the predicted line).

overall average is the pivot point

55

138

Page 6: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

For each X, we can predict Y:Y = 1.24(X) + 69.8

X Y_Actual Y_Pred SOSError

35 114 113.2 0.6445 120 125.6 31.3655 150 138 14465 140 150.4 108.1675 166 162.8 10.24

294.4

Error SOS is calculated as the sum of (YActual – YPredicted)2

This gives us an index of how scattered the actual observations are around the predicted line. The more scattered the points, the larger the Error SOS will be. This is like analysis of variance, except we are using the predicted line instead of the mean value.

100

120

140

160

180

30 40 50 60 70 80

Independent Value

Page 7: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

Total SOS• Calculated as the sum of (Y – Yavg)2

• Gives us an index of how scattered our data set is around the overall Y average.

100

120

140

160

180

30 40 50 60 70 80

Independent Value

Depe

nden

t Val

ue

Overall Y average

Regression line not shown

Page 8: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

X Y_Actual Y Average SOSTotal

35 114 138 57645 120 138 32455 150 138 14465 140 138 475 166 138 784

1832

Total SOS gives us an index of how scattered the data points are around the overall average. This is calculated the same way for a single treatment in ANOVA.

What happens to Total SOS when all of the points are close to the overall average? What happens when the points form a non-horizontal linear trend?

Page 9: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

Model SOS

• Calculated as the Sum of (YPredicted – Yavg)2

• Gives us an index of how far away the predicted values are from the overall average value

100

120

140

160

180

30 40 50 60 70 80

Independent Value

Depe

nden

t Val

ue Distance between predicted Y and overall mean

Page 10: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

Model SOS• Gives us an index of how far away the predicted

values are from the overall average value

• What happens to Model SOS when all of the predicted values are close to the average value?

X Y_Pred Y Average SOSModel

35 113.2 138 615.0445 125.6 138 153.7655 138 138 065 150.4 138 153.7675 162.8 138 615.04

1537.6

Page 11: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

All Together Now!!

X Y_Actual Y_Pred SOSError Y_Avg SOSTotal SOSModel

35 114 113.2 0.64 138 576 615.04

45 120 125.6 31.36 138 324 153.76

55 150 138 144 138 144 0

65 140 150.4 108.16 138 4 153.76

75 166 162.8 10.24 138 784 615.04

294.4 1832 1537.6

SOSError = (Y_Actual – Y_Pred)2

SOSTotal = (Y_Actual –Y_ Avg) 2

SOSModel = (Y_Pred – Y_Avg) 2

Page 12: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

Using SOS to Assess Regression Line

• Model SOS gives us an index on how ‘different’ the predicted values are from the average values.– Bigger Model SOS = more different– Tells us how different a sloped line is from a line made

up only of Y_avg.– Remember, the regression line will pass through the

overall average point.

• Error SOS gives us an index of how different the predicted values are from the actual values– More variability = larger Error SOS = large distance

between predicted and actual values

Page 13: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

Magic of the F-test• The ratio of Model SOS to Error SOS (Model SOS divided by Error SOS)

gives us an overall index (the F statistic) used to indicate the relative ‘difference’ between the regression line and a line with slope of zero (all values = Y_avg.– A large Model SOS and small Error SOS = a large F statistic. Why does this

indicate a significant difference?– A small Model SOS and a large Error SOS = a small F statistic. Why does

this indicate no significant difference??

• Based on sample size and alpha level (P-value), each F statistic has an associated P-value.– P < 0.05 (Large F statistic) there is a significant difference between the

regression line a the Y_avg line.– P ≥ 0.05 (Small F statistic) there is NO significant difference between the

regression line a the Y_avg line.

Page 14: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

100

120

140

160

180

30 40 50 60 70 80

Independent Value

Depe

nden

t Val

ue

Mean Model SOS Mean Error SOS

100

120

140

160

180

30 40 50 60 70 80Independent Value

Dep

ende

nt V

alue

Basically, this is an index that tells us how different the regression line is from Y_avg, and the scatter of the data around the predicted values.

= F

Page 15: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

100

120

140

160

180

30 40 50 60 70 80

Independent Value

Dep

ende

nt V

alue

Y = 1.24(X) + 69.8

slope Y-intercept

Rise/Run

Use regression line to predict a specific number or a specific change.

Page 16: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

Correlation (r):Another measure of the mutual linear relationship between two variables.• ‘r’ is a pure number without units or dimensions• ‘r’ is always between –1 and 1• Positive values indicate that y increases when x

does and negative values indicate that y decreases when x increases.– What does r = 0 mean?

• ‘r’ is a measure of intensity of association observed between x and y.– ‘r’ does not predict – only describes associations

between variables

Page 17: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

100

120

140

160

180

30 40 50 60 70 80

Inpendent Variable

Depe

nden

t Var

iabl

e

100

120

140

160

180

30 40 50 60 70 80

Independent Variable

Depe

nden

t Var

iabl

e

100

120

140

160

180

30 40 50 60 70 80

Independent Variable

Depe

nden

t Var

iabl

e

r > 0r < 0

r = 0r is also called Pearson’s correlation coefficient.

Page 18: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

R-square• If we square r, we get rid of the negative value

if it is negative and we get an index of how close the data points are to the regression line.

• Allows us to decide how much confidence we have in making a prediction based on our model.

• Is calculated as Model SOS / Total SOS

Page 19: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

r2 = Model SOS / Total SOS

100

120

140

160

180

30 40 50 60 70 80

Independent Value

Depe

nden

t Val

ue

= Model SOS

= Total SOS

Page 20: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

100

120

140

160

180

30 40 50 60 70 80

Independent Value

Depe

nden

t Val

ue= Model SOS

= Total SOS

R2 = 0.0144

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50

r2 = Model SOS / Total SOS

numerator/denominator

Small numerator Big denominator

R2 = 0.8393

Page 21: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

R-square and Prediction Confidence

R2 = 0.0144

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60

R2 = 0.5537

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60

R2 = 0.7605

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60

R2 = 0.9683

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60

Page 22: Regression For the purposes of this class: Does Y depend on X? Does a change in X cause a change in Y? Can Y be predicted from X? Y= mX + b Predicted

Finally……..

• If we have a significant relationship (based on the p-value), we can use the r-square value to judge how sure we are in making a prediction.