least-squares regression section 3.3. why create a model? there are two reasons to create a...

26
Least-Squares Regression Section 3.3

Upload: roland-adams

Post on 12-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Least-Squares RegressionSection 3.3

Page 2: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Why Create a Model?There are two reasons to create a

mathematical model for a set of bivariate data.To predict the response value for a new

individual.To find the “average” response value for any

explanatory value.

Page 3: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Which Model is “Best”Since we want to use our model to predict

response values for given explanatory values, we will define “best” as the model in which we have the smallest error. (We will define “error/residual” as the vertical distance from an observed value to the prediction line)

Residual = Observed – Predicted

When the variables show a linear relationship, we find that the line of “best” fit is the

Least-Squares Regression Line

Page 4: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Least-Squares Regression LineWhy is it called the

“Least-Squares Regression Line?

Consider our data set from the Hamburger data

Notice that our line is an “average” line and that it does not intersect each piece of data.

•This means that our predictions will have some error associated with it

Ca

lori

es

400

450

500

550

600

650

700

Fat18 20 22 24 26 28 30 32 34 36 38 40 42 44

Calories = 11.1Fat + 210; r^2 = 0.92

Collection 1 Scatter Plot

Page 5: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

So Why is it “Best”?If we find the vertical distance from the

actual data point to our prediction line, we can find the amount of error. But if we try to add these errors together, we will find they add to zero since our line is an “average” line.

We can avoid that sum of zero by squaring each of those errors and then finding the sum.

Page 6: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Smallest “Sum of Squared Error”We find that the line

called the Least-Squares Regression Line has the smallest sum of squared error.

This seems to indicate that this model will be the line that does the best job of predicting.

Ca

lori

es

400

450

500

550

600

650

700

Fat18 20 22 24 26 28 30 32 34 36 38 40 42 44

Calories = 11.1Fat + 210; r^2 = 0.92; Sum of squares = 3736

Collection 1 Scatter Plot

Page 7: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Equation of the LSRLThe LSRL can be

found using the means, standard deviations, and the correlation between our explanatory and response variable.

xbby 10ˆ Where:yhat = predicted response variablebo = y-interceptb1 = slopex = explanatory variable value

Page 8: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Calculating LSRL using summary statistics

When all you have is the summary statistics, we can use the following equations to calculate

Where b1 and b0 can be found using:

xbby o 1ˆ

xbybo 1

x

y

s

srb 1

Page 9: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Finding the LSRLSo with the summary

statistics for both minutes and points, we can find the line of “best” fit for predicting the number of points we can expect, on average, for a given number of minutes played.

9606.

8146.89

590

8042.7

2857.34

r

S

y

S

x

y

x

Page 10: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

fat)(riesoCal 1bbo

9682.210)2857.34(0551.11590

0551.118042.7

8146.899606.1

ob

b

fat)(0551.119682.210riesocal

Page 11: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Describing bo in contextb0= the y-intercept: the y-intercept is the value of

the response variable when our explanatory variable is zero. Sometimes this has meaning in context and sometimes has only a mathematical meaning.

fat)(0551.119682.210riesocal

bo = 210.9682, this would mean that if a hamburger had no grams of fat, it would still have, on average, approximately 211 calories

Page 12: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Describing b1 in contextb1= the slope: the slope of the regression line,

tells us, what change in the response variable we expect, on average, for an increase of 1 in the explanatory variable.

Since b1= 11.0551, we can say that, on average, for each additional fat gram in a hamburger, we would expect approximately 11.0551 more calories

fat)(0551.119682.210riesocal

Page 13: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Finding the LSRL with raw dataWe can find the LSRL using technology---

either our TI-calculators or a statistical software.

The program called “StatCrunch” is a web based statistical program that provides statistical calculations and plots. The output is very similar to most statistical programs.

Page 14: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Simple linear regression results: Dependent Variable: Calories Independent Variable: Fat Calories = 210.95387 + 11.055512 Fat Sample size: 7 R (correlation coefficient) = 0.9606 R-sq = 0.9228155 Estimate of error standard deviation: 27.333975

Least-Squares Regression OutputRegression Equation

Y-Intercept

Slope

Parameter

Estimate Std. Err. DF T-Stat P-Value

Intercept 210.95387 50.10144 5 4.210535 0.0084

Slope 11.055512 1.4298865 5 7.7317414 0.0006

Parameter estimates:

Page 15: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

TI-Tips for LSRLTo find the LSRL on a TI-83, 84 calculator,

first enter the data into the list editor of the calculator. This can be either named lists or the built in lists.

Page 16: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

From the home screen:STATCALC8:LinReg(a+bx)

The arguments for this command are simply to tell the calculator where the explanatory and response values are located.

ENTERNotice that in

addition to the values for the y-intercept and slope, the correlation coefficient, r, is also given.

Page 17: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Is a linear model appropriate?We now know how to create a linear model,

but how do we know that this type of model is the appropriate one?

To answer this question, we look at 3 things: Does a scatterplot of the data appear linear? How strong is the linear relationship, as measured

by the correlation coefficient, “r” ? What does a graph of the residuals (errors in

prediction) look like.

Page 18: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Checking for LinearityAs we can see from the

scatterplot the relationship appears fairly linear

The correlation coefficient for the linear relationship is .9606

Even though both of these things indicate a linear model, we must check a graph of the residuals to make sure the errors associated with a linear model aren’t systematic in some way.

Ca

lori

es

400

450

500

550

600

650

700

Fat18 20 22 24 26 28 30 32 34 36 38 40 42 44

Collection 1 Scatter Plot

Page 19: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

ResidualsWe can look at a graph

of the number of minutes (x-values) vs the errors produced by the LSRL. If there is no pattern present, we can use a linear model to represent the relationship.

However, if a pattern is present, (like any of the graphs at the right) we should investigate other possible models.

resi

du

al

-1.0

-0.6

-0.2

0.2

0.6

1.0

1.4

x-3 -2 -1 0 1 2 3 4 5 6

Collection 1 Scatter Plot

resi

du

al3

-1.0

-0.6

-0.2

0.2

0.6

1.0

x3-2 0 2 4 6

Collection 1 Scatter Plot

A parabolic shape indicates the data is not linear

A “trig” looking pattern indicates “auto-correlation”.

resi

du

al2

-2.0

-1.0

0.0

1.0

2.0

x2-3 -2 -1 0 1 2 3 4

Collection 1 Scatter Plot

An increase or decrease in variation is called a mega-phone effect

Page 20: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Hamburger residualsNotice that there

does not appear to be any pattern to the residuals of the least-squares regression line between the fat grams and calories for fast food hamburgers. This would indicate that a linear model is appropriate.

Page 21: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

How Good is our ModelAlthough a linear model may be appropriate,

we can also evaluate how much of the differences in our response variable can be explained by the differences in the explanatory variable.

The statisticsthat gives this information is r2. This is the Coefficient of Determination. This statistic helps us to measure the contribution of our explanatory variable in predicting our response variable.

Page 22: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

How Good is our Hamburger Model? Remember from both our stat crunch

output and our calculator output, we found that r2=.9228

Approximately 92% of the differences in the number of calories in a hamburger can be explained by the differences in the amount of fat grams.

An alternative way to say this same thing:Approximately 92% of the differences in the

number of calories can be explained by the least-squares regression of calories on fat grams.

Page 23: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

So, how good is it????Well it may help to know how r2 is calculated.

Yes, r2 is the square of the correlation coefficient r, however it is useful to see it in a different light.

Remember that our goal is to find a model that helps us to predict the response variable, in this case points scored.

Page 24: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Interpreting r2

When r2 is close to zero, this indicates that the variable we have chosen to use as a predictor does not contribute much, in other words, it would be just as valuable to use the mean of our response variable.

As r2 gets closer to 1, this indicates that the explanatory variable is contributing much more to our predictions and our regression model will be more useful for predictions than just reporting a mean.

Some models include more than one explanatory variable, this type of model is called multiple-linear regression and we’ll leave the study of these models for another course

Page 25: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

Additional ResourcesAgainst All Odds

http://www.learner.org/resources/series65.html Video #7 Models for Growth

The Practice of Statistics-YMM Pg 137-151The Practice of Statistics-YMS Pg 149-165The Basic Practice of Statistics-Moore Pg 104-

123

Page 26: Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict

What you learned:Why we create a modelWhich model is “best” and whyFinding the LSRL using summary statsUsing technology to find the LSRLDescribing the y-intercept and slope in

contextDetermining if a LSRL is appropriateHow “good” is our model?