ap statistics lesson 3 – 3 least – squares regression

Post on 31-Dec-2015

218 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

AP STATISTICSAP STATISTICS

LESSON 3 – 3 LESSON 3 – 3

LEAST – SQUARES REGRESSION

Regression Line Regression Line A regression line is a straight line that

describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x. Regression, unlike correlation, requires we have an explanatory variable and a response variable.

LSRL – Is the abbreviation for least squares regression line. LSRL is a mathematical model.

Least – squares Regression Line Least – squares Regression Line Error = observed – predicted

To find the most effective model we must square the errors and sum them to find the least errors squared.

Least – squares Regression LineLeast – squares Regression Line

The least – squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.

Equation of the LSRLEquation of the LSRLWe have data on an explanatory variable x

and a response variable y for n individuals. From the data, calculate the means x and y and the standard deviations sx and sy, and their correlation r.

¯ ¯

What happened to y = mx+b?What happened to y = mx+b?

y represents the observed (actual) values for y, and y represents the predicted values for y. We use y hat in the equation of the regression line to emphasize that the line gives predicted values for any x.

When you are solving regression problems, be sure to distinguish between y and y.

Hot tip: (x, y) is always a point on the regression line!

ˆ

ˆ

¯ ¯

AP STATISTICSAP STATISTICS

LESSON 3 – 3 (DAY 2)LESSON 3 – 3 (DAY 2)

The role of r2 in regression

Essential Question: Essential Question:

How is the rHow is the r22 used to determine the used to determine the reliability of a linear regression line?reliability of a linear regression line?

To calculate r2.

To find the SST, the SSE and find the r2

from them.

Definitions and AbbreviationsDefinitions and Abbreviations

r2 = coefficient of determination ( The proportion of the total sample variability that is explained by the least-squares regression of y on x.

LSRL – Least squares regression line.

SST – (Total Sum of Squares)

SST = ∑ ( y – y )

SSE – (Sum of squares of errors)

SSE = ∑ ( y – ŷ)

2

2

ExercisesExercises

Small rSmall r22 and Large r and Large r22

Page 158: Example 3.10 SMALL r2

Page 160: Example 3.11 LARGE r2

rr2 2 in Regressionin Regression

The coefficient of determination r2, is the fraction of the variation in the values of y that is explained by least-squares regression of y on x.

r2 = SST - SSE

SST

Facts about Least-squares Facts about Least-squares RegressionsRegressions

Fact 1: The distinction between explanatory and response variable is essential in regression.

Fact 2: There is a close connection between correlation and the slope of the least-squares line. A change of one standard deviation of x corresponds to a change of r standard deviations in y.

Facts of RegressionFacts of Regression(continued)(continued)

Fact 3. The least-squares regression line always passes through the point ( x, y ).

Fact 4. The square of the correlation, r2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.

A P STATISTICS A P STATISTICS

LESSON 3 – 3 (DAY 3)LESSON 3 – 3 (DAY 3)

RESIDUALS

ESSENTIAL QUESTION:

What is a residual and what can a residual graph tell us about linear regression lines?

Objective: To define and use residuals in the analysis of linear regression lines.

Residuals Residuals

A residual is the difference between an observed variable and the value predicted by the regression line.

That is, residual = observed y – predicted y

= y - ŷ

Residual FactsResidual Facts

The mean of the least-square residuals is always zero.

The sum is not exactly 0 because the software rounded the residuals to four decimal places.

This is roundoff error.

The horizontal line of the residual plot is at zero.

Residual PlotsResidual Plots A residual plot is a scatterplot of the regression

residuals against the explanatory variable. Residual plots help us assess the fit of a regression line.

If the regression line captures the overall relationship between x and y, the residuals should should have no systematic pattern. The residual plot will look something like the simplfied pattern. That plot shows a uniform scatter of the points about the fitted line, with no unusual individual observations.

top related