scatterplot, find r & describe

30
A student wonders if tall women tend to date taller men than do short women. She measures herself, her dormitory roommate, and the women in the adjoining rooms. Then she measures the next man each woman date. Draw & discuss the scatterplot and calculate the correlation coefficient. Women (x) Men (y) 66 72 64 68 66 70 65 68 70 71 65 65

Upload: harry

Post on 07-Jan-2016

39 views

Category:

Documents


2 download

DESCRIPTION

A student wonders if tall women tend to date taller men than do short women. She measures herself, her dormitory roommate, and the women in the adjoining rooms. Then she measures the next man each woman date. Draw & discuss the scatterplot and calculate the correlation coefficient. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scatterplot, find r & describe

A student wonders if tall women tend to date taller men than do short women. She measures herself, her dormitory roommate, and the women in the adjoining rooms. Then she measures the next man each woman date. Draw & discuss the scatterplot and calculate the correlation coefficient.

Women(x)

Men(y)

66 72

64 68

66 70

65 68

70 71

65 65

Page 2: Scatterplot, find r & describe

Scatterplot, find r & describe.

SAT-math SAT-verbal

680 780

450 570

440 550

610 500

730 720

530 570

700 600

640 530

740 800

Page 3: Scatterplot, find r & describe

Create scatterplot

Find the correlation

Describe the association

Fat (g) Sodium

19 920

31 1500

34 1310

35 860

39 1180

39 940

43 1260

Page 4: Scatterplot, find r & describe

Linear Regression

Page 5: Scatterplot, find r & describe

Guess the correlation coefficient

http://istics.net/stat/Correlations/

Page 6: Scatterplot, find r & describe

Can we make a Line of Best Fit

Page 7: Scatterplot, find r & describe

Regression Line This is a line that describes how a response

variable (y) changes as an explanatory variable (x) changes.

It’s used to predict the value of (y) for a given value of (x).

Unlike correlation, regression requires that we have an explanatory variable.

Page 8: Scatterplot, find r & describe

Let’s try some!

http://illuminations.nctm.org/ActivityDetail.aspx?ID=146

Page 9: Scatterplot, find r & describe

Regression Line

Page 10: Scatterplot, find r & describe

The following data shows the number of miles driven and advertised price for 11 used Honda CR-Vs from the 2002-2006 model years (prices found at www.carmax.com). The scatterplot below shows a strong, negative linear association between number of miles and advertised cost. The correlation is -0.874. The line on the plot is the regression line for predicting advertised price based on number of miles.

ThousandMiles

Driven

Cost(dollars)

22 1799829 1645035 1499839 1399845 1459949 1498855 1359956 1459969 1199870 1445086 10998

10

12

14

16

18

ThousandMilesDriven20 30 40 50 60 70 80 90

Cost = 1.88e+04 - 86.2ThousandMilesDriven

Page 11: Scatterplot, find r & describe

The regression line is shown below…. Use it to answer the following.

Slope:

Y-intercept:

Page 12: Scatterplot, find r & describe

Predict the price for a Honda with 50,000 miles.

Page 13: Scatterplot, find r & describe

Extrapolation

This refers to using a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line.

They are not usually very accurate predictions.

Page 14: Scatterplot, find r & describe

Slope:

Y-int:

Predict weight after 16 wk

Predict weight at 2 years:

Page 15: Scatterplot, find r & describe

Residual

Page 16: Scatterplot, find r & describe

The equation of the least-squares regression line for the sprint time and long-jump distance data is predicted long-jump distance = 304.56 – 27.3 (sprint time).

Find and interpret the residual for the student who had a sprint time of 8.09

seconds.

Page 17: Scatterplot, find r & describe

Regression

Let’s see how a regression line is calculated.

Page 18: Scatterplot, find r & describe

Fat vs Calories in BurgersFat (g) Calories

19 410

31 580

34 590

35 570

39 640

39 680

43 660

Page 19: Scatterplot, find r & describe

Let’s standardize the variables

Fat Cal z - x's z - y's

19 410 -1.959 -2

31 580 -0.42 -0.1

34 590 -0.036 0

35 570 0.09 -0.2

39 640 0.6 0.56

39 680 0.6 1

43 660 1.12 0.78

The line must contain the point and pass through the origin. ,x y

Page 20: Scatterplot, find r & describe

Let’s clarify a little. (Just watch & listen)

The equation for a line that passes through the origin can be written with just a slope & no intercept: y = mx.

But, we’re using z-scores so our equation should reflect this and thus it’s

Many lines with different slope pass through the origin. Which one fits our data the best? That is which slope determines the line that minimizes the sum of the squared residuals.

y xz mz

Page 21: Scatterplot, find r & describe

Line of Best Fit –Least Squares Regression Line

It’s the line for which the sum of the squared residuals is smallest. We want to find the mean squared residual.

Focus on the vertical deviations from the line.

Residual = Observed - Predicted

Page 22: Scatterplot, find r & describe

Let’s find it. (just watch & soak it in)

2

2

2 2 2

2 22

2

1

1

2

1

21 1 1

1 2

yy

y x

y x y x

y x y x

z zMSR

n

z mzMSR

n

z mz z m zMSR

nz z z z

MSR m mn n n

MSR mr m

since y xz mz

St. Dev of z scores is 1 so variance is 1 also.

This is r!

Page 23: Scatterplot, find r & describe

Continue……

Since this is a parabola – it reaches it’s minimum at 2

bx

a

This gives us(2 )

2(1)

rm r

Hence – the slope of the best fit line for z-scores is the correlation coefficient → r.

Page 24: Scatterplot, find r & describe

Slope – rise over run

A slope of r for z-scores means that for every increase of 1 standard deviation in , there is an increase of r standard deviations in . “Over 1 and up r”

Translate back to x & y values – “over one standard deviation in x, up r standard deviations in y.

Slope of the regression line is:

xz

yz

y

x

rsb

s

Page 25: Scatterplot, find r & describe

Why is correlation “r”

Because it was calculated from the regression of y on x after standardizing the variables – just like we have just done – thus he used r to stand for (standardized) regression.

Page 26: Scatterplot, find r & describe

The number of miles (in thousands) for the 11 used Hondas have a mean of 50.5 and a standard deviation of 19.3. The asking prices had a mean of $14,425 and a standard deviation of $1,899. The correlation for these variables is r = -0.874. Find the equation of the least-squares regression line and explain what change in price we would expect for each additional 19.3 thousand miles.

Page 27: Scatterplot, find r & describe

So let’s write the equation!

0

0 1

1

from algebra

y-intercept

slope

y mx b

by b b x

b

Fat (g) Calories

19 410

31 580

34 590

35 570

39 640

39 680

43 660

Slope:

Explain the slope:

Page 28: Scatterplot, find r & describe

Now for the final part – the equation!

Y-intercept: Remember – it has to pass through the point . ,x y

0 1y b b x Solve for y-intercept:

Page 29: Scatterplot, find r & describe

Now it can be used to predict.

How many calories do I expect to find in a hamburger that has 25 grams of fat?

Page 30: Scatterplot, find r & describe

That’s…all…..Folks!