chapter 20 linear regression. what if… we believe that an important relation between two measures...

33
Chapter 20 Linear Regression

Upload: hillary-kennedy

Post on 01-Jan-2016

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Chapter 20

Linear Regression

Page 2: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

What if…We believe that an important relation

between two measures exists?For example, we ask 5 people about

their salary and education levelFor each observation we have two

measures, and those two measures came from the same person

Page 3: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

What would we “predict”? Does more education mean more salary? Does more salary mean more education? Does more education mean less salary? Does more salary mean less education? Are salary and education related?

Page 4: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

RegressionDescriptive vs. Inferential Bivariate data - measurements on two

variables for each observation– Heights (X) and weights (Y)– IQ (X) and SAT(Y) scores – Years of educ. (X) and Annual salary (Y)– Number of Policemen (X) and Number of

crimes (Y) in US cities

Page 5: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Regression

How are the two sets of scores related?

Using a scatterplot we can “look” at the relationship

Constructed by plotting each of the bivariate observations (X, Y)

9 11 13 15 17 19 210

10

20

30

40

50

60

70

Yrs of Education

An

nu

al

Sala

ry (

in 1

00

0's

)

Page 6: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Regression

Which one’s X and which one’s Y?

That’s up to you, but… Generally, the X

variable is thought of as the “predictor” variable

We try to predict a Y score given an X score

9 11 13 15 17 19 210

10

20

30

40

50

60

70

Yrs of Education

An

nu

al

Sala

ry (

in 1

00

0's

)

Page 7: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Regression

If the scores seem to “line up,” we call this a “linear relationship”

9 11 13 15 17 19 210

10

20

30

40

50

60

70

Yrs of Education

An

nu

al

Sala

ry (

in 1

00

0's

)

Page 8: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Interpreting Scatterplots

If the following relations hold:

low x - high ymid x - mid yhigh x - low y,

“A negative linear relationship”

9 11 13 15 17 19 210

10

20

30

40

50

60

70

Yrs of Education

An

nu

al

Sala

ry (

in 1

00

0's

)

Page 9: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Interpreting Scatterplots

If the following relations hold:

low x - low ymid x - mid yhigh x - high

y,

“A positive linear relationship”

2 4 6 80

1

2

3

4

5

6

7

8

9

10

Police per 1000 citizensN

um

be

r o

f C

rim

es (

10

00

s)

Page 10: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Interpreting Scatterplots

However, there also can be “no relation” also

2 4 6 899

100

101

102

103

104

105

106

107

Shoe Size

IQ

Page 11: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Interpreting Scatterplots

Curvelinear

50 55 60 65 70 75100

105

110

115

120

125

130

135

HeightW

eig

ht

Page 12: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Measuring Linear RelationshipsThe first measure of a linear

relationship (not in the book) is COVARIANCE (sXY)

Page 13: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Or

SPXY is known as the “Sum of Products” or the sum of the products of the deviations of X and Y from their means

Page 14: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Easy Calculation

Page 15: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Covariance

Interpretation:– positive = positive linear relationship– negative = negative linear relationship– zero = no relationship

Magnitude (strength of the relationship)?– Uninterpretable– for example, a large covariance does not

necessarily mean strong relationship

Page 16: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

But, we can use covariance Which line best fits our

data? Do we just draw one

that looks good? No, we can use

something called “least squares regression” to find the equation of the best-fit line (“Best-fit linear regression”)

9 11 13 15 17 19 2120

25

30

35

40

45

50

55

60

65

Yrs of EducationA

nn

ual

Sala

ry (

in 1

00

0's

)

Page 17: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Linear Equations

Yi = mXi + bm = slopeb = y-intercept

Page 18: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Finding the Slope

Page 19: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Or…

Page 20: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Finding the y-intercept (b)After finding the slope (m), find b using:

Page 21: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Least Squares Criterion

The best line has the property of least squares

The sum of the squared deviations of the points from the line are a minimum

9 11 13 15 17 19 2120

25

30

35

40

45

50

55

60

65

Yrs of EducationA

nn

ual

Sala

ry (

in 1

00

0's

)

Page 22: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

What’s the “least” again?What are we trying to minimize?

– The best fit line will be described by the function Yi = mXi + b

– Thus, for any Xi, we can estimate a corresponding Yi value

– Problem: for some Xi’s we already have Yi’s

– So, let’s call the estimated value

(“Y-sub-I-hat”), to differentiate it from the “real” Yi

Page 23: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Least Squares Criterion

For example, when

Xi = 15we would estimate that = 44,000

But, we have a “real” Yi value corresponding to Xi =15 (35,000)

9 11 13 15 17 19 2120

25

30

35

40

45

50

55

60

65

Yrs of Education

An

nu

al

Sala

ry (

in 1

00

0's

)When Xi = 15

Our estimatedY value is44,000

A “real”Y valueof 35,000

iY

Page 24: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Minimize this…

For every Xi, we have the a value Yi, and an estimate of Yi ( )

Consider the quantity:– Which is the deviation of the real score from the

estimated score, for any give Xi value The sum of these deviations will be zero

Page 25: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

• But, by squaring those deviations and summing,

• We want the line that makes the above quantity the minimum (the least squares criterion)

• This is also called the sums of squares error or SSE (how much do our estimates “err” from our real values?)

Page 26: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

How accurate are our Estimates?Two ways to measure how “good” our

estimates are:– Standard Error of the Estimate– Coefficient of Determination (not covered

in our book, yet)

Page 27: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Standard Error of the Estimate

but, this term is very hard to interpret. (Hurrah, there are better ways to measure the goodness of the fit!)

Page 28: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Coefficient of Determination

cd = r2

Page 29: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Now You:ID INCOME NUMDRK

2001 1 1

2002 6 2

2003 5 8

2004 4 1

2005 6 3

Page 30: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Practice:ID INCOME NUMDRK

XY

2001 1 1

2002 6 2

2003 5 8

2004 4 1

2005 6 3

Σ

n

M

SS(X)

Page 31: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Practice:ID INCOME NUMDRK

XY

2001 1 1 1

2002 6 2 12

2003 5 8 40

2004 4 1 4

2005 6 3 18

Σ 22 15 75

n 5 5

M 4.4 3

SS(X) 17.2 34

Page 32: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

Practice:ID INCOME NUMDRK

XY

2001 1 1 1

2002 6 2 12

2003 5 8 40

2004 4 1 4

2005 6 3 18

Σ 22 15 75

n 5 5

M 4.4 3

SS(X) 17.2 34

Page 33: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary

0 1 2 3 4 5 6 70

1

2

3

4

5

6

7

8

9

f(x) = 0.523255813953 x + 0.697674418605