regression - university of washington€¦ · 3/27/18 5 9 ©2018 emily fox stat/cse 416: intro to...

9
3/27/18 1 Regression: Predicting House Prices Emily Fox University of Washington March 27, 2018 1 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning Predicting house prices ©2018 Emily Fox 2

Upload: others

Post on 12-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regression - University of Washington€¦ · 3/27/18 5 9 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning square feet (sq.ft.)) x y •Look at average price in range •Still

3/27/18

1

STAT/CSE 416: Intro to Machine Learning

Regression:Predicting House PricesEmily FoxUniversity of WashingtonMarch 27, 2018

1 ©2018 Emily Fox

STAT/CSE 416: Intro to Machine Learning

Predicting house prices

©2018 Emily Fox2

Page 2: Regression - University of Washington€¦ · 3/27/18 5 9 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning square feet (sq.ft.)) x y •Look at average price in range •Still

3/27/18

2

STAT/CSE 416: Intro to Machine Learning3

How much is my house worth?

©2018 Emily Fox

I want to listmy house

for sale

STAT/CSE 416: Intro to Machine Learning4

How much is my house worth?

©2018 Emily Fox

$$ ????

Page 3: Regression - University of Washington€¦ · 3/27/18 5 9 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning square feet (sq.ft.)) x y •Look at average price in range •Still

3/27/18

3

STAT/CSE 416: Intro to Machine Learning

Data

©2018 Emily Fox

(x1 = sq.ft., y1 = $)

(x2 = sq.ft., y2 = $)

(x3 = sq.ft., y3 = $)

(x4 = sq.ft., y4 = $) Input vs. Output:• y is the quantity of interest

• assume y can be predicted from x

input output

(x5 = sq.ft., y5 = $)

5

STAT/CSE 416: Intro to Machine Learning6

Look at recent sales in my neighborhood

How much did they sell for?

©2018 Emily Fox

Page 4: Regression - University of Washington€¦ · 3/27/18 5 9 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning square feet (sq.ft.)) x y •Look at average price in range •Still

3/27/18

4

STAT/CSE 416: Intro to Machine Learning7

Plot recent house sales (Past 2 years)

©2018 Emily Fox

square feet (sq.ft.)

pri

ce ($

)

x

y

Terminology:x – feature,

covariate, or predictor

y – observation or response

STAT/CSE 416: Intro to Machine Learning8

Predict your house by similar houses

©2018 Emily Fox

square feet (sq.ft.)

pri

ce ($

)

x

y

No house sold recently had exactlythe same sq.ft.

Page 5: Regression - University of Washington€¦ · 3/27/18 5 9 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning square feet (sq.ft.)) x y •Look at average price in range •Still

3/27/18

5

STAT/CSE 416: Intro to Machine Learning9 ©2018 Emily Fox

square feet (sq.ft.)

pri

ce ($

)

x

y

• Look at average price in range

• Still only 2 houses!• Throwing out info

from all other sales

Predict your house by similar houses

STAT/CSE 416: Intro to Machine Learning10 ©2018 Emily Fox

square feet (sq.ft.)

pri

ce ($

)

x

y

Model – How we assume the world works

Regression model:

Page 6: Regression - University of Washington€¦ · 3/27/18 5 9 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning square feet (sq.ft.)) x y •Look at average price in range •Still

3/27/18

6

STAT/CSE 416: Intro to Machine Learning11 ©2018 Emily Fox

square feet (sq.ft.)

pri

ce ($

)

x

y

“Essentially, all models are wrong, but some are useful.”

George Box, 1987.

Model – How we assume the world works

STAT/CSE 416: Intro to Machine Learning12 ©2018 Emily Fox

TrainingData

Featureextraction

ML model

Qualitymetric

ML algorithm

y

x ŷ

⌃f

Page 7: Regression - University of Washington€¦ · 3/27/18 5 9 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning square feet (sq.ft.)) x y •Look at average price in range •Still

3/27/18

7

STAT/CSE 416: Intro to Machine Learning

Linear regression

©2018 Emily Fox13

STAT/CSE 416: Intro to Machine Learning14 ©2018 Emily Fox

TrainingData

Featureextraction

ML model

Qualitymetric

ML algorithm

y

x ŷ

f⌃

Page 8: Regression - University of Washington€¦ · 3/27/18 5 9 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning square feet (sq.ft.)) x y •Look at average price in range •Still

3/27/18

8

STAT/CSE 416: Intro to Machine Learning15 ©2018 Emily Fox

square feet (sq.ft.)

pri

ce ($

)

x

yFit a line through the data

f(x) = w0+w1 x

parameters of model

Use a simple linear regression model

yi = w0+w1 xi + εi

STAT/CSE 416: Intro to Machine Learning16

Which line?

©2018 Emily Fox

square feet (sq.ft.)

pri

ce ($

)

x

y

different parameters w0,w1

f(x) = w0+w1 x

Page 9: Regression - University of Washington€¦ · 3/27/18 5 9 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning square feet (sq.ft.)) x y •Look at average price in range •Still

3/27/18

9

STAT/CSE 416: Intro to Machine Learning17 ©2018 Emily Fox

TrainingData

Featureextraction

ML model

Qualitymetric

ML algorithm

ŵy

x ŷ

f⌃

STAT/CSE 416: Intro to Machine Learning18

“Cost” of using a given line

©2018 Emily Fox

square feet (sq.ft.)

pri

ce ($

)

x

y Residual sum of squares (RSS)

RSS(w0,w1) = ($house 1-[w0+w1sq.ft.house 1])2

+ ($house 2-[w0+w1sq.ft.house 2])2

+ ($house 3-[w0+w1sq.ft.house 3])2

+ … [include all houses]