dr. yanjun qi / uva cs 6316 / f16 uva cs 6316/4501 – fall ... · lr with non-linear basis...

36
UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 5: Non-Linear Regression Models Dr. Yanjun Qi University of Virginia Department of Computer Science 9/12/16 Dr. Yanjun Qi / UVA CS 6316 / f16 1

Upload: others

Post on 01-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

UVACS6316/4501–Fall2016

MachineLearning

Lecture5:Non-LinearRegressionModels

Dr.YanjunQi

UniversityofVirginia

DepartmentofComputerScience

9/12/16

Dr.YanjunQi/UVACS6316/f16

1

Page 2: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Wherearewe?èFivemajorsecGonsofthiscourse

q Regression(supervised)q ClassificaGon(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels

9/12/16 2

Dr.YanjunQi/UVACS6316/f16

Page 3: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

TodayèRegression(supervised)

q Fourwaystotrain/performopGmizaGonforlinearregressionmodelsq NormalEquaGonq GradientDescent(GD)q StochasGcGDq Newton’smethod

q Supervisedregressionmodels

q Linearregression(LR)q LRwithnon-linearbasisfuncGonsq LocallyweightedLRq LRwithRegularizaGons

9/12/16 3

Dr.YanjunQi/UVACS6316/f16

Page 4: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Today

q MachineLearningMethodinanutshellq RegressionModelsBeyondLinear

– LRwithnon-linearbasisfuncGons– Locallyweightedlinearregression– RegressiontreesandMulGlinearInterpolaGon(later)

9/12/16 4

Dr.YanjunQi/UVACS6316/f16

Page 5: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

TradiGonalProgramming

MachineLearning

ComputerData

ProgramOutput

ComputerData

OutputProgram

9/12/16 5

Dr.YanjunQi/UVACS6316/f16

Page 6: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Machine Learning in a Nutshell

Task

Representation

Score Function

Search/Optimization

Models, Parameters

9/12/16 6

MLgrewoutofworkinAIOp#mizeaperformancecriterionusingexampledataorpastexperience,Aimingtogeneralizetounseendata

Dr.YanjunQi/UVACS6316/f16

Page 7: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

(1) Multivariate Linear Regression

Regression

Y = Weighted linear sum of X�s

Sum of squared error

Linear algebra / GD / SGD

Regression coefficients

Task

Representation

Score Function

Search/Optimization

Models, Parameters

y = f (x)=θ0 +θ1x1 +θ2x29/12/16 7

Dr.YanjunQi/UVACS6316/f16

Page 8: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Today

q MachineLearningMethodinanutshellq RegressionModelsBeyondLinear

– LRwithnon-linearbasisfuncGons– Locallyweightedlinearregression– RegressiontreesandMulGlinearInterpolaGon(later)

9/12/16 8

Dr.YanjunQi/UVACS6316/f16

Page 9: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Dr.YanjunQi/UVACS6316/f16

9

LRwithnon-linearbasisfuncGons

•  LRdoesnotmeanwecanonlydealwithlinearrelaGonships

•  Wearefreetodesign(non-linear)features(e.g.,basisfuncGonderived)underLR

wherethearefixedbasisfuncGons(alsodefine).

•  E.g.:polynomialregression:

!!y =θ0 + θ jϕ j(x)j=1m∑ =θTϕ(x)

ϕ(x):= 1,x ,x2⎡⎣ ⎤⎦T

9/12/16

!!ϕ j(x)

!!ϕ0(x)=1

Page 10: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

e.g.(1)polynomialregression

•  IntroducebasisfuncGons

9/12/16

Dr.YanjunQi/UVACS6316/f16

10Dr.NandodeFreitas’stutorialslide

θ * = ϕTϕ( )−1ϕT !y

Page 11: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

e.g.(1)polynomialregression

9/12/16

Dr.YanjunQi/UVACS6316/f16

11

KEY:ifthebasesaregiven,theproblemoflearningtheparametersissGlllinear.

Page 12: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Dr.YanjunQi/UVACS6316/f16

12

ManyPossibleBasisfuncGons•  TherearemanybasisfuncGons,e.g.:

–  Polynomial

–  RadialbasisfuncGons

–  Sigmoidal

–  Splines,–  Fourier,–  Wavelets,etc

ϕ j (x) = xj−1

( )⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= 2

2

2sx

x jj

µφ exp)(

⎟⎟⎠

⎞⎜⎜⎝

⎛ −=

sx

x jj

µσφ )(

9/12/16

Page 13: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

ManyPossibleBasisfuncGons

9/12/16

Dr.YanjunQi/UVACS6316/f16

13

Page 14: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Dr.YanjunQi/UVACS6316/f16

14

e.g.(2)LRwithradial-basisfuncGons

•  E.g.:LRwithRBFregression:

!! y =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ

9/12/16

!!ϕ(x):= 1,Kλ1(x ,r1),Kλ2(x ,r2),Kλ3

(x ,r3),Kλ4(x ,r4 )⎡⎣

⎤⎦T

θ * = ϕTϕ( )−1ϕT !y

Page 15: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Kλ(x ,r)= exp − (x − r)

2

2λ2⎛

⎝⎜⎞

⎠⎟

RBF=radial-basisfuncGon:afuncGonwhichdependsonlyontheradialdistancefromacentrepoint

GaussianRBFè

asdistancefromthecenterrincreases,theoutputoftheRBFdecreases

1Dcase 2Dcase

9/12/16

Dr.YanjunQi/UVACS6316/f16

15

Page 16: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

9/12/16

Dr.YanjunQi/UVACS6316/f16

16

Kλ(x ,r)= exp − (x − r)

2

2λ2⎛

⎝⎜⎞

⎠⎟

X=

10.6065307

0.1353353

0.0001234098

Kλ(x ,r)=r

r +λ

r +2λr +3λ

Page 17: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

e.g.anotherLinearregressionwith1DRBFbasisfuncGons

(assuming3predefinedcentresandwidth)

9/12/16

Dr.YanjunQi/UVACS6316/f16

17

ϕ(x):= 1,Kλ1(x ,r1),Kλ2(x ,r2),Kλ3

(x ,r3)⎡⎣

⎤⎦T

θ * = ϕTϕ( )−1ϕT !y

Page 18: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Dr.YanjunQi/UVACS6316/f16

18

e.g.aLRwith1DRBFs(3predefinedcentresandwidth)

•  1DRBF

•  Aierfit:

9/12/16

Page 19: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Dr.YanjunQi/UVACS6316/f16

19

e.g.2DGoodandBadRBFs

•  Agood2DRBF

•  Twobad2DRBFs

9/12/16

Page 20: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Twomainissues:

•  Learntheparameter\theta– AlmostthesameasLR,justèXto– LinearcombinaGonofbasisfuncGons(thatcanbenon-linear)

•  Howtochoosethemodelorder,e.g.whatpolynomialdegreeforpolynomialregression

9/12/16

Dr.YanjunQi/UVACS6316/f16

20

ϕ(x)

Page 21: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Dr.YanjunQi/UVACS6316/f16

21

Issue:Overfilngandunderfilng

xy 10 θθ += 2210 xxy θθθ ++= ∑ =

= 5

0jj

j xy θ

9/12/16

K-foldCrossValidaGon!!!!

Generalisation:learnfuncGon/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”newdataexamples

Underfit Looksgood Overfit

Page 22: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

(2) Multivariate Linear Regression with basis Expansion

Regression

Y = Weighted linear sum of (X basis expansion)

SSE

Linear algebra

Regression coefficients

Task

Representation

Score Function

Search/Optimization

Models, Parameters

!! y =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ

9/12/16 22

Dr.YanjunQi/UVACS6316/f16

Page 23: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Today

q MachineLearningMethodinanutshellq RegressionModelsBeyondLinear

– LRwithnon-linearbasisfuncGons– Locallyweightedlinearregression– RegressiontreesandMulGlinearInterpolaGon(later)

9/12/16 23

Dr.YanjunQi/UVACS6316/f16

Page 24: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

24

Locally weighted regression

•  aka locally weighted regression, local linear regression, LOESS, …

9/12/16

Dr.YanjunQi/UVACS6316/f16

linear_func(x)->yèTorepresentonlytheneighborregionofx_0

UseRBFfuncGontopickout/emphasizetheneighborregionofx_0è

Kλ (xi, x0 )

Kλ (xi, x0 )

Page 25: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Dr.YanjunQi/UVACS6316/f16

25

Locallyweightedlinearregression

Insteadofminimizingnowwefittominimize

∑=

−=n

ii

Ti yJ

1

2

21 )()( θθ x

J(θ ) = 12

wi (xiTθ − yi )

2

i=1

n

wi = Kλ(x i ,x0)= exp −

(x i − x0)22λ2

⎝⎜

⎠⎟

9/12/16

wherex_0isthequerypointforwhichwe'dliketoknowitscorresponding y

Page 26: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Dr.YanjunQi/UVACS6316/f16

26

LocallyweightedlinearregressionWefit\thetatominimizeWheredowi'scomefrom?

•  x_0isthequerypointforwhichwe'dliketoknowitscorresponding y

àEssenGallyweputhigherweightson(thoseerrorsfrom)trainingexamplesthatareclosetothequerypointx_0(thanthosethatarefurtherawayfromthequerypoint)

J(θ ) = 12

wi (xiTθ − yi )

2

i=1

n

wi = Kλ(x i ,x0)= exp −

(x i − x0)22λ2

⎝⎜

⎠⎟

9/12/16

Page 27: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

27

Locally weighted linear regression

•  ThewidthofRBFmasers!

9/12/16

Dr.YanjunQi/UVACS6316/f16

Page 28: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

9/12/16

Dr. Yanjun Qi / UVA CS 6316 / f16

Locally weighted linear regression

•  èSeparate weighted least squares at each target point x0:

min

θ0(x0 ),θ1(x0 )Kλ(xi ,x0)[ yi −θ0(x0)−θ1(x0)xi ]2

i=1

N

f (x0)=θ0!(x0)+θ1

!(x0)x0

Page 29: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

LEARNING of Locally weighted linear regression

9/12/16

Dr.YanjunQi/UVACS6316/f16

x0

29

è Separate weighted least squares at each target point x0

f (x0)=θ0!(x0)+θ1

!(x0)x0

Page 30: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

9/12/16

Dr. Yanjun Qi / UVA CS 6316 / f16

Locally weighted linear regression

•  b(x)T=(1,x); •  B: Nx2 regression matrix with i-th row b(x)T;

( ) NixxKdiagxW iNN ,,1,),()( 00 …==× λ

f (x0 ) = b(x0 )T (BTW (x0 )B)

−1BTW (x0 )y

e.g.whenforonlyonefeaturevariable

versus LR f (xq ) = (xq )Tθ * = (xq )

T XTX( )−1XT !y

LWR

Page 31: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

9/12/16

More è Local Weighted Polynomial Regression

•  Local polynomial fits of any degree d

∑∑ ∑

=

= ==

+=

⎥⎦

⎤⎢⎣

⎡−−

d

jj

j

N

i

d

j

jijiidjxx

xxxxf

xxxyxxKj

1 0000

1

2

1000,,1),(),(

)(ˆ)(ˆ)(ˆ

)()(),(min00

βα

βαλβα …

Dr.YanjunQi/UVACS6316/f16

Blue:trueGreen:esGmated

Page 32: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

Dr.YanjunQi/UVACS6316/f16

32

Parametricvs.non-parametric•  Locallyweightedlinearregressionisanon-parametric

algorithm.

•  The(unweighted)linearregressionalgorithmthatwesawearlierisknownasaparametriclearningalgorithm–  becauseithasafixed,finitenumberofparameters(the),whicharefittothedata;

–  Oncewe'vefitthe\thetaandstoredthemaway,wenolongerneedtokeepthetrainingdataaroundtomakefuturepredicGons.

–  Incontrast,tomakepredicGonsusinglocallyweightedlinearregression,weneedtokeeptheenGretrainingsetaround.

•  Theterm"non-parametric"(roughly)referstothefactthattheamountofknowledgeweneedtokeep,inordertorepresentthehypothesisgrowswithlinearlythesizeofthetrainingset.

9/12/16

θ

Page 33: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

(3) Locally Weighted / Kernel Linear Regression

Regression

Y = Weighted linear sum of X�s

Weighted SSE

Linear algebra

Local Regression coefficients

(conditioned on each test point)

Task

Representation

Score Function

Search/Optimization

Models, Parameters

0000 )(ˆ)(ˆ)(ˆ xxxxf βα +=min

α (x0 ),β(x0 )Kλ(x0 ,xi )[ yi −α(x0)−β(x0)xi ]2

i=1

N

∑9/12/16 33

Dr.YanjunQi/UVACS6316/f16

Page 34: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

TodayRecap

q MachineLearningMethodinanutshellq RegressionModelsBeyondLinear

– LRwithnon-linearbasisfuncGons– Locallyweightedlinearregression– RegressiontreesandMulGlinearInterpolaGon(later)

9/12/16 34

Dr.YanjunQi/UVACS6316/f16

Page 35: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

35

ProbabilisGcInterpretaGonofLinearRegression(LATER)

•  LetusassumethatthetargetvariableandtheinputsarerelatedbytheequaGon:

whereεisanerrortermofunmodeledeffectsorrandomnoise

•  NowassumethatεfollowsaGaussianN(0,σ),thenwehave:

•  Byiid(amongsamples)assumpGon:

iiT

iy εθ += x

⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= 2

2

221

σθ

σπθ )(

exp);|( iT

iii

yxyp

x

⎟⎟

⎞⎜⎜

⎛ −−⎟

⎠⎞⎜

⎝⎛== ∑∏ =

=2

12

1 221

σθ

σπθθ

n

i iT

inn

iii

yxypL

)(exp);|()(

x

9/12/16

Dr.YanjunQi/UVACS6316/f16

ManymorevariaGonsofLRfromthisperspecGve,e.g.binomial/poisson

(LATER)

Page 36: Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

References

•  BigthankstoProf.EricXing@CMUforallowingmetoreusesomeofhisslides

q Prof.NandodeFreitas’stutorialslide

9/12/16 36

Dr.YanjunQi/UVACS6316/f16