linear regression - penn engineering › ~cis519 › spring2019 › ... · based on slide by...

61
Linear Regression Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Please send comments and corrections to Eric.

Upload: others

Post on 04-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearRegression

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.PleasesendcommentsandcorrectionstoEric.

Page 2: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

RegressionGiven:– Datawhere

– Correspondinglabelswhere

2

0

1

2

3

4

5

6

7

8

9

1970 1980 1990 2000 2010 2020

Septem

berA

rcticSeaIceExtent

(1,000,000sq

km)

Year

DatafromG.Witt.JournalofStatisticsEducation,Volume21,Number1(2013)

LinearRegressionQuadraticRegression

X =nx(1), . . . ,x(n)

ox(i) 2 Rd

y =ny(1), . . . , y(n)

oy(i) 2 R

Page 3: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

• 97samples,partitionedinto67train/30test• Eightpredictors(features):

– 6continuous(4logtransforms),1binary,1ordinal

• Continuousoutcomevariable:– lpsa:log(prostatespecificantigenlevel)

ProstateCancerDataset

BasedonslidebyJeffHowbert

Page 4: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearRegression• Hypothesis:

• Fitmodelbyminimizingsumofsquarederrors

5

x

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

Assumex0 =1

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

Figures are courtesy ofGregShakhnarovich

Page 5: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LeastSquaresLinearRegression

6

• CostFunction

• Fitbysolving

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

min✓

J(✓)

Page 6: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

7

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

BasedonexamplebyAndrewNg

Page 7: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

8

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafunctionofx) (functionoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

BasedonexamplebyAndrewNg

Page 8: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

9

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafunctionofx) (functionoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

J([0, 0.5]) =1

2⇥ 3

⇥(0.5� 1)2 + (1� 2)2 + (1.5� 3)2

⇤⇡ 0.58Basedonexample

byAndrewNg

Page 9: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

10

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafunctionofx) (functionoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

J([0, 0]) ⇡ 2.333

BasedonexamplebyAndrewNg

J()isconcave

Page 10: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

11SlidebyAndrewNg

Page 11: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

12

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 12: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

13

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 13: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

14

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 14: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

15

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 15: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce

16

✓ J(✓)

q1q0

J(q0,q1)

FigurebyAndrewNg

Page 16: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce

17

J(✓)

q1q0

J(q0,q1)

FigurebyAndrewNg

Page 17: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce

18

J(✓)

q1q0

J(q0,q1)

FigurebyAndrewNg

Sincetheleastsquaresobjectivefunctionisconvex(concave),wedon’tneedtoworryaboutlocalminima

Page 18: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent• Initialize• Repeatuntilconvergence

19

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

learningrate(small)e.g.,α=0.05

J(✓)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

Page 19: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent• Initialize• Repeatuntilconvergence

20

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!x(i)j

=1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

Page 20: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent• Initialize• Repeatuntilconvergence

21

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!x(i)j

=1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

Page 21: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent• Initialize• Repeatuntilconvergence

22

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!x(i)j

=1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

Page 22: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent• Initialize• Repeatuntilconvergence

23

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!x(i)j

=1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

Page 23: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescentforLinearRegression

• Initialize• Repeatuntilconvergence

24

simultaneousupdateforj =0...d

✓j ✓j � ↵1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

• Toachievesimultaneousupdate• AtthestartofeachGDiteration,compute• Usethisstoredvalueintheupdatesteploop

h✓

⇣x(i)

kvk2 =

sX

i

v2i =q

v21 + v22 + . . .+ v2|v|L2 norm:

k✓new � ✓oldk2 < ✏• Assumeconvergencewhen

Page 24: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

25

(forfixed,thisisafunctionofx) (functionoftheparameters)

h(x)=-900– 0.1x

SlidebyAndrewNg

Page 25: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

26

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 26: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

27

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 27: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

28

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 28: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

29

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 29: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

30

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 30: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

31

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 31: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

32

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 32: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

33

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 33: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

Choosingα

34

αtoosmall

slowconvergence

αtoolarge

Increasingvaluefor J(✓)

• Mayovershoottheminimum• Mayfailtoconverge• Mayevendiverge

Toseeifgradientdescentisworking,printouteachiteration• Thevalueshoulddecreaseateachiteration• Ifitdoesn’t,adjustα

J(✓)

Page 34: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

ExtendingLinearRegressiontoMoreComplexModels

• TheinputsX forlinearregressioncanbe:– Originalquantitativeinputs– Transformationofquantitativeinputs

• e.g.log,exp,squareroot,square,etc.

– Polynomialtransformation• example:y =b0 +b1×x +b2×x2 +b3×x3

– Basisexpansions– Dummycodingofcategoricalinputs– Interactionsbetweenvariables

• example:x3 =x1 × x2

Thisallowsuseoflinearregressiontechniquestofitnon-lineardatasets.

Page 35: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearBasisFunctionModels

• Generally,

• Typically,sothatactsasabias• Inthesimplestcase,weuselinearbasisfunctions:

h✓(x) =dX

j=0

✓j�j(x)

�0(x) = 1 ✓0

�j(x) = xj

basisfunction

BasedonslidebyChristopherBishop(PRML)

Page 36: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearBasisFunctionModels

– Theseareglobal;asmallchangeinx affectsallbasisfunctions

• Polynomialbasisfunctions:

• Gaussianbasisfunctions:

– Thesearelocal;asmallchangeinx onlyaffectnearbybasisfunctions.μj ands controllocationandscale(width).

BasedonslidebyChristopherBishop(PRML)

Page 37: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearBasisFunctionModels• Sigmoidal basisfunctions:

where

– Thesearealsolocal;asmallchangeinx onlyaffectsnearbybasisfunctions.μjands controllocationandscale(slope).

BasedonslidebyChristopherBishop(PRML)

Page 38: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

ExampleofFittingaPolynomialCurvewithaLinearModel

y = ✓0 + ✓1x+ ✓2x2 + . . .+ ✓px

p =pX

j=0

✓jxj

Page 39: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearBasisFunctionModels

• BasicLinearModel:

• GeneralizedLinearModel:

• Oncewehavereplacedthedatabytheoutputsofthebasisfunctions,fittingthegeneralizedmodelisexactlythesameproblemasfittingthebasicmodel– Unlessweusethekerneltrick– moreonthatwhenwecoversupportvectormachines

– Therefore,thereisnopointinclutteringthemathwithbasisfunctions

40

h✓(x) =dX

j=0

✓j�j(x)

h✓(x) =dX

j=0

✓jxj

BasedonslidebyGeoffHinton

Page 40: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearAlgebraConcepts• Vector in isanorderedsetofd realnumbers– e.g.,v=[1,6,3,4]isin– “[1,6,3,4]” isacolumnvector:– asopposedtoarowvector:

• Anm-by-n matrix isanobjectwithm rowsandn columns,whereeachentryisarealnumber:

÷÷÷÷÷

ø

ö

ççççç

è

æ

4361

( )4361

÷÷÷

ø

ö

ççç

è

æ

2396784821

Rd

R4

BasedonslidesbyJosephBradley

Page 41: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

• Transpose:reflectvector/matrixonline:

( )baba T

=÷÷ø

öççè

æ÷÷ø

öççè

æ=÷÷

ø

öççè

ædbca

dcba T

– Note:(Ax)T=xTAT (We’lldefinemultiplicationsoon…)

• Vectornorms:– Lp normofv =(v1,…,vk)is– Commonnorms:L1,L2– Linfinity =maxi |vi|

• Lengthofavectorv isL2(v)

X

i

|vi|p! 1

p

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 42: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

• Vectordotproduct:

– Note:dotproductofu withitself =length(u)2 =

• Matrixproduct:

( ) ( ) 22112121 vuvuvvuuvu +=•=•

÷÷ø

öççè

æ++++

=

÷÷ø

öççè

æ=÷÷

ø

öççè

æ=

2222122121221121

2212121121121111

2221

1211

2221

1211 ,

babababababababa

AB

bbbb

Baaaa

A

kuk22

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 43: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

• Vectorproducts:– Dotproduct:

– Outerproduct:

( ) 22112

121 vuvuvv

uuvuvu T +=÷÷ø

öççè

æ==•

( ) ÷÷ø

öççè

æ=÷÷

ø

öççè

æ=

2212

211121

2

1

vuvuvuvu

vvuu

uvT

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 44: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

h(x) = ✓|x

x| =⇥1 x1 . . . xd

Vectorization• Benefitsofvectorization– Morecompactequations– Fastercode(usingoptimizedmatrixlibraries)

• Considerourmodel:

• Let

• Canwritethemodelinvectorized formas45

h(x) =dX

j=0

✓jxj

✓ =

2

6664

✓0✓1...✓d

3

7775

Page 45: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

Vectorization• Considerourmodelforn instances:

• Let

• Canwritethemodelinvectorized formas46

h✓(x) = X✓

X =

2

66666664

1 x(1)1 . . . x(1)

d...

.... . .

...

1 x(i)1 . . . x(i)

d...

.... . .

...

1 x(n)1 . . . x(n)

d

3

77777775

✓ =

2

6664

✓0✓1...✓d

3

7775

h⇣x(i)

⌘=

dX

j=0

✓jx(i)j

R(d+1)⇥1 Rn⇥(d+1)

Page 46: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

J(✓) =1

2n

nX

i=1

⇣✓|x(i) � y(i)

⌘2

Vectorization• Forthelinearregressioncostfunction:

47

J(✓) =1

2n(X✓ � y)| (X✓ � y)

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

Rn⇥(d+1)

R(d+1)⇥1

Rn⇥1R1⇥n

Let:

y =

2

6664

y(1)

y(2)

...y(n)

3

7775

Page 47: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

ClosedFormSolution:

ClosedFormSolution• InsteadofusingGD,solveforoptimal analytically– Noticethatthesolutioniswhen

• Derivation:

Takederivativeandsetequalto0,thensolvefor:

48

✓@

@✓J(✓) = 0

J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

1x1J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

✓@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

Page 48: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

ClosedFormSolution• CanobtainbysimplypluggingX and into

• IfX TX isnotinvertible(i.e.,singular),mayneedto:– Usepseudo-inverseinsteadoftheinverse

• Inpython,numpy.linalg.pinv(a)

– Removeredundant(notlinearlyindependent)features– Removeextrafeaturestoensurethatd ≤n

49

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

y =

2

6664

y(1)

y(2)

...y(n)

3

7775X =

2

66666664

1 x(1)1 . . . x(1)

d...

.... . .

...

1 x(i)1 . . . x(i)

d...

.... . .

...

1 x(n)1 . . . x(n)

d

3

77777775

✓ y

Page 49: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescentvs ClosedForm

GradientDescentClosedFormSolution

50

• Requiresmultipleiterations• Needtochooseα• Workswellwhenn islarge• Cansupportincremental

learning

• Non-iterative• Noneedforα• Slowifn islarge

– Computing(X TX)-1 isroughlyO(n3)

Page 50: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

ImprovingLearning:FeatureScaling

• Idea:Ensurethatfeaturehavesimilarscales

• Makesgradientdescentconvergemuch faster

51

0

5

10

15

20

0 5 10 15 20

✓1

✓2

BeforeFeatureScaling

0

5

10

15

20

0 5 10 15 20

✓1

✓2

AfterFeatureScaling

Page 51: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

FeatureStandardization• Rescalesfeaturestohavezeromeanandunitvariance

– Letμj bethemeanoffeaturej:

– Replaceeachvaluewith:

• sj isthestandarddeviationoffeaturej• Couldalsousetherangeoffeaturej (maxj – minj)forsj

• Mustapplythesametransformationtoinstancesforbothtrainingandprediction

• Outlierscancauseproblems52

µj =1

n

nX

i=1

x(i)j

x(i)j

x(i)j � µj

sj

forj =1...d(notx0!)

Page 52: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

QualityofFit

Overfitting:• Thelearnedhypothesismayfitthetrainingsetverywell( )

• ...butfailstogeneralizetonewexamples

53

Prod

uctiv

ity

TimeSpent

Prod

uctiv

ity

TimeSpent

Prod

uctiv

ity

TimeSpent

Underfitting(highbias)

Overfitting(highvariance)

Correctfit

J(✓) ⇡ 0

BasedonexamplebyAndrewNg

Page 53: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

Regularization• Amethodforautomaticallycontrollingthecomplexityofthelearnedhypothesis

• Idea:penalizeforlargevaluesof– Canincorporateintothecostfunction– Workswellwhenwehavealotoffeatures,eachthatcontributesabittopredictingthelabel

• Canalsoaddressoverfitting byeliminatingfeatures(eithermanuallyorviamodelselection)

54

✓j

Page 54: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

Regularization• Linearregressionobjectivefunction

– istheregularizationparameter()– Noregularizationon!

55

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+ �

dX

j=1

✓2j

modelfittodata regularization

✓0

� � � 0

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 55: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

UnderstandingRegularization

• Notethat

– Thisisthemagnitudeofthefeaturecoefficientvector!

• Wecanalsothinkofthisas:

• L2 regularizationpullscoefficientstoward0

56

dX

j=1

✓2j = k✓1:dk22

dX

j=1

(✓j � 0)2 = k✓1:d � ~0k22

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 56: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

UnderstandingRegularization

• Whathappensas?

57

Prod

uctiv

ity

TimeSpentonWork

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+

2

dX

j=1

✓2j

� ! 1

✓0 + ✓1x+ ✓2x2 + ✓3x

3 + ✓4x4

Page 57: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

UnderstandingRegularization

• Whathappensas?

58

Prod

uctiv

ity

TimeSpentonWork

0 0 0 0

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+

2

dX

j=1

✓2j

� ! 1

✓0 + ✓1x+ ✓2x2 + ✓3x

3 + ✓4x4

Page 58: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

RegularizedLinearRegression

59

• CostFunction

• Fitbysolving

• Gradientupdate:

min✓

J(✓)

✓j ✓j � ↵1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

regularization

@

@✓jJ(✓)

@

@✓0J(✓)

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+

2

dX

j=1

✓2j

� ↵�✓j

Page 59: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

RegularizedLinearRegression

60

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

• Wecanrewritethegradientstepas:

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+

2

dX

j=1

✓2j

✓j ✓j (1� ↵�)� ↵1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

✓j ✓j � ↵1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j � ↵�✓j

Page 60: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

RegularizedLinearRegression

61

✓ =

0

BBBBB@X|X + �

2

666664

0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...

......

. . ....

0 0 0 . . . 1

3

777775

1

CCCCCA

�1

X|y

• Toincorporateregularizationintotheclosedformsolution:

Page 61: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

RegularizedLinearRegression

62

• Toincorporateregularizationintotheclosedformsolution:

• Canderivethisthesameway,bysolving

• Canprovethatforλ >0,inverseexistsintheequationabove

✓ =

0

BBBBB@X|X + �

2

666664

0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...

......

. . ....

0 0 0 . . . 1

3

777775

1

CCCCCA

�1

X|y

@

@✓J(✓) = 0