Transcript
Page 1: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearRegression

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.PleasesendcommentsandcorrectionstoEric.

Page 2: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

RegressionGiven:– Datawhere

– Correspondinglabelswhere

2

0

1

2

3

4

5

6

7

8

9

1970 1980 1990 2000 2010 2020

Septem

berA

rcticSeaIceExtent

(1,000,000sq

km)

Year

DatafromG.Witt.JournalofStatisticsEducation,Volume21,Number1(2013)

LinearRegressionQuadraticRegression

X =nx(1), . . . ,x(n)

ox(i) 2 Rd

y =ny(1), . . . , y(n)

oy(i) 2 R

Page 3: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

• 97samples,partitionedinto67train/30test• Eightpredictors(features):

– 6continuous(4logtransforms),1binary,1ordinal

• Continuousoutcomevariable:– lpsa:log(prostatespecificantigenlevel)

ProstateCancerDataset

BasedonslidebyJeffHowbert

Page 4: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearRegression• Hypothesis:

• Fitmodelbyminimizingsumofsquarederrors

5

x

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

Assumex0 =1

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

Figures are courtesy ofGregShakhnarovich

Page 5: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LeastSquaresLinearRegression

6

• CostFunction

• Fitbysolving

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

min✓

J(✓)

Page 6: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

7

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

BasedonexamplebyAndrewNg

Page 7: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

8

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafunctionofx) (functionoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

BasedonexamplebyAndrewNg

Page 8: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

9

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafunctionofx) (functionoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

J([0, 0.5]) =1

2⇥ 3

⇥(0.5� 1)2 + (1� 2)2 + (1.5� 3)2

⇤⇡ 0.58Basedonexample

byAndrewNg

Page 9: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

10

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafunctionofx) (functionoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

J([0, 0]) ⇡ 2.333

BasedonexamplebyAndrewNg

J()isconcave

Page 10: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

11SlidebyAndrewNg

Page 11: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

12

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 12: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

13

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 13: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

14

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 14: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

IntuitionBehindCostFunction

15

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 15: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce

16

✓ J(✓)

q1q0

J(q0,q1)

FigurebyAndrewNg

Page 16: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce

17

J(✓)

q1q0

J(q0,q1)

FigurebyAndrewNg

Page 17: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce

18

J(✓)

q1q0

J(q0,q1)

FigurebyAndrewNg

Sincetheleastsquaresobjectivefunctionisconvex(concave),wedon’tneedtoworryaboutlocalminima

Page 18: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent• Initialize• Repeatuntilconvergence

19

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

learningrate(small)e.g.,α=0.05

J(✓)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

Page 19: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent• Initialize• Repeatuntilconvergence

20

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!x(i)j

=1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

Page 20: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent• Initialize• Repeatuntilconvergence

21

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!x(i)j

=1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

Page 21: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent• Initialize• Repeatuntilconvergence

22

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!x(i)j

=1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

Page 22: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent• Initialize• Repeatuntilconvergence

23

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y(i)

!x(i)j

=1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

Page 23: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescentforLinearRegression

• Initialize• Repeatuntilconvergence

24

simultaneousupdateforj =0...d

✓j ✓j � ↵1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

• Toachievesimultaneousupdate• AtthestartofeachGDiteration,compute• Usethisstoredvalueintheupdatesteploop

h✓

⇣x(i)

kvk2 =

sX

i

v2i =q

v21 + v22 + . . .+ v2|v|L2 norm:

k✓new � ✓oldk2 < ✏• Assumeconvergencewhen

Page 24: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

25

(forfixed,thisisafunctionofx) (functionoftheparameters)

h(x)=-900– 0.1x

SlidebyAndrewNg

Page 25: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

26

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 26: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

27

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 27: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

28

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 28: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

29

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 29: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

30

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 30: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

31

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 31: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

32

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 32: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescent

33

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 33: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

Choosingα

34

αtoosmall

slowconvergence

αtoolarge

Increasingvaluefor J(✓)

• Mayovershoottheminimum• Mayfailtoconverge• Mayevendiverge

Toseeifgradientdescentisworking,printouteachiteration• Thevalueshoulddecreaseateachiteration• Ifitdoesn’t,adjustα

J(✓)

Page 34: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

ExtendingLinearRegressiontoMoreComplexModels

• TheinputsX forlinearregressioncanbe:– Originalquantitativeinputs– Transformationofquantitativeinputs

• e.g.log,exp,squareroot,square,etc.

– Polynomialtransformation• example:y =b0 +b1×x +b2×x2 +b3×x3

– Basisexpansions– Dummycodingofcategoricalinputs– Interactionsbetweenvariables

• example:x3 =x1 × x2

Thisallowsuseoflinearregressiontechniquestofitnon-lineardatasets.

Page 35: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearBasisFunctionModels

• Generally,

• Typically,sothatactsasabias• Inthesimplestcase,weuselinearbasisfunctions:

h✓(x) =dX

j=0

✓j�j(x)

�0(x) = 1 ✓0

�j(x) = xj

basisfunction

BasedonslidebyChristopherBishop(PRML)

Page 36: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearBasisFunctionModels

– Theseareglobal;asmallchangeinx affectsallbasisfunctions

• Polynomialbasisfunctions:

• Gaussianbasisfunctions:

– Thesearelocal;asmallchangeinx onlyaffectnearbybasisfunctions.μj ands controllocationandscale(width).

BasedonslidebyChristopherBishop(PRML)

Page 37: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearBasisFunctionModels• Sigmoidal basisfunctions:

where

– Thesearealsolocal;asmallchangeinx onlyaffectsnearbybasisfunctions.μjands controllocationandscale(slope).

BasedonslidebyChristopherBishop(PRML)

Page 38: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

ExampleofFittingaPolynomialCurvewithaLinearModel

y = ✓0 + ✓1x+ ✓2x2 + . . .+ ✓px

p =pX

j=0

✓jxj

Page 39: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearBasisFunctionModels

• BasicLinearModel:

• GeneralizedLinearModel:

• Oncewehavereplacedthedatabytheoutputsofthebasisfunctions,fittingthegeneralizedmodelisexactlythesameproblemasfittingthebasicmodel– Unlessweusethekerneltrick– moreonthatwhenwecoversupportvectormachines

– Therefore,thereisnopointinclutteringthemathwithbasisfunctions

40

h✓(x) =dX

j=0

✓j�j(x)

h✓(x) =dX

j=0

✓jxj

BasedonslidebyGeoffHinton

Page 40: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

LinearAlgebraConcepts• Vector in isanorderedsetofd realnumbers– e.g.,v=[1,6,3,4]isin– “[1,6,3,4]” isacolumnvector:– asopposedtoarowvector:

• Anm-by-n matrix isanobjectwithm rowsandn columns,whereeachentryisarealnumber:

÷÷÷÷÷

ø

ö

ççççç

è

æ

4361

( )4361

÷÷÷

ø

ö

ççç

è

æ

2396784821

Rd

R4

BasedonslidesbyJosephBradley

Page 41: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

• Transpose:reflectvector/matrixonline:

( )baba T

=÷÷ø

öççè

æ÷÷ø

öççè

æ=÷÷

ø

öççè

ædbca

dcba T

– Note:(Ax)T=xTAT (We’lldefinemultiplicationsoon…)

• Vectornorms:– Lp normofv =(v1,…,vk)is– Commonnorms:L1,L2– Linfinity =maxi |vi|

• Lengthofavectorv isL2(v)

X

i

|vi|p! 1

p

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 42: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

• Vectordotproduct:

– Note:dotproductofu withitself =length(u)2 =

• Matrixproduct:

( ) ( ) 22112121 vuvuvvuuvu +=•=•

÷÷ø

öççè

æ++++

=

÷÷ø

öççè

æ=÷÷

ø

öççè

æ=

2222122121221121

2212121121121111

2221

1211

2221

1211 ,

babababababababa

AB

bbbb

Baaaa

A

kuk22

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 43: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

• Vectorproducts:– Dotproduct:

– Outerproduct:

( ) 22112

121 vuvuvv

uuvuvu T +=÷÷ø

öççè

æ==•

( ) ÷÷ø

öççè

æ=÷÷

ø

öççè

æ=

2212

211121

2

1

vuvuvuvu

vvuu

uvT

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 44: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

h(x) = ✓|x

x| =⇥1 x1 . . . xd

Vectorization• Benefitsofvectorization– Morecompactequations– Fastercode(usingoptimizedmatrixlibraries)

• Considerourmodel:

• Let

• Canwritethemodelinvectorized formas45

h(x) =dX

j=0

✓jxj

✓ =

2

6664

✓0✓1...✓d

3

7775

Page 45: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

Vectorization• Considerourmodelforn instances:

• Let

• Canwritethemodelinvectorized formas46

h✓(x) = X✓

X =

2

66666664

1 x(1)1 . . . x(1)

d...

.... . .

...

1 x(i)1 . . . x(i)

d...

.... . .

...

1 x(n)1 . . . x(n)

d

3

77777775

✓ =

2

6664

✓0✓1...✓d

3

7775

h⇣x(i)

⌘=

dX

j=0

✓jx(i)j

R(d+1)⇥1 Rn⇥(d+1)

Page 46: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

J(✓) =1

2n

nX

i=1

⇣✓|x(i) � y(i)

⌘2

Vectorization• Forthelinearregressioncostfunction:

47

J(✓) =1

2n(X✓ � y)| (X✓ � y)

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2

Rn⇥(d+1)

R(d+1)⇥1

Rn⇥1R1⇥n

Let:

y =

2

6664

y(1)

y(2)

...y(n)

3

7775

Page 47: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

ClosedFormSolution:

ClosedFormSolution• InsteadofusingGD,solveforoptimal analytically– Noticethatthesolutioniswhen

• Derivation:

Takederivativeandsetequalto0,thensolvefor:

48

✓@

@✓J(✓) = 0

J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

1x1J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

✓@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

Page 48: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

ClosedFormSolution• CanobtainbysimplypluggingX and into

• IfX TX isnotinvertible(i.e.,singular),mayneedto:– Usepseudo-inverseinsteadoftheinverse

• Inpython,numpy.linalg.pinv(a)

– Removeredundant(notlinearlyindependent)features– Removeextrafeaturestoensurethatd ≤n

49

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

y =

2

6664

y(1)

y(2)

...y(n)

3

7775X =

2

66666664

1 x(1)1 . . . x(1)

d...

.... . .

...

1 x(i)1 . . . x(i)

d...

.... . .

...

1 x(n)1 . . . x(n)

d

3

77777775

✓ y

Page 49: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

GradientDescentvs ClosedForm

GradientDescentClosedFormSolution

50

• Requiresmultipleiterations• Needtochooseα• Workswellwhenn islarge• Cansupportincremental

learning

• Non-iterative• Noneedforα• Slowifn islarge

– Computing(X TX)-1 isroughlyO(n3)

Page 50: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

ImprovingLearning:FeatureScaling

• Idea:Ensurethatfeaturehavesimilarscales

• Makesgradientdescentconvergemuch faster

51

0

5

10

15

20

0 5 10 15 20

✓1

✓2

BeforeFeatureScaling

0

5

10

15

20

0 5 10 15 20

✓1

✓2

AfterFeatureScaling

Page 51: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

FeatureStandardization• Rescalesfeaturestohavezeromeanandunitvariance

– Letμj bethemeanoffeaturej:

– Replaceeachvaluewith:

• sj isthestandarddeviationoffeaturej• Couldalsousetherangeoffeaturej (maxj – minj)forsj

• Mustapplythesametransformationtoinstancesforbothtrainingandprediction

• Outlierscancauseproblems52

µj =1

n

nX

i=1

x(i)j

x(i)j

x(i)j � µj

sj

forj =1...d(notx0!)

Page 52: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

QualityofFit

Overfitting:• Thelearnedhypothesismayfitthetrainingsetverywell( )

• ...butfailstogeneralizetonewexamples

53

Prod

uctiv

ity

TimeSpent

Prod

uctiv

ity

TimeSpent

Prod

uctiv

ity

TimeSpent

Underfitting(highbias)

Overfitting(highvariance)

Correctfit

J(✓) ⇡ 0

BasedonexamplebyAndrewNg

Page 53: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

Regularization• Amethodforautomaticallycontrollingthecomplexityofthelearnedhypothesis

• Idea:penalizeforlargevaluesof– Canincorporateintothecostfunction– Workswellwhenwehavealotoffeatures,eachthatcontributesabittopredictingthelabel

• Canalsoaddressoverfitting byeliminatingfeatures(eithermanuallyorviamodelselection)

54

✓j

Page 54: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

Regularization• Linearregressionobjectivefunction

– istheregularizationparameter()– Noregularizationon!

55

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+ �

dX

j=1

✓2j

modelfittodata regularization

✓0

� � � 0

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 55: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

UnderstandingRegularization

• Notethat

– Thisisthemagnitudeofthefeaturecoefficientvector!

• Wecanalsothinkofthisas:

• L2 regularizationpullscoefficientstoward0

56

dX

j=1

✓2j = k✓1:dk22

dX

j=1

(✓j � 0)2 = k✓1:d � ~0k22

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 56: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

UnderstandingRegularization

• Whathappensas?

57

Prod

uctiv

ity

TimeSpentonWork

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+

2

dX

j=1

✓2j

� ! 1

✓0 + ✓1x+ ✓2x2 + ✓3x

3 + ✓4x4

Page 57: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

UnderstandingRegularization

• Whathappensas?

58

Prod

uctiv

ity

TimeSpentonWork

0 0 0 0

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+

2

dX

j=1

✓2j

� ! 1

✓0 + ✓1x+ ✓2x2 + ✓3x

3 + ✓4x4

Page 58: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

RegularizedLinearRegression

59

• CostFunction

• Fitbysolving

• Gradientupdate:

min✓

J(✓)

✓j ✓j � ↵1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

regularization

@

@✓jJ(✓)

@

@✓0J(✓)

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+

2

dX

j=1

✓2j

� ↵�✓j

Page 59: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

RegularizedLinearRegression

60

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

• Wecanrewritethegradientstepas:

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘2+

2

dX

j=1

✓2j

✓j ✓j (1� ↵�)� ↵1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

✓j ✓j � ↵1

n

nX

i=1

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j � ↵�✓j

Page 60: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

RegularizedLinearRegression

61

✓ =

0

BBBBB@X|X + �

2

666664

0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...

......

. . ....

0 0 0 . . . 1

3

777775

1

CCCCCA

�1

X|y

• Toincorporateregularizationintotheclosedformsolution:

Page 61: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:

RegularizedLinearRegression

62

• Toincorporateregularizationintotheclosedformsolution:

• Canderivethisthesameway,bysolving

• Canprovethatforλ >0,inverseexistsintheequationabove

✓ =

0

BBBBB@X|X + �

2

666664

0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...

......

. . ....

0 0 0 . . . 1

3

777775

1

CCCCCA

�1

X|y

@

@✓J(✓) = 0


Top Related