logistic regression - cc.gatech.edu · logistic regression robot image credit: viktoriyasukhanova©...

LogisticRegression

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyByronBoots,withonlyminormodificationsfromEricEaton’sslidesandgratefulacknowledgementtothemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.

ClassificationBasedonProbability• Insteadofjustpredictingtheclass,givetheprobabilityoftheinstancebeingthatclass– i.e.,learn

• Comparisontoperceptron:– Perceptrondoesn’tproduceprobabilityestimate

• Recallthat:

2

p(y | x)

p(event) + p(¬event) = 1

0 p(event) 1

LogisticRegression• Takesaprobabilisticapproachtolearningdiscriminativefunctions(i.e.,aclassifier)

• shouldgive– Want

• Logisticregressionmodel:

3

h✓(x) = g (✓|x)

g(z) =1

1 + e�z

0 h✓(x) 1

g(z) =1

1 + e�z

h✓

(x) =1

1 + e�✓

Tx

Logistic/SigmoidFunction

h✓(x) p(y = 1 | x;✓)

InterpretationofHypothesisOutput

4

=estimated

à Tellpatientthat70%chanceoftumorbeingmalignant

Example:Cancerdiagnosisfromtumorsize

h✓(x) p(y = 1 | x;✓)

x =

x0

x1

�=

1

tumorSize

�

h✓(x) = 0.7

p(y = 0 | x;✓) + p(y = 1 | x;✓) = 1Notethat:

BasedonexamplebyAndrewNg

Therefore, p(y = 0 | x;✓) = 1� p(y = 1 | x;✓)

AnotherInterpretation• Equivalently,logisticregressionassumesthat

• Inotherwords,logisticregressionassumesthatthelogoddsisalinearfunctionof

5

log

p(y = 1 | x;✓)p(y = 0 | x;✓) = ✓0 + ✓1x1 + . . .+ ✓dxd

x

SideNote:theoddsinfavorofaneventisthequantityp /(1−p),wherep istheprobabilityoftheevent

E.g.,IfItossafairdice,whataretheoddsthatIwillhavea6?

oddsofy =1

BasedonslidebyXiaoli Fern

LogisticRegression

• Assumeathresholdand...

– Predicty =1if– Predicty =0if

6

h✓(x) = g (✓|x)

g(z) =1

1 + e�z

g(z) =1

1 + e�z

h✓(x) � 0.5

h✓(x) < 0.5

y =1

y =0

✓

BasedonslidebyAndrewNg

shouldbelargenegativevaluesfornegativeinstances

h✓(x) = g (✓|x) shouldbelargepositive

valuesforpositiveinstancesh✓(x) = g (✓|

x)

Non-LinearDecisionBoundary• Canapplybasisfunctionexpansiontofeatures,sameaswithlinearregression

7

x =

2

41x1

x2

3

5 !

2

6666666666666664

1x1

x2

x1x2

x

21

x

22

x

21x2

x1x22

...

3

7777777777777775

LogisticRegression

• Given

where

• Model:

8

x

| =⇥1 x1 . . . xd

⇤✓ =

2

6664

✓0✓1...✓d

3

7775

h✓(x) = g (✓|x)

g(z) =1

1 + e�z

n⇣

x

(1), y(1)⌘

,⇣

x

(2), y(2)⌘

, . . . ,⇣

x

(n), y(n)⌘o

x

(i) 2 Rd, y(i) 2 {0, 1}

LogisticRegressionObjectiveFunction• Can’tjustusesquaredlossasinlinearregression:

– Usingthelogisticregressionmodel

resultsinanon-convexoptimization

9

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

h✓

(x) =1

1 + e�✓

Tx

DerivingtheCostFunctionviaMaximumLikelihoodEstimation

• Likelihoodofdataisgivenby:

• So,lookingfortheθ thatmaximizesthelikelihood

• Cantakethelogwithoutchangingthesolution:

10

l(✓) =nY

i=1

p(y(i) | x(i);✓)

✓MLE = argmax

✓l(✓) = argmax

✓

nY

i=1

p(y(i) | x(i);✓)

✓MLE = argmax

✓log

nY

i=1

p(y(i) | x(i);✓)

= argmax

✓

nX

i=1

log p(y(i) | x(i);✓)

✓MLE = argmax

✓log

nY

i=1

p(y(i) | x(i);✓)

= argmax

✓

nX

i=1


DerivingtheCostFunctionviaMaximumLikelihoodEstimation

11

• Expandasfollows:

• Substituteinmodel,andtakenegativetoyield

✓MLE = argmax

✓

nX

i=1


= argmax

✓

nX

i=1

hy(i) log p(y(i)=1 | x(i)

;✓) +

⇣1� y(i)

⌘log

⇣1� p(y(i)=1 | x(i)

;✓)

⌘i

J(✓) = �nX

i=1

hy(i) log h✓(x

(i)) +

⇣1� y(i)

⌘log

⇣1� h✓(x

(i))

⌘i

Logisticregressionobjective:min✓

J(✓)

✓MLE = argmax

✓

nX

i=1


= argmax

✓

nX

i=1

hy(i) log p(y(i)=1 | x(i)

;✓) +

⇣1� y(i)

⌘log

⇣1� p(y(i)=1 | x(i)

;✓)

⌘i

IntuitionBehindtheObjective

• Costofasingleinstance:

• Canre-writeobjectivefunctionas

12

J(✓) = �nX

i=1

hy(i) log h✓(x

(i)) +

⇣1� y(i)

⌘log

⇣1� h✓(x

(i))

⌘i

cost (h✓(x), y) =

⇢� log(h✓(x)) if y = 1

� log(1� h✓(x)) if y = 0

J(✓) =nX

i=1

cost

⇣h✓(x

(i)), y(i)

⌘

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2Comparetolinearregression:


13

cost (h✓(x), y) =

⇢� log(h✓(x)) if y = 1

� log(1� h✓(x)) if y = 0

Aside:Recalltheplotoflog(z)


Ify =1• Cost=0ifpredictioniscorrect• As

• Capturesintuitionthatlargermistakesshouldgetlargerpenalties– e.g.,predict,buty =1

14

cost (h✓(x), y) =

⇢� log(h✓(x)) if y = 1

� log(1� h✓(x)) if y = 0

h✓(x) ! 0, cost ! 1

h✓(x) = 0


Ify =1

10

cost

h✓(x) = 0


15

cost (h✓(x), y) =

⇢� log(h✓(x)) if y = 1

� log(1� h✓(x)) if y = 0

Ify =0

10

cost

Ify =1

Ify =0• Cost=0ifpredictioniscorrect• As

• Capturesintuitionthatlargermistakesshouldgetlargerpenalties

(1� h✓(x)) ! 0, cost ! 1


h✓(x) = 0

RegularizedLogisticRegression

• Wecanregularizelogisticregressionexactlyasbefore:

16

J(✓) = �nX

i=1

hy(i) log h✓(x

(i)) +

⇣1� y(i)

⌘log

⇣1� h✓(x

(i))

⌘i

Jregularized(✓) = J(✓) + �dX

j=1

✓2j

= J(✓) + �k✓[1:d]k22

GradientDescentforLogisticRegression

17

• Initialize• Repeatuntilconvergence

✓

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

Jreg(✓) = �nX

i=1

hy(i) log h✓(x

(i)) +

⇣1� y(i)

⌘log

⇣1� h✓(x

(i))

⌘i+ �k✓[1:d]k22

Want min✓

J(✓)

Usethenaturallogarithm(ln =loge)tocancelwiththeexp()in h✓

(x) =1

1 + e�✓

Tx

✓0 ✓0 � ↵

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘

✓j ✓j � ↵

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j � �✓j


18

Jreg(✓) = �nX

i=1

hy(i) log h✓(x

(i)) +

⇣1� y(i)

⌘log

⇣1� h✓(x

(i))

⌘i+ �k✓[1:d]k22

Want min✓

J(✓)


✓(simultaneousupdateforj =0...d)

✓j ✓j � ↵

"nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j �

�

n

✓j

#

✓0 ✓0 � ↵

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘

✓j ✓j � ↵

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j � �✓j


19


✓(simultaneousupdateforj =0...d)

ThislooksIDENTICALtolinearregression!!!• Ignoringthe1/n constant• However,theformofthemodelisverydifferent:

h✓

(x) =1

1 + e�✓

Tx

✓j ✓j � ↵

"nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j �

�

n

✓j

#

Multi-ClassClassification

Diseasediagnosis: healthy/cold/flu/pneumonia

Objectclassification: desk/chair/monitor/bookcase20

x1

x2

x1

x2

Binaryclassification: Multi-classclassification:

h✓(x) =1

1 + exp(�✓

Tx)

=

exp(✓

Tx)

1 + exp(✓

Tx)

Multi-ClassLogisticRegression• For2classes:

• ForC classes{1,...,C}:

– Calledthesoftmax function

21

h✓(x) =1

1 + exp(�✓

Tx)

=

exp(✓

Tx)

1 + exp(✓

Tx)

weightassignedtoy =0

weightassignedtoy =1

p(y = c | x;✓1, . . . ,✓C) =exp(✓

Tc x)PC

c=1 exp(✓Tc x)

Multi-ClassLogisticRegression

• Trainalogisticregressionclassifierforeachclassi topredicttheprobabilitythaty =i with

22

x1

x2

SplitintoOnevs Rest:

hc(x) =exp(✓

Tc x)PC

c=1 exp(✓Tc x)

hc(x) =exp(✓

Tc x)PC

c=1 exp(✓Tc x)

ImplementingMulti-ClassLogisticRegression

• Useasthemodelforclassc

• Gradientdescentsimultaneouslyupdatesallparametersforallmodels– Samederivativeasbefore,justwiththeabovehc(x)

• Predictclasslabelasthemostprobablelabel

23

max

chc(x)

logistic regression - cc.gatech.edu · logistic regression robot image credit: viktoriyasukhanova©...

Documents