logistic regression - svivek · logistic regression is the discriminative version. this lecture...

63
Machine Learning Logistic Regression 1

Upload: others

Post on 30-Jul-2020

30 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

MachineLearning

LogisticRegression

1

Page 2: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Wherearewe?

Wehaveseenthefollowingideas– Linearmodels– Learningaslossminimization– Bayesianlearningcriteria(MAPandMLEestimation)– TheNaïveBayesclassifier

2

Page 3: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thislecture

• Logisticregression

• ConnectiontoNaïveBayes

• Trainingalogisticregressionclassifier

• Backtolossminimization

3

Page 4: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thislecture

• Logisticregression

• ConnectiontoNaïveBayes

• Trainingalogisticregressionclassifier

• Backtolossminimization

4

Page 5: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

LogisticRegression:Setup

• Thesetting– Binaryclassification– Inputs:Featurevectorsx 2 <d

– Labels:y 2 {-1, +1}

• Trainingdata– S={(xi, yi)},mexamples

5

Page 6: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Classification,but…

Theoutputy isdiscretevalued(-1 or 1)

Insteadofpredictingtheoutput,letustrytopredictP(y=1 |x)

Expandhypothesisspacetofunctionswhoseoutputis[0-1]• Originalproblem:<d ! {-1, 1}• Modifiedproblem:<d ! [0-1]• Effectivelymaketheproblemaregressionproblem

Manyhypothesisspacespossible

6

Page 7: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Classification,but…

Theoutputy isdiscretevalued(-1 or 1)

Insteadofpredictingtheoutput,letustrytopredictP(y=1 |x)

Expandhypothesisspacetofunctionswhoseoutputis[0-1]• Originalproblem:<d ! {-1, 1}• Modifiedproblem:<d ! [0-1]• Effectivelymaketheproblemaregressionproblem

Manyhypothesisspacespossible

7

Page 8: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

TheSigmoidfunction

Thehypothesisspaceforlogisticregression:Allfunctionsoftheform

Thatis,alinearfunction,composedwithasigmoidfunction(thelogisticfunction) ¾

Whatisthedomainandtherangeofthesigmoidfunction?

Thisisareasonablechoice.Wewillseewhylater

8

Page 9: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

TheSigmoidfunction

Thehypothesisspaceforlogisticregression:Allfunctionsoftheform

Thatis,alinearfunction,composedwithasigmoidfunction(thelogisticfunction) ¾

Thisisareasonablechoice.Wewillseewhylater

9

Page 10: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

TheSigmoidfunction

Thehypothesisspaceforlogisticregression:Allfunctionsoftheform

Thatis,alinearfunction,composedwithasigmoidfunction(thelogisticfunction) ¾

Whatisthedomainandtherangeofthesigmoidfunction?

Thisisareasonablechoice.Wewillseewhylater

10

Page 11: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

TheSigmoidfunction

¾(z)

z

11

Page 12: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

TheSigmoidfunction

12

Whatisitsderivativewithrespecttoz?

Page 13: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

TheSigmoidfunction

13

Whatisitsderivativewithrespecttoz?

Page 14: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Predictingprobabilities

Accordingtothelogisticregressionmodel,wehave

14

Page 15: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Predictingprobabilities

Accordingtothelogisticregressionmodel,wehave

15

Page 16: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Predictingprobabilities

Accordingtothelogisticregressionmodel,wehave

16

Page 17: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Predictingprobabilities

Accordingtothelogisticregressionmodel,wehave

Orequivalently

17

Page 18: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Predictingprobabilities

Accordingtothelogisticregressionmodel,wehave

Orequivalently

18

Notethatwearedirectlymodeling𝑃(𝑦|𝑥) ratherthan𝑃(𝑥|𝑦)and𝑃(𝑦)

Page 19: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Predictingalabelwithlogisticregression

• ComputeP(y=1|x;w)

• Ifthisisgreaterthanhalf,predict1elsepredict-1– WhatdoesthiscorrespondtointermsofwTx?

19

Page 20: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Predictingalabelwithlogisticregression

• ComputeP(y=1|x;w)

• Ifthisisgreaterthanhalf,predict1elsepredict-1– WhatdoesthiscorrespondtointermsofwTx?

– Prediction=sgn(wTx)

20

Page 21: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thislecture

• Logisticregression

• ConnectiontoNaïveBayes

• Trainingalogisticregressionclassifier

• Backtolossminimization

21

Page 22: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

NaïveBayesandLogisticregression

RememberthatthenaïveBayesdecisionisalinearfunction

Here,theP’srepresenttheNaïveBayesposteriordistribution,andwcanbeusedtocalculatethepriorsandthelikelihoods.

Thatis,𝑃(𝑦 = 1|𝐰, 𝐱)iscomputedusing𝑃(𝐱|𝑦 = 1,𝐰)and𝑃(𝑦 = 1|𝐰)

22

log𝑃(𝑦 = −1|𝐱,𝐰)𝑃(𝑦 = +1|𝐱,𝐰) = 𝐰2𝐱

Page 23: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

NaïveBayesandLogisticregression

RememberthatthenaïveBayesdecisionisalinearfunction

Butwealsoknowthat𝑃 𝑦 = +1 𝐱,𝐰 = 1 − 𝑃(𝑦 = −1|𝐱,𝐰)

23

log𝑃(𝑦 = −1|𝐱,𝐰)𝑃(𝑦 = +1|𝐱,𝐰) = 𝐰2𝐱

Page 24: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

NaïveBayesandLogisticregression

RememberthatthenaïveBayesdecisionisalinearfunction

Butwealsoknowthat𝑃 𝑦 = +1 𝐱,𝐰 = 1 − 𝑃(𝑦 = −1|𝐱,𝐰)

Substitutingintheaboveexpression,weget

24

log𝑃(𝑦 = −1|𝐱,𝐰)𝑃(𝑦 = +1|𝐱,𝐰) = 𝐰2𝐱

𝑃 𝑦 = +1 𝐰, 𝐱 = 𝜎 𝐰2𝐱 =1

1 + exp(−𝐰2𝐱)

Page 25: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

NaïveBayesandLogisticregression

RememberthatthenaïveBayesdecisionisalinearfunction

Butwealsoknowthat𝑃 𝑦 = +1 𝐱,𝐰 = 1 − 𝑃(𝑦 = −1|𝐱,𝐰)

Substitutingintheaboveexpression,weget

25

log𝑃(𝑦 = −1|𝐱,𝐰)𝑃(𝑦 = +1|𝐱,𝐰) = 𝐰2𝐱

𝑃 𝑦 = +1 𝐰, 𝐱 = 𝜎 𝐰2𝐱 =1

1 + exp(−𝐰2𝐱)

Thatis,bothnaïveBayesandlogisticregressiontrytocomputethesameposteriordistributionovertheoutputs

NaïveBayesisagenerativemodel.

LogisticRegressionisthediscriminativeversion.

Page 26: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thislecture

• Logisticregression

• ConnectiontoNaïveBayes

• Trainingalogisticregressionclassifier– First:Maximumlikelihoodestimation– Then:Addingpriorsà MaximumaPosterioriestimation

• Backtolossminimization

26

Page 27: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Maximumlikelihoodestimation

Let’sgetbacktotheproblemoflearning

• Trainingdata– S={(xi, yi)},mexamples

• Whatwewant– Findaw suchthatP(S|w)ismaximized– Weknowthatourexamplesaredrawnindependentlyandareidenticallydistributed(i.i.d)

– Howdoweproceed?

27

Page 28: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Maximumlikelihoodestimation

28

Theusualtrick:Convertproductstosumsbytakinglog

Recallthatthisworksonlybecauselogisanincreasingfunctionandthemaximizer willnotchange

argmax𝐰

𝑃 𝑆 𝐰 = argmax𝐰

;𝑃 𝑦< 𝐱<,𝐰)=

<>?

Page 29: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Maximumlikelihoodestimation

29

Equivalenttosolving

argmax𝐰

𝑃 𝑆 𝐰 = argmax𝐰

;𝑃 𝑦< 𝐱<,𝐰)=

<>?

max𝐰

@log𝑃 𝑦< 𝐱<, 𝐰)=

<

Page 30: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Maximumlikelihoodestimation

30

But(bydefinition)weknowthat

argmax𝐰

𝑃 𝑆 𝐰 = argmax𝐰

;𝑃 𝑦< 𝐱<,𝐰)=

<>?

max𝐰

@log𝑃 𝑦< 𝐱<, 𝐰)=

<

𝑃 𝑦 𝐰, 𝐱 = 𝜎 𝑦<𝐰2𝐱< =1

1 + exp(−𝑦<𝐰2𝐱<)

Page 31: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Maximumlikelihoodestimation

31

argmax𝐰

𝑃 𝑆 𝐰 = argmax𝐰

;𝑃 𝑦< 𝐱<,𝐰)=

<>?

max𝐰

@log𝑃 𝑦< 𝐱<, 𝐰)=

<

𝑃 𝑦 𝐰, 𝐱 =1

1 + exp(−yB𝐰2𝐱<)

Equivalenttosolving

max𝐰

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

<

Page 32: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Maximumlikelihoodestimation

32

argmax𝐰

𝑃 𝑆 𝐰 = argmax𝐰

;𝑃 𝑦< 𝐱<,𝐰)=

<>?

max𝐰

@log𝑃 𝑦< 𝐱<, 𝐰)=

<

𝑃 𝑦 𝐰, 𝐱 =1

1 + exp(−yB𝐰2𝐱<)

Equivalenttosolving

Thegoal:Maximumlikelihoodtrainingofadiscriminativeprobabilisticclassifierunderthelogisticmodelfortheposteriordistribution.

max𝐰

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

<

Page 33: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Maximumlikelihoodestimation

33

argmax𝐰

𝑃 𝑆 𝐰 = argmax𝐰

;𝑃 𝑦< 𝐱<,𝐰)=

<>?

max𝐰

@log𝑃 𝑦< 𝐱<, 𝐰)=

<

𝑃 𝑦 𝐰, 𝐱 =1

1 + exp(−yB𝐰2𝐱<)

Equivalenttosolving

max𝐰

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

<

Equivalentto:Trainingalinearclassifierbyminimizingthelogisticloss.

Thegoal:Maximumlikelihoodtrainingofadiscriminativeprobabilisticclassifierunderthelogisticmodelfortheposteriordistribution.

Page 34: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Maximumaposterioriestimation

Wecouldalsoaddapriorontheweights

Supposeeachweightintheweightvectorisdrawnindependentlyfromthenormaldistributionwithzeromeanandstandarddeviation𝜎

𝑝 𝐰 =;𝑝(𝑤<)E

F>?

=;1

𝜎 2𝜋� exp−𝑤<J

𝜎J

E

F>?

34

Page 35: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

MAPestimationforlogisticregression

35

𝑝 𝐰 =;𝑝(𝑤<)E

F>?

=;1

𝜎 2𝜋� exp−𝑤<J

𝜎J

E

F>?

Letusworkthroughthisprocedureagaintoseewhatchanges

Page 36: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

MAPestimationforlogisticregression

36

𝑝 𝐰 =;𝑝(𝑤<)E

F>?

=;1

𝜎 2𝜋� exp−𝑤<J

𝜎J

E

F>?

Letusworkthroughthisprocedureagaintoseewhatchanges

WhatisthegoalofMAPestimation?(Inmaximumlikelihood,wemaximizedthelikelihoodofthedata)

Page 37: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

MAPestimationforlogisticregression

37

𝑝 𝐰 =;𝑝(𝑤<)E

F>?

=;1

𝜎 2𝜋� exp−𝑤<J

𝜎J

E

F>?

WhatisthegoalofMAPestimation?(Inmaximumlikelihood,wemaximizedthelikelihoodofthedata)

Tomaximizetheposteriorprobabilityofthemodelgiventhedata(i.e.tofindthemostprobablemodel,giventhedata)

𝑃 𝐰 𝑆 ∝ 𝑃 𝑆 𝐰 𝑃(𝐰)

Page 38: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

MAPestimationforlogisticregression

38

Learningbysolving

𝑝 𝐰 =;𝑝(𝑤<)E

F>?

=;1

𝜎 2𝜋� exp−𝑤<J

𝜎J

E

F>?

argmax𝐰

𝑃(𝐰|𝑆) = argmax𝐰

𝑃 𝑆 𝐰 𝑃(𝐰)

Page 39: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

MAPestimationforlogisticregression

39

Learningbysolving

𝑝 𝐰 =;𝑝(𝑤<)E

F>?

=;1

𝜎 2𝜋� exp−𝑤<J

𝜎J

E

F>?

argmax𝐰

𝑃 𝑆 𝐰 𝑃(𝐰)

Takelogtosimplify

max𝐰

log 𝑃 𝑆 𝐰 + log𝑃(𝐰)

Page 40: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

MAPestimationforlogisticregression

40

Learningbysolving

𝑝 𝐰 =;𝑝(𝑤<)E

F>?

=;1

𝜎 2𝜋� exp−𝑤<J

𝜎J

E

F>?

argmax𝐰

𝑃 𝑆 𝐰 𝑃(𝐰)

Takelogtosimplify

max𝐰

log 𝑃 𝑆 𝐰 + log𝑃(𝐰)

Wehavealreadyexpandedoutthefirstterm.

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

<

Page 41: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

MAPestimationforlogisticregression

41

Learningbysolving

𝑝 𝐰 =;𝑝(𝑤<)E

F>?

=;1

𝜎 2𝜋� exp−𝑤<J

𝜎J

E

F>?

argmax𝐰

𝑃 𝑆 𝐰 𝑃(𝐰)

Takelogtosimplify

max𝐰

log 𝑃 𝑆 𝐰 + log𝑃(𝐰)

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

<

+@−𝑤<J

𝜎J

E

F>?

+ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠

Expandthelogprior

Page 42: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

MAPestimationforlogisticregression

42

Learningbysolving

𝑝 𝐰 =;𝑝(𝑤<)E

F>?

=;1

𝜎 2𝜋� exp−𝑤<J

𝜎J

E

F>?

argmax𝐰

𝑃 𝑆 𝐰 𝑃(𝐰)

Takelogtosimplify

max𝐰

log 𝑃 𝑆 𝐰 + log𝑃(𝐰)

max𝐰

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

<

+@−𝑤<J

𝜎J

E

F>?

+ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠

Page 43: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

MAPestimationforlogisticregression

43

Learningbysolving

𝑝 𝐰 =;𝑝(𝑤<)E

F>?

=;1

𝜎 2𝜋� exp−𝑤<J

𝜎J

E

F>?

argmax𝐰

𝑃 𝑆 𝐰 𝑃(𝐰)

Takelogtosimplify

max𝐰

log 𝑃 𝑆 𝐰 + log𝑃(𝐰)

max𝐰

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

<

−1𝜎J 𝐰

2𝐰

Page 44: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

MAPestimationforlogisticregression

44

Learningbysolving

𝑝 𝐰 =;𝑝(𝑤<)E

F>?

=;1

𝜎 2𝜋� exp−𝑤<J

𝜎J

E

F>?

argmax𝐰

𝑃 𝑆 𝐰 𝑃(𝐰)

Takelogtosimplify

max𝐰

log 𝑃 𝑆 𝐰 + log𝑃(𝐰)

max𝐰

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

<

−1𝜎J 𝐰

2𝐰

Maximizinganegativefunctionisthesameasminimizingthefunction

Page 45: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Learningalogisticregressionclassifier

Learningalogisticregressionclassifierisequivalenttosolving

45

min𝐰@log(1 + exp(−𝑦<𝐰2𝐱<)=

<

+1𝜎J 𝐰

2𝐰

Page 46: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Learningalogisticregressionclassifier

Learningalogisticregressionclassifierisequivalenttosolving

46

Wherehaveweseenthisbefore?

min𝐰@log(1 + exp(−𝑦<𝐰2𝐱<)=

<

+1𝜎J 𝐰

2𝐰

Page 47: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Learningalogisticregressionclassifier

Learningalogisticregressionclassifierisequivalenttosolving

47

Wherehaveweseenthisbefore?

Thefirstquestioninthehomework:Writedownthestochasticgradientdescentalgorithmforthis?

Historically,othertrainingalgorithmsexist.Inparticular,youmightrunintoLBFGS

min𝐰@log(1 + exp(−𝑦<𝐰2𝐱<)=

<

+1𝜎J 𝐰

2𝐰

Page 48: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Logisticregressionis…

• Aclassifierthatpredictstheprobabilitythatthelabelis+1foraparticularinput

• Thediscriminativecounter-partofthenaïveBayesclassifier

• AdiscriminativeclassifierthatcanbetrainedviaMAPorMLEestimation

• Adiscriminativeclassifierthatminimizesthelogisticlossoverthetrainingset

48

Page 49: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thislecture

• Logisticregression

• ConnectiontoNaïveBayes

• Trainingalogisticregressionclassifier

• Backtolossminimization

49

Page 50: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Learningaslossminimization• Thesetup

– Examplesx drawnfromafixed,unknowndistributionD– Hiddenoracleclassifierf labelsexamples– Wewishtofindahypothesish thatmimicsf

• Theidealsituation– DefineafunctionL thatpenalizesbadhypotheses– Learning:Pickafunctionh2 Htominimizeexpectedloss

• Instead,minimizeempiricallossonthetrainingset

50

ButdistributionDisunknown

Page 51: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Empiricallossminimization

Learning=minimizeempiricallossonthetrainingset

51

Isthereaproblemhere?

Page 52: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Empiricallossminimization

Learning=minimizeempiricallossonthetrainingset

Weneedsomethingthatbiasesthelearnertowardssimplerhypotheses• Achievedusingaregularizer,whichpenalizescomplex

hypotheses

52

Isthereaproblemhere? Overfitting!

Page 53: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Regularizedlossminimization

• Learning:

• Withlinearclassifiers:

• Whatisalossfunction?– Lossfunctionsshouldpenalizemistakes– Weareminimizingaveragelossoverthetrainingdata

• Whatistheideallossfunctionforclassification?

53

(usingl2regularization)

Page 54: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

The0-1loss

Penalizeclassificationmistakesbetweentruelabelyandpredictiony’

• Forlinearclassifiers,thepredictiony’=sgn(wTx)– MistakeifywTx· 0

Minimizing0-1lossisintractable.Needsurrogates

54

Page 55: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thelossfunctionzoo

Manylossfunctionsexist– Perceptronloss

– Hingeloss(SVM)

– Exponentialloss(AdaBoost)

– Logisticloss(logisticregression)

55

Page 56: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thelossfunctionzoo

56

Page 57: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thelossfunctionzoo

57

Zero-one

Page 58: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thelossfunctionzoo

58

Hinge:SVM

Zero-one

Page 59: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thelossfunctionzoo

59

Perceptron

Hinge:SVM

Zero-one

Page 60: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thelossfunctionzoo

60

Perceptron

Hinge:SVM

Exponential:AdaBoost

Zero-one

Page 61: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thelossfunctionzoo

61

Perceptron

Hinge:SVM

Logisticregression

Exponential:AdaBoost

Zero-one

Page 62: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thelossfunctionzoo

62

Zoomedout

Page 63: Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Thelossfunctionzoo

63

Zoomedoutevenmore