logistic regression

MachineLearning

LogisticRegression

Wherearewe?

Wehaveseenthefollowingideas– Linearmodels– Learningaslossminimization– Bayesianlearningcriteria(MAPandMLEestimation)– TheNaïveBayesclassifier

Thislecture

• Logisticregression

• ConnectiontoNaïveBayes

• Trainingalogisticregressionclassifier

• Backtolossminimization

Thislecture

LogisticRegression:Setup

• Thesetting– Binaryclassification– Inputs:Featurevectorsx 2 <d

– Labels:y 2 {-1, +1}

• Trainingdata– S={(xi, yi)},mexamples

Classification,but…

Theoutputy isdiscretevalued(-1 or 1)

Insteadofpredictingtheoutput,letustrytopredictP(y=1 |x)

Expandhypothesisspacetofunctionswhoseoutputis[0-1]• Originalproblem:<d ! {-1, 1}• Modifiedproblem:<d ! [0-1]• Effectivelymaketheproblemaregressionproblem

Manyhypothesisspacespossible

Classification,but…

Theoutputy isdiscretevalued(-1 or 1)

Insteadofpredictingtheoutput,letustrytopredictP(y=1 |x)

Expandhypothesisspacetofunctionswhoseoutputis[0-1]• Originalproblem:<d ! {-1, 1}• Modifiedproblem:<d ! [0-1]• Effectivelymaketheproblemaregressionproblem

Manyhypothesisspacespossible

TheSigmoidfunction

Thehypothesisspaceforlogisticregression:Allfunctionsoftheform

Thatis,alinearfunction,composedwithasigmoidfunction(thelogisticfunction) ¾

Whatisthedomainandtherangeofthesigmoidfunction?

Thisisareasonablechoice.Wewillseewhylater

TheSigmoidfunction

Whatisthedomainandtherangeofthesigmoidfunction?

TheSigmoidfunction

Whatisitsderivativewithrespecttoz?

TheSigmoidfunction

Whatisitsderivativewithrespecttoz?

Predictingprobabilities

Accordingtothelogisticregressionmodel,wehave

Orequivalently

Notethatwearedirectlymodeling𝑃(𝑦|𝑥) ratherthan𝑃(𝑥|𝑦)and𝑃(𝑦)

Predictingalabelwithlogisticregression

• ComputeP(y=1|x;w)

• Ifthisisgreaterthanhalf,predict1elsepredict-1– WhatdoesthiscorrespondtointermsofwTx?

Predictingalabelwithlogisticregression

• ComputeP(y=1|x;w)

• Ifthisisgreaterthanhalf,predict1elsepredict-1– WhatdoesthiscorrespondtointermsofwTx?

– Prediction=sgn(wTx)

Thislecture

NaïveBayesandLogisticregression

RememberthatthenaïveBayesdecisionisalinearfunction

Here,theP’srepresenttheNaïveBayesposteriordistribution,andwcanbeusedtocalculatethepriorsandthelikelihoods.

Thatis,𝑃(𝑦 = 1|𝐰, 𝐱)iscomputedusing𝑃(𝐱|𝑦 = 1,𝐰)and𝑃(𝑦 = 1|𝐰)

log𝑃(𝑦 = −1|𝐱,𝐰)𝑃(𝑦 = +1|𝐱,𝐰) = 𝐰2𝐱

Butwealsoknowthat𝑃 𝑦 = +1 𝐱,𝐰 = 1 − 𝑃(𝑦 = −1|𝐱,𝐰)

log𝑃(𝑦 = −1|𝐱,𝐰)𝑃(𝑦 = +1|𝐱,𝐰) = 𝐰2𝐱

Substitutingintheaboveexpression,weget

log𝑃(𝑦 = −1|𝐱,𝐰)𝑃(𝑦 = +1|𝐱,𝐰) = 𝐰2𝐱

𝑃 𝑦 = +1 𝐰, 𝐱 = 𝜎 𝐰2𝐱 =1

1 + exp(−𝐰2𝐱)

Substitutingintheaboveexpression,weget

log𝑃(𝑦 = −1|𝐱,𝐰)𝑃(𝑦 = +1|𝐱,𝐰) = 𝐰2𝐱

𝑃 𝑦 = +1 𝐰, 𝐱 = 𝜎 𝐰2𝐱 =1

1 + exp(−𝐰2𝐱)

Thatis,bothnaïveBayesandlogisticregressiontrytocomputethesameposteriordistributionovertheoutputs

NaïveBayesisagenerativemodel.

LogisticRegressionisthediscriminativeversion.

Thislecture

• Trainingalogisticregressionclassifier– First:Maximumlikelihoodestimation– Then:Addingpriorsà MaximumaPosterioriestimation

Maximumlikelihoodestimation

Let’sgetbacktotheproblemoflearning

• Trainingdata– S={(xi, yi)},mexamples

• Whatwewant– Findaw suchthatP(S|w)ismaximized– Weknowthatourexamplesaredrawnindependentlyandareidenticallydistributed(i.i.d)

– Howdoweproceed?

Theusualtrick:Convertproductstosumsbytakinglog

Recallthatthisworksonlybecauselogisanincreasingfunctionandthemaximizer willnotchange

argmax𝐰

𝑃 𝑆 𝐰 = argmax𝐰

;𝑃 𝑦< 𝐱<,𝐰)=

Equivalenttosolving

argmax𝐰

;𝑃 𝑦< 𝐱<,𝐰)=

max𝐰

@log𝑃 𝑦< 𝐱<, 𝐰)=

But(bydefinition)weknowthat

argmax𝐰

;𝑃 𝑦< 𝐱<,𝐰)=

max𝐰

𝑃 𝑦 𝐰, 𝐱 = 𝜎 𝑦<𝐰2𝐱< =1

1 + exp(−𝑦<𝐰2𝐱<)

argmax𝐰

;𝑃 𝑦< 𝐱<,𝐰)=

max𝐰

𝑃 𝑦 𝐰, 𝐱 =1

1 + exp(−yB𝐰2𝐱<)

Equivalenttosolving

max𝐰

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

argmax𝐰

;𝑃 𝑦< 𝐱<,𝐰)=

max𝐰

𝑃 𝑦 𝐰, 𝐱 =1

Equivalenttosolving

Thegoal:Maximumlikelihoodtrainingofadiscriminativeprobabilisticclassifierunderthelogisticmodelfortheposteriordistribution.

max𝐰

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

argmax𝐰

;𝑃 𝑦< 𝐱<,𝐰)=

max𝐰

𝑃 𝑦 𝐰, 𝐱 =1

Equivalenttosolving

max𝐰

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

Equivalentto:Trainingalinearclassifierbyminimizingthelogisticloss.

Thegoal:Maximumlikelihoodtrainingofadiscriminativeprobabilisticclassifierunderthelogisticmodelfortheposteriordistribution.

Maximumaposterioriestimation

Wecouldalsoaddapriorontheweights

Supposeeachweightintheweightvectorisdrawnindependentlyfromthenormaldistributionwithzeromeanandstandarddeviation𝜎

𝑝 𝐰 =;𝑝(𝑤<)E

𝜎 2𝜋� exp−𝑤<J

MAPestimationforlogisticregression

Letusworkthroughthisprocedureagaintoseewhatchanges

WhatisthegoalofMAPestimation?(Inmaximumlikelihood,wemaximizedthelikelihoodofthedata)

Tomaximizetheposteriorprobabilityofthemodelgiventhedata(i.e.tofindthemostprobablemodel,giventhedata)

𝑃 𝐰 𝑆 ∝ 𝑃 𝑆 𝐰 𝑃(𝐰)

Learningbysolving

argmax𝐰

𝑃(𝐰|𝑆) = argmax𝐰

𝑃 𝑆 𝐰 𝑃(𝐰)

Learningbysolving

argmax𝐰

Takelogtosimplify

max𝐰

log 𝑃 𝑆 𝐰 + log𝑃(𝐰)

Learningbysolving

argmax𝐰

Takelogtosimplify

max𝐰

Wehavealreadyexpandedoutthefirstterm.

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

Learningbysolving

argmax𝐰

Takelogtosimplify

max𝐰

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

+@−𝑤<J

+ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠

Expandthelogprior

Learningbysolving

argmax𝐰

Takelogtosimplify

max𝐰

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

+@−𝑤<J

+ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠

Learningbysolving

argmax𝐰

Takelogtosimplify

max𝐰

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

−1𝜎J 𝐰

Learningbysolving

argmax𝐰

Takelogtosimplify

max𝐰

@−log(1 + exp(−𝑦<𝐰2𝐱<)=

−1𝜎J 𝐰

Maximizinganegativefunctionisthesameasminimizingthefunction

Learningalogisticregressionclassifier

Learningalogisticregressionclassifierisequivalenttosolving

min𝐰@log(1 + exp(−𝑦<𝐰2𝐱<)=

+1𝜎J 𝐰

Wherehaveweseenthisbefore?

+1𝜎J 𝐰

Wherehaveweseenthisbefore?

Thefirstquestioninthehomework:Writedownthestochasticgradientdescentalgorithmforthis?

Historically,othertrainingalgorithmsexist.Inparticular,youmightrunintoLBFGS

+1𝜎J 𝐰

Logisticregressionis…

• Aclassifierthatpredictstheprobabilitythatthelabelis+1foraparticularinput

• Thediscriminativecounter-partofthenaïveBayesclassifier

• AdiscriminativeclassifierthatcanbetrainedviaMAPorMLEestimation

• Adiscriminativeclassifierthatminimizesthelogisticlossoverthetrainingset

Thislecture

Learningaslossminimization• Thesetup

– Examplesx drawnfromafixed,unknowndistributionD– Hiddenoracleclassifierf labelsexamples– Wewishtofindahypothesish thatmimicsf

• Theidealsituation– DefineafunctionL thatpenalizesbadhypotheses– Learning:Pickafunctionh2 Htominimizeexpectedloss

• Instead,minimizeempiricallossonthetrainingset

ButdistributionDisunknown

Empiricallossminimization

Learning=minimizeempiricallossonthetrainingset

Isthereaproblemhere?

Empiricallossminimization

Learning=minimizeempiricallossonthetrainingset

Weneedsomethingthatbiasesthelearnertowardssimplerhypotheses• Achievedusingaregularizer,whichpenalizescomplex

hypotheses

Isthereaproblemhere? Overfitting!

Regularizedlossminimization

• Learning:

• Withlinearclassifiers:

• Whatisalossfunction?– Lossfunctionsshouldpenalizemistakes– Weareminimizingaveragelossoverthetrainingdata

• Whatistheideallossfunctionforclassification?

(usingl2regularization)

The0-1loss

Penalizeclassificationmistakesbetweentruelabelyandpredictiony’

• Forlinearclassifiers,thepredictiony’=sgn(wTx)– MistakeifywTx· 0

Minimizing0-1lossisintractable.Needsurrogates

Thelossfunctionzoo

Manylossfunctionsexist– Perceptronloss

– Hingeloss(SVM)

– Exponentialloss(AdaBoost)

– Logisticloss(logisticregression)

Thelossfunctionzoo

Zero-one

Thelossfunctionzoo

Hinge:SVM

Zero-one

Thelossfunctionzoo

Perceptron

Hinge:SVM

Zero-one

Thelossfunctionzoo

Perceptron

Hinge:SVM

Exponential:AdaBoost

Zero-one

Thelossfunctionzoo

Perceptron

Hinge:SVM

Logisticregression

Exponential:AdaBoost

Zero-one

Thelossfunctionzoo

Zoomedout

Thelossfunctionzoo

Zoomedoutevenmore

logistic regression - svivek · logistic regression is the discriminative version. this lecture...

Documents

and logistic regression linear and logistic regression

logistic regression -

regression analysis linear regression logistic regression

logistic regression and discriminant analysis ·...

logistic regression for distribution modeling - clas...

binary logistic regression - juan battlebinary logistic...

logistic regression -...

binary logistic regression multinomial logistic...

1 chapter 16 logistic regression analysis. 2 content...

lecture 20 - logistic regression - statistical science ·...

logistic regression€¦ · logistic regression • combine...

logistic regression - krishanpandey.com...

logistic regression using spss -...

introduction to logistic regression modeling -...

stata for logistic regression - people.umass.edu for...

regression logistic regression

logistic regression iii: advanced topics conditional...

teaching logistic regression using ordinary least …...

logistic regression