logistic regression & neural networksanswer: back-propagation calculate derivative with chain...

LogisticRegression&NeuralNetworksCMSC723/LING723/INST725

MarineCarpuat

Slidescredit:GrahamNeubig,JacobEisenstein

LogisticRegression

Perceptron&Probabilities

• Whatifwewantaprobabilityp(y|x)?

• Theperceptrongivesusapredictiony• Let’sillustratethiswithbinaryclassification

Illustrations:GrahamNeubig

Thelogisticfunction

• “Softer”functionthaninperceptron

• Canaccountforuncertainty

• Differentiable

Logisticregression:howtotrain?

• Trainbasedonconditionallikelihood• Findparametersw thatmaximizeconditionallikelihoodofallanswers𝑦" givenexamples𝑥"

Stochasticgradientascent(ordescent)• Onlinetrainingalgorithmforlogisticregression

• andotherprobabilisticmodels

• Updateweightsforeverytrainingexample• Moveindirectiongivenbygradient• Sizeofupdatestepscaledbylearningrate

Gradientofthelogisticfunction

Example:Person/not-personclassificationproblemGivenanintroductorysentenceinWikipediapredictwhetherthearticleisaboutaperson

Example:initialupdate

Example:secondupdate

Howtosetthelearningrate?

• Variousstrategies• decayovertime

𝛼 =1

𝐶 + 𝑡

• Useheld-outtestset,increaselearningratewhenlikelihoodincreases

ParameterNumberofsamples

Multiclassversion

Somemodelsarebetterthenothers…• Considerthese2examples

• Whichofthe2modelsbelowisbetter?

Classifier2willprobablygeneralizebetter!Itdoesnotincludeirrelevantinformation=>Smallermodelisbetter

Regularization

• Apenaltyonaddingextraweights

• L2regularization:• bigpenaltyonlargeweights• smallpenaltyonsmallweights

• L1regularization:• Uniformincreasewhenlargeorsmall• Willcausemanyweightstobecomezero

𝑤 +

𝑤 ,

L1regularizationinonlinelearning

Whatyoushouldknow

• Standardsupervisedlearningset-upfortextclassification• Differencebetweentrainvs.testdata• Howtoevaluate

• 3examplesofsupervisedlinearclassifiers• NaïveBayes,Perceptron,LogisticRegression• Learningasoptimization:whatistheobjectivefunctionoptimized?• Differencebetweengenerativevs.discriminativeclassifiers• Smoothing,regularization• Overfitting,underfitting

Neuralnetworks

Person/not-personclassificationproblemGivenanintroductorysentenceinWikipediapredictwhetherthearticleisaboutaperson

Formalizingbinaryprediction

ThePerceptron:a“machine”tocalculateaweightedsum

sign - 𝑤".

"/,⋅ ϕ" 𝑥

φ“A” = 1φ“site” = 1

φ“,” = 2

φ“located” = 1

φ“in” = 1

φ“Maizuru”= 1

φ“Kyoto” = 1φ“priest” = 0φ“black” = 0

0-30000020

ThePerceptron:Geometricinterpretation

Limitationofperceptron● canonlyfindlinearseparations betweenpositiveandnegativeexamples

NeuralNetworks● Connecttogethermultipleperceptrons

φ“A” = 1φ“site” = 1

φ“,” = 2

φ“located” = 1

φ“in” = 1

φ“Maizuru”= 1

● Motivation:Canrepresentnon-linearfunctions!

NeuralNetworks:keyterms

φ“A” = 1φ“site” = 1

φ“,” = 2

φ“located” = 1

φ“in” = 1

φ“Maizuru”= 1

• Input(akafeatures)• Output• Nodes• Layers• Hiddenlayers• Activationfunction(non-linear)

• Multi-layerperceptron

Example● Createtwoclassifiers

φ0(x2) = {1, 1}φ0(x1) = {-1, 1}

φ0(x4) = {1, -1}φ0(x3) = {-1, -1}

φ0[0]

φ0[1]

φ0[0]

φ0[1] φ1[0]

φ0[0]

φ0[1]

φ1[1]

Example● Theseclassifiersmaptoanewspace

φ0(x2) = {1, 1}φ0(x1) = {-1, 1}

φ0(x4) = {1, -1}φ0(x3) = {-1, -1}

-1-1-1

φ1[1]

φ1[0]

φ1[1]

φ1(x1) = {-1, -1}X O

φ1(x2) = {1, -1}

φ1(x3) = {-1, 1}

φ1(x4) = {-1, -1}

Example● Innewspace,theexamplesarelinearlyseparable!

φ0(x2) = {1, 1}φ0(x1) = {-1, 1}

φ0(x4) = {1, -1}φ0(x3) = {-1, -1}

-1-1-1

φ0[0]

φ0[1]

φ1[1]

φ1[0]

φ1[1]

φ1(x1) = {-1, -1}X O φ1(x2) = {1, -1}

Oφ1(x3) = {-1, 1}

φ1(x4) = {-1, -1}

φ2[0] = y

Examplewrap-up:Forwardpropagation

● Thefinalnet

φ0[0]

φ0[1]

φ0[0]

φ0[1]

φ1[0]

φ1[1]

φ2[0]

Softmax Functionformulticlassclassification

● Sigmoid function for multiple classes

● Can be expressed using matrix/vector ops

𝑃 𝑦 ∣ 𝑥 =𝑒𝐰⋅6 7,9

∑ 𝑒𝐰⋅6 7,9;�9;

Current class

Sum of other classes

𝐫 = exp 𝐖 ⋅ ϕ 𝑥, 𝑦

𝐩 = 𝐫 - ��

E∈𝐫G

StochasticGradientDescentOnlinetrainingalgorithmforprobabilisticmodels

w=0for I iterationsforeachlabeledpairx,y inthedata

w +=α *dP(y|x)/dw

Inotherwords• For every training example, calculate the gradient

(the direction that will increase the probability of y)• Move in that direction, multiplied by learning rate α

GradientoftheSigmoidFunctionTakethederivativeoftheprobability

𝑑𝑑𝑤 𝑃 𝑦 = 1 ∣ 𝑥 =

𝑑𝑑𝑤

𝑒𝐰⋅6 7

1 + 𝑒𝐰⋅6 7

= ϕ 𝑥𝑒𝐰⋅6 7

1 + 𝑒𝐰⋅6 7 +

𝑑𝑑𝑤 𝑃 𝑦 = −1 ∣ 𝑥 =

𝑑𝑑𝑤 1 −

𝑒𝐰⋅6 7

1 + 𝑒𝐰⋅6 7

= −ϕ 𝑥𝑒𝐰⋅6 7

1 + 𝑒𝐰⋅6 7 +

Learning:WeDon'tKnowtheDerivativeforHiddenUnits!

ForNNs,onlyknowcorrecttagforlastlayer

y=1ϕ 𝑥

𝑑𝑃 𝑦 = 1 ∣ 𝐱𝑑𝐰𝟒

= 𝐡 𝑥𝑒𝐰𝟒⋅𝐡 7

1 + 𝑒𝐰𝟒⋅𝐡 7 +

𝐡 𝑥

𝑑𝑃 𝑦 = 1 ∣ 𝐱𝑑𝐰𝟏

𝑑𝑃 𝑦 = 1 ∣ 𝐱𝑑𝐰𝟐

𝑑𝑃 𝑦 = 1 ∣ 𝐱𝑑𝐰𝟑

Answer:Back-PropagationCalculatederivativewithchainrule

𝑑𝑃 𝑦 = 1 ∣ 𝑥𝑑𝐰𝟏

=𝑑𝑃 𝑦 = 1 ∣ 𝑥𝑑𝐰𝟒𝐡 𝐱

𝑑𝐰𝟒𝐡 𝐱𝑑ℎ, 𝐱

𝑑ℎ, 𝐱𝑑𝐰𝟏

𝑒𝐰𝟒⋅𝐡 7

1 + 𝑒𝐰𝟒⋅𝐡 7 +𝑤,,R

Error ofnext unit (δ4)

Weight Gradient ofthis unit

𝑑𝑃 𝑦 = 1 ∣ 𝐱𝐰𝐢

=𝑑ℎ" 𝐱𝑑𝐰𝐢

- δU�

U𝑤",UIn General

Calculate i basedon next units j:

Backpropagation=

Gradientdescent+

Chainrule

FeedForwardNeuralNetsAllconnectionspointforward

yϕ 𝑥

Itisadirectedacyclicgraph(DAG)

NeuralNetworks

• Non-linearclassification

• Prediction:forwardpropagation• Vector/matrixoperations+non-linearities

• Training:backpropagation+stochasticgradientdescent

Formoredetails,seeCIMLChap7

logistic regression & neural networksanswer: back-propagation calculate derivative with chain...

Documents

fs4 d e f - anne arundel county, maryland · h5 h5 h1 fs1...

klj developer launches klj square sector 83 gurgaon...

revista de coches bjpm klj (2)

blakelands - loopnet€¦ · from hi-tech companies to...

gce qualification level guidance h2, h5) details the...

klj greens

scr banking klj

rollno name ahi code h1 h2 h3 h4 h5 h6 hp1 hp2 hp3 hp4 … o...

html- teks , , , , , , , … - pendukung teks , , , , … -...

c:8eq8 klj f9a l@icf u,fn

intelligent decentralized williams_dfinity _edcon.pdf · h7...

klj developers pvt. ltd faridabad

peak 886 quick reference guide ver c. (pn:...

conduit bodies, covers and gaskets. . . . . h2-h5 mogul ......

scottish gardenplant award plant listing · acorus...

5.3: disjoint sets - university of cambridge · disjoint...

act public sector legal professionals ... · web viewh1...

klj square,sector 83,gurgaon

animal influenza surveillance: strengthening activities in...

klj developers in faridabad