algorithms for nlptbergkir/11711fa17/fa17 11-711...proposal? en vertu de les nouvelle propositions,...
TRANSCRIPT
![Page 1: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/1.jpg)
ClassificationIITaylorBerg-Kirkpatrick– CMU
Slides:DanKlein– UCBerkeley
AlgorithmsforNLP
![Page 2: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/2.jpg)
MinimizeTrainingError?§ Alossfunctiondeclareshowcostlyeachmistakeis
§ E.g.0lossforcorrectlabel,1lossforwronglabel§ Canweightmistakesdifferently(e.g.falsepositivesworsethanfalse
negativesorHammingdistanceoverstructuredlabels)
§ Wecould,inprinciple,minimizetrainingloss:
§ Thisisahard,discontinuousoptimizationproblem
![Page 3: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/3.jpg)
Examples:Perceptron
§ SeparableCase
3
![Page 4: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/4.jpg)
Examples:Perceptron
§ Non-SeparableCase
4
![Page 5: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/5.jpg)
ObjectiveFunctions
§ Whatdowewantfromourweights?§ Depends!§ Sofar:minimize(training)errors:
§ Thisisthe“zero-oneloss”§ Discontinuous,minimizingisNP-complete
§ MaximumentropyandSVMshaveotherobjectivesrelatedtozero-oneloss
![Page 6: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/6.jpg)
LinearModels:MaximumEntropy
§ Maximumentropy(logisticregression)§ Usethescoresasprobabilities:
§ Maximizethe(log)conditionallikelihoodoftrainingdata
Make positive
Normalize
![Page 7: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/7.jpg)
MaximumEntropyII
§ Motivationformaximumentropy:§ Connectiontomaximumentropyprinciple(sortof)§ Mightwanttodoagoodjobofbeinguncertainonnoisycases…
§ …inpractice,though,posteriorsareprettypeaked
§ Regularization(smoothing)
![Page 8: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/8.jpg)
Log-Loss§ Ifweviewmaxentasaminimizationproblem:
§ Thisminimizesthe“logloss”oneachexample
§ Oneview:loglossisanupperbound onzero-oneloss
![Page 9: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/9.jpg)
MaximumMargin
§ Non-separableSVMs§ Addslacktotheconstraints§ Makeobjectivepay(linearly)forslack:
§ Ciscalledthecapacity oftheSVM– thesmoothingknob
§ Learning:§ CanstillstickthisintoMatlabifyouwant§ Constrainedoptimizationishard;bettermethods!§ We’llcomebacktothislater
Note: exist other choices of how to penalize slacks!
![Page 10: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/10.jpg)
RememberSVMs…
§ Wehadaconstrained minimization
§ …butwecansolveforxi
§ Giving
![Page 11: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/11.jpg)
HingeLoss
§ Thisiscalledthe“hingeloss”§ Unlikemaxent /logloss,youstop
gainingobjectiveoncethetruelabelwinsbyenough
§ YoucanstartfromhereandderivetheSVMobjective
§ Cansolvedirectlywithsub-gradientdecent(e.g.Pegasos:Shalev-Shwartz etal07)
§ Consider the per-instance objective:
Plot really only right in binary case
![Page 12: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/12.jpg)
Maxvs“Soft-Max”Margin
§ SVMs:
§ Maxent:
§ Verysimilar!Bothtrytomakethetruescorebetterthanafunctionoftheotherscores§ TheSVMtriestobeattheaugmentedrunner-up§ TheMaxentclassifiertriestobeatthe“soft-max”
You can make this zero
… but not this one
![Page 13: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/13.jpg)
LossFunctions:Comparison
§ Zero-OneLoss
§ Hinge
§ Log
![Page 14: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/14.jpg)
Structure
![Page 15: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/15.jpg)
Handwritingrecognition
brace
Sequential structure
x y
[Slides:Taskar andKlein05]
![Page 16: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/16.jpg)
CFGParsing
The screen was a sea of red
Recursive structure
x y
![Page 17: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/17.jpg)
BilingualWordAlignment
What is the anticipated cost of collecting fees under the new proposal?
En vertu de nouvelle propositions, quel est le côut prévu de perception de les droits?
x yWhat
is the
anticipatedcost
ofcollecting
fees under
the new
proposal?
En vertu delesnouvelle propositions, quel est le côut prévu de perception de le droits?
Combinatorial structure
![Page 18: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/18.jpg)
StructuredModels
Assumption:
Scoreisasumoflocal“part”scores
Parts=nodes,edges,productions
space of feasible outputs
![Page 19: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/19.jpg)
NamedEntityRecognition
Apple Computer bought Smart Systems Inc. located in Arkansas.
ORG ORG --- ORG ORG ORG --- --- LOC
![Page 20: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/20.jpg)
Bilingualwordalignment
§ association§ position§ orthography
Whatis
theanticipated
costof
collecting fees
under the
new proposal
?
En vertu delesnouvelle propositions, quel est le côut prévu de perception de le droits?
j
k
![Page 21: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/21.jpg)
EfficientDecoding§ Commoncase:youhaveablackboxwhichcomputes
atleastapproximately,andyouwanttolearnw
§ Easiestoptionisthestructuredperceptron[Collins01]§ Structureentershereinthatthesearchforthebestyistypicallya
combinatorialalgorithm(dynamicprogramming,matchings,ILPs,A*…)§ Predictionisstructured,learningupdateisnot
![Page 22: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/22.jpg)
StructuredMargin(Primal)
Rememberourprimalmarginobjective?
minw
1
2kwk22 + C
X
i
✓max
y
�w>fi(y) + `i(y)
�� w>fi(y
⇤i )
◆
Stillapplieswithstructuredoutputspace!
![Page 23: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/23.jpg)
StructuredMargin(Primal)
minw
1
2kwk22 + C
X
i
�w>fi(y) + `i(y)� w>fi(y
⇤i )�
y = argmaxy�w>fi(y) + `i(y)
�Justneedefficientloss-augmenteddecode:
rw = w + CX
i
(fi(y)� fi(y⇤i ))
Stillusegeneralsubgradient descentmethods!(Adagrad)
![Page 24: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/24.jpg)
StructuredMargin(Dual)§ Remembertheconstrainedversionofprimal:
§ Dualhasavariableforeveryconstrainthere
![Page 25: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/25.jpg)
§ Wewant:
§ Equivalently:
FullMargin:OCR
a lot!…
“brace”
“brace”
“aaaaa”
“brace” “aaaab”
“brace” “zzzzz”
![Page 26: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/26.jpg)
§ Wewant:
§ Equivalently:
‘It was red’
Parsingexample
a lot!
SA B
C D
SA BD F
SA B
C D
SE F
G H
SA B
C D
SA B
C D
SA B
C D
…
‘It was red’
‘It was red’
‘It was red’
‘It was red’
‘It was red’
‘It was red’
![Page 27: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/27.jpg)
§ Wewant:
§ Equivalently:
‘What is the’‘Quel est le’
Alignmentexample
a lot!…
123
123
‘What is the’‘Quel est le’
123
123
‘What is the’‘Quel est le’
123
123
‘What is the’‘Quel est le’
123
123
123
123
123
123
123
123
‘What is the’‘Quel est le’
‘What is the’‘Quel est le’
‘What is the’‘Quel est le’
![Page 28: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/28.jpg)
CuttingPlane(Dual)§ Aconstraintinductionmethod[Joachimsetal09]
§ Exploitsthatthenumberofconstraintsyouactuallyneedperinstanceistypicallyverysmall
§ Requires(loss-augmented)primal-decodeonly
§ Repeat:§ Findthemostviolatedconstraintforaninstance:
§ Addthisconstraintandresolvethe(non-structured)QP(e.g.withSMOorotherQPsolver)
![Page 29: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/29.jpg)
CuttingPlane(Dual)§ Someissues:
§ CaneasilyspendtoomuchtimesolvingQPs§ Doesn’texploitsharedconstraintstructure§ Inpractice,worksprettywell;fastlikeperceptron/MIRA,morestable,noaveraging
![Page 30: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/30.jpg)
Likelihood,Structured
§ Structureneededtocompute:§ Log-normalizer§ Expectedfeaturecounts
§ E.g.ifafeatureisanindicatorofDT-NNthenweneedtocomputeposteriormarginalsP(DT-NN|sentence)foreachpositionandsum
§ Alsoworkswithlatentvariables(morelater)
![Page 31: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/31.jpg)
Comparison
![Page 32: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/32.jpg)
Option0:Reranking
x = “The screen was a sea of red.”
…Baseline Parser
Input N-Best List(e.g. n=100)
Non-Structured Classification
Output
[e.g. Charniak and Johnson 05]
![Page 33: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/33.jpg)
Reranking§ Advantages:
§ Directlyreducetonon-structuredcase§ Nolocalityrestrictiononfeatures
§ Disadvantages:§ Stuckwitherrorsofbaselineparser§ Baselinesystemmustproducen-bestlists§ But,feedbackispossible[McCloskey,Charniak,Johnson2006]
![Page 34: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/34.jpg)
M3Ns§ Anotheroption:expressallconstraintsinapackedform
§ MaximummarginMarkovnetworks[Taskar etal03]§ Integratessolutionstructuredeeplyintotheproblemstructure
§ Steps§ ExpressinferenceoverconstraintsasanLP§ Usedualitytotransformminimax formulationintomin-min§ Constraintsfactorinthedualalongthesamestructureastheprimal;
alphasessentiallyactasadual“distribution”§ Variousoptimizationpossibilitiesinthedual
![Page 35: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/35.jpg)
Example:Kernels
§ Quadratickernels
![Page 36: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/36.jpg)
Non-LinearSeparators§ Anotherview:kernelsmapanoriginalfeaturespacetosome
higher-dimensionalfeaturespacewherethetrainingsetis(more)separable
Φ: y→ φ(y)
![Page 37: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured](https://reader034.vdocument.in/reader034/viewer/2022052616/609c0b00bf8bbe194e439252/html5/thumbnails/37.jpg)
WhyKernels?§ Can’tyoujustaddthesefeaturesonyourown(e.g.addall
pairsoffeaturesinsteadofusingthequadratickernel)?§ Yes,inprinciple,justcomputethem§ Noneedtomodifyanyalgorithms§ But,numberoffeaturescangetlarge(orinfinite)§ Somekernelsnotasusefullythoughtofintheirexpanded
representation,e.g.RBFordata-definedkernels[HendersonandTitov05]
§ Kernelsletuscomputewiththesefeaturesimplicitly§ Example:implicitdotproductinquadratickerneltakesmuchless
spaceandtimeperdotproduct§ Ofcourse,there’sthecostforusingthepuredualalgorithms…