loss-augmented structured prediction

40
Loss-augmented Structured Prediction CMSC 723 / LING 723 / INST 725 Marine Carpuat Figures, algorithms & equations from CIML chap 17

Upload: others

Post on 12-May-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Loss-augmented Structured Prediction

Loss-augmentedStructuredPredictionCMSC723/LING723/INST725

MarineCarpuat

Figures,algorithms&equationsfromCIMLchap17

Page 2: Loss-augmented Structured Prediction

POStaggingSequencelabelingwiththeperceptron

Sequencelabelingproblem• Input:

• sequenceoftokensx=[x1 … xL]• VariablelengthL

• Output(akalabel):• sequenceoftagsy=[y1 … yL]• #tags=K• Sizeofoutputspace?

StructuredPerceptron• Perceptronalgorithmcanbeusedforsequencelabeling

• Buttherearechallenges• Howtocomputeargmax efficiently?• Whatareappropriatefeatures?

• Approach:leveragestructureofoutputspace

Page 3: Loss-augmented Structured Prediction

Solvingtheargmax problemforsequenceswithdynamicprogramming

• Efficientalgorithmspossibleifthefeaturefunctiondecomposesovertheinput

• ThisholdsforunaryandmarkovfeaturesusedforPOStagging

Page 4: Loss-augmented Structured Prediction

Featurefunctionsforsequencelabeling

• StandardfeaturesofPOStagging

• Unaryfeatures: #timeswordwhasbeenlabeledwithtaglforallwordswandalltagsl

• Markovfeatures: #timestaglisadjacenttotagl’inoutputforalltagslandl’

• Sizeoffeaturerepresentationisconstantwrtinputlength

Page 5: Loss-augmented Structured Prediction

Solvingtheargmax problemforsequences

• Trellissequencelabeling• Anypathrepresentsalabelingofinputsentence

• Goldstandardpathinred

• Eachedgereceivesaweightsuchthataddingweightsalongthepathcorrespondstoscoreforinput/ouputconfiguration

• Anymax-weightmax-weightpathalgorithmcanfindtheargmax• e.g.ViterbialgorithmO(LK2)

Page 6: Loss-augmented Structured Prediction

Definingweightsofedgeintreillis

• Weightofedgethatgoesfromtimel-1totimel,andtransitionsfromytoy’

UnaryfeaturesatpositionltogetherwithMarkovfeaturesthat

endatpositionl

Page 7: Loss-augmented Structured Prediction

Dynamicprogram

• Define:thescoreofbestpossibleoutputprefixuptoandincludingpositionlthatlabelsthel-th wordwithlabelk

• Withdecomposablefeatures,alphascanbecomputedrecursively

Page 8: Loss-augmented Structured Prediction
Page 9: Loss-augmented Structured Prediction

AmoregeneralapproachforargmaxIntegerLinearProgramming• ILP:optimizationproblemoftheform,forafixedvectora

• Withintegerconstraints

• Pro:canleveragewell-engineeredsolvers(e.g.,Gurobi)• Con:notalwaysmostefficient

Page 10: Loss-augmented Structured Prediction

POStaggingasILP

• Markovfeaturesasbinaryindicatorvariables

• Outputsequence:y(z)obtainedbyreadingoffvariablesz

• Defineasuchthata.z isequaltoscore

• Enforcingconstraintsforwellformedsolutions

Page 11: Loss-augmented Structured Prediction

Sequencelabeling

• Structuredperceptron• Ageneralalgorithmforstructuredpredictionproblemssuchassequencelabeling

• TheArgmax problem• Efficientargmax forsequenceswithViterbialgorithm,givensomeassumptionsonfeaturestructure

• Amoregeneralsolution:IntegerLinearProgramming

• Loss-augmentedstructuredprediction• Trainingalgorithm• Loss-augmentedargmax

Page 12: Loss-augmented Structured Prediction

Instructuredperceptron,allerrorsareequallybad

Page 13: Loss-augmented Structured Prediction

Allbadoutputsequencesarenotequallybad

• Consider• 𝑦"# = 𝐴, 𝐴, 𝐴, 𝐴• 𝑦'# = [𝑁, 𝑉, 𝑁,𝑁]

• HammingLoss

• Givesamorenuancedevaluationofoutputthan0–1loss

Page 14: Loss-augmented Structured Prediction

Lossfunctionsforstructuredprediction

• Recalllearningasoptimizationforclassification

• e.g.,

• Let’sdefineastructure-awareoptimizationobjective

• e.g.,

Structuredhingeloss• 0iftrueoutputbeats

scoreofeveryimposteroutput

• Otherwise:scaleslinearlyasfunctionofscorediffbetweenmostconfusingimposterandtrueoutput

Page 15: Loss-augmented Structured Prediction

Optimization:stochasticsubgradient descent

• Subgradients ofstructuredhingeloss?

Page 16: Loss-augmented Structured Prediction

Optimization:stochasticsubgradient descent

• subgradients ofstructuredhingeloss

Page 17: Loss-augmented Structured Prediction

Optimization:stochasticsubgradient descentResultingtrainingalgorithm

Only2differencescomparedtostructuredperceptron!

Page 18: Loss-augmented Structured Prediction

Loss-augmentedinference/searchRecalldynamicprogrammingsolutionwithoutHammingloss

Page 19: Loss-augmented Structured Prediction

Loss-augmentedinference/searchDynamicprogrammingwithHammingloss

WecanuseViterbialgorithmasbeforeaslongasthelossfunctiondecomposesovertheinputconsistentlywfeatures!

Page 20: Loss-augmented Structured Prediction

Sequencelabeling

• Structuredperceptron• Ageneralalgorithmforstructuredpredictionproblemssuchassequencelabeling

• TheArgmax problem• Efficientargmax forsequenceswithViterbialgorithm,givensomeassumptionsonfeaturestructure

• Amoregeneralsolution:IntegerLinearProgramming

• Loss-augmentedstructuredprediction• Trainingalgorithm• Loss-augmentedargmax

Page 21: Loss-augmented Structured Prediction

Syntax&GrammarsFromSequencestoTrees

Page 22: Loss-augmented Structured Prediction
Page 23: Loss-augmented Structured Prediction

Syntax&Grammar

• Syntax• FromGreeksyntaxis,meaning“settingouttogether”• referstothewaywordsarearrangedtogether.

• Grammar• Setofstructuralrulesgoverningcompositionofclauses,phrases,andwordsinanygivennaturallanguage• Descriptive,notprescriptive• Panini’sgrammarofSanskrit~2000yearsago

Page 24: Loss-augmented Structured Prediction

SyntaxandGrammar

• Goalofsyntactictheory• “explainhowpeoplecombinewordstoformsentencesandhowchildrenattainknowledgeofsentencestructure”

• Grammar• implicitknowledgeofanativespeaker• acquiredwithoutexplicitinstruction• minimallyabletogenerateallandonlythepossiblesentencesofthelanguage

[Philips,2003]

Page 25: Loss-augmented Structured Prediction

SyntaxinNLP

• Syntacticanalysisoftenakeycomponent inapplications• Grammarcheckers• Dialoguesystems• Questionanswering• Informationextraction• Machinetranslation• …

Page 26: Loss-augmented Structured Prediction

Twoviewsofsyntacticstructure

• Constituency(phrasestructure)• Phrasestructureorganizeswordsinnestedconstituents

• Dependencystructure• Showswhichwordsdependon(modifyorareargumentsof)whichonotherwords

Page 27: Loss-augmented Structured Prediction

Constituency

• Basicidea:groupsofwordsactasasingleunit

• Constituentsformcoherentclassesthatbehavesimilarly• Withrespecttotheirinternalstructure:e.g.,atthecoreofanounphraseisanoun• Withrespecttootherconstituents:e.g.,nounphrasesgenerallyoccurbeforeverbs

Page 28: Loss-augmented Structured Prediction

Constituency:Example

• ThefollowingareallnounphrasesinEnglish...

• Why?• Theycanallprecedeverbs• Theycanallbepreposed/postposed• …

Page 29: Loss-augmented Structured Prediction

GrammarsandConstituency

• Foraparticularlanguage:• Whatarethe“right”setofconstituents?• Whatrulesgovernhowtheycombine?

• Answer:notobviousanddifficult• That’swhytherearemanydifferenttheoriesofgrammarandcompetinganalysesofthesamedata!

• Ourapproach• Focusprimarilyonthe“machinery”

Page 30: Loss-augmented Structured Prediction

Context-FreeGrammars

• Context-freegrammars(CFGs)• Akaphrasestructuregrammars• AkaBackus-Naurform(BNF)

• Consistof• Rules• Terminals• Non-terminals

Page 31: Loss-augmented Structured Prediction

Context-FreeGrammars

• Terminals• We’lltakethesetobewords

• Non-Terminals• Theconstituentsinalanguage(e.g.,nounphrase)

• Rules• Consistofasinglenon-terminalontheleftandanynumberofterminalsandnon-terminalsontheright

Page 32: Loss-augmented Structured Prediction

AnExampleGrammar

Page 33: Loss-augmented Structured Prediction

ParseTree:Example

Note:equivalencebetweenparsetreesandbracketnotation

Page 34: Loss-augmented Structured Prediction

DependencyGrammars

• CFGsfocusonconstituents• Non-terminalsdon’tactuallyappearinthesentence

• Independencygrammar,aparseisagraph(usuallyatree)where:• Nodesrepresentwords• Edgesrepresentdependencyrelationsbetweenwords(typedoruntyped,directedorundirected)

Page 35: Loss-augmented Structured Prediction

DependencyGrammars

• Syntacticstructure=lexicalitemslinkedbybinaryasymmetricalrelationscalleddependencies

Page 36: Loss-augmented Structured Prediction

DependencyRelations

Page 37: Loss-augmented Structured Prediction

ExampleDependencyParse

Theyhidtheletterontheshelf

Comparewithconstituentparse…What’stherelation?

Page 38: Loss-augmented Structured Prediction
Page 39: Loss-augmented Structured Prediction

UniversalDependenciesproject

• Setofdependencyrelationsthatare• Linguisticallymotivated• Computationallyuseful• Cross-linguisticallyapplicable• [Nivre etal.2016]

• Universaldependencies.org

Page 40: Loss-augmented Structured Prediction

Summary

• Syntax&Grammar

• Twoviewsofsyntacticstructures• Context-FreeGrammars• Dependencygrammars• Canbeusedtocapturevariousfactsaboutthestructureoflanguage(butnotall!)

• Treebanks asanimportantresourceforNLP