loss-augmented structured prediction

Post on 12-May-2022

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Loss-augmentedStructuredPredictionCMSC723/LING723/INST725

MarineCarpuat

Figures,algorithms&equationsfromCIMLchap17

POStaggingSequencelabelingwiththeperceptron

Sequencelabelingproblem• Input:

• sequenceoftokensx=[x1 … xL]• VariablelengthL

• Output(akalabel):• sequenceoftagsy=[y1 … yL]• #tags=K• Sizeofoutputspace?

StructuredPerceptron• Perceptronalgorithmcanbeusedforsequencelabeling

• Buttherearechallenges• Howtocomputeargmax efficiently?• Whatareappropriatefeatures?

• Approach:leveragestructureofoutputspace

Solvingtheargmax problemforsequenceswithdynamicprogramming

• Efficientalgorithmspossibleifthefeaturefunctiondecomposesovertheinput

• ThisholdsforunaryandmarkovfeaturesusedforPOStagging

Featurefunctionsforsequencelabeling

• StandardfeaturesofPOStagging

• Unaryfeatures: #timeswordwhasbeenlabeledwithtaglforallwordswandalltagsl

• Markovfeatures: #timestaglisadjacenttotagl’inoutputforalltagslandl’

• Sizeoffeaturerepresentationisconstantwrtinputlength

Solvingtheargmax problemforsequences

• Trellissequencelabeling• Anypathrepresentsalabelingofinputsentence

• Goldstandardpathinred

• Eachedgereceivesaweightsuchthataddingweightsalongthepathcorrespondstoscoreforinput/ouputconfiguration

• Anymax-weightmax-weightpathalgorithmcanfindtheargmax• e.g.ViterbialgorithmO(LK2)

Definingweightsofedgeintreillis

• Weightofedgethatgoesfromtimel-1totimel,andtransitionsfromytoy’

UnaryfeaturesatpositionltogetherwithMarkovfeaturesthat

endatpositionl

Dynamicprogram

• Define:thescoreofbestpossibleoutputprefixuptoandincludingpositionlthatlabelsthel-th wordwithlabelk

• Withdecomposablefeatures,alphascanbecomputedrecursively

AmoregeneralapproachforargmaxIntegerLinearProgramming• ILP:optimizationproblemoftheform,forafixedvectora

• Withintegerconstraints

• Pro:canleveragewell-engineeredsolvers(e.g.,Gurobi)• Con:notalwaysmostefficient

POStaggingasILP

• Markovfeaturesasbinaryindicatorvariables

• Outputsequence:y(z)obtainedbyreadingoffvariablesz

• Defineasuchthata.z isequaltoscore

• Enforcingconstraintsforwellformedsolutions

Sequencelabeling

• Structuredperceptron• Ageneralalgorithmforstructuredpredictionproblemssuchassequencelabeling

• TheArgmax problem• Efficientargmax forsequenceswithViterbialgorithm,givensomeassumptionsonfeaturestructure

• Amoregeneralsolution:IntegerLinearProgramming

• Loss-augmentedstructuredprediction• Trainingalgorithm• Loss-augmentedargmax

Instructuredperceptron,allerrorsareequallybad

Allbadoutputsequencesarenotequallybad

• Consider• 𝑦"# = 𝐴, 𝐴, 𝐴, 𝐴• 𝑦'# = [𝑁, 𝑉, 𝑁,𝑁]

• HammingLoss

• Givesamorenuancedevaluationofoutputthan0–1loss

Lossfunctionsforstructuredprediction

• Recalllearningasoptimizationforclassification

• e.g.,

• Let’sdefineastructure-awareoptimizationobjective

• e.g.,

Structuredhingeloss• 0iftrueoutputbeats

scoreofeveryimposteroutput

• Otherwise:scaleslinearlyasfunctionofscorediffbetweenmostconfusingimposterandtrueoutput

Optimization:stochasticsubgradient descent

• Subgradients ofstructuredhingeloss?

Optimization:stochasticsubgradient descent

• subgradients ofstructuredhingeloss

Optimization:stochasticsubgradient descentResultingtrainingalgorithm

Only2differencescomparedtostructuredperceptron!

Loss-augmentedinference/searchRecalldynamicprogrammingsolutionwithoutHammingloss

Loss-augmentedinference/searchDynamicprogrammingwithHammingloss

WecanuseViterbialgorithmasbeforeaslongasthelossfunctiondecomposesovertheinputconsistentlywfeatures!

Sequencelabeling

• Structuredperceptron• Ageneralalgorithmforstructuredpredictionproblemssuchassequencelabeling

• TheArgmax problem• Efficientargmax forsequenceswithViterbialgorithm,givensomeassumptionsonfeaturestructure

• Amoregeneralsolution:IntegerLinearProgramming

• Loss-augmentedstructuredprediction• Trainingalgorithm• Loss-augmentedargmax

Syntax&GrammarsFromSequencestoTrees

Syntax&Grammar

• Syntax• FromGreeksyntaxis,meaning“settingouttogether”• referstothewaywordsarearrangedtogether.

• Grammar• Setofstructuralrulesgoverningcompositionofclauses,phrases,andwordsinanygivennaturallanguage• Descriptive,notprescriptive• Panini’sgrammarofSanskrit~2000yearsago

SyntaxandGrammar

• Goalofsyntactictheory• “explainhowpeoplecombinewordstoformsentencesandhowchildrenattainknowledgeofsentencestructure”

• Grammar• implicitknowledgeofanativespeaker• acquiredwithoutexplicitinstruction• minimallyabletogenerateallandonlythepossiblesentencesofthelanguage

[Philips,2003]

SyntaxinNLP

• Syntacticanalysisoftenakeycomponent inapplications• Grammarcheckers• Dialoguesystems• Questionanswering• Informationextraction• Machinetranslation• …

Twoviewsofsyntacticstructure

• Constituency(phrasestructure)• Phrasestructureorganizeswordsinnestedconstituents

• Dependencystructure• Showswhichwordsdependon(modifyorareargumentsof)whichonotherwords

Constituency

• Basicidea:groupsofwordsactasasingleunit

• Constituentsformcoherentclassesthatbehavesimilarly• Withrespecttotheirinternalstructure:e.g.,atthecoreofanounphraseisanoun• Withrespecttootherconstituents:e.g.,nounphrasesgenerallyoccurbeforeverbs

Constituency:Example

• ThefollowingareallnounphrasesinEnglish...

• Why?• Theycanallprecedeverbs• Theycanallbepreposed/postposed• …

GrammarsandConstituency

• Foraparticularlanguage:• Whatarethe“right”setofconstituents?• Whatrulesgovernhowtheycombine?

• Answer:notobviousanddifficult• That’swhytherearemanydifferenttheoriesofgrammarandcompetinganalysesofthesamedata!

• Ourapproach• Focusprimarilyonthe“machinery”

Context-FreeGrammars

• Context-freegrammars(CFGs)• Akaphrasestructuregrammars• AkaBackus-Naurform(BNF)

• Consistof• Rules• Terminals• Non-terminals

Context-FreeGrammars

• Terminals• We’lltakethesetobewords

• Non-Terminals• Theconstituentsinalanguage(e.g.,nounphrase)

• Rules• Consistofasinglenon-terminalontheleftandanynumberofterminalsandnon-terminalsontheright

AnExampleGrammar

ParseTree:Example

Note:equivalencebetweenparsetreesandbracketnotation

DependencyGrammars

• CFGsfocusonconstituents• Non-terminalsdon’tactuallyappearinthesentence

• Independencygrammar,aparseisagraph(usuallyatree)where:• Nodesrepresentwords• Edgesrepresentdependencyrelationsbetweenwords(typedoruntyped,directedorundirected)

DependencyGrammars

• Syntacticstructure=lexicalitemslinkedbybinaryasymmetricalrelationscalleddependencies

DependencyRelations

ExampleDependencyParse

Theyhidtheletterontheshelf

Comparewithconstituentparse…What’stherelation?

UniversalDependenciesproject

• Setofdependencyrelationsthatare• Linguisticallymotivated• Computationallyuseful• Cross-linguisticallyapplicable• [Nivre etal.2016]

• Universaldependencies.org

Summary

• Syntax&Grammar

• Twoviewsofsyntacticstructures• Context-FreeGrammars• Dependencygrammars• Canbeusedtocapturevariousfactsaboutthestructureoflanguage(butnotall!)

• Treebanks asanimportantresourceforNLP

top related