cs395t: structured models for nlp lecture 9: trees...
TRANSCRIPT
CS395T:StructuredModelsforNLPLecture9:Trees3
GregDurrett
Administrivia‣ Project1dueat*5pm*today
‣ Project2willbeoutbytonight.DueOctober17
‣ ShiL-reduceparser:greedymodel,beamsearchmodel,extension
Recall:Dependencies
DT NNTOVBDDT NNthe housetoranthe dog
‣ Dependencysyntax:syntacScstructureisdefinedbydependencies‣ Head(parent,governor)connectedtodependent(child,modifier)‣ EachwordhasexactlyoneparentexceptfortheROOTsymbol‣ Dependenciesmustformadirectedacyclicgraph
ROOT
Recall:ProjecSvity‣ ProjecSve<->no“crossing”arcs
dogsinhousesandcats thedograntothehouse
credit:LanguageLog
‣ Crossingarcs:
‣ Today:algorithmsforprojecSveparsing
ThisLecture‣ Graph-baseddependencyparsing
‣ TransiSon-based(shiL-reduce)dependencyparsing
‣ Dynamicprogramsforexactinference—lookalotlikesequenSalCRFs
‣ Approximate,greedyinference—fast,butalialebitweird!
Graph-basedDependencyParsing
‣ Howdidweparselexicalizedtrees?
‣ NormalCKYistooslow:grammaristoolargeifitincludeswords
Graph-basedDependencyParsing‣ Naivealgorithm:O(n5)
Y[h] Z[h’]
X[h]
i h k h’ j
‣ CombinespanslikeCKYandlookattheirheads
‣ Fiveindicestoloopover‣ Featurescanlookatspansandheads
‣ Canbeappliedtodependencyparsesaswell!BuildsprojecSvetrees
‣Whatdoourscoreslooklike?Fornow,assumefeaturesonedge(head,child)pairwithsomeweights
Whyisthisinefficient?
DT NNTOVBDDT NNthe housetoranthe dog
ROOT
‣ Lotsofspuriousambiguity—manywaystoderivetherightparses
‣ CansplitateitherpointandwecanbuildupsubtreesY[h] Z[h’]
X[h]
i h k h’ j
Eisner’sAlgorithm:O(n3)
DT NNTOVBDDT NNthe housetoranthe dog
ROOT
‣ Completeitems:allchildrenareaaached,headisatthe“tallend”‣ Incompleteitems:arcfrom“tall”to“short”end,wordonshortendhasparentbutmaybenotallofitschildren
‣ Cubic-SmealgorithmlikeCKY
‣Maintaintwochartswithdimension[n,n,2]:
Eisner’sAlgorithm:O(n3)
DT NNTOVBDDT NNthe housetoranthe dog
ROOT
+
‣ Completeitem:allchildrenareaaached,headisatthe“tallend”‣ Incompleteitem:arcfrom“tallend”to“shortend”,maysSllexpectchildren
‣ Taketwoadjacentcompleteitems,addarcandbuildincompleteitem
= or
+ =
‣ Takeanincompleteitem,completeit(othercaseissymmetric)
Eisner’sAlgorithm:O(n3)
DT NNTOVBDDT NNthe housetoranthe dog
ROOT
1)Buildincompletespan
2)Promotetocomplete
3)Buildincompletespan
+
=
+
or
=
Eisner’sAlgorithm:O(n3)
DT NNTOVBDDT NNthe housetoranthe dog
ROOT
+
=
+
or
=4)Promotetocomplete
Eisner’sAlgorithm:O(n3)
DT NNTOVBDDT NNthe housetoranthe dog
ROOT
‣We’vebuiltleLchildrenandrightchildrenofranascompleteitems
‣ AaachingtoROOTmakesanincompleteitemwithleLchildren,aaacheswithrightchildrensubsequentlytofinishtheparse
Eisner’sAlgorithm
DT NNTOVBDDT NNthe housetoranthe dog
ROOT
‣ Eisner’salgorithmdoesn’thavesplitpointambiguiSeslikethis
‣ LeLandrightchildrenarebuiltindependently,headsareedgesofspans
‣ Chartsarenxnx2becauseweneedtotrackarcdirecSon/leLvsright
Eisner:
n5
MSTParser‣ Viewdependencyparsingasfindingamaximumdirectspanningtree—spaceofallspanningtrees,sowefindnonprojecSvetreestoo!
‣ Chu-Liu-EdmondsalgorithmtofindthebestMSTinO(n2)
McDonaldetal.(2005)
‣ Ironically,thesoLwarearSfactcalledMSTParserhasanimplementaSonofEisner’salgorithm,whichiswhatmostpeopleuse
‣ Thisonlycomputesmaxes,butthereisanalgorithmforsummingoveralltreesaswell(matrix-treetheorem)
BuildingSystems
‣ CanimplementViterbidecodingandmarginalcomputaSonusingEisner’salgorithmorMSTtomax/sumoverprojecSve/nonprojecSvetrees
‣ SameconceptassequenSalCRFsforNER,canalsousemargin-basedmethods—youknowhowtoimplementthese!
‣ Featuresareoverdependencyedges
FeaturesinGraph-BasedParsing‣ Dynamicprogramexposestheparentandchildindices
‣McDonaldetal.(2005)—conjuncSonsofparentandchildwords+POS,POSofwordsinbetween,POSofsurroundingwords.~91UAS
DT NNTOVBDDT NNthe housetoranthe dog
ROOT
‣ Leietal.(2014)—waysoflearningconjuncSonsofthese
‣ HEAD=TO&MOD=NN‣ HEAD=TO&MOD-1=the
‣ HEAD=TO&MOD=house‣ HEAD=TO&MOD=DT
FeaturesinGraph-BasedParsing
DT NNTOVBDDT NNthe housetoranthe dog
ROOT
‣ Ideallywouldusefeaturesonmorearcs
‣ Grandparents:ran->to->house
‣ Siblings:dog<-ran->to
Higher-OrderParsing‣ TerryKoo(2010)
‣ TrackaddiSonalstateduringparsingsowecanlookatgrandparentsandsiblings,O(n4)
‣ AddiSonalindicatorfeaturesbasedonthisinformaSon,~93UAS(upfrom91UAS)
‣ Turnsoutyoucanjustusebeamsearchandforgetthiscrazydynamicprogram…
ShiL-ReduceParsing
ShiL-ReduceParsing
‣ SimilartodeterminisScparsersforcompilers
‣ AtreeisbuiltfromasequenceofincrementaldecisionsmovingleLtorightthroughthesentence
‣ ShiLsconsumethebuffer,reducesbuildatreeonthestack
‣ StackcontainingparSally-builttree,buffercontainingrestofsentence
‣ AlsocalledtransiSon-basedparsing
ShiL-ReduceParsing
Iatesomespaghewbolognese
ROOT
‣ ShiL1:Stack:[ROOTI]Buffer:[atesomespaghewbolognese]
‣ ShiL:topofbuffer->topofstack
‣ IniSalstate:Stack:[ROOT]Buffer:[Iatesomespaghewbolognese]
‣ ShiL2:Stack:[ROOTIate]Buffer:[somespaghewbolognese]
ShiL-ReduceParsing
Iatesomespaghewbolognese
ROOT
‣ State:Stack:[ROOTIate]Buffer:[somespaghewbolognese]
‣ LeL-arc(reduceoperaSon):Letdenotethestack�‣ “Poptwoelements,addanarc,putthembackonthestack”
‣ State:Stack:[ROOTate]Buffer:[somespaghewbolognese]
I
�|w�2, w�1 ! �|w�1 w�1w�2 isnowachildof,
Arc-StandardParsing
‣ Start:stackcontains[ROOT],buffercontains[Iatesomespaghewbolognese]
‣ ShiL:topofbuffer->topofstack‣ LeL-Arc: �|w�2, w�1 ! �|w�1 w�1w�2
‣ Right-Arc �|w�2, w�1 ! �|w�2
isnowachildof,
w�1 w�2,
Iatesomespaghewbolognese
‣ End:stackcontains[ROOT],bufferisempty[]
‣Musttake2nstepsfornwords(nshiLs,nLA/RA)
isnowachildof
ROOT
‣ Arc-standardsystem:threeoperaSons
Arc-StandardParsing
[Iatesomespaghewbolognese][ROOT]
[ROOTI]
[ROOTIate]
[ROOTate]
I
S
S
L
‣ CoulddotheleLarclater!Butnoreasontowait‣ Can’taaachROOT<-ateyeteventhoughthisisacorrectdependency!
Stopofbuffer->topofstackLARA
[Isomespaghewbolognese]
[somespaghewbolognese]
[somespaghewbolognese]
Iatesomespaghewbolognese
ROOTpoptwo,leLarcbetweenthempoptwo,rightarcbetweenthem
Arc-StandardParsing
[ROOTate]
I
[somespaghewbolognese]
[ROOTatesomespaghew]
I
[bolognese]
[ROOTatespaghew]
I some
[bolognese]
S
L
Iatesomespaghewbolognese
S
ROOT
S
Stopofbuffer->topofstackLARA
poptwo,leLarcbetweenthempoptwo,rightarcbetweenthem
Arc-StandardParsing
[ROOTatespaghewbolognese]
I some
[ROOTatespaghew]
I some bolognese[ROOTate]
Isome bolognesespaghew
‣ StackconsistsofallwordsthataresSllwaiSngforrightchildren,endwithabunchofright-arcops
[ROOT]
Isome bolognesespaghew
ate
[]
Iatesomespaghewbolognese
ROOT
[]
[][]
Finalstate:
R
R
Stopofbuffer->topofstackLARA
poptwo,leLarcbetweenthempoptwo,rightarcbetweenthem
OtherSystems‣ Arc-eager(Nivre,2004):letsyouaddrightarcssoonerandkeepsitemsonstack,separatereduceacSonthatclearsoutthestack
‣ Arc-swiL(QiandManning,2017):explicitlychooseaparentfromwhat’sonthestack
‣Manywaystodecomposethese,whichoneworksbestdependsonthelanguageandfeatures
BuildingShiL-ReduceParsers
[ROOTatesomespaghew]
I
[bolognese]
‣ CorrectacSonisleL-arc
‣MulS-wayclassificaSonproblem:shiL,leL-arc,orright-arc?
[ROOT] [Iatesomespaghewbolognese]
‣ Howdowemaketherightdecisioninthiscase?
‣ Howdowemaketherightdecisioninthiscase?(allthreeacSonslegal)
‣ Onlyonelegalmove(shiL)
FeaturesforShiL-ReduceParsing
[ROOTatesomespaghew]
I
[bolognese]
‣ FeaturestoknowthisshouldleL-arc?
‣ Oneoftheharderfeaturedesigntasks!
‣ Inthiscase:thestacktagsequenceVBD-DT-NNispreayinformaSve—lookslikeaverbtakingadirectobjectwhichhasadeterminerinit
‣ Thingstolookat:topwords/POSofbuffer,topwords/POSofstack,leLmostandrightmostchildrenoftopitemsonthestack
TrainingaGreedyModel
‣ Useouroracletoextractparserstates+correctdecisions
‣ Problem:nolookahead
[ROOTatesomespaghew]
I
[bolognese]
‣ Trainaclassifiertopredicttherightdecisionusingtheseastrainingdata
‣ Thealgorithmwe’vedevelopedsofarisanoracle,tellsusthecorrectstatetransiSonsequenceforeachtree
‣ Nolookahead‣ Trainingdataisextractedassumingeverythingiscorrect
DynamicOracle
‣ Needadynamicoracletodeterminewhat’stheopSmalthingtodoevenifmistakeshavealreadybeenmade(soweknowhowtosuperviseit)
[ROOTatesomespaghew]
I
[bolognese]
‣ ExtracttrainingdatabasedontheoraclebutalsoanexecuSontraceofatrainedparser
GoldbergandNivre(2012)
‣We’llseesimilarideasinneuralnetcontextsaswell
SpeedTradeoffs
UnopSmizedS-R{{{{
ChenandManning(2014)
OpSmizedS-R
Graph-basedNeuralS-R
‣ OpSmizedconsStuencyparsersare~5sentences/sec
‣ UsingS-Rusedtomeantakingaperformancehitcomparedtograph-based,that’snolongertrue
GlobalDecoding
[ROOTatesomespaghew]
I
[bolognese]
‣ Trytofindthehighest-scoringsequenceofdecisions
‣ Globalsearchproblem,requiresapproximatesearch
GlobalDecoding
[ROOTgavehim]
I
[dinner]
‣ Correct:Right-arc,ShiL,Right-arc,Right-arc
Igavehimdinner
ROOT
[ROOTgave]
I
[dinner]
him
[ROOTgavedinner]
I
[]
him
[ROOTgave]
I
[]
him dinner
GlobalDecoding:ACartoon
S
LA
RA
‣ Bothwrong!Alsobothprobablylowscoring!
RA S‣ Correct,highscoringopSon
[ROOTgavehim]
I
[dinner]Igavehimdinner
ROOT
[ROOTgavehimdinner]
I
[]
LA
[ROOTgave]
I him
[dinner]
GlobalDecoding:ACartoon
[ROOTgavehim]
I
[dinner]Igavehimdinner
ROOT
‣ Lookaheadcanhelpusavoidgewngstuckinbadspots
‣ Globalmodel:maximizesumofscoresoveralldecisions
‣ SimilartohowViterbiworks:wemaintainuncertaintyoverthecurrentstatesothatifanotheronelooksmoreopSmalgoingforward,wecanusethatone
Recap
‣ Eisner’salgorithmforgraph-basedparsing
‣ Arc-standardsystemfortransiSon-basedparsing
‣ Runaclassifieranddoitgreedilyfornow,we’llseeglobalsystemsnextSme