algorithms for nlptbergkir/11711fa17/fa17 11-711 lecture 15... · 2017-10-19 · p1 shout-outs §...

Post on 01-Aug-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ParsingVITaylorBerg-Kirkpatrick– CMU

Slides:DanKlein– UCBerkeley

AlgorithmsforNLP

P1Shout-outs§ Saksham Singhal -- implementedpseudo-tries.Usedimplicitcaching(storedthemost

frequentn-gramsontopofhashtables)andexplicitcaching.§ Soumya Wadhwa,Tejas Nama-- approximatedbyignoringalltrigramswithcount1.

ThatdroppedBLEUscorebylessthan0.1onlybutfreedhalfthememory!§ CraigStewart-- rehashannealingidea.Maderesizingfactorandloadfactorchange

witheveryrehashtoconvergeto0.9loadfactortominimizewastedspace.§ GriffinThomasAdams-- Builta"waterfall"tieredcachesystem§ DeanAlderucci -- Builtaclasstopackdatatypesofarbitrarysizeintoanarrayoflongs.

Builtacustomimplementationoflogthatranfaster.§ RobinJonathanAlgayres – Contexttrie!§ Raghuram Mandyam Annasamy -- Useddatabaseinspiredsharding techniqueonkeys§ Xianyang Chen-- Compressedhashtableanddidsmarterbinarysearchbyindexing

chunkswiththesamelastword§ Aldrian Obaja -- ImplementedNestedMap,achieving792MBofmemory.§ Otherthingsmanypeopledid -- LRUcaching,packingmultiplevalues(countsand

contextfertilites)intoasinglelong,binarysearchinsteadofhashtable.

GrammarProjections

NP→DT@NP

Coarse Grammar Fine Grammar

DT

NP

JJ

@NP

@NPNN

@NP

NN

DT^NP

NP^VP

JJ^NP

@NP^VP[DT]

@NP^VP[…,NN]NN^NP

@NP^VP[…,JJ]

NN^NP

NP^VP→DT^NP@NP^VP[DT]

Note:X-BarGrammarsareprojectionswithruleslikeXP→Y@XorXP→@XYor@X→X

EfficientParsingforStructuralAnnotation

Coarse-to-FinePruning

… QP NP VP …coarse:

fine:

E.g.considerthespan5to12:

< thresholdP (X|i, j, S)

Coarse-to-FinePruning

ForeachcoarsechartitemX[i,j],computeposteriorprobability:

… QP NP VP …coarse:

fine:

E.g.considerthespan5to12:

< threshold↵(X, i, j) · �(X, i, j)

↵(root, 0, n)

ComputingMarginals↵(X, i, j) =

X

X!Y Z

X

k2(i,j)

P (X ! Y Z)↵(Y, i, k)↵(Z, k, j)

ComputingMarginals�(X, i, j) =

X

Y!ZX

X

k2[0,i)

P (Y ! ZX)�(Y, k, j)↵(B, k, i)

+X

Y!XZ

X

k2(j,n]

P (Y ! XZ)�(Y, i, k)↵(Z, j, k)

EfficientParsingforLexicalGrammars

LexicalizedTrees

§ Add“headwords”toeachphrasalnode§ Syntacticvs.semantic

heads§ Headshipnotin(most)

treebanks§ Usuallyuseheadrules,

e.g.:§ NP:

§ TakeleftmostNP§ TakerightmostN*§ TakerightmostJJ§ Takerightchild

§ VP:§ TakeleftmostVB*§ TakeleftmostVP§ Takeleftchild

LexicalizedPCFGs?§ Problem:wenowhavetoestimateprobabilitieslike

§ Nevergoingtogettheseatomicallyoffofatreebank

§ Solution:breakupderivationintosmallersteps

LexicalDerivationSteps§ Aderivationofalocaltree[Collins99]

Choose a head tag and word

Choose a complement bag

Generate children (incl. adjuncts)

Recursively derive children

LexicalizedCKY

bestScore(X,i,j,h)if (j = i+1)return tagScore(X,s[i])

elsereturn max max score(X[h]->Y[h] Z[h’]) *

bestScore(Y,i,k,h) *bestScore(Z,k,j,h’)

max score(X[h]->Y[h’] Z[h]) *bestScore(Y,i,k,h’) *bestScore(Z,k,j,h)

Y[h] Z[h’]

X[h]

i h k h’ j

k,h’,X->YZ

(VP->VBD •)[saw] NP[her]

(VP->VBD...NP •)[saw]

k,h’,X->YZ

QuarticParsing§ Turnsout,youcando(alittle)better[Eisner99]

§ GivesanO(n4)algorithm§ Stillprohibitiveinpracticeifnotpruned

Y[h] Z[h’]

X[h]

i h k h’ j

Y[h] Z

X[h]

i h k j

PruningwithBeams§ TheCollinsparserpruneswithper-

cellbeams[Collins99]§ Essentially,runtheO(n5) CKY§ Rememberonlyafewhypothesesfor

eachspan<i,j>.§ IfwekeepKhypothesesateachspan,

thenwedoatmostO(nK2)workperspan(why?)

§ Keepsthingsmoreorlesscubic(andinpracticeismorelikelinear!)

§ Also:certainspansareforbiddenentirelyonthebasisofpunctuation(crucialforspeed)

Y[h] Z[h’]

X[h]

i h k h’ j

PruningwithaPCFG

§ TheCharniak parserprunesusingatwo-pass,coarse-to-fineapproach[Charniak 97+]§ First,parsewiththebasegrammar§ ForeachX:[i,j]calculateP(X|i,j,s)

§ Thisisn’ttrivial,andtherearecleverspeedups§ Second,dothefullO(n5) CKY

§ SkipanyX:[i,j]whichhadlow(say,<0.0001)posterior§ Avoidsalmostallworkinthesecondphase!

§ Charniak etal06:canusemorepasses§ Petrov etal07:canusemanymorepasses

Results

§ Someresults§ Collins99– 88.6F1(generativelexical)§ Charniak andJohnson05– 89.7/91.3F1(generativelexical/reranked)

§ Petrov etal06– 90.7F1(generativeunlexical)§ McClosky etal06– 92.1F1(gen+rerank +self-train)

LatentVariablePCFGs

§ Annotationrefinesbasetreebank symbolstoimprovestatisticalfitofthegrammar§ Parentannotation[Johnson’98]

TheGameofDesigningaGrammar

§ Annotationrefinesbasetreebank symbolstoimprovestatisticalfitofthegrammar§ Parentannotation[Johnson’98]§ Headlexicalization [Collins’99,Charniak ’00]

TheGameofDesigningaGrammar

§ Annotationrefinesbasetreebank symbolstoimprovestatisticalfitofthegrammar§ Parentannotation[Johnson’98]§ Headlexicalization [Collins’99,Charniak ’00]§ Automaticclustering?

TheGameofDesigningaGrammar

LatentVariableGrammars

Parse Tree Sentence Parameters

...

Derivations

Backward

LearningLatentAnnotations

EMalgorithm:

X1

X2 X7X4

X5 X6X3

He was right

.

§ Brackets are known§ Base categories are known§ Only induce subcategories

JustlikeForward-BackwardforHMMs.Forward

RefinementoftheDTtag

DT

DT-1 DT-2 DT-3 DT-4

Hierarchicalrefinement

HierarchicalEstimationResults

74

76

78

80

82

84

86

88

90

100 300 500 700 900 1100 1300 1500 1700Total Number of grammar symbols

Par

sing

acc

urac

y (F

1)

Model F1Flat Training 87.3Hierarchical Training 88.4

Refinementofthe,tag§ Splittingallcategoriesequallyiswasteful:

Adaptive Splitting

§ Want to split complex categories more§ Idea: split everything, roll back splits which

were least useful

AdaptiveSplittingResults

Model F1Previous 88.4With 50% Merging 89.5

0

5

10

15

20

25

30

35

40

NP

VP PP

ADVP S

ADJP

SBAR Q

P

WH

NP

PRN

NX

SIN

V

PRT

WH

PP SQ

CO

NJP

FRAG

NAC UC

P

WH

ADVP INTJ

SBAR

Q

RR

C

WH

ADJP X

RO

OT

LST

NumberofPhrasalSubcategories

NumberofLexicalSubcategories

0

10

20

30

40

50

60

70

NNP JJ

NNS NN VBN RB

VBG VB VBD CD IN

VBZ

VBP DT

NNPS CC JJ

RJJ

S :PR

PPR

P$ MD

RBR

WP

POS

PDT

WRB

-LRB

- .EX

WP$

WDT

-RRB

- ''FW RB

S TO$

UH, ``

SYM RP LS #

LearnedSplits

§ Proper Nouns (NNP):

§ Personal pronouns (PRP):

NNP-14 Oct. Nov. Sept.NNP-12 John Robert JamesNNP-2 J. E. L.NNP-1 Bush Noriega Peters

NNP-15 New San WallNNP-3 York Francisco Street

PRP-0 It He IPRP-1 it he theyPRP-2 it them him

§ Relativeadverbs(RBR):

§ CardinalNumbers(CD):

RBR-0 further lower higherRBR-1 more less MoreRBR-2 earlier Earlier later

CD-7 one two ThreeCD-4 1989 1990 1988CD-11 million billion trillionCD-0 1 50 100CD-3 1 30 31CD-9 78 58 34

LearnedSplits

FinalResults(Accuracy)

≤ 40 wordsF1

all F1

ENG

Charniak&Johnson ‘05 (generative) 90.1 89.6

Split / Merge 90.6 90.1

GER

Dubey ‘05 76.3 -

Split / Merge 80.8 80.1

CH

N

Chiang et al. ‘02 80.0 76.6

Split / Merge 86.3 83.4

Still higher numbers from reranking / self-training methods

EfficientParsingforHierarchicalGrammars

Coarse-to-FineInference§ Example:PPattachment

?????????

HierarchicalPruning

… QP NP VP …coarse:

split in two: … QP1 QP2 NP1 NP2 VP1 VP2 …

… QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:

split in eight: … … … … … … … … … … … … … … … … …

BracketPosteriors

1621min111min35min

15min(nosearcherror)

OtherSyntacticModels

DependencyParsing

§ Lexicalizedparserscanbeseenasproducingdependencytrees

§ Eachlocalbinarytreecorrespondstoanattachmentinthedependencygraph

questioned

lawyer witness

the the

DependencyParsing

§ Puredependencyparsingisonlycubic[Eisner99]

§ Someworkonnon-projective dependencies§ Commonin,e.g.Czechparsing§ CandowithMSTalgorithms[McDonaldandPereira05]

Y[h] Z[h’]

X[h]

i h k h’ j

h h’

h

h k h’

Shift-ReduceParsers

§ Anotherwaytoderiveatree:

§ Parsing§ Nousefuldynamicprogrammingsearch§ Canstillusebeamsearch[Ratnaparkhi97]

TreeInsertionGrammars

§ Rewritelarge(possiblylexicalized)subtrees inasinglestep

§ Formally,atree-insertiongrammar§ Derivationalambiguitywhethersubtrees weregeneratedatomically

orcompositionally§ MostprobableparseisNP-complete

TIG:Insertion

Tree-adjoininggrammars

§ Startwithlocaltrees§ Caninsertstructure

withadjunctionoperators

§ Mildlycontext-sensitive

§ Modelslong-distancedependenciesnaturally

§ …aswellasotherweirdstuffthatCFGsdon’tcapturewell(e.g.cross-serialdependencies)

TAG:LongDistance

CCGParsing

§ CombinatoryCategorialGrammar§ Fully(mono-)

lexicalizedgrammar§ Categoriesencode

argumentsequences§ Verycloselyrelated

tothelambdacalculus(morelater)

§ Canhavespuriousambiguities(why?)

EmptyElements

EmptyElements

EmptyElements§ InthePTB,threekindsofemptyelements:

§ Nullitems(usuallycomplementizers)§ Dislocation(WH-traces,topicalization,relativeclauseandheavyNPextraposition)

§ Control(raising,passives,control,sharedargumentation)

§ Needtoreconstructthese(andresolveanyindexation)

Example:English

Example:German

TypesofEmpties

APattern-MatchingApproach§ [Johnson02]

Pattern-MatchingDetails§ Somethingliketransformation-basedlearning§ Extractpatterns

§ Details:transitiveverbmarking,auxiliaries§ Details:legalsubtrees

§ Rankpatterns§ Pruningranking:bycorrect/matchrate§ Applicationpriority:bydepth

§ Pre-ordertraversal§ Greedymatch

TopPatternsExtracted

Results

SemanticRoles

SemanticRoleLabeling(SRL)

§ Characterizeclausesasrelations withroles:

§ SaysmorethanwhichNPisthesubject(butnotmuchmore):§ Relationslikesubject aresyntactic,relationslikeagent ormessage are

semantic§ Typicalpipeline:

§ Parse,thenlabelroles§ Almostallerrorslockedinbyparser§ Really,SRLisquitealoteasierthanparsing

SRLExample

PropBank/FrameNet

§ FrameNet:rolessharedbetweenverbs§ PropBank:eachverbhasitsownroles§ PropBank moreused,becauseit’slayeredoverthetreebank (andsohas

greatercoverage,plusparses)§ Note:somelinguistictheoriespostulatefewerrolesthanFrameNet (e.g.

5-20total:agent,patient,instrument,etc.)

PropBankExample

PropBankExample

PropBankExample

SharedArguments

PathFeatures

Results

§ Features:§ Pathfromtargettofiller§ Filler’ssyntactictype,headword,case§ Target’sidentity§ Sentencevoice,etc.§ Lotsofothersecond-orderfeatures

§ Goldvsparsedsourcetrees

§ SRLisfairlyeasyongoldtrees

§ Harderonautomaticparses

ParseReranking

§ Assumethenumberofparsesisverysmall§ WecanrepresenteachparseTasafeaturevectorj(T)

§ Typically,alllocalrulesarefeatures§ Alsonon-localfeatures,likehowright-branchingtheoveralltreeis§ [Charniak andJohnson05]givesarichsetoffeatures

K-BestParsing [Huang and Chiang 05, Pauls, Klein, Quirk 10]

top related