probabilistic parsing chapter 14, part 2
DESCRIPTION
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney. Vanilla PCFG Limitations. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/1.jpg)
Probabilistic Parsing
Chapter 14, Part 2
•This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney
![Page 2: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/2.jpg)
2
Vanilla PCFG Limitations
• Since probabilities of productions do not rely on specific words or concepts, only general structural disambiguation is possible (e.g. prefer to attach PPs to Nominals).
• Consequently, vanilla PCFGs cannot resolve syntactic ambiguities that require semantics to resolve, e.g. ate with fork vs. meatballs.
• In order to work well, PCFGs must be lexicalized,
![Page 3: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/3.jpg)
Example of Importance of Lexicalization
• A general preference for attaching PPs to NPs rather than VPs can be learned by a vanilla PCFG.
• But the desired preference can depend on specific words.
3
S → NP VPS → VPNP → Det A NNP → NP PPNP → PropNA → εA → Adj APP → Prep NPVP → V NPVP → VP PP
0.90.10.50.30.20.60.41.00.70.3
English
PCFG Parser
S
NP VP
John V NP PP
put the dog in the pen
John put the dog in the pen.
3
![Page 4: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/4.jpg)
4
Example of Importance of Lexicalization
• A general preference for attaching PPs to NPs rather than VPs can be learned by a vanilla PCFG.
• But the desired preference can depend on specific words.
S → NP VPS → VPNP → Det A NNP → NP PPNP → PropNA → εA → Adj APP → Prep NPVP → V NPVP → VP PP
0.90.10.50.30.20.60.41.00.70.3
English
PCFG Parser
S
NP VP
John V NP
put the dog in the penXJohn put the dog in the pen.
4
![Page 5: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/5.jpg)
5
![Page 6: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/6.jpg)
Head Words
• Syntactic phrases usually have a word in them that is most “central” to the phrase.
• Linguists have defined the concept of a lexical head of a phrase.
• Simple rules can identify the head of any phrase by percolating head words up the parse tree.– Head of a VP is the main verb
– Head of an NP is the main noun
– Head of a PP is the preposition
– Head of a sentence is the head of its VP
6
![Page 7: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/7.jpg)
Head propagation in parse trees
VP (bought)
bought
NP (book)
the
NN (book)
book
S (bought)
the
NN (man)
man DT (the)
V (bought)DT (the)
NP (man)
7
![Page 8: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/8.jpg)
Lexicalized Productions
• Specialized productions can be generated by including the head word and its POS of each non-terminal as part of that non-terminal’s symbol.
S
VP
VBD NP
DT Nominal
Nominal PP
liked
IN NP
in
the
dog
NN
DT Nominal
NNthe
pen
NNP
NP
John
pen-NN
pen-NN
in-INdog-NN
dog-NN
dog-NN
liked-VBD
liked-VBD
John-NNP
Nominaldog-NN → Nominaldog-NN PPin-IN
8
![Page 9: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/9.jpg)
Lexicalized Productions
S
VP
VP PP
DT Nominalput
IN NP
in
the
dog
NN
DT Nominal
NNthe
pen
NNP
NP
John
pen-NN
pen-NN
in-IN
dog-NN
dog-NN
put-VBD
put-VBD
John-NNP
NPVBD
put-VBD
VPput-VBD → VPput-VBD PPin-IN
9
![Page 10: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/10.jpg)
Parameters
• We used to have– VP -> V NP PP P(r|VP)
• That’s the count of this rule divided by the number of VPs in a treebank
• Now we have– VP(dumped-VBD)-> V(dumped-VBD) NP(sacks-NNS)PP(into-P)
– P(r where dumped is the verb ^ sacks is the head of the NP ^ into is the head of the PP|VP whose head is dumped-VBD)
– Not likely to have significant counts in any treebank
10
![Page 11: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/11.jpg)
Challenges for Probabilistic Lexicalized Grammar
• Huge number of rules
• Most rules are too specific, so it is hard to make good probability estimates for them
• Need to make more manageable independence assumptions
• Many approaches have been developed; We’ll look at one (simplified version)
11
![Page 12: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/12.jpg)
Collins Parser [1997]
• Replaces production rules with a model of bilexical relationships
• Focus on head and dependency relationship (see notes in lecture)
• Generate parses in smaller steps; generate the head, then generate siblings assuming they are conditionally independent given the head
12
![Page 13: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/13.jpg)
Dependency Representation
• Represented as an 8 tuple:
(head word, head tag, modifier word, modifier tag,
parent non-terminal, head non-terminal,
modifier non-terminal, direction)
Example CFG rule:
VP(bought, VBD) V(bought, VBD) NP(book,NN) PP(with,Prep)
Decomposes into two dependencies
(bought, VBD, book, NN, VP, V, NP, right)
(bought, VBD, with, Prep, VP, V, PP, right)
13
![Page 14: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/14.jpg)
Collins: (see lecture)
VP(bought, VBD) V(bought, VBD) NP(book,NN) PP(with,Prep)(head word, head tag, modifier word, modifier tag,parent non-terminal, head non-
terminal, modifier non-terminal, direction)
1. Prhead(headNT| parentNT, headword, headtag)x
2. Prleft(modNT, modword, modtag | parentNT, headNT, headword, headtag) x … x
3. Prleft(STOP| parentNT, headNT, headword, headtag)x
4. Prright(modNT, modword, modtag | parentNT, headNT, headword, headtag) x … x
5. Prright(STOP| parentNT, headNT, headword, headtag)
6. Prhead(V| VP, bought, VBD) x
7. Prleft(STOP| VP, V, bought, VBD)x
8. Prright(NP, book, NN | VP, V, bought, VBD) x
9. Prright(PP, with, Prep | VP, V, bought, VBD) x
10. Prright(STOP| VP, V, bought, VBD)
14
![Page 15: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/15.jpg)
Estimating Production Generation Parameters
• Smooth estimates by linearly interpolating with simpler models conditioned on just POS tag or no lexical info. Using the version in the text:
smPR(PPin-IN | VPput-VBD) = 1 PR(PPin-IN | VPput-VBD)
+ (1 1) (2 PR(PPin-IN | VPVBD) +
(1 2) PR(PPin-IN | VP))
15
![Page 16: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/16.jpg)
Missed Context Dependence
• Another problem with CFGs is that which production is used to expand a non-terminal is independent of its context.
• However, this independence is frequently violated for normal grammars.– NPs that are subjects are more likely to be
pronouns than NPs that are objects.
16
![Page 17: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/17.jpg)
Splitting Non-Terminals
• To provide more contextual information, non-terminals can be split into multiple new non-terminals based on their parent in the parse tree using parent annotation.– A subject NP becomes NP^S since its parent
node is an S.– An object NP becomes NP^VP since its parent
node is a VP
17
![Page 18: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/18.jpg)
Parent Annotation Example
18
S
VP
VBD NP
DT Nominal
Nominal PP
liked
IN NP
in
the
dog
NN
DT Nominal
NNthe
pen
NNP
NP
John
^NP
^PP
^Nominal^Nominal
^NP
^VP
^S^S
^Nominal
^NP
^PP^Nominal
^NP
^VP^NP
VP^S → VBD^VP NP^VP
![Page 19: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/19.jpg)
Split and Merge
• Non-terminal splitting greatly increases the size of the grammar and the number of parameters that need to be learned from limited training data.
• Best approach is to only split non-terminals when it improves the accuracy of the grammar.
• May also help to merge some non-terminals to remove some un-helpful distinctions and learn more accurate parameters for the merged productions.
• Method: Heuristically search for a combination of splits and merges that produces a grammar that maximizes the likelihood of the training treebank.
19
![Page 20: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/20.jpg)
20
Treebanks
• English Penn Treebank: Standard corpus for testing syntactic parsing consists of 1.2 M words of text from the Wall Street Journal (WSJ).
• Typical to train on about 40,000 parsed sentences and test on an additional standard disjoint test set of 2,416 sentences.
• Chinese Penn Treebank: 100K words from the Xinhua news service.
• Other corpora existing in many languages, see the Wikipedia article “Treebank”
![Page 21: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/21.jpg)
First WSJ Sentence
21
( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (, ,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (, ,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (. .) ))
![Page 22: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/22.jpg)
WSJ Sentence with Trace (NONE)
22
( (S (NP-SBJ (DT The) (NNP Illinois) (NNP Supreme) (NNP Court) ) (VP (VBD ordered) (NP-1 (DT the) (NN commission) ) (S (NP-SBJ (-NONE- *-1) ) (VP (TO to) (VP (VP (VB audit) (NP (NP (NNP Commonwealth) (NNP Edison) (POS 's) ) (NN construction) (NNS expenses) )) (CC and) (VP (VB refund) (NP (DT any) (JJ unreasonable) (NNS expenses) )))))) (. .) ))
![Page 23: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/23.jpg)
23
Parsing Evaluation Metrics
• PARSEVAL metrics measure the fraction of the constituents that match between the computed and human parse trees. If P is the system’s parse tree and T is the human parse tree (the “gold standard”):– Recall = (# correct constituents in P) / (# constituents in T)
– Precision = (# correct constituents in P) / (# constituents in P)
• Labeled Precision and labeled recall require getting the non-terminal label on the constituent node correct to count as correct.
• F1 is the harmonic mean of precision and recall.
![Page 24: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/24.jpg)
Computing Evaluation Metrics
Correct Tree TS
VP
Verb NP
Det Nominal
Nominal PP
book
Prep NP
through
Houston
Proper-Noun
the
flight
Noun
Computed Tree P
VP
Verb NP
Det Nominalbook
Prep NP
through
Houston
Proper-Nounthe
flight
Noun
S
VP
PP
# Constituents: 12 # Constituents: 12# Correct Constituents: 10
Recall = 10/12= 83.3% Precision = 10/12=83.3% F1 = 83.3%24
![Page 25: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/25.jpg)
25
Treebank Results
• Results of current state-of-the-art systems on the English Penn WSJ treebank are slightly greater than 90% labeled precision and recall.
![Page 26: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/26.jpg)
Discriminative Parse Reranking
• Motivation: Even when the top-ranked parse not correct, frequently the correct parse is one of those ranked highly by a statistical parser.
• Use a classifier that is trained to select the best parse from the N-best parses produced by the original parser.
• Reranker can exploit global features of the entire parse whereas a PCFG is restricted to making decisions based on local info.
26
![Page 27: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/27.jpg)
2-Stage Reranking Approach
• Adapt the PCFG parser to produce an N-best list of the most probable parses in addition to the most-likely one.
• Extract from each of these parses, a set of global features that help determine if it is a good parse tree.
• Train a classifier (e.g. logistic regression) using the best parse in each N-best list as positive and others as negative.
27
![Page 28: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/28.jpg)
Parse Reranking
28
sentence N-Best Parse Trees
PCFG Parser
Parse Tree Feature Extractor
Parse Tree Descriptions
Parse Tree Classifier Best
Parse Tree
![Page 29: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/29.jpg)
Sample Parse Tree Features
• Probability of the parse from the PCFG.• The number of parallel conjuncts.
– “the bird in the tree and the squirrel on the ground”– “the bird and the squirrel in the tree”
• The degree to which the parse tree is right branching.– English parses tend to be right branching (cf. parse of
“Book the flight through Houston”)
• Frequency of various tree fragments, i.e. specific combinations of 2 or 3 rules.
29
![Page 30: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/30.jpg)
Evaluation of Reranking
• Reranking is limited by oracle accuracy, i.e. the accuracy that results when an omniscient oracle picks the best parse from the N-best list.
• Typical current oracle accuracy is around F1=97%
• Reranking can generally improve test accuracy of current PCFG models a percentage point or two.
30
![Page 31: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/31.jpg)
Dependency Grammars
• An alternative to phrase-structure grammar is to define a parse as a directed graph between the words of a sentence representing dependencies between the words.
31
liked
John dog
pen
inthe
the
liked
John dog
pen
in
the
the
nsubj dobj
det
det
Typed dependencyparse
![Page 32: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/32.jpg)
Dependency Graph from Parse Tree
• Can convert a phrase structure parse to a dependency tree by making the head of each non-head child of a node depend on the head of the head child.
32
S
VP
VBD NP
DT Nominal
Nominal PP
liked
IN NP
in
the
dog
NN
DT Nominal
NNthe
pen
NNP
NP
John
pen-NN
pen-NN
in-INdog-NN
dog-NN
dog-NN
liked-VBD
liked-VBD
John-NNP
liked
John dog
pen
inthe
the
![Page 33: Probabilistic Parsing Chapter 14, Part 2](https://reader035.vdocument.in/reader035/viewer/2022062314/568144db550346895db1a702/html5/thumbnails/33.jpg)
Statistical Parsing Conclusions
• Statistical models such as PCFGs allow for probabilistic resolution of ambiguities.
• PCFGs can be easily learned from treebanks.• Lexicalization and non-terminal splitting are
required to effectively resolve many ambiguities.
• Current statistical parsers are quite accurate but not yet at the level of human-expert agreement.
33