probabilistic parsing
DESCRIPTION
Probabilistic Parsing. Ling 571 Fei Xia Week 5: 10/25-10/27/05. Outline. Lexicalized CFG (Recap) Hw5 and Project 2 Parsing evaluation measures: ParseVal Collin’s parser TAG Parsing summary. Lexicalized CFG recap. Important equations. Lexicalized CFG. Lexicalized rules: - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/1.jpg)
Probabilistic Parsing
Ling 571Fei Xia
Week 5: 10/25-10/27/05
![Page 2: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/2.jpg)
Outline
• Lexicalized CFG (Recap)• Hw5 and Project 2• Parsing evaluation measures: ParseVal • Collin’s parser
• TAG• Parsing summary
![Page 3: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/3.jpg)
Lexicalized CFG recap
![Page 4: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/4.jpg)
Important equations
),...|(),...,(
),...,()(
111
1
,...,11
2
ii
in
AAn
AAAPAAP
AAPAPn
![Page 5: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/5.jpg)
Lexicalized CFG
• Lexicalized rules:
• Sparse data problem– First generate the head– Then generate the unlexicalized rule
)()...()()()...()()()....()(
1111
11
mmnn
nn
rRrRhHlLlLhAwBwBwA
![Page 6: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/6.jpg)
Lexicalized models
))(),(|(*)))((),(|)((
)),...,),(|(*)),...,|)((
),...,|)(,(
),...|(
),...,(),(
1
11111
111
111
1
iiiiiii
iiiiii
iii
i
ii
i
n
rhrlhsrPrmhrlhsrhP
lrlrrhrPlrlrrhP
lrlrrhrP
lrlrlrP
lrlrPSTP
![Page 7: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/7.jpg)
An example
• he likes her
),|Pr(*),|(*),|(*),|(*
),|Pr(*),|(*),|(*),|(
),(
herNPonNPPlikesNPherPlikesVPVNPVPPlikesVPlikesPheNPonNPPlikesNPheP
likesSNPVPSPSlikesPSTP
![Page 8: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/8.jpg)
An example
• he likes her
),Pr|(Pr*),Pr|(*),|(*),|(*),Pr|(Pr*),Pr|(*),|Pr(*),|(*),|(*),|(*
),|Pr(*),|(*),|(*),|(*
),|(*),|(),(
heronheronPheronherPlikesVlikesVPlikesVlikesP
heonheonPheonhePherNPonNPPlikesNPherPlikesVPVNPVPPlikesVPlikesPheNPonNPPlikesNPheP
likesSNPVPSPlikesSlikesPlikesTopSTopPToplikesP
STP
![Page 9: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/9.jpg)
Head-head probability
)...)(...)(()....)(...)((
),(),,(),(),,(
),|(
1
21
1
12
1
12
12
wAwXCwAwXC
wACwAwC
wAPwAwP
wAwP
w
)...)(...)(()...)(...)((),|(wNPlikesXC
heNPlikesXClikesNPheP
w
![Page 10: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/10.jpg)
Head-rule probability
))(())((
))(())((
))(())((
),(),,(
),|(
wACwAC
wACwAC
wAPwAP
wAPwAAP
wAAP
))(()Pr)((),|Pr(
heNPConheNPCheNPonNPP
![Page 11: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/11.jpg)
Estimate parameters
))(())((),|(
)...)(...)(()....)(...)((
),|(1
2112
wACwACwAAP
wAwXCwAwXC
wAwP
w
![Page 12: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/12.jpg)
Building a statistical tool • Design a model:
– Objective function: generative model vs. discriminative model
– Decomposition: independence assumption– The types of parameters and parameter size
• Training: estimate model parameters– Supervised vs. unsupervised– Smoothing methods
• Decoding:
![Page 13: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/13.jpg)
Team Project 1 (Hw5)• Form a team: program language, schedule, expertise,
etc.
• Understand the lexicalized model
• Design the training algorithm
• Work out the decoding (parsing) algorithm: augment CYK algorithm.
• Illustrate the algorithms with a real example.
![Page 14: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/14.jpg)
Team Project 2
• Task: parse real data with a real grammar extracted from a treebank.
• Parser: PCFG or lexicalized PCFG
• Training data: English Penn Treebank Section 02-21
• Development data: section 00
![Page 15: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/15.jpg)
Team Project 2 (cont)
• Hw6: extract PCFG from the treebank• Hw7: make sure your parser works given
real grammar and real sentences; measure parsing performance
• Hw8: improve parsing results• Hw10: write a report and give a
presentation
![Page 16: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/16.jpg)
Parsing evaluation measures
![Page 17: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/17.jpg)
Evaluation of parsers: ParseVal
• Labeled recall: • Labeled precision: • Labeled F-measure:
• Complete match: % of sents where recall and precision are 100%
• Average crossing: # of crossing per sent• No crossing: % of sents which have no crossing.
![Page 18: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/18.jpg)
An example
Gold standard: (VP (V saw) (NP (Det the) (N man)) (PP (P with) (NP (Det a) (N telescope))))
Parser output: (VP (V saw) (NP (NP (Det the) (N man)) (PP (P with) (NP (Det a) (N telescope)))))
![Page 19: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/19.jpg)
ParseVal measures
• Gold standard: (VP, 1, 6), (NP, 2, 3), (PP, 4, 6), (NP, 5, 6)
• System output: (VP, 1, 6), (NP, 2, 6), (NP, 2, 3), (PP, 4,
6), (NP, 5, 6)
• Recall=4/4, Prec=4/5, crossing=0
![Page 20: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/20.jpg)
A different annotationGold standard: (VP (V saw) (NP (Det the) (N’ (N man)) (PP (P with) (NP (Det a) (N’ (N telescope)))))
Parser output: (VP (V saw) (NP (Det the) (N’ (N man) (PP (P with) (NP (Det a) (N’ (N telescope)))))))
![Page 21: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/21.jpg)
ParseVal measures (cont)• Gold standard: (VP, 1, 6), (NP, 2, 3), (N’, 3, 3), (PP, 4, 6), (NP, 5, 6), (N’, 6,6)
• System output: (VP, 1, 6), (NP, 2, 6), (N’, 3, 6), (PP, 4, 6), (NP, 5, 6), (N’, 6, 6)
• Recall=4/6, Prec=4/6, crossing=1
![Page 22: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/22.jpg)
EVALB
• A tool that calculates ParseVal measures• To run it:
evalb –p parameter_file gold_file system_output• A copy is available in my dropbox• You will need it for Team Project 2
![Page 23: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/23.jpg)
Summary of Parsing evaluation measures
• ParseVal is the widely used: F-measure is the most important
• The results depend on annotation style• EVALB is a tool that calculates ParseVal
measures• Other measures are used too: e.g.,
accuracy of dependency links
![Page 24: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/24.jpg)
History-based models
![Page 25: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/25.jpg)
History-based models
• History-based approaches maps (T, S) into a decision sequence
• Probability of tree T for sentence S is:
)),....,(|(
),...,|(
),....,(),(
11
11
1
iii
iii
n
ddfdP
dddP
ddPSTP
ndd ,....1
![Page 26: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/26.jpg)
History-based models (cont)
• PCFGs can be viewed as a history-based model
• There are other history-based models– Magerman’s parser (1995)– Collin’s parsers (1996, 1997, ….)– Charniak’s parsers (1996,1997,….)– Ratnaparkhi’s parser (1997)
![Page 27: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/27.jpg)
Collins’ models
• Model 1: Generative model of (Collins, 1996)
• Model 2: Add complement/adjunct distinction
• Model 3: Add wh-movement
![Page 28: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/28.jpg)
Model 1
• First generate the head constituent label• Then generate left and right dependents
),,|)()...((*),,|)()...((*
),|())()...(,,,|)()...((*
),,|)()...((*),|(
),|)()...()()()...()((
11
11
1111
11
1111
hHArRrRPhHAlLlLP
hAHPlLlLhHArRrRP
hHAlLlLPhAHP
hArRrRhHlLlLhAP
mm
nn
H
nnmm
nn
H
mmnn
![Page 29: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/29.jpg)
Model 1(cont)
),,|)((
))()....(,,,|)((
),,|)()...((
),,|)((
))()....(,,,|)((
),,|)()...((
1111
11
1111
11
hHArRP
rRrRhHArRP
hHArRrRP
hHAlLP
lLlLhHAlLP
hHAlLlLP
iiL
iiiiL
mm
iiL
iiiiL
nn
![Page 30: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/30.jpg)
An example
),,|(*),,|(
*),,|)((*),,|)((
*),|(),|(
boughtVPSSTOPPboughtVPSSTOPP
boughtVPSweekNPPboughtVPSmarksNPP
boughtSVPPboughtSruleP
R
L
L
L
H
)()()()(: boughtVPMarksNPweekNPboughtSrule
Sentence: Last week Marks bought Brooks.
![Page 31: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/31.jpg)
Model 2
• Generate a head label H• Choose left and right subcat frames• Generate left and right arguments• Generate left and right modifiers
![Page 32: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/32.jpg)
An example
{}),,,|(*{}),,,|(
*{}),,,|)((*}){,,,|)((
*),,|({}*),,|}({*),|(
),|(
boughtVPSSTOPPboughtVPSSTOPP
boughtVPSweekNPPNPboughtVPSmarksNPP
boughtVPSPboughtVPSNPPboughtSVPP
boughtSruleP
R
L
L
ccL
rcclc
H
)()()()(: boughtVPMarksNPweekNPboughtSrule c
![Page 33: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/33.jpg)
Model 3
• Add Trace and wh-movement• Given that the LHS of a rule has a gap,
there are three ways to pass down the gap– Head: S(+gap)NP VP(+gap)– Left: S(+gap)NP(+gap) VP– Right: SBAR(that)(+gap)WHNP(that)
S(+gap)
![Page 34: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/34.jpg)
Parsing results
LR LP
Model 1 87.4% 88.1%
Model 2 88.1% 88.6%
Model 3 88.1% 88.6%
![Page 35: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/35.jpg)
Tree Adjoining Grammar (TAG)
![Page 36: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/36.jpg)
TAG
• TAG basics:
• Extension of LTAG– Lexicalized TAG (LTAG)– Synchronous TAG (STAG)– Multi-component TAG (MCTAG)– ….
![Page 37: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/37.jpg)
TAG basics• A tree-rewriting formalism (Joshi et. al, 1975)
• It can generate mildly context-sensitive languages.
• The primitive elements of a TAG are elementary trees.
• Elementary trees are combined by two operations: substitution and adjoining.
• TAG has been used in – parsing, semantics, discourse, etc.– Machine translation, summarization, generation, etc.
![Page 38: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/38.jpg)
Two types of elementary trees
VP
ADVP
ADV
still
VP*
Initial tree: Auxiliary tree:
S
NP VP
V NP
draft
![Page 39: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/39.jpg)
Substitution operation
![Page 40: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/40.jpg)
They draft policies
![Page 41: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/41.jpg)
Adjoining operation
Y
Y*Y*
![Page 42: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/42.jpg)
They still draft policies
![Page 43: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/43.jpg)
Derivation tree
Elementary trees
Derived tree
Derivation tree
![Page 44: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/44.jpg)
Derived tree vs. derivation tree
• The mapping is not 1-to-1.• Finding the best derivation is not the same
as finding the best derived tree.
![Page 45: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/45.jpg)
S
V
do
S*
they
PN
NP
Wh-movementWhat do they draft ?
i
S
iNP S
NP VP
V NP
draft
N
what do
PN
they
i
i
S
NP S
V S
NP VPV NP
draft
what
NP
N
![Page 46: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/46.jpg)
What does John think they draft ?
S
V
does
S*
S
NP VP
V S*
think
Long-distance wh-movement
S
SNP
NP VP
V NP
draft i
i
does
think
i
i
S
NP S
V S
NP VP
S
NP VPV
draft
NP
what
John
they
![Page 47: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/47.jpg)
Who did you have dinner with?
have
S
NP VP
NPV
S
NPS*
PN
who
iPP
P NP
with
VP
VP*
i
S
NP
PN
who PP
P NP
with
VP
VP
have
S
NP
NPV
i
i
![Page 48: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/48.jpg)
TAG extension
• Lexicalized TAG (LTAG)• Synchronized TAG (STAG)• Multi-component TAG (MCTAG)• ….
![Page 49: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/49.jpg)
STAG
• The primitive elements in STAG are elementary tree pairs.
• Used for MT
![Page 50: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/50.jpg)
Summary of TAG• A formalism beyond CFG• Primitive elements are trees, not rules• Extended domain of locality• Two operations: substitution and adjoining
• Parsing algorithm: • Statistical parser for TAG• Algorithms for extracting TAG from treebanks.
)( 6nO
![Page 51: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/51.jpg)
Parsing summary
![Page 52: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/52.jpg)
Types of parsers• Phrase structure vs. dependency tree• Statistical vs. rule-based• Grammar-based or not• Supervised vs. unsupervised
Our focus:Phrase structureMainly statisticalMainly Grammar-based: CFG, TAGSupervised
![Page 53: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/53.jpg)
Grammars• Chomsky hierarchy:
– Unstricted grammar (type 0)– Context-sensitive grammar – Context-free grammar– Regular grammarHuman languages are beyond context-free
• Other formalism– HPSG, LFG– TAG– Dependency grammars
![Page 54: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/54.jpg)
Parsing algorithm for CFG
• Top-down• Bottom-up• Top-down with bottom-up filter• Earley algorithm• CYK algorithm
– Requiring CFG to be in CNF– Can be augmented to deal with PCFG,
lexicalized CFG, etc.
![Page 55: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/55.jpg)
Extensions of CFG
• PCFG: find the most likely parse trees
• Lexicalized CFG: – use less strong independence assumption– Account for certain types of lexical and
structural dependency
![Page 56: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/56.jpg)
Beyond CFG
• History-based models– Collins’ parsers
• TAG– Tree-writing– Mildly context-sensitive grammar– Many extensions: LTAG, STAG, …
![Page 57: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/57.jpg)
Statistical approach• Modeling
– Choose the objective function– Decompose the function:
• Common equations: joint, conditional, marginal probabilities• Independency assumptions
• Training: – Supervised vs. unsupervised– Smoothing
• Decoding– Dynamic programming– Pruning
![Page 58: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/58.jpg)
Evaluation of parsers
• Accuracy: ParseVal• Robustness• Resources needed• Efficiency• Richness
![Page 59: Probabilistic Parsing](https://reader030.vdocument.in/reader030/viewer/2022020322/56814c53550346895db969c2/html5/thumbnails/59.jpg)
Other things
• Converting into CNF:– CFG– PCFG– Lexicalized CFG
• Treebank annotation– Tagset: syntactic labels, POS tag, function
tag, empty categories– Format: indentation, brackets