-
Web-Scale Features for Full-Scale Parsing
Mohit Bansal and Dan Klein
UC Berkeley
1
-
Example
They considered running the ad during the Super Bowl.
running * during à 11K running it during à 239
considered * during à 7K considered it during à 112
[Nakov and Hearst 2005b]
head arg
-
Canonical Ambiguity Type 1
VBD
VP
NP
NN spent
PP
with family
time
VBD
VP
NN
spent
PP
with family
time
Prepositional phrase (PP) attachment ambiguities (isolated)
vs.
spent with time with
[Volk 2001; Nakov & Hearst 2005b]
-
Canonical Ambiguity Type 2
NP coordination ambiguities
vs.
[Nakov & Hearst 2005b; Bergsma et al. 2011]
NP
NP
CC
car
NP
truck production
and
NN
NP
NP
NN NN
CC
NP
production
NN
car
and
truck
-
Canonical Ambiguity Type 3
Noun compound bracketing ambiguities
vs.
liver cell
[Lapata & Keller 2004; Nakov & Hearst 2005a; Pitler et al. 2010]
liver antibody
NP
NP
NN NN
NP
antibody
NN
liver
cell
NP
NP
NN NN
NP
liver
NN
cell
antibody
-
Parsing Errors
Berkeley parser - errors cast as incorrect dependency attachments
This work - single system that addresses various kinds of ambiguities
21.2 19.6
10.0 7.4
5.3 4.5 4.2 3.5 3.3 3.1 3.1
0.0
5.0
10.0
15.0
20.0
25.0
NNP IN NN RB NNS CC JJ VBD TO CD DT
% o
f Tot
al E
rror
s
Child Tag
-
PDT
NP
… half
DT
a
NN
dozen
NNS
newspapers
WSJ Errors
… ordered full pages in the Monday editions of half a dozen newspapers .
-
PDT
NP
… half
DT
a
NN
dozen
NNS
newspapers
QP
WSJ Errors
… ordered full pages in the Monday editions of half a dozen newspapers .
-
VBZ
VP
… ´s
ADVP
RB
just
ADJP
JJ
fine
WSJ Errors
… a familiar message : Keep on investing , the market 's just fine .
-
VBZ
VP
… ´s
RB
just
JJ
fine
ADJP
WSJ Errors
… a familiar message : Keep on investing , the market 's just fine .
-
Using Web-Scale Features
raising $ 30 million from debt
!(raising from) !($ from)
"(head arg) Idea: Edge-factored features that encode web-counts
-
(v n1 p n2)verb / noun attachment ?
Prepositional Phrase (PP) disambiguation
Web-Scale Statistics
(Volk, 2001)
Only 2 competing attachments !
-
1[h, a]
Dependency Features
Discriminative dependency parsing
(McDonald et al., 2005; inter alia)
1[cluster(h), cluster(a)] (Koo et al., 2008; Finkel et al., 2008)
ɸ(h a)
-
Web-Scale Features
Affinity based Web features
c(raising * from) = 20K
raising * from
Web-count
20 Binning
ɸ(raising +3 from)
-
Web-Scale Features
Affinity based Web features
1[VBG ― +3 ― IN, 20]
ɸ(raising +3 from)
Unlexicalized Lexicalized
-
Web-Scale Features
Affinity based Web features
1[POS(h) ― d ― POS(a), webcnt]
ɸ(h [distance] a)
-
Web-Scale Features
Paraphrase (context) based Web features
ɸ(raising from)
raising * from
Top trigrams
raising money from raising funds from raising him from raising it from raising capital from
....
[Nakov and Hearst 2005b]
-
Web-Scale Features
Paraphrase (context) based Web features
ɸ(raising from)
1[VBG ― it ― IN]
-
Web-Scale Features
Paraphrase (context) based Web features
ɸ(h a)
1[POS(h) ― context ― POS(a)]
-
Web-Scale Features
Collecting top context words
before :
ɸ(h a)
middle :
after :
-
q1‘ = w1
q2‘ = wn
Web n-grams Query Count Trie
counts
q1 q2
q1 *q2
q1 **q2
q1 ***q2
w1 … wn w1 … wn w1 … wn
SC
AN
{q2} hash {q1} hash
Computing Web Statistics Efficiently
Search engines inefficient – use Google n-grams (n = 1 to 5)
Query q = q1q2
#(w1…wn)
4 billion n-grams
4.5 million queries
< 2 hours
Batch – Collect all queries beforehand, then scan all n-grams
-
Parsing Results
92.0
91.4
90 91 92 93
This work
McDonald & Pereira 2006
UAS
Dependency Parsing Web-features integrated into underlying dynamic program
Error reduction (relative) of 7.0% over order-2 features
-
Parsing Results
Constituent Parsing
91.4
91.1
90.2
89 90 91 92
Web + Configurational features
Web features
Petrov et al. 2006
Parsing F1
Get k-best parses and rerank them discriminatively
Error reduction (relative) of 9.2% and 12.2%
Collins+Charniak
-
Error Analysis
Errors reduced for a variety of child (argument) types
12.4
9.1
12.1
5.8
15.7
11.7
6.7
1.4
11.6
-2.0 0.0 2.0 4.0 6.0 8.0
10.0 12.0 14.0 16.0 18.0
NN NNP IN DT NNS JJ CD VBD RB CC VB
% E
rror
Red
uctio
n
Child Tag (in decreasing order of total # of attachments)
-
Error Analysis
Error reduction for each type of parent attachment for a given child
18 23
30
20
33
0
10
20
30
40
IN NN VB NNP VBN
% E
rror
Red
uctio
n
Parent Tag
11 11
20 18 23
0
10
20
30
NN VBD NNS VB VBG
% E
rror
Red
uctio
n
Parent Tag
9
22 17
9
17
0
10
20
30
NN NNS VBD JJ CD
% E
rror
Red
uctio
n
Parent Tag
NN Child Tag IN JJ
-
Affinity features
High-Weight Features
their bridge back into the big-time RB IN
an Oct. 19 review of “The, Misanthrope” NN IN
The new rate will be payable Feb. 15 DT NN
-
Paraphrase features
High-Weight Features
the guile learned from his years in VBN this IN
sow a row of male-fertile plants The NN IN
about stock purchases and sales by NNS and NNS
The row of
learned this from
purchases and sales
-
Web-features are powerful disambiguators
Conclusion
Incorporation into end-to-end full-scale parsing
Uniform treatment of all attachment error types
7-12% relative error reduction in state-of-the-art parsers
Intuitive features surface in the learning setup
-
Thank you!
29
Questions?