better k-best parsing...–very close to our lazy offline algorithm (but for cky only) –tested on...
TRANSCRIPT
![Page 1: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/1.jpg)
Better Better kk-best Parsing-best Parsing
Liang Huang (Penn)
David Chiang (Maryland)
9th International Workshop on Parsing Technologies (IWPT 2005)Vancouver, B.C., Canada
![Page 2: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/2.jpg)
Liang Huang and David Chiang 1IWPT 2005
POS tagging
Syntactic Parsing
Semantic Interpretation
compact:lattice
k-best lists
MotivationsMotivations
• NLP pipeline– 1-best is not always optimal in the future– postpone disambiguation to next phases– next phase compatible with current?
• Y: packed representation (forest, lattice)• N: k-best lists
• Discriminative Training– Reranking (Collins, 2000)– Minimum error training (Och, 2003)– k-best MIRA/Perceptron (McDonald,
Crammer and Pereira, 2005)
![Page 3: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/3.jpg)
Liang Huang and David Chiang 2IWPT 2005
Previous WorkPrevious Work
• Collins (2000); Bikel (2004)– Turn down Dynamic Programming– Aggressive Pruning
• Tight Beam Width• Hard Cell Limit
• Charniak and Johnson (ACL 2005)– multi-pass, coarse-to-fine k-best– improvement: f-score: 89.7% ==> 91.0%
• Jiménez and Marzal (2000)– very close to our lazy offline algorithm (but for CKY only)– tested on a tiny grammar on WSJ (512 rules)
![Page 4: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/4.jpg)
Liang Huang and David Chiang 3IWPT 2005
OutlineOutline
• Formulation– directed monotonic hypergraphs
• Algorithms– Alg.0 thru Alg. 3
• Experiments– k-best parser on top of the Collins/Bikel Parser– k-best CKY-based Hiero decoder (Chiang, 2005)
• Conclusion
![Page 5: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/5.jpg)
Liang Huang and David Chiang 4IWPT 2005
HypergraphHypergraph
• A hypergraph is a pair <V, E>– V is the set of vertices (the items in derivation)– E is the set of hyperedges, each hyperedge connecting several vertices (the antecedents of a derivation rule)
to one vertex (the consequent of a derivation rule)
![Page 6: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/6.jpg)
Liang Huang and David Chiang 5IWPT 2005
I saw a boy with a telescope
S
VPNP
I
NP
a boy
v
sawNP
a telescope
prep
with
PP
VP
Packed Forest as HypergraphPacked Forest as Hypergraph
logical deduction(Shieber et al., 1995)
hypergraph search(Klein & Manning, 2001)
![Page 7: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/7.jpg)
Liang Huang and David Chiang 6IWPT 2005
I saw a boy with a telescope
S
VPNP
I
NP
a boy
v
sawNP
a telescope
prep
with
PP
VP NP
VP
S
Packed Forest as HypergraphPacked Forest as Hypergraph
logical deduction(Shieber et al., 1995)
hypergraph search(Klein & Manning, 2001)
![Page 8: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/8.jpg)
Liang Huang and David Chiang 7IWPT 2005
Packed Forest as HypergraphPacked Forest as Hypergraph
I saw a boy with a telescope
S
VPNP
I
NP
a boy
v
sawNP
a telescope
prep
with
PP
VP NP
VP
a hypergraph!
vertices
hyperedges
logical deduction(Shieber et al., 1995)
hypergraph search(Klein & Manning, 2001)
weighted deduction(Nederhof, 2003)
weighted hypergraph
![Page 9: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/9.jpg)
Liang Huang and David Chiang 8IWPT 2005
Weighted HypergraphWeighted Hypergraph
• A tuple <V, E, t, R>• t - target vertex (goal item) e.g. t = (S, 1, n)
• R - weight set with a total-ordering ≤• every e = <T(e), h(e), f > f - weight function
: a : b : c
: f (a, b, c)
e = < ( ), , f >(Nederhof, 2003)
![Page 10: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/10.jpg)
Liang Huang and David Chiang 9IWPT 2005
Monotonic WeightMonotonic Weight FunctionsFunctions
• all weight functions must be monotonic on eachof its arguments
• optimal sub-problem property in dynamicprogramming
A: f (b, c)
B: bC: c
A: f (b’, c) ≤ f (b, c)
B: b’≤bC: c
e = < ((NP, 1, 2) (VP, 3, 5)),(S, 1, 5), f >
f (b, c)= b•c •Pr(S→NP VP)
CKY example:
![Page 11: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/11.jpg)
Liang Huang and David Chiang 10IWPT 2005
in CKY, t = (S, 1, n)
kk-best Problem in Hypergraph-best Problem in Hypergraph
• 1-best problem– find the best derivation of the target vertex t
• k-best problem– find the top k derivations of the target vertex t
• assumptions– acyclic: so that we can use topological order
![Page 12: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/12.jpg)
Liang Huang and David Chiang 11IWPT 2005
OutlineOutline
• Formulations• Algorithms
– Algorithm 0: naïve polynomial– Algorithm 1: speeding up multk
– Algorithm 2: speeding up mergek
– Algorithm 3: offline lazy algorithm
• Experiments• Conclusion and Future Work
![Page 13: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/13.jpg)
Liang Huang and David Chiang 12IWPT 2005
Generic 1-best Generic 1-best ViterbiViterbi Algorithm Algorithm
• traverse the hypergraph in topological order– for each incoming hyperedge
• compute the result of the f function along the hyperedge• update the 1-best value for the current vertex if possible
v
u: a
w: b
f1
: f1(a, b)
![Page 14: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/14.jpg)
Liang Huang and David Chiang 13IWPT 2005
Generic 1-best Viterbi AlgorithmGeneric 1-best Viterbi Algorithm
v
u: a
w: b
u’: c
w’: d
f1
f2
: better ( f1(a, b), f2(c, d))
• traverse the hypergraph in topological order– for each incoming hyperedge
• compute the result of the f function along the hyperedge• update the 1-best value for the current vertex if possible
![Page 15: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/15.jpg)
Liang Huang and David Chiang 14IWPT 2005
Generic 1-best Viterbi AlgorithmGeneric 1-best Viterbi Algorithm
v
u: a
w: b
u’: c
w’: d
f1
f2
: better( better ( f1(a, b), f2(c, d)), …)
… overall time complexity: O(|E|)
CKY + CNF: |E|=O (n3|P|)
• traverse the hypergraph in topological order– for each incoming hyperedge
• compute the result of the f function along the hyperedge• update the 1-best value for the current vertex if possible
![Page 16: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/16.jpg)
Liang Huang and David Chiang 15IWPT 2005
Dynamic Programming: 1957Dynamic Programming: 1957
Dr. Andrew Viterbi
We knew everything so farin your talk 40 years ago
Dr. Richard Bellman
![Page 17: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/17.jpg)
Liang Huang and David Chiang 16IWPT 2005
kk-best Viterbi algorithm 0: naïve-best Viterbi algorithm 0: naïve
• straightforward k-best extension:– a vector of length k instead of a single value– vector components maintain sorted– now what’s f (a, b) ?
• k2 values -- Cartesian Product f (ai, bj)• just need top k out of the k2 values• O(k2 logk) (sorting) or O(k2) (selection)
v
u: a
w: b
f1
: multk ( f1, a, b)
multk ( f, a, b) = topk { f (ai, bj) }
![Page 18: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/18.jpg)
Liang Huang and David Chiang 17IWPT 2005
v
u: a
w: b
f1
u’: c
w’: d
f2
: mergek(multk ( f1, a, b), multk ( f2, c, d))
overall time complexity: O(k2|E|)
Algorithm 0: naïveAlgorithm 0: naïve
• straightforward k-best extension:– a vector of length k instead of a single value– and how to update?
• from two k-lengthed vectors (2k elements)• select the top k elements: O(k)
![Page 19: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/19.jpg)
Liang Huang and David Chiang 18IWPT 2005
Algorithm 1: speedup Algorithm 1: speedup multmultkk
• only interested in top k, why enumerate all k2 ?• a and b are sorted!• f is monotonic!• so …?• f (a1, b1) must be the 1-best• the 2nd-best must be either f (a2, b1) or f (a1, b2)• what about the 3rd-best?
multk ( f, a, b) = topk{ f (ai, bj) }
![Page 20: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/20.jpg)
Liang Huang and David Chiang 19IWPT 2005
Algorithm 1Algorithm 1 (Demo)(Demo)
.3.3.4.6
.5
.4
.3
.1
ai
b j .24.24
.30.30 .20.20
f (a, b) = ab
![Page 21: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/21.jpg)
Liang Huang and David Chiang 20IWPT 2005
.3.3.4.6
.20.30.5
.24.4
.3
.1
Algorithm 1 (Demo)Algorithm 1 (Demo)
.18.18
.16.16b j
ai
f (a, b) = ab
![Page 22: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/22.jpg)
Liang Huang and David Chiang 21IWPT 2005
Algorithm 1 (Demo)Algorithm 1 (Demo)
.3.3.4.6
.20.30.5
.16.24.4
.18.3
.1
.15.15
b j
ai
use a priority queue (heap) tostore the candidates (frontier) in each iteration:
1. extract-max from theheap
2. push the two“shoulders” into theheap
k iterations.
O(k logk |E|) overall time
![Page 23: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/23.jpg)
Liang Huang and David Chiang 22IWPT 2005
Algorithm 2: speedup Algorithm 2: speedup mergemergekk
• if a vertex has d incoming hyperedges,Algorithm 1 takes time O(d k logk )– d multk and d mergek
• multk ( f, a, b) is just intermediate resultswe are only interested in the result ofmergek (multk ( f1, a, b), …, multk ( fd, x, y) )
v
u: a w: b
f1 fd
…p: x q: y
…fi
![Page 24: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/24.jpg)
Liang Huang and David Chiang 23IWPT 2005
AlgorithmAlgorithm 2 (Demo)2 (Demo)
• can we do the mergek and multksimultaneously? same trick -- heapsort
0.70.4
0.6.42
0.1
0.40.3
0.9.36
0.5
0.90.7
0.4.36
0.1
item-level heap
B1 x C1 B2 x C2 B3 x C3
v k = 2, d = 3
![Page 25: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/25.jpg)
Liang Huang and David Chiang 24IWPT 2005
Algorithm 2 (Demo)Algorithm 2 (Demo)
0.70.4
0.6.42
0.1
0.40.3
0.9.36
0.5
0.80.7
0.4.32
0.1
item-level heap
B1 x C1 B2 x C2 B3 x C3
.42
starts with an initial heap of the 1-best derivationsfrom each hyperedge
v k = 2, d = 3
![Page 26: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/26.jpg)
Liang Huang and David Chiang 25IWPT 2005
Algorithm 2 (Demo)Algorithm 2 (Demo)
0.70.4
0.6.42
0.1
0.40.3
0.9.36
0.5
0.80.7
0.4.32
0.1
item-level heap
B1 x C1 B2 x C2 B3 x C3
starts with an initial heap of the 1-best derivationsfrom each hyperedge
v
but just need the top k among the d 1-best derivations
k = 2, d = 3
![Page 27: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/27.jpg)
Liang Huang and David Chiang 26IWPT 2005
Algorithm 2 (Demo)Algorithm 2 (Demo)
0.70.4
0.6.42
0.1
0.40.3
0.9.36
0.5
0.80.7
0.4.32
0.1
item-level heap
B1 x C1 B2 x C2 B3 x C3
.42
v
pop the best (.42) and …
k = 2, d = 3
![Page 28: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/28.jpg)
Liang Huang and David Chiang 27IWPT 2005
Algorithm 2 (Demo)Algorithm 2 (Demo)
0.70.4
0.6.42.24
0.1.07
0.40.3
0.9.36
0.5
0.80.7
0.4.32
0.1
item-level heap
B1 x C1 B2 x C2 B3 x C3
output
.42
v
pop the best (.42) and …
push the two shoulders (.07 and .24) as its successors
k = 2, d = 3
![Page 29: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/29.jpg)
Liang Huang and David Chiang 28IWPT 2005
Algorithm 2 (Demo)Algorithm 2 (Demo)
0.70.4
0.6.42.24
0.1.07
0.40.3
0.9.36
0.5
0.80.7
0.4.32
0.1
item-level heap
B1 x C1 B2 x C2 B3 x C3
output
.36
.42
improves the O(dk log k) to O(d + k log k )
overall time complexity: O(|E|+|V|k log k )
v k = 2, d = 3
![Page 30: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/30.jpg)
Liang Huang and David Chiang 29IWPT 2005
AlgorithmAlgorithm 3: Offline (lazy)3: Offline (lazy)
• from Alg. 0 to Alg. 2:– delaying the calculations until needed -- lazier– larger locality
• Even lazier… (one step further)– we are interested in the k-best derivations of the
final item only!
![Page 31: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/31.jpg)
Liang Huang and David Chiang 30IWPT 2005
Algorithm 3: Offline (lazy)Algorithm 3: Offline (lazy)
• forward phase– do a normal 1-best search till the final item– construct the hypergraph (parse forest) along the way
• recursive backward phase– ask the final item: what’s your 2nd-best?– final item will propagate this question till the leaves– then ask the final item: what’s your 3rd-best?
![Page 32: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/32.jpg)
Liang Huang and David Chiang 31IWPT 2005
Algorithm 3 demoAlgorithm 3 demo
0.7
?0.6
.42 ?
NP (1, 3) VP (4, 7)
.42k=2S (1, 7)
0.4
?0.5
.20 ?
NP (1, 2) VP (3, 7)
0.7
?0.4
.28 ?
VP (1, 5) NP (6, 7)1-best
after the “forward” step (1-best parsing):
forest = 1-best derivations from each hyperarc
![Page 33: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/33.jpg)
Liang Huang and David Chiang 32IWPT 2005
Algorithm 3 demoAlgorithm 3 demo
0.7
?0.6
.42 ?
NP (1, 3) VP (4, 7)
.42k=2S (1, 7)
0.7
?0.4
.28 ?
VP (1, 5) NP (6, 7)1-best
what’s your 2nd-best?
now the backward step
0.4
?0.5
.20 ?
NP (1, 2) VP (3, 7)
![Page 34: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/34.jpg)
Liang Huang and David Chiang 33IWPT 2005
Algorithm 3 demoAlgorithm 3 demo
0.7
?0.6
.42 ?
NP (1, 3) VP (4, 7)
.42k=2S (1, 7)
0.7
?0.4
.28 ?
VP (1, 5) NP (6, 7)1-best
I’m not sure... let meask my parents…
0.4
?0.5
.20 ?
NP (1, 2) VP (3, 7)
![Page 35: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/35.jpg)
Liang Huang and David Chiang 34IWPT 2005
Algorithm 3 demoAlgorithm 3 demo
0.7
?0.6
.42 ?
NP (1, 3) VP (4, 7)
.42k=2S (1, 7)
0.7
?0.4
.28 ?
VP (1, 5) NP (6, 7)1-best
what’s your 2nd-best?
0.4
?0.5
.20 ?
NP (1, 2) VP (3, 7)
![Page 36: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/36.jpg)
Liang Huang and David Chiang 35IWPT 2005
Algorithm 3 demoAlgorithm 3 demo
0.7
?0.6
.42 ?
NP (1, 3) VP (4, 7)
.42k=2S (1, 7)
0.7
?0.4
.28 ?
VP (1, 5) NP (6, 7)1-best
or, equivalently…who’s your successor in
this hyperarc?
0.4
?0.5
.20 ?
NP (1, 2) VP (3, 7)
![Page 37: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/37.jpg)
Liang Huang and David Chiang 36IWPT 2005
Algorithm 3 demoAlgorithm 3 demo
0.7
?0.6
.42
?
?
?
NP (1, 3) VP (4, 7)
.42k=2S (1, 7)
0.7
?0.4
.28 ?
VP (1, 5) NP (6, 7)
well, it must be either … or …
0.4
?0.5
.20 ?
NP (1, 2) VP (3, 7)
![Page 38: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/38.jpg)
Liang Huang and David Chiang 37IWPT 2005
Algorithm 3 demoAlgorithm 3 demo
0.7
?0.6
.42
?
?
?
NP (1, 3) VP (4, 7)
.42k=2S (1, 7)
0.7
?0.4
.28 ?
VP (1, 5) NP (6, 7)
these are candidatesfor my 2nd-best
0.4
?0.5
.20 ?
NP (1, 2) VP (3, 7)
![Page 39: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/39.jpg)
Liang Huang and David Chiang 38IWPT 2005
Algorithm 3 demoAlgorithm 3 demo
0.7
?0.6
.42
??
?
NP (1, 3) VP (4, 7)
.42k=2S (1, 7)
0.7
?0.4
.28 ?
VP (1, 5) NP (6, 7)
but wait a minute… did you already know the ?’s ?
oops… forgot to askmore questionsrecursively …
0.4
?0.5
.20 ?
NP (1, 2) VP (3, 7)
![Page 40: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/40.jpg)
Liang Huang and David Chiang 39IWPT 2005
Algorithm 3 demoAlgorithm 3 demo
0.7
?0.6
.42
?
?
?
NP (1, 3) VP (4, 7)
.42k=2S (1, 7)
0.7
?0.4
.28 ?
VP (1, 5) NP (6, 7)
what’s your 2nd-best?
0.4
?0.5
.20 ?
NP (1, 2) VP (3, 7)
![Page 41: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/41.jpg)
Liang Huang and David Chiang 40IWPT 2005
Algorithm 3 demoAlgorithm 3 demo
0.7
?0.6
.42
?
??
NP (1, 3) VP (4, 7)
.42k=2S (1, 7)
0.7
?0.4
.28 ?
VP (1, 5) NP (6, 7)
… …
… …
recursion goes on to the leaf nodes
0.4
?0.5
.20 ?
NP (1, 2) VP (3, 7)
![Page 42: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/42.jpg)
Liang Huang and David Chiang 41IWPT 2005
Algorithm 3 demoAlgorithm 3 demo
0.7
0.50.6
.42
?0.3
?
NP (1, 3) VP (4, 7)
.42k=2S (1, 7)
0.7
?0.4
.28 ?
VP (1, 5) NP (6, 7)
and reports back the numbers…
0.4
?0.5
.20 ?
NP (1, 2) VP (3, 7)
![Page 43: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/43.jpg)
Liang Huang and David Chiang 42IWPT 2005
Algorithm 3 demoAlgorithm 3 demo
0.7
0.50.6
.42
.300.3
.21
NP (1, 3) VP (4, 7)
.30
.42k=2S (1, 7)
0.7
?0.4
.28 ?
VP (1, 5) NP (6, 7)
push .30 and .21 to the candidate heap (priority queue)
0.4
?0.5
.20 ?
NP (1, 2) VP (3, 7)
![Page 44: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/44.jpg)
Liang Huang and David Chiang 43IWPT 2005
Algorithm 3 demoAlgorithm 3 demo
0.7
0.50.6
.42
.300.3
.21
NP (1, 3) VP (4, 7)
.30
.42k=2S (1, 7)
0.4
?0.5
.20 ?
NP (1, 2) VP (3, 7)
0.7
?0.4
.28 ?
VP (1, 5) NP (6, 7)
now I know my 2nd-best
pop the root of the heap (.30)
![Page 45: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/45.jpg)
Liang Huang and David Chiang 44IWPT 2005
Summary of AlgorithmsSummary of Algorithms
globalO(|E|+|D|k log (d+k))generalized J&M
hyperedgeO(|E|)1-best
globalO(|E|+|D|k log k)alg. 3: lazy
item (mergek)O(|E|+|V|k log k)alg. 2
hyperedge (multk)O(k log k |E|)alg. 1
hyperedge (multk)O(ka|E|)alg. 0: naïve
LocalityTime ComplexityAlgorithms
for CKY: a=2 |E|=n3|P| |V|=n2|N| |D|=O(n)
![Page 46: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/46.jpg)
Liang Huang and David Chiang 45IWPT 2005
OutlineOutline
• Formulations• Algorithms: Alg.0 thru Alg. 3• Experiments
– Collins/Bikel Parser– CKY-based MT decoder (Chiang, 2005)
• Conclusion
![Page 47: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/47.jpg)
Liang Huang and David Chiang 46IWPT 2005
Bikel ParserBikel Parser
• Based on lexical context-free models (Collins, 2003)• We use it to emulate Collins Model 2• beam search (pruning on cells)
– cell [i, j] contains all items in the form of (A, i, j)– beam width x
• prune away items worse than x times the best item in the cell (threshold pruning in MT)
– cell limit y• only keep at most y best items in a cell (histogram pruning in MT)
![Page 48: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/48.jpg)
Liang Huang and David Chiang 47IWPT 2005
EfficiencyEfficiency
Implemented Algorithms 0, 1, 3 on top of Bikel ParserAverage (wall-clock) time on section 23 (per sentence):
O(|E|+|D|k log k)
![Page 49: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/49.jpg)
Liang Huang and David Chiang 48IWPT 2005
Quality of the Quality of the kk-best lists-best listsOracle Reranking --Oracle Reranking -- Accuracy (F-score)Accuracy (F-score)
![Page 50: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/50.jpg)
Liang Huang and David Chiang 49IWPT 2005
Why are our k-best lists better?Why are our k-best lists better?
average number of parses for sentences of certain length
as sentences get longer, the number of parses should goup (exponentially)!
Collins
beam width 10-3 k=100beam width 10-4
k=100
![Page 51: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/51.jpg)
Liang Huang and David Chiang 50IWPT 2005
MT decoder (Efficiency)MT decoder (Efficiency)
CKY-based Hiero decoder (Chiang, ACL 2005):implemented algorithms 2 and 3average decoding time (excluding 1-best part)
![Page 52: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/52.jpg)
Liang Huang and David Chiang 51IWPT 2005
DiscussionsDiscussionsHyperpaths Hyperpaths vsvs. Derivations. Derivations
• hyperpath– a minimal sub-hypergraph
• every vertex has at mostone hyperedge
• derivation– a tree– a vertex can appear more
than once
• 1-best: always coincide
hypergraph
hyperpath
derivation
![Page 53: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/53.jpg)
Liang Huang and David Chiang 52IWPT 2005
ConclusionConclusion
• monotonic hypergraph formulation– we solved the k-best derivations problem– not the k-shortest hyperpaths problem
• k-best Algorithms– Alg. 0 (naïve) thru Alg. 3 (lazy)
• experimental results– efficiency– accuracy (effective search over larger space)
![Page 54: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/54.jpg)
Liang Huang and David Chiang 53IWPT 2005
THE ENDTHE END
Questions?
Comments?
![Page 55: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/55.jpg)
Liang Huang and David Chiang 54IWPT 2005
![Page 56: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/56.jpg)
Liang Huang and David Chiang 55IWPT 2005
Discussions (contDiscussions (cont’’d)d)Hyperpaths Hyperpaths vsvs. Derivations. Derivations
(B, i, j) (C, j+1, k)
(A, i, k)
Earley: not the case, but easy to fix
CKY: always coincide
![Page 57: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/57.jpg)
Liang Huang and David Chiang 56IWPT 2005
Interesting PropertiesInteresting Properties
• 1-best is best everywhere (all decisionsoptimal)
• 2nd-best is optimal everywhere exceptone decision– and that decision must be 2nd-best– and it’s the best of all 2nd-best decisions
• so what about the 3rd-best?• kth-best is…
(Charniak and Johnson, ACL 2005)
local picture:
.3.3.4.6
.15.15.15.15.20.30.5
.12.12.12.12.16.16.24.4
.09.09.09.09.12.12.18.18.3
.03.03.03.03.04.04.06.06.1
.18.18
.16.16b j
ai
![Page 58: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/58.jpg)
Liang Huang and David Chiang 57IWPT 2005
Quality of the Quality of the kk-best lists-best listsOracle Reranking --Oracle Reranking -- Relative ImprovementRelative Improvement
f ∈ R|T(e)|
→ R
![Page 59: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/59.jpg)
Liang Huang and David Chiang 58IWPT 2005
Future WorkFuture Work
• k-best discriminative supertagging• Implement Alg. 2 and 3 for Bikel Parser and Alg. 0
and 1 for MT decoder (David)– so both experiments have all four algorithms
• Real Reranking (w/ Libin)• Chinese Parsing?• Formal Grammars and Hypergraphs• Case Factor Diagrams and Hypergraphs
![Page 60: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/60.jpg)
Liang Huang and David Chiang 59IWPT 2005
Hypergraph is Everything!Hypergraph is Everything!
Generic Dynamic Programming
Shared Forest
Weighted Deduction
Hypergraph Searching
branching structures vs. finite-state structures
![Page 61: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/61.jpg)
Liang Huang and David Chiang 60IWPT 2005
Convergence of FomulationsConvergence of Fomulations
Shared Forest
Weighted Deduction
Hypergraph
![Page 62: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline](https://reader034.vdocument.in/reader034/viewer/2022042123/5e9e4f356efa7d0bef302890/html5/thumbnails/62.jpg)
Liang Huang and David Chiang 61IWPT 2005
on aon a log scalelog scale……
?