exponential decay pruning for bottom-up beam-search parsing nathan bodenstab, brian roark, aaron...

33
Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

Upload: christine-arnold

Post on 16-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

Exponential Decay Pruning for Bottom-Up Beam-Search Parsing

Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall

April 2010

Page 2: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

2

Talk Outline

• Intro to Syntactic Parsing– Why Parse?

• Parsing Algorithms– CYK– Best-First– Beam-Search

• Exponential Decay Pruning• Results

Page 3: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

3

Intro to Syntactic Parsing

• Hierarchically cluster and label syntactic word groups (constituents)• Provides structure and meaning

Page 4: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

4

Intro to Syntactic Parsing

• Why Parse?– Machine Translation

• Synchronous Grammars

– Language Understanding• Semantic Role Labeling• Word Sense Disambiguation• Question-Answering• Document Summarization

– Language Modeling• Long-distance dependencies

– Because it’s fun

Page 5: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

5

Intro to Syntactic Parsing

• What you (usually) need to parse– Supervised data: A treebank of sentences with annotated parse structure

• WSJ treebank: 50k sentences

– A Binarized Probabilistic Context Free Grammar induced from a treebank

– A parsing algorithm

• Example grammar rules:– S NP VP prob=0.2

– NP NP NN prob=0.1

– NP JJ NN prob=0.06

– Binarize: VP PP VB NN • VP PP @VP prob=0.2• @VP VB NN prob=0.5

Page 6: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

6

Parsing Accuracy

Non-terminals

Grammar Size

Sec / Sent

F-Score

Baseline 2,500 64,000 0.1 74%

Parent Annotation (Johnson) 6,000 75,000 1.0 78%

Manual Refinement (Klein) 15,000 86%

Latent Variable (Petrov) 1,100 4,000,000 100.0 89%

Lexical (Collins, Charniak) Lots Implicit 89%

• Accuracy Improvements from grammar refinement– Split original non-terminal categories (Subject-NP vs. Object-NP)

– Accuracy at the cost of speed• Solution space becomes impractical to exhaustively search

Page 7: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

7

Berkeley Grammar & Parser

• Petrov et al. automatically split non-terminals using latent variables• Example grammar rules:

– S_3 NP_12 VP_6 prob=0.2

– NP_12 NP_9 NN_7 prob=0.1

– NN_7 house prob=0.06

• Berkeley Coarse-to-Fine parser uses six latent variable grammars– Parse input sentence once with each grammar

– Posterior probabilities from pass n used to prune pass n+1

– Must know mapping between non-terminals from different grammars• Grammar(2) { NP_1, NP_6 } Grammar(3) { NP_2, NP_9, NP_14 }

Page 8: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

8

Research Goals

• Our Research Goals– Find good solutions very quickly in this LARGE grammar space (not ML)– Algorithms should be grammar agnostic– Consider practical implications (speed, memory)

• This talk: Exponential Decay Pruning– Beam-Search parsing for efficient search– Searches the final grammar space directly– Balance overhead of targeted exploration (best-first) vs. memory and

cache benefits of local exploration (CYK)

Page 9: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

9

Parsing Algorithms: CYK

• Intro to Syntactic Parsing– Why Parse?

• Parsing Algorithms– CYK– Best-First– Beam-Search

• Exponential Decay Pruning• Results

Page 10: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

10

Parsing Algorithms: CYK

• Exhaustive population of all parse trees permitted by the grammar

• Dynamic Programming algorithm give Maximum Likelihood solution

Page 11: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

11

Parsing Algorithms: CYK

• Fill in cells for SPAN=1,2,3,4,…

GrammarS NP VP (p=0.7)

NP NP NP (p=0.2)

NP NP VP (p=0.1)

NN court (p=0.4)

VB court (p=0.1)

….

Page 12: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

12

Parsing Algorithms: CYK

GrammarS NP VP (p=0.7)

NP NP NP (p=0.2)

NP NP VP (p=0.1)

NN court (p=0.4)

VB court (p=0.1)

….

• N iterations through the grammar at each chart cell to consider all possible midpoints

Page 13: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

13

Parsing Algorithms: Best-First

• Intro to Syntactic Parsing– Why Parse?

• Parsing Algorithms– CYK– Best-First– Beam-Search

• Exponential Decay Pruning• Results

Page 14: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

14

Parsing Algorithms: Best-First

GrammarS NP VP (p=0.7)

VB court (p=0.1)

….

Frontier PQ[try][shooting,defendant]VP VB NP fom=28.1

[try,shooting][defendant]VP VB NP fom=14.7

[Juvenile][court]NP ADJ NN fom=13

• Frontier is a Priority Queue of all potentially buildable entries

• Add best entry from Frontier; expand Frontier with all possible chart + grammar extensions

Page 15: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

15

Parsing Algorithms: Best-First

GrammarS NP VP (p=0.7)

VB court (p=0.1)

….

Frontier PQ[try][shooting,defendant]VP VB NP fom=28.1

[try,shooting][defendant]VP VB NP fom=14.7

[Juvenile][court]NP ADJ NN fom=13

• Frontier is a Priority Queue of all potentially buildable entries

• Add best entry from Frontier; expand Frontier with all possible chart + grammar extensions

Page 16: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

16

Parsing Algorithms: Best-First

• How do we rank Frontier entries?– Figure-of-Merit (FOM)– FOM = Inside (grammar) * Outside (heuristic)

– Caraballo and Charniak, 1997 (C&C)– Problem with comparisons of different spans

GrammarS NP VP (p=0.7)

VB court (p=0.1)

….

Frontier PQ[try][shooting,defendant]VP VB NP fom=28.1

[try,shooting][defendant]VP VB NP fom=14.7

[Juvenile][court]NP ADJ NN fom=13

Page 17: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

17

Parsing Algorithms: Beam-Search

• Intro to Syntactic Parsing– Why Parse?

• Parsing Algorithms– CYK– Best-First– Beam-Search

• Exponential Decay Pruning• Results

Page 18: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

18

Parsing Algorithms: Beam-Search

• Beam-Search: Best of both worlds

• CKY exhaustive traversal (bottom-up)

• At each chart cell– Compute FOM for all possible cell entries– Rank entries in a (temporary) local priority queue– Only populate the cell with the n-best entries (beam-width)

• Less Memory– Not storing all cell entries (CYK) nor bad frontier entries (Best-First)

• Runs Faster– Search space is pruned (unlike CYK) and don’t need to maintain global

priority queue (Best-First)

• Eliminates problem of global cell entry comparison

Page 19: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

19

Parsing Algorithms: Beam-Search

• Intro to Syntactic Parsing– Why Parse?

• Parsing Algorithms– CYK– Best-First– Beam-Search

• Exponential Decay Pruning• Results

Page 20: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

20

Exponential Decay Pruning

• What is the optimal beam-width per chart cell?– Common solutions:

• Relative score difference from highest ranking entry• Global maximum number of candidates

• Exponential Decay Pruning– Adaptive beam-width conditioned on chart cell information– How reliable is our Figure-of-Merit per chart cell?– Plotted rank of Gold entry against span and sentence size

• FOM is more reliable for larger spans– Less dependent on outside estimate

• FOM is less reliable for short sentences– Atypical grammatical structure (in WSJ?)

Page 21: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

21

Exponential Decay Pruning

• Confidence in FOM can be modeled with the Exponential Decay function– N0 = Global beam-width maximum

– n = sentence length– s = span length (number of words covered)– λ = tuning parameter

Page 22: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

22

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

SpanLength / SentenceLength

Per

cen

t B

asel

ine

Co

nst

itu

ents

Ad

ded

to

Ch

art

baseline

n=5

n=10

n=20

n=40

Exponential Decay Pruning

• Confidence in FOM can be modeled with the Exponential Decay function

Page 23: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

23

• Intro to Syntactic Parsing– Why Parse?

• Parsing Algorithms– CYK– Best-First– Beam-Search

• Exponential Decay Pruning• Results

Page 24: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

24

Results

• Wall Street Journal treebank– Train: Sections 2-21 (40k sentences)

– Dev: Section 24 (1.3k sentences

– Test: Section 23 (2.4k sentences)

• Berkeley SM6 Latent Variable Grammar• Figure-of-Merit from Caraballo and Charniak, 1997 (C&C)• Also applied Cell Closing Constraints (Roark and Hollingshead, 2008)• External comparison with Berkeley Coarse-to-Fine parser using same

grammar

Page 25: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

25

Results: Dev

Algorithm FOM Beam-Width

Cell Closing

Seconds per Sent

Chart Entries

F-Score

CYK 94.1 163537 87.2

Best-First Inside 138.0 152472 87.2

Best-First C&C 1.43 349 85.2

Beam-Search Inside Constant 5.68 35501 87.2

Beam-Search Inside Decay 3.01 20002 87.0

Beam-Search C&C Constant 0.62 7548 87.0

Beam-Search C&C Decay 0.37 5145 87.1

Beam-Search C&C Constant Yes 0.31 5333 87.4

Beam-Search C&C Decay Yes 0.20 3839 87.5

• Figure-of-Merit makes a big difference• Fast solution, but significant accuracy degradation

Page 26: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

26

Results: Dev

Algorithm FOM Beam-Width

Cell Closing

Seconds per Sent

Chart Entries

F-Score

CYK 94.1 163537 87.2

Best-First Inside 138.0 152472 87.2

Best-First C&C 1.43 349 85.2

Beam-Search Inside Constant 5.68 35501 87.2

Beam-Search Inside Decay 3.01 20002 87.0

Beam-Search C&C Constant 0.62 7548 87.0

Beam-Search C&C Decay 0.37 5145 87.1

Beam-Search C&C Constant Yes 0.31 5333 87.4

Beam-Search C&C Decay Yes 0.20 3839 87.5

• Using the inside probability for the FOM– 95% speed reduction with Beam-Search over Best-First

– Exponential Decay adds additional 47% speed reduction

Page 27: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

27

Results: Dev

Algorithm FOM Beam-Width

Cell Closing

Seconds per Sent

Chart Entries

F-Score

CYK 94.1 163537 87.2

Best-First Inside 138.0 152472 87.2

Best-First C&C 1.43 349 85.2

Beam-Search Inside Constant 5.68 35501 87.2

Beam-Search Inside Decay 3.01 20002 87.0

Beam-Search C&C Constant 0.62 7548 87.0

Beam-Search C&C Decay 0.37 5145 87.1

Beam-Search C&C Constant Yes 0.31 5333 87.4

Beam-Search C&C Decay Yes 0.20 3839 87.5• Using the C&C FOM

– Beam-Search is faster (57%) and more accurate than Best-First

– Exponential Decay adds additional 40% speed reduction

Page 28: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

28

Results: Dev

Algorithm FOM Beam-Width

Cell Closing

Seconds per Sent

Chart Entries

F-Score

CYK 94.1 163537 87.2

Best-First Inside 138.0 152472 87.2

Best-First C&C 1.43 349 85.2

Beam-Search Inside Constant 5.68 35501 87.2

Beam-Search Inside Decay 3.01 20002 87.0

Beam-Search C&C Constant 0.62 7548 87.0

Beam-Search C&C Decay 0.37 5145 87.1

Beam-Search C&C Constant Yes 0.31 5333 87.4

Beam-Search C&C Decay Yes 0.20 3839 87.5

Page 29: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

29

Results: Test

Algorithm FOM Beam-Width

Cell Closing

Seconds per Sent

F-Score

CYK 76.63 88.0

Beam-Search C&C Constant 0.45 87.9

Beam-Search C&C Decay 0.28 88.0

Beam-Search C&C Decay Yes 0.16 88.3

Berkeley C2F 0.21 88.3

• 38% relative speed-up (Decay vs. Constant beam-width)• Decay pruning and Cell Closing Constraints are complementary• Same ball-park as Coarse-to-Fine (perhaps a bit faster)• Requires no knowledge of the grammar

Page 30: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

30

Thanks

Page 31: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

31

FOM Details

• C&C FOM Details– FOM(NT) = Outsideleft * Inside * Outsideright

– Inside = Constituent grammar score for NT

– Outsideleft = Max { POS forward prob * POS-to-NT transition prob }

– Outsideright = Max { NT-to-POS transition prob * POS bkwd prob }

Page 32: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

32

FOM Details

• C&C FOM Details

Page 33: Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

33

Research Goals

• Research Goals– Find good solutions very quickly in this LARGE grammar space (not ML)– Algorithms should be grammar agnostic– Consider practical implications (speed, memory)

• Current projects towards these goals– Better FOM function

• Inside estimate (grammar refinement)• Outside estimate (participation in complete parse tree)

– Optimal chart traversal strategy• Which areas of the search space are most promising?• Cell Closing Constraints (Roark and Hollingshead, 2008)

– Balance between targeted and exhaustive exploration• How much “work” should be done exploring the search space around these promising

areas?• Overhead of targeted exploration (best-first) vs. memory and cache benefits of local

exploration (CYK)