march 1, 2009dr. muhammed al_mulhem1 ics482 parsing chapter 13 muhammed al-mulhem march 1, 2009

52
March 1, 2009 Dr. Muhammed Al_mulhem 1 ICS482 Parsing Chapter 13 Muhammed Al-Mulhem March 1, 2009

Upload: mia-miah-fitzgerald

Post on 14-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

March 1 2009 Dr Muhammed Al_mulhem 1

ICS482 Parsing

Chapter 13

Muhammed Al-MulhemMarch 1 2009

March 1 2009 2Dr Muhammed Al_mulhem

Parsing

Parsing with CFGs refers to the task of assigning proper trees to input strings

Proper here means a tree that covers all and only the elements of the input and has an S at the top

March 1 2009 3Dr Muhammed Al_mulhem

Top Down Parsing

March 1 2009 4Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 5Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 6Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 7Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 8Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 9Dr Muhammed Al_mulhem

Ambiguity

March 1 2009 10Dr Muhammed Al_mulhem

Shared Sub-Problems

Consider A flight from Indianapolis to Houston on TWA

March 1 2009 11Dr Muhammed Al_mulhem

Shared Sub-Problems

Assume a top-down parse making choices among the various Nominal rules

In particular between these two Nominal -gt Noun Nominal -gt Nominal PP

Choosing the rules in this order leads to the following bad results

March 1 2009 12Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 13Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 2Dr Muhammed Al_mulhem

Parsing

Parsing with CFGs refers to the task of assigning proper trees to input strings

Proper here means a tree that covers all and only the elements of the input and has an S at the top

March 1 2009 3Dr Muhammed Al_mulhem

Top Down Parsing

March 1 2009 4Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 5Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 6Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 7Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 8Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 9Dr Muhammed Al_mulhem

Ambiguity

March 1 2009 10Dr Muhammed Al_mulhem

Shared Sub-Problems

Consider A flight from Indianapolis to Houston on TWA

March 1 2009 11Dr Muhammed Al_mulhem

Shared Sub-Problems

Assume a top-down parse making choices among the various Nominal rules

In particular between these two Nominal -gt Noun Nominal -gt Nominal PP

Choosing the rules in this order leads to the following bad results

March 1 2009 12Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 13Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 3Dr Muhammed Al_mulhem

Top Down Parsing

March 1 2009 4Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 5Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 6Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 7Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 8Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 9Dr Muhammed Al_mulhem

Ambiguity

March 1 2009 10Dr Muhammed Al_mulhem

Shared Sub-Problems

Consider A flight from Indianapolis to Houston on TWA

March 1 2009 11Dr Muhammed Al_mulhem

Shared Sub-Problems

Assume a top-down parse making choices among the various Nominal rules

In particular between these two Nominal -gt Noun Nominal -gt Nominal PP

Choosing the rules in this order leads to the following bad results

March 1 2009 12Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 13Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 4Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 5Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 6Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 7Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 8Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 9Dr Muhammed Al_mulhem

Ambiguity

March 1 2009 10Dr Muhammed Al_mulhem

Shared Sub-Problems

Consider A flight from Indianapolis to Houston on TWA

March 1 2009 11Dr Muhammed Al_mulhem

Shared Sub-Problems

Assume a top-down parse making choices among the various Nominal rules

In particular between these two Nominal -gt Noun Nominal -gt Nominal PP

Choosing the rules in this order leads to the following bad results

March 1 2009 12Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 13Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 5Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 6Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 7Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 8Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 9Dr Muhammed Al_mulhem

Ambiguity

March 1 2009 10Dr Muhammed Al_mulhem

Shared Sub-Problems

Consider A flight from Indianapolis to Houston on TWA

March 1 2009 11Dr Muhammed Al_mulhem

Shared Sub-Problems

Assume a top-down parse making choices among the various Nominal rules

In particular between these two Nominal -gt Noun Nominal -gt Nominal PP

Choosing the rules in this order leads to the following bad results

March 1 2009 12Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 13Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 6Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 7Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 8Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 9Dr Muhammed Al_mulhem

Ambiguity

March 1 2009 10Dr Muhammed Al_mulhem

Shared Sub-Problems

Consider A flight from Indianapolis to Houston on TWA

March 1 2009 11Dr Muhammed Al_mulhem

Shared Sub-Problems

Assume a top-down parse making choices among the various Nominal rules

In particular between these two Nominal -gt Noun Nominal -gt Nominal PP

Choosing the rules in this order leads to the following bad results

March 1 2009 12Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 13Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 7Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 8Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 9Dr Muhammed Al_mulhem

Ambiguity

March 1 2009 10Dr Muhammed Al_mulhem

Shared Sub-Problems

Consider A flight from Indianapolis to Houston on TWA

March 1 2009 11Dr Muhammed Al_mulhem

Shared Sub-Problems

Assume a top-down parse making choices among the various Nominal rules

In particular between these two Nominal -gt Noun Nominal -gt Nominal PP

Choosing the rules in this order leads to the following bad results

March 1 2009 12Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 13Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 8Dr Muhammed Al_mulhem

Bottom-Up Search

March 1 2009 9Dr Muhammed Al_mulhem

Ambiguity

March 1 2009 10Dr Muhammed Al_mulhem

Shared Sub-Problems

Consider A flight from Indianapolis to Houston on TWA

March 1 2009 11Dr Muhammed Al_mulhem

Shared Sub-Problems

Assume a top-down parse making choices among the various Nominal rules

In particular between these two Nominal -gt Noun Nominal -gt Nominal PP

Choosing the rules in this order leads to the following bad results

March 1 2009 12Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 13Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 9Dr Muhammed Al_mulhem

Ambiguity

March 1 2009 10Dr Muhammed Al_mulhem

Shared Sub-Problems

Consider A flight from Indianapolis to Houston on TWA

March 1 2009 11Dr Muhammed Al_mulhem

Shared Sub-Problems

Assume a top-down parse making choices among the various Nominal rules

In particular between these two Nominal -gt Noun Nominal -gt Nominal PP

Choosing the rules in this order leads to the following bad results

March 1 2009 12Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 13Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 10Dr Muhammed Al_mulhem

Shared Sub-Problems

Consider A flight from Indianapolis to Houston on TWA

March 1 2009 11Dr Muhammed Al_mulhem

Shared Sub-Problems

Assume a top-down parse making choices among the various Nominal rules

In particular between these two Nominal -gt Noun Nominal -gt Nominal PP

Choosing the rules in this order leads to the following bad results

March 1 2009 12Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 13Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 11Dr Muhammed Al_mulhem

Shared Sub-Problems

Assume a top-down parse making choices among the various Nominal rules

In particular between these two Nominal -gt Noun Nominal -gt Nominal PP

Choosing the rules in this order leads to the following bad results

March 1 2009 12Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 13Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 12Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 13Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 13Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 14Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 15Dr Muhammed Al_mulhem

Shared Sub-Problems

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 16Dr Muhammed Al_mulhem

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well no not really) Efficiently store ambiguous structures with

shared sub-parts Wersquoll cover two approaches that roughly

correspond to top-down and bottom-up approaches CKY Earley

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 17Dr Muhammed Al_mulhem

CKY Parsing

First wersquoll limit our grammar to epsilon-free binary rules (more later)

Consider the rule A BC If there is an A somewhere in the

input then there must be a B followed by a C in the input

If the A spans from i to j in the input then there must be some k st iltkltj Ie The B splits from the C someplace

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 18Dr Muhammed Al_mulhem

Problem

What if your grammar isnrsquot binary As in the case of the TreeBank grammar

Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

What does this mean The resulting grammar accepts (and rejects)

the same set of strings as the original grammar

But the resulting derivations (trees) are different

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 19Dr Muhammed Al_mulhem

Problem

More specifically we want our rules to be of the formA B C

Or

A w

That is rules can expand to either 2 non-terminals or to a single terminal

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 20Dr Muhammed Al_mulhem

Binarization Intuition

Eliminate chains of unit productions Introduce new intermediate non-

terminals into the grammar that distribute rules with length gt 2 over several rules Sohellip S A B C turns into S X C andX A BWhere X is a symbol that doesnrsquot occur

anywhere else in the the grammar

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 21Dr Muhammed Al_mulhem

Sample L1 Grammar

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 22Dr Muhammed Al_mulhem

CNF Conversion

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 23Dr Muhammed Al_mulhem

CKY

So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

So a non-terminal spanning an entire string will sit in cell [0 n] Hopefully an S

If we build the table bottom-up wersquoll know that the parts of the A must go from i to k and from k to j for some k

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 24Dr Muhammed Al_mulhem

CKY

Meaning that for a rule like A B C we should look for a B in [ik] and a C in [kj]

In other words if we think there might be an A spanning ij in the inputhellip AND

A B C is a rule in the grammar THEN

There must be a B in [ik] and a C in [kj] for some iltkltj

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 25Dr Muhammed Al_mulhem

CKY

So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

systematic search

For each cell loop over the appropriate k values to search for things to add

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 26Dr Muhammed Al_mulhem

CKY Algorithm

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 27Dr Muhammed Al_mulhem

CKY Parsing

Is that really a parser

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 28Dr Muhammed Al_mulhem

Note

We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore

filling a cell the parts needed to fill it are already in the table (to the left and below)

Itrsquos somewhat natural in that it processes the input a left to right a word at a time Known as online

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 29Dr Muhammed Al_mulhem

Example

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 30Dr Muhammed Al_mulhem

Example

Filling column 5

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 31Dr Muhammed Al_mulhem

Example

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 32Dr Muhammed Al_mulhem

Example

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 33Dr Muhammed Al_mulhem

Example

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 34Dr Muhammed Al_mulhem

Example

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 35Dr Muhammed Al_mulhem

CKY Notes

Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy

Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 36Dr Muhammed Al_mulhem

Earley Parsing

Allows arbitrary CFGs Top-down control Fills a table in a single sweep over

the input Table is length N+1 N is number of

words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 37Dr Muhammed Al_mulhem

States

The table-entries are called states and are represented with dotted-rules

S VP A VP is predicted

NP Det Nominal An NP is in progress

VP V NP A VP has been found

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 38Dr Muhammed Al_mulhem

StatesLocations

S VP [00]

NP Det Nominal [12]

VP V NP [03]

A VP is predicted at the start of the sentence

An NP is in progress the Det goes from 1 to 2

A VP has been found starting at 0 and ending at 3

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 39Dr Muhammed Al_mulhem

Earley

As with most dynamic programming approaches the answer is found by looking in the table in the right place

In this case there should be an S state in the final column that spans from 0 to N and is complete That is S α [0N]

If thatrsquos the case yoursquore done

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 40Dr Muhammed Al_mulhem

Earley

So sweep through the table from 0 to Nhellip New predicted states are created by

starting top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 41Dr Muhammed Al_mulhem

Earley

More specificallyhellip1 Predict all the states you can upfront2 Read a word

1 Extend states based on matches2 Generate new predictions3 Go to step 2

3 When yoursquore out of words look at the chart to see if you have a winner

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 42Dr Muhammed Al_mulhem

Core Earley Code

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 43Dr Muhammed Al_mulhem

Earley Code

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 44Dr Muhammed Al_mulhem

Example

Book that flight We should findhellip an S from 0 to 3

that is a completed statehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 45Dr Muhammed Al_mulhem

Chart[0]

Note that given a grammar these entries are the same for all inputs they can be pre-loaded

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 46Dr Muhammed Al_mulhem

Chart[1]

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 47Dr Muhammed Al_mulhem

Charts[2] and [3]

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 48Dr Muhammed Al_mulhem

Efficiency

For such a simple example there seems to be a lot of useless stuff in there

Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 49Dr Muhammed Al_mulhem

Details

As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 50Dr Muhammed Al_mulhem

Back to Ambiguity

Did we solve it

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 51Dr Muhammed Al_mulhem

Ambiguity

Nohellip Both CKY and Earley will result in

multiple S structures for the [0N] table entry

They both efficiently store the sub-parts that are shared between multiple parses

And they obviously avoid re-deriving those sub-parts

But neither can tell us which one is right

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip

March 1 2009 52Dr Muhammed Al_mulhem

Ambiguity

In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

Wersquoll try to model that with probabilities

But note something odd and important about the Groucho Marx examplehellip