context-free parsing. 2/37 basic issues top-down vs. bottom-up handling ambiguity –lexical...

37
Context-Free Parsing

Post on 19-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Context-Free Parsing

2/37

Basic issues

• Top-down vs. bottom-up

• Handling ambiguity– Lexical ambiguity– Structural ambiguity

• Breadth first vs. depth first

• Handling recursive rules

• Handling empty rules

3/37

Some terminology

• Rules written A B co Terminal vs. non-terminal symbolso Left-hand side (head): always non-terminalo Right-hand side (body): can be mix of terminal

and non-terminal, any number of themo Unique start symbol (usually S)o ‘’ “rewrites as”, but is not directional (an “=”

sign would be better)

4/37

S NP VPNP det nVP vVP v NP

1. Top-down with simple grammar

Lexicondet {an, the}n {elephant, man} v shot

S NP VP S

NP VP

NP det n

the man shot an elephant

det n

det {an, the}

the

n {elephant, man}

man

VP vVP v NP

v

v shot shot

No more rules, but input is not completely accounted for…So we must backtrack, and try the other VP rule

5/37

Lexicondet {an, the}n {elephant, man} v shot

S NP VPNP det nVP vVP v NP

1. Top-down with simple grammar

S NP VP S

NP VP

NP det n

the man shot an elephant

det n

det {an, the}

the

n {elephant, man}

man

VP vVP v NP

v NP

v shot shot

NP det n

det n

det {an, the} an

n {elephant, man}

elephant

No more rules, and input is completely accounted for

6/37

Breadth-first vs depth-first (1)

• When we came to the VP rule we were faced with a choice of two rules

• “Depth-first” means following the first choice through to the end

• “Breadth-first” means keeping all your options open

• We’ll see this distinction more clearly later,

• And also see that it is quite significant

7/37

S NP VPNP det nVP vVP v NP

2. Bottom-up with simple grammar

Lexicondet {an, the}n {elephant, man} v shot

S NP VP

S

NP det n

the man shot an elephant

VP v

NP VP

We’ve reached the top, but input is not completely accounted for…So we must backtrack, and try the other VP rule

det {an, the}n {elephant, man} v shot

det n v det n

NP

VP v NP

VP

S NP VP

S

We’ve reached the top, and input is completely accounted for

8/37

S NP VPNP det nVP vVP v NP

Same again but with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

shot can be v or n

9/37

S NP VPNP det nVP vVP v NP

3. Top-down with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

S NP VP S

NP VP

NP det n

the man shot an elephant

det n

det {an, the}

the

n {elephant, man}

man

VP vVP v NP

v NP

shot det n

an elephantSame as before: at this point, we are looking for a v, and shot fits the bill; the n reading never comes into play

10/37

S NP VPNP det nVP vVP v NP

4. Bottom-up with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

S NP VP

S

NP det n

the man shot an elephant

VP v

NP VP

det {an, the}n {elephant, man, shot} v shot

det n v det n

NP

n

VP v NP VP

S

Terminology:graphnodesarcs (edges)

11/37

det n

S NP VPNP det nVP vVP v NP

4. Bottom-up with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

the man shot an elephant

NP

det n v

NP

n

VP

VPS

S

Let’s get rid of all the unused arcs

12/37

det n

S NP VPNP det nVP vVP v NP

4. Bottom-up with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

the man shot an elephant

NP

det n v

NP

VP

S

Let’s get rid of all the unused arcs

13/37

det n

S NP VPNP det nVP vVP v NP

4. Bottom-up with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

the man shot an elephant

NP

det n v

NP

VP

SAnd let’s clear away all the arcs…

14/37

det n

S NP VPNP det nVP vVP v NP

4. Bottom-up with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

the man shot an elephant

NP

det n v

NP

VP

SAnd let’s clear away all the arcs…

15/37

Breadth-first vs depth-first (2)

• In chart parsing, the distinction is more clear cut:

• At any point there may be a choice of things to do: which arcs to develop

• Breadth-first vs. depth-first can be seen as what order they are done in

• Queue (FIFO = breadth-first) vs. stack (LIFO= depth-first)

16/37

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

Same again but with structural ambiguity

Lexicondet {an, the, his}n {elephant, man, shot, pyjamas} v shot prep in

S

NP VP

det n

the man

v NP

shot det n

an elephantprep NP

in det n

his pyjamas

PP

in his pyjamasthe man shot an elephant

We introduce a PP rule in two places

17/37

Lexicondet {an, the, his}n {elephant, man, shot, pyjamas} v shot prep in

S

NP VP

det n

the man

v NP

shot det n

an elephantprep NP

in det n

his pyjamas

PP

in his pyjamasthe man shot an elephant

We introduce a PP rule in two places

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

Same again but with structural ambiguity

18/37

S NP VP S

NP VP

NP det n NP det n PP

the man shot an elephant in his pyjamas

det n

det {an, the, his}

n {elephant, man, shot, pyjamas}

5. Top-down with structural ambiguity

At this point, depending on our strategy (breadth-first vs. depth-first) we may consider the NP complete and look for the VP, or we may try the second NP rule.Let’s see what happens in the latter case.

PP prep NPprep in

PP

prep NP

The next word, shot, isn’t a prep,So this rule simply fails

the man

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

19/37

S NP VP S

NP VP

NP det n NP det n PP

the man shot an elephant in his pyjamas

det n

det {an, the, his}

the

n {elephant, man, shot, pyjamas}

man

VP vVP v NPVP v NP PPv shot

an elephant

5. Top-down with structural ambiguity

shot

v

As before, the first VP rule works,But does not account for all the input.

v NP

shot

NP det n NP det n PP

det n

det {an, the, his}

n {elephant, man, shot, pyjamas} Similarly, if we try the second VP rule, and the

first NP rule …

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

20/37

S NP VP S

NP VP

NP det n NP det n PP

the man shot an elephant in his pyjamas

det n

det {an, the, his}

the

n {elephant, man, shot, pyjamas}

man

VP vVP v NPVP v NP PPv shot

5. Top-down with structural ambiguity

shot

v

NP det n NP det n PP

So what do we try next?This?Or this?

Depth-first: it’s a stack, LIFO

Breadth-first: it’s a queue, FIFO

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

an elephant

v NP

shot det n

21/37

S NP VP S

NP VP

NP det n NP det n PP

the man shot an elephant in his pyjamas

det n

det {an, the, his}

the

n {elephant, man, shot, pyjamas}

man

VP vVP v NPVP v NP PPv shot

5. Top-down with structural ambiguity (depth-first)

NP det n NP det n PP an elephant

v NP

det {an, the, his}n {elephant, man, shot, pyjamas}

shot det n PP

PP prep NPprep in

prep NP

in his pyjamas

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

22/37

S NP VP S

NP VP

NP det n NP det n PP

the man shot an elephant in his pyjamas

det n

det {an, the, his}

the

n {elephant, man, shot, pyjamas}

man

VP vVP v NPVP v NP PPv shot

5. Top-down with structural ambiguity (breadth-first)

NP det n NP det n PP

v NP PP

prep NP

in his pyjamas

shot det n

andet {an, the, his}

elephant

n {elephant, man, shot, pyjamas}

PP prep NPprep in

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

23/37

Recognizing ambiguity

• Notice how the choice of strategy determines which result we get (first).

• In both strategies, there are often rules left untried, on the list (whether queue or stack).

• If we want to know if our input is ambiguous, at some time we do have to follow these through.

• As you will see later, trying out alternative paths can be quite intensive

24/37

6. Bottom-up with structural ambiguity

S NP VP

NP det n

the man shot an elephant in his pyjamas

VP v

NP NPVP

VP v NP

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

det n v det n prep det n

NP

NP det n PP

PP prep NP

PP

NP

VP

VP

VP v NP PP

VP

S

S

S

S

25/37

6. Bottom-up with structural ambiguity

the man shot an elephant in his pyjamas

NP NP

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

det n v det n prep det n

NP

PP

NP

VP

VP

VP

VP

S

S

S

26/37

Recursive rules

• “Recursive” rules call themselves

• We already have some recursive rule pairs: NP det n PP

PP prep NP

• Rules can be immediately recursive AdjG adj AdjG

(the) big fat ugly (man)

27/37

Recursive rulesLeft recursiveAdjG AdjG adjAdjG adj

Right recursiveAdjG adj AdjGAdjG adj

AdjG

AdjG adj

AdjG adj

big fat rich old

AdjG

AdjGadj

AdjGadj

big fat rich old

28/37

NP

det n

the

NP det nNP det AdjG nAdjG AdjG adjAdjG adj

7. Top-down with left recursion

NP det nNP det AdjG n

the big fat rich old man

the

AdjG AdjG adjAdjG adj

NP

det AdjG n

AdjG adj

AdjG adj

AdjG adj

AdjG adj

AdjG adj

You can’t have left-recursive rules with a top-down parser, even if the non-recursive rule is first

29/37

NP det nNP det AdjG nAdjG adj AdjGAdjG adj

7. Top-down with right recursion

NP det nNP det AdjG n

the big fat rich old man

the

AdjG adj AdjGAdjG adj

NP

det AdjG n

adj AdjG

big adj AdjG

fat adj AdjG

rich adj AdjG

old adj AdjG

adj

old

man

old

30/37

NP det nNP det AdjG nAdjG AdvG adj AdjGAdjG adjAdvG AdvG advAdvG adv

8. Bottom-up with left and right recursion

AdjG rule is right recursive,AdvG rule is left recursive

the very very fat ugly man

det adv adv adj adj n

AdjG adj

AdjG AdjG

AdvG adv

AdvG AdvG

AdjG AdvG adj AdjG

AdjG

AdvG AdvG adv

AdvG

AdjG AdvG adj AdjG AdjGNP det AdjG n

NP

Quite a few useless paths, but overall no difficulty

31/37

NP det nNP det AdjG nAdjG AdvG adj AdjGAdjG adjAdvG AdvG advAdvG adv

8. Bottom-up with left and right recursion

AdjG rule is right recursive,AdvG rule is left recursive

the very very fat ugly man

det adv adv adj adj n

AdjG adj

AdjG

AdvG adv

AdvG

AdjG AdvG adj AdjG AdvG AdvG adv

AdvG

AdjG AdvG adj AdjG AdjGNP det AdjG n

NP

32/37

Empty rules

• For exampleNP det AdjG n

AdjG adj AdjG

AdjG ε

• Equivalent toNP det AdjG n

NP det n

AdjG adj

AdjG adj AdjG

• Or NP det (AdjG) n

AdjG adj (AdjG)

33/37

NP det AdjG nAdjG adj AdjGAdjG ε

7. Top-down with empty rules

NP det AdjG n

the man

the

AdjG adj AdjGAdjgG ε

NP

det AdjG n

adj AdjG man

NP det AdjG n

the big fat man

the

AdjG adj AdjGAdjgG ε

NP

det AdjG n

adj AdjG

big adj AdjG

fat

man

34/37

8. Bottom-up with empty rules

the fat man

det adj n

AdjG ε

AdjG adj AdjG

NP det AdjG nNP

Lots of useless paths, especially in a long sentence, but otherwise no difficulty

NP det AdjG nAdjG adj AdjGAdjG ε

AdjGAdjG AdjG AdjG

AdjG

35/37

Top down vs. bottom-up

• Bottom-up builds many useless trees• Top-down can propose false trails, sometimes

quite long, which are only abandoned when they reach the word level– Especially a problem if breadth-first

• Bottom-up very inefficient with empty rules• Top-down CANNOT handle left-recursion• Top-down cannot do partial parsing

– Especially useful for speech

• Wouldn’t it be nice to combine them to get the advantages of both?

36/37

Left-corner parsing

• The “left corner” of a rule is the first symbol after the rewrite arrow– e.g. in S NP VP, the left corner is NP.

• Left corner parsing starts bottom-up, taking the first item off the input and finding a rule for which it is the left corner.

• This provides a top-down prediction, but we continue working bottom-up until the prediciton is fulfilled.

• When a rule is completed, apply the left-corner principle: is that completed constituent a left-corner?

37/37

S NP VPNP det nVP v VP v NP

9. Left-corner with simple grammar

NP det n

the man shot an elephant

VP v

det

the man

NP

n

S NP VP

S

VP

shot

v

but text not allaccounted for,so tryVP v NP

NP

det

an

NP det nn

elephant