Transcript
Page 1: Structure and Ambiguity Removing Ambiguity

Structure and AmbiguityRemoving Ambiguity

Chomsky Normal FormPushdown Automata Intro

(who is he foolin', thinking that there will be time to get to this?)

MA/CSSE 474Theory of Computation

Page 2: Structure and Ambiguity Removing Ambiguity

Context free languages:

We care about structure.

E

E + E

id E * E

3 id id

5 7

Structure

Page 3: Structure and Ambiguity Removing Ambiguity

Parse trees capture essential structure:

1 2 3 4 5 6S SS (S)S ((S))S (())S (())(S) (())()S SS (S)S ((S))S ((S))(S) (())(S) (())() 1 2 3 5 4 6

S

S S

( S ) ( S )

( S )

Derivations and parse trees

Page 4: Structure and Ambiguity Removing Ambiguity

Parse Trees

A parse tree, derived from a grammar G = (V, , R, S), is a rooted, ordered tree in which:

● Every leaf node is labeled with an element of {}, ● The root node is labeled S,

● Every other node is labeled with an element of N, and

● If m is a non-leaf node labeled X and the (ordered) children of m are labeled x1, x2, …, xn, then R contains the rule X  x1 x2, … xn.

Page 5: Structure and Ambiguity Removing Ambiguity

S

NP VP

Nominal V NP

Adjs N Nominal

Adj N

the smart cat smells chocolate

Structure in English

Page 6: Structure and Ambiguity Removing Ambiguity

Generative Capacity

Because parse trees matter, it makes sense, given a grammar G, to distinguish between:

● G’s weak generative capacity, defined to be the set of strings, L(G), that G generates, and

● G’s strong generative capacity, defined to be the set of parse trees that G generates.

Page 7: Structure and Ambiguity Removing Ambiguity

Algorithms Care How We Search

Algorithms for generation and recognition must be systematic. They typically use either the leftmost derivation or the rightmost derivation.

S

S S

( S ) ( S )

( S )

Page 8: Structure and Ambiguity Removing Ambiguity

Derivations of The Smart Cat• A left-most derivation is: S NP VP the Nominal VP the Adjs N VP the Adj N VP the smart N VP the smart cat VP the smart cat V NP the smart cat smells NP

the smart cat smells Nominal the smart cat smells N the smart cat smells chocolate

• A right-most derivation is: S NP VP NP V NP NP V Nominal NP V N NP V chocolate NP smells chocolate the Nominal smells chocolate the Adjs N smells chocolate the Adjs cat smells chocolate

the Adj cat smells chocolate the smart cat smells chocolate

Page 9: Structure and Ambiguity Removing Ambiguity

Ambiguity

A grammar is ambiguous iff there is at least one string in L(G) for which G produces more than one parse tree*.

For many applications of context-free grammars, this is a problem.

Example: A programming language. •If there can be two different structures for a string in the language, there can be two different meanings. •Not good!

* Equivalently, more than one leftmost derivation, or more than one rightmost derivation.

Page 10: Structure and Ambiguity Removing Ambiguity

An Arithmetic Expression Grammar

E E + E E E E E (E) E id

Page 11: Structure and Ambiguity Removing Ambiguity

Inherent Ambiguity

Some CF languages have the property that every grammar for them is ambiguous. We call such languages inherently ambiguous.

Example:

L = {anbncm: n, m 0} {anbmcm: n, m 0}.

Page 12: Structure and Ambiguity Removing Ambiguity

Inherent Ambiguity

L = {anbncm: n, m 0} {anbmcm: n, m 0}.

One grammar for L has the rules:

S S1 | S2

S1 S1c | A /* Generate all strings in {anbncm}.A aAb |

S2 aS2 | B /* Generate all strings in {anbmcm}.B bBc |

Consider any string of the form anbncn.

It turns out that L is inherently ambiguous.

Page 13: Structure and Ambiguity Removing Ambiguity

Ambiguity and undecidability

Both of the following problems are undecidable*:

• Given a context-free grammar G, is G ambiguous?

• Given a context-free language L, is L inherently ambiguous?

Informal definition of undecidable for the first problem:There is no algorithm (procedure that is guaranteed to always halt) that, given a grammar G, determines whether G is ambiguous.

Page 14: Structure and Ambiguity Removing Ambiguity

But We Can Often Reduce Ambiguity

We can get rid of:

● some rules like S ,

● rules with symmetric right-hand sides, e.g.,

S SS E E + E

● rule sets that lead to ambiguous attachment of optional postfixes, such as if … else ….

Page 15: Structure and Ambiguity Removing Ambiguity

A Highly Ambiguous Grammar

S S SSS (S)

Page 16: Structure and Ambiguity Removing Ambiguity

Resolving the Ambiguity with a Different Grammar

The biggest problem is the rule.

A different grammar for the language of balanced parentheses:

S* S* SS SSS (S)S ()

We'd like to have an algorithm for removing all -productions…… except for the case where - is actually in the language; then we introduce a new start symbol and have one -production whose left side is that symbol.

Page 17: Structure and Ambiguity Removing Ambiguity

Nullable Nonterminals

Examples:

S aTa T

S aTaT A B

A B

A nonterminal X is nullable iff either: (1) there is a rule X , or (2) there is a rule X PQR… and P, Q, R, … are all nullable.

Page 18: Structure and Ambiguity Removing Ambiguity

Nullable Nonterminals

A nonterminal X is nullable iff either: (1) there is a rule X , or (2) there is a rule X PQR… and P, Q, R, … are all nullable.

So compute N, the set of nullable nonterminals, as follows:

1. Set N to the set of nonterminals that satisfy (1). 2. Repeat until an entire pass is made without adding anything to N Evaluate all other nonterminals with respect to (2). If any nonterminal satisfies (2) and is not in N, insert it.

Page 19: Structure and Ambiguity Removing Ambiguity

A General Technique for Getting Rid of -Rules

Definition: a rule is modifiable iff it is of the form:

P Q, for some nullable Q.

removeEps(G: cfg) = 1. Let G = G. 2. Find the set N of nullable nonterminals in G. 3. Repeat until G contains no modifiable rules that

haven’t been processed: Given the rule P Q, where Q N,

add the rule P if it is not already present and if   and if P . 4. Delete from G all rules of the form X . 5. Return G.

L(G) = L(G) – {}

Page 20: Structure and Ambiguity Removing Ambiguity

An Example G = {{S, T, A, B, C, a, b, c}, {a, b, c}, R, S), R = { S aTa

T ABC A aA | C

B Bb | C C c | }

removeEps(G: cfg) = 1. Let G = G. 2. Find the set N of nullable nonterminals in G. 3. Repeat until G contains no modifiable rules

that haven’t been processed:

Given the rule P Q, where Q N, add the rule P

if it is not already present and if   and if P .

4. Delete from G all rules of the form X . 5. Return G.

Page 21: Structure and Ambiguity Removing Ambiguity

What If L?

atmostoneEps(G: cfg) = 1. G = removeEps(G). 2. If SG is nullable then /* i. e., L(G)

2.1 Create in G a new start symbol S*. 2.2 Add to RG the two rules:

S* S* SG.

3. Return G.

Page 22: Structure and Ambiguity Removing Ambiguity

But There Can Still Be Ambiguity

S* What about ()()() ?S* SS SSS (S)S ()

Page 23: Structure and Ambiguity Removing Ambiguity

Eliminating Symmetric Recursive Rules

S* S* S S SS S (S) S ()

Replace S SS with one of:

S  SS1 /* force branching to the left S  S1S /* force branching to the right

So we get:

S* S  SS1

S* S S  S1 S1 (S) S1 ()

Page 24: Structure and Ambiguity Removing Ambiguity

Eliminating Symmetric Recursive Rules

So we get: S* S* S S  SS1 S  S1 S1 (S) S1 ()

S*

S

S S1

S S1

S1

( ) ( ) ( )

Page 25: Structure and Ambiguity Removing Ambiguity

Arithmetic Expressions

E E + E E E E E (E) E id }

E E

E E E E

E E E E

id id id id id id

Problem 1: Associativity

Page 26: Structure and Ambiguity Removing Ambiguity

Arithmetic Expressions

E E + E E E E E (E) E id }

E E

E E E E

E E E E

id id + id id id + id

Problem 2: Precedence

Page 27: Structure and Ambiguity Removing Ambiguity

Arithmetic Expressions - A Better Way

E E + T E T

T T * FT F

F (E)F id

Page 28: Structure and Ambiguity Removing Ambiguity

Ambiguous Attachment

The dangling else problem:

<stmt> ::= if <cond> then <stmt><stmt> ::= if <cond> then <stmt> else <stmt>

Consider:

if cond1 then if cond2 then st1 else st2

Page 29: Structure and Ambiguity Removing Ambiguity

<Statement> ::= <IfThenStatement> | <IfThenElseStatement> | <IfThenElseStatementNoShortIf><StatementNoShortIf> ::= <block> | <IfThenElseStatementNoShortIf> | …<IfThenStatement> ::= if ( <Expression> ) <Statement><IfThenElseStatement> ::= if ( <Expression> ) <StatementNoShortIf> else <Statement><IfThenElseStatementNoShortIf> ::= if ( <Expression> ) <StatementNoShortIf> else <StatementNoShortIf>

<Statement>

<IfThenElseStatement>

if (cond) <StatementNoShortIf> else <Statement>

The Java Fix

Page 30: Structure and Ambiguity Removing Ambiguity

Hidden: Going Too Far

S NP VPNP the Nominal | Nominal | ProperNoun | NP PPNominal N | Adjs NN cat | girl | dogs | ball | chocolate | bat ProperNoun Chris | FluffyAdjs Adj Adjs | AdjAdj young | older | smartVP V | V NP | VP PPV like | likes | thinks | hitsPP Prep NPPrep with

● Chris likes the girl with the cat.

● Chris shot the bear with a rifle.

Page 31: Structure and Ambiguity Removing Ambiguity

Normal Forms

A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid objects, with the following two properties: ● For every element c of C, except possibly a finite set of special cases, there exists some element f of F such that f is equivalent to c with respect to some set of tasks.

● F is simpler than the original form in which the elements of C are written. By “simpler” we mean that at least some tasks are easier to perform on elements of F than they would be on elements of C.

Page 32: Structure and Ambiguity Removing Ambiguity

Normal Form Examples

● Disjunctive normal form for database queries so that they can be entered in a query-by-example grid.

● Jordan normal form for a square matrix, in which the matrix is almost diagonal in the sense that its only non-zero entries lie on the diagonal and the superdiagonal.

● Various normal forms for grammars to support specific parsing techniques.

Page 33: Structure and Ambiguity Removing Ambiguity

Normal Forms for Grammars

Chomsky Normal Form, in which all rules are of one of the following two forms:

● X a, where a , or● X BC, where B and C are elements of V - .

Advantages:

● Parsers can use binary trees.● Exact length of derivations is known:

S

A B

A A B B

a a b B B

b b

Page 34: Structure and Ambiguity Removing Ambiguity

Normal Forms for Grammars

Greibach Normal Form, in which all rules are of the following form:

● X a , where a and (V - )*.

Advantages:

● Every derivation of a string s contains |s| rule applications.

● Greibach normal form grammars can easily be converted to pushdown automata with no - transitions. This is useful because such PDAs are guaranteed to halt.

Page 35: Structure and Ambiguity Removing Ambiguity

Theorems: Normal Forms ExistTheorem: Given a CFG G, there exists an equivalent Chomsky normal form grammar GC such that:

L(GC) = L(G) – {}.

Proof: The proof is by construction.

Theorem: Given a CFG G, there exists an equivalent Greibach normal form grammar GG such that:

L(GG) = L(G) – {}.

Proof: The proof is also by construction.

Details of Chomsky conversion are complex but straightforward; I leave them for you to read in Chapter 11 and/or in the next 16 slides.

Details of Greibach conversion are more complex but still straightforward; I leave them for you to read in Appendix D if you wish (not req'd).

Page 36: Structure and Ambiguity Removing Ambiguity

The Price of Normal Forms

E E + EE (E)E id

Converting to Chomsky normal form:

E E EE P EE L EE E RE id L (R )P +

Conversion doesn’t change weak generative capacity but it may change strong generative capacity.


Top Related