structure and ambiguity removing ambiguity

36
Structure and Ambiguity Removing Ambiguity Chomsky Normal Form Pushdown Automata Intro (who is he foolin', thinking that there will be time to get to this?) MA/CSSE 474 Theory of Computation

Upload: ita

Post on 21-Jan-2016

114 views

Category:

Documents


2 download

DESCRIPTION

MA/CSSE 474 Theory of Computation. Structure and Ambiguity Removing Ambiguity Chomsky Normal Form Pushdown Automata Intro (who is he foolin', thinking that there will be time to get to this?). Structure. Context free languages: We care about structure. E E + E id E * E - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Structure and Ambiguity Removing Ambiguity

Structure and AmbiguityRemoving Ambiguity

Chomsky Normal FormPushdown Automata Intro

(who is he foolin', thinking that there will be time to get to this?)

MA/CSSE 474Theory of Computation

Page 2: Structure and Ambiguity Removing Ambiguity

Context free languages:

We care about structure.

E

E + E

id E * E

3 id id

5 7

Structure

Page 3: Structure and Ambiguity Removing Ambiguity

Parse trees capture essential structure:

1 2 3 4 5 6S SS (S)S ((S))S (())S (())(S) (())()S SS (S)S ((S))S ((S))(S) (())(S) (())() 1 2 3 5 4 6

S

S S

( S ) ( S )

( S )

Derivations and parse trees

Page 4: Structure and Ambiguity Removing Ambiguity

Parse Trees

A parse tree, derived from a grammar G = (V, , R, S), is a rooted, ordered tree in which:

● Every leaf node is labeled with an element of {}, ● The root node is labeled S,

● Every other node is labeled with an element of N, and

● If m is a non-leaf node labeled X and the (ordered) children of m are labeled x1, x2, …, xn, then R contains the rule X  x1 x2, … xn.

Page 5: Structure and Ambiguity Removing Ambiguity

S

NP VP

Nominal V NP

Adjs N Nominal

Adj N

the smart cat smells chocolate

Structure in English

Page 6: Structure and Ambiguity Removing Ambiguity

Generative Capacity

Because parse trees matter, it makes sense, given a grammar G, to distinguish between:

● G’s weak generative capacity, defined to be the set of strings, L(G), that G generates, and

● G’s strong generative capacity, defined to be the set of parse trees that G generates.

Page 7: Structure and Ambiguity Removing Ambiguity

Algorithms Care How We Search

Algorithms for generation and recognition must be systematic. They typically use either the leftmost derivation or the rightmost derivation.

S

S S

( S ) ( S )

( S )

Page 8: Structure and Ambiguity Removing Ambiguity

Derivations of The Smart Cat• A left-most derivation is: S NP VP the Nominal VP the Adjs N VP the Adj N VP the smart N VP the smart cat VP the smart cat V NP the smart cat smells NP

the smart cat smells Nominal the smart cat smells N the smart cat smells chocolate

• A right-most derivation is: S NP VP NP V NP NP V Nominal NP V N NP V chocolate NP smells chocolate the Nominal smells chocolate the Adjs N smells chocolate the Adjs cat smells chocolate

the Adj cat smells chocolate the smart cat smells chocolate

Page 9: Structure and Ambiguity Removing Ambiguity

Ambiguity

A grammar is ambiguous iff there is at least one string in L(G) for which G produces more than one parse tree*.

For many applications of context-free grammars, this is a problem.

Example: A programming language. •If there can be two different structures for a string in the language, there can be two different meanings. •Not good!

* Equivalently, more than one leftmost derivation, or more than one rightmost derivation.

Page 10: Structure and Ambiguity Removing Ambiguity

An Arithmetic Expression Grammar

E E + E E E E E (E) E id

Page 11: Structure and Ambiguity Removing Ambiguity

Inherent Ambiguity

Some CF languages have the property that every grammar for them is ambiguous. We call such languages inherently ambiguous.

Example:

L = {anbncm: n, m 0} {anbmcm: n, m 0}.

Page 12: Structure and Ambiguity Removing Ambiguity

Inherent Ambiguity

L = {anbncm: n, m 0} {anbmcm: n, m 0}.

One grammar for L has the rules:

S S1 | S2

S1 S1c | A /* Generate all strings in {anbncm}.A aAb |

S2 aS2 | B /* Generate all strings in {anbmcm}.B bBc |

Consider any string of the form anbncn.

It turns out that L is inherently ambiguous.

Page 13: Structure and Ambiguity Removing Ambiguity

Ambiguity and undecidability

Both of the following problems are undecidable*:

• Given a context-free grammar G, is G ambiguous?

• Given a context-free language L, is L inherently ambiguous?

Informal definition of undecidable for the first problem:There is no algorithm (procedure that is guaranteed to always halt) that, given a grammar G, determines whether G is ambiguous.

Page 14: Structure and Ambiguity Removing Ambiguity

But We Can Often Reduce Ambiguity

We can get rid of:

● some rules like S ,

● rules with symmetric right-hand sides, e.g.,

S SS E E + E

● rule sets that lead to ambiguous attachment of optional postfixes, such as if … else ….

Page 15: Structure and Ambiguity Removing Ambiguity

A Highly Ambiguous Grammar

S S SSS (S)

Page 16: Structure and Ambiguity Removing Ambiguity

Resolving the Ambiguity with a Different Grammar

The biggest problem is the rule.

A different grammar for the language of balanced parentheses:

S* S* SS SSS (S)S ()

We'd like to have an algorithm for removing all -productions…… except for the case where - is actually in the language; then we introduce a new start symbol and have one -production whose left side is that symbol.

Page 17: Structure and Ambiguity Removing Ambiguity

Nullable Nonterminals

Examples:

S aTa T

S aTaT A B

A B

A nonterminal X is nullable iff either: (1) there is a rule X , or (2) there is a rule X PQR… and P, Q, R, … are all nullable.

Page 18: Structure and Ambiguity Removing Ambiguity

Nullable Nonterminals

A nonterminal X is nullable iff either: (1) there is a rule X , or (2) there is a rule X PQR… and P, Q, R, … are all nullable.

So compute N, the set of nullable nonterminals, as follows:

1. Set N to the set of nonterminals that satisfy (1). 2. Repeat until an entire pass is made without adding anything to N Evaluate all other nonterminals with respect to (2). If any nonterminal satisfies (2) and is not in N, insert it.

Page 19: Structure and Ambiguity Removing Ambiguity

A General Technique for Getting Rid of -Rules

Definition: a rule is modifiable iff it is of the form:

P Q, for some nullable Q.

removeEps(G: cfg) = 1. Let G = G. 2. Find the set N of nullable nonterminals in G. 3. Repeat until G contains no modifiable rules that

haven’t been processed: Given the rule P Q, where Q N,

add the rule P if it is not already present and if   and if P . 4. Delete from G all rules of the form X . 5. Return G.

L(G) = L(G) – {}

Page 20: Structure and Ambiguity Removing Ambiguity

An Example G = {{S, T, A, B, C, a, b, c}, {a, b, c}, R, S), R = { S aTa

T ABC A aA | C

B Bb | C C c | }

removeEps(G: cfg) = 1. Let G = G. 2. Find the set N of nullable nonterminals in G. 3. Repeat until G contains no modifiable rules

that haven’t been processed:

Given the rule P Q, where Q N, add the rule P

if it is not already present and if   and if P .

4. Delete from G all rules of the form X . 5. Return G.

Page 21: Structure and Ambiguity Removing Ambiguity

What If L?

atmostoneEps(G: cfg) = 1. G = removeEps(G). 2. If SG is nullable then /* i. e., L(G)

2.1 Create in G a new start symbol S*. 2.2 Add to RG the two rules:

S* S* SG.

3. Return G.

Page 22: Structure and Ambiguity Removing Ambiguity

But There Can Still Be Ambiguity

S* What about ()()() ?S* SS SSS (S)S ()

Page 23: Structure and Ambiguity Removing Ambiguity

Eliminating Symmetric Recursive Rules

S* S* S S SS S (S) S ()

Replace S SS with one of:

S  SS1 /* force branching to the left S  S1S /* force branching to the right

So we get:

S* S  SS1

S* S S  S1 S1 (S) S1 ()

Page 24: Structure and Ambiguity Removing Ambiguity

Eliminating Symmetric Recursive Rules

So we get: S* S* S S  SS1 S  S1 S1 (S) S1 ()

S*

S

S S1

S S1

S1

( ) ( ) ( )

Page 25: Structure and Ambiguity Removing Ambiguity

Arithmetic Expressions

E E + E E E E E (E) E id }

E E

E E E E

E E E E

id id id id id id

Problem 1: Associativity

Page 26: Structure and Ambiguity Removing Ambiguity

Arithmetic Expressions

E E + E E E E E (E) E id }

E E

E E E E

E E E E

id id + id id id + id

Problem 2: Precedence

Page 27: Structure and Ambiguity Removing Ambiguity

Arithmetic Expressions - A Better Way

E E + T E T

T T * FT F

F (E)F id

Page 28: Structure and Ambiguity Removing Ambiguity

Ambiguous Attachment

The dangling else problem:

<stmt> ::= if <cond> then <stmt><stmt> ::= if <cond> then <stmt> else <stmt>

Consider:

if cond1 then if cond2 then st1 else st2

Page 29: Structure and Ambiguity Removing Ambiguity

<Statement> ::= <IfThenStatement> | <IfThenElseStatement> | <IfThenElseStatementNoShortIf><StatementNoShortIf> ::= <block> | <IfThenElseStatementNoShortIf> | …<IfThenStatement> ::= if ( <Expression> ) <Statement><IfThenElseStatement> ::= if ( <Expression> ) <StatementNoShortIf> else <Statement><IfThenElseStatementNoShortIf> ::= if ( <Expression> ) <StatementNoShortIf> else <StatementNoShortIf>

<Statement>

<IfThenElseStatement>

if (cond) <StatementNoShortIf> else <Statement>

The Java Fix

Page 30: Structure and Ambiguity Removing Ambiguity

Hidden: Going Too Far

S NP VPNP the Nominal | Nominal | ProperNoun | NP PPNominal N | Adjs NN cat | girl | dogs | ball | chocolate | bat ProperNoun Chris | FluffyAdjs Adj Adjs | AdjAdj young | older | smartVP V | V NP | VP PPV like | likes | thinks | hitsPP Prep NPPrep with

● Chris likes the girl with the cat.

● Chris shot the bear with a rifle.

Page 31: Structure and Ambiguity Removing Ambiguity

Normal Forms

A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid objects, with the following two properties: ● For every element c of C, except possibly a finite set of special cases, there exists some element f of F such that f is equivalent to c with respect to some set of tasks.

● F is simpler than the original form in which the elements of C are written. By “simpler” we mean that at least some tasks are easier to perform on elements of F than they would be on elements of C.

Page 32: Structure and Ambiguity Removing Ambiguity

Normal Form Examples

● Disjunctive normal form for database queries so that they can be entered in a query-by-example grid.

● Jordan normal form for a square matrix, in which the matrix is almost diagonal in the sense that its only non-zero entries lie on the diagonal and the superdiagonal.

● Various normal forms for grammars to support specific parsing techniques.

Page 33: Structure and Ambiguity Removing Ambiguity

Normal Forms for Grammars

Chomsky Normal Form, in which all rules are of one of the following two forms:

● X a, where a , or● X BC, where B and C are elements of V - .

Advantages:

● Parsers can use binary trees.● Exact length of derivations is known:

S

A B

A A B B

a a b B B

b b

Page 34: Structure and Ambiguity Removing Ambiguity

Normal Forms for Grammars

Greibach Normal Form, in which all rules are of the following form:

● X a , where a and (V - )*.

Advantages:

● Every derivation of a string s contains |s| rule applications.

● Greibach normal form grammars can easily be converted to pushdown automata with no - transitions. This is useful because such PDAs are guaranteed to halt.

Page 35: Structure and Ambiguity Removing Ambiguity

Theorems: Normal Forms ExistTheorem: Given a CFG G, there exists an equivalent Chomsky normal form grammar GC such that:

L(GC) = L(G) – {}.

Proof: The proof is by construction.

Theorem: Given a CFG G, there exists an equivalent Greibach normal form grammar GG such that:

L(GG) = L(G) – {}.

Proof: The proof is also by construction.

Details of Chomsky conversion are complex but straightforward; I leave them for you to read in Chapter 11 and/or in the next 16 slides.

Details of Greibach conversion are more complex but still straightforward; I leave them for you to read in Appendix D if you wish (not req'd).

Page 36: Structure and Ambiguity Removing Ambiguity

The Price of Normal Forms

E E + EE (E)E id

Converting to Chomsky normal form:

E E EE P EE L EE E RE id L (R )P +

Conversion doesn’t change weak generative capacity but it may change strong generative capacity.