review of cfgs and parsing - university of waterloo

78
Winter 2014 Costas Busch - RPI 1 Review of CFGs and Parsing – I Context-free Languages and Grammars

Upload: others

Post on 13-Apr-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 1

Review of CFGs and Parsing – I Context-free Languages and Grammars

Page 2: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 2

Regular Languages

}0:{ ≥nba nn }{ Rww

**ba *)( ba +

Context-Free Languages

Page 3: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 3

Context-Free Languages

Pushdown Automata

Context-Free Grammars

stack

automaton

Page 4: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 4

Context-Free Grammars

Page 5: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 5

Formal Definitions

( )PSTVG ,,,=

Set of Variables/ Non-terminal Set of

terminal symbols

Start Symbol

Set of productions

Grammar:

Page 6: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 6

Context-Free Grammar:

All productions in are of the form

sA →

String of Non-terminal and terminals

),,,( PSTVG =

P

Non-terminal

Page 7: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 7

λ|aSbS →

( )PSTVG ,,,=

}{SV =},{ baT =

},{ λ→→= SaSbSP

variables terminals

productions

start variable

Example of Context-Free Grammar

Page 8: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 8

For a grammar with start variable G S

String of terminals or

*},:{)(*

TwwSwGL ∈⇒=

λ

Language of a Grammar:

Page 9: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 9

Example

λ→

SaSbSGrammar:

Non-terminals

Sequence of terminals and non-terminals

The right side may be λ

Page 10: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 10

Grammar: Derivation of string :

λ→

SaSbS

abaSbS ⇒⇒

ab

aSbS→ λ→S

Page 11: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 11

Grammar: Derivation of string :

aabbaaSbbaSbS ⇒⇒⇒

aSbS→ λ→S

aabb

λ→

SaSbS

Page 12: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 12

λ→

SaSbS

}0:{ ≥= nbaL nn

Grammar:

Language of the grammar:

Page 13: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 13

We write: Instead of:

aaabbbS*⇒

aaabbbaaaSbbbaaSbbaSbS ⇒⇒⇒⇒

for zero or more derivation steps

A Convenient Notation

Page 14: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 14

nww*

1 ⇒

nwwww ⇒⇒⇒⇒ 321

in zero or more derivation steps

In general we write:

If:

ww*⇒Trivially:

Page 15: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 15

A language is context-free if there is a context-free grammar with

LG

)(GLL =

Context-Free Language:

Page 16: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 16

since context-free grammar :

}0:{ ≥= nbaL nn

G

Example:

is a context-free language

generates LGL =)(

λ|aSbS →

Page 17: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 17

λ||bSbaSaS →

abbaabSbaaSaS ⇒⇒⇒

Context-free grammar : G

Example derivations:

abaabaabaSabaabSbaaSaS ⇒⇒⇒⇒

=)(GL }*},{:{ bawwwR ∈

Palindromes of even length

Another Example

Page 18: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 18

λ|| SSaSbS→

ababSaSbSSSS ⇒⇒⇒⇒

Context-free grammar : G

Example derivations:

abababaSbabSaSbSSSS ⇒⇒⇒⇒⇒

}prefixanyin )()( and

),()(:{

vvnvn

wnwnw

ba

ba≥

==)(GL

() ((( ))) (( ))

Describes matched parentheses: )b (, ==a

Another Example

Page 19: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 19

Derivation Order and

Derivation Trees

Page 20: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 20

Derivation Order Consider the following example grammar with 5 productions:

ABS→.1λ→

AaaAA

.3

.2λ→

BBbB

.5

.4

Page 21: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 21

aabaaBbaaBaaABABS54321⇒⇒⇒⇒⇒

Leftmost derivation order of string : aab

At each step, we substitute the leftmost variable

ABS→.1λ→

AaaAA

.3

.2λ→

BBbB

.5

.4

Page 22: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 22

aabaaAbAbABbABS32541⇒⇒⇒⇒⇒

Rightmost derivation order of string : aab

At each step, we substitute the rightmost variable

ABS→.1λ→

AaaAA

.3

.2λ→

BBbB

.5

.4

Page 23: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 23

aabaaAbAbABbABS32541⇒⇒⇒⇒⇒

Rightmost derivation of : aab

aabaaBbaaBaaABABS54321⇒⇒⇒⇒⇒

Leftmost derivation of : aab

ABS→.1λ→

AaaAA

.3

.2λ→

BBbB

.5

.4

Page 24: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 24

Derivation Trees Consider the same example grammar:

aabaaBbaaABbaaABABS ⇒⇒⇒⇒⇒

And a derivation of : aab

ABS→ λ|aaAA→ λ|BbB→

Page 25: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 25

ABS⇒S

BA

ABS→ λ|aaAA→ λ|BbB→

yield AB

Page 26: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 26

aaABABS ⇒⇒

a a A

S

BA

ABS→ λ|aaAA→ λ|BbB→

yield aaAB

Page 27: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 27

aaABbaaABABS ⇒⇒⇒S

BA

a a A B b

ABS→ λ|aaAA→ λ|BbB→

yield aaABb

Page 28: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 28

aaBbaaABbaaABABS ⇒⇒⇒⇒S

BA

a a A B b

λ

ABS→ λ|aaAA→ λ|BbB→

yield aaBbBbaa =λ

Page 29: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 29

aabaaBbaaABbaaABABS ⇒⇒⇒⇒⇒

yield aabbaa =λλ

S

BA

a a A B b

λ λ

Derivation Tree

ABS→ λ|aaAA→ λ|BbB→

(parse tree)

Page 30: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 30

aabaaBbaaBaaABABS ⇒⇒⇒⇒⇒

aabaaAbAbABbABS ⇒⇒⇒⇒⇒S

BA

a a A B b

λ λ

Give same derivation tree

Sometimes, derivation order doesn’t matter Leftmost derivation:

Rightmost derivation:

Page 31: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 31

Ambiguity

Page 32: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 32

Grammar for mathematical expressions

))(()( aaaaaaa +∗++∗+

Example strings:

Denotes any number

aEEEEEE |)(|| ∗+→

Page 33: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 33

A leftmost derivation for aaa ∗+

E

EE

EE

+

a

a a

aaaEaaEEaEaEEE*+⇒∗+⇒

∗+⇒+⇒+⇒

aEEEEEE |)(|| ∗+→

Page 34: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 34

E

EE

+

a a

EE a

aaaEaaEEaEEEEEE

∗+⇒∗+⇒

∗+⇒∗+⇒∗⇒

Another leftmost derivation for

aEEEEEE |)(|| ∗+→

aaa ∗+

Page 35: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 35

aaa ∗+ E

EE

+

a a

EE a

E

EE

EE

+

a

a a

Two derivation trees for

aEEEEEE |)(|| ∗+→

Page 36: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 36

E

EE

+

EE

E

EE

EE

+

∗2

2 2 2 2

2

222 ∗+=∗+ aaa

take 2=a

Page 37: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 37

E

EE

+

EE

E

EE

EE

+

6222 =∗+

2

2 2 2 2

2

8222 =∗+

4

2 2

2

6

2 2

24

8

Good Tree Bad Tree

Compute expression result using the tree

Page 38: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 38

Two different derivation trees may cause problems in applications which use the derivation trees:

•  Evaluating expressions

•  In general, in compilers for programming languages

Page 39: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 39

Ambiguous Grammar:

A context-free grammar is ambiguous if there is a string which has:

two different derivation trees or two leftmost derivations

G)(GLw∈

(Two different derivation trees give two different leftmost derivations and vice-versa)

Page 40: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 40

E

EE

+

a a

EE a

E

EE

EE

+

a

a a

string aaa ∗+ has two derivation trees

aEEEEEE |)(|| ∗+→

this grammar is ambiguous since

Example:

Page 41: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 41

string aaa ∗+ has two leftmost derivations

aaaEaaEEaEEEEEE

∗+⇒∗+⇒

∗+⇒∗+⇒∗⇒

aaaEaaEEaEaEEE*+⇒∗+⇒

∗+⇒+⇒+⇒

aEEEEEE |)(|| ∗+→

this grammar is ambiguous also because

Page 42: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 42

IF_STMT if EXPR then STMT →| if EXPR then STMT else STMT

Another ambiguous grammar:

Variables Terminals

Very common piece of grammar in programming languages

Page 43: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 43

If expr1 then if expr2 then stmt1 else stmt2

IF_STMT

expr1 then

else if expr2 then

STMT

stmt1

if

IF_STMT

expr1 then else

if expr2 then

STMT stmt2 if

stmt1

stmt2

Two derivation trees

Page 44: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 44

In general, ambiguity is bad and we want to remove it

Sometimes it is possible to find a non-ambiguous grammar for a language

But, in general we cannot do so

Page 45: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 45

aEEE

EEEEEE

∗→

+→

)(aEFFFTTTTEE

|)(||

∗→

+→

Ambiguous Grammar Non-Ambiguous

Grammar

Equivalent

generates the same language

A successful example:

Page 46: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 46

aaaFaaFFaFTaTaTFTTTEE

∗+⇒∗+⇒∗+⇒

∗+⇒+⇒+⇒+⇒+⇒

E

E T+

T ∗ F

F

a

T

F

a

a

aEFFFTTTTEE

|)(||

∗→

+→

Unique derivation tree for aaa ∗+

Page 47: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 47

}{}{ mmnmnn cbacbaL ∪=

0, ≥mn

An un-successful example:

every grammar that generates this language is ambiguous

L is inherently ambiguous:

Page 48: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 48

}{}{ mmnmnn cbacbaL ∪=

λ||11

aAbAAcSS

λ||22

bBcBBaSS

→21 | SSS→

Example (ambiguous) grammar for : L

Page 49: Review of CFGs and Parsing - University of Waterloo

Professor Alex Aiken Lecture #5 (Modified by Professor Vijay

Ganesh)

49

Dealing with Ambiguity There are several ways to handle ambiguity Most direct method is to rewrite grammar

unambiguously Enforces precedence of * over +

' '

'

E E E | EE id E | id | (E) E | (E)

→ +

ʹ′ ʹ′→ ∗ ∗

Page 50: Review of CFGs and Parsing - University of Waterloo

50

Ambiguity No general techniques for handling ambiguity Impossible to convert automatically an

ambiguous grammar to an unambiguous one Used with care, ambiguity can simplify the

grammar. Sometimes allows more natural definitions. However, we still need to eventually eliminate ambiguity. How?

Page 51: Review of CFGs and Parsing - University of Waterloo

51

Precedence and Associativity Declarations Instead of rewriting the grammar

Use the more natural (ambiguous) grammar Along with disambiguating declarations

Most parser-generators allow precedence

and associativity declarations to disambiguate grammars

Page 52: Review of CFGs and Parsing - University of Waterloo

52

Associativity Declarations •  Ambiguous •  two parse trees of int + int + int

•  Left associativity declaration: %left +

E

E

E E

E +

int +

int int

E

E

E E

E +

int +

int int

Page 53: Review of CFGs and Parsing - University of Waterloo

53

Precedence Declarations •  Consider the string int + int * int •  Precedence declarations: %left + %left *

E

E

E E

E +

int *

int int

E

E

E E

E *

int +

int int

Page 54: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 54

Properties of

Context-Free languages

Page 55: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 55

Context-free languages are closed under: Union

1L is context free

2L is context free 21 LL ∪

is context-free

Union

Page 56: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 56

Example

λ|11 baSS →

λ|| 222 bbSaaSS →

Union

}{1nnbaL =

}{2RwwL =

21 | SSS→}{}{ Rnn wwbaL ∪=

Language Grammar

Page 57: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 57

In general:

The grammar of the union has new start variable and additional production 21 | SSS→

For context-free languages with context-free grammars and start variables

21, LL

21, GG21, SS

21 LL ∪S

Page 58: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 58

Context-free languages are closed under: Concatenation

1L is context free

2L is context free 21LL

is context-free

Concatenation

Page 59: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 59

Example

λ|11 baSS →

λ|| 222 bbSaaSS →

Concatenation

}{1nnbaL =

}{2RwwL =

21SSS→}}{{ Rnn wwbaL =

Language Grammar

Page 60: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 60

In general:

The grammar of the concatenation has new start variable and additional production 21SSS→

For context-free languages with context-free grammars and start variables

21, LL

21, GG21, SS

21LLS

Page 61: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 61

Context-free languages are closed under: Star-operation

L is context free *L is context-free

Star Operation

Page 62: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 62

λ|aSbS→}{ nnbaL =

λ|11 SSS →*}{ nnbaL =

Example

Language Grammar

Star Operation

Page 63: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 63

In general:

The grammar of the star operation has new start variable and additional production

For context-free language with context-free grammar and start variable

LGS

*L1S

λ|11 SSS →

Page 64: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 64

Context-free Languages don’t have the following

Properties

Page 65: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 65

Context-free languages are not closed under: intersection

1L is context free

2L is context free

21 LL ∩

not necessarily context-free

Intersection

Page 66: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 66

Example

}{1mnn cbaL =

λ

λ

||

cCCaAbAACS

Context-free:

}{2mmn cbaL =

λ

λ

||

bBcBaAAABS

→Context-free:

}{21nnn cbaLL =∩ NOT context-free

Intersection

Page 67: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 67

Context-free languages are not closed under: complement

L is context free L not necessarily context-free.

Complement

I.e., the complement of a context-free language may or may not be context-free.

Page 68: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 68

}{2121nnn cbaLLLL =∩=∪

NOT context-free

Example

}{1mnn cbaL =

λ

λ

||

cCCaAbAACS

Context-free:

}{2mmn cbaL =

λ

λ

||

bBcBaAAABS

→Context-free:

Complement

Page 69: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 69

Intersection of

Context-free languages and

Regular Languages

Page 70: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 70

The intersection of a context-free language and a regular language is a context-free language

1L context free

2L regular 21 LL ∩

context-free

Page 71: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 71

Applications of

Regular Closure

Page 72: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 72

The intersection of a context-free language and a regular language is a context-free language

1L context free

2L regular 21 LL ∩

context-free

Regular Closure

Page 73: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 73

An Application of Regular Closure

Prove that: }0,100:{ ≥≠= nnbaL nn

is context-free

Page 74: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 74

}0:{ ≥nba nn

We know:

is context-free

Page 75: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 75

}{ 1001001 baL = is regular

}{}){( 100100*1 babaL −+= is regular

We also know:

Page 76: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 76

}{}){( 100100*1 babaL −+=

regular }{ nnba

context-free

1}{ Lba nn ∩ context-free

LnnbaLba nnnn =≥≠=∩ }0,100:{}{ 1is context-free

(regular closure)

Page 77: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 77

Another Application of Regular Closure

Prove that: }:{ cba nnnwL ===

is not context-free

Page 78: Review of CFGs and Parsing - University of Waterloo

Winter 2014 Costas Busch - RPI 78

}:{ cba nnnwL ===

}{*}**{ nnn cbacbaL =∩

context-free regular context-free

If is context-free

Then

Impossible!!!

Therefore, is not context free L

(regular closure)