languages and finite automataryan/cse4083/busch/class09.pdf · 6 the parser finds the derivation of...

48
1 Compilers

Upload: others

Post on 07-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

1

Compilers

Page 2: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

2

Compiler

Program

v = 5;

if (v>5)

x = 12 + v;

while (x !=3) {

x = x - 3;

v = 10;

}

......

Add v,v,0

cmp v,5

jmplt ELSE

THEN:

add x, 12,v

ELSE:

WHILE:

cmp x,3

...

Machine Code

Page 3: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

3

Lexical

analyzer parser

Compiler

programmachine

code

input output

Page 4: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

4

A parser knows the grammar

of the programming language

Page 5: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

5

ParserPROGRAM STMT_LIST

STMT_LIST STMT; STMT_LIST | STMT;

STMT EXPR | IF_STMT | WHILE_STMT

| { STMT_LIST }

EXPR EXPR + EXPR | EXPR - EXPR | ID

IF_STMT if (EXPR) then STMT

| if (EXPR) then STMT else STMT

WHILE_STMT while (EXPR) do STMT

Page 6: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

6

The parser finds the derivation

of a particular input

10 + 2 * 5

Parser

E -> E + E

| E * E

| INT

E => E + E

=> E + E * E

=> 10 + E*E

=> 10 + 2 * E

=> 10 + 2 * 5

input

derivation

Page 7: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

7

10

E

2 5

E => E + E

=> E + E * E

=> 10 + E*E

=> 10 + 2 * E

=> 10 + 2 * 5

derivation

derivation tree

E E

E E

+

*

Page 8: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

8

10

E

2 5

derivation tree

E E

E E

+

*

mult a, 2, 5

add b, 10, a

machine code

Page 9: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

9

Parsing

Page 10: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

10

grammar

Parserinput

stringderivation

Page 11: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

11

Example:

Parser

derivation

S

bSaS

aSbS

SSSinput

?aabb

Page 12: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

12

Exhaustive Search

||| bSaaSbSSS

Phase 1:

S

bSaS

aSbS

SSSaabb

All possible derivations of length 1

Find derivation of

Page 13: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

13

S

bSaS

aSbS

SSS aabb

Page 14: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

14

Phase 2

aSbS

SSS

aabb

SSSS

bSaSSSS

aSbSSSS

SSSSSS

Phase 1

abaSbS

abSabaSbS

aaSbbaSbS

aSSbaSbS

||| bSaaSbSSS

Page 15: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

15

Phase 2

SSSS

aSbSSSS

SSSSSS

aaSbbaSbS

aSSbaSbS

Phase 3

aabbaaSbbaSbS

||| bSaaSbSSS

aabb

Page 16: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

16

Final result of exhaustive search

Parser

derivation

S

bSaS

aSbS

SSSinput

aabb

aabbaaSbbaSbS

(top-down parsing)

Page 17: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

17

Time complexity of exhaustive search

Suppose there are no productions of the form

A

BA

Number of phases for string : w ||2 w

Page 18: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

18

Time for phase 1: k

k possible derivations

For grammar with rules k

Page 19: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

19

Time for phase 2: 2k

possible derivations2k

Page 20: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

20

Time for phase : ||2wk

possible derivations||2wk

||2 w

Page 21: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

21

Total time needed for string :w

||22 wkkk

Extremely bad!!!

phase 1 phase 2 phase 2|w|

Page 22: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

22

There exist faster algorithms

for specialized grammars

S-grammar: axA

symbol string

of variables

),( aA appears oncePair

Page 23: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

23

S-grammar example:

cS

bSSS

aSS

abccabcSabSSaSS

Each string has a unique derivation

Page 24: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

24

In the exhaustive search parsing

there is only one choice in each phase

For S-grammars:

Total time for parsing string :w || w

Time for a phase: 1

Page 25: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

25

For general context-free grammars:

There exists a parsing algorithm

that parses a string

in time

|| w3|| w

Page 26: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

26

Simplifications of

Context-Free Grammars

Page 27: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

27

A Substitution Rule

bB

abbAB

abBcA

aaAA

aA

abbcA

ababbAcA

aaAA

aA

Substitute B

Equivalent

grammar

Page 28: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

28

In general:

nyyyB

xBzA

||| 21

Substitute B

zxyzxyzxyA n||| 21 equivalent

grammar

Page 29: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

29

Useless Productions

aAA

AS

S

aSbS

aAaaaaAaAAS

Some derivations never terminate...

Useless Production

Page 30: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

30

bAB

A

aAA

AS

Another grammar:

Not reachable from S

Useless Production

Page 31: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

31

In general:

If wxAyS

Then variable is usefulA

Otherwise, variable is uselessA

)(GLw

Page 32: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

32

A production is useful

if all its variables are useful

xA

Page 33: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

33

Removing Useless Productions

Example Grammar:

aCbC

aaB

aA

CAaSS

||

Page 34: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

34

First: find all variables that produce

strings with only terminals

aCbC

aaB

aA

CAaSS

|| },{ BA

},,{ SBA

Round 1:

Round 2:

Page 35: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

35

Keep only the variables

that produce terminal symbols

aCbC

aaB

aA

CAaSS

||

},,{ SBA

aaB

aA

AaSS

|

Page 36: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

36

Second: Find all variables

reachable from

aaB

aA

AaSS

|

S A B

Dependency Graph

not

reachable

S

Page 37: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

37

Keep only the variables

reachable from S

aaB

aA

AaSS

|

aA

AaSS

|

Final Grammar

Page 38: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

38

Nullable Variables

:production A

Nullable Variable: A

Page 39: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

39

Removing Nullable Variables

Example Grammar:

M

aMbM

aMbS

Nullable variable

Page 40: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

40

M

M

aMbM

aMbSSubstitute

abM

aMbM

abS

aMbS

Final Grammar

Page 41: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

41

Unit-Productions

BAUnit Production:

Page 42: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

42

Removing Unit Productions

Observation:

AA

Is removed immediately

Page 43: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

43

Example Grammar:

bbB

AB

BA

aA

aAS

Page 44: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

44

bbB

AB

BA

aA

aAS

SubstituteBA

bbB

BAB

aA

aBaAS

|

|

Page 45: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

45

Remove

bbB

BAB

aA

aBaAS

|

|

bbB

AB

aA

aBaAS

|

BB

Page 46: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

46

SubstituteAB

bbB

aA

aAaBaAS

||

bbB

AB

aA

aBaAS

|

Page 47: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

47

Remove repeated productions

bbB

aA

aBaAS

|

bbB

aA

aAaBaAS

||

Final grammar

Page 48: Languages and Finite Automataryan/cse4083/busch/class09.pdf · 6 The parser finds the derivation of a particular input 10 + 2 * 5 Parser E -> E + E | E * E | INT E => E + E => E +

48

Removing All

Step 1: Remove Nullable Variables

Step 2: Remove Unit-Productions

Step 3: Remove Useless Variables