lecture compiler construction - graz university of technology · concept of compilation –...

63
Lecture Compiler Construction Franz Wotawa [email protected] Institute for Software Technology Technische Universit¨ at Graz Inffeldgasse 16b/2, A-8010 Graz, Austria Summer term 2016 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 1 / 309

Upload: others

Post on 22-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Lecture Compiler Construction

Franz [email protected]

Institute for Software TechnologyTechnische Universitat Graz

Inffeldgasse 16b/2, A-8010 Graz, Austria

Summer term 2016

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 1 / 309

Page 2: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Compilers are everywhere

Programming languages like Java, C#, C, C++, Pascal, Modula,SML, Lisp, VHDL, Basic,..Graphical languages also need compilers (HTML, LATEX,...)Means for communication between computers like XMLNatural languages

Compilers translate a sentence written in one language to anotherlanguage.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 2 / 309

Page 3: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Example – HTML

<!DOCTYPE HTML PUBLIC "...- > <html> <head> <metacontent=text/html; ..."http-equiv=Content-Type- ><title>Hello World</title> </head> <body> <h1>HelloWorld!</h1> </body> </html>

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 3 / 309

Page 4: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Another example

(Source: en.wikipedia.org/wiki/Java bytecode)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 4 / 309

Page 5: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Compiler vs. Interpreter

Compiler: Translates from one into another language whereprograms are directly executed (C,C++,Pascal, ...).

Interpreter: Executes a program directly without converting it (BASIC,batch languages,..).

Exact boundary difficult to define nowadays (Byte codeinterpreter vs. CPU, which executes machine codestatements.Front end (analysis phase) of compilers is always needed!

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 5 / 309

Page 6: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Organizational issues

Lecture (Vorlesung) 2hPractical exercises (Ubung) 1h

Examples from the Compiler Construction theoryHands on part: Development of a compilerMore information soon (via email and/or the webpage)

Office hours:Franz Wotawa: Tuesday, 13:00–14:00Roxane Koitz: Monday, 13:00–14:00

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 6 / 309

Page 7: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Lecture dates

Monday, 29.2.2016, 16:00 - 19:00 (preliminary discussion, lexical analysis)

Monday, 7.3.2016, 16:00 - 19:00 (lexical analysis, parsing)

Tuesday, 8.3.2016, 16:00 - 17:30 (parsing, attributed grammars)

Monday, 14.3.2016, 16:00 - 19:00 (attributed grammars, type checking)

Tuesday, 15.3.2016, 16:00 - 17:30 (type checking, runtime environment)

Monday, 11.4.2016, 16:00 - 19:00 (code generation)

Monday, 18.4.2016, 16:00 - 19:00 (code optimization)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 7 / 309

Page 8: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Exam

Written form only; no papers, books etc. allowed!Content: DFA, NFA, LL(1), SLR(1), Attributed Grammars, TypeChecking, Code Generation, Code Optimization, etc.)1. date: Monday, 9.5.2016, 18:00-20:00, i13, (2 groups, each 1hour)2. date: Monday, 13.6.2016, 18:00-20:00, i13, (2 groups, each 1hour)3. date: in October 2016, will be announced soon

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 8 / 309

Page 9: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Literature, etc.

Aho, Seti, Ullman, Compilers Principles, Techniques, andTools, Addison-Wesley, 1985, ISBN 0-201-10088-6.Tremblay, Sorenson, The Theory and Practice of Compiler Writing,McGraw Hill, 1985, ISBN 0-07-065161-2.Copy of slides but no lecture notesCompiler construction webpage:http://www.ist.tugraz.at/cb16.html

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 9 / 309

Page 10: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

PART 1 - INTRODUCTION

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 10 / 309

Page 11: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

What is a compiler?

ProgramSource language⇒ Target languageError reports if source code contains errors

Source program −→ Compiler −→ Target program↓

Error messages

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 11 / 309

Page 12: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Implications of definition

There are (very) many compilersThousands of source languages (e.g. Pascal, Modula, C, C++,Java, . . . )Thousands of target languages (other high-level programminglanguages, machine code)Classification: single-pass, multi-pass, debugging, or optimizingBut: basics of creating a compiler are mostly consistent

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 12 / 309

Page 13: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Concept of compilation

Analysis phaseSplit source program into tokensGenerate intermediate code (intermediate representation of sourceprogram)

Synthesis phaseGenerate target program based on intermediate code

Enables reuse of program parts for different source and targetlanguages

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 13 / 309

Page 14: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Concept of compilation – Analysis

Operations used within program are stored in hierarchicalstructureSyntax TreeOther tools which perform an analysis:

Structure editorsPretty printersStatic checkersInterpretersWeb browser

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 14 / 309

Page 15: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Language-processing systems

skeletal source program↓

preprocessor↓

source program↓

compiler↓

target assembly language↓

assembler↓

relocatable machine code

↓relocatable machine code

loader/link-editor ←−library,relocatableobject files

↓absolute machine code

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 15 / 309

Page 16: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Components of a compiler

lexical analyser

semantic analyser

syntax analyser

intermediate code

generator

code optimizer

code generator

error handler

manager

symbol-table

source code

target program

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 16 / 309

Page 17: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Lexical analysis, scanning

Example:pos := init + rate * 60

Tokens:1 Identifier pos2 Assignment symbol :=3 Identifier init4 Plusoperator5 Identifier rate6 Multiplication operator7 Constant (number) 60

Syntax tree

:=

pos+

init *

rate 60

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 17 / 309

Page 18: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Syntax analysis, parsing

Grammatical analysisToken↔ rules of grammarRules of grammar (example)

1 Each identifier is an expression2 Each number is an expression3 Assuming that ex1 and ex2 are expressions, then ex1 + ex2 andex1 ∗ ex2 are expressions as well

4 A term: identifier := Expression is a statement

Parse tree

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 18 / 309

Page 19: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Example - Parse tree

assignment

statement

:=identifier expression

pos

expression +

identifier

init

expression

expression

identifier

rate

*

60

expression

number

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 19 / 309

Page 20: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Semantic analysis

Check for semantic errorsType checkingIdentification of operators and operandsNecessary for code generationExample: conversion from int to real

statementpos := init + rate * 60

becomespos := init + rate * int2real(60)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 20 / 309

Page 21: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Intermediate code generation

E.g.: three-address codeExample:1. tmp1 := int2real(60)2. tmp2 := id3 * tmp13. tmp3 := id2 + tmp24. id1 := tmp3

using the symbol table

pos (id1) real . . .init (id2) real . . .rate (id3) real . . .

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 21 / 309

Page 22: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Code optimizer

Goal: faster machine codeExample:

1. tmp1 := int2real(60)2. tmp2 := id3 * tmp13. tmp3 := id2 + tmp24. id1 := tmp3

⇓1. tmp1 := id3 * 60.02. id1 := id2 + tmp1

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 22 / 309

Page 23: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Code generator

Generation of actual target code (Machine code, Assembler)Example:

1. MOVF id3, R22. MULF #60.0, R23. MOVF id2, R14. ADDF R2, R15. MOVF R1, id1

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 23 / 309

Page 24: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

PART 2 - LEXICAL ANALYSIS

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 24 / 309

Page 25: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Goals of this section

Specification and implementation of lexical analysersEasiest method:

1 Create diagram which describes structure of tokens2 Manually translate diagram into a program

Regular expressions in finite state machines

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 25 / 309

Page 26: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Tasks

get next token

token

source program analyser

lexicalparser

symbol

table

Filter comments, white spaces, ..Establish relations between error messages of compiler andsource code (e.g. via line number)Preprocessor functions (e.g. in C)Create copy of source program containing error messages

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 26 / 309

Page 27: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Why separate lexical analysis and parsing?

1 Simpler design: e.g. removal of comments and white spacesenables use of simpler parser design

2 Improve efficiency: large portion of time is invested into reading ofprograms and conversion into tokens

3 Enhance portability of compilers4 ( There are tools for both phases )

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 27 / 309

Page 28: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Tokens, patterns, lexemes

Pattern: Rule describing a set of strings (a token)Token: Set of strings described by a pattern

Lexem: Sequence of characters which are matched by a patternof a token

Token Lexem Patterns (verbal)if if ifrelation >, <, . . . < or > or . . .id pi, test cases letter followed by letters, digits, or ’ ’num 2.7, 0 , 12e-4 any numeric constantliteral “segmentation fault “ any characters between “ and “

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 28 / 309

Page 29: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Tokens

Terminals in the grammar (checked by the parser)In programming languages:keywords, operators, identifiers, constants, strings, brackets

Language conventions:1 Tokens have to occur in specific places within the code2 Tokens are separated by white spaces

Example (Fortran): DO 5 I = 1.25 is interpreted as DO5I =1.25

3 Keywords are reserved and must not be used as identifiers1. is (currently) not used2. is accepted3. is reasonable (and simplyfies compiler construction)

IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 29 / 309

Page 30: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Token attributes

Store lexeme for further processingStore pointers to entries in symbol tableLine number or position of token within source code

U = 2 * R * PI

< id, pointer to U >< assign op >< num, integer value 2 >< mult op >< id, pointer to R >< mult op >< id, pointer to PI >

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 30 / 309

Page 31: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Lexical errors

fi ( a == 1) ...⇒ fi is interpreted as undefined identifierString sequences which cannot be matched to a tokene.g.: 2.3e*4 instead of 2.3e+4.Error-correcting:

1 Panic Mode: Removal of faulty characters until a token can bematched

2 Remove a faulty character3 Include a new character4 Replace a character5 Change order of neighboring characters

Some errors (like the first example) may be only recognizable bythe parser

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 31 / 309

Page 32: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Specification of tokens

Strings over an alphabet:Alphabet . . . finite set of symbols example: {0, 1} Binary alphabetString (word, sentence) . . . finite sequence of symbols of the alphabetε . . . empty string

Language: Set of all strings over an alphabetOperators:

Concatenation xy is a string whereby y is attached to the end of xxi is defined as: x0 = ε, i > 0→ xi = xi−1xPrefix: computer→ compSuffix: computer→ uterSubstring: computer→ putSubsequence: computer→ opt

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 32 / 309

Page 33: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Operations on languages

Union L ∪M = {x|x ∈ L∨x ∈M}Concatenation LM = {xy|x ∈ L∧ y ∈M}Kleen Closure L∗ =

⋃∞i=0 L

i

Positive Closure L+ =⋃∞i=1 L

i

Example: L = {A, . . . , Z},M = {0, 1, . . . , 9}L ∪M Set of letters and numbersLM Strings containing a letter followed by a numberL4 Strings which are made up of four lettersL∗ Strings made up of letters including εM+ Number strings containing at least one number

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 33 / 309

Page 34: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Regular expressions

Example: Pascal identifier letter ( letter | digit )∗

Regular expression r defines a language over an alphabet Σusing the following rules:

1 The regular expression ε describes {ε}2 a ∈ Σ: The regular Expression a describes {a}3 r, s are regular expressions (languages L(r), L(s)):

1 (r)|(s) describes L(r) ∪ L(s)2 (r)(s) describes L(r)L(s)3 (r)∗ describes (L(r))∗

4 (r)+ describes (L(r))+

Regular sets

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 34 / 309

Page 35: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Examples

Assuming Σ = {a, b}1 a|b describes the language {a, b}2 (a|b)(a|b) describes {aa, ab, ba, bb}3 a∗ describes {ε, a, aa, aaa, aaaa, . . .}4 (a|b)∗ describes all strings which contain no or multiple a’s and b’s

Not all languages can be described by regular expressionse.g. {wcw|w is a string made up of a’s and b’s}Regular expressions may only describe words containing either afixed amount of repetitions or an infinite amount of repetitions

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 35 / 309

Page 36: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Regular definitions

A regular definition is a sequence of the form:

d1 → r1. . .dn → rn

whereby di is a distinct name and ri is a regular expression overΣ ∪ {d1, . . . , dn}Example:

letter → A| . . . |Z|a| . . . |zdigit → 0|1| . . . |9

id → letter(letter|digit)∗

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 36 / 309

Page 37: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Token detection

Goal: Identify lexeme of token in input buffer

Output: Token-attribute-pair

Regular Expression Token Attribute Valuewhite space - -

if if -then then -

id id pointer to table entrynum num pointer to table entry< relop LT> relop GT. . . . . . . . .

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 37 / 309

Page 38: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Transition diagrams

Actions performed by lexical analyser (after get next token ofthe parser)States connected by directed edgesEdges are labeled:

Labels contain symbols expected in input buffer in order to toprogress to next stateLabel other denotes all symbols not used by any other edgeoriginating from the same node

One start stateEnd states after matching a tokenDeterministic (not necessarily)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 38 / 309

Page 39: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Example

Regular expression letter (letter | digit)∗ can be described byfollowing transition diagram:

start letter

letter or digit

otherreturn(gettoken(),install_id())1 2 3

Assume the input buffer contains:Pos 1 2 3 4 . . .

P I = . . .and the current-symbol pointer is pointing to position 1

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 39 / 309

Page 40: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Tokenmatching - Example

1 Current symbol = P, position = 1⇒ Transition from state 1 to state 2 and read new symbol

2 Current symbol = I, position = 2⇒ Remain in state 2 and read new symbol

3 Current symbol = SPACE, position = 3⇒ Transition from state 2 to state 3

4 State 3 is an end state

start letter

letter or digit

otherreturn(gettoken(),install_id())1 2 3

P

I

’ ’

1

2

3

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 40 / 309

Page 41: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Example – Implementing a Lexer directly

Pseudo code1 lexem = ““;2 if (next char ∈ letter) then3 lexem = lexem + next char;4 get next char();5 while (next char ∈ letter ∪ digit) do6 lexem = lexem + next char;7 get next char();8 return IDENTIFIER(lexem);

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 41 / 309

Page 42: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Generalization – Finite automata

Nondeterministic finite automaton (NFA) (S,Σ,move, s0, F )1 Set of states S2 Set of input symbols Σ3 Transition relation move : S × (Σ ∪ {ε}) 7→ 2S , relating

state-symbol-pairs to states4 Initial state s0 ∈ S5 Set of end states F ⊆ S

NFA can be visualized in form of directed graph (transition graph)Distinctions from transition diagram:

Nondeterministicε-moves allowed

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 42 / 309

Page 43: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

NFA – Example

NFA for aa∗|bb∗ (language, accepting NFA):({0, 1, 2, 3, 4}, {a, b},move, 0, {2, 4}) withmove:

state symbol new state0 ε {1,3}1 a {2}2 a {2}3 b {4}4 b {4}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 43 / 309

Page 44: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Deterministic finite automaton

Deterministic finite automaton (DFA) is a special case of a NFAwhereby:

1 No state has an ε-move2 For each state s there is at most one edge which is

marked with an input symbol a

DFAs contain exactly one transition for each inputEach entry in transition table contains exactly one state

State Input Symbolsa b

0 1 2

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 44 / 309

Page 45: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

DFA simulation

Input: Input string x, DFA DOutput: ’Yes’ if D accepts x, ’No’ otherwise

s := s0c := nextcharwhile c 6= eof do

s := move(s, c)if s ==⊥ then

return ’No’c := nextchar

endif s ∈ F then

return ’Yes’else

return ’No’end

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 45 / 309

Page 46: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

DFA – Example

DFA for aa∗|bb∗:({0, 1, 2}, {a, b},move, 0, {1, 2}) whereby

move:

state symbol new state0 a 10 b 21 a 12 b 2

0 b

a

2

1

a

b

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 46 / 309

Page 47: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Conversion NFA→ DFA

Subset construction algorithmIdea:

1 Each DFA state corresponds to a set of NFA states2 After reading a1a2 . . . an the DFA is in a state which ist equivalent to

a subset of the NFA. This subset corresponds to the path of theNFA when reading a1a2 . . . an.

Number of DFA states is exponential in the nunber of NFA states(worst case)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 47 / 309

Page 48: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Subset construction

Input: NFA N = (S,Σ,move, s0, F )Output: DFA D, accepting the same language as N

Initially, ε-closure(s0) is the only state in Dstates and it is unmarkedwhile there is an unmarked state T in Dstates do

mark Tfor each input symbol a do

U := ε-closure(move(T ,a))if U 6∈ Dstates then

Add U as unmarked state to DstatesendDtran[T, a] := U

endend

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 48 / 309

Page 49: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

ε-closure, move

ε-closure(s): Set of NFA states which are accessible from a state svia ε-movesε-closure(T ): Set of NFA states which are accessible from a states ∈ T via ε-movesmove(T , a): Set of NFA states which are accessible from a states ∈ T via a move relation of the input symbol a

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 49 / 309

Page 50: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Calculation of ε-closure

Input: NFA N , set of states TOutput: ε-closure

Push all states in T onto stackInitialize ε-closure(T ) to Twhile stack 6= ∅ do

Pop t, the top element of stackfor each state u with an edge from t to u labeled ε do

if u 6∈ ε-closure(T ) thenAdd u to ε-closure(T )Push u onto stack

endend

end

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 50 / 309

Page 51: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Example NFA→ DFA

0 1

2 3

4 5

6 7 8 9 10

ε

ε

ε

ε

ε

ε

ε

ε

a

b

a b b

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 51 / 309

Page 52: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Thompson’s ConstructionInput:

Regular expression r over an alphabet ΣOutput: NFA N , which accepts L(r)

ε ε

i f st fi

N(s)

N(t)

a ∈ Σ i fa s∗ i

N(s)

f

ε

ε

ε

ε

s|t i f

N(s)

N(t)

ε

ε

ε

ε

[s]

whereby [s] = (s | ε)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 52 / 309

Page 53: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

NFA simulation

Input: NFA N , input string xOutput: ’Yes’ if N accepts x, ’No’ otherwise

S := ε-closure({s0})a := nextCharwhile a 6= eof do

S := ε-closure(move(S,a))a := nextChar

endif S ∩ F 6= ∅ then

return ’Yes’else

return ’No’end

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 53 / 309

Page 54: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Summary

Check whether an input string x is contained in a languagedefined by the regular expression r:

1 Construct NFA (Thompson’s construction) and apply simulationalgorithm for NFAs

2 Construct NFA (Thompson’s construction), convert NFA to DFA andapply simulation algorithm for DFAs

Time-space tradeoffs

Automaton Space TimeNFA O(|r|) O(|r| · |x|)DFA O(2|r|) O(|x|)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 54 / 309

Page 55: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Regular expression→ DFA

Direct conversionNotation:

NFA state s is important↔ s has at least one non-ε-move to asuccessorExtended regular expression (Augmented regular expression) (r)#Illustration of extended expressions via syntax trees (cat-node,or-node, star-node)Each leaf node is labeled either with a symbol (of the alphabet) orwith εEach leaf node (6= ε) has a designated number (position)

Remark: NFA-DFA-conversion only via important states =positions

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 55 / 309

Page 56: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Syntax tree – Example

*

|

a b

b

#

b

a

1 2

3

4

5

6

Syntax tree for (a|b)∗abb#

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 56 / 309

Page 57: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Algorithm - Idea

Approach:1 Create syntax tree for extended regular expression2 Calculate four functions: nullable, firstpos, lastpos, followpos3 Construct DFA using followpos

DFA states correspond to sets of positions (important NFA states)Position i, followpos(i) are positions j for which there exists aninput string . . . cd . . . where i corresponds to c and j correspondsto d

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 57 / 309

Page 58: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

followpos – Example

Syntax tree for (a|b)∗abb#followpos(1) = {1, 2, 3}Explanation: Suppose we see an a. This symbol could belongeither to (a|b)∗ or to the following a. If we see a b then this symbolhas to belong to (a|b)∗. Thus positions 1,2,3 are contained infollowpos(1).Informal definition of functions (node n, string s)

firstpos . . . positions, which match the first symbol of slastpos . . . positions, which match the last symbol of snullable . . . True, if node n can create a language containing theempty string, false otherwise

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 58 / 309

Page 59: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Rules for nullable, firstpos, lastpos

Node n nullable firstpos [lastpos]Leaf n labeledε

true ∅ [∅]Leaf n labeledposition i false {i} [{i}]

c1|c2 nullable(c1)∨nullable(c2) firstpos(c1) ∪ firstpos(c2)[lastpos(c1) ∪ lastpos(c2)]

c1 • c2 nullable(c1)∧nullable(c2)

if nullable(c1) thenfirstpos(c1) ∪firstpos(c2)else firstpos(c1)if nullable(c2) thenlastpos(c1) ∪ lastpos(c2)else lastpos(c2)

c∗1 true firstpos(c1) [lastpos(c1)]

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 59 / 309

Page 60: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

firstpos, followpos – Example

*

|

a b

b

#

b

a

1 2

3

4

5

6

{1} {2}

{3}

{4}

{5}

{6}

{1,2,3}

{1,2}

{1} {2}

{3}

{4}

{5}

{6}

{6}

{5}

{4}

{3}

{1,2,3}

{1,2,3}

{1,2,3}

{1,2}

{1,2} {1,2}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 60 / 309

Page 61: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

Calculation followpos

1 If n is a cat-node (•), c1 is the left, c2 is the right child and i is aposition in lastpos(c1), then all positions of firstpos(c2) arecontained in followpos(i)

2 If n is a star-node (∗) and i is contained in lastpos(n), then allpositions of firstpos(c) are contained in followpos(i)

Node followpos1 {1,2,3}2 {1,2,3}3 {4}4 {5}5 {6}6 -

1

2

3 4 5 6

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 61 / 309

Page 62: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

followpos-graph→ NFA

A NFA without ε-moves can be created based on afollowpos-graph:

1 Mark all positions in firstpos of the root node of the syntax tree asinitial states

2 Label each directed edge (i, j) with the symbol of position i3 Mark position which belongs to # as end state

⇒ followpos-graph can be converted to DFA (using subsetconstruction)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 62 / 309

Page 63: Lecture Compiler Construction - Graz University of Technology · Concept of compilation – Analysis Operations used within program are stored in hierarchical structure Syntax Tree

DFA-algorithmInput:

Regular expression rOutput: DFA D, accepting the language L(r)

Initially, the only unmarked state in Dstates is firstpos(root)while there is an unmarked state T ∈ Dstates do

Mark Tfor each input symbol a do

Let U be the set of positions that are in followpos(p)for some position p ∈ T where the symbol of p is a.

if U 6= ∅ and U 6∈ Dstates thenAdd U as unmarked state to Dstates

endDTran(T, a) = U

endend

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 63 / 309