chapter 2 chang chi-chung 2008.03 rev.1. a simple syntax-directed translator this chapter contains...
Post on 22-Dec-2015
222 views
TRANSCRIPT
Chapter 2
Chang Chi-Chung
2008.03 rev.1
A Simple Syntax-Directed Translator This chapter contains introductory material to
Chapters 3 to 8 To create a syntax-directed translator that maps
infix arithmetic expressions into postfix expressions.
Building a simple compiler involves: Defining the syntax of a programming language Develop a source code parser: for our compiler
we will use predictive parsing Implementing syntax directed translation to
generate intermediate code
A Code Fragment To Be Translated
{ int i; int j; float[100] a; float v; float x; while (true) { do i = i + 1; while ( a[i] < v ); do j = j – 1; while ( a[j] > v ); if ( i>= j ) break; x = a[i]; a[i] = a[j]; a[j] = x; }}
To extend syntax-directed translator to map code fragments into three-address code. See appendix A.
1: i = i + 1 2: t1 = a [ i ] 3: if t1 < v goto 1 4: j = j -1 5: t2 = a [ j ] 6: if t2 > v goto 4 7: ifFalse i >= j goto 9 8: goto 14 9: x = a [ i ]10: t3 = a [ j ]11: a [ i ] = t312: a [ j ] = x13: goto 114:
Syntaxtree
A Model of a Compiler Front End
Lexical analyzer Parser
CharacterStream
Tokenstream
Symbol Table
Sourceprogram
IntermediateCode
Generator
Three-addresscode
Two Forms of Intermediate Code Abstract syntax trees Tree-Address instructions
do-while
body
assign
i +
i 1
>
[ ]
a
v
i
1: i = i + 12: t1 = a [ i ]3: if t1 < v goto 1
Syntax Definition
Using Context-free grammar (CFG) BNF: Backus-Naur Form Context-free grammar has four components:
A set of tokens (terminal symbols) A set of nonterminals A set of productions A designated start symbol
Example of CFG
G = <T, N, P, S> T = { +,-,0,1,2,3,4,5,6,7,8,9 } N = { list, digit } P =
list list + digit list list – digit list digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
S = list
Derivations
The set of all strings (sequences of tokens) generated by the CFG using derivation Begin with the start symbol Repeatedly replace a nonterminal symbol in the c
urrent sentential form with one of the right-hand sides of a production for that nonterminal
Example of the Derivations
Leftmost derivation replaces the leftmost nonterminal (underlined) in each step.
Rightmost derivation replaces the rightmost nonterminal in each step.
list list + digit list - digit + digit digit - digit + digit 9 - digit + digit 9 - 5 + digit 9 - 5 + 2
Production list list + digit list list – digit list digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Parser Trees Given a CFG, a parse tree according to the grammar is a tree wit
h following propertes. The root of the tree is labeled by the start symbol Each leaf of the tree is labeled by a terminal (=token) or Each interior node is labeled by a nonterminal If A X1 X2 … Xn is a production, then node A has immediate chil
dren X1, X2, …, Xn where Xi is a (non)terminal or ( denotes the empty string)
Example A XYZ
A
X Y Z
Example of the Parser Tree
Parse tree of the string 9-5+2 using grammar G
list
digit
9 - 5 + 2
list
list digit
digitThe sequence ofleafs is called the
yield of the parse tree
Ambiguity
Consider the following context-free grammar
This grammar is ambiguous, because more than one parse tree represents the string 9-5+2
P = string string + string | string - string | 0 | 1 | … | 9
G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>
Ambiguity (Cont’d)
string
string
9 - 5 + 2
string
string string
string
string
9 - 5 + 2
string
string string
Associativity of Operators Left-associative
If an operand with an operator on both sides of it, then it belongs to the operator to its left. string a+b+c has the same meaning as (a+b)+c
Left-associative operators have left-recursive productions left left + term | term
Right-associative If an operand with an operator on both sides of it, then it belo
ngs to the operator to its right. string a=b=c has the same meaning as a=(b=c)
Right-associative operators have right-recursive productions right term = right | term
Associativity of Operators (cont’d)
list
digit
a + b + c
list
list digit
digit
right
=
letter
a c=b
right
rightletter
letter
left-associative right-associative
Precedence of Operators String 9+5*2 has the same meaning as 9+(5*2) * has higher precedence than + Constructs a grammar for arithmetic expression
s with precedence of operators. left-associative : + - (expr) left-associative: * / (term)
Step 4:expr expr + term | expr – term | termterm term * factor | term / factor | factorfactor digit | ( expr )
Step 1:factor digit | ( expr )
Step 2:term term * factor | term / factor | factor
Step 3:expr expr + term | expr – term | term
An Example: Syntax of Statements The grammar is a subset of Java statements.
This approach prevents the build-up of semicolons after statements such as if- and while-, which end with nested substatements.
stmt id = expression ; | if ( expression ) stmt | if ( expression ) stmt else stmt | while ( expression ) stmt | do stmt while ( expression ) ; | { stmts }
stmts stmts stmt |
Syntax-Directed Translation Syntax-Directed translation is done by attaching rules
or program fragments to productions in a grammar. Translate infix expressions into postfix notation. ( in t
his chapter ) Infix: 9 – 5 + 2 Postfix: 9 5 – 2 +
An Example expr expr1 + term The pseudo-code of the translation
translate expr1 ;
translate term ; handle + ;
Syntax-Directed Translation (Cont’d) Two concepts (approaches) related to
Syntax-Directed Translation. Synthesized Attributes
Syntax-directed definition Build up a translation by attaching strings (semantic
rules) as attributes to the nodes in the parse tree. Translation Schemes
Syntax-directed translation Build up a translation by program fragments which are
called semantic actions and embedded within production bodies.
Syntax-directed definition The syntax-directed definition associates
With each grammar symbol (terminals and nonterminals), a set of attributes.
With each production, a set of semantic rules for computing the values of the attributes associated with the symbols appearing in the production.
An attribute is said to be Synthesized
if its value at a parse-tree node is determined from attribute values at its children and at the node itself.
Inherited if its value at a parse-tree node is determined from attribute valu
es at the node itself, its parent, and its siblings in the parse tree.
An Example: Synthesized Attributes An annotated parse tree
Suppose a node N in a parse tree is labeled by grammar symbol X.
The X.a is denoted the value of attribute a of X at node N.
expr.t = “95-2+”
term.t = “2”
9 - 5 + 2
expr.t = “95-”
expr.t = “9” term.t = “5”
term.t = “9”
Semantic Rules
Production Semantic Rules
expr expr1 + term
expr expr1 - term
expr term
term 0
term 1
…
term 9
expr.t = expr1.t || term.t || ‘+’
expr.t = expr1.t || term.t || ‘-’
expr.t = term.t
term.t = ‘0’
term.t = ‘1’
…
term.t = ‘9’
|| is the operator for string concatenation in semantic rule.
Depth-First Traversals Tree traversals
Breadth-First Depth-First
Preorder: N L R Inorder: L N R Postorder: L R N
Depth-First Traversals: Postorder、 From left to right
procedure visit(node N){ for ( each child C of N, from left to right ) { visit(C); } evaluate semantic rules at node N;}
Example: Depth-First Traversals
expr.t = 95-2+
term.t = 2
9 - 5 + 2
expr.t = 95-
expr.t = 9 term.t = 5
term.t = 9
Note: all attributes are the synthesized type
Translation Schemes A translation scheme is a CFG embedded
with semantic actions Example
rest + term { print(“+”) } rest
rest
term rest+ { print(“+”) }
Embedded Semantic Action
An Example: Translation Scheme expr
term
9
-
5
+
2
expr
expr term
term
{ print(‘+’) }
{ print(‘-’) } { print(‘2’) }
{ print(‘9’) }
{ print(‘5’) }expr expr + term { print(‘+’) }expr expr – term { print(‘-’) }expr termterm 0 { print(‘0’) }term 1 { print(‘1’) }
…term 9 { print(‘9’) }
Parsing The process of determining if a string of
terminals (tokens) can be generated by a grammar.
Time complexity: For any CFG there is a parser that takes at most
O(n3) time to parse a string of n terminals. Linear algorithms suffice to parse essentially all
languages that arise in practice. Two kinds of methods
Top-down: constructs a parse tree from root to leaves Bottom-up: constructs a parse tree from leaves to root
Top-Down Parsing Recursive descent parsing is a top-down method
of syntax analysis in which a set of recursive procedures is used to process the input. One procedure is associated with each nonterminal of a gr
ammar. If a nonterminal has multiple productions, each production i
s implemented in a branch of a selection statement based on input lookahead information
Predictive parsing A special form of recursive descent parsing The lookahead symbol unambiguously determines the flow
of control through the procedure body for each nonterminal.
An Example: Top-Down Parsing stmt expr ;
| if ( expr ) stmt | for ( optexpr ; optexpr ; optexpr ) stmt | other optexpr | expr
stmt
optexpr
ε expr
optexprfor ( ; ;optexpr ) stmt
expr other
void stmt() { switch ( lookahead ) { case expr: match(expr); match(‘;’); break; case if: match(if); match(‘(‘); match(expr); match(‘)’); stmt(); break; case for: match(for); match(‘(‘); optexpr(); match(‘;’); optexpr(); match(‘;’); optexpr(); match(‘)’); stmt(); break; case other: match(other); break; default: report(“syntax error”); }}
void optexpr() { if ( lookahead == expr ) match(expr);}
void match(terminal t) { if ( lookahead == t ) lookahead = nextTerminal; else
report(“syntax error”); }
stmt expr ; | if ( expr ) stmt | for ( optexpr ; optexpr ; optexpr ) stmt | other
optexpr | expr
Pseudocode For a Predictive Parser
Use ε-Productions
Example: Predictive Parsing
stmt
for ( ; expr ; expr ) other
ParseTree
Input
LL(1)
lookahead
for
match(for)
( match(‘(‘)optexpr()match(‘;‘)optexpr()match(‘;‘)optexpr()match(‘)‘) stmt()
optexpr ; optexpr ; optexpr )
stmt
FIRST FIRST() is the set of terminals that appear a
s the first symbols of one or more strings generated from
is Sentential Form Example
FIRST(stmt) = { expr, if, for, other } FIRST(expr ;) = { expr }
stmt expr ; | if ( expr ) stmt | for ( optexpr ; optexpr ; optexpr ) stmt | other
Examples: First
FIRST(simple) = { integer, char, num }
FIRST(^ id) = { ^ }
FIRST(type) = { integer, char, num, ^, array }
type simple | ^ id | array [ simple ] of typesimple integer | char | num dotdot num
Designing a Predictive Parser A predictive parser is a program consisting of a
procedure for every nonterminal. The procedure for nonterminal A
It decides which A-production to use by examining the lookahead symbol. Left Factor Left Recursion ε Production
Mimics the body of the chosen production. Applying translation scheme
Construct a predictive parser, ignoring the actions. Copy the actions from the translation scheme into th
e parser
Left Factor Left Factor
One production for nonterminal A starts with the same symbols.
Example:stmt if ( expr ) stmt
| if ( expr ) stmt else stmt
Use Left Factoring to fix itstmt if ( expr ) stmt restrest else stmt | ε
Left Recursion Left Recursive
A production for nonterminal A starts with a self reference.
A Aα | β An Example:
expr expr + term | term Rewrite the left recursive to right recursive by
using the following rules.A βRR αR | ε
Example: Left and Right Recursive
β α α …. α β α α …. α
A
A
A
A
…
A
RR
R
…
R
ε
left recursive right recursive
Abstract and Concrete Syntax +
-
9 5
2
expr
term
9 - 5 + 2
expr
expr term
termhelper
Conclusion: Parsing and Translation Scheme Give a CFG grammar G as below:
expr expr + term { print(‘+’) }expr expr – term { print(‘-’) }expr termterm 0 { print(‘0’) }term 1 { print(‘1’) } …term 9 { print(‘9’) }
Semantic actions for translating into postfix notation.
Conclusion: Parsing and Translation Scheme Step 1
To elimination left-recursion Technique
A Aα | Aβ | γ
into
A γR
R αR | βR | ε Use the rule to transforms G.
Left-Recursion-eliminationexpr term rest rest + term { print(‘+’) } rest | – term { print(‘-’) } rest | εterm 0 { print(‘0’) }term 1 { print(‘1’) } …term 9 { print(‘9’) }
Conclusion: Parsing and Translation Scheme
An Example: Left-Recursion-elimination expr
term
9 { print(‘9’) }
5
rest
- term { print(‘-’) }
{ print(‘5’) }
2
rest
+ term { print(‘+’) }
{ print(‘2’) } ε
rest
expr term restrest + term { print(‘+’) } rest | – term { print(‘-’) } rest | ε
term 0 { print(‘0’) } | 1 { print(‘1’) } | … | 9 { print(‘9’) }
Conclusion: Parsing and Translation Scheme Step 2 Procedures for
Nonterminals.
void expr() { term(); rest();}
void rest() { if ( lookahead == ‘+’ ) { match(‘+’); term(); print(‘+’); rest(); } else if ( lookahead == ‘-’ ) { match(‘-’); term(); print(‘-’); rest(); } else { } //do nothing with the input} void term() { if ( lookahead is a digit ) { t = lookahead; match(lookahead); print(t); } else report(“syntax error”); }
Step 3 Simplifying the Translator
Conclusion: Parsing and Translation Scheme
void rest() { while ( true ) { if ( lookahead == ‘+’ ) { match(‘+’); term(); print(‘+’); continue; } else if (lookahead == ‘-’) { match(‘-’); term(); print(‘-’); continue; } break; }}
void rest() { if ( lookahead == ‘+’ ) { match(‘+’); term(); print(‘+’); rest(); } else if (lookahead == ‘-’) { match(‘-’); term(); print(‘-’); rest(); } else { }
Conclusion: Parsing and Translation Scheme Complete
void term() throws IOException { if (Character.isDigit((char)lookahead){ System.out.write((char)lookahead); match(lookahead); } else throw new Error(“syntax error”); }
void match(int t) throws IOException { if ( lookahead == t ) lookahead = System.in.read(); else throw new Error(“syntax error”); }}
import java.io.*;
class Parser { static int lookahead;
public Parser() throws IOException { lookahead = System.in.read(); }
void expr() { term(); while ( true ) { if ( lookahead == ‘+’ ) { match(‘+’); term(); System.out.write(‘+’); continue; } else if (lookahead == ‘-’) { match(‘-’); term(); System.out.write(‘-’); continue; } else return; }