intermediate code generation manas thakur
TRANSCRIPT
CS502: Compiler Design
Intermediate Code Generation
Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 2
Midway through the course!
Lexical AnalyzerLexical Analyzer
Syntax AnalyzerSyntax Analyzer
Semantic AnalyzerSemantic Analyzer
Intermediate Code Generator
Intermediate Code Generator
Character stream
Token stream
Syntax tree
Syntax tree
Intermediaterepresentation
Machine-Independent Code Optimizer
Machine-Independent Code Optimizer
Code GeneratorCode Generator
Target machine code
Intermediate representation
Machine-Dependent Code Optimizer
Machine-Dependent Code Optimizer
Target machine code
SymbolTable
F r
o n
t e
n d
B a
c k
e n
d
Manas Thakur CS502: Compiler Design 3
Roles of IR Generator
● Act as a glue between front-end and back-end
– Or source and machine codes
● Lower abstraction from source level
– To make life simple
● Maintain some high-level information
– To keep life interesting
● Make the dream of m+n components for m languages and n platforms look like a possibility
– Scala to Java Bytecode, for example
● Enable machine-independent optimization
– Next phase
Manas Thakur CS502: Compiler Design 4
Intermediate Representations (IR)
● IR design affects compiler speed and capabilities
● Some important IR properties:
– Ease of generation, manipulation, optimization
– Size of the representation
– Level of abstraction: level of detail in the IR● How close is the IR to source code? To the machine?● What kinds of operations are represented?
● Often, different IRs for different jobs:
– High-level IR: close to the source language
– Low-level IR: close to the assembly code
– Some compilers even have mid-level IRs!
Manas Thakur CS502: Compiler Design 5
Kinds of IRs
● Structural
– Graph oriented
– Heavily used in IDEs, source-to-sourcetranslators
– Tend to be large
● Linear
– Pseudo-code for an abstract machine
– Level of abstraction varies
– Simple, compact data structures
● Hybrid
– Combination of graphs and linear code
Examples:ASTs, DAGs
Examples:3 address codeBytecode (Stack machine)
Examples:Control-flow graphs,Ideal IR (HotSpot C2)
Manas Thakur CS502: Compiler Design 6
Abstract Syntax Tree (AST)
● Parse tree with some intermediate nodes removed
● Advantages:
– Easy to evaluate● Postfix form: x 2 y * -● Useful for interpretation
– Source code can be reconstructed● Helpful in program understanding
x – 2 * y
-
x
2 y
*
Manas Thakur CS502: Compiler Design 7
Directed Acyclic Graph (DAG)
● AST with a unique node for each value
● Advantages:
– Compact (reduces redundancy)
– Won’t have to evaluate the same expression twice
a + a * (b – c) + (b – c) * d
++
**++
aa ** -- dd
bb ccaa --
bb cc
++
**++
** dd
aa --
bb cc
becomes
Manas Thakur CS502: Compiler Design 8
Three Address Code (3AC or TAC)● At most
– Three addresses (names/constants) in the instruction
– One operator on the right hand side of assignment
● General statement form: x = y op z
● Longer expressions are simplified by introducing temporaries
● Advantages:
– Easy to understand
– Names for intermediate values
z = x – 2 * y becomest1 = 2 * yt2 = x – t1z = t2
ort1 = 2 * yz = x – t1
Manas Thakur CS502: Compiler Design 9
More about 3AC● Allows variety of instructions:
– Assignments● x = y op z● x = op y● x = y● x = y[i] and x[i] = y● x = y.f and x.f = y
– Branches● goto L● if x goto L
– Procedure calls
● param x1; param x2; ..., param xn; call p, n
– Pointer assignments
Manas Thakur CS502: Compiler Design 10
Classwork: Generate 3AC
● r = a + a * (b – c) + (b – c) * d
● if (x < y) S1 else S2
● while (x < 10) S1
t1 = b - ct2 = t1 * dt3 = b – ct4 = a * t3t5 = t4 + t2r = a + t5t1 = x < y
if !t1 goto L1S1goto L2L1: S2L2:
L1: c = x < 10t = !cif !t goto L2S1goto L1L2:
Manas Thakur CS502: Compiler Design 11
3AC Representations
● Triples
● Quadruples
Assignment: a = b * -c + d * -e
minus c t1
* b t1 t2
minus e t3
* d t3 t4
+ t2 t4 t5
= t5 a
t1 = minus c
t2 = b * t1
t3 = minus e
t4 = d * t3
t5 = t2 + t4
a = t5
op arg1 arg2 result
minus c
* b (0)
minus e
* d (2)
+ (1) (3)
= a (4)
op arg1 arg2
0
1
2
3
4
5
Instructions can be reordered easily.
Instructions cannot be reordered easily.
Manas Thakur CS502: Compiler Design 12
3AC Representations (Cont.)
● Triples
● Quadruples
Assignment: a = b * -c + d * -e
minus c
* b (0)
minus e
* d (2)
+ (1) (3)
= a (4)
op arg1 arg2
0
1
2
3
4
5
Instructions cannot be reordered easily.
0
1
2
3
4
5
(2)
(3)
(0)
(1)
(4)
(5)
can be reordered easily
(0)
(1)
(2)
(3)
(4)
(5)
Indirect triples
t1 = minus c
t2 = b * t1
t3 = minus e
t4 = d * t3
t5 = t2 + t4
a = t5
Manas Thakur CS502: Compiler Design 13
2 Address Code
● Where have you seen them?
– Common in Assembly
● Example:
● Larger number of instructions compared to 3AC
● Good for register allocation
z = x – 2 * y
MOV R1, yMUL R1, 2MOV R2, xSUB R2, R1MOV x , R2
becomes
Manas Thakur CS502: Compiler Design 14
1 Address Code
● Stack-based computers
● Example: Java Virtual Machines!
● Advantages:
– Simple to generate and execute
– Compact form● There is a reason you find Java based systems popular in:
– Embedded systems– Mobile phones (Android)– Systems where code is transmitted (Internet)
x – 2 * y becomes
push xpush 2push ymultiplysubtract
Manas Thakur CS502: Compiler Design 15
What next?
● More IRs (while learning CGO):
– Control-Flow Graph (CFG)
– Static Single Assignment (SSA)
● Next class: IR generation
– Focus: 3AC. Why?● Comfortable and still affordable!● Offers a wide understanding of
the involved challenges.● Assignment 3 would involve
3AC generation!– But there is time for it.
CS502: Compiler Design
Intermediate Code Generation (Cont.)
Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 17
IR Generation
● High level language is complex
● Goal: Lower HLL code to a simpler form (3AC)
● Constructs that we need to translate:
– Variable declarations
– Expressions
– Array accesses
– Control structures (conditionals, loops)
– Function calls
– Function bodies
– Classes and objects!
● Approach: Syntax-directed translation from parse tree.
Manas Thakur CS502: Compiler Design 18
Variable declarations
● Use symbol tables
– Maps from names to values
● Take care of nested scopes
– What will you do at the entry to a new block?
– What to do at a function call?
– Function entry?
– Function exit?
– Need to push and pop the current environment.
● Fields of a structure/class?
– We will study in detail when we learn translating objects.
Manas Thakur CS502: Compiler Design 19
Lowering scheme
● Code template for each AST node
– Captures key semantics of each construct
– Has blanks for the node’s children
– Implemented in a function called gen● To fill in the template:
– Call the function gen recursively on children● Did anyone say “visitors”?
– Plug code into the blanks
● How to stitch code together?
– gen stores the results into a temporary
– Emit code that combines the results for the syntactic construct represented by the current node
Manas Thakur CS502: Compiler Design 20
Translating expressionsSay E.addr is a synthesized attribute that denotes the temporary holding the value of E.
Construct Translation
E -> E1 + E2
E.addr = newtemp();gen(E.addr ‘=’ E1.addr ‘+’ E2.addr)
Construct visit() method
E -> E1 + E2
t1 = visit(E1);t2 = visit(E2);r = newtemp();System.out.println(“r = t1 + t2”);return r;
Construct Translation
E -> E1 + E2
E.addr = newtemp();E.code = E1.code || E2.code || gen(E.addr ‘=’ E1.addr ‘+’ E2.addr)
In t
erm
s o
f o
ur
ass
ign
me
nt:
In t
erm
s o
f a n
attr
ibu
te E.c ode
:
Manas Thakur CS502: Compiler Design 21
Translating expressions (Cont.)
● symTab is the symbol table of the current scope.
Construct Translation
S -> id = E gen(symTab.get(id.lexeme) ‘=’ E.addr)
E -> -E1
E.addr = newtemp()gen(E.addr ‘=’ ‘-’E1.addr)
E -> (E1) E.addr = E1.addr
E -> id E.addr = symTab.get(id.lexeme)
Manas Thakur CS502: Compiler Design 22
Example
● 3AC for a = b + -c:
Construct Translation
S -> id = E gen(symTab.get(id.lexeme) ‘=’ E.addr)
E -> E1 + E2
E.addr = newtemp();gen(E.addr ‘=’ E1.addr ‘+’ E2.addr)
E -> -E1
E.addr = newtemp()gen(E.addr ‘=’ ‘-’E1.addr)
E -> (E1) E.addr = E1.addr
E -> id E.addr = symTab.get(id.lexeme)
t1 = - ct2 = b + t1a = t2
Manas Thakur CS502: Compiler Design 23
Translating array references
● Each type has a width (e.g., int may have 4)
● How do you get the relative address (from base) of the ith element of an array A, that is, A[i]?
– base + i * w
● What about A[i][j]?
– base + i1 * w1 + i2 * w2
● In general for a k-dimension array:
– base + i1 * w1 + i2 * w2 + ... + ik * wk
● Note: We are assuming row-major order.
Manas Thakur CS502: Compiler Design 24
Translating array references (Cont.)
● Say we have the following grammar rule for generating a possibly multidimensional array reference:
● Say we have the following attributes:
– L.addr: a temporary that holds the offset for the array reference
– L.array: pointer to the symTab entry for the array name
– L.array.base: actual location of the array reference
– L.type: type of the subarray generated by L
– t.width: width of type t
– t.elem: type of the elements of array type t
L -> L[E] | id [E]
Manas Thakur CS502: Compiler Design 25
Translating array references (Cont.)
S -> L = E
E -> L
L -> id[E]
L -> L1[E]
gen(L.array.base’[‘L.addr’]’ ’=’ E.addr)
E.addr = newtemp()gen(E.addr ‘=’ L.array.base’[’L.addr’]’)
L.array = symTab.get(id.lexeme)L.type = L.array.type.elemL.addr = newtemp()gen(L.addr ‘=’ E.addr ‘*’ L.type.width)
L.array = L1.arrayL.type = L1.type.elemt = newtemp()L.addr = newtemp()gen(t ‘=’ E.addr ‘*’ L.type.width)gen(L.addr ‘=’ L1.addr ‘+’ t)
Manas Thakur CS502: Compiler Design 26
Example● 3AC for c + a[i][j],
– where type of a is array(2, array(3, integer)),
– and width of integer is 4.
t1 = i * 12t2 = j * 4t3 = t1 + t2t4 = a[t3]t5 = c + t4
Construct Translation
S -> L = E gen(L.array.base’[‘L.addr’]’ ’=’ E.addr)
E -> LE.addr = newtemp()gen(E.addr ‘=’ L.array.base’[’L.addr’]’)
L -> id[E]L.array = symTab.get(id.lexeme)L.type = L.array.type.elem; L.addr = newtemp()gen(L.addr ‘=’ E.addr ‘*’ L.type.width)
L -> L1[E]
L.array = L1.array; L.type = L1.type.elem
t = newtemp(); L.addr = newtemp()gen(t ‘=’ E.addr ‘*’ L.type.width)gen(L.addr ‘=’ L1.addr ‘+’ t)
Section 6.4 (DB)
CS502: Compiler Design
Intermediate Code Generation (Cont.)
Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 28
Control flow
● By default straight line
– One statement after another
● Conditionals
– if, if-else, switch
● Loops
– while, do-while, for, repeat-until
● But we first need to consider:
– Boolean expressions
– Jumps (gotos) and labels
Manas Thakur CS502: Compiler Design 29
Translating boolean expressions
● B -> B || B | B && B | !B | E relop E | true | false
● relop -> < | <= | > | >= | == | !=
● How to optimize the evaluation of || and &&?
– Short-circuiting
– We need to keep that in mind in order to generate efficient 3AC.
– When can not doing short-circuiting affect correctness?
Manas Thakur CS502: Compiler Design 30
Translating boolean expressions (Cont.)
B -> B1 || B2
B1.true = B.trueB1.false = newlabel()B2.true = B.trueB2.false = B.falseB.code = B1.code || label(B1.false) || B2.code
B -> B1 && B2
B1.true = newlabel()B1.false = B.falseB2.true = B.trueB2.false = B.falseB.code = B1.code || label(B1.true) || B2.code
Say apart from the synthesized attributed code,each Boolean expressions has two inherited attributes true and false.
We will see how B.true and B.false are set after two slides.
Manas Thakur CS502: Compiler Design 31
Translating boolean expressions (Cont.)
B1.true = B.falseB1.false = B.trueB.code = B1.code
B.code = E1.code || E2.code || gen(‘if’ E1.addr relop E2.addr ‘goto’ B.true) || gen(‘goto’ B.false)
B.code = gen(‘goto’ B.true)
B.code = gen(‘goto’ B.false)
B -> !B1
B -> E1 relop E2
B -> true
B -> false
Manas Thakur CS502: Compiler Design 32
Translating control-flow expressions
S -> if (B) S1
S -> if (B) S1 else S2
B.true = newlabel()B.false = S1.next = S.nextS.code = B.code || label(B.true) || S1.code
B.true = newlabel()B.false = newlabel()S1.next = S2.next = S.nextS.code = B.code || label(B.true) || S1.code || gen(‘goto’ S.next) || label(B.false) || S2.code
Notice that next is another inherited attribute with each statement.
Manas Thakur CS502: Compiler Design 33
Translating control-flow expressions (Cont.)
S -> while (B) S1
S -> S1 S2
begin = newlabel()B.true = newlabel()B.false = S.nextS1.next = begin
S.code = label(begin) || B.code || label(B.true) || S1.code || gen(‘goto’ begin)
S1.next = newlabel()
S2.next = S.nextS.code = S1.code || label(S1.next) || S2.code
Manas Thakur CS502: Compiler Design 34
3AC for if (x < 100 || x > 200 && x != y) x = 0 if x < 100 goto L2
goto L3
L3: if x > 200 goto L4
goto L1
L4: if x != y goto L2
goto L1
L2: x = 0L1:
B1.true = B.trueB1.false = newlabel()B2.true = B.true; B2.false = B.falseB.code = B1.code || label(B1.false) || B2.code
B1.true = newlabel()B1.false = B.false; B2.true = B.trueB2.false = B.falseB.code = B1.code || label(B1.true) || B2.code
B.true = newlabel()B.false = S1.next = S.nextS.code = B.code || label(B.true) || S1.code
B -> B1 || B2
B.code = E1.code || E2.code || gen(‘if’ E1.addr relop E2.addr ‘goto’ B.true) || gen(‘goto’ B.false)
if (B) S1
B -> E1 relop E2
B -> B1 && B2
S1.next = newlabel()
S2.next = S.nextS.code = S1.code || label(S1.next) || S2.code
S -> S1 S2
if x < 100 goto L2
if x <= 200 goto L1
if x == y goto L1
L2: x = 0L1: another way
straightforward
Manas Thakur CS502: Compiler Design 35
Backpatching
● S -> if (B) S1 required us to pass label for evaluating B.
– We did it using inherited attributes.
● Alternatively, we could leave the label unspecified,
– and fill it in later.
● Called backpatching.
– A general concept for one-pass code generation.
– Self reading: Section 6.7 of Dragon book.
Manas Thakur CS502: Compiler Design 36
Translating break and continue
● break and continue are special (disciplined?) gotos.
● Their IR needs
– currently enclosing loop/switch.
– goto to a label just outside/before the enclosing block.
● How to generate the 3AC for break?
– either pass on the enclosing block and label as inherited attributes,
or
– use backpatching to fill-in the label of goto.
● For continue?
Manas Thakur CS502: Compiler Design 37
Translating switch statements
● Using nested if-else
● Using a table of pairs <Vi, S
i>
● Using a hash-table
– when n is large (say >10)
● Special case when Vis are
consecutive integrals
– Indexed array would be sufficient
switch (E) { case V1: S1
case V2: S2
... case Vn-1: Sn-1
default: Sn
}
Manas Thakur CS502: Compiler Design 38
Where are we?
● Learnt to generate 3AC for:
– Expressions
– Array references
– Control-flow statements
● Key learning:
– Generate code for yourself; trust the family to patch-up
● Next class (when?):
– Translating classes/structures, objects, object references.