Download - Intermediate Code Generation Manas Thakur
![Page 1: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/1.jpg)
CS502: Compiler Design
Intermediate Code Generation
Manas Thakur
Fall 2020
![Page 2: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/2.jpg)
Manas Thakur CS502: Compiler Design 2
Midway through the course!
Lexical AnalyzerLexical Analyzer
Syntax AnalyzerSyntax Analyzer
Semantic AnalyzerSemantic Analyzer
Intermediate Code Generator
Intermediate Code Generator
Character stream
Token stream
Syntax tree
Syntax tree
Intermediaterepresentation
Machine-Independent Code Optimizer
Machine-Independent Code Optimizer
Code GeneratorCode Generator
Target machine code
Intermediate representation
Machine-Dependent Code Optimizer
Machine-Dependent Code Optimizer
Target machine code
SymbolTable
F r
o n
t e
n d
B a
c k
e n
d
![Page 3: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/3.jpg)
Manas Thakur CS502: Compiler Design 3
Roles of IR Generator
● Act as a glue between front-end and back-end
– Or source and machine codes
● Lower abstraction from source level
– To make life simple
● Maintain some high-level information
– To keep life interesting
● Make the dream of m+n components for m languages and n platforms look like a possibility
– Scala to Java Bytecode, for example
● Enable machine-independent optimization
– Next phase
![Page 4: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/4.jpg)
Manas Thakur CS502: Compiler Design 4
Intermediate Representations (IR)
● IR design affects compiler speed and capabilities
● Some important IR properties:
– Ease of generation, manipulation, optimization
– Size of the representation
– Level of abstraction: level of detail in the IR● How close is the IR to source code? To the machine?● What kinds of operations are represented?
● Often, different IRs for different jobs:
– High-level IR: close to the source language
– Low-level IR: close to the assembly code
– Some compilers even have mid-level IRs!
![Page 5: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/5.jpg)
Manas Thakur CS502: Compiler Design 5
Kinds of IRs
● Structural
– Graph oriented
– Heavily used in IDEs, source-to-sourcetranslators
– Tend to be large
● Linear
– Pseudo-code for an abstract machine
– Level of abstraction varies
– Simple, compact data structures
● Hybrid
– Combination of graphs and linear code
Examples:ASTs, DAGs
Examples:3 address codeBytecode (Stack machine)
Examples:Control-flow graphs,Ideal IR (HotSpot C2)
![Page 6: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/6.jpg)
Manas Thakur CS502: Compiler Design 6
Abstract Syntax Tree (AST)
● Parse tree with some intermediate nodes removed
● Advantages:
– Easy to evaluate● Postfix form: x 2 y * -● Useful for interpretation
– Source code can be reconstructed● Helpful in program understanding
x – 2 * y
-
x
2 y
*
![Page 7: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/7.jpg)
Manas Thakur CS502: Compiler Design 7
Directed Acyclic Graph (DAG)
● AST with a unique node for each value
● Advantages:
– Compact (reduces redundancy)
– Won’t have to evaluate the same expression twice
a + a * (b – c) + (b – c) * d
++
**++
aa ** -- dd
bb ccaa --
bb cc
++
**++
** dd
aa --
bb cc
becomes
![Page 8: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/8.jpg)
Manas Thakur CS502: Compiler Design 8
Three Address Code (3AC or TAC)● At most
– Three addresses (names/constants) in the instruction
– One operator on the right hand side of assignment
● General statement form: x = y op z
● Longer expressions are simplified by introducing temporaries
● Advantages:
– Easy to understand
– Names for intermediate values
z = x – 2 * y becomest1 = 2 * yt2 = x – t1z = t2
ort1 = 2 * yz = x – t1
![Page 9: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/9.jpg)
Manas Thakur CS502: Compiler Design 9
More about 3AC● Allows variety of instructions:
– Assignments● x = y op z● x = op y● x = y● x = y[i] and x[i] = y● x = y.f and x.f = y
– Branches● goto L● if x goto L
– Procedure calls
● param x1; param x2; ..., param xn; call p, n
– Pointer assignments
![Page 10: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/10.jpg)
Manas Thakur CS502: Compiler Design 10
Classwork: Generate 3AC
● r = a + a * (b – c) + (b – c) * d
● if (x < y) S1 else S2
● while (x < 10) S1
t1 = b - ct2 = t1 * dt3 = b – ct4 = a * t3t5 = t4 + t2r = a + t5t1 = x < y
if !t1 goto L1S1goto L2L1: S2L2:
L1: c = x < 10t = !cif !t goto L2S1goto L1L2:
![Page 11: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/11.jpg)
Manas Thakur CS502: Compiler Design 11
3AC Representations
● Triples
● Quadruples
Assignment: a = b * -c + d * -e
minus c t1
* b t1 t2
minus e t3
* d t3 t4
+ t2 t4 t5
= t5 a
t1 = minus c
t2 = b * t1
t3 = minus e
t4 = d * t3
t5 = t2 + t4
a = t5
op arg1 arg2 result
minus c
* b (0)
minus e
* d (2)
+ (1) (3)
= a (4)
op arg1 arg2
0
1
2
3
4
5
Instructions can be reordered easily.
Instructions cannot be reordered easily.
![Page 12: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/12.jpg)
Manas Thakur CS502: Compiler Design 12
3AC Representations (Cont.)
● Triples
● Quadruples
Assignment: a = b * -c + d * -e
minus c
* b (0)
minus e
* d (2)
+ (1) (3)
= a (4)
op arg1 arg2
0
1
2
3
4
5
Instructions cannot be reordered easily.
0
1
2
3
4
5
(2)
(3)
(0)
(1)
(4)
(5)
can be reordered easily
(0)
(1)
(2)
(3)
(4)
(5)
Indirect triples
t1 = minus c
t2 = b * t1
t3 = minus e
t4 = d * t3
t5 = t2 + t4
a = t5
![Page 13: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/13.jpg)
Manas Thakur CS502: Compiler Design 13
2 Address Code
● Where have you seen them?
– Common in Assembly
● Example:
● Larger number of instructions compared to 3AC
● Good for register allocation
z = x – 2 * y
MOV R1, yMUL R1, 2MOV R2, xSUB R2, R1MOV x , R2
becomes
![Page 14: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/14.jpg)
Manas Thakur CS502: Compiler Design 14
1 Address Code
● Stack-based computers
● Example: Java Virtual Machines!
● Advantages:
– Simple to generate and execute
– Compact form● There is a reason you find Java based systems popular in:
– Embedded systems– Mobile phones (Android)– Systems where code is transmitted (Internet)
x – 2 * y becomes
push xpush 2push ymultiplysubtract
![Page 15: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/15.jpg)
Manas Thakur CS502: Compiler Design 15
What next?
● More IRs (while learning CGO):
– Control-Flow Graph (CFG)
– Static Single Assignment (SSA)
● Next class: IR generation
– Focus: 3AC. Why?● Comfortable and still affordable!● Offers a wide understanding of
the involved challenges.● Assignment 3 would involve
3AC generation!– But there is time for it.
![Page 16: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/16.jpg)
CS502: Compiler Design
Intermediate Code Generation (Cont.)
Manas Thakur
Fall 2020
![Page 17: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/17.jpg)
Manas Thakur CS502: Compiler Design 17
IR Generation
● High level language is complex
● Goal: Lower HLL code to a simpler form (3AC)
● Constructs that we need to translate:
– Variable declarations
– Expressions
– Array accesses
– Control structures (conditionals, loops)
– Function calls
– Function bodies
– Classes and objects!
● Approach: Syntax-directed translation from parse tree.
![Page 18: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/18.jpg)
Manas Thakur CS502: Compiler Design 18
Variable declarations
● Use symbol tables
– Maps from names to values
● Take care of nested scopes
– What will you do at the entry to a new block?
– What to do at a function call?
– Function entry?
– Function exit?
– Need to push and pop the current environment.
● Fields of a structure/class?
– We will study in detail when we learn translating objects.
![Page 19: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/19.jpg)
Manas Thakur CS502: Compiler Design 19
Lowering scheme
● Code template for each AST node
– Captures key semantics of each construct
– Has blanks for the node’s children
– Implemented in a function called gen● To fill in the template:
– Call the function gen recursively on children● Did anyone say “visitors”?
– Plug code into the blanks
● How to stitch code together?
– gen stores the results into a temporary
– Emit code that combines the results for the syntactic construct represented by the current node
![Page 20: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/20.jpg)
Manas Thakur CS502: Compiler Design 20
Translating expressionsSay E.addr is a synthesized attribute that denotes the temporary holding the value of E.
Construct Translation
E -> E1 + E2
E.addr = newtemp();gen(E.addr ‘=’ E1.addr ‘+’ E2.addr)
Construct visit() method
E -> E1 + E2
t1 = visit(E1);t2 = visit(E2);r = newtemp();System.out.println(“r = t1 + t2”);return r;
Construct Translation
E -> E1 + E2
E.addr = newtemp();E.code = E1.code || E2.code || gen(E.addr ‘=’ E1.addr ‘+’ E2.addr)
In t
erm
s o
f o
ur
ass
ign
me
nt:
In t
erm
s o
f a n
attr
ibu
te E.c ode
:
![Page 21: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/21.jpg)
Manas Thakur CS502: Compiler Design 21
Translating expressions (Cont.)
● symTab is the symbol table of the current scope.
Construct Translation
S -> id = E gen(symTab.get(id.lexeme) ‘=’ E.addr)
E -> -E1
E.addr = newtemp()gen(E.addr ‘=’ ‘-’E1.addr)
E -> (E1) E.addr = E1.addr
E -> id E.addr = symTab.get(id.lexeme)
![Page 22: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/22.jpg)
Manas Thakur CS502: Compiler Design 22
Example
● 3AC for a = b + -c:
Construct Translation
S -> id = E gen(symTab.get(id.lexeme) ‘=’ E.addr)
E -> E1 + E2
E.addr = newtemp();gen(E.addr ‘=’ E1.addr ‘+’ E2.addr)
E -> -E1
E.addr = newtemp()gen(E.addr ‘=’ ‘-’E1.addr)
E -> (E1) E.addr = E1.addr
E -> id E.addr = symTab.get(id.lexeme)
t1 = - ct2 = b + t1a = t2
![Page 23: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/23.jpg)
Manas Thakur CS502: Compiler Design 23
Translating array references
● Each type has a width (e.g., int may have 4)
● How do you get the relative address (from base) of the ith element of an array A, that is, A[i]?
– base + i * w
● What about A[i][j]?
– base + i1 * w1 + i2 * w2
● In general for a k-dimension array:
– base + i1 * w1 + i2 * w2 + ... + ik * wk
● Note: We are assuming row-major order.
![Page 24: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/24.jpg)
Manas Thakur CS502: Compiler Design 24
Translating array references (Cont.)
● Say we have the following grammar rule for generating a possibly multidimensional array reference:
● Say we have the following attributes:
– L.addr: a temporary that holds the offset for the array reference
– L.array: pointer to the symTab entry for the array name
– L.array.base: actual location of the array reference
– L.type: type of the subarray generated by L
– t.width: width of type t
– t.elem: type of the elements of array type t
L -> L[E] | id [E]
![Page 25: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/25.jpg)
Manas Thakur CS502: Compiler Design 25
Translating array references (Cont.)
S -> L = E
E -> L
L -> id[E]
L -> L1[E]
gen(L.array.base’[‘L.addr’]’ ’=’ E.addr)
E.addr = newtemp()gen(E.addr ‘=’ L.array.base’[’L.addr’]’)
L.array = symTab.get(id.lexeme)L.type = L.array.type.elemL.addr = newtemp()gen(L.addr ‘=’ E.addr ‘*’ L.type.width)
L.array = L1.arrayL.type = L1.type.elemt = newtemp()L.addr = newtemp()gen(t ‘=’ E.addr ‘*’ L.type.width)gen(L.addr ‘=’ L1.addr ‘+’ t)
![Page 26: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/26.jpg)
Manas Thakur CS502: Compiler Design 26
Example● 3AC for c + a[i][j],
– where type of a is array(2, array(3, integer)),
– and width of integer is 4.
t1 = i * 12t2 = j * 4t3 = t1 + t2t4 = a[t3]t5 = c + t4
Construct Translation
S -> L = E gen(L.array.base’[‘L.addr’]’ ’=’ E.addr)
E -> LE.addr = newtemp()gen(E.addr ‘=’ L.array.base’[’L.addr’]’)
L -> id[E]L.array = symTab.get(id.lexeme)L.type = L.array.type.elem; L.addr = newtemp()gen(L.addr ‘=’ E.addr ‘*’ L.type.width)
L -> L1[E]
L.array = L1.array; L.type = L1.type.elem
t = newtemp(); L.addr = newtemp()gen(t ‘=’ E.addr ‘*’ L.type.width)gen(L.addr ‘=’ L1.addr ‘+’ t)
Section 6.4 (DB)
![Page 27: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/27.jpg)
CS502: Compiler Design
Intermediate Code Generation (Cont.)
Manas Thakur
Fall 2020
![Page 28: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/28.jpg)
Manas Thakur CS502: Compiler Design 28
Control flow
● By default straight line
– One statement after another
● Conditionals
– if, if-else, switch
● Loops
– while, do-while, for, repeat-until
● But we first need to consider:
– Boolean expressions
– Jumps (gotos) and labels
![Page 29: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/29.jpg)
Manas Thakur CS502: Compiler Design 29
Translating boolean expressions
● B -> B || B | B && B | !B | E relop E | true | false
● relop -> < | <= | > | >= | == | !=
● How to optimize the evaluation of || and &&?
– Short-circuiting
– We need to keep that in mind in order to generate efficient 3AC.
– When can not doing short-circuiting affect correctness?
![Page 30: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/30.jpg)
Manas Thakur CS502: Compiler Design 30
Translating boolean expressions (Cont.)
B -> B1 || B2
B1.true = B.trueB1.false = newlabel()B2.true = B.trueB2.false = B.falseB.code = B1.code || label(B1.false) || B2.code
B -> B1 && B2
B1.true = newlabel()B1.false = B.falseB2.true = B.trueB2.false = B.falseB.code = B1.code || label(B1.true) || B2.code
Say apart from the synthesized attributed code,each Boolean expressions has two inherited attributes true and false.
We will see how B.true and B.false are set after two slides.
![Page 31: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/31.jpg)
Manas Thakur CS502: Compiler Design 31
Translating boolean expressions (Cont.)
B1.true = B.falseB1.false = B.trueB.code = B1.code
B.code = E1.code || E2.code || gen(‘if’ E1.addr relop E2.addr ‘goto’ B.true) || gen(‘goto’ B.false)
B.code = gen(‘goto’ B.true)
B.code = gen(‘goto’ B.false)
B -> !B1
B -> E1 relop E2
B -> true
B -> false
![Page 32: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/32.jpg)
Manas Thakur CS502: Compiler Design 32
Translating control-flow expressions
S -> if (B) S1
S -> if (B) S1 else S2
B.true = newlabel()B.false = S1.next = S.nextS.code = B.code || label(B.true) || S1.code
B.true = newlabel()B.false = newlabel()S1.next = S2.next = S.nextS.code = B.code || label(B.true) || S1.code || gen(‘goto’ S.next) || label(B.false) || S2.code
Notice that next is another inherited attribute with each statement.
![Page 33: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/33.jpg)
Manas Thakur CS502: Compiler Design 33
Translating control-flow expressions (Cont.)
S -> while (B) S1
S -> S1 S2
begin = newlabel()B.true = newlabel()B.false = S.nextS1.next = begin
S.code = label(begin) || B.code || label(B.true) || S1.code || gen(‘goto’ begin)
S1.next = newlabel()
S2.next = S.nextS.code = S1.code || label(S1.next) || S2.code
![Page 34: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/34.jpg)
Manas Thakur CS502: Compiler Design 34
3AC for if (x < 100 || x > 200 && x != y) x = 0 if x < 100 goto L2
goto L3
L3: if x > 200 goto L4
goto L1
L4: if x != y goto L2
goto L1
L2: x = 0L1:
B1.true = B.trueB1.false = newlabel()B2.true = B.true; B2.false = B.falseB.code = B1.code || label(B1.false) || B2.code
B1.true = newlabel()B1.false = B.false; B2.true = B.trueB2.false = B.falseB.code = B1.code || label(B1.true) || B2.code
B.true = newlabel()B.false = S1.next = S.nextS.code = B.code || label(B.true) || S1.code
B -> B1 || B2
B.code = E1.code || E2.code || gen(‘if’ E1.addr relop E2.addr ‘goto’ B.true) || gen(‘goto’ B.false)
if (B) S1
B -> E1 relop E2
B -> B1 && B2
S1.next = newlabel()
S2.next = S.nextS.code = S1.code || label(S1.next) || S2.code
S -> S1 S2
if x < 100 goto L2
if x <= 200 goto L1
if x == y goto L1
L2: x = 0L1: another way
straightforward
![Page 35: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/35.jpg)
Manas Thakur CS502: Compiler Design 35
Backpatching
● S -> if (B) S1 required us to pass label for evaluating B.
– We did it using inherited attributes.
● Alternatively, we could leave the label unspecified,
– and fill it in later.
● Called backpatching.
– A general concept for one-pass code generation.
– Self reading: Section 6.7 of Dragon book.
![Page 36: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/36.jpg)
Manas Thakur CS502: Compiler Design 36
Translating break and continue
● break and continue are special (disciplined?) gotos.
● Their IR needs
– currently enclosing loop/switch.
– goto to a label just outside/before the enclosing block.
● How to generate the 3AC for break?
– either pass on the enclosing block and label as inherited attributes,
or
– use backpatching to fill-in the label of goto.
● For continue?
![Page 37: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/37.jpg)
Manas Thakur CS502: Compiler Design 37
Translating switch statements
● Using nested if-else
● Using a table of pairs <Vi, S
i>
● Using a hash-table
– when n is large (say >10)
● Special case when Vis are
consecutive integrals
– Indexed array would be sufficient
switch (E) { case V1: S1
case V2: S2
... case Vn-1: Sn-1
default: Sn
}
![Page 38: Intermediate Code Generation Manas Thakur](https://reader033.vdocument.in/reader033/viewer/2022051609/6280b4257accee75f348b605/html5/thumbnails/38.jpg)
Manas Thakur CS502: Compiler Design 38
Where are we?
● Learnt to generate 3AC for:
– Expressions
– Array references
– Control-flow statements
● Key learning:
– Generate code for yourself; trust the family to patch-up
● Next class (when?):
– Translating classes/structures, objects, object references.