fall 2015-2016 compiler principles lecture 6: intermediate ...comp161/wiki.files/06-ir.pdf ·...
TRANSCRIPT
Fall 2015-2016 Compiler PrinciplesLecture 6: Intermediate Representation
Roman ManevichBen-Gurion University of the Negev
Tentative syllabus
FrontEnd
Scanning
Top-downParsing (LL)
Bottom-upParsing (LR)
IntermediateRepresentation
Lowering
Operational Semantics
Optimizations
DataflowAnalysis
LoopOptimizations
Code Generation
RegisterAllocation
InstructionSelection
2
Previously
3
• Becoming parsing ninjas
– Going from text to an Abstract Syntax Tree
By Admiral Ham [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
From scanning to parsing59 + (1257 * xPosition)
)id*num(+num
Lexical Analyzer
program text
token stream
Parser
Grammar:
E id
E num
E E + E
E E * E
E ( E ) +
num
num x
*
Abstract Syntax Tree
validsyntaxerror
4
Agenda• The role of intermediate representations
• Two example languages
– A high-level language
– An intermediate language
• Lowering
• Correctness
– Formal meaning of programs
5
Role of intermediate representation
• Bridge between front-end and back-end
• Allow implementing optimizations independent of source language and executable (target) language
High-levelLanguage
(scheme)
Executable
Code
LexicalAnalysis
Syntax Analysis
Parsing
AST SymbolTableetc.
Inter.Rep.(IR)
CodeGeneration
6
Motivation for intermediate representation
7
Intermediate representation• A language that is between the source language and
the target language– Not specific to any source language of machine language
• Goal 1: retargeting compiler components for different source languages/target machines
8
C++ IR
Pentium
Java bytecode
SparcPyhton
Java
Intermediate representation• A language that is between the source language and
the target language– Not specific to any source language of machine language
• Goal 1: retargeting compiler components for different source languages/target machines
• Goal 2: machine-independent optimizer– Narrow interface: small number of node types
(instructions)
9
C++ IR
Pentium
Java bytecode
SparcPyhton
Javaoptimize
Lowering Code Gen.
Multiple IRs
• Some optimizations require high-level structure
• Others more appropriate on low-level code
• Solution: use multiple IR stages
AST LIR
Pentium
Java bytecode
Sparc
optimize
HIR
optimize
10
Multiple IRs example
11
ElixirProgram
AutomatedReasoning
(Boogie+Z3)
DeltaInferencer
Query Answer
ElixirProgram+ delta
AutomatedPlanner
ILSynthesizer
PlanningProblem
Plan
LIR
C++backend
C++ code
GaloisLibrary
HIRLowering
HIR
Elixir – a language for parallel graph algorithms
Mini-project on parallel graph algorithms
AST vs. LIR for imperative languages
AST
• Rich set of language constructs• Rich type system• Declarations: types (classes,
interfaces), functions, variables
• Control flow statements: if-then-else, while-do, break-continue, switch, exceptions
• Data statements: assignments, array access, field access
• Expressions: variables, constants, arithmetic operators, logical operators, function calls
LIR
• An abstract machine language
• Very limited type system
• Only computation-related code
• Labels and conditional/ unconditional jumps, no looping
• Data movements, generic memory access statements
• No sub-expressions, logical as numeric, temporaries, constants, function calls –explicit argument passing
12
three address code
13
Three-Address Code IR
• A popular form of IR
• High-level assembly where instructions have at most three operands
• There exist other types of IR
– For example, IR based on acyclic graphs – more amenable for analysis and optimizations
14
Chapter 8
Base language: While
15
Syntax
A n | x | A ArithOp A | (A)
ArithOp - | + | * | /
B true | false
| A = A | A A | B | B B | (B)
S x := A | skip | S; S | { S }| if B then S else S
| while B S
16
n Num numeralsx Var program variables
Example program
17
while x < y {
x := x + 1
{
y := x;
Intermediate language:IL
18
Syntax
V n | x
R V Op V
Op - | + | * | / | = | | << | >> | …
C l: skip
| l: x := R
| l: Goto l’| l: IfZ x Goto l’| l: IfNZ x Goto l’
IR C+
19
n Num Numeralsl Num Labelsx Temp Var Temporaries and variables
Intermediate language programs
• An intermediate program P has the form1:c1…n:cn
• We can view it as a map from labels to individual commands and write P(j) = cj
20
1: t0 := 137
2: y := t0 + 3
3: IfZ x Goto 7
4: t1 := y
5: z := t1
6: Goto 9
7: t2 := y
8: x := t2
9: skip
Lowering
21
TAC generation
• At this stage in compilation, we have– an AST
– annotated with scope information
– and annotated with type information
• To generate TAC for the program, we do recursive tree traversal– Generate TAC for any subexpressions and
substatements
– Using the result, generate TAC for the overall expression (bottom-up manner)
22
TAC generation for expressions
• Define a function cgen(expr) that generates TAC that computes an expression, stores it in a temporary variable, then hands back the name of that temporary
• Define cgen directly for atomic expressions (constants, this, identifiers, etc.)
• Define cgen recursively for compound expressions (binary operators, function calls, etc.)
23
Translation rules for expressions
24
cgen(n) = (l: t:=n, t) where l and t are fresh
cgen(x) = (l: t:=x, t) where l and t are fresh
cgen(e1) = (P1, t1) cgen(e2) = (P2, t2)cgen(e1 op e2) = (P1· P2· l: t:=t1 op t2, t)
where l and t are fresh
cgen for basic expressions
• Maintain a counter for temporaries in c, and a counter for labels in l
• Initially: c = 0, l = 0
25
cgen(k) = { // k is a constantc = c + 1, l = l +1Emit(l: tc := k)Return tc
{
cgen(id) = { // id is an identifierc = c + 1, l = l +1Emit(l: t := id)Return tc
{
Naive cgen for binary expressions
• cgen(e1 op e2) = {Let A = cgen(e1)Let B = cgen(e2)c = c + 1, l = l +1Emit( l: tc := A op B; )Return tc
}
26
The translation emits code to evaluate e1 before e2. Why is that?
Example: cgen for binary expressions
27
cgen( (a*b)-d)
Example: cgen for binary expressions
28
c = 0, l = 0cgen( (a*b)-d)
Example: cgen for binary expressions
29
c = 0, l = 0cgen( (a*b)-d) = {Let A = cgen(a*b)
Let B = cgen(d)c = c + 1 , l = l +1Emit(l: tc := A - B; )Return tc
}
Example: cgen for binary expressions
30
c = 0, l = 0cgen( (a*b)-d) = {Let A = {
Let A = cgen(a)Let B = cgen(b)c = c + 1, l = l +1Emit(l: tc := A * B; )Return tc
}Let B = cgen(d)c = c + 1, l = l +1Emit(l: tc := A - B; )Return tc
}
Example: cgen for binary expressions
31
c = 0, l = 0cgen( (a*b)-d) = {Let A = {
Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc }c = c + 1, l = l +1Emit(l: tc := A * B; )Return tc
} Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc }c = c + 1, l = l +1Emit(l: tc := A - B; )Return tc
}
Code
here A=t1
Example: cgen for binary expressions
32
c = 0, l = 0cgen( (a*b)-d) = {Let A = {
Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc }Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc }c = c + 1, l = l +1Emit(l: tc := A * B; )Return tc
} Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc }c = c + 1, l = l +1Emit(l: tc := A - B; )Return tc
}
Code1: t1:=a;
here A=t1
Example: cgen for binary expressions
33
c = 0, l = 0cgen( (a*b)-d) = {Let A = {
Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc }Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc }c = c + 1, l = l +1Emit(l: tc := A * B; )Return tc
} Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc }c = c + 1, l = l +1Emit(l: tc := A - B; )Return tc
}
Code1: t1:=a;2: t2:=b;
here A=t1
Example: cgen for binary expressions
34
c = 0, l = 0cgen( (a*b)-d) = {Let A = {
Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc }Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc }c = c + 1, l = l +1Emit(l: tc := A * B; )Return tc
} Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc }c = c + 1, l = l +1Emit(l: tc := A - B; )Return tc
}
Code1: t1:=a;2: t2:=b;3: t3:=t1*t2
here A=t1
Example: cgen for binary expressions
35
c = 0, l = 0cgen( (a*b)-d) = {Let A = {
Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc }Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc }c = c + 1, l = l +1Emit(l: tc := A * B; )Return tc
} Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc }c = c + 1, l = l +1Emit(l: tc := A - B; )Return tc
}
Code1: t1:=a;2: t2:=b;3: t3:=t1*t2
here A=t1
here A=t3
Example: cgen for binary expressions
36
c = 0, l = 0cgen( (a*b)-d) = {Let A = {
Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc }Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc }c = c + 1, l = l +1Emit(l: tc := A * B; )Return tc
} Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc }c = c + 1, l = l +1Emit(l: tc := A - B; )Return tc
}
Code1: t1:=a;2: t2:=b;3: t3:=t1*t24: t4:=d
here A=t1
here A=t3
Example: cgen for binary expressions
37
c = 0, l = 0cgen( (a*b)-d) = {Let A = {
Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc }Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc }c = c + 1, l = l +1Emit(l: tc := A * B; )Return tc
}Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc }c = c + 1, l = l +1Emit(l: tc := A - B; )Return tc
}
Code1: t1:=a;2: t2:=b;3: t3:=t1*t24: t4:=d5: t5:=t3-t4
here A=t1
here A=t3
cgen for statements
• We can extend the cgen function to operate over statements as well
• Unlike cgen for expressions, cgen for statements does not return the name of a temporary holding a value
– (Why?)
38
Syntax
A n | x | A ArithOp A | (A)
ArithOp - | + | * | /
B true | false
| A = A | A A | B | B B | (B)
S x := A | skip | S; S | { S }| if B then S else S
| while B S
39
n Num numeralsx Var program variables
Translation rules for statements
40
cgen(e) = (P, t)cgen(x := e) = P · l: x :=t
where l is fresh
cgen(b) = (Pb, t), cgen(S1) = P1, cgen(S2) = P2
cgen(if b then S1 else S2) = PbIfZ t Goto lfalse
P1
lfinish: Goto Lafter
lfalse: skipP2
lafter: skip
cgen(skip) = l: skip where l is fresh
where lfinish, lfalse, lafter
are fresh
Translation rules for loops
41
cgen(b) = (Pb, t), cgen(S) = Pcgen(while b S) =
lbefore: skipPbIfZ t Goto lafter
Plloop: Goto Lbefore
lafter: skip
where lafter, lbefore, lloop
are fresh
Translation example
42
1: t1 := 137
2: t2 := 3
3: t3 := t1 + t2
4: y := t3
5: t4 := x
6: t5 := 0
7: t6 := t4=t5
8: IfZ t6 Goto 12
9: t7 := y
10: z := t7
11: Goto 14
12: t8 := y
13: x := t8
14: skip
y := 137+3;
if x=0
z := y;
else
x := y;
Correctness
43
Compiler correctness
• Intuitively, a compiler translates programs in one language (usually high) to another language (usually lower) such that they are bot equivalent
• Our goal is to formally define the meaning of this equivalence
• But first, we must define the meaning of a programming language
44
Formal semantics
45
46
http://www.daimi.au.dk/~bra8130/Wiley_book/wiley.html
What is formal semantics?
47
“Formal semantics is concerned with rigorously specifying the meaning, or behavior,
of programs, pieces of hardware, etc.”
What is formal semantics?
48
“This theory allows a program to be manipulated like a formula –
that is to say, its properties can be calculated.”
Gérard Huet & Philippe Flajolet homageto Gilles Kahn
Why formal semantics?
• Implementation-independent definitionof a programming language
• Automatically generating interpreters(and some day maybe full fledged compilers)
• Optimization, verification, and debugging– If you don’t know what it does, how do you know its
correct/incorrect?
– How do you know whether a given optimization is correct?
49
Operational semantics
• Elements of the semantics
• States/configurations: the (aggregate) values that a program computes during execution
• Transition rules: how the program advances from one configuration to another
50
Operational semanticsof while
51
While syntax reminder
A n | x | A ArithOp A | (A)
ArithOp - | + | * | /
B true | false
| A = A | A A | B | B B | (B)
S x := A | skip | S; S | { S }| if B then S else S
| while B S
52
n Num numeralsx Var program variables
Semantic categories
Z Integers {0, 1, -1, 2, -2, …}
T Truth values {ff, tt}
State Var Z
Example state: =[x5, y7, z0]
Lookup: (x) = 5
Update: [x6] = [x6, y7, z0]
53
Semantics of expressions
54
Semantics of arithmetic expressions
• Semantic function A : State Z
• Defined by induction on the syntax tree
n = n
x = (x)
a1 + a2 = a1 + a2
a1 - a2 = a1 - a2
a1 * a2 = a1 a2
(a1) = a1 --- not needed
- a = 0 - a1
• Compositional
• Expressions in While are side-effect free
55
Arithmetic expression exercise
Suppose x = 3
Evaluate x+1
56
Semantics of boolean expressions
• Semantic function B : State T
• Defined by induction on the syntax tree
true = tt
false = ff
a1 = a2 =
a1 a2 =
b1 b2 =
b =
• Compositional
• Expressions in While are side-effect free
57
Natural operating semantics
• Developed by Gilles Kahn [STACS 1987]
• Configurations
S, Statement S is about to execute on state
Terminal (final) state
• Transitions
S, ’ Execution of S from will terminatewith the result state ’
– Ignores non-terminating computations
58
Natural operating semantics
• defined by rules of the form
• The meaning of compound statements is defined using the meaning immediate constituent statements
59
S1, 1 1’, … , Sn, n n’S, ’
if…
premise
conclusion
side condition
Natural semantics for While
60
x := a, [x a][assns]
skip, [skipns]
S1, ’, S2, ’ ’’S1; S2, ’’
[compns]
S1, ’ if b then S1 else S2, ’
if b = tt[ifttns]
S2, ’ if b then S1 else S2, ’
if b = ff[ifffns]
axioms
Natural semantics for While
61
S, ’, while b S, ’ ’’while b S, ’’
if b = tt[whilettns]
while b S, if b = ff[whileffns]
Non-compositional
Next lecture:Correctness of lowering