intermediate representations saumya debray dept. of computer science the university of arizona...

28
Intermediate Representations Saumya Debray Dept. of Computer Science The University of Arizona Tucson, AZ 85721

Upload: jorden-oxford

Post on 22-Dec-2015

224 views

Category:

Documents


3 download

TRANSCRIPT

Intermediate Representations

Saumya DebrayDept. of Computer ScienceThe University of Arizona

Tucson, AZ 85721

The Role of Intermediate Code

Intermediate Code 2

lexical analysissyntax analysisstatic checking

intermediate code

generation

final code generation

source code

finalcode

tokens intermediatecode

Intermediate Code 3

Why Intermediate Code?

• Closer to target language. – simplifies code generation.

• Machine-independent.– simplifies retargeting of the compiler.– Allows a variety of optimizations to be implemented in

a machine-independent way.

• Many compilers use several different intermediate representations.

Intermediate Code 4

Different Kinds of IRs

• Graphical IRs: the program structure is represented as a graph (or tree) structure.Example: parse trees, syntax trees, DAGs.

• Linear IRs: the program is represented as a list of instructions for some virtual machine.Example: three-address code.

• Hybrid IRs: combines elements of graphical and linear IRs.Example: control flow graphs with 3-address code.

Intermediate Code 5

Graphical IRs 1: Parse Trees

• A parse tree is a tree representation of a derivation during parsing.

• Constructing a parse tree:– The root is the start symbol S of the grammar.– Given a parse tree for X , if the next derivation step is

X 1…n then the parse tree is obtained as:

Intermediate Code 6

Graphical IRs 2: Abstract Syntax Trees (AST)

A syntax tree shows the structure of a program by abstracting away irrelevant details from a parse tree.– Each node represents a computation to be performed;– The children of the node represents what that

computation is performed on.

Intermediate Code 7

Abstract Syntax Trees: Example

Grammar :E E + T | T

T T * F | F

F ( E ) | id

Input: id + id * id

Parse tree:

Syntax tree:

Intermediate Code 8

Syntax Trees: Structure

• Expressions:– leaves: identifiers or constants;– internal nodes are labeled with

operators;– the children of a node are its operands.

• Statements:– a node’s label indicates what kind of

statement it is;– the children correspond to the

components of the statement.

Intermediate Code 9

Graphical IRs 3: Directed Acyclic Graphs (DAGs)A DAG is a contraction of an AST that avoids

duplication of nodes.• reduces compiler memory requirements;• exposes redundancies.

E.g.: for the expression (x+y)*(x+y), we have:

AST: DAG:

Intermediate Code 10

Linear IRs

• A linear IR consists of a sequence of instructions that execute in order.– “machine-independent assembly code”

• Instructions may contain multiple operations, which (if present) execute in parallel.

• They often form a starting point for hybrid representations (e.g., control flow graphs).

Intermediate Code 11

Linear IR 1: Three Address Code

• Instructions are of the form ‘x = y op z,’ where x, y, z are variables, constants, or “temporaries”.

• At most one operator allowed on RHS, so no ‘built-up” expressions.Instead, expressions are computed using temporaries

(compiler-generated variables).

• The specific set of operators represented, and their level of abstraction, can vary widely.

Intermediate Code 12

Three Address Code: Example

• Source: if ( x + y*z > x*y + z) a = 0;

• Three Address Code:t1 = y*zt2 = x+t1 // x + y*zt3 = x*yt4 = t3+z // x*y + zif (t2 t4) goto La = 0

L:

Intermediate Code 13

An Example Intermediate Instruction Set• Assignment:

– x = y op z (op binary)– x = op y (op unary); – x = y

• Jumps:– if ( x op y ) goto L (L a label); – goto L

• Pointer and indexed assignments:– x = y[ z ]– y[ z ] = x– x = &y– x = *y– *y = x.

• Procedure call/return:– param x, k (x is the kth

param)– retval x– call p– enter p– leave p– return– retrieve x

• Type Conversion:– x = cvt_A_to_B y (A, B base

types) e.g.: cvt_int_to_float• Miscellaneous

– label L

Intermediate Code 14

Three Address Code: Representation• Each instruction represented as a structure called a

quadruple (or “quad”):– contains info about the operation, up to 3 operands.– for operands: use a bit to indicate whether constant or Symbol

Table pointer.

E.g.: x = y + z if ( x y ) goto L

Intermediate Code 15

Linear IRs 2: Stack Machine Code

• Sometimes called “One-address code.”• Assumes the presence of an operand stack.

– Most operations take (pop) their operands from the stack and push the result on the stack.

• Example: code for “x*y + z”

Stack machine code push x

push y

mult

push z

add

Three Address Code tmp1 = x

tmp2 = y

tmp3 = tmp1 * tmp2

tmp4 = z

tmp5 = tmp3 + tmp4

Intermediate Code 16

Stack Machine Code: Features

• Compact– the stack creates an implicit name space, so many

operands don’t have to be named explicitly in instructions.

– this shrinks the size of the IR.

• Necessitates new operations for manipulating the stack, e.g., “swap top two values”, “duplicate value on top.”

• Simple to generate and execute. • Interpreted stack machine codes easy to port.

Intermediate Code 17

Linear IRs 3: Register Transfer Language (GNU RTL)

• Inspired by (and has syntax resembling) Lisp lists.

• Expressions are not “flattened” as in three-address code, but may be nested.– gives them a tree structure.

• Incorporates a variety of machine-level information.

Intermediate Code 18

RTLs (cont’d)

Low-level information associated with an RTL expression include:

• “machine modes” – gives the size of a data object;

• information about access to registers and memory;

• information relating to instruction scheduling and delay slots;

• whether a memory reference is “volatile.”

Intermediate Code 19

RTLs: Examples

Example operations:– (plus:m x y), (minus:m x y), (compare:m x y), etc.,

where m is a machine mode.

– (cond [test1 value1 test2 value2 …] default)

– (set lval x) (assigns x to the place denoted by lval).

– (call func argsz), (return)

– (parallel [x0 x1 …]) (simultaneous side effects).

– (sequence [ins1 ins2 … ])

Intermediate Code 20

RTL Examples (cont’d)

• A call to a function at address a passing n bytes of arguments, where the return value is in a (“hard”) register r:

(set (reg:m r) (call (mem:fm a) n))

– here m and fm are machine modes.

• A division operation where the result is truncated to a smaller size:

(truncate:m1 (div:m2 x (sign_extend:m2 y)))

Intermediate Code 21

Hybrid IRs

• Combine features of graphical and linear IRs:– linear IR aspects capture a lower-level program

representation;– graphical IR aspects make control flow behavior

explicit.

• Examples:– control flow graphs– static single assignment form (SSA).

Intermediate Code 22

Hybrid IRs 1: Control Flow Graphs

Example: L1: if x > y goto L0 t1 = x+1 x = t1 L0: y = 0 goto L1

Definition: A control flow graph for a function is a directed graph G = (V, E) such that:– each v V is a straight-line code sequence (“basic block”); and– there is an edge a b E iff control can go directly from a to b.

Intermediate Code 23

Basic Blocks

• Definition: A basic block B is a sequence of consecutive instructions such that:1. control enters B only at its beginning; and

2. control leaves B only at its end (under normal execution); and

• This implies that if any instruction in a basic block B is executed, then all instructions in B are executed.Þ for program analysis purposes, we can treat a

basic block as a single entity.

Intermediate Code 24

Identifying Basic Blocks

1. Determine the set of leaders, i.e., the first instruction of each basic block:

– the entry point of the function is a leader;– any instruction that is the target of a branch is a

leader;– any instruction following a (conditional or

unconditional) branch is a leader.

2. For each leader, its basic block consists of:– the leader itself;– all subsequent instructions upto, but not including,

the next leader.

Intermediate Code 25

Example

int dotprod(int a[], int b[], int N)

{

int i, prod = 0;

for (i = 1; i N; i++) {

prod += a[i]b[i];

}

return prod;

}

No. Instruction leader? Block No.

1 enter dotprod Y 1

2 prod = 0 1

3 i = 1 1

4 t1 = 4*i Y 2

5 t2 = a[t1] 2

6 t3 = 4*i 2

7 t4 = b[t3] 2

8 t5 = t2*t4 2

9 t6 = prod+t5 2

10 prod = t6 2

11 t7 = i+i 2

12 i = t7 2

13 if i N goto 4 2

14 retval prod Y 3

15 leave dotprod 3

16 return 3

Intermediate Code 26

Hybrid IRs 2: Static Single Assignment Form

• The Static Single Assignment (SSA) form of a program makes information about variable definitions and uses explicit.– This can simplify program analysis.

• A program is in SSA form if it satisfies:– each definition has a distinct name; and– each use refers to a single definition.

• To make this work, the compiler inserts special operations, called -functions, at points where control flow paths join.

Intermediate Code 27

SSA Form: - Functions

• A -function behaves as follows:

x1 = … x2 = …

x3 = (x1, x2)

This assigns to x3 the value of x1, if control comes from the left, and that of x2 if control comes from the right.

• On entry to a basic block, all the -functions in the block execute (conceptually) in parallel.

Intermediate Code 28

SSA Form: Example

Example:

Original code Code in SSA form