basic block and trace

45
1 Basic Block and Trace Chapter 8

Upload: kaveri

Post on 18-Mar-2016

20 views

Category:

Documents


1 download

DESCRIPTION

Basic Block and Trace. Chapter 8. (1) Semantic gap. Tree IR. Machine Languages. (2) IR is not proper for optimization analysis. Eg: - Some expressions have side effects ESEQ, CALL Tree representation => no execution order is assumed. - Semantic Gap - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Basic  Block and Trace

1

Basic Block and Trace

Chapter 8

Page 2: Basic  Block and Trace

2

Tree IR

(1) Semantic gap

(2) IR is not proper for optimization analysis

Machine Languages

Eg: - Some expressions have side effects

ESEQ, CALL Tree representation => no execution order is assumed.

- Semantic Gap CJUMP vs. Jump on Condition 2 targets 1 target + “fall through”

Page 3: Basic  Block and Trace

3

Semantic Gap Continued

- ESEQ within expression is inconvenient- evaluation order matters

- CALL node within expression causes side effect ! - CALL node within the argument – expression of other CALL nodes will cause problem if the args of result are passed in the same (one) register.

- Rewrite Tree into an equivalent tree(Canonical Form) SEQ

SEQ

SEQ SEQS1

S2 S3 S4 S5

=> S1;S2;S3;S4;S5

Page 4: Basic  Block and Trace

4

Transformation

Step 1: A tree is rewritten into a list of “canonical trees” without SEQ or ESEQ nodes.

-> Tree. StmList linearize(Tree.Stm S);

Step 2: Grouping into a set of “basic blocks” which contains no internal jumps or labels

-> BasicBlocks

Step 3: Basic Blocks are ordered into a set of “traces” in which every CJUMP is immediately followed by

false label.-> Trace Schedule(BasicBlock b)

Page 5: Basic  Block and Trace

5

8.1 Canonical Trees

Def : canonical trees as having following properties:1. No SEQ or ESEQ2. The parent of each CALL is either EXP(..) or MOVE(TEMP t, ….)

=> Separate SEQs and EXPressions

Page 6: Basic  Block and Trace

6

Transformations on ESEQ-move ESEQ to higher level.

Eg.ESEQ

ESEQS1

S2e

ESEQ

SEQ

S1 S2

e

ESEQ

S e1

BINOPe2op ESEQ

S e1

MEM

ESEQ

S e1

JUMP

ESEQ

S e1

CJUMPop l1 l2e

ESEQBINOPS

op

e1 e2

ESEQ

S MEM

e1

ESEQJMPS

e1

SEQCJUMPS

op

e1 e2 l2l1

Case 1:

Case 2:

Page 7: Basic  Block and Trace

7

Case 3:

ESEQ

S

e1

BINOP

e2

op ESEQ

S1 e2

CJUMPop ㅣ

1

ㅣ 2e1

ESEQESEQMOVE

S

t

BINOP

op

TEMP e2

TEMP e1

t

SEQ

S CJUMP

op

e2 ㅣ1

ㅣ 2

ESEQMOVE

TEMP e1

t

t

TEMP

Page 8: Basic  Block and Trace

8

Case 4: When S does not affect e1 in case 3 and (s and e1 have not I/O)

ESEQ

S

e1

BINOP

e2

op

ESEQ

S BINOP

op

e1 e2

ESEQ

S1 e2

CJUMPop ㅣ

1

ㅣ 2e1

SEQ

S CJUMP

op

e1 e2 ㅣ1

ㅣ 2

if s,e1 commute if s,e1 commute

Page 9: Basic  Block and Trace

9

How can we tell if tow expressions commute?

MOVE(MEM(x),y) MEM(z)x = z (aliased) not commutex z commute

CONST(n) can commute with any Expression!!

=> Be Conservative !!

?? We don’tknow yet!

Page 10: Basic  Block and Trace

10

General Rewriting Rules

1. Identify the subexpressions.2. Pull the ESEQs out of the stm or exp.

Ex: [e1,e2,ESEQ(S,e3)]

-> (s1;[e1,e2,e3]) s1, e1,e2 commute

-> (SEQ (MOVE(t1, e1),SEQ(MOVE(t2,e2),s));

-> (SEQ(MOVE(t1,e1),s); [TEMP(t1),e2,e3]

Reorder(ExpListexps) => (stms; ExpList)

Page 11: Basic  Block and Trace

11

MOVING CALLS TO TOPLEVEL

CALL returns its result in the same register TEMP(Ri) BINOP(+,CALL(…),CALL(…))

SolutionCALL(fun,args) -> ESEQ(MOVE(TEMP t,CALL()),TEMP t)

Then eliminate ESEQ.=> need extra TEMP(t) (registers)

do_stm(MOVE(TEMP tnew, CALL(f, args))) do_stm (EXP(CALL(f, args)))

- will not reorder on CALL node - will reorder on f and args as the children of MOVE

overwrite TEMP(RV)

Page 12: Basic  Block and Trace

12

A LINEAR LIST OF STATEMENTS

S0 [S0’ ; ]

SEQ

SEQ

a b

c

SEQSEQa

b c

SEQ(SEQ(SEQ…())) => a;b;c

linear(stm s)

Page 13: Basic  Block and Trace

13

8.2 TAMING CONDITIONAL BRANCHES

BASIC BLOCKa sequence of statements entered at the beginning

exited at the end- The 1st stmt is a LABEL- The last stmt is a JUMP

a CJUMP- no other LABELs, JUMPs, CJUMPs..

FT

CJUMP

Cond

CondCJUMP

T:F: …… t

.

.

.JUMP

LABEL

Page 14: Basic  Block and Trace

14

Algorithm

Scan from beginning to end- when Label is found, begin new Block- when JUMP or CJUMP is found, a block is ended- If block ends without JUMP or CJUMP, insert JUMP LABEL, LABEL ;

Epilogue block of Function.Label it as DONEand put JUMP DONE at the end of body of the function.

-> Canon.BasicBlocks.

Page 15: Basic  Block and Trace

15

Trace:a sequence of stmts that could beconsecutively executed during the execution of the program.

We want a set of traces that exactly covers the program

: one block in one trace.

To reduce JUMPs, fewer traces are preferred !!

Traces

Exit

Page 16: Basic  Block and Trace

16

5

34

2

4

7

6

3

2

1

5

T F

7

6

1

JUMP

T->FF->T

remove JUMP

JUMP on False

Idea !!

T F

Page 17: Basic  Block and Trace

17

Algorithm 8.2 (Canon.Trace Schedule)

Put all the blocks of the Program into a list Q.while Q is not empty

Start a new(empty) trace, call it T.Remove the head element b from Q.while b is not marked Mark b; T <- T;b. Examine the successors of b. if there is nay unmarked successor C

b <- C. END the current trace T.

Page 18: Basic  Block and Trace

18

Finishing Up

- analysis and optimizations are efficient for basic blocks (not for stmts level)

- Some local arrangement(1)CJUMP + false Label => OK(2)CJUMP + true Label => reverse

condition(3)CJUMP lt,lf + no lt lf

CJUMP lt, lf’; LABEL lf’; JUMP lf JUMP on true lt; JUMP lf chance to optimized !!

Finding optimal trace is not easy !!

Page 19: Basic  Block and Trace

19

Instruction Selection

Chapter 9

Page 20: Basic  Block and Trace

20

What we are going to do.

Tree IR machine Instruction(Jouette Architecture or SPARC or MIPS or Pentium or T )

LOAD R1,ea(c);MEM

CONST

BINOP

+ ea

C

Page 21: Basic  Block and Trace

21

Machine Example - Jouette ArchitectureRegister R0 always contains zero

ADD ri <- rj+rk+

MUL ri <- rj*rk*

SUB ri <- rj-rk-

DIV ri <- rj/rk

BINOP/

ADDI ri <- rj+C+

CONST+

CONSTCONST

SUBI ri <- rj-C-

CONSTLOAD ri <- M[rj+C]

+CONST

CONST

MEM MEM

+CONST

MEM MEM

Instructions produces a result in a register => EXP

ri TEMP

Page 22: Basic  Block and Trace

22

STORE M[rj+C] <-ri

+CONST

CONST

MEM MEM

+CONST

MEM MEMMOVE MOVE MOVE MOVE

MOVEM M[rj] <-M[ri]

MEMMOVE

MEM

Execution of instructions produce side effects on Mem.=> Stm

Page 23: Basic  Block and Trace

23

Tiling the IR tree ex: a[i]:= x i:register a,x:frame var

2 LOAD r1<-M[fp+a]4 ADDI r2<- r0 + 45 MUL r2 <- ri*r2

6 ADD r1 <- r1+r28 LOAD r2<-M[fp+x]9 STORE M[r1+0] <- r2

* CONST x

MEM

FP

+CONST a

MEM

FP

+

MOVE

+

MEM

CONST 4TEMP i1

23 4

5

67

8

9

Page 24: Basic  Block and Trace

24

Another Solution ex: a[i]:= x i:register a,x:frame var

2 LOAD r1<-M[fp+a]4 ADDI r2<- r0 + 45 MUL r2 <- ri*r2

6 ADD r1 <- r1+r28 ADDI r2<- fp+x9 MOVEM M[r1] <- M[r2 ]

* CONST x

MEM

FP

+CONST a

MEM

FP

+

MOVE

+

MEM

CONST 4TEMP i1

23 4

5

67

8

9

Page 25: Basic  Block and Trace

25

Or Another Tiles with a different set of tile-pattern

3 LOAD r1<-M[r1+0]4 ADDI r2<- r0 + 45 MUL r2 <- ri*r2

6 ADD r1 <- r1+r2

8 ADD r2<- fp+ r2

10 STORE M[r1+0] <- r2

1 ADDI r1<- r0 + a2 ADD r1 <- fp +r1

7 ADDI r2<- r0 + x

9 LOAD r2<-M[r2+0]

* CONST x

MEM

FP

+CONST a

MEM

FP

+

MOVE

+

MEM

CONST 4TEMP i12

3

45

67

8

9

10

Page 26: Basic  Block and Trace

26

OPTIMAL and OPTIMUM TILINGS

Optimum Tiling : one whose tiles sum to the lowest possible value.

cost of tile : instr. exe. time, # of bytes, ......

Optimal Tiling : one where no two adjacent tiles can be combined into a single tile of lower cost.

then why we keep ?are enough.30 25

Page 27: Basic  Block and Trace

27

Algorithms for Instruction Selection

1. Optimal vs Optimum simple maybe hard

2. CISC vs RISC (Complex Instr. Set Computer) tile size large small

optimal >= optimum optimal ~= optimum

instruction cost varies almost same! on addressing mode

Page 28: Basic  Block and Trace

28

Maximal Munch – optimal tiling algorithm

1. starting at root, find the largest tile that fits.

2. repeat step 1 for several subtrees which are generated(remain)!!

3. Generate instructions for each tile (which are in reverse order)

=> traverse tree of tiles in post-orderWhen several tiles can be matched, select the largest tile(which covers the most nodes).

If same tiles are matched, choose an arbitrary one.

Page 29: Basic  Block and Trace

29

Implementation

See Program 9.3 for example(p181)

case statements for each root type!!

There is at least one tile for each type of root node!!

Page 30: Basic  Block and Trace

30

MEM

B30+2 A40

10 20

Dynamic Programming – finding optimum tiling finding optimum solutions based on

optimum solutions of each subproblem!!

1. Assign cost to every node in the

tree.

2. Find several matches.

3. Compute the cost for each match.

4. Choose the best one.

5. Let the cost be the value of node.

10+20+40

+4=

30+2+40+5=?

Page 31: Basic  Block and Trace

31

+

MEM

ADDIADDI

+

+CONSTCONST2CONST1

+CONST

TileCost LeavesCost Total

1 1+1 3

1 1 2

1 1 2

CONST

MEM

+CONST

MEM

+

MEM

2

1 1

LOAD ri<-M[rj] LOAD ri<-M[rj+c] LOAD ri<-M[rj+c]

cost 1+2 1+1 1+1

Example

MEM node

Page 32: Basic  Block and Trace

32

Tree Grammars

Example : Schizo-Jouette machine

ADD di <- dj +dk

MUL di <- dj *dk

SUB di <- dj - dk

DIV di <- dj /dk

d

d+

d

d

d*

d

d

d-

d

d

d/

d

ADDI di <- dj +C

SUBI di <- dj -C

d

d+

CONST CONST

d+

d CONSTd

d

d-

CONST

MOVEA dj<- ai

MOVED aj<- di

da

ad

A generalization of DP for machines with complex instruction set and several classes of registers and addressing modes.

ai : address register

dj : data register

Page 33: Basic  Block and Trace

33

LOAD di<-M[aj+C]

STORE M[aj+C]<- di

MOVEM M[aj] <- M[ai ]

+CONST

CONST

dMEM dMEM

+CONST

dMEM

a a

a

dMEM

+CONST

CONST

MEM MEM

+CONST

MEM MEMMOVE MOVE MOVE MOVE

a a

ad d d d

MEMMOVE

a

MEM

a

Page 34: Basic  Block and Trace

34

Use Context-free grammar to describe the tiles;ex: nonterminal s : statement

d : data a : address

d -> MEM(+(a,CONST))d-> MEM(+(CONST,a))d-> MEM(CONST)d-> MEM(a)

d -> a

a -> d

MOVEA

MOVED

LOAD

=> ambiguous grammar!!

-> parse based on the minimum cost!!

s MOVE(MEM(+(a,CONST)), d)STORE

s MOVE(M(a),M(a))MOVEM

Page 35: Basic  Block and Trace

35

Efficiency of Tiling AlgorithmsOrder of Execution Cost for “Maximal Munch

& Dynamic Programming”

T : # of different tiles.K : # of non-leaf node of tile (in average)K’: largest # of node that need to be examined to choose the right tile ~= the size of largest tileT’: average # of tile-patterns which matches at each tree node

Ex: for RISC machineT = 50, K = 2, K’= 4, T’ = 5 ,

Page 36: Basic  Block and Trace

36

N : # of input nodes in a tree.

complexity = N/K * ( K’ + T’) ofmaximal Munch # of node

(#of patterns)to be examined

to find matched pattern

to findminimum

cost

complexity ofDynamic Programming

= N * (K’ + T’)

“linear to N”

Page 37: Basic  Block and Trace

37

9.2 RISC vs CISC

RISC 1. 32registers.2. only one class of integer/pointer registers.3. arithmetic operations only between registers.4. “three-address” instruction form r1<-r2 & r35. load and store instructions with only the

M[reg+const] addressing mode.6. every instruction exactly 32 bits long.7. One result or effect per instruction.

Page 38: Basic  Block and Trace

38

CISC(Complex Instruction Set Computers)

Complex Addressing Mode

1. few registers (16 or 8 or 6).2. registers divided into different classes.3. arithmetic operations can access registers or memory through “addressing mode”.4. “two-address” instruction of the form r1<-r1 & r2.5. several different addressing modes.6. variable length instruction format.7. instruction with side effects. eg: auto-increment/decrement.

Page 39: Basic  Block and Trace

39

Solutions for CISC1. Few registers.

- do it in register allocation phase.2. Classes of registers.

- specify the operands and result explicitly. - ex: left opr of arith op (e.g. mul) must be eax

- t1 t2 x t3 ==>- move eax, t2 eax t2- mul t3 eax eax x t3; edx

garbage - mov t1 eax t1 eax3. Two addressing instructions

- add extra move instruction -> resgister allocation

t1 <- t2+t3 move t1,t2 t1<- t2 add t1,t3 t1<- t1+t3

Page 40: Basic  Block and Trace

40

4. Arithmetic operations can address memory.- actually handled by “register spill” phase.

- load memory operand into register and

store back into memory -> may trash registers!!

-ex: add [ebp – 8,] ecx is equivalent to - mov eax, [ebp –8] - add eax, ecx - mov [ebp – 8], eax

Page 41: Basic  Block and Trace

41

5. several addressing modes- takes time to execute (no faster than multiInstr seq) “trash” fewer registers short instruction sequence select appropriate patterns for addressing mode.6. Variable Length Instructions

- let assembler do generate binary code.7. Instruction with Side effect

eg: r2 <- M[r1]; r1<- r1 + 4; - difficult to model!!

(a) ignore the auto increment-> forget it! (b) try to match special idioms

(c) try to invent new algorithms.

Page 42: Basic  Block and Trace

42

• assembly language instruction without register assignment.

package assem;public abstract class Instr { public String assem; // instr template public abstract temp.TempList use(); // retrun src list public abstract temp.TempList def(); // return dst list public abstract Targets jumps(); // return jump public String format(temp.tempMap m); // txt of assem instr }public Targets(temp.LabelList labes);

Abstract Assembly Language Instructions

Page 43: Basic  Block and Trace

43

// dst, src and jump can be null.public OPER(String assem, TempList dst, TempList src,

temp.LabelList jump);public OPER(String assem, TempList dst, TempList src);

public MOVE(String assem, Temp dst, Temp src)

public LABEL(String assem, temp.Label label);

Page 44: Basic  Block and Trace

44

Example

• assem.Instr is independent of th etarget machine assembly.

ex: MEM( +( fp, CONST(8)) ==> new OPER(“LOAD ‘d0 <- M[‘s0 + 8]”, new TempList(new Temp(),null), new TempList(frame.FP(), null));call format(…) on the above Instr. we get LOAD r1 <- M[r27+8] assume reg. allocator assign r1 to the new Temp and r27 is

the frame pointer register.

Page 45: Basic  Block and Trace

45

Another Example• *(+(Temp(t87), CONST(3)), MEM(temp(t92))• assem dst src• ADDI ‘d0 <- ‘s0 + 3 t908 t87• LOAD ‘d0 <- M[‘s0+0] t909 t92• MUL ‘d0 <- ‘s0*’s1 t910 t908,t909• after register allocation, the instr look like:

– ADDI r1 <- r12 + 3 t908/r1 t87/r12– LOAD r2 <- M[r13+0] t909/r2 t92/r13– MUL r1 <- r1*r2 t910/r1

• Two-address instructions– t1 t1 + t2 ==>– assem dst src– add ‘d0 ‘s1 t1 t1,t2