topic 2: foundamentals in basic back-end optimization

53
111/05/08 \course\cpeg421-10F\Topic-2.ppt 1 Topic 2: Foundamentals in Basic Back-End Optimization Instruction Selection Instruction scheduling Register allocation

Upload: joyce

Post on 29-Jan-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Topic 2: Foundamentals in Basic Back-End Optimization. Instruction Selection Instruction scheduling Register allocation. ABET Outcome. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 1

Topic 2: Foundamentals in Basic Back-End Optimization

Instruction Selection

Instruction scheduling

Register allocation

Page 2: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 2

ABET Outcome

• Ability to apply knowledge of basic code generation techniques,

e.g. Instruction selection, instruction scheduling, register

allocation, to solve code generation problems.

• Ability to analyze the basic algorithms on the above techniques

and conduct experiments to show their effectiveness.

• Ability to use a modern compiler development platform and tools

for the practice of above.

• A Knowledge on contemporary issues on this topic.

Page 3: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 3

• Good IPO• Good LNO• Good global

optimization• Good integration of

IPO/LNO/OPT• Smooth information

passing between FE and CG

• Complete and flexible support of inner-loop scheduling (SWP), instruction scheduling and register allocation

Inter-ProceduralOptimization (IPA)

Loop NestOptimization (LNO)

Global Optimization(OPT)

Source

InnermostLoop

scheduling

Global instscheduling

Reg alloc

Local instscheduling

Executable

ArchModels

BE/CG

ME

General Compiler Framework

Page 4: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 4

Three Basic Back-End Optimization

Instruction selection• Mapping IR into assembly code• Assume a fixed storage mapping & code shape• Combining operations, using address modes

Instruction scheduling• Reordering operations to hide latencies• Assume a fixed program (set of operations)• Changes demand for registers

Register allocation• Deciding which values will reside in registers• Changes the storage mapping may add false sharing• Concerns about placement of data & memory operations

Page 5: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 5

Instruction Selection

Some slides are from CS 640 lecture in

George Mason University

Page 6: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 6

Reading List

Some slides are from CS 640 lecture in

George Mason University

(1) K. D. Cooper & L. Torczon, Engineering a Compiler, Chapter 11

(2) Dragon Book, Chapter 8.7, 8.9

Page 7: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 7

Objectives

• Introduce the complexity and importance

of instruction selection

• Study practical issues and solutions

• Case study: EBO (Extended Basic block

Optimization) in Open64

Page 8: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 8

Instruction Selection: Retargetable

Machine description should also help with scheduling & allocation

Front End Back EndMiddle End

Infrastructure

Tables

PatternMatchingEngine

Back-endGenerator

Machinedescription

Description-based retargeting

This is simplistic but useful view

Page 9: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 9

Complexity of Instruction Selection

Modern computers have many ways to do things.

Consider a register-to-register copy

• Obvious operation is: mv rj, ri

• Many others exist add rj, ri,0 sub rj, ri, 0 rshiftI rj, ri, 0

mul rj, ri, 1 or rj, ri, 0 divI rj, r, 1

xor rj, ri, 0 others …

Page 10: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 10

Complexity of Instruction Selection

(Cont.)

• Multiple addressing modes

• Each alternate sequence has its cost Complex ops (mult, div): several cycles

Memory ops: latency vary

• Sometimes, cost is context related

• Use under-utilized FUs

• Dependent on objectives: speed, power, code size

Page 11: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 11

Complexity of Instruction Selection

(Cont.)• Additional constraints on specific operations

Load/store multiple words: contiguous registers

Multiply: need special register Accumulator

• Interaction between instruction selection,

instruction scheduling, and register allocation For scheduling, instruction selection predetermines latencies and

function units

For register allocation, instruction selection pre-colors some

variables. e.g. non-uniform registers (such as registers for

multiplication)

Page 12: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 12

Instruction Selection Techniques

Tree Pattern-Matching

• Tree-oriented IR suggests pattern matching on trees

• Tree-patterns as input, matcher as output

• Each pattern maps to a target-machine instruction sequence

• Use dynamic programming or bottom-up rewrite systems

Peephole-based Matching

• Linear IR suggests using some sort of string matching

• Inspired by peephole optimization

• Strings as input, matcher as output

• Each string maps to a target-machine instruction sequence

In practice, both work well; matchers are quite different.

Page 13: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 13

A Simple Tree-Walk Code Generation Method

• Assume starting with a Tree-like IR

• Starting from the root, recursively walking

through the tree

• At each node use a simple (unique) rule to

generate a low-level instruction

Page 14: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 14

Tree Pattern-Matching

Assumptions

tree-like IR - an AST

Assume each subtree of IR – there is a corresponding set of tree

patterns (or “operation trees” - low-level abstract syntax tree)

Problem formulation: Find a best mapping of the AST to

operations by “tiling” the AST with operation trees (where

tiling is a collection of (AST-node, operation-tree) pairs).

Page 15: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 15

Tile AST

gets

+ -

val num

ref *

ref num

ref

val num

lab num

+ +

Tile 6

Tile 1

Tile 2

Tile 3

Tile 4

Tile 5

An AST tree

Page 16: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 16

Goal is to “tile” AST with operation trees. • A tiling is collection of <ast-node, op-tree > pairs ◊ ast-node is a node in the AST ◊ op-tree is an operation tree ◊ <ast-node, op-tree> means that op-tree could

implement the subtree at ast-node • A tiling ‘implements” an AST if it covers every

node in the AST and the overlap between any two trees is

limited to a single node ◊ <ast-node, op-tree> tiling means ast-node is

also covered by a leaf in another operation tree in the tiling, unless it is the root

◊ Where two operation trees meet, they must be compatible (expect the value in the same location)

Tile AST with Operation Trees

Page 17: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 17

Tree Walk by Tiling: An Example

a = a + 22;

MOVE

SP

+

a

+

MEM

SP

+

a

22

Page 18: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 18

Example

a = a + 22;

MOVE

+

MEM

SP

+

a

22SP

+

at1

t2t3 ld t1, [sp+a]

st [t3], t2

add t2, t1, 22add t3, sp, a

Page 19: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 19

Example: An Alternative

a = a + 22;

MOVE

+

MEM

SP

+

a

22SP

+

at1

t2 ld t1, [sp+a]

st [sp+a], t2add t2, t1, 22

Page 20: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 20

Finding Matches to Tile the Tree

• Compiler writer connects operation trees to AST

subtrees ◊ Provides a set of rewrite rules ◊ Encode tree syntax, in linear form ◊ Associated with each is a code template

Page 21: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 21

Generating Code in Tilings

Given a tiled tree

• Postorder treewalk, with node-dependent order for children

◊ Do right child before its left child • Emit code sequence for tiles, in order

• Tie boundaries together with register names

◊ Can incorporate a “real” register allocator or

can simply use “NextRegister++” approach

Page 22: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 22

Optimal Tilings

• Best tiling corresponds to least cost

instruction sequence

• Optimal tiling

no two adjacent tiles can be combined to a

tile of lower cost

Page 23: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 23

Dynamic Programming for Optimal Tiling

• For a node x, let f(x) be the cost of the optimal

tiling for the whole expression tree rooted at x.

Then

)(min)( Ty xT

f(y)Txf tile of child covering tile

)cost(

Page 24: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 24

Dynamic Programming for Optimal Tiling (Con’t)

• Maintain a table: node x the optimal tiling

covering node x and its cost

• Start from root recursively: check in table for optimal tiling for this node

If not computed, try all possible tiling and find the optimal,

store lowest-cost tile in table and return

• Finally, use entries in table to emit code

Page 25: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 25

Peephole-based Matching

• Basic idea inspired by peephole optimization

• Compiler can discover local improvements locally

◊ Look at a small set of adjacent operations ◊ Move a “peephole” over code & search for improvement

A Classic example is store followed by loadst $r1,($r0)ld $r2,($r0)

st $r1,($r0) move $r2,$r1

Original code Improved code

Page 26: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 26

Implementing Peephole Matching

• Early systems used limited set of hand-coded patterns

• Window size ensured quick processing

• Modern peephole instruction selectors break problem

into three tasks

ExpanderIRLLIR

SimplifierLLIRLLIR

MatcherLLIRASM

IR LLIR LLIR ASM

LLIR: Low Level IR

ASM: Assembly Code

Page 27: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 27

ExpanderIRLLIR

SimplifierLLIRLLIR

MatcherLLIRASM

IR LLIR LLIR ASM

Implementing Peephole Matching (Con’t)

Simplifier• Looks at LLIR through

window and rewrites it

• Uses forward substitution, algebraic simplification, local constant propagation, and dead-effect elimination

• Performs local optimization within window

• This is the heart of the peephole system and benefit of peephole optimization shows up in this step

Expander• Turns IR code into a low-

level IR (LLIR)• Operation-by-operation,

template-driven rewriting

• LLIR form includes all direct effects

• Significant, albeit constant, expansion of size

Matcher• Compares simplified LLIR

against a library of patterns

• Picks low-cost pattern that captures effects

• Must preserve LLIR effects, may add new ones

• Generates the assembly code output

Page 28: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 28

Some Design Issues of Peephole Optimization

• Dead values Recognizing dead values is critical to remove useless

effects, e.g., condition code

Expander Construct a list of dead values for each low-level operation by

backward pass over the code

Example: consider the code sequence:

r1=ri*rj

cc=fx(ri, rj) // is this dead ?

r2=r1+ rk

cc=fx(r1, rk)

Page 29: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 29

Some Design Issues of Peephole Optimization (Cont.)

• Control flow and predicated operations

A simple way: clear the simplifier’s window when

it reaches a branch, a jump, or a labeled or

predicated instruction

A more aggressive way: to be discussed next

Page 30: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 30

Some Design Issues of Peephole Optimization (Cont.)

• Physical vs. Logical Window

Simplifier uses a window containing adjacent low

level operations

However, adjacent operations may not operate on

the same values

In practice, they may tend to be independent for

parallelism or resource usage reasons

Page 31: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 31

Some Design Issues of Peephole Optimization (Cont.)

• Use Logical Window

Simplifier can link each definition with the next use

of its value in the same basic block

Simplifier largely based on forward substitution

No need for operations to be physically adjacent

More aggressively, extend to larger scopes beyond a basic

block.

Page 32: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 32

An Example

Original IR Code

OP Arg1 Arg2 Result

mult 2 y t1

sub x t1 w

LLIR Code

r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

Expand

where (@x,@y,@w are offsets of x, y and w from aglobal location stored in r0

R12: y mem addressR20: w mem addressR13: yR14: t1R17: xR18: w

Page 33: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 33

An Example (Con’t)

LLIR Code

r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

Simplify

LLIR Code

r13 MEM(r0+ @y) r14 2 * r13

r17 MEM(r0 + @x) r18 r17 - r14

MEM(r0 + @w) r18

Original IR Code

OP Arg1 Arg2 Result

mult 2 y t1

sub x t1 w

Page 34: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 34

• Introduced all memory operations & temporary names

• Turned out pretty good code

An Example (Con’t)

MatchLLIR Code

r13 MEM(r0+ @y) r14 2 * r13

r17 MEM(r0 + @x) r18 r17 - r14

MEM(r0 + @w) r18

ILOC Assembly CodeloadAI r0,@y r13

multI 2 * r13 r14 loadAI r0,@x r17 sub r17 - r14 r18

storeAI r18 r0,@w

Original IR Code

OP Arg1 Arg2 Result

mult 2 y t1

sub x t1 w

Page 35: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 35

Simplifier (3-operation window)

LLIR Code

r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

r10 2r11 @yr12 r0 + r11

r10 2r12 r0 + @yr13 MEM(r12)

Page 36: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 36

Simplifier (3-operation window)

LLIR Code r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

r10 2r12 r0 + @yr13 MEM(r12)

r10 2r13 MEM(r0 + @y)r14 r10 x r13

Page 37: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 37

Simplifier (3-operation window)

LLIR Code

r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

r13 MEM(r0 + @y)r14 2 * r13

r15 @x

r10 2r13 MEM(r0 + @y)r14 r10 * r13

Page 38: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 38

Simplifier (3-operation window)

LLIR Code

r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

r13 MEM(r0 + @y)r14 2 * r13

r15 @x

r14 2 * r13

r15 @xr16 r0 + r15

1st op it has rolled out of window

r13 MEM(r0+ @y)

Page 39: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 39

Simplifier (3-operation window)

LLIR Code

r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

r14 2 * r13

r15 @xr16 r0 + r15

r14 2 * r13

r16 r0 + @xr17 MEM(r16)

r13 MEM(r0+ @y)

Page 40: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 40

Simplifier (3-operation window)

LLIR Code

r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

r14 2 * r13

r17 MEM(r0+@x) r18 r17 - r14

r14 2 * r13

r16 r0 + @xr17 MEM(r16)

r13 MEM(r0+ @y)

Page 41: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 41

Simplifier (3-operation window)

LLIR Code

r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

r17 MEM(r0+@x) r18 r17 - r14

r19 @w

r14 2 * r13

r17 MEM(r0+@x) r18 r17 - r14

r13 MEM(r0+ @y) r14 2 * r13

Page 42: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 42

Simplifier (3-operation window)

LLIR Code

r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

r18 r17 - r14

r19 @wr20 r0 + r19

r17 MEM(r0+@x) r18 r17 - r14

r19 @w

r13 MEM(r0+ @y) r14 2 * r13 r17 MEM(r0 + @x)

Page 43: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 43

Simplifier (3-operation window)

LLIR Code

r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

r18 r17 - r14

r20 r0 + @wMEM(r20) r18

r18 r17 - r14

r19 @wr20 r0 + r19

r13 MEM(r0+ @y) r14 2 * r13 r17 MEM(r0 + @x)

Page 44: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 44

Simplifier (3-operation window)

LLIR Code

r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

r18 r17 - r14

MEM(r0 + @w) r18

r18 r17 - r14

r20 r0 + @wMEM(r20) r18

r13 MEM(r0+ @y) r14 2 * r13 r17 MEM(r0 + @x)

Page 45: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 45

Simplifier (3-operation window)

LLIR Code

r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

r18 r17 - r14

MEM(r0 + @w) r18

r18 r17 - r14

r20 r0 + @wMEM(r20) r18

r13 MEM(r0+ @y) r14 2 * r13 r17 MEM(r0 + @x)

Page 46: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 46

An Example (Con’t)

LLIR Code

r10 2 r11 @y r12 r0 + r11

r13 MEM(r12) r14 r10 * r13

r15 @x r16 r0 + r15 r17 MEM(r16) r18 r17 - r14

r19 @w r20 r0 + r19

MEM(r20) r18

Simplify

LLIR Code r13 MEM(r0+ @y) r14 2 * r13

r17 MEM(r0 + @x) r18 r17 - r14

MEM(r0 + @w) r18

Page 47: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 47

Making It All Work

• LLIR is largely machine independent

• Target machine described as LLIR ASM pattern

• Actual pattern matching Use a hand-coded pattern matcher

Turn patterns into grammar & use LR parser

• Several important compilers use this technology

• It seems to produce good portable instruction selectors

• Key strength appears to be late low-level optimization

Page 48: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-07s\Topic-5.ppt 48112/04/22 \course\cpeg421-07s\Topic-5.ppt 48

Case Study: LLIR and Instruction Selection

• LLIR stands for low-level intermediate representation

• GCC RTL

• Open64 CGIR

• Instruction Selection Techniques in Real Compiler:

• GCC: peephole based matching

• LCC: tree pattern match method

Page 49: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 49

Case Study: Peephole Optimization in Kcc/Open64

• Aggressive simplifier using a logical window

that spans a set of basic blocks (Extended

Basic Blocks: EBO)

• Basis One definition for multiple uses

Constant propagation

Forward substitution

Others: pattern matching

Page 50: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 50

KCC/Open64: Where Instruction Selection Happens?

f90

Very High WHIRL

VHO(Very High WHIRL Optimizer)

Standalone Inliner

W2C/W2F

Fortran

High WHIRL

Middle WHIRL

Low WHIRL

Very Low WHIRL

lowering

Source to IR

Scanner →Parser → RTL → WHIRL

LNO

• Loop unrolling/

• Loop reversal/Loop fission/Loop fussion

• Loop tiling/Loop peeling…

WOPT

• SSAPRE(Partial Redundency Elimination)

• VNFRE(Value Numbering based Full Redundancy Elimination)

RVI-1(Register Variable Identification)

lowering

• RVI-2

• IVR(Induction Variable Recognition)

lowering

• Cflow(control flow opt), HBS (hyperblock schedule)

• EBO (Extended Block Opt.) • GCM (Global Code Motion)

• PQS (Predicate Query System)

• SWP, Loop unrolling

Assembly Code

W2C/W2F

CFG/DDG

WHIRL-to-TOP lowering IGLS(pre-pass)

GRA

LRA

IGLS(post-pass)

DDG

gfecc gfec

C++ C

IPA

• IPL(Pre_IPA)

• IPA_LINK(main_IPA)

◦ Analysis

◦ Optimization

PREOPT

SSA

• IGLS(Global and Local Instruction Scheduling)

• GRA(Global Register Allocation)

• LRA(Local Register Allocation)

Machine Description

GCC Compile

Mach

ine M

odel

Fron

t En

dM

idd

le En

dB

ack E

nd

SSA

CGIR

Some peephole optimization

lowering

lowering

Page 51: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 51

Case Study: Peephole Optimization in Kcc/Open64 (cont.)

• Summary recognize an extended block/sequence

perform peephole optimizations on the instructions within

a logical window (The size of the logical window grows as

the basic block sequence grows)

• Optimizations forward propagation of constants

redundant expression elimination

dead expression elimination

Page 52: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 52

Flowchart of Kcc/Open64 Code Generator

WHIRL

Process Inner Loops: unrolling, EBO

Loop prep, software pipelining

IGLS: pre-passGRA, LRA, EBOIGLS: post-passControl Flow Opt

Code Emission

WHIRL-to-TOP Lowering

CGIR: Quad Op List

Control Flow Opt IEBO

Hyperblock Formation Critical-Path Reduction

Control Flow Opt IIEBO

EBO:Extendedbasic blockoptimizationpeephole,etc.

PQS:PredicateQuery System

Page 53: Topic 2:  Foundamentals in  Basic Back-End Optimization

112/04/22 \course\cpeg421-10F\Topic-2.ppt 53

Case Study: Peephole Optimization in Kcc/Open64 (cont.)

• Design issues

need to recognize the definition and use

connections between values.

called several times during compilation by

different modules, and may have different

information available each time it is called