05 may 20061 lazy code motion in an ssa world a cs 526 course project patrick meredith steven...

Post on 17-Dec-2015

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

05 May 2006 1

Lazy Code Motion in an SSA World

A CS 526 Course Project

Patrick MeredithSteven Lauterburg

05 May 2006

05 May 2006 2

Presentation Overview

Introduction Motivation Preliminaries

Implementing LCM Results Implementation Status

05 May 2006 3

Motivation

What problem are we trying to solve? Lazy Code Motion is a bit-vector-based iterative

dataflow algorithm for partial redundancy elimination (PRE) that delivers safe, computationally optimal results.

SSAPRE is an approach to PRE that was specifically designed to work on SSA-form that also delivers a computationally optimal placement

Unfortunately, the sparse SSAPRE algorithm does not always perform better than the older Lazy Code Motion dataflow algorithm.

05 May 2006 4

Solution?

LCM is based on the source level syntax of a program… an expression like a+b is easy to identify in non-SSA form.

In SSA-form, variables are renamed… What variables are the same from a source-level

perspective? What expressions are equivalent to each other? How do we handle multiple instances of the same

variable being live at the same time? Which instance of a variable do we use when we move a

computation to a new location?

Why is implementing Lazy Code Motion on an SSA-based internal representation (like LLVM’s) difficult?

05 May 2006 5

Redundant and Partially Redundant Computations Code motion is used to remove RedundantRedundant

computations…

… and Partially Redundant Partially Redundant computations.

f := 7

y := e + f

f := 7 y := e + f

y := e + f

s := b + c

t := b + c

05 May 2006 6

Critical Edges ProblemProblem: Code motion can be blocked by “Critical

Edges” – edges leading from nodes with more than one successor to nodes with more than one predecessor.

SolutionSolution: An edge splitting transformation can be performed that inserts extra nodes.

z := u + v

w := u + v

h := u + v

z := h

w := u + v

h := u + v

05 May 2006 7

Variable Equivalent Classes (VECs)

What is a Variable? A VEC?

Variables that are operands of a phi-node, along with the phi-node itself are placed in to the same VEC.

Many variables may be tied together by multiple phi-nodes.

Independent variables and constants are placed in singleton VECs

Function arguments can also be included in VECs

a1 := cx := a1 +

b

a0 := d

y := b + a2

a3 := fz := a3 +

b

a2 = phi (a1, a0)

05 May 2006 8

Expression Equivalent Classes (EECs)

When are two expressions equivalent?

Two expressions are considered equivalent for purposes of code motion if and only if…

1. they have the same operator and

2. the corresponding operands of the two expressions are in the same VEC.

Modulo commutativity, etc. of course

a1 := cx := a1 +

b

a0 := d

y := b + a2

a3 := fz := a3 +

b

a2 = phi (a1, a0)

05 May 2006 9

“Stale” Uses and “Fresh” Values

What is a “stale” use? A “stale” use occurs when

the live ranges of two different versions of the same source-level variable overlap

An expression that uses a “stale” definition is not the same syntactically as those using “fresh” values

What is a “fresh” value? Intuitively, it is the most

recent definition of a variable. More formally, for a given VEC, the fresh value is the VEC member definition that immediately dominates a program point.

a1 := c

a0 := d

==>y := b +

a2z := b +

a0

a2 = phi (a1, a0)

05 May 2006 10

Freshness Analysis

The Freshness Lattice BOTTOM < SSA values <

TOPLocal Freshness For each instruction, make

that SSA value the Fresh value for its VEC. What ever is Fresh at the exit is X_FRESH

Global Freshness To compute the N_FRESH

for a basic block we meet over the succesors.

Removal of Stale Uses After completion of the

Freshness analysis we remove Stale uses by inserting copies.

a1 := c

a0 := d

==>y := b +

a2z := b +

a0

a2 = phi (a1, a0)

05 May 2006 11

LCM Analyses

Which analyses do we perform?

We perform upsafety, downsaftey, earliestness, delayability, latenesses.

Why do we not do the Isolation analysis?

Mem2reg, essentially like leaving the original computation in place.

Worklist based

No predecessors/successors can cause problems

a1 := c

a0 := d

==>y := b +

a2z := b +

a0

a2 = phi (a1, a0)

05 May 2006 12

Moving Code… The Almost LCM TransformationThe Basic Block Local Transform We do not require local CSE as a prereq. We first insert new computations for everything marked as

N_INSERT for this basic block. As we step through instuctions in a given basic block we

update the local fresh set based on the fresh set at the beginning of the basic block. We also keep a set of dead computations.

For each Binary Operator if its computation is dead we insert a new computation with the proper Fresh operands. We store this computation to a memory location specific to each EEC.

At the point of each original computation we insert a load of the proper memory location, and replace all uses of that original computation with the load.

At the end of the basic block we insert computations and stores for all expressions that are X_INSERT and not X_REPLACE. These will be used in later basic blocks.

05 May 2006 13

Example

y0 := a0 + b0 y1 := a0 + b0a1 := G0y2 := a1 + b0y3 := y2 + b0

EEC0_comp_0 := a0 + b0store EEC0_comp_0, ECC0EEC0_load_0 := load ECC0a1 := G0EEC0_comp_1 := a1 + b0store EEC0_comp_1, ECC0EEC0_load_1 := load ECC0EEC1_comp_0 := EEC0_load_1 +

b0

05 May 2006 14

Results

Removed Stale Uses 3.0 2.0 16.0 2.0 2.0 7.0 5.0 4.0 111.0 0.0 0.0Unpropagated Constants 17.0 30.0 123.0 49.0 95.0 16.0 15.0 40.0 227.0 14.0 41.0VECs 227.0 1002.0 1040.0 1103.0 713.0 341.0 601.0 764.0 6879.0 169.0 409.0EECs 48.0 250.0 291.0 163.0 251.0 87.0 206.0 238.0 3088.0 34.0 131.0Non-singleton VECs 12.0 20.0 74.0 51.0 61.0 21.0 27.0 31.0 243.0 14.0 31.0Insertions 50.0 253.0 332.0 167.0 258.0 99.0 219.0 252.0 3196.0 34.0 134.0Replacements 50.0 257.0 349.0 115.0 257.0 99.0 219.0 249.0 3194.0 34.0 143.0Lines of code 537.0 448.0 703.0 634.0 785.0 432.0 579.0 913.0 4185.0 366.0 435.0Base +LCM time 6.7 1.8 8.9 12.3 59.9 NA 0.8 65.5 104.2 6.4 21.9Base time 6.6 1.8 8.8 12.3 79.5 NA 0.8 65.4 102.5 6.4 21.8Number of functions 13.0 9.0 17.0 21.0 5.0 9.0 9.0 12.0 76.0 3.0 5.0

05 May 2006 15

Bmps!

05 May 2006 16

Bmps!

05 May 2006 17

Limitations

Currently we can only dubiously handle programs which use unwind (maybe it will work, maybe not, if it does it is probably by accident).

While we appear to handle programs that use unreachable correclty we are not completely sure.

The algorithm is pretty slow due to all the book keeping we must do.

05 May 2006 18

Random Thoughts

I actually found a case where map is faster than hash_map!

Using handles to make Fresh updates not suck

The truth(?) of VECs!

05 May 2006 19

Results

What is a “stale” use? A “stale” use occurs when

the live ranges of two different versions of the same source-level variable overlap

An expression that uses a “stale” definition is not the same syntactically as those using “fresh” values

What is a “fresh” value? Intuitively, it is the most

recent definition of a variable. More formally, for a given VEC, the fresh value is the VEC member definition that immediately dominates a program point.

a1 := c

a0 := d

==>y := b +

a2z := b +

a0

a2 = phi (a1, a0)

05 May 2006 20

bmps

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

The predicate equations used by the algorithm make use of two local predicates defined below:

For every assignment node n ≡ v := t′ and every term t T \ V (where T is the set of all terms, and V is the set of all variables):

Used(n, t) = t SubTerms(t′ )

Transp(n, t) = v Var(t)

When t is understood, these predicates will be denoted Used(n) and Transp(n).

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

05 May 2006 21

Down-Safe

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

A node n is D-SAFE if a computation of a term t at n does not introduce a new value on a terminating path starting in n.

D-SAFE(n) =

false if n = e

Used(n) otherwiseTransp(n) D-SAFE(m)

m succ(n)

D-Safe

05 May 2006 22

Earliest

EARLIEST(n) =

true if n = s

(¬Transp(m) otherwise

¬ D-SAFE(m) EARLIEST(m))

Σ Σ m pred(n)

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

A node n is EARLIEST if there is a path from s to n where no node on that path prior to n is D-Safe and delivers the same value for t as when computed at n.

D-Safe Earliest

05 May 2006 23

Safe-Earliest Transformation

x := h

h := a + b

x := h

y := h

w := h

a := c

z := h

h := a + b

Introduce a new auxiliary variable h for the term t.

Insert at the entry of every node n that is both D-Safe and Earliest the assignment h := t.

Replace every original computation of t by h.

D-Safe & Earliest

The set of nodes that are both D-Safe and Earliest are computationally optimal computation points.

Safe-Earliest Transformation…

05 May 2006 24

An Example…

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

The Lazy Code Motion Approach:

Identifies the earliest optimal computation points based on predicate values determined through a series of forward and backward analyses.

Identifies computation points that allow variables to be initialized “as late as possible”.

Replaces original computations with auxiliary variables that are initialized at identified computation points… But only when there is computational gain.

05 May 2006 25

Delay

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

A node n is DELAY if on every path from s to n there is a computation of the Safe-Earliest Transform such that all subsequent original computations lie in n.

DELAY(n) =

D-SAFE(n) EARLIEST(n) false if n = s

¬Used(m) DELAY(m) otherwise m pred(n)

DelayD-Safe & Earliest

05 May 2006 26

Latest

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

LATEST(n) =

false if n = e

DELAY(n) otherwise

(Used(n) ¬ DELAY(m)) m succ(n)

Delay Latest

A node n is LATEST if… n is a computation point of some

computationally optimal placement.

On every terminating path starting in n, any subsequent optimal computation point follows an original computation.

05 May 2006 27

An Example…

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

The Lazy Code Motion Approach:

Identifies the earliest optimal computation points based on predicate values determined through a series of forward and backward analyses.

Identifies computation points that allow variables to be initialized “as late as possible”.

Replaces original computations with auxiliary variables that are initialized at identified computation points… But only when there is computational gain.

05 May 2006 28

Isolated

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

A node n is ISOLATED if on every terminating path starting from a successor of n, any original computation of t is preceded by a new, latest computation.

ISOLATED(n) =

true if n = e

(LATEST(m) otherwise

¬ Used(m) ISOLATED(m))

m succ(n)

Isolated Latest

05 May 2006 29

Lazy Code Motion Transformation

z := h

x := a + b

h := a + b

y := h

w := h

a := c

x := a + b

h := a + b

Set of Optimal Computation Points for t : OCP = { n | Latest(n) ¬ Isolated(n) }

Set of Redundant Occurrences of t : RO = { n | Used(n) ¬ (Latest(n) Isolated(n)) }

Introduce a new auxiliary variable h for the term t.

Insert at the entry of every node in OCPOCP the assignment h := t.

Replace every original computation of t in nodes of RORO by h.

Latest Latest & Isolated

LCM Transformation…

05 May 2006 30

Register pressure is not always reduced. Some desirable code motion is not allowed. Code size can be increased.

Considerations

05 May 2006 31

How do the live ranges of aa and bb affect “lifetime optimality”?

Reducing Register Pressure?

z := h

h := a + b

y := h

w := h

x := a + b

h := a + b

x := h

y := h

w := h

z := h

h := a + b

05 May 2006 32

Some desirable code motion is not D-Safe and therefore not allowed.

Code Motion & Down-Safety

w := a + b

w := h

h := a + b

05 May 2006 33

Late placement of computation points increases code size.

Code Bloat

y := h y := h

h := a + b

y := hy := hh := a +

by := h

h := a + b

y := h

h := a + b

y := h

h := a + b

y := h

05 May 2006 34

05 May 2006 35

Equations 1

D-SAFE(n) =

false if n = e

Used(n) otherwiseTransp(n) D-SAFE(m)

m succ(n)

EARLIEST(n) =

true if n = s

(¬Transp(m) otherwise

¬ D-SAFE(m) EARLIEST(m))

Σ Σ m pred(n)

For every node n ≡ v := t′ and every term t T \ V…

Used(n, t) = t SubTerms(t′ )

Transp(n, t) = v Var(t)

05 May 2006 36

Equations 2

DELAY(n) =

D-Safe(n) Earliest(n) false if n = s

¬Used(m) DELAY(m) otherwise m pred(n)

LATEST(n) =

false if n = e

Delay(n) otherwise

(Used(n) ¬ Delay(m)) m succ(n)

05 May 2006 37

Equations 3

ISOLATED(n) =

true if n = e

(Latest(m) otherwise

¬ Used(m) ISOLATED(m))

m succ(n)

RO = { n | Used(n) ¬ (Latest(n) Isolated(n))

OCP = { n | Latest(n) ¬ Isolated(n) }

top related