course outline traditional static program analysis –theory compiler optimizations; control flow...
TRANSCRIPT
Course Outline• Traditional Static Program Analysis
– Theory• Compiler Optimizations; Control Flow Graphs
• Data-flow Analysis – today’s class
– Classic analyses and applications
• Software Testing
• Dynamic Program Analysis
Outline
• The four classical data-flow problems– Reaching definitions– Live variables– Available expressions– Very busy expressions
• Data-flow frameworks• Reading: Compilers: Principles, Techniques and
Tools, by Aho, Lam, Sethi and Ullman, Chapter 9.2
Four Classical Data-flow Problems
• Reaching definitions (Reach)• Live uses of variables (Live)• Available expressions (Avail)• Very busy expressions (VeryB)• Def-use chains built from Reach, and the dual
Use-def chains, built from Live, play role in many optimizations
• Avail enables global common subexpression elimination
• VeryB is used for conservative code motion
Classical Data-flow Problems
• How to formulate the analysis using data-flow equations defined on the control flow graph?
• Forward and backward data-flow problems
• May and must data-flow problems
out(i) = gen(i) (in(i) – kill(i))
in(i) = gen(i) (out(i) – kill(i))
Forward:
Backward:
Problem 1: Reaching Definitions
• For each CFG node n, compute the set of definitions that reach n.
i
inRD(i) = { outRD(j) | j is predecessor of i }
j: a=b+c
outRD(i)= gen(i) (inRD(i)– kill(i))
kill(j): all definitions of a gen(j): this definition of a, (a,j)
Example
1. x:=read()
2. y:=1
3. if x<2 then
4. y:=x*y
5. x:=x-1
6. goto 3
7. …
inRD(1) = Ø
inRD(2) = outRD (1)
inRD(3) = outRD(2) outRD(6)
inRD(4) = outRD(3)
inRD(5) = outRD(4)
inRD(6) = outRD(5)
inRD(7) = outRD(3)
outRD(1) = (inRD(1)-Dx) {(x,1)}
outRD(2) = (inRD(2)-Dy) {(y,2)}
outRD(3) = inRD(3)
outRD(4) = (inRD(4)-Dy) {(y,4)}
outRD(5) = (inRD(5)-Dx) {(x,5)}
outRD(6) = inRD(6)
Example1. x:=read()
2. y:=1
3. if x<2 then
4. y:=x*y
5. x:=x-1
6. goto 3
7. …
inRD(1) = Ø
inRD(2) = {(x,1)}
inRD(3) = {(x,1),(x,5),(y,2),(y,4)}
inRD(4) = {(x,1),(x,5),(y,2),(y,4)}
inRD(6) = {(x,5),(y,4)}
inRD(7) = {(x,1),(x,5),(y,2),(y,4)}
outRD(1) = {(x,1)}
outRD(2) = {(x,1), (y,2)}
outRD(3) = {(x,1),(x,5),(y,2),(y,4)}
outRD(4) = {(x,1),(x,5),(y,4)}
inRD(5) = {(x,1),(x,5),(y,4)}
outRD(5) = {(x,5),(y,4)}
inRD(6) = {(x,5),(y,4)}
Reaching Definitions
m1 m2 m3
j
inRD(m1)
Forward, may dataflow problem
inRD(j)
inRD(m3)inRD(m2)
Equivalent Equations
where:
pres(m) is the set of definitions preserved through node m
gen(m) is the set of definitions generated at node m pred(j) is the set of immediate predecessors of node j
Problem 2: Live Uses of Variables
• For each node n, compute the set of variables live on exit from n.
i:
outLV(i) = { inLV(j) | j is a successor of i }
inLV(i)= gen(i) (outLV(i) – kill(i))
1.x:=2; 2. y:=4; 3. x:=1; (if (y>x) then 5. z:=y; else 6. z:=y*y); 7. x:=z;
What variables are live on exit from statement 1? Statement 3?
x = y+z Q: What is gen(i)?Q: What is kill(i)?
Example1. x:=2
2. y:=4
3. x:=1
4. if (y>x)
5. z:=y 6. z:=y*y
7. x := z
Live Uses of Variables
m1 m2 m3
j outLV(j)
Backward, may dataflow problem
outLV(m1) outLV(m2) outLV(m3)
Equivalent equations
where:
pres(m) is the set of uses preserved through node m (roughly, correspond to variables whose defs are preserved)
gen(m) is the set of uses generated at node m succ(j) is the set of immediate successors of node j
Problem 3: Available Expressions
• An expression X op Y is available at node n if every path from entry to n evaluates X op Y, and after every evaluation prior to reaching n, there are NO subsequent assignments to X or Y
X op YX = …Y = …
X op YX = …Y = …
X op YX = …Y = …
n
ρ
Global Common Subexpressions
z=a*br=2*z
q=a*b
u=a*bz=u/2
w=a*b
Global Common Subexpressions
t1=a*bz=t1r=2*z
t1=a*bq=t1
u=t1z=u/2
w=a*b
Can we eliminate w=a*b?
Available Expressions
m1 m2 m3
j
Forward, must dataflow problem
inAE(j) = ?outAE(j) = ?gen(j) = ?kill(j) = ?
x=y+z
inAE(m1) inAE(m2) inAE(m3)
inAE(j)
Example
1. x = a + b
2. y = a * b
3. if y <= a + b then goto 7
4. a = a + 1
5. x = a + b
6. goto 3
7. …
Problem 4: Very Busy Expressions
• An expression X op Y is very busy at node n, if along EVERY path from n to the end of the program, we come to a computation of X op Y BEFORE any redefinition of X or Y.
X = …Y = …t1=X op Y
X = …Y = …t1=X op Y
X = …Y = …t1=X op Y
n
Very Busy Expressions
m1 m2 m3
j outVB(j)
outVB(m1) outVB(m2) outVB(m3)
Very Busy Expressions
where:
pres(m) is the set of expressions preserved through node m gen(m) is the set of expressions generated at node m succ(j) is the set of immediate successors of node j
Dataflow Problems
May Problems Must Problems
Forward Problems
Reaching Definitions
Available Expressions
Backward Problems
Live Uses of Variables
Very Busy Expressions
Similarities• There is a finite set, U, of data-flow facts:
– Reaching Definitions: the set of all definitions:
e.g., {(x,1),(y,2),(x,4),(y,5)}
– Available Expressions and Very Busy Expressions: the set of all arithmetic expressions e.g., { a+b,a*b,a+1}
– Live Uses: the set of all variables e.g., { x,y,z }
• The solution at a node is a subset of U (e.g., every definition either reaches node i or does not).
Similarities
• Equations (i.e., transfer functions) always have the form:out(i) = Fi(in(i)) = (in(i) – kill(i)) gen(i) =
(in(i) pres(i)) gen(i)
A note: what makes the 4 classical problems special is that sets pres(i) and gen(i) are constants, i.e., they do not depend on in(i)
• Set union and set intersection can be implemented as logical OR and AND respectively
The worklist algorithm for data-flow Analysis: Reaching Definitions
change = true;
Initialize inRD(m) = Ø for m=2…n
inRD(1) = UNDEF
while (change) do {
change = false;
while ( j s.t. inRD(j) ≠ ((inRD (m) pres(m)) gen(m) ) {
inRD (j) = ((inRD (m) pres(m)) gen(m)
change = true;
}
}
)( jpredm
)( jpredm
A Better Algorithm
/* initially all inRD sets are empty */for m := 2 to n do inRD(m) := Ø; inRD(1) = UNDEFW := {1,2,…,n} /* put every node on the worklist */while W ≠ Ø do {
remove j from W;new = {inRD(m) pres(m) gen(m) };
if new ≠ inRD (j) then {inRD (j) = new;for k succ(j) do add k to W
}
An Implementation
• Use bitstring representation for sets: 1 bit position per variable definition
For each control flow graph node jpres(j) – has 0 in bit positions corresponding to definitions of variables
defined at node j– has 1 in bit positions corresponding to definitions of variables not
defined at node j
gen(j)– has 1 in bit positions corresponding to definitions at node j– has 0 in bit positions for all other definitions (i.e., definitions not
at node j)
Detailed Algorithm
W = empty // initialize the worklist for (i = 1; i < n+1; i++) // i varies over nodes
for (j = 1; j < m+1; j++) { // j over definitions if (k pred(i) with j gen(k)) then
{ set j bit to 1 in inRD(i); add (j,i) to W}
else { set j bit to 0 in inRD(i);}while (W not empty) do {
remove (j,i) from Wif (j pres(i)) then {
for (k succ(i)) if (j bit in inRD(k) == 0) then { set j bit to 1 in inRD(k); add (j,k) to W } }
}
First loop (for) passes gen sets to successors.
Second loop (while) performs worklist propagation.
Example, Bitvector Calculationi=0k=0
i<0
mod(i,3) = 0?
k:=k-1 k:=k+1
i:=i+1
exit
(i,1),(k,1)
(k,4) (k,5)
(i,6)
B1
B2
B3
B4 B5
B6
Definitions and basic blocks are given unique identifiers
Initializationi=0k=0
i<0
mod(i,3) = 0?
k:=k-1 k:=k+1
i:=i+1
exit
B1
B2
B3
B4 B5
B6
B1 B2 B3 B4 B5 B6
pres: 00000 11111 11111 10001 10001 01110gen: 11000 00000 00000 00100 00010 00001
Bits: i1,k1,k4,k5,i6
(i,1),(k,1)
(k,4) (k,5)
(i,6)
After Initialization Loopi=0k=0
i<0
mod(i,3) = 0?
k:=k-1 k:=k+1
i:=i+1
exit
B1
B2
B3
B4 B5
B6
00000
11001
00000
00000 00000
00110
B1 B2 B3 B4 B5 B6
pres: 00000 11111 11111 10001 10001 01110gen: 11000 00000 00000 00100 00010 00001
Bits: i1,k1,k4,k5,i6
(i,1),(k,1)
(k,4) (k,5)
(i,6)
Propagation Loop
Worklist W = {(i1,2),(k1,2),(i6,2),(k4,6),(k5,6)}Choose (i1,2); pres(2) = 11111, so Reach(3) = 10000
and we add (i1,3) to W.Then choose (k1,2) off W and set Reach(3) = 11000 and
we add (k1,3) to W.Then choose (i6,2) off W and set Reach(3) = 11001 and
add (i6,3) to W. NowW = {(k4,6),(k5,6), (i1,3) , (k1,3), (i6,3)}Iteration continues until worklist is empty.
After Steps in Previous Slidei=0k=0
i<0
mod(i,3) = 0?
k:=k-1 k:=k+1
i:=i+1
exit
B1
B2
B3
B4 B5
B6
00000
11001
11001
00000 00000
00110
(i,1),(k,1)
(k,4) (k,5)
(i,6)
After Steps in Previous Slidei=0k=0
i<0
mod(i,3) = 0?
k:=k-1 k:=k+1
i:=i+1
exit
B1
B2
B3
B4 B5
B6
00000
11111
11001
00000 00000
00110
(i,1),(k,1)
(k,4) (k,5)
(i,6)
After Steps in Previous Slidei=0k=0
i<0
mod(i,3) = 0?
k:=k-1 k:=k+1
i:=i+1
exit
B1
B2
B3
B4 B5
B6
00000
11111
11001
11001 11001
00110
(i,1),(k,1)
(k,4) (k,5)
(i,6)
Solution (skipping some steps)i=0k=0
i<0
mod(i,3) = 0?
k:=k-1 k:=k+1
i:=i+1
exit
B1
B2
B3
B4 B5
B6
00000
11111
11111
11111 11111
10111
(i,1),(k,1)
(k,4) (k,5)
(i,6)