new results in program slicing
DESCRIPTION
New Results in Program Slicing. Aharon Abadi, Ran Ettinger, and Yishai Feldman IBM Haifa Research Lab. Context. The Programmer’s Apprentice The Plan Calculus Bogart Midas Sliding Painless Paz Aderet. Improving Slice Accuracy by Compression of Data and Control Flow Paths. - PowerPoint PPT PresentationTRANSCRIPT
1
New Results inProgram Slicing
Aharon Abadi, Ran Ettinger, and Yishai Feldman
IBM Haifa Research Lab
2
Context
• The Programmer’s Apprentice– The Plan Calculus
• Bogart
• Midas
• Sliding
• Painless– Paz– Aderet
3
Improving Slice Accuracy by Compression of Data and Control Flow Paths
Presented at ESEC/FSE 2009
4
Program Slicing
Program
x := expStart Slice x := exp
Slice
The same sequence of values
5
A:Z0A:Z0
Control-Flow Path Compression
go-to Bgo-to B
if-zero-go-to A
test X
Work in two stages:- Compute the ‘traditional’ slice
- Control dependences- Data Dependences
- Compute the necessary branches to prevent infeasible control paths
test X
if-zero-go-to A
. . .
L:test Y
if-zero-
. . .
go-to L
B:
. . .
B:
6
A:Z0
go-to B
if-zero-go-to A
test X
Limitations of previous approaches:- insert all the loop;- add branches not from the program; or- do not preserve behavior
This algorithm:- preserves behavior- yields a sub-program
- one version may turn conditional branches into unconditional ones (“rhetorization”)
B:go-to Bgo-to B
test X test X
. . .
L:test Y
if-zero-
. . .
go-to L
B:
. . .
if-zero-go-to A
A:Z0
Control-Flow Path Compression
7
Data-Flow Path Compression
The result is too large
The value of R7 does not depend on the loop
R7:=exp1Out: R0:=R7 + 1
Previous syntax-preserving algorithms insert the loop and the assignments inside it
Out: R0:= R7 + 1
Start:R2:=0
R7:=exp1
Loop: R2:=R2 + 1
compare R2, R9
if-not-less-go-to Out
use R7
Temp:=R7; spill R7 to memory
… ; code that uses
; all registers
R7:=Temp; restore R7
go-to Loop
Start:R2:=0 R7:=exp1Loop: R2:=R2 + 1 compare R2, R9 if-not-less-go-to Out Temp:=R7 R7:=Temp go-to Loop Out: R0:=R7 + 1
8
Control-Flow Path Compressionx<11
F T
x:=x+1 goto A4
goto A2
y<TT F
y:=y-1
goto A
print(x)
x<9T F
x:=x-1x:=x+2
goto A2
goto A3
if (x<11)
x := x+1
goto A2
A1: if (y<T)
y := y–1
goto A1
goto A2
goto A4
x := x-1
A4: if (x<9) goto A3
A3: x := x+2
A2: print(x)
9
Compute the ‘Traditional’ Slicex<11
F T
x:=x+1 goto A4
goto A2
y<TT F
y:=y-1
goto A
print(x)
x<9T F
x:=x-1x:=x+2
goto A2
goto A3
if (x<11)
x := x+1
goto A2
A1: if (y<T)
y := y–1
goto A1
goto A2
goto A4
x := x-1
A4: if (x<9) goto A3
A3: x := x+2
print(x)
A2: print(x)
x:=x+1
x:=x+2 x:=x-1
x<11
x<9
y<T
10
Completing Control Flow Paths:Main Lemma
• precisely identifies the possible sets of branches that may be added to the slice
• any path in the original program can be chosen
• optimizations can be performed
All paths from the same point in the slice enter the slice at a single point
11
Compute the Necessary Branchesx<11
F T
x:=x+1 goto A4
goto A2
y<TT F
y:=y-1
goto A
print(x)
x<9T F
x:=x-1x:=x+2
goto A2
goto A3
if(x<11)
x:=x+1
goto A2
A1: if(y<T)
y:=y–1
goto A1
goto A2
goto A4
x:=x-1
A4: if(x<9) goto A3
A3: x:=x+2
A2: print(x)
12
Start:R2:=0 R7:=exp1Loop: R2:=R2 + 1 compare R2, R9 if-not-less-go-to Out use R7 Temp:=R7; spill R7 to ;memory … ; code that uses ;all registers R7:=Temp; restore R7 go-to Loop Out: R0:=R7 + 1
Data-Flow Path Compression
R7:=exp1Out:R0:=R7 + 1 +1
R7:=exp1
exit
R0:=R7+1
R2:=0
R2:=R2+1
compare R2,R9
if-not-less
use R7
Temp:=R7
R7:=Temp
goto Loop
go-to Out
13
++
++
exp1
Data-Flow Path Compression
R7:=exp1
exit
R7:=R7+1
R2:=0
R2:=R2+1
compare R2,R9
if-not-less
use R7
Temp:=R7
R7:=Temp
goto-Loop
• R7,Temp carry the value of exp1
• Use data edges instead of variables
go-to Out
out data portholds the last valuein data port
holds the next value
d1 d2
d1
Start:R2:=0 R7:=exp1Loop: R2:=R2 + 1 compare R2, R9 if-not-less-go-to Out use R7 Temp:=R7; spill R7 to ; memory … ; code that uses ; all registers R7:=Temp; restore R7 go-to Loop Out: R0:=R7+1
0
• The Plan Calculus:The Programmer’s Apprentice,Rich and Waters, 1990
14
exp1
entry
0
exit
++
R7
R0
R9
R2
++
R2
T F
compare R2,R9
R7:= exp1R0:=R7 + 1
Start:R2:=0
Loop: R2:=R2 + 1 compare R2, R9 if-not-less-go-to Out use R7 Temp:=R7; spill R7 to ; memory … ; code that uses ; all registers R7:=Temp; restore R7 go-to Loop Out: R0:=R7 + 1
R7:=exp1
Out: R0:=R7 + 1
R7:=exp1
if-not-less
use R7
15
exp1
0
exit
++
R7
R0
R9
R2
++
R2
T F
compare R2,R9
Start:R2:=0
Loop:R2:=R2 + 1 compare R2, R9 if-not-less- use R7 ; spill R7 to ; memory … ; code that uses ; all registers ; restore R7 go-to Loop Out: R0:=R7 + 1
R7:=exp1
if-not-less
use R7
Decompression
go-to Out
Temp:=R7
R7:=Temp
R7:=exp1
R0:=R7 + 1
go-to Out
entry
Out:
16
Properties of the Slices
• Syntax preserving, possibly rhetorizing• Behavior preserving• Executable• For structured programs
– At least as accurate as previous algorithms– Strictly smaller in interesting cases
• For unstructured programs– Empirically shown to be superior– Modification of the algorithm guaranteed at least as
accurate
17
Implementation
• A family of slicing algorithms– rhetorizing (*RB, *RM)– strictly syntax-preserving
(*PB, *PM)– amorphous (*AB, *AM)
• adds new branches(not from the program)
A1:if(y<T) goto A2
A:Z 0
if-zero-go-to A
test X
. . .
L:test Y
if-zero-go-to B
. . .
go-to L
C:
go-to exit
. . .goto exit
B:go-to C
18
Empirical Study
• Corpus of 15 manually-written assembly-language modules from a large mainframe product
• 8578 non-comment source lines
• Computed slices from all lines
• 5801 non-empty slices
19
Empirical Results
Effect of%slices better
%average decrease
%slices worse
%average decrease
Rhetorization177.5
Control path compression
Lenient BH3017
Strict BH9465
Data path compression
implemented124815
modified
20
Related WorkBehaviorPreserve
behaviorMay add infinite loops
Not executable
BH,CF1,Ag, HLB,*P,*R, *A
HLB, HDKH
Subset of the original program(for flat languages)
Syntax-preserving
RhetorizingAmorphous
BH, CF1, Ag, HD, HLB, *P
*RHLB, CF, *A
Comparison to traditional algorithm on structured programs
Smaller than traditional
Equal to traditional
Larger than traditional
*P, *R, *ABH, CF1, Ag, HD, KH, HLB, CF2
BH: Ball & Horwitz 1993CF: Choi & Ferrante 1994Ag: Agrawal 1994
KH: Kumar & Horwitz 2002HD: Harman & Danicic 1998HLB: Harman, Lakhotia & Binkley 2006
21
Conclusions
• Two techniques for reducing slice size– Control-Flow Path Compression
• Precise identification of all correct solutions• Shortest paths significantly improve slice accuracy
– 17-22% improvement for 30-37% of the cases– Data-Flow Path Compression
• Eliminates copy assignments• Yields significant improvement in a few cases
– 24% improvement for 1% of the slices computed
• Strictly smaller even for structured programs
22
Fine Slicing forProgram Transformation
23
Refactoring’s Rubicon:Extract Method
• Automating Extract Method is Refactoring’s Rubicon (Fowler*)– The one that demonstrates “serious tool
support”– Precondition for many other transformations
• This Rubicon has not yet been crossed– Getting it right requires more analysis
capability than is available in current IDEs
*http://www.martinfowler.com/articles/refactoringRubicon.html
24
Fowler’s Example (website)void printOwing() { printBanner();
//print details System.out.println("name: " + _name); System.out.println("amount " + getOutstanding());}
void printOwing() { printBanner(); printDetails(getOutstanding());}
void printDetails(double outstanding) { System.out.println("name: " + _name); System.out.println("amount " + outstanding);}
25
A Case Study inEnterprise Refactoring
• Converted a Java Servlet to use the MVC pattern*
• Used as much automated support as available– The whole conversion could be described as a series
of cataloged (“small”) refactorings– Most steps were inadequately supported by the IDE– Some were not supported at all
* Based on Alex Chaffee’s “Refactoring to Model-View-Controller” article (http://www.purpletech.com/articles/mvc/refactoring-to-mvc.html)
26
Case-Study: Automation (1)
13Total
3
3
2
1
1
1
1
1
Extract Method
Extract Temp
(Self) Encapsulate Field
Replace Magic Number with Symbolic Constant
Inline Temp
Extract Superclass
Delete Methods
Move Method
UsesFully Supported Refactorings
27
Case-Study: Automation (2)
23Total
10
5
3
2
1
1
1
Extract Method *
Substitute Expression **
Replace Temp with Query *
Replace Method with Method Object **
Substitute Statement **
Extract Class *
Move Statement (or Swap Statements) **
UsesPartial(*) or No(**) Support
28
Currently Unsupported Casesof Extract Method
(a) Extract multiple fragments
(b) Extract a partial fragment– select sub-expressions as parameters
(c) Extract loop with partial body– loop duplication with data flow
(d) Extract code with conditional exits
Program slicing pulls related code together!
29
slice (v.): to cut with or as if with a knife
Merriam-Webster
slice (n.): a thin flat piece cut from something
30
A (backward) slice of a given program with respect to selected “interesting” variables is a subprogram that computes the same values as the original program for the selected variables
A (backward) fine slice of a given program with respect to selected “interesting” variables and other “oracle” variables is a subprogram that computes the same values as the original program for the selected variables, given values for the oracle variables
31
Fine Slicing
• A generalization of traditional program slicing• Fine slices can be precisely bounded
– Slicing criteria include set of data and control dependences to ignore
• Fine slices are executable and extractable• Complement slices (co-slices) are also fine slices• Oracle-based semantics for fine slices• Algorithm for computing data-structure representing the
oracle• Forward fine slices are executable, may be slightly larger
than traditional forward slices• Confines generalize blocks for unstructured programs
32
Extract Computation
• A new refactoring
• Extracts a fine slice into contiguous code
• Computes the co-slice
• Computation can then be extracted into a separate method using Extract Method
• Passes necessary “oracle” variables between slice and co-slice
• Generates new containers if series of values need to be passed
33
(a) Extract multiple fragmentsUser user = getCurrentUser(request);
if (user == null) {
response.sendRedirect(LOGIN_PAGE_URL);
return;
}
response.setContentType("text/html");
disableCache(response);
String albumName = request.getParameter("album");
PrintWriter out = response.getWriter();
34
(b) Extract a partial fragment
out.println(DOCTYPE_HTML);
out.println("<html>");
out.println("<head>");
out.println("<title>Error</title>");
out.println("</head>");
out.print("<body><p class='error'>");
out.print("Could not load album '" +
albumName + "'");
out.println("</p></body>");
out.println("</html>");
35
out.println("<table border=0>");
int start = page * 20;
int end = start + 20;
end = Math.min(end,
album.getPictures().size());
for (int i = start; i < end; i++) {
Picture picture = album.getPicture(i);
printPicture(out, picture);
}
out.println("</table>");
(c) Extract loop with partial body
1
2
3
4
5
6
7
8
9
10
36
2
3
4
5
***
***
6
7
***
9
1
6
8
10
int start = page * 20;
int end = start + 20;
end = Math.min(end,
album.getPictures().size());
Queue<Picture> pictures =
new LinkedList<Picture>();
for (int i = start; i < end; i++) {
Picture picture = album.getPicture(i);
pictures.add(picture);
}
out.println("<table border=0>");
for (int i = start; i < end; i++)
printPicture(out, pictures.remove());
out.println("</table>");
37
(d) Extract code with conditional exits
if (album == null) {
new ErrorPage("Could not load album '"
+ album.getName() + "'").printMessage(out);
return;
}
//...
38
if (invalidAlbum(album, out))
return;
}
//...
boolean invalidAlbum(Album album,
PrintWriter out) {
boolean invalid = album == null;
if (invalid) {
new ErrorPage("Could not load album '"
+ album.getName() + "'").printMessage(out);
}
return invalid;
}
39
++
out.println("<table border=0>");int start = page * 20;int end = start + 20;end = Math.min(end, album.getPictures().size());for (int i = start; i < end; i++) { Picture picture = album.getPicture(i); printPicture(out, picture);}out.println("</table>");
entry
println
out
*
album
getPictures
size
page
min
+ out
start
end
T F
>
getPicture
i
out
end
printPicture
out
out
println
i
"<table border=0>"
20
"</table>"
exit
p1
p1
p2
p2
Token Semantics
40
++
out.println("<table border=0>");int start = page * 20;int end = start + 20;end = Math.min(end, album.getPictures().size());for (int i = start; i < end; i++) { Picture picture = album.getPicture(i); printPicture(out, picture);}out.println("</table>");
entry
println
out
*
album
getPictures
size
page
min
+ out
start
end
T F
>
getPicture
i
out
end
printPicture
out
out
println
i
"<table border=0>"
20
"</table>"
exit
printPicture
Fine Slicing
41
++
out.println("<table border=0>");for (int i = start; i < end; i++) { printPicture(out, picture);}out.println("</table>");
entry
println
out
out
T F
>
i
out
end
printPicture
out
out
println
i
"<table border=0>"
"</table>"
exit
printPicture
startpicture
The Fine Slice
42
++
out.println("<table border=0>");int start = page * 20;int end = start + 20;end = Math.min(end, album.getPictures().size());for (int i = start; i < end; i++) { Picture picture = album.getPicture(i); printPicture(out, picture);}out.println("</table>");
entry
println
out
*
album
getPictures
size
page
min
+ out
start
end
T F
>
getPicture
i
out
end
printPicture
out
out
println
i
"<table border=0>"
20
"</table>"
exit
printPicture
Co-Slicing
43
++
int start = page * 20;int end = start + 20;end = Math.min(end, album.getPictures().size());for (int i = start; i < end; i++) { Picture picture = album.getPicture(i); }
entry
*
album
getPictures
size
page
min
+
start
end
T F
>
getPicture
i
end
out
i
20
exit
startpicture
The Co-Slice
44
++
entry
*
album
getPictures
size
page
min
+
start
end
T F
>
getPicture
i
end
out
i
20
exit
start
picture
++
entry
println
out
T F
>
end
out
println
i
"<table border=0>"
"</table>"
exit
printPicture
startpicture
Fine slice Co-slice
out
45
++
println
>
remove
printPicture println
++
out.println("<table border=0>");int start = page * 20;int end = start + 20;end = Math.min(end, album.getPictures().size());Queue<Picture> pictures = new LinkedList<Picture>();for (int i = start; i < end; i++) { Picture picture = album.getPicture(i); pictures.add(picture); printPicture(out,pictures.remove());}out.println("</table>");
entry
println
out
*
album
getPictures
size
page
min
+ out
start
end
T F
>
getPicture
i
out
end
printPicture
out
out
println
i
"<table border=0>"
20
"</table>"
exit
new
remove
add
picture
pictures
picture
pictures
pictures
Adding a Container
pictures
46
++
println
<
remove
printPicture println
++
void display(PrintStream out, int start, int end, Queue<Picture> pictures){ out.println("<table border=0>"); for (int i = start; i < end; i++) { printPicture(out, pictures.remove()); } out.println("</table>");}
entry
println
out
out
start
T F
>
out
end
printPicture
out
println
i
"<table border=0>"
"</table>"
exit
pictures
remove
entry
i
out
The Fine Slice
pictures
pictures
picture
47
++
println
>
remove
printPicture println
++
entry
println
out
*
album
getPictures
size
page
min
+ out
start
end
T F
>
getPicture
i
out
end
printPicture
out
out
println
i
"<table border=0>"
20
"</table>"
exit
new
remove
add
out.println("<table border=0>");int start = page * 20;int end = start + 20;end = Math.min(end, album.getPictures().size());Queue<Picture> pictures = new LinkedList<Picture>();for (int i = start; i < end; i++) { Picture picture = album.getPicture(i); pictures.add(picture); printPicture(out,pictures.remove());}out.println("</table>");
Program with
Container
pictures
pictures
pictures
pictures
picture
picture
48
++
>
++
int start = page * 20;int end = start + 20;end = Math.min(end, album.getPictures().size());Queue<Picture> pictures = new LinkedList<Picture>();for (int i = start; i < end; i++) { Picture picture = album.getPicture(i); pictures.add(picture); }display(out, start, end, pictures);
entry
*
album
getPictures
size
page
min
+
out
start
end
T F
>
getPicture
i
end
i
20
exit
newpictures
add
display
pictures
start
out
The Co-Slice
pictures
pictures
pictures
picture
49
Conclusions
• Fine slicing algorithm yields executable slices whose boundaries can be precisely controlled
• Can be used to make any subset of a program executable by adding some control structures but not the data on which they depend– including forward slices, thin slices, barrier
slices, chops, and barrier chops– Conjecture: the size of these executable
programs will not be substantially larger
50
Conclusions
• New Extract Computation refactoring is an important step towards the automation of Extract Method in difficult cases– Enables the automation of big refactorings
from smaller building blocks
• Uses new fine-slicing algorithm• Automatically computes complement
slices (co-slices)• Automatically generates containers to
pass series of values if necessary
51
Related Work (I): Non-Executable Slices
• Traditional backward slicing (e.g., Weiser [ICSE81] or Ottenstein & Ottenstein [PSDE84]), when applied to unstructured code– Solved by path-completion stage in plan-based slicing (Abadi,
Ettinger & Feldman [FSE09])
• Forward slicing (Horwitz, Reps & Binkley, [TOPLAS90])• Barrier slicing (Krinke [SCAM03])• Chopping (Jackson & Rollins [FSE94]) and Barrier
Chopping (Krinke [SCAM03])• Thin slicing (Sridharan, Fink & Bodik [PLDI07])• All the above can be made executable with an
appropriate oracle, by adding the required control structure
52
Related Work (II): Executable Slices with Reduced Scope or Size
• Block-based slicing (Maruyama [SSR01]): structured code only, no correctness proof
• Co-slicing (Ettinger's thesis, Oxford 2006): limited to slicing from the end and oracle of final values only; proof on toy language
• Parametric slicing (Field, Ramalingam & Tip [POPL95]): an executable generalization of static and dynamic slices; like oracle semantics, they formalize programs with holes; however, their holes stand for expressions whose values are irrelevant, while our holes stand for significant (oracle) values
• Some forms of dynamic and forward slicing are executable (Binkley et al. [SCAM04]): forward slices made excessively large through the addition of backward slices
53
Related Work (III): Behavior- Preserving Procedure Extraction
• Contiguous code– Bill Opdyke's thesis (UIUC 1992): for C++– Griswold and Notkin [ToSE93]: for Scheme
• Arbitrary selections– Tucking (Lakhotia & Deprez [IST98]): the complement is a slice too; no dataflow from the
extracted slice to its complement yields over-duplication; strong preconditions (e.g., no global variables involved, and no live-on-exit variable defined in both the slice and complement)
– Semantics-Preserving Procedure Extraction (Komondoor & Horwitz [POPL00]): considers all permutations of selected and surrounding statements; no duplication allowed; not practical (exponential time complexity); very strong preconditions
– Effective Automatic Procedure Extraction (Komondoor & Horwitz [IWPC03]): improves on their previous algorithm by improving complexity (cubic time and space), allowing some duplication (of conditionals and jumps); might miss some correct permutations; no duplication of assignments or loops; allows dataflow from complement to extracted code and from extracted code to (the second portion of the) complement; supports extraction of returns
– Extraction of block-based slices (Maruyama [SSR01]): extracts a slice of one variable only; restricted to structured code; no proof given
– Ettinger's thesis (Oxford 2006): sliding transformation sequentially composes a slice and its complement, allowing dataflow from the former to the latter; supports loop untangling and duplication of assignments; restricted to slicing from the end, and only final values from the extracted slice can be reused in the complement; proof for toy language