CS4723Software
Engineering
Lecture 10Debugging and Fault
Localization
2
Debugging
We do when testing find a bug
Basic Process Reproduce the bug
Locate the fault
Fix
3
Debugging
Sometimes the software is too large
Before we can do the fix
Narrow down the relevant input Delta Debugging
Narrow down the relevant code Statistical debugging
Dynamic slicing
4
Debugging
The inputs can be very complex… Quite common in real world (compiler, office,
browser, database, OS, …)
Important to locate just relevant inputs Shorten the execution for debugging Filter out the noise Easier to identify the root cause of the bug
5
Consider Mozilla Firefox
Taking html pages as inputs A large number of bugs are related to
loading certain html pages Corner cases in html syntax
Incompatibility between browsers
Corner cases in Javascripts, css, …
Error handling for incorrect html, Javascript, css, …
6
How do we go from this<SELECT NAME="op sys" MULTIPLE SIZE=7><OPTION VALUE="All">All<OPTION VALUE="Windows 3.1">Windows 3.1<OPTION VALUE="Windows 95">Windows 95<OPTIONVALUE="Windows 98">Windows 98<OPTION VALUE="Windows ME">Windows ME<OPTION VALUE="Windows 2000">Windows2000<OPTION VALUE="Windows NT">Windows NT<OPTION VALUE="Mac System 7">Mac System 7<OPTION VALUE="Mac System7.5">Mac System 7.5<OPTION VALUE="Mac System 7.6.1">Mac System 7.6.1<OPTION VALUE="Mac System 8.0">Mac System8.0<OPTION VALUE="Mac System 8.5">Mac System 8.5<OPTION VALUE="Mac System 8.6">Mac System 8.6<OPTION VALUE="MacSystem 9.x">Mac System 9.x<OPTION VALUE="MacOS X">MacOS X<OPTION VALUE="Linux">Linux<OPTIONVALUE="BSDI">BSDI<OPTION VALUE="FreeBSD">FreeBSD<OPTION VALUE="NetBSD">NetBSD<OPTIONVALUE="OpenBSD">OpenBSD<OPTION VALUE="AIX">AIX<OPTION VALUE="BeOS">BeOS<OPTION VALUE="HP-UX">HPUX<OPTION VALUE="IRIX">IRIX<OPTION VALUE="Neutrino">Neutrino<OPTION VALUE="OpenVMS">OpenVMS<OPTIONVALUE="OS/2">OS/2<OPTION VALUE="OSF/1">OSF/1<OPTION VALUE="Solaris">Solaris<OPTIONVALUE="SunOS">SunOS<OPTION VALUE="other">other</SELECT></td><td align=left valign=top><SELECT NAME="priority" MULTIPLE SIZE=7><OPTION VALUE="--">--<OPTION VALUE="P1">P1<OPTION VALUE="P2">P2<OPTION VALUE="P3">P3<OPTIONVALUE="P4">P4<OPTION VALUE="P5">P5</SELECT></td><td align=left valign=top><SELECT NAME="bug severity" MULTIPLE SIZE=7><OPTION VALUE="blocker">blocker<OPTION VALUE="critical">critical<OPTION VALUE="major">major<OPTIONVALUE="normal">normal<OPTION VALUE="minor">minor<OPTION VALUE="trivial">trivial<OPTIONVALUE="enhancement">enhancement<
7
To this…
<SELECT NAME="priority" MULTIPLE SIZE=7>
8
Motivation
Turning bug reports with real web pages to minimized test cases
The minimized test case should still be able to reveal the bug
Benefit of simplification Easy to communicate
Remove duplicates
Easy debugging Involve less potentially buggy code Shorter execution time
9
Delta Debugging
The problem definition A program exhibit an error for an input
The input is a set of elements
E.g., a sequence of API calls, a text file, a serialized object, …
Problem: Find a smaller subset of the input that still cause the
failure
10
A generic algorithm
How do people handle this problem?
Binary search Cut the input to halves
Try to reproduce the bug
Iterate
11
Delta Debugging Version 1
The set of elements in the bug-revealing input is I
Assumptions Each subset of I is a valid input:
Each Subset of I -> success / fail
A single input element E causes the failure
E will cause the failure in any cases (combined with any other elements) (Monotonic)
12
Solution is simple
Go with the binary search process
Throw away half of the input elements, if the rest input elements still cause the failure
13
Solution is simple
Go with the binary search process
Throw away half of the input elements, if the rest input elements still cause the failure
A single element: we are done!
14
Example
15
Delta Debugging Version 1
This is just binary search: easy to automate
The assumptions do not always hold
Let’s look at the assumptions:
(I1 U I2) =
-> I1 = and I2 =
or I1 = and I2 =
It is interesting to see if this is not the case
16
Case I: multiple failing branches
What happened if I1 = and I2 = ?
A subset of I1 fails and also a subset of I2 fails
We can simply continue to search I1 and I2 And we find two fail-causing elements
They may be due to the same bug or not
17
Case II: Interference
What happened if I1 = and I2 = ?
This means that a subset of I1 and a subset of I2
cause the failure when they combined
This is called interference
18
Handling Interference
The cute trick Consider I1 = and I2 =
But I1 U I2 =
An element D1 in I1 and an element D2 in I2 cause the
failure
We do binary search in I2 with I1
Split I2 to P1 and P2, try I1 U P1 and I1 U P2
Continue until you find D2, so that I1 U D2 cause the
failure
Then we do binary search in I1 with D2 until find D1
Return D1 U D2
19
Example I: Handle interference
Consider 8 input elements, of which 3 and 7 cause the failure when they applied together
Configuration Result1 2 3 4
5 6 7 81 2 3 4 5 61 2 3 4 7 8
1 2 3 4 7
1 2 7 3 4 7 3 7
Interference!
20
Example II: Handle multiple interference
Consider 8 input elements, of which 3, 5 and 7 cause the failure when they applied together
Configuration Result1 2 3 4
5 6 7 81 2 3 4 5 61 2 3 4 7 8
1 2 3 4 5 6 7
1 2 3 4 5 7 1 2 5 7 3 4 5 7
Interference!
Second Interference! What to do?
3 5 7
Go on with I1 U P1!
21
Delta Debugging Version 2
The set of elements in the bug-revealing input is I
New Assumptions Each subset of I is a valid input
A subset of input elements E causes the failure
E will cause the failure in any cases (combined with any other elements)
22
Delta Debugging Version 2
Algorithm Split I to I1 and I2
Case I: I1 = and I2 =
Try I1
Case I: I1 = and I2 =
Try I2
Case I: I1 = and I2 =
try both I1 and I2
Case II: I1 = and I2 =
Handle interference for I1 and I2
23
Real example: GNU Compiler
This input program (bug.c)
causes Gcc 2.59.2 to crash
when all optimitization are
enabled
Minimize it to debug gcc
Consider each character
as an element
24
Real example: GNU Compiler
Our delta debugging process Create the appropriate subset of bug.c
Feed it to gcc
Continue according to whether Gcc crashes77
25
GCC compiler example
The minimized code:
The test case is 1-minimal No single character can be removed
Even every space is removed
The function name has been changed from mult to a signle t
Gcc is executed for 700+ times
Input reduce to 10% of the initial input
t(double z[],int n){int i,j;for(;;){i=i+j+1;z[i]=z[i]*(z[0]+0);}return z[n];}
26
Another example: GDB
GDB is the debugger from GNU
It updates from 4.16 to 4.17
The version 4.17 no longer compatible with DDD (a GUI for GNU software development tools)
178, 000 lines of code change from 4.16
How to know which code change(s) cause the failure
27
Results
After a lot of work (by machine) 178KLOC change grouped to 8700 groups (commits)
Use delta debugging
Work it out in 470 tests
It took 48 hours
Doing this by hand would be a nightmare!
28
Importance of input elements
It is important to have good input element definition So that subset of input elements are valid for input
The size of input is small
Consider the examples GCC example: we use characters as elements, which
is simple but not so good, if the bug happens after parser, the bug is not monotonic due to syntax errors
GDB example: we group LOC to groups to reduce input size to 5% of the original size. 2 days are acceptable, what about 40 days?
29
Limitations of Delta debugging
Rely on the assumptions Monotonicity does not always hold
Rely on good input elements, always providing valid inputs will enhance efficiency
Require automatic test oracles Good for regression testing No good for development-time testing
30
Statistical Debugging
Delta Debugging Narrow down the input to be considered
Statistical Debugging Narrow down the code to be considered
31
Statistical Debugging
Basic Idea Consider a number of test cases, some of
which pass and some of which fail
If a statement is covered mostly by failed test cases, it is highly likely to be the buggy part of the code
32
Tarantula A classical tool for statistical debugging
Use the following formulas Color = red + pass/(fail + pass) * (green ) Brightness = max (pass, fail)
33
Tarantula: Illustration
34
Context based statistical debugging Not just consider a statement
Runtime Control Flow Graph
Also consider connections Outcomes of branches Connections on a runtime-CFG
35
Runtime Control Flow Graph1: void replaceFirst (sx, sy) {2: for (int i=0;i<len;i++) {3: if (arr[i]==sx){4: arr[i] = sz;5: //should break;6: }7: if (arr[i]==sy)){8: arr[i] = sz;9: //should break;10: }11: }12:}
pass passFail
36
Limitations Questions:
If a statement is covered only by passed test cases, can it be the root cause of the bug found?
If a statement is covered only by failed test cases, it must be the root cause of the bug found?
37
Example
void f(int a, int b){ if (a > 0){ //error: should be >= do something; } if (b < 0){ do something }}
Test Cases:3, 22, 1, 0, -12, 0
38
Dynamic Slicing Another way to narrow down code to be
considered in debugging Recall static slicing
All code elements that affect or are affected by a certain variable
Generate a large dependency graph for the code
Do reachability analysis
39
Data Dependencies
Data dependencies are the dependency from the usage of a variable to the definition of the variable
Example:s1: x = 3;s2: if(y > 5){s3: y = y + x; //data depend on x in s1s4: }
40
Control Dependencies
Control dependencies are the dependency from the branch basic blocks to the predicate
Example:
s1: x = 3;s2: if(y > 5){s3: y = y + x; //control depend on y in s2s4: }
41
Program slicing for sum = 0 -> sum = 1entry:main
expression: sum=0
expression: i=1
control-point: while i<11
call-site: add
expression:sum=add$0
call-site: add
expression:i=add$1
actual-out:add$0
actual-out:add$1
actual-in:sum$0
actual-in: i$0
actual-in: i$1
entry: add
Formal-in: a Formal-in:b formal-out:add$result
expression: add$result=a+b
???
actual-in: 1
42
Dynamic Slicing Also describe dependencies among code
elements
If a variable has incorrect value, the bug should be in its backward dynamic slice
Like runtime control flow graph A map from static slicing to the executed
code
Dynamic Slicing Example
1: b=02: a=23: for i= 1 to N do4: if ((i++)%2==1) then5: a = a+1 else6: b = a*2 endif done7: z = a+b8: print(z)
For input N=2,11: b=0 [b=0]
21: a=2
31: for i = 1 to N do [i=1]
41: if ( (i++) %2 == 1) then [i=1]
51: a=a+1 [a=3]
32: for i=1 to N do [i=2]
42: if ( i%2 == 1) then [i=2]
61: b=a*2 [b=6]
71: z=a+b [z=9]
81: print(z) [z=9]
Algorithm I
This algorithm uses a static dependence graph in which all executed nodes are marked dynamically so that during slicing when the graph is traversed, nodes that are not marked are avoided as they cannot be a part of the dynamic slice.
Limited dynamic information - fast, imprecise (but more precise than static slicing)
81
71
51
41
31
11
21
Algorithm I Example1: b=0
2: a=2
3: 1 <=i <=N
4: if ((i++)%2= =1)
5: a=a+1 6: b=a*2
7: z=a+b
8: print(z)
T F
T
F
For input N=1, the trace is:
32
Algorithm II
A dependence edge is introduced from a load to a store if during execution, at least once, the value stored by the store is indeed read by the load (mark dependence edge)
No static analysis is needed.
11
21
51
71
81
31
41
Algorithm II Example
1: b=0
2: a=2
3: 1 <=i <=N
4: if ((i++)%2= =1)
5: a=a+1 6: b=a*2
7: z=a+b
8: print(z)
T F
T
F
For input N=1, the trace is:
Algorithm II Example
1: b=0
2: a=2
3: 1 <=i <=N
4: if ((i++)%2= =1)
5: a=a+1 6: b=a*2
7: z=a+b
8: print(z)
T F
T
F
For input N=2, the trace is:
21 : save a11 : save b
31 : save i
41 : load i
51 : load/save a
32 : load/save i
61 : load a / save b
71 : load a, b / save z
81 : load z
42 : load i
Algorithm II – Compare to Algorithm I
More precise
b=…
…=b…=b
Algo. I
b=…
…=b…=b
Algo. II
Efficiency: Summary
For an execution of 130M instructions: Space requirement: about 1.5GB Time requirement: About 10 min
JSlice http://jslice.sourceforge.net/
Dynamic Dependence Graph Sizes
ProgramStatements Executed (Millions)
Dynamic Dependence
Graph Size(MB)
300.twolf
256.bzip2
255.vortex
197.parser
181.mcf
134.perl
130.li
126.gcc
099.go
140
67
108
123
118
220
124
131
138
1,568
1,296
1,442
1,816
1,535
1,954
1,745
1,534
1,707
Classic Dynamic Slicing in DebuggingBuggy Runs LOC EXEC
(%LOC)
BS (%EXEC)
flex 2.5.31(a) 26754 1871 (6.99%) 695 (37.2%)
flex 2.5.31(b) 26754 2198 (8.2%) 272 (12.4%)
flex 2.5.31(c) 26754 2053 (7.7%) 50 (2.4%)
grep 2.5 8581 1157 (13.5%) NA
grep 2.5.1(a) 8587 509 (5.9%) NA
grep 2.5.1(b) 8587 1123 (13.1%) NA
grep 2.5.1(c) 8587 1338 (15.6%) NA
make 3.80(a) 29978 2277 (7.6%) 981 (43.1%)
make 3.80(b) 29978 2740 (9.1%) 1290 (47.1%)
gzip-1.2.4 8164 118 (1.5%) 34 (28.8%)
ncompress-4.2.4 1923 59 (3.1%) 18 (30.5%)
polymorph-0.4.0 716 45 (6.3%) 21 (46.7%)
tar 1.13.25 25854 445 (1.7%) 105 (23.6%)
bc 1.06 8288 636 (7.7%) 204 (32.1%)
Tidy 31132 1519 (4.9 %) 554 (36.5%)
2.4-47.1% EXEC
Avg 30.9%
Advantages compared with StatisticalDebugging
Error-related code is guaranteed to be appear in the slice
Only requires the test case that reveals the bugs This is a large advantage for field bugs
reported by users
Issues about Dynamic Slicing
Slices are usually not very small (30% of the execution code)
Running history – very big ( GB ) Algorithm to compute dynamic slice
- slow and very high space requirement. On average, given an execution of 130M
instructions, the constructed dependence graph requires 1.5GB space.
Review of Debugging
Debugging is a process after testing Steps:
Reproduce, Localize, Fix Approach in localization
Delta Debugging Statistic Debugging Dynamic Slicing