AAIS 05 Curino, Giusti Delta Debugging
Delta Delta DebuggingDebugging
Authors:Carlo Curino, Alessandro Giusti
Politecnico di MilanoPolitecnico di Milano
An advanced debugging technique
AAIS 05 Curino, Giusti Delta Debugging
MotivationsMotivations• Reducing faults:
• 50%-80% of total cost
• Debugging:
• One of the hardest, yet least systematic activities of software engineering
• most time-consuming
• Locating faults:
• most difficult
50%
60%
70%
80%
90%
100%
1979 1984 1990 2000
Software maintenance/total
software cost
AAIS 05 Curino, Giusti Delta Debugging
OverviewOverview
• Which problems are solved by Delta Debugging
• Four solutions: a common approach
1. Simplifying failure-inducing input
2. Isolating failure-inducing thread schedule
3. Identifying failure-inducing changes in the code
4. Isolating Cause-Effect Chains
AAIS 05 Curino, Giusti Delta Debugging
Failure-inducing inputFailure-inducing input
• This HTML input makes Mozilla crash (segmentation fault). Which portion is the failure-inducing one?
AAIS 05 Curino, Giusti Delta Debugging
Thread schedulingThread scheduling
• The result of a multithread program seems not deterministic. Why it happens?
AAIS 05 Curino, Giusti Delta Debugging
Code changesCode changes
• The old version of GDB works with DDD, the new one doesn’t!
• 178.000 lines of code have been modified between the two versions where’s the bug?
AAIS 05 Curino, Giusti Delta Debugging
Cause-effect chainCause-effect chain
• Which part of the program state is involved in the failure?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
AAIS 05 Curino, Giusti Delta Debugging
Four solutions: a single approachFour solutions: a single approach
• The underlying problem is:
• Find which part of something determines the failure
So a common strategy can be applied:
• Divide et impera applied to deltas between:
• Working and failing Inputs
• Working and failing code versions
• Working and failing threads schedules
• Working and failing program states
This allows:
• Efficient and automatic debugging procedure
AAIS 05 Curino, Giusti Delta Debugging
Common terminology Common terminology • A test case can either:
• Fail
• (The failure shows up)
• Pass
• (program runs properly)
• Be Unspecified
• (different problems arise)
• Delta debugging Algorithms iteratively:
• Apply changes (to input, code, schedule or state)
• Run tests
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
AAIS 05 Curino, Giusti Delta Debugging
Common terminology (2) Common terminology (2)
• Concept of difference:
• A really general delta between something in 2 test cases
• Examples:
• Difference in the input: different character (or bit) in the input stream
• Difference in thread schedule: difference in the time a given thread switch is performed
• Difference in the code: different statement in 2 version of a program
• Difference in the program state: different values of the internal variables of a program
AAIS 05 Curino, Giusti Delta Debugging
Simplifying Failure-inducing Simplifying Failure-inducing inputinput
AAIS 05 Curino, Giusti Delta Debugging
Minimizing vs IsolatingMinimizing vs Isolating
• Minimizing (ddmin algorithm):
• Slower
• More human friendly
• Isolating (dd algorithm):
• Generalization of the ddmin algorithm
• Faster
• Good to generate the input of the cause-effect chain DD
AAIS 05 Curino, Giusti Delta Debugging
Minimizing: Mozilla bugMinimizing: Mozilla bug
• Minimizing:
• 57 test to simplify the 896 line HTML input to the “<SELECT>” tag that causes the crash
• Each character is relevant (as shown from line 20 to 26)
• Only removes deltas from the failing test
• Returns a n-minimal (global minimum is NP) input that causes a failure
AAIS 05 Curino, Giusti Delta Debugging
Minimizing: didactic example Minimizing: didactic example
AAIS 05 Curino, Giusti Delta Debugging
Isolating: Mozilla bugIsolating: Mozilla bug
• Isolating:
• Only 7 tests (instead of 26)
• Removes deltas from the failing test and add deltas to passing test
• Isolates a single delta “<” that makes the failure to go away
• Returns the 2 nearest input on failing and the other passing
AAIS 05 Curino, Giusti Delta Debugging
General DD Algorithm General DD Algorithm
Initial Fail
Initial Pass
Diff
eren
ces
AAIS 05 Curino, Giusti Delta Debugging
General DD Algorithm General DD Algorithm
Initial Fail
Initial Pass
Diff
eren
ces
What if we remove these difffrom current failing test?
AAIS 05 Curino, Giusti Delta Debugging
General DD AlgorithmGeneral DD Algorithm
Initial Fail
Initial Pass
Diff
eren
ces
Failure disappears:“Move up”
AAIS 05 Curino, Giusti Delta Debugging
General DD AlgorithmGeneral DD Algorithm
Initial Fail
Initial Pass
Diff
eren
ces
What if we remove these diff?
AAIS 05 Curino, Giusti Delta Debugging
General DD AlgorithmGeneral DD Algorithm
Initial Fail
Initial Pass
Diff
eren
ces
UNRESOLVED TEST: “Increase Granularity”
AAIS 05 Curino, Giusti Delta Debugging
General DD AlgorithmGeneral DD Algorithm
Initial Fail
Initial Pass
Diff
eren
ces
What if we remove these diff from current failing test?
AAIS 05 Curino, Giusti Delta Debugging
General DD AlgorithmGeneral DD Algorithm
Initial Fail
Initial Pass
Diff
eren
ces
Still Fails: “Move Down”
AAIS 05 Curino, Giusti Delta Debugging
Formally: the AlgorithmFormally: the Algorithm
AAIS 05 Curino, Giusti Delta Debugging
Efficiency considerationsEfficiency considerations
• The worst case: |k|2 + 3|k| tests (k=cardinality of the change set)
• all test cases are unresolved except the last one
• very unlikely
• The best case: 2*log|k|
• Try to avoid unresolved tests outcomes
• Lexical, syntactical knowledge about input
AAIS 05 Curino, Giusti Delta Debugging
DEMODEMO
Eclipse Plugin Live Demo
AAIS 05 Curino, Giusti Delta Debugging
Thread SchedulingThread Scheduling
• The behavior of a multithreaded program may depend on the schedule.
AAIS 05 Curino, Giusti Delta Debugging
DD applied to Thread SchedulingDD applied to Thread Scheduling
• Debug is even harder here:
• Thread switches and schedules are nondeterministic
• It is difficult to reproduce and isolate failures
• Goal:
• Relate failure to a small set of relevant differences from passing and failing schedules
• Again a “purely experimental approach”, no need to understand the program
AAIS 05 Curino, Giusti Delta Debugging
Purely experimental: Pros and Purely experimental: Pros and ConsCons
• Pros:
• program treated as a black box: requires only to execute the program
• Failure: an arbitrary behaviour of the program. Requires only to distinguish failure from success.
• Cons:
• (w.r.t static analysis) Test-based: can not determine properties for all runs of a program like the general absence of deadlocks
• require an observable failure
AAIS 05 Curino, Giusti Delta Debugging
Dejavu toolDejavu tool
• Tool: Dejavu (DEterministic JAVa replay Utility) by IBM
• Reproduce of schedules and induced failures
• Exploiting Dejavu
• the Thread Schedule becomes an input
• We can generate schedules by mixing 1 running schedule and 1 failing schedule
AAIS 05 Curino, Giusti Delta Debugging
Differences in thread schedulingDifferences in thread scheduling
• Starting point:
• Passing run
• Failing run
• Differences (for t1):
• t1 occurs in at time 254
• t1 occurs in at time 278
• ∆1 = |278 − 254| induces a statement interval: the code executed between time 254 and 278
AAIS 05 Curino, Giusti Delta Debugging
Differences in thread schedulingDifferences in thread scheduling
• We can build further test cases mixing the two schedule to isolate the relevant differences
AAIS 05 Curino, Giusti Delta Debugging
Real life test: settingReal life test: setting
• Test #205 of the SPEC JVM98 Java test suite
• Modification of the raytracer program to a multi-threaded version
• Introduction of a simple race condition
• Implementation of an automated test that checks failure/passing
• Generation of random schedules to find a passing schedule and a failing schedule
• Differences between the passing and failing schedule:
• 3,842,577,240 differences
• Each diff moves thread switch time to +1 or -1
AAIS 05 Curino, Giusti Delta Debugging
Real life test: resultsReal life test: results
• DD isolate one single difference after 50 test (about 28 min)
AAIS 05 Curino, Giusti Delta Debugging
Real life test: pin-point the failureReal life test: pin-point the failure
• The failure occurs if and only if thread switch #33 occurs at yield point (safe point like function invocation) 59,772,127 (instead of 59,772,126)
• at 59,772,127 line 91 is the first yield point after the initialization of OldScenesLoaded
• At 59,772,126 line 82 is the yield point just before the initialization of OldScenesLoaded
AAIS 05 Curino, Giusti Delta Debugging
Real life test: conclusionReal life test: conclusion• Delta Debugging is efficient
• even when applied to very large thread schedules (>3,000,000,000 diff)
• No analysis is required as Delta Debugging relies on experiments alone
• only the schedule was observed and altered
• failure-inducing thread switch is easily associated with code
• Alternate runs are obtained automatically
• by generating random schedules
• only one initial run (pass or fail) is required
AAIS 05 Curino, Giusti Delta Debugging
Code changesCode changes
• A given revision of a program behaves correctly. The next one does not.
• Find which of the changes in the code causes the problem.
• Inconvent when difference == thousands of lines of code
AAIS 05 Curino, Giusti Delta Debugging
The manual solutionThe manual solution• Binary search through the revision history
Regression containment
• Does not always work:
• Multiple changes that cause the failure only when combined (interference)
• A single change can amount to many code lines (granularity)
• Mixing parallel developement branches originates inconsistency problems
AAIS 05 Curino, Giusti Delta Debugging
ProcedureProcedure
• Developed in 1999: some differences with current general DD algorithms.
• Consider the differences between the working and failing revisions.
• Ignore any knowledge about the temporal ordering of the changes.
• Goal: find a minimal failure-inducing change set.
AAIS 05 Curino, Giusti Delta Debugging
InconsistenciesInconsistencies
• Mixing code changes regardless of their ordering originates lots of tests with “Unresolved” outcome:
• Integration failure
• Construction failure
• Execution failure
• They increase complexity of the DD algorithm!
AAIS 05 Curino, Giusti Delta Debugging
Future workFuture work
• Group related changes (partly done) less inconsistent trials.
• Common change dates/sources
• Location criteria
• Lexical criteria
• Syntactic criteria (common funcions/modules)
• Semantic criteria
AAIS 05 Curino, Giusti Delta Debugging
Cause-Effect BackgroundCause-Effect Background• A bit of background:
• A program state is represented by variable values, and references.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
AAIS 05 Curino, Giusti Delta Debugging
Background (2)Background (2)
• While the program runs, the state evolves.
• We assume the program is
• Deterministic
• Not interactive
identical states at identical times have identical evolutions.
AAIS 05 Curino, Giusti Delta Debugging
Idea: apply DD to program Idea: apply DD to program states.states.
• We need two distinct runs:
• one failing
• one passing
• We want the two runs to be (initially) as much similar as possibile.
• If we let the two runs evolve in parallel, their initial state will be similar.
• Isolating failure-inducing input can help.
• Apply DD to different "slices" of the program evolution. (A sort of TAC for computer routines).
AAIS 05 Curino, Giusti Delta Debugging
ProcedureProcedure• Iteratively
• Build a new state mixing the passing and failing state.
• Let the program evolve and see if it passes, fails, or does unrelated weird things (undefined outcome).
• Isolate the smallest subset of the state relevant for the failure.
• No news so far. But:
• this happens at a specific moment of the program evolution. It will be repeated (e.g. at important functions' entry points).
AAIS 05 Curino, Giusti Delta Debugging
The resultThe result
• A cause-effect chain that leads to a failure.
AAIS 05 Curino, Giusti Delta Debugging
The cause-effect chainThe cause-effect chain
• The initial states are absolutely legitimate: for example, direct consequence of a specific input that the program should handle.
intended program states.
• The final effects are the failure.
faulty program states.
• The error lies somewhere in the middle, when an intended program states evolves into a faulty one.
AAIS 05 Curino, Giusti Delta Debugging
Fascinating terminologyFascinating terminology
• A defect in the code originates an infection in the state.
• The infection usually propagates as the program evolves.
AAIS 05 Curino, Giusti Delta Debugging
LimitsLimits• No automatic discrimination of intended and faulty
(infected) states!
• The human user can increase resolution of slices, and pinpoint the code that evolves an INTENDED state to a FAULTY one.
Correct the error (== defect in the code) and break the cause-effect chain that leads to the failure.
AAIS 05 Curino, Giusti Delta Debugging
Cause TransitionsCause Transitions
• Sometimes executing an instruction
• a given variable ceases to be failure-inducing
• others begin
the failure-inducing subset of the state changes (cause transition)
• An algorithm can efficiently find cause transitions in cause-effect chains, by means of binary search (again).
AAIS 05 Curino, Giusti Delta Debugging
Cause Transitions (2)Cause Transitions (2)
AAIS 05 Curino, Giusti Delta Debugging
Cause Transitions (3)Cause Transitions (3)
Why do we bother looking for cause transitions?
• A variable begins to cause a failure:
• Good location for a fix
• More important:
• “cause transitions are significantly better locators of defects than any other methods previously known”
• Result: valuable help in the search for the defect: only a bunch of cause transitions, and nearby code locations need to be analyzed as the source of the infection.
AAIS 05 Curino, Giusti Delta Debugging
Other approaches to defect Other approaches to defect localizationlocalization
• Coverage
• Slicing
• Dynamic invariantsno success with Siemens test suite
• Explicit specificationgood results, but needs specification of desired internal behavior
• Nearest neighbor (using coverage)best results albeit quite naive
AAIS 05 Curino, Giusti Delta Debugging
Evaluation setupEvaluation setup
• Siemens suite
• 7 C sample programs (hundreds of lines of code each).
• 132 variations with one realistic defect each.
• A test suite for each program.
• Apply the different defect locators, and compare their performance (only comparison to NN is presented).
AAIS 05 Curino, Giusti Delta Debugging
Evaluation resultsEvaluation results
AAIS 05 Curino, Giusti Delta Debugging
ClarificationClarification• Two small improvements;
• relevance of code locations (automatic)
• sources of infection (programmer-driven): Unfair!
Jump to the conclusion
AAIS 05 Curino, Giusti Delta Debugging
Zoom on the representation of the Zoom on the representation of the statestate
We said:
“A program state is represented by variable values, and references”
In general, representing and manipulating the state is not trivial
• One of the problems: C pointers
copying their value does not make sense
Solution: Memory graphs.
AAIS 05 Curino, Giusti Delta Debugging
Memory graphsMemory graphs•Systematically unfold all data structures, starting
from base variables.
AAIS 05 Curino, Giusti Delta Debugging
Memory graphs (2)Memory graphs (2)
• Nodes: all values and all variables of a program operations like
• Edges:
• variable access
• pointer dereferencing
• struct member access
• array element access
Abstract from memory addresses.
Compare and alter pointers.
AAIS 05 Curino, Giusti Delta Debugging
Memory graphs (3)Memory graphs (3)
•What if the set of variables differ in the two states we are mixing?
• Just compute the largest common subgraph.
The deltas we apply to a state:
• Change variable values.
• Alter data structures.
AAIS 05 Curino, Giusti Delta Debugging
Implementation considerationsImplementation considerations
•All we need is a way to access and modify program state.
• GDB is the solution for C programs, but has performance problems (5000% overhead).
• DD applied to states is still a black box approach (sort of)
• Easily extended to other languages as soon as something provides GDB-like functionality.
AAIS 05 Curino, Giusti Delta Debugging
ConclusionsConclusions
Delta Debugging:
• is an extremely interesting technique
• works pretty good at least in theory
• there are no usable tools
• can be usefully integrated in various IDE
• the algorithm is now patent-free (expired patent)
SO :
LET’S MAKE SOME MONEY ON IT!
AAIS 05 Curino, Giusti Delta Debugging
AcknowledgementsAcknowledgements
• Some slides and images adapted from Dr. Andreas Zeller’s presentations and papers
• (http://www.st.cs.uni-sb.de/~zeller/)
AAIS 05 Curino, Giusti Delta Debugging
ReferencesReferences• Yesterday, My Program Worked. Today, It does Not. Why?,
Andreas Zeller, FSE 1999
• Finding Failure Causes through Automated Testing. Holger Cleve, Andreas Zeller; 4° International Workshop on Automated Debugging 2000
• Simplifying failure-inducing input, Ralf Hildebrandt, Andreas Zeller, ISSTA 2000
• Automated Debugging: Are We Close? Andreas Zeller; IEEE Computer, November 2001.
• Isolating Failure-Inducing Thread Schedules. Jong-Deok Choi and Andreas Zeller, ISSTA 2002
• Isolating Cause-Effect Chains from Computer Programs, Andreas Zeller, FSE 2002
• Locating Causes of Program Failures. Holger Cleve and Andreas Zeller, ICSE 2005