“isolating failure causes through test case generation “ jeremias rößler gordon fraser...
TRANSCRIPT
“Isolating Failure Causes through Test Case Generation “
Jeremias RoßlerGordon FraserAndreas Zeller
Alessandro Orso
Presented by John-Paul Ore
Motivation: Debugging & Maintenance is Super Expensive
• Cost to develop software worldwide: $1,500,000,000,000 (USD)
• Debugging and Maintenance cost$350,000,000,000 (USD) (assumes 23% of developer time spent
debugging)
Source: Judge Business School of the University of Cambridge, UK (2013) Evans Data Corporation (2012), Payscale (2012), RTI (2002), CVP Surveys (2012)
What is Debugging?
Finding the fault responsible for the failure, and applying a change to program P such that P is correct with regard to the specification S concerning the failure.
Debugging includes a search problem. We can automate search.
Talk Outline
• Problems BugEx seeks to address• Background concepts• Inner Workings of BugEx Algorithm• Empirical Evaluation• Relation of this work to 990 Class Project
Automated Debugging: still a hard problem
Parnin, Chris, and Alessandro Orso. "Are automated debugging techniques actually helping programmers?." Proceedings of the 2011 International Symposium on Software Testing and Analysis. ACM, 2011.
BugEx : Overview Problems addressed
Problem 1: Automated debugging techniques reveal too many possible code locations
Solution 1: Increase precision through guided test-generation
Problem 2: Even if the location is known, developer might not have perfect bug understanding
Solution 2: presents ‘facts’ rather than code locationProblem 3: Other experimental techniques unsound (Delta Debugging, Predicate switching)
Solution 3: Generate real program executions
BugEx: Underlying Concepts
1. Expands on statistical debugging. Correlate program facts with failures
1. BugEx extends Statistical DebuggingBenjamin Liblit et al.
Liblit, B., Aiken, A., Zheng, A. X., & Jordan, M. I. (2003). Bug isolation via remote program sampling. ACM SIGPLAN Notices, 38(5), 141-154.(and more, identified in the paper)
“Statistical debugging works off of the contrast between good and bad runs, so you need to feed it both.” – B. Liblit.
Passing test case
Failing test case
BugEx: Underlying Concepts
1. Expands on statistical debugging. Correlate program facts with failures
2. Use automatic test generation (genetic algorithms) to create statistically significant number of tests
2. Test Case GenerationGenetic Algorithms
• Individual is a TEST encoded in JAVA bytecode • Mutation might change
branching or variable values
•TE
ST_a
•TE
ST_b
•TE
ST_b
’
•TE
ST_a
’
• Fitness branch distance or predicate distance (closer is better)
Image http://www.ewh.ieee.org/soc/es/May2001/14/Begin.htm
Test Case GenerationGenetic Algorithms
• Shape of the search directs fitness function (the gradient)
• Globally Optimality not guaranteed
Image © Mathworks, 2010
Overview of BugEx (hint: it’s a Search)
Generate Tests explore search space(Genetic algorithm)
Find facts that correlate with failure to guide test generation(Statistical debugging)
Show results
BugEx Algorithm : Initialization(figure 4 p. 312)
BugEx Algorithm : Main Loop(figure 4 p. 312)
(of the best!)
(Statistical Debugging)
(Genetic Algorithm)
LOO
P
(branches or state predicates)
14. F := getFacts(Tfail) U getFacts(Tpass) U F
1. Fact must be Boolean: either true or false at runtime2. Fact must be observable.
Branches• Reached or not reached• T or F branch taken?
attribute | parameters | inspector
< | > | <= | >= | = | !=attribute | parameters | inspector | constant
State Predicates• All available variables, objects,
constants at beginning of method
? How Big is this space (in Big O) ? {
16. Fcorrelating := correlateToFailure(F, Tfail, Tpass)
Bayes’ Theorem
Bayesian Inference
Slides courtesy of Jeremias Roßler (2012)
Slides courtesy of Jeremias Roßler (2012)
Slides courtesy of Jeremias Roßler (2012)
Slides courtesy of Jeremias Roßler (2012)
Slides courtesy of Jeremias Roßler (2012)
Slides courtesy of Jeremias Roßler (2012)
Slides courtesy of Jeremias Roßler (2012)
Empirical Evaluation
Empirical Research Questions
• RQ1. Is the number of relevant facts identified by BUGEX small enough for a developer to examine?
• RQ1. Is the number of relevant facts identified by BUGEX small enough for a developer to examine?
# of Branches vs Time to Converge
Bran
ches
Seconds
RQ1: BugEx compared to Statistical Debugging
BugEx
Empirical Research Questions
• RQ2. Do the facts identified by BUGEX help the developer understand the failure?
• Authors answered ‘yes’, compared their fix with the ‘official fix’. Challenging because sometimes the original developers refactored the code at a larger scale.
Subsequent User Studies: nope
“This study showed how much effort the design and preparation of
a user study requires, and how easy error prone it is. This is
probably the reason, why there are still so few user studies in the
field of automated debugging.”
“So there was little time to prepare BUGEX and the underlying
infrastructure.”
Roßler, Jeremias. "From software failure to explanation." (2013).
Summary
• BugEx combines Statistical Debugging and Automated Test Generation (GA) to improve debugging precision.
• BugEx treats debugging is a search problem, and tries to find information that is useful to developers.
• Usefulness difficult to evaluate because prototype tool is very specific.
Relation of BugEx to Project
• Guided automatic test generation. • Focus on message passing programs, observed at the
component level (ROS – robot operating system)• Use program traces to generate test suites for
regression testing, based on component properties.