“isolating failure causes through test case generation “ jeremias rößler gordon fraser...

“Isolating Failure Causes through Test Case Generation “

Jeremias RoßlerGordon FraserAndreas Zeller

Alessandro Orso

Presented by John-Paul Ore

Motivation: Debugging & Maintenance is Super Expensive

• Cost to develop software worldwide: $1,500,000,000,000 (USD)

• Debugging and Maintenance cost$350,000,000,000 (USD) (assumes 23% of developer time spent

debugging)

Source: Judge Business School of the University of Cambridge, UK (2013) Evans Data Corporation (2012), Payscale (2012), RTI (2002), CVP Surveys (2012)

What is Debugging?

Finding the fault responsible for the failure, and applying a change to program P such that P is correct with regard to the specification S concerning the failure.

Debugging includes a search problem. We can automate search.

Talk Outline

• Problems BugEx seeks to address• Background concepts• Inner Workings of BugEx Algorithm• Empirical Evaluation• Relation of this work to 990 Class Project

Automated Debugging: still a hard problem

Parnin, Chris, and Alessandro Orso. "Are automated debugging techniques actually helping programmers?." Proceedings of the 2011 International Symposium on Software Testing and Analysis. ACM, 2011.

BugEx : Overview Problems addressed

Problem 1: Automated debugging techniques reveal too many possible code locations

Solution 1: Increase precision through guided test-generation

Problem 2: Even if the location is known, developer might not have perfect bug understanding

Solution 2: presents ‘facts’ rather than code locationProblem 3: Other experimental techniques unsound (Delta Debugging, Predicate switching)

Solution 3: Generate real program executions

BugEx: Underlying Concepts

1. Expands on statistical debugging. Correlate program facts with failures

1. BugEx extends Statistical DebuggingBenjamin Liblit et al.

Liblit, B., Aiken, A., Zheng, A. X., & Jordan, M. I. (2003). Bug isolation via remote program sampling. ACM SIGPLAN Notices, 38(5), 141-154.(and more, identified in the paper)

“Statistical debugging works off of the contrast between good and bad runs, so you need to feed it both.” – B. Liblit.

Passing test case

Failing test case

BugEx: Underlying Concepts

1. Expands on statistical debugging. Correlate program facts with failures

2. Use automatic test generation (genetic algorithms) to create statistically significant number of tests

2. Test Case GenerationGenetic Algorithms

• Individual is a TEST encoded in JAVA bytecode • Mutation might change

branching or variable values

•TE

ST_a

•TE

ST_b

•TE

ST_b

’

•TE

ST_a

’

• Fitness branch distance or predicate distance (closer is better)

Image http://www.ewh.ieee.org/soc/es/May2001/14/Begin.htm

Test Case GenerationGenetic Algorithms

• Shape of the search directs fitness function (the gradient)

• Globally Optimality not guaranteed

Image © Mathworks, 2010

Overview of BugEx (hint: it’s a Search)

Generate Tests explore search space(Genetic algorithm)

Find facts that correlate with failure to guide test generation(Statistical debugging)

Show results

BugEx Algorithm : Initialization(figure 4 p. 312)

BugEx Algorithm : Main Loop(figure 4 p. 312)

(of the best!)

(Statistical Debugging)

(Genetic Algorithm)

LOO

P

(branches or state predicates)

14. F := getFacts(Tfail) U getFacts(Tpass) U F

1. Fact must be Boolean: either true or false at runtime2. Fact must be observable.

Branches• Reached or not reached• T or F branch taken?

attribute | parameters | inspector

< | > | <= | >= | = | !=attribute | parameters | inspector | constant

State Predicates• All available variables, objects,

constants at beginning of method

? How Big is this space (in Big O) ? {

16. Fcorrelating := correlateToFailure(F, Tfail, Tpass)

Bayes’ Theorem

Bayesian Inference

Slides courtesy of Jeremias Roßler (2012)

Empirical Evaluation

Empirical Research Questions

• RQ1. Is the number of relevant facts identified by BUGEX small enough for a developer to examine?

• RQ1. Is the number of relevant facts identified by BUGEX small enough for a developer to examine?

# of Branches vs Time to Converge

Bran

ches

Seconds

RQ1: BugEx compared to Statistical Debugging

BugEx

Empirical Research Questions

• RQ2. Do the facts identified by BUGEX help the developer understand the failure?

• Authors answered ‘yes’, compared their fix with the ‘official fix’. Challenging because sometimes the original developers refactored the code at a larger scale.

Subsequent User Studies: nope

“This study showed how much effort the design and preparation of

a user study requires, and how easy error prone it is. This is

probably the reason, why there are still so few user studies in the

field of automated debugging.”

“So there was little time to prepare BUGEX and the underlying

infrastructure.”

Roßler, Jeremias. "From software failure to explanation." (2013).

Summary

• BugEx combines Statistical Debugging and Automated Test Generation (GA) to improve debugging precision.

• BugEx treats debugging is a search problem, and tries to find information that is useful to developers.

• Usefulness difficult to evaluate because prototype tool is very specific.

Relation of BugEx to Project

• Guided automatic test generation. • Focus on message passing programs, observed at the

component level (ROS – robot operating system)• Use program traces to generate test suites for

regression testing, based on component properties.

“isolating failure causes through test case generation “ jeremias rößler gordon fraser...

Documents