bug isolation in the presence of multiple errors ben liblit, mayur naik, alice x. zheng, alex aiken,...

24
Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Post on 18-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Bug Isolation in thePresence of Multiple Errors

Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan

UC Berkeley and Stanford University

Page 2: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

In Our Last Episode…

• Generic instrumentation schemes– Wild guesses about interesting behavior

• Bernoulli sampling transformation– Amortization across acyclic regions

• Data mining techniques– Deterministic: process of elimination– Non-deterministic: logistic regression

Page 3: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Schemes Sites Predicates

• Several instrumentation schemes available– Function returns, pairwise comparisons, branches, …

• Scheme induces finite set of instrumentation sites• Site determines finite set of observable predicates• Predicates completely partition each site

– Bump exactly one counter per observation

– Infer additional predicates (e.g. ≤, ≠, ≥) offline

Page 4: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

What Does This Give Us?

• Absolutely certain of what we do see

• Uncertain of what we don’t see

• Given enough runs, samples ≈ reality– Common events seen most often– Rare events seen at proportionate rate

Page 5: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Regularized Logistic Regression

• S-shaped cousin to linear regression• Predict success/failure as function of counters• Penalty factor forces most coefficients to zero

– Large coefficient highly predictive of failure

count

failure = 1

success = 0

Page 6: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Buffer Overrun in bc

void more_arrays (){ …

/* Copy the old arrays. */ for (indx = 1; indx < old_count; indx++) arrays[indx] = old_ary[indx];

/* Initialize the new elements. */ for (; indx < v_count; indx++) arrays[indx] = NULL;

…}

#1: indx > scale#2: indx > use_math#3: indx > opterr#4: indx > next_func#5: indx > i_base

#1: indx > scale#2: indx > use_math#3: indx > opterr#4: indx > next_func#5: indx > i_base

Page 7: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Limitations of Logistic Regression

• Linearly-weighted combination of features– What does this mean?

• Many correlated features– Weight may be spread in unpredictable ways

• Suited to explaining a single mode of failure– Do you really believe you have just one bug?

Page 8: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Multiple-Bug Isolation

• Consider predicates one at a time– Include inferred predicates (e.g. ≤, ≠, ≥)

• How likely is failure when predicate P is true?– (technically, when P is observed to be true)

Page 9: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Multiple-Bug Isolation

• Consider predicates one at a time– Include inferred predicates (e.g. ≤, ≠, ≥)

• How likely is failure when predicate P is true?– (technically, when P is observed to be true)

)()(

)()(

PFPS

PFPBad

Page 10: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Are We Done? Not Exactly!

f = …;

if (f == NULL) {

x = 0;

*f;

}

• Predicate (x == 0) is an innocent bystander– Program is already doomed

Bad(f == NULL)

= 1.0

Bad(f == NULL)

= 1.0Bad(x == 0)

= 1.0

Bad(x == 0)

= 1.0

Page 11: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Three-Valued Logic

• Identify unlucky sites on the doomed path

• Captures risk of failure from reaching site at all, regardless of predicate truth/falsehood

)()(

)()(

PPSPPS

PPFPContext

Page 12: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Getting to the Heart of the Matter

• Looking for increase in failure odds

• Correspondence to likelihood ratio testing

)()()( PContextPBadPIncrease

Page 13: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Multiple-Bug Filtering & Ranking

1. Discard predicates having Increase(P) ≤ 0– Dead predicates– Invariant predicates– Bystander predicates– Others

2. Sort remaining predicates by Bad(P)– Likely causes with determinacy metrics

Page 14: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Case Study: Moss

• Reintroduce nine historic Moss bugs– Including wrong-output bugs

• Instrument with everything we’ve got– Branches, returns, scalar pairs, the works

• Generate 32,000 randomized runs

Page 15: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Effectiveness of Filtering

• Eliminates 99% of branch predicates– 4170 → 51

• Eliminates 99.5% of return predicates– 2964 → 16

• Eliminates 96% of scalar pair predicates– 195,864 → 8242

Page 16: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Effectiveness of Ranking

• Five bugs: captured by branches, returns– Lists are short, easy to examine by hand– “Smoking guns” rise to the top– Stop early if Bad() dips down

• Two bugs: buried in scalar pairs results– List is still too large to be useful

• Two bugs: never cause a failure– No failure, no problem!

Page 17: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Summary: Putting it All Together

• Wild guesses + fair random sampling• Seek behaviors that co-vary with outcome

– Statistical modeling, confidence tests, more…

• Future work– More selective instrumentation schemes– Non-uniform sampling– Improved statistical models– Use of program structure in analysis

Page 18: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Join the Cause!

The Cooperative Bug Isolation Projecthttp://www.cs.berkeley.edu/~liblit/sampler/

Page 19: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University
Page 20: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Linear Regression

xxXyYP T

0)|(

• Match a line to the data points• Outcome can be anywhere along y axis• But our outcomes are always 0/1

Page 21: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Logistic Regression

xxXYP

T

0exp1

1)|1(

• Prediction asymptotically approaches 0 and 1– 0: predict success

– 1: predict failure

Page 22: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Training the Model

xYPy

xYPyyxLL

|11log)1(

|1log),,(

• Maximize LL using stochastic gradient ascent• Problem: model is wildly under-constrained

– Far more counters than runs

– Will get perfectly predictive model just using noise

Page 23: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University

Regularized Logistic Regression

j j

xYPy

xYPyyxLL

|11log)1(

|1log),,(

• Add penalty factor for nonzero terms• Force most coefficients to zero• Retain only features that “pay their way” by

significantly improving prediction accuracy

Page 24: Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University