bug isolation via remote program sampling ben liblit, alex aiken, alice x.zheng, michael i.jordan...
TRANSCRIPT
Bug Isolation via Remote Program Sampling
Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan
Presented by: Xia Cheng
Outline of My Talk
Bug Isolation Using Predicate Elimination Statistical Debugging Related Work Privacy and Security Future Work Conclusions
Bug Isolation
Using Predicate Elimination Instrumentation strategyElimination strategiesData collection and analysisRefinement over timePerformance Impact
Bug Isolation Instrumentation strategy
Automatic isolation of deterministic bugs Release 1.2 of the ccrypt encryption tool
Randomly sampling function Return values may identify key operations that behave differe
ntly in successful versus crashed runs
Group return value into three classes Negative values, zero, and positive values
Bug IsolationInstrument ccrypt
Syntactic call site
Return scalar values
Update one of three counters
triple of counters
Negative value zero positive value
Bug Isolation Elimination Strategies - Discard irrelevant predicates
Elimination by universal falsehood Disregard any counter that is zero on all runs Represent predicates that can never be true
Elimination by lack of failing coverage Disregard any triple of counters all three of which
are zero on all failed runs Not even reached in failing executions
Bug Isolation Discard irrelevant predicates (cont…)
Elimination by lack of failing example Disregard any counter that is zero on all failed runs Not be true for a failure to occur
Elimination by successful counter example Disregard any counter that has a non-zero value on any
successful run Can be true without a subsequent program failure
Bug Isolation
universal falsehood discards 1569 counters-zero on all runs
leaves 141 candidate predicates
lack of failing coverage discards 526 counter triples –all zero on all crashes
leaves 132 candidate predicates
lack of failing example discards 1665 counters-zero on all crashes
Leaves 45 candidate predicates
successful counterexample
Discards 139 counters-non-zero on any successful run
Leaves 1571 candidate predicates
•Data Collection and Analysis•Successful counterexample distinct, the another three partially overlapped•Falsehood and counterexample test disjoint properties, combined to good effect•Falsehood and counterexample at successful runs, be analyzed together•Failing example eliminates the most features, combine with counterexample•Failing coverage is an inherently weaker strategy
2990 trial runs at sampling rate 1/1000; 88 of these end in a crash
Bug Isolation•Refinement over time
•Elimination strategies benefit from increasing the number of runs•On average, 1750 runs are enough to isolate twenty candidate features
•Greater diversity benefits the analysis
Bug Isolation Performance Impact
Sampling transformation a simpler but slower pattern of checking the next-
sample countdown at each and every site
the performance impact minimal for sampled instrumentation
Overhead for 1/1000 sampling is less than 4%
Outline of My Talk
Bug Isolation Using Predicate Elimination Statistical Debugging Related Work Privacy and Security Future Work Conclusions
Statistical Debugging
Instrumentation strategy Crash prediction using logistic regression Data collection and analysis Performance Impact
Statistical Debugging Automatic isolation of non-deterministic bu
gs
Instrumentation strategy
Instrument bc to guess and randomly check a large number of predicates
Goal: identify predicates capturing bad behavior-false on success, true on crashing
Cast an extremely broad net An eye toward pointer and buffer error
Statistical Debugging Crash prediction using logistic regression
Goal: narrow down the set of features Method: balance good classification performance
with aggressive feature selection Binary classifier
Given by quantizing the logistic function output Takes feature values as input, and outputs a prediction of
either 0 or 1 Feature selection
Achieved by regularizing the function parameters to ignore most input feature,
Forcing it to form a model that predicts success or failure using just a small selection of sampled features
Statistical Debugging to learn a good classifier, maximize the log
likelihood of the training set
distribution is modeled as logistic function
penalized log likelihood function
Statistical Debugging
Data collection and analysis
bc data set consists of 4390 runs with distinct random inputs and distinct randomized 1/1000 sampling
Statistical Debugging
Statistical Debugging
• Performance Impact
Outline of My Talk
Bug Isolation Using Predicate Elimination Statistical Debugging Related Work Privacy and Security Future Work Conclusions
Related Work Performance profiling and optimization
Triggers - periodic hardware timers/interrupts, periodic software event counters or both [ M.Arnold, 2000]
Digital Continuous Profiling Infrastructure [Anderson 1997] choosing sampling intervals randomly
Trace collection on program understanding Difficulty
Minimizing performance overhead and managing large quantities of captured data
Directly adapt dynamic trace analysis techniques to domain
Related Work (cont…) Sharing the following techniques but with new approach
Daikon, fairly unstructured guesses and eliminate not holding ones [Ernst 2001]
new approach: gathering data from production code
DIDUCE, identify bugs using analysis of executions [Hangal 2002] new approach: more probabilistic, correlating predicate violations
with increased likelihood of failure
Software tomography, through the GAMMA system, low-overhead distributed monitoring of deployed code [Bowring 2002]
new approach: bug isolation
Outline of My Talk
Bug Isolation Using Predicate Elimination Statistical Debugging Related Work Privacy and Security Future Work Conclusions
Privacy and Security
Privacy and Security Statistical model - an mechanism for protecting
user anonymity
Logistic regression - parameters updated with a new trace
Statistical approach with noise - against malicious users
Collaborative filtering system
Outline of My Talk
Bug Isolation Using Predicate Elimination Statistical Debugging Related Work Privacy and Security Future Work Conclusions
Future Work Public Deployment of Cooperative Bug Iso
lation Scalable Statistical Bug Isolation Path Optimization in Programs and its Ap
plication to Debugging Statistical Debugging: Simultaneous Identi
fication of Multiple Bugs The cooperative Bug Isolation Project, visi
t www.cs.wisc.edu/cbi/
Outline of My Talk
Bug Isolation Using Predicate Elimination Statistical Debugging Related Work Privacy and Security Future Work Conclusions
Conclusions sampling infrastructure - gathering
information from the set of runs produced by the user community
Bernoulli process to do the sampling several sample applications
Sharing the overhead of assertionsPredicating guessing and elimination to
isolate a deterministic bugRegularizing logistic regression to isolate a
non-deterministic memory corruption error