Download - CS590Z Statistical Debugging
![Page 1: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/1.jpg)
CS590Z Statistical Debugging
Xiangyu Zhang
(part of the slides are from Chao Liu)
![Page 2: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/2.jpg)
A Very Important Principle Traditional debugging techniques deal with
single (or very few) executions. With the acquisition of a large set of
executions, including passing and failing executions, statistical debugging is often highly effective. Failure reporting In house testing
![Page 3: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/3.jpg)
Tarantula (ASE 2005, ISSTA 2007)
![Page 4: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/4.jpg)
Scalable Remote Bug Isolation (PLDI 2004, 2005) Look at predicates
Branches Function returns (<0, <=0, >0, >=0, ==0, !=0) Scalar pairs
For each assignment x=…, find all variables y_i and constants c_j, each pair of x (=,<,<=…) y_i/c_j
Sample the predicate evaluations (Bernoulli sampling) Investigate the relation of the probability of a predicate be
ing true with the bug manifestion.
![Page 5: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/5.jpg)
Bug Isolation
![Page 6: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/6.jpg)
Bug Isolation
How much does P being true increase the probability of failure over simply reaching the line P is sampled.
![Page 7: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/7.jpg)
An Example Symptoms
563 lines of C code 130 out of 5542 test
cases fail to give correct outputs
No crashes The predicate are
evaluated to both true and false in one execution
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if (m >= 0){
putsub(lin, i, m, sub);
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if ((m >= 0) && (lastm != m) ){
putsub(lin, i, m, sub);
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
not enough
![Page 8: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/8.jpg)
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if ((m >= 0) && (lastm != m) ){
putsub(lin, i, m, sub);
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
P_f (A) = tilde P (A | A & !B)
P_t (A) = tilde P (A | !(A&!B))
![Page 9: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/9.jpg)
Program Predicates A predicate is a proposition about any program prop
erties e.g., idx < BUFSIZE, a + b == c, foo() > 0 … Each can be evaluated multiple times during one executio
n Every evaluation gives either true or false
Therefore, a predicate is simply a boolean random variable, which encodes program executions from a particular aspect.
![Page 10: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/10.jpg)
Evaluation Bias of Predicate P Evaluation bias
Def’n: the probability of being evaluated as true within one execution Maximum likelihood estimation: Number of true evaluations over the tot
al number of evaluations in one run Each run gives one observation of evaluation bias for predicate P
Suppose we have n correct and m incorrect executions, for any predicate P, we end up with An observation sequence for correct runs
S_p = (X’_1, X’_2, …, X’_n) An observation sequence for incorrect runs
S_f = (X_1, X_2, …, X_m) Can we infer whether P is suspicious based on S_p and S_f?
![Page 11: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/11.jpg)
Underlying Populations Imagine the underlying distribution of evaluation bias for correct and
incorrect executions are and S_p and S_f can be viewed as a random sample from the underlying p
opulations respectively One major heuristic is
The larger the divergence between and , the more relevant the predicate P is to the bug
0 1
Prob
Evaluation bias0 1
Prob
Evaluation bias
![Page 12: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/12.jpg)
Major Challenges
No knowledge of the closed forms of both distributions
Usually, we do not have sufficient incorrect executions to estimate reliably.
0 1
Prob
Evaluation bias0 1
Prob
Evaluation bias
![Page 13: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/13.jpg)
Our Approach
![Page 14: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/14.jpg)
Algorithm Outputs A ranked list of program predicates w.r.t. the
bug relevance score s(P) Higher-ranked predicates are regarded more relev
ant to the bug What’s the use?
Top-ranked predicates suggest the possible buggy regions
Several predicates may point to the same region … …
![Page 15: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/15.jpg)
Outline Program Predicates Predicate Rankings Experimental Results Case Study: bc-1.06 Future Work Conclusions
![Page 16: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/16.jpg)
Experiment Results Localization quality metric
Software bug benchmark Quantitative metric
Related works Cause Transition (CT), [CZ05] Statistical Debugging, [LN+05]
Performance comparisons
![Page 17: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/17.jpg)
Bug Benchmark Bug benchmark
Dreaming benchmark Large number of known bugs on large-scale programs with adequate test suite
Siemens Program Suite 130 variants of 7 subject programs, each of 100-600 LOC 130 known bugs in total mainly logic (or semantic) bugs
Advantages Known bugs, thus judgments are objective Large number of bugs, thus comparative study is statistically significant.
Disadvantages Small-scaled subject programs
State-of-the-art performance, so far claimed in literature, Cause-transition approach, [CZ05]
![Page 18: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/18.jpg)
Localization Quality Metric [RR03]
![Page 19: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/19.jpg)
1st Example
1
23
5
4
9
6
10
8
7
T-score = 70%
![Page 20: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/20.jpg)
2nd Example
1
23
74
9
6
10
5
T-score = 20%8
![Page 21: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/21.jpg)
Related Works Cause Transition (CT) approach [CZ05]
A variant of delta debugging [Z02] Previous state-of-the-art performance holder on Siemens s
uite Published in ICSE’05, May 15, 2005 Cons: it relies on memory abnormality, hence its performan
ce is restricted. Statistical Debugging (Liblit05) [LN+05]
Predicate ranking based on discriminant analysis Published in PLDI’05, June 12, 2005 Cons: Ignores evaluation patterns of predicates within each e
xecution
![Page 22: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/22.jpg)
Localized bugs w.r.t. Examined Code
![Page 23: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/23.jpg)
Cumulative Effects w.r.t. Code Examination
![Page 24: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/24.jpg)
Top-k Selection
Regardless of specific selection of k, both Liblit05 and SOBER are better than CT, the current state-of-the-art holder
From k=2 to 10, SOBER is better than Liblit05 consistently
![Page 25: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/25.jpg)
Outline Evaluation Bias of Predicates Predicate Rankings Experimental Results Case Study: bc-1.06 Future Work Conclusions
![Page 26: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/26.jpg)
Case Study: bc 1.06 bc 1.06
14288 LOC An arbitrary-precision calculator shipped with most
distributions of Unix/Linux Two bugs were localized
One was reported by Liblit in [LN+05] One was not reported previously
Some lights on scalability
![Page 27: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/27.jpg)
Outline Evaluation Bias of Predicates Predicate Rankings Experimental Results Case Study: bc-1.06 Future Work Conclusions
![Page 28: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/28.jpg)
Future Work Further leverage the localization quality Robustness to sampling Torture on large-scale programs to confirm its
scalability to code size …
![Page 29: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/29.jpg)
Conclusions We devised a principled statistical method for
bug localization. No parameter setting hassles It handles both crashing and noncrashing bugs.
Best quality so far.
![Page 30: CS590Z Statistical Debugging](https://reader035.vdocument.in/reader035/viewer/2022062723/56813fd8550346895daabdf4/html5/thumbnails/30.jpg)
Discussion Features
Easy implementation Difficult experimentation More advanced statistical technique may not be necessary Go wide, not go deep…
Predicates are treated as independent random variables.
Can execution indexing help? Can statistical principles be combined with slicing or
IWIH ?