cs590z statistical debugging xiangyu zhang (part of the slides are from chao liu)

30
CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Post on 20-Dec-2015

229 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

CS590Z Statistical Debugging

Xiangyu Zhang

(part of the slides are from Chao Liu)

Page 2: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

A Very Important Principle Traditional debugging techniques deal with

single (or very few) executions. With the acquisition of a large set of

executions, including passing and failing executions, statistical debugging is often highly effective. Failure reporting In house testing

Page 3: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Tarantula (ASE 2005, ISSTA 2007)

Page 4: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Scalable Remote Bug Isolation (PLDI 2004, 2005) Look at predicates

Branches Function returns (<0, <=0, >0, >=0, ==0, !=0) Scalar pairs

For each assignment x=…, find all variables y_i and constants c_j, each pair of x (=,<,<=…) y_i/c_j

Sample the predicate evaluations (Bernoulli sampling) Investigate the relation of the probability of a predicate be

ing true with the bug manifestion.

Page 5: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Bug Isolation

Page 6: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Bug Isolation

How much does P being true increase the probability of failure over simply reaching the line P is sampled.

Page 7: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

An Example Symptoms

563 lines of C code 130 out of 5542 test

cases fail to give correct outputs

No crashes The predicate are

evaluated to both true and false in one execution

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m >= 0){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if ((m >= 0) && (lastm != m) ){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

not enough

Page 8: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if ((m >= 0) && (lastm != m) ){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

P_f (A) = tilde P (A | A & !B)

P_t (A) = tilde P (A | !(A&!B))

Page 9: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Program Predicates A predicate is a proposition about any program prop

erties e.g., idx < BUFSIZE, a + b == c, foo() > 0 … Each can be evaluated multiple times during one executio

n Every evaluation gives either true or false

Therefore, a predicate is simply a boolean random variable, which encodes program executions from a particular aspect.

Page 10: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Evaluation Bias of Predicate P Evaluation bias

Def’n: the probability of being evaluated as true within one execution Maximum likelihood estimation: Number of true evaluations over the tot

al number of evaluations in one run Each run gives one observation of evaluation bias for predicate P

Suppose we have n correct and m incorrect executions, for any predicate P, we end up with An observation sequence for correct runs

S_p = (X’_1, X’_2, …, X’_n) An observation sequence for incorrect runs

S_f = (X_1, X_2, …, X_m) Can we infer whether P is suspicious based on S_p and S_f?

Page 11: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Underlying Populations Imagine the underlying distribution of evaluation bias for correct and

incorrect executions are and S_p and S_f can be viewed as a random sample from the underlying p

opulations respectively One major heuristic is

The larger the divergence between and , the more relevant the predicate P is to the bug

0 1

Prob

Evaluation bias0 1

Prob

Evaluation bias

Page 12: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Major Challenges

No knowledge of the closed forms of both distributions

Usually, we do not have sufficient incorrect executions to estimate reliably.

0 1

Prob

Evaluation bias0 1

Prob

Evaluation bias

Page 13: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Our Approach

Page 14: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Algorithm Outputs A ranked list of program predicates w.r.t. the

bug relevance score s(P) Higher-ranked predicates are regarded more relev

ant to the bug What’s the use?

Top-ranked predicates suggest the possible buggy regions

Several predicates may point to the same region … …

Page 15: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Outline Program Predicates Predicate Rankings Experimental Results Case Study: bc-1.06 Future Work Conclusions

Page 16: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Experiment Results Localization quality metric

Software bug benchmark Quantitative metric

Related works Cause Transition (CT), [CZ05] Statistical Debugging, [LN+05]

Performance comparisons

Page 17: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Bug Benchmark Bug benchmark

Dreaming benchmark Large number of known bugs on large-scale programs with adequate test suite

Siemens Program Suite 130 variants of 7 subject programs, each of 100-600 LOC 130 known bugs in total mainly logic (or semantic) bugs

Advantages Known bugs, thus judgments are objective Large number of bugs, thus comparative study is statistically significant.

Disadvantages Small-scaled subject programs

State-of-the-art performance, so far claimed in literature, Cause-transition approach, [CZ05]

Page 18: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Localization Quality Metric [RR03]

Page 19: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

1st Example

1

23

5

4

9

6

10

8

7

T-score = 70%

Page 20: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

2nd Example

1

23

74

9

6

10

5

T-score = 20%8

Page 21: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Related Works Cause Transition (CT) approach [CZ05]

A variant of delta debugging [Z02] Previous state-of-the-art performance holder on Siemens s

uite Published in ICSE’05, May 15, 2005 Cons: it relies on memory abnormality, hence its performan

ce is restricted. Statistical Debugging (Liblit05) [LN+05]

Predicate ranking based on discriminant analysis Published in PLDI’05, June 12, 2005 Cons: Ignores evaluation patterns of predicates within each e

xecution

Page 22: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Localized bugs w.r.t. Examined Code

Page 23: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Cumulative Effects w.r.t. Code Examination

Page 24: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Top-k Selection

Regardless of specific selection of k, both Liblit05 and SOBER are better than CT, the current state-of-the-art holder

From k=2 to 10, SOBER is better than Liblit05 consistently

Page 25: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Outline Evaluation Bias of Predicates Predicate Rankings Experimental Results Case Study: bc-1.06 Future Work Conclusions

Page 26: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Case Study: bc 1.06 bc 1.06

14288 LOC An arbitrary-precision calculator shipped with most

distributions of Unix/Linux Two bugs were localized

One was reported by Liblit in [LN+05] One was not reported previously

Some lights on scalability

Page 27: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Outline Evaluation Bias of Predicates Predicate Rankings Experimental Results Case Study: bc-1.06 Future Work Conclusions

Page 28: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Future Work Further leverage the localization quality Robustness to sampling Torture on large-scale programs to confirm its

scalability to code size …

Page 29: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Conclusions We devised a principled statistical method for

bug localization. No parameter setting hassles It handles both crashing and noncrashing bugs.

Best quality so far.

Page 30: CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Discussion Features

Easy implementation Difficult experimentation More advanced statistical technique may not be necessary Go wide, not go deep…

Predicates are treated as independent random variables.

Can execution indexing help? Can statistical principles be combined with slicing or

IWIH ?