mining behavior graphs for backtrace of noncrashing bugs chao liu, xifeng yan, hwanjo yu, jiawei han...

30
Mining Behavior Graphs for “Backtrace” of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana- Champaign Philip S. Yu IBM T. J. Watson Research Presented by: Chao Liu

Upload: adrian-suarez

Post on 26-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Mining Behavior Graphs for “Backtrace” of Noncrashing Bugs

Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei HanUniversity of Illinois at Urbana-Champaign

Philip S. YuIBM T. J. Watson Research

Presented by: Chao Liu

Page 2: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Outline

Motivations Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Case Study Conclusions

Page 3: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Motivations

• Software is full of bugs– Windows 2000, 35M LOC

• 63,000 known bugs at the time of release, 2 per 1000 lines

• Software failure costs– Ariane 5 explosion is due to “errors in the software

of the inertial reference system” (Ariaen-5 flight 501inquiry board report http://ravel.esrin.esa.it/docs/esa-x-1819eng.pdf)

– A study by the National Institute of Standards and Technology found that software errors cost the U.S.economy about $59.5 billion annuallyhttp://www.nist.gov/director/prog-ofc/report02-3.pdf

• Testing and debugging are laborious and expensive– “50% of my company employees are testers, and the rest spends 50% of

their time testing!” --Bill Gates, in 1995

Courtesy to CNN.com

Page 4: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Bug Localization

• Automatically circle out the most suspicious places

• Two kinds of bugs w.r.t. symptoms– Crashing bugs

• Typical symptoms: segmentation faults• Reasons: memory access violations

– Noncrashing bugs• Typical symptoms: smooth executions but unexpected

outputs• Reasons: logic or semantic errors• An example

Page 5: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Running Example

• Subject program– replace: perform regular

expression matching and substitutions

– 563 lines of C code– 17 functions are involved

• Execution behaviors– 130 out of 5542 test cases

fail to give correct outputs– No incorrect executions

incur segmentation faults

• Debug method– Step-by-step tracing

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m >= 0){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if ((m >= 0) && (lastm != m) ){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

Page 6: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Debugging Crashes

Page 7: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Bug Localization via Backtrace

• Backtrace for noncrashing bugs?

• Major challenges– No abnormality is visible on the surface.– When and where the abnormality happens.

Page 8: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Outline

Motivations Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Case Study Conclusions

Page 9: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Related Work

• Crashing bugs– Memory access monitoring

• Purify [HJ92], Valgrind [SN00], GDB …

• Noncrashing bugs– Static program analysis– Traditional model checking– Model checking source code

Page 10: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Static Program Analysis

• Methodology– Examine source code directly– Enumerate all the possible execution paths without running the program– Check user-specified properties, e.g.

• free(p) …… (*p)• lock(res) …… unlock(res)• receive_ack() … … send_data()

• Strengths– Check all possible execution paths

• Problems– Shallow semantics– Properties should be directly mapped to source code structure

• Tools– ESC [DRL+98], LCLint [EGH+94], ESP [DLS02], MC Checker [ECC00] …

×

Page 11: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Traditional Model Checking

• Methodology– Model program computation as finite state machines– It is described with a particular description language– Exhaustively explore all the reachable states in checking desired or

undesired properties

• Strengths– Model deeper semantics– Naturally fit in checking event-driven systems, like protocols

• Problems– Significant amount of manual efforts in modeling– State space explosion

• Tools– SMV [M93], SPIN [H97], Murphi [DDH+92] …

Page 12: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Model Checking Source Code

• Methodology– Execute the real program in a sandbox (e.g., virtual machine)– Manipulate event happenings, e.g.,

• Message incomings• Return value of memory allocation

• Strengths– Less significant manual specification

• Problems– Application restrictions, e.g.,

• Event-driven programs (still)• Clear mapping between source code and logic event

• Tools– CMC [MPC+02], Verisoft [G97], Java PathFinder [BHP+-00] …

Page 13: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Summary of Related Work

• In summary,– Semantic inputs are necessary

• Program model• Properties to be checked (all three methods)

– Restricted application domain• Event-driven model• Properties are also event-related.

Page 14: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m > 0){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m >= 0){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

Example Revisited

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m >= 0){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if ((m >= 0) && (lastm != m) ){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

• No memory violations

• Not event-driven program

• No explicit error properties

Page 15: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Outline

Motivations Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Case Study Conclusions

Page 16: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Synopsis of Program Execution

• Program behavior graphs– Function-level abstraction of program behaviors– Function calls and transitions– First-order sequential information about function interactions

int main(){ ... A(); ... B();}int A(){ ... }int B(){ ... C() ... }int C(){ ... }

Page 17: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Identification of Incorrect Executions

• A two-class classification problem– Every execution gives one behavior graph– Edges and closed frequent subgraphs as features

• Is classification useful?– Classification itself does not work for bug localization

• Classifier only labels each run as either correct or incorrect as a whole• It does not tell when and where abnormality happens

• Observations– Good classifiers know the differences between correct and

incorrect execution• Difference, a kind of abnormality?

– Where and when does abnormality happens?• Incremental classification

?

Page 18: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Outline

Motivations Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Case Study Conclusions

Page 19: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Incremental Classification

• Classification works only when instances from two classes are different.

• Precision as a measure of the difference.

• Incremental classification • Observe accuracy dynamics

Page 20: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Illustration: Precision Boost

main main

A A

B C

D

B C

D

One Correct Execution One Incorrect Execution

E E

F

G

F

G

H

Page 21: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Bug Relevance

• Precision boost– For each function F:

• Precision boost = Exit precision - Entrance precision.

– Intuition & heuristics• Differences take place within the execution of F• Abnormality happens while F is in the stack• The larger the boost, the more likely F is relevant to the bug

• Bug-relevant function

Page 22: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Outline

Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Case Study Conclusions

Page 23: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Case Study

• Subject program– replace: perform regular

expression matching and substitutions

– 563 lines of C code– 17 functions are involved

• Execution behaviors– 130 out of 5542 test cases

fail to give correct outputs– No incorrect executions

incur segmentation faults

• Task– Can we circle out the

backtrace for this bug?

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if (m >= 0){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

void subline(char *lin, char *pat, char *sub)

{

int i, lastm, m;

lastm = -1;

i = 0;

while((lin[i] != ENDSTR)) {

m = amatch(lin, i, pat, 0);

if ((m >= 0) && (lastm != m) ){

putsub(lin, i, m, sub);

lastm = m;

}

if ((m == -1) || (m == i)){

fputc(lin[i], stdout);

i = i + 1;

} else

i = m;

}

}

Page 24: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Precision Pairs

Page 25: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Backtrace for Noncrashing Bugs

Page 26: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Outline

Motivations Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Case Study Conclusions

Page 27: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Conclusions

• Identify incorrect executions from program runtime behaviors.

• Classification dynamics can give away “backtrace” for noncrashing bugs without any semantic inputs.

• Data mining can contribute to software engineering and system researches in general.

Mining into

Software and

Systems?

Page 28: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

References

• [DRL+98] David L. Detlefs, K. Rustan, M. Leino, Greg Nelson and James B. Saxe. Extended static checking, 1998

• [EGH+94] David Evans, John Guttag, James Horning, and Yang Meng Tan. LCLint: A tool for using specifications to check code. In Proceedings of the ACM SIG-SOFT '94 Symposium on the Foundations of Software Engineering, pages 87-96, 1994.

• [DLS02] Manuvir Das, Sorin Lerner, and Mark Seigle. Esp: Path-sensitive program verication in polynomial time. In Conference on Programming Language Design and Implementation, 2002.

• [ECC00] D.R. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specic, programmer-written compiler extensions. In Proceedings of the Fourth Symposium on Operating Systems Design and Implementation, October 2000.

• [M93] Ken McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993• [H97] Gerard J. Holzmann. The model checker SPIN. Software Engineering, 23(5):279-

295, 1997.• [DDH+92] David L. Dill, Andreas J. Drexler, Alan J. Hu, and C. Han Yang. Protocol

verication as a hardware design aid. In IEEE International Conference on Computer Design: VLSI in Computers and Processors, pages 522-525, 1992.

• [MPC+02] Madanlal Musuvathi, David Y.W. Park, Andy Chou, Dawson R. Engler and David L. Dill. CMC: A Pragmatic Approach to Model Checking Real Code. In Proceedings of the fifth Symposium on Operating Systems Design and Implementation, 2002.

Page 29: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

References (cont’d)

• [G97] P. Godefroid. Model Checking for Programming Languages using VeriSoft. In Proceedings of the 24th ACM Symposium on Principles of Programming Languages, 1997

• [BHP+-00] G. Brat, K. Havelund, S. Park, and W. Visser. Model checking programs. In IEEE International Conference on Automated Software Engineering (ASE), 2000.

• [HJ92] R. Hastings and B. Joyce. Purify: Fast Detection of Memory Leaks and Access Errors. 1991. in Proceeding of the fthe Winter 1992 USENIX Conference, pages 125-138. San Francisco, California

• [SN00] Julian Seward and Nick Nethercote. Valgrind, an open-source memory debugger for x86-GNU/Linux http://valgrind.org/

• [LLM+04] Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code, in Proceeding of the 6th Symposium of Operating Systems Design and Implementation, 2004

• [LCS+04] Zhenmin Li, Zhifeng Chen, Sudarshan M. Srinivasan, Yuanyuan Zhou. C-Miner: Mining Block Correlations in Storage Systems. In proceeding of the 3rd usenix conferences on file and storage technologies, 2004

Page 30: Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip

Q & A

Thank You!