helping programmers debug code using semantic...

21
1 10/06 U Mass DistLect,, BGRyder 1 Helping Programmers Debug Code Using Semantic Change Impact Analysis Dr. Barbara G. Ryder Rutgers University http://www.cs.rutgers.edu/~ryder This research is supported by NSF grants CCR-0204410 & CCR-0331797 and, in part, by IBM Research. Collaborators are Dr. Frank Tip of IBM T.J. Watson Research Center and graduate students Ophelia Chesley, Xiaoxia Ren, Maximilian Stoerzer and Fenil Shah. 10/06 U Mass DistLect,, BGRyder 2 PROLANGS Languages/Compilers and Software Engineering Algorithm design and prototyping Mature research projects Incremental dataflow analysis, TOPLAS 1988, POPL’88, POPL’90, TSE 1990, ICSE97, ICSE99 Pointer analysis of C programs, POPL’91, PLDI’92 Side effect analysis of C systems, PLDI’93, TOPLAS 2001 Points-to and def-use analysis of statically-typed object-oriented programs (C++/Java) - POPL’99, OOPSLA’01, ISSTA’02,Invited paper at CC’03, TOSEM 2005, ICSM 2005 Profiling by sampling in Java feedback directed optimization, LCPC’00, PLDI’01, OOPSLA’02 Analysis of program fragments (i.e., modules, libraries), FSE’99, CC’01, ICSE’03, IEEE-TSE 2004

Upload: nguyendan

Post on 03-May-2018

251 views

Category:

Documents


1 download

TRANSCRIPT

1

10/06 U Mass DistLect,, BGRyder 1

Helping Programmers Debug CodeUsing Semantic Change Impact

Analysis

Dr. Barbara G. RyderRutgers University

http://www.cs.rutgers.edu/~ryder

This research is supported by NSF grants CCR-0204410 &CCR-0331797 and, in part, by IBM Research. Collaboratorsare Dr. Frank Tip of IBM T.J. Watson Research Center andgraduate students Ophelia Chesley, Xiaoxia Ren, MaximilianStoerzer and Fenil Shah.

10/06 U Mass DistLect,, BGRyder 2

PROLANGS• Languages/Compilers and Software Engineering

– Algorithm design and prototyping• Mature research projects

– Incremental dataflow analysis, TOPLAS 1988, POPL’88,POPL’90, TSE 1990, ICSE97, ICSE99

– Pointer analysis of C programs, POPL’91, PLDI’92– Side effect analysis of C systems, PLDI’93, TOPLAS 2001– Points-to and def-use analysis of statically-typed

object-oriented programs (C++/Java) - POPL’99,OOPSLA’01, ISSTA’02,Invited paper at CC’03, TOSEM2005, ICSM 2005

– Profiling by sampling in Java feedback directedoptimization, LCPC’00, PLDI’01, OOPSLA’02

– Analysis of program fragments (i.e., modules, libraries),FSE’99, CC’01, ICSE’03, IEEE-TSE 2004

2

10/06 U Mass DistLect,, BGRyder 3

PROLANGS• Ongoing research projects

– Change impact analysis for object-orientedsystems PASTE’01, DCS-TR-533(9/03), OOPSLA’04,ICSM’05, FSE’06, IEEE-TSE 2007

– Robustness testing of Java web serverapplications, DSN’01, ISSTA’04, IEEE-TSE 2005,Eclipse Wkshp OOPSLA’05, DCS-TR-579 (7/05)

– Analyses to aid performance understanding ofprograms built with frameworks (e.g., Tomcat,Websphere)

– Using more precise static analyses information intesting OOPLs, DCS-TR-574 (4/05), SCAM’06

10/06 U Mass DistLect,, BGRyder 4

Outline• Overview• Contributions• Analysis framework• Tools built

– Chianti• Experiments with 1 year of Daikon CVS data

– JUnitCIA• Experiments on SE class

– CRISP• Related work• Summary

3

10/06 U Mass DistLect,, BGRyder 5

Non-locality of change impactin OO programs

• Small source code changes can havemajor and non-local effects in object-oriented systems– Due to subtyping and dynamic dispatch

X x= new Z()…x.f()

X

Y Z

f()

f()

X x= new Z()…x.f()

10/06 U Mass DistLect,, BGRyder 6

Motivation• Program analysis provides feedback on

semantic impact of changes– E.g., added/deleted method calls, fields and classes

• Object-oriented system presumed to consistof set of classes and set of associated unitor regression tests

• Change impact measured by tests affected– Describes application functionality modified– Discovers need for new tests

4

10/06 U Mass DistLect,, BGRyder 7

Exampleclass A { public void foo() {}class B extends A { public void foo(){ }}class C extends A{}

public void test1{ A a = new A(); a.foo();//A’s foo} public void test2(){ A a = new B(); a.foo(); //B’s foo}public void test3(){ A a = new C(); a.foo();//A’s foo}

A

B C

foo()

foo()

10/06 U Mass DistLect,, BGRyder 8

Exampleclass A { public void foo() { } public int x;}class B extends A { public void foo(){B.bar();} public static void bar() {

y = 17;} public static int y;}class C extends A{ public void foo() { x = 18; } public void baz() { z = 19;} public int z;}

A

B C

foo(), x

foo(), ybar()

foo(), zbaz()

Question: what affect did thisedit have on these tests?

5

10/06 U Mass DistLect,, BGRyder 9

Example - Changes to foo()class A { public void foo() { } public int x;}class B extends A { public void foo(){B.bar();} public static void bar()

{y=17;} public static int y;}class C extends A{ public void foo() { x = 18; } public void baz() { z = 19;} public int z;}

public void test1{ A a = new A(); a.foo();//A’s foo} public void test2(){ A a = new B(); a.foo(); //B’s foo}public void test3(){ A a = new C(); a.foo();//A’s foo}

1.Decompose edit intoatomic changes

2.Find affected tests

3.Find affecting changes

10/06 U Mass DistLect,, BGRyder 10

Assumptions• Our change analysis tools run within an interactive

programming environment - Eclipse– User makes a program edit– User requests change impact analysis of an edit

• Before and after edit, the program compiles• Tests execute different functionalities of the

system• Call graphs obtainable for each test through static

or dynamic analysis– Call graphs show the (possible) calling structure of a program;

nodes ~ methods, edges ~ calls• Change impact measured in terms of changes to

these call graphs, corresponding to a (growing) setof tests

6

10/06 U Mass DistLect,, BGRyder 11

Usage Scenario

∀ Ti, run Ti

Output okay?

yes

Build call graphs

Edit code

Derive atomic changesS = {c1,c2,…,cn}

Find potentially affected testsT1, T2, …,Tk

Chianti

noFind S’⊆S that may impact Ti

Explore subsets of S’

Crisp

Junit/CIA

Chianti

10/06 U Mass DistLect,, BGRyder 12

Prototypes• Chianti, a prototype change impact analyzer

for full Java language (1.4)– Written as an Eclipse plug-in– Experimental results from analysis of year 2002

Daikon system CVS check-in• Crisp, a builder of intermediate program

versions (between original and edited)– To find failure-inducing changes semi-

automatically• JUnit/CIA, augmented version of JUnit in

Eclipse to perform unit/regression testing andchange classification– Find likelihood of a change being failure-inducing

7

10/06 U Mass DistLect,, BGRyder 13

Atomic Changes

AC Add an empty classDC Delete an empty classAM Add an empty methodDM Delete an empty methodCM Change body of a methodLC Change virtual method lookupAF Add a fieldDF Delete a field

CFI Change defn instance field initializerCSFI Change defn static field initializerAI Add an empty instance initializerDI Delete an empty instance initializerCI Change defn instance initializerASI Add empty static initializerDSI Delete empty static initializerCSI Change definition of static initializer

Each edit corresponds to uniqueset of (method-level) atomic changes

10/06 U Mass DistLect,, BGRyder 14

Examples of Atomic Changesclass A { public void foo() { } public int x;//1}class B extends A { public void foo(){B.bar();}//6 public static void bar()

{y=17;}//5,8 public static int y;//7}class C extends A{ public void foo() { x = 18; }//2,3 public void baz() { z = 19;}//10,11 public int z;//9}

8

10/06 U Mass DistLect,, BGRyder 15

Dynamic Dispatch ChangesA set of triples defined for a class hierarchy:

Runtime receiver type, declared method typesignature, actual target method<C, A.m, B.m> means:1. class A contains` method m2. class C does not contain method m3. class B contains method m, and C inherits m from B

A

B

C

m()

m()

A

B

C

m()

m()

m()

Delete <C, A.m, B.m>Add <C, A.m, C.m>

LC change: <C, A.m>

10/06 U Mass DistLect,, BGRyder 16

Affected Tests

T, set of all tests ; A, set of all atomicchanges; P, program before edit; P’,program after edit

AffectedTests (T,A) ≡ {Ti | Ti ∈ T, (Nodes(P,Ti) ∩ (CM ∪ DM)) ≠ ∅} ∪ {Ti | Ti ∈ T, n ∈ Nodes(P,Ti), n→B,X.m A.m ∈

Edges(P,Ti), for (B,X.m) ∈ LC, B<A≤* X}

9

10/06 U Mass DistLect,, BGRyder 17

Affecting Changes

T, set of all tests; A, set of all atomicchanges; P, program before edit; P’, programafter edit

Affecting Changes(Ti, A) ≡

{a’ | a ∈ Nodes(P’,Ti) ∩ (CM ∪ AM), a’<*a} ∪{a’ | a ≡ (B,X.m) ∈ LC, B <* X, n→B,X.m A.m ∈

Edges(P’,Ti),for some n, A.m ∈ Nodes(P’,Ti), a’<*a}

10/06 U Mass DistLect,, BGRyder 18

Edited Program P’

Call GraphBuilder

Unit/Regression Tests

Atomic ChangeDecoder

Change ImpactAnalyzer

Atomic Changes

Chianti

Original Program P

Call Graphs of Tests in P

Affected Tests

Call Graphs of Affected Tests in P’

Affecting Changes

10

10/06 U Mass DistLect,, BGRyder 19

How to run Chianti?

• To simulate use in an IDE while editingevolving code, select 2 successive CVSversions of a project and their associatedtests

• Call for change impact analysis of the edit• Chianti displays change analysis results

– Affected tests with their affecting atomicchanges and prerequisite changes

10/06 U Mass DistLect,, BGRyder 20

Chianti GUI

test2 calls B.foo()affected by CM(B.foo())with prereq AM(B.bar()),because B.foo() callsB.bar()

Eclipse project files Java codeAffected tests& affecting changes

11

10/06 U Mass DistLect,, BGRyder 21

Chianti GUI

test2 calls B.bar() affectedby CM(B.bar()) with prereqsAM(B.bar()) & AF(B.y)because B.bar() uses B.y

10/06 U Mass DistLect,, BGRyder 22

Chianti Experiments

• Data: Daikon project (cf M.Ernst, MIT)– Obtained CVS repository from 2002 with version

history - an active debugging period– Grouped code updates within same week for 12

months– Obtained 39 intervals with changes– Growth from

• 48K to 121K lines of code• 357 to 765 classes• 2409 to 6284 methods• 937 to 2885 fields• 40-62 unit tests per version

OOPLSA’04

12

10/06 U Mass DistLect,, BGRyder 23

Number of atomic changes

268

1471

2810

308

11698

1197

171

397

1006

212

116

300

29

214

350

5238

378

2344

1319

4

286

3

15

65

7

13

6095

153

1

163

635

40

2913

659

723

665

332

465

1747

1

10

100

1000

10000

100000

0107-0114

0114-0121

0121-0128

0128-0204

0204-0211

0211-0218

0218-0225

0225-0304

0304-0311

0311-0318

0318-0401

0401-0408

0408-0415

0415-0506

0506-0527

0527-0603

0603-0610

0610-0617

0617-0625

0625-0701

0701-0708

0708-0715

0715-0722

0722-0805

0805-0819

0819-0826

0826-0902

0902-0909

0909-0916

0916-0923

0923-0930

0930-1111

1111-1119

1119-1126

1126-1202

1202-1209

1209-1216

1216-1223

1223-1230

10/06 U Mass DistLect,, BGRyder 24

Affected Tests

53%

80%

0%2%

63%

89%

0%

68%

0%

61%

70%

0%

61%

52%

22%

27%

68%65%

60%

44%

48%

76%

57%

0%

69%

77%

0%

63%

76%

63%

79%

68%

72%72%

63%

67%69%

57%

55%

58%

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

01

07

_0

11

4

01

14

_0

12

1

01

21

_0

12

8

01

28

_0

20

4

02

04

_0

21

1

02

11

_0

21

8

02

18

_0

22

5

02

25

_0

30

4

03

04

_0

31

1

03

11

_0

31

8

03

18

--04

01

04

01

--04

08

04

08

--04

15

04

15

--05

06

05

06

--05

27

05

27

--06

03

06

03

--06

10

06

10

--06

17

06

17

--06

25

06

25

--07

01

07

01

-07

08

07

08

-07

15

07

15

-07

22

07

22

-08

05

08

05

-08

19

08

19

-08

26

08

26

-09

02

09

02

-09

09

09

09

-09

16

09

16

-09

23

09

23

-09

30

09

30

-11

11

11

11

-11

19

11

19

-11

26

11

26

-12

02

12

02

-12

09

12

09

-12

16

12

16

-12

23

12

23

-12

30

Av

era

ge

Percentage Affected Tests

13

10/06 U Mass DistLect,, BGRyder 25

Affecting Changes

9.54%

0.34%

2.58%

15.07%

10.34%

28.24%

2.94%

5.08%

73.53%

43.41%

3.82% 3.95%

0.38%

0.26%

8.33%

5.64%

3.60%

2.88%

1.95%

7.68%

8.81%

1.54%

3.84%

1.44%

12.67%

3.36%

5.90%

0.48%

1.96% 1.92%

33.16%

1.21%1.29%

5.94%

0.10%

1.00%

10.00%

100.00%

01

07

_0

11

4

01

14

_0

12

1

01

21

_0

12

8

01

28

_0

20

4

02

04

_0

21

1

02

11

_0

21

8

02

18

_0

22

5

02

25

_0

30

4

03

04

_0

31

1

03

11

_0

31

8

03

18

--04

01

04

08

--04

15

04

15

--05

06

05

06

--05

27

05

27

--06

03

06

03

--06

10

06

10

--06

17

06

17

--06

25

06

25

--07

01

07

01

-07

08

07

08

-07

15

07

15

-07

22

07

22

-08

05

08

05

-08

19

08

19

-08

26

09

09

-09

16

09

16

-09

23

09

30

-11

11

11

11

-11

19

11

19

-11

26

12

02

-12

09

12

09

-12

16

12

23

-12

30

Ge

om

-me

an

Percentage Affecting Changes

10/06 U Mass DistLect,, BGRyder 26

Performance of Chianti

• Deriving atomic changes from 2 successiveDaikon versions takes on average 87 seconds

• Calculating the set of affected tests takeson average 5 seconds

• Calculating affecting changes for anaffected test takes on average 1.2 seconds

• Results show promise of our change impactframework

14

10/06 U Mass DistLect,, BGRyder 27

JUnit/CIA

• Integrated Chianti with JUnit in Eclipse– Executes change impact analysis as part of

regular unit testing of code during softwareevolution• Allows user to avoid rerunning unaffected unit tests• Allows user to see atomic changes not tested by current

test suite, in order to develop new tests• Using test outcome history to estimate

likelihood that an affecting change isfailure-inducing for a test– Classifying changes by whether they are affecting

to tests with worsening, improving, or sameoutcomes on both versions

FSE’06

10/06 U Mass DistLect,, BGRyder 28

Junit/CIA - Intuition• JUnit/CIA change classifications

• untested : change affects no test, more tests should be added

• green: all affected tests improve in outcome• yellow: some affected tests worsen,

some improve in outcome,indeterminate• red: all affected tests worsen in outcome;

good candidate for a bug!

• Performed in situ experiments with SE classat University of Passau

• 61 student version pairs had worsening tests with morethan 1 affecting change

• Overall results showed best coloring classifier helpedfocus programmer attention on failure-inducing changesin ~50% of cases

15

10/06 U Mass DistLect,, BGRyder 29

Comparison of Recall, Precisionfor Red Classifiers

• Recall• Measures fraction of failure-inducing changes colored RED• False negative rate = 1-recall

• Precision• Ratio of actual failure-inducing RED changes to all RED

changes• False positive rate = 1-precision

• Pair of values quantifies effectiveness for anyRED classifier

10/06 U Mass DistLect,, BGRyder 30

Relaxed Red (“Best”) Classifier

Harmonic mean = .68

16

10/06 U Mass DistLect,, BGRyder 31

Strict Red ClassifierHarmonic mean = .64

10/06 U Mass DistLect,, BGRyder 32

Simple ClassifierHarmonic mean = .49

17

10/06 U Mass DistLect,, BGRyder 33

CRISP

• Tool to find changes introducing bugs• IDEA: Atomic changes which do not affect a

test cannot be responsible for its failure– Starts with original program and allows user to

select atomic changes to add– Tool adds selected changes and their prerequisites

to the original version, allowing running of testson this intermediate program version• Goal: to perform this task automatically making ‘smart’

choices of changes to add

ICSM’05, IEEE-TSE 2007

10/06 U Mass DistLect,, BGRyder 34

Crisp - Daikon Case StudyNov 11th (original) and Nov 19th (new) versions

35 affecting changes in a failing test

2 failure-inducingchanges

Crisp

Select changes

Apply group ofchanges

Un-applychanges

Run tests

Need to create failingtests scenario for Crisp;executed test suitefor Daikon version pair inversion n against sourcecode in version n+1

6093 atomic changes

Chianti

18

10/06 U Mass DistLect,, BGRyder 35

Related Work - Impact Anal• Static impact analysis techniques

– Reachability: Bohner-Arnold ‘96; Kung et al.JSS’96; Tonella TSE 2003;

– Year 2000 analyses: Eidorff et.al POPL’99;Ramalingam et al. POPL’99;

• Dynamic impact analysis techniques– Zeller FSE’99; Law-Rothermel ICSE’03;

• Combined analysis techniques– Orso et al. FSE’03; Orso et al. ICSE’04;

10/06 U Mass DistLect,, BGRyder 36

Related Work - Testing,Avoiding Recompilation

• Selective regression testing– TestTube-Rosenblum ICSE’94; DejaVu-Harrold-

Rothermel TOSEM 1997;– Slicing-based: Bates-Horwitz POPL’93; Binkley

TSE’97;– Test selection: Harrold et al. OOPSLA’01;

Elbaum et al. JSTVR 2003;• Techniques for Avoiding Recompilation

– Tichy ACM TOPLAS’86; et al., ACM TOSEM ’94;Hood et al., PSDE’87; Burke et al. TOPLAS‘93;Chambers et al., ICSE’95; Dmitriev, OOPSLA’02

19

10/06 U Mass DistLect,, BGRyder 37

Related Work - FaultLocalization

• Comparing dynamic data across executions• Reps et al. FSE’97; Harrold et al. PASTE’98;Jones et al. ICSE’02; Reneris and ReissASE’03; Dallmeier et al. ECOOP’05;

• Statistically-based techniques: Liblet et al.PLDI’03, PLDI’05; Liu et al. FSE’05;

• Slicing-based techniques• Lyle et al. ISICCA’87; DeMillo et al.ISSTA’96; Bonus et al. ASE’03;

10/06 U Mass DistLect,, BGRyder 38

SummaryDeveloped change impact analysis for object-

oriented codes with emphasis on practicality,viability in an IDE, and scalability toindustrial-strength systems– Defined atomic changes to subdivide edits– Defined impact using tests affected and their

affecting changes– Tested ideas in Chianti on 12 months of data

from Daikon project with promising results– Developed new tools --JUnitCIA, CRISP-- that

incorporate change impact analysis in practicalways for testing and debugging applications

20

10/06 U Mass DistLect,, BGRyder 39

Future Work• Upgrade Chianti to Java 1.5

• Need to consider additional dependences introduced• Experiment with JUnit/CIA in combination with Crisp

to find failure-inducing atomic changes• Develop heuristics to automate this process• Use larger programs as data -- e.g., Eclipse compiler

• Use change impact analysis to examine collaborativechanges to determine unexpected semanticdependences in resulting code

• Experiment with finding committable changes that allow earlyexposure of changes to team members

• Explore further applications of change analysis totesting

• Codify issues of semantic dependence between changes

10/06 U Mass DistLect,, BGRyder 40

21

10/06 U Mass DistLect,, BGRyder 41

JUnitCIA Screen Shot