faults and regression testing - localizing failure-inducing program edits based on spectrum...

Localizing Failure-Inducing Program Edit B d S t I f tiEdits Based on Spectrum Information

Lingming Zhang, Miryung Kim, Sarfraz KhurshidThe University of Texas at Austin

ICSM2011, September 27th 2011

1

Overview

Change impact analysis is effective at findingChange impact analysis is effective at finding suspicious edits but lacks precise ranking.

Spectrum based fault localization is effective atSpectrum-based fault localization is effective at ranking but does not scale well.

Our insight: combine change impact analysis andOur insight: combine change-impact analysis and spectrum-based fault localization.• Identify suspicious edits using extended call graphs. • Rank suspicious edits using dynamic program

spectrum information.

L. Zhang: Localizing failure-inducing program edits based on spectrum information 2

Summary of our results

FaultTracer localizes failure-inducing edits with

y

FaultTracer localizes failure inducing edits with high precision:

Id tif i i i dit t f• Identifying suspicious edits: outperforms Chianti by 19.37%.

• Ranking all suspicious edits: ranks real regression faults within top 3 edits for 14 ofregression faults within top 3 edits for 14 of the 22 studied real-world failures. R ki th d l l i i dit• Ranking method-level suspicious edits: outperforms existing heuristic by 56.25%.


Outline

FaultTracer ApproachFaultTracer ApproachEmpirical EvaluationRelated Work ConclusionsConclusions


Examplep

Program P Program P’Program P Program Ppublic class A {

public static int f1=0;public static int f2=0;

public class A {public static int f1=1;public static int f2=1;

evolve

p ;...

}class B {

int f1=0; int f2=0; int f3=0;

public static int f2 1;...

}class B {

int f1=0; int f2=1; int f3=1;public int foo(){return f1;}...

}class C extends B{

; ; ;int f4=1;public int foo(){ if(f1>=0) return f1;

else return f4;

Regression test suite T

...}

}...

}class C extends B{

T t

public int f1=3;public void bar(int f) {f3=f+f1;}...

}

public void test1() { A.bar(1); }public void test2() { ... }public void test3() { }Test

Re-TestBug!Bug!

public void test3() { ... }public void test4() {

C c = new C();int f = c.foo();

}


}public void test5() { ... }

FaultTracer overview

Selecting tests

TT’

Detecting changes and

gbased on Extended Call Graph analysis

P∆

②Tchanges and

dependences

①P’∆

ᵟtId tif i i i

① ③

tIdentifying suspicious

edits based on Extended

Call Graph analysisRank suspicious edits based on④Call Graph analysis edits based on

program spectrum information

④

ᵟt’L. Zhang: Localizing failure-inducing program edits based on spectrum information 6

Extended Call Graph representationp p

public void test1() { A.bar(1); }public void test4() {

C c = new C();int f = c.foo();

}

Extended�Call�Graph�used by FaultTracer

Traditional�Call�Graph�used by Chianti used�by�FaultTracerused�by�Chianti

test1 test4

<C,C.foo()>

test1 test4

<C,C.foo()>

A.bar() C.foo()C.C() A.bar()

<SFW,A.f2>

A.Clinit() C.foo()

<FR,C.f1>

C.C()A.Clinit()

B.B()

A.f2 B.f1B.B()


Step 1. Detecting atomic changes and p g gdependences

Change types

Description

CM Change�method

AM Add�method

DM Delete�method

AF Add�field

DF Delete�field

CFI Change�instance�field

CSFI Change static fieldCSFI Change�static�field

LCm Method�look-up�change

LCf Field�look-up changeChange dependences inference rulesChange�dependences�inference�rules

Atomic�Change�Types


Step 2. Test selection based on Extended C ll G h (ECG) l iCall Graph (ECG) analysis

FaultTracer directly matches all changes with test ECGs before edits to select the influenced tests.before edits to select the influenced tests.


Step 3. Suspicious edit identification b d E t d d C ll G h l ibased on Extended Call Graph analysis

FaultTracer directly selects the non-look-up changes appear on test ECGs after edits as suspicious edits.appear on test ECGs after edits as suspicious edits.

FaultTracer selects method or field edits that have caused look-up changes on test ECGs as suspicious editslook up changes on test ECGs as suspicious edits.


Step 4. Spectrum-based fault localization f ditCorrelation between suspicious edits and testsfor program edits

pEdits test2 test3 test4 test5

CSFI(A.f1)

CM(B f )CM(B.foo)

AF(C.f1)

AM(C.bar)

Suspiciousness score computationout Pass Pass Pass Fail

Suspiciousness Score TieBreak

Edits Tarantula SBI Jaccard Ochiai -EditsCSFI(A.f1) 0.00 0.00 0.00 0.00 -

CM(B.foo) 0.75 0.50 0.50 0.71 1

AF(C.f1) 0.75 0.50 0.50 0.71 0

AM(C.bar) 1.00 1.00 1.00 1.00 -


Outline

FaultTracer ApproachFaultTracer ApproachEmpirical EvaluationRelated Work ConclusionsConclusions


Research Questions

RQ1: How does FaultTracer compare to Chianti in id tif i i i dit ?identifying suspicious edits?

RQ2: How effective is FaultTracer in ranking suspicious edits?suspicious edits?


Subjects: overviewj

Subjects from Software-artifact Infrastructure Repository (SIR)Repository (SIR).

Project Version Program Size (KLoC) NumberProject Version Program Size (KLoC) Number of Test

Jtopas 0.0-3.0 1.83 ~ 5.36 95-209

Xml-Security 0.0-3.0 17.44 ~ 18.99 84-106

JMeter 0.0-5.0 31.01 ~ 41.05 70-97

Ant 0.0-8.0 17.20 ~ 80.44 112-878


Subjects: change statistics

Number of changes for each version pair

j g

Number of changes for each version pair

Ant5 0-6 0Ant6.0-7.0Ant7.0-8.0

Ant2.0-3.0Ant3.0-4.0Ant4.0-5.0Ant5.0 6.0

AM

DM

JMeter3.0-4.0JMeter4.0-5.0

Ant0.0-1.0Ant1.0-2.0 DM

CM

AF

JMeter0.0-1.0JMeter1.0-2.0JMeter2.0-3.0JMeter3.0 4.0

DF

CFI

CSFI

Jtopas2.0-3.0XmlSec0.0-1.0XmlSec1.0-2.0XmlSec2.0-3.0

LCm

LCf

0 1000 2000 3000 4000 5000 6000 7000

Jtopas0.0-1.0Jtopas1.0-2.0

p


0 1000 2000 3000 4000 5000 6000 7000

RQ1: How does FaultTracer compare to Chi ti i id tif i i i dit ?FaultTracer achieves 19.37% improvement in theChianti in identifying suspicious edits?FaultTracer achieves 19.37% improvement in the

precision of identification suspicious edits.

120

140

160

80

100

120

40

60 ChiantiFaultTracer

0

20

1.0

2.0

3.0

1.0

2.0

3.0

1.0

2.0

3.0

4.0

5.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

Jtop

as0.

0-

Jtop

as1.

0-2

Jtop

as2.

0-3

XmlS

ec0.

0-

XmlS

ec1.

0-2

XmlS

ec2.

0-3

JMet

er0.

0-

JMet

er1.

0-2

JMet

er2.

0-3

JMet

er3.

0-4

JMet

er4.

0-5

Ant0

.0-

Ant1

.0- 2

Ant2

.0-3

Ant3

.0-4

Ant4

.0-5

Ant5

.0-6

Ant6

.0-7

Ant7

.0-8


X X X

RQ2: How effective is FaultTracer in ki i i dit ?

Ranks all types of edits:ranking suspicious edits?Ranks all types of edits:

• Average performance.Tarantula SBI Jaccard Ochiai Suspicious

edit num.Editnumber

Average 8.50 8.50 10.83 14.66 68.83 3932Percentage Toedit number

0.22% 0.22% 0.28% 0.37% 1.75% --

• Example (Ant5.0-6.0)T t T t l SBI J O hi i S i i EditTest Tarantula SBI Jaccar

dOchiai Suspicious

edit num.Editnumber

ant.taskdefs.optional.EchoPropertiesTest testEchoToBadFile

1 1 1 10 182 5019pertiesTest.testEchoToBadFile


RQ2: How effective is FaultTracer in ki i i dit ?

Ranks method edits (FaultTracer v.s. Heuristic)ranking suspicious edits?Ranks method edits (FaultTracer v.s. Heuristic)

• Achieves 56.25% improvement in the precision of localizing method-level failure-inducing editslocalizing method-level failure-inducing edits


Limitations

Does not currently filter out refactorings (e.g., useDoes not currently filter out refactorings (e.g., use RefFinder [Prete+2010]).

Uses only four spectrum based fault localizationUses only four spectrum-based fault localization techniques.

The experimental evaluation is limited by the small number of real regression faults.number of real regression faults.


Related work

Change-impact analysisChange impact analysis• Chianti [Ren+2004]• Crisp [Chesley+2005]• Crisp [Chesley+2005]• Heuristic ranking [Ren+2007]

Fault localization• Spectrum-basedSpectrum based

• E.g., Tarantula [Jones+2002], SBI [Liblit+2005], Jaccard[Abreu+2007], Ochiai [Abreu+2007].

• Delta debugging [Zeller1999]• Model-basedModel based

• E.g., Bayesian diagnosis [Kleer+1987]


Conclusion

FaultTracer combines change impact analysis with g p ydynamic spectra.

FaultTracer improves change impact analysis basedFaultTracer improves change impact analysis based extended call graph analysis.

Experimental evaluation shows FaultTracer:Experimental evaluation shows FaultTracer:• Performs 19.37% better than Chianti in determining

affecting changesaffecting changes.• Localizes failure-inducing edits within top 3 edits for

14 of the 22 regression failures14 of the 22 regression failures.• Performs 56.25% better than previous heuristic for

l li i f il i d i ditlocalizing failure-inducing program edits.

zhanglm10@gmail com


[email protected]

faults and regression testing - localizing failure-inducing program edits based on spectrum...

Technology

suspicious edits edits

findingsuspicious edits

field edits

localizing failure

test ecgsbefore edits

public int f1

public static int f1

public int foo