faults and regression testing - localizing failure-inducing program edits based on spectrum...
DESCRIPTION
Paper: Localizing Failure-Inducing Program Edits Based on Spectrum Information.Authors: Lingming Zhang, Miryung Kim, Sarfraz Khurshid.Session: Research Track Session 1: Faults and Regression TestingTRANSCRIPT
Localizing Failure-Inducing Program Edit B d S t I f tiEdits Based on Spectrum Information
Lingming Zhang, Miryung Kim, Sarfraz KhurshidThe University of Texas at Austin
ICSM2011, September 27th 2011
1
Overview
Change impact analysis is effective at findingChange impact analysis is effective at finding suspicious edits but lacks precise ranking.
Spectrum based fault localization is effective atSpectrum-based fault localization is effective at ranking but does not scale well.
Our insight: combine change impact analysis andOur insight: combine change-impact analysis and spectrum-based fault localization.• Identify suspicious edits using extended call graphs. • Rank suspicious edits using dynamic program
spectrum information.
L. Zhang: Localizing failure-inducing program edits based on spectrum information 2
Summary of our results
FaultTracer localizes failure-inducing edits with
y
FaultTracer localizes failure inducing edits with high precision:
Id tif i i i dit t f• Identifying suspicious edits: outperforms Chianti by 19.37%.
• Ranking all suspicious edits: ranks real regression faults within top 3 edits for 14 ofregression faults within top 3 edits for 14 of the 22 studied real-world failures. R ki th d l l i i dit• Ranking method-level suspicious edits: outperforms existing heuristic by 56.25%.
L. Zhang: Localizing failure-inducing program edits based on spectrum information 3
Outline
FaultTracer ApproachFaultTracer ApproachEmpirical EvaluationRelated Work ConclusionsConclusions
L. Zhang: Localizing failure-inducing program edits based on spectrum information 4
Examplep
Program P Program P’Program P Program Ppublic class A {
public static int f1=0;public static int f2=0;
public class A {public static int f1=1;public static int f2=1;
evolve
p ;...
}class B {
int f1=0; int f2=0; int f3=0;
public static int f2 1;...
}class B {
int f1=0; int f2=1; int f3=1;public int foo(){return f1;}...
}class C extends B{
; ; ;int f4=1;public int foo(){ if(f1>=0) return f1;
else return f4;
Regression test suite T
...}
}...
}class C extends B{
T t
public int f1=3;public void bar(int f) {f3=f+f1;}...
}
public void test1() { A.bar(1); }public void test2() { ... }public void test3() { }Test
Re-TestBug!Bug!
public void test3() { ... }public void test4() {
C c = new C();int f = c.foo();
}
L. Zhang: Localizing failure-inducing program edits based on spectrum information 5
}public void test5() { ... }
FaultTracer overview
Selecting tests
TT’
Detecting changes and
gbased on Extended Call Graph analysis
P∆
②Tchanges and
dependences
①P’∆
ᵟtId tif i i i
① ③
tIdentifying suspicious
edits based on Extended
Call Graph analysisRank suspicious edits based on④Call Graph analysis edits based on
program spectrum information
④
ᵟt’L. Zhang: Localizing failure-inducing program edits based on spectrum information 6
Extended Call Graph representationp p
public void test1() { A.bar(1); }public void test4() {
C c = new C();int f = c.foo();
}
Extended�Call�Graph�used by FaultTracer
Traditional�Call�Graph�used by Chianti used�by�FaultTracerused�by�Chianti
test1 test4
<C,C.foo()>
test1 test4
<C,C.foo()>
A.bar() C.foo()C.C() A.bar()
<SFW,A.f2>
A.Clinit() C.foo()
<FR,C.f1>
C.C()A.Clinit()
B.B()
A.f2 B.f1B.B()
L. Zhang: Localizing failure-inducing program edits based on spectrum information 7
Step 1. Detecting atomic changes and p g gdependences
Change types
Description
CM Change�method
AM Add�method
DM Delete�method
AF Add�field
DF Delete�field
CFI Change�instance�field
CSFI Change static fieldCSFI Change�static�field
LCm Method�look-up�change
LCf Field�look-up changeChange dependences inference rulesChange�dependences�inference�rules
Atomic�Change�Types
L. Zhang: Localizing failure-inducing program edits based on spectrum information 8
Step 2. Test selection based on Extended C ll G h (ECG) l iCall Graph (ECG) analysis
FaultTracer directly matches all changes with test ECGs before edits to select the influenced tests.before edits to select the influenced tests.
L. Zhang: Localizing failure-inducing program edits based on spectrum information 9
Step 3. Suspicious edit identification b d E t d d C ll G h l ibased on Extended Call Graph analysis
FaultTracer directly selects the non-look-up changes appear on test ECGs after edits as suspicious edits.appear on test ECGs after edits as suspicious edits.
FaultTracer selects method or field edits that have caused look-up changes on test ECGs as suspicious editslook up changes on test ECGs as suspicious edits.
L. Zhang: Localizing failure-inducing program edits based on spectrum information 10
Step 4. Spectrum-based fault localization f ditCorrelation between suspicious edits and testsfor program edits
pEdits test2 test3 test4 test5
CSFI(A.f1)
CM(B f )CM(B.foo)
AF(C.f1)
AM(C.bar)
Suspiciousness score computationout Pass Pass Pass Fail
Suspiciousness Score TieBreak
Edits Tarantula SBI Jaccard Ochiai -EditsCSFI(A.f1) 0.00 0.00 0.00 0.00 -
CM(B.foo) 0.75 0.50 0.50 0.71 1
AF(C.f1) 0.75 0.50 0.50 0.71 0
AM(C.bar) 1.00 1.00 1.00 1.00 -
L. Zhang: Localizing failure-inducing program edits based on spectrum information 11
Outline
FaultTracer ApproachFaultTracer ApproachEmpirical EvaluationRelated Work ConclusionsConclusions
L. Zhang: Localizing failure-inducing program edits based on spectrum information 12
Research Questions
RQ1: How does FaultTracer compare to Chianti in id tif i i i dit ?identifying suspicious edits?
RQ2: How effective is FaultTracer in ranking suspicious edits?suspicious edits?
L. Zhang: Localizing failure-inducing program edits based on spectrum information 13
Subjects: overviewj
Subjects from Software-artifact Infrastructure Repository (SIR)Repository (SIR).
Project Version Program Size (KLoC) NumberProject Version Program Size (KLoC) Number of Test
Jtopas 0.0-3.0 1.83 ~ 5.36 95-209
Xml-Security 0.0-3.0 17.44 ~ 18.99 84-106
JMeter 0.0-5.0 31.01 ~ 41.05 70-97
Ant 0.0-8.0 17.20 ~ 80.44 112-878
L. Zhang: Localizing failure-inducing program edits based on spectrum information 14
Subjects: change statistics
Number of changes for each version pair
j g
Number of changes for each version pair
Ant5 0-6 0Ant6.0-7.0Ant7.0-8.0
Ant2.0-3.0Ant3.0-4.0Ant4.0-5.0Ant5.0 6.0
AM
DM
JMeter3.0-4.0JMeter4.0-5.0
Ant0.0-1.0Ant1.0-2.0 DM
CM
AF
JMeter0.0-1.0JMeter1.0-2.0JMeter2.0-3.0JMeter3.0 4.0
DF
CFI
CSFI
Jtopas2.0-3.0XmlSec0.0-1.0XmlSec1.0-2.0XmlSec2.0-3.0
LCm
LCf
0 1000 2000 3000 4000 5000 6000 7000
Jtopas0.0-1.0Jtopas1.0-2.0
p
L. Zhang: Localizing failure-inducing program edits based on spectrum information 15
0 1000 2000 3000 4000 5000 6000 7000
RQ1: How does FaultTracer compare to Chi ti i id tif i i i dit ?FaultTracer achieves 19.37% improvement in theChianti in identifying suspicious edits?FaultTracer achieves 19.37% improvement in the
precision of identification suspicious edits.
120
140
160
80
100
120
40
60 ChiantiFaultTracer
0
20
1.0
2.0
3.0
1.0
2.0
3.0
1.0
2.0
3.0
4.0
5.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
Jtop
as0.
0-
Jtop
as1.
0-2
Jtop
as2.
0-3
XmlS
ec0.
0-
XmlS
ec1.
0-2
XmlS
ec2.
0-3
JMet
er0.
0-
JMet
er1.
0-2
JMet
er2.
0-3
JMet
er3.
0-4
JMet
er4.
0-5
Ant0
.0-
Ant1
.0- 2
Ant2
.0-3
Ant3
.0-4
Ant4
.0-5
Ant5
.0-6
Ant6
.0-7
Ant7
.0-8
L. Zhang: Localizing failure-inducing program edits based on spectrum information 16
X X X
RQ2: How effective is FaultTracer in ki i i dit ?
Ranks all types of edits:ranking suspicious edits?Ranks all types of edits:
• Average performance.Tarantula SBI Jaccard Ochiai Suspicious
edit num.Editnumber
Average 8.50 8.50 10.83 14.66 68.83 3932Percentage Toedit number
0.22% 0.22% 0.28% 0.37% 1.75% --
• Example (Ant5.0-6.0)T t T t l SBI J O hi i S i i EditTest Tarantula SBI Jaccar
dOchiai Suspicious
edit num.Editnumber
ant.taskdefs.optional.EchoPropertiesTest testEchoToBadFile
1 1 1 10 182 5019pertiesTest.testEchoToBadFile
L. Zhang: Localizing failure-inducing program edits based on spectrum information 17
RQ2: How effective is FaultTracer in ki i i dit ?
Ranks method edits (FaultTracer v.s. Heuristic)ranking suspicious edits?Ranks method edits (FaultTracer v.s. Heuristic)
• Achieves 56.25% improvement in the precision of localizing method-level failure-inducing editslocalizing method-level failure-inducing edits
L. Zhang: Localizing failure-inducing program edits based on spectrum information 18
Limitations
Does not currently filter out refactorings (e.g., useDoes not currently filter out refactorings (e.g., use RefFinder [Prete+2010]).
Uses only four spectrum based fault localizationUses only four spectrum-based fault localization techniques.
The experimental evaluation is limited by the small number of real regression faults.number of real regression faults.
L. Zhang: Localizing failure-inducing program edits based on spectrum information 19
Related work
Change-impact analysisChange impact analysis• Chianti [Ren+2004]• Crisp [Chesley+2005]• Crisp [Chesley+2005]• Heuristic ranking [Ren+2007]
Fault localization• Spectrum-basedSpectrum based
• E.g., Tarantula [Jones+2002], SBI [Liblit+2005], Jaccard[Abreu+2007], Ochiai [Abreu+2007].
• Delta debugging [Zeller1999]• Model-basedModel based
• E.g., Bayesian diagnosis [Kleer+1987]
L. Zhang: Localizing failure-inducing program edits based on spectrum information 20
Conclusion
FaultTracer combines change impact analysis with g p ydynamic spectra.
FaultTracer improves change impact analysis basedFaultTracer improves change impact analysis based extended call graph analysis.
Experimental evaluation shows FaultTracer:Experimental evaluation shows FaultTracer:• Performs 19.37% better than Chianti in determining
affecting changesaffecting changes.• Localizes failure-inducing edits within top 3 edits for
14 of the 22 regression failures14 of the 22 regression failures.• Performs 56.25% better than previous heuristic for
l li i f il i d i ditlocalizing failure-inducing program edits.
zhanglm10@gmail com
L. Zhang: Localizing failure-inducing program edits based on spectrum information 21