information and software technology...many tools have already been developed, such as ctags, cscope...

13
ARTICLE IN PRESS JID: INFSOF [m5GeSdc;December 5, 2018;1:43] Information and Software Technology xxx (xxxx) xxx Contents lists available at ScienceDirect Information and Software Technology journal homepage: www.elsevier.com/locate/infsof VFL: Variable-based fault localization Jeongho Kim a,, Jindae Kim b , Eunseok Lee a,a Sungkyunkwan University, Seobu-ro, Jangan-gu, Suwon, Republic of Korea b HKUST, Clear Water Bay, Sai Kung, Hong Kong a r t i c l e i n f o KEYWORDS: Software debugging Software testing Spectrum-based fault localization Variable-based fault localization Suspicious variable a b s t r a c t Context: Fault localization is one of the most important debugging tasks. Hence, many automatic fault localization techniques have been proposed to reduce the burden on developers for such tasks. Among them, Spectrum-based Fault Localization (SFL) techniques leverage coverage information and localize faults based on the coverage difference between the failed and passed test cases. Objective: However, such SFL techniques cannot localize faults effectively when coverage differences are not clear. To address this issue and improve the fault localization performance of the SFL techniques, we propose a Variable-based Fault Localization (VFL) technique. Method: The VFL technique identifies suspicious variables and uses them to generate a ranked list of suspicious source code lines. Since it only requires additional information about variables that are also available in the SFL techniques, the proposed technique is lightweight and can be used to improve the performance of existing the SFL techniques. Results: In an evaluation with 224 real Java faults and 120 C faults, the VFL technique outperforms the SFL techniques using the same similarity coefficient. The average Exam scores of the VFL techniques are reduced by more than 55% compared to the SFL techniques, and the VFL techniques localize faults at a lower rank than the SFL techniques for about 73% of the 344 faults. Conclusion: We proposed a novel variable-based fault localization technique for more effective debugging. The VFL technique has better performance than the existing techniques and the results were more useful for actual fault localization tasks. In addition, this technique is very lightweight and scalable, so it is very easy to collaborate with other fault localization techniques. 1. Introduction Software debugging is an expensive software development task, which accounts for 50%–80% of the total software development costs [19,42]. This task consists of fault localization and fixing, especially fault localization is a very difficult task that takes more time and cost. In addition, since faults cannot be fixed if the location of faults are not found, fault localization task is very important. Therefore, many researchers have proposed various fault localization techniques such as spectrum-based [32], mutation-based [28], and slice- based [41] techniques. Among these techniques, the Spectrum-based Fault Localization (SFL) technique is the most popular which takes 35% of all fault localization-related studies [17]. The SFL techniques provide a ranked list of suspicious entities (such as statements, functions, basic blocks, and classes) to help developers localize a fault and fix it during debugging. Many SFL techniques with various similarity coefficients such as Tarantula [12], Jaccard [1], Naish [23], and GP [45] have been pro- Corresponding authors. E-mail addresses: [email protected] (J. Kim), [email protected] (J. Kim), [email protected] (E. Lee). posed to improve localization performance. The results and formulas of these similarity coefficients are different, but they are all designed for the same purpose. The intuition behind the SFL techniques is that if an entity covered by more failed tests and fewer passed tests is more likely to be faulty. However, such SFL techniques cannot localize faults effectively if the coverage information is not sufficient to distinguish the execution of failed and passed test cases [2,10]. In other words, perfor- mance is not guaranteed if there is not enough reliable data because the SFL techniques are statistical approaches. For example, suppose a pro- gram has one if statement. It is possible that a failed test case executes the then-block of the if statement, although it should execute the else- block. The problem occurs when a given passed test case executes the then-block just like the failed test case. Since both failed and passed test cases execute the same block, they have the same coverage. With such coverage information, the SFL techniques can only indicate that a fault is localized somewhere in the then-block. If there are many source code lines in this block, this outcome is not satisfactory. Moreover, given that the failed test case took the wrong branch, it is more likely that a fault exists in the statements before the branch, which leads the execution to the wrong side of the branch. Various techniques [10,18,33,34,46-48] https://doi.org/10.1016/j.infsof.2018.11.009 Received 2 January 2018; Received in revised form 10 October 2018; Accepted 29 November 2018 Available online xxx 0950-5849/© 2018 Elsevier B.V. All rights reserved. Please cite this article as: J. Kim, J. Kim and E. Lee, VFL: Variable-based fault localization, Information and Software Technology, https://doi.org/ 10.1016/j.infsof.2018.11.009

Upload: others

Post on 17-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information and Software Technology...many tools have already been developed, such as Ctags, Cscope and JavaParser, variable information can be easily obtained without addi- tional

ARTICLE IN PRESS

JID: INFSOF [m5GeSdc; December 5, 2018;1:43 ]

Information and Software Technology xxx (xxxx) xxx

Contents lists available at ScienceDirect

Information and Software Technology

journal homepage: www.elsevier.com/locate/infsof

VFL: Variable-based fault localization

Jeongho Kim

a , ∗ , Jindae Kim

b , Eunseok Lee

a , ∗

a Sungkyunkwan University, Seobu-ro, Jangan-gu, Suwon, Republic of Korea b HKUST, Clear Water Bay, Sai Kung, Hong Kong

a r t i c l e i n f o

KEYWORDS:

Software debugging

Software testing

Spectrum-based fault localization

Variable-based fault localization

Suspicious variable

a b s t r a c t

Context: Fault localization is one of the most important debugging tasks. Hence, many automatic fault localization

techniques have been proposed to reduce the burden on developers for such tasks. Among them, Spectrum-based

Fault Localization (SFL) techniques leverage coverage information and localize faults based on the coverage

difference between the failed and passed test cases.

Objective: However, such SFL techniques cannot localize faults effectively when coverage differences are not

clear. To address this issue and improve the fault localization performance of the SFL techniques, we propose a

Variable-based Fault Localization (VFL) technique.

Method: The VFL technique identifies suspicious variables and uses them to generate a ranked list of suspicious

source code lines. Since it only requires additional information about variables that are also available in the SFL

techniques, the proposed technique is lightweight and can be used to improve the performance of existing the

SFL techniques.

Results: In an evaluation with 224 real Java faults and 120 C faults, the VFL technique outperforms the SFL

techniques using the same similarity coefficient. The average Exam scores of the VFL techniques are reduced by

more than 55% compared to the SFL techniques, and the VFL techniques localize faults at a lower rank than the

SFL techniques for about 73% of the 344 faults.

Conclusion: We proposed a novel variable-based fault localization technique for more effective debugging. The

VFL technique has better performance than the existing techniques and the results were more useful for actual

fault localization tasks. In addition, this technique is very lightweight and scalable, so it is very easy to collaborate

with other fault localization techniques.

1

w

[

f

I

f

t

b

F

o

a

b

d

T

p

o

f

i

l

e

e

m

S

g

t

b

t

c

c

i

l

t

h

R

A

0

. Introduction

Software debugging is an expensive software development task,hich accounts for 50%–80% of the total software development costs

19,42] . This task consists of fault localization and fixing, especiallyault localization is a very difficult task that takes more time and cost.n addition, since faults cannot be fixed if the location of faults are notound, fault localization task is very important.

Therefore, many researchers have proposed various fault localizationechniques such as spectrum-based [32] , mutation-based [28] , and slice-ased [41] techniques. Among these techniques, the Spectrum-basedault Localization (SFL) technique is the most popular which takes 35%f all fault localization-related studies [17] . The SFL techniques provide ranked list of suspicious entities (such as statements, functions, basiclocks, and classes) to help developers localize a fault and fix it duringebugging.

Many SFL techniques with various similarity coefficients such asarantula [12] , Jaccard [1] , Naish [23] , and GP [45] have been pro-

∗ Corresponding authors.

E-mail addresses: [email protected] (J. Kim), [email protected] (J. Kim), leee

e

t

ttps://doi.org/10.1016/j.infsof.2018.11.009

eceived 2 January 2018; Received in revised form 10 October 2018; Accepted 29 N

vailable online xxx

950-5849/© 2018 Elsevier B.V. All rights reserved.

Please cite this article as: J. Kim, J. Kim and E. Lee, VFL: Variable-based fau10.1016/j.infsof.2018.11.009

osed to improve localization performance. The results and formulasf these similarity coefficients are different, but they are all designedor the same purpose. The intuition behind the SFL techniques is thatf an entity covered by more failed tests and fewer passed tests is moreikely to be faulty. However, such SFL techniques cannot localize faultsffectively if the coverage information is not sufficient to distinguish thexecution of failed and passed test cases [ 2 , 10 ]. In other words, perfor-ance is not guaranteed if there is not enough reliable data because the

FL techniques are statistical approaches. For example, suppose a pro-ram has one if statement. It is possible that a failed test case executeshe then -block of the if statement, although it should execute the else -lock. The problem occurs when a given passed test case executes thehen -block just like the failed test case. Since both failed and passed testases execute the same block, they have the same coverage. With suchoverage information, the SFL techniques can only indicate that a faults localized somewhere in the then -block. If there are many source codeines in this block, this outcome is not satisfactory. Moreover, given thathe failed test case took the wrong branch, it is more likely that a faultxists in the statements before the branch, which leads the execution tohe wrong side of the branch. Various techniques [ 10 , 18 , 33 , 34 , 46-48 ]

[email protected] (E. Lee).

ovember 2018

lt localization, Information and Software Technology, https://doi.org/

Page 2: Information and Software Technology...many tools have already been developed, such as Ctags, Cscope and JavaParser, variable information can be easily obtained without addi- tional

J. Kim, J. Kim and E. Lee Information and Software Technology xxx (xxxx) xxx

ARTICLE IN PRESS

JID: INFSOF [m5GeSdc; December 5, 2018;1:43 ]

h

a

t

m

t

T

v

v

e

s

c

g

s

f

n

n

v

m

J

t

f

V

i

r

f

p

a

S

n

T

i

e

I

T

c

F

S

2

t

r

t

Table 1

Representative similarity coefficient formulas.

Similarity coefficient Formula

Tarantula

𝑁 𝑐𝑓

𝑁 𝑐𝑓 + 𝑁 𝑢𝑓 𝑁 𝑐𝑓

𝑁 𝑐𝑓 + 𝑁 𝑢𝑓 + 𝑁 𝑐𝑝

𝑁 𝑐𝑝 + 𝑁 𝑢𝑝

Jaccard 𝑁 𝑐𝑓

𝑁 𝑐𝑓 + 𝑁 𝑢𝑓 + 𝑁 𝑐𝑝

Naish 𝑁 𝑐𝑓 − 𝑁 𝑐𝑝

𝑁 𝑐𝑝 + 𝑁 𝑢𝑝 +1

GP08 𝑁

2 𝑐𝑓 ( 2 𝑁 𝑐𝑝 + 2 𝑁 𝑐𝑓 + 3 𝑁 𝑢𝑝 )

GP10

√ |𝑁 𝑐𝑓 − 1

𝑁 𝑢𝑝

GP11 𝑁

2 𝑐𝑓 ( 𝑁

2 𝑐𝑓

+ √𝑁 𝑢𝑝 )

GP13 𝑁 𝑐𝑓 ( 1 + 1

2 𝑁 𝑐𝑝 + 𝑁 𝑐𝑓 )

GP20 2( 𝑁 𝑐𝑓 + 𝑁 𝑢𝑝

𝑁 𝑐𝑝 + 𝑁 𝑢𝑝 )

GP26 2 𝑁

2 𝑐𝑓

+ √𝑁 𝑢𝑝

c

l

i

t

D

𝑡

u

i

s

t

e

o

f

o

b

N

t

c

t

s

3

v

b

t

3

r

s

S

f

[

l

s

a

T

u

r

i

a

ave been proposed to overcome this issue, and we wish to contribute technique that addresses this problem differently.

Our idea to overcome this limitation and improve the fault localiza-ion performance of the SFL techniques is to leverage additional infor-ation about program entities. More precisely, we focus on variables

hat appear at the source code lines and try to find suspicious variables.hen, we consider source code lines that are closely related to suspiciousariables as possible fault locations.

The intuition behind this idea is that developers often monitor thealues of suspicious variables while debugging. If a variable has an un-xpected value at a certain location, it is likely that a fault resides at thetatements containing this suspicious variable. Therefore, a developerhecks the locations of that variable to localize the fault. In this debug-ing scenario, finding suspicious variables and localizing a fault amongtatements where such variables appeared can be an effective approachor fault localization to improve the existing SFL techniques.

Hence, we propose a Variable-based Fault Localization (VFL) tech-ique to improve the fault localization performance of the SFL tech-iques. The proposed the VFL technique only requires information aboutariables appearing in the source code and coverage information. Sinceany tools have already been developed, such as Ctags, Cscope and

avaParser, variable information can be easily obtained without addi-ional human efforts. Also, coverage information can be easily obtainedrom inputs originally used in existing SFL techniques. Therefore, theFL technique is a lightweight technique that can be combined with ex-

sting SFL techniques by simply borrowing their similarity coefficients.

• The detailed process of the VFL technique is as follows. 1. Extract variables and function invocations information and then

calculate the suspicious ratio of the individual variables and func-tion invocations.

2. Assign the suspicious ratio of individual statements that containvariables and function invocations using the suspicious variableand function invocations ranked list. (If a statement contains sev-eral variables and function invocations, the maximum value isselected.)

3. Perform debugging in the order of a ranked list of statementsderived from the above steps.

To evaluate the proposed technique, we applied the technique to 224eal faults from four Java programs in the Defects4j dataset [14] and 120aults from seven C programs in the Siemens suite [30] .

• The contributions of this paper are listed below.

1. A novel variable-based fault localization technique with improvederformance that can benefit existing SFL techniques.

2. A new approach to fault localization based on suspicious vari-bles that can be identified by information originally available to theFL techniques.

The rest of the paper is organized as follows. First, we introduce tech-iques that are the background of the proposed technique in Section 2 .hen, we describe the details of the proposed approach with a motivat-

ng example in Section 3 . Then, we present research questions and anxperimental setup for evaluation of the proposed approach in Section 4 .n Section 5 , we provide evaluation results of the proposed technique.hen, we discuss the issues of this paper in Section 6 . After that, we dis-uss threats to the validity ( Section 7 ) and related work ( Section 8 ).inally, we conclude the study with suggestions for future work inection 9 .

. Background

The SFL technique identifies suspicious statements using executionraces and test results. This technique determines the order of statementsequiring fixes based on a statistically computed suspicious ratio. Failedest cases increase the suspicious ratio of statements, while passed test

2

ases reduce the suspicious ratio of statements that are relatively unre-ated to program failure. Therefore, the SFL technique cannot be appliedn the absence of at least one failed test case. The definition of the SFLechnique is as follows:

efinition 1. Spectrum-based fault localization technique

Given a program P with n statements 𝑆 = { 𝑠 1 , 𝑠 2 , 𝑠 𝑛 } and m test cases = { 𝑡 1 , 𝑡 2 , 𝑡 𝑚 } vectors 𝑉 = { 𝑁 𝑐𝑓 𝑁 𝑢𝑓 𝑁 𝑢𝑝 } can be calculated for each S n sing an SFL matrix ( n ×m ) Similarity coefficients can be calculated us-ng V.

The goal of the SFL technique is to provide a developer with faultytatements with a ranking of S for effective debugging tasks. The SFLechnique utilizes four parameters based on statements, whether cov-red or not, and the test results (pass or fail). N cf indicates the numberf failed test cases covering the statement, N uf indicates the number ofailed test cases not covering the statement, N cp indicates the numberf passed test cases covering the statement, and N up indicates the num-er of passed test cases not covering the statement. It computes the N cf ,

uf , N cp , N up of each statement based on the program spectra informa-ion and then calculates the suspicious ratio using the similarity coeffi-ients. There are various similarity coefficients [4] , but we compare ourechnique to the nine most up-to-date and representative techniques de-cribed in Table 1 .

. Approach

In this section, we present a motivating example that explains why aariable-based approach can produce better results than previous line-ased approaches and introduce our variable-based fault localizationechnique.

.1. Motivating example

The SFL technique is a technique that statistically assigns suspiciousatios to entities based on the execution traces and test results of a testuite. Although there are differences in the similarity coefficients, theFL technique basically assigns the entity a high suspicious ratio if theailed test case is highly covered and the passed test case is less covered 5 , 35 ]. Therefore, even if a suspicious ratio is given at the statementevel, which is the finest-grained entity, all the entities included in theame branch are given the same suspicious ratio. Of course, this problemlso occurs for other entities and all of them are given the same rank.his phenomenon is generally defined as a tied-rank problem .

Fig. 1 indicates the tied-rank problem of the traditional SFL techniquesing the binary search algorithm source code as an example. This algo-ithm determines the position of a particular value in ascending ordern an ordered array. The algorithm selects the first intermediate valuend compares the value with that to be searched. If the first selected

Page 3: Information and Software Technology...many tools have already been developed, such as Ctags, Cscope and JavaParser, variable information can be easily obtained without addi- tional

J. Kim, J. Kim and E. Lee Information and Software Technology xxx (xxxx) xxx

ARTICLE IN PRESS

JID: INFSOF [m5GeSdc; December 5, 2018;1:43 ]

3 if (start < end) { F F P P F P F 0.57 4

4 int mid = start + (end - start) / 2; F F P P F P F 0.57 4

5 if (key < arr[mid]) { F F P F 0.6 2

6 return binarySearch(arr, start + 1, mid, key); // bug: start+1 -> start F F P F 0.6 2

7 } else if (key > arr[mid]) { F P F 0.4 7

8 return binarySearch(arr, mid + 1, end, key); F P F 0.4 7

9 } else { P P P 0 9

10 return mid; P P P 0 9

11 } 0 9

12 } 0 9

13 return -1; F F F F 1 1

14 } F F P P F P F 0.57 4

15

16 public sta�c void main(String[] args) {

17 int arr[] = { 0, 1, 2, 3, 4, 5 };

18 int start = 0;

19 int end = arr.length;

21 int result = binarySearch(arr, start, end, key);

22 }

23 }

Line Source codeTest suite (key)

Jaccard SFLRank0 1 2 3 4 5 6

1 public class binarySearch {

2 public sta�c int binarySearch(int[] arr, int start, int end, int key) {

Fig. 1. A buggy program and its test suite.

i

b

d

[

a

c

5

p

7

b

c

d

w

t

t

i

s

p

3

t

c

t

p

e

t

i

Algorithm 1 Variable-based fault localization

technique.

1: P : Program

2: FTC : A set of failed test cases

3: PTC : A set of passed test cases

4: S : Similarity coefficient

5: procedure VFL( P, FTC, PTC, S )

6: V ← variables in P

7: L ← source code lines in P

8: for i = 1…| V | do

9: for j = 1…| L | do

10: N cf ← | covered Vj | in FTC

11: N uf ← | uncovered Vj | in FTC

12: N cp ← | covered Vj | in PTC

13: N up ← | uncovered Vj | in PTC

14: V i ( N tuple ) ← ( N cf , N uf , N cp , N up )

15: suspicious ( V i ) ← average ( S ( V i ( N tuple )))

16: order ( V i ) with condition ( N cf desc, N cp asc )

17: L ← order ( L, suspicious ( V ), desc ) return L

r

r

t

(

S

o

t

t

r

d

t

ntermediate value is greater than the value to be searched, the valueecomes the new highest value. On the other hand, if the first interme-iate value selected is smaller, the value becomes the new lowest value16] .

Columns 1, 2, and 3 in Fig. 1 represent line numbers, source code,nd a test suite, respectively. Column 4 represents the suspicious ratioalculated based on the execution traces and test results, and column represents the rank of each statement. In this example, the tied-rankroblem occurs in {statements 3, 4, 14}, {statements 5, 6}, {statements, 8}, and {statements 9, 10, 11, 12}. Because they are all in the sameranch, they are all given the same suspicious ratio and rank. The lo-ation of actual fault is statement 6 (start + 1 ⇒ start). However, tra-itional SFL technique cannot localize this statement at once. In otherords, the SFL rank of actual fault is 2, but there is a statement with

he same rank. Since the SFL techniques cannot further distinguish thesewo statements, we need to examine all these statements. This examples very simple, so the number of statements with the same rank is verymall. However, there is a serious tied-rank problem in the real worldrograms.

.2. Variable-based fault localization

In this section, we introduce our variable-based fault localizationechnique in detail.

Fig. 2 illustrates the entire process of the VFL technique. First, weonstruct a variable list by parsing the source code. Then, we executehe program with the test cases, and collect the variable spectra as therogram is running. The variable spectra are calculated based on thexecution traces and the test results of the program. Next, we calculatehe suspicious ratio of the variables by substituting the variable spectranto the similarity coefficient. Finally, we obtain a suspicious variable

3

anked list that is sorted in descending order relative to the suspiciousatios of the variables.

The difference between the traditional SFL technique and the VFLechnique is that they calculate suspicious ratios for different entitiesi.e., at the statement level and variable level, respectively). That is, theFL technique assigns suspicious ratios to individual statements basedn the spectra covered by the test suite, while on the other hand, the VFLechnique assigns suspicious ratios to individual variables, for which ithen recalculates the suspicious ratio of each statement based on thisanked list of suspicious variables.

Algorithm 1 describes the entire algorithm of the VFL technique inetail. The inputs are a program, a set of failed test cases, a set of passedest cases, and a similarity coefficient, and the output is a ranked list of

Page 4: Information and Software Technology...many tools have already been developed, such as Ctags, Cscope and JavaParser, variable information can be easily obtained without addi- tional

J. Kim, J. Kim and E. Lee Information and Software Technology xxx (xxxx) xxx

ARTICLE IN PRESS

JID: INFSOF [m5GeSdc; December 5, 2018;1:43 ]

Fig. 2. An overview of the variable-based fault localization

technique.

Table 2

Rank computation of suspicious variables.

Line Suspicious variable ranked list Final VFL Rank

arr start end key mid

Susp. Rank Susp. Rank Susp. Rank Susp. Rank Susp. Rank

1

2

3 0.57 2 0.57 1 2

4 0.57 2 0.57 1 0.57 3 2

5 0.6 1 0.6 1 0.6 1 5

6 0.6 1 0.6 1 0.6 1 0.6 1 1

7 0.4 3 0.4 3 0.4 4 6

8 0.4 3 0.4 3 0.4 3 0.4 4 4

9

10 0 6 7

11

12

13

14

VFL Score 0.5 0.58 0.51 0.5 0.43

VFL Rank 3 1 2 3 4

s

t

i

r

l

1

t

e

c

n

s

N

a

fi

t

i

pi

s

s

V

v

t

o

r

t

t

e

p

u

c

V

t

a

V

v

t

v

T

t

s

t

s

F

a

b

i

r

s

s

i

i

h

a

m

t

t

n

uspicious statements. In the beginning, the set of variables declared inhe program is initialized to V , and the source code line of program isnitialized to L . In the loop statements 8 and 9, i, j repeat until V, L ,espectively. The suspicious ratios are calculated to generate a rankedist of all variables in these loop statements.

First, we calculate the values of N cf , N uf , N cp , and N up in statements0–13. N cf is the number of test cases that covered variable V amonghe total failed test cases, N uf is the number of test cases that uncov-red V among all failed test cases, N cp is the number of test cases thatovered V among the total passed test cases, and N up assigns V as theumber of uncovered test cases among all passed test cases. When theame variable is used several times in the source code, N cf , N uf , N cp , and up are assigned several times. In statement 14, the four parametersre grouped into a tuple V i ( N tuple ). Statement 15 uses a similarity coef-cient to calculate the average suspicious ratio of the variables. Next,he order of the statements containing the same variable is rearrangedn statement 16 because the same variables are assigned the same sus-icious ratio. Specifically, this sorting satisfies that N cf is large and N cp

s small. This is motivated by the fact that the suspicious ratios of thetatements are reassigned using the suspicious ratios of the variables intatement 17.

Table 2 details how to calculate the suspicious ratio by applying theFL technique to the buggy program in Fig. 1 . First, we prepare a list ofariables like second row of Table 2 by parsing the variables declared inhe example source code. Then, both the suspicious ratio and the rankf these variables are obtained based on the execution traces and testesults. The difference between the SFL and the VFL technique is thathe SFL technique calculates the suspicious ratio of the statements andhe VFL technique calculates the suspicious ratio of the variables. Forxample, the variable arr is used in statements 5, 6, 7, and 8, and the sus-icious ratios of these variables are 0.6, 0.6, 0.4, and 0.4, respectively,

4

sing the Jaccard correlation coefficient. We define the average suspi-ious ratio of these individual variables as the VFL Score . Therefore, theFL Score of the arr is 0.4 ((0.6 + 0.6 + 0.4 + 0.4)/4 = 0.5). In this way,

he VFL Scores of all declared variables are calculated and their rankingsre also obtained. Finally, we calculate the Final VFL Rank based on theseFL Scores and the VFL Rank . The Final VFL Rank is used to select theariable with the highest VFL Rank and then order the suspicious ra-ios of the statements containing the variable in order. For example, theariable with the highest VFL Score in the example source code is start .he statements containing this start variable are 3, 4, and 6. Therefore,he Final VFL Rank 1 is assigned to statement 6, which has the highestuspicious ratio in the start variable. Then, we find the statement withhe next highest suspicious ratio in the same variable. Since the ranks oftatements 3 and 4 are the same, rank 2 is assigned to all of them in theinal VFL Rank . Next, we find end , which is the second highest VFL Score ,nd repeat the above operation. Statements with multiple variables cane given a repetitive Final VFL Rank . In such a case, the Final VFL Rank

s determined by which variable is ranked first. That is, when variousanks are given, the highest rank is selected, and the rest are skipped.

Because the SFL technique simply uses execution traces and test re-ults information, it results in the same suspicious ratio and rank for alltatements with the same coverage. Therefore, a large amount of efforts needed to break the tied ranks, which is accomplished by extend-ng the information or improving the SFL technique itself. Many studiesave been conducted, as a solution to this problem can reduce the timend cost for developers to localize faults [ 48 –50 ]. Therefore, the ulti-ate goal of this paper is to propose a VFL technique that can solve the

ied-rank problem of the SFL technique. The motivating example in Fig. 1 illustrates how to apply the SFL

echnique to calculate the SFL Rank. When debugging with the SFL tech-ique, the ranks of statements 8, 9, 18, and 19, which comprise the

Page 5: Information and Software Technology...many tools have already been developed, such as Ctags, Cscope and JavaParser, variable information can be easily obtained without addi- tional

J. Kim, J. Kim and E. Lee Information and Software Technology xxx (xxxx) xxx

ARTICLE IN PRESS

JID: INFSOF [m5GeSdc; December 5, 2018;1:43 ]

g

t

a

i

t

i

b

p

t

H

T

a

p

f

p

3

s

d

v

i

t

s

h

a

t

F

e

n

t

i

t

t

(

o

c

t

l

m

t

b

a

t

s

t

d

t

T

i

m

f

d

c

a

b

n

t

c

t

i

Table 3

Example of statement types consisting of a

combination of variables.

Statement type Example

Assignment x = y + z Condition if ( (x + y) < 5 )

Loop for ( x = 1 ; x < 5 ; x + + ) Declaration int x = 10

Table 4

Example of statement types with function invocation state-

ments.

Statement type Example

Assignment x = y + getCurrentMonth()

Condition if ( (x + y) < calculateTotalLength( y, z ) )

Loop for ( x = getCurrentAge(name) ; x < 80 ; x + + ) Declaration int x = countNumberOfFiles()

t

f

n

o

a

v

v

c

e

v

s

o

o

T

c

I

f

t

w

v

n

s

V

f

s

a

c

s

n

t

o

a

c

r

W

V

t

l

i

m

t

roup that contains actual faults, are ranked first. However, becausehere are four statements with the same rank, the developer must checkt least 1 to a maximum of 4 statements to localize the actual faultsn the source code. On the other hand, when debugging with the VFLechnique, the ranking of statement 8, which includes an actual fault,s the unique first rank in Table 2 . Therefore, the VFL technique canreak the tied ranks occurred in the SFL technique, thus improving theerformance.

Because the SFL technique is highly dependent on the quality of theest cases, a small diversity of test cases cannot guarantee high accuracy.owever, the above example is a toy program with enough test cases.herefore, in this case, the accuracy of the SFL Rank is relatively highnd the probability of localizing the fault is high. That said, real-worldrojects often fail to meet all of these conditions. Therefore, the per-ormance difference between the SFL and the VFL techniques is morerominent when applied to real-world projects.

.2.1. Handling statements without variables

Unlike the existing SFL techniques, the VFL technique calculates auspicious ratio based on a variable, not a statement. This technique isesigned with the assumption that most executable statements containariables and that the probability that a statement without a variables defective is very low, for which we have verified these assumptionshrough experiments. However, all faults do not necessarily occur intatements including variables. Therefore, in this chapter, we explainow the VFL technique can handle statements that do not contain vari-bles.

To explain the VFL technique step by step, we classify the proposedechnique into the VFL (V: Variable) and the VFL (VF: Variable andunction invocation) versions. The VFL (VF) technique is designed tonable the VFL (V) technique to handle additional statements that doot contain variables.

In other words, the VFL (V) technique assigns only suspicious ra-io based on variables and VFL (VF) can handle not only statementsncluding variables but also statements excluding variables. Therefore,he VFL (VF) technique is more accurate and scalable than the VFL (V)echnique. Finally, the VFL technique proposed in this paper is the VFLVF) technique and the VFL (V) is a concept to introduce the main ideaf the VFL technique.

First, the type of statement is classified as assignment, condition,ommand, loop, and other statements. The assignment statement is usedo assign values on the right-hand side to values of the variables on theeft-hand side of equals sign. The condition statement is used to deter-ine whether the dependent statements are executed by confirming that

he condition is satisfied. This statement can be set to various conditionsy adding else if and else statements. The command statement consists ofn invocation and a return statement. The invocation statement is usedo invoke an internal or external function with arguments. The returntatement is used to return data such as a specific variable, array, orable from a function. The loop statement is used to iterate and executeependent statements by determining the start and end points. Finally,he other statement consists of declaration and annotation statements.he declaration statement is used to define the type of variables and

nitialize values, and the annotation statement is an unexecutable state-ent that gives additional explanations of the source code, including

unctions, variables and so on [ 3 , 7 , 36 ]. Among the types of statements described above, the only one that

oes not contain variables is the command statement. The function invo-ation statement is used to invoke internal and external functions withrguments. Therefore, the variables can be included as an argument,ut it is not mandatory, so the function invocation statement may notecessarily contain a variable. In other words, in the case of a functionhat does not require an argument, this invocation statement does notontain a variable. Therefore, if there is a fault in the function invoca-ion statement, the accuracy of the VFL (V) technique can be loweredn specific cases. For example, suppose that there is a fault in the func-

5

ion Calendar.getInstance(getCurrentTimeZone()) . Because this is a simpleunction invocation statement that gets the current time in the system,o arguments are needed. Therefore, it is difficult to localize this kindf fault with the VFL (V) technique.

Except for the function statement, the remaining statements such asssignment, condition, loop, and declaration statements generally useariables, as listed in Table 3 .

However, some of these variables can be replaced with function in-ocation as in Table 4 . That is, the value returned by the function invo-ation can be used instead of the value of the variable. Therefore, thexample source code of Table 3 can be transformed like Table 4 .

Because the VFL (V) technique assigns a suspicious ratio to eachariable, it is very difficult to localize faults in function invocationtatements that do not contain variables. Furthermore, the performancef the VFL (V) technique cannot be guaranteed if there are faults inther types of statements, including function invocation as described inable 4 . Therefore, it is necessary to additionally handle function invo-ation statements to improve the performance of the VFL (V) technique.n other words, we extend the VFL (V) technique to further consider theunction invocation statements.

The difference between the VFL (V) and the VFL (VF) techniques ishat the VFL (V) technique assigns suspicious ratios based on variables,hile the VFL (VF) technique assigns suspicious ratios based on bothariables and function invocation statements. That is, the VFL (V) tech-ique calculates the suspicious ratio for each variable and then assignsuspicious ratios to the statements containing these variables, while theFL (VF) technique calculates the suspicious ratio for each variable and

unction invocation statement and then assigns suspicious ratios to thetatements containing these variables and function invocations. For ex-mple, suppose that the Main() function contains a statement that invo-ates the getCurrentMonth() function. The VFL (V) technique only assignsuspicious ratios to individual variables. Therefore, a suspicious ratio isot assigned to this function invocation statement because the statementhat invocates getCurrentMonth() does not contain any variables. On thether hand, the VFL (VF) technique assigns suspicious ratios both to vari-bles and function invocation statements. Thus, the VFL (VF) techniquealculates suspicious ratios based on the execution traces and the testesults of the statement that invocates the getCurrentMonth() function.hen the same variable is repeatedly used in several statements, theFL (V) technique collects all the information regarding the statements

hat include this variable, and then uses this as information to calcu-ate the suspicious ratio of the variable. Similarly, if getCurrentMonth()

s invocated multiple times, the VFL (VF) technique collects all the infor-ation regarding the invocated statements and immediately computes

he suspicious ratio of getCurrentMonth() .

Page 6: Information and Software Technology...many tools have already been developed, such as Ctags, Cscope and JavaParser, variable information can be easily obtained without addi- tional

J. Kim, J. Kim and E. Lee Information and Software Technology xxx (xxxx) xxx

ARTICLE IN PRESS

JID: INFSOF [m5GeSdc; December 5, 2018;1:43 ]

m

t

T

f

i

I

t

4

s

4

4

i

f

s

s

4

a

p

(

a

t

m

w

w

4

[

f

a

i

t

4

a

s

a

f

t

t

4

t

4

d

g

w

f

u

[

T

b

J

m

f

w

v

s

s

c

o

i

b

c

f

C

p

t

t

t

p

l

c

f

h

t

c

t

V

n

n

a

f

w

4

t

d

t

a

a

fi

b

c

t

l

𝐸

i

a

F

l

e

a

fi

O

o

(

t

t

c

o

Finally, the return statement, which is the remainder of the com-and statement, is used to return data of the function. If a fault occurs in

his return statement, the suspicious ratio of the function also increases.herefore, the VFL (VF) technique assigns a high suspicious ratio to theunction including the return statement. In other words, performancemprovement can also be expected by applying the VFL (VF) technique.n addition, since the annotation statement is not executed directly inhe program, the description of this statement is skipped.

. Evaluation

In this section, we present our research questions and experimentaletup.

.1. Research questions

.1.1. RQ1: How well does the VFL (V) technique localize faults?

The main purpose of using an automatic fault localization techniques to help the developers during debugging by reducing the search areaor finding a fault. Hence, we evaluated the VFL (V) technique by as-essing the amount of source code to be examined until a real faultytatement is found.

.1.2. RQ2: How well does the VFL (V) technique place faulty statements

t low ranks?

Although answering RQ1 can provide information about the overallerformance of the VFL (V) technique, we want to evaluate the VFLV) technique more thoroughly. From RQ1, we obtain an answer suchs ‘only 3% of statements are examined when a fault is found’ . However,his is not helpful for developers if the examined program has over oneillion lines of code. Therefore, we apply more strict conditions (e.g.,e examine only the top N statements of a ranked list) and determinehether the VFL (V) technique can localize a fault.

.1.3. RQ3: Is the VFL (V) technique effective for every fault?

It is known that no techniques show the best performance for all bugs 11 , 15 , 42 ]. One fault localization technique can show impressive per-ormance for some bugs, but perform poorly for other bugs. This samergument can be applied to the VFL and the SFL techniques. Hence, it ismportant to verify that the VFL (V) technique performs more effectivelyhan the SFL technique for the majority of faults.

.1.4. RQ4: Can we increase the VFL (V) technique performance and

pplicability by using additional techniques?

We introduce a technique to list suspicious variables to localize faulttatements. However, a faulty statement might not contain any vari-bles, and the VFL (V) technique cannot be applied in this case. There-ore, we extend the VFL (V) technique to enable handling of statementshat do not contain variables. We name this technique as VFL (VF) inhis paper and evaluate this performance through experiments.

.2. Experimental setup

In this section, we present the experimental setup for evaluation ofhe proposed technique.

.2.1. Benchmarks and Tools

To evaluate the proposed VFL technique, we employ the Defects4jataset [14] for Java programs and the Siemens suite [30] for C pro-rams as our sets of faults. The Defects4j dataset provides faults alongith human-written patches, test cases, and other information useful

or fault localization studies [ 9 , 14 , 22 , 43 ]. The Siemens suite is a pop-lar set of faults that has been used for most fault localization studies 8 , 27 , 29 , 39 , 40 ].

Table 5 shows details related to the two datasets used in this study.he Defects4j dataset originally consisted of five different Java projects,

6

ut we excluded Closure since its test cases do not follow the standardUnit structure, resulting in some difficulties obtaining coverage infor-ation. However, even for a partial dataset, we secured 224 real bugs

rom large software projects with at least 22,000 lines of code, and theseere sufficient to conduct our study. The Siemens suite consists of 132ersions of seven C programs. However, only 120 identified faulty ver-ions were used in this experiment. We excluded the other 12 versionsince they have no fault or their faulty lines reside on header files thatannot be localized by standard SFL technique.

The proposed techniques require two types of data. Therefore, we usepen source tools to collect this data automatically. The first type of datas the execution traces, which is necessary to calculate suspicious ratioy applying the existing SFL technique. This is a collection of statementsovered by individual test cases. In this experiment, the Java-based De-ects4j dataset uses Gzoltar and the C language-based Siemens suite useslover to collect this data. Second, the additional data needed to applyroposed VFL techniques are variable and function invocation informa-ion. Variable information refers to the names of variables declared inhe program along with their location information. Function informa-ion refers to the names of functions coded in the program. In this ex-eriment, the Java-based Defects4j dataset uses ASTParser and the Canguage-based Siemens suite uses Ctags to collect this data. We canollect all the data we need using a variety of open source tools. There-ore, these tasks are very simple and easy, not requiring any additionaluman effort. Further, variable names and function invocation informa-ion can be gathered from source code statically, as opposed to dynami-ally. Therefore, the time and cost of collecting variable names in ordero apply the VFL (V) technique is negligible. In addition, to enable theFL (V) technique to handle statements that do not contain variables,ot only variable names but also function invocation information areeeded. However, the time and cost of collecting this information islso negligible. Most open source parsers do not require additional ef-ort because these tools can automatically collect information as long ase specify the type of information we want.

.2.2. Evaluation metric

One popular metric of evaluating the performance of fault localiza-ion techniques is an Exam score [ 13 , 21 , 24 , 26 , 31 ]. The Exam score in-icates the amount of source code that needs to be examined until ac-ual fault is localized. Since fault localization techniques often provide ranked list of suspicious lines, we can assume that the developer ex-mines each line in the given ranked list from the top. Once a developernds a fault on an examined line, the rank of the line indicates the num-er of source code lines the developer already examined. Hence, we canompute the proportion of such examined lines to all source code lineso assess how well a fault localization technique localizes an actual faultocation. The detailed formula to compute the Exam score is as follows.

𝑥𝑎𝑚 𝑆𝑐𝑜𝑟𝑒 =

𝑅𝑎𝑛𝑘 𝑜𝑓 𝑎𝑛 𝑎𝑐𝑡𝑢𝑎𝑙 𝑓𝑎𝑢𝑙𝑡 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛

𝑇 𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑑𝑒 𝑙𝑖𝑛𝑒𝑠 × 100

We consider two types of Exam scores: Best (B) and Worst (W) . Its possible that multiple lines can have the same rank in a ranked list,nd we need to decide how many lines of the same rank to examine.or example, suppose there are five lines with rank 7, and one of theseines is a faulty line. Since all lines above rank 6 have already beenxamined, we only need to confirm how many of the five lines at rank 7re examined until the faulty line is found. If the faulty line is examinedrst among the rank 7 lines, this is the best case (e.g., Exam score (B) ).n the contrary, if we assume that the faulty line is examined after allther rank 7 lines are examined, this is the worst case (e.g., Exam score

W) ). The Exam score is highly dependent on the sensitivity of the correla-

ion coefficient to calculate the suspicious ratio. This sensitivity meanshe suspicious ratio increases when one failed test case is covered and de-reases when one passed test case is covered. Of course, the sensitivitiesf the correlation coefficients are different because these are designed

Page 7: Information and Software Technology...many tools have already been developed, such as Ctags, Cscope and JavaParser, variable information can be easily obtained without addi- tional

J. Kim, J. Kim and E. Lee Information and Software Technology xxx (xxxx) xxx

ARTICLE IN PRESS

JID: INFSOF [m5GeSdc; December 5, 2018;1:43 ]

Table 5

Fault information from the Defects4j dataset and Siemens suite.

Program #Fault LoC Average#Variable Description

Defects4j Chart 26 96K 23,810 JFreechart

Lang 65 22K 5296 Apache commons-lang

Math 106 85K 15,928 Apache commons-math

Time 27 28K 8104 Joda-time

Java total 224 231K 13,285

Siemens suite Chart 5 726 97 Lexical analyzer

Printtokens2 10 570 94 Lexical analyzer

Replace 31 564 139 Pattern replacement

Schedule 9 412 87 Priority scheduler

Schedule2 9 374 87 Priority scheduler

Tcas 37 173 42 Altitude separation

Totinfo 19 565 69 Information measure

C total 120 3384 615

f

S

c

s

t

i

F

t

p

t

t

(

i

T

E

l

i

T

s

m

p

t

b

o

s

a

t

T

o

5

a

v

w

e

1

5

t

5

S

s

u

E

p

a

s

w

g

(

t

c

V

c

e

S

a

S

a

d

I

n

T

s

n

m

s

t

n

a

a

t

u

a

t

o

t

t

1

g

t

W

v

p

c

or different purposes and environments. Therefore, a high-performanceFL technique is good for classifying the difference between failed testases and passed test cases with suitable sensitivity. The higher the sen-itivity of the correlation coefficient, the lower the Exam score (B) buthe higher the Exam score (W) . This means that although a fault is foundn the upper ranking, there are many statements with the same ranking.or example, the correlation coefficient named Intersection [4] defineshe suspicious ratio simply as the number of failed test cases covered.

Intersection = The number of covered failed test cases

Suppose a faulty program has one failed test case and a hundredassed test cases. If the Intersection correlation coefficient is applied tohis program, the suspicious ratio of all statements covered by the failedest cases is 1, otherwise it is 0. Therefore, in this case, the Exam score

B) of this correlation coefficient is almost zero, but the Exam score (W)

s almost 100. In addition, suppose that the Exam score (B) using thearantula correlation coefficient is 0.5 and the Exam score (W) is 5. Thexam score (B) of the Tarantula is higher than the Intersection, but notess accurate. The reason for this is that the Exam score (B) of Tarantulas worse than the Intersection, but the Exam score (W) is much better.herefore, it is more accurate to compare the performance of the Exam

core (W) than the Exam score (B) . For this reason, we used a stricteretric, Exam score (W) in this study.

Exam score is most commonly used as a metric for comparing theerformance of various types of fault localization techniques. However,his score may not help developers localize faults, practically speaking,ecause developers do not check so many statements to find faults. Inther words, most of developers have complete trust in this kind of re-ult from automated techniques [55] . Therefore, we also use Top N as andditional metric to answer RQ2. Top N is defined as localizing the ac-ual faults in the top Nth ranked statement. We decided to examine theop 1, 5, 10, 20 , and 50 lines, since these numbers were used in a surveyf developers conducted in a previous study. Developers prefer the Top

(73.58%) as the minimum criteria for debugging. The next rankingsre Top 10 (15.09%), Top 1 (9.43%), and Top 20 (1.35%). There wereery few developers who wanted to use the Top 50 , but no single valueas more than a minimum criterion for debugging [17] . Therefore, we

valuate the performance of the proposed technique based on Top 1, 5,

0, 20 and 50 .

. RESULTS

This section presents evaluation results and a corresponding analysiso answer the research questions introduced in Section 4 .

.1. RQ1: How much effort can be saved with the VFL (V) technique?

To answer RQ1, we computed the Exam Scores of the VFL (V) and theFL techniques for all 344 faults from the Defects4j dataset and Siemensuite. Fig. 3 shows the Exam scores of the VFL (V) and the SFL techniques

7

sing nine different similarity coefficients. Each group represents thexam scores of the SFL (left) and the VFL (V) (right) techniques com-uted with a similarity coefficient. The Exam scores shown here are theverage scores of 344 programs in the Defects4j dataset and Siemensuite for each similarity coefficient. We also present the reduced efforthen using the VFL (V) technique instead of the SFL technique as a lineraph. For example, in Tarantula, the SFL Exam score is 7.51, the VFLV) Exam score is 3.31, and the reduced effort is about 56%.

This result indicates that the SFL technique should examine 7.51% ofhe total source code lines to find the fault, while the VFL (V) techniquean find faults by reviewing only 3.31% of the total lines. Therefore, ourFL (V) technique can reduce about 56% of the effort for examinationompared to the SFL technique with Tarantula. Note that the reducedffort is dependent to the similarity coefficient used in the VLF and theFL techniques. In addition, the maximum value is 56.41% in GP26,nd the minimum value is 51.81% in GP11. The average scores of theFL and the VFL (V) Exam scores are 6.87 and 3.11, respectively. Theverage reduced effort for all similarity coefficients is about 55%.

We further perform statistical significance tests to identify pre-postifferences between accuracy of the SFL and the VFL (V) techniques.f the sample size is very small or if the sample data does not assumeormality, this data should be tested with non-parametric techniques.herefore, we use the Wilcoxon signed ranks test because our sampleize is sufficient but the normality is not met. This is a statistical tech-ique of comparing the mean of two variables ( 𝜃1 , 𝜃2 ). Therefore, it isainly used to measure the effects of an experiment or treatment. We

et the accuracy of the SFL technique to pre-data and the accuracy ofhe VFL (V) technique to post-data to prove that they are different. Theull hypothesis and the alternative hypothesis are set as follows. • 𝐻 0 ( 𝑛𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 ) ∶ 𝜃1 = 𝜃2 (There is no difference between the

ccuracy of the SFL technique and that of the VFL (V) technique.) • H 1 ( alternative hypothesis ): 𝜃1 ≠ 𝜃2 (There is a difference between the

ccuracy of the SFL technique and that of the VFL (V) technique.) For this hypothesis test, we apply nine similarity coefficients to both

he SFL and the VLF (V) techniques. The accuracy of the SFL techniquesed as the pre-data is defined as the ranking of the statements of thectual faults localized by applying the SFL technique. The accuracy ofhe VFL (V) technique used as the post-data is defined as the rankingf the statements of the actual faults localized by applying the VFL (V)echnique. In addition, we use the two-tailed test rather than the one-ailed test for more rigorous hypothesis testing.

We apply the SFL technique to 224 Defect4j dataset programs and20 Siemens suite programs to calculate the accuracy ( 𝜃1 ) for each pro-ram. We also calculate the accuracy ( 𝜃2 ) of the VFL (V) technique inhe same way. The ultimate goal of this hypothesis test is to confirm by

ilcoxon signed ranks test that there is a difference between these twoariables ( 𝜃1 , 𝜃2 ). In Table 6 , since the asymptotic significance of allairs is less than 0.01, the null hypothesis is rejected at a 99% signifi-ance level. This result implies that there is a clear difference between

Page 8: Information and Software Technology...many tools have already been developed, such as Ctags, Cscope and JavaParser, variable information can be easily obtained without addi- tional

J. Kim, J. Kim and E. Lee Information and Software Technology xxx (xxxx) xxx

ARTICLE IN PRESS

JID: INFSOF [m5GeSdc; December 5, 2018;1:43 ]

Fig. 3. Comparison of Exam scores between SFL and VFL (V)

techniques.

Table 6

Wilcoxon signed ranks test results of the accuracy of SFL and VFL (V) techniques.

Wilcoxon signed ranks test

Z r (Effect size) Asymp. Sig. (2-tailed)

VFL(V) Tarantula – SFL Tarantula − 8.874 ∗ 0.500 0.000

VFL(V) Jaccard – SFL Jaccard − 6.978 ∗ 0.393 0.000

VFL(V) Naish – SFL Naish − 10.921 ∗ 0.615 0.000

VFL(V) GP08 – SFL GP08 − 10.125 ∗ 0.570 0.000

VFL(V) GP10 – SFL GP10 − 9.176 ∗ 0.517 0.000

VFL(V) GP11 – SFL GP11 − 10.595 ∗ 0.597 0.000

VFL(V )GP13 – SFL GP13 − 7.867 ∗ 0.443 0.000

VFL(V) GP20 – SFL GP20 − 9.626 ∗ 0.542 0.000

VFL(V) GP26 – SFL GP26 − 7.390 ∗ 0.416 0.000

∗ Based on positive ranks.

t

u

o

0

I

c

c

i

a

V

5

l

n

g

n

t

h

d

F

l

i

o

f

t

S

i

c

o

5

t

c

p

p

t

f

7

b

T

u

a

d

r

d

he accuracy of the SFL technique and that of the VFL (V) techniquesing all similarity coefficients in the experiment.

We also calculated the r value of the Wilcoxon signed ranks test tobtain the effect size in Table 6 . Generally, if the r value is greater than.1, 0.3, or 0.5, each effect is defined as small, medium, or large size.n the experimental results, although the effect is medium size in thease of Jaccard, GP13 and GP26, the effect is large size in the remainingases. In addition, since the average of these is 0.511, the average effects large size. In other words, the average effect size of the SFL techniquend the VFL (V) technique is large. Therefore, we can conclude that theFL (V) technique has improved much more than the SFL technique.

.2. RQ2: How well does the VFL (V) technique place fault statements at

ow ranks?

To answer RQ2, we compare the performance of the VFL (V) tech-ique with that of the SFL technique using only the Top N entries of theenerated ranked lists. The SFL and the VFL (V) columns indicate theumber of faults found within Top 1, 5, 10, 20 , and 50 lines.

In some cases, the number of faults found within the Top N is smallerhan or equal to (white box) that of existing SFL techniques. Overall,owever, the performance of the VFL (V) technique improves (gray box)ramatically in both Defects4j dataset and the Siemens suite in Table 7 .or example, in Tarantula, the VFL (V) technique in Defects4j datasetocalized 37 out of 224 faults in Top 1 , while the SFL technique local-zed only 6 faults. In the Siemens suite, the SFL technique included 2

8

ut of 120 faults in the Top 1 , and the VFL (V) technique included 13aults. Therefore, when the Tarantula similarity coefficient is applied,he VFL (V) technique includes 50 out of 344 faults in Top 1 , while theFL technique includes 8.

In addition, the SFL technique included 57, 85, 140, and 247 faultsn the Top 5, Top 10, Top 20 , and Top 50 , and the VFL (V) techniques in-luded 114, 157, 212, and 259 faults, respectively. The rates of increasef Top 1, Top 5, Top 10, Top 20 , and Top 50 were 525%, 100%, 85%,1% and 5%, respectively.

The accuracy depends on the applied similarity coefficient. However,he VFL (V) technique outperforms the SFL technique with all similarityoefficients. On average, the VFL (V) technique extends the number ofrograms included in the Top 1 from 11 to 44 (about 306%). In addition,rograms included in the Top 5 were expanded by about 93% from 54o 104. The program included in the Top 10 is expanded by about 89%rom 77 to 145. The programs included in the Top 20 expanded by about0% from 117 to 199. The programs included in the Top 50 expandedy about 24% from 203 to 251.

The minimum success criterion for finding faults is generally set fromop 1 to Top 50 . However, most software developers only consider val-es up to the Top 5 , even if a list of suspicious entities is obtained bypplying the automated fault localization technique. In other words, ifevelopers cannot find actual faults in the Top 5 of the suspicious entityanked list, they find manual debugging ways [17] . Therefore, to helpebugging tasks in the real world, faults must be found in the Top 5 .

Page 9: Information and Software Technology...many tools have already been developed, such as Ctags, Cscope and JavaParser, variable information can be easily obtained without addi- tional

J. Kim, J. Kim and E. Lee Information and Software Technology xxx (xxxx) xxx

ARTICLE IN PRESS

JID: INFSOF [m5GeSdc; December 5, 2018;1:43 ]

Table 7

Comparison of successfully localized faults with only the Top N entries for SFL, VFL (V), and VFL (VF).

Top 1 Top 5 Top 10 Top 20 Top 50

Similarity coefficient Program SFL VFL(V) VFL(VF) SFL VFL(V) VFL(VF) SFL VFL(V) VFL(VF) SFL VFL(V) VFL(VF) SFL VFL(V) VFL(VF)

Tarantula Defects4j 6 37 45 36 85 95 57 116 118 99 145 149 170 176 182

Siemens 2 13 15 21 29 31 28 41 46 41 67 78 77 83 91

Total 8 50 60 57 114 126 85 157 164 140 212 227 247 259 273

Jaccard Defects4j 47 59 61 114 111 117 140 134 136 163 159 163 190 187 191

Siemens 2 13 16 22 30 33 29 43 48 42 68 76 79 82 92

Total 49 72 77 136 141 150 169 177 184 205 227 239 269 269 283

Naish Defects4j 3 36 43 18 86 97 29 112 116 53 141 147 102 176 181

Siemens 4 15 18 25 33 35 36 49 54 48 70 79 88 88 98

Total 7 51 61 43 119 132 65 161 170 101 211 226 190 264 279

GP08 Defects4j 3 18 20 18 49 58 29 76 86 53 114 126 102 167 179

Siemens 3 14 17 25 34 35 38 51 55 55 74 84 91 89 98

Total 6 32 37 43 83 93 67 127 141 108 188 210 193 256 277

GP10 Defects4j 3 22 36 18 65 73 29 88 90 53 123 115 102 155 153

Siemens 0 13 13 10 26 26 14 40 41 35 65 75 68 85 95

Total 3 35 49 28 91 99 43 128 131 88 188 190 170 240 248

GP11 Defects4j 3 29 40 18 70 83 29 101 107 53 132 145 102 168 179

Siemens 4 15 18 27 30 31 36 45 49 48 66 75 88 86 97

Total 7 44 58 45 100 114 65 146 156 101 198 220 190 254 276

GP13 Defects4j 3 25 15 18 72 79 29 92 93 53 114 120 102 152 158

Siemens 4 15 18 25 34 35 36 49 53 48 70 79 88 88 99

Total 7 40 33 43 106 114 65 141 146 101 184 199 190 240 257

GP20 Defects4j 3 29 40 18 72 84 29 96 108 53 129 137 102 164 176

Siemens 4 15 18 25 33 34 36 49 54 48 70 80 88 87 97

Total 7 44 58 43 105 118 65 145 162 101 199 217 190 251 273

GP26 Defects4j 3 15 31 18 43 56 29 76 67 53 112 92 102 146 132

Siemens 1 15 18 29 34 36 38 49 54 56 71 75 84 84 95

Total 4 30 49 47 77 92 67 125 121 109 183 167 186 230 227

Average 11 44 54 54 104 115 77 145 153 117 199 211 203 251 266

g

t

t

1

n

5

m

d

a

t

b

t

f

S

G

1

c

D

o

i

a

t

a

n

m

D

(

4

p

t

S

a

r

a

t

d

t

t

e

m

a

m

s

5

a

p

l

(

i

5

f

p

S

y

w

r

p

t

t

f

o

In experimental results, the existing SFL technique has only 54 pro-rams that find the actual faults in the Top 5 . Therefore, we proposehe VFL (V) technique that can improve the localized faults from 54o 104 (about 93%). In addition, the existing SFL technique has only1 programs that find actual faults in the Top 1 , but the VFL (V) tech-ique included 44 programs with about 306% improvement in the Top

. Based on the above experimental results, the VFL (V) technique isore helpful to the developer than the SFL technique.

As described above, the performance of the SFL technique is highlyependent on the similarity coefficient. If other similarity coefficientsre applied to the same program, the results are different. Of course,his phenomenon appears in the VFL (V) technique as well because theasic concept of the VFL (V) technique is very similar to that of the SFLechnique. Note that our results in Table 6 also show different resultsor each similarity coefficient.

When applying the SFL technique to the Defects4j dataset andiemens suite, Tarantula has 8, Jaccard has 49, Naish, GP11, GP13,P20 has 7, GP08 has 6, GP10 has 3 and GP26 has 4 programs in Top. Of course, the performance is different according to the similarityoefficient in the other Top N. Specifically, if Jaccard is applied to theefects4j dataset, the accuracy of the technique is very high, being onef the outliers. Based on these results, we can see that applying Jaccards the best way to localize faults in the Defects4j dataset. However, wedditionally determined that it is not necessary to apply the VFL (V)echnique to programs that already have a very high accuracy whenpplying the SFL technique. In other words, applying the VFL (V) tech-ique to such programs is not necessarily better compared to the perfor-ance of the SFL technique. In Table 6 , when Jaccard is applied to theefects4j dataset, the SFL technique outperforms the VFL (V) technique

Top 5, 10, 20, 50). Since the SFL and the VFL (V) techniques include7 and 59 programs in Top 1, respectively, the VFL (V) technique out-erforms the SFL technique. However, since the SFL and the VFL (V)echniques include 114 and 111 programs in Top 5, respectively, theFL technique outperforms the VFL (V) technique. Similarly, the SFLnd the VFL (V) techniques include 140 and 134 programs in Top 10,

n

9

espectively, 163 and 159 programs in Top 20, respectively, and 190nd 187 programs in Top 50, respectively.

This result ultimately requires further discussion, but it seems thathis is caused by the ability of Jaccard to localize faults in the Defects4jataset with high accuracy. In other words, we believe that it is difficulto improve the performance via the VFL (V) technique because the SFLechnique using Jaccard is already overfitted for this benchmark. How-ver, practically speaking it is rarely possible to know in advance theost optimal similarity coefficient for each benchmark [ 20 , 35 , 42 , 44 ],

nd the VFL (V) technique outperforms the SFL technique for the re-aining similarity coefficients except Jaccard.

In addition, when GP08, GP11, and GP20 are applied to the Siemensuite, the SFL technique outperforms the VFL (V) technique in the Top0. However, developers do not check to the top 50 in practice, andpplying the VFL (VF) technique to these programs guarantees a bettererformance than the SFL technique. Therefore, we believe these out-iers can be ignored. Although the SFL technique outperforms the VFLV) technique for some programs, it is clear that the VFL (V) techniques an overall very effective technique, given the above reasons.

.3. RQ3: Is the VFL (V) technique effective for every fault?

To answer RQ3, we compare the SFL and the VFL (V) ranks of actualault locations by applying the SFL and the VFL (V) techniques to 344rograms. Fig. 4 indicates the results for 224 Defects4j dataset and 120iemens suite programs. The x -axis is the portion of programs, and the -axis is similarity coefficients. Better defines the number of programshose rank of faults found by the VFL (V) technique is lower than the

ank of faults found by the SFL technique. Worse defines the number ofrograms whose rank of faults found by the VFL (V) technique is higherhan the rank of faults found by the SFL technique. Finally, Same referso the situation where the number of programs whose rank of faultsound by the SFL technique and the number of programs whose rankf faults found by the VFL (V) technique are the same. N/A defines theumber of programs that cannot be found by the VFL (V) technique.

Page 10: Information and Software Technology...many tools have already been developed, such as Ctags, Cscope and JavaParser, variable information can be easily obtained without addi- tional

J. Kim, J. Kim and E. Lee Information and Software Technology xxx (xxxx) xxx

ARTICLE IN PRESS

JID: INFSOF [m5GeSdc; December 5, 2018;1:43 ]

Fig. 4. Change of rank between SFL and VFL (V) techniques.

S

a

r

a

c

t

b

(

a

a

i

5

a

e

c

n

T

v

c

a

l

p

o

w

p

e

m

f

f

f

c

i

T

t

b

a

4

Table 8

Information for faults from the Defects4j dataset and Siemens

suite.

Benchmark Program Version Rank of actual fault

Defects4j Chart 23 42

Lang 11 52

Lang 51 161

Lang 52 143

Math 2 18

Math 22 22

Math 27 28

Math 10 211

Math 12 19

Math 51 69

Time 25 302

Siemens suite Printtokens2 1 2

Printtokens2 3 2

Printtokens2 4 39

Printtokens2 9 3

Schedule2 4 25

Tcas 1 2

Tcas 4 35

Tcas 6 26

Tcas 10 26

Tcas 25 2

Tcas 31 28

Tcas 32 28

Tcas 37 3

Tcas 39 5

Tcas 40 37

Tcas 41 210

Totinfo 3 79

Totinfo 14 71

Fig. 5. Change of rank between SFL and VFL (VF) technique.

m

a

7

t

t

t

n

i

t

r

t

1

p

1

f

For example, in Tarantula, the VFL (V) technique outperforms theFL technique in 238 of 344 programs (about 70%). In addition, theccuracy is the same or worse in 15 (4%) and 62 (18%) of programs,espectively.

The accuracy depends on the similarity coefficient. However, theccuracy is improved in about 70% of programs for all similarity coeffi-ients. The VFL (V) technique indicates a higher accuracy than the SFLechnique in an average of 240 programs (about 70%). Also, the num-er of programs with the same and worse accuracy is 11 (3%) and 6218%), respectively.

The N/A of all the correlation coefficients is 29, which accounts forbout 8% of the 344 programs. Therefore, the VFL (V) technique can bepplied to more than 92% of the programs on average, and the accuracys improved by more than 70%

.4. RQ4: Can we increase the VFL (V) technique performance and

pplicability by using additional techniques?

The SFL technique assigns a suspicious ratio to all statements cov-red by the test suite. Therefore, although the accuracy of this techniquean be very low, it can always find faults. However, unlike the SFL tech-ique, the VFL (V) technique only assigns suspicious ratios to variables.herefore, it is necessary to handle statements that do not include anyariable to improve the performance of the VFL (V) technique.

However, faults that cannot be found by the VFL (V) technique alsoannot be accurately detected by the SFL technique. In Fig. 4 , therere 29 N/A faults. This means that the location of the faults cannot beocalized by the VFL (V) technique. Table 8 shows a complete list ofrograms that cannot find a fault using the VFL (V) technique. However,nly 7 of the 29 programs (about 24%) can be included in the Top 10ith the existing SFL technique. In other words, the remaining 76% arerograms that do not provide useful information to actual developers,ven if the existing SFL technique is applied.

We found some common characteristics among the programs, afteranual inspection of these 29 programs. That is, most programs invoke

unctions frequently. If there is a fault in the statements that invokeunctions, it is difficult to find a fault with the VFL (V) technique. There-ore, we extend the VFL (V) technique to handle statements that do notontain variables by parsing the functions invocation information. Thisnformation can be used to find an additional 14 faults (Math 2–27, Allcas faults in Table 8 ). In Fig. 4 , there are 29 N/A faults with the VFL (V)echnique. In Fig. 5 , the number of N/A faults is reduced from 29 to 15y applying the VFL (VF) technique. Among the 14 reduced programs,n average of 12 are included in the Better .

A comparison of the VFL (V) and the VFL (VF) techniques in Figs. and 5 shows that the VFL (VF) technique improves the accuracy in

10

ore programs than the VFL (V) technique. The accuracy of the VFL (V)nd the VFL (VF) techniques is improved by an average of 240 (about0%) and 252 (about 73%) programs, respectively, compared to the SFLechnique. In addition, to analyze the validity of the VFL (VF) technique,he evaluation metric is set to Top N and compared with the SFL andhe VFL (V) techniques. The VFL (VF) columns in Table 7 indicate theumber of programs included in the Top N when the VFL (VF) techniques applied. In the Top 1, Top 5, Top 10, Top 20 , and Top 50 , the accuracy ofhe VFL (VF) technique is improved by 28%, 23%, 11%, 14%, and 30%,espectively, compared with the VFL (V) technique. Also, the VFL (VF)echnique extends the number of programs included in the Top 1 from1 to 54, about 392%, compared with the SFL technique. In addition,rograms included in the Top 5 are expanded by about 114% from 54 to15. The programs included in the Top 10 are expanded by about 99%rom 77 to 153. The programs included in the Top 20 are expanded by

Page 11: Information and Software Technology...many tools have already been developed, such as Ctags, Cscope and JavaParser, variable information can be easily obtained without addi- tional

J. Kim, J. Kim and E. Lee Information and Software Technology xxx (xxxx) xxx

ARTICLE IN PRESS

JID: INFSOF [m5GeSdc; December 5, 2018;1:43 ]

a

e

6

6

p

d

g

f

f

i

p

f

t

c

t

n

u

o

s

t

a

l

f

o

i

T

t

t

T

a

H

a

t

c

I

f

f

i

b

r

s

i

m

f

c

d

r

t

t

i

t

s

n

t

t

m

c

t

o

s

s

y

y

u

o

t

n

m

7

7

b

g

i

2

u

u

d

t

f

t

i

p

v

fi

t

m

p

t

t

8

8

r

c

c

s

t

t

T

t

t

D

t

b

t

f

n

l

n

O

r

bout 80% from 117 to 211. The programs included in the Top 50 arexpanded by about 31% from 203 to 266.

. Discussion

.1. Multiple faults

It is necessary to consider the number of faults when applying theroposed technique to real projects. However, it is difficult not only toetermine the number of faults contained in this source code, but moreenerally to even determine whether the source code contains a singleault or multiple faults. Therefore, various studies dealing with multipleaults are already actively carried out, and we classify these researchesnto two types based on the treatment method.

The first is a sequential fault localization technique. Here, if someortion of test cases fail, the technique assumes that the program has oneault and then localizes the fault. Next, this technique fixes the fault andhen runs the entire test again. (It is assumed that the found faults areompletely fixed.) If another test case fails, this technique again assumeshat there is an additional fault at another location and localizes thisew fault. This technique repeatedly and sequentially localizes the faultsntil all test cases are passed.

The second is a parallel fault localization technique. If some portionf test cases fail, this technique first determines whether the fault is aingle fault or multiple faults. If the source code contains multiple faults,his technique predicts the number of faults by using information suchs coverage, model, and pattern. Then, the technique simultaneouslyocalizes and fixes these multiple faults in parallel. Unlike sequentialault localization techniques, a parallel technique can be completed atnce without repeating the fault localization task. However, if the testnformation is not sufficient, it is difficult to predict the number of faults.herefore, in this case, the accuracy of such a parallel fault localizationechnique is not high [ 51 –54 ].

In addition, since this multiple faults cannot be guaranteed to be mu-ually independent, an interference problem may occur between them.he sequential technique is to find faults one by one and fix them inny order. Therefore, the interference problem need not be considered.owever, because the parallel fault localization technique finds all faultst once, this interference problem must be considered.

The interference problem means that more than two different faultshat are combined affect each other. In general, interference consists ofonstructive, destructive, and constructive and destructive interference.f more than two different faults are existing at the same time, a newault may be created that is different from the existing faults, and theseaults are defined as constructive interference relations. On the contrary,f more than two different faults exist at the same time, all faults maye canceled out, and these faults are defined as destructive interferenceelations. Finally, if both constructive and destructive interference occurimultaneously, these faults are defined as constructive and destructiventerference relations.

To apply the parallel VFL technique and verify this performance,ultiple faults must be generated in combination of more than two

aults. However, interference phenomenon of faults occurs in this pro-ess and it is very important to deal with them. In addition, this is aifficult problem that has been actively carried out but has not yet beenesolved [ 58 –62 ]. Actually, we have created a new program that con-ains multiple faults by combining several faults, and we have appliedhe parallel VFL technique here. As a result, some programs have beenmproved when parallel techniques were applied than when sequentialechniques were applied. However, some of the program has the oppo-ite result. In other words, the performance of parallel VFL technique isot stable and even highly dependent on interference between faults.

In addition, the VFL technique simply calculates the suspicious ra-io by substituting the execution traces and test results information intohe similarity coefficients like the SFL technique. Therefore, the perfor-ance of this technique is highly dependent on the similarity coeffi-

11

ient. Even the performances of sequential and parallel fault localiza-ion techniques are largely determined by the similarity coefficient. Inther words, some similarity coefficients are more suitable for applyingequential techniques rather than parallel techniques, while for otherimilarity coefficients the opposite is true.

For the above reasons, we think that parallel VFL technique is notet more effective than sequential VFL technique because there is notet enough technique to help improve the parallel VFL technique bysing the extracted variables and function invocations information. Inther words, it is more effective to apply sequential fault localizationechnique rather than parallel to find multiple faults with the VFL tech-ique. Therefore, to handle multiple faults with the VFL technique, weust apply this technique repeatedly to localize faults one at a time.

. Threats to validity

.1. External validity

The primary threat to external validity is that our results might note generalizable. We evaluate performance by applying both Java pro-rams (Defects4j dataset) and C programs (Siemens suite) to general-ze our results. Defects4j dataset consists of four large-scale projects of2,000 lines or more. Also, because it is an open source project, individ-al programs contain actual faults. Therefore, it is reasonable to eval-ate the performance by applying the proposed technique to Defects4jataset. In other words, this result can be generalized and is expectedo show similar results to other projects.

However, the most widely used Siemens suite for evaluating the per-ormance of FL techniques consists of very small toy programs of lesshan 1,000 lines. It is often pointed out that some researchers use annsufficient test suite. In other words, this project cannot represent all Crograms. Likewise, the UNIX programs flex, grep, gzip, and make pro-ided by SIR are large, but they also consist of programs that have arti-cial faults. Therefore, this project is also not realistic. In other words,he high performance of these projects does not guarantee high perfor-ance in all C programs. However, most of the studies performed in theast have been applied to this benchmark to evaluate performance. Al-hough the Siemens suite might not be realistic, it is sufficient to identifyhe feasibility of the proposed technique, as in previous studies [39] .

. Related work

.1. Spectrum-based fault localization techniques

The common feature of the SFL technique is that it gives suspiciousatio to statements based on execution trace and test result, and welassify them into three types according to research purpose. The firstalculates the execution traces and test results of the program and thenubstitutes these values into the similarity coefficient formula. Althoughhis technique is very traditional, it is still being studied because it hashe advantage of very low computational cost [4] .

Jones et al. [12] first designed Tarantula for software debugging.his technique provides a high suspicious ratio to entities that are closeo the failed test case and far from the passed test case. Many similarechniques such as Jaccard [1] and Ochiai [32] have been designed.allmeier et al. [6] subsequently designed the AMPLE technique. This

echnique results in a suspicious ratio that is a function of the distanceetween covered failed and passed test cases. Compared to Tarantula,his technique assigns equally high suspicious to statements that are farrom the failed test cases and close to passed. In other words, this tech-ique considers only the difference between two test case sets, regard-ess of the test results. Naish et al. [23] suggested the Op and Op2 tech-ique by experimentally analyzing various similarity coefficients. Thep2 technique is the representative in most SFL studies, and is often

eferred to as the Naish technique. Yoo [45] suggested GP 1–30 based

Page 12: Information and Software Technology...many tools have already been developed, such as Ctags, Cscope and JavaParser, variable information can be easily obtained without addi- tional

J. Kim, J. Kim and E. Lee Information and Software Technology xxx (xxxx) xxx

ARTICLE IN PRESS

JID: INFSOF [m5GeSdc; December 5, 2018;1:43 ]

o

g

t

o

W

t

p

a

e

e

t

a

S

n

e

S

f

e

f

t

t

n

t

e

t

a

t

s

s

n

t

f

t

t

i

c

a

o

t

c

p

m

i

t

s

e

c

l

l

c

d

p

9

t

s

t

o

a

c

o

t

f

a

a

e

t

f

t

v

s

t

f

d

o

a

l

t

t

w

s

r

A

p

d

(

t

M

R

[

[

[

[

[

n a reformulation of various similarity coefficients with genetic pro-ramming techniques. Not all proposed similarity coefficients are op-imal for all projects. However, the six GP similarity coefficients rec-mmended by the authors showed high performance in most projects.ong et al. [ 37 , 38 ] developed techniques for adding weights to failed

est cases through various rules. Heuristic 1, 2, and 3 techniques em-irically added different weights to the covered entities via the failednd the passed test cases. Also, Dstar additionally gave weights to thentities covered by the failed test case by squaring the numerator of thexisting Kulczynski [38] similarity coefficient.

The second SFL technique type is to improve the performance ofhe fault localization technique by combining with other techniques ordding information to the suspicious ratio calculated by the traditionalFL technique described above. Mengshi et al. [56] suggested a tech-ique that uses the PageRank technique to improve the performance ofxisting SFL techniques. This technique improves the accuracy of theFL technique by further utilizing the connection relation (invocationrequency) between the functions and the information of functions cov-red by the test case. Divya Gopinath et al. [46] proposed a techniqueor improving the performance of the formula-based fault localizationechnique with SFL technique. This is a way to incrementally localizehe faults while adding test cases one at a time based on the SFL tech-ique. This technique computes the suspicious ratio of Tarantula andhen selects the next test case most similar to this ratio to improve thefficiency of fault localization. Wong et al. [ 40 , 57 ] designed a techniqueo apply machine learning using the execution traces and test results of program as training data. In addition, this technique utilizes the or-hogonal matrix as test data. Therefore, the degrees to which individualtatements affected the failed test case can be calculated and used as auspicious ratio of individual statements.

The last SFL technique type is to compare and analyze various tech-iques. It is not easy to further improve the accuracy or efficiency of thisechnique, since this study has been very active up until recently. There-ore, many studies are being conducted to compare and analyze the fea-ures of existing similarity coefficients rather than researching improvedechnique. Xie et al. [42] theoretically analyzed the 30 existing similar-ty coefficients, and they grouped the similarity coefficients with similarharacteristics into six groups. In addition, Tang et al. [35] comparednd analyzed the accuracy of the coefficients using graph theory basedn Xie’s results.

Xuan et al. [44] suggested a MULTRIC technique that combines mul-iple similarity coefficients. Because the performance of the similarityoefficient depends on the project, it is not possible to achieve the besterformance using only one similarity coefficient. Therefore, the opti-al situation is a combination of similarity coefficients based on histor-

cal results. Lucia et al. [25] applied 40 similarity coefficients to the same project

o compare the performance in the same environment. Experimental re-ults have shown that there is no single best similarity coefficient forvery project. It also proved that the performance of similarity coeffi-ients depends on the number of faults (single or multiple).

Tien-Duy et al. [20] proposed Savant, which recommends one simi-arity coefficient that is most suitable for a project among several simi-arity coefficients. This technique uses various similarity coefficients toalculate rankings and learns their changes. Then, when the project isetermined, it recommends the similarity coefficient that is most appro-riate for the project.

. Conclusion and future work

In this paper, we proposed a novel variable-based fault localizationechnique for more effective debugging. The existing SFL techniques as-ign suspicious ratios to individual statements based on the executionraces and test results. The proposed technique requires these two typesf data as well as the name and location information of the variablesnd functions extracted from the faulty source code. This information

12

an be easily extracted automatically without human effort using vari-us existing open source tools. In addition, it is further utilized to solvehe problems of existing SFL techniques that rely solely on coverage in-ormation. We have verified the significantly improved the accuracy ofll correlation coefficients for 344 Java and C programs that includertificial and real faults.

Also, since the proposed technique is very lightweight, it is relativelyasy to collaborate with other fault localization techniques. Althoughhis requires the additional task of extracting variable and function in-ormation, it consumes very little time and cost with existing automationools. In addition, if statements include multiple variables or function in-ocations, the VFL technique needs iterative calculations, but this is aimple computation, so it is lightweight enough to be ignored.

In addition, the existing SFL techniques calculate the suspicious ra-ios of all statements regardless of whether they contain variables andunction invocations. On the other hand, the proposed technique firstetermines the suspicious variables and functions, and then computesnly the suspicious ratios of the statements containing these variablesnd function invocations, thus reducing the overall computation.

To conclude, we expect that the proposed techniques are highlyikely to evolve and will be very useful in the future. Furthermore, evenhough we are now assigning even weights without taking into accounthe type of statements, we will experiment in the future by assigningeights differently based on historical data. We will also consider as-

igning suspicious ratios only to statements containing some of the top-anked variables in the calculated suspicious variable list.

cknowledgments

This research was supported by Next-Generation Information Com-uting Development Program through the National Research Foun-ation of Korea (NRF) funded by the Ministry of Science, ICT 2017M3C4A7068179 ), and Basic Science Research Program throughhe National Research Foundation of Korea (NRF) funded by theinistry of Education ( 2016R1D1A1B03934610 ).

eferences

[1] R. Abreu , P. Zoeteweij , ArjanJ.C. van Gemund , An Evaluation of Similarity Coeffi-cients for Software Fault Localization, in: Proceedings of the International Sympo-sium on Software Testing and Analysis, 2006 .

[2] G.K. Baah , A. Podgurski , M.J. Harrold , Causal Inference for Statistical Fault Local-ization, in: Proceedings of the International Symposium on Software Testing andAnalysis, 2010 .

[3] R.M. Balzer , EXDAMS: extendable debugging and monitoring system, in: Proceed-ings of the Spring Joint Computer Conference, 1969 .

[4] S. Choi , S. Cha , C.C. Tappert , A survey of binary similarity and distance measures,J. Syst. Cybern. Inform. 8 (1) (2010) 43–48 .

[5] P. Yulei , X. Xue , A.S. Namin , Debugging in parallel or sequential: an empirical study,J. Softw. 10 (5) (2015) 566–576 .

[6] V. Dallmeier , C. Lindig , A. Zeller , Lightweight bug localization with AMPLE, in:Proceedings of the Symposium on Automated Analysis-driven Debugging, 2005 .

[7] R.W. Floyd , Assigning meanings to programs, in: Proceedings of the Symposia inApplied Mathematics, 1967 .

[8] R. Gao , W.E. Wong , Z. Chen , Y. Wang , Effective software fault localization usingpredicted execution results, Softw. Qual. J. 25 (1) (2017) 131–169 .

[9] G. Gay , The fitness function for the job: search-based generation of test suites thatdetect real faults, in: Proceedings of the International Conference on Software Test-ing, Verification and Validation, 2017 .

10] D. Gopinath , R.N. Zaeem , S. Khurshid , FAULTTRACER: a spectrum-based approachto localizing failure-inducing program edits, Autom. Softw. Eng. 25 (12) (2013)1357–1383 .

11] B. Hofer , A. Perez , R. Abreu , F. Wotawa , On the empirical evaluation of similaritycoefficients for spreadsheets fault localization, Autom. Softw. Eng. 22 (1) (2015)47–74 .

12] J.A. Jones , M.J. Harrold , Empirical evaluation of the tarantula automatic fault-lo-calization technique, in: Proceedings of the International Conference on Automatedsoftware engineering, 2005 .

13] X. Jua , S. Jianga , X. Chen , X. Wanga , Y. Zhanga , H. Cao , HSFal: effective fault local-ization using hybrid spectrum of full slices and execution slices, J. Syst. Softw. 90(2014) 3–17 .

14] R. Just , D. Jalali , M.D. Ernst , Defects4J: a database of existing faults to enable con-trolled testing studies for Java programs, in: Proceedings of the International Sym-posium on Software Testing and Analysis, 2014 .

Page 13: Information and Software Technology...many tools have already been developed, such as Ctags, Cscope and JavaParser, variable information can be easily obtained without addi- tional

J. Kim, J. Kim and E. Lee Information and Software Technology xxx (xxxx) xxx

ARTICLE IN PRESS

JID: INFSOF [m5GeSdc; December 5, 2018;1:43 ]

[

[[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

15] J. Kim , J. Park , E. Lee , A new spectrum-based fault localization with the techniqueof test case optimization, J. Inf. Sci. Eng. 32 (1) (2013) 177–196 .

16] D.E. Knuth , Optimum binary search trees, Acta Inf. 1 (1) (1971) 14–25 . 17] P.S. Kochhar , X. Xia , D. Lo , S. Li , Practitioners’ expectations on automated fault

localization, in: Proceedings of the International Symposium on Software Testingand Analysis, 2016 .

18] G. Laghari , A. Murgia , S. Demeyer , Fine-tuning spectrum based fault localisationwith frequent method item sets, in: Proceedings of the International Conference onAutomated Software Engineering, 2016 .

19] D. Landsberg , H. Chockler , D. Kroening , M. Lewis , Evaluation of measures for sta-tistical fault localisation and an optimising scheme, in: International Conference onFundamental Approaches to Software Engineering, 2015 .

20] T.-D.B. Le , D. Lo , C.L. Goues , L. Grunske , A learningto-rank based fault localizationapproach using likely invariants, in: Proceedings of the International Symposium onSoftware Testing and Analysis, 2016 .

21] T-D.B. Le , F. Thung , D. Lo , Theory and practice, do they match? A case with spec-trum-based fault localization, in: Proceedings of the International Conference onSoftware Maintenance, 2013 .

22] X.-B.D. Le , D.-H. Chu , D. Lo , C.L. Goues , W. Visser , S3: Syntax- and semantic-guidedrepair synthesis via programming by examples, in: Proceedings of the Joint Meetingon Foundations of Software Engineering, 2017 .

23] N. Lee , H.J. Lee , K. Ramamohanarao , A model for spectrabased software diagnosis,Trans. Softw. Eng. Methodol. 20 (3) (2006) 11 .

24] X. Li , W.E. Wong , R. Gao , L. Hu , S. Hosono , Genetic algorithm-based test generationfor software product line with the integration of fault localization techniques, Empir.Softw. Eng. 23 (1) (2018) 1–51 .

25] DavidLo L. , L. Jiang , F. Thung , A. Budi , Extended comprehensive study of associationmeasures for fault localization, J. Softw. Evolut. Process 26 (2) (2014) 172–219 .

26] M. Maamar , N. Lazaar , S. Loudni , Y. Lebbah , Fault localization using itemset miningunder constraints, Autom. Softw. Eng. 24 (2) (2017) 341–368 .

27] X. Mao , Y. Lei , Z. Dai , Y. Qi , C. Wang , Slice-based statistical fault localization, J.Syst. Softw. 89 (2013) 51–62 .

28] P. Mike , Y.L. Traon , Metallaxis-FL: mutation-based fault localization, Softw. Test.Verif. Reliab. 25 (5–7) (2015) 605–628 .

29] B. Mohammed , H. Collavizza , M. Rueher , LocFaults:a new flow-driven and constrain-t-based error localization approach, in: Proceedings of the Symposium on AppliedComputing, 2015 .

30] University Of Nebraska-Lincoln, Software-artifact Infrastructure Repository, 2005.http://sir.unl.edu/php/index.php .

31] S. Pearson , Evaluation of fault localization techniques, in: Proceedings of the Inter-national Symposium on Foundations of Software Engineering, 2016 .

32] A. Rui , P. Zoeteweij , A.J.C. van Gemund , On the accuracy of spectrum-based faultlocalization, in: Proceedings of the Academic and Industrial Conference Practice andResearch Techniques-MUTATION, 2007 .

33] J. Sohn , S. Yoo , FLUCCS: using code and change metrics to improve fault local-ization, in: Proceedings of the International Symposium on Software Testing andAnalysis, 2017 .

34] F. Steimann , M. Frenkel , R. Abreu , Threats to the validity and value of empiricalassessments of the accuracy of coverage-based fault locators, in: Proceedings of theInternational Symposium on Software Testing and Analysis, 2013 .

35] C.M. Tang , W.K. Chan , Y.T. Yu , Z. Zhang , Accuracy graphs of spectrum-based faultlocalization formulas, Trans. Softw. Eng. 66 (2) (2017) 403–424 .

36] R.P. Weicker , Dhrystone: a synthetic systems programming benchmark, Commun.ACM. 27 (10) (1984) 1013–1030 .

37] W.E. Wong , V. Debroy , B. Choi , A family of code coverage-based heuristics for ef-fective fault localization, J. Syst. Softw. 83 (2) (2010) 188–208 .

38] W.E. Wong , V. Debroy , R. Gao , Y. Li , The DStar method for effective software faultlocalization, Trans. Reliab. 63 (1) (2013) 290–308 .

39] W.E. Wong , R. Gao , Y. Li , R. Abreu , F. Wotawa , A survey on software fault localiza-tion, Trans. Softw. Eng. 42 (8) (2016) 707–740 .

40] W.E. Wong , Y. Qi , BP neural network-based effective fault localization, J. Softw.Eng. Knowl. Eng. 19 (04) (2009) 573–597 .

13

41] X. Xiaoyuan , W.E. Wong , T.Y. Chen , Baowen Xu , Metamorphic slice: an applicationin spectrum-based fault localization, Inf. Softw. Technol. 55 (5) (2013) 866–879 .

42] X. Xie , T.Y. Chen , F.-C. Kuo , B. Xu , A theoretical analysis of the risk evaluationformulas for spectrum-based fault localization, Trans. Softw. Eng. Methodol. 22 (4)(2013) 31 .

43] J. Xuan , M. Martinez , F. DeMarco , M. Clement , S.L. Marcote , T. Durieux , D.L. Berre ,M. Monperrus , Nopol: automatic repair of conditional statement bugs in java pro-grams, Trans. Softw. Eng. 43 (1) (2017) 34–55 .

44] J. Xuan , M. Monperrus , Learning to combine multiple ranking metrics for fault lo-calization, in: Proceedings of the International Conference on Software Maintenanceand Evolution, 2014 .

45] S. Yoo , Evolving human competitive spectra-based fault localisation techniques, in:International Symposium on Search Based Software Engineering LNCS 7515, 2012 .

46] D. Gopinath , R.N. Zaeem , S. Khurshid , Improving the effectiveness of spectra-basedfault localization using specifications, in: Proceedings of the International Confer-ence on Automated Software Engineering, 2012 .

47] L. Zhang , L. Yan , Z. Zhang , J. Zhang , W.K. Chan , Z. Zheng , A theoretical analysison cloning the failed test cases to improve spectrum-based fault localization, J. Syst.Softw. 129 (2017) 35–57 .

48] X. Xu , V. Debroy , W.E. Wong , D. Guo , Ties within fault localization rankings: ex-posing and addressing the problem, Int. J. Softw. Eng. Knowl. Engineering 21 (06)(2011) 803–827 .

49] A. Rui , M. Wolfgang , S. Markus , J.C. van Gemund Arjan , Refining spectrum-basedfault localization rankings, in: Proceedings of the Symposium on Applied Computing,2009 .

50] X. Xiaoyuan , T.Y. Chen , B. Xu , Isolating suspiciousness from spectrum-based faultlocalization techniques, in: Proceedings of the International Conference on QualitySoftware, 2010 .

51] A. Jones James , J.F. Bowring , M.J. Harrold , Debugging in parallel, in: Proceedingsof the International Symposium on Software Testing and Analysis, 2007 .

52] G. Ruizhi , W.E. Wong , MSeer-an advanced technique for locating multiple bugs inparallel, IEEE Trans. Softw. Eng. (2017) (in press) .

53] D. Vidroha , W.E. Wong , Insights on fault interference for programs with multiplebugs, in: Proceedings of the International Symposium on Software Reliability Engi-neering, 2009 .

54] D. Vidroha , W.E. Wong , Using mutation to automatically suggest fixes for faultyprograms, in: Proceedings of the International Conference on Software Testing, Ver-ification and Validation, 2010 .

55] C. Parnin , A. Orso , Are automated debugging techniques actually helping program-mers? in: Proceedings of the International Symposium on Software Testing and Anal-ysis, 2011 .

56] M. Zhang , X. Li , L. Zhang , S. Khurshid , Boosting spectrum-based fault localizationusing pagerank, in: Proceedings of the International Symposium on Software Testingand Analysis, 2017 .

57] W.E. Wong , V. Debroy , R. Golden , X. Xu , B. Thuraisingham , Effective softwarefault localization using an RBF neural network, IEEE Trans. Reliab. 61 (1) (2012)149–169 .

58] C. Fang , Y. Feng , Q. Shi , Z. Liu , S. Li , B. Xu , Fault interference and coupling effect, in:Proceedings of the International Conference on Software Engineering & KnowledgeEngineering, 2017 .

59] N. DiGiuseppe , J.A. Jones , Fault density, fault types, and spectra-based fault local-ization, Emp. Softw. Eng. 20 (4) (2014) 928–967 .

60] N. DiGiuseppe , J.A. Jones , Software behavior and failure clustering: an empiricalstudy of fault causality, in: Proceedings of the International Conference on SoftwareTesting, Verification and Validation, 2012 .

61] N. DiGiuseppe , J.A. Jones , On the influence of multiple faults on coverage-basedfault localization, in: Proceedings of the International Symposium on Software Test-ing and Analysis, 2011 .

62] X. Xue , A.S. Namin , How significant is the effect of fault interactions on cover-age-based fault localizations? in: Proceedings of the International Symposium onEmpirical Software Engineering and Measurement, 2013 .