applying smoothing filters to improve ir-based ... · yuniversity of sannio, viale traiano - 82100...

10
1 Applying Smoothing Filters to Improve IR-based Traceability Recovery Processes: An Empirical Investigation Andrea De Lucia * , Massimiliano Di Penta , Rocco Oliveto , Annibale Panichella * , Sebastiano Panichella * University of Salerno, Via Ponte don Melillo - 84084 Fisciano (SA), Italy University of Sannio, Viale Traiano - 82100 Benevento, Italy University of Molise, C.da Fonte Lappone - 86090 Pesche (IS), Italy [email protected], [email protected], [email protected], [email protected], [email protected] November 7, 2011 DRAFT

Upload: nguyenkhue

Post on 17-Feb-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

1

Applying Smoothing Filters to Improve

IR-based Traceability Recovery Processes: An

Empirical Investigation

Andrea De Lucia∗, Massimiliano Di Penta†, Rocco Oliveto‡, Annibale Panichella∗, Sebastiano

Panichella†

∗University of Salerno, Via Ponte don Melillo - 84084 Fisciano (SA), Italy†University of Sannio, Viale Traiano - 82100 Benevento, Italy

†University of Molise, C.da Fonte Lappone - 86090 Pesche (IS), Italy

[email protected], [email protected], [email protected],

[email protected], [email protected]

November 7, 2011 DRAFT

2

Abstract

This technical report contains details on the case study reported in our paper “Applying Smoothing Filters to

improve IR-based Traceability Recovery Processes: An Empirical Investigation” submitted to Information and Software

Technology Journal - Special Issue ICPC’11. Figures 1, 2, 3, 4, 5, 6, 7 show the Precision/Recall curves achieved

to answer our first research question, RQ1: To what extent do smoothing filters improve the accuracy of traceability

recovery methods?. Tables I and II report the statistical tests performed to answer our second research question, RQ2:

How effective is the smoothing filter in filtering out non-relevant words, as compared to stop word removal?

November 7, 2011 DRAFT

3

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(a) Curve Precision/Recall achieved by VSM

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(b) Curve Precision/Recall achieved by LSI

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(c) Curve Precision/Recall achieved by JS

Fig. 1. Precision/Recall curves achieved tracing UC onto CC on EasyClinic.

November 7, 2011 DRAFT

4

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(a) Curve Precision/Recall achieved by VSM

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(b) Curve Precision/Recall achieved by LSI

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(c) Curve Precision/Recall achieved by JS

Fig. 2. Precision/Recall curves achieved tracing ID onto CC on EasyClinic.

November 7, 2011 DRAFT

5

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(a) Curve Precision/Recall achieved by VSM

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(b) Curve Precision/Recall achieved by LSI

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

ciso

n

Recall

(c) Curve Precision/Recall achieved by JS

Fig. 3. Precision/Recall curves achieved tracing TC ases onto CC on EasyClinic.

November 7, 2011 DRAFT

6

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(a) Curve Precision/Recall achieved by VSM

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(b) Curve Precision/Recall achieved by LSI

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(c) Curve Precision/Recall achieved by JS

Fig. 4. Precision/Recall curves achieved tracing UC onto CC on e-Tour

November 7, 2011 DRAFT

7

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(a) Curve Precision/Recall achieved by VSM

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(b) Curve Precision/Recall achieved by LSI

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

csio

n

Recall

(c) Curve Precision/Recall achieved by JS

Fig. 5. Precision/Recall curves achieved tracing HLR onto LLR on Modis

November 7, 2011 DRAFT

8

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(a) Curve Precision/Recall achieved by VSM

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(b) Curve Precision/Recall achieved by LSI

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(c) Curve Precision/Recall achieved by JS

Fig. 6. Precision/Recall curves achieved tracing HLR onto UC on Pine.

November 7, 2011 DRAFT

9

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(a) Curve Precision/Recall achieved by VSM

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(b) Curve Precision/Recall achieved by LSI

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Pre

cisi

on

Recall

(c) Curve Precision/Recall achieved by JS

Fig. 7. Precision/Recall curves achieved tracing HLR onto CC on iTrust

November 7, 2011 DRAFT

10

TABLE I

COMPARISON OF DIFFERENT “NOISE REMOVAL” CONFIGURATIONS USING VSM: CLIFF’S d EFFECT SIZE FOR COMPARISONS WHERE THE

WILCOXON RANK SUM TEST INDICATES A SIGNIFICANT DIFFERENCE.

Comparison A1 A2 A3 A4 A5 A6 A7

Customized stop word list Vs. Standard stop word list -0.29 -0.52 -0.71 -0.19 0.65 0.08 -0.16

Smoothing Filter Vs. Standard stop word list -0.24 0.21 0.68 -0.08 -0.23 -0.06 -0.08

Smoothing Filter Vs. Customized stop word 0.36 0.22 0.72 0.52 -0.23 -0.07 0

Standard stop word list + Smoothing Filter Vs. Standard stop word list -0.18 0.47 -0.73 -0.17 0.01 0.55 0.01

Standard stop word list + Smoothing Filter Vs. Customized stop word 0.8 0.47 -0.56 0.34 0.01 0.56 0.15

Standard stop word list + Smoothing Filter Vs. Smoothing Filter -0.13 0.55 -2.46 -0.17 0.63 0.32 0.2

Customized stop word list + Smoothing Filter Vs. Standard stop word list 0.62 0.47 -0.41 0.28 0.51 0.59 0.09

Customized stop word list + Smoothing Filter Vs. Customized stop word 0.51 0.47 -0.23 0.47 0.51 0.58 0.21

Customized stop word list + Smoothing Filter Vs. Smoothing Filter 0.72 0.54 -2.33 0.28 0.46 0.38 0.25

Customized stop word list + Smoothing Filter Vs. Smoothing Filter + Standard stop word 0.34 0.16 0.24 0.36 0.33 0.11 0.16

TABLE II

COMPARISON OF DIFFERENT “NOISE REMOVAL” CONFIGURATIONS USING JS: CLIFF’S d EFFECT SIZE FOR COMPARISONS WHERE THE

WILCOXON RANK SUM TEST INDICATES A SIGNIFICANT DIFFERENCE.

Comparison A1 A2 A3 A4 A5 A6 A7

Customized stop word list Vs. Standard stop word list -0.06 -0.32 0.7 -0.21 1.1 0.17 0.23

Smoothing Filter Vs. Standard stop word list 0.04 -0.54 -0.27 -0.08 -0.48 0.62 0.02

Smoothing Filter Vs. Customized stop word 0.06 -0.54 -0.53 0.54 -0.49 0.59 0.11

Standard stop word list + Smoothing Filter Vs. Standard stop word list 0.59 0.46 0.3 -0.19 -0.39 0.66 0.17

Standard stop word list + Smoothing Filter Vs. Customized stop word 0.57 0.52 -0.39 0.53 -0.41 0.63 0.29

Standard stop word list + Smoothing Filter Vs. Smoothing Filter 0.31 1.08 1.07 -0.17 0.38 0.29 0.11

Customized stop word list + Smoothing Filter Vs. Standard stop word list 0.58 0.46 0.27 0.26 0.32 0.62 0.18

Customized stop word list + Smoothing Filter Vs. Customized stop word 0.56 0.52 -0.47 0.52 0.45 0.58 0.28

Customized stop word list + Smoothing Filter Vs. Smoothing Filter 0.31 1.06 1.02 0.29 0.82 0.21 0.25

Customized stop word list + Smoothing Filter Vs. Smoothing Filter + Standard stop word 0.06 0.06 -0.27 0.42 0.65 -0.05 0.07

November 7, 2011 DRAFT