applying smoothing filters to improve ir-based ... · yuniversity of sannio, viale traiano - 82100...
TRANSCRIPT
1
Applying Smoothing Filters to Improve
IR-based Traceability Recovery Processes: An
Empirical Investigation
Andrea De Lucia∗, Massimiliano Di Penta†, Rocco Oliveto‡, Annibale Panichella∗, Sebastiano
Panichella†
∗University of Salerno, Via Ponte don Melillo - 84084 Fisciano (SA), Italy†University of Sannio, Viale Traiano - 82100 Benevento, Italy
†University of Molise, C.da Fonte Lappone - 86090 Pesche (IS), Italy
[email protected], [email protected], [email protected],
[email protected], [email protected]
November 7, 2011 DRAFT
2
Abstract
This technical report contains details on the case study reported in our paper “Applying Smoothing Filters to
improve IR-based Traceability Recovery Processes: An Empirical Investigation” submitted to Information and Software
Technology Journal - Special Issue ICPC’11. Figures 1, 2, 3, 4, 5, 6, 7 show the Precision/Recall curves achieved
to answer our first research question, RQ1: To what extent do smoothing filters improve the accuracy of traceability
recovery methods?. Tables I and II report the statistical tests performed to answer our second research question, RQ2:
How effective is the smoothing filter in filtering out non-relevant words, as compared to stop word removal?
November 7, 2011 DRAFT
3
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(a) Curve Precision/Recall achieved by VSM
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(b) Curve Precision/Recall achieved by LSI
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(c) Curve Precision/Recall achieved by JS
Fig. 1. Precision/Recall curves achieved tracing UC onto CC on EasyClinic.
November 7, 2011 DRAFT
4
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(a) Curve Precision/Recall achieved by VSM
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(b) Curve Precision/Recall achieved by LSI
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(c) Curve Precision/Recall achieved by JS
Fig. 2. Precision/Recall curves achieved tracing ID onto CC on EasyClinic.
November 7, 2011 DRAFT
5
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(a) Curve Precision/Recall achieved by VSM
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(b) Curve Precision/Recall achieved by LSI
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
ciso
n
Recall
(c) Curve Precision/Recall achieved by JS
Fig. 3. Precision/Recall curves achieved tracing TC ases onto CC on EasyClinic.
November 7, 2011 DRAFT
6
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(a) Curve Precision/Recall achieved by VSM
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(b) Curve Precision/Recall achieved by LSI
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(c) Curve Precision/Recall achieved by JS
Fig. 4. Precision/Recall curves achieved tracing UC onto CC on e-Tour
November 7, 2011 DRAFT
7
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(a) Curve Precision/Recall achieved by VSM
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(b) Curve Precision/Recall achieved by LSI
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
csio
n
Recall
(c) Curve Precision/Recall achieved by JS
Fig. 5. Precision/Recall curves achieved tracing HLR onto LLR on Modis
November 7, 2011 DRAFT
8
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(a) Curve Precision/Recall achieved by VSM
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(b) Curve Precision/Recall achieved by LSI
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(c) Curve Precision/Recall achieved by JS
Fig. 6. Precision/Recall curves achieved tracing HLR onto UC on Pine.
November 7, 2011 DRAFT
9
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(a) Curve Precision/Recall achieved by VSM
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(b) Curve Precision/Recall achieved by LSI
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Pre
cisi
on
Recall
(c) Curve Precision/Recall achieved by JS
Fig. 7. Precision/Recall curves achieved tracing HLR onto CC on iTrust
November 7, 2011 DRAFT
10
TABLE I
COMPARISON OF DIFFERENT “NOISE REMOVAL” CONFIGURATIONS USING VSM: CLIFF’S d EFFECT SIZE FOR COMPARISONS WHERE THE
WILCOXON RANK SUM TEST INDICATES A SIGNIFICANT DIFFERENCE.
Comparison A1 A2 A3 A4 A5 A6 A7
Customized stop word list Vs. Standard stop word list -0.29 -0.52 -0.71 -0.19 0.65 0.08 -0.16
Smoothing Filter Vs. Standard stop word list -0.24 0.21 0.68 -0.08 -0.23 -0.06 -0.08
Smoothing Filter Vs. Customized stop word 0.36 0.22 0.72 0.52 -0.23 -0.07 0
Standard stop word list + Smoothing Filter Vs. Standard stop word list -0.18 0.47 -0.73 -0.17 0.01 0.55 0.01
Standard stop word list + Smoothing Filter Vs. Customized stop word 0.8 0.47 -0.56 0.34 0.01 0.56 0.15
Standard stop word list + Smoothing Filter Vs. Smoothing Filter -0.13 0.55 -2.46 -0.17 0.63 0.32 0.2
Customized stop word list + Smoothing Filter Vs. Standard stop word list 0.62 0.47 -0.41 0.28 0.51 0.59 0.09
Customized stop word list + Smoothing Filter Vs. Customized stop word 0.51 0.47 -0.23 0.47 0.51 0.58 0.21
Customized stop word list + Smoothing Filter Vs. Smoothing Filter 0.72 0.54 -2.33 0.28 0.46 0.38 0.25
Customized stop word list + Smoothing Filter Vs. Smoothing Filter + Standard stop word 0.34 0.16 0.24 0.36 0.33 0.11 0.16
TABLE II
COMPARISON OF DIFFERENT “NOISE REMOVAL” CONFIGURATIONS USING JS: CLIFF’S d EFFECT SIZE FOR COMPARISONS WHERE THE
WILCOXON RANK SUM TEST INDICATES A SIGNIFICANT DIFFERENCE.
Comparison A1 A2 A3 A4 A5 A6 A7
Customized stop word list Vs. Standard stop word list -0.06 -0.32 0.7 -0.21 1.1 0.17 0.23
Smoothing Filter Vs. Standard stop word list 0.04 -0.54 -0.27 -0.08 -0.48 0.62 0.02
Smoothing Filter Vs. Customized stop word 0.06 -0.54 -0.53 0.54 -0.49 0.59 0.11
Standard stop word list + Smoothing Filter Vs. Standard stop word list 0.59 0.46 0.3 -0.19 -0.39 0.66 0.17
Standard stop word list + Smoothing Filter Vs. Customized stop word 0.57 0.52 -0.39 0.53 -0.41 0.63 0.29
Standard stop word list + Smoothing Filter Vs. Smoothing Filter 0.31 1.08 1.07 -0.17 0.38 0.29 0.11
Customized stop word list + Smoothing Filter Vs. Standard stop word list 0.58 0.46 0.27 0.26 0.32 0.62 0.18
Customized stop word list + Smoothing Filter Vs. Customized stop word 0.56 0.52 -0.47 0.52 0.45 0.58 0.28
Customized stop word list + Smoothing Filter Vs. Smoothing Filter 0.31 1.06 1.02 0.29 0.82 0.21 0.25
Customized stop word list + Smoothing Filter Vs. Smoothing Filter + Standard stop word 0.06 0.06 -0.27 0.42 0.65 -0.05 0.07
November 7, 2011 DRAFT