automatic fine-grained issue report reclassification

25
Automatic Fine-Grained Issue Report Reclassification Pavneet Singh Kochhar, Ferdian Thung, David Lo Singapore Management University {kochharps.2012, ferdiant.2013, davidlo}@smu.edu.sg

Upload: pavneet-singh-kochhar

Post on 09-Feb-2017

147 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Automatic Fine-Grained Issue Report Reclassification

Automatic Fine-Grained Issue ReportReclassification

Pavneet Singh Kochhar, Ferdian Thung, David LoSingapore Management University

{kochharps.2012, ferdiant.2013, davidlo}@smu.edu.sg

Page 2: Automatic Fine-Grained Issue Report Reclassification

2/24

Misclassification of Issue Reports

BUG

Herzig et al. *• 40% of issue reports are misclassified.• 1/3 issue reports are wrongly classified as bugs.

* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013

DOCUMENTATIONIMPROVEMENT

REFACTORING

BACKPORTCLEANUP

DESIGN DEFECT

TASK

TEST

Page 3: Automatic Fine-Grained Issue Report Reclassification

Impact of Misclassification

• Well-known projects receive large number of issue reports

• Large number of bug reports can overwhelm the number of developers.

• Mozilla developer - “Everyday, almost 300 bugs appear that need triaging.” *

• Manual Process

• Misclassified reports take more time to fix+

* J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug repository,” in ETX, pp. 35–39, 2005+ X. Xia, D. Lo, M. Wen, E. Shihab, and B. Zhou, “An empirical study of bug report field reassignment,” in CSMR-WCRE, pp. 174–183, 2014.

3/24

Page 4: Automatic Fine-Grained Issue Report Reclassification

Related Work

• Herzig et al. [1] – • Manually classify over 7000 issue reports.• 14 different categories

We use the same dataset We use 13 categories (merge UNKNOWN & OTHERS)

• Antoniol et al. [2] – • Classify issue reports either as “bug” or “enhancement”

We consider “reclassification” problem We use 13 different categories

[1] It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013[2] G. Antoniol, K. Ayari, M. D. Penta, F. Khomh, and Y.-G. Gueheneuc, “Is it a bug or an enhancement? a text-based approach to classify change requests,” in CASCON, pp. 23:304–23:318, 2008.

4/24

Page 5: Automatic Fine-Grained Issue Report Reclassification

Our Study

Fine-Grained Issue Report Reclassification

13 Categories*

BUG RFE IMPROVEMENT DOCUMENTATION

TASK BUILD

REFACTORING

DESIGN DEFECT

TEST CLEANUP

BACKPORT

SPECIFICATION

OTHERS* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013

5/24

(Adaptive Maintenance)

(PerfectiveMaintenance)

(Deallocatingmemory)

(RemovingDuplicate methods)

Page 6: Automatic Fine-Grained Issue Report Reclassification

Overall Framework

Training Issue

Reports

Ground Truth

Categories*

New Issue Reports

Model Building Model

Feature Extraction

Predicted Reclassified Categories

Training Phase Deployment Phase

*Herzig et al.

6/24

Page 7: Automatic Fine-Grained Issue Report Reclassification

Pre-Processing

• Text Pre-Processing• Summary & Description fields

• Stop-word removal • eg., “is”, “are”, “if”

• Stemming (Reducing to root form)• eg., “reads” and “reading” -----> “read”• Use Porter Stemmer*

*http://tartarus.org/martin/PorterStemmer/

7/24

Page 8: Automatic Fine-Grained Issue Report Reclassification

Feature Extraction

1. TF-IDF TF - Term Frequency, IDF- Inverse Document Frequency

2. Reported Category (C1-C13) Cn=1 where n=1 to 13

8/24

Page 9: Automatic Fine-Grained Issue Report Reclassification

Feature Extraction

3. Exception Trace (S) a) Phrase: “Exception in thread” b) Regex : [A-Za-z0-9$.]+Exception eg., java.lang.NullPointerException c) Regex : [A-Za-z0-9$.]+[A-Za-z0-9]+([A-Za-z0-9]+(java:[0-9]+)?) eg., oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:447)

4. Issue Reporter (R1-RM) where M is total number of reporters

9/24

Page 10: Automatic Fine-Grained Issue Report Reclassification

Model Building

• LibSVM (Support Vector Machine)*• Multi-class classification

• Inputs• L, Learner (Training Algorithm)• X, Set of Training Data i.e., Issue Reports• y, where }, Labels i.e., 13 categories

• Output• A list of classifiers for k },

• Classifiers are applied on unseen data to predict label k

*http://www.csie.ntu.edu.tw/~cjlin/libsvm/10/24

Page 11: Automatic Fine-Grained Issue Report Reclassification

Dataset

Projects Organization Tracker Number of Issue Reports

HTTPClient Apache JIRA 746

Jackrabbit Apache JIRA 2402

Lucene-Java Apache JIRA 2443

Rhino Mozilla BugZilla 1226

Tomcat5 Apache BugZilla 584

Total = 7401 Issue Reports *

* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013

11/24

Page 12: Automatic Fine-Grained Issue Report Reclassification

Evaluation Metrics

(Precision)

(Recall)

(F-Measure)

( Weighted F-Measure)

We use Weighted Precision, Recall & F-Measure

12/24

Page 13: Automatic Fine-Grained Issue Report Reclassification

Baselines

• Baseline-1 Predicts reclassified category same as assigned category

• Baseline-2 Predicts reclassified category as “BUG” (Majority of the issues are BUGS)

13/24

Page 14: Automatic Fine-Grained Issue Report Reclassification

Research Questions

RQ1: Effectiveness of Our Approach

RQ2: Varying the Amount of Training Data

RQ3: Most Discriminative Features

RQ4: Analysis of Correctly & Wrongly Classified Issue Reports

RQ5: Comparison to Other Classification Algorithms

14/24

Page 15: Automatic Fine-Grained Issue Report Reclassification

RQ1: Effectiveness of Our ApproachHTTPClient Jackrabbit Lucene-Java

Prec Rec WF1 Prec Rec WF1 Prec Rec WF1

Ours 0.61 0.63 0.60 0.71 0.72 0.71 0.63 0.62 0.63

Baseline-1 0.54 0.52 0.43 0.61 0.62 0.54 0.50 0.50 0.43

Baseline-2 0.16 0.40 0.23 0.15 0.39 0.21 0.08 0.28 0.12

Improvement-1 12.96 21.15 39.53 16.39 16.12 31.48 24.00 26.00 44.18Improvement-2 281.2 57.4 160.8 373.3 84.6 238.0 675.0 125.0 416.6

Rhino Tomcat5Prec Rec WF1 Prec Rec WF1

Ours 0.58 0.61 0.57 0.58 0.62 0.58

Baseline-1 0.35 0.57 0.43 0.36 0.58 0.45

Baseline-2 0.26 0.51 0.35 0.30 0.54 0.38

Improvement-1 65.71 7.01 32.55 61.11 6.89 28.88Improvement-2 123.0 19.6 62.85 93.3 14.8 52.63

15/24

Page 16: Automatic Fine-Grained Issue Report Reclassification

RQ2: Varying Training Data

% of Issue Reports

HTTPClient Jackrabbit Lucene-Java

Prec Rec WF1 Prec Rec WF1 Prec Rec WF1

10 0.49 0.56 0.47 0.63 0.65 0.60 0.55 0.57 0.5320 0.54 0.55 0.46 0.64 0.66 0.61 0.57 0.57 0.5430 0.58 0.60 0.54 0.68 0.70 0.67 0.59 0.60 0.5840 0.54 0.53 0.48 0.69 0.71 0.68 0.59 0.58 0.5650 0.58 0.61 0.57 0.69 0.71 0.69 0.62 0.63 0.6160 0.59 0.62 0.58 0.64 0.65 0.62 0.61 0.62 0.6170 0.60 0.62 0.58 0.70 0.72 0.70 0.62 0.63 0.6280 0.62 0.68 0.61 0.70 0.72 0.70 0.63 0.64 0.6390 0.61 0.64 0.60 0.71 0.73 0.71 0.62 0.63 0.62

16/24

Page 17: Automatic Fine-Grained Issue Report Reclassification

RQ2: Varying Training Data

% of Issue Reports

Rhino Tomcat5

Prec Rec WF1 Prec Rec WF1

10 0.45 0.52 0.40 0.47 0.54 0.4320 0.46 0.50 0.39 0.50 0.55 0.4530 0.46 0.50 0.40 0.54 0.60 0.5340 0.47 0.48 0.40 0.56 0.62 0.5650 0.52 0.58 0.50 0.56 0.61 0.5660 0.55 0.59 0.53 0.50 0.48 0.4270 0.56 0.60 0.54 0.49 0.44 0.3880 0.58 0.61 0.56 0.57 0.62 0.5890 0.59 0.61 0.56 0.54 0.59 0.55

17/24

Page 18: Automatic Fine-Grained Issue Report Reclassification

RQ3: Most Discriminative Features

HTTPClient JackrabbitFeature Fisher

ScoreFeature Fisher

ScoreStemmed word “test” 1.73 Reported Category (BUG) 0.72

Reported Category (TASK) 0.58 Stemmed word “test” 0.55

Stemmed word “privat” 0.56 Stemmed word “maven” 0.51

Reported Category (BUG) 0.54 Stemmed word “backport” 0.46

Stemmed word “cleanup” 0.50 Reported Category (IMPR) 0.43

18/24

Page 19: Automatic Fine-Grained Issue Report Reclassification

RQ3: Most Discriminative FeaturesLucene-Java Rhino

Feature Fisher Score

Feature Fisher Score

Stemmed word “test” 0.94 Stemmed word “test” 3.84

Reported Category (BUG) 0.61 Stemmed word “suit” 0.43

Reported Category (TEST) 0.50 Stemmed word “patch” 0.32

Stemmed word “backport” 0.45 Stemmed word “driver” 0.29

Stemmed word “remov” 0.38 Stemmed word “regress” 0.27

Tomcat5Feature Fisher Score

Stemmed word “longer” 1.15

Issue Reporter “starksm” 0.71

Stemmed word “class” 0.64

Stemmed word “ant” 0.62

Reported Category (BUG) 0.56

19/24

Page 20: Automatic Fine-Grained Issue Report Reclassification

RQ4: Correctly & Wrongly Classified Reports

BUG RFE IMPR TEST DOC BUILD CLEANUP REFACBUG 2631 48 119 26 23 8 8 1

RFE 139 765 223 6 13 7 13 31

IMPR 320 214 658 8 12 13 16 19

TEST 84 12 15 220 1 8 4 3

DOC 95 39 37 0 209 13 17 2

BUILD 29 17 19 11 10 127 5 1

CLEANUP 58 30 42 6 11 5 104 12

REFAC 20 51 61 1 2 0 16 91

Predicted Labels

Gro

und

Tru

th L

abel

s

Table shows 8 categories (Total 13 categories)

BUG – 2631/2914 (90.3%)TEST – 220/349 (63%)

RFE – 765/1221 (62.7%)

20/24

Page 21: Automatic Fine-Grained Issue Report Reclassification

RQ4: Correctly & Wrongly Classified Reports

BUG RFE IMPR TEST DOC BUILD CLEANUP REFACBUG 2631 48 119 26 23 8 8 1

RFE 139 765 223 6 13 7 13 31

IMPR 320 214 658 8 12 13 16 19

TEST 84 12 15 220 1 8 4 3

DOC 95 39 37 0 209 13 17 2

BUILD 29 17 19 11 10 127 5 1

CLEANUP 58 30 42 6 11 5 104 12

REFAC 20 51 61 1 2 0 16 91

Predicted Labels

Gro

und

Tru

th L

abel

s

21/24

Page 22: Automatic Fine-Grained Issue Report Reclassification

RQ5: Comparison with Other Algorithms

Approach HTTPClient Jackrabbit Lucene-JavaPrec Rec WF1 Prec Rec WF1 Prec Rec WF1

Ours (LibSVM) 0.61 0.63 0.60 0.71 0.72 0.71 0.62 0.63 0.62Naïve Bayes 0.49 0.47 0.48 0.51 0.39 0.43 0.46 0.37 0.40

NB Multinomial

0.53 0.60 0.54 0.64 0.66 0.61 0.60 0.59 0.56

K-Nearest Neighbors

0.47 0.29 0.34 0.60 0.58 0.59 0.46 0.40 0.42

Random Forest

0.45 0.56 0.46 0.54 0.58 0.53 0.45 0.48 0.43

RBF Network 0.37 0.39 0.37 0.39 0.41 0.40 0.31 0.31 0.30

22/24

Page 23: Automatic Fine-Grained Issue Report Reclassification

RQ5: Comparison with Other Algorithms

Approach Rhino Tomcat5Prec Rec WF1 Prec Rec WF1

Ours (LibSVM) 0.58 0.61 0.57 0.58 0.62 0.58Naïve Bayes 0.51 0.51 0.51 0.48 0.40 0.42

NB Multinomial

0.52 0.58 0.49 0.51 0.58 0.47

K-Nearest Neighbors

0.50 0.43 0.43 0.43 0.43 0.42

Random Forest

0.51 0.56 0.47 0.45 0.56 0.46

RBF Network 0.40 0.43 0.41 0.33 0.54 0.39

23/24

Page 24: Automatic Fine-Grained Issue Report Reclassification

Conclusion & Future Work

Automated approach to reclassify issue reportsEvaluate over 7000 issue reportsExtract features such as TF-IDF, Reported category, Exception trace, Issue reporterPerform multi-class classification (13 Categories)F-Measure Score 0.57-0.71Improvement of 28.88% - 414.66% over baselines

Future Work: Analyse more issue reports Design advanced multi-class solution

24/24

Page 25: Automatic Fine-Grained Issue Report Reclassification

Thank You!

Email: [email protected]