potential biases in bug localization: do they matter?
TRANSCRIPT
![Page 1: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/1.jpg)
Potential Biases in Bug Localization: Do They Matter?
Pavneet Singh Kochhar, Yuan Tian, David LoSingapore Management University
{kochharps.2012, yuan.tian.2012,davidlo}@smu.edu.sg
![Page 2: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/2.jpg)
Issue Tracking
• Projects use issue tracking systems like JIRA
• Well-known projects receive large number of issue reports
• Large number of bug reports can overwhelm the number of developers.
• Mozilla developer - “Everyday, almost 300 bugs appear that need triaging. This is far too much for only the Mozilla programmers to handle” *
What have researchers proposed to overcome this issue?
* J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug repository,” in ETX, pp. 35–39, 2005
2/25
![Page 3: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/3.jpg)
Bug Localization
Thousands of Source Code Files
GOAL: Find the buggy files ------>
3/25
![Page 4: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/4.jpg)
How Bug Localization Works
• Uses fixed/closed bug reports
• Uses standard information retrieval (IR) techniques such as Vector space model (VSM)
• Computes similarity between bug reports & source code
• Returns rank list of potential buggy source code files
• Returned list is compared with actual buggy files to compute accuracy
4/25
![Page 5: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/5.jpg)
Issues in Bug Localization
HOWEVER
What if bug localization results are biased?
• Past study* shows: • Upto 80% of the bug reports can be localized by
inspecting 5 source code files.• Results are promising
* Improving bug localization using structured information retrieval, R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry, ASE 2013
5/25
![Page 6: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/6.jpg)
Our Study
Potential Biases in Bug Localization
1. Wrongly Classified Reports Herzig et al. *– 1/3 of reports marked as bugs are not bugs 2. Already Localized Reports
3. Incorrect Ground Truth Files Kawrykow et al.+ - Lot of changes are non-essential
* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013+ Non-essential changes in version histories D. Kawrykow and M. P. Robillard.. ICSE, 2011.
6/25
![Page 7: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/7.jpg)
Our Study
Potential Biases in Bug Localization
1. Wrongly Classified Reports Herzig et al. *– 1/3 of reports marked as bugs are not bugs 2. Already Localized Reports
3. Incorrect Ground Truth Files Kawrykow et al.+ - Lot of changes are non-essential
* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013+ Non-essential changes in version histories D. Kawrykow and M. P. Robillard.. ICSE, 2011.
7/25
![Page 8: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/8.jpg)
Dataset
Projects Organization Tracker Number of Issue Reports
HTTPClient Apache JIRA 746
Jackrabbit Apache JIRA 2402
Lucene-Java Apache JIRA 2443
Total = 5591 Issue Reports *
* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013
8/25
![Page 9: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/9.jpg)
Evaluation Metric
Average precision
Mean Average Precision (MAP) – Mean of average precisions over all ranked lists.
9/25
![Page 10: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/10.jpg)
BIAS 1– Report Misclassification
Projects Reported Actual Difference Cohen’s dHTTPClient 0.429 0.419 -2.33% 0.13
Jackrabbit 0.302 0.339 12.25%* 0.06
Lucene-Java
0.301 0.322 6.98% 0.04
Difference of -2.33% to 12.25% between MAP scores* Statistical significant differences (Mann-Whitney Wilcoxon test)Effect sizes are trivial (d<0.2)
Mean Average Precision (MAP) Scores
10/25
![Page 11: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/11.jpg)
BIAS 1– Report Misclassification
Mean Average Precision (MAP) ScoresActual to Reported HC JB LJ Overall
None 0.429 0.302 0.301 0.312RFE to BUG 0.427 0.303 0.304 0.313DOCUMENTATION to BUG 0.430 0.304 0.305 0.315IMPROVEMENT to BUG 0.416 0.299 0.295 0.307
REFACTORING to BUG 0.428 0.301 0.301 0.311BACKPORT to BUG 0.430 0.303 0.300 0.313CLEANUP to BUG 0.429 0.303 0.303 0.314
SPEC to BUG 0.435 0.302 0.301 0.312TASK to BUG 0.432 0.302 0.301 0.312TEST to BUG 0.429 0.328 0.313 0.334BUILD_SYSTEM to BUG 0.429 0.306 0.303 0.315
DESIGN_DEFECT to BUG 0.424 0.301 0.301 0.311OTHERS to BUG 0.439 0.303 0.301 0.313
* HC – HTTPClient, JB- Jackrabbit, LJ – Lucene-Java
11/25
![Page 12: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/12.jpg)
BIAS 1– Report Misclassification
Results:Significantly impacts bug localization result for 1/3 projectsHowever, effect sizes are negligible i.e., <0.2
12/25
![Page 13: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/13.jpg)
BIAS 2– Localized Bug Reports
Category DescriptionFully All the buggy files are mentioned in the bug report
Partially Some of the buggy files are specified in the bug report
Not Bug reports do not specify any buggy files
Fully Localized Report (Example)Category DescriptionSummary DecompressingEntity not calling close on InputStream
retrieved by getContent
Description The method DecompressingEntity.writeTo(OutputStream outstream) does not close the InputStream retrieved by getContent().
Buggy Files DecompressingEntity.java
Categories
13/25
![Page 14: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/14.jpg)
BIAS 2– Localized Bug Reports
Manually Identifying Localized Reports
5591 Issue reports
1191 bug reports (Herzig et al.*)
Randomly selected 350
Files changed Summary & Description
Classified bug reports
* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013
14/25
![Page 15: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/15.jpg)
BIAS 2– Localized Bug Reports
Based on manual investigation:Build an algorithm to automatically classify bug reportsInput – Summary/Description of bug reports &
Files changed to fix the bugOutput – Bug reports classified into 1 out of 3
categories
Automatically Identifying Localized Reports
15/25
![Page 16: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/16.jpg)
BIAS 2– Localized Bug Reports
Number/ProportionProject Category Number Proportion
Fully 36 3.02%
HTTPClient Partially 28 2.35%
Not 35 2.93%
Fully 299 25.10%
Jackrabbit Partially 132 11.08%
Not 402 33.75%
Fully 63 5.28%
Lucene-Java Partially 87 7.30%
Not 109 9.15%
Overall 33.41% are fully localizedMore than 50% fully or partially localized
16/25
![Page 17: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/17.jpg)
BIAS 2– Localized Bug Reports
Projects Fully Partially NotHTTPClient 0.615 0.349 0.250
Jackrabbit 0.560 0.373 0.187
Lucene-Java 0.527 0.338 0.197
Difference between Fully & Not HTTPClient - 84.39% Jackrabbit - 99.86% Lucene-Java - 91.16%
Mean Average Precision (MAP) Scores
17/25
![Page 18: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/18.jpg)
BIAS 2– Localized Bug Reports
ProjectsFully-Partially Partially-Not Fully-Not
p-value
d Effect Size
p-value
d Effect Size
p-value
d Effect Size
HTTPClient * 0.94 L * 0.53 M * 1.27 L
Jackrabbit * 0.56 M * 0.55 M * 1.14 L
Lucene-Java
* 0.53 M * 0.41 S * 1.04 L
Comparison – Fully vs. Partially vs Not
*Significant differences (p-value<0.05)
Effect sizes b/w Fully & Not are LARGE
18/25
![Page 19: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/19.jpg)
BIAS 2– Localized Bug Reports
Best & Worst bug reportsProject Fully Partially Not p-value
HTTPClientUpper 16 5 4
0.0041*Lower 6 4 15
*Significant differences (p-value<0.05)
JackrabbitUpper 35 9 6
2.807e-13*
Lower 7 1 42
Lucene-JavaUpper 22 18 10
8.724e-05*
Lower 5 18 27
19/25
![Page 20: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/20.jpg)
BIAS 2– Localized Bug Reports
Results:More than 50% of bugs are either fully or partially localizedMAP scores for fully & partially localized much higher than not localizedEffect sizes between fully & not localized are LARGE
20/25
![Page 21: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/21.jpg)
BIAS 3– Non-Buggy Files
Manual Investigation
Randomly selected 100 not localized bug reports
Files changed to fix these bugs
Diff between original & modified file
Non-buggy = Cosmetic changes, refactorings etc.
clean GROUND TRUTH files
21/25
![Page 22: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/22.jpg)
BIAS 3– Non-Buggy Files
Example
22/25
![Page 23: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/23.jpg)
BIAS 3– Non-Buggy Files
Differences are not significantEffect sizes are trivial (<0.2)
Mean Average Precision (MAP) Scores
Projects Dirty Clean Difference dHTTPClient 0.207 0.171 0.036 0.08
Jackrabbit 0.115 0.115 0.000 0.08
Lucene-Java
0.271 0.239 0.032 0.17
23/25
![Page 24: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/24.jpg)
BIAS 3– Non-Buggy Files
Results:28.11% of the files in the ground-truth are non-buggyDifferences between MAP scores are not significantEffect sizes are negligible i.e., <0.2
24/25
![Page 25: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/25.jpg)
Conclusion
BIAS 1- Wrongly classified issue reports NOT statistically significant NO substantial impact
BIAS 2 – Localized bug reports Statistically significant Substantial impact
BIAS 3 – Non-buggy files: NOT statistically significant NO substantial impact
25/25
![Page 27: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/27.jpg)
Other Evaluation Metrics
HIT@N : Percentage of bug reports with at least one buggy file in top N ranked results
Mean Reciprocal Rank (MRR) Reciprocal rank is inverse of the rank of the 1st buggy file. MRR is average of the reciprocal ranks.
![Page 28: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/28.jpg)
BIAS 1- Report Misclassification
![Page 29: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/29.jpg)
BIAS 2- Localized Bug Reports
![Page 30: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/30.jpg)
BIAS 3- Non-Buggy Files
![Page 31: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/31.jpg)
BIAS 1, BIAS 2 & BIAS 3
Mean Reciprocal Rank (MRR) Scores
![Page 32: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/32.jpg)
Appendix (Statistical Analysis)
• Mann-Whitney-Wilcoxon (MWW) test: Given a significance level = 0.05,if p-value <, then the test rejects the null hypothesis.
![Page 33: Potential Biases in Bug Localization: Do They Matter?](https://reader036.vdocument.in/reader036/viewer/2022062821/589bc46d1a28ab082b8b5e81/html5/thumbnails/33.jpg)
Appendix (BIAS-2 Results)Actual to Reported HC JB LJ Overall
None 0.429 0.302 0.301 0.312
RFE to BUG 0.427 0.303 0.304 0.313
DOCUMENTATION to BUG 0.430 0.304 0.305 0.315
IMPROVEMENT to BUG 0.416 0.299 0.295 0.307
REFACTORING to BUG 0.428 0.301 0.301 0.311
BACKPORT to BUG 0.430 0.303 0.300 0.313
CLEANUP to BUG 0.429 0.303 0.303 0.314
SPEC to BUG 0.435 0.302 0.301 0.312
TASK to BUG 0.432 0.302 0.301 0.312
TEST to BUG 0.429 0.328 0.313 0.334
BUILD_SYSTEM to BUG 0.429 0.306 0.303 0.315
DESIGN_DEFECT to BUG 0.424 0.301 0.301 0.311
OTHERS to BUG 0.439 0.303 0.301 0.313