Download - Patterns for Cleaning Up Bug Data
![Page 1: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/1.jpg)
P!""#r$% f&r C'#!$($) Up B*) D!"!
Rodrigo Souza1,*
Christina Chavez1
Roberto Bittencourt2
1 Federal University of Bahia, Brazil 2 State University of Feira de Santana, Brazil
DAPSE’13: International Workshop on Data Analysis Patterns in Software Engineering
* speaker; email: [email protected]
May 21, 2013 San Francisco, USA
![Page 2: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/2.jpg)
Bug reports
![Page 3: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/3.jpg)
provide insight about… - the quality of the software - the quality of the process
Bug reports
![Page 4: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/4.jpg)
often contain data that is… - incomplete - innacurate
- biased
Bug reports
may lead you to wrong conclusions
![Page 5: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/5.jpg)
are like vegetables…
You have to clean them up before using them
Bug reports
![Page 6: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/6.jpg)
I$ +(% T!', Two patterns to help you clean up your data
1. Look Out For Mass Updates 2. Old Wine Tastes Better
they’re like recipes for data scientists
![Page 7: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/7.jpg)
L&&, O*" f&r M!%% Up-!"#% Determine which changes to bug reports were the result of a mass update.
![Page 8: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/8.jpg)
1. Context 2. Problem 3. Solution 4. Discussion
L&&, O*" f&r M!%% Up-!"#%
![Page 9: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/9.jpg)
tuesday
Worked on bug #5
Worked on bug #12
Updated bug report #5
Updated bug report #12
Joe’s worklog
Today, Joe worked on two bugs and updated the corresponding bug reports
![Page 10: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/10.jpg)
tuesday
Updated bug report #5
Updated bug report #12
Joe’s worklog
Data scientists just see the updates Joe updated two reports ⇒ Joe worked on two bugs
Worked on bug #5
Worked on bug #12
![Page 11: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/11.jpg)
wednesday
Joe’s worklog
Joe updated 2600 reports ⇒ Joe worked on 2600 bugs?
Updated bug report #3
Updated bug report #18
Updated bug report #9
Updated bug report #15
Updated bug report #21
Updated bug report #52
Updated bug report #40
Updated bug report #41
Updated bug report #68
Updated bug report #73 Updated bug report #78
…
![Page 12: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/12.jpg)
![Page 13: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/13.jpg)
Mass updates do not represent actual work Often, they are just cleanup
![Page 14: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/14.jpg)
Mass updates should be discarded from your analyses
![Page 15: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/15.jpg)
1. Context 2. Problem 3. Solution 4. Discussion
L&&, O*" f&r M!%% Up-!"#%
![Page 16: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/16.jpg)
Determine which changes to bug reports were the result of a mass update
![Page 17: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/17.jpg)
1. Context 2. Problem 3. Solution 4. Discussion
L&&, O*" f&r M!%% Up-!"#%
![Page 18: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/18.jpg)
You’ll need: - Changes in bug reports (i.e., updates)
- What changed - Date - User - Comment
I$)r#-(#$"%
![Page 19: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/19.jpg)
Bug # What changed Date User Comment
1 status ⇒ VERIFIED
... ... ...
2 status ⇒ VERIFIED
... ... ...
3 status ⇒ CLOSED
... ... ...
4 status ⇒ VERIFIED
... ... ...
I$)r#-(#$"%
Select one type of change (“what changed”) e.g., status ⇒VERIFIED
![Page 20: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/20.jpg)
1
D(r#."(&$% (%&'*"(&$ #1)
2 Seek unusually high cliffs 3 Changes in the cliff are
considered mass updates
Plot accum. number of changes over time
![Page 21: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/21.jpg)
D(r#."(&$% (%&'*"(&$ #2)
Date User Comment
D1 U1 C1
D2 U2 C2
D3 U3 C3
D4 U4 C4
D5 U5 C5
Count ▼
1703
972
447
1
1
2 Count the groups 3 Groups with
higher counts are mass updates
1 Group changes by ⟨date, user, comment⟩
![Page 22: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/22.jpg)
1. Context 2. Problem 3. Solution 4. Discussion
L&&, O*" f&r M!%% Up-!"#%
![Page 23: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/23.jpg)
The main challenge is to find a suitable threshold (i.e., how many updates characterize mass updates)
![Page 24: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/24.jpg)
O'- W($# T!%"#% B#""#r Determine bug reports that are too recent to be classified.
![Page 25: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/25.jpg)
1. Context 2. Problem 3. Solution 4. Discussion
O'- W($# T!%"#% B#""#r
![Page 26: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/26.jpg)
Prediction models predict which bug reports will undergo some change, e.g.,
predict which bugs get reopened, predict which bugs get closed as invalid, predict which bugs get assigned to John.
![Page 27: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/27.jpg)
e.g., predict which bugs get reopened
# Who reported? Severity Age Reopened?
1 ... ... ... YES 2 ... ... ... YES 3 ... ... ... NO 4 ... ... ... NO 5 ... ... ... NO
training set
![Page 28: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/28.jpg)
# Who reported? Severity Age Reopened?
1 ... ... ... YES 2 ... ... ... YES 3 ... ... ... NO 4 ... ... ... NO 5 ... ... 1 day not yet
training set
can’t use too recent bugs for training
![Page 29: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/29.jpg)
1. Context 2. Problem 3. Solution 4. Discussion
O'- W($# T!%"#% B#""#r
![Page 30: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/30.jpg)
Determine bug reports that are too recent to be classified
![Page 31: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/31.jpg)
1. Context 2. Problem 3. Solution 4. Discussion
O'- W($# T!%"#% B#""#r
![Page 32: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/32.jpg)
You’ll need: - Date of last change in your data set
- Bug reports - Creation date - Whether it has been reopened*
I$)r#-(#$"%
* or, in general, whether it has undergone a particular change
![Page 33: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/33.jpg)
Measure each bug’s age, from its creation date to the date of the last change in your data set
1
D(r#."(&$%
# ... Age Reopened? 1 ... 180 days YES 2 ... 90 days NO 3 ... 16 days YES 4 ... 12 days NO ... ... ... ...
![Page 34: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/34.jpg)
Guess a threshold so that bugs younger than the threshold are considered too recent to be classified
2
D(r#."(&$%
threshold = 42 days
# ... Age Reopened? 1 ... 180 days YES 2 ... 90 days NO 3 ... 16 days YES 4 ... 12 days NO ... ... ... ...
too recent
![Page 35: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/35.jpg)
Estimate the confidence (α) that the remaining non-reopened bugs will never be reopened
3
D(r#."(&$%
# ... Age Reopened? 1 ... 180 days YES 2 ... 90 days NO 3 ... 16 days YES 4 ... 12 days NO ... ... ... ...
confidence (α)?
![Page 36: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/36.jpg)
α =
D(r#."(&$% (f&r/*'! ($ "0# p!p#r)
# ... Age Reopened? 1 ... 180 days YES 2 ... 90 days NO 3 ... 16 days YES 4 ... 12 days NO ... ... ... ...
num. bugs that have been reopened num. bugs older than the threshold
![Page 37: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/37.jpg)
If α is not high enough (e.g., α< 0.95), choose another threshold (i.e., repeat from )
4
D(r#."(&$%
2
![Page 38: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/38.jpg)
1. Context 2. Problem 3. Solution 4. Discussion
O'- W($# T!%"#% B#""#r
![Page 39: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/39.jpg)
There’s a trade off:
larger α ⇒ more confidence, less data smaller α⇒ less confidence, more data
![Page 40: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/40.jpg)
For the project NetBeans/Platform:
removing bugs younger than 6 weeks (0.7%) raises the confidence from 88% to 95%
![Page 41: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/41.jpg)
Arrrr!* It’s in the
paper!
*
Do ye have any source
code to show?
![Page 42: Patterns for Cleaning Up Bug Data](https://reader033.vdocument.in/reader033/viewer/2022060203/559e96ca1a28ab0d128b45f3/html5/thumbnails/42.jpg)
Thank you!
And clean up your bug reports before using them!