patterns for cleaning up bug data

42
Pr fr C Up B D Rodrigo Souza 1,* Christina Chavez 1 Roberto Bittencourt 2 1 Federal University of Bahia, Brazil 2 State University of Feira de Santana, Brazil DAPSE’13: International Workshop on Data Analysis Patterns in Software Engineering * speaker; email: [email protected] May 21, 2013 San Francisco, USA

Upload: rodrigo-rocha

Post on 09-Jul-2015

126 views

Category:

Technology


0 download

DESCRIPTION

Paper at https://github.com/rodrigorgs/dapse13-bugpatterns/blob/master/preprint/icsews13dapse-id2-p-16145-preprint.pdf?raw=true

TRANSCRIPT

Page 1: Patterns for Cleaning Up Bug Data

P!""#r$% f&r C'#!$($) Up B*) D!"!

Rodrigo Souza1,*

Christina Chavez1

Roberto Bittencourt2

1 Federal University of Bahia, Brazil 2 State University of Feira de Santana, Brazil

DAPSE’13: International Workshop on Data Analysis Patterns in Software Engineering

* speaker; email: [email protected]

May 21, 2013 San Francisco, USA

Page 2: Patterns for Cleaning Up Bug Data

Bug reports  

Page 3: Patterns for Cleaning Up Bug Data

provide insight about… - the quality of the software - the quality of the process

Bug reports  

Page 4: Patterns for Cleaning Up Bug Data

often contain data that is… -  incomplete -  innacurate

-  biased

Bug reports  

may lead you to wrong conclusions  

Page 5: Patterns for Cleaning Up Bug Data

are like vegetables…

You have to clean them up before using them

Bug reports  

Page 6: Patterns for Cleaning Up Bug Data

I$ +(% T!', Two patterns to help you clean up your data

1. Look Out For Mass Updates 2. Old Wine Tastes Better

they’re like recipes for data scientists

Page 7: Patterns for Cleaning Up Bug Data

L&&, O*" f&r M!%% Up-!"#% Determine which changes to bug reports were the result of a mass update.

Page 8: Patterns for Cleaning Up Bug Data

1. Context 2. Problem 3. Solution 4. Discussion

L&&, O*" f&r M!%% Up-!"#%

Page 9: Patterns for Cleaning Up Bug Data

tuesday

Worked  on  bug  #5  

Worked  on  bug  #12  

Updated  bug  report  #5  

Updated  bug  report  #12  

Joe’s worklog

Today, Joe worked on two bugs and updated the corresponding bug reports

Page 10: Patterns for Cleaning Up Bug Data

tuesday

Updated  bug  report  #5  

Updated  bug  report  #12  

Joe’s worklog

Data scientists just see the updates Joe updated two reports ⇒ Joe worked on two bugs

Worked  on  bug  #5  

Worked  on  bug  #12  

Page 11: Patterns for Cleaning Up Bug Data

wednesday

Joe’s worklog

Joe updated 2600 reports ⇒ Joe worked on 2600 bugs?

Updated  bug  report  #3  

Updated  bug  report  #18  

Updated  bug  report  #9  

Updated  bug  report  #15  

Updated  bug  report  #21  

Updated  bug  report  #52  

Updated  bug  report  #40  

Updated  bug  report  #41  

Updated  bug  report  #68  

Updated  bug  report  #73  Updated  bug  report  #78  

…  

Page 12: Patterns for Cleaning Up Bug Data
Page 13: Patterns for Cleaning Up Bug Data

Mass updates    do not represent actual work Often, they are just cleanup

Page 14: Patterns for Cleaning Up Bug Data

Mass updates    should be discarded from your analyses

Page 15: Patterns for Cleaning Up Bug Data

1. Context 2. Problem 3. Solution 4. Discussion

L&&, O*" f&r M!%% Up-!"#%

Page 16: Patterns for Cleaning Up Bug Data

Determine which changes to bug reports were the result of a mass update

Page 17: Patterns for Cleaning Up Bug Data

1. Context 2. Problem 3. Solution 4. Discussion

L&&, O*" f&r M!%% Up-!"#%

Page 18: Patterns for Cleaning Up Bug Data

You’ll need: -  Changes in bug reports (i.e., updates)

- What changed -  Date -  User -  Comment

I$)r#-(#$"%

Page 19: Patterns for Cleaning Up Bug Data

Bug  #   What  changed   Date   User   Comment  

1   status  ⇒  VERIFIED  

...   ...   ...  

2   status  ⇒  VERIFIED  

...   ...   ...  

3   status  ⇒  CLOSED  

...   ...   ...  

4   status  ⇒  VERIFIED  

...   ...   ...  

I$)r#-(#$"%

Select one type of change (“what changed”) e.g., status ⇒VERIFIED

Page 20: Patterns for Cleaning Up Bug Data

1

D(r#."(&$% (%&'*"(&$ #1)

2 Seek unusually high cliffs 3 Changes in the cliff are

considered mass updates

Plot accum. number of changes over time

Page 21: Patterns for Cleaning Up Bug Data

D(r#."(&$% (%&'*"(&$ #2)

Date   User   Comment  

D1   U1   C1  

D2   U2   C2  

D3   U3   C3  

D4   U4   C4  

D5   U5   C5  

Count  ▼  

1703  

972  

447  

1  

1  

2 Count the groups 3 Groups with

higher counts are mass updates

1 Group changes by ⟨date, user, comment⟩

Page 22: Patterns for Cleaning Up Bug Data

1. Context 2. Problem 3. Solution 4. Discussion

L&&, O*" f&r M!%% Up-!"#%

Page 23: Patterns for Cleaning Up Bug Data

The main challenge is to find a suitable threshold (i.e., how many updates characterize mass updates)

Page 24: Patterns for Cleaning Up Bug Data

O'- W($# T!%"#% B#""#r Determine bug reports that are too recent to be classified.

Page 25: Patterns for Cleaning Up Bug Data

1. Context 2. Problem 3. Solution 4. Discussion

O'- W($# T!%"#% B#""#r

Page 26: Patterns for Cleaning Up Bug Data

Prediction models predict which bug reports will undergo some change, e.g.,

predict which bugs get reopened, predict which bugs get closed as invalid, predict which bugs get assigned to John.

Page 27: Patterns for Cleaning Up Bug Data

e.g., predict which bugs get reopened

#   Who  reported?   Severity   Age   Reopened?  

1   ...   ...   ...   YES  2   ...   ...   ...   YES  3   ...   ...   ...   NO  4   ...   ...   ...   NO  5   ...   ...   ...   NO  

training set

Page 28: Patterns for Cleaning Up Bug Data

#   Who  reported?   Severity   Age   Reopened?  

1   ...   ...   ...   YES  2   ...   ...   ...   YES  3   ...   ...   ...   NO  4   ...   ...   ...   NO  5   ...   ...   1  day   not  yet  

training set

can’t use too recent bugs for training

Page 29: Patterns for Cleaning Up Bug Data

1. Context 2. Problem 3. Solution 4. Discussion

O'- W($# T!%"#% B#""#r

Page 30: Patterns for Cleaning Up Bug Data

Determine bug reports that are too recent to be classified

Page 31: Patterns for Cleaning Up Bug Data

1. Context 2. Problem 3. Solution 4. Discussion

O'- W($# T!%"#% B#""#r

Page 32: Patterns for Cleaning Up Bug Data

You’ll need: -  Date of last change in your data set

-  Bug reports -  Creation date - Whether it has been reopened*

I$)r#-(#$"%

* or, in general, whether it has undergone a particular change

Page 33: Patterns for Cleaning Up Bug Data

Measure each bug’s age, from its creation date to the date of the last change in your data set

1

D(r#."(&$%

#   ...   Age   Reopened?  1   ...   180  days   YES  2   ...   90  days   NO  3   ...   16  days   YES  4   ...   12  days   NO  ...   ...   ...   ...  

Page 34: Patterns for Cleaning Up Bug Data

Guess a threshold so that bugs younger than the threshold are considered too recent to be classified

2

D(r#."(&$%

threshold = 42 days

#   ...   Age   Reopened?  1   ...   180  days   YES  2   ...   90  days   NO  3   ...   16  days   YES  4   ...   12  days   NO  ...   ...   ...   ...  

too recent

Page 35: Patterns for Cleaning Up Bug Data

Estimate the confidence (α) that the remaining non-reopened bugs will never be reopened

3

D(r#."(&$%

#   ...   Age   Reopened?  1   ...   180  days   YES  2   ...   90  days   NO  3   ...   16  days   YES  4   ...   12  days   NO  ...   ...   ...   ...  

confidence (α)?

Page 36: Patterns for Cleaning Up Bug Data

α =

D(r#."(&$% (f&r/*'! ($ "0# p!p#r)

#   ...   Age   Reopened?  1   ...   180  days   YES  2   ...   90  days   NO  3   ...   16  days   YES  4   ...   12  days   NO  ...   ...   ...   ...  

num. bugs that have been reopened num. bugs older than the threshold

Page 37: Patterns for Cleaning Up Bug Data

If α is not high enough (e.g., α< 0.95), choose another threshold (i.e., repeat from )

4

D(r#."(&$%

2

Page 38: Patterns for Cleaning Up Bug Data

1. Context 2. Problem 3. Solution 4. Discussion

O'- W($# T!%"#% B#""#r

Page 39: Patterns for Cleaning Up Bug Data

There’s a trade off:

larger α ⇒ more confidence, less data smaller α⇒ less confidence, more data

Page 40: Patterns for Cleaning Up Bug Data

For the project NetBeans/Platform:

removing bugs younger than 6 weeks (0.7%) raises the confidence from 88% to 95%

Page 41: Patterns for Cleaning Up Bug Data

Arrrr!* It’s in the

paper!

*  

Do ye have any source

code to show?

Page 42: Patterns for Cleaning Up Bug Data

Thank you!

And clean up your bug reports before using them!