emad fse2011 final

38
High-Impact Defects: A Study of Breakage and Surprise Defects Emad Shihab, Audris Mockus, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan

Upload: sailqu

Post on 12-Apr-2017

114 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Emad fse2011 final

High-Impact Defects: A Study of Breakage and Surprise Defects

Emad Shihab, Audris Mockus, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan

Page 2: Emad fse2011 final

2

We know that….

Software ^ has defectsalways

always

How can we spend the limited resources to maximize quality?

Q:

Projects ^ have limited resources

Page 3: Emad fse2011 final

3

Defect Prediction

0.8

0.1Prediction

Model

Size

Pre-release defects..

Complexity

Input: Metrics

Churn

Output: Risk [0..1]

.

.

Key Predictors: Size and pre-release defects

Page 4: Emad fse2011 final

4

Existing Approaches Aren’t Adding Value

• Obvious to practitioners• Require a large amount of effort• Not all defects are equally important

So….what can we do?FOCUS ON HIGH-IMPACT DEFECTS !

Page 5: Emad fse2011 final

5

Impact Is In The Eye of The Beholder!

Customers: Breakages

Break existing functionality

Affect established customersHurt company image

Low pre-, high post-release defects

Catch developers off-guardLead to schedule interruptions

Developers: Surprises

Occur in unexpected locations

Page 6: Emad fse2011 final

6

Case Study

Commercial telecom project

30+ years of development7+ MLOCMainly in C/C++

Page 7: Emad fse2011 final

7

Part 1 Part 2

Part 3 Part 4

Exploratory Study of Breakages and Surprises

Prediction of Breakages and Surprises

Understanding Prediction Models of Breakages and Surprises

Value of Focusing on Breakages and Surprises

Study Overview

Page 8: Emad fse2011 final

8

Exploratory Study of Breakages and Surprises

All files

Breakages Surprises

Post-release10%

2% 2%

Rare (2% of files)

6% overlap Should study them separately

Very difficult to model

Page 9: Emad fse2011 final

9

Part 1 Part 2

Part 3 Part 4

Exploratory Study of Breakages and Surprises

Prediction of Breakages and Surprises

Understanding Prediction Models of Breakages and Surprises

Value of Focusing on Breakages and Surprises

Predicting Breakages and Surprises

Page 10: Emad fse2011 final

10

Prediction Using Logistic Regression

Outcome = Const + β1 factor 1+ β2 factor2+ β3 factor 3

.

.+ βn factor n

Breakage?Surprises?

Factors From 3 Dimensions

Page 11: Emad fse2011 final

11

Factors Used to Model Breakages and Surprises

SizePre-release defects

Number, churn, size, pre-release changes, pre-release defects

Latest changeAge

Traditional

Co-changed files

Time

Page 12: Emad fse2011 final

12

Breakages Surprises

Precision

Recall74.1%71.2%

6.7%2.0%

4.7%2.0%

Random Predictor

Prediction Results

2-3X precision, high recall

Page 13: Emad fse2011 final

13

Part 1 Part 2

Part 3 Part 4

Exploratory Study of Breakages and Surprises

Prediction of Breakages and Surprises

Understanding Prediction Models of Breakages and Surprises

Value of Focusing on Breakages and Surprises

Understanding Breakages and Surprises Models

Page 14: Emad fse2011 final

14

Determining Important Factors

Traditional Co-change Time

15.6%

Quality of fit Deviance Explained

+1.5% +0.4%

Example: Breakages R1.1

Page 15: Emad fse2011 final

15Traditional Co-change Time

Important Factors for High-Impact Defects

R1.1 R2.1 R3 R4 R4.10

5

10

15

20

25

30

35

40

R1.1 R2.1 R3 R4 R4.10

5

10

15

20

25

30

35

40Breakages Surprises

Devi

ance

Exp

lain

ed (%

)

Traditional Co-changeTime

Page 16: Emad fse2011 final

16

Part 1 Part 2

Part 3 Part 4

Exploratory Study of Breakages and Surprises

Prediction of Breakages and Surprises

Understanding Prediction Models of Breakages and Surprises

Value of Focusing on Breakages and Surprises

Value of Focusing on Breakages and Surprises

Page 17: Emad fse2011 final

17

Building Specialized Models

Test

Post-releaseDefects

Train

Breakages

Test

Breakages

Train

Breakages

Compare False

Positives

General model

Specialized model

Page 18: Emad fse2011 final

18

Effort Savings Using Specialized Models

File LOC0

10

20

30

40

50

60

70

80

90

100

41 42

5550

BreakagesSurprises

Effor

t Sav

ings

(%)

40-50% Effort Savings Using Specialized Models

Page 19: Emad fse2011 final

19

Take Home Messages

1. Breakages and Surprises are different. Occur in 2% of files, hard to predict2. Achieve 2-3X improvement in precision, high recall

Co-change and Time metrics

4. Building specialized models saves 40-50% effort

Traditional metrics 3. Breakages

Surprises

http://research.cs.queensu.ca/home/emads/data/FSE2011/hid_artifact.html

Page 20: Emad fse2011 final

20

Page 21: Emad fse2011 final

21

Quantifying Effort Savings

Yes No

Yes 26 320

No 7 1093

Pred

icte

d

ActualYes No

Yes 26 538

No 7 875

Pred

icte

d

Actual

Set recall to be the same

Effort Savings ~41%!

General model Specialized model

Page 22: Emad fse2011 final

22

Remaining Challenges

• “We tend to test features not files”– Can we predict defects for features

• “Without knowing more about the nature of the defect or recommendations for how to fix it, I am not sure how we can use it”– Predict the nature of defects– Can we provide specific remediation strategies for predicted

defects• e.g., surprises mostly relate to incorrectly implemented

requirements

Page 23: Emad fse2011 final

23

Quantifying Effect…An Example…

PredictionModel

Median SizeMedian Pre-defects

.

.Median age

2 x Median Size

100% increase

0.10.2

Page 24: Emad fse2011 final

24

Effect of Factors on Breakages and Surprises

Pre-release

defects Size

No. co-ch

anged files

Churn of co-ch

anged files

Latest ch

ange-150-100

-500

50100150200

154

39

-85

-19

-92

BreakagesSurprises

Page 25: Emad fse2011 final

25

High Impact Defects: Summary

Can we identify them?

What factor best predict them?

What is the value of focusing on them?

Yes, 2-3X precision,~70% recall

Breakages: Traditional Surprises: Co-change and

release schedule

40-50% effort savings

Page 26: Emad fse2011 final

26

Current approaches predict the obvious

Focus on high-impact, i.e. surprises and breakages

Pre-defects and size predict BreakagesNumber and churn of co-changed files and late changes predict surprises

Using specialized models reduces effort by 40-50%

Page 27: Emad fse2011 final

27

Study Overview

Extract Metrics

Build Statistical

Models

Analyze Effect on Quality

1. Traditional2. Co-change3. Time

Logistic Regression 1. Predictive & explanative power2. Quantify Effect

Page 28: Emad fse2011 final

28

Breakage Defects

Defects that break existing functionality

Affect an established customer baseHurt quality image

Page 29: Emad fse2011 final

29

Surprise Defects

Flag files with defects in unexpected locations

Catch practitioners off guardInterrupt schedules

High ratio of post-to-pre defects

Page 30: Emad fse2011 final

30

Predicting Breakages and SurprisesExplanative Power

Breakages Surprises

17.8%13.1%

State of Art(post-release)

17.7 – 27.9%

Page 31: Emad fse2011 final

31

Stability of Important FactorsBreakages

R1.1 R2.1 R3.1

No. co-changed files

Late changes

Pre-defects

R3 R4.1

Size

Churn co-changed files

Highlystable

Mainlystable

Notstable

Page 32: Emad fse2011 final

32

Stability of Important Factors

R1.1 R2.1 R3.1R3 R4.1 R1.1 R2.1 R3.1R3 R4.1

Breakages Surprises

Size

Pre-defects No. co-change files

Churn of co-change files

Late changes

Page 33: Emad fse2011 final

33

Breakage Defects

Defects that break existing functionality

Affect an established customer baseHurt quality image

Page 34: Emad fse2011 final

34

Surprise Defects

Flag files with defects in unexpected locations

Catch practitioners off guardInterrupt schedules

High ratio of post-to-pre defects

Page 35: Emad fse2011 final

35

Defect Prediction Helps Focus Quality Assurance Efforts

Extract Metrics

SizeComplexity.

.

Post-release defects

D(f) = C + 0.1*size(f) + 0.2*complexity(f) + …Model

(e.g. Logistic Regression)

Extract Metrics

SizeComplexity

.

.

D(f) = C + 0.1*size(f) + 0.2*complexity(f) + …

D(f) = 0.8

D(f) = 0.6

Page 36: Emad fse2011 final

36

Factors Used to Model High Impact Defects

SizePre-release defectsAge

Number, churn, size, pre-release changes, pre-release defects

Latest changes

Traditional

Co-changed files

Release schedule

Page 37: Emad fse2011 final

37

SizePre-release defects

# of filesChurn

SizePre-release defectsPre-release changes

Latest ChangeAge

TraditionalTim

e

Co-Changed Files

Prediction Factors

Page 38: Emad fse2011 final

38

Evaluation of Prediction Model

Yes No

Yes TP FP

No FN TN

Pred

icte

d

Actual

Precision 𝑇𝑃𝑇𝑃+𝐹𝑃𝑇𝑃

𝑇𝑃+𝐹𝑁Recall

Training2/3 Testing

1/3

Data

Build ModelInput

Outcome