emad fse2011 final

High-Impact Defects: A Study of Breakage and Surprise Defects

Emad Shihab, Audris Mockus, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan

2

We know that….

Software ^ has defectsalways

always

How can we spend the limited resources to maximize quality?

Q:

Projects ^ have limited resources

3

Defect Prediction

0.8

0.1Prediction

Model

Size

Pre-release defects..

Complexity

Input: Metrics

Churn

Output: Risk [0..1]

.

.

Key Predictors: Size and pre-release defects

4

Existing Approaches Aren’t Adding Value

• Obvious to practitioners• Require a large amount of effort• Not all defects are equally important

So….what can we do?FOCUS ON HIGH-IMPACT DEFECTS !

5

Impact Is In The Eye of The Beholder!

Customers: Breakages

Break existing functionality

Affect established customersHurt company image

Low pre-, high post-release defects

Catch developers off-guardLead to schedule interruptions

Developers: Surprises

Occur in unexpected locations

6

Case Study

Commercial telecom project

30+ years of development7+ MLOCMainly in C/C++

7

Part 1 Part 2

Part 3 Part 4

Exploratory Study of Breakages and Surprises

Prediction of Breakages and Surprises

Understanding Prediction Models of Breakages and Surprises

Value of Focusing on Breakages and Surprises

Study Overview

8


All files

Breakages Surprises

Post-release10%

2% 2%

Rare (2% of files)

6% overlap Should study them separately

Very difficult to model

9

Part 1 Part 2

Part 3 Part 4





Predicting Breakages and Surprises

10

Prediction Using Logistic Regression

Outcome = Const + β1 factor 1+ β2 factor2+ β3 factor 3

.

.+ βn factor n

Breakage?Surprises?

Factors From 3 Dimensions

11

Factors Used to Model Breakages and Surprises

SizePre-release defects

Number, churn, size, pre-release changes, pre-release defects

Latest changeAge

Traditional

Co-changed files

Time

12

Breakages Surprises

Precision

Recall74.1%71.2%

6.7%2.0%

4.7%2.0%

Random Predictor

Prediction Results

2-3X precision, high recall

13

Part 1 Part 2

Part 3 Part 4





Understanding Breakages and Surprises Models

14

Determining Important Factors

Traditional Co-change Time

15.6%

Quality of fit Deviance Explained

+1.5% +0.4%

Example: Breakages R1.1

15Traditional Co-change Time

Important Factors for High-Impact Defects

R1.1 R2.1 R3 R4 R4.10

5

10

15

20

25

30

35

40

R1.1 R2.1 R3 R4 R4.10

5

10

15

20

25

30

35

40Breakages Surprises

Devi

ance

Exp

lain

ed (%

)

Traditional Co-changeTime

16

Part 1 Part 2

Part 3 Part 4






17

Building Specialized Models

Test

Post-releaseDefects

Train

Breakages

Test

Breakages

Train

Breakages

Compare False

Positives

General model

Specialized model

18

Effort Savings Using Specialized Models

File LOC0

10

20

30

40

50

60

70

80

90

100

41 42

5550

BreakagesSurprises

Effor

t Sav

ings

(%)

40-50% Effort Savings Using Specialized Models

19

Take Home Messages

1. Breakages and Surprises are different. Occur in 2% of files, hard to predict2. Achieve 2-3X improvement in precision, high recall

Co-change and Time metrics

4. Building specialized models saves 40-50% effort

Traditional metrics 3. Breakages

Surprises

http://research.cs.queensu.ca/home/emads/data/FSE2011/hid_artifact.html

21

Quantifying Effort Savings

Yes No

Yes 26 320

No 7 1093

Pred

icte

d

ActualYes No

Yes 26 538

No 7 875

Pred

icte

d

Actual

Set recall to be the same

Effort Savings ~41%!

General model Specialized model

22

Remaining Challenges

• “We tend to test features not files”– Can we predict defects for features

• “Without knowing more about the nature of the defect or recommendations for how to fix it, I am not sure how we can use it”– Predict the nature of defects– Can we provide specific remediation strategies for predicted

defects• e.g., surprises mostly relate to incorrectly implemented

requirements

23

Quantifying Effect…An Example…

PredictionModel

Median SizeMedian Pre-defects

.

.Median age

2 x Median Size

100% increase

0.10.2

24

Effect of Factors on Breakages and Surprises

Pre-release

defects Size

No. co-ch

anged files

Churn of co-ch

anged files

Latest ch

ange-150-100

-500

50100150200

154

39

-85

-19

-92

BreakagesSurprises

25

High Impact Defects: Summary

Can we identify them?

What factor best predict them?

What is the value of focusing on them?

Yes, 2-3X precision,~70% recall

Breakages: Traditional Surprises: Co-change and

release schedule

40-50% effort savings

26

Current approaches predict the obvious

Focus on high-impact, i.e. surprises and breakages

Pre-defects and size predict BreakagesNumber and churn of co-changed files and late changes predict surprises

Using specialized models reduces effort by 40-50%

27

Study Overview

Extract Metrics

Build Statistical

Models

Analyze Effect on Quality

1. Traditional2. Co-change3. Time

Logistic Regression 1. Predictive & explanative power2. Quantify Effect

28

Breakage Defects

Defects that break existing functionality

Affect an established customer baseHurt quality image

29

Surprise Defects

Flag files with defects in unexpected locations

Catch practitioners off guardInterrupt schedules

High ratio of post-to-pre defects

30

Predicting Breakages and SurprisesExplanative Power

Breakages Surprises

17.8%13.1%

State of Art(post-release)

17.7 – 27.9%

31

Stability of Important FactorsBreakages

R1.1 R2.1 R3.1

No. co-changed files

Late changes

Pre-defects

R3 R4.1

Size

Churn co-changed files

Highlystable

Mainlystable

Notstable

32

Stability of Important Factors

R1.1 R2.1 R3.1R3 R4.1 R1.1 R2.1 R3.1R3 R4.1

Breakages Surprises

Size

Pre-defects No. co-change files

Churn of co-change files

Late changes

33

Breakage Defects

Defects that break existing functionality

Affect an established customer baseHurt quality image

34

Surprise Defects

Flag files with defects in unexpected locations

Catch practitioners off guardInterrupt schedules

High ratio of post-to-pre defects

35

Defect Prediction Helps Focus Quality Assurance Efforts

Extract Metrics

SizeComplexity.

.

Post-release defects

D(f) = C + 0.1*size(f) + 0.2*complexity(f) + …Model

(e.g. Logistic Regression)

Extract Metrics

SizeComplexity

.

.

D(f) = C + 0.1*size(f) + 0.2*complexity(f) + …

D(f) = 0.8

D(f) = 0.6

36

Factors Used to Model High Impact Defects

SizePre-release defectsAge

Number, churn, size, pre-release changes, pre-release defects

Latest changes

Traditional

Co-changed files

Release schedule

37

SizePre-release defects

# of filesChurn

SizePre-release defectsPre-release changes

Latest ChangeAge

TraditionalTim

e

Co-Changed Files

Prediction Factors

38

Evaluation of Prediction Model

Yes No

Yes TP FP

No FN TN

Pred

icte

d

Actual

Precision 𝑇𝑃𝑇𝑃+𝐹𝑃𝑇𝑃

𝑇𝑃+𝐹𝑁Recall

Training2/3 Testing

1/3

Data

Build ModelInput

Outcome

emad fse2011 final

Documents