emad fse2011 final
TRANSCRIPT
High-Impact Defects: A Study of Breakage and Surprise Defects
Emad Shihab, Audris Mockus, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan
2
We know that….
Software ^ has defectsalways
always
How can we spend the limited resources to maximize quality?
Q:
Projects ^ have limited resources
3
Defect Prediction
0.8
0.1Prediction
Model
Size
Pre-release defects..
Complexity
Input: Metrics
Churn
Output: Risk [0..1]
.
.
Key Predictors: Size and pre-release defects
4
Existing Approaches Aren’t Adding Value
• Obvious to practitioners• Require a large amount of effort• Not all defects are equally important
So….what can we do?FOCUS ON HIGH-IMPACT DEFECTS !
5
Impact Is In The Eye of The Beholder!
Customers: Breakages
Break existing functionality
Affect established customersHurt company image
Low pre-, high post-release defects
Catch developers off-guardLead to schedule interruptions
Developers: Surprises
Occur in unexpected locations
6
Case Study
Commercial telecom project
30+ years of development7+ MLOCMainly in C/C++
7
Part 1 Part 2
Part 3 Part 4
Exploratory Study of Breakages and Surprises
Prediction of Breakages and Surprises
Understanding Prediction Models of Breakages and Surprises
Value of Focusing on Breakages and Surprises
Study Overview
8
Exploratory Study of Breakages and Surprises
All files
Breakages Surprises
Post-release10%
2% 2%
Rare (2% of files)
6% overlap Should study them separately
Very difficult to model
9
Part 1 Part 2
Part 3 Part 4
Exploratory Study of Breakages and Surprises
Prediction of Breakages and Surprises
Understanding Prediction Models of Breakages and Surprises
Value of Focusing on Breakages and Surprises
Predicting Breakages and Surprises
10
Prediction Using Logistic Regression
Outcome = Const + β1 factor 1+ β2 factor2+ β3 factor 3
.
.+ βn factor n
Breakage?Surprises?
Factors From 3 Dimensions
11
Factors Used to Model Breakages and Surprises
SizePre-release defects
Number, churn, size, pre-release changes, pre-release defects
Latest changeAge
Traditional
Co-changed files
Time
12
Breakages Surprises
Precision
Recall74.1%71.2%
6.7%2.0%
4.7%2.0%
Random Predictor
Prediction Results
2-3X precision, high recall
13
Part 1 Part 2
Part 3 Part 4
Exploratory Study of Breakages and Surprises
Prediction of Breakages and Surprises
Understanding Prediction Models of Breakages and Surprises
Value of Focusing on Breakages and Surprises
Understanding Breakages and Surprises Models
14
Determining Important Factors
Traditional Co-change Time
15.6%
Quality of fit Deviance Explained
+1.5% +0.4%
Example: Breakages R1.1
15Traditional Co-change Time
Important Factors for High-Impact Defects
R1.1 R2.1 R3 R4 R4.10
5
10
15
20
25
30
35
40
R1.1 R2.1 R3 R4 R4.10
5
10
15
20
25
30
35
40Breakages Surprises
Devi
ance
Exp
lain
ed (%
)
Traditional Co-changeTime
16
Part 1 Part 2
Part 3 Part 4
Exploratory Study of Breakages and Surprises
Prediction of Breakages and Surprises
Understanding Prediction Models of Breakages and Surprises
Value of Focusing on Breakages and Surprises
Value of Focusing on Breakages and Surprises
17
Building Specialized Models
Test
Post-releaseDefects
Train
Breakages
Test
Breakages
Train
Breakages
Compare False
Positives
General model
Specialized model
18
Effort Savings Using Specialized Models
File LOC0
10
20
30
40
50
60
70
80
90
100
41 42
5550
BreakagesSurprises
Effor
t Sav
ings
(%)
40-50% Effort Savings Using Specialized Models
19
Take Home Messages
1. Breakages and Surprises are different. Occur in 2% of files, hard to predict2. Achieve 2-3X improvement in precision, high recall
Co-change and Time metrics
4. Building specialized models saves 40-50% effort
Traditional metrics 3. Breakages
Surprises
http://research.cs.queensu.ca/home/emads/data/FSE2011/hid_artifact.html
20
21
Quantifying Effort Savings
Yes No
Yes 26 320
No 7 1093
Pred
icte
d
ActualYes No
Yes 26 538
No 7 875
Pred
icte
d
Actual
Set recall to be the same
Effort Savings ~41%!
General model Specialized model
22
Remaining Challenges
• “We tend to test features not files”– Can we predict defects for features
• “Without knowing more about the nature of the defect or recommendations for how to fix it, I am not sure how we can use it”– Predict the nature of defects– Can we provide specific remediation strategies for predicted
defects• e.g., surprises mostly relate to incorrectly implemented
requirements
23
Quantifying Effect…An Example…
PredictionModel
Median SizeMedian Pre-defects
.
.Median age
2 x Median Size
100% increase
0.10.2
24
Effect of Factors on Breakages and Surprises
Pre-release
defects Size
No. co-ch
anged files
Churn of co-ch
anged files
Latest ch
ange-150-100
-500
50100150200
154
39
-85
-19
-92
BreakagesSurprises
25
High Impact Defects: Summary
Can we identify them?
What factor best predict them?
What is the value of focusing on them?
Yes, 2-3X precision,~70% recall
Breakages: Traditional Surprises: Co-change and
release schedule
40-50% effort savings
26
Current approaches predict the obvious
Focus on high-impact, i.e. surprises and breakages
Pre-defects and size predict BreakagesNumber and churn of co-changed files and late changes predict surprises
Using specialized models reduces effort by 40-50%
27
Study Overview
Extract Metrics
Build Statistical
Models
Analyze Effect on Quality
1. Traditional2. Co-change3. Time
Logistic Regression 1. Predictive & explanative power2. Quantify Effect
28
Breakage Defects
Defects that break existing functionality
Affect an established customer baseHurt quality image
29
Surprise Defects
Flag files with defects in unexpected locations
Catch practitioners off guardInterrupt schedules
High ratio of post-to-pre defects
30
Predicting Breakages and SurprisesExplanative Power
Breakages Surprises
17.8%13.1%
State of Art(post-release)
17.7 – 27.9%
31
Stability of Important FactorsBreakages
R1.1 R2.1 R3.1
No. co-changed files
Late changes
Pre-defects
R3 R4.1
Size
Churn co-changed files
Highlystable
Mainlystable
Notstable
32
Stability of Important Factors
R1.1 R2.1 R3.1R3 R4.1 R1.1 R2.1 R3.1R3 R4.1
Breakages Surprises
Size
Pre-defects No. co-change files
Churn of co-change files
Late changes
33
Breakage Defects
Defects that break existing functionality
Affect an established customer baseHurt quality image
34
Surprise Defects
Flag files with defects in unexpected locations
Catch practitioners off guardInterrupt schedules
High ratio of post-to-pre defects
35
Defect Prediction Helps Focus Quality Assurance Efforts
Extract Metrics
SizeComplexity.
.
Post-release defects
D(f) = C + 0.1*size(f) + 0.2*complexity(f) + …Model
(e.g. Logistic Regression)
Extract Metrics
SizeComplexity
.
.
D(f) = C + 0.1*size(f) + 0.2*complexity(f) + …
D(f) = 0.8
D(f) = 0.6
36
Factors Used to Model High Impact Defects
SizePre-release defectsAge
Number, churn, size, pre-release changes, pre-release defects
Latest changes
Traditional
Co-changed files
Release schedule
37
SizePre-release defects
# of filesChurn
SizePre-release defectsPre-release changes
Latest ChangeAge
TraditionalTim
e
Co-Changed Files
Prediction Factors
38
Evaluation of Prediction Model
Yes No
Yes TP FP
No FN TN
Pred
icte
d
Actual
Precision 𝑇𝑃𝑇𝑃+𝐹𝑃𝑇𝑃
𝑇𝑃+𝐹𝑁Recall
Training2/3 Testing
1/3
Data
Build ModelInput
Outcome