a comparative analysis of the efficiency of change metrics and static code attributes for defect...

19
A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi Free University of Bolzano-Bozen University of Alberta

Upload: stewart-white

Post on 24-Dec-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction

Raimund Moser, Witold Pedrycz, Giancarlo Succi

Free University of Bolzano-Bozen

University of Alberta

Page 2: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Defect Prediction

quality standards customer satisfaction

Questions:

1. Which metrics are good defect predictors?

2. Which models should be used?

3. How accurate those models?

4. How much does it cost? Benefits?

Page 3: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Approaches for Defect Prediction

1. Product-centric

2. Process-centric

3. Combination of both

Change history of source files (number or size of modifications, age of a file)

Changes in the team structure Testing effort Technology Other human factors to software defects

measures extracted from the: • static/dynamic structure of source code • design documents • design requirements

Page 4: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Previous Work

The relationship between software defects and code metrics

No agreed answerNo cost-insensitive prediction

Impact of the software process on the defectiveness of software

Two aspects of defect prediction:

Page 5: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Questions to Answer by This Work

Questions:

1. Which metrics are good defect predictors?

2. Which models should be used?

3. How accurate those models?

4. How much does it cost? Benefits?

Are change metrics more useful?

Which change metrics are good?

How can cost-sensitive analysis be used?

Not how many defects are present in a subsystem butis source file defective?

Page 6: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Outline

Experimental Set-Up Assessing Classification Accuracy

Accuracy Classification Results Cost-Sensitive Classification

Cost-Sensitive Defect Prediction

Experiment Using a Cost Factor of 5

Page 7: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Data & Experimental Set-Up

Public data set from the Eclipse CVS repository (releases 2.0, 2.1, 3.0) by Zimmermann et al.

18 change metrics concerning change history of files 31 static code attributes metrics that Zimmerman has

used at a file level (correlation analysis, logistic regression, and ranking analysis)

Page 8: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

One Possible Proposal of Change Metrics

renaming or moving software elements

the number of files that have been committed together with file x

in weeks, starting from release date to its first appearance

Page 9: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Experiments

1. Change Model uses proposed change metrics

2. Code Model uses static code metrics

3. Combined Model uses both types of metrics

Build Three Models for Predicting the Presence or Absence of

Defects in Files

Page 10: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Outline

Experimental Set-Up Assessing Classification Accuracy

Accuracy Classification Results Cost-Sensitive Classification

Cost-Sensitive Defect Prediction

Experiment Using a Cost Factor of 5

Page 11: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

ResultsAssessing Classification Accuracy

Page 12: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Accuracy Classification Results

By analyzing the decision trees:

Defect Free: Large MAX_CHANGESET or

Low REVISIONS Smaller MAX_CHANGESET and

Low REVISIONS and

REFACTORINGS

Defect Prone: High number of BUGFIXES

Page 13: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Outline

Experimental Set-Up Assessing Classification Accuracy

Accuracy Classification Results Cost-Sensitive Classification

Cost-Sensitive Defect Prediction

Experiment Using a Cost Factor of 5

Page 14: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Cost-Sensitive Classification

Cost-sensitive classification - costs associated with different errors made by a model

min

>1 FN implicate higher costs than FP

Costly to fix an undetected defect in post release cycle than to inspect defect-free file

Page 15: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Cost-Sensitive Defect PredictionResults for J48 Learner, Release 2.0

Use some heuristics to stop increasing the recall

FP<30%

=5

Page 16: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Experiment Using a Cost Factor of 5

Defect predictors based on change data outperform those based on static code attributes.

Reject

H0

Page 17: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Limitations

Dependability on a specific environment Conclusions are on only three data miners Choice for code and change metrics Reliability of the data

mapping between defects and locations in source code

extraction of code or change metrics from repositories

Page 18: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Conclusions

18 change metrics, J48 learner, =5 give accurate results for 3 releases of the Eclipse project:

>75% of correctly classified files>80% recall< 30% FP rate

Hence, the change metrics contain more discriminatory and meaningful information about the defect distribution that the source code itself.

Important change metrics: Defect prone

files with high revision numberslarge bug fixing activities

Defect-freefiles that are large CVS commitsrefactored several times files

Page 19: A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi

Future Research

Which information in change data is relevant for defect prediction?

How to extract this data automatically?