benevol 2010
DESCRIPTION
I used these slides during my presentation at BeNeVol 2010 in Lille, France. Paper: Vasilescu B, Serebrenik A and van den Brand MGJ (2010), "Comparative study of software metrics' aggregation techniques", In Proceedings of the 9th Belgian-Netherlands Software Evolution Seminar, pp. 80-84.TRANSCRIPT
![Page 1: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/1.jpg)
Software metrics are usually right-skewed
Histogram of SLOC(org.argouml.ui)
SLOC for classes in org.argouml.ui
Fre
quen
cy
0 100 200 300 400 500
05
1015
2025
![Page 2: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/2.jpg)
2/11
Aggregation of software metrics using the“softnometric” index
Bogdan [email protected]
Eindhoven University of TechnologyThe Netherlands
March 9, 2011
![Page 3: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/3.jpg)
3/11Aggregation techniques
Classical:I MeanI SumI Cardinality
Distribution fitting:I Log-normalI ExponentialI Negative binomial
Inequality indices:I TheilI GiniI KolmI Atkinson
![Page 4: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/4.jpg)
3/11Aggregation techniques
Classical:I MeanI SumI Cardinality
Distribution fitting:I Log-normalI ExponentialI Negative binomial
Inequality indices:I TheilI GiniI KolmI Atkinson
![Page 5: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/5.jpg)
4/11Gini index
The Gini index is based on the Lorenz curve:I proportion of the total income of the population (y-axis)
cumulatively earned by the bottom x% of the people.I 0 perfect equality: every person receives the same income.I 1 perfect inequality: one person receives all the income.
IGini(X ) = AA+B
![Page 6: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/6.jpg)
4/11Gini index
The Gini index is based on the Lorenz curve:I proportion of the total income of the population (y-axis)
cumulatively earned by the bottom x% of the people.I 0 perfect equality: every person receives the same income.I 1 perfect inequality: one person receives all the income.
IGini(X ) = AA+B
![Page 7: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/7.jpg)
5/11Theoretical comparison
Criteria:I Domain→ determines applicability
I Range→ determines interpretation
I Invariance• w.r.t. addition→ LOC, ignore headers• w.r.t. multiplication→ LOC, percentages vs. absolute values
I Decomposability→ explain inequality by partitioning thepopulation into groups
![Page 8: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/8.jpg)
6/11Theoretical comparison
Agg. technique Domain Range Invariance Decomposability
Mean R R - N/ASum R R - N/ACardinality R N - N/AGini Index R+ [0, 1] mult. -
R R mult. -Theil Index R+ [0, log n] mult. yesKolm Index R R+ add. yesAtkinson Index R+ [0, 1− 1/n] mult. -
![Page 9: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/9.jpg)
7/11Empirical comparison
Research questions:
I Does LOC relate to bugs?
I Do the aggregation techniques influence the presence/strength ofthis relation?
I Is there any difference between the aggregation techniques?Do they express the same thing?
![Page 10: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/10.jpg)
8/11Empirical comparison
Case study: ArgoUMLI Open-source,∼ 1200 Java classes,∼ 100 packages.
Methodology:I Tool chain to automatically process issue tracker and version
control system data.I Mapped defects to Java classes and then packages.I Measured SLOC of each class, aggregated to package level.I For each aggregation technique, statistically studied correlation
with bugs.
![Page 11: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/11.jpg)
8/11Empirical comparison
Case study: ArgoUMLI Open-source,∼ 1200 Java classes,∼ 100 packages.
Methodology:I Tool chain to automatically process issue tracker and version
control system data.I Mapped defects to Java classes and then packages.I Measured SLOC of each class, aggregated to package level.I For each aggregation technique, statistically studied correlation
with bugs.
![Page 12: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/12.jpg)
9/11Results
mean IGini ITheil IKolm IAtkinson defectsmean 0.170 0.192 0.6761 0.203 0.0096IGini 0.908 0.467 0.903 0.27ITheil 0.488 0.918 0.273IKolm 0.501 0.119IAtkinson 0.229
I IGini, ITheil and IAtkinson indicate the strongest and also statisticallysignificant correlation with the number of defects.However, high and statistically significant correlation betweenthem.
I Mean indicates the lowest correlation with the number of defects.
1statistically significant correlations, with two-sided p-values not exceeding 0.01, are typeset in boldface
![Page 13: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/13.jpg)
10/11Threats to validity
No control over the issue tracker→mapping of defects to classes.I bugs missing from the issue tracker.I bug fixes not showing up in the commit log.
How representative is the case? How about the version?I replicate on more systems and more versions.
Is LOC the most suitable metric?I replicate with more metrics.
![Page 14: Benevol 2010](https://reader033.vdocument.in/reader033/viewer/2022052909/55985e951a28ab24428b458b/html5/thumbnails/14.jpg)
11/11Conclusions
Software metrics are not distributed normally.
Histogram of SLOC(org.argouml.ui)
SLOC for classes in org.argouml.ui
Fre
quen
cy
0 100 200 300 400 500
05
1015
2025
Theoretical comparison.Agg. technique Domain Range Invariance Decomposability
Mean R R - N/ASum R R - N/ACardinality R N - N/AGini Index R+ [0, 1] mult. -
R R mult. -Theil Index R+ [0, log n] mult. yesKolm Index R R+ add. yesAtkinson Index R+ [0, 1− 1/n] mult. -
Empirical comparison.mean Gini Theil Kolm Atkinson defects
mean 0.170 0.192 0.676 0.203 0.0096Gini 0.908 0.467 0.903 0.27Theil 0.488 0.918 0.273Kolm 0.501 0.119Atkinson 0.229
Classical aggregation techniques have problems when distributions areskewed. Inequality indices look more promising.