an empirical study of identical function clones in cran
TRANSCRIPT
An Empirical Study ofIdentical Function Clones
in CRANMaëlick Claes
Tom Mens, Narjisse Tabout & Philippe Grosjean&
6th February 2014, IWSC 2015
Software Engineering Lab Numerical Ecology of AquaticSystems Lab
0
Introduction
Statistical environment based on the S languagePackages with code, doc, examples, tests, datasetsCRAN (Comprehensive R Archive Network)
Official R package repositoryStrict policy for package acceptancePackage quality regularly checked & archive processComplaints in the community Hornik 2012, Are there too many R packages?
Empirical study of Inter-project clones in CRAN
http://www.r-project.org
Previous workPreliminary empirical study using CRAN meta-data
On the maintainability of CRAN packages (CSMR-WCRE 2014)R CMD check results from CRAN:
Most errors resolved quickly without developer interventionMaintenance effort needs to focus on fixing errors caused by othersNeed for a more specific tool to detect problems related to dependencychanges
Web-dashboard for CRAN maintainersmaintaineR, a web-based dashboard for maintainers of CRAN packages(ICSME 2014)Type-1 function clone identification
http://cran.r-project.org/web/checks/
Identifying cloned R functionsParsing R code with R itselfAssigning a SHA-1 hash to each function's ASTIgnoring functions with less than 6 lines of codeIdentifying Type-1 clones = identifying identical hashes across packages
Observed clone casesCoexisting package versions: plyr and dplyr, lme and nlme, np and npRmpiFork package: Rcmdr and QCAGUIFrequently cloned package: distrUtility package: DescToolsPopular package: MASSPopular function: permn() from combinat
Research QuestionsHow prevalent are (Type-1) function clones in CRAN?Why did these clones appear?Is it possible to remove them and how?
How prevalent are(Type-1) functionclones in CRAN?
Evolution of the number ofpackages
Evolution of the number of LOC
Evolution of the relative size
Why did clonesappear?
Categorizing clonesAll clones on 1st December 2014
7366 clones162k LOC1409 packages
3184 clone setsIdentifying the origin of each clone setEach clone set origin is either
An anonymous and/or local functionAn archived global functionA private global functionA public global function
Anonymous, local and globalfunctions
From DescTools 0.99.8.1 package...qbinom.abscont <- function(p, size, x){ fun <- function(prob, size, x, p){ pbinom.abscont(x, size, prob) - p } uniroot(fun, interval = c(0, 1), size = size, x = x, p = p)$root}
... which could be rewritten asqbinom.abscont <- function(p, size, x){ uniroot(function(prob, size, x, p){ pbinom.abscont(x, size, prob) - p }, interval = c(0, 1), size = size, x = x, p = p)$root}
NAMESPACE fileAlso from DescTools 0.99.8.1
exportPattern("̂[̂\\.]")
importFrom("boot", "boot", "boot.ci", "corr")import(tcltk)
useDynLib(DescTools)
Classification of clone origins
Most clones were created because it was not possible to re-use the original function
Is it possible toremove clones and
how?
Adding dependency toThe origin package
673 out of the 1899 global clone set origins are public functions782 functions that could potentially be removed in 332 packages48 functions in a package where there is already a direct dependency20 functions in a package where a dependency cannot be added withoutcreating cycles
A non-original clone copyOn 2511 clone sets with a non-public origin function, only 250 have anotherpublic copyOnly 299 functions could be removed by depending on another copy
=> Removing clones in CRAN packages cannot be reduced to code refactoring. Mostof the time it would require communication between maintainers of different
packages
ConclusionCloned code represents a small fraction of all CRAN code but still more than100K LOC across the biggest CRAN packagesMost clones cannot be removed by adding dependencies without enforcing CRANpolicyBut still an important number of clones that could theoretically easily be removedFurther work needed to understand if the refactorable clones are justified or not
Future WorkAsking developers (survey) about their cloning behaviorType-2 and Type-3 clonesClone patternsInter-project cloning behavior in other languages / ecosystems
Thanks for your attention
Questions?Slides: http://maelick.net/presentations/iwsc2015/