towards an empirical analysis of the maintainability of cran packages

Post on 25-Dec-2014

124 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

TOWARDS ANEMPIRICAL ANALYSIS

OF THEMAINTAINABILITY OF

CRAN PACKAGESMaëlick Claes

Tom Mens & Philippe Grosjean

&

17th

December 2013, BENEVOL

Software Engineering Lab Numerical Ecology of Aquatic

Systems Lab

0

INTRODUCTIONR & CRAN

Statistical environment based on the S languagePackage systemPackages contain code (R, C, Fortran,...), documentation, examples, tests,datasets,...Dependency relations between packages

http://www.r-­project.org

CRAN"Official" repository containing more than 5000 packagesStrict policy for package acceptancePackage quality regularly checked & archive process.Complaints in the community Hornik 2012, Are there too many R packages?

http://cran.r-­project.org

"DESCRIPTION" FILE

PACKAGE DEPENDENCIES

HAS GROWTH BECOME LINEAR?

CRAN CHECKS

R CMD check command tool for checking package correctness and quality

Package status: OK, NOTE, WARNING or ERROR

Every package release must pass the test on two OS to be accepted on

CRAN

R CMD check rerun daily on the whole set of packages

Packages with inactive maintainer and ERROR status are archived when

the next non-­minor version of R is released

A combination of R version, OS, architecture and C compiler forms a

flavor

EXAMPLESPackage with ERROR status

Package listing

NUMBER OF PACKAGES FOR EACHFLAVOR

HOW DO FLAVORSIMPACT ERRORS?

EVOLUTION OF ERRORS FOREACH FLAVOR

NUMBER OF ERRORS EXTRACTED

WHAT IS THECAUSE OF ERRORS

IN CRAN ANDHOW ARE THEY

FIXED?

SOURCES OF STATUS CHANGESPackage Update (PU)

ERROR status change coincides with the release of a new package

version

Dependency Update (DU)ERROR status change coincides with the release of a new package

version of a dependency

External Factors (EF)ERROR status change without a new package version or a new

version of any dependency

SOURCES OF NEW ERRORS ANDERROR FIXES

ARE ERRORS CAUSED AND FIXEDBY DEPENDENCY UPDATE REALLY

CAUSED AND FIXED BY IT?

WHAT IS THE SOURCE OF ERRORSFIXED BY PACKAGE UPDATE?

HOW LONG DOESIT TAKE TO FIX AN

ERROR?

NUMBER OF DAYS NEEDED TO FIXERRORS

NUMBER OF DAYS NEEDED TO FIXERRORS WITH PACKAGE UPDATE

CONCLUSION

CONCLUSIONSome flavors are more error prone than others

High amount of errors caused by external factors

Most errors fixed quickly without developer intervention

Maintenance effort needs to be focused on fixing errors caused by others

Need for a more specific tool for problems related to dependency changes

FUTURE WORKRefine errors by looking at package content

Static analysis of the R code

Function dependency graph

Rerun some part of R CMD checkImpact of the maintainers on the time required to fix packages and

amount of errors for each flavor

Do packages containing tests are more error prone?

How does the dependency network evolve over time?

THANKS FOR YOUR ATTENTION

QUESTIONS?

top related