towards an empirical analysis of the maintainability of cran packages

25
TOWARDS AN EMPIRICAL ANALYSIS OF THE MAINTAINABILITY OF CRAN PACKAGES Maëlick Claes Tom Mens & Philippe Grosjean & 17 th December 2013, BENEVOL Software Engineering Lab Numerical Ecology of Aquatic Systems Lab 0

Upload: tom-mens

Post on 25-Dec-2014

124 views

Category:

Technology


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Towards an empirical analysis of the maintainability of CRAN packages

TOWARDS ANEMPIRICAL ANALYSIS

OF THEMAINTAINABILITY OF

CRAN PACKAGESMaëlick Claes

Tom Mens & Philippe Grosjean

&

17th

December 2013, BENEVOL

Software Engineering Lab Numerical Ecology of Aquatic

Systems Lab

0

Page 2: Towards an empirical analysis of the maintainability of CRAN packages

INTRODUCTIONR & CRAN

Page 3: Towards an empirical analysis of the maintainability of CRAN packages

Statistical environment based on the S languagePackage systemPackages contain code (R, C, Fortran,...), documentation, examples, tests,datasets,...Dependency relations between packages

http://www.r-­project.org

Page 4: Towards an empirical analysis of the maintainability of CRAN packages

CRAN"Official" repository containing more than 5000 packagesStrict policy for package acceptancePackage quality regularly checked & archive process.Complaints in the community Hornik 2012, Are there too many R packages?

http://cran.r-­project.org

Page 5: Towards an empirical analysis of the maintainability of CRAN packages

"DESCRIPTION" FILE

Page 6: Towards an empirical analysis of the maintainability of CRAN packages

PACKAGE DEPENDENCIES

Page 7: Towards an empirical analysis of the maintainability of CRAN packages

HAS GROWTH BECOME LINEAR?

Page 8: Towards an empirical analysis of the maintainability of CRAN packages

CRAN CHECKS

R CMD check command tool for checking package correctness and quality

Package status: OK, NOTE, WARNING or ERROR

Every package release must pass the test on two OS to be accepted on

CRAN

R CMD check rerun daily on the whole set of packages

Packages with inactive maintainer and ERROR status are archived when

the next non-­minor version of R is released

A combination of R version, OS, architecture and C compiler forms a

flavor

Page 9: Towards an empirical analysis of the maintainability of CRAN packages

EXAMPLESPackage with ERROR status

Package listing

Page 10: Towards an empirical analysis of the maintainability of CRAN packages

NUMBER OF PACKAGES FOR EACHFLAVOR

Page 11: Towards an empirical analysis of the maintainability of CRAN packages

HOW DO FLAVORSIMPACT ERRORS?

Page 12: Towards an empirical analysis of the maintainability of CRAN packages

EVOLUTION OF ERRORS FOREACH FLAVOR

Page 13: Towards an empirical analysis of the maintainability of CRAN packages

NUMBER OF ERRORS EXTRACTED

Page 14: Towards an empirical analysis of the maintainability of CRAN packages

WHAT IS THECAUSE OF ERRORS

IN CRAN ANDHOW ARE THEY

FIXED?

Page 15: Towards an empirical analysis of the maintainability of CRAN packages

SOURCES OF STATUS CHANGESPackage Update (PU)

ERROR status change coincides with the release of a new package

version

Dependency Update (DU)ERROR status change coincides with the release of a new package

version of a dependency

External Factors (EF)ERROR status change without a new package version or a new

version of any dependency

Page 16: Towards an empirical analysis of the maintainability of CRAN packages

SOURCES OF NEW ERRORS ANDERROR FIXES

Page 17: Towards an empirical analysis of the maintainability of CRAN packages

ARE ERRORS CAUSED AND FIXEDBY DEPENDENCY UPDATE REALLY

CAUSED AND FIXED BY IT?

Page 18: Towards an empirical analysis of the maintainability of CRAN packages

WHAT IS THE SOURCE OF ERRORSFIXED BY PACKAGE UPDATE?

Page 19: Towards an empirical analysis of the maintainability of CRAN packages

HOW LONG DOESIT TAKE TO FIX AN

ERROR?

Page 20: Towards an empirical analysis of the maintainability of CRAN packages

NUMBER OF DAYS NEEDED TO FIXERRORS

Page 21: Towards an empirical analysis of the maintainability of CRAN packages

NUMBER OF DAYS NEEDED TO FIXERRORS WITH PACKAGE UPDATE

Page 22: Towards an empirical analysis of the maintainability of CRAN packages

CONCLUSION

Page 23: Towards an empirical analysis of the maintainability of CRAN packages

CONCLUSIONSome flavors are more error prone than others

High amount of errors caused by external factors

Most errors fixed quickly without developer intervention

Maintenance effort needs to be focused on fixing errors caused by others

Need for a more specific tool for problems related to dependency changes

Page 24: Towards an empirical analysis of the maintainability of CRAN packages

FUTURE WORKRefine errors by looking at package content

Static analysis of the R code

Function dependency graph

Rerun some part of R CMD checkImpact of the maintainers on the time required to fix packages and

amount of errors for each flavor

Do packages containing tests are more error prone?

How does the dependency network evolve over time?

Page 25: Towards an empirical analysis of the maintainability of CRAN packages

THANKS FOR YOUR ATTENTION

QUESTIONS?