characterizing the open source software process: a horizontal study

27
Politecnico di Torino Andrea Capiluppi andrea.capiluppi@polit o.it Characterizing the Open Source Software Process: a Horizontal Study A. Capiluppi, P. Lago, M. Morisio

Upload: lecea

Post on 23-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

Characterizing the Open Source Software Process: a Horizontal Study. A. Capiluppi, P. Lago, M. Morisio. Outline. Rationale behind the current study Methodology Conclusions Actual and future work. Rationale. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Characterizing the Open Source Software Process:  a Horizontal Study

Politecnico di Torino

Andrea Capiluppi

[email protected]

Characterizing the Open Source Software Process:

a Horizontal Study

Characterizing the Open Source Software Process:

a Horizontal Study

A. Capiluppi, P. Lago, M. Morisio

Page 2: Characterizing the Open Source Software Process:  a Horizontal Study

2Politecnico di Torino

OutlineOutline

Rationale behind the current studyMethodologyConclusionsActual and future work

Page 3: Characterizing the Open Source Software Process:  a Horizontal Study

3Politecnico di Torino

RationaleRationale

Most Open Source analyses focus on a single, flagship project (Linux, Apache, GNOME) Limitation: the conclusions are based on a ‘vertical’ studythere is a lack of ‘horizontal’ studies

a pool of projects a wider area of interest

Page 4: Characterizing the Open Source Software Process:  a Horizontal Study

4Politecnico di Torino

MethodologyMethodology

Choice of projectsAttributes definitionCodingAnalysis

Page 5: Characterizing the Open Source Software Process:  a Horizontal Study

5Politecnico di Torino

Choice of projects: repositoryChoice of projects: repository

Selected FreshMeat repositoryFreshMeat (http://freshmeat.net) is focused on Open Source development since 1996It gathers thousands of projects, either doubled on the pages of SourceForge (http://sourceforge.net), or hosted on FreshMeat only.FreshMeat lists more than 24000 projects (many inactive)

Page 6: Characterizing the Open Source Software Process:  a Horizontal Study

6Politecnico di Torino

Choice of projects: sampling IChoice of projects: sampling I

From 24000 to 406 - how?

FreshMeat organizes projects by filters and categories

Filter = “Topic”Categories = {“Internet”, “Database”, “Multimedia”,…}

Other filters: Programming language, Topic (i.e. application domain), Status of Evolution, etc.

Page 7: Characterizing the Open Source Software Process:  a Horizontal Study

7Politecnico di Torino

Choice of projects: sampling IIChoice of projects: sampling II

We picked randomly a number of projects through the “Status” filter

Rationale: limited number of categories associated {“Planning”, “PreAlpha”, “Alpha”, “Beta”, “Stable”, “Mature”}

The overall count is 406 projects

Page 8: Characterizing the Open Source Software Process:  a Horizontal Study

8Politecnico di Torino

Attribute definition Attribute definition

AgeApplication domainProgramming languageSize [KB]Number of developersStable and transient developersNumber of users • Red: defined by

FreshMeat• Black: defined by us

Modularity level Documentation levelPopularityStatusSuccess of projectVitality

Page 9: Characterizing the Open Source Software Process:  a Horizontal Study

9Politecnico di Torino

CodingCoding

Each attribute was coded twice, to capture evolutive trends

First observation: January 2002Second observation: July 2002

Page 10: Characterizing the Open Source Software Process:  a Horizontal Study

10Politecnico di Torino

AnalysisAnalysis

Here we discuss:Application domain issuesDevelopers [stable & transient] issuesSubscribers (as users) issuesCode size issues

Page 11: Characterizing the Open Source Software Process:  a Horizontal Study

11Politecnico di Torino

Application domain distributionApplication domain distribution

96 89 8670

27 23 18 16 16 10 9 5 10

341414

113

0

20

40

60

80

100

120

140

160In

tern

et

Sys

tem

Sw

Dev

el

Com

munic

atio

ns

Multim

edia

Des

ktop

Dat

abas

e

Gam

es

Sec

urity

Utilit

ies

Sci

ent/Eng

Tex

t Editors

Offi

ce/B

usi

nes

s

Tex

t Pro

cess

ing

Printing

Ter

min

als

oth

er

Page 12: Characterizing the Open Source Software Process:  a Horizontal Study

12Politecnico di Torino

Attributes: project’s developersAttributes: project’s developers

We evaluate how many people write code for an applicationExternal contributions are always credited in special-purpose files, or in the ChangeLogWe distinguish betweenStable developersTransient developers

Core team: more than one stable developerManual inspections and pattern-recognition scripts

Page 13: Characterizing the Open Source Software Process:  a Horizontal Study

13Politecnico di Torino

Developers over projectsDevelopers over projects

We observe:72% of projects have a single stable developer80% of projects have at most a number of 10 developers

Page 14: Characterizing the Open Source Software Process:  a Horizontal Study

14Politecnico di Torino

Developers distribution over projectsDevelopers distribution over projects

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

0 1 2 up to 10 up to 20 up to 50 up to100

morethan100

Developers

Fre

qu

ency

[%

]

DevelopersStable devTransient dev

Page 15: Characterizing the Open Source Software Process:  a Horizontal Study

15Politecnico di Torino

Definition: clusters of developers Definition: clusters of developers

Cluster 1: 1 to 3 developers (64.5%)Cluster 2: 4 to 10 developers (20%)Cluster 3: 11 to 20 developers (9.5%)

“Average” nr. of stable dev: 2“Average” nr. of transient dev: 3

Cluster 4: more than 20 developers (6%)“Average” nr. of stable dev: 6“Average” nr. of stable dev: 19

Page 16: Characterizing the Open Source Software Process:  a Horizontal Study

16Politecnico di Torino

Productivity vs. ‘global’ developersProductivity vs. ‘global’ developers

605

733

621656

0

100

200

300

400

500

600

700

800

Clust1 Clust2 Clust3 Clust4

Global developers

Co

de

Siz

e [

kB

]

Page 17: Characterizing the Open Source Software Process:  a Horizontal Study

17Politecnico di Torino

Productivity vs. ‘stable’ developersProductivity vs. ‘stable’ developers

1867

2543

3223

438

0

500

1000

1500

2000

2500

3000

3500

1 to 3 4 to 10 11 to 20 more than 20

Stable developers

Co

de

size

[kB

]

Page 18: Characterizing the Open Source Software Process:  a Horizontal Study

18Politecnico di Torino

Code variation over clustersCode variation over clusters

10.94%

19.58%

10.40%

15.83%

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

Clust1 Clust2 Clust3 Clust4

Co

de

Var

iati

on

[%

]

Page 19: Characterizing the Open Source Software Process:  a Horizontal Study

19Politecnico di Torino

Attributes: subscribersAttributes: subscribers

We use some publicly available data to gather some proxy about usersUsers ~ Mailing List subscribers (public datum)It’s not a monotonic measure: subscribers can join and leave as wellWe have a measure of users in two different observations

Page 20: Characterizing the Open Source Software Process:  a Horizontal Study

20Politecnico di Torino

Distribution of subscribers over projectDistribution of subscribers over project

05

1015

2025

3035

4045

1 5 10 50 100 More

Number of subscribers

Fre

qu

ency

[%

]

all projectsolder than one year

Around 42% of projects have at most 1 subscriber-user

Page 21: Characterizing the Open Source Software Process:  a Horizontal Study

21Politecnico di Torino

Users evolutionUsers evolution

30.3%

12.1%9.1%

32.3%

16.3%

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

35.0%

40.0%

Between 1 and 10 More than 10

Users evolution

Fre

qu

en

cy

[%

] No Gain

Proj's Loosing users

Proj's Gaining users

Page 22: Characterizing the Open Source Software Process:  a Horizontal Study

22Politecnico di Torino

Attributes: project’s sizeAttributes: project’s size

We evaluate the code of each project twiceCode evaluated is contained in packages. We exclude from the count:

Auxiliary files: documentation, configuration files, GIF files, etc.Legacy code: inherited libraries (e.g. Gnome macros), internationalization code

Page 23: Characterizing the Open Source Software Process:  a Horizontal Study

23Politecnico di Torino

Distribution of code size over projectsDistribution of code size over projects

39.25%

7%

17%

1%

35.75%

[0-10] (10-100] (100-1000] (1000-10000]

>10000

Size clusters [KB]

Fre

qu

en

cy

[%

]

Page 24: Characterizing the Open Source Software Process:  a Horizontal Study

24Politecnico di Torino

Evolutive observations of size changesEvolutive observations of size changes

59%

22%15%

5%

0% (0%-10%] (10%-50%] >50%

Range of variation

Fre

qu

ency

[%

]

Page 25: Characterizing the Open Source Software Process:  a Horizontal Study

25Politecnico di Torino

Conclusions IConclusions I

The vast majority of projects are developed by only one developerAdding people to a project has small effect on productivity (i.e. code per developer)Open Source software is made by experts for experts (72% of horizontal projects have more than 10 developers)58% of projects didn’t change their size63% of projects had a change within 1%

Page 26: Characterizing the Open Source Software Process:  a Horizontal Study

26Politecnico di Torino

Conclusions IIConclusions II

Java is relevant for 8% of the projects, C/C++ for 56%, PERL with Python for 20%Observations from flagship projects (Apache, Linux, Gnome) are not confirmed for an average Open Source projectSeveral projects are white noise: to be filtered outHuge amount of data on public repositories: empirical researchers have an invaluable resource of software data

Page 27: Characterizing the Open Source Software Process:  a Horizontal Study

27Politecnico di Torino

Current and future workCurrent and future work

Eliminating white noise: only projects in cluster 3 and 4 have been selectedDeeper analysis: the whole story of a project is being studied

What can we say with respect of conclusions on bigger OS projects?What can be said about OSS evolution compared with traditional software evolution?