evidence for the pareto principle in open source software activity

16
Evidence for the Pareto principle in OSS Activity Mathieu Goeminne & Tom Mens Service de Génie Logiciel, Institut d’Informatique Faculté des Sciences, Université de Mons 1

Upload: tom-mens

Post on 28-Nov-2014

684 views

Category:

Education


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Evidence for the Pareto principle in open source software activity

Evidence for the Pareto principle in OSS Activity

Mathieu Goeminne & Tom MensService de Génie Logiciel, Institut d’Informatique

Faculté des Sciences, Université de Mons

1

Page 2: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom Mens

Research topic

• Study of open source software evolution.

• Taking into account the community (social network) of persons surrounding the software project (developers, users).

• Looking for recurrent behaviour in this community.

Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany2

Page 3: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

Goals

• Long-term goal

• Understand how software community and software product/project co-evolve

• Provide guidelines and tools to support this

• Short-term goal

• Study of how development activity is distributed over the different stakeholders.

• Find evidence for the Pareto principle in evolving OSS.

3

Page 4: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

Questions• Is there a core group (of developers and/or

users) being significantly more active than the others?

• How does the activity distribution evolve over time?

• Is there an overlap between the different activities?

• How does the activity distribution vary across different projects?

4

Page 5: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

Methodology

• Exploiting available data from source code repositories, mailing lists and bug trackers.

• Use of economy metrics measuring distribution (in)equality.

• Empirical study of 3 OSS : Brasero, Evince and Wine.

5

Page 6: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

Econometrics

• Gini, Hoover, Theil (normalised) : express inequality in a distribution.

• Values between 0 and 1

• 0 reflects a perfect equality

• 1 reflects a perfect inequality

• Have similar behaviors.

6

Page 7: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

The 3 notions of activities we used

• # commits done

• # mails sent

• # bug status changed

All of them are related to typical developer activities

7

Page 8: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

Activity distributions (Gini index)

!"

!#$"

!#%"

!#&"

!#'"

!#("

!#)"

!#*"

!#+"

!#,"

$"

-./0!)" 1230!*" -./0!*" 1230!+" -./0!+" 1230!," -./0!," 1230$!" -./0$!"

.4556/7"

58697"

:;"0".<8=>?7"

!"

!#$"

!#%"

!#&"

!#'"

!#("

!#)"

!#*"

!#+"

!#,"

$"

-./0,," -./0!!" -./0!$" -./0!%" -./0!&" -./0!'" -./0!(" -./0!)" -./0!*" -./0!+" -./0!," -./0$!"

1233456"

37486"

9:;"/<.2/5"1=7>;<6"

Brasero

Evince

!"

!#$"

!#%"

!#&"

!#'"

!#("

!#)"

!#*"

!#+"

!#,"

$"

-./0,+"-./0,," 1230!!" 1230!$" 1230!%" 1230!&" 1230!'" 1230!(" 1230!)" 1230!*" 1230!+" 1230!," 1230$!"

2.44536"

47586"

9:;"<=>.<3"2?7@;=6"

Wine

8

Page 9: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

Pareto principle

• Most of the activity is carried out by a small group of persons.

• Typically : 20% do 80% of the job.

• Doesn’t necessarily imply that the activity distribution follows a Pareto law.

9

Page 10: Evidence for the Pareto principle in open source software activity

Université de Mons

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

Pareto principle (cont.)Brasero

Evince

Wine

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

commits

mails

br changes

10

Page 11: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

Core groups

• Display Venn diagrams of most active (top 20) persons, according to each definition of activity.

• For each person, show the percentage of activity attributable to this person.

• Use heuristics to take into account and merge multiple identities representing the same real person.

11

Page 12: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

Core groups (cont.)Brasero

Evince

Wine

12

Page 13: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

Conclusions

• Activity distributions seem to become more and more unequally distributed.

• The Pareto principle is clearly present in studied projects.

• For Brasero and Evince, the activity is led by a limited number of persons involved in 2 or 3 of the defined activities.

• For Wine, it seems not to be the case.

13

Page 14: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

Future work

• Determine if the core group of a project evolves over time.

• Use sliding windows to ignore inactive persons and discover new active persons.

• Study “bus factor” and the persons involved.

• Automatic generation of Venn diagrams including all persons involved in software evolution.

14

Page 15: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

Future work (cont.)

• Study correlation between community structure (social network) and source code quality (as computed using software metrics).

• Automatic statistical analysis to determine the distributions fitting the data.

• Extend and refine types of activity. For instance:

• different types of commit activity (doc, source code, test, etc.); of mail activity (information, asking, answering, etc.); of bug repository activity (bug creation, modification and commenting)

15

Page 16: Evidence for the Pareto principle in open source software activity

Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany

Thank you

16