fse 2012 talk: finding mentors in software projects
TRANSCRIPT
Who is going to Mentor Newcomers in Open Source Projects?
Gerardo Massimiliano Rocco Sebastiano Canfora Di Penta Oliveto Panichella
Context and MotivationsSoftware Development
How?Training via Mentoring
Case StudyExploratory analysisRecommendation system evaluation
Training Project Newcomers
Newcomer
Competencies
Technical Skills
OrganizationalAspects
Zhou and Mockus, ICSE 2011
Previous Work
Low sociability
Be/ertraining
Does the Initial Environment Impact the Future ofDevelopers?
Minghui ZhouSchool of Electronics Engineering and Computer
Science, Peking UniversityKey Laboratory of High Confidence Software
Technologies, Ministry of EducationBeijing 100871, [email protected]
Audris MockusAvaya Labs Research
233 Mt Airy Rd, Basking Ridge, [email protected]
ABSTRACT
Software developers need to develop technical and social skills tobe successful in large projects. We model the relative sociality ofa developer as a ratio between the size of her communication net-work and the number of tasks she participates in. We obtain bothmeasures from the problem tracking systems. We use her work-flow peer network to represent her social learning, and the issuesshe has worked on to represent her technical learning. Using threeopen source and three traditional projects we investigate how theproject environment reflected by the sociality measure at the timea developer joins, affects her future participation. We find: a) theprobability that a new developer will become one of long-term andproductive developers is highest when the project sociality is low;b) times of high sociality are associated with a higher intensity ofnew contributors joining the project; c) there are significant dif-ferences between the social learning trajectories of the developerswho join in low and in high sociality environments; d) the opensource and commercial projects exhibit different nature in the rela-tionship between developer’s tenure and the project’s environmentat the time she joins. These findings point out the importance ofthe initial environment in determining the future of the developersand may lead to better training and learning strategies in softwareorganizations.
Categories and Subject Descriptors
D.2.8 [Software Engineering]: Metrics—process metrics; D.2.9[Software Engineering]: Management—productivity
General Terms
Measurement, Performance, Human Factors
Keywords
Socio-technical balance, initial environment, relative sociality, learn-ing trajectory
1. INTRODUCTIONThe most critical tasks in software projects require “expertise
across multiple areas”, however, “there are few staff to choose from”
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ICSE11, May 21-28 2011, Waikiki, Honolulu , HI, USACopyright 2011 ACM 978-1-4503-0445-0/11/05 ...$10.00.
according to an expert developer 1. One possibility suggested bya software project manager, is that many developers tend to fo-cus on the modules they are familiar with, and rarely communicateoutside their narrow circle of colleagues to gain expertise in otherareas. The software engineering literature has investigated the im-portant role of social and communication aspects in a developer’swork. They might impact developer productivity (Cataldo et al [2])and they might affect software quality (Cataldo et al [3]). Further-more, cognitive scientists have argued that interacting with partnersis significantly better than learning alone [5]. In other words, thedevelopers need both technical and social skills to be capable ofsolving critical tasks, though that might present two contradictingor at least competing learning goals.
On the other hand, there may be obstacles for the developers toachieve socio-technical balance, even when they have a strong mo-tivation to cultivate their social and technical trajectories, becausethe project environment, in particular, the environment at the timea developers joins (i.e., the initial environment for the developer),may have a significant impact on the individual. For example, inmany offshoring projects, the developers in the offshore locationwere considered to be incompetent to implement new feature de-velopment in legacy projects: “I don’t know if people are “climb-ing up” (moving from defect fixing to new development) in thissite,” because “initially nobody could get trained by experiencedmentors”, according to an outsourcing manager. Therefore, “theoffshore team really needs time working with onshore developersto gain mature practices,” according to the same manager.
This anecdotal evidence sparked our interest to investigate howthe initial environment may impact the developers’ learning trajec-tories, in particular, the achievement of social and technical bal-ance. Improving this process may help understand how to increasethe number of developers capable of solving critical tasks, to im-prove the developers’ training, and to facilitate the project’s suc-cess.
We have to overcome two challenges to proceed with this inves-tigation. First, we need to measure the socio-technical balance, sec-ond, we need to determine how the initial environment affects thetrajectories of developers. In addition to the challenges of measur-ing the social and technical achievement in general, we also need toderive these measures from commonly available project data, suchas version control system and problem tracking system. Such dataare difficult to obtain and even more difficult to interpret. For ex-ample, Cataldo et al. [4] compared an MR-induced logical depen-dency graph on source code files with a graph induced by instant
1The quotes, including the latter ones, are obtained from the inter-views conducted in our former work [20].
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ICSE’11, May 21–28, 2011, Waikiki, Honolulu, HI, USACopyright 2011 ACM 978-1-4503-0445-0/11/05 ...$10.00
271
Previous WorkDagenais et al., ICSE 2010
Mentoring project newcomers highly desirable
Moving into a New Software Project Landscape
Barthélémy Dagenais†!, Harold Ossher‡, Rachel K. E. Bellamy‡, Martin P. Robillard†,Jacqueline P. de Vries‡
School of Computer Science† IBM T.J. Watson Research Center‡McGill University P.O. Box 704
Montréal, QC, Canada Yorktown Heights, NY 10598{bart,martin}@cs.mcgill.ca {ossher,rachel,devries}@us.ibm.com
ABSTRACTWhen developers join a software development project, they findthemselves in a project landscape, and they must become familiarwith the various landscape features. To better understand the natureof project landscapes and the integration process, with a view to im-proving the experience of both newcomers and the people responsi-ble for orienting them, we performed a grounded theory study with18 newcomers across 18 projects. We identified the main featuresthat characterize a project landscape, together with key orientationaids and obstacles, and we theorize that there are three primaryfactors that impact the integration experience of newcomers: earlyexperimentation, internalizing structures and cultures, and progressvalidation.
Categories and Subject DescriptorsD.2.9 [Software Engineering]: Management
General TermsHuman Factors
1. INTRODUCTIONSoftware developers working on a project effectively inhabit a
project landscape. They are familiar with its features, such as theproduct architecture, the team communication strategies and the de-velopment process, and they know the shortcuts and the commonly-traveled paths. Newcomers are explorers who must orient them-selves within an unfamiliar landscape. As they gain experience,they eventually settle in and create their own places within thelandscape. Like explorers of the natural landscape, they encountermany obstacles, such as culture shock or getting lost without help.
We conducted a qualitative study to better understand what proj-ect landscapes look like and how newcomers explore them. Think-ing of a project as a landscape, and integration of newcomers asthe process of settling into that landscape, changes what we per-ceive to be important and helps us see new ways of aiding new-comers. From a newcomer’s perspective, it emphasizes the pro-!This research was conducted while the author was working at theIBM T.J. Watson Research Center.
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ICSE ’10, May 2-8 2010, Cape Town, South AfricaCopyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00.
cess of learning about a project, and how that process unfolds overtime. From the perspective of someone helping newcomers set-tle in, the landscape metaphor reveals the need to show them thecommonly-traversed routes, to help them learn to interpret aspectsof the landscape unique to the project, and to introduce them to thecustoms of the people who inhabit the landscape. It also suggeststhat if the community wants to be welcoming to newcomers, theyneed to be tolerant of cultural faux-pas, be sensitive to mis-stepscaused by a newcomer’s lack of understanding, take the time tounderstand why newcomers get lost in their landscape, add readily-interpretable signposts and move them as things change. Such sign-posts are especially important at cross-roads—places with choiceswhere others have tended to get lost. Identifying what counts as across-roads and what characterizes the parts of a project that needsignposts can be aided by studies such as that presented here.
Specifically, we were interested in answering three main researchquestions: what are the key, prominent features in a project land-scape, what orientation obstacles do new team members face, andwhat orientation aids can be provided? We interviewed 18 develop-ers and team leaders across 18 projects at IBM during the last yearto answer these questions.
Following these interviews, we theorized that there are threemain factors that impact how newcomers settle into a project land-scape: early experimentation, internalizing structures and cultures,and progress validation. We also identified the landscape featuresthat newcomers learned while moving into new project landscapesand we observed how the features facilitated or hindered the new-comers’ integration. When we presented the results of our study toseven of the participants, they all agreed that the factors accuratelyrepresented their experiences as newcomers and that application ofour findings would have eased their integration.
In the past, studies on project integration have been performedwith new employees joining their first software development proj-ects [2, 15]. Because these studies were performed with junior andrecently-hired developers, many of the difficulties they encounteredrelated to the newness of the corporate culture and the differencebetween academic and industrial environments. We were interestedin understanding specifically the project landscape, independentlyof the circumstances related to the first-time transition of personnelinto an industry environment. To this end, we focused this studyon developers with varying degrees of experience in the field andwithin their company who were joining on-going projects in thecompany. We reported preliminary results at a workshop [6].
The contributions of this paper include a theory, grounded in em-pirical data, of how newcomers integrate into a project landscape,and a characterization of project landscapes as seen by newcomers.The landscape features identified are well known; the contributionin this area is the empirical evidence of their impact on integration.
275
Characteristics of a Good Mentor
enough exper;seabout the topic of interest for the newcomer…
enough ability to helpother people…
Ability to help others
Expertise
Sources of Information
SVNGIT CVS
Approach for Mentors Identificationin Open Source Projects
YODA (Young and newcOmer Developer Assistant)
Our Contribution
1) Identify Mentors in Past Project History
SVNGIT CVS
YODA: Two phases
2) Recommend Mentors
?
What factors can be used to identify
mentors?
ArnetMiner (http://arnetminer.org): popular search engine for academicresearchers in computer science
identifies relations between students and advisors
What factors can be used to identify mentors?
RQ1: Identifying mentors in past project history
Similar problem:Iden;fying advisors inacademic collabora;ons
How does ArnetMiner work?
f1 they published many papers together
f2advisor published more than the student
f3advisor older than the student
f4student published her first paper(s) with the advisor
Ranks pairs of researchers according to four factors:
Jim Alice
Is the mentor of
IF
Time
F1: Exchanged emails
Heuristics to identify mentors
Jim Alice
Is the mentor of
IF
Time When Alice joinsthe project
F1
F1: Exchanged emails
Heuristics to identify mentors
Jim Alice
Is the mentor of
IF
F1
F2: overall amount of emails
Heuristics to identify mentors
Jim Alice
Is the mentor of
IF
F1
F2
F2 >
F2: overall amount of emails
Heuristics to identify mentors
Jim Alice
Is the mentor of
IF
F1
F2
>F2 >
F2: overall amount of emails
Heuristics to identify mentors
Jim Alice
Is the mentor of
IF
F1
F2 > Time
F3: age in the project
Heuristics to identify mentors
Jim Alice
Is the mentor of
IF
F1
F2 > Time
F3
F3
F3: age in the project
Heuristics to identify mentors
Jim Alice
Is the mentor of
IF
F1
F2 > Time
F3
F4 -‐ 1st
F4: newcomer “early” emails
Heuristics to identify mentors
Jim Alice
Is the mentor of
IF
F1
F2 > Time
F3
F4 -‐ 1st
F4: newcomer “early” emails
First emails by Alicein the project
Heuristics to identify mentors
Jim Alice
Is the mentor of
IF
F1
F2 >F3
F4 -‐ 1st
Time
F5: Commits
Heuristics to identify mentors
When Alice joinsthe project
Jim Alice
Is the mentor of
IF
F1
F2 >F3
F4 -‐ 1st
F5
F5
Time
F5: Commits
Heuristics to identify mentors
What factors can be used to identify
mentors?
Aggregating the factors
Recommending Mentors
Time
Project developers
Recommending Mentors
Time
Past mentors
Recommending Mentors
Time
Alice
t0
Past mentors
Recommending Mentors
Time
Alice
t0
Mentor with adequate skills
Past mentors
Time
Inspired to the work on Bug Triaging by J. Anvik et al., TOSEM 2011
Recommending MentorsPast mentors
Time
Alice
t0
Inspired to the work on Bug Triaging by J. Anvik et al., TOSEM 2011
Recommending MentorsPast mentors
Time
Alice
t0
Inspired to the work on Bug Triaging by J. Anvik et al., TOSEM 2011
Recommending MentorsPast mentors
Time
Alice
t0
Inspired to the work on Bug Triaging by J. Anvik et al., TOSEM 2011
Recommending MentorsPast mentors
Time
Alice
t0
Inspired to the work on Bug Triaging by J. Anvik et al., TOSEM 2011
Recommending MentorsPast mentors
Time
Alice
t0
DICE SIMILARITY
Inspired to the work on Bug Triaging by J. Anvik et al., TOSEM 2011
Recommending MentorsPast mentors
Empirical Study
Goal: analyze data from mailing lists and versioning systems
Purpose: investigating which factors can be used to identify mentors
Quality focus: recommend mentors in software projects
Context: mailing lists and versioning systems of five software projects:• Apache, FreeBSD, PostgreSQL, Python and Samba
Apache FreeBSD PostgreSQL Python Samba
Period (Training set) 08/2001-03/2002 11/1998-02/2000 10/1998-05/2001 05/2000-05/2001 04/1998-09/2000
Period (Test set)
04/2002-12/2008 03/2000-10/2008 06/2001-03/2008 06/2001-12/2008 10/2000-12/2008
# of Mentors (Training set) 19 65 10 28 17
# of Newcomers
(Training set)13 33 8 32 33
# of Newcomers
(Test set)13 33 7 31 33
Context
Split into a training set and a test set
RQ1 How can we
identify mentors from the past history of a software project?
RQ2To what extent would it be possible to recommend mentors to newcomers joining a software project?
Research Questions
?
RQ1: How can we identify mentors from the past history of a software project?
F1
F2 >F3
F4 -‐ 1st
F5
2.5
2.5
1.5
1.5
1.0
1.0
Pair Score
RQ1: How can we identify mentors from the past history of a software project?
F1
F2 >F3
F4 -‐ 1st
F5
Manuallyvalidated✔
2.5
2.5
1.5
1.5
1.0
1.0
Pair Score
Possible Configurations
f1
F1
F2 >F3
F4 -‐ 1st
F5
RQ1: How can we identify mentors from the past history of a software project?
0%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
18# 19# 20# 21# 22# 23# 24#
Precision)
Number)of)newcomer0mentor)pairs)
f1 +f2+ f3
F1
F2 >F3
F4 -‐ 1st
F5
RQ1: How can we identify mentors from the past history of a software project?
Possible
Configurations
0%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
18# 19# 20# 21# 22# 23# 24#
Precision)
Number)of)newcomer0mentor)pairs)
f1 +f2+ f4
RQ1: How can we identify mentors from the past history of a software project?
F1
F2 >F3
F4 -‐ 1st
F5
Possible Configurations
0%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
18# 19# 20# 21# 22# 23# 24#
Precision)
Number)of)newcomer0mentor)pairs)
f5
(Baseline)
F1
F2 >F3
F4 -‐ 1st
F5
Possible Configurations
0%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
18# 19# 20# 21# 22# 23# 24#
Precision)
Number)of)newcomer0mentor)pairs)
RQ1: How can we identify mentors from the past history of a software project?
Apache PostgreSQL
f1
F1 F2 > F3 F4 – 1st F5
f1 +f2+ f3 f1 +f2+ f4 f5(Baseline)
RQ1: How can we identify mentors from the past history of a software project?
0%#10%#20%#30%#40%#50%#60%#70%#80%#90%#100%#
18# 19# 20# 21# 22# 23# 24#
Precision)
Number)of)newcomer0mentor)pairs)
0%#10%#20%#30%#40%#50%#60%#70%#80%#90%#100%#
12# 14# 16# 18# 20# 22#Precision)
Number)of)newcomer0mentor)pairs)
Apache PostgreSQL
f1
F1 F2 > F3 F4 – 1st F5
f1 +f2+ f3 f1 +f2+ f4 f5(Baseline)
RQ1: How can we identify mentors from the past history of a software project?
0%#10%#20%#30%#40%#50%#60%#70%#80%#90%#100%#
18# 19# 20# 21# 22# 23# 24#
Precision)
Number)of)newcomer0mentor)pairs)
0%#10%#20%#30%#40%#50%#60%#70%#80%#90%#100%#
12# 14# 16# 18# 20# 22#Precision)
Number)of)newcomer0mentor)pairs)
RQ1: How can we identify mentors from the past history of a software project?
0%#10%#20%#30%#40%#50%#60%#70%#80%#90%#
100%#
24# 26# 28# 30# 32# 34# 36# 38# 40# 42# 44# 46# 48#
Precision)
Number)of)newcomer0mentor)pairs)
0%#10%#20%#30%#40%#50%#60%#70%#80%#90%#
100%#
23# 25# 27# 29# 31# 33# 35# 37# 39# 41#
Precision)
Number)of)newcomer0mentor)pairs)
0%#10%#20%#30%#40%#50%#60%#70%#80%#90%#100%#
30# 32# 34# 36# 38# 40# 42#
Precision)
Number)of)newcomer0mentor)pairs)
Python FreeBSD
Samba
RQ1: How can we identify mentors from the past history of a software project?
0%#10%#20%#30%#40%#50%#60%#70%#80%#90%#
100%#
24# 26# 28# 30# 32# 34# 36# 38# 40# 42# 44# 46# 48#
Precision)
Number)of)newcomer0mentor)pairs)
0%#10%#20%#30%#40%#50%#60%#70%#80%#90%#
100%#
23# 25# 27# 29# 31# 33# 35# 37# 39# 41#
Precision)
Number)of)newcomer0mentor)pairs)
0%#10%#20%#30%#40%#50%#60%#70%#80%#90%#100%#
30# 32# 34# 36# 38# 40# 42#
Precision)
Number)of)newcomer0mentor)pairs)
Python FreeBSD
Samba
Useful factors for mentor identification
F1
F2 >F3
F4 -‐ 1st
F5
0.5*f1 + 0.25*f2 + 0.25*f40.5*f1 + 0.25*f2 + 0.25*f3
f1
85%$
30%$
100%$
64%$
94%$
81%$
24%$
100%$
77%$82%$
0%$
10%$
20%$
30%$
40%$
50%$
60%$
70%$
80%$
90%$
100%$
110%$
Apache$ FreeBSD$ PostgreSQL$ Python$ Samba$
Top$1$$ Top$2$
RQ2: To what extent would it be possible torecommend mentors to newcomers joining a software project?
Precision
85%$
30%$
100%$
64%$
94%$
81%$
24%$
100%$
77%$82%$
0%$
10%$
20%$
30%$
40%$
50%$
60%$
70%$
80%$
90%$
100%$
110%$
Apache$ FreeBSD$ PostgreSQL$ Python$ Samba$
Top$1$$ Top$2$
RQ2: To what extent would it be possible torecommend mentors to newcomers joining a software project?
Precision
85%$
30%$
100%$
64%$
94%$
81%$
24%$
100%$
77%$82%$
0%$
10%$
20%$
30%$
40%$
50%$
60%$
70%$
80%$
90%$
100%$
110%$
Apache$ FreeBSD$ PostgreSQL$ Python$ Samba$
Top$1$$ Top$2$
RQ2: To what extent would it be possible torecommend mentors to newcomers joining a software project?
YODA makes it possibleto recommend mentors
✔
Precision
0%#
6%#
50%#
0%#
35%#
8%#
3%#
25%#
7%#
35%#
0%#
10%#
20%#
30%#
40%#
50%#
60%#
Apache# FreeBSD# PostgreSQL# Python# Samba#
Top#1## Top#2#
Why don’t just using Top Committers?
Precision
0%#
6%#
50%#
0%#
35%#
8%#
3%#
25%#
7%#
35%#
0%#
10%#
20%#
30%#
40%#
50%#
60%#
Apache# FreeBSD# PostgreSQL# Python# Samba#
Top#1## Top#2#
Why don’t just using Top Committers?
Precision
0%#
6%#
50%#
0%#
35%#
8%#
3%#
25%#
7%#
35%#
0%#
10%#
20%#
30%#
40%#
50%#
60%#
Apache# FreeBSD# PostgreSQL# Python# Samba#
Top#1## Top#2#
Why don’t just using Top Committers?
Precision
0%#
6%#
50%#
0%#
35%#
8%#
3%#
25%#
7%#
35%#
0%#
10%#
20%#
30%#
40%#
50%#
60%#
Apache# FreeBSD# PostgreSQL# Python# Samba#
Top#1## Top#2#
Why don’t just using Top Committers?
Not all committers are good mentors!
Precision
Questions asked:
Done/received mentoring
Perceived importance of mentoring
What makes a good mentor
Surveying Projects Developers
NewcomerMentor
Sent to 114 Subjects...
FreeBSD
Postgre-SQL
Python
Apache
Samba.....37
.....37.....15
.....23.....23
Obtained Answers
FreeBSD
Postgre-SQL
Python
Apache
Samba
-‐
Done/received mentoring?
92%$
58%$
8%$
42%$
0%# 20%# 40%# 60%# 80%# 100%#
Did#mentoring?#
Had#a#mentor?#
YES# NO#
Done/received mentoring?
92%$
58%$
8%$
42%$
0%# 20%# 40%# 60%# 80%# 100%#
Did#mentoring?#
Had#a#mentor?#
YES# NO#
Yes, I received Mentoring. My mentor was…
Yes, I didmentoring…
Perceived importance of mentoring
18%$
36%$
45%$
0%$
0%$
33%$
56%$
11%$
0%$
0%$
0%# 10%# 20%# 30%# 40%# 50%# 60%#
Very#important#
Important#
Neutral#
Not#important#
Useless#at#all#
Effect#of#mentor# Effect#on#newcomer#
Perceived importance of mentoring
18%$
36%$
45%$
0%$
0%$
33%$
56%$
11%$
0%$
0%$
0%# 10%# 20%# 30%# 40%# 50%# 60%#
Very#important#
Important#
Neutral#
Not#important#
Useless#at#all#
Effect#of#mentor# Effect#on#newcomer#
Is very important that a mentor shares knowledgewith a mentee…
What makes a good mentor
19%$
42%$
38%$
0%$
0%# 20%# 40%# 60%#
Experience#
Communica4on#skills#
Project#knowledge#
Others#
What makes a good mentor
19%$
42%$
38%$
0%$
0%# 20%# 40%# 60%#
Experience#
Communica4on#skills#
Project#knowledge#
Others#My first mentor had a very strong and technical background
Conclusions
Conclusions
Conclusions
Conclusions
Conclusions
Future Work...
Considering factors able to better
capture the technical skills of mentors.
Replicating the study with different
projects.