crowdsourced knowledge catalyzes software...

34
http://www.flickr.com/photos/jamiemanley/5278662995 Crowdsourced Knowledge Catalyzes Software Development Bogdan Vasilescu, TU Eindhoven @b_vasilescu BeNeVol 2013, Mons, Belgium Based on : Vasilescu, B, Filkov, V and Serebrenik, A (2013), "StackOverflow and GitHub: Associations between software development and crowdsourced knowledge", In SocialCom, pp.188-195. IEEE

Upload: others

Post on 20-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

http://www.flickr.com/photos/jamiemanley/5278662995

Crowdsourced Knowledge Catalyzes Software Development

Bogdan Vasilescu, TU Eindhoven @b_vasilescu

BeNeVol 2013, Mons, Belgium

Base

d on

: Vas

ilesc

u, B

, Filk

ov, V

and

Ser

ebre

nik,

A (2

013)

, "St

ackO

verfl

ow a

nd G

itHub

: Ass

ocia

tions

be

twee

n so

ftwar

e de

velo

pmen

t and

cro

wds

ourc

ed k

now

ledg

e", I

n So

cial

Com

, pp.

188-

195.

IEEE

Page 2: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Standing on the shoulders of others

Developers:

• reuse components and libraries

• forage on the Web for information

Page 3: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Standing on the shoulders of others

Developers:

• reuse components and libraries

• forage on the Web for information

Page 4: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Standing on the shoulders of others

Developers:

• reuse components and libraries

• forage on the Web for information

Page 5: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow
Page 6: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Writing code vs. seeking and sharing knowledge

Demand����������� ������������������  for����������� ������������������  knowledge

Supply����������� ������������������  of����������� ������������������  knowledge

Page 7: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Is participation in SO related to productivity of developers?

Page 8: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Is participation in SO related to productivity of developers?

Beneficial:

• good technical solutions

!

• fast answers (median 11 mins)

[Parnin et al. “Crowd documentation: Exploring the coverage and the dynamics of API discussions on Stack Overflow,” Georgia Institute of Technology, Tech. Rep., 2012]

[Mamykina et al. “Design lessons from the fastest Q&A site in the west,” in CHI. ACM, 2011, pp. 2857–2866]

http://www.flickr.com/photos/dw212/4433157278

Page 9: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Detrimental:

• competes for time !

• gamified, thus addictive !

• context switches are expensive

http://www.flickr.com/photos/jamiemanley/5278662995

[Storey et al. “The impact of social media on software engineering practices and tools,” FoSER. ACM, 2010, pp. 359–364]

[Deterding, “Gamification: designing for motivation,” Interactions, vol. 19, no. 4, pp. 14–17, 2012]

[Bacchelli et al. “Harnessing Stack Overflow for the IDE,” in RSSE. IEEE, 2012, pp. 26–30]

Is participation in SO related to productivity of developers?

Page 10: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Is participation in SO related to productivity of developers?

Asset or

burden?

Page 11: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Dataset

~400k users July 2011 - April 2012

~1.3M users July 2008 - August 2012

[G. Gousios and D. Spinellis. “GHTorrent: Github’s data from a firehose,” in MSR. IEEE, 2012, pp. 12–21] [Quarterly StackExchange data dump (August 2012)]

Largest����������� ������������������  code����������� ������������������  host����������� ������������������  in����������� ������������������  the����������� ������������������  world Largest����������� ������������������  programming����������� ������������������  

Q&A����������� ������������������  site����������� ������������������  in����������� ������������������  the����������� ������������������  world

Page 12: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Dataset

!

[G. Gousios and D. Spinellis. “GHTorrent: Github’s data from a firehose,” in MSR. IEEE, 2012, pp. 12–21] [Quarterly StackExchange data dump (August 2012)]

?

July 2011 - April 2012 July 2008 - August 2012

Page 13: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Dataset

Email address (plain text)

!

[G. Gousios and D. Spinellis. “GHTorrent: Github’s data from a firehose,” in MSR. IEEE, 2012, pp. 12–21] [Quarterly StackExchange data dump (August 2012)]

Email address (MD5 hash)

?

July 2011 - April 2012 July 2008 - August 2012

Page 14: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Dataset

Email address (plain text)

!

[G. Gousios and D. Spinellis. “GHTorrent: Github’s data from a firehose,” in MSR. IEEE, 2012, pp. 12–21] [Quarterly StackExchange data dump (August 2012)]

Email address (MD5 hash)

~94k users

(24%)(7%)

July 2011 - April 2012 July 2008 - August 2012

Page 15: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Dataset

Email address (plain text)

!

[G. Gousios and D. Spinellis. “GHTorrent: Github’s data from a firehose,” in MSR. IEEE, 2012, pp. 12–21] [Quarterly StackExchange data dump (August 2012)]

Email address (MD5 hash)

~47k users active on both GitHub and StackOverflow between July 2011 - April 2012

(12%)(4%)

July 2011 - April 2012 July 2008 - August 2012

Page 16: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Is participation in SO related to productivity of developers?

Asset or

burden?

Page 17: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Macro: overall activity levels

[Capiluppi et al. “Assessing technical candidates on the social web,” IEEE Software, vol. 30, no. 1, pp. 45–51, 2013]

To what extent can activity (expertise) on one platform be used as a proxy for activity (expertise) on the other?

• social signals (e.g., open source projects, professional social media) ~ career advancement

Page 18: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Is attention focused (bursts of commits) or divided between the two platforms?

• working rhythms of developers ~ software quality[Eyolfson et al. “Correlations between bugginess and time-based commit characteristics,” Empirical Software Engineering, pp. 1–31, 2013]

Intermediate: working rhythms

See paper

Page 19: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Do StackOverflow activities accelerate or slow down GitHub commits?

Micro: coordination between commits

and Q&A

[Storey et al. “The impact of social media on software engineering practices and tools,” FoSER. ACM, 2010, pp. 359–364]

Page 20: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Macro Overall activity

100 5 50

10 75 15

25 10 75

Stuart

Kevin

Dave

#Commits #Answers #Questions

Page 21: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Macro Overall activity

100 5 50

25 10 75

10 75 15Stuart

Kevin

Dave

#Commits #Answers #Questions

Page 22: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Macro Overall activity

100 5 50

25 10 75

10 75 15Stuart

Kevin

Dave

#Commits #Answers #Questions

Fix,����������� ������������������  sort����������� ������������������   Quartiles/Deciles,����������� ������������������  compare����������� ������������������  

Not restricted to monotonic relations!

Page 23: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

• Active GitHub committers are experienced developers:

• few StackOverflow questions

• many StackOverflow answers

Findings

Q2

Q1

Q3 Q4

Quartiles (#Commits)

Compare #Questions

Q2

Q3

Q4

Q1

Quartiles (#Commits)

Compare #Answers

Page 24: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

• Active GitHub committers are experienced developers:

• few StackOverflow questions

• many StackOverflow answers

Findings

Q2

Q1

Q3 Q4

Quartiles (#Commits)

Compare #Questions

Q2

Q3

Q4

Q1

Quartiles (#Commits)

Compare #Answers

Top StackOverflow users are superstars rather than slackers!

Page 25: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

• Active GitHub committers are experienced developers:

• few StackOverflow questions

• many StackOverflow answers

Findings

Q2

Q1

Q3 Q4

Quartiles (#Commits)

Compare #Questions

Q2

Q3

Q4

Q1

Quartiles (#Commits)

Compare #Answers

Top StackOverflow users are superstars rather than slackers!

GitHub activity ~ StackOverflow willingness to answer

technical questions (expertise)

Page 26: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Micro Who benefits from participating in SO

� �

� � � �� ��

� � � �

���Dave

[Xuan et al. “Measuring the effect of social communications on individual working rhythms: A case study of open source software,” in Social Informatics. ASE/IEEE, 2012]

Page 27: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

� �

� � � �� ��

� �

� � � �� �� �

���

���

Dave

[Xuan et al. “Measuring the effect of social communications on individual working rhythms: A case study of open source software,” in Social Informatics. ASE/IEEE, 2012]

Micro Who benefits from participating in SO

Page 28: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

� �

� � � �� ��

� �

� � � �� �� �

� � � �

� �� �� �

� � � �

� � � �� �

���

���

��

��

Dave

[Xuan et al. “Measuring the effect of social communications on individual working rhythms: A case study of open source software,” in Social Informatics. ASE/IEEE, 2012]

Micro Who benefits from participating in SO

Page 29: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

� �

� � � �� ��

� �

� � � �� �� �

� � � �

� �� �� �

� � � �

� � � �

� � � �� �

� �

� ��� �

���

���

��

��

���

Dave

[Xuan et al. “Measuring the effect of social communications on individual working rhythms: A case study of open source software,” in Social Informatics. ASE/IEEE, 2012]

Compare actual and shuffled series:

actual < shuffled: acceleration

actual > shuffled: impediment

Micro Who benefits from participating in SO

Page 30: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Findings

Page 31: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Findings

For active committers, asking and answering questions on StackOverflow

catalyses committing on GitHub.

For no group is participating in StackOverflow detrimental!

Page 32: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Summary: Is participation in SO related to productivity of GitHub dev’s?

Asset or

burden?

Page 33: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Summary

Experts����������� ������������������  are����������� ������������������  experts����������� ������������������  everywhere!Active committers are also

active answerers (knowledge providers) on

!

Different working rhythms for novices (focused attention) and experts

!

Participating in reinforces commit activities on

asset����������� ������������������  or����������� ������������������  burden

Going����������� ������������������  to����������� ������������������  !

is����������� ������������������  “costlier”����������� ������������������  for����������� ������������������  novices

Page 34: Crowdsourced Knowledge Catalyzes Software …informatique.umons.ac.be/genlog/benevol2013/presentations...19, no. 4, pp. 14–17, 2012] [Bacchelli et al. “Harnessing Stack Overflow

Summary

Experts����������� ������������������  are����������� ������������������  experts����������� ������������������  everywhere!Active committers are also

active answerers (knowledge providers) on

!

Different working rhythms for novices (focused attention) and experts

!

Participating in reinforces commit activities on

asset����������� ������������������  or����������� ������������������  burden

Going����������� ������������������  to����������� ������������������  !

is����������� ������������������  “costlier”����������� ������������������  for����������� ������������������  novices