wikipedia structure of collaboration

20
The structure of social collaboration on Wikipedia Sorin Adam Matei, Associate Professor of Communication, Purdue U [email protected] David Braun, Research Scientist, Envision Lab, Purdue U [email protected] Horia Petrache, Assistant Professor of Physics, IUPUI [email protected] Presented at Wikimania, 2009 Buenos Aires, Argentina August 25-28 2009 http://wikimania2009.wikimedia.org/wiki/Proceedings:132

Upload: sorin-adam-matei

Post on 02-Jul-2015

2.432 views

Category:

Technology


3 download

DESCRIPTION

There is no wisdom of crowds on Wikipedia. After 500 edits articles become quite uneven, a tiny fraction of users become the ad hoc rulers of those articles.Preseented at Wikimania 2009http://wikimania2009.wikimedia.org/wiki/Proceedings:132

TRANSCRIPT

Page 1: Wikipedia structure of collaboration

The structure of social collaboration on

Wikipedia

Sorin Adam Matei, Associate Professor of Communication, Purdue [email protected]

David Braun, Research Scientist, Envision Lab, Purdue [email protected]

Horia Petrache, Assistant Professor of Physics, [email protected]

Presented at Wikimania, 2009Buenos Aires, Argentina

August 25-28 2009http://wikimania2009.wikimedia.org/wiki/Proceedings:132

Page 2: Wikipedia structure of collaboration

2005: A Wikipedian explains Wikipedia as Wisdom of Crowds

The basic premise [of Wisdom of Crowds] that crowds of relatively ignorant individuals make better decisions than small groups of experts. I'm sure everyone here agrees with this as Wikipedia is run this way...

Wikipedia displays emergent properties because each article is better than the contribution of each individual. Similarly, ants display emergence because an ant colony can accomplish things that each individual ant cannot even conceive.

Page 3: Wikipedia structure of collaboration

Implied idea

Fine grained, micro contributions, independent and decentralized and maybe equal lead to articles that are better than what each contributor can write

Page 4: Wikipedia structure of collaboration

As expressed in this Wikipedia-l post

I imagine Wikipedia as a massive, active swarm intelligence, supplemented by small roving groups of active editors who admire consistency, elegance, and reasoned discourse. (not unlike certain models of how the brain works :) The swarm does the bulk of the writing, especially finding and providing current facts, starting new articles, and adding neglected POVs. The roving groups are sensitive to dozens of policy pages, and implement them as they rove... they also take on large projects, one at a time, and try to implement certain changes across thousands of pages at once.

Page 5: Wikipedia structure of collaboration

To which “Jimbo” (Wales) answers

I should point out that I like Suroweicki'sthesis just fine, it's just that I'm not convinced that "swarm intelligence" is very helpful in understanding how Wikipedia works -- in fact, it might be an impediment, because it leads us away from thinking about how the community interacts in a process of reasoned discourse.

Page 6: Wikipedia structure of collaboration

Jimbo concludes

My research (conducted in December) showed that half the edits by logged in users belong to just 2.5% of logged in users.

Page 7: Wikipedia structure of collaboration

Does the 80/20 applies?

Power-law curves are all over the real world …

Adar and Huberman (2000) found 50% of the content on Gnutella is provided by 1% of the users,

O'Mahony and Ferraro (2003) found the curve in the Debian dev key ring, Moon and Sproul(2002) on the Linux Kernel list, Briggs et al. (1997) in group support systems, Krogh, Spaethand Lakhani (2003) in Freenet.

(By another participant to the 2005 discussion)

Page 8: Wikipedia structure of collaboration

What would the 80/20 rule mean?

Extreme inequality? Elitism?

Structured collaboration?

Interactive exchanges between groups of individuals?

Page 9: Wikipedia structure of collaboration

Previous research

Wikipedia contributions, in all languages, have become more skewed in favor of a small group of editors and old time users (Ortega et al., 2005)

Page 10: Wikipedia structure of collaboration

Top contributors dominate edits and no words contributed

Page 11: Wikipedia structure of collaboration

Our approach

Increase in inequality => higher level of structuration

Increasing division of labor

From diffuse collaboration to structured collaboration

Emergence of bureaucracy

Emergence of adhocracy◦ Groups of individuals that become article stewards

Page 12: Wikipedia structure of collaboration

Social entropy and structuration

Social Entropy

As system become organized (biased) their entropy decreases

Entropy is a measure of meaningful organization

Page 13: Wikipedia structure of collaboration

Entropy and organization

Meaningful messages use words and letters in uneven manner

Symbol distribution in meaningful messages is uneven

Information (and social) entropy are measures of organization and meaning

As collaboration becomes more biased, the group becomes more organized

Page 14: Wikipedia structure of collaboration

Shannon’s formula

where the sum is over all users i, and is the fractional contribution of user i. We allow p and S and to be functions of time (t).

i

iitptptS )(ln)()(

Page 15: Wikipedia structure of collaboration

Shannon’s forumal explained

Social entropy reflects how uneven and lacking in diversity a group/system process is

10 users and 100 contributions, ◦ each contributing 10 edits to a Wikipedia article => entropy reaches its highest level

◦ 1 contributor contributes all, entropy at the lowest value

Page 16: Wikipedia structure of collaboration

Analytic strategy

Downloaded latest available dump

Trouble with unzipping (dump corrupted)

Extracted ◦ 792,654 registered users

◦ 234,798 articles

Calculated number of times individuals contributed to each article and how many words have they contributed (not completely finalized)

Page 17: Wikipedia structure of collaboration

Inte

rvention e

ntr

opy

Intervention number (events)

ln(x)

Basic plot: Entropy increases for the first @500

interventions, then levels off….

• Red: observed

entropy values

• Orange: fit curve

(takes into account

the spread of values

• Dotted: Maximum

entropy, wisdom of

crowds ceiling

Page 18: Wikipedia structure of collaboration

Inte

rve

ntio

n e

ntr

op

y

Intervention number (events)

Logged plot: Average article entropy increasingly and

monotonously diverges from the “wisdom of the crowds”

ceiling. Wikipedia becomes “cooler” and more and more

structured ….

• Red: observed

entropy values

• Orange: fit curve

(takes into account

the spread of values

• Dotted: Maximum

entropy, wisdom of

crowds ceiling

Page 19: Wikipedia structure of collaboration

Standard deviation/ Int. Entropy

Intervention number (events)

n-1/2

ratio

After the 500th

intervention the

coefficient of

variation

(StDev/Mean)

becomes

constant; all

articles tend to

behave within

the same limits

of variability for

the next 9,500

iterations

Page 20: Wikipedia structure of collaboration

What remains to be done

Entropy decreases, Wikipedia “hardens” Does it become more structured? In what

way? Will analyze degree of structuration

measuring structure of coedits (network) analysis◦ Expectation: as entropy decreases, network

structures become more hierarchical can inflexible (less degrees of freedom)

Will analyze distribution of collaboration across formal and informal roles◦ Who are the nodes of collaboration◦ What is their contribution to cooling and hardening

Wikipedia