wikipedia structure of collaboration
DESCRIPTION
There is no wisdom of crowds on Wikipedia. After 500 edits articles become quite uneven, a tiny fraction of users become the ad hoc rulers of those articles.Preseented at Wikimania 2009http://wikimania2009.wikimedia.org/wiki/Proceedings:132TRANSCRIPT
The structure of social collaboration on
Wikipedia
Sorin Adam Matei, Associate Professor of Communication, Purdue [email protected]
David Braun, Research Scientist, Envision Lab, Purdue [email protected]
Horia Petrache, Assistant Professor of Physics, [email protected]
Presented at Wikimania, 2009Buenos Aires, Argentina
August 25-28 2009http://wikimania2009.wikimedia.org/wiki/Proceedings:132
2005: A Wikipedian explains Wikipedia as Wisdom of Crowds
The basic premise [of Wisdom of Crowds] that crowds of relatively ignorant individuals make better decisions than small groups of experts. I'm sure everyone here agrees with this as Wikipedia is run this way...
Wikipedia displays emergent properties because each article is better than the contribution of each individual. Similarly, ants display emergence because an ant colony can accomplish things that each individual ant cannot even conceive.
Implied idea
Fine grained, micro contributions, independent and decentralized and maybe equal lead to articles that are better than what each contributor can write
As expressed in this Wikipedia-l post
I imagine Wikipedia as a massive, active swarm intelligence, supplemented by small roving groups of active editors who admire consistency, elegance, and reasoned discourse. (not unlike certain models of how the brain works :) The swarm does the bulk of the writing, especially finding and providing current facts, starting new articles, and adding neglected POVs. The roving groups are sensitive to dozens of policy pages, and implement them as they rove... they also take on large projects, one at a time, and try to implement certain changes across thousands of pages at once.
To which “Jimbo” (Wales) answers
I should point out that I like Suroweicki'sthesis just fine, it's just that I'm not convinced that "swarm intelligence" is very helpful in understanding how Wikipedia works -- in fact, it might be an impediment, because it leads us away from thinking about how the community interacts in a process of reasoned discourse.
Jimbo concludes
My research (conducted in December) showed that half the edits by logged in users belong to just 2.5% of logged in users.
Does the 80/20 applies?
Power-law curves are all over the real world …
Adar and Huberman (2000) found 50% of the content on Gnutella is provided by 1% of the users,
O'Mahony and Ferraro (2003) found the curve in the Debian dev key ring, Moon and Sproul(2002) on the Linux Kernel list, Briggs et al. (1997) in group support systems, Krogh, Spaethand Lakhani (2003) in Freenet.
(By another participant to the 2005 discussion)
What would the 80/20 rule mean?
Extreme inequality? Elitism?
Structured collaboration?
Interactive exchanges between groups of individuals?
Previous research
Wikipedia contributions, in all languages, have become more skewed in favor of a small group of editors and old time users (Ortega et al., 2005)
Top contributors dominate edits and no words contributed
Our approach
Increase in inequality => higher level of structuration
Increasing division of labor
From diffuse collaboration to structured collaboration
Emergence of bureaucracy
Emergence of adhocracy◦ Groups of individuals that become article stewards
Social entropy and structuration
Social Entropy
As system become organized (biased) their entropy decreases
Entropy is a measure of meaningful organization
Entropy and organization
Meaningful messages use words and letters in uneven manner
Symbol distribution in meaningful messages is uneven
Information (and social) entropy are measures of organization and meaning
As collaboration becomes more biased, the group becomes more organized
Shannon’s formula
where the sum is over all users i, and is the fractional contribution of user i. We allow p and S and to be functions of time (t).
i
iitptptS )(ln)()(
Shannon’s forumal explained
Social entropy reflects how uneven and lacking in diversity a group/system process is
10 users and 100 contributions, ◦ each contributing 10 edits to a Wikipedia article => entropy reaches its highest level
◦ 1 contributor contributes all, entropy at the lowest value
Analytic strategy
Downloaded latest available dump
Trouble with unzipping (dump corrupted)
Extracted ◦ 792,654 registered users
◦ 234,798 articles
Calculated number of times individuals contributed to each article and how many words have they contributed (not completely finalized)
Inte
rvention e
ntr
opy
Intervention number (events)
ln(x)
Basic plot: Entropy increases for the first @500
interventions, then levels off….
• Red: observed
entropy values
• Orange: fit curve
(takes into account
the spread of values
• Dotted: Maximum
entropy, wisdom of
crowds ceiling
Inte
rve
ntio
n e
ntr
op
y
Intervention number (events)
Logged plot: Average article entropy increasingly and
monotonously diverges from the “wisdom of the crowds”
ceiling. Wikipedia becomes “cooler” and more and more
structured ….
• Red: observed
entropy values
• Orange: fit curve
(takes into account
the spread of values
• Dotted: Maximum
entropy, wisdom of
crowds ceiling
Standard deviation/ Int. Entropy
Intervention number (events)
n-1/2
ratio
After the 500th
intervention the
coefficient of
variation
(StDev/Mean)
becomes
constant; all
articles tend to
behave within
the same limits
of variability for
the next 9,500
iterations
What remains to be done
Entropy decreases, Wikipedia “hardens” Does it become more structured? In what
way? Will analyze degree of structuration
measuring structure of coedits (network) analysis◦ Expectation: as entropy decreases, network
structures become more hierarchical can inflexible (less degrees of freedom)
Will analyze distribution of collaboration across formal and informal roles◦ Who are the nodes of collaboration◦ What is their contribution to cooling and hardening
Wikipedia