chi2007 talk on conflicts in wikipedia

38
He Says, She Says: Conflict and Coordination in Wikipedia Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed Chi UCLA Augmented Social Cognition Group Palo Alto Research Center

Upload: ed-chi

Post on 14-Jun-2015

1.517 views

Category:

Technology


2 download

DESCRIPTION

Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed H. Chi.He Says, She Says: Conflict and Coordination in Wikipedia.In Proc. of ACM Conference on Human Factors in Computing Systems (CHI2007), pp. 453--462, April 2007. ACM Press. San Jose, CA.http://www-users.cs.umn.edu/~echi/papers/2007-CHI/2007-Wikipedia-coordination-PARC-CHI2007.pdf

TRANSCRIPT

Page 1: CHI2007 talk on Conflicts in Wikipedia

He Says, She Says: Conflict and Coordination in Wikipedia

Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed ChiUCLA Augmented Social Cognition Group

Palo Alto Research Center

Page 2: CHI2007 talk on Conflicts in Wikipedia

What is Wikipedia?

“Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you’re getting the best possible information.”

– Steve Carell, The Office

Page 3: CHI2007 talk on Conflicts in Wikipedia

Spreading conflict

Page 4: CHI2007 talk on Conflicts in Wikipedia

Spreading conflict

Page 5: CHI2007 talk on Conflicts in Wikipedia

Spreading conflict

Page 6: CHI2007 talk on Conflicts in Wikipedia

Spreading conflict

Page 7: CHI2007 talk on Conflicts in Wikipedia

Spreading conflict

Page 8: CHI2007 talk on Conflicts in Wikipedia

Policy and procedure

“The degree of success that one meets in dealing with conflicts... often depends on the efficiency with which one can quote policy and precedent.” - Wikipedia admin (survey

data)

Page 9: CHI2007 talk on Conflicts in Wikipedia

Collaborative work beneath the surface

• Visitors only look at article pages• But much of Wikipedia comprised of

other pages– Conflict resolution, coordination, policies and

procedures

Page 10: CHI2007 talk on Conflicts in Wikipedia

Characterizing coordination and conflict

Page 11: CHI2007 talk on Conflicts in Wikipedia

Characterizing coordination and conflict

Page 12: CHI2007 talk on Conflicts in Wikipedia

Exponential growth

Page 13: CHI2007 talk on Conflicts in Wikipedia

Costs of growth

• Increase in conflict and coordination costs– Software development (Boehm, 1981; Brooks, 1975)

– MUDs/MOOs (Curtis, 1992; Dibbell, 1993)

– Mailing lists (Sproull & Kiesler, 1991)

• How has growth affected Wikipedia?– Millions of new users and articles

Page 14: CHI2007 talk on Conflicts in Wikipedia

Infrastructure

• Analyze entire history of Wikipedia– Every edit to every article

• Large amount of data– 4+ million pages– 58+ million revisions– 800+ Gb– as of June 2006

• Distributed processing– Hadoop distributed filesystem– Map/reduce to process data in parallel

Page 15: CHI2007 talk on Conflicts in Wikipedia

Types of work

Direct work Immediately consumable

Indirect workCoordination,

conflict

Maintenance work Reverts, vandalism

Article Talk, user, procedure

Page 16: CHI2007 talk on Conflicts in Wikipedia

Less direct work

• Decrease in proportion of edits to article page

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

2001 2002 2003 2004 2005 2006

Edi

t pr

opor

tion

70%

Page 17: CHI2007 talk on Conflicts in Wikipedia

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

2001 2002 2003 2004 2005 2006

Ed

it P

rop

ort

ion

More indirect work

• Increase in proportion of edits to user talk

8%

Page 18: CHI2007 talk on Conflicts in Wikipedia

More indirect work

• Increase in proportion of edits to user talk

• Increase in proportion of edits to procedure

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

2001 2002 2003 2004 2005 2006

Edi

t pr

opor

tion 11

%

Page 19: CHI2007 talk on Conflicts in Wikipedia

More maintenance work

• Increase in proportion of edits that are reverts

00.020.040.060.08

0.10.120.140.160.18

0.2

2001 2002 2003 2004 2005 2006

Ed

it p

rop

ort

ion

7%

Page 20: CHI2007 talk on Conflicts in Wikipedia

More wasted work

• Increase in proportion of edits that are reverts

• Increase in proportion of edits reverting vandalism

00.005

0.010.015

0.02

0.0250.03

2001 2002 2003 2004 2005

Ed

it p

rop

ort

ion

1-2%

Page 21: CHI2007 talk on Conflicts in Wikipedia

Global level

• Conflict and coordination costs are growing– Less direct work (articles)+ More indirect work (article talk, user,

procedure)+ More maintenance work (reverts, vandalism)

60%

65%

70%

75%

80%

85%

90%

95%

100%

2001 2002 2003 2004 2005 2006

Pe

rce

nta

ge

of t

ota

l ed

its

Article

User

Article Talk

User Talk

Other

Maintenance

Page 22: CHI2007 talk on Conflicts in Wikipedia

Characterizing coordination and conflict

Page 23: CHI2007 talk on Conflicts in Wikipedia

Conflict at the article level

• What defines conflict in articles?• Build a characterization model of article

conflict– Identify page features and metrics

associated with conflict– Automatically identify high-conflict articles

Page 24: CHI2007 talk on Conflicts in Wikipedia

Page metrics

• Chose metrics for identifying conflict in articles– Easily computable, scalable

Metric type Page Type

Revisions (#)Article, talk, article/talk

Page lengthArticle, talk, article/talk

Unique editorsArticle, talk, article/talk

Unique editors / revisions

Article, talk

Links from other articles Article, talk

Links to other articles Article, talk

Anonymous edits (#, %) Article, talk

Administrator edits (#, %)

Article, talk

Minor edits (#, %) Article, talk

Reverts (#, by unique editors)

Article

Page 25: CHI2007 talk on Conflicts in Wikipedia

Defining conflict

• Operational definition for conflict • Revisions tagged controversial

• Conflict revision count

Page 26: CHI2007 talk on Conflicts in Wikipedia

Machine learning

• Predict conflict from page metrics– Training set of “controversial” pages– Support vector machine regression

predicting # controversial revisions (SMOreg; Smola & Scholkopf, 1998)

• Not just conflict/no conflict, but how much conflict

Page 27: CHI2007 talk on Conflicts in Wikipedia

Performance: Cross-validation

• 5x cross-validation, R2 = 0.897

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Predicted controversial revisions

Act

ual c

ontrov

ersial

revi

sion

s

Page 28: CHI2007 talk on Conflicts in Wikipedia

Performance: Cross-validation

• 5x cross-validation, R2 = 0.897

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Predicted controversial revisions

Act

ual c

ontrov

ersial

revi

sion

s

Page 29: CHI2007 talk on Conflicts in Wikipedia

Determinants of conflict

1. —Revisions (talk)2. —Minor edits (talk)3. ˜Unique editors (talk)4. —Revisions (article)5. ˜Unique editors (article)6. —Anonymous edits (talk)7. ˜Anonymous edits (article)

Highly weighted metrics of conflict model:

Page 30: CHI2007 talk on Conflicts in Wikipedia

Identifying untagged articles

• Detect conflicts for unlabeled articles– Majority of articles have never been conflict

tagged

• Testing model generalization– Applied model to untagged articles– Sample rated by expert Wikipedians

• Significant positive correlation with predicted scores– By rank correlation, p < 0.013 (Spearman’s

rho)

Page 31: CHI2007 talk on Conflicts in Wikipedia

Characterizing coordination and conflict

Page 32: CHI2007 talk on Conflicts in Wikipedia

Conflict at the user level

• How can we identify conflict between users?

• Reverts as a proxy for user conflict• Revert patterns between users• Force directed layout to cluster users

– Group similar viewpoints– Find conflicts between groups

Page 33: CHI2007 talk on Conflicts in Wikipedia

Dokdo/Takeshima opinion groups

Group A

Group B Group C

Group D

Page 34: CHI2007 talk on Conflicts in Wikipedia

Terry Schiavo

Mediators

Sympathetic to parents

Sympathetic to husband

Anonymous (vandals/spammers)

Page 35: CHI2007 talk on Conflicts in Wikipedia

Summary: Characterizing Wikipedia

• Coordination costs and conflict are increasing

• Global-level: Trend identification– Decrease in direct article work– Increase in indirect coordination work– Increase in maintenance work

• Article-level: Prediction using Machine learning– Identify characteristics of article conflict– Detect conflict-heavy articles needing extra

attention

• User-level: User Conflict Visualization– Make sense of user conflicts and identify shared

viewpoints

Page 36: CHI2007 talk on Conflicts in Wikipedia

Future Work

• Applied to many domains– Corporate memory (Socialtext)– Intelligence gathering (Intellipedia)– Scholarly research (Scholarpedia)– Collaborative problem solving (Lostpedia)

• Application: Social Dashboard– Identify high conflict articles– Surface editing patterns to readers– Route attention to articles that need it most

Page 37: CHI2007 talk on Conflicts in Wikipedia

Future work

Page 38: CHI2007 talk on Conflicts in Wikipedia

He Says, She Says: Conflict and Coordination in Wikipedia

Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed ChiUCLA Augmented Social Cognition Group

Palo Alto Research Center

Thank you!