wikipedia article curation -...

Post on 02-Aug-2020

25 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Wikipedia Article Curation Understanding Quality Recommending Tasks

Morten Warncke-Wang en:User:Nettrom / nettrom@twitter

2014-08-20

1

Recommending Tasks

2

3

Step 1: Interest profiling

4

5

Step 2: Find similar articles

6

7

Step 3: Filtering

8

Want: Tasks

9

10

11

Photo: "Gold Pan" by Nate Cull – CC BY

Step 4: Presentation

12

13

MOAR INFORMATION!

14

What information?

»  Idea: Show viewership & quality »  Contributors work on popular low

quality things first? » Needs data: •  Article viewership •  Article quality

15

Viewership

» Not readily available to contributors » Readily available to us: •  Wikimedia Foundation data dumps •  stats.grok.se

16

Quality

» More easily available •  Wikipedia assessments on the talk page •  Contributor’s own judgement

» Assessment might be lagging or absent »  Contributor judgement requires

experience

17

Problem: Up-to-date Quality

18

Twist: Actionable Quality

19

Solution: Machine Learning?

20

ML Issue: Feature Selection

»  Typical features are difficult to change: •  Editor diversity •  Meta-features

21

Photo: Oregon Dept of Transportation – CC BY

Our Approach: Actionable features

» Use features editors can act upon » Originally 5 main features: •  Amount of article content •  Number of citations •  Number of images •  Number of wikilinks •  Number of article sections

22

Does It Work?

23

Seven classes, difficult problem

»  Seven assessment classes on en-WP: •  Featured Article •  A-class •  Good Article •  B-class •  C-class •  Start-class •  Stub-class

» Unclear boundaries between them

24

Yardstick: Random guesses

25

Random 14.3%

Actionable Model Performance

26

Random

Actionable model

14.3%

42.5%

Often off by one class

27

Random

Actionable model

Actionable model off-by-one

14.3%

42.5%

76.9%

Situation Report

»  Few features, many information » Missing information quality aspects

28

Show Me The Information!

29

30

MOAR BETTER!

31

Improved Information

» Viewership: numbers? » Quality: assessments and predictions? »  Turn quality features into tasks? » Make it sort?

32

Quality: Low, Assessed class: Unassessed, Predicted class: Stub

33

In summary…

»  SuggestBot: recommending tasks to contributors

» Quality: •  Actionable features •  Predicting article quality •  Suggesting improvement tasks

34

Acknowledgements

»  NSF grants IIS 08-08692, 09-68483, 08-45351 »  WMF and Wikimedia Deutschland »  GroupLens Research

35

Questions?

36

Wikipedia: User:Nettrom Email: morten@cs.umn.edu Twitter: @nettrom Web: http://www-users.cs.umn.edu/~morten/ GroupLens: http://www.grouplens.org/

top related