improving dbpedia (one microtask at a time)
TRANSCRIPT
![Page 1: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/1.jpg)
Improving DBpedia (one microtask at a time)
Elena Simperl
University of Southampton
Google, San Francisco 21 April 2015
![Page 2: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/2.jpg)
DBpedia
Class Instances
Resource (overall) 4,233,000
Place 735,000
Person 1,450,000
Work 411,000
Species 251,000
Organisation 241,000 2
4.58M things
![Page 3: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/3.jpg)
Crowds or no crowds?
• Study different ways to crowdsource entity typing using paid microtasks.
• Three workflows
– Free associations
– Validating the machine
– Exploring the DBpedia ontology
3
![Page 4: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/4.jpg)
What to crowdsource
• Entity typing (free associations)
4
E
C
![Page 5: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/5.jpg)
What to crowdsource (2)
• Entity typing (from a list of suggestions)
5
E - City
- SportsTeam - Municipality
- PopulatedPlace
C
![Page 6: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/6.jpg)
How to crowdsource: no suggestions
Workflow
Ask crowd to suggest
classes
Take top k
Ask crowd to vote the best match
Pros/cons
+ No biases
+ No pre-processing
– Vocabulary convergence
– Time and costs
– The more classifications the better
– Two steps
6
![Page 7: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/7.jpg)
How to crowdsource: with suggestions
Two options
• Generate a shortlist
– Automatically
• Show all available options
– As a tree
Pros/cons
+ Focused, cheap, fast
– Too many classes (685!), see [Miller, 1956]
– Not the right classes
– Tool does not perform well
– Crowd is not familiar with classes, see [Rosch et al., 1976], [Tanaka & Taylor, 1991]
7
![Page 8: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/8.jpg)
How to crowdsource: microtasks
8
![Page 9: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/9.jpg)
How to crowdsource: microtasks (2)
9
![Page 10: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/10.jpg)
Experiments: Data
• Classified entities in popular categories
• Test workflows, compare crowd and machine performance
E1: Baseline, 120 entities
• Test the three workflows on data that cannot be classified automatically
E2: Unclassified entities, 12o
entities
• Fewer judgements • Lower level of tool support
E3: Unclassified
entities, optimized, 120
entities
![Page 11: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/11.jpg)
Experiments: Methods
• Adjusted precision metric to take into account broader and narrower matches, as well as synonyms
• Gold standard (for E2 and E3)
– Two annotator, Cohen kappa of 0.7
– Conflicts resolved via small set of rules and discussions
11
![Page 12: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/12.jpg)
Overall results
• Shortlists are easy & fast
• Freedom comes with a price
• Working at the basic level of abstraction achieves greatest precision
– Even when there is too much choice
12
![Page 13: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/13.jpg)
Other observations
• Unclassified entities might be unclassifiable
– Different entity summary
– Freetext or explorative workflow
• Popular classes are not enough
– Alternative approach to browse the taxonomy
• The basic level of abstraction in DBpedia is user-friendly
– But when given the freedom to choose, users suggest more specific classes
– Domain-specific vocabulary is not welcome
13
![Page 14: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/14.jpg)
Conclusions
• In knowledge engineering, microtask crowdsourcing has focused on improving the results of automatic algorithms
• We know too little about those cases in which algorithms fail
• No optimal workflow in sight
• The DBpedia ontology needs revision
14
![Page 15: Improving DBpedia (one microtask at a time)](https://reader036.vdocument.in/reader036/viewer/2022062419/55a5b7451a28ab1a578b467c/html5/thumbnails/15.jpg)
Using microtasks to crowdsource DBpedia entity classification: a study in workflow design E Simperl, Q Bu, Y Li Submitted to SWJ, 2015
Email: [email protected]
Twitter: @esimperl
15