harnessing manpower for creating semantics (doctoral dissertation) jakub Šimko [email protected]...

Harnessing manpower for creating semantics

(doctoral dissertation)

Jakub Š[email protected]

Institute of Informatics and Software Engineering, Faculty of Informatics and Information Technologies,

Slovak University of Technology in Bratislava

Supervised by: prof. Mária Bieliková

July 4th, 2013

Games with a purpose (GWAP) for semantics acquisition

Games with a purpose

Cheap (once they are created)Difficult to create

[Quinn & Bederson. Human computation: a survey and taxonomy of a growing field. CHI’11, 2011]

ESP Game: image metadata acquisition

What is in the image?

Player 1: Player 2:

watersky

bridge

Mostarnightriver

bridgeBosnia

The players must blindly match

Banned words: blue, towers

[Von Ahn & Dabbish: Designing games with a purpose. Commun. ACM, 2008.]

MotivationOpen issues in semantics

acquisition◦Modelling of specific domains◦Personal multimedia metadata

acquisition◦Metadata upkeep

Games with a purpose (GWAPs): design issues◦In general: no design methodology

(young problem area)◦Cold start problems◦Quality management, effectiveness

of work allocation

Thesis Goals

1. Create new, GWAP-based approaches to semantics creation, particularly for specific domains

2. Bring in generally applicable improvements to GWAP design, focusing on selected problems

Work overviewState of the art:GWAP taxonomy and design space

GWAPs we created:Little Search Game: term network acquisitionPexAce: (personal) imagery tag acquisitionCityLights: validation of music metadata

General GWAP design improvements:Helper artifacts: cold start problem reductionPlayer competences: improving GWAP output

quality

Our taxonomy of GWAPs

GWAP designA relatively new area (<10 years)No holistic design methodology exists

◦GWAPs are created ad-hoc

Few works aimed at particular design issues◦ [Ahn, 2008] Player agreement schemes◦ [Chiou, 2011] Suggested considering player skills

in GWAPs

Our contribution: GWAP design dimensions◦ following the idea of design lenses [Schell, 2008]

[Von Ahn & Dabbish: Designing games with a purpose. Commun. ACM, 2008.][Chiou & Hsu. Capability-aligned matching: improving quality of games with a purpose. AAMAS ’11][J. Schell. The art of game design a book of lenses. Elsevier/Morgan Kaufmann, 2008.]

Our GWAP design dimensions

Task distribution

Task difficulty

Validation of player outputAnti-cheating

measures

Purpose encapsulation

Player challenges Player capability driven

Data (ontology) driven

Task-value driven

Greedy

Random

Restrictive rules

Mutual player supervision

Anomalous behavior detection

A posteriori cheating detection

Offline player mutual agreement

Bootstrapping

Automated exact

Automatic approximative

Helper artifacts

Equally complex tasks

Gradually complex tasks

High Low

Social experience

Self-challenge

Competition

Discovery

Online player mutual agreement

Gre

edy

Task

-val

ue

Dat

a-dr

iven

Play

er c

apab

ility

Gre

edy

Task

-val

ue

Dat

a-dr

iven

Play

er c

apab

ility

Gre

edy

Task

-val

ue

Dat

a-dr

iven

Play

er c

apab

ility

Gre

edy

Task

-val

ue

Dat

a-dr

iven

Play

er c

apab

ility

Restrictive rules 9 4 2 1 10 5 2 1Mutual supervision 1 1 1 1 1 1Anomaly detection 7 2 2 1 7 2 2 1A posteriori N/A 4 4 5 5Restrictive rules 1 1Mutual supervisionAnomaly detection 1 1 1A posteriori N/A 1Restrictive rules 2 1 2 1 1Mutual supervisionAnomaly detection 1 1 1 1A posteriori N/ARestrictive rulesMutual supervisionAnomaly detection 2 2A posteriori N/ARestrictive rules 1 1 1 1 1 1Mutual supervisionAnomaly detectionA posteriori 1 1 1 1 1 1N/A

Boot

stra

ppin

gAu

t. Ap

prox

Aut.

Exac

tH

elpe

r ar

tifac

ts

DiscoveryCompetitionSelf-challengeSocial experience

Onl

ine

mut

ual

Existing GWAPs in our design space

PexAceGoal: acquire (personal) image tags

New artifact validation modelQuality management through player modelling

International Journal on Human-Computer Studies [In press]-Šimko, J., Tvarožek, M., Bieliková, M. Human Computation: Single-player Annotation Game for Image Metadata.

SMAP 2011 (IEEE CS Press)-Šimko, J., Bieliková, M.: Games with a Purpose: User Generated Valid Metadata for Personal Archives.

I-Semantics 2012 (ACM)- Šimko, Jakub - Bieliková, Mária: Personal Image Tagging: a Game-based Approach. I-Semantics, 2012

PexAce: acquisition of image metadataCards– image pair seeking memory gamePlayers create image annotations to aid

their memory

Players Single-player game

Untagged images

Free text annotations

General domain

tags

Personal image tags

PexAce: general domain deployment(Standard) Corel 5K dataset: photos +

tags + our tags107 players, 814 games, 2 792 images22 176 annotations, 5 723 tags Golden standard comparison: 73%

precisionAposteriori evaluation: 94% precisionAutomated methods ~70% *

◦Limited set of tags*[Duygulu et. al. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary 2002. Springer-Verlag.]

PexAce for personal imagesPersonal image metadata – virtually

impossible to getPersonal images instead of general

images in PexAce◦Players like that more◦They provide specific annotations (metadata)

Experiments: 2 x 2-player groups, 50 images each

Correctness: 94%◦44% specific tags

Persons (53%)

Events (21%)

Places (15%)

Other (11%)

„Benevolent“ artifact validation model

Original mutual player

supervision

Less strict heuristics

ITPitp ),,(Annotations decomposed to votes:P - players, T- terms, I - Images

Artifact validation and cold start problem:A general GWAP issue

„How can a result of a human intelligence task be automatically evaluated?“

GWAPs use:◦Approximative or exact automated evaluation

(case dependent)◦Mutual player supervision

Threat to multiplayer validation schemes: COLD START‘’The requirement is to have multiple players online at the same time, sometimes with a requirement that they cannot communicate.”

Keep the games single-player

Helper artifacts: a new artifact validation principle

Helper artifacts:◦Decouple scoring from task solving,

instead motivate players to solve tasks to help themselves in the progress of the game

◦E.g. in PexAce, a player may win the game well enough even without the annotations

◦Potential of general applicability (to any existing game)

Quality management in GWAPs:Considering differences in player competences

1. Quantify player skills – player model(e.g. player’s task-solving expertise for each sub-domain)

2. Apply model ina) “post-processing” - Solution filtering

(e.g. vote weighting)b) “pre-processing” - Task assignment

(e.g. match task subdomain to expertise areas)

3. Speed up the process or/and retrieve higher quality results

Measuring player competences: PexAce dataUsefulness (delivery of correct artifacts)Consensus ratio (agreement with other players)Correlation: 0.496

0.40.50.60.70.80.9

1

Consensus ratio Usefulness

0.40.50.60.70.80.9

1

weighting with usefulness Weighting with consensus

Little Search GameGoal: acquire lightweight term network

statistically unsupported, yet valid term relationshipsspecific domain use

Int. J. on Semantic Web and Information Systems-Šimko, Jakub - Tvarožek, Michal - Bieliková, Mária: Semantics Discovery via Human Computation Games. In: International Journal on Semantic Web and Information Systems (2011)

Hypertext 2011 (ACM)-Šimko, Jakub - Tvarožek, Michal - Bieliková, Mária: Little Search Game: Term Network Acquisition via a Human Computation Game. Hypertext, 2011

Little Search Game (negative search game)

Search query: „Star –movie –war –death“

war

armyship

navy

marineamerican

blue

sea

ocean

fish

deep

• Creation of lightweight term network• Player’s task: reduce number of results

with negative search

star

movie

war

death

LSG Term network evaluationAposteriori evaluation: 91%

correctnessA potential to add term

relationships to existing bases◦59% of LSG rels. do not exist in ConceptNet * corpus

◦…including demanded non-taxonomic relationships

*[Liu & Singh. ConceptNet — A Practical Commonsense Reasoning Tool-Kit. BT Technology Journal 2004]

Hidden term relationships – hard for automated discovery (40% of LSG term network)

LSG modification: TermBlaster(Harvesting relationships for software design domain)

Specific domainNo text typing

71 % correct, 21% „hidden relationships“

CityLightsGoal: validate existing music tags

quality management through confidence expression

I-Semantics 2012 (ACM)-Dulačka, Peter - Šimko, Jakub - Bieliková, Mária: Validation of Music Metadata via Game with a Purpose. I-Semantics 2012

CityLights: music tag validation(a concept of validation question)

Validation question:“Which of these tag groups

characterizes the music track you hear?”

1. Rockabilly, USA, 60ties2. Seasonal, rich oldies, xmas3. February 08 love, oldies, 60 musik

Tag support value:+ increases

+ player selects the group

- decreases- p. doesn’t select the group- player rules out the tag

Wrong and correct tags bubble outPossitive and negative thresholds

CityLights: experiments

LastFM dataset875 games, 4933 questions, 1492

tagsFeedback actions per tag:

◦17.75 implicit◦5.29 explicit

Optimized parameter configuration◦68% correctness

Betting mechanism: Measuring competence through confidence

Betting mechanism within a GWAPThrough bet height, the player

expresses his confidence in his task solution

CityLights case: bet height aligns with impact on tag validity value

Helps with cold start problem associated with user modeling

Main contributionsDefinition GWAP design spaceGWAPs for semantics acquisition

◦For specific domains (personal images, SW engineering)

◦For otherwise hardly discoverable semantics (hidden rels.)

New GWAP design principles◦Helper artifacts for cold start reduction◦Metrics for long term player competence modeling

◦Betting mechanism for short term player competence acq.

◦Metadata validation GWAP concept

SummaryGWAP taxonomy and design dimensions

◦ [survey paper prepared]

Little Search Game – Lightweight term network acquisition

Hidden term relationships◦ Hypertext 2011, ACM◦ Int. J. of Semantic Web and Information Systems, 2011 (CC, IGI)

PexAce – Personal image metadata acquisitionHelper artifactsCompetence measures

◦ SMAP 2011, IEEE◦ I-Semantics 2012, ACM◦ Int. J. of Human-Computer Studies, 2013 (CC, Elsevier)

CityLights – Music metadata validationBetting mechanics – player competence through

confidence◦ I-Semantics 2012b, ACM

Selected publicationsSemantics Discovery via Human Computation

Games. In: International Journal on Semantic Web and Information Systems. 2011

Human Computation: Single-player Annotation Game for Image Metadata. International Journal on Human-Computer Studies. 2012 [In press].

Validation of Music Metadata via Game with a Purpose. I-Semantics 2012 (ACM)

Games with a Purpose: User Generated Valid Metadata for Personal Archives. SMAP 2011 (IEEE CS)

Little Search Game: Term Network Acquisition via a Human Computation Game. Hypertext 2011 (ACM)

Personal Image Tagging: a Game-based Approach. I-Semantics 2012 (ACM)

harnessing manpower for creating semantics (doctoral dissertation) jakub Šimko [email protected]...

Documents

semantics acquisition

image metadata acquisition

memory slide

design space slide

taxonomy of gwaps slide

gwap output quality

new area slide

image annotations