task and workflow design in human computation kse 652 social computing system design and analysis...

44
Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Upload: jeffry-watts

Post on 03-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Task and Workflow Designin Human Computation

KSE 652 Social Computing System Design and Analysis

Uichin Lee

Page 2: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

TurKit: Human Computation Algorithms on Mechanical Turk

Greg Little, Lydia B. Chilton, Rob Miller, and Max Goldman

(MIT CSAIL)UIST 2010

Page 3: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Workflow in M-Turk

HIT

HIT

HIT

HIT

HIT

HIT

Data Collected

in CSV File

Requester posts HIT Groups to

Mechanical Turk

Data Exported for Use

Page 4: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Workflow: Pros & Cons

• Easy to run simple, parallelized tasks.• Not so easy to run tasks in which turkers

improve on or validate each other’s work.

• TurKit to the rescue!

Page 5: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

The TurKit Toolkit

• Arrows indicate the flow of information.

• Programmer writes 2 sets of source code:– HTML files for web

servers– JavaScript executed by

TurKit

• Output is retrieved via a JavaScript database.

Turkers

Mechanical Turk

Web Server TurKit

*.html *.js

Programmer

JavaScript Database

Page 6: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Crash-and-rerun programming model

• Observation: local computation is cheap, but the external class costs money

• Managing states over a long running program is challenging– Examples: Computer restarts? Errors?

• Solution: store states in the database (just in case)• If an error happens, just crash the program and re-run by

following the history in DB– Throw a “crash” exception; the script is automatically re-run.

• New keyword “once”: – Remove non-determinism– Don’t need to re-execute an expensive operation (when re-run)

• But why should we re-run???

Page 7: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Example: quicksort

Page 8: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Parallelism

• First time the script runs, HITs A and C will be created

• For a given forked branch, if a task fails (e.g., HIT A), TurKit crashes the forked branch (and re-run)

• Synchronization w/ join()

Page 9: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

MTurk Functions

• Prompt(message, # of people)– mturk.prompt("What is your favorite color?", 100)

• Voting(message, options)• Sort(message, items)

VOTE() SORT()

Page 10: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

TurKit: Implementation

• TurKit: Java using Rhino to interpret JavaScript code, and E4X2 to handle XML results from MTurk

• IDE: Google App Engine3 (GAE)

Online IDE

Page 11: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Exploring Iterative and Parallel Human Computation Processes

Greg Little, Lydia B. ChiltonMax Goldman, Robert C. Miller

HCOMP 2010

Page 12: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

HC Task Model

• Dimension: – Dependent (iterative) or independent (parallel) tasks – Creation and decision tasks

• Task model examples

Creation tasks (creating new content): e.g., writing ideas,

imagery solutions, etc.

Decision tasks (voting/rating): e.g., rating quality of a description of an

image

Page 13: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

HC Task Model

• Combining tasks: iterative and parallel tasks

Iterative pattern: a sequence of creation tasks where the result of each task feeds into the next one, followed by a comparison task

Parallel pattern: a set of creation tasks executed in parallel, followed by a task of choosing the best

Page 14: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Experiment: Writing Image Description

• Iterative vs. parallel; each 6 creation tasks ($0.02), followed by rating tasks (1-10 scale, $0.01)

Page 15: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Experiment: Writing Image Description

• Turkers in iterative condition gave better description while parallel condition always shows an empty text area.

Page 16: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Experiment: Writing Image Description

• Average rating after n iterations– After six iterations: 7.9 vs. 7.4, t-test T29=2.1, p=0.04

iterative

parallel

Page 17: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Experiment: Writing Image Description

• Length vs. rating: positive correlation

• The two outliers (circled) represent instances of text copied from the Internet (with superficial description)

Length (characters)

Ratin

g

Page 18: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Experiment: Writing Image Description

• Work Quality:– 31% mainly append content at the end, and make only minor

modifications (if any) to existing content; – 27% modify/expand existing content, but it is evident that they use

the provided description as a basis;– 17% seem to ignore the provided description entirely and start over;– 13% mostly trim or remove content; – 11% make very small changes (adding a word, fixing a misspelling,

etc);– 1% copy-paste superficially related content found on the internet.

• Creating vs. improving (takes about the same time, avg. 211 seconds)

Page 19: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Experiment: Brainstorming

Page 20: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Experiment: Brainstorming

• Iterative work: higher average rating– Biased thinking: e.g., tech -> xxtech -> yytech

• Parallel work: diversity, higher deviation (rating) – No iteration for brainstorming

Iteration Rating

Avg.

Rati

ng

iterative

parallel

Page 21: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Example: Blurry Text Recognition

Page 22: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Example: Blurry Text Recognition

• Iterative performs better than parallel

Iteration

Accu

racy

Page 23: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Summary

• TurKit: a flexible programming tool for m-turk

• Various work-flow can be designed; e.g., iterative, parallel, and hybrid

• Iterative performs better than parallel in several cases (e.g., image description, brainstorming, text recognition)

Page 24: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Turkalytics: Real-time Analytics for Human Computation

Paul Heymann and Hector Garcia-MolinaWWW'11

Page 25: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Basic Buyer human programming• A human program generates forms; advertised through a marketplace. • Workers look at posts, and then complete the forms for compensation.

Page 26: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Game Maker human programming• The programmer writes a human program and a game. • The game implements features to make it fun and difficult to cheat. • The human program loads and dumps data from the game.

Page 27: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Human Processing programming

Page 28: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Human Processing programming• Task description:

– Input, output, web forms, human driver, other information– Human task instance

• Human drivers: interact with workers– Functions: initialization (forms, games), retrieving results – “Human Program” accesses workers via “human drivers”

• Recruiters: post task instances into the marketplaces, (by working with marketplace drivers)– Marketplace driver provides an interface to marketplaces

(description) (instance)

Page 29: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Turkalytics

• Challenge: collecting reliable data about the workers and the tasks they perform

• Why?– If a task is not being completed, is it because no workers

are seeing it? Is it because the task is currently being offered at too low a price?

– How does the task completion time break down? – Do workers spend more time previewing tasks or doing

them? – Do they take long breaks? – Which are the more “reliable” workers?

Page 30: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Interaction Model

• Search-Preview-Accept (SPA) model

Page 31: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Interaction Model• Search-Continue-RapidAccept-Accept-Preview (SCRAP)

Continue completing a task that was accepted but not submitted

Accept the next task in a HITGroup w/o previewing it

Page 32: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Turkalytics Data Models

Page 33: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Turkalytics ArchitectureClient-side javascript: ta.js Log Server

Client-side javascript: ta.js

ta.js

ta.js

Ajax: POST

Log messages (JSON )

Analysis Server

Log messages (JSON )

Page 34: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Implementation: client-side Javascript

• Requester embeds a Turkalytics script (ta.js) into a HIT (when designing a HIT)– Monitoring: Detect relevant worker data and actions.– Sending: Log events by making image requests to the

log server (ajax: POST)

Page 35: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Implementation: ta.js -- client-side JavaScript

• ta.js’s monitoring activities:– Client Information: Worker’s screen resolution? What

plugins are supported? Can ta.js set cookies?– DOM Events: Over the course of a page view, the

browser emits various events (e.g., load, submit, before unload, and unload events)

– Activity: listens on a second-by-second basis for the mousemove, scroll and keydown events to determine if the worker is active or inactive.

– Form Contents: examines forms on the page and their contents; logs initial form contents, incremental updates, and final state.

Page 36: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Implementation: log/analysis

• Log Server:– Simple web app built on Google’s App Engine. – Receives logging events from clients running ta.js and saves them

to a data store. • IP address, user agent, and referer, etc

• Analysis Server: – Periodically polls the log server to download any new events that

have been received – Event inserted into DB, considering the following:

• Time constraints: data availability to analysis server• Dependencies: if events are dependent on one another• Incomplete input: if all events are not received yet..• Unknown input: what if unexpected input is received?

Page 37: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Implementation: analysis

// what type of data (event) is sent // actual data for a given type

Detailed info about task

// session ID

Page 38: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Experiments• Tasks:

– Named Entity Recognition (NER): This task, posted in groups of 200 by a researcher in Natural Language Processing, asks workers to label words in a Wikipedia article if they correspond to people, organizations, locations, or demonyms. (2, 000 HITs, 1 HIT Type, more than 500 workers.)

– Turker Count (TC): This task, posted once a week by a professor of business at U.C. Berkeley, asks workers to push a button, and is designed just to gauge how many workers are present in the marketplace. (2 HITs, 1 HIT Type, more than 1, 000 workers each.)

– Create Diagram (CD): This task, posted by the authors, asked workers to draw diagrams for this paper based on hand drawn sketches

Page 39: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Experiments: origin of workers

• GeoLite City DB from MaxMind to geolocate all remote users by IP address

Page 40: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Experiments: worker characteristics

Page 41: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Experiments: states/actions

• RapidAccept is quite popular (Continue is rare)

Page 42: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Experiments: # previews• Artificial recency for NER/CD (keep making them near the top in the list):

NER and CD exhibit less severe drop as opposed to TC

ArtificialRecency

Page 43: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Experiments: activity vs. delay

• Average active and total seconds for each worker who completed the NER task (correlation 0.88)

Page 44: Task and Workflow Design in Human Computation KSE 652 Social Computing System Design and Analysis Uichin Lee

Discussion

• Multi-tasking users? Activity vs. working time• Privacy??– We can collect as much as we can..– How about Google Analytics? Any web pages that we visit

can collect such information…

• False data injection?• How can we better utilize the dataset?– Re-designing existing tasks, pricing, etc. (or mining user

behavior?)