breaking bad - understanding the behavior of crowd workers in categorization microtasks

Breaking Bad - Understanding Behavior of Crowd Workers

in Categorization Microtasks

Ujwal Gadiraju, Ricardo Kawase, Patrick Siehndel and Besnik Fetahu

METU NCC, 2nd September 2015

Outline

● Motivation

● Categorization Tasks

● Analysis & Results

● Conclusions

2

What is the problem?

● Increase in the number of new task requesters on AMT (1000 per month) [Difallah et al., WWW’15].

○ Not all task requesters are familiar with task task task-specific settings

○ No tangible guidelines for task design ; ■ task length■ monetary incentive■ task completion time

Worker Behavior in Categorization Tasks

● Categorization tasks are one of the most common types of crowdsourced tasks. [Gadiraju et al., A taxonomy of Microtasks on the Web, HT’14]

● Experimental Setup:○ 9 tasks deployed on CrowdFlower○ Task length : 20, 30, 40 units○ Monetary Reward : 1 , 2, 3 USD cents

Tasks Design● Clear instructions and help snippets.● Workers have to select the most

suitable category in each Set (1-5) consisting of 10 different categories.

● Category options were manually tailored to avoid ambiguity.

● Set-1 was made compulsory, Set-2 through Set-5 were optional.

● Tasks were deployed non-concurrently, and order of units were randomized within each task.

● Tasks designed to facilitate 100% accuracy in responses (with an aim to study worker behavior).

Data Collection

● Responses gathered from 100 workers in each task ; 900 workers in total.

● We collected 27,000 unit judgments in total. In 88% of the cases, workers provided responses for all sets (incl. optional).

● Average Task Completion Time○ Tasks with length of 20 Units : 11.3 mins○ Tasks with length of 30 Units : 16.4 mins○ Tasks with length of 40 Units : 18.6 mins

● Tipping Point : The first point (unit-index) at which a worker provides an unacceptable response after having provided at least one acceptable response. [Gadiraju et al., CHI’2015]

● Beaver Workers : Workers who exert additional effort by answering optional questions in order to help task requesters.

Definitions

Consistency of Units within Tasks● Avg. accuracy of around 90% with little Std. Dev.

● We tolerate 10% incorrect responses from workers, owing to possible drifts in attention spans / boredom.

● Bad Workers : Workers who answer 10% or more of the units within a categorization task incorrectly.

● Poor Starters : Workers whose first 2 responses within a categorization task are incorrect.

Poor Starters, Bad Workers, & Tipping Point

Task Completion Time vs Worker Accuracy

Worker Behavior Within a TaskKey Findings● A worker’s accuracy decreases through the

course of a task. (optional sets are not considered). ○ This is more prominent as the task length

increases.● Workers that exert additional effort project

higher accuracies within tasks.● The additional effort that workers exert

decreases through the course of a task. ○ This is more prominent as the task length

increases.

Scrutiny of Additional Responses

● % Correct Additional Responses gradually decreases from Set-1 to Set-5.

● On average, workers skip more optional sets as they proceed from Set-2 to Set-5.

Workers Breaking Bad

Adjusted Tipping Point (ATP) : Workers that consecutively respond to at least 10% of the units in a task incorrectly, are said to have an ATP. The index of the first unit at which this is observed, is called the ATP of the worker. Such a worker is called a BREAKER.

Conclusions & Future WorkTo achieve good quality in categorization tasks…● It is better to err on the lower side of monetary

incentives offered.● Use minimum time required as a filter, but give

ample time for task completion. It is better to err on the higher side of maximum task completion time.

● It is better to err on the shorter side of task length.

● We can gauge worker intentions through the nature of their responses to optional questions.

● We plan to quantify the limits and these guidelines in the imminent future.

Contact Details :

[email protected]

http://www.L3S.de

SLIDES: http://www.slideshare.net/ujwal07/

15

Removal of Ineligible Workers

Ineligible workers : The workers who do not conform to the priorly stated prerequisites, belong to this category.

● We found 9 ineligible workers who used browser-embedded translator tools in order to participate in the task.

● Ineligible workers were not considered in the further analysis.

breaking bad - understanding the behavior of crowd workers in categorization microtasks

Data & Analytics