budget-optimal task allocation for reliable crowdsourcing

Budget-Optimal Task Allocation forReliable Crowdsourcing Systems

Sewoong Oh

Massachusetts Institute of Technologyjoint work with David R. Karger and Devavrat Shah

September 28, 2011

1 / 13

Crowdsourcing

Image classification

Character recognition

Transcription

Proofreading

2 / 13

Budget-optimal Crowdsourcing

Microtasks Workers

Add redundancy to cope with errors

Objective: Get reliable answers at minimum cost

Challenges

1. Task Allocation

→ Solution: Random Graph

2. Inference Problem

→ Solution: Low-rank Matrix Approximation

3 / 13

Budget-optimal Crowdsourcing

Microtasks Workers

Add redundancy to cope with errors

Objective: Get reliable answers at minimum cost

Challenges

1. Task Allocation → Solution: Random Graph2. Inference Problem → Solution: Low-rank Matrix Approximation

3 / 13

Previous Work on Reliable Crowdsourcing

Focuses on Inference problem

EM-based heuristics with no guarantees

I Dawid, Skene (’79)I Smyth et al. (’95)I Whitehill et al. (’09)I Welinder et al. (’10)

4 / 13

Task AllocationMicrotasks Batches

Random (`, r)-regular bipartite graphs have good properties

I Locally Tree-like

→ Sharpen Analysis

I Good Expander

︸︷︷︸Gap

→ High Signal-to-Noise Ratio

5 / 13

Modeling the Crowd

−−+

+−−+

−−−

Binary tasks: si ∈ {+1,−1}Worker reliability: pj ∈ [0, 1]

{si with probability pj−si with probability 1− pj

Assume we know if 1n

∑j pj > 0.5

6 / 13

Inference Problem

Given: Responses from the crowd {Aij}Find: Estimate of the answer {si}

si = sign(∑

Wij︸︷︷︸reliability

Aij︸︷︷︸response

−−+

+−−+

−−

0.0001

0 5 10 15 20 25

Resources

Error rate

Majority VotingWij = 1

Oracle Estimator who knows pj ’sWij = log(

pj1−pj

7 / 13

Inference Problem

si = sign(∑

−−+

+−−+

−−

0.0001

0 5 10 15 20 25

Resources

Error rate

Majority VotingWij = 1

Oracle Estimator who knows pj ’sWij = log(

pj1−pj

Iterative Algorithm learns Wij ’s

7 / 13

Inference Problem

si = sign(∑

−−+

+−−+

−−

Iteratively learn the weights

Task-likelihood update

Lij︸︷︷︸likelihood

=∑j′ 6=j

Aij′ Wij′︸︷︷︸reliability

Worker-reliability update

=∑i′ 6=i

Ai′j Li′j︸︷︷︸likelihood

A task is likely to be ‘+’ if reliable

workers agree that it is ‘+’

A worker is reliable if the worker agreed

with our belief on other tasks7 / 13

Iterative Algorithm as Singular Vector Computation

A E[A|s, p] Random Perturbation

︸︷︷︸data

︸︷︷︸low-rank signal

︸︷︷︸noise

−+−

−+−−

1. Why are the singular vectors good for inference?→ Good expanders have high SNR

2. Why not use the singular vectors directly?→ Exploit tree-like structure to prove a sharp bound

8 / 13

Performance Analysis

The performance depends on the worker reliability through

q ≡ 1

n∑j=1

(2pj − 1)2

Theorem. [Karger, O., Shah ’11]

In the large system limit, for σ2 ≡(

3 + 1qr

q2lr−1and `r > 1/q2

Perror ≤ exp{− q`

}9 / 13

How Good is the Performance?

0.0001

0 1 2 3 4 5 6 7 8 9

Majority Voting

EM Algorithm

Iterative Algorithm

Oracle Estimatorq`

Perror

Iterative algorithm (r > 1/q):

Perror ≤ e−116 q`

Matching minimax lower bound:

infAlg,G(`)

sup{si},{pj}∈F(q)

Perror & e−(q`+O(q2`))

10 / 13

Implications

PError ≤ e−116q`

How much do we need to spend to achieve PError ≤ ε ?

I Sufficient to choose ` ∼ 1q log( 1

I Necessary to have ` ∼ 1q log( 1

I Need q to determine `

I Can search for q using bisection

11 / 13

Resource Allocation

Which crowd is ‘better’?

Cost c1 = $0.04 c2 = $0.05

Worker Quality 0 0.2 0.4 0.6 0.8 1

P1 0 0.2 0.4 0.6 0.8 1

q1 = E[(2P1 − 1)2] q2 = E[(2P2 − 1)2]

Invest all resources on arg max qkCk

12 / 13

Resource Allocation

Which crowd is ‘better’?

Cost c1 = $0.04 c2 = $0.05

Worker Quality 0 0.2 0.4 0.6 0.8 1

P1 0 0.2 0.4 0.6 0.8 1

q1 = E[(2P1 − 1)2] q2 = E[(2P2 − 1)2]

Invest all resources on arg max qkCk

12 / 13

Conclusion

Problem: Reliable crowdsourcing with minimum resources

Task allocation: random regular graphs

Inference algorithm: low-rank matrix approximation

Required budget is order-optimal

13 / 13

budget-optimal task allocation for reliable crowdsourcing

Documents

crowdsourcing & culture

crowdsourcing infographics

crowdsourcing quotes – crowdsourcing zitate -...

geocrowd: enabling query answering with spatial...

reliable multiple-choice iterative algorithm for...

talent crowdsourcing: the quick guide - amazon...

crowdfunding & crowdsourcing

technical report 1 frog: a fast and reliable crowdsourcing...

water conservation and allocation policy for oilfield...

statistical decision making for optimal budget allocation in...

crowdsourcing: fundraising best practices and … › - ›...

task allocation in spatial crowdsourcing: current state...

translation crowdsourcing

is crowdsourcing a reliable method for mass data

1 efﬁcient and flexible crowdsourcing of …to a certain...

energy-efficient power allocation in ofdm-based …...key...

1 reliable crowdsourcing for multi-class labeling using...

mirror mirror: crowdsourcing better...

reliable aggregation method for vector regression tasks in...

crowdsourcing claudia