![Page 1: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/1.jpg)
CrowdFill: Collecting Structured Data from the Crowd
Hyunjung Park Jennifer Widom
Stanford University
![Page 2: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/2.jpg)
Goal
•Collect high-quality structured data from the crowd, while capping total monetary cost and keeping latency low
6/25/2014 Hyunjung Park 2
name nationality position caps goals
Brazil
Messi FW
Klose Germany 133
![Page 3: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/3.jpg)
Traditional Microtask-based Approach
1. Decompose the data collection task into a set of microtaskse.g., “What position does Klose play?”
“How many goals has Messi scored?”
2. Each worker provides specific pieces of data via microtasks
3. Assemble the collected pieces of data into the final table
6/25/2014 Hyunjung Park 3
![Page 4: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/4.jpg)
CrowdFill’s Table-filling Approach
1. Present an entire partially-filled table to all participating workers
2. Each worker contributes what they know to the table by filling in empty cells, and voting on data entered by others
3. Propagate worker actions in real-time to synchronize the table across all workers
6/25/2014 Hyunjung Park 4
![Page 5: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/5.jpg)
CrowdFill’s Table-filling Approach
6/25/2014 Hyunjung Park 5
![Page 6: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/6.jpg)
Outline
•Formal model
•Overall architecture
•Concurrent operations
•Satisfying values constraint
•Compensation scheme
•Experimental evaluation
•Related work
6/25/2014 Hyunjung Park 6
![Page 7: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/7.jpg)
Formal Model: Schema
•Table SpecificationColumn definitions and primary keySoccerPlayer(name, nationality, position, caps, goals)
•Scoring FunctionAccept a row r if and only if f(ur, dr) > 0
where ur and dr are its upvote and downvote countse.g., “majority of three or more”
f(ur, dr) = ur−dr if ur+dr≥20 otherwise
6/25/2014 Hyunjung Park 7
![Page 8: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/8.jpg)
Formal Model: Constraints
•Values ConstraintFinal table S must “match” template T (a partially-filled
table)
•Cardinality ConstraintFinal table S must contain at least N rowsSpecial case of values constraint
6/25/2014 Hyunjung Park 8
name nationality position
Argentina
FW
name nationality position
Messi Argentina FW
Rooney England FW
![Page 9: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/9.jpg)
Formal Model: Candidate Table
•Candidate table RExposed to clientsPrimary key not enforcedEach row annotated with its upvote and downvote
counts
6/25/2014 Hyunjung Park 9
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
![Page 10: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/10.jpg)
Formal Model: Operations
•Primitive Operations on R Insert a new empty row into RFill in an empty column of a row with a valueUpvote a complete rowDownvote a non-empty row
6/25/2014 Hyunjung Park 10
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
0 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose 0 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany 0 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 0 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 1 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 1 1
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 1 2
![Page 11: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/11.jpg)
Formal Model: Final Table
•Final table SDerived from candidate table REach complete row r in R such that f(ur, dr) > 0, andf(ur, dr) is the highest score of any row with the same primary key as r
6/25/2014 Hyunjung Park 11
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose German DF 1 2
name nationality position
Messi Argentina FW
Ronaldo Portugal FW
![Page 12: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/12.jpg)
CrowdFill Architecture
Front-end Server
Back-end Server
Database
Worker Client
Web Interface
CrowdsourcingMarketplace
taskacceptance
task setup,payment
results collectiontable specs, payment
Execution Server
CentralClient
Worker Client
Worker Client
Worker Client
dataentry
6/25/2014 Hyunjung Park 12
![Page 13: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/13.jpg)
Outline
•Formal model
•Overall architecture
•Concurrent operations
•Satisfying values constraint
•Compensation scheme
•Experimental evaluation
•Related work
6/25/2014 Hyunjung Park 13
![Page 14: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/14.jpg)
Concurrent Operations
•Model designed to minimize effects of concurrency (details in paper)Operations are easily mergedConflicts are resolved seamlessly
•Convergence theoremArchitecture ensures server and all clients apply the
same operations, possibly with different ordersTheorem guarantees that server and all clients
converge to the same candidate table whenever the system quiesces
6/25/2014 Hyunjung Park 14
![Page 15: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/15.jpg)
Satisfying Values Constraint
•Values constraint Final table S must match template T
•Worker clientsPerform fill, upvote, and downvote operationsNeed not be aware of the template T
• Special “Central client”Automatically populates new rows to guide the final table S
towards the template T
• Probable Row Invariant (PRI)R always contains just enough “probable” rows matching
template TPRI maintained based on maximum bipartite matching
6/25/2014 Hyunjung Park 15
![Page 16: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/16.jpg)
Compensation Scheme: Overview
•After data collectionAllocate a total monetary budget based on each
worker’s overall contribution to the final tableEncourage workers to submit useful workMake total monetary cost predictable
•During data collectionProvide estimated compensation for individual actions
to keep workers engaged
6/25/2014 Hyunjung Park 16
![Page 17: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/17.jpg)
Compensation Scheme: Contribution
•Given final table S, operation op contributed to Sif:op filled in a cell in S (“direct” contribution)op first provided a value for S while creating a subset
of a row in S (“indirect” contribution)op upvoted a row in Sop downvoted a combination of values not present in S
6/25/2014 Hyunjung Park 17
![Page 18: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/18.jpg)
Compensation Scheme: Allocation
•Uniform allocationEach cell and contributing vote has the same
compensationEach cell divided into direct and indirect contributions
•Column-weighted allocationTake into account varying difficulty of filling in
different columns and casting votes
•Dual-weighted allocationAlso take into account entering new key values can get
progressively more difficult as the table fills up
6/25/2014 Hyunjung Park 18
![Page 19: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/19.jpg)
Experimental Evaluation: Setting
•SoccerPlayer(name, nationality, position, caps, goals, date-of-birth)
•Scoring function: “majority of three or more”
•Goal: information about 20 players with caps between 80 and 99
•Five volunteer workers
•Total monetary budget: $10
•Dual-weighted allocation scheme
6/25/2014 Hyunjung Park 19
![Page 20: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/20.jpg)
Experimental Evaluation: Summary
• In our representative runOverall latency: 10m 44s#Rows in the candidate table: 23Final compensations: $0.51, $1.68, $2.08, $2.24, $3.49No “slowdown” in obtaining new primary keys
6/25/2014 Hyunjung Park 20
![Page 21: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/21.jpg)
Accuracy of Estimated Compensation
6/25/2014 Hyunjung Park 21
![Page 22: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/22.jpg)
Related Work
•Crowdsourcing structured dataCrowdDB [Franklin et al. 2011]Deco [Park et al. 2012]
•Real-time cooperative editing systemsConvergence [Ellis and Gibbs 1989] Intention preservation [Sun et al. 1998]
•Monetary compensation for crowdsourcing Incentive designs [Shaw et al. 2011]
6/25/2014 Hyunjung Park 22
![Page 23: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/23.jpg)
Summary
•CrowdFill’s novel table-filling approachReal-time collaboration among workers Intuitive data entry interfaceCompensation based on contribution
• In the paper:Full description of the formal modelPRI maintenance algorithm with examples
More details about the compensation schemeMore experimental results
6/25/2014 Hyunjung Park 23
![Page 24: CrowdFill: Collecting Structured Data from the Crowd](https://reader033.vdocument.in/reader033/viewer/2022042700/554e8691b4c905fc368b460d/html5/thumbnails/24.jpg)
Thank you