Download - We are the data
![Page 1: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/1.jpg)
WE ARE THE DATA USING CROWDSOURCING FOR BETTER OPEN DATA
Elena Simperl
@esimperl
July 7th, 2015
1
![Page 2: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/2.jpg)
EXECUTIVE SUMMARY Broad participation and ‘open by default’ are key to keep momentum and improve quality of open data
Crowdsourcing helps with open data collection, curation, and analysis
However,
there is crowdsourcing and crowdsourcing pick your favs and mix them
human intelligence is a valuable resource experiment design is key
sustaining engagement is an art crowdsourcing analytics may help
computers & humans are a powerful mix the age of ‘social machines’
2
![Page 3: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/3.jpg)
THE PROMISE
3
![Page 4: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/4.jpg)
CROWDSOURCING: PROBLEM SOLVING VIA OPEN CALLS
“Crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. “
[Source: Howe, 2006]
4
![Page 5: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/5.jpg)
CROWDSOURCING FOR OPEN DATA: PARTICIPATION & INCENTIVES
5
Owners Aggregators /
publishers Consumers
![Page 6: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/6.jpg)
EXAMPLE: OPEN STREET MAPS OPENSTREETMAP.ORG
Open Street Map coverage of Port-au-Prince, Haiti, (C) Mikel Maron 7
Before 2010 earthquake After 2010 earthquake
![Page 7: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/7.jpg)
THE CHALLENGE
8
![Page 8: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/8.jpg)
THERE IS CROWDSOURCING AND CROWDSOURCING
9
![Page 9: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/9.jpg)
HOW TO SET UP AN EFFECTIVE CROWDSOURCING PROJECT?
10
![Page 10: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/10.jpg)
EXAMPLE: OPEN ADDRESSES OPENADDRESSESUK.ORG
11
![Page 11: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/11.jpg)
THE ANSWER
12
![Page 12: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/12.jpg)
BASIC PRINCIPLES
Understand your design space
Decide what you want to do
Avoid major pitfalls
Observe and adjust
Be open
13
![Page 13: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/13.jpg)
DIMENSIONS OF CROWDSOURCING (I)
WHAT IS OUTSOURCED
Tasks you can’t run in-house or using computers
A matter of time, budget, resources, ethics etc.
WHO IS THE CROWD Crowdsourcing ≠‘turkers’
‘Open’ call, biased by use of platforms and promotion channels
For individuals and groups
No traditional means to manage and incent
Crowd has little/no context your project
14
![Page 14: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/14.jpg)
EXAMPLE: PROJECT GUTENBERG GUTENBERG.ORG
WHAT IS OUTSOURCED Proofreading eBooks
Procuring eligible paper books
Burning CDs/DVDs
Crowdfunding
WHO IS THE CROWD
Anyone
15
![Page 15: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/15.jpg)
EXAMPLE: SF OPEN DATA & KAGGLE DATA.SFGOV.ORG
WHAT IS OUTSOURCED
Data processing tasks (on predefined data sets)
WHO IS THE CROWD
Anyone with the right skillset
16
![Page 16: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/16.jpg)
DIMENSIONS OF CROWDSOURCING (II)
HOW ARE THE TASKS OUTSOURCED
Macro vs. microtasks
Complex workflows
Assessment of answers
WHY DO PEOPLE CONTRIBUTE
Money, love, and glory
Aligning incentives
Observe and adjust
18
![Page 17: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/17.jpg)
TYPOLOGIES OF CROWDSOURCING
19
Macrotasks
Microtasks
Challenges
Self-organized crowds
Crowdfunding
Source: [Source: Prpić, Shukla, Kietzmann and
McCarthy 2015]
![Page 18: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/18.jpg)
EXAMPLE: STREETBUMP STREETBUMP.ORG
HOW ARE THE TASKS OUTSOURCED
‘Participatory sensing’
People install app that collects acceleration and GPS data to detect road hazards
Microtasks
Aggregated data used to detect potholes
App targets people who drive and own a smart phone
Open innovation
InnoCentive challenge to improve pothole prediction
Macrotask
400,000 experts registered to InnoCentive
Prize of $25,000
20
![Page 19: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/19.jpg)
EXAMPLE: DISTRIBUTED PROOFREADERS PGDP.NET
HOW ARE THE TASKS OUTSOURCED
eBook divided into pages, proofread independently
People can decide what they work on
Related: Soylent’s ‘Find-Fix-Verify’
For text shortening, proof-reading, open editing
Uses paid microtask platform
21
[Source: Bernstein et al, 2010]
![Page 20: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/20.jpg)
INCENTIVES AND MOTIVATION
Successful volunteer crowdsourcing is difficult to predict or replicate Highly context-specific
Building a community from scratch is not trivial
Not applicable to every task
Reward models often easier to study and control (if performance can be reliably measured) Different models: pay-per-time, pay-per-unit, winner-takes-it-all
Not always easy to abstract from social aspects (free-riding, social pressure)
May undermine intrinsic motivation
22
![Page 21: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/21.jpg)
COMBINING INTRINSIC MOTIVATION WITH REWARDS
Task design matters as much as payment
‘Gamification’ achieves high accuracy for lower costs and improved engagement
People appreciate social features, but their main motivation is still task-driven
23
See also [Feyisetan et al., 2015]
![Page 22: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/22.jpg)
CROWDSOURCING ANALYTICS
24
0
2
4
6
8
10
12
14
16
18
20
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Act
ive u
sers
in %
Month since registration
See also [Luczak-Rösch et al. 2014]
![Page 23: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/23.jpg)
SUMMARY AND BEYOND
25
![Page 24: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/24.jpg)
SUMMARY • There is crowdsourcing and crowdsourcing pick
your favs and mix them
• Human intelligence is a valuable resource experiment design is key
• Sustaining engagement is an art crowdsourcing analytics may help
• Computers & humans are a powerful mix the age of ‘social machines’
26
![Page 25: We are the data](https://reader033.vdocument.in/reader033/viewer/2022042818/55c354febb61eb8f4e8b47fd/html5/thumbnails/25.jpg)
THE AGE OF SOCIAL MACHINES
27