we are the data
TRANSCRIPT
WE ARE THE DATA USING CROWDSOURCING FOR BETTER OPEN DATA
Elena Simperl
@esimperl
July 7th, 2015
1
EXECUTIVE SUMMARY Broad participation and ‘open by default’ are key to keep momentum and improve quality of open data
Crowdsourcing helps with open data collection, curation, and analysis
However,
there is crowdsourcing and crowdsourcing pick your favs and mix them
human intelligence is a valuable resource experiment design is key
sustaining engagement is an art crowdsourcing analytics may help
computers & humans are a powerful mix the age of ‘social machines’
2
THE PROMISE
3
CROWDSOURCING: PROBLEM SOLVING VIA OPEN CALLS
“Crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. “
[Source: Howe, 2006]
4
CROWDSOURCING FOR OPEN DATA: PARTICIPATION & INCENTIVES
5
Owners Aggregators /
publishers Consumers
EXAMPLE: OPEN STREET MAPS OPENSTREETMAP.ORG
Open Street Map coverage of Port-au-Prince, Haiti, (C) Mikel Maron 7
Before 2010 earthquake After 2010 earthquake
THE CHALLENGE
8
THERE IS CROWDSOURCING AND CROWDSOURCING
9
HOW TO SET UP AN EFFECTIVE CROWDSOURCING PROJECT?
10
EXAMPLE: OPEN ADDRESSES OPENADDRESSESUK.ORG
11
THE ANSWER
12
BASIC PRINCIPLES
Understand your design space
Decide what you want to do
Avoid major pitfalls
Observe and adjust
Be open
13
DIMENSIONS OF CROWDSOURCING (I)
WHAT IS OUTSOURCED
Tasks you can’t run in-house or using computers
A matter of time, budget, resources, ethics etc.
WHO IS THE CROWD Crowdsourcing ≠‘turkers’
‘Open’ call, biased by use of platforms and promotion channels
For individuals and groups
No traditional means to manage and incent
Crowd has little/no context your project
14
EXAMPLE: PROJECT GUTENBERG GUTENBERG.ORG
WHAT IS OUTSOURCED Proofreading eBooks
Procuring eligible paper books
Burning CDs/DVDs
Crowdfunding
WHO IS THE CROWD
Anyone
15
EXAMPLE: SF OPEN DATA & KAGGLE DATA.SFGOV.ORG
WHAT IS OUTSOURCED
Data processing tasks (on predefined data sets)
WHO IS THE CROWD
Anyone with the right skillset
16
DIMENSIONS OF CROWDSOURCING (II)
HOW ARE THE TASKS OUTSOURCED
Macro vs. microtasks
Complex workflows
Assessment of answers
WHY DO PEOPLE CONTRIBUTE
Money, love, and glory
Aligning incentives
Observe and adjust
18
TYPOLOGIES OF CROWDSOURCING
19
Macrotasks
Microtasks
Challenges
Self-organized crowds
Crowdfunding
Source: [Source: Prpić, Shukla, Kietzmann and
McCarthy 2015]
EXAMPLE: STREETBUMP STREETBUMP.ORG
HOW ARE THE TASKS OUTSOURCED
‘Participatory sensing’
People install app that collects acceleration and GPS data to detect road hazards
Microtasks
Aggregated data used to detect potholes
App targets people who drive and own a smart phone
Open innovation
InnoCentive challenge to improve pothole prediction
Macrotask
400,000 experts registered to InnoCentive
Prize of $25,000
20
EXAMPLE: DISTRIBUTED PROOFREADERS PGDP.NET
HOW ARE THE TASKS OUTSOURCED
eBook divided into pages, proofread independently
People can decide what they work on
Related: Soylent’s ‘Find-Fix-Verify’
For text shortening, proof-reading, open editing
Uses paid microtask platform
21
[Source: Bernstein et al, 2010]
INCENTIVES AND MOTIVATION
Successful volunteer crowdsourcing is difficult to predict or replicate Highly context-specific
Building a community from scratch is not trivial
Not applicable to every task
Reward models often easier to study and control (if performance can be reliably measured) Different models: pay-per-time, pay-per-unit, winner-takes-it-all
Not always easy to abstract from social aspects (free-riding, social pressure)
May undermine intrinsic motivation
22
COMBINING INTRINSIC MOTIVATION WITH REWARDS
Task design matters as much as payment
‘Gamification’ achieves high accuracy for lower costs and improved engagement
People appreciate social features, but their main motivation is still task-driven
23
See also [Feyisetan et al., 2015]
CROWDSOURCING ANALYTICS
24
0
2
4
6
8
10
12
14
16
18
20
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Act
ive u
sers
in %
Month since registration
See also [Luczak-Rösch et al. 2014]
SUMMARY AND BEYOND
25
SUMMARY • There is crowdsourcing and crowdsourcing pick
your favs and mix them
• Human intelligence is a valuable resource experiment design is key
• Sustaining engagement is an art crowdsourcing analytics may help
• Computers & humans are a powerful mix the age of ‘social machines’
26
THE AGE OF SOCIAL MACHINES
27