we are the data

26
WE ARE THE DATA USING CROWDSOURCING FOR BETTER OPEN DATA Elena Simperl [email protected] @esimperl July 7 th , 2015 1

Upload: elena-simperl

Post on 06-Aug-2015

276 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: We are the data

WE ARE THE DATA USING CROWDSOURCING FOR BETTER OPEN DATA

Elena Simperl

[email protected]

@esimperl

July 7th, 2015

1

Page 2: We are the data

EXECUTIVE SUMMARY Broad participation and ‘open by default’ are key to keep momentum and improve quality of open data

Crowdsourcing helps with open data collection, curation, and analysis

However,

there is crowdsourcing and crowdsourcing pick your favs and mix them

human intelligence is a valuable resource experiment design is key

sustaining engagement is an art crowdsourcing analytics may help

computers & humans are a powerful mix the age of ‘social machines’

2

Page 3: We are the data

THE PROMISE

3

Page 4: We are the data

CROWDSOURCING: PROBLEM SOLVING VIA OPEN CALLS

“Crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. “

[Source: Howe, 2006]

4

Page 5: We are the data

CROWDSOURCING FOR OPEN DATA: PARTICIPATION & INCENTIVES

5

Owners Aggregators /

publishers Consumers

Page 6: We are the data

EXAMPLE: OPEN STREET MAPS OPENSTREETMAP.ORG

Open Street Map coverage of Port-au-Prince, Haiti, (C) Mikel Maron 7

Before 2010 earthquake After 2010 earthquake

Page 7: We are the data

THE CHALLENGE

8

Page 8: We are the data

THERE IS CROWDSOURCING AND CROWDSOURCING

9

Page 9: We are the data

HOW TO SET UP AN EFFECTIVE CROWDSOURCING PROJECT?

10

Page 10: We are the data

EXAMPLE: OPEN ADDRESSES OPENADDRESSESUK.ORG

11

Page 11: We are the data

THE ANSWER

12

Page 12: We are the data

BASIC PRINCIPLES

Understand your design space

Decide what you want to do

Avoid major pitfalls

Observe and adjust

Be open

13

Page 13: We are the data

DIMENSIONS OF CROWDSOURCING (I)

WHAT IS OUTSOURCED

Tasks you can’t run in-house or using computers

A matter of time, budget, resources, ethics etc.

WHO IS THE CROWD Crowdsourcing ≠‘turkers’

‘Open’ call, biased by use of platforms and promotion channels

For individuals and groups

No traditional means to manage and incent

Crowd has little/no context your project

14

Page 14: We are the data

EXAMPLE: PROJECT GUTENBERG GUTENBERG.ORG

WHAT IS OUTSOURCED Proofreading eBooks

Procuring eligible paper books

Burning CDs/DVDs

Crowdfunding

WHO IS THE CROWD

Anyone

15

Page 15: We are the data

EXAMPLE: SF OPEN DATA & KAGGLE DATA.SFGOV.ORG

WHAT IS OUTSOURCED

Data processing tasks (on predefined data sets)

WHO IS THE CROWD

Anyone with the right skillset

16

Page 16: We are the data

DIMENSIONS OF CROWDSOURCING (II)

HOW ARE THE TASKS OUTSOURCED

Macro vs. microtasks

Complex workflows

Assessment of answers

WHY DO PEOPLE CONTRIBUTE

Money, love, and glory

Aligning incentives

Observe and adjust

18

Page 17: We are the data

TYPOLOGIES OF CROWDSOURCING

19

Macrotasks

Microtasks

Challenges

Self-organized crowds

Crowdfunding

Source: [Source: Prpić, Shukla, Kietzmann and

McCarthy 2015]

Page 18: We are the data

EXAMPLE: STREETBUMP STREETBUMP.ORG

HOW ARE THE TASKS OUTSOURCED

‘Participatory sensing’

People install app that collects acceleration and GPS data to detect road hazards

Microtasks

Aggregated data used to detect potholes

App targets people who drive and own a smart phone

Open innovation

InnoCentive challenge to improve pothole prediction

Macrotask

400,000 experts registered to InnoCentive

Prize of $25,000

20

Page 19: We are the data

EXAMPLE: DISTRIBUTED PROOFREADERS PGDP.NET

HOW ARE THE TASKS OUTSOURCED

eBook divided into pages, proofread independently

People can decide what they work on

Related: Soylent’s ‘Find-Fix-Verify’

For text shortening, proof-reading, open editing

Uses paid microtask platform

21

[Source: Bernstein et al, 2010]

Page 20: We are the data

INCENTIVES AND MOTIVATION

Successful volunteer crowdsourcing is difficult to predict or replicate Highly context-specific

Building a community from scratch is not trivial

Not applicable to every task

Reward models often easier to study and control (if performance can be reliably measured) Different models: pay-per-time, pay-per-unit, winner-takes-it-all

Not always easy to abstract from social aspects (free-riding, social pressure)

May undermine intrinsic motivation

22

Page 21: We are the data

COMBINING INTRINSIC MOTIVATION WITH REWARDS

Task design matters as much as payment

‘Gamification’ achieves high accuracy for lower costs and improved engagement

People appreciate social features, but their main motivation is still task-driven

23

See also [Feyisetan et al., 2015]

Page 22: We are the data

CROWDSOURCING ANALYTICS

24

0

2

4

6

8

10

12

14

16

18

20

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

Act

ive u

sers

in %

Month since registration

See also [Luczak-Rösch et al. 2014]

Page 23: We are the data

SUMMARY AND BEYOND

25

Page 24: We are the data

SUMMARY • There is crowdsourcing and crowdsourcing pick

your favs and mix them

• Human intelligence is a valuable resource experiment design is key

• Sustaining engagement is an art crowdsourcing analytics may help

• Computers & humans are a powerful mix the age of ‘social machines’

26

Page 25: We are the data

THE AGE OF SOCIAL MACHINES

27

Page 26: We are the data

THANKS [email protected]

@esimperl

sociam.org 28