crowdsourcing big data_industry_jun-25-2015_for_slideshare

FCPCCS - Big Data and Crowdsourcing

Pattern-recognition and the crowd


What would you do with unlimited human analysts?


People

DataCategories


People

DataCategories

Models


Unstructured data gets structured (bonus: a system that gets smarter over time)

Adaptive System

Machine Learning

Optimization

Human Annotation

Prediction Engine

Structured Data Reports

Action


News Category 4

News Category 2

News Category 1

Manufacturing

Health Sciences

0% 20% 40% 60% 80% 100%

80%

85%

99%

83%

81%

88%

87%

90%

73%

91%

Finding Relevant News Articles

% analyst time saved% accuracy (compared to humans)

Efficiency of human time is a major benefit


The importance of definition

• If people can’t agree on what’s-in and what’s-out, it’s hard to train a machine


Wait a sec! Aren’t these ducks?(Can we agree to disagree?)


The importance of definition

• If people can’t agree on what’s-in and what’s-out, it’s hard to train a machine

• In our case toxicity was defined as:• ad hominem attacks (directed at specific people)• bigoted comments (e.g., sexist, racist, homophobic, etc)

• Set definitions• Then see if people are consistent • Run pilots• Do inter-annotator agreement• Iterate


Inter-annotator agreement: is everyone measuring the same way?


Quick recommendation for inter-annotator agreement• You can measure consistency, probably the best way is

Krippendorff’s alpha• Don’t use percentage agreement! Particularly when data are

skewed towards one category.• If 95% of the data fall under one category label, then random

coding would still have two people agree so much that % agreement would make you think you had a reliable study (even though you wouldn’t)

• And you can ALSO use models to check these things


Finding healthy communities (supportive)


And unhealthy ones (toxic)


Collect data and annotations—then interrogate it

Human annotations

Which people/categories should we be wary

of?

Which annotations do we select to train

a model with?

A classifier that can predict

unseen data


Routing messages that matter


Processing millions of SMS in 12 African languages

Intent of sender(i.e. report a problem, ask

a question or make a suggestion)

Categorization(i.e. orphans and

vulnerable children, violence against children,

health, nutrition)

Language detection(i.e. English, Acholi,

Karamojong, Luganda, Nkole, Swahili, Lango)

Location(i.e. village names)


1.4%


Top 3 categories in Nigeria

Employment

U-report support

Health

9.69%

17.68%

39.44%


The Donald Rumsfeld Question


How do I find what I don’t know I don’t know?


Negative topics in Walmart employee reviews

Hours/Benefits

968

518

Management

2,404Work/life balance

1,241

Company Values Dealing With Customers

658

Training & Expectation

968

Low Pay

1,446


Common Pros among Employees

Common Cons among Employees

Good co-w

orkers

Fits m

y sch

edule

Pay/opportu

nities0%

10%

20%

30%

40%

50%

37%

25% 24%

41%

27%

17%

Current Former

Management

raining/E

xpecta

tions

Low Pay

Work/

Life Balance

0%

10%

20%

30%24%

16%13% 13%14%

16%

12%

CurrentFormer

Structuring unstructured data lets you combine it with other metadata


Question: What improves models the most?


Instead of worrying about the algorithms in the machine


It’s almost always better to just get more pandas


How else do you verify?

We assess model accuracy using cross-validation. Instead of using all annotated data to train a model, you hold out a

random 10% and build the model with the rest. Then you predict against that 10%. You do this 10 times and

average the accuracy. Precision measures “if we automatically label something as

X, how often are we right?” Recall measures “how much of stuff that SHOULD have label

X are actually given label X?”


The system gets smarter

Here’s what happens going across the first 2,543 annotations on one REALLY low signal classification task

By 9,744 annotations, our accuracy is 97%


Other tasks are more straight-forward

50 100 150 2000.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

F-scores go up with more annotations

Disease

Country

Reported_deaths

Reported_cases

Date

Number of paragraphs annotated

F-sc

ore

IssueLocationPeople affected# of deathsEvent date


Project workflow

Phase 1:Data• Data capture,

normalization and loading

Phase 2:Discovery• Topic discovery• Category creation• Expert data

annotation• Category

verification

Phase 3:Training• Guideline creation• Annotator

validation• Model training

Phase 4: Optimization• Model evaluation• Category

refinement

Phase 5:Model Deployment• Full system

integration• Model

performance• Metrics reporting


email [email protected] @idibonwww idibon.com

THANK YOU!

crowdsourcing big data_industry_jun-25-2015_for_slideshare

Data & Analytics