why sentiment analysis is a market for lemons … and how to fix it

14
Language Intelligence Why Sentiment Analysis is a Market for Lemons … and How to Fix it Robert Munro

Upload: robert-munro

Post on 12-Apr-2017

61 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

Language Intelligence

Why Sentiment Analysis is a Market for Lemons … and How to Fix it

Robert Munro

Page 2: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

With thanks!

Gary King & Jana Thompson:

<- other Idibon people here:Michelle Casbon & Nick Gaylord

Page 3: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

What is a market for lemons?

• Information asymmetry between buyers and sellers, leaving only "lemons" behind. George Akerlof • Buyers cannot distinguish good

from bad products• Prices are equally low for all

products• The buyer's price adverse

selection problem drives the high-quality products from the market

Page 4: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

Competition is not increasing accuracy• 100+ companies

offering some form of sentiment analysis• Accuracy hovering

around 70% for real-world applications for almost a decade

Page 5: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

The most honest sentiment analysis results you will see

Accuracy

F-Score Recall Precision F-Score

PositiveNegativ

e NeutralPositiv

eNegativ

e NeutralPositiv

eNegati

ve NeutralSemantria 0.59 0.59 0.56 0.47 0.78 0.68 0.80 0.45 0.62 0.59 0.57MonkeyLearn 0.50 0.38* 0.84 0.54 0.00 0.45 0.60 0.00 0.59 0.57 0.00MetaMind 0.66 0.66 0.68 0.46 0.88 0.78 0.88 0.50 0.73 0.60 0.64Idibon Public 0.68 0.67 0.76 0.75 0.49 0.66 0.69 0.72 0.71 0.72 0.58

• Even within the best results for one domain, there is no clear leader when broken down by category• All systems could have best results in other domains• All could adapt here: Monkey Learn had errors with the ‘Neutral’

category, but we are sure they could update their models

Source: Sentiment 140 corpus, 3-way sentiment on social data:http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip

Page 6: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

Data beats algorithms; feedback beats data

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0.457 0.473

0.615

0.948precisionrecallF-value

Distinguishing the correct ‘Ford’

Distinguishing “Ford” the company from people called “Ford”

Page 7: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

Consumers are uncertain• When consumers try out-

of-domain analysis, they lose confidence from the poor results.• Domain-dependence

means that even bad models will be accurate in some areas• Consumers can only

evaluate anecdotally or by precision, not recall • Uncertainty prevails

Page 8: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

Market forces are not breeding innovation• Can’t innovate

through code alone• More training data! • But low price-points

means low margins • Lack of capital to

find & label enough training data

Page 9: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

The Solution

• A different economic models for useful sentiment analysis: • Data-sharing for more

accurate training data • Protecting sensitive data

from public release

Page 10: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

Machine learning

Optimization

Human annotation

Cloudprediction

engine

Actionable intelligence

On-site prediction

engine

Copy & Sync Models

App Requests

Ambiguous, Novel & Interesting Items

Internal Data Flow

Hybrid Model Data Flow

Application Data Flow

firewall

Page 11: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

The Benefits• Multiple organizations can share in the benefits of better

sentiment analysis, without sacrificing privacy• Single point of human-contact: no expensive duplicate

manual labeling of data• Keeps lemons out of the market

Page 12: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

Idibon Public: our implementation

• Free product, offered in addition to our enterprise Idibon Studio and Idibon Terminal solutions

Page 13: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

Applies to NLP and Machine Learning more broadly

Every human communication

• Any task can be bundled this way• Allows margins for use cases that

were not otherwise viable• … including the full diversity of

languages, priced out when everyone started in English

Page 14: Why Sentiment Analysis is a Market for Lemons … and How to Fix it

Language Intelligence

Why Sentiment Analysis is a Market for Lemons … and How to Fix it

QUESTIONS?Robert Munro