text, topics, and turkers: a consensus measure for statistical topics

29
Text, Topics, and Turkers. Hypertext 2015 1 Text, Topics, and Turkers: A Consensus Measure for Statistical Topics Fred Morstatter , Jürgen Pfeffer , Katja Mayer * , Huan Liu Arizona State University Tempe, Arizona, USA Carnegie Mellon University Pittsburgh, Pennsylvania, USA * University of Vienna Vienna, Austria

Upload: fred-morstatter

Post on 13-Feb-2017

566 views

Category:

Social Media


1 download

TRANSCRIPT

Text, Topics, and Turkers. Hypertext 2015 1

Text, Topics, and Turkers:A Consensus Measure for Statistical Topics

Fred Morstatter†, Jürgen Pfeffer‡, Katja Mayer*, Huan Liu†

†Arizona State UniversityTempe, Arizona, USA

‡Carnegie Mellon UniversityPittsburgh, Pennsylvania, USA

*University of ViennaVienna, Austria

Text, Topics, and Turkers. Hypertext 2015 2

Text

• Text is everywhere in research.• Text is huge:

• Too much data to read.• How can we understand what is going on in

big text data?

Source Size

Wikipedia 36 million pages

World Wide Web 100+ billion static web pages

Social Media 500 million new tweets/day

Text, Topics, and Turkers. Hypertext 2015 3

Topics

• Topic Modeling• Latent Dirichlet Allocation (LDA)

– Most commonly-used topic modeling algorithm– Discovers “topics” within a corpus

Corpus

LDA

K

Topic ID Words

Topic 1 cat, dog, horse, ...

Topic 2 ball, field, player, ...

... ...

Topic K red, green, blue, ...

Topic 1 Topic 2 ... Topic K

Document1 0.2 0.1 0.01

Document2 0.7 0.02 0.1

...

Documentn 0.1 0.3 0.01

Text, Topics, and Turkers. Hypertext 2015 4

Topics

LDA

K = 10

Topic ID Words

Topic 1 river, lake, island, mountain, area, park, antarctic, south, mountains, dam

Topic 2 relay, athletics, metres, freestyle, hurdles, ret, divisão, athletes, bundesliga, medals

... ...

Topic 10 courcelles, centimeters, mattythewhite, wine, stamps, oko, perennial, stubs, ovate, greyish

Topic 1 Topic 2 ... Topic 10

Document1 0.2 0.1 0.01

Document2 0.7 0.02 0.1

...

Documentn 0.1 0.3 0.01

Text, Topics, and Turkers. Hypertext 2015 5

Topics

• How can we measure the quality of statistical topics?

• We don’t know how well humans can interpret topics.

• Problem: Does their understanding match what is going on in the corpus?

Text, Topics, and Turkers. Hypertext 2015 6

Turkers

• One Solution: Crowdsourcing• Example: Amazon’s Mechanical Turk

– Show LDA results to Turkers– Gauge their understanding– How to effectively measure understanding?

Text, Topics, and Turkers. Hypertext 2015 7

Turkers

• Previous Work: Chang et. al 2009– “Word Intrusion”– “Topic Intrusion”

Corpus

LDA

KTopic ID Words

Topic 1 cat, dog, horse, ...

Topic 2 ball, field, player, ...

... ...

Topic K red, green, blue, ...

Topic 1 Topic 2 ... Topic K

Document1 0.2 0.1 0.01

Document2 0.7 0.02 0.1

...

Documentn 0.1 0.3 0.01

“Word Intrusion”

“Topic Intrusion”

Text, Topics, and Turkers. Hypertext 2015 8

Word Intrusion

• Show the Turker 6 words in random order– Top 5 words from topic– 1 “Intruded” word– Ask Turker to choose “Intruded” word

cat dog bird truck horse snake

Topic i:

[Chang et. al 2009]

Text, Topics, and Turkers. Hypertext 2015 9

Topic Intrusion

• Show the Turker a document• Show the Turker 4 topics

– 3 most probable topics– 1 “Intruded” topic– Ask Turker to choose “Intruded” Topic

Documenti

Topic A Topic B Topic C Topic D

[Chang et. al 2009]

Text, Topics, and Turkers. Hypertext 2015 10

New Measure: Topic Consensus

Corpus

LDA

KTopic ID Words

Topic 1 cat, dog, horse, ...

Topic 2 ball, field, player, ...

... ...

Topic K red, green, blue, ...

Topic 1 Topic 2 ... Topic K

Document1 0.2 0.1 0.01

Document2 0.7 0.02 0.1

...

Documentn 0.1 0.3 0.01

“Word Intrusion”

“Topic Intrusion”

• Complements existing framework• Measures topic quality with corpus.

“Topic Consensus”

Text, Topics, and Turkers. Hypertext 2015 11

Topic Consensus: Intuition• Measures the agreement between topics and

“sections” they come from.LDA Distribution Turker Distribution

Text, Topics, and Turkers. Hypertext 2015 12

Topic Consensus: Calculation

• We are comparing probability distributions.• Jensen-Shannon Divergence.

Turker Distribution LDA Distribution

Text, Topics, and Turkers. Hypertext 2015 13

Dataset

• Scientific Abstracts• All available abstracts

since 2007.• Classified into three areas:

– Social Sciences & Humanities (SH)– Life Sciences (LS)– Physical Sciences (PE)

• Ran LDA on this dataset:– K = [10, 25, 50, 100]– 185 topics; 4 topic sets.

Text, Topics, and Turkers. Hypertext 2015 14

Turkers

• One task:

• Turkers have 3 + 1 options. • Each task solved 8 times.

Text, Topics, and Turkers. Hypertext 2015 15

Results

Topic Set

ERC

-10

ERC

-25

ERC

-50

ERC

-100

new, group, results, plan, class, ...

selection, variation, population, genetic, natural, ...

Text, Topics, and Turkers. Hypertext 2015 16

Other Topic Sets

• LDA Topics– Use New York Times dataset from one day.

25 topics, 1 topic set• Hand-Picked Topics

– Pure “Social Science & Humanities”• Sampled words that occur only in these documents.

11 topics, 1 topic set– Random Topics

• Randomly choose topics according to word distribution of corpus.25 topics, 1 topic set

Text, Topics, and Turkers. Hypertext 2015 17

Results

Topic Set

ERC

-10

ERC

-25

ERC

-50

ERC

-100 N

YT-

25

RA

ND

-25

SH-2

5

Text, Topics, and Turkers. Hypertext 2015 18

Overview of the Process

• Topic Consensus can reveal new information about the topics being studied.– Can measure topics from a new perspective.– Can help reveal topic confusion.

• Drawbacks:– Expensive– Time Consuming– Scalability

Text, Topics, and Turkers. Hypertext 2015 19

Automated Measures

1. Topic Size: Number of tokens assigned to the topic.

2. Topic Coherence: Probability that the top words co-occur in documents in the corpus.

3. Topic Coherence Significance: Significance of Topic Coherence compared to other topics.

4. Normalized Pointwise Mutual Information: Measures the association between the top words in the topics.

Text, Topics, and Turkers. Hypertext 2015 20

Measures

• Herfindahl-Hirschman Index (HHI)– Measures concentration of a market.– Used to find monopolies.– Viewed from two perspectives:

Word Probability HHI

"vaccine" "disease" "cure" "medicine" ...

5. 6.

Social Sciences Physical Sciences

Life Sciences

ERC Section HHI

Text, Topics, and Turkers. Hypertext 2015 21

Results - Correlation

Automated Measure CorrelationTopic Size -0.532Topic Coherence -0.584Topic Coherence Significance -0.788Normalized Pointwise Mutual Information

-0.774

HHI (Word Probability) -0.885HHI (ERC Section) -0.478

Text, Topics, and Turkers. Hypertext 2015 22

Results - Prediction

• Build classifier to predict actual Topic Consensus value.

• Build linear regression model:– Takes automated measures.– Predicts Topic Consensus.

• RMSE: 0.12 ± 0.02.

Text, Topics, and Turkers. Hypertext 2015 23

Acknowledgements

• Members of the DMML lab

• Office of Naval Research through grant N000141410095

• LexisNexis and HPCC Systems

Text, Topics, and Turkers. Hypertext 2015 24

Conclusion

• Introduced a new method for evaluating the interpretability of statistical topics.

• Demonstrated this measure on a real-world dataset.

• Automated this measure for scalability.

Text, Topics, and Turkers. Hypertext 2015 25

Future Work

• How sensitive are measures to top words?– Word Intrusion uses 5– Topic Intrusion uses 5– Topic Consensus uses 25

• How do measures fare on different datasets?

• Other measures that can reveal quality topics?

Text, Topics, and Turkers. Hypertext 2015 26

Auxiliary Slides

Text, Topics, and Turkers. Hypertext 2015 27

User Demographics

Sex Education Age

First Language Country of Origin

Text, Topics, and Turkers. Hypertext 2015 28

Results – Confusion Matrix

Text, Topics, and Turkers. Hypertext 2015 29

Dataset Statistics