trend detection and analysis on twitter

37
Trend Detection and Analysis on Twitter Benjamin Räthlein Henning Muszynski Lukas Masuch

Upload: lukas-masuch

Post on 14-Aug-2015

64 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Trend detection and analysis on Twitter

Trend Detection and Analysis on Twitter

Benjamin Räthlein Henning Muszynski Lukas Masuch

Page 2: Trend detection and analysis on Twitter

3

Motivation

Predict the stock market in real time source

source

Detecting influenza epidemics

Automatic crime prediction source

“Successful results of mainly research-based projects

helped to open up new business opportunities”

Page 3: Trend detection and analysis on Twitter

4

Twitter

Page 4: Trend detection and analysis on Twitter

5

Early Trend Detector

Bag-of-words (Hashtags, Mentions)

Twitter Streaming API (Twython)

Architecture

Bag of Words

Bags Count

#newyear 7

#christmas 6

@bigdata 2

@sap 3

Page 5: Trend detection and analysis on Twitter

6

Statistical Measurement

Early Trend Detector

Bag-of-words (Hashtags, Mentions)

Twitter Streaming API (Twython)

Architecture

Statistical Measurement (growth, average usage, retweets, participating users…)

Report statistics (every 20 minutes): • Total hashtags & user mentions • Hashtag/mentions count • Usage growth per hashtag/mention • Participating users per hashtag/mention • Retweet count per hashtag/mention

Page 6: Trend detection and analysis on Twitter

7

Early Trend Detector

Bag-of-words (Hashtags, Mentions)

Twitter Streaming API (Twython)

Architecture

Statistical Measurement (growth, average usage, retweets, participating users…)

Anomaly Detection

Time Series Analysis

Calculated for every hashtag / user mention Every 2 / 4 hours based on reports

Anomaly detection using: • Relative & absolute fluctuation • Total occurrences (sum) • Minimum occurrences • Maximum occurrences • Average occurrences

Time Series Analysis

Page 7: Trend detection and analysis on Twitter

8

Twitter Streaming API (Twython)

Architecture

Trend Analyzer

Text Preprocessing (Python NLTK)

Lowercasing & tokenizing

URL & stopword removal

Stop Word Removal

This sample text shows which words will be removed when applying stop word removal. Mostly words like the, a or and.

This sample text shows which words will be removed when applying stop word removal. Mostly words like the, a or and.

Page 8: Trend detection and analysis on Twitter

9

Twitter Streaming API (Twython)

Architecture

Trend Analyzer

Text Preprocessing (Python NLTK)

URL & stopword removal

Lowercasing & tokenizing

Word stemming

Stemming

Amazing

Amazement

Amazed

amaze

Page 9: Trend detection and analysis on Twitter

10

Twitter Streaming API (Twython)

Architecture

Trend Analyzer

Text Preprocessing (Python NLTK)

URL & stopword removal

Lowercasing & tokenizing

Word stemming

Sentiment Analysis

Sentiment Analysis

I love cookies I hate cookies

Page 10: Trend detection and analysis on Twitter

11

Twitter Streaming API (Twython)

Architecture

Trend Analyzer

Text Preprocessing (Python NLTK)

URL & stopword removal

Lowercasing & tokenizing

Word stemming

Sentiment Analysis

Topic Modeling (LDA)

Topic Modeling

Topics • … • … • …

Trend Classification

Page 11: Trend detection and analysis on Twitter

14

Trend Analyzer

Text Preprocessing (Python NLTK)

URL & stopword removal

Lowercasing & tokenizing

Word stemming

Sentiment Analysis

Topic Modeling (LDA)

Wordcloud Visualization

Wordfreq.js

Wordcloud2.js

GeoSpatial Visualization

CartoDB

Early Trend Detector

Bag-of-words (Hashtags, Mentions)

Anomaly Detection

Statistical Measurement (growth, average usage, retweets, participating users…)

Time Series Analysis

Trend Classification

Twitter Streaming API (Twython)

Architecture

Page 12: Trend detection and analysis on Twitter

15

Analyzed Trends

Page 13: Trend detection and analysis on Twitter

16

Limitations

Tweets collected: 38 million (70GB)

Only English tweets from the USA

Twitter Streaming API

Page 14: Trend detection and analysis on Twitter

17

New Year

Time Series

Page 15: Trend detection and analysis on Twitter

18

New Year

Word Cloud

Page 16: Trend detection and analysis on Twitter

19

New Year

Geospatial Analysis

Midnight Los Angeles Midnight New York

Page 17: Trend detection and analysis on Twitter

20

New Year

Sentiment Analysis

Positive Neutral Negative

Home sick on #nye. Horrible timing

stupid cold. Ugh. My date is my

couch & pillow watching.

#HappyNewYear everyone.

#HappyNewYear from the Youth for

Astronomy and Engineering Program

at Space Telescope Science Institute!

Happy New Year! Last year was

amazing, and here’s to another great

year of love & happiness! #NYE2015

Page 18: Trend detection and analysis on Twitter

21

Air Asia Tragedy

Page 19: Trend detection and analysis on Twitter

22

Air Asia Tragedy

Time Series

Page 20: Trend detection and analysis on Twitter

23

Air Asia Tragedy

Word Cloud

Page 21: Trend detection and analysis on Twitter

24

Air Asia Tragedy

Topic Modeling

News airasia, missing, flight, air, Indonesia, singapore, asia

Search for the Plane airasia, missing, plane, find, plane, world, technology

Sympathy Prayers, families, thoughts, airasia, crash, thought, airfrance

Cause airasia, weather, flight, pilots, fly, bad, path

International Help raaf, butterworth, china, australia, Russia, trndnl, trending

Page 22: Trend detection and analysis on Twitter

25

Air Asia Tragedy

Sentiment Analysis

Neutral Negative Positive

Prayers are USELESS! Stop repeating

meaningless crap, pretending that

you care … #PrayForAirAsia #QZ8501

#GrowABrain #ReligousNonsense

#BREAKING #AirAsia Flight #8501

likely “at the bottom of the sea”

rescue officials says.

May God’s great love shine on the

families and loved ones of all

passengers and crew #AirAsia #8501

Page 23: Trend detection and analysis on Twitter

26

Air Asia Tragedy

Google Trends Comparison

Google Trends Twitter Sample

Page 24: Trend detection and analysis on Twitter

27

Air Asia Tragedy

Google Trends Comparison

Google Trends Twitter Sample

Page 25: Trend detection and analysis on Twitter

28

Sony Hack

Page 26: Trend detection and analysis on Twitter

29

Sony Hack

Time Series

Page 27: Trend detection and analysis on Twitter

30

Sony Hack

Word Cloud

Page 28: Trend detection and analysis on Twitter

31

Sony Hack

Topic Modeling

Christmas Release theinterview, christmas, day, theaters, freedom, theater, showing

Reviews theinterview, jamesfrancotv, sethrogen, movie, interview, funny, hilarious

Suspicions northkorea, sonyhack, korea, north, internet, sony, amp

News theinterview, sonypictures, sony, movie, korea, north, interview

Insider Joke theinterview, aint, hate, cuz, jealous, anus, peanutbutter

Page 29: Trend detection and analysis on Twitter

32

Sony Hack

Geospatial Analysis

Page 30: Trend detection and analysis on Twitter

33

Sony Hack

Sentiment Analysis

Neutral Negative Positive

#TheInterview SUCKS!!! @sethrogen

Like I knew it would #Stupid

#NotFunny

#Sony says #TheInterview made

more than $1 million at the box office

on in 1 single day on Dec. 25.

Happy I joined my fellow Americans

in the great #TheInterview Christmas

Day Viewing. Plus it was pretty funny,

truth be told.

Page 31: Trend detection and analysis on Twitter

34

Network Outage

Page 32: Trend detection and analysis on Twitter

35

Network Outage

Time Series

Page 33: Trend detection and analysis on Twitter

36

Network Outage

Word Cloud

Page 34: Trend detection and analysis on Twitter

37

Network Outage

Topic Modeling

Network Error xbox, psn, sign, connect, live, error, account, issues

Connection between Hacks xbox, playstation, watch, movie, fuckcrucifix, north, korea, interview

Xbox Down xbox, christmas, play, xboxlivedown, live, xboxlive, xboxsupport, day

Caused Damage playstation, dollar, psn, company, lizardsquad, sony, billion, multi

Hacker Group fuckcrucifix, lizardmafia, lizardsquad, fuck,lizard, squad, finestsquad, stop

Restored psn, back, playstation, online, askplaystation, network, psndown, working

Page 35: Trend detection and analysis on Twitter

38

Network Outage

Sentiment Analysis

Neutral Negative Positive

@XboxSupport f*** your servers, a

big ass company like you should

handle these teenage kids, terrible

@AskPlayStation when will the

service be back online because it says

there’s maintenance?

@PlayStation thanks for the great

year. I am sure this new year will be

amazing. Don’t allow yourselves to

be hacked ever again.

Page 36: Trend detection and analysis on Twitter

39

Conclusion

High quality insights into world’s interest

Twitter is very good for detecting and predicting trends

Maintaining a high data quality is important

Page 37: Trend detection and analysis on Twitter

40

#Questions

Benjamin Räthlein @B3nRa

Henning Muszynski @henningmus

Lukas Masuch @LukasMasuch