twitter trend detection and analysis

38
Trend Detection and Analysis on Twitter Benjamin Räthlein Henning Muszynski Lukas Masuch

Upload: henning-muszynski

Post on 08-Jan-2017

229 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Twitter Trend Detection and Analysis

Trend Detection and Analysison Twitter

Benjamin RäthleinHenning MuszynskiLukas Masuch

Page 2: Twitter Trend Detection and Analysis

2

Agenda

Motivation

Architecture

Data Preparation

Trend Analysis

Analyzed Trends

Conclusion

Page 3: Twitter Trend Detection and Analysis

3

Motivation

Predict the stock market in real timesource

source

Detecting influenza epidemics

Automatic crime predictionsource

“Successful results of mainly research-based projects

helped to open up new business opportunities”

Page 4: Twitter Trend Detection and Analysis

4

Twitter

Page 5: Twitter Trend Detection and Analysis

5

Early Trend Detector

Bag-of-words (Hashtags, Mentions)

Twitter Streaming API (Twython)

Architecture

Bag of Words

Bags Count

#newyear 7

#christmas 6

@bigdata 2

@sap 3

Page 6: Twitter Trend Detection and Analysis

6

Statistical MeasurementEarly Trend Detector

Bag-of-words (Hashtags, Mentions)

Twitter Streaming API (Twython)

Architecture

Statistical Measurement(growth, average usage, retweets, participating users…)

Report statistics (every 20 minutes):• Total hashtags & user mentions• Hashtag/mentions count• Usage growth per hashtag/mention• Participating users per hashtag/mention• Retweet count per hashtag/mention

Page 7: Twitter Trend Detection and Analysis

7

Early Trend Detector

Bag-of-words (Hashtags, Mentions)

Twitter Streaming API (Twython)

Architecture

Statistical Measurement(growth, average usage, retweets, participating users…)

Anomaly Detection

Time Series Analysis

Calculated for every hashtag / user mentionEvery 2 / 4 hours based on reports

Anomaly detection using:• Relative & absolute fluctuation• Total occurrences (sum)• Minimum occurrences• Maximum occurrences• Average occurrences

Time Series Analysis

Page 8: Twitter Trend Detection and Analysis

8

Twitter Streaming API (Twython)

Architecture

Trend Analyzer

Text Preprocessing (Python NLTK)

Lowercasing & tokenizing

URL & stopword removal

Stop Word Removal

This sample text shows which words willbe removed when applying stop word removal. Mostly words like the, a or and.

This sample text shows which words will be removed when applying stop word removal. Mostly words like the, a or and.

Page 9: Twitter Trend Detection and Analysis

9

Twitter Streaming API (Twython)

Architecture

Trend Analyzer

Text Preprocessing (Python NLTK)

URL & stopword removal

Lowercasing & tokenizing

Word stemming

Stemming

Amazing

Amazement

Amazed

amaze

Page 10: Twitter Trend Detection and Analysis

10

Twitter Streaming API (Twython)

Architecture

Trend Analyzer

Text Preprocessing (Python NLTK)

URL & stopword removal

Lowercasing & tokenizing

Word stemming

Sentiment Analysis

Sentiment Analysis

I love cookiesI hate cookies

Page 11: Twitter Trend Detection and Analysis

11

Twitter Streaming API (Twython)

Architecture

Trend Analyzer

Text Preprocessing (Python NLTK)

URL & stopword removal

Lowercasing & tokenizing

Word stemming

Sentiment Analysis

Topic Modeling (LDA)

Topic Modeling

Topics• …• …• …

Trend Classification

Page 12: Twitter Trend Detection and Analysis

14

Trend Analyzer

Text Preprocessing (Python NLTK)

URL & stopword removal

Lowercasing & tokenizing

Word stemming

Sentiment Analysis

Topic Modeling (LDA)

Wordcloud Visualization

Wordfreq.js

Wordcloud2.js

GeoSpatial Visualization

CartoDB

Early Trend Detector

Bag-of-words (Hashtags, Mentions)

Anomaly Detection

Statistical Measurement(growth, average usage, retweets, participating users…)

Time Series Analysis

Trend Classification

Twitter Streaming API (Twython)

Architecture

Page 13: Twitter Trend Detection and Analysis

15

Analyzed Trends

Page 14: Twitter Trend Detection and Analysis

16

Limitations

Tweets collected: 38 million (70GB)

Only English tweets from the USA

Twitter Streaming API

Page 15: Twitter Trend Detection and Analysis

17

New Year

Time Series

Page 16: Twitter Trend Detection and Analysis

18

New Year

Word Cloud

Page 17: Twitter Trend Detection and Analysis

19

New Year

Geospatial Analysis

Midnight Los Angeles Midnight New York

Page 18: Twitter Trend Detection and Analysis

20

New Year

Sentiment Analysis

Positive Neutral Negative

Home sick on #nye. Horrible timing

stupid cold. Ugh. My date is my

couch & pillow watching.

#HappyNewYear everyone.

#HappyNewYear from the Youth for

Astronomy and Engineering Program

at Space Telescope Science Institute!

Happy New Year! Last year was

amazing, and here’s to another great

year of love & happiness! #NYE2015

Page 19: Twitter Trend Detection and Analysis

21

Air Asia Tragedy

Page 20: Twitter Trend Detection and Analysis

22

Air Asia Tragedy

Time Series

Page 21: Twitter Trend Detection and Analysis

23

Air Asia Tragedy

Word Cloud

Page 22: Twitter Trend Detection and Analysis

24

Air Asia Tragedy

Topic Modeling

Newsairasia, missing, flight, air, Indonesia, singapore, asia

Search for the Planeairasia, missing, plane, find, plane, world, technology

SympathyPrayers, families, thoughts, airasia, crash, thought, airfrance

Causeairasia, weather, flight, pilots, fly, bad, path

International Helpraaf, butterworth, china, australia, Russia, trndnl, trending

Page 23: Twitter Trend Detection and Analysis

25

Air Asia Tragedy

Sentiment Analysis

Neutral Negative Positive

Prayers are USELESS! Stop repeating

meaningless crap, pretending that

you care … #PrayForAirAsia #QZ8501

#GrowABrain #ReligousNonsense

#BREAKING #AirAsia Flight #8501

likely “at the bottom of the sea”

rescue officials says.

May God’s great love shine on the

families and loved ones of all

passengers and crew #AirAsia #8501

Page 24: Twitter Trend Detection and Analysis

26

Air Asia Tragedy

Google Trends Comparison

Google Trends Twitter Sample

Page 25: Twitter Trend Detection and Analysis

27

Air Asia Tragedy

Google Trends Comparison

Google Trends Twitter Sample

Page 26: Twitter Trend Detection and Analysis

28

Sony Hack

Page 27: Twitter Trend Detection and Analysis

29

Sony Hack

Time Series

Page 28: Twitter Trend Detection and Analysis

30

Sony Hack

Word Cloud

Page 29: Twitter Trend Detection and Analysis

31

Sony Hack

Topic Modeling

Christmas Releasetheinterview, christmas, day, theaters, freedom, theater, showing

Reviewstheinterview, jamesfrancotv, sethrogen, movie, interview, funny, hilarious

Suspicionsnorthkorea, sonyhack, korea, north, internet, sony, amp

Newstheinterview, sonypictures, sony, movie, korea, north, interview

Insider Joketheinterview, aint, hate, cuz, jealous, anus, peanutbutter

Page 30: Twitter Trend Detection and Analysis

32

Sony Hack

Geospatial Analysis

Page 31: Twitter Trend Detection and Analysis

33

Sony Hack

Sentiment Analysis

Neutral Negative Positive

#TheInterview SUCKS!!! @sethrogen

Like I knew it would #Stupid

#NotFunny

#Sony says #TheInterview made

more than $1 million at the box office

on in 1 single day on Dec. 25.

Happy I joined my fellow Americans

in the great #TheInterview Christmas

Day Viewing. Plus it was pretty funny,

truth be told.

Page 32: Twitter Trend Detection and Analysis

34

Network Outage

Page 33: Twitter Trend Detection and Analysis

35

Network Outage

Time Series

Page 34: Twitter Trend Detection and Analysis

36

Network Outage

Word Cloud

Page 35: Twitter Trend Detection and Analysis

37

Network Outage

Topic Modeling

Network Errorxbox, psn, sign, connect, live, error, account, issues

Connection between Hacksxbox, playstation, watch, movie, fuckcrucifix, north, korea, interview

Xbox Downxbox, christmas, play, xboxlivedown, live, xboxlive, xboxsupport, day

Caused Damageplaystation, dollar, psn, company, lizardsquad, sony, billion, multi

Hacker Groupfuckcrucifix, lizardmafia, lizardsquad, fuck,lizard, squad, finestsquad, stop

Restoredpsn, back, playstation, online, askplaystation, network, psndown, working

Page 36: Twitter Trend Detection and Analysis

38

Network Outage

Sentiment Analysis

Neutral Negative Positive

@XboxSupport f*** your servers, a

big ass company like you should

handle these teenage kids, terrible

@AskPlayStation when will the

service be back online because it says

there’s maintenance?

@PlayStation thanks for the great

year. I am sure this new year will be

amazing. Don’t allow yourselves to

be hacked ever again.

Page 37: Twitter Trend Detection and Analysis

39

Conclusion

High quality insights into world’s interest

Twitter is very good for detecting and predicting trends

Maintaining a high data quality is important

Page 38: Twitter Trend Detection and Analysis

40

#Questions

Benjamin Räthlein@B3nRa

Henning Muszynski@henningmus

Lukas Masuch@LukasMasuch