network analytics meets text mining for social media analysis

41
Network Analytics meets Text Mining for Social Media Analysis Dr. Rosaria Silipo

Upload: stu

Post on 26-Feb-2016

114 views

Category:

Documents


5 download

DESCRIPTION

Network Analytics meets Text Mining for Social Media Analysis. Dr. Rosaria Silipo. Social Media Data Water Water Everywhere , and not a drop to drink. Social Media Data Water Water Everywhere , and not a drop to drink. What companies do with it: Download and keep - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Network Analytics meets

Text Mining for Social Media Analysis

Dr. Rosaria Silipo

Page 2: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Social Media DataWater Water Everywhere, and not a drop to drink

2

Page 3: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Social Media DataWater Water Everywhere, and not a drop to drink

What companies do with it:• Download and keep• Topic [Shift] Detection (email content routing,

detect market interest shift, clinical studies, query non structured DBs, ...)

• Sentiment Analysis (marketing, polls, elections, ...)

• Connection Analysis (influencers, risk analysis, ...)

• .... 3

Page 4: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Social Media DataWater Water Everywhere, and not a drop to drink

The Analysis Tools:• Web Crawlers• Visual Exploration• Topic Detection (Text Mining, NLP,

Ontologies)• Sentiment Score (Text Mining, NLP)• Influence Score (Network Analytics)• Find Groups (Predictive Analytics)

4

Page 5: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Case Study Example: Slashdot Data

5

Basic Numbers:

• 24532 users

• 491 threads with • 15 – 843 responses• 12 – 507 users

• 113505 posts

• 60 main topics

• Selected Topic: Politics

Post

Comments

Page 6: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Case Study Example: Slashdot• Very rich data sources about customers

!

• We want to establish:• How users feel about the discussed topic• Whether it matters how users feel• A more general abstraction of the results

6

Sentiment Analysis

Network Analytics

Clustering

Page 7: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Sentiment AnalysisRemove anonymous users, group by PostID Words Tagging

Positive words

Negative words

MPQACorpus

BoW

, Ent

ity F

ilter

, Wor

d Fr

eque

ncy,

A

ttitu

de C

alcu

latio

n by

Doc

umen

t

User Bins

Word cloud for selected users

Total Attitude by User

Page 8: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Slashdot – Text MiningMost Negative User pNutz

Page 9: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Slashdot – Text MiningMost Positive User dada21

Page 10: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Slashdot – Sentiment Analysis• 16016 positive users• 7107 negative users• Most positive user: dada21 (2838 positive/1725 negative

words)• Most negative user: pNutz (43 positive/109 negative

words)• Which Topics have positive users in common ?

– Government– People– Law/s– Money– Market– Parties

Page 11: Network  Analytics  meets  Text  Mining  for Social Media Analysis

11

Network CreationUser

1

User2

User3

User6

User4

User5

Page 12: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Topic Graphs

12

Page 13: Network  Analytics  meets  Text  Mining  for Social Media Analysis

14

Topic Graph: NASA

Page 14: Network  Analytics  meets  Text  Mining  for Social Media Analysis

15

Topic Graph: Sci-Fi

Page 15: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Hubs & Authorities

16

• Hubs = Followers• Authorities = Leaders

Filtering anonymous users and creating network Centrality index to define hub weight

and authority weight

Users with hub and authority weights and

other features

Page 16: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Hubs & Authorities

17

dada21

Doc Ruby

Carl Bialik from the WSJ

pNutz

99BottlesOfBeerInMyF

Tube Steak

Page 17: Network  Analytics  meets  Text  Mining  for Social Media Analysis

18

KNIME: Bringing it all together

Network Analysis

Text Analysis

Users with hub and authority weights and other features

Users bins: positive, negative, neutral

Page 18: Network  Analytics  meets  Text  Mining  for Social Media Analysis

19

Carl Bialik from the WSJ

dada21

Doc Ruby

99BottlesOfBeerInMyFWeb

Hos

ting

Guy

pNutz

Tube Steak

Catbeller

Page 19: Network  Analytics  meets  Text  Mining  for Social Media Analysis

What we have found ...- The positive

leaders- The neutral

leaders- The negative

leaders- The inactive users

20

What identifies each group?How do I identify a new user?How do I handle each user?

Page 20: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Why Clustering?

- No a priori knowledge (not even on a subset of users)

- Prediction and interpretation capabilities required

21

k-Means algorithm

Page 21: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Re-sampling the Training Set

23

k = 10

Page 22: Network  Analytics  meets  Text  Mining  for Social Media Analysis

The k-Means Clusters

24

Page 23: Network  Analytics  meets  Text  Mining  for Social Media Analysis

The k-Means Clusters

25

Superfans

Negativeusers

Neutralusers

Fans

Page 24: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Additional Discoveries• There are only very few real leaders!

Authority and hub scores identify active participants rather than leaders.

• Superfans can be found in cluster_3 • Negative and (sigh!) active users are collected in

cluster_1.• Neutral users are usually inactive (cluster_2,

cluster_7, and cluster_8)• Positive users with different degrees of activity are

scattered across the remaining clusters.26

Page 25: Network  Analytics  meets  Text  Mining  for Social Media Analysis

The operational Workflow

27

Pre-processingCluster Extraction

Assignment of new data

Page 26: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Notes• MPQA Corpus: publicly available

Subjectivity Lexicon (http://www.cs.pitt.edu/mpqa/lexicons.html)

• User Characterization is Sum -> Mean• NLP: No sentence splitting, no negation

identification.• For a more refined syntax-based

sentiment analysis -> „External Tool“ node 28

Page 27: Network  Analytics  meets  Text  Mining  for Social Media Analysis

External Tool NodeThe „External Tool“ node executes any

external program from command line1. Writes input data to an input file2. Calls Tool to run on input file and command

line options and to write results to output file

3. Reads output file and presents data at output port

29

Page 28: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Alternative Sentiment AnalysisFree non-interactive Command Line

running Tools for Sentiment Analysis not found

SentiStrength v2.2 (still interactive)

30External Tool and

Generic Web Service Client

Page 29: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Web Crawling Workflow

31

Community Web Crawler Node

XML Parsing Nodes

Page 30: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Next Steps- Integrate topic information- Integrate user demographic and

behavioural information- Discover [time series] patterns for

early detection of negative users and superfans

- Try other techniques, maybe even on manually segmented data, to discover new user segments 32

Page 31: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Where do I find more?Whitepaper: [email protected] Complete Workflows + Data:

www.knime.com - text mining- network mining- combined analysis

(note the above 3 process huge data and require 16G memory)– clustering

Open Source Software: KNIME www.knime.com 33

Page 32: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Next Appointment

User Day US Boston (free)

October 22nd 2013 10:00 -17:00Microsoft New England R&D Center (NERD) One Memorial Drive, Suite 100, Cambridge

http://www.knime.com/user-day-boston-2013

34

Page 33: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Hands-on Session

1. Download KNIME from www.knime.com

35

Page 34: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Hands-on Session2. Install ExtensionsHelp -> Install New

SoftwareSelect:

• KNIME & Extensions

In KNIME Labs Extensions, select:

• KNIME Network Mining• KNIME Textprocessing

36

Page 35: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Hands-on Session3. Get workflows and Slashdot data

• Get workflows from USB stick (KNIMEBoston2013.zip)• Text Mining• Network Analytics• Text and Network Mining• Social Media Clustering

• Slashdot Raw Data is included in the downloaded workflows

• A smaller set of data is available, Slashdot Reduced Data, for lower memory requirements

• Both data sets are available from USB Stick 37

Page 36: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Hands-on Session3. Import Workflows

38

Page 37: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Hands-on SessionMemory Increase in knime.ini-startupplugins/org.eclipse.equinox.launcher_1.2.0.v20110502.jar--launcher.libraryplugins/org.eclipse.equinox.launcher.win32.win32.x86_64_1.1.100.v20110502-vmargs-Xmx2G-XX:MaxPermSize=256m-server-Dsun.java2d.d3d=false-Dosgi.classloader.lock=classname-XX:+UnlockDiagnosticVMOptions-XX:+UnsyncloadClass-Dknime.enable.fastload=true-Djava.library.path=C:\Users\rosy\Documents\R\win-library\2.15\rJava\jri\x64

39

Page 38: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Hands-on Session

5. Improve Workflows: Text Mining

40

Data Reading

Data Preprocessing

Reading Tag Corpus

Tagging Words

BoW

Scoring and Tag Cloud

Page 39: Network  Analytics  meets  Text  Mining  for Social Media Analysis

Hands-on Session

6. Improve Workflows: Network Analytics

41

Data Reading andpreprocessing

Create Network Object Clean up Network

Visualize Network

Page 40: Network  Analytics  meets  Text  Mining  for Social Media Analysis

zoomba

42

Page 41: Network  Analytics  meets  Text  Mining  for Social Media Analysis

nahdude812

43