meena nagarajan ph.d. dissertation defense

Understanding User-generated Content on

Social Media Meena Nagarajan

Ph.D. Dissertation DefenseKno.e.sis Center, College of Engineering and Computer Science

Wright State University

1

Introductions and Thank-you!

2

Social Information Needs

• Facts, Networked Public Conversations, Opinions, Emotions, Preferences..

3

Social Information Needs

• Can we use this information to assess a population’s preference?

• Can we study how these preferences propagate in a network of friends?

• Are such crowd-sourced preferences a good substitute for traditional polling methods?

4

!" !"

Social Information Processing

• "Who says what, to whom, why, to what extent and with what effect?" [Laswell]

• Network: Social structure emerges from the aggregate of relationships (ties)

• People: poster identities, the active effort of accomplishing interaction

• Content : studying the content of communication.

ABOUTNESS of textual user-generated content via the lens of TEXT MINING

5

Aboutness Of Text

• One among several terms used to express certain attributes of a discourse, text or document

• characterizing what a document is about, what its content, subject or topic matter are

• A central component of knowledge organization and information retrieval

• For machine and human consumption

6

Aboutness & Subgoals in IE

• Named entity recognition

• Co-reference, anaphora resolution• "International Business Machines" and "IBM"; ‘he’ in a passage

refers to the mention of ‘John Smith’

• Terminology, key-phrase, lexical chain extraction

• Relationship and fact extraction• ‘person works for organization’

7

Text Mining and Aboutness

• Thesis focus: `Aboutness’ understanding via Text Mining

• Gleaning meaningful information from natural language text useful for particular purposes

• Indicators of thematic elements for aboutness

• via NER, Key phrase extraction

8

Aboutness & The Role Of Context

• Extracting thematic elements : interpretation of the individual elements in context.

• (a) I can hear bass sounds. (b) They like grilled bass.

• Typical context cues that are employed

• Word Associations, Linguistic Cues, Syntactic, Structural Cues, Knowledge Sources..

9

10

1.2. THESIS CONTRIBUTIONS – ‘ABOUTNESS’ OF INFORMAL TEXT August 10, 2010

User-generated content on Twitter during the 2009 Iran Election

show support for democracy in Iranadd green overlay to your Twitter avatar with 1-click - http://helpiranelection.com/

Twitition: Google Earth to update satellite images of Tehran#Iranelection http://twitition.com/csfeo @patrickaltoft

Set your location to Tehran and your time zone to GMT +3.30.Security forces are hunting for bloggers using location/timezone searches

User comments on music artist pages on MySpace

Your music is really bangin!

You’re a genius! Keep droppin bombs!

u doin it up 4 real. i really love the album.

hey just hittin you up showin love to one ofchi-town’s own. MADD LOVE.

Comments on Weblogs about movies and video games

I decided to check out Wanted demo today even though I really did not like the movie

It was THE HANGOVER of the year..lasted forever..so I went to the movies..bad choice picking GI Jane worse now

Excerpt from a blog around the 2009 Health Care Reform debate

Hawaii’s Lessons - NY timesIn Hawaii’s Health System, Lessons for LawmakersSince 1974, Hawaii has required all employers to provide relatively generoushealth care benefits to any employee who works 20 hours a week or more. Ifhealth care legislation passes in Congress, the rest of the country may barelycatch up.Lawmakers working on a national health care fix have much to learnfrom the past 35 years in Hawaii, President Obama’s native state.Among the most important lessons is that even small steps to change the sys-tem can have lasting effects on health. Another is that, once benefits are en-trenched, taking them away becomes almost impossible. There have not beenany serious efforts in Hawaii to repeal the law, although cheating by employersmay be on the rise. But perhaps the most intriguing lesson from Hawaii has todo with costs. This is a state where regular milk sells for $8 a gallon, gasolinecosts $3.60 a gallon and the median price of a home in 2008 was $624, 000 Ñthe second-highest in the nation.

Figure 1.1: Examples of user-generated content from different social media platforms

5

10

















5

Unmediated Interpersonal communication

Informal English Domain

10

















5



Context is implicit

Interactions between like-minded people

10

















5



Context is implicit


Variations and creativity in expression

Properties of the medium

10

















5



Context is implicit


Variations and creativity in expression

Properties of the medium

One solution rarely fits all

social media content

Thesis Contributions• Compensating for informal highly variable

language, lack of context

• Examining usefulness of multiple context cues for text mining algorithms

• Context cues: Document corpus, syntactic, structural cues, social medium and external domain knowledge

• End goal: NER, Key Phrase Extraction11

Thesis Statements

• We show that for 2 Aboutness Understanding tasks -- NER, Key Phrase Extraction

• Multiple contextual information can supplement and improve the reliability and performance of existing NLP/ML algorithms

• Improvements tend to be robust across domains and data sources

12

13

Con

text

Cue

s

In Content

Medium Metadata,

Structural cues

External Knowledge

Sources

Text Formality

Thesis ContributionsTask : Aboutness of text

Weblogs MySpace Music Forum

NER - Movie Names NER - Music Album/Track names

I loved your music Yesterday!

“It was THE HANGOVER of the year..lasted forever.. so I went to the movies..bad choice

picking “GI Jane” worse now”

13

Con

text

Cue

s

In Content

Medium Metadata,

Structural cues

External Knowledge

Sources

Text Formality



Wikipedia Infoboxes

Word Associations from large corpora

Blog URL, Title, Post URL


13

Con

text

Cue

s

In Content

Medium Metadata,

Structural cues

External Knowledge

Sources

Text Formality



Wikipedia Infoboxes

Word Associations from large corpora

Blog URL, Title, Post URL

Word associations from large corpora, POS Tags, Syntactic

Dependencies

Music Brainz, UrbanDictionary

Page URL


14

Con

text

Cue

s

In Content

Medium Metadata,

Structural cues

External Knowledge

Sources

Text Formality


Twitter Facebook, MySpace Forums

Key Phrase Extraction Key Phrase Elimination

4.1. KEY PHRASE EXTRACTION - ‘ABOUTNESS’ OF CONTENT August 10, 2010

document that are descriptive of its contents.

The contributions made in this thesis fall under the second category of extracting key phrases

that are explicitly present in the content and are also indicative of what the document is ‘about’.

The focus of previous approaches to key phrase extraction have been on extracting phrases

that summarize a document, e.g. a news article, a web page, a journal article or a book. In

contrast, the focus of this thesis is not in summarizing a document generated by users on social

media platforms but to extract key phrases that are descriptive of information present in multiple

observations (or documents) made by users about an entity, event or topic of interest.

The primary motivation is to obtain an abstraction of a social phenomenon that makes volumes

of unstructured user-generated content easily consumable by humans and agents alike. As an

example of the goals of our work, Table 4.1 shows key phrases extracted from online discussions

around the 2009 Health Care Reform debate and the 2008 Mumbai terror attack, summarizing

hundreds of user comments to give a sense of what the population cared about on a particular day.

2009 Health Care Reform 2008 Mumbai Terror Attack

Health care debate Foreign relations perspectiveHealthcare staffing problem Indian prime minister speechObamacare facts UK indicating supportHealthcare protestors Country of IndiaParty ratings plummet Rejected evidence providedPublic option Photographers capture images of Mumbai

Table 4.1: Showing summary key phrases extracted from more than 500 online posts on Twitteraround two news-worthy events on a single day.

Solutions to key phrase extraction have ranged from both unsupervised techniques that are

based on heuristics to identify phrases and supervised learning approaches that learn from human

105

14

Con

text

Cue

s

In Content

Medium Metadata,

Structural cues

External Knowledge

Sources

Text Formality



n-grams for thematic cues

spatial, temporal metadata


14

Con

text

Cue

s

In Content

Medium Metadata,

Structural cues

External Knowledge

Sources

Text Formality



n-grams for thematic cues

spatial, temporal metadata

Word associations from large corpora

Page Title


Seeds from a Domain Knowledge base

WHY

15

WHAT

HOW

Building on results of NER, Key Phrase Extraction

WHEREWHEN

WHO

Thesis ContributionsBuilding Social Intelligence Applications

WHY

15

WHAT

HOW

Social Intelligence Applications

1. Application of NER results : BBC Sound Index with IBM Almaden2. Application of Key Phrase Extraction : Twitris @ Knoesis

Building on results of NER, Key Phrase Extraction

WHEREWHEN

WHO

Thesis ContributionsBuilding Social Intelligence Applications

Thesis Significance, Impact• Focuses on relatively less explored content aspects

of expression on social media platforms

• Why text on social media is different from what most text mining applications have focused on

• Combination of top-down, bottom-up analysis for informal text

• Statistical NLP, ML algorithms over large corpora

• Models and rich knowledge bases in a domain16

TALK OUTLINE - In Detail

ABOUTNESS UNDERSTANDING

• Named Entity Identification in Informal Text

TALK OUTLINE - Overviews

• Topical Key Phrase Extraction from Informal Text

• Applications and Consequences of Understanding content : Social Intelligence Application

• BBC SoundIndex, Twitris

17

18

Named Entity RecognitionI loved your music Yesterday!

“It was THE HANGOVER of the year..lasted forever..

so I went to the movies..bad choice picking “GI Jane” worse now”

Thesis Contributions

19

Predominant Focus of Prior Work Thesis Focus

Entity Type Focus : PER, LOC, ORGN, DATE, TIME.. [TREC]

Entity Type Focus: Cultural Entities

Method: Sequential Labeling Method: Spot and Disambiguate (pre-supposed knowledge)

Document Types: Scientific Literature, News, Blogs (formal)

Document Types: Social Media Content, Blogs, MySpace Forums

Features: Word-Level Features, List-lookup Features, Documents and

corpus features


corpus features

Cultural Named Entities

20

• NER focus in my work: Cultural Named Entities

• Name of a books, music albums, films, video games, etc.

• The Lord of the Rings, Lips, Crash, Up, Wanted, Today, Twilight, Dark Knight...

• Common words in a language

Characteristics of Cultural Entities

• Varied senses, several poorly documented• Merry Christmas covered by 60+ artists

Star Trek: movies, tv series, media franchise.. and cuisines !!

• Changing contexts with recent events• The Dark Knight reference to Obama, health care reform

• Unrealistic expectations: Comprehensive sense definitions, enumeration of contexts, labeled corpora for all senses ..

21

Characteristics of Cultural Entities

• Varied senses, several poorly documented• Merry Christmas covered by 60+ artists

Star Trek: movies, tv series, media franchise.. and cuisines !!

• Changing contexts with recent events• The Dark Knight reference to Obama, health care reform

• Unrealistic expectations: Comprehensive sense definitions, enumeration of contexts, labeled corpora for all senses ..

21NER Relaxing the closed-world sense assumptions


22


Entity Types : PER, LOC, ORGN, DATE, TIME Entity Type Focus: Cultural Entities



Document Types: Social Media Content, Blogs, MySpace Forums


corpus features


corpus features

A Spot and Disambiguate Paradigm

23

• NER generally a sequential prediction problem• NER system that achieves 90.8 F1 score on the CoNLL-2003 NER

shared task (PER, LOC, ORGN entities) [Lev Ratinov, Dan Roth]

• My approach: Spot and Disambiguate Paradigm• Dictionary or list of entities we want to spot

• Disambiguate in context (natural language, domain knowledge cues)

• Binary Classification


24


Entity Types : PER, LOC, ORGN, DATE, TIME Entity Type Focus: Cultural Entities



Document Types: Informal Social Media Content, Blogs, MySpace

Forums, Twitter, Facebook

Features: Word-Level Features, List-lookup Features, Documents

and corpus features

Features: SENSE BIASED Word-Level Features, List-lookup

Features, Documents and corpus features

NER Algorithmic Contributions Supervised, Two Flavors

25

3.2. THESIS FOCUS - CULTURAL NER IN INFORMAL TEXT August 10, 2010

(a) Multiple Senses in the same Music DomainBands with a song “Merry Christmas” 60Songs with “Yesterday” in the title 3,600Releases of “American Pie” 195Artists covering “American Pie” 31

(b) Multiple senses in different domains for the same movie entitiesTwilight Novel, Film, Short story, Albums, Places, Comics, Poem, Time of dayTransformers Electronic device, Film, Comic book series, Album, Song, Toy LineThe Dark Knight Nickname for comic superhero Batman, Film, Soundtrack, Video game,

Themed roller coaster ride

Table 3.3: Challenging Aspects of Cultural Named Entities

3.2.3 Two Approaches to Cultural NER

In this thesis, we present two approaches to Cultural NER, both addressing different challenges in

their identification. Cultural entities display two characteristic challenges related to their sense or

meanings – certain Cultural entities are so commonly used that they tend to have multiple senses

in the same domain. The music industry is a great example of this scenario where popular themes

feature in several track/album titles of different artists. Table 3.3(a) shows examples of such cases

– for example, there are more than 3600 songs with the word ‘Yesterday’ in their title.

Connecting mentions of such entities in free text to their actual real-world references is rather

challenging, especially in light of poor contextual information. If a user post mentioned the song

‘Merry Christmas’, as in, “This new Merry Christmas tune is so good!”; it is non-trivial to disam-

biguate its reference to one among 60 artists who have covered that song.

On the other hand, there are Cultural entities that span multiple domains. The phrase, ‘The

Hangover’ is a named entity in the film and music domain. Movies that are based on novels

or video games are great examples of such cases of sense ambiguity. Resolving the mention of

‘Wanted’ in Figure 3.2 as a reference to the video game entity (and not the movie reference) is a

38

“I am watching Pattinson scenes in <movie id=2341>Twilight</movie> for the nth time.” “I spent a romantic evening watching the Twilight by the bay..”

“I love <artist id=357688>Lilyʼs</artist> song <track id=8513722>smile</track>”.

NER - Approach 1

26

Approach 1: Multiple Senses, Multiple Domains

• When a Cultural entity appears in multiple senses across domains in the same corpus

27

3.3. CULTURAL NER – MULTIPLE SENSES ACROSS MULTIPLE DOMAINSAugust 10, 2010

Title: Peter Cullen Talks Transformers: War for Cybertron

Recently, we heard legendary Transformers voice actor Peter

Cullen talk not only about becoming an hero to millions for his

portrayal of the heroic Autobot leader, Optimus Prime, but also

about being the first person to play the role of video game icon

Mario. But today, he focuses more on the recent Transformersvideo game release, War for Cybertron.

Following are some excerpts from an interview Cullen recently

conducted with Techland. On how the Optimus Prime seen in War

for Cybertron differs from the versions seen in other branches of

the franchise and its multiverse...

Figure 3.1: Showing excerpt of a blog discussing two senses of the entity ‘Transformers’

the second mention.

3.3.1 A Feature Based Approach to Cultural NER

In this work, we propose a new feature that represents the complexity of extracting particular en-

tities. We hypothesize that knowing how hard it is to extract an entity is useful for learning better

entity classifiers. With such a measure, entity extractors become ‘complexity aware’, i.e. they can

respond differently to the extraction complexity associated with different target entities.

Suppose that we have two entities, one deemed easy to extract and the other more complex.

When a classifier knows the extraction complexity of the entity, it may require more evidence (or

apply more complex rules) in identifying the more complex entity compared to the easier target.

Consider concretely a movie recognition system dealing with two movies, say, ‘The Curious Case

of Benjamin Button’ a title appearing only in reference to a recent movie, and ‘Wanted’, a segment

with wider senses and intentions. With comparable signals a traditional NER system can only

apply the same inference to both cases whereas a ‘complexity aware’ system has the advantage of

42

Algorithm Preliminaries• Problem Space

• Corpus: Weblogs, Distribution: unknown

• All senses of a cultural entity: unknown

• Problem Definition

• Input: A target Sense (e.g., movie); List of Entities to be extracted

• Goal: Disambiguating every entity’s mention as related to target sense or not

28

Contribution: Improving NER - feature-based approach

• Improving classifiers using a novel feature• the “complexity of extraction” in a target sense

• Hypothesis: knowing how hard or easy it is to extract this entity in a particular sense will improve extraction accuracy of learners

29

Contribution: Improving NER - feature-based approach

• Improving classifiers using a novel feature• the “complexity of extraction” in a target sense

• Hypothesis: knowing how hard or easy it is to extract this entity in a particular sense will improve extraction accuracy of learners

• Making classifiers ‘complexity aware’ • ‘The Curious Case of Benjamin Button’ vs. ‘Wanted’

29

Overview

The Curious Case of Benjamin ButtonTwilightDate NightDeath at a FuneralThe Last SongUpAngels and Demons

Sample Population

List of movies to extract

Uncharacterized population (blog corpus), target sense (movies)

Overview


Sample Population

The Curious Case of Benjamin Button



Overview


Sample Population

The Curious Case of Benjamin Button 0.2

Complexity of ExtractionEntityList of movies to extract


Overview


Sample Population


Date Night



Overview


Sample Population


Date Night 0.5



Overview


Sample Population


Date Night 0.5

Complexity of ExtractionEntity

Use Complexity of Extraction as a feature in named entity classifiers



Overview


Sample Population


Date Night 0.5

Complexity of ExtractionEntity

Use Complexity of Extraction as a feature in named entity classifiers



NOTE: An entity occurring in fewer varied senses (The Curious Case of Benjamin Button) could still have a high complexity of extraction if the distribution is skewed away from the

sense of interest!

Extraction in a Target Sense

• Complexity of extraction in a sense of interest = how much support in corpus toward that sense

31



• How do we find this?

31



• How do we find this?• Documents that mention the entity in word contexts that

are biased to our sense of interest (language models)

31



• How do we find this?• Documents that mention the entity in word contexts that

are biased to our sense of interest (language models)

• More document, implies a lot of support, implies easy to extract, low complexity of extraction

31

Support via Word Associations

• Co-occurring words alone wont cut it!

• Prolific discussion and comparison of different senses

• Co-occurrence based language models will give us everything unless we bias it to our sense (movies)

32

Complexity of Extraction• Goal: Complexity of Extraction in a target sense

• Subgoal: Support in terms of sense-biased contexts in documents that mention entity

• Step1: Extract a sense-biased LM

• Step 2: Identify documents that mention entity in the context of the sense-biased LM

33

Knowledge Features to seed Sense-biased Word Association Gathering

34

• Sense Definition (hints) from Wikipedia Infoboxes

• Working definition: Sense is domain of interest

Knowledge Features to seed Sense-biased Word Association Gathering

34

• Sense Definition (hints) from Wikipedia Infoboxes

• Working definition: Sense is domain of interest

• Use sense hints to derive contextual support

Lot of support, easy to extract, implies a low ‘complexity of extraction’ score!

Measuring ‘complexity of extraction’

• Step 1: Propagate sense evidence in contexts of e, extract a sense-biased language model (LM)• random walks, distributional similarity approaches

• SPREADING ACTIVATION NETWORKS

35

Two step framework (unsupervised)

D e e

Measuring ‘complexity of extraction’

• Step 1: Propagate sense evidence in contexts of e, extract a sense-biased language model (LM)• random walks, distributional similarity approaches

• SPREADING ACTIVATION NETWORKS

35

Two step framework (unsupervised)

Sense hint nodesSense-biased Language Model

D e e

Overview

• Result: Clustered Documents in similar senses

• Not just similar words!

SenseRel doc 1 doc 2 doc n

sense LM term 1 SenseRel (t1) SenseRel (t1) SenseRel (t1)

sense LM term 2 SenseRel (t2)

sense LM term m SenseRel (tm) SenseRel (tm)

• Step 2: Clustering documents represented by sense-relatedness vectors

• CHINESE WHISPERS CLUSTERING

Constructing the SAN

Constructing the SANJ. J. Abrams Damon Lindelof Roberto Orci Alex Kurtzman Paramount Pictures Chris Pine Zachary Quinto Eric Bana Zoe Saldana Karl Urban John Cho Anton Yelchin Simon Pegg Bruce Greenwood Leonard Nimoy Kirk Spock Nero Pavel Chekov Nyota Uhura .. Greenwood Leonard Nimoy Pavel Chekov Nyota Uhura

Star Trek Startrek


Star Trek Startrek

indicative of being a Named Entity


Star Trek Startrek

10 minutes. That is all it took for JJ Abrams to make a believer out of me. 10 minutes. Let us set the stage for my viewing of Star Trek. IMAX? Check. Perfect seats? Check..not sit well with me was the libidinous Spock. It changed one of the fundamental aspects of the character for no good reason. Other than that, however, none of the changes to Trek canon particularly bothered me in a "get a life" kind of way.………….the special effects were stunning, and the performances were...wow. Chris Pine IS James T. Kirk. Karl Urban IS Leonard McCoy…Spock



Star Trek Startrek


Top X keywords (IDF) - Among the context surrounding (but excluding) entity of interest -  Force include sense related words

Spock IMAX .. Kirk Karl Urban James .. canon Chris Pine libidinous



Star Trek Startrek




Activation Network



Star Trek Startrek




Activation Network

Let us set the stage for my viewing of Star Trek. IMAX? Check. Perfect seats? Check..not sit well with me was the libidinous Spock. It changed one of



Star Trek Startrek




Activation Network

Let us set the stage for my viewing of Star Trek. IMAX? Check. Perfect seats? Check..not sit well with me was the libidinous Spock. It changed one of

libidinous

Spock 1 imax

1 1



Star Trek Startrek




Activation Network

effects were stunning, and the performances were...wow. Chris Pine IS James T. Kirk. Karl Urban IS Leonard McCoy…Spock

libidinous

Spock 1 imax

1 1



Star Trek Startrek




Activation Network


Chris Pine

Kirk 1

1

1 libidinous

Spock 1 imax

1 1



Star Trek Startrek




Activation Network


Chris Pine

Kirk 1

1

1 Repeat this procedure for all blogs End up with a connected SAN With some sense Nodes and other words in context of entity

libidinous

Spock 1 imax

1 1


Node and Edge Semantics• Pre-adjustment phase

• Node weights: Sense nodes: 1; Other nodes: 0.1

• ambiguous sense nodes

• alternate seeding methods: distributional similarity with unambiguous domain terms (movie, theatre, imax, cinemas)

• Edge weights: co-occurrence counts

38

Node and Edge Semantics• Pre-adjustment phase

• Node weights: Sense nodes: 1; Other nodes: 0.1

• ambiguous sense nodes

• alternate seeding methods: distributional similarity with unambiguous domain terms (movie, theatre, imax, cinemas)

• Edge weights: co-occurrence counts

38

1

1

1

1

1

Eric Bana

Sulu

Romulan

movie

franchise

Chris pine

J. J. Abrams starship

seats

Constructing the Spreading Activation Network G from

words co-occurring with e in D

sense hints Yother vertices X

Propagating Sense Evidences Pulse sense nodes and spread effect As many pulses (iterations) as number of

sense nodes

39

1

1

1

1

1

Eric Bana

Sulu

Romulan

movie

franchise

Chris pine


seats





sense nodes

39

At every iterationA BFS walk starting at a sense node (weight 1)Revisiting nodes not edgesAmplifying weights of visited nodes:W [ j ] = W [ j ] + (W [ i ] * co-occ[ i, j ] * α)

1

1

1

1

1

Eric Bana

Sulu

Romulan

movie

franchise

Chris pine


seats





sense nodes

39


1

1

1

1

1

Eric Bana

Sulu

Romulan

movie

franchise

Chris pine


seats




Collective Spreading controlled by dampening factor α, co-occurrence thresholds

Propagating Sense Evidences

Eric Bana

Sulu

Romulan

movie

franchise

Chris pine


seats

non-activated vertices

Post Propagation of Sense Evidences:

Spreading Activation Theory

Pulse sense nodes and spread effect As many pulses (iterations) as number of

sense nodes

39

Final activated portions of the network indicate word’s

relatedness to sense = sense-biased LM


Collective Spreading controlled by dampening factor α, co-occurrence thresholds

Sense-biased LM

Entity: Star Trek(movie)20 iterations (pulsed sense nodes)900+ blogs, 35K+ words in co-occ graph167 words in the LM

40

Sense-biased LM

Entity: Star Trek(movie)20 iterations (pulsed sense nodes)900+ blogs, 35K+ words in co-occ graph167 words in the LM

Sense-biased Spreading Activation already lends one type of clustering (separation of words strongly related to our sense)

40

Documents D Represented in terms of LMe




di(LMe) = {w1, LMe(w1) ; .. wx, LMe(wx) }

Step2:Clustering using Extracted LMAlgorithmic Implementations

41

Vector Space ModelTypically: word, tfidf score

Here: word, sense relatedness score

Step2:Clustering using Extracted LMAlgorithmic Implementations

41



http://realart.blogspot.com/2009/05/star-trek-balance-of-terror-from.html


http://susanisaacs.blogspot.com/2009/04/quantum-leap-convention.html

http://semioblog.blogspot.com/2009/01/retrofuturo-web.html http://wilwheaton.net/2006/05/learn_to_swim.php

No Representation

Step2:Clustering using Extracted LM

Clustering documents in D along same the dimensions of

propagation
















Algorithmic Implementations

Cluster scores are as high as sense-relatedness scores of terms in

documents in clusters41





http://susanisaacs.blogspot.com/2009/04/quantum-leap-convention.html

http://semioblog.blogspot.com/2009/01/retrofuturo-web.html http://wilwheaton.net/2006/05/learn_to_swim.php

No Representation

Chinese Whispers

42

*[Biemann 2006] Biemann, C. (2006): Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems. Proceedings of the HLT-NAACL-06 Workshop on Textgraphs-06,!New York, USA.

Clustering documents in D along same the dimensions of

propagation
















!  Randomized graph-clustering algorithm !  for undirected, weighted graphs

!  Nodes are documents, edges represent Dot Product similarity between documents !  Feature vector = language model from Step1

!  Partitions nodes i.e. documents based on max average similarity

High and Low Scoring Clusters

43

++++++++++CLUSTER++++++++++On cluster 1032 total of 4 members and score = 7.46775436953479E-05 http://...mish-mashy-hodge-podge-with-no.html Keywords adventure 0.0016552556767791 1 movie 0.000477036040807386 5 http://perchedraven.blogspot.com/2006/06/reading-habits.html Keywords adventure 0.0016552556767791 1 movie 0.000477036040807386 4 http:/..9/05/27/new-york-times-crossword-will-shortz-corey-rubin/ Keywords adventure 0.0016552556767791 1 movie 0.000477036040807386 1 http://....no-doubt-have-heard-by-now.html Keywords adventure 0.0016552556767791 1 movie 0.000477036040807386 1

++++++++++CLUSTER++++++++++On cluster 2382 total of 4 members and score = 0.130057194715825 http://torontomike.com/2009/05/advanced_screening_of_pixars_u.html Keywords comedy 0.00256627265885554 1 adventure 0.0016552556767791 2 carl 0.020754281327519 1 pete docter 0.0549975353578234 1 carl fredricksen 0.133630375166943 1 russell 0.116638105837327 1 pixar 0.0048327182073532 2 digital 5.70733953613266E-05 1 disney 2.05783382714942E-06 2 fredricksen 1 1 docter 0.016306935328765 1 http://theplaylist.blogspot.com/2009/05/up-pixars-latest-is-profoundly.html Keywords russell 0.116638105837327 2 carl 0.020754281327519 3 carl fredricksen 0.133630375166943 1 pete docter 0.0549975353578234 1 comedy 0.00256627265885554 1 animation 0.0164047754350987 1 pixar 0.0048327182073532 7 film 1.32766200713231E-05 4 adventure 0.0016552556767791 1 movies 0.0399341006575118 2

Low scoring cluster - less evidence for relatedness to sense

High scoring cluster

From Clusters to SupportClustering documents in D along

same the dimensions of propagation
















44

Cluster scores are as high as sense-relatedness

scores of terms in documents in clusters

!  A conservative estimate, a heuristic

!  Average strength of all clusters (A) !  average sense relatedness

!  C* = Clusters with score >= A

!  Support = No. of documents in strong sense related clusters |C*| / No. of documents mentioning the entity |D|

From Clusters to SupportClustering documents in D along

same the dimensions of propagation
















Higher the proportion of documents, lower the entity’s complexity of extraction score

44

Cluster scores are as high as sense-relatedness

scores of terms in documents in clusters

!

'complexity of extraction'of e =1

|C* | |D |!

!  A conservative estimate, a heuristic

!  Average strength of all clusters (A) !  average sense relatedness

!  C* = Clusters with score >= A

!  Support = No. of documents in strong sense related clusters |C*| / No. of documents mentioning the entity |D|

Validating the Framework

45

!  Intuitively, some (cultural) entities are harder to extract than some others !  ‘Up’ vs. ‘The Time Traveler’s Wife’ !!!

!  For a list of X movies !  Selected blogs from the general corpus !  Obtained Sense definitions from Wikipedia !  Generated support for the movie sense using proposed

framework

46

!

“Up” and “Wanted” will arguably be harder to extract than a movie like “Angels and Demons” – clearly indicating that our approach for computing ‘extraction complexity’ is effective. A note on the confidence of these scores is presented in Section 4.2.

Table 1 Entities and their computed extraction complexities

Entity and Variations used in obtaining D; (possible senses found in Wikipedia Disambiguation pages)

Complexity of Extraction

Twilight (novel, film, time of day, albums..) 0.4

Up (film, relative direction, abbreviations, albums..) 0.352

Wanted (film, common verb in English, video game, music..) 0.161

Star Trek, Startrek (film, tv series, video game, media franchise..) 0.114

Transformers (toy line, film, comic series, electronic device..) 0.085

The Hangover (unpleasant feeling, film, band, song.. ) 0.072

The Dark Knight, Dark Knight (Batman’s nickname, film, comic series, soundtrack..) 0.070

Angels and Demons, Angels & Demons (novel, movie, episode on The Blade: the series…) 0.066

4.2 NER Improvements The underlying hypothesis behind this work is that knowing how hard it is to extract an entity will allow classifiers to use cues differently in deciding whether a mention spotted is mentioned in a valid entity in a target sense or not. In this second set of experiments, we measure the usefulness of our feature in assisting a variety of supervised classifiers.

Labeling Data We randomly selected documents one after another from the pool of documents D collected for all entities E in Experiment 1 and labeled every occurrence of entity e and its valid variations in a document as either a movie entity or not. We labeled a total of 1500 spots. We also observed a 100% inter-annotator agreement between the two authors over a random sample of 15% of the labeled spots, indicating that labeling for movie or not-movie sense is not hard in this data. Figure 3 shows statistics for the percentage of true positives found for each entity. The percentage of entity mentions that appear in the movie sense implicitly indicates how much support there is for the target movie sense for the entities. It is interesting then that this order closely matches the extraction complexity ordering of the entities – an indication that the approach we use for extracting our feature is sound. In the process of random labeling the entity “Angels and Demons” received only 10 labels and was discarded from this experiment.

Classifiers, Features, Experimental Setup: We used 3 different state-of-the-art entity classifiers for learning entity extractors – decision tree classifiers, bagging and boosting (using 6 nodes and stump base learners for boosting) [1, 4, 21]. The goal of using different classification models was to show that our measure is useful with different underlying prediction approaches rather than for the purpose of finding the most suitable classifier. We trained and tested the classifiers on an extensive list of features (Figure 4).

We used two well-known features – word-level features that indicate whether the spotted entity is capitalized, surrounded by quotes etc., and contextual syntactic features that encode the Part- Of-Speech tags of words surrounding the entity. We also used knowledge-derived features that indicate whether words already

known to be relevant to the target sense of the entity (sense definition of e) are found in the document, surrounding paragraph, title of post or in the post or blog url. The intuition is that the presence of such words strengthens its case for being a valid mention. We also encoded similar features using the extracted language model LMe to test the usefulness of the new words we extracted as relevant to the target sense of the entity.

In addition to the basic word-level and syntactic features, we also measure the usefulness of our proposed feature against a strong ‘contextual entropy’ baseline. This baseline measures how specific a word is in terms of the contexts it occurs in. A general word will occur in varied contexts, and will have a high context entropy value [14]. High entropy in context distribution is an indication that extracting the entity in any sense might be hard. This baseline is very similar in spirit to our feature, except that our proposed measure identifies how hard it is to extract an entity in a particular target sense. We evaluated classifier accuracies in labeling test spots with and without our ‘complexity of extraction’ feature as a prior. Specifically, we used the following feature combinations:

a. Basic features: word-level, syntactic, knowledge features obtained from the sense definitions S and Sd. b. Baseline: Basic + ‘contextual entropy’ feature as a prior.

c. Our measure: Basic + knowledge features obtained from the extracted LMe + ‘complexity of extraction’ feature as a prior.

Figure 5 shows the precision-recall curves using the basic, baseline and our proposed measures for entity classification using the decision tree and boosting classifiers. We verified stability of these results using 10 fold cross validation. We see better performance of our measure compared to both the basic setting and the strong ‘contextual entropy’ baseline. Notably, there is overwhelming improvement in entity extraction over traditional extractor settings (basic features). The stability of the suggested improvement is also confirmed across both classifiers.

We see significant improvements using the proposed feature and now turn to confirm that this is indeed a consistent pattern. Here, we show the averaged performance of binary classification over 100 runs, each run using different and random samples of training and test sets (obtained from 50-50 splits).

We measured the F-measure and accuracy of the classifiers using the basic, baseline and our proposed measure features. Accuracy is defined as the number of correct classifications (both true positives and negatives) divided by the total number of classifications. We use accuracy to represent general classification improvement – when we care about classifying both the correct and incorrect cases. The F-measure is the standard harmonic mean of precision and recall of classification results and we use it to represent information retrieval improvement – when we only care about our target sense. We report both of these metrics here for consistency with past literature.

Figure 3 Labeled Data

Figure 4 Features used in judging NER improvements

Computed Extraction Complexities

Hand-labeled data% true positives

NER Improvements• Goal: Evaluate existing NE classifiers with and

without ‘complexity of extraction’ feature• spot and disambiguate paradigm

• decision tree classifiers, bagging and boosting

• Cultural Entity: Movie entities in weblogs

• Binary Classification: 1500+ multiple author annotations from a 2130K general blog corpus

47

Feature Set

Feature SetBoolean Word-level

features First letter capitalize, All capitalized, In quotes

Boolean Contextual Syntactic features POS tags of words before and after the spot

Knowledge features Infobox sense definitions in same blog, same paragraph as entity, title, URL, blog URL

Basic





Basic

Baseline Basic + Contextual Entropy prior





Basic

Baseline Basic + Contextual Entropy prior

+ Basic + proposed ‘Complexity of Extraction’ prior

Our Measure

Knowledge features Extracted sense-biased LM in same blog, same paragraph as entity, title, URL, blog URL

PR Curves (10 fold cross validation)basic: word-level, syntactic, sense definitionsbaseline: Basic + contextual entropyour measure: Basic + sense-biased LM + ‘complexity of extraction’

49

Binary Classification (over 100 runs) F-measure, Accuracybasic: word-level, syntactic, sense definitionsbaseline: Basic + contextual entropyour measure: Basic + sense-biased LM + ‘complexity of extraction’

Basic Features: Average accuracy - 74%; F Measure 69%Proposed Feature: +10% Accuracy; +11% F Measure

50

Entity level improvementsBasic vs. Proposed feature sets

Accuracy improvementsThe Dark Night (+26.9%) and The Hangover (+31%)

F-measure improvementsUp (+12.6%), The Dark Knight (+14.9%) and The Hangover (+16.5%)

51

The Polluted LM of Twilight

• “I spent a romantic evening watching the Twilight…”

• “here are photos of the Twilight from the bay..” (photos turned up in the extracted LM: “red carpet photos of the Twilight crew”)

• “I am re-reading Edward Cullen chapters of Twilight” mentions Cullen, a cast in the movie.

Extracted sense-biased LM (words in bold) were used to derive knowledge features; negatively impacted classifier results

Thesis Statement• Knowledge Features (Wikipedia Infobox/IMDB)

• From statistically significant co-occurrences to a sense-biased LM

• Baseline settings (state-of-the-art for cultural entities): F-score 69% for traditional extractor settings

• 90.8 F-score on the CoNLL-2003 NER shared task (PER, LOC, ORGN entities) [Lev Ratinov, Dan Roth]

• Average improvements: +11% (F-score)

• Generic methods/proposed feature53

APPLICATIONS

54

Several Applications of this work

• Weak Indicators for contextual browsing/search, reduced document set for manual labeling

• Step1,2: Clustered Sense-biased Documents

55




• IE Pipeline: Ignore entity if high extraction complexity

55




• IE Pipeline: Ignore entity if high extraction complexity

• Step 1 (LM generation): Unsupervised Domain Lexicon Generation

• restaurants and bars55

Related Terms for Topic Classification

Unsupervised Generation of associated words in the Restaurants and Bars topic

56

5

3

4

2

1

Restaurant

Step 1: Pulse on “Restaurant”



56

1 Restaurant

2

3

Review

Table

5

4

Reservation

Waiters

Step 2: Pulse on top n surfaced terms from step 1

5

3

4

2

1

Restaurant




57

5

3

4

2

1

Restaurant


restaurant, waiters, tasty, waiter, dish, nutrition, review, cooking, reviews, tibits, vegetarian, chef, sweet, bourdain, waitress, reservations, lunch, dishes, sushi, cuisine, burger, taste, burgers, fries, french, wines, tapas, wineries, wine, café, huang, vietnamese, espresso, anhui, coffee, shops, hotels, cafes, diners, bars, called, hefei, menus, chefs, michelin, dine, establishments, tourist, eateries, chain, meals, culinary, stores, pubs, food, retail, chains, specialty, bakeries, vendors, fuyang, restaurants, entrees, appetizers, salads, menu, assignment, shopper, shoppers, service, delicious, meal, paleo, eating, booths, tables, buffet, shrimp, chopsticks, eat, micah, tierney, dinners, dinner, mkhulu, san, tex, mexican, italian, pizza, brunch, bar, dining, steak, place, seafood, servers, salad, hostess, chinese, sandwich, patrons, bakery, eatery, local, outdoor, diner, mcdonald, greek, fancy, ate, ordering, cheese, business, thai, sandy, dined, hotel, japanese, afternoon, celebrate, birthday, cafe, table, downtown, francisco, good, seating, taco, foods, mex, night, soup, gift, chicken, banquet, anniversary, themed, pizzas, recommend, don, priced, pancakes, burrito, famous, neighborhood, drinks, potato, dessert, sausage, restaurateur, tonight, nearby, german, morton’s, casual, reception, kosher, ranch, favorite, servings, crab, appetizer, steaks, toilets, veggie, grilled, baked, pho, pasta, opened, wonderful, reservation, mussels, quaint, pancake, chinatown, foodies, oasis, swanky, kitchen, enjoyed, patio, work, upscale, friend, plate, cab, corner, coworkers, cooks, valentine, celebrated, arrive, stuffed, owners, discount, bistro, vegan, …

NER - Approach 2

58

• When a cultural entity has several senses in the same domain

• Domain: Music; Senses: album, tracks, band, artist name

Approach 2: Multiple senses same Domain

3.4. CULTURAL NER – MULTIPLE SENSES IN THE SAME DOMAIN August 12, 2010

3.4 Cultural NER – Multiple Senses in the Same Domain

Compared to our first contribution in Cultural NER (Section 3.3) that focused on disambiguating

entity names that span different domains (e.g., movies vs. video games), the focus of our sec-

ond contribution detailed in this chapter is in the identification of Cultural entities that appear in

multiple senses even in the same domain.

The occurrence of a person last names such as ‘Clinton’ in text, even if restricted to docu-

ments in the political domain, could refer to either Bill or Hillary or Chelsea Clinton. Figure 3.12

shows another example of the word ‘Celebration’ used as the name of a band, song, album and

track title by multiple artists in the music domain.

‘Celebration’ (song), a song by Kool & The Gang, notably covered by Kylie Minogue

‘Celebration’ (Voices With Soul song), the debut single from girl band, Voices With Soul

‘Celebration’, a song by Krokus from Hardware

‘Celebration’ (Simple Minds album), a 1982 album by Simple Minds

‘Celebration’ (Julian Lloyd Webber album), a 2001 album by Julian Lloyd Webber

‘Celebration’ (Madonna album), a 2009 greatest hits album by Madonna

‘Celebration’ (Madonna song), same-titled single by Madonna

‘Celebration’ (band), a Baltimore-based band

‘Celebration’ (Celebration album), a 2006 album by ‘Celebration’

‘Celebration’ (musical), a 1969 musical theater work by Harvey Schmidt and Tom Jones

Figure 3.12: Showing usages of the word ‘Celebration’ as the name of a band, hitsong, album and

track title by multiple artists in the music domain.

The goal of the algorithm described in this chapter is in the fine-grained entity identification

and disambiguation of such entities (in social media text) that have multiple real-world references

within a same domain.

79

• Problem Space

• Cultural Entity: Music album, tracks• Smile (Lilly Allen), Celebration (Madonna)..

60

Algorithm Preliminaries

• Problem Space

• Cultural Entity: Music album, tracks• Smile (Lilly Allen), Celebration (Madonna)..

• Corpus: MySpace comments

• Context-poor utterances• “Happy 25th Lilly, Alfie is funny”

• Goal: Semantic Annotationof named entity (w.r.t MusicBrainz)

60

Algorithm Preliminaries

61

• 60 songs with Merry Christmas

• 3600 songs with Yesterday

• 195 releases of American Pie

• 31 artists covering American Pie

Happy 25th! Loved your song Smile ..

Semantic Annotation

Using a Knowledge Resource for NER

61

• 60 songs with Merry Christmas

• 3600 songs with Yesterday

• 195 releases of American Pie

• 31 artists covering American Pie

Happy 25th! Loved your song Smile ..

Using a domain knowledge base is not straight-forward

Semantic Annotation

Using a Knowledge Resource for NER

Approach Overview

62

This new Merry Christmas tune..

SO GOOD!

Which ‘Merry Christmas’?; ‘So Good’ is also a song!

Approach Overview

62

This new Merry Christmas tune..

SO GOOD!

• Scoped Relationship graphs

• Using context cues from the content, webpage title, url..

• Reduce potential entity spot size

• Generate candidate entities

• Spot and Disambiguate

Which ‘Merry Christmas’?; ‘So Good’ is also a song!

Scoping via Real-world Restrictions

“I heart your new album Habits”

eliminate album releases that are not ‘new’ using metadata in MusicBrainz


!"""#$

!""#$

!"#$

!#$

#$

#"$

#""$!"""#$ !""#$ !"#$ !#$ #$ #"$ #""$

!"#$%&%'()'*)+,#)-.'++#"%&'()*+,-..)/*./&0%)1*-%*-%23*405&%%&*+-%6+**

*****789!9$*,/):0+0-%;

&/.0+.+*<5-+)*=0/+.*&2>?@*<&+*0%*.5)*,&+.*8*3)&/+

*&22*&/.0+.+*<5-*/)2)&+)1*&%*&2>?@*0%*.5)*,&+.*8*3)&/+

&.*2)&+.*8*&2>?@+)A&:.23*8*&2>?@+

*)%.0/)*B?+0:*C/&0%D*.&A-%-@3*7"!"""8$*,/):0+0-%;



From all of MusicBrainz (281890 artists, 6220519 tracks) to tracks of one artist


!"""#$

!""#$

!"#$

!#$

#$

#"$

#""$!"""#$ !""#$ !"#$ !#$ #$ #"$ #""$

!"#$%&%'()'*)+,#)-.'++#"%&'()*+,-..)/*./&0%)1*-%*-%23*405&%%&*+-%6+**

*****789!9$*,/):0+0-%;

&/.0+.+*<5-+)*=0/+.*&2>?@*<&+*0%*.5)*,&+.*8*3)&/+

*&22*&/.0+.+*<5-*/)2)&+)1*&%*&2>?@*0%*.5)*,&+.*8*3)&/+

&.*2)&+.*8*&2>?@+)A&:.23*8*&2>?@+

*)%.0/)*B?+0:*C/&0%D*.&A-%-@3*7"!"""8$*,/):0+0-%;

Closely follow distribution of random restrictions, conforming loosely to a Zipf distribution





!"""#$

!""#$

!"#$

!#$

#$

#"$

#""$!"""#$ !""#$ !"#$ !#$ #$ #"$ #""$

!"#$%&%'()'*)+,#)-.'++#"%&'()*+,-..)/*./&0%)1*-%*-%23*405&%%&*+-%6+**

*****789!9$*,/):0+0-%;

&/.0+.+*<5-+)*=0/+.*&2>?@*<&+*0%*.5)*,&+.*8*3)&/+

*&22*&/.0+.+*<5-*/)2)&+)1*&%*&2>?@*0%*.5)*,&+.*8*3)&/+

&.*2)&+.*8*&2>?@+)A&:.23*8*&2>?@+

*)%.0/)*B?+0:*C/&0%D*.&A-%-@3*7"!"""8$*,/):0+0-%;

Closely follow distribution of random restrictions, conforming loosely to a Zipf distribution



Choosing which constraints to implement is simple - pick whatever is easiest first


• User comments are on MySpace artist pages

• Contextual Restriction: Artist name

• Assumption: no other artist/work mention

Madonna’s tracks

64

Scoped Entity Lists

“this is bad news, ill miss you MJ”

• User comments are on MySpace artist pages

• Contextual Restriction: Artist name

• Assumption: no other artist/work mention

• Naive spotter has advantage of spotting all possible mentions (modulo spelling errors)

• Generates several false positives

Madonna’s tracks

64

Scoped Entity Lists

Non-Music Mentions

• Challenge 1: Several senses in the same domain

• Scoping relationship graphs narrows possible senses

• Challenge 2: Non-music mentions

• Got your new album Smile. Loved it!

• Keep your SMILE on!

65valid mention?

Hand-labeling - Fairly Subjective• 1800+ spots in MySpace user comments from

artist pages

• Keep your SMILE on!

• good spot, bad spot, inconclusive?

• 4-way annotator agreements• Madonna 90% agreement

• Rihanna 84% agreement

• Lily Allen 53% agreement

66

Supervised Learners

67

Syntactic features Notation-S+POS tag of s s.POS

POS tag of one token before s s.POSb

POS tag of one token after s s.POSa

Typed dependency between s and sentiment word s.POS-TDsent∗

Typed dependency between s and domain-specific term s.POS-TDdom∗

Boolean Typed dependency between s and sentiment s.B-TDsent∗

Boolean Typed dependency between s and domain-specific term s.B-TDdom∗

Word-level features Notation-W+Capitalization of spot s s.allCaps

+Capitalization of first letter of s s.firstCaps

+s in Quotes s.inQuotes

Domain-specific features Notation-DSentiment expression in the same sentence as s s.Ssent

Sentiment expression elsewhere in the comment s.Csent

Domain-related term in the same sentence as s s.Sdom

Domain-related term elsewhere in the comment s.Cdom+Refers to basic features, others are advanced features

∗These features apply only to one-word-long spots.

Table 6. Features used by the SVM learner

Valid spot: Got your new album Smile.

Simply loved it!

Encoding: nsubj(loved-8, Smile-5) imply-

ing that Smile is the nominal subject of

the expression loved.

Invalid spot: Keep your smile on. You’ll

do great !Encoding: No typed dependency between

smile and great

Table 7. Typed Dependencies Example

Typed Dependencies:

We also captured the typed de-

pendency paths (grammatical rela-

tions) via the s.POS-

TDsent and s.POS-TDdom boolean

features. These were obtained be-

tween a spot and co-occurring senti-

ment and domain-specific words by

the Stanford parser[12] (see exam-

ple in 7). We also encode a boolean

value indicating whether a relation

was found at all using the s.B-TDsent

and s.B-TDdom features. This allows us to accommodate parse errors given the

informal and often non-grammatical English in this corpus.

5.2 Data and Experiments

Our training and test data sets were obtained from the hand-tagged data (see

Table 3). Positive and negative training examples were all spots that all four

annotators had confirmed as valid or invalid respectively, for a total of 571 posi-

tive and 864 negative examples. Of these, we used 550 positive and 550 negative

examples for training. The remaining spots were used for test purposes.

Our positive and negative test sets comprised of all spots that three annota-

tors had confirmed as valid or invalid spots, i.e. had a 75% agreement. We also

included spots where 50% of the annotators had agreement on the validity of the

Generic syntactic, spot-level, domain, knowledge features

[“Multimodal Social Intelligence in a Realtime Dashboard System”, VLDB Journal, Special Issue on "Data Management and Mining on Social Networks and Social Media", 2010]

Supervised Learners

67

Syntactic features Notation-S+POS tag of s s.POS

POS tag of one token before s s.POSb

POS tag of one token after s s.POSa

Typed dependency between s and sentiment word s.POS-TDsent∗

Typed dependency between s and domain-specific term s.POS-TDdom∗

Boolean Typed dependency between s and sentiment s.B-TDsent∗

Boolean Typed dependency between s and domain-specific term s.B-TDdom∗

Word-level features Notation-W+Capitalization of spot s s.allCaps

+Capitalization of first letter of s s.firstCaps

+s in Quotes s.inQuotes

Domain-specific features Notation-DSentiment expression in the same sentence as s s.Ssent

Sentiment expression elsewhere in the comment s.Csent

Domain-related term in the same sentence as s s.Sdom

Domain-related term elsewhere in the comment s.Cdom+Refers to basic features, others are advanced features

∗These features apply only to one-word-long spots.

Table 6. Features used by the SVM learner

Valid spot: Got your new album Smile.

Simply loved it!

Encoding: nsubj(loved-8, Smile-5) imply-

ing that Smile is the nominal subject of

the expression loved.

Invalid spot: Keep your smile on. You’ll

do great !Encoding: No typed dependency between

smile and great

Table 7. Typed Dependencies Example

Typed Dependencies:

We also captured the typed de-

pendency paths (grammatical rela-

tions) via the s.POS-

TDsent and s.POS-TDdom boolean

features. These were obtained be-

tween a spot and co-occurring senti-

ment and domain-specific words by

the Stanford parser[12] (see exam-

ple in 7). We also encode a boolean

value indicating whether a relation

was found at all using the s.B-TDsent

and s.B-TDdom features. This allows us to accommodate parse errors given the

informal and often non-grammatical English in this corpus.

5.2 Data and Experiments

Our training and test data sets were obtained from the hand-tagged data (see

Table 3). Positive and negative training examples were all spots that all four

annotators had confirmed as valid or invalid respectively, for a total of 571 posi-

tive and 864 negative examples. Of these, we used 550 positive and 550 negative

examples for training. The remaining spots were used for test purposes.

Our positive and negative test sets comprised of all spots that three annota-

tors had confirmed as valid or invalid spots, i.e. had a 75% agreement. We also

included spots where 50% of the annotators had agreement on the validity of the

Generic syntactic, spot-level, domain, knowledge features

[“Multimodal Social Intelligence in a Realtime Dashboard System”, VLDB Journal, Special Issue on "Data Management and Mining on Social Networks and Social Media", 2010]

1. Sentiment Expressions:Slang sentiment gazetteer using Urban Dictionary

2. Domain specific termsmusic, album, concert..

Efficacy of Features

PR tradeoffs: choosing feature combinations depends on end application requirement

Recall intensive

Prec

ision

inte

nsive



identified 90% of valid spotseliminated 35% of invalid spots

78-50

42-91

Recall intensive

Prec

ision

inte

nsive

90-35



identified 90% of valid spotseliminated 35% of invalid spots

78-50

42-91

Recall intensive

Prec

ision

inte

nsive Thesis Statement: Feature

combinations were most stable, best performing

Gazetteer matched domain words and sentiment expressions proved to be useful

90-35

Dictionary Spotter + NLP

69

!"

#!"

$!"

%!"

&!"

'!!"

()*+,

-./00,1

2!345

6&35!

6$3$!

6#345

6'36!

6!35%

%#3&$

%'36!

%!3&5

$53&%

$#32'

!"#$$%&%'()#**+(#*,)$-"%.$)/0#"%12%30#"%14

5('*%$%63)7)8'*#""

71,89-9/(:;/1:<9=>:?==,(71,89-9/(:;/1:@9A)(()71,89-9/(:;/1:B)C/(()@,8)==:D)==:0A1,,E

Step 1: Spot with naive spotter, knowledge base restricted to an

Artists’ tracks

Madonna’s track spots-

23% precision


69

!"

#!"

$!"

%!"

&!"

'!!"

()*+,

-./00,1

2!345

6&35!

6$3$!

6#345

6'36!

6!35%

%#3&$

%'36!

%!3&5

$53&%

$#32'

!"#$$%&%'()#**+(#*,)$-"%.$)/0#"%12%30#"%14

5('*%$%63)7)8'*#""

71,89-9/(:;/1:<9=>:?==,(71,89-9/(:;/1:@9A)(()71,89-9/(:;/1:B)C/(()@,8)==:D)==:0A1,,E


Artists’ tracks

Step 2: Disambiguate using NL features (SVM classifier)


23% precision 42-91 : “All features” setting


69

!"

#!"

$!"

%!"

&!"

'!!"

()*+,

-./00,1

2!345

6&35!

6$3$!

6#345

6'36!

6!35%

%#3&$

%'36!

%!3&5

$53&%

$#32'

!"#$$%&%'()#**+(#*,)$-"%.$)/0#"%12%30#"%14

5('*%$%63)7)8'*#""

71,89-9/(:;/1:<9=>:?==,(71,89-9/(:;/1:@9A)(()71,89-9/(:;/1:B)C/(()@,8)==:D)==:0A1,,E


Artists’ tracks

Step 2: Disambiguate using NL features (SVM classifier)


23% precision


~60% precision

42-91 : “All features” setting

Thesis Statements

• Highlights issues with using a domain knowledge for an IE task

70

Thesis Statements


• Two stage approach: chaining NL learners over results of domain model based spotters

70

Thesis Statements


• Two stage approach: chaining NL learners over results of domain model based spotters

• Improves accuracy up to a further 50%• allows the more time-intensive NLP analytics to run on

less than the full set of input data

70

APPLICATIONS

71

BBC SoundIndex (IBM Almaden)Pulse of the Online Music Populace

http://www.almaden.ibm.com/cs/projects/iis/sound/

Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth: Multimodal Social Intelligence in a Real-Time Dashboard System to appear in a special issue of the

VLDB Journal on "Data Management and Mining for Social Networks and Social Media", 2010






Domain metadata, Artist/Track

unstructured, structured metadata

ETL

UIMA Analytics Environment

Album/Track identification [ISWC09]

Sentiment Identification

Spam and off-topic comments

“U R $o Bad!”, “Thriller is my most fav MJ album”;“this is BAD news, miss u MJ”

ETL

Thriller/NNP is/VBZ my/PRP$ most/RBS fav/JJ MJ/NN album/NN

Exracted concepts into explorable datastructures

ETL

What are 18 year olds in London listening to?

ETL

What are 18 year olds in London listening to?

Crowd-sourced preferences

ETL

Several Insights..

74-ve +ve

< 4%

spamnon-spam

Predictive Power of Data

21

38% of total comments were spam61% of total comments had positive sentiments4% of total comments had negative sentiments

35% of total comments had no identifiable sentiments

Table 7 Annotation Statistics

As described in Section 8, the structured metadata

(artist name, timestamp, etc.) and annotation results

(spam/non-spam, sentiment, etc.) were loaded in the

hypercube.

The data represented by each cell of the cube is the

number of comments for a given artist. The dimension-

ality of the cube is dependent on what variables we

are examining in our experiments. Timestamp, age and

gender of the poster, geography, and other factors are

all dimensions in hypercube, in addition to the mea-

sures derived from the annotators (spam, non-spam,

number of positive sentiments, etc.).

For the purposes of creating a top-N list, all dimen-

sions except for artist name are collapsed. The cube is

then sliced along the spam axis (to project only non-

spam comments) and the comment counts are projected

onto the artist name axis. Since the percentage of nega-

tive comments was very small (4%), the top-N list was

prepared by sorting artists on the number of non-spam

comments they had received independent of the senti-

ment scoring.

In Table 8 we show the top 10 most popular Bill-

board artists and the list generated by our analysis of

MySpace for the week of the survey. While some top

artists appear on both lists (e.g., Soulja Boy, Timba-

land, 50 Cent, and Pink), there are important differ-ences. In some cases, our MySpace Analysis list clearly

identified rising artists before they reached the Top-10

list on billboard (e.g., Fall Out Boy and Alicia Keys

both climbed to #1 on Billboard.com shortly after we

produced these lists). Overall, we can observe that the

Billboard.com list contains more artists with a long his-

tory and large body of work (e.g., Kanye West, Fer-

gie, Nickleback), whereas our MySpace Analysis List is

more likely to identify ”up and coming” artists. This is

consistent with our expectations, particularly in light

of the aforementioned industry reports which indicate

that teenagers are the biggest music influencers (Me-

diamark, 2004).

11.1.2 The Word on the Street

Using the above lists, we performed a casual preference

poll of 74 people in the target demographic. We con-

ducted a survey among students of an after-school pro-

gram (Group 1), Wright State (Group 2), and Carnegie

Mellon (Group 3). Of the three different groups, Group

Billboard.com MySpace Analysis

Soulja Boy T.I.Kanye West Soulja BoyTimbaland Fall Out BoyFergie RihannaJ. Holiday Keyshia Cole50 Cent Avril LavigneKeyshia Cole TimbalandNickelback PinkPink 50 CentColbie Caillat Alicia Keys

Table 8 Billboard’s Top Artists vs. our generated list

1 was comprised of respondents between ages 8 and 15;

while Group 2 and 3 were primarily comprised of col-

lege students in the 17–22 age group. Table 9 shows

statistics pertaining to the three survey groups.

Groups and Age Range No. of male No. of femalerespondents respondents

Group 1 (8–15) 8 9Group 2 (17–22) 21 26Group 3 (17–22) 7 3

Table 9 Survey Group Statistics

The survey was conducted as follows: the 74 respon-

dents were asked to study the two lists shown in Table 8.

One was generated by Billboard and the other through

the crawl of MySpace. They were then asked the fol-

lowing question: “Which list more accurately reflectsthe artists that were more popular last week?” Their re-

sponse along with their age, gender and the reason for

preferring a list was recorded.

The sources used to prepare the lists were not shown

to the respondents, so they would not be influenced by

the popularity of MySpace or Billboard. In addition,

we periodically switched the lists while conducting the

study to avoid any bias based on which list was pre-

sented first.

11.1.3 Results

The raw results of our study immediately suggest the

validity of the system, as can be seen in Table 10. The

MySpace data generated list is preferred over 2 to 1

to the Billboard list by our 74 test subjects, and the

preference is consistently in favor of our list across all

three survey groups.

More exactly, 68.9± 5.4% of subjects prefer the SI

derived list to the Billboard list. Looking specifically at

Group 1, the youngest survey group whose ages range

from 8–15, we can see that our list is even more suc-

cessful. Even with a smaller sample group (resulting in

User study indicated 2:1 and upto 7:1 (younger age groups) preference for MySpace list

Billboards Top 50 Singles chart during the week of Sept 22-28 ’07 vs. MySpace popularity charts

Challenging traditional polling methods!

INTERMISSION?Up Next: Overview of Key Phrase Extraction,

Applications, Conclusions

Key Phrase Extraction

• Different from Information Extraction

• Extracting vs. Assigning Key Phrases • Thesis Focus: Key Phrase Extraction

• Prior work focus: extracting phrases that summarize a document -- a news article, a web page, a journal article, a book..

• Thesis focus: summarize multiple documents (UGC) around same event/topic of interest

Key Phrase Extraction - Aboutness Understanding

• Prominent discussions (key phrases) around the 2009 Health Care Reform debate and 2008 Mumbai Terror Attack on one day

Key Phrase Extraction - Aboutness Understanding

4.1. KEY PHRASE EXTRACTION - ‘ABOUTNESS’ OF CONTENT August 10, 2010

document that are descriptive of its contents.

The contributions made in this thesis fall under the second category of extracting key phrases

that are explicitly present in the content and are also indicative of what the document is ‘about’.

The focus of previous approaches to key phrase extraction have been on extracting phrases

that summarize a document, e.g. a news article, a web page, a journal article or a book. In

contrast, the focus of this thesis is not in summarizing a document generated by users on social

media platforms but to extract key phrases that are descriptive of information present in multiple

observations (or documents) made by users about an entity, event or topic of interest.

The primary motivation is to obtain an abstraction of a social phenomenon that makes volumes

of unstructured user-generated content easily consumable by humans and agents alike. As an

example of the goals of our work, Table 4.1 shows key phrases extracted from online discussions

around the 2009 Health Care Reform debate and the 2008 Mumbai terror attack, summarizing

hundreds of user comments to give a sense of what the population cared about on a particular day.

2009 Health Care Reform 2008 Mumbai Terror Attack

Health care debate Foreign relations perspectiveHealthcare staffing problem Indian prime minister speechObamacare facts UK indicating supportHealthcare protestors Country of IndiaParty ratings plummet Rejected evidence providedPublic option Photographers capture images of Mumbai

Table 4.1: Showing summary key phrases extracted from more than 500 online posts on Twitteraround two news-worthy events on a single day.

Solutions to key phrase extraction have ranged from both unsupervised techniques that are

based on heuristics to identify phrases and supervised learning approaches that learn from human

105

Key Phrase Extraction on Social Media Content

• Thesis Focus: Summarizing Social Perceptions via key phrase extraction

• Preserving/Isolating the social behind the social data

• Accounting for redundancy, variability, off-topic content

80

“Met up with mom for lunch, she looks lovely as ever, good genes .. Thanks Nike, I love my new Gladiators ..smooth as a feather. I burnt all the calories of Italian joy in one run.. if you are

looking for good Italian food on Main, Buca is the place to go.”

Social and Cultural Logic in UGC

• Thematic components

• similar messages convey similar ideas

• Space, time metadata• role of community and geography in communication

• Poster attributes• age, gender, socio-economic status reflect similar

perceptions

81

Feature Space (in prior work and in thesis)

• Thesis Focus: n-grams, spatio-temporal metadata (social components)

• Syntactic Cues: In quotes, italics, bold; in document headers; phrases collocated with acronyms

• Document and Structural Cues: Two word phrases, appearing in the beginning of a document, frequency, presence in multiple similar documents etc.

• Linguistic Cues: Stemmed form of a phrase, phrases that are simple and compound nouns in sentences etc.

82

Key Phrase Extraction Overview

Spatio-Temporal Clusters δs Event Spatial Bias

δt Event Temporal Bias

Key Phrase Generation

n-gram generation

n-gram WeightingThematic, Temporal and

Spatial Scores

User-generated Content

textual component tc,temporal parameter tt,spatial parameter tg

Off-topic Key Phrase Elimination

Key Phrase Extraction Overview




n-gram generation


Spatial Scores




“President Obama in trying to regain control of the health-care debate will likely shift his pitch in September”

1-grams: President, Obama, in, trying, to, regain, ...

2-grams: “President Obama”, “Obama in”, “in trying”, “trying to”...

3-grams: “President Obama in”, “Obama in trying”; “in trying to”...

“President Obama in”“President” “President Obama”




n-gram generation


Spatial Scores




• A descriptor is an n-gram weighted by:

• Thematic Importance• TFIDF, stop words, noun phrases

• redundancy: statistically discriminatory in nature

• variability: contextually important

• Spatial Importance (local vs. global popularity)

• Temporal Importance (always popular vs. currently trending)

“President Obama in”“President” “President Obama”




n-gram generation


Spatial Scores




Higher-order n-grams picked over lower-order n-grams (if same scores)

Eliminating Off-topic Content [WISE2009]

• Frequency based heuristics will not eliminate off-topic content that is ALSO POPULAR




n-gram generation


Spatial Scores




• “Yeah i know this a bit off topic but the other electronics forum is dead right now. im looking for a good camcorder, somethin not to large that can record in full HD only ones so far that ive seen are sonys”

• “Canon HV20. Great little cameras under $1000.”

87

Approach Overview

Approach Overview

• Assume one or more seed words (from domain knowledge base) C1 - ['camcorder']

• Extracted Key words / phrases : C2 - ['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic']

• Gradually expand C1 by adding phrases from C2 that are strongly associated with C1

• Mutual Information based algorithm [WISE2009]

Key Phrases & Aboutness - Evaluations

• Are the key phrases we extracted topical and good indicators of what the content is about?

• If it is, it should act as an effective index/search phrase and return relevant content

• Evaluation Application: Targeted Content Delivery

89

Targeted Content Delivery - Evaluations

• 12K posts from MySpace and Facebook Electronics forums

• Baseline phrases: Yahoo Term Extractor

• Our method phrases: Key phrase extraction, elimination

• Targeted Content from Google AdSense

90

Targeted Content for Extracted Key Phrases

91

A. Showing Advertisements generated for phrases identified by the Yahoo Term Extractor (YTE)

B. Showing Advertisements generated for topical phrases extracted by our algorithm

User Studies and Results

92


92

!  !"#$"%&'()#*%+*"%$#,#-+./%/0%/1#%&0"/%!  2/%,#+"/%345%'./#$6#-+,7+/0$%+8$##9#./%

!  :0$%/1#%;4%&0"/"%!  <0/+,%0=%>??%+*%'9&$#""'0."%!  >@5%0=%+*"%&'()#*%+"%$#,#-+./%

!  :0$%/1#%/0&'(+,%)#AB0$*"%!  <0/+,%0=%>;C%+*%'9&$#""'0."%!  ?45%0=%+*"%&'()#*%+"%$#,#-+./%


92

!  !"#$"%&'()#*%+*"%$#,#-+./%/0%/1#%&0"/%!  2/%,#+"/%345%'./#$6#-+,7+/0$%+8$##9#./%

!  :0$%/1#%;4%&0"/"%!  <0/+,%0=%>??%+*%'9&$#""'0."%!  >@5%0=%+*"%&'()#*%+"%$#,#-+./%

!  :0$%/1#%/0&'(+,%)#AB0$*"%!  <0/+,%0=%>;C%+*%'9&$#""'0."%!  ?45%0=%+*"%&'()#*%+"%$#,#-+./%

Extracted, topical phrases yield 2X more relevant content

Thesis Statement

• TFIDF + social contextual cues yield more useful phrases that preserve social perceptions

• Corpus + seeds from a domain knowledge base eliminate off-topic phrases effectively

93

APPLICATIONS

94

http://twitris.knoesis.org/

Twitris (with Knoesis)online pulse around news-worthy events..

(Mumbai Terror Attack ‘08, Health Care Debate 2010)

Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav, Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web

Information Systems Engineering, Oct 5-7, 2009: 539-553

http://twitris.knoesis.org

http://twitris.knoesis.org

http://knoesis.wright.edu/library/resource.php?id=00559




Chatter around news-worthy events

Hundreds of tweets, facebook posts, blogs about a single event Multiple narratives, strong opinions, breaking news..

Preserving Social Perceptions

The Health Care Reform Debate

Zooming in on Florida

Summaries of Citizen Reports

Zooming in on Washington

Summaries of Citizen Reports

RT @WestWingReport: Obama reminds the faith-based groups "we're neglecting 2 live up 2 the call" of being R

brother's keeper on #healthcare

Providing Context

twitris socially influenced browsingAshu, Raghava, Wenbo, Pramod. Vinh, Karthik, Meena, Amit, and Ajith

kno.e.sis center, Wright State University Opinion on Ira

n

Electio

n from th

e

US talks

about O

il

economies

,

blogging

Opinion on Iran

Electio

n from Ira

n

talks

about

theocra

cy,

oppressio

n,

demonstr

ation

Spatial perspective

Capture changing perceptions, issues of interest every day; legalize illegal immigrants in the healthcare context on September 18.

Temporal perspective

Capture changing perceptions, issues of interest every day; Nobel is no more the news for Obama! captured October 12.

Find resources related to social perceptions

News and Wikipedia articles to put extracted descriptors in context

Twitris aggregates social perceptions from Twitter using a spatio, temporal and thematic approach. Twitris captures what was said, when it was saidand where it was said. Fetch resources from the Web to explore perceptions further. Browse the Web for issues that matter to people, using

people's perceptions as the fulcrum.

What does twitris do?

✓ Exploit spatio, temporal semantics for thematic aggregation

✓ Analyze the anatomy of a tweet "RT @m33na come back and checkl new events on twitris #twitris" RT: Retweet or a repost of a tweet; # (hashtags) user generated meta; @- refer to

other users

✓Data from diverse sources (Twitter, news services, Wikipedia, and other Web resources)

✓ End user application

Little statistics from Tiwtris (unit: tweets)

Healthcare ( Aug 19 - Oct 20) : 721 K (US Only)

Obama (Oct 8 - 20): 312 K (US Only)

H1N1 (Oct 5 - 20) : 232 K (US Only)

Iran Election (June 5 - Oct 20) : 2.8 m (Worldwide)

`

Twitris

Concept Cloud, News and related

articles

Google News widget

DBpedia widget

Context + Selected

Term

Context + Selected

Term

Twitris DB

Data Collection

event-1 crawler....

event-kcrawler

.

.

.

.event-ncrawler

Author Location Lookup

.

.

.Author Location

Lookup..


Geocode Lookup....

Geocode Lookup....

Geocode Lookup

Data ProcessingTFIDF based

descriptor extraction

Spatio, Temporal, Thematic descriptor extraction

Extracting storylines around

descriptorsTwitter Search

Shared

Memory

Data Dumper....

Data Dumper....

Data Dumper

Shared

Memory

Shared

Memory

Parallel crawling to scaleData processing pipeline to streamline Twitter, geocode services, data analytics, to handle heterogeneityLive resource aggregationNear real time: Processing upto a day before Spatio-temporally weighted text analytics

twitris internals in less than

140 characters

Culled out user observations correlated well with mainstream media (news, blogs)

The fourth estate perspective

Cavetas and Future work

1. Handle Twitter constructs such as hashtags, retweets, mentions and replies better2. Different viz widgets such as time series to show changing perceptions from a place for an event and demographic based visualizations.3. Sentiment analysis 4. Robust computing approaches (Cloud, Hadoop)5. FB Connect for sharing and personalization

Check us out at: http://twitris.knoesis.org

Follow us @7w17r15

Become a FB Fan and share Twtitris with everyone

A tetris like approach to twitter to gather aggregated social signals is defined as

SOYLENT GREEN and the HEALTH CARE REFORMInformation right where you need it !


kno.e.sis center, Wright State University

Opinion on Iran

Electio

n from th

e

US talks

about O

il

economies

,

blogging

Opinion on Iran

Electio

n from Ira

n

talks

about

theocra

cy,

oppressio

n,

demonstr

ation

Spatial perspective











other users






H1N1 (Oct 5 - 20) : 232 K (US Only)


`

Twitris


articles

Google News widget

DBpedia widget

Context + Selected

Term

Context + Selected

Term

Twitris DB

Data Collection

event-1 crawler....

event-kcrawler

.

.

.

.event-ncrawler


.

.

.Author Location

Lookup..


Geocode Lookup....

Geocode Lookup....

Geocode Lookup






Shared

Memory

Data Dumper....

Data Dumper....

Data Dumper

Shared

Memory

Shared

Memory



140 characters






Follow us @7w17r15




kno.e.sis center, Wright State University

Opinion on Iran

Electio

n from th

e

US talks

about O

il

economies

,

blogging

Opinion on Iran

Electio

n from Ira

n

talks

about

theocra

cy,

oppressio

n,

demonstr

ation

Spatial perspective











other users






H1N1 (Oct 5 - 20) : 232 K (US Only)


`

Twitris


articles

Google News widget

DBpedia widget

Context + Selected

Term

Context + Selected

Term

Twitris DB

Data Collection

event-1 crawler....

event-kcrawler

.

.

.

.event-ncrawler


.

.

.Author Location

Lookup..


Geocode Lookup....

Geocode Lookup....

Geocode Lookup






Shared

Memory

Data Dumper....

Data Dumper....

Data Dumper

Shared

Memory

Shared

Memory



140 characters






Follow us @7w17r15



CONCLUSIONS

103

Contributions and Summary

104

• Thesis motivation

• Understand characteristics of user-generated textual content on social media.

• Thesis demonstrated that

• UGC is different, variability and lack of context affects performance of TM tasks

• Example: 69% F-measure for NER on UGC

Summary• Described frameworks and implementations

• How contextual knowledge from multiple sources can be integrated to supplement traditional NLP/ML algorithms

• Showed effectiveness of these frameworks for NER, Key Phrase extraction on UGC

• for e.g., +11% average F-score improvements for NER in Weblogs

• Building Social Intelligence Applications

Other Contributions

106

UG

C U

nder

stan

ding

Tas

ks

FacetsHOWWHYWHAT

Named Entity Recognition [ISWC2009]

Key Phrase Extraction[WISE2009, WI2009]

Domain Models: Disambiguating entities in merging Ontologies; applications in Conflict-of-interest detection [WWW2006, TWEB2008]

Other Contributions

106

UG

C U

nder

stan

ding

Tas

ks

FacetsHOWWHYWHAT



Semantic Document Classification [WWW2007]


Other Contributions

106

UG

C U

nder

stan

ding

Tas

ks

FacetsHOWWHYWHAT




Intent Mining [WI2009]


Other Contributions

106

UG

C U

nder

stan

ding

Tas

ks

FacetsHOWWHYWHAT





Network Effects[ICWSM2010]


Other Contributions

106

UG

C U

nder

stan

ding

Tas

ks

FacetsHOWWHYWHAT





Network Effects[ICWSM2010]

Gendered Language Usage in online Self-Expression

[ICWSM2009]


Future Work

• The long-term outlook

• research online social user interactions

• build and design systems to understand and impact how society produces, consumes and shares data

• The near-term goals

• transformative and robust ways of coding, analyzing and interpreting user observations

107

Future Work

108

Future Work

• Big Data Challenges & Availability of Domain Models

108

Future Work


• Computational Social Science

108

Future Work



• ‘why’ are we seeing what we seeing

108

Future Work




• people-content-network interactions

108

Future Work




• people-content-network interactions

• building tools that close that loop; slice and dice of data, correlation with other media..

108

THANK-YOU!Are there any questions?

109

meena nagarajan ph.d. dissertation defense

Education