topical search in the twitter osn
DESCRIPTION
Topical search in the Twitter OSN. Saptarshi Ghosh. Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS). Topical search in Twitter. Twitter has emerged as an important source of information & real-time news - PowerPoint PPT PresentationTRANSCRIPT
Topical search in the Twitter OSN
Collaborators:
Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP)Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)
Saptarshi Ghosh
Topical search in Twitter Twitter has emerged as an important source
of information & real-time news Search for breaking news and trending topics
Topical search Searching for topical experts Searching for information on specific topics
Primary requirement: Identify topical expertise of users
Profile of a Twitter user
Example tweets
Prior approaches to find topic experts Research studies
Pal et. al. (WSDM 2011) uses 15 features from tweets, network, to identify topical experts
Weng et. al. (WSDM 2010) uses ML approach
Application systems Twitter Who To Follow (WTF), Wefollow, … Methodology not fully public, but reported to utilize
several features
Prior approaches use features extracted from User profiles
Screen-name, bio, …
Tweets posted by a user Hashtags, others retweeting a given user, …
Social graph of a user Number of followers, PageRank, …
Problems with prior approaches User profiles – screen-name, bio, …
Bio often does not give meaningful information
Tweets posted by a user Tweets mostly contain day-to-day conversation
Social graph of a user – number of followers, PageRank Helps to identify authoritative users, but … Does not provide topical information
We propose … Use a completely different feature to infer
topics of expertise for an individual Twitter user
Utilize social annotations How does the Twitter crowd describe a user? Social annotations obtained through Twitter Lists Approach essentially relies on crowdsourcing
Twitter Lists Primarily an organizational feature
Used to organize the people one is following
Create a named list, add an optional List description
Add related users to the List
Tweets posted by these users will be grouped together as a separate stream
How Lists work ?
Using Lists to infer topics for users If U is an expert / authority in a certain topic
U likely to be included in several Lists List names / descriptions provide valuable
semantic cues to the topics of expertise of U
Inferring topical attributes of users
Dataset Collected Lists of 55 million Twitter users who
joined before or in 2009 88 million Lists collected in total
All studies consider 1.3 million users who are included in 10 or more Lists
Most List names / descriptions in English, but significant fraction also in French, Portuguese, …
Mining Lists to infer expertise Collect Lists containing a given user U
List names / descriptions collected into a ‘topic document’ for the given user
Identify U’s topics from the document Ignore domain-specific stopwords Identify nouns and adjectives Unify similar words based on edit-distance,
e.g., journalists and jornalistas, politicians and politicos (not unified by stemming)
Mining Lists to infer expertise
Unigrams and bigrams considered as topics
Extracted from topic document of U: Topics for user U Frequencies of the topics in the
document
Topics inferred from Lists
linux, tech, open, software, libre, gnu, computer, developer, ubuntu, unix
politics, senator, congress, government, republicans, Iowa, gop, conservative
politics, senate, government, congress, democrats, Missouri, progressive, women
celebs, actors, famous, movies, comedy, funny, music, hollywood, pop culture
Lists vs. other features
love, daily, people, time, GUI, movie, video, life, happy, game, cool
Most common words from tweets
celeb, actor, famous, movie, stars, comedy, music, Hollywood, pop culture
Most common words from Lists
Profile bio
Lists vs. other features
Fallon, happy, love, fun, video, song, game, hope, #fjoln, #fallonmono
Most common words from tweets
celeb, funny, humor, music, movies, laugh, comics, television, entertainers
Most common words from Lists
Profile bio
Evaluation of inferred topics – 1 Evaluated through user-survey
Evaluator shown top 30 topics for a chosen user Are the inferred attributes (i) accurate, (ii)
informative? Binary response for both queries
More than 93% evaluators judged the topics to be both accurate and informative The few negative judgments were a result of
subjectivity
Evaluation of inferred topics – 2 Comparison with topics identified by Twitter WTF
Obtained top 20 WTF results for about 200 queries 3495 distinct users
Topics inferred by us from Lists include query-topic for 2916 users (83.4%)
For the rest Case 1 – inferred topics include semantically very
similar words, but not exact query-word (18%) Case 2 – wrong results by WTF, unrelated to query
(58%)
Comparison with Twitter WTF Restaurant dineLA for query “dining”
Inferred topics – food, restaurant, recipes, los angeles Space explorer HubbleHugger77 for query “hubble”
Inferred topics – science, tech, space, cosmology, nasa
Comedian jimmyfallon for query “astrophysicist” Inferred topics – celebs, comedy, humor, actor
Web developer ScreenOrigami for query “origami” Inferred topics – webdesign, html, designers
Case 1
Case 2
Who-is-who service Developed a Who-is-
Who service for Twitter
Shows word-cloud for major topics for a user
http://twitter-app.mpi-sws.org/who-is-who/Inferring Who-is-who in the Twitter
Social Network, WOSN 2012 (Highest rated paper in workshop)
Identifying topical experts
Topical experts in Twitter 400 million tweets posted daily
Quality of tweets posted by different users vary widely News, pointless babble, conversational tweets,
spam, …
Challenge: to find topical experts Sources of authoritative information on specific
topics
Basic methodology Given a query (topic)
Identify experts on the topic using Lists Discussed earlier
Rank identified experts w.r.t. expertise on the given topic Need a suitable ranking algorithm Commonly used ranking metrics such as number of
followers, PageRank does not consider topic
Ranking experts Two components of ranking user U w.r.t. query
Q: relevance of U to Q, popularity of U
Relevance of user to query Cover density ranking between topic document TU of
user U and Q Cover Density ranking preferred for short queries
Popularity of user: Number of Lists including the user
Topic relevance( TU, Q ) × log( #Lists including U )
Cognos Search system for topical experts in Twitter
Publicly deployed athttp://twitter-app.mpi-sws.org/whom-to-follow/
Cognos: Crowdsourcing Search for Topic Experts in Microblogs, ACM International SIGIR Conference 2012
Cognos results for “politics”
Cognos results for “stem cell”
Cognos results for “earthquake”
Evaluation of Cognos System evaluated ‘in-the-wild’
People were asked to try the system and give feedback
Evaluators were students & researchers from the home institutes of researchers
Advantage – lot of varied queries tried
Disadvantage – subjectivity in relevance judgement
User-evaluation of Cognos
Sample queries for evaluation
Evaluation results Overall 2136 relevance judgments over 55
queries 1680 said relevant (78.7%)
Large amount of subjectivity in evaluations Same result for same query received both relevant
and non-relevant judgments
E.g., for query “cloud computing”, Werner Vogels got 4 relevant judgments, 6 non-relevant judgments
Cognos vs Twitter Who-to-follow Evaluator shown top 10 results by both
systems Result-sets anonymized Evaluator judges which is better / both good / both
bad Queries chosen by evaluators themselves
27 distinct queries were asked at least twice In total, asked 93 times
Judgment by majority voting
Cognos vs Twitter WTF Cognos judged better on 12 queries
Computer science, Linux, mac, Apple, ipad, India, internet, windows phone, photography, political journalist
Twitter WTF judged better on 11 queries Music, Sachin Tendulkar, Anjelina Jolie, Harry Potter,
metallica, cloud computing, IIT Kharagpur Mostly names of individuals or organizations
Tie on 4 queries Microsoft, Dell, Kolkata, Sanskrit as an official language
Topical content search
Challenges in topical content search Services today are limited to keyword search
Search for ‘politics’ get only tweets which contain the word ‘politics’
Knowing which keywords to search for, is itself an issue
Individual tweets are too small to deduce topics
Scalability: 400M tweets posted per day
Tweets may contain spam / rumors / phishing URLs
Our approach Look at tweets posted by a selected set of
topical experts
Inferring topic of tweets from tweeters’ expertise Large fraction of tweets posted by experts are only
about day-to-day conversation
Solution: If multiple experts on a topic tweet about something, it is most likely related to the topic
Sampling Tweets from Experts We capture all tweets from 585K topical experts
Identified through Lists Expertise in a wide variety of topics
The experts generate 1.46 million tweets per day 0.268% of all tweets on twitter scalable
Trustworthiness Experts not likely to post spam / phishing URLs Less chance of rumors in what is posted by several
experts
Methodology at a Glance Gather tweets from experts on given topic Group tweets on the same news-story
We use a group of hashtags to represent a news-story
Multi-level clustering (cluster: news-story) Cluster tweets based on the hashtags they contain Cluster hashtags based on co-occurrence
Rank new-stories by popularity Number of distinct experts tweeting on the story Number of tweets on the story
Results for thelast week on
Politics (a popular topic)
Related tweetsgrouped together bycommon hashtags.
The most popular tweet in the story shown
Hashtags which co-occur frequently grouped together
Our system specially excels for niche topics.
Evaluation – Relevance Evaluated using human feedback
Used Amazon Mechanical Turk for user evaluation Evaluated top 10 clusters for 20 topics
Users have to judge if the tweet shown was relevant to the given topic Options are Relevant / Not Relevant / Can’t Say
Evaluating Tweet Relevance We obtained 3150 judgments
80% of tweets marked relevant by majority judgment
Non-relevant results primarily due to Global events that were discussed by experts
across all topics, e.g., Hurricane Sandy in the USA
Sometimes, topic is too specific and several experts tweet on a broader topic (e.g., baseball and ESPN Sports Update)
Effect of global events
Experts on all topics tweeting on #sandy Most of these got negative judgments
Diversity of topics in Twitter
Topics in Twitter Discovering thousands of experts on diverse topics
characterizing the Twitter platform as a whole
On what topics is expert content available in Twitter?
Popular view – few topics such as politics, sports, music, celebs, …
We find – lots of niche topics along with the popular ones
Topics in Twitter – major topics to niche ones what Twitter is mostly known for
wide variety of niche topics
Thank You
Contact: [email protected]