topical search in twitter
DESCRIPTION
Topical search in Twitter. Complex Network Research Group Department of CSE, IIT Kharagpur. Topical search on Twitter. Twitter has emerged as an important source of information & real-time news Most common search in Twitter: search for trending topics and breaking news Topical search - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/1.jpg)
Topical search in Twitter
Complex Network Research GroupDepartment of CSE, IIT Kharagpur
![Page 2: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/2.jpg)
Topical search on Twitter Twitter has emerged as an important source of information & real-time news Most common search in Twitter: search for trending topics and breaking news
Topical search Identifying topical attributes / expertise of users
Searching for topical experts Searching for information on specific topics
![Page 3: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/3.jpg)
Prior approaches to find topic experts Research studies
Pal et. al. (WSDM 2011) uses 15 features from tweets, network, to identify topical experts
Weng et. al. (WSDM 2010) uses ML approach
Application systems Twitter Who To Follow (WTF), Wefollow, … Methodology not fully public, but reported to utilize several features
![Page 4: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/4.jpg)
Prior approaches use features extracted from User profiles
Screen-name, bio, …
Tweets posted by a user Hashtags, others retweeting a given user, …
Social graph of a user #followers, PageRank, …
![Page 5: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/5.jpg)
Problems with prior approaches User profiles – screen-name, bio, …
Bio often does not give meaningful information
Information in users profiles mostly unvetted
Tweets posted by a user Tweets mostly contain day-to-day conversation
Social graph of a user – #followers, PageRank Does not provide topical information
![Page 6: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/6.jpg)
We propose … Use a different way to infer topics of expertise for an individual Twitter user
Utilize social annotations How does the Twitter crowd describe a user? Social annotations obtained through Twitter Lists
Approach essentially relies on crowdsourcing
![Page 7: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/7.jpg)
Twitter Lists A feature used to organize the people one is following on Twitter Create a named list, add an optional List description
Add related users to the List Tweets posted by these users will be grouped together as a separate stream
![Page 8: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/8.jpg)
How Lists work ?
![Page 9: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/9.jpg)
Using Lists to infer topics for users If U is an expert / authority in a certain topic U likely to be included in several Lists List names / descriptions provide valuable semantic cues to the topics of expertise of U
![Page 10: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/10.jpg)
Dataset Collected Lists of 55 million Twitter users who joined before or in 2009 88 million Lists collected in total
All studies consider 1.3 million users who are included in 10 or more Lists
Most List names / descriptions in English, but significant fraction also in French, Portuguese, …
![Page 11: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/11.jpg)
Inferring topical attributes of users
![Page 12: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/12.jpg)
Mining Lists to infer expertise Collect Lists containing a given
user U List names / descriptions collected into a ‘document’ for the given user
Identify U’s topics from the document Handle CamelCase words, case-folding Ignore domain-specific stopwords Identify nouns and adjective Unify similar words based on edit-distance, e.g., journalists and jornalistas, politicians and politicos (not unified by stemming)
![Page 13: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/13.jpg)
Mining Lists to infer expertise
Unigrams and bigrams considered as topics
Result: Topics for U along with their frequencies in the document
![Page 14: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/14.jpg)
Topics inferred from Lists
linux, tech, open, software, libre, gnu, computer, developer, ubuntu, unix
politics, senator, congress, government, republicans, Iowa, gop, conservative
politics, senate, government, congress, democrats, Missouri, progressive, women
celebs, actors, famous, movies, comedy, funny, music, hollywood, pop culture
![Page 15: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/15.jpg)
Lists vs. other features
love, daily, people, time, GUI, movie, video, life, happy, game, cool
Most common words from tweets
celeb, actor, famous, movie, stars, comedy, music, Hollywood, pop culture
Most common words from Lists
Profile bio
![Page 16: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/16.jpg)
Lists vs. other features
Fallon, happy, love, fun, video, song, game, hope, #fjoln, #fallonmono
Most common words from tweets
celeb, funny, humor, music, movies, laugh, comics, television, entertainers
Most common words from Lists
Profile bio
![Page 17: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/17.jpg)
Who-is-who service Developed a Who-is-Who service for Twitter
Shows word-cloud for major topics for a user
http://twitter-app.mpi-sws.org/who-is-who/Inferring Who-is-who in the Twitter
Social Network, WOSN 2012 (Highest rated paper in workshop)
![Page 18: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/18.jpg)
Identifying topical experts
![Page 19: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/19.jpg)
Topical experts in Twitter 400 million tweets posted daily
Quality of tweets posted by different users vary widely News, pointless babble, conversational tweets, spam, …
Challenge: to find topical experts Sources of authoritative information on specific topics
![Page 20: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/20.jpg)
Basic methodology Given a query (topic)
Identify experts on the topic using Lists Discussed earlier
Rank identified experts w.r.t. given topic Need ranking algorithm
Additional challenge: keeping the system up-to-date in face of thousands of users joining Twitter daily
![Page 21: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/21.jpg)
Ranking experts Used a ranking scheme solely based on Lists
Two components of ranking user U w.r.t. query Q Relevance of user to query – cover density ranking between topic document TU of user and Q
Popularity of user – number of Lists including the user
Cover Density ranking preferred for short queriesTopic relevance( TU, Q ) × log( #Lists including U )
![Page 22: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/22.jpg)
Cognos Search system for topical experts in Twitter
Publicly deployed athttp://twitter-app.mpi-sws.org/whom-to-follow/
Cognos: Crowdsourcing Search for Topic Experts in Microblogs, ACM SIGIR 2012
![Page 23: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/23.jpg)
Cognos results for “politics”
![Page 24: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/24.jpg)
Cognos results for “stem cell”
![Page 25: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/25.jpg)
Evaluation of Cognos - 1 Competes favorably with prior research attempts to identify topical experts (Pal et al. [WSDM 2011])
![Page 26: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/26.jpg)
Evaluation of Cognos – 2 Cognos compared with Twitter WTF Evaluator shown top 10 results by both systems
Result-sets anonymized Evaluator judges which is better / both good / both bad
Queries chosen by evaluators themselves
27 distinct queries were asked at least twice In total, asked 93 times
Judgment by majority voting
![Page 27: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/27.jpg)
![Page 28: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/28.jpg)
Cognos vs Twitter WTF Cognos judged better on 12 queries
Computer science, Linux, mac, Apple, ipad, India, internet, windows phone, photography, political journalist
Twitter WTF judged better on 11 queries Music, Sachin Tendulkar, Anjelina Jolie, Harry Potter, metallica, cloud computing, IIT Kharagpur
Mostly names of individuals or organizations
Tie on 4 queries Microsoft, Dell, Kolkata, Sanskrit as an official language
![Page 29: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/29.jpg)
Cognos vs Twitter WTF Low overlap between top 10 results
… In spite of same topic being inferred for 83% experts
Major differences are due to List-based ranking Top Twitter WTF results – mostly business accounts
Top Cognos results – mostly personal accounts
![Page 30: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/30.jpg)
![Page 31: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/31.jpg)
Keeping system up-to-date Any search / recommendation system on OSN platform needs to be kept up-to-date Thousands of new users join every day Need efficient way of discovering topical experts
Can brute force approach be used? Periodically crawl data (profile, Lists) of all users
![Page 32: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/32.jpg)
Scalability problem 200 million new users joined Twitter during 9 months in 2011 740K new users join daily
Lower-bound estimate: 1480K API calls per day required to crawl their profiles and Lists
Twitter allows only 3.6K API calls per day per IP 480K API calls per day from whitelisted IP
Plus, 465 million users already
![Page 33: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/33.jpg)
How many experts in Twitter? Only 1% listed 10 or more times
Only 0.12% listed 100 or more times
If experts can be identified efficiently, possible to crawl their Lists
![Page 34: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/34.jpg)
Identifying experts efficiently Hubs – users who follow many experts and add them to Lists Identified top hubs in social network using HITS
Crawled Lists created by top 1 million hubs
Top 1M hubs listed 4.1M users 2.06M users included in 10 or more Lists (50%)
Discovered 65% of the estimated number of experts listed 100 or more times
![Page 35: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/35.jpg)
Identifying experts efficiently More than 42% of the users listed by top hubs have joined Twitter after 2009
Discovered several popular experts who joined within the duration of the crawl
All experts reported by Pal et. al. discovered
Discovered all Twitter WTF top 20 results for 50% of the queries, 15 or more for 80% of the queries
![Page 36: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/36.jpg)
Topical search in Twitter
![Page 37: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/37.jpg)
Looking for Tweets by Topic Services today are limited to keyword search Knowing which keywords to search for, is itself an issue
Keyword search is not context aware
Tweets are too small to deduce topics
Topic analysis of 400M tweets/day is a challenge
![Page 38: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/38.jpg)
Challenges Some tweets are more important than others Millions of tweets are posted on popular topics
Only some are relevant to the context intended
Tweets may contain wrong or misleading info Twitter has a large population of spammers Twitter is also a potent source of rumors Some tweets are outright malicious
![Page 39: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/39.jpg)
Our Approach to the Issues Scalability
We only look at tweets from as small subset of users who are experts on different topics
Topic deduction We map user expertise topics, to tweets/hashtags, instead of the other way round
Trustworthiness Our source of tweets is a small subset of users It is practical to vet their expertise and reputation
![Page 40: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/40.jpg)
Advantages of list-based methodology
600K experts on 36K distinct topics
![Page 41: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/41.jpg)
TopicalDiversityofExpertSample
CSCW’14
![Page 42: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/42.jpg)
PopularTopics
![Page 43: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/43.jpg)
NicheTopics
![Page 44: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/44.jpg)
Challenges in Used Approach We assign topics to tweets/hashtags
Inferring tweet topics from tweeter expertise Experts can have multiple topics of expertise Experts do tweet about topics beyond their expertise
Solution: If multiple experts on a subject tweet about something, it is most likely related to the topic.
![Page 45: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/45.jpg)
Sampling Tweets from Experts We capture all tweets from 585K topical experts
This is a set we obtained from our previous study This about 0.1% of the whole Twitter population
The experts generate 1.46 million tweets/per day This is 0.268% of all tweets on twitter
Expertise in diverse topics (36K) Our topics of expertise is crowd sourced We will have more topics as more users show interests
![Page 46: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/46.jpg)
Methodology at a Glance Given a topic, we gather tweets from experts We use hashtags to represent subjects
Clustering Tweets by similar hashtags A cluster represents information on related subjects
Ranking clusters by popularity Number of unique experts tweeting on the subject Number of unique tweets on the subject
Ranking tweets by authority Tweets from highest ranked user is shown first
![Page 47: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/47.jpg)
What-is-happening on Twitter
twitter-app.mpi-sws.org/what-is-happening/
Topical search in Microblogs with Cognoscenti, Or: The Wisdom of Crowdsourced Experts,
![Page 48: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/48.jpg)
Results for thelast week on
Politics (a popular topic)
![Page 49: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/49.jpg)
Related tweets aregrouped together bycommon hashtags.
Number of expertstweeting on the subjectand the number of tweetson the subject decidesranking.
The most popular tweetfrom the mostauthoritative userrepresents the group.
![Page 50: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/50.jpg)
Our system specially excels for niche topics.
![Page 51: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/51.jpg)
Evaluation – Relevance We used Amazon Mechanical Turk for user evaluation We chose to evaluate 20 topics We picked top 10 tweets and hashtags We picked results for all 3 time groups
Users have to judge if the tweet/hashtag was relevant to the given topic Options are Relevant/Not Relevant/Can’t Say
We chose master workers only Every tweet/hashtag was evaluated by at least 4 users
![Page 52: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/52.jpg)
Evaluating Tweet Relevance We obtained 3150 judgments
76% of which were Relevant
22% Not Relevant, 2% Can’t Say
80% of the Tweets were marked relevant by majority judgment
![Page 53: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/53.jpg)
Dissecting Negative Judgments Iphone was the topic which received most negative results
Experts on Iphone were generally tweeting on the overall topic (such as androids, tablets, …)
Last week time group had most positive results Scarcity of information led to bad ranking
![Page 54: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/54.jpg)
Evaluating Hashtag Relevance Total 3200 judgments
62.3% were Relevant Much less than tweets (76% were marked relevant)
Relevance of hashtags is very context sensitive
![Page 55: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/55.jpg)
Perspectival relevance
The generic hashtag #sandy is very relevant to the topics in context of the tweet.
These got negative judgments when shown without the tweets.
![Page 56: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/56.jpg)
Generic Hashtags
Some hashtags are generic, but our service brings our their specificity with respect to the topic.
These hashtags received negative judgments when shown without the context of the tweet.
![Page 57: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/57.jpg)
Summary Simple Core Observation
Users curate experts
Services who-is who (WOSN’12, CCR’12)whom-to-follow (SIGIR’12)what-is-happening (in-submission)Sample-stream (CIKM’13, CSCW’14)
![Page 58: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/58.jpg)
Complex Network Research Group
![Page 59: Topical search in Twitter](https://reader035.vdocument.in/reader035/viewer/2022062723/56813e2c550346895da80d52/html5/thumbnails/59.jpg)
Thank You
Contact: [email protected]
Complex Network Research Group (CNeRG) CSE, IIT Kharagpur, Indiahttp://cse.iitkgp.ac.in/resgrp/cnerg/