leveraging the semantics of tweets for adaptive faceted search on twitter
DESCRIPTION
Slides presented at ISWC 2011, Bonn, Germany. Corresponding paper: http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Research_Paper/12/70310001.pdfTRANSCRIPT
DelftUniversity ofTechnology
Leveraging the Semantics of Tweets for Adaptive Faceted Search on TwitterISWC, Bonn, Germany, Oct 27th 2011
Fabian Abel1, Ilknur Celik1, Geert-Jan Houben, Patrick Siehndel2
1Web Information Systems, TU Delft, the Netherlands2L3S Research Center, Hannover, Germany
2Adaptive Faceted Search on Twitter
PersonalizedRecommendations
Personalized Search Adaptive Systems
What we do: Science and Engineering for the Personal Web
Social Web
Analysis and User Modeling
user/usage data
Semantic Enrichment, Linkage and Alignment
domains: news social media cultural heritage public data e-learning
3Adaptive Faceted Search on Twitter
200,000,000number of tweets published per day
4Adaptive Faceted Search on Twitter
1number of tweets that are interesting for me now
5Adaptive Faceted Search on Twitter
Searching on Twitter
6Adaptive Faceted Search on Twitter
Issues with Multiple Keywords Search
7Adaptive Faceted Search on Twitter
Let’s try to search with One Keyword
8Adaptive Faceted Search on Twitter
Page 1
9Adaptive Faceted Search on Twitter
Page 2
10Adaptive Faceted Search on Twitter
Page 3
11Adaptive Faceted Search on Twitter
Page 60!!
tweet I was looking for
Next Saturday @thatsimpsonguy aka Guilty Simpson will be performing atArea51 in my hometwon Eindhoven. #realliveshit #iwillspinrecordsabout 9 hours ago via Blackberry
Music Artist
Locations
12Adaptive Faceted Search on Twitter
Is there an easier way?
Locations more...
Events more...
Music Artists:+ Guilty Simpson+ Bryan Adams+ Elton John+ Golden Earring+ Rihanna+ The eagles+ 3 Doors Downmore...
Current Query:
Results:1. Yskiddd: Next saturday
@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit #iwillspinrecords2
2. Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL
3. sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents
Eindhoven Music
Expand Query:
Faceted Search can help (hypothesis)
13Adaptive Faceted Search on Twitter
Challenges
14Adaptive Faceted Search on Twitter
Facets of a Tweet
@bob: Julian Assange got arrested
http://bit.ly/5d4r2t
Creator @bob Location Delft, the NetherlandsCreation time Nov 29th 2011
Facet type Facet Value
Challenge 1: How to infer facets that describe the content of a tweet?
15Adaptive Faceted Search on Twitter
Faceted Search: selecting facet-value pairs
Locations+ Aachen+ Aalborg+ Aalesund+ Aarhus+ Aasiaat+ Abaiang+ Abakanmore...
Events more...
Music Artists more…
Current Query:
Results:1. Yskiddd: Next saturday
@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit #iwillspinrecords2
2. Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL
3. sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents
Music
Expand Query:
Number of selectable facet values may be
very high!
Challenge 2: How to adapt the faceted search interface to the current demands
of a user?
16Adaptive Faceted Search on Twitter
Adaptive Faceted Search Framework
17Adaptive Faceted Search on Twitter
Adaptive Faceted Search Framework
Adaptive Faceted Search
Twitter posts
Semantic Enrichment
User and Context Modeling
user
How to adapt the facet-value pair ranking to
the current demands of the
user?
How to represent the content of a
tweet? facet extraction
18Adaptive Faceted Search on Twitter
Facet Extraction and Semantic Enrichment
@bob: Julian Assange got arrested http://bit.ly/5d4r2t
Julian Assange
Julian Assange Tweet-basedenrichment
Julian Assange arrestedJulian Assange, the founder ofWikiLeaks, is under arrest inLondon…
Link-basedenrichment
Julian Assange
London
WikiLeaks
Julian Assange Julian Assange
LondonWikiLeaks
powered by
19Adaptive Faceted Search on Twitter
Impact of Link-based enrichment
Representation of tweets:
significantly more facets per tweet with link-
based enrichment
20Adaptive Faceted Search on Twitter
Faceted Search Strategies
• Challenge: most-relevant facet-value pair should appear at the top of the ranking
• Baseline: hashtag-based keyword search
Locations1. Aachen2. Aalborg3. Aalesund4. Aarhus…2145.
Eindhoven
Locations1. Eindhoven2. Delft3. Amsterdam4. Rotterdam5. London…
21Adaptive Faceted Search on Twitter
Faceted Search Strategies
• Challenge: most-relevant facet-value pair should appear at the top of the ranking
• Baseline: hashtag-based keyword search• Faceted Search Strategies:
1. Occurrence frequency: count occurrence frequencies of FVP (baseline)
Locations1. Aachen2. Aalborg3. Aalesund4. Aarhus…2145.
Eindhoven
Locations1. Eindhoven2. Delft3. Amsterdam4. Rotterdam5. London…
facet-value pair
current hit list of matching tweets
number of tweets that contain the FVP
22Adaptive Faceted Search on Twitter
Faceted Search Strategies
• Challenge: most-relevant facet-value pair should appear at the top of the ranking
• Baseline: hashtag-based keyword search• Faceted Search Strategies:
1. Occurrence frequency: count occurrence frequencies of FVP (baseline)
2. Personalization: adapt ranking to user profile ( different user modeling strategies possible; here: entire tweeting history of the user)
Locations1. Aachen2. Aalborg3. Aalesund4. Aarhus…2145.
Eindhoven
Locations1. Eindhoven2. Delft3. Amsterdam4. Rotterdam5. London…
facet-value pair
current hit list of matching tweets
number of tweets that contain the FVP
Personalized FVP ranking strateyfacet-value pair
current hit list of matching tweets
number of tweets that contain the FVP
timeJune 27 July 4user
User Profile FVP weight 6
43
(location, Delft)
(event, JazzBaltica)
(person, ChetBaker)
weight in user profile =
rank of the FVP
23Adaptive Faceted Search on Twitter
Faceted Search Strategies
• Challenge: most-relevant facet-value pair should appear at the top of the ranking
• Baseline: hashtag-based keyword search• Faceted Search Strategies:
1. Occurrence frequency: count occurrence frequencies of FVP (baseline)
2. Personalization: adapt ranking to user profile ( different user modeling strategies possible; here: entire tweeting history of the user)
3. Diversification: increase variety among the top-ranked FVPs
Locations1. Aachen2. Aalborg3. Aalesund4. Aarhus…2145.
Eindhoven
Locations1. Eindhoven2. Delft3. Amsterdam4. Rotterdam5. London…
facet-value pair
current hit list of matching tweets
number of tweets that contain the FVP
Personalized FVP ranking strateyfacet-value pair
current hit list of matching tweets
number of tweets that contain the FVP
timeJune 27 July 4user
User Profile FVP weight 6
43
(location, Delft)
(event, JazzBaltica)
(person, ChetBaker)
weight in user profile =
rank of the FVPnumber of tweets that contain the FVP
Genre+ Blues+ Jazz+ JazzMusic+ Rockmore...
Genre+ Blues+ Jazz+ Rock+ Classicmore...
minimize overlaps
24Adaptive Faceted Search on Twitter
Faceted Search Strategies
• Challenge: most-relevant facet-value pair should appear at the top of the ranking
• Baseline: hashtag-based keyword search• Faceted Search Strategies:
1. Occurrence frequency: count occurrence frequencies of FVP (baseline)
2. Personalization: adapt ranking to user profile ( different user modeling strategies possible; here: entire tweeting history of the user)
3. Diversification: increase variety among the top-ranked FVPs4. Time-sensitivity: adapt FVP ranking to temporal context
•Semantic enrichment: (i) tweet-based and (ii) link-based enrichment
Locations1. Aachen2. Aalborg3. Aalesund4. Aarhus…2145.
Eindhoven
Locations1. Eindhoven2. Delft3. Amsterdam4. Rotterdam5. London…
facet-value pair
current hit list of matching tweets
number of tweets that contain the FVP
Personalized FVP ranking strateyfacet-value pair
current hit list of matching tweets
number of tweets that contain the FVP
timeJune 27 July 4user
User Profile FVP weight 6
43
(location, Delft)
(event, JazzBaltica)
(person, ChetBaker)
weight in user profile =
rank of the FVPnumber of tweets that contain the FVP
Genre+ Blues+ Jazz+ JazzMusic+ Rockmore...
Genre+ Blues+ Jazz+ Rock+ Classicmore...
minimize overlaps
Personalized FVP ranking stratey
current hit list of matching tweets
number of tweets that contain the FVP
timeJune 27 July 4June 20
occu
rren
ce
freq
uen
cy
of
FV
P (event, JazzBaltica)
(event, FrenchOpen)
Event+ JazzBaltica+ FrenchOpenmore...
search
25Adaptive Faceted Search on Twitter
Research Questions
1. How well does faceted search that is supported by the semantic enrichment perform in comparison to keyword search?
2. What strategy performs best in ranking facet-value pairs that allow users to find relevant tweets on Twitter?
3. How do the different building blocks of the faceted search framework influence the performance?
26Adaptive Faceted Search on Twitter
Dataset
timeNov 15 Dec 15 Jan 15 Feb 15
20,000 Twitter users
30,000,000 tweets
4 months
more than:
Egyptian revolution
Jan 25
27Adaptive Faceted Search on Twitter
Evaluation Framework• User Simulation Model [cf. Koren et al., WWW’08]:
• Input: search settings = { (user who searches, relevant target tweet) }
• Drill down search result list until no more FVPs can be applied or less than 10 tweets match the query
• Simulating click behavior: first-matching FVP is selected ( user knows target resource)
• Ground truth relevant target tweet = tweet that has been re-tweeted by the user
• Metrics:• Succes@k: probability that relevant FVP appears in the top k
(the higher the Succes@k, the faster the search and fewer the user effort)
• MRR: mean reciprocal rank of the target tweet when the user selected it
28Adaptive Faceted Search on Twitter
Faceted-search vs. hashtag-based (keyword) search
Faceted search based on semantic
enrichment of tweets outperforms
hashtgag-based search significantly.
29Adaptive Faceted Search on Twitter
Results: OverviewPersonalized strategy achieves ~12% better
performance than other semantic strategies (and 2 x better than hashtag-based)
30Adaptive Faceted Search on Twitter
Impact of link-based enrichmentPersonalized strategy outperforms baseline
significantly
Link-based enrichment improves quality for
both strategies
31Adaptive Faceted Search on Twitter
Impact of time-sensitivity
Time-sensitivity based ranking improves quality for both frequency and
diversification strategies
32Adaptive Faceted Search on Twitter
Application of the Faceted Search Framework
33Adaptive Faceted Search on Twitter
Twitcident.comTwitter-based crisis
management system
1.
2.
3. 4.
Semantic enrichment allows for:1. Grouping
tweets into incidents
2. Faceted search3. Thematic Views4. Analysis
34Adaptive Faceted Search on Twitter
Conclusions
What we did: • Adaptive Faceted Search on Twitter + Evaluation
Framework• Analysis and Evaluation (+ Application in Twitcident)Findings:1. Semantic Enrichment allows for structured
representation of the content of tweets basis for faceted search
2. Faceted search performs significantly better than hashtag-based keyword search
3. Different building blocks for making faceted search on Twitter adaptive improve the search quality:a) Link-based enrichment: more discoverable tweets, better search
performance
b) Personalization leads to significant improvements
c) Time-sensitivity improves performance as well
35Adaptive Faceted Search on Twitter
Thank you!
Twitter: @fabianabelhttp://wis.ewi.tudelft.nl/iswc2011/