dynamic user profiling for search personalisation
TRANSCRIPT
Thanh VuComputing and Communications
DepartmentThe Open University
Dynamic User Profiling for Search Personalisation
Classical Search Systems
2
AOL, Altavista return search results based onThe user input queryRegardless of the user searching preferences
Different users submit the same input query will get the same returned result list
Queries are usually short and ambiguous, e.g., Michael Jordan, Java, etc.
Different users have different information needs with the same input query
Search PersonalisationReturn search results based on
The input queryThe user searching interests
Different users submit the same input query will probably get different search result lists
Even an individual user will get different search results at different search times (e.g., Open US)
3
4
Part I: Dynamic group formation
The performance of search personalisation
depends onthe richness of a user
profileJ. Teevan, M. R. Morris, and S. Bush. Discovering and using groups to improve personalized search. In WSDM’2009
5
Topic-based user profilesUse Human generated ontology (ODP –
dmoz.org) to extract topics from all clicked/relevant documents of a specific user to build her profile
1. R. W. White, et al., Enhancing Personalized Search by Mining and Modeling Task Behavior. In WWW’20132. P. N. Bennett, et al., Modeling the impact of short- and long-term behavior on search personalization. In SIGIR’2012
6
Challenges for Human Generated OntologyNew topics which are not covered in the
Ontology will possibly emerge overtimeExpensive human effort to classify/maintain
each document into correct categories
7
Enriching a user profileUse information of the group of users who
share common interests
R. W. White, W. Chu, A. Hassan, X. He, Y. Song, and H. Wang. Enhancing personalized search by mining and modeling task behavior. WWW '13, pages 1411-1420, Switzerland, 2013. ACM8
Challenges for grouping methodsConstruct groups statically using some
predetermined criterions such as common clicked documentsUsers in a group may have different interests
on different topics w.r.t the input query
Z. Dou, R. Song, and J.-R. Wen. A large-scale evaluation and analysis of personalized search strategies. WWW '07, pages 581-590, NY, USA, 2007. ACM.9
Research QuestionHow can we enrich user profiles with dynamic group formation?
1. How can we dynamically group users who share common interests?
2. How can we enrich user profiles with group information?
3. Can enriched user profiles help to improve search performance?
10
Dynamic group formationThe groups should be dynamically
constructed in response to the user’s input query
11
Applying Latent Dirichlet Allocation
12
Constructing a user profileAverage the relevant documents over
topics
13
Query-dependent user groupingConstruct shared user profilesUse the input query as an indicator for
grouping users
14
Constructing a shared user profile
15
Query-dependent user groupingP(q|z) =
16
Query-dependent user grouping
The 2-nearest users
0.450.350.20
17
Enriching a user profileAverage all users in the group over topics
18
Re-ranking search resultsFor each input query q
Download the top n ranked search results from the search engine
Compute a personalised score for each web page d given the current user u – p(d|u)
Combine the personalised score p(d|u) and the original rank r(q,d), to get a final score
),()|(),|(
dqrudpqudf
19
Re-ranking search results Query: MU
20
DatasetQuery logs from Bing search engine for 15
days from 1st to 15th July 2012, 106 anonymous users
A relevant document is a click with dwell time of at least 30 seconds or the last click in a session (SAT click)
21
Evaluation metricsInverse Average Rank (IAR)
Personalisation Gain (P-Gain)
22
Baseline and Personalisation StrategiesBaseline and Personalisation Strategies
Baseline: The original ranked results from Bing
S_Profile: Use only the current user profileS_Group: Enrich the profile with static groupD_Group: Enrich the profile with dynamic
group
23
Overall Performance
24
25
Part II: Temporal User Profiles
Challenges for Time-awarenessPrevious methods use all the
clicked/relevant documents of a user to build her searching profile
The documents are treated equally without considering temporal features (i.e., the time of documents being clicked and viewed)The profile is too broad Cannot fully express the current interest of
the user1. T. T. Vu, et al., Improving search personalisation with dynamic group formation. In SIGIR’20142. K. Raman, et al., Toward whole-session relevance: Exploring intrinsic diversity in web search. In SIGIR’2013
26
Research QuestionHow can we build user profiles with time-awareness?
1. How can we build temporal user profiles?2. Can the time-aware profiles help improve
search performance?
27
Building temporal user profiles (1)Non-temporal method
4th 1st2nd3rd
FootballLawHealthOS
0.510.330.110.05
Clicked documents
FootballLawOSHealth
0.550.270.100.08
LawOSHealthFootball
0.410.370.120.10
OSLawFootballHealth
0.650.210.100.04
Distribution over topics
FootballLawOSHealth
0.320.300.290.09
Means over topics
The topic-based user profile
28
Building temporal user profiles (2)Our method
1st
FootballLawHealthOS
0.510.330.110.05
FootballLawHealthOS
0.510.330.110.05
The temporal topic user profile
0.90
29
FootballLawHealthOS
0.530.300.090.08
Building temporal user profiles (2)
2nd 1st
FootballLawHealthOS
0.510.330.110.05
FootballLawOSHealth
0.550.270.100.08
The temporal topic user profile
0.91 0.90
30
FootballLawOSHealth
0.370.340.190.10
0.91
0.92
Building temporal user profiles (2)
3rd 1st2nd
FootballLawHealthOS
0.510.330.110.05
FootballHealthOSLaw
0.550.270.100.08
LawOSHealthFootball
0.410.370.120.10
The temporal topic user profile
0.90
31
OSLawFootballHealth
0.320.300.290.09
Building temporal user profiles (2)
4th 1st2nd3rd
FootballLawHealthOS
0.510.330.110.05
FootballHealthOSLaw
0.550.270.100.08
LawOSHealthFootball
0.410.370.120.10
OSLawFootballHealth
0.650.210.100.04
Temporal topic profile
0.93
0.92
0.91
0.90
FootballLawOSHealth
0.320.300.290.09
Non-temporal topic profile
32
Building temporal user profiles (3)Du = {d1, d2, …, dn} is a relevant document
set of the user uThe user profile of u is a distribution over
the topic Z (extracted by LDA)
tdi = n indicates that di is the nth most relevant/clicked document of u
α is the decay parameter; K is the normalisation factor
33
Building temporal user profiles (4)Long-term user profile
Use relevant documents extracted from the user’s whole search history
Daily user profileUse relevant documents extracted from the
search history of the user in the current searching day
Session user profileUse relevant documents extracted from the
search history of the user in the current search session
34
Re-ranking search results (1)1 32
HealthLawFootballOS
0.510.330.110.05
FootballLawHealthOS
0.550.270.130.05
FootballOSHealthLaw
0.410.370.120.10
Original Rank
132
HealthLawFootballOS
0.510.330.110.05
FootballLawHealthOS
0.550.270.130.05
FootballOSHealthLaw
0.410.370.120.10
After re-ranking
FootballLawOSHealth
0.470.240.160.12
The user profile (p)
35
Re-ranking search results (2)Personalised scores
Use Jensen-Shannon divergence (DJS[d||p] )
1 32
HealthLawFootballOS
0.510.330.110.05
FootballLawHealthOS
0.550.270.130.05
FootballOSHealthLaw
0.410.370.120.10
FootballLawOSHealth
0.470.240.160.12
Returned documents (d)
The user profile (p)
36
Re-ranking search results (3)Re-ranking Features
Re-Ranking Algorithm: LambdaMART[1]
1. C. J. Burges, et al., Learning to rank with non-smooth cost functions. In NIPS’2007.
Feature DescriptionPersonalised FeaturesLongTermScore
Personalised score between document and long-term profile
DailyScore Personalised score between document and daily profile
SessionScore Personalised score between document and session profile
Non-personalised FeaturesDocRank Rank of document on original returned listQuerySim Cosine similarity score between current and
previous queriesQueryNo Total number of queries that have been submitted in
the current search session (included the current query)
37
EvaluationDatasetThe query logs of 1166 anonymous users in four
weeks, from 01st to 28th July 2012A log entity consists of an anonymous user
identifier, a query, top-10 returned URLs, and clicked documents along with the user’s dwell time
Download all the URLs’ content for learning topicsA search session is demarcated by 30 minutes of
user inactivityA relevant document is a click with dwell time of at
least 30 seconds or the last click in a session (SAT click)
38
Evaluation methodologyAssign a positive (relevant) label to a
returned URL ifit is a SAT click in the current queryit is a SAT click in one of the other repeated
queries in the same search sessionAssign negative (irrelevant) labels to the
rest of URLs
39
Personalisation Methods and BaselinesPersonalisation Methods
LON uses only LongTermScore from long-term profileDAI uses only DailyScore from daily profileSES uses SessionScore from session profileALL uses all personalised scores from three profiles
(ALL)Baselines
Default is the default ranking returned by the search engine
Static uses the LongTermScore from long-term profile without time-awareness (i.e., not using decay function)
40
ResultsEvaluation metrics
Mean Average Precision (MAP)Precision (P@k)Mean Reciprocal Rank (MRR)Normalized Discounted Cumulative Gain
(nDCG@k) For each evaluation metric, the higher
value indicates the better ranking
41
Overall Performance
• All the improvements over the baselines are significant with paired t-test of p < 0.001
42
Overall Performance
43
Overall Performance
44
Overall Performance
45
Overall Performance
46
TakeawaysDynamic Grouping
Grouping improves search performanceDynamic grouping outperforms static grouping
Temporal profilesThree temporal profiles help to improve
search performance over the default ranking and the use of non-temporal profile
Using all features (ALL) achieves the highest performance
The short-term profile achieves better performance than the longer-term profile
47
Thank you!Any questions?
48
Dataset (2)
49
Example of query logs
50
Click EntropiesP(d|q) is the percentage of the clicks on
document d among all the clicks for qA smaller query click entropy value
indicates more agreement between users on clicking a small number of web pages
51
Click entropies
52
Query Positions in Search SessionAim to study whether the position of a
query has any effect on the performance of the temporal latent topic profiles
Label the queries by their positions during the search
53
FootballLawHealthOS
0.510.330.110.05
Clicked documents
FootballHealthOSLaw
0.550.270.130.05
LawOSHealthFootball
0.410.370.120.10
OSLawFootballHealth
0.650.150.110.09
Distribution over topics
FootballLawOSHealth
0.320.290.280.11
Means over topics
The topic-based user profile
54
Re-ranking search results (1) Query: MU
55
Pre-processingRemove the queries whose positive label
set is empty from the datasetDiscard the domain-related queries (e.g.,
Facebook, Youtube)
56
Overall Performance
57