dynamic user profiling for search personalisation

Thanh VuComputing and Communications

DepartmentThe Open University

Dynamic User Profiling for Search Personalisation

Classical Search Systems

2

AOL, Altavista return search results based onThe user input queryRegardless of the user searching preferences

Different users submit the same input query will get the same returned result list

Queries are usually short and ambiguous, e.g., Michael Jordan, Java, etc.

Different users have different information needs with the same input query

Search PersonalisationReturn search results based on

The input queryThe user searching interests

Different users submit the same input query will probably get different search result lists

Even an individual user will get different search results at different search times (e.g., Open US)

3

4

Part I: Dynamic group formation

The performance of search personalisation

depends onthe richness of a user

profileJ. Teevan, M. R. Morris, and S. Bush. Discovering and using groups to improve personalized search. In WSDM’2009

5

Topic-based user profilesUse Human generated ontology (ODP –

dmoz.org) to extract topics from all clicked/relevant documents of a specific user to build her profile

1. R. W. White, et al., Enhancing Personalized Search by Mining and Modeling Task Behavior. In WWW’20132. P. N. Bennett, et al., Modeling the impact of short- and long-term behavior on search personalization. In SIGIR’2012

6

Challenges for Human Generated OntologyNew topics which are not covered in the

Ontology will possibly emerge overtimeExpensive human effort to classify/maintain

each document into correct categories

7

Enriching a user profileUse information of the group of users who

share common interests

R. W. White, W. Chu, A. Hassan, X. He, Y. Song, and H. Wang. Enhancing personalized search by mining and modeling task behavior. WWW '13, pages 1411-1420, Switzerland, 2013. ACM8

Challenges for grouping methodsConstruct groups statically using some

predetermined criterions such as common clicked documentsUsers in a group may have different interests

on different topics w.r.t the input query

Z. Dou, R. Song, and J.-R. Wen. A large-scale evaluation and analysis of personalized search strategies. WWW '07, pages 581-590, NY, USA, 2007. ACM.9

Research QuestionHow can we enrich user profiles with dynamic group formation?

1. How can we dynamically group users who share common interests?

2. How can we enrich user profiles with group information?

3. Can enriched user profiles help to improve search performance?

10

Dynamic group formationThe groups should be dynamically

constructed in response to the user’s input query

11

Applying Latent Dirichlet Allocation

12

Constructing a user profileAverage the relevant documents over

topics

13

Query-dependent user groupingConstruct shared user profilesUse the input query as an indicator for

grouping users

14

Constructing a shared user profile

15

Query-dependent user groupingP(q|z) =

16

Query-dependent user grouping

The 2-nearest users

0.450.350.20

17

Enriching a user profileAverage all users in the group over topics

18

Re-ranking search resultsFor each input query q

Download the top n ranked search results from the search engine

Compute a personalised score for each web page d given the current user u – p(d|u)

Combine the personalised score p(d|u) and the original rank r(q,d), to get a final score

),()|(),|(

dqrudpqudf

19

Re-ranking search results Query: MU

20

DatasetQuery logs from Bing search engine for 15

days from 1st to 15th July 2012, 106 anonymous users

A relevant document is a click with dwell time of at least 30 seconds or the last click in a session (SAT click)

21

Evaluation metricsInverse Average Rank (IAR)

Personalisation Gain (P-Gain)

22

Baseline and Personalisation StrategiesBaseline and Personalisation Strategies

Baseline: The original ranked results from Bing

S_Profile: Use only the current user profileS_Group: Enrich the profile with static groupD_Group: Enrich the profile with dynamic

group

23

Overall Performance

24

25

Part II: Temporal User Profiles

Challenges for Time-awarenessPrevious methods use all the

clicked/relevant documents of a user to build her searching profile

The documents are treated equally without considering temporal features (i.e., the time of documents being clicked and viewed)The profile is too broad Cannot fully express the current interest of

the user1. T. T. Vu, et al., Improving search personalisation with dynamic group formation. In SIGIR’20142. K. Raman, et al., Toward whole-session relevance: Exploring intrinsic diversity in web search. In SIGIR’2013

26

Research QuestionHow can we build user profiles with time-awareness?

1. How can we build temporal user profiles?2. Can the time-aware profiles help improve

search performance?

27

Building temporal user profiles (1)Non-temporal method

4th 1st2nd3rd

FootballLawHealthOS

0.510.330.110.05

Clicked documents

FootballLawOSHealth

0.550.270.100.08

LawOSHealthFootball

0.410.370.120.10

OSLawFootballHealth

0.650.210.100.04

Distribution over topics

FootballLawOSHealth

0.320.300.290.09

Means over topics

The topic-based user profile

28

Building temporal user profiles (2)Our method

1st

FootballLawHealthOS

0.510.330.110.05

FootballLawHealthOS

0.510.330.110.05

The temporal topic user profile

0.90

29

FootballLawHealthOS

0.530.300.090.08

Building temporal user profiles (2)

2nd 1st

FootballLawHealthOS

0.510.330.110.05

FootballLawOSHealth

0.550.270.100.08


0.91 0.90

30

FootballLawOSHealth

0.370.340.190.10

0.91

0.92


3rd 1st2nd

FootballLawHealthOS

0.510.330.110.05

FootballHealthOSLaw

0.550.270.100.08

LawOSHealthFootball

0.410.370.120.10


0.90

31

OSLawFootballHealth

0.320.300.290.09


4th 1st2nd3rd

FootballLawHealthOS

0.510.330.110.05

FootballHealthOSLaw

0.550.270.100.08

LawOSHealthFootball

0.410.370.120.10

OSLawFootballHealth

0.650.210.100.04

Temporal topic profile

0.93

0.92

0.91

0.90

FootballLawOSHealth

0.320.300.290.09

Non-temporal topic profile

32

Building temporal user profiles (3)Du = {d1, d2, …, dn} is a relevant document

set of the user uThe user profile of u is a distribution over

the topic Z (extracted by LDA)

tdi = n indicates that di is the nth most relevant/clicked document of u

α is the decay parameter; K is the normalisation factor

33

Building temporal user profiles (4)Long-term user profile

Use relevant documents extracted from the user’s whole search history

Daily user profileUse relevant documents extracted from the

search history of the user in the current searching day

Session user profileUse relevant documents extracted from the

search history of the user in the current search session

34

Re-ranking search results (1)1 32

HealthLawFootballOS

0.510.330.110.05

FootballLawHealthOS

0.550.270.130.05

FootballOSHealthLaw

0.410.370.120.10

Original Rank

132

HealthLawFootballOS

0.510.330.110.05

FootballLawHealthOS

0.550.270.130.05

FootballOSHealthLaw

0.410.370.120.10

After re-ranking

FootballLawOSHealth

0.470.240.160.12

The user profile (p)

35

Re-ranking search results (2)Personalised scores

Use Jensen-Shannon divergence (DJS[d||p] )

1 32

HealthLawFootballOS

0.510.330.110.05

FootballLawHealthOS

0.550.270.130.05

FootballOSHealthLaw

0.410.370.120.10

FootballLawOSHealth

0.470.240.160.12

Returned documents (d)

The user profile (p)

36

Re-ranking search results (3)Re-ranking Features

Re-Ranking Algorithm: LambdaMART[1]

1. C. J. Burges, et al., Learning to rank with non-smooth cost functions. In NIPS’2007.

Feature DescriptionPersonalised FeaturesLongTermScore

Personalised score between document and long-term profile

DailyScore Personalised score between document and daily profile

SessionScore Personalised score between document and session profile

Non-personalised FeaturesDocRank Rank of document on original returned listQuerySim Cosine similarity score between current and

previous queriesQueryNo Total number of queries that have been submitted in

the current search session (included the current query)

37

EvaluationDatasetThe query logs of 1166 anonymous users in four

weeks, from 01st to 28th July 2012A log entity consists of an anonymous user

identifier, a query, top-10 returned URLs, and clicked documents along with the user’s dwell time

Download all the URLs’ content for learning topicsA search session is demarcated by 30 minutes of

user inactivityA relevant document is a click with dwell time of at

least 30 seconds or the last click in a session (SAT click)

38

Evaluation methodologyAssign a positive (relevant) label to a

returned URL ifit is a SAT click in the current queryit is a SAT click in one of the other repeated

queries in the same search sessionAssign negative (irrelevant) labels to the

rest of URLs

39

Personalisation Methods and BaselinesPersonalisation Methods

LON uses only LongTermScore from long-term profileDAI uses only DailyScore from daily profileSES uses SessionScore from session profileALL uses all personalised scores from three profiles

(ALL)Baselines

Default is the default ranking returned by the search engine

Static uses the LongTermScore from long-term profile without time-awareness (i.e., not using decay function)

40

ResultsEvaluation metrics

Mean Average Precision (MAP)Precision (P@k)Mean Reciprocal Rank (MRR)Normalized Discounted Cumulative Gain

(nDCG@k) For each evaluation metric, the higher

value indicates the better ranking

41

Overall Performance

• All the improvements over the baselines are significant with paired t-test of p < 0.001

42

Overall Performance

43

Overall Performance

44

Overall Performance

45

Overall Performance

46

TakeawaysDynamic Grouping

Grouping improves search performanceDynamic grouping outperforms static grouping

Temporal profilesThree temporal profiles help to improve

search performance over the default ranking and the use of non-temporal profile

Using all features (ALL) achieves the highest performance

The short-term profile achieves better performance than the longer-term profile

47

Thank you!Any questions?

48

Dataset (2)

49

Example of query logs

50

Click EntropiesP(d|q) is the percentage of the clicks on

document d among all the clicks for qA smaller query click entropy value

indicates more agreement between users on clicking a small number of web pages

51

Click entropies

52

Query Positions in Search SessionAim to study whether the position of a

query has any effect on the performance of the temporal latent topic profiles

Label the queries by their positions during the search

53

FootballLawHealthOS

0.510.330.110.05

Clicked documents

FootballHealthOSLaw

0.550.270.130.05

LawOSHealthFootball

0.410.370.120.10

OSLawFootballHealth

0.650.150.110.09

Distribution over topics

FootballLawOSHealth

0.320.290.280.11

Means over topics

The topic-based user profile

54

Re-ranking search results (1) Query: MU

55

Pre-processingRemove the queries whose positive label

set is empty from the datasetDiscard the domain-related queries (e.g.,

Facebook, Youtube)

56

Overall Performance

57

dynamic user profiling for search personalisation

Data & Analytics