i ii x_slides_albakour_online

26
Diversifying Contextual Suggestions from Location- based Social Networks M-Dyaa Albakour , Romain Deveaud, Craig Macdonald, Iadh Ounis University of Glasgow IIiX 2014, Regensburg, Germany @dyaaa

Upload: dyaa-albakour

Post on 16-Jun-2015

459 views

Category:

Technology


0 download

DESCRIPTION

Diversifying Contextual Suggestions from Location-based Social Networks M-Dyaa Albakour, Romain Deveaud, Craig Macdonald, Iadh Ounis A talk at the IIiX 2014 conference in Resenburg

TRANSCRIPT

Page 1: I ii x_slides_albakour_online

Diversifying Contextual Suggestions from Location-based Social NetworksM-Dyaa Albakour, Romain Deveaud, Craig Macdonald, Iadh

OunisUniversity of Glasgow

IIiX 2014, Regensburg, Germany@dyaaa

Page 2: I ii x_slides_albakour_online

The Task of Contextual Suggestions

Entertain me!

Elfreths Alley Museum

Eastern State Penitentiary

Round Guys Brewing Co

Reading Terminal MarketChinatown

cDarlings Cafe

Location ( Springfield )

This is an important IR task when considering new Smart City environments (recent i-ASC 2014 workshop in ECIR)

2

Zero-query

Page 3: I ii x_slides_albakour_online

Challenges in Contextual Suggestion

Ambiguity of the zero-query• Accurately representing the user’s preferences.• Existing approaches (e.g. [1]) model the direct low-level interests of the user.• Collaborative Filtering approaches (e.g. [2]) can be employed to infer higher

level of interests (need a large number of users in a social network setting).

[1] P. Yang and H. Fang. Opinion-based User Profile Modeling for Contextual Suggestions. In Proceedings of ICTIR, 2013.[2] A. Noulas, S. Scellato, N. Lathia, and C. Mascolo. A Random Walk around the City: New Venue Recommendation in Location-based Social Networks. In Proceedings of PASSAT, 2012 3

Ranked list of suggestions

Abraham Lincoln Presidential Library & Museum

Illinois State Museum

Dana-Thomas House

Lincoln Home Visitor's Center

President Abraham Lincoln Hotel

Redundancy of suggestions• If there are lots of museums in an area,

then we are likely to recommend many of them to a user who is interested in museums – but would a user like to visit multiple in a single trip?

Page 4: I ii x_slides_albakour_online

4

ContributionsAdapt a diversification approach to deal with ambiguity and redundancy• We adapt of a state-of-the-art approach, the xQuAD framework [3].• Aim is to balance between matching the user’s low-level interests and

covering the inferred high-level venue categories. (restaurants, shops..)• Categories obtained from Location-based Social Networks (LBSNs), namely

FourSquare and Yelp.

Alleviate the limitations of a social network setting• We have extended our approach by developing a classifier for predicting the

category of a venue from its public profile (a web page)

Thorough evaluation using the TREC 2013 Contextual Suggestion track (it serves as a user study!)

[3] R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting Query Reformulations for Web Search Result Diversification. In Proceedings of WWW, 2010.

Page 5: I ii x_slides_albakour_online

5

Outline

• Language Modelling for Contextual Suggestions• Category Diversification• Venue Category Prediction• Evaluation• Conclusions

Page 6: I ii x_slides_albakour_online

LANGUAGE MODELLING FOR CONTEXTUAL SUGGESTIONS

Page 7: I ii x_slides_albakour_online

Contextual Suggestion

The aim is to rank venues for a location and a given user

− Venues can be obtained from a LBSN or the web.

Ranking venues in a location based on a language model

− Build a language model of the venue (description of the venue from its home page)

− Build a profile of the user from venues they rated explicitly before.

Location ( Springfield )

c

7

r ( , ) ?

Page 8: I ii x_slides_albakour_online

8

Building the User Profile

user

Elfreths Alley Museum

Eastern State Penitentiary

Round Guys Brewing Co

Reading Terminal MarketChinatown

cDarlings Cafe

Negative User Profile

BakeryFarmerMarket

Chinatown......

Positive User Profile

MuseumAlleyBrewingHistoryElfreths

Beers......

Page 9: I ii x_slides_albakour_online

Ranking Venues

user

Location ( Springfield )

c

Dana Thomas House

Venue Profile

architecturemuseumhouseart glasshistoricpreservation

• Divergence between the language model of the venue (the document) and the user profile (the query)

• Linear combination for both profiles to estimate the final score.

α. KL ( || ) - (1- α). KL ( || ) r ( , ) =

r ( , ) ?

9

Page 10: I ii x_slides_albakour_online

CATEGORY DIVERSIFICATIONOur Proposed Enhancements

Page 11: I ii x_slides_albakour_online

Diversified Suggestions

Abraham Lincoln Presidential Library & Museum

National Museum of Surveying

Del's Popcorn Shop

The Globe Tavern

Illinois State Museum

Incorporating DiversityRecall that due to bias towards top categories, we may recommend many similar venues

− e.g. Lots of museums in Washington

Our diversification approach aims to− Maximise coverage of venue categories in top ranked results− Incorporate the user’s preference for specific venue categories

(personalised diversification)

11

Ranked list of suggestions

Abraham Lincoln Presidential Library & Museum

Illinois State Museum

Dana-Thomas House

Lincoln Home Visitor's Center

President Abraham Lincoln Hotel

Page 12: I ii x_slides_albakour_online

12

xQuAD for diversifying contextual suggestions

Adapt an explicit web search results diversification approach− Consider the high-level venue categories underlying a user profile to be

equivalent to query aspects−Adapt the xQuAD [3] framework due to its effectiveness in Web Search

Venuerelevance Venue

Categories

Categoryimportance

categorycoverage

Venue Novelty

Can be estimated using

our LM approach Finite state of categories.Categorisation schemes in LBSN (Yelp, FourSquare)

Category importance: Personalised vs. non-personalised

Coverage: it is calculated based on the category of the venue

[3] R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting Query Reformulations for Web Search Result Diversification. In Proceedings of WWW, 2010.

r ( , )

?

Page 13: I ii x_slides_albakour_online

13

Category Importance

?To estimate the category importance in the xQuAD framework

1. Non-Personalised diversification: same importance for all categories and all users.

Uniform: with 10 categories = 1/10 for any category and all users.

2. Personalised diversification: infer the category of interest to the user from her positive and negative profiles.

How? Marginalisation of probabilities over all the venue in the original ranking using the LM approach

Venue category

What if the venue is not in the LBSN??

Venuerelevance

Can be estimated using

our LM approach Can be obtained from the LSBN.

r ( , )?

Page 14: I ii x_slides_albakour_online

VENUE CATEGORY PREDICTION

Page 15: I ii x_slides_albakour_online

15

Venue Category Prediction

Predicting the category of a venue−Venues may not be available in LBSNs. (e.g. when we consider the web for

recommendation)−Generalise our approach beyond a single LBSN−We developed an approach for estimating given a web page that

represents the venue

How?−Using a textual classifier trained with top search results from a large web

collection (ClueWeb12) for a large sample of venues in two LBSNs (Yelp and FourSquare)

Page 16: I ii x_slides_albakour_online

16

Venue Category Prediction

Venue: Tierra Cafe

Category: restaurant

d1

Web Collection

Tierra Cafe - Downtown - Los Angeles, CA | Yelpwww.yelp.com/biz/tierra-cafe-los-angeles

d2

dk

Tierra Cafe & Grill, Harrisburg - Restaurant Reviews - TripAdvisorwww.tripadvisor.com/...erra_Cafe_Grill-Harrisburg_Pennsylvania.html

Tierra Cafe & Grill - Harrisburg | Urbanspoonwww.urbanspoon.com/r/160/1657133/restaurant

Retrieved web documents

(d1, restaurant)

(d2, restaurant)

(dk, restaurant)

Classifier(supervised machine learning)

Learning instances

2. retrieve

1.sample

3. train

Features: document terms

Page 17: I ii x_slides_albakour_online

17

Venue Category Prediction

Evaluation• Samples from 2 LBSNs (5000 from FourSquare & 5000 from Yelp)• Retrieval models : BM25 & LTR approaches (AFS and LambdaMART) • Supervised learning: Naïve Bayes, J48, Random Forests and SVM.

• Results are consistent on both LBSNs. − Best accuracy is achieved with LambdaMART (for retrieval) and Random Forests (for

supervised learning). F-1=0.60 approximately.

http://artsbma.org/

ClassifierCategory Prob.

Arts and Entertainment 0.5

Shopping 0.4

Food 0.05

v

Category: ?

Home Page

classify

Page 18: I ii x_slides_albakour_online

EVALUATIONEvaluating our diversification approach for contextual suggestions

Page 19: I ii x_slides_albakour_online

19

Web Collection

ClueWeb12 CS FourSquare Cats. (6) 30,144

ClueWeb12 CS Yelp Cats. (10) 30,144

Experimental SetupEvaluation using the TREC 2013 Contextual Suggestion track

• 223 unique pairs of users and contexts (locations): 115 users in 36 unique locations (city centres)

• Each user has explicitly rated 50 sample venues

Venue Sources Categories # Venues

Specific LBSN FourSquare FourSquare Cats.(6) 60,212

Yelp Yelp Cats (10) 7,096

Venue Sources & Categories• Crawled venues from FourSquare and Yelp for the considered city centres

using 4km2 grids centred at those locations

Apply our venue

category prediction approach

Models Setup• α=0.5 (Equal weights for the positive and negative profiles)• λ=0.5 for xQuAD (Equal weights for the relevance and diversity components)

Page 20: I ii x_slides_albakour_online

20

Research Questions

RQ1: Can our diversification approach improve the quality of contextual suggestion over the LM baseline?

RQ2: What is the contribution of the diversity to the effectiveness of recommendation for different types of users?

Page 21: I ii x_slides_albakour_online

21

p@3 P@5 MRR0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700LM baseline

Non-personalised xQuAD

Personalised xQuAD

+4.5%-2.4%

+6.9%-1.6%

+2.5%-0.6%

Results - FourSquare

• Personalised diversification improves effectiveness over the LM baseline.

• Better Improvements at higher cut-offs.

• Non-personalised diversification harms effectiveness marginally

• Similar patterns observed in the Yelp dataset (details in the paper)

LM Baseline Non-pers. xQuAD Pers. xQuAD

67.98% 63.94% 68.43%judged@5

Page 22: I ii x_slides_albakour_online

22

Results – ClueWeb12 CS

LM Baseline Non-pers. xQuAD Pers. xQuAD

26.78% 27.22% 28.10%j@5

LM Baseline Non-pers. xQuAD Pers. xQuAD

26.78% 27.04% 26.60%

FourSquare Categories Yelp Categories

• As before, consistent improvement for the personalised diversification over the LM baseline for the various measures

• Using either categorisations (FourSquare or Yelp) produces consistent results

p@3 P@5 MRR0.000

0.050

0.100

0.150

0.200

0.250

LM baseline

Non-personalised xQuAD

Personalised xQuAD

+10.17%

-5.86%+8.89%

+1.23%

+4.47%

-4.71%

p@3 P@5 MRR

LM baseline

Non-personalised xQuAD

Personalised xQuAD

+7.72%

-10.22%+10.00%

0.00%

+2.24%

-3.30%

Page 23: I ii x_slides_albakour_online

23

AnalysisUsers are different in terms of the variety of their interests

• To measure this variation, we measure the entropy of category probability distribution for a given user

• Low entropy users have few venue categories of interest• High entropy users have a variety of equal interests to many venue categories

Top 50 users ranked by category entropy Least 50 users ranked by category entropy

• The difference is mostly negative (86% of users)

• Diversification approach succeeds in providing a diverse list of venues matching the user’s interests

• The difference is minimal for most users.

• However in 30% of the cases, the original ranking was better

Page 24: I ii x_slides_albakour_online

CONCLUSIONS

Page 25: I ii x_slides_albakour_online

25

Conclusions

Diversification can improve effectiveness of contextual suggestions when it is personalised.

• Up to 10% over a LM baseline in p@5• Consistent results on different datasets

Users with higher variety of interests benefits most from diversification of contextual suggestions

• 86% of high-variety users benefited from diversification

Page 26: I ii x_slides_albakour_online

26

Thanks!

Questions?

@smartfp7@dyaaa

[email protected]