improving memory-based collaborative filtering by neighbour selection based on user preference...

24
IR G IR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap Alejandro Bellogín*, Pablo Castells, Iván Cantador Information Retrieval Group Universidad Autónoma de Madrid *Information Access Group Centrum Wiskunde & Informatica

Upload: alejandro-bellogin

Post on 27-Jun-2015

616 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Improving Memory-Based Collaborative Filtering

by Neighbour Selection based on User Preference Overlap

Alejandro Bellogín*, Pablo Castells, Iván Cantador

Information Retrieval Group – Universidad Autónoma de Madrid *Information Access Group – Centrum Wiskunde & Informatica

Page 2: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

2

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Recommender Systems

A recommender system aims to find and suggest items of likely interest based on the users’ preferences

Examples:

• Amazon – products

• Netflix – tv shows and movies

• LinkedIn – jobs and colleagues

• Last.fm – music artists and tracks

Page 3: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

3

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Collaborative Filtering

“You may like classical music if you like heavy metal”

Page 4: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

4

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Collaborative Filtering (in real world)

Page 5: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

5

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Neighbours are selected according to how similar they are

Neighbour selection

Page 6: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

6

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Neighbour-based collaborative filtering

Depend on user similarity metrics

• Pearson’s correlation

• Spearman’s correlation

• Cosine

Advantages

• Simplicity

• Intuitive

• Efficiency

Disadvantages

• Lower accuracy than other methods

• Limited coverage

Page 7: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

7

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Research question

Can we find a good surrogate of user similarity for neighbour selection?

We need an alternative of similarity which provides equivalent (or better!) results

We focus on user preference overlap

Page 8: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

8

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Relation between similarity and overlap

I u I v

P ea rso n 's ,co r u v

2 2

, ,

s im ,

, ,

i

i i

r u i r u r v i r v

u v

r u i r u r v i r v

Page 9: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

9

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Relation between similarity and overlap

The larger the overlap, the more likely the users would agree on their ratings

Page 10: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

10

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Relation between similarity and overlap

The larger the overlap, the more likely the users would agree on their ratings

Low (negative) similarity values obtained with small overlap sizes

Page 11: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

11

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Relation between similarity and overlap

The larger the overlap, the more likely the users would agree on their ratings

Low (negative) similarity values obtained with small overlap sizes

Highest similarity values obtained by users with tiny overlap

• This is coincidental, such similarities have low reliability

Page 12: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

12

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

User overlap for neighbour selection

User-based collaborative filtering:

Select top-k neighbours V according to

• Similarity

• Intersection (overlap)

• Herlocker’s weighting

• McLaughlin’s weighting

r , s im , r ,

v V

u i u v v i

m in ,

H ,

m a x ,

M L ,

I u I v n

u vn

I u I v n

u vn

Page 13: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

13

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

User overlap for neighbour selection

Performance comparison

• RMSE: error-based evaluation (the lower, the better)

• P@10: precision-based evaluation (the higher the better)

2

,

1R M S E , ,

u i T

r u i r u iT

R el T o p NP @ N

N

Page 14: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

14

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

User overlap for neighbour selection

Performance comparison

• RMSE: error-based evaluation (the lower, the better)

• P@10: precision-based evaluation (the higher the better)

↑ Performance as good (or better) than the baseline

↓ Lower coverage for some methods

0.00

0.10

0.20

0.30

0.40

0.50

0.60

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000P@ 10

Neighbourhood sizeAR

Similarity filtering Intersection filtering

Herlocker filtering McLaughlin filtering

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

Page 15: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

15

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

User overlap for neighbour selection

Performance comparison: different values for n

0.00001.0000

10 20 50 100 200 1000 2000P@ 10

Neighbourhood sizeARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering

0.00

0.05

0.10

0.15

0.20

0.25

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 10

m in ,

H ,

m a x ,

M L ,

I u I v n

u vn

I u I v n

u vn

0.00

0.10

0.20

0.30

0.40

0.50

0.60

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.40

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 100

Page 16: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

16

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

User overlap for neighbour selection

Performance comparison: different values for n

0.00001.0000

10 20 50 100 200 1000 2000P@ 10

Neighbourhood sizeARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering

0.00

0.05

0.10

0.15

0.20

0.25

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 10

m in ,

H ,

m a x ,

M L ,

I u I v n

u vn

I u I v n

u vn

0.00

0.10

0.20

0.30

0.40

0.50

0.60

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.40

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 100

0.00

0.05

0.10

0.15

0.20

0.25

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 20

0.00

0.10

0.20

0.30

0.40

0.50

0.60

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 80

Page 17: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

17

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

User overlap for neighbour selection

Performance comparison: different values for n

0.00001.0000

10 20 50 100 200 1000 2000P@ 10

Neighbourhood sizeARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering

0.00

0.05

0.10

0.15

0.20

0.25

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 10

m in ,

H ,

m a x ,

M L ,

I u I v n

u vn

I u I v n

u vn

0.00

0.10

0.20

0.30

0.40

0.50

0.60

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.40

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 100

0.00

0.05

0.10

0.15

0.20

0.25

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 20

0.00

0.10

0.20

0.30

0.40

0.50

0.60

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 80

0.00

0.10

0.20

0.30

0.40

0.50

0.60

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 40

0.00

0.10

0.20

0.30

0.40

0.50

0.60

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0Neighbourhood sizeAR

Similarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 60

Page 18: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

18

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Conclusions

We compare three approaches based on user preference overlap

• Intersection

• Herlocker’s weighting

• McLaughlin’s weighting

Overlap-based approaches improve accuracy (RMSE) and precision (P@10)

• Especially, for small neighbourhoods

• Stable with respect to the threshold parameter n

Page 19: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

19

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Future work

Extend our analysis to other similarity functions

• Cosine

• Spearman

Exploit other recommendation input dimensions

• User logs

• Social networks

Understand why the overlap is special to improve other recommendation algorithms

Page 20: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

20

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Thank you!

Improving Memory-Based Collaborative Filtering by Neighbour Selection Based on

User Preference Overlap

Alejandro Bellogín, Pablo Castells, Iván Cantador

@abellogin [email protected]

Page 21: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

21

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

User overlap for neighbour selection

Performance comparison: different values for n

0.00001.0000

10 20 50 100 200 1000 2000P@ 10

Neighbourhood sizeARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering0.00

0.10

0.20

0.30

0.40

0.50

0.60

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.40

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 100

0.00

0.10

0.20

0.30

0.40

0.50

0.60

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 80

0.00

0.10

0.20

0.30

0.40

0.50

0.60

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 60

0.00

0.10

0.20

0.30

0.40

0.50

0.60

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 40

0.00

0.05

0.10

0.15

0.20

0.25

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 20

0.00

0.05

0.10

0.15

0.20

0.25

10 20 50 100 200 1000 2000

P@

10

Neighbourhood size

0.55

0.70

0.85

1.00

1.15

1.30

1.45

10 20 50 100 200 1000 2000

RM

SE

Neighbourhood size

0.00001.0000

10 20 50 100 200 1000 2000

P @ 1 0

Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filtering

n = 10

Page 22: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

22

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Experiments

Dataset: MovieLens 1M

• Users: 6,040

• Items: 3,600

• Number of ratings: 1 million

• Available in http://www.grouplens.org/node/73

Page 23: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

23

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

Neighbour selection in collaborative filtering

Two main alternatives

• Top-k most similar users

• Thresholding: those users whose similarity is above a threshold

In [Herlocker et al, 2002]: thresholding obtains worse error accuracy

Compatible with

• Clustering of users: neighbourhood ~ users in the same cluster [Xue et al, 2005; Bellogin & Parapar, 2013]

• Probabilistic relevance models: p(v | Ru) [Submitted to RecSys ‘13]

• Rating normalisation [Desrosiers & Karypis, 2012]

Page 24: Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

24

IRGIR Group @ UAM

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013)

References

[Bellogin & Parapar, 2013] Using Graph Partitioning Techniques for Neighbour Selection in User-Based Collaborative Filtering. RecSys

[Desrosiers & Karypis, 2012] A Comprehensive Survey of Neighborhood-based Recommendation Methods. Recommender Systems Handbook

[Herlocker et al, 2002] An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms. Information Retrieval Journal

[Xue et al, 2005] Scalable Collaborative Filtering Using Cluster-Based Smoothing. SIGIR