algorithms for efficient collaborative filtering vreixo formoso fidel cacheda víctor carneiro...

22
Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Post on 22-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Algorithms for Efficient Collaborative Filtering

Vreixo Formoso

Fidel Cacheda

Víctor CarneiroUniversity of A Coruña (Spain)

Page 2: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 20082

Outline

Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions

Page 3: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 20083

Introduction

More and more information every day Personalized retrieval systems are quite

interesting– Recommender systems: recommend items that

would be more appropriate for the user’s needs or preferences

– Useful in e-commerce, but we think they could be also useful in Web IR

Recommender systems store some information about the user preferences User profile– Explicit or implicit

Page 4: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 20084

Introduction

Types of recommender systems:– Content-based filtering: recommend items

based on their content Depends on automatic analysis of the items Unable to determine the item quality Serendipitous find

– Collaborative filtering: based on other users evaluations

It will recommend items well considered by other users with similar interests

Problems with computational performance and efficiency

Page 5: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 20085

Outline

Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions

Page 6: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 20086

Background

User profile: evaluations carried by the user Evaluation: numerical value (e.g. 1 – 5) Evaluation matrix: contains the evaluation of

the users Types of collaborative filtering algorithms:

– Memory-based: use similarity measures to predict related neighbours (users or items)

The entire matrix is used in each prediction

– Model-based: build a model that represents the user behaviour predict his evaluations

The parameters of the model are estimated using the evaluation matrix (off-line)

Page 7: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 20087

Background

Memory-based– Simple and give reasonably precise results– Low scalability– More sensitive to common recommender systems

problems: sparsity, cold-start and spam. Model-based

– Finds underlying characteristics in the data– Faster in prediction time– Complexity of the models:

Sensitive to changes in the data High construction times Model updating when new data are available

Page 8: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 20088

Background: Notation

i1 i2u1

u2

.

.

.

in

um

v11 …

… v2n

vm1 vm2 …

.

.

.

.

.

.

.

.

.

.

.

.

Items (I)

Users (U)

User profile (I1)

Users that have evaluated i1 (U1)

Evaluation matrix (V)

Prediction of evaluation of user m for item n (pmn)

vu. : evaluations of user u

v.i : evaluations for item i

Mean values: vu. and v.i

Page 9: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 20089

Outline

Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions

Page 10: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 200810

Proposed algorithms

Objectives:– Good behaviour in low density– Computational efficiency– Constant updating

Item mean algorithm– Our base Use the mean of an item as its prediction–

Simple mean based algorithm– The item mean is corrected with the mean of the user

ui ip v

( )

| |u

uj jj I

ui iu

v v

p vI

Page 11: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 200811

Proposed algorithms

Tendencies based algorithm– Main idea: users tend to evaluate items positively

or negatively Include tendencies in the formula– Tendency ≠ mean– Tendency of a user (ubu) and tendency of an item

(ibi):

– In this algorithm we use the mean of the item and the user as well as their respective tendencies.

( )

| |u

ui ii I

uu

v v

ubI

( )

| |i

ui uu U

ii

v v

ibU

Page 12: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 200812

Proposed algorithms

Tendencies based algorithm

max( , )ui u i i up v ib v ub

min( , )ui u i i up v ib v ub

min[max( , ) ( )(1 )), ]ui u i u u i ip v v ub v ib v

(1 )ui i up v v

Page 13: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 200813

Outline

Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions

Page 14: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 200814

Experiments

Algorithms evaluated– Memory-based: user-based, item-based and similarity

fusion– Model-based: regression based, slope one, latent semantic

index and cluster based smoothing– Hybrid: personality diagnosis

Dataset MovieLens– Real rating of films: 1 (very bad) – 5 (excellent)– 100,000 evaluations from 943 users for 1,682 movies (1.78

items evaluated/user). Density 6%– Training set: 10%, 50% and 90%

For each algorithm we evaluated (5 times):– Training and prediction times– Quality of the predictions

Page 15: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 200815

Proposed algorithms

Tendencies based algorithm

Only 5% of the prediction with 10% training set 2% of the prediction with 90% training set This case represents some unusual elements Tendencies seem a good prediction mechanism

Page 16: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 200816

Experiments: Computational complexity

AlgorithmTraining complexity

Prediction complexity

User Based - O(mn)

Item-Based O(mn²) O(n)

Similarity Fusion O(n²m + m²n) O(mn)

Personality Diagnosis O(m²n) O(m)

Regression Based O(mn²) O(n)

Slope One O(mn²) O(n)

Latent Semantic Indexing O((m+n)³) O(1)

Cluster Based Smoothing O(mnα + m²n) O(mn)

Item Mean O(mn) O(1)

Simple Mean Based O(mn) O(1)

Tendencies Based O(mn) O(1)

Page 17: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 200817

Experiments: Training time

Algorithms 10% 50% 90%

User Based 0 0 0

Item Based 415 1,060 1,986

Similarity Fusion 987 3,840 5,474

Personality Diagnosis 257 994 2,213

Regression Based 3,302 4,575 7,780

Slope One 1,246 2,175 2,541

Latent Semantic Indexing 117,758 115,218 102,855

Cluster Based Smoothing 60,247 71,529 44,635

Item Mean 2 3 3

Simple Mean Based 7 10 5

Tendencies Based 11 15 9

Page 18: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 200818

Experiments: Prediction time

Algorithms 10% 50% 90%

User Based 6,250 15,597 8,915

Item Based 221 1,864 909

Similarity Fusion 227,736 756,834 264,951

Personality Diagnosis 1,369 3,845 1,400

Regression Based 205 570 265

Slope One 319 501 116

Latent Semantic Indexing 162 158 20

Cluster Based Smoothing 70,515 251,595 118,552

Item Mean 24 12 2

Simple Mean Based 25 11 4

Tendencies Based 24 16 4

Page 19: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 200819

Experiments: Prediction quality

Algorithms 10% 50% 90%

User Based 0.99 0.71 0.68

Item Based 0.92 0.75 0.71

Similarity Fusion 0.84 0.73 0.71

Personality Diagnosis 0.82 0.78 0.78

Regression Based 1.03 0.76 0.74

Slope One 0.90 0.72 0.70

Latent Semantic Indexing 0.85 0.77 0.73

Cluster Based Smoothing 0.97 0.87 0.80

Item Mean 0.82 0.79 0.79

Simple Mean Based 0.79 0.72 0.72

Tendencies Based 0.79 0.72 0.71

Page 20: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 200820

Outline

Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions

Page 21: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 200821

Conclusions

We have presented a couple of algorithms for collaborative filtering:– Very simple Good response times– Tendencies based algorithm:

Quality of the predictions equivalent to the best algorithms

Even better in low density training sets

Next steps: use these algorithms in Web IR– Problems: dataset?

Page 22: Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Glasgow - 30th March 2008EIIR 200822

Thank you!

Questions?