introduction to collaborative filtering

33
WE KNOW YOU WILL LIKE THIS Introduction to Recommendation Engines Monday, January 14, 13

Upload: nimrod-priell

Post on 07-Jul-2015

529 views

Category:

Technology


5 download

DESCRIPTION

An introduction to user-based and item-based collaborative filtering, and a telling of the story of the Netflix Prize.

TRANSCRIPT

Page 1: Introduction to Collaborative Filtering

WE KNOW YOU WILL LIKE THISIntroduction to Recommendation Engines

Monday, January 14, 13

Page 2: Introduction to Collaborative Filtering

Supervised

MLX X + Y

X X + Y

T + YT

Unsupervised

Regression Classification

Y=Turnout

301225

(numeric) Y=ClassSpamNot SpamSpam

(Categorical)

Clustering

Hierarchical Clustering

Monday, January 14, 13

Page 3: Introduction to Collaborative Filtering

Recommendation

Content/Model-Based

(predicting the rating)

MarabooKarnaf Ima AdamaLivIdan 5 ? 3 ?Shahar 4 3 ? 2Gadi ? 1 ? 5

(Agnostic, Behavioural)

Monday, January 14, 13

Page 4: Introduction to Collaborative Filtering

Monday, January 14, 13

Page 5: Introduction to Collaborative Filtering

Monday, January 14, 13

Page 6: Introduction to Collaborative Filtering

Monday, January 14, 13

Page 7: Introduction to Collaborative Filtering

Monday, January 14, 13

Page 8: Introduction to Collaborative Filtering

Monday, January 14, 13

Page 9: Introduction to Collaborative Filtering

Monday, January 14, 13

Page 10: Introduction to Collaborative Filtering

Monday, January 14, 13

Page 11: Introduction to Collaborative Filtering

Monday, January 14, 13

Page 12: Introduction to Collaborative Filtering

Rating Problem (Movies)

Preference Problem (Ads)

Monday, January 14, 13

Page 13: Introduction to Collaborative Filtering

Monday, January 14, 13

Page 14: Introduction to Collaborative Filtering

Related problem: Ranking

Monday, January 14, 13

Page 15: Introduction to Collaborative Filtering

Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1

Maraboo Karnaf Ima Adama LivIdan 5 ? 3 ?Shahar 4 3 ? 2Gadi ? 1 ? 5

Monday, January 14, 13

Page 16: Introduction to Collaborative Filtering

Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1

Monday, January 14, 13

Page 17: Introduction to Collaborative Filtering

Maraboo Karnaf Ima Adama LivIdan 5 ? 3 ?Shahar 4 3 ? 2Gadi ? 1 ? 5

Monday, January 14, 13

Page 18: Introduction to Collaborative Filtering

User-based Collaborative Filtering

Monday, January 14, 13

Page 19: Introduction to Collaborative Filtering

Monday, January 14, 13

Page 20: Introduction to Collaborative Filtering

Pearson’s Correlation 1-Distance

Cosine Similiarity

Jaccard Distance “We share 5 preferences out of 7!”

“Our preferences go in the same direction!”

(but only 2 such preferences do...)

Euclidean Distance

Log-LikelihoodRatio

Measure of “Surprise” at correlation

Monday, January 14, 13

Page 21: Introduction to Collaborative Filtering

Item-Based Collaborative Filtering

Usually bounded

Monday, January 14, 13

Page 22: Introduction to Collaborative Filtering

Case study: Amazon100,000,000 users

2,000,000 items

Each user expresses preference for 10 items

Each item has 500 reviewsUser-Based CF:

100,000,000 x 100,000,000 similarity matrix

2,000,000 x 500 sum terms

Item-Based CF:

2,000,000 x 2,000,000 similarity matrix

2,000,000 x 10 sum terms

Monday, January 14, 13

Page 23: Introduction to Collaborative Filtering

Interpretability

“People who go to La Colombe

Torrefaction & FourSquare HQ tend

to go here”

“Coffee Shop connoisseurs tend

to come here”

Monday, January 14, 13

Page 24: Introduction to Collaborative Filtering

EvaluationRating Problem: Predictive accuracy (regression) metrics

RMSE, MAE, etc.

Preference (Binary) Problem: Classification accuracy (IR) metrics

Accuracy, Precision, Recall, F-1, ROC, etc.

Benchmark vs. ‘random’ and ‘popular’

Ranking accuracy metrics: Similarity of permutations

Pearson’s correlation, Spearman’s rho, Kendall’s tau

Monday, January 14, 13

Page 25: Introduction to Collaborative Filtering

Monday, January 14, 13

Page 26: Introduction to Collaborative Filtering

Challenges

Cold-start problems (new item, new user)

“Black” and “Grey” sheep

Exploration-exploitation and reinforcement learning

Scale

Monday, January 14, 13

Page 27: Introduction to Collaborative Filtering

Advanced Topics

Dimensionality Reduction

Map-Reducible calculations

Content-based (feature-based)

Multiple models

Monday, January 14, 13

Page 28: Introduction to Collaborative Filtering

MapReduce Similarity Calculation

Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1

GadiMaraboo ?Karnaf 1Ima Adama ?Liv 1

GadiIdan 0Shahar 2Gadi 2

* =

A ui

GadiIdan 0Shahar 2Gadi 2

Idan Shahar GadiMaraboo 1 1 ?Karnaf ? 1 1Ima Adama 1 ? ?Liv ? 1 1

*

AT Aui

=Gadi

Maraboo 2Karnaf 4Ima Adama 0Liv 4

“User-based”

AT(Aui)User similarity vector

Monday, January 14, 13

Page 29: Introduction to Collaborative Filtering

MapReduce Similarity Calculation

Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1

GadiMaraboo ?Karnaf 1Ima Adama ?Liv 1

* =

A

ui

Idan Shahar GadiMaraboo 1 1 ?Karnaf ? 1 1Ima Adama 1 ? ?Liv ? 1 1

*

AT

=Gadi

Maraboo 2Karnaf 4Ima Adama 0Liv 4

“Item-Based”

(ATA)ui

Maraboo Karnaf Ima Adama LivMaraboo 2 1 1 1Karnaf 1 2 0 2Ima Adama 1 0 1 0Liv 1 2 0 2

Maraboo Karnaf Ima Adama LivMaraboo 2 1 1 1Karnaf 1 2 0 2Ima Adama 1 0 1 0Liv 1 2 0 2

ATAItem similarity matrix

Similarity of item x to item y is <i ,i >x y

Monday, January 14, 13

Page 30: Introduction to Collaborative Filtering

MapReduce Similarity Calculation

Recall row outer-product matrix multiplication:

Maraboo Karnaf Ima Adama LivMaraboo 0 0 0 0Karnaf 0 1 0 1Ima Adama 0 0 0 0Liv 0 1 0 1

Maraboo Karnaf Ima Adama LivMaraboo 1 0 1 0Karnaf 0 0 0 0Ima Adama 1 0 1 0Liv 0 0 0 0

Maraboo Karnaf Ima Adama LivMaraboo 1 1 0 1Karnaf 1 1 0 1Ima Adama 0 0 0 0Liv 1 1 0 1

Maraboo Karnaf Ima Adama LivMaraboo 2 1 1 1Karnaf 1 2 0 2Ima Adama 1 0 1 0Liv 1 2 0 2

=

+ +

uIdanuIdan uShaharuShahar uGadiuGadiT TT

Only one user’s list of items is used every time!

Monday, January 14, 13

Page 31: Introduction to Collaborative Filtering

MapReduce Similarity Calculation

All of the classic similarity functions aremade up of 3 stages:

Preprocess (uses only one ELEMENT)

Norm (Can be done in reduce on one VECTOR)

Similarity utilizes the A A matrix joinedwith norm entries

T

Monday, January 14, 13

Page 32: Introduction to Collaborative Filtering

BibliographyGoogle News Personalization: Scalable Online Collaborative Filtering - Das, Datar, Garg, Rajaram, WWW2007

Logistic Regression and Collaborative Filtering for Sponsored Search Term Recommendation - Bartz, Murthi, Sebastian, EC2006

Evaluating Collaborative Filtering Recommender Systems - Herlocker, Konstan, Tenveen, Riedl, ACM TIS2004

A Survey of Collaborative Filtering Techniques - Su, Khoshgoftaar, AAI2009

An Introduction to Information Retrieval - Manning, Raghavan, Schutze, Cambridge Press

Mahout in Action - Friedman, Dunning, Anil, Owen, Manning Publications

Lessons from the Netflix Prize Challenge - Bell, Koren, KDD2009

Factorization meets the Neighbourhood: a Multifaceted Collaborative Filtering Model - Koren, KDD2008

Accurate Methods for the Statistics of Surprise and Coincidence - Dunning, ACL1993

Item-Based Collaborative Filtering Recommendation Algorithms - Sarwar, Konstan, Karypis, Riedl, WWW2001

Matrix Factorization Techniques for Recommender Systems - Koren, Bell, Volinsky, IEEE2009

recommenderlab: A Framework for Developing and Testing Recommendation Algorithms - Hahsler, 2001

Scalable Similarity-Based Neighbourhood Methods with MapReduce - Schelter, Boden, Markl, RecSys2012

Monday, January 14, 13

Page 33: Introduction to Collaborative Filtering

Thanks!

Nimrod Priell [email protected]@nimrodpriellhttp://www.educated-guess.com

Monday, January 14, 13