introduction to collaborative filtering

Post on 07-Jul-2015

529 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

An introduction to user-based and item-based collaborative filtering, and a telling of the story of the Netflix Prize.

TRANSCRIPT

WE KNOW YOU WILL LIKE THISIntroduction to Recommendation Engines

Monday, January 14, 13

Supervised

MLX X + Y

X X + Y

T + YT

Unsupervised

Regression Classification

Y=Turnout

301225

(numeric) Y=ClassSpamNot SpamSpam

(Categorical)

Clustering

Hierarchical Clustering

Monday, January 14, 13

Recommendation

Content/Model-Based

(predicting the rating)

MarabooKarnaf Ima AdamaLivIdan 5 ? 3 ?Shahar 4 3 ? 2Gadi ? 1 ? 5

(Agnostic, Behavioural)

Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Rating Problem (Movies)

Preference Problem (Ads)

Monday, January 14, 13

Monday, January 14, 13

Related problem: Ranking

Monday, January 14, 13

Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1

Maraboo Karnaf Ima Adama LivIdan 5 ? 3 ?Shahar 4 3 ? 2Gadi ? 1 ? 5

Monday, January 14, 13

Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1

Monday, January 14, 13

Maraboo Karnaf Ima Adama LivIdan 5 ? 3 ?Shahar 4 3 ? 2Gadi ? 1 ? 5

Monday, January 14, 13

User-based Collaborative Filtering

Monday, January 14, 13

Monday, January 14, 13

Pearson’s Correlation 1-Distance

Cosine Similiarity

Jaccard Distance “We share 5 preferences out of 7!”

“Our preferences go in the same direction!”

(but only 2 such preferences do...)

Euclidean Distance

Log-LikelihoodRatio

Measure of “Surprise” at correlation

Monday, January 14, 13

Item-Based Collaborative Filtering

Usually bounded

Monday, January 14, 13

Case study: Amazon100,000,000 users

2,000,000 items

Each user expresses preference for 10 items

Each item has 500 reviewsUser-Based CF:

100,000,000 x 100,000,000 similarity matrix

2,000,000 x 500 sum terms

Item-Based CF:

2,000,000 x 2,000,000 similarity matrix

2,000,000 x 10 sum terms

Monday, January 14, 13

Interpretability

“People who go to La Colombe

Torrefaction & FourSquare HQ tend

to go here”

“Coffee Shop connoisseurs tend

to come here”

Monday, January 14, 13

EvaluationRating Problem: Predictive accuracy (regression) metrics

RMSE, MAE, etc.

Preference (Binary) Problem: Classification accuracy (IR) metrics

Accuracy, Precision, Recall, F-1, ROC, etc.

Benchmark vs. ‘random’ and ‘popular’

Ranking accuracy metrics: Similarity of permutations

Pearson’s correlation, Spearman’s rho, Kendall’s tau

Monday, January 14, 13

Monday, January 14, 13

Challenges

Cold-start problems (new item, new user)

“Black” and “Grey” sheep

Exploration-exploitation and reinforcement learning

Scale

Monday, January 14, 13

Advanced Topics

Dimensionality Reduction

Map-Reducible calculations

Content-based (feature-based)

Multiple models

Monday, January 14, 13

MapReduce Similarity Calculation

Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1

GadiMaraboo ?Karnaf 1Ima Adama ?Liv 1

GadiIdan 0Shahar 2Gadi 2

* =

A ui

GadiIdan 0Shahar 2Gadi 2

Idan Shahar GadiMaraboo 1 1 ?Karnaf ? 1 1Ima Adama 1 ? ?Liv ? 1 1

*

AT Aui

=Gadi

Maraboo 2Karnaf 4Ima Adama 0Liv 4

“User-based”

AT(Aui)User similarity vector

Monday, January 14, 13

MapReduce Similarity Calculation

Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1

GadiMaraboo ?Karnaf 1Ima Adama ?Liv 1

* =

A

ui

Idan Shahar GadiMaraboo 1 1 ?Karnaf ? 1 1Ima Adama 1 ? ?Liv ? 1 1

*

AT

=Gadi

Maraboo 2Karnaf 4Ima Adama 0Liv 4

“Item-Based”

(ATA)ui

Maraboo Karnaf Ima Adama LivMaraboo 2 1 1 1Karnaf 1 2 0 2Ima Adama 1 0 1 0Liv 1 2 0 2

Maraboo Karnaf Ima Adama LivMaraboo 2 1 1 1Karnaf 1 2 0 2Ima Adama 1 0 1 0Liv 1 2 0 2

ATAItem similarity matrix

Similarity of item x to item y is <i ,i >x y

Monday, January 14, 13

MapReduce Similarity Calculation

Recall row outer-product matrix multiplication:

Maraboo Karnaf Ima Adama LivMaraboo 0 0 0 0Karnaf 0 1 0 1Ima Adama 0 0 0 0Liv 0 1 0 1

Maraboo Karnaf Ima Adama LivMaraboo 1 0 1 0Karnaf 0 0 0 0Ima Adama 1 0 1 0Liv 0 0 0 0

Maraboo Karnaf Ima Adama LivMaraboo 1 1 0 1Karnaf 1 1 0 1Ima Adama 0 0 0 0Liv 1 1 0 1

Maraboo Karnaf Ima Adama LivMaraboo 2 1 1 1Karnaf 1 2 0 2Ima Adama 1 0 1 0Liv 1 2 0 2

=

+ +

uIdanuIdan uShaharuShahar uGadiuGadiT TT

Only one user’s list of items is used every time!

Monday, January 14, 13

MapReduce Similarity Calculation

All of the classic similarity functions aremade up of 3 stages:

Preprocess (uses only one ELEMENT)

Norm (Can be done in reduce on one VECTOR)

Similarity utilizes the A A matrix joinedwith norm entries

T

Monday, January 14, 13

BibliographyGoogle News Personalization: Scalable Online Collaborative Filtering - Das, Datar, Garg, Rajaram, WWW2007

Logistic Regression and Collaborative Filtering for Sponsored Search Term Recommendation - Bartz, Murthi, Sebastian, EC2006

Evaluating Collaborative Filtering Recommender Systems - Herlocker, Konstan, Tenveen, Riedl, ACM TIS2004

A Survey of Collaborative Filtering Techniques - Su, Khoshgoftaar, AAI2009

An Introduction to Information Retrieval - Manning, Raghavan, Schutze, Cambridge Press

Mahout in Action - Friedman, Dunning, Anil, Owen, Manning Publications

Lessons from the Netflix Prize Challenge - Bell, Koren, KDD2009

Factorization meets the Neighbourhood: a Multifaceted Collaborative Filtering Model - Koren, KDD2008

Accurate Methods for the Statistics of Surprise and Coincidence - Dunning, ACL1993

Item-Based Collaborative Filtering Recommendation Algorithms - Sarwar, Konstan, Karypis, Riedl, WWW2001

Matrix Factorization Techniques for Recommender Systems - Koren, Bell, Volinsky, IEEE2009

recommenderlab: A Framework for Developing and Testing Recommendation Algorithms - Hahsler, 2001

Scalable Similarity-Based Neighbourhood Methods with MapReduce - Schelter, Boden, Markl, RecSys2012

Monday, January 14, 13

Thanks!

Nimrod Priell nimrod.priell@gmail.com@nimrodpriellhttp://www.educated-guess.com

Monday, January 14, 13

top related