introduction to collaborative filtering
Embed Size (px)
DESCRIPTION
An introduction to user-based and item-based collaborative filtering, and a telling of the story of the Netflix Prize.TRANSCRIPT

WE KNOW YOU WILL LIKE THISIntroduction to Recommendation Engines
Monday, January 14, 13

Supervised
MLX X + Y
X X + Y
T + YT
Unsupervised
Regression Classification
Y=Turnout
301225
(numeric) Y=ClassSpamNot SpamSpam
(Categorical)
Clustering
Hierarchical Clustering
Monday, January 14, 13

Recommendation
Content/Model-Based
(predicting the rating)
MarabooKarnaf Ima AdamaLivIdan 5 ? 3 ?Shahar 4 3 ? 2Gadi ? 1 ? 5
(Agnostic, Behavioural)
Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Monday, January 14, 13

Rating Problem (Movies)
Preference Problem (Ads)
Monday, January 14, 13

Monday, January 14, 13

Related problem: Ranking
Monday, January 14, 13

Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1
Maraboo Karnaf Ima Adama LivIdan 5 ? 3 ?Shahar 4 3 ? 2Gadi ? 1 ? 5
Monday, January 14, 13

Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1
Monday, January 14, 13

Maraboo Karnaf Ima Adama LivIdan 5 ? 3 ?Shahar 4 3 ? 2Gadi ? 1 ? 5
Monday, January 14, 13

User-based Collaborative Filtering
Monday, January 14, 13

Monday, January 14, 13

Pearson’s Correlation 1-Distance
Cosine Similiarity
Jaccard Distance “We share 5 preferences out of 7!”
“Our preferences go in the same direction!”
(but only 2 such preferences do...)
Euclidean Distance
Log-LikelihoodRatio
Measure of “Surprise” at correlation
Monday, January 14, 13

Item-Based Collaborative Filtering
Usually bounded
Monday, January 14, 13

Case study: Amazon100,000,000 users
2,000,000 items
Each user expresses preference for 10 items
Each item has 500 reviewsUser-Based CF:
100,000,000 x 100,000,000 similarity matrix
2,000,000 x 500 sum terms
Item-Based CF:
2,000,000 x 2,000,000 similarity matrix
2,000,000 x 10 sum terms
Monday, January 14, 13

Interpretability
“People who go to La Colombe
Torrefaction & FourSquare HQ tend
to go here”
“Coffee Shop connoisseurs tend
to come here”
Monday, January 14, 13

EvaluationRating Problem: Predictive accuracy (regression) metrics
RMSE, MAE, etc.
Preference (Binary) Problem: Classification accuracy (IR) metrics
Accuracy, Precision, Recall, F-1, ROC, etc.
Benchmark vs. ‘random’ and ‘popular’
Ranking accuracy metrics: Similarity of permutations
Pearson’s correlation, Spearman’s rho, Kendall’s tau
Monday, January 14, 13

Monday, January 14, 13

Challenges
Cold-start problems (new item, new user)
“Black” and “Grey” sheep
Exploration-exploitation and reinforcement learning
Scale
Monday, January 14, 13

Advanced Topics
Dimensionality Reduction
Map-Reducible calculations
Content-based (feature-based)
Multiple models
Monday, January 14, 13

MapReduce Similarity Calculation
Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1
GadiMaraboo ?Karnaf 1Ima Adama ?Liv 1
GadiIdan 0Shahar 2Gadi 2
* =
A ui
GadiIdan 0Shahar 2Gadi 2
Idan Shahar GadiMaraboo 1 1 ?Karnaf ? 1 1Ima Adama 1 ? ?Liv ? 1 1
*
AT Aui
=Gadi
Maraboo 2Karnaf 4Ima Adama 0Liv 4
“User-based”
AT(Aui)User similarity vector
Monday, January 14, 13

MapReduce Similarity Calculation
Maraboo Karnaf Ima Adama LivIdan 1 ? 1 ?Shahar 1 1 ? 1Gadi ? 1 ? 1
GadiMaraboo ?Karnaf 1Ima Adama ?Liv 1
* =
A
ui
Idan Shahar GadiMaraboo 1 1 ?Karnaf ? 1 1Ima Adama 1 ? ?Liv ? 1 1
*
AT
=Gadi
Maraboo 2Karnaf 4Ima Adama 0Liv 4
“Item-Based”
(ATA)ui
Maraboo Karnaf Ima Adama LivMaraboo 2 1 1 1Karnaf 1 2 0 2Ima Adama 1 0 1 0Liv 1 2 0 2
Maraboo Karnaf Ima Adama LivMaraboo 2 1 1 1Karnaf 1 2 0 2Ima Adama 1 0 1 0Liv 1 2 0 2
ATAItem similarity matrix
Similarity of item x to item y is <i ,i >x y
Monday, January 14, 13

MapReduce Similarity Calculation
Recall row outer-product matrix multiplication:
Maraboo Karnaf Ima Adama LivMaraboo 0 0 0 0Karnaf 0 1 0 1Ima Adama 0 0 0 0Liv 0 1 0 1
Maraboo Karnaf Ima Adama LivMaraboo 1 0 1 0Karnaf 0 0 0 0Ima Adama 1 0 1 0Liv 0 0 0 0
Maraboo Karnaf Ima Adama LivMaraboo 1 1 0 1Karnaf 1 1 0 1Ima Adama 0 0 0 0Liv 1 1 0 1
Maraboo Karnaf Ima Adama LivMaraboo 2 1 1 1Karnaf 1 2 0 2Ima Adama 1 0 1 0Liv 1 2 0 2
=
+ +
uIdanuIdan uShaharuShahar uGadiuGadiT TT
Only one user’s list of items is used every time!
Monday, January 14, 13

MapReduce Similarity Calculation
All of the classic similarity functions aremade up of 3 stages:
Preprocess (uses only one ELEMENT)
Norm (Can be done in reduce on one VECTOR)
Similarity utilizes the A A matrix joinedwith norm entries
T
Monday, January 14, 13

BibliographyGoogle News Personalization: Scalable Online Collaborative Filtering - Das, Datar, Garg, Rajaram, WWW2007
Logistic Regression and Collaborative Filtering for Sponsored Search Term Recommendation - Bartz, Murthi, Sebastian, EC2006
Evaluating Collaborative Filtering Recommender Systems - Herlocker, Konstan, Tenveen, Riedl, ACM TIS2004
A Survey of Collaborative Filtering Techniques - Su, Khoshgoftaar, AAI2009
An Introduction to Information Retrieval - Manning, Raghavan, Schutze, Cambridge Press
Mahout in Action - Friedman, Dunning, Anil, Owen, Manning Publications
Lessons from the Netflix Prize Challenge - Bell, Koren, KDD2009
Factorization meets the Neighbourhood: a Multifaceted Collaborative Filtering Model - Koren, KDD2008
Accurate Methods for the Statistics of Surprise and Coincidence - Dunning, ACL1993
Item-Based Collaborative Filtering Recommendation Algorithms - Sarwar, Konstan, Karypis, Riedl, WWW2001
Matrix Factorization Techniques for Recommender Systems - Koren, Bell, Volinsky, IEEE2009
recommenderlab: A Framework for Developing and Testing Recommendation Algorithms - Hahsler, 2001
Scalable Similarity-Based Neighbourhood Methods with MapReduce - Schelter, Boden, Markl, RecSys2012
Monday, January 14, 13

Thanks!
Nimrod Priell n[email protected]@nimrodpriellhttp://www.educated-guess.com
Monday, January 14, 13