speaker pham cong dinh
TRANSCRIPT
A quick introduction to item-based collaborative filtering
Pham Cong Dinh @pcdinhPHPDay Saigon 2012
Outline
● PHP popularity and challenges to produce engaging content
● Recommendation engine at work● How to build a item-based collaborative
filtering-based recommendation engine
PHP is everywhere
● W3Tech report in 2012●
●
●
PHP website distribution
● Reported by builtwith.com in 2012 (more than 28 millions site in PHP)
●
●
●
You have a website. Now what?
Information overload
From http://bethesignal.org/
ORno engaging
content?
Why recommendation system?
●
Recommendation engine at work
Recommendation engine at work
Build a recommendation system
● Collaborative filtering: user and item– Filtering: automatic predictions about the interests
of a user
– Collaborative: many users (preferences or taste information)
Item-based collaborative filtering
● Model-based– The similarities between different items in the data
set are calculated
– Predict ratings for user-item pairs not present in the data set
Steps to do item-based collaborative filtering
● Data collection and representations (preferences/taste …)
● Finding the relationships and determine the similarity
● Recommendation computations - recommendations/suggestions/discoveries (produce engaging content)
Collaborative filtering: data collection
● Data collection and representations (preferences/taste …)
– Clicks
– Likes, favorites
– Watch, read
– Survey
– Ratings
– Others …
● E.x: Find the set of movies that user X likes
(user, item)
✗X,1
✗X,2
✗Y,1
✗Y,2
✗Z,2
✗Z,3
● Data collection and representations (preferences/taste …)
– Clicks
– Likes, favorites
– Watch, read
– Survey
– Ratings
– Others …
● E.x: Find the set of movies that user X likes
Collaborative filtering: Similarity (1)
● Finding the relationships and determine the similarity
– The similarity values between items are measured by observing all the users who have interacted (rated) both the items
● E.x: Find a group of movies that is similar to these set of movies that we know user X likes
Collaborative filtering: Similarity (2)
● Manhattan distance: |x1 – x2| + |y1 - y2|●
●
User(x, y)
Amy(5, 5)Bill(2, 5)Jim(1, 4)
Item(x1, x2, x3) → RatingsSnow Crash(5, 2, 1)Girl with the Dragon Tattoo (5, 5, 1)
Manhattan distance→ Amy – Bill: |5 – 2| + |5 – 5| = 3→ Snow Crash - Girl with the Dragon Tattoo: 3
X
Y
X Y
Collaborative filtering: Similarity (3)
● Cosine distance: the angle between these vectors. Value: -1 (no related) to 1
Item(x1, x2, x3) → RatingsSnow Crash(5, 2, 1)Girl with the Dragon Tattoo (5, 5, 1)
Cosine distance→ Snow Crash - Girl with the Dragon Tattoo: (5x5 + 2x5 + 1x1) / (( 5x5 + 2x2 + 1 x 1) x ( 5x5 + 5x5 + 1x1))
PHP: https://github.com/aoiaoi/CosineSimilarity/blob/master/CosineSimilarity.php
Collaborative filtering: Similarity (4)
● Pearson Correlation Coefficient: from -1 (no related) to +1
●
●
●
●
● How much the ratings by common users for a pair of items deviate from average ratings for those items
● Correlation is basically the average product
Collaborative filtering: Similarity (5)
● Euclidean distance: the "ordinary" distance between two points.
●
●
●
●
● Values: Near 0 (no related) to 1
Collaborative filtering: Similarity (6)
● Spearman distance: Spearman distance is a square of Euclidean Distance between two rank vectors. A perfect positive correlation is +1 and a perfect negative correlation is -1.
●
●
● Spearman Rank Correlation: The range of Spearman Correlation is from -1 to 1 (a perfect Spearman correlation of +1)
Collaborative filtering: Similarity (6)
● Adjusted Euclidean distance: take length of vectors into account
●
Collaborative filtering: Recommendation computations
● Calculate similarity between Item A that user X watch/buy/like with items that User X does not watch/buy/like
● Score all the items (e.x: apply weighted algorithms – average score by the other)
● Sorting● Return top-N items
Collaborative filtering: Other issues
● Accuracy of Predicting Ratings. To evaluate
accuracy when predicting unrated item for the active user, use Mean Absolute Error (MAE).
● Accuracy of Recommendations. To evaluate the accuracy of recommendations, use Mean Average Precision (MAP), which is defined as Average of the Average Precision (AP) value for a set of queries (a query could be considered as a user’s asking for recommending items in recommender systems).
The End
● Q & A