speaker pham cong dinh

23
A quick introduction to item-based collaborative filtering Pham Cong Dinh @pcdinh PHPDay Saigon 2012

Upload: aiti-education

Post on 01-Jul-2015

426 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Speaker pham cong dinh

A quick introduction to item-based collaborative filtering

Pham Cong Dinh @pcdinhPHPDay Saigon 2012

Page 2: Speaker pham cong dinh

Outline

● PHP popularity and challenges to produce engaging content

● Recommendation engine at work● How to build a item-based collaborative

filtering-based recommendation engine

Page 3: Speaker pham cong dinh

PHP is everywhere

● W3Tech report in 2012●

Page 4: Speaker pham cong dinh

PHP website distribution

● Reported by builtwith.com in 2012 (more than 28 millions site in PHP)

Page 5: Speaker pham cong dinh

You have a website. Now what?

Page 6: Speaker pham cong dinh

Information overload

From http://bethesignal.org/

ORno engaging

content?

Page 7: Speaker pham cong dinh

Why recommendation system?

Page 8: Speaker pham cong dinh

Recommendation engine at work

Page 9: Speaker pham cong dinh

Recommendation engine at work

Page 10: Speaker pham cong dinh

Build a recommendation system

● Collaborative filtering: user and item– Filtering: automatic predictions about the interests

of a user

– Collaborative: many users (preferences or taste information)

Page 11: Speaker pham cong dinh

Item-based collaborative filtering

● Model-based– The similarities between different items in the data

set are calculated

– Predict ratings for user-item pairs not present in the data set

Page 12: Speaker pham cong dinh

Steps to do item-based collaborative filtering

● Data collection and representations (preferences/taste …)

● Finding the relationships and determine the similarity

● Recommendation computations - recommendations/suggestions/discoveries (produce engaging content)

Page 13: Speaker pham cong dinh

Collaborative filtering: data collection

● Data collection and representations (preferences/taste …)

– Clicks

– Likes, favorites

– Watch, read

– Survey

– Ratings

– Others …

● E.x: Find the set of movies that user X likes

(user, item)

✗X,1

✗X,2

✗Y,1

✗Y,2

✗Z,2

✗Z,3

● Data collection and representations (preferences/taste …)

– Clicks

– Likes, favorites

– Watch, read

– Survey

– Ratings

– Others …

● E.x: Find the set of movies that user X likes

Page 14: Speaker pham cong dinh

Collaborative filtering: Similarity (1)

● Finding the relationships and determine the similarity

– The similarity values between items are measured by observing all the users who have interacted (rated) both the items

● E.x: Find a group of movies that is similar to these set of movies that we know user X likes

Page 15: Speaker pham cong dinh

Collaborative filtering: Similarity (2)

● Manhattan distance: |x1 – x2| + |y1 - y2|●

User(x, y)

Amy(5, 5)Bill(2, 5)Jim(1, 4)

Item(x1, x2, x3) → RatingsSnow Crash(5, 2, 1)Girl with the Dragon Tattoo (5, 5, 1)

Manhattan distance→ Amy – Bill: |5 – 2| + |5 – 5| = 3→ Snow Crash - Girl with the Dragon Tattoo: 3

X

Y

X Y

Page 16: Speaker pham cong dinh

Collaborative filtering: Similarity (3)

● Cosine distance: the angle between these vectors. Value: -1 (no related) to 1

Item(x1, x2, x3) → RatingsSnow Crash(5, 2, 1)Girl with the Dragon Tattoo (5, 5, 1)

Cosine distance→ Snow Crash - Girl with the Dragon Tattoo: (5x5 + 2x5 + 1x1) / (( 5x5 + 2x2 + 1 x 1) x ( 5x5 + 5x5 + 1x1))

PHP: https://github.com/aoiaoi/CosineSimilarity/blob/master/CosineSimilarity.php

Page 17: Speaker pham cong dinh

Collaborative filtering: Similarity (4)

● Pearson Correlation Coefficient: from -1 (no related) to +1

● How much the ratings by common users for a pair of items deviate from average ratings for those items

● Correlation is basically the average product

Page 18: Speaker pham cong dinh

Collaborative filtering: Similarity (5)

● Euclidean distance: the "ordinary" distance between two points.

● Values: Near 0 (no related) to 1

Page 19: Speaker pham cong dinh

Collaborative filtering: Similarity (6)

● Spearman distance: Spearman distance is a square of Euclidean Distance between two rank vectors. A perfect positive correlation is +1 and a perfect negative correlation is -1.

● Spearman Rank Correlation: The range of Spearman Correlation is from -1 to 1 (a perfect Spearman correlation of +1)

Page 20: Speaker pham cong dinh

Collaborative filtering: Similarity (6)

● Adjusted Euclidean distance: take length of vectors into account

Page 21: Speaker pham cong dinh

Collaborative filtering: Recommendation computations

● Calculate similarity between Item A that user X watch/buy/like with items that User X does not watch/buy/like

● Score all the items (e.x: apply weighted algorithms – average score by the other)

● Sorting● Return top-N items

Page 22: Speaker pham cong dinh

Collaborative filtering: Other issues

● Accuracy of Predicting Ratings. To evaluate

accuracy when predicting unrated item for the active user, use Mean Absolute Error (MAE).

● Accuracy of Recommendations. To evaluate the accuracy of recommendations, use Mean Average Precision (MAP), which is defined as Average of the Average Precision (AP) value for a set of queries (a query could be considered as a user’s asking for recommending items in recommender systems).

Page 23: Speaker pham cong dinh

The End

● Q & A