collaborative filtering recommendation algorithm based on hadoop
TRANSCRIPT
2015/2/27
Scaling-up Item-based Collaborative Filtering Recommendation Algorithm based on Hadoop
Jing Jiang, Jie Lu, Guangquan Zhang, Guodong Long 2011 IEEE World Congress Services
outline
✤ Collaborative Filtering
✤ scaling-up item-based CF
✤ experimentation and evaluation
Collaborative Filtering
✤ Collaborative filtering (CF) techniques have achieved widespread success in E-commerce nowadays.
Collaborative Filtering
✤ Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). from wiki
Collaborative Filtering
1. Weight all users with respect to similarity with active user
2. Select a subset of users to use as a set of predictors
3. Compute a prediction from a weighted combination of selected neighbors’ ratings
1. Weight all users with respect to similarity with active user
2. Select a subset of users to use as a set of predictors
3. Compute a prediction from a weighted combination of selected neighbors’ ratings
simple compute
Nathan [5,1,5]
Joe [5,2,5]
John [2,5,2.5]
Al [2,2,4]
use cosine compute similarity
cos (Nathan,Joe) 0.99
cos (Nathan,John) 0.64
cos (Nathan,Al) 0.91
1. Weight all users with respect to similarity with active user
2. Select a subset of users to use as a set of predictors
3. Compute a prediction from a weighted combination of selected neighbors’ ratings
simple compute
cos (Nathan,Joe) 0.99
cos (Nathan,John) 0.64
cos (Nathan,Al) 0.91
(0.99*4+0.64*3+0.91*2)/(0.99+0.64+0.91) = 3.03
0.99
0.910.64
? = 3.03
Collaborative Filtering
✤ User-Based CF
✤ Item-Based CF
compute similarity base on user
compute similarity base on item
Collaborative Filtering
✤ User-Based CFcompute similarity base on user
if predict user A to item4 rating user B to item4 rating is 5 user F to item4 rating is 1
user A to item4 =
5 * similarities (user A, user B) + 1 * similarities (user A, user F)
similarities (user A, user B) + similarities (user A, user F)
Collaborative Filtering
✤ Item-Based CFcompute similarity base on item
if predict user A to item4 rating user A to item2 rating is 1 user A to item3 rating is 1
user A to item4 =
1 * similarities (item2, item4) + 1 * similarities (item3, item4)
similarities (item2, item4) + similarities (item3, item4)
scaling-up item-based CF
divide CF algorithm into two steps as follows:
Similarity computation
Prediction and Recommendation
pearson correlation(1,-1)
j
scaling-up item-based CF
pearson correlation(1,-1)
j
Covariance
scaling-up item-based CF
Similarity computation
apple milk toast
sam 2 0 4
john 5 5 3
tim 2 4 ?
u
i j
j
Ri = (2+5+2)/3 Rj = (4+3)/2
scaling-up item-based CF
Similarity computation
apple milk toast
sam 2 0 4
john 5 5 3
tim 2 4 ?
u
j i
Ru(sam) = (2+0+4)/3
Rj = (2+5+2)/3 Ri = (4+3)/2
scaling-up item-based CF
The three parts of intensive computation are:
(1) computing the average rating for each item
(2) computing the similarity between item pairs
(3) computing predicted items for the target user
item i by user j
map item i
1 2 3
1
where means the set of users who rated the item k and item l
2
similarity
3
map user jmap user j
experimentation and evaluation
3 nodes
nodes with Intel P4 CPU, 1G RAM, 80G disk
All the machines were connectedwith one 100Mbps switch.
experimentation and evaluation
13
20