1 top-n recommendation algorithm based on item-graph allen, zhenjiang lin cse, cuhk nov 13, 2007
Post on 15-Jan-2016
223 views
TRANSCRIPT
1
Top-N Recommendation Algorithm Based on Item-Graph
Allen, Zhenjiang LIN CSE, CUHK
Nov 13, 2007
2
Outline
1. Top-N Recommendation Problem
2. Top-N Recommendation Algorithm
3. Item-Graph Model and GCP-based Method Item-Graph Model
Generalized Conditional Probability (GCP)-based
Recommendation Algorithm
4. Preliminary Experimental Results
5. Conclusion and Future Work
3
1. Top-N Recommendation Problem The Top-N Recommendation Problem
Given the preference information of users, recommend a set of N items to a certain user that he might be interested in, based on the items he has selected. E-commerce system example: Amazon. COM,
customers vs products.
Item 1
Item 2 Item 3 … Item m
User 1 1 0 1 0
User 2 1 1 0 0
…
User n 0 1 0 1
New User
1 ? 1 ? ?Bask
et
Active
User
User-Item
matrix
4
Example: the Amazon.com
Basket
Active User
Recommendations
5
1. Top-N Recommendation Problem Challenges in E-commerce Systems
Huge amounts of data: millions of users and/or items;
Real-time return the results set; Limited new user’s preference information; Volatile users’ preference information.
6
Two major approaches Content-based: recommend items based on the content
(textual information) of items. Fab system [Balabanovic97], Syskill & Webert system
[Pazzani97].
Collaborative Filtering (CF): recommend items by collecting taste information from other users. Collaborative (correlation) information between users.
More popular than content-based recommendation, since in many domains (such as music, restaurants) it is hard to extract useful features from items.
Tapestry system [Goldberg92], Video Recommender [Hill95], Ringo [Shardanand95], GroupLens [Konstan97], Jester system [Goldberg01], Amazon [Linden03].
2. Top-N Recommendation Algorithm
7
CF algorithms classified by strategy of using data Memory-based: make recommendations based on the entire
collection of references of the users. No pre-computing is needed, suffer serious scalability problem.
E.g., Correlation-based [Resnick94], Cosine-based [Breese98].
Model-based: use the collection of user preferences to learn a model, which is then used to make recommendations. Building a model off-line, more scalable.
E.g., Cluster models [Ungar98], Bayesian network model [Breese98], Association Rule Mining approach [Lin00].
2. Top-N Recommendation Algorithm
8
CF algorithms classified by strategy of using objects User-centric: look for similar (like-minded) users first
and then make recommendation. Similarity between users is relatively dynamic. Pre-computing user neighborhood may lead to poor
predictions.
Item-centric: look for similar (or related) items first and then make recommendation. Similarity between items is relatively static. Enables pre-computing of item-item similarity. More scalable.
2. Top-N Recommendation Algorithm
9
Notations Item set I = {I1, I2, …, Im}.
User set U = {U1, U2, …, Un}.
User-Item (binary) matrix D = (Dn,m).
Basket of the active user B I. Similarity score of x and y: sim(x,y).
Formal definition of top-N recommendation problem Given a user-item matrix D and a set of items B that have been
selected by the active user, identify an ordered set of items X, such that |X| ≤ N, and X ∩B = 0.
2. Top-N Recommendation Algorithm
10
Two classical item-item similarity measures Cosine-based (symmetric)
sim(Ii, Ij) = cos(D*,i, D*,j) (1)
Conditional Probability(CP)-based (asymmetric)
sim(Ii, Ij) = P(Ij | Ii) ≈ Freq(Ii Ij) / Freq(Ii) (2)
Freq(X): the number of users who have purchased the item set X.
The ranking score for item x
RS(x) = ∑ b∈B sim(b,x) (3)
(the sum of similarity score between x and the items in the basket B)
2. Top-N Recommendation Algorithm
11
4. Preliminary Experimental Results Dataset
The MovieLens (http://www.grouplens.org/data) A web-based movies recommender system; Contains multi-valued ratings that indicate how much each
user liked a particular movie or not; Each user has rated at least 20 movies. We treat the ratings as an indication that the users have
seen the movies (nonzero) or not (zero).
# of Users # of Items Density1 Average Basket Size
943 1682 6.31% 106.04
Table 1: The characteristics of the MovieLens dataset
1Density: the percentage of nonzero entries in the user-item matrix.
12
4. Preliminary Experimental Results-1 Evaluation Design
Split the dataset into training and test sets by randomly selecting one rated movie of each user to be part of the test
set, use the remaining rated movies for training.
Cosine(COS)-based, CP-based, GCP-based methods, 10-runs average.
Evaluation Metrics Hit-Rate (HR)
HR = # of hits / n (6) Average Reciprocal Hit-Rate (ARHR)
ARHR = (∑i=1,h1/pi) / n (7)# of hits: the number of items in the test set that were also in the top-N lists.h is the number of hits that occurred at positions p1, p2, … , ph within the top-N lists (i.e., 1 ≤ pi ≤ N).
13
4. Preliminary Experimental Results-1 Performance of Top-N Recommendation Algorithms
HR (left): x-axis: top-N items, y-axis: hit-rate of all users.ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of all
users.
(For the GCP-based method, set d = 2.)
14
4. Preliminary Experimental Results-2 Testing the Parameter d in GCP Method
Testing the effect of d ( d = 1, 2, 3 ).
Evaluation: Online Shopping Simulation Randomly selecting part of the user records to be the training
set; Use the remaining user records for training. STEP 0: Constructing the item-graph based on the training set; STEP 1: for each user in the training set
randomly moving one item out of the user’s basket and make recommendation based on the remaining items in the basket;
computing the order of this item in the recommendation list; updating the item-graph.
STEP 2: Computing HR and ARHR metrics.
15
4. Preliminary Experimental Results-2 Performance of Top-N Recommendation Algorithms
HR (left): x-axis: top-N items, y-axis: hit-rate of all users.ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of
all users.
16
5. Conclusion and Future Work
Conclusion Top-N Recommendation Problem and item-centric Algorithms
Cosine-based, conditional probability-based Item-Graph model
Visualizing the relationship among items. Easy to update.
Generalized Conditional Probability-based top-N recommendation algorithm Item-centric & based on the Item-Graph model
Future Work Clustering items and measuring item-item similarities based on
the Item-Graph model Speeding up the GCP method.
17
References
[Balabanovic97] M. Balabanovic and Y. Shoham. Fab: Content-based, Collaborative Recommendation. Commun. ACM, 40(3):66-72, 1997.
[Breese98] J. S. Breese, D. Heckerman, David and C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 43-52, San Francisco, 1998.
[Deshpande04] M. Deshpande and G. Karypis. Item-based Top-N Recommendation Algorithms. ACM Trans. Inf. Syst., 22(1):143-177, 2004.
[Lin00] W. Lin. Association Rule Mining for Collaborative Recommender Systems. Thesis submitted for the Degree of M.S. in Computer Science.
[Linden03] G. Linden, B. Smith and J. York. Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing, 7(1):76-80, 2003.
[Resnick94] P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. Proc. Computer Supported Cooperative Work Conf., pages 175-186, 1994.