1 top-n recommendation algorithm based on item-graph allen, zhenjiang lin cse, cuhk nov 13, 2007

1

Top-N Recommendation Algorithm Based on Item-Graph

Allen, Zhenjiang LIN CSE, CUHK

Nov 13, 2007

2

Outline

1. Top-N Recommendation Problem

2. Top-N Recommendation Algorithm

3. Item-Graph Model and GCP-based Method Item-Graph Model

Generalized Conditional Probability (GCP)-based

Recommendation Algorithm

4. Preliminary Experimental Results

5. Conclusion and Future Work

3

1. Top-N Recommendation Problem The Top-N Recommendation Problem

Given the preference information of users, recommend a set of N items to a certain user that he might be interested in, based on the items he has selected. E-commerce system example: Amazon. COM,

customers vs products.

Item 1

Item 2 Item 3 … Item m

User 1 1 0 1 0

User 2 1 1 0 0

…

User n 0 1 0 1

New User

1 ? 1 ? ?Bask

et

Active

User

User-Item

matrix

4

Example: the Amazon.com

Basket

Active User

Recommendations

5

1. Top-N Recommendation Problem Challenges in E-commerce Systems

Huge amounts of data: millions of users and/or items;

Real-time return the results set; Limited new user’s preference information; Volatile users’ preference information.

6

Two major approaches Content-based: recommend items based on the content

(textual information) of items. Fab system [Balabanovic97], Syskill & Webert system

[Pazzani97].

Collaborative Filtering (CF): recommend items by collecting taste information from other users. Collaborative (correlation) information between users.

More popular than content-based recommendation, since in many domains (such as music, restaurants) it is hard to extract useful features from items.

Tapestry system [Goldberg92], Video Recommender [Hill95], Ringo [Shardanand95], GroupLens [Konstan97], Jester system [Goldberg01], Amazon [Linden03].


7

CF algorithms classified by strategy of using data Memory-based: make recommendations based on the entire

collection of references of the users. No pre-computing is needed, suffer serious scalability problem.

E.g., Correlation-based [Resnick94], Cosine-based [Breese98].

Model-based: use the collection of user preferences to learn a model, which is then used to make recommendations. Building a model off-line, more scalable.

E.g., Cluster models [Ungar98], Bayesian network model [Breese98], Association Rule Mining approach [Lin00].


8

CF algorithms classified by strategy of using objects User-centric: look for similar (like-minded) users first

and then make recommendation. Similarity between users is relatively dynamic. Pre-computing user neighborhood may lead to poor

predictions.

Item-centric: look for similar (or related) items first and then make recommendation. Similarity between items is relatively static. Enables pre-computing of item-item similarity. More scalable.


9

Notations Item set I = {I1, I2, …, Im}.

User set U = {U1, U2, …, Un}.

User-Item (binary) matrix D = (Dn,m).

Basket of the active user B I. Similarity score of x and y: sim(x,y).

Formal definition of top-N recommendation problem Given a user-item matrix D and a set of items B that have been

selected by the active user, identify an ordered set of items X, such that |X| ≤ N, and X ∩B = 0.


10

Two classical item-item similarity measures Cosine-based (symmetric)

sim(Ii, Ij) = cos(D*,i, D*,j) (1)

Conditional Probability(CP)-based (asymmetric)

sim(Ii, Ij) = P(Ij | Ii) ≈ Freq(Ii Ij) / Freq(Ii) (2)

Freq(X): the number of users who have purchased the item set X.

The ranking score for item x

RS(x) = ∑ b∈B sim(b,x) (3)

(the sum of similarity score between x and the items in the basket B)


11

4. Preliminary Experimental Results Dataset

The MovieLens (http://www.grouplens.org/data) A web-based movies recommender system; Contains multi-valued ratings that indicate how much each

user liked a particular movie or not; Each user has rated at least 20 movies. We treat the ratings as an indication that the users have

seen the movies (nonzero) or not (zero).

# of Users # of Items Density1 Average Basket Size

943 1682 6.31% 106.04

Table 1: The characteristics of the MovieLens dataset

1Density: the percentage of nonzero entries in the user-item matrix.

12

4. Preliminary Experimental Results-1 Evaluation Design

Split the dataset into training and test sets by randomly selecting one rated movie of each user to be part of the test

set, use the remaining rated movies for training.

Cosine(COS)-based, CP-based, GCP-based methods, 10-runs average.

Evaluation Metrics Hit-Rate (HR)

HR = # of hits / n (6) Average Reciprocal Hit-Rate (ARHR)

ARHR = (∑i=1,h1/pi) / n (7)# of hits: the number of items in the test set that were also in the top-N lists.h is the number of hits that occurred at positions p1, p2, … , ph within the top-N lists (i.e., 1 ≤ pi ≤ N).

13

4. Preliminary Experimental Results-1 Performance of Top-N Recommendation Algorithms

HR (left): x-axis: top-N items, y-axis: hit-rate of all users.ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of all

users.

(For the GCP-based method, set d = 2.)

14

4. Preliminary Experimental Results-2 Testing the Parameter d in GCP Method

Testing the effect of d ( d = 1, 2, 3 ).

Evaluation: Online Shopping Simulation Randomly selecting part of the user records to be the training

set; Use the remaining user records for training. STEP 0: Constructing the item-graph based on the training set; STEP 1: for each user in the training set

randomly moving one item out of the user’s basket and make recommendation based on the remaining items in the basket;

computing the order of this item in the recommendation list; updating the item-graph.

STEP 2: Computing HR and ARHR metrics.

15

4. Preliminary Experimental Results-2 Performance of Top-N Recommendation Algorithms

HR (left): x-axis: top-N items, y-axis: hit-rate of all users.ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of

all users.

16

5. Conclusion and Future Work

Conclusion Top-N Recommendation Problem and item-centric Algorithms

Cosine-based, conditional probability-based Item-Graph model

Visualizing the relationship among items. Easy to update.

Generalized Conditional Probability-based top-N recommendation algorithm Item-centric & based on the Item-Graph model

Future Work Clustering items and measuring item-item similarities based on

the Item-Graph model Speeding up the GCP method.

17

References

[Balabanovic97] M. Balabanovic and Y. Shoham. Fab: Content-based, Collaborative Recommendation. Commun. ACM, 40(3):66-72, 1997.

[Breese98] J. S. Breese, D. Heckerman, David and C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 43-52, San Francisco, 1998.

[Deshpande04] M. Deshpande and G. Karypis. Item-based Top-N Recommendation Algorithms. ACM Trans. Inf. Syst., 22(1):143-177, 2004.

[Lin00] W. Lin. Association Rule Mining for Collaborative Recommender Systems. Thesis submitted for the Degree of M.S. in Computer Science.

[Linden03] G. Linden, B. Smith and J. York. Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing, 7(1):76-80, 2003.

[Resnick94] P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. Proc. Computer Supported Cooperative Work Conf., pages 175-186, 1994.

1 top-n recommendation algorithm based on item-graph allen, zhenjiang lin cse, cuhk nov 13, 2007

Documents

n recommendation problemgiven

contentbased recommendation

n recommendation problemthe

n recommendation problem2

n recommendation algorithm3

set of n items

useritem matrixexample

useritem matrix d