collaborative filtering with ccam
DESCRIPTION
Published by ICMLA'11 in Honolulu, Hawaii.TRANSCRIPT
COLLABORATIVE
FILTERING WITH CCAMPresenter: Meng-Lun Wu
Author: Meng-Lun Wu, Chia-Hui Chang and Rei-Zhe Liu
Date: 2011/12/21
ICMLA'11, Honolulu, Hawaii 1
Outline
• Introduction
• Related Work
• Preliminary
• Collaborative Filtering with CCAM
• Experiment
• Conclusion
ICMLA'11, Honolulu, Hawaii 2
Introduction (1/2)
• In any recommender system, the number of ratings already
obtained is usually very small compared to the number of
ratings that need to be predicted.
• A possible solution turns out to be dimensionality reduction
methods which can alleviate data sparsity.
• Typically, clustering is the simplest way that can be extended
over recommender systems to achieve a compact model and
avoid the sparsity problem.
ICMLA'11, Honolulu, Hawaii 3
Introduction (2/2)
• In the past years, co-clustering based on information theory has
attracted more and more attention.
• We have extended a co-clustering algorithm based on
information theory to augmented data matrix which called Co-
Clustering with Augmented data Matrix, CCAM.
• In this paper, we consider how to alleviate the sparsity problem
and achieve a precise prediction by Collaborative Filtering with
CCAM.
ICMLA'11, Honolulu, Hawaii 4
Related Work
• Information theoretical co-clustering
• Dhillon et al. (2003) developed from information theory and tried to
optimize the objective function based on the loss of mutual information
between clustered random variables.
• Matrix factorization co-clustering
• Chen et al. (2008) linearly combined user-based, item-based CF
method, and matrix factorization results in order to make prediction on
ratings which relied on ONMTF.
• Li et al. (2009) presented a novel cross-domain collaborative filtering
method which co-clusters movie information via ONMTF and
reconstructs knowledge for recommending books and movies.
ICMLA'11, Honolulu, Hawaii 5
Preliminary (1/2)
• Suppose that we are given a clicking information matrix R
which is composed of user set, U={u1, u2, …, unu} and a set of
ad, A={a1, a2, …, ana}.
• nu and na respectively represents the number of users and ads.
• For memory-based CF methods, before finding similar
neighbors, it is inevitable to encounter sparsity issues of
demanded data.
• In the research of Dhillon et al. (2003), they considered a co-clustering
algorithm which monotonically decreases the information loss of tabular data
to form a compact model.
ICMLA'11, Honolulu, Hawaii 6
Preliminary (2/2)
• Assume U and A are random variable sets with a joint probability distribution p(U, A) and marginal distribution p(U) and p(A). The mutual information I(U; A) is defined as
• Suppose there are G1 user clusters CU={cu(1), cu
(2), …, cu(G1)} and, G2
ad clusters CA={ca(1), ca
(2), …, ca(G2)}, in order to judge the quality of
a co-clustering, we define the loss in mutual information as
• PROPOSITION 1. There are also properties that are declared and proven, they are
ICMLA'11, Honolulu, Hawaii 7
Co-Clustering with Augmented data
Matrix, CCAM (1/4)• When the optimization problem of loss in mutual information is first
proposed by Dhillon et al. (2003), it was designed and applied for
single tabular data.
• However, in many cases besides the major data set, there exist related tables which
may provide some useful information.
• In this co-clustering approach, Co-Clustering with Augmented data
Matrix (CCAM), we will simultaneously modify the co-clusters of
multiple augmented data to reduce the information loss.
• The other two sets of components, feature set F={f1, f2, …, fnf}, and
profile set P={p1, p2, …, pnp}, are extensive information for ads and
users and form the augmented matrices
• where nf and np denotes the number of features and profiles, respectively.
ICMLA'11, Honolulu, Hawaii 8
Co-Clustering with Augmented data
Matrix, CCAM (2/4)• PROPOSITION 2. There are extensive properties recognized
when p(A, F) and p(U, P) were considered.
• which were also declared and proven.
• DEFINITION 1. An optimal co-cluster (CU, CA) we desire to
obtain would minimize
ICMLA'11, Honolulu, Hawaii 9
Co-Clustering with Augmented data
Matrix, CCAM (3/4)•
ICMLA'11, Honolulu, Hawaii 10
Algorithm 1Co-Clustering with Augmented data Matrix algorithm
ICMLA'11, Honolulu, Hawaii 11
Collaborative filtering with CCAM
(1/5)•
ICMLA'11, Honolulu, Hawaii 12
Collaborative filtering with CCAM
(2/5)• DEFINITION 3. Since CCAM is designed on the base of KL-
divergence, the distance metrics would be in a similar format.
• Here we define the distance between each user and user cluster and each ad and
ad cluster.
• Note that the ad cluster prototype and user cluster prototype of
CCAM would be regarded as
ICMLA'11, Honolulu, Hawaii 13
Collaborative filtering with CCAM (3/5)
•
ICMLA'11, Honolulu, Hawaii 14
Collaborative filtering with CCAM (4/5)
•
ICMLA'11, Honolulu, Hawaii 15
Collaborative filtering with CCAM (5/5)
ICMLA'11, Honolulu, Hawaii 16
Data set
• The data set used in the experiments are obtained from a financial social web-site, Ad$Mart, which ranged from 2009/09/01 to 2010/03/31.
• For each test user, 15 observed clicking rates (Given15) are provided to find nearest neighbors and the remaining clicking rates are used for evaluation.
• To ensure each test user would click at least 15 ads, users with more than 20 clicked ads and ads with more than 10 clicked user-ad pairs would be reserved.• User-Ad: The pre-processing clicking data is provided by 1786 users and 520 ads. After
preprocessing, we make it a joint probability distribution over user and ad, and also reform it into a clicking rate matrix scaled from 1-5.
• Ad-Feature: An advertisement feature data set compiling 37 statistics of 530 ads.
• User-Profile: A questionnaire data set provided by 520 users on 24 survey questions.
ICMLA'11, Honolulu, Hawaii 17
Evaluation methodology (1/2)
•
ICMLA'11, Honolulu, Hawaii 18
Evaluation methodology (2/2)
•
ICMLA'11, Honolulu, Hawaii 19
•
and tuning based on k-NN
ICMLA'11, Honolulu, Hawaii 20
G1 and G2 tuning based on K-Means
• We also have to determine what value of G1 would result in a
well-performed MAE.
• We simply make G2=10 as well as K1 = K2 = 5, and as a strategy to avoid too
many parameter tunings.
• On this issue, we will see the responding of k-Means with different G1
(7, 15, 30, 60) and reserve the best one in order to apply to the other
algorithms.
ICMLA'11, Honolulu, Hawaii 21
Parameter tuning with CCAM (1/2)
• In order to evaluate the result of co-clustering, we take
advantage of classification algorithm (Weka J48) on user data to
test the F-measure of 10-fold c.v., and similarly in ad aspect.
• We use the clustering result of the user data (user-ad matrix and user-profile
matrix) as the target labels for evaluation of user clustering, and is similar to
the ad data (ad-user matrix and ad-feature matrix).
• To examine the effectiveness of co-clustering, we reduce the columns of user-
ad matrix to a smaller user-ad cluster matrix. The reduced data is then inserted
into our user data for classification, so as the ad data.
User dataUser-
ad cluster
Clustering result
of user-ad and
user-profile
ICMLA'11, Honolulu, Hawaii 22
Parameter tuning with CCAM (2/2)
• We find that when G1=60, the best setting will be λ=0.2, φ=0.1.
• Therefore, we will then apply the result of the optimal parameters of CCAM in the next section to compare with the other algorithms.
•
ICMLA'11, Honolulu, Hawaii 23
Results
• Table 3 compare the model-
based approaches.
• Table 4 compare the hybrid
models approaches with the
previous parameter settings.
ICMLA'11, Honolulu, Hawaii 24
Conclusion
• In this paper, we applied the rating framework of Chen’s to evaluate the performance of hybrid CF with various model construction.
• In order to give a fair comparison, we start by tuning for the best performance in each individual approach.
• As a result, we compared four algorithm, CCAM, ITCC, k-Means and k-NN. The MAE metric has shown that CCAM outperformed the other three algorithms.
• In the future, to have more thorough discussions, we will investigate our algorithm on different real world data set.
• such as the MovieLens, EachMovie and Book-Crossing data sets which respectively contains movie and book rating data of users.
ICMLA'11, Honolulu, Hawaii 25
THANK YOU FOR
LISTENING.Q & A
ICMLA'11, Honolulu, Hawaii 26