collaborative filtering with ccam

COLLABORATIVE

FILTERING WITH CCAMPresenter: Meng-Lun Wu

Author: Meng-Lun Wu, Chia-Hui Chang and Rei-Zhe Liu

Date: 2011/12/21

ICMLA'11, Honolulu, Hawaii 1

Outline

• Introduction

• Related Work

• Preliminary

• Collaborative Filtering with CCAM

• Experiment

• Conclusion


Introduction (1/2)

• In any recommender system, the number of ratings already

obtained is usually very small compared to the number of

ratings that need to be predicted.

• A possible solution turns out to be dimensionality reduction

methods which can alleviate data sparsity.

• Typically, clustering is the simplest way that can be extended

over recommender systems to achieve a compact model and

avoid the sparsity problem.


Introduction (2/2)

• In the past years, co-clustering based on information theory has

attracted more and more attention.

• We have extended a co-clustering algorithm based on

information theory to augmented data matrix which called Co-

Clustering with Augmented data Matrix, CCAM.

• In this paper, we consider how to alleviate the sparsity problem

and achieve a precise prediction by Collaborative Filtering with

CCAM.


Related Work

• Information theoretical co-clustering

• Dhillon et al. (2003) developed from information theory and tried to

optimize the objective function based on the loss of mutual information

between clustered random variables.

• Matrix factorization co-clustering

• Chen et al. (2008) linearly combined user-based, item-based CF

method, and matrix factorization results in order to make prediction on

ratings which relied on ONMTF.

• Li et al. (2009) presented a novel cross-domain collaborative filtering

method which co-clusters movie information via ONMTF and

reconstructs knowledge for recommending books and movies.


Preliminary (1/2)

• Suppose that we are given a clicking information matrix R

which is composed of user set, U={u1, u2, …, unu} and a set of

ad, A={a1, a2, …, ana}.

• nu and na respectively represents the number of users and ads.

• For memory-based CF methods, before finding similar

neighbors, it is inevitable to encounter sparsity issues of

demanded data.

• In the research of Dhillon et al. (2003), they considered a co-clustering

algorithm which monotonically decreases the information loss of tabular data

to form a compact model.


Preliminary (2/2)

• Assume U and A are random variable sets with a joint probability distribution p(U, A) and marginal distribution p(U) and p(A). The mutual information I(U; A) is defined as

• Suppose there are G1 user clusters CU={cu(1), cu

(2), …, cu(G1)} and, G2

ad clusters CA={ca(1), ca

(2), …, ca(G2)}, in order to judge the quality of

a co-clustering, we define the loss in mutual information as

• PROPOSITION 1. There are also properties that are declared and proven, they are


Co-Clustering with Augmented data

Matrix, CCAM (1/4)• When the optimization problem of loss in mutual information is first

proposed by Dhillon et al. (2003), it was designed and applied for

single tabular data.

• However, in many cases besides the major data set, there exist related tables which

may provide some useful information.

• In this co-clustering approach, Co-Clustering with Augmented data

Matrix (CCAM), we will simultaneously modify the co-clusters of

multiple augmented data to reduce the information loss.

• The other two sets of components, feature set F={f1, f2, …, fnf}, and

profile set P={p1, p2, …, pnp}, are extensive information for ads and

users and form the augmented matrices

• where nf and np denotes the number of features and profiles, respectively.



Matrix, CCAM (2/4)• PROPOSITION 2. There are extensive properties recognized

when p(A, F) and p(U, P) were considered.

• which were also declared and proven.

• DEFINITION 1. An optimal co-cluster (CU, CA) we desire to

obtain would minimize



Matrix, CCAM (3/4)•


Algorithm 1Co-Clustering with Augmented data Matrix algorithm


Collaborative filtering with CCAM

(1/5)•


Collaborative filtering with CCAM

(2/5)• DEFINITION 3. Since CCAM is designed on the base of KL-

divergence, the distance metrics would be in a similar format.

• Here we define the distance between each user and user cluster and each ad and

ad cluster.

• Note that the ad cluster prototype and user cluster prototype of

CCAM would be regarded as


Collaborative filtering with CCAM (3/5)

•



•


Data set

• The data set used in the experiments are obtained from a financial social web-site, Ad$Mart, which ranged from 2009/09/01 to 2010/03/31.

• For each test user, 15 observed clicking rates (Given15) are provided to find nearest neighbors and the remaining clicking rates are used for evaluation.

• To ensure each test user would click at least 15 ads, users with more than 20 clicked ads and ads with more than 10 clicked user-ad pairs would be reserved.• User-Ad: The pre-processing clicking data is provided by 1786 users and 520 ads. After

preprocessing, we make it a joint probability distribution over user and ad, and also reform it into a clicking rate matrix scaled from 1-5.

• Ad-Feature: An advertisement feature data set compiling 37 statistics of 530 ads.

• User-Profile: A questionnaire data set provided by 520 users on 24 survey questions.


Evaluation methodology (1/2)

•


Evaluation methodology (2/2)

•


•

and tuning based on k-NN


G1 and G2 tuning based on K-Means

• We also have to determine what value of G1 would result in a

well-performed MAE.

• We simply make G2=10 as well as K1 = K2 = 5, and as a strategy to avoid too

many parameter tunings.

• On this issue, we will see the responding of k-Means with different G1

(7, 15, 30, 60) and reserve the best one in order to apply to the other

algorithms.


Parameter tuning with CCAM (1/2)

• In order to evaluate the result of co-clustering, we take

advantage of classification algorithm (Weka J48) on user data to

test the F-measure of 10-fold c.v., and similarly in ad aspect.

• We use the clustering result of the user data (user-ad matrix and user-profile

matrix) as the target labels for evaluation of user clustering, and is similar to

the ad data (ad-user matrix and ad-feature matrix).

• To examine the effectiveness of co-clustering, we reduce the columns of user-

ad matrix to a smaller user-ad cluster matrix. The reduced data is then inserted

into our user data for classification, so as the ad data.

User dataUser-

ad cluster

Clustering result

of user-ad and

user-profile


Parameter tuning with CCAM (2/2)

• We find that when G1=60, the best setting will be λ=0.2, φ=0.1.

• Therefore, we will then apply the result of the optimal parameters of CCAM in the next section to compare with the other algorithms.

•


Results

• Table 3 compare the model-

based approaches.

• Table 4 compare the hybrid

models approaches with the

previous parameter settings.


Conclusion

• In this paper, we applied the rating framework of Chen’s to evaluate the performance of hybrid CF with various model construction.

• In order to give a fair comparison, we start by tuning for the best performance in each individual approach.

• As a result, we compared four algorithm, CCAM, ITCC, k-Means and k-NN. The MAE metric has shown that CCAM outperformed the other three algorithms.

• In the future, to have more thorough discussions, we will investigate our algorithm on different real world data set.

• such as the MovieLens, EachMovie and Book-Crossing data sets which respectively contains movie and book rating data of users.


THANK YOU FOR

LISTENING.Q & A


collaborative filtering with ccam

Education