Sparsity, robustness, and
diversification of Recommender Systems
Zhuo Zhang
A Dissertation
Presented to the Faculty
of Princeton University
in Candidacy for the Degree
of Doctor of Philosophy
Recommended for Acceptance
by the Department of
Electrical Engineering
Adviser: Sanjeev R. Kulkarni
September 2014
c© Copyright by Zhuo Zhang, 2014.
All rights reserved.
Abstract
Recommender systems have played an important role in helping individuals select
useful items or places of interest when they face too many choices. Collaborative
filtering is one of the most popular methods used in recommender systems. The idea
is to recommend to the target user an item that users with similar tastes will pre-
fer. An important goal of recommender systems is to predict the user’s preferences
accurately. However, prediction accuracy is not the only evaluation metric in recom-
mender systems. In this dissertation, we will mainly deal with three other aspects of
recommender systems, namely sparsity, robustness and diversification.
The dissertation starts with iterative collaborative filtering to overcome sparsity is-
sues in recommender systems. Instead of calculating the similarity matrix using sparse
data only once, we iterate this process many times until convergence is achieved. To
overcome the sparsity, users’ ratings in dense areas are estimated first and these esti-
mates are then used to estimate other ratings in sparse areas. Second, the robustness
of recommender system is taken into consideration to detect shilling attacks in recom-
mender systems. Some graph-based algorithms are applied in the user-user similarity
graph to detect the highly correlated group, in order to get the group of fake users.
Finally, we consider diversification of the types of information being used for rec-
ommendations. Specifically, geographical information, temporal information, social
network information, and tag information are all aggregated in a biased random walk
algorithm to make use of diversified data in multi-dimensional recommender systems.
iii
Acknowledgements
I would like to express my sincere gratitude to Professor Sanjeev Kulkarni, my advisor
in the EE department. As a fresh graduate student in summer 2011, I joined Professor
Kulkarni’s group. I have to admit that at that moment I had very little research
experience in academia, had little idea about what to do. It is Professor Kulkarni
who advised me not only in the academic area, but also in the career development.
He suggested me to read more to explore interesting research areas and follow my
passions instead of simply assigning a project to me. He encouraged me to go to
industry for a summer internship twice to see whether I enjoyed it. Without his
tremendous support, I could not achieve so much and finish the dissertation within 4
years.
I would also like to thank Professor Paul Cuff for his assistance and guidance in
my academic work, especially for serving as a dissertation reader. He always provided
insightful comments about my ongoing projects and proposed interesting questions
to get me inspired and motivated.
I would also like to thank my internship advisor, my dissertation reader, and
my friend, Professor Pan Hui. During the summer internship in Deutsche Telekom
Innovation Lab, I got so many innovative ideas in the mobile and social network area
from him, which forms part of my dissertation.
I would like to thank Professor Mung Chiang for serving on my general exam
committee, and Professor Peter Ramadge and Professor Mung Chiang for serving on
my thesis committee as non-readers.
My special thanks go to one of my group mates Shang Shang. I have had very
close collaborations with Shang in the recommender system area and I benefited
much from tremendous discussions with her. She contributed much to the location-
based recommender system during our last year in Princeton. I would also like to
express my gratitude to my friends in Princeton, Tiance (Mark) Wang, Haipeng
iv
Zheng, Guanchun (Arvid) Wang, Jieqi Yu, Pingmei Xu and Zhen (James) Xiang,
and all the other friends in EE and ORFE departments, for the sleepless nights we
were working together before deadlines, and for all the fun we have had in the last
four years.
Finally, and most importantly, I would like to thank my parents Xuan and Tianx-
ing, without whom my life would not be possible. Their support, encouragement,
patience, and love push me forward during four years of PhD life.
v
To my parents.
vi
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Introduction 1
1.1 Iterative Collaborative Filtering in Sparse Recommender Systems . . 2
1.2 Shilling Attack Detection using Graph-based Algorithms . . . . . . . 3
1.3 Location-based Multi-dimensional Recommender Systems . . . . . . . 4
2 Iterative Collaborative Filtering in Sparse Recommender Systems 5
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Iterative Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Iterative Framework . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Selective Processes . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 15
vii
2.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 16
3 Graph-based Shilling Attack Detection in the Recommender Sys-
tems 19
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1 Attack Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Graph-based Filtering Algorithm . . . . . . . . . . . . . . . . . . . . 24
3.2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.2 Heuristic Merging . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.3 Searching for the Largest Component . . . . . . . . . . . . . . 28
3.2.4 Iterative Refinement . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Spectral Clustering Detection . . . . . . . . . . . . . . . . . . . . . . 31
3.3.1 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.2 Dealing with Unbalanced Structure . . . . . . . . . . . . . . . 34
3.3.3 Iterative Refinement . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.4 Searching for the Number of Attack Profiles . . . . . . . . . . 38
3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.2 Assumption Validation . . . . . . . . . . . . . . . . . . . . . . 40
3.4.3 Searching for the Number of Attack Profiles . . . . . . . . . . 41
3.4.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.5 Experimental Results and Discussion . . . . . . . . . . . . . . 44
4 Location-based Recommender Systems 48
4.1 Introduction to Augmented Reality . . . . . . . . . . . . . . . . . . . 49
4.1.1 AR Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.2 Content Fusion in AR . . . . . . . . . . . . . . . . . . . . . . 51
viii
4.2 Background of Location-based Recommender Systems . . . . . . . . . 55
4.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Random Walk in Location-based Recommender Systems . . . . . . . 57
4.4.1 Model Construction . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4.2 Score Computation . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4.3 Personalized Recommendation through θ . . . . . . . . . . . . 61
4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5.1 Preliminary Experiments for Geographical Model . . . . . . . 63
4.5.2 Dataset Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5.4 Compared Algorithms . . . . . . . . . . . . . . . . . . . . . . 69
4.5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 70
5 Conclusion 72
5.1 Contribution of the Dissertation . . . . . . . . . . . . . . . . . . . . . 72
5.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . 73
Bibliography 75
ix
List of Tables
2.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 General Form of Attack Profiles . . . . . . . . . . . . . . . . . . . . . 21
3.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1 Average Percentile of Recommendations . . . . . . . . . . . . . . . . 64
4.2 Recall of Top Recommendations . . . . . . . . . . . . . . . . . . . . . 64
4.3 Average Distance of Top Recommendations . . . . . . . . . . . . . . 64
4.4 Statistics of Foursquare Dataset . . . . . . . . . . . . . . . . . . . . . 65
4.5 Geographic Distance Statistics . . . . . . . . . . . . . . . . . . . . . . 66
4.6 Average Percentile of Recommendations . . . . . . . . . . . . . . . . 70
4.7 Average Percentile of Recommendations . . . . . . . . . . . . . . . . 70
x
List of Figures
2.1 Iterative Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 MAE versus Data Sparsity . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 RMSE versus Data Sparsity . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Coverage versus Data Sparsity . . . . . . . . . . . . . . . . . . . . . . 18
3.1 The Average Similarity PDF for Different Group Size . . . . . . . . . 40
3.2 The 99% Percentile Average Similarity Value . . . . . . . . . . . . . . 41
3.3 G0(n) and Fitting Curve . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Group Size Vs Average Similarity for 100 Random Attackers . . . . . 43
3.5 Group Size Vs Average Similarity for 100 Average Attackers . . . . . 44
3.6 Group Size Vs Average Similarity for 100 Bandwagon Attackers . . . 45
4.1 AR Ecosystem Framework . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Content Fusion Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 An example of recommendation graph . . . . . . . . . . . . . . . . . 58
4.4 Pairwise Check-in Places Distance Distribution . . . . . . . . . . . . 67
4.5 Average Distance Distribution for a Single User . . . . . . . . . . . . 68
4.6 Check-in Time Frequency . . . . . . . . . . . . . . . . . . . . . . . . 69
4.7 Top K Hit Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
xi
Chapter 1
Introduction
In less than two decades, recommender systems have been widely used on the Internet,
providing users with personalized items and information. They play a very important
role in making profits for companies such as Amazon or Netflix. In recommender
systems, there are lists of both users and items. Each user will use scores or linguistic
terms such as like or dislike to rate a subset of all possible items. With a large amount
of information, users often find it hard to select the useful and relevant information.
Therefore, recommender systems are designed to help select the relevant information
for the specific user [1]. By analyzing the available ratings, collaborative filtering
attempts to make the best predictions or recommendations to the target user. The
underlying principle in collaborative filtering is to find a group of users with similar
tastes and then provide a prediction for the target user based on the preferences of
similar users.
There are several current challenges in recommender systems. First, recommender
systems need to overcome sparsity issues. Only a small proportion of users tend to
rate or leave feedback on the products they used, while the set of items is usually very
large. Thus the rating information is always limited and the user-item rating matrix
1
is very sparse. The traditional collaborative filtering algorithms will cause overfitting
issues, leading to inaccurate prediction results [47].
Second, evaluating recommender systems and their algorithms is inherently dif-
ficult for several reasons [29]. High accuracy may not be the only goal that recom-
mender systems want to achieve. Since recommender systems have been widely used,
people are able to exploit some basic knowledge about them. Some fake user profiles
are generated based on some attack models and some target items are pushed to be
more popular or less popular, in order to make some profits. Therefore, robustness
to attacks is also important in recommender systems.
Last but not least, in recent years researchers have not only used explicit rating
information or feedback, but also aggregate “side information” such as geographical
information, temporal information, tag information, and social network information,
to improve the prediction accuracy [2]. With rapid development of smartphones
and wireless networks, location-based services have become more and more popular.
Geographic information is therefore very important. Recent research has shown that
time, item categories, and even friendship between users have strong connections
for item preference. Therefore a multi-dimensional recommender system using all
available information is one of the trends in the future.
1.1 Iterative Collaborative Filtering in Sparse
Recommender Systems
Collaborative filtering (CF) is one of the most successful techniques in recommender
systems. By utilizing co-rated items of pairwise users for similarity measurements,
traditional CF uses a weighted summation to predict unknown ratings based on the
available ones. However, in practice, the rating matrix is too sparse to find sufficiently
many co-rated items, thus leading to inaccurate predictions. In Chapter 2, to address
2
the case of sparse data, we propose an iterative CF that updates the similarity and
rating matrix [66]. The improved CF incrementally selects reliable subsets of miss-
ing ratings based on an adaptive parameter and therefore produces a more credible
prediction based on similarity. Experimental results on the MovieLens dataset show
that our algorithm significantly outperforms traditional CF, Default Voting, and SVD
when the data is 1% sparse. The results also show that in the dense data case our
algorithm performs as well as state of art methods.
1.2 Shilling Attack Detection using Graph-based
Algorithms
Collaborative filtering has been widely used in recommender systems as a method
to recommend items to users. However, by using knowledge of the recommendation
algorithm, shilling attackers can generate fake profiles to increase or decrease the
popularity of a targeted set of items. In Chapter 3, we present a spectral clustering
method to make recommender systems resistant to these attacks in the case that
the attack profiles are highly correlated with each other [68, 69]. We formulate the
problem as finding a maximum submatrix in the similarity matrix. This is an NP
hard problem. In order to search for the maximum submatrix, we first translate the
matrix into a graph and then use a spectral clustering algorithm to find the min-cut
to estimate the highly correlated group. The graph is created based on the edge
density in order to allow dealing with an unbalanced clustering. The detection is
refined through an iterative process to obtain a better estimate of the group of attack
profiles. Some analysis about the stability of the refinement process is further pro-
vided. Experimental results show that the proposed approach can improve detection
precision compared to existing methods.
3
1.3 Location-based Multi-dimensional Recom-
mender Systems
Location-based recommender systems have attracted a large number of users in re-
cent years since wireless networks and mobile devices have rapidly developed [67].
Realtime location-based recommender systems should take location, temporal infor-
mation, and social network information into consideration, in order to improve the
user experience. In Chapter 4, we first review the development of augmented reality
in recent years, serving as an introduction to location-based recommender systems.
Then we present an aggregated random walk algorithm incorporating personal pref-
erences, location information, temporal information, and social network information
in a layered graph [70]. By adaptively changing the graph edge weight and comput-
ing the rank score, the proposed location-based recommender system predicts users’
preferences and provides the most relevant recommendations with aggregated infor-
mation. Specifically, the geographical information is modeled as an exponential decay
function while the temporal information is abstracted as a time vector to incorporate
into the final ranking score. A biased random walk algorithm has flexibility so that
the personalized parameter can be specified to meet different purposes. Experimental
results show that the biased random walk algorithm gives better results for location-
based multi-dimensional recommender systems than other state-of-the-art methods.
4
Chapter 2
Iterative Collaborative Filtering in
Sparse Recommender Systems
In this chapter, we describe a method for iterative collaborative filtering applied in
sparse recommender systems. This chapter is organized as follows. Section 2.1 intro-
duces the background, describes the problem formulation and related work. Section
2.2 discusses details of our proposed iterative collaborative filtering. Experimental
results and comparisons with existing methods are shown in Section 2.3.
2.1 Background
In recommender systems, there are lists of both users and items. Each user will use
scores or linguistic terms such as like or dislike to rate a subset of all possible items.
With a large amount of information, users often find it hard to select the useful and
relevant information. Therefore, recommender systems such as Netflix and Amazon
are designed to help select the relevant information for the specific user [1]. By
analyzing these ratings, collaborative filtering attempts to make the best predictions
or recommendations to the target user. The underlying principle in collaborative
5
filtering is to find a group of users with similar tastes and then provide a prediction
for the target user based on such preferences.
Typically there are two kinds of collaborative filtering, memory-based methods
and model-based methods. Memory-based methods [50] [19] [61] first calculate the
similarity function between pairs of users and then uses a weighted summation of
available ratings to predict unknown ones. In contrast, model-based methods [30] [36]
first make certain assumptions about the data and then fit the existing data into the
assumed model to obtain predictions.
Currently there are several challenges in collaborative filtering [55], one of the most
difficult of which is dealing with data sparsity. In recommender systems, extremely
sparse rating data may occur when ratings are available for only a small proportion
of items compared to the actual large item set. Traditional approaches to CF, such as
the Pearson Correlation method or Vector Similarity method [12], calculate two users’
similarity based on their co-rated items. As a result, if the rating data is extremely
sparse, the lack of co-rated items for pairwise users will lead to inaccurate similarity
estimates.
Several approaches have been developed to address the data sparsity problem. One
of them is Default Voting [12], which automatically assumes a default rating value for
some number of additional items and thus extends the aggregation domain. However,
experiments show that even though Default Voting does improve performance to some
extent, it is still too rough because a default rating lacks specialization on each item.
Another memory-based method formulates a linear model [7] to fit existing ratings
and calculates similarity through a quadratic optimization problem. Although this
approach exploits global information, it still highly depends on the availability of a
sufficient number of co-rated items.
In this chapter, we propose an iterative collaborative filtering framework to deal
with sparse data in recommender systems. Our algorithm first estimates parts of the
6
missing ratings based on similarity. Afterwards, it goes back to update the similarity
function by using the estimated ratings. This process is repeated iteratively.
2.1.1 Problem Formulation
In recommender systems, there is a set of users U = u1, ..., um and a set of items
I = i1, ..., in. For each user u, Iu denotes the corresponding subset of items that
user u has rated. Let Iuv denote the subset of items that both user u and user v have
rated. An m×n dimensional rating matrix R is then constructed. Each element ru,i in
R denotes user u’s rating on item i. The data only provides part of this information.
Let ru denote the average rating of user u on all the items user u has rated. The goal
is to estimate an unknown rating ru,i of a specific user u on a target item i.
2.1.2 Related Work
As mentioned before, memory-based methods are divided into two steps: similarity
assessment and aggregation. Various approaches have been used in collaborative
filtering. The most commonly-used method, called the Pearson Correlation [50], uses
a similarity function su,v between users u and v which is defined as
su,v =
∑i∈Iuv(ru,i − ru)(rv,i − rv)√∑
i∈Iuv(ru,i − ru)2√∑
i∈Iuv(rv,i − rv)2(2.1)
Then the unknown rating ru,i for user u and item i is usually computed as an
aggregate of known ratings. The aggregation function is defined as follows:
ru,i = ru +
∑v∈U su,v(rv,i − rv)∑
v∈U |su,v|(2.2)
From Eq.(2.1), we may infer that the Pearson Correlation between users u and
v highly depends on Iuv, the co-rated items by users u and v. As unpaired ratings
7
cannot be used separately, insufficient co-rated items can lead to inaccurate similarity
estimation.
To overcome the sparsity problem, [12] proposed a default voting technique, as-
suming a default rating value d for some number k of additional items. This is done
to extend the aggregation domain from Iu ∩ Iv to Iu ∪ Iv, replacing unpaired missing
values with a default value d. Experiments show that this can improve performance in
the sparse case but the approach is rather ad-hoc and depends greatly on the choice of
d. Another memory-based method [7] formulates a linear model to fit existing ratings
and calculates similarity wi,j between items i and j through a quadratic optimization
problem. For a fixed item i, the optimization function is defined as follows:
arg minw
∑u∈Ui
cu
(ru,i −
∑j∈Iu wi,jru,j∑j∈Iu wi,j
)2
/∑u∈Ui
cu,
where cu =(∑
j∈Iu wi,j
)2
.
Here, Ui denotes all the users that have rated item i while Iu denotes all the items
that have been rated by user u. Although this algorithm takes global information into
consideration, it is still highly dependent on co-rated items. Moreover, it needs to
solve quadratic optimization for each individual item, leading to high computational
cost.
Model-based methods view ratings as a probabilistic model, fit a model to the
training data, and then make predictions on the testing data. A variety of Singular
Value Decomposition (SVD) related algorithms have been proposed for PCA or low
rank matrix completion [11] [52]. Formally, the SVD of an m× n matrix M is given
by
M = UΣV ∗,
where U is an m× n unitary matrix, Σ is an n× n rectangular diagonal matrix with
nonnegative real numbers on the diagonal, and V ∗ is an n × n unitary matrix. We
8
take the largest k component Σk in Σ and reconstruct the low rank matrix Mk with
its corresponding Uk and Vk:
Mk = UkΣkV∗k .
Billsus and Pazzani [11] proposed a binary SVD algorithm, regarding prediction as a
classification problem. It first transforms the sparse rating data into a dense binary
matrix. Then it applies SVD to get the low dimensional data as users’ preference
features and trains n feedforward neural networks for classification. Sarwar et al. [52]
proposed a generalized SVD algorithm, which works by first filling in all missing
data with average ratings of each user, and then reducing the dimension of data with
SVD and finally using the Pearson Correlation or N-Nearest-Neighbors for predic-
tion. Later, Kim and Yum [34] proposed an iterative principal component analysis
algorithm for collaborative filtering. Instead of filling in the missing ratings with
average values of each user, it uses low dimensional data derived from SVD to fill in
the missing ratings iteratively, until convergence is achieved. All these SVD related
algorithms address the sparsity problem to varying degrees. However when data gets
increasingly sparse, the performance of SVD degrades significantly due to limited
information. In the next section, we introduce a new algorithm that has improved
performance in the sparse regime.
2.2 Iterative Collaborative Filtering
In this section we will present our iterative collaborative filtering. We start by in-
troducing the iterative framework, and then go to the details of selective processes.
Finally we will show our algorithm in pseudo-code.
9
2.2.1 Iterative Framework
In the Pearson Correlation method in Eq.(2.1), the similarity between users u and
v is calculated only on co-rated items in the subset Iuv. Default Voting and SVD
algorithms overcome the sparse problem by filling in missing values with average
ratings of users. Here we propose an iterative framework to overcome the sparsity, by
first estimating some ratings for which we have sufficient data, and then using these
to refine our estimate of the similarity measure.
We base our iterative collaborative filtering method on the Pearson Correlation,
following the steps presented below:
1. Calculate a preliminary estimate of the similarity between pairs of users.
su,v =
∑i∈Iuv(ru,i − ru)(rv,i − rv)√∑
i∈Iuv(ru,i − ru)2√∑
i∈Iuv(rv,i − rv)2,
∀(u, v) ∈ U × U such that |Iuv| > M
2. Estimate a subset of missing values that can be predicted most reliably using
the preliminary similarity estimates.
ru,i = ru +
∑v∈U su,v(rv,i − rv)∑
v∈U |su,v|,
for eligible (u, i) ∈ U × I
3. Update all the average ratings of each user and similarity between pairs of users.
4. Use the updated estimates and original data for the target prediction ru0,i0 . If
|rnewu0,i0− roldu0,i0 | ≥ α, go back to Step 2.
Fig.2.1 shows the framework of iterative collaborative filtering.
10
Figure 2.1: Iterative Framework
2.2.2 Selective Processes
From the iterative framework, we can see that since the rating matrix and similarity
function highly depend on each other, the rough estimation of the missing ratings
in the first round can be very noisy if selected inappropriately, and thus can distort
the similarity measure when it is used for updating. In the similarity measurement
step, the similarity function su,v between u and v is calculated more reliably when
the number of co-rated items |Iuv| is large. In the aggregation step, to estimate ru,i,
it is intuitive that more available ratings ru,j,∀j ∈ I will provide a better estimate
of user u’s mean value ru while more available ratings rv,i,∀v ∈ U will provide a
better estimate of the bias term for item i. Moreover, [28] and [20] show that top-N
nearest-neighbor methods will have a higher accuracy when compared to the entire
11
area ratings aggregation, which means top similar users are more reliable. Therefore,
our algorithm applies the following selective processes:
• In the preliminary estimate of the similarity measure, we only rely on those
pairwise users u and v that have more than M co-rated items. In the following
experimental section, we set M = 5.
• To fill in missing ratings, we set a threshold γ such that if there are more
than γ related ratings available for this missing value, we will estimate it using
weighted summation prediction. Otherwise we leave it blank. Here we dynam-
ically choose this threshold γ as triple the average number of known related
ratings. When data is sparse, γ is small and we need to pre-estimate many
missing ratings. On the other hand, when data gets dense, a large γ selects less
regions for estimation.
• For every prediction, we only use N nearest neighbors instead of all ratings for
aggregation. Unless otherwise noted, all experimental results demonstrated in
this paper take N = 30.
2.2.3 Algorithm Description
The pseudo-code is shown in Algorithm 1. Note that we do not need to do the rough
estimate update every time when we estimate ru0,i0 for a specific u0 and i0. In fact,
for multiple estimations in each iteration, we need to go through this step only once.
Therefore this process significantly improves the efficiency of the algorithm.
12
Algorithm 1 Iterative Collaborative Filtering
Given m users u1, ..., um and n items i1, ..., in;Given sparse rating matrix Rm×n in which ru,i denotes the rating of item i by useruGoal: Estimate ru0,i0 for specific u0 and i0while |rnewu0,i0
− roldu0,i0| ≥ 0.01 do
Calculate ru =∑
i∈Iu ru,i|Iu| ∀u ∈ U
Calculate su,v =∑
i∈Iuv (ru,i−ru)(rv,i−rv)√∑i∈Iuv (ru,i−ru)2
√∑i∈Iuv (rv,i−rv)2
∀(u, v) ∈ U × U such that |Iuv| > M, zero otherwise.for u = u1, ..., um dodu = Number of available ratings on specific user u
end forfor i = i1, ..., in doci = Number of available ratings on specific item i
end forγ = 3(m+n−1)
m×n∑
u∈U dufor u = u1, ..., um do
for i = i1, ..., in doif du + ci > γ and ru,i is missing then
Estimate ru,i = ru +∑
v∈U su,v(rv,i−rv)∑v∈U |su,v |
;
end ifend for
end forru0,i0 = ru0 +
∑u∈U su0,u(ru,i0−ru)∑
u∈U |su0,u|end whilereturn ru0,i0
2.3 Experiments
This section discusses our experimental results on a real dataset MovieLens1, com-
paring our iterative collaborative filtering algorithm with other state of art methods.
2.3.1 Experimental Design
The MovieLens dataset was first used in [50]. It contains a total of 100,000 ratings
from 943 users on 1,682 items. Each user rates at least 20 movies, based on a scale
from 1 to 5. The sparsity of the rating matrix is 6.3%. Although simple demographic
1http://www.grouplens.org
13
information of users and basic information of movies are available in the dataset, we
do not use them here because our algorithm is purely collaborative filtering which
depends only on users’ ratings of items.
The MovieLens dataset has originally been divided into 80% training set and 20%
testing set. As mentioned in MovieLens, it is already a post-processed dataset, which
eliminates users who provided less than 20 ratings and maintains users who provided
at least 20 ratings. The rating sparsity is already 6.3% while [51] mentions that in
recommender systems even the active users may purchase or rate fewer than 1% of
the items. In order to consider more typical sparsity in recommender systems, we
change the density of data and randomly pick 20%, 40%, 60%, 80% or 100% from the
orginal training set, which corresponds to 1% to 5% data sparsity for training and
the rest for testing.
In our experiments, we compare our algorithm with several other algorithms:
• Baseline. This is the baseline for our algorithm evaluation, using average ratings
of each user as prediction.
• Pearson. We implement user-based collaborative filtering, with the Pearson
Correlation as its similarity function.
• DV. This is the Default Voting algorithm [12], with optimal parameters fit.
• Naive SVD. This is the work proposed in [52], with average ratings filling in for
preprocessing. It uses the reduced dimensional matrix data directly as predic-
tion ratings.
• CF-based SVD. This is also proposed in [52], with average ratings filling in for
preprocessing. It uses low dimensional data as users’ preference features for
similarity measurement.
14
• Iterative SVD. This is an improved version of Naive SVD, proposed in [34]. It
iteratively calculates SVD and replaces its missing values with the low dimen-
sional data, until convergence is achieved.
• Iterative CF. This is our proposed iterative collaborative filtering algorithm.
We find that in experiment, this algorithm converges very fast, usually within
2 to 4 iterations.
2.3.2 Evaluation Metrics
Currently recommender systems have been evaluated in many ways [29]. In this
experiment, we use the commonly-used CF predictive accuracy metrics MAE, RMSE
and an effectiveness evaluation metrics called Coverage.
MAE & RMSE
The most commonly-used metrics are Mean Absolute Error (MAE) and Root Mean
Square Error (RMSE). They are defined as follows:
MAE =
∑(u,i) |ru,i − ru,i|
n,
RMSE =
√∑(u,i)(ru,i − ru,i)2
n,
where ru,i is the estimated value while ru,i is the ground truth, and n denotes the
total number of estimated ratings.
Coverage
Although MAE and RMSE are one way to measure accuracy, there are other metrics
to evaluate an algorithm’s effectiveness and robustness. Coverage is the percentage of
items for which predictions are effective. Algorithms with lower coverage may be less
15
valuable to users since they are limited in the area that they can help with. In the
following part, we only test on the testing set and count for the effective predictions.
coverage =Number of effective estimated ratings
Number of total ratings that need to be estimated
2.3.3 Experimental Results
In this subsection, we first show the change of MAE, RMSE and coverage with differ-
ent data sparsity. From Fig.2.2 and Fig.2.3 we can see that when data is 1% sparse,
our algorithm performs the best. Default Voting (DV) is the second best algorithm
in the sparse case. However we should note that the performance of DV greatly de-
pends on the choice of parameters. All the results of DV shown here use the optimal
parameters. When the rating matrix gets denser, even 3% sparse, our algorithm still
performs better compared to others. At 5% sparse, our algorithm still performs as
well as CF-based SVD and the Pearson Correlation method. This is because when
data is dense, the adaptive parameter γ will only choose a very limited region of
missing ratings for estimation, leading to a process similar to the Pearson Correla-
tion method. CF-based SVD is slightly better than our algorithm because it gets
rid of useless information in the original data using PCA. Fig.2.4 shows that when
data is extremely sparse, namely 1% data sparsity, our algorithm still has a very high
coverage at 96%, compared to the Pearson Correlation method’s 69%.
In order to see the performance of our algorithm in sparse data as well as the
normal case, we show all the metrics in Table 3.2 for 1% sparse data and 5% sparse
data.
Table 3.2 shows that when data is 1% sparse, iterative collaborative filtering out-
performs by 0.03 in MAE and 0.04 in RMSE respectively compared with other algo-
rithms. The Pearson Correlation algorithm and the CF-based SVD perform poorly
because of insufficient co-rated items. When data becomes dense, achieving 5% sparse,
16
Figure 2.2: MAE versus Data Sparsity
Table 2.1: Experimental Results1% Data Sparsity 5% Data Sparsity
MAE RMSE Coverage MAE RMSE CoverageBaseline 0.877 1.107 100% 0.850 1.063 100%Pearson 0.873 1.118 68.76% 0.753 0.959 99.97%
DV 0.849 1.085 99.38% 0.760 0.965 99.93%Naive SVD 0.851 1.078 100% 0.791 0.977 100%
CF-based SVD 0.868 1.135 97.63% 0.750 0.958 99.54%Iterative SVD 0.850 1.078 100% 0.753 0.962 100%Iterative CF 0.817 1.042 96.23% 0.751 0.960 99.99%
iterative collaborative filtering still performs as well as the CF-based SVD and the
Pearson Correlation method.
17
Figure 2.3: RMSE versus Data Sparsity
Figure 2.4: Coverage versus Data Sparsity
18
Chapter 3
Graph-based Shilling Attack
Detection in the Recommender
Systems
This chapter introduces shilling detection in recommender systems using graph-based
algorithms. This chapter is organized as follows. Section 3.1 describes the background
and related work. Section 3.2 formulates the problem and introduces two graph-
based detection algorithms. Section 3.3 discusses details of the spectral clustering
algorithm, an advanced clustering method for searching the highly correlated group.
Experimental results and comparisons with existing methods are presented in Section
3.4.
3.1 Background
Recommender systems are vulnerable to shilling attacks [37] in which an attacker
signs up as a number of “dummy” users and gives fake ratings in an attempt to
increase or decrease the recommendations of specific items by exploiting knowledge
of the recommender system algorithm. “Push attacks” attempt to make one or more
19
items popular in the system so that they are recommended to more users. Conversely,
attacks that make a set of items less popular are called “nuke attacks”. One of the
difficult challenges in recommender system design is to find algorithms that are robust
to shilling attacks. For simplicity, in the following we consider only push attacks, but
actually our proposed algorithm can be applied to both cases.
In this chapter, we model shilling attacks as a group of users with highly correlated
ratings. We formulate the detection problem as a spectral clustering problem, namely
to cluster the whole user set and choose the fake profile groups from clusters. However,
since the rating matrix is usually very sparse, we cannot easily define a complete
distance measure for the clustering problem. In order to overcome the sparsity, we
construct a graph based on the similarity matrix. Using that graph we apply a spectral
clustering algorithm, which is based on the similarity measure instead of distance
measure for clustering. Experimental results show that our method performs well for
a range of different attacks.
Our algortihm makes the following contributions over prior work:
• We do not make any assumption on the attack model except that attack users
are highly correlated;
• We start from intra attributes to focus on statistics across user profiles instead
of individual profiles;
• We apply graph-based algorithms and spectral clustering to cluster user profiles
based on similarity, but we avoid the use of a distance measure, namely the
correlations between pairwise users;
• Our algorithms do not need to specify the exact number of attack profiles and
it can automatically estimate the number of attack profiles.
20
Table 3.1: General Form of Attack ProfilesLT LS LF LN
lt lS1 · · · lSs lF1 · · · lFf lN1 · · · lNnγ(lt) α(lS1 ) · · · α(lSs ) β(lF1 ) · · · β(lFf ) null null null
3.1.1 Attack Models
All the notations are addressed in Section 2.1.1 and 2.1.2. A fairly general form of an
attack profile is shown in Table 3.1. First the target item is rated as either highest or
lowest. Then some items are selected to be rated to mimic the real users’ rating so
that the fake profile can be similar to the real user, in order to make some impact on
the final recommendations. Based on the function, the attack profile can be thought
of as four sets of items:
• LT : a singleton target item lt;
• LS: a set of selected items with particular characteristics determined by the
attacker;
• LF : a set of filler items usually chosen randomly;
• LN : a set of unrated items.
In a typical attack, the target item lt is usually set at either the highest score (for
a push attack) or the lowest score (for a nuke attack). However, different choices of
rating functions and selections of LS and LF lead to different attack models, some
of which are described below. Here N(µ, σ2) denotes the Gaussian distribution with
mean µ and variance σ2.
• Random attack: LS = φ and β(l) ∼ N(r, σ2).
• Average attack: LS = φ and β(l) ∼ N(rl, σ2l ).
• Bandwagon attack: LS contains some number of popular items, α(l) = rmax
and β(l) ∼ N(rl, σ2l ).
21
3.1.2 Related Work
O’Mahony et al. summarize different types of attack strategies and empirically
evaluate the robustness of memory-based collaborative filtering [43–45]. The work
in [16, 41, 42, 65] extends the robustness analysis to model-based algorithms such as
K-means, Probabilistic Latent Semantic Analysis (PLSA), Singular Value Decompo-
sition (SVD), Principle Component Analysis (PCA), and Matrix Factorization (MF).
In the existing algorithms, three main methods are proposed for attack detection in
recommender systems.
Generic Attributes
Attack profiles usually have low deviation from the mean value for most items, but
high deviation from the mean for the attacked item, and they are highly correlated
with each other. Therefore generic attributes are often used to evaluate the deviation
of rating profiles or the similarity with nearest neighbors. [17] proposes Rating Devi-
ation from Mean Agreement (RDMA) and Degree of Similarity with Top Neighbors
(DegSim) to classify fake profiles. Furthermore, an unsupervised retrieval method
(UnRAP) based on matrix residues was proposed in [13]. These measures are defined
as follows.
• Rating Deviation from Mean Agreement (RDMA):
RDMAu0 =
∑l∈Lu0
|ru0,l−rl||Ll|
|Lu0|,
where |Lu| is the number of ratings that user u has rated and |Ll| is the number
of ratings provided for item l.
• Degree of Similarity with Top Neighbors (DegSim):
DegSimu0 =
∑ki=1 s0,ni
k,
22
where uniis the ith nearest neighbor for user u0.
• Unsupervised Retrieval of Attack Profiles (UnRAP):
UnRAPu0 =
∑l∈Lu0
(ru0,l − ru0 − rl + r)2
(ru0,l − rl)2.
Model-specific Attributes
Prior work has shown that generic attributes are generally insufficient for distinguish-
ing an attack profile from eccentric but authentic profiles [14]. Model-based methods
assume that we have some prior knowledge about the attack model. Based on an
assumed model, ratings can be automatically divided into LS and LF . Finally several
measurements such as Filler Mean Variance (FMV) or Filler Mean Target Differ-
ence (FMTD) [62] can be computed from each subset to evaluate the authenticity of
profiles.
• Filler Mean Variance (FMV)
FMVu0 =∑l∈LF
(ru0,l − rl)2
|LF |,
where LF denotes the filler item set.
• Filler Mean Target Difference (FMTD)
FMTDu0 =
∣∣∣∣∑
l∈LSru0,l
|LS|−∑
l∈LFru0,l
|LF |
∣∣∣∣ ,where LS denotes the selected item set while LF denotes the filler item set.
Intra Attributes
Model-specific methods usually need some training data to estimate parameters of
the attack model. Otherwise LS and LF can not be easily separated. Unlike generic
23
attributes and model-specific attributes which concentrate on characteristics within
a single profile, intra attributes focus on statistics across profiles. As Mehta et al.
mention in [39], spam users are highly correlated and often work together. There-
fore a PCA-based method can be applied to remove the most correlated users. This
approach orders all the user profiles based on the contribution of the principle com-
ponent (or the top few principle components) and removes the most highly correlated
items. However, [39–41, 72] do not specify how to choose the number of principle
components to be considered. In [68], starting from the same assumption, a large
component searching algorithm is proposed in the similarity graph in order to find
the highly correlated group. It is unstable when a small size of random attack is
applied since the algorithm only searches for local optimal solutions but in this case
the local optimum may be far away from the global optimum.
In the remainder of this chapter, we will start from the same assumption that fake
users are highly correlated, but we propose a more robust algorithm to find the most
correlated group of users.
3.2 Graph-based Filtering Algorithm
3.2.1 Problem Formulation
As we mentioned before, shilling attackers generally work as a group and are highly
correlated with each other. Suppose we have m user profiles among which n are fake.
From the original rating matrix Rm×p we can calculate the user correlation matrix
S = (si,j)m×m based on Eq.(2.1). Our final goal is to find the n × n submatrix with
the maximum sum in the original m ×m matrix, with the same columns and rows
selected. We define ~δ = (δ1, ..., δn) where δi is an indicator function that represents
whether column/row i is being selected. Therefore the problem is formulated as
below.
24
~δ = arg max||~δ||1=n
1
|~δ|2~δS~δT (3.1)
= arg max∑mi=1 δi=n
1
|~δ|2
m∑i=1
m∑j=1
si,jδiδj,
where δi = 0 or 1,∀i = 1, ...,m.
Searching for a maximal submatrix is typically referred to as a biclustering prob-
lem [15] and several algorithms have been proposed. Since (3.1) requires selecting the
same columns and rows, it is different from traditional biclustering. However, a data
matrix can be viewed as a weighted graph G = (V,E) where V is the set of vertices
and E is the set of edges. Each vertex in V denotes a corresponding column/row and
each edge between vi and vj has a weight si,j. Therefore we can get a suboptimal
solution from the graph, a subset of points which have a high average weight within
group. In the following subsections, we propose two algorithms, namely heuristic
merging and largest component searching, to derive an approximate solution. After
that, we use an iterative process to refine the attack group. There is always a trade-
off between the group size n and the average similarity within the group. Therefore,
how to set up the stopping criterion is another challenge in this problem and we will
address it later.
3.2.2 Heuristic Merging
A naive greedy algorithm selects one user out of the group with the highest average
similarities with all users in the group, and adds it into the original group. When
the group size is relatively large and most of them are fake profiles, this algorithm
works very well. However, when the group size is relatively small, the initial points
are hard to select. Therefore, we propose a generalized greedy, or heuristic merging
algorithm, regarding each point initially as a separate cluster, merging them step by
25
step by heuristic functions and finally leading to an optimal cluster with the right
size. Before doing that, we first introduce some notation for convenience.
• C(t)i is the node set of cluster i after t merging actions.
⋃iC
(t)i =1, ...,m and
C(t)i
⋂C
(t)j = ø, ∀i 6= j.
• n(t)i = |C(t)
i | is the number of nodes in cluster i after t merging actions.
• d(t)ij = 1
n(t)i n
(t)j
∑x∈C(t)
i ,y∈C(t)jsx,y is the average distance between i and j after t
merging actions, i 6= j.
• f (t)i = 1
(n(t)i )2
∑x∈C(t)
i ,y∈C(t)isx,y is the average similarity within the group after t
merging actions.
Initially each node belongs to an individual cluster, i.e. C(0)i = i,∀i = 1, ...,m.
Then at each time t we search for two clusters C(t)i , C
(t)j based on a heuristic function
f(C(t)i , C
(t)j ) and merge them together. Our final goal is to find a cluster C
(t0)i0
to
maximize f(t0)i0
such that n(t0)i0≥ n.
In the problem above we have two objects to focus on, namely the size of the
cluster n(t0)i0
and the average utility score f(t0)i0
. Therefore we can choose our heuristic
function h as either n(t0)i0
or f(t0)i0
. But before the algorithms are introduced, let us
first see the relationship after two clusters are combined.
Claim 1 Suppose in the (t + 1)th merging action, C(t)i and C
(t)j merge, i.e.
C(t+1)i = C
(t)i
⋃C
(t)j . Then
d(t+1)ik =
n(t)i d
(t)ik + n
(t)j d
(t)jk
n(t)i + n
(t)j
,∀k 6= i.
f(t+1)i =
(n(t)i )2f
(t)i + (n
(t)j )2f
(t)j + 2n
(t)i n
(t)j d
(t)ij
(n(t)i + n
(t)j )2
;
26
Proof.
d(t+1)ik =
∑x∈C(t+1)
i ,y∈C(t+1)k
sx,y
n(t+1)i n
(t+1)k
=
∑x∈C(t)
i
⋃C
(t)j ,y∈C(t)
ksx,y
(n(t)i + n
(t)j )n
(t)k
=
∑x∈C(t)
i ,y∈C(t)ksx,y
(n(t)i + n
(t)j )n
(t)k
+
∑x∈C(t)
j ,y∈C(t)ksx,y
(n(t)i + n
(t)j )n
(t)k
=n
(t)i n
(t)k d
(t)ik
(n(t)i + n
(t)j )n
(t)k
+n
(t)j n
(t)k d
(t)jk
(n(t)i + n
(t)j )n
(t)k
=n
(t)i d
(t)ik + n
(t)j d
(t)jk
n(t)i + n
(t)j
.
f(t+1)i =
∑x∈C(t+1)
i ,y∈C(t+1)i
sx,y
(n(t+1)i )2
=
∑x∈C(t)
i
⋃C
(t)j ,y∈C(t)
i
⋃C
(t)jsx,y
(n(t+1)i )2
=
∑x∈C(t)
i ,y∈C(t)isx,y
(n(t)i + n
(t)j )2
+
∑x∈C(t)
j ,y∈C(t)jsx,y
(n(t)i + n
(t)j )2
+2
∑x∈C(t)
i ,y∈C(t)jsx,y
(n(t)i + n
(t)j )2
=(n
(t)i )2f
(t)i + (n
(t)j )2f
(t)j + 2n
(t)i n
(t)j d
(t)ij
(n(t)i + n
(t)j )2
.
The heuristic merging algorithm is shown in Algorithm 2.
We can easily verify that when h(C(t)i , C
(t)j ) = n
(t+1)i = n
(t)i + n
(t)j (when n
(t+1)i
is equal we will maximize f(t+1)i to avoid trivial cases) the algorithm becomes the
naive greedy algorithm, which adds the largest node into the original cluster at
each step and converges within n − 1 steps. When we set h(C(t)i , C
(t)j ) = f
(t+1)i =
(n(t)i )2f
(t)i +(n
(t)j )2f
(t)j +2n
(t)i n
(t)j d
(t)ij
(n(t)i +n
(t)j )2
, we allow nodes to merge freely regardless of the current
cluster size. Finally, after at most m− 1 steps, a cluster with more than n nodes will
27
Algorithm 2 Heuristic Merging Algorithm
Input: Sm×m, a symmetric correlation matrix of real numbers, and n, the maximumsize of cluster.Output: C
(t0)i0
, a cluster which has a size n(t0)i0≥ n and local maximal average
similarity f(t0)i0
.
Initialization: C(0)i = i, n(0)
i = 1, d(0)ij = sij, f
(0)i = 0 ∀i, j = 1, ...,m. t = 0.
while maxi(n(t)i ) ≤ n do
Find cluster i, j s.t. h(C(t)i , C
(t)j ) achieves maximum:
C(t+1)i ← C
(t)i
⋃C
(t)j , n
(t+1)i ← n
(t)i + n
(t)j ;
d(t+1)ik ← n
(t)i d
(t)ik +n
(t)j d
(t)jk
n(t)i +n
(t)j
,∀k 6= i.
f(t+1)i ← (n
(t)i )2f
(t)i +(n
(t)j )2f
(t)j +2n
(t)i n
(t)j d
(t)ij
(n(t)i +n
(t)j )2
;
t← t+ 1;end whilereturn C
(t)i
appear to be a solution. In the following, we will use f(t+1)i as the merging heuristic
function.
3.2.3 Searching for the Largest Component
The heuristic merging algorithm achieves a local optimal solution step by step. How-
ever, as we mentioned before, we must do it carefully for the first few steps in order
to keep the local optimum not far away from the global optimum. Is there a way to
find a group of highly correlated users in a single step, namely to find a cluster that
maximizes the average similarity f(C) for a cluster C? Here f(C) is defined as
f(C) =1
|C|2∑
x∈C,y∈C
sx,y
A natural way would be to set a threshold to break some edges and find the largest
component from the modified graph.
Here we set a threshold γ and convert Sm×m to a connected graph based on the
following rule: if si,j > γ, we link the two vertices vi and vj; otherwise we break the
28
Algorithm 3 Largest Component Searching Algorithm
Input: Sm×m, a symmetric correlation matrix of real numbers, and n, the maximumsize of cluster.Output: C, a cluster with a size larger than n and a relatively high average simi-larity within the cluster f(C).Denote γ as the translation threshold, δγ as the smallest distinguished thresholdand Gm×m as the translated graphInitial γ ← 1repeatγ ← γ − δγif su,v > γ then
Set u, v connected in Gelse
Set u, v unconnected in Gend ifFind the largest component in G, denote as C
until |C| ≥ nreturn C
link between two vertices. When γ is close to 1, all the vertices are separate from each
other. As γ decreases, the original separate components connect with each other due
to high correlation. In this case, the largest component in the graph, denoted as C,
can be derived based on the classic algorithm introduced in [18] for an approximate
solution of Eq. (3.1). Here, we can always choose a proper γ to make sure the size of
C is around a prefixed number n. In the following step, we will further refine the set
and automatically determine the final fake profile size n. The algorithm is shown in
Algorithm 3.
3.2.4 Iterative Refinement
We expect that the solution derived from either heuristic merging or largest compo-
nent searching will contain a certain level of noise since many genuine profiles could be
highly similar with at least one of the fake profiles. Based on either algorithm, a real
user’s profile could be classified as an attack profile when one edge with large weight
is connected to the attack group. However the attack profiles share high similarity
29
Algorithm 4 Iterative Refinement
Input: Sm×m, a symmetric correlation matrix of real numbers, and C, a clusterwith a high average similarity within the cluster f(C).Output: C, a cluster with a size larger than n and a relatively high average simi-larity within the cluster f(C).while C 6= C0 doC0 = Cfor i = 1 to s doCoru ←
∑v∈C su,v|C| ,∀u ∈ C
C = C ∪ arg max(Coru)u∈Cend forfor i = 1 to s doCoru ←
∑v∈C su,v|C| ,∀u ∈ C
C = C/ arg min(Coru)u∈Cend for
end whilereturn C
with all the other attack profiles. Therefore we need a further refinement process for
this algorithm.
Here a greedy algorithm is applied for refinement. We separate this process into
two steps, deletion and addition. A pre-set number s steps of addition/deletion is
specified. For the deletion step, we can calculate the average similarity in set C and
remove the least correlated profile from C. We repeat this process s times to delete
s profiles. And then we proceed to the addition step. The average similarity of each
profile in C with profiles in C is calculated and the one with highest average similarity
with C is added. The addition process is repeated s times as well. We repeatedly
perform s deletions and s additions until convergence is achieved. Note that usually
we do not choose s = 1 but a larger number, say s = 10 in this case. This helps
avoid local optima and helps push the final results toward the global optimum. The
algorithm is shown in Algorithm 4.
30
3.3 Spectral Clustering Detection
3.3.1 Spectral Clustering
A connection between data matrices and graphs has been exploited in Section 3.2.
A data matrix can be viewed as a weighted graph G = (V,E) where V is the set
of vertices and E is the set of edges. Each vertex vi ∈ V denotes a corresponding
column/row, and each edge between vi and vj has a weight si,j. Therefore the adja-
cency matrix is represented by Sm×m. To find a highly correlated group in the graph,
our aim is to maximize the intragroup correlations and to minimize the intergroup
correlations. Given a subset of vertices C ⊂ V and its complement C, we define
Cut(C, C) as the cost function of a graph separation.
Cut(C, C) =∑
i∈C,j∈C
si,j
In order to get approximately equal size of each group, we incorporate the group size
and define RatioCut(C, C) as the minimization function.
RatioCut(C, C) =∑
i∈C,j∈C
si,j|C|
+∑
i∈C,j∈C
si,j|C|
(3.2)
Instead of optimizing Eq.(3.1), here in this section we will optimize Eq.(3.2) to get
the highly correlated group for fake user detection.
To rewrite the RatioCut function, let us first define the degree of a vertex vi ∈ V
as
di =m∑j=1
si,j.
31
The degree matrix D is defined as the diagonal matrix with the degrees d1, ..., dm on
the diagonal.
D =
d1 0 · · · 0
0 d2 · · · 0
......
. . . 0
0 · · · · · · dn
The unnormalized graph Laplacian matrix is defined as
L = D − S.
We further define the vector f = (f1, ..., fm)′ ∈ Rm with entries
fi =
√|C|/|C|, if vi ∈ C
−√|C|/|C|, if vi ∈ C
(3.3)
32
The RatioCut function then can be rewritten as follows:
f ′Lf = f ′Df − f ′Sf
=m∑i=1
dif2i −
m∑i=1
m∑j=1
fifjsi,j
=1
2(m∑i=1
dif2i − 2
m∑i=1
m∑j=1
fifjsi,j +m∑j=1
djf2j )
=1
2
m∑i=1
m∑j=1
si,j(fi − fj)2
=1
2
∑i∈C,j∈C
si,j
(√|C||C|
+
√|C||C|
)2
+
1
2
∑i∈C,j∈C
si,j
(−
√|C||C|−
√|C||C|
)2
= Cut(C, C)
(|C||C|
+|C||C|
+ 2
)= Cut(C, C)
(|C|+ |C||C|
+|C|+ |C||C|
)= |V |RatioCut(C, C).
We know that
m∑i=1
fi =∑i∈C
√¯|C||C|−∑i∈C
√|C|¯|C|
= |C|
√¯|C||C|− |C|
√|C|¯|C|
= 0
and
||f ||2 =m∑i=1
f 2i = |C|
¯|C||C|
+ |C| |C|¯|C|= |V |.
From the derivation above, minimizing the RatioCut function is the same as mini-
mizing f ′Lf . From [59], if we relax the condition on f and allow it to be continuous
33
instead of two predefined values in Eq.(3.3), the nontrivial optimal solution to min-
imize f ′Lf would be the second smallest eigenvalue of L, regardless of the smallest
eigenvalue 0 and trivial eigenvector (1, . . . , 1)′. Based on the second eigenvector, we
can map the values back to two corresponding values in Eq.(3.3) and get the right
clusterings.
3.3.2 Dealing with Unbalanced Structure
Standard spectral clustering creates a division into two roughly equal-sized clusterings
based on the RationCut function in Eq.(3.2). However in our case, the target group
size |C| |V |. The final optimization problem would be
minC∈V,|C|≤η|V |
Cut(C, C) =∑
i∈C,j∈C
si,j, (3.4)
where η is an upper bound on the attack size. Here, |V | = m and η|V | = n. To deal
with unbalanced data, we choose the rank-adjustment degree graph for separations
[48]. Basically it can be divided into three steps.
1. Rank Computation: The density rank R(vi) for each vertex vi is calculated
based on the underlying density function ρ(·).
R(vi) =1
|V |∑vk∈V
Iρ(vi)≥ρ(vk).
Here, we use the average weights with vi’s top 30 nearest neighbors as ρ(vi).
2. Graph Construction: Connect each point vi to its d(vi) nearest neighbors in
graph G, where
d(vi) = d0(λ+ 2(1− λ)R(vi)),
34
where λ is a scalar parameter to deal with the unbalanceness. For other edges,
we set the weight as zero. In the following part, we take λ = 0.5.
3. Graph Separation: Calculate the second smallest eigenvector of L and separate
it iteratively for the highly correlated group.
The reason we want to adjust the degree is because we want to add edges in the dense
area while reducing edges in the sparse area. With adjustment, the penalty of having
a smaller cluster has been reduced since edges have already been cut in the sparse
area. After adjustment, when we apply the standard spectral clustering method,
the algorithm can automatically cut the graph into two unbalanced structures. The
smaller cluster with higher average similarity will contain the fake users group CF .
Denote T as the pending processing stack and we initially insert U into it. We pop
each element V from T , apply this algorithm to separate into two subgroups V1 and
V2 and add them into stack T until the group size less than or equal to n. Finally the
group with a size less than or equal to n with the highest average similarity within
the group is selected as CF . The complete algorithm is shown in Algorithm 5.
3.3.3 Iterative Refinement
We expect that the solution derived from spectral clustering will contain a certain
level of noise since many genuine profiles could be highly similar with some of the fake
profiles. Therefore sometimes a real user’s profile could be classified as an attack pro-
file when several edges with large weight are connected to the attack group. However
the attack profiles share high similarity with all the other attack profiles while these
real profiles only share high similarity with a few attack profiles. Therefore we need a
further refinement process to remove these real profiles. The procedure is addressed
in Section 3.2.4.
35
Algorithm 5 Spectral Clustering Attack Detection Algorithm
Input: Sm×m, a symmetric correlation matrix.Goal: Find CF , a cluster with a high average similarity within the cluster with size≤ n.Add users set U into the pending stack Twhile nonempty(T ) do
Pop V from TR(ui) = 1
|V|∑
uk∈V Iρ(ui)≥ρ(uk) for each ui ∈ Vd(ui) = d0(λ+ 2(1− λ)R(ui)) for each ui ∈ VAdjust si,j = 0 if uj is not the top d(ui) nearest neighbor of uiCalculate graph Laplacian matrix L = D − S.Singular vector decomposition L = M × Σ×NTake second eigenvector of L and separate all the values in V into two groups V1
and V2 based on signsif |V1| ≤ n or |V2| ≤ n then
if AvgSim(V1) ≥ max or AvgSim(V2) ≥ max thenCF = V1 or V2
max =AvgSim(V1) or AvgSim(V2)end if
elseAdd V1 and V2 into T
end ifend whilereturn CF
We can further analyze the robustness of the refinement procedure. If both ui
and uj are fake users, assume that the similarity si,j between them is drawn from
N(µF , σ2); otherwise, the similarity (between two real users or between a real user
and a fake user) is drawn from N(µR, σ2). We further assume that they are all
independent. Here µF > µR due to the high correlation between fake users. Suppose
we have found a highly correlated group of users CR with size n. Let x denote the
current percentage of fake users in the group. Then for a new fake user ui coming
from outside of the group, the total similarity distribution with users in the group
36
CR is
CorFui =1
n
∑uj∈CR
si,j
=1
n
∑uj∈CR
uj is fake
si,j +∑uj∈CR
uj is real
si,j
=
1
n
xn∑j=1
Ωj +
(1−x)n∑j=1
ωj
∼ N
(µR + x(µF − µR),
σ2
n
),
where
Ωj ∼ N(µF , σ2) and ωj ∼ N(µR, σ
2).
While for a real user ui, the total similarity distribution
CorRui =1
n
∑uj∈CR
si,j =1
n
n∑j=1
ωj ∼ N
(µR,
σ2
n
).
Thus, CorFui −CorRui∼ N(x(µF − µR), 2σ2
n). A fake user is selected in the refinement
procedure with a probability
P (CorFui > CorRui) = P (CorFui − CorRui> 0)
= φ
(x√n(µF − µR)√
2σ
), (3.5)
where φ(x) is the cumulative distribution function (CDF) of the standard normal
distribution
φ(x) =1√2π
∫ x
−∞e−t
2/2dt
Similarly, a real user is removed from the target group in the refinement procedure
with the same probability in Eq.(3.5). We can see that the robustness depends on
37
the original fake user percentage x, target group size n and the statistical difference
between fake users and real users µF−µRσ
. A larger proportion of fake users, larger
group size, and larger difference between fake users and real users all give more
robustness of the refinement procedure.
3.3.4 Searching for the Number of Attack Profiles
Since we do not have any prior knowledge, it is difficult to find the right number of
attack profiles n. We know that when the size of the group gets larger, the average
similarity within the group would become lower. Therefore for the target profiles with
attacks, our algorithm varies the size of the highly correlated group to get a sequence,
G(n), denoting the maximal average similarity of a group with size n. Suppose G0(n)
is the maximal average similarity sequence for profiles without attacks. The optimal
number of fake profiles n∗ is then obtained from Eq.(3.6) as
n∗ = arg maxnG(n)−G0(n) (3.6)
Therefore, for the detection task, we first formulate it as an optimization problem
in Eq.(3.1). We construct the similarity graph with edge adjustment to deal with the
unbalanced clustering and then apply the spectral clustering algorithms iteratively,
to find the most correlated group with a fix size n, shown in Algorithm 5. We then
refine the results using greedy algorithm in Algorithm ??. Finally we vary the size n
and find the right attack size n∗ to maximize Eq.(3.6).
38
3.4 Experiments
3.4.1 Experimental Setup
In the experiments, we use the MovieLens dataset [28]. It contains around 100,000
ratings from 943 users on 1,682 items. Each user rates at least 20 movies on a scale
from 1 to 5. The density of the rating matrix is 6.3%. 80% of the ratings are randomly
selected as the training set and the rest as testing.
We randomly pick 20 movies as the attack items for each test and artificially insert
50, 70 or 100 attack profiles (corresponding to 5%, 7% and 10% attack size) with filler
size 5%. We choose such a filler size because it is consistent with the real user profiles.
Each movie is attacked individually and the average is reported in the results.
Since a general random attack cannot have a big impact on the final prediction,
we generate ratings based on N(r, (0.7σ)2) for an enhanced random attack. While
for an average attack, we generate ratings based on N(rl, σl2). For a bandwagon
attack, we select movies 50, 56, 100, 127, 174, 181 as the selected set LS and rate
them as rmax = 5. We select these movies because they are rated by more than 300
users and have an average rating larger than 4, which means they are very popular
in the system. We further generate two obfuscated attack models. One is a noisy
bandwagon attack, a special case of Average Over Popular Items (AOP) mentioned
in [31], which randomly selects 3 out of the 6 popular movies mentioned above rated
as rmax to avoid high correlations. The other is a mixed attack [10], which combines
a 3% or 5% average attack together with a 3% or 5% noisy bandwagon attack, to
make the attack model diversified and difficult to identify.
We compare our spectral clustering (SC) algorithm with RDMA [17], DegSim [17],
UnRAP [13] and large component searching (LC) algorithms [68]. In the experiments,
since we have no prior knowledge of the exact number of attack profiles, our algorithm
and LC derive n∗ from Eq.(3.6). For RDMA, DegSim and UnRAP, we assume the
39
exact number of attack profiles is exactly known, which yields the same precision and
recall in the results.
3.4.2 Assumption Validation
From [22], if ru,i1 , ..., ru,ip and rv,i1 , ..., rv,ip are generated from G(r, σ2) with no
correlations, the pdf of su,v is proportional to f(s) = (1−s2u,v)
p−42 . If we want to know
the distribution for independent multiple correlation’s distribution, we can simply do
the convolution of f(s). In real case, p is the number of co-rated items by two users u
and v. Then we can draw the figure for average similarity and 99% percentile average
similarity value in Fig.3.1 and Fig.3.2.
Figure 3.1: The Average Similarity PDF for Different Group Size
In Fig.3.1, we can see that when the group size is becoming larger, the pdf is
more central to its mean. We further see the 99% percentile value with different
group sizes in Fig.3.2 and the shape of the curve is similar to the real users’ maximal
40
Figure 3.2: The 99% Percentile Average Similarity Value
average similarity in Fig.3.3, but a little lower. The reason is that the calculated 99%
percentile average similarity value assumes that the similarities are independent with
each other but actually high correlations exist.
3.4.3 Searching for the Number of Attack Profiles
We run the spectral clustering algorithm first on the real user profiles to get the
relationship between maximal average similarity and group size G0(n). G0(n) is
shown as a solid line in Fig.3.3. We can see that G0(n) first decays very fast but
then begins to decay slowly. To analyze the characteristics of G0(n), we segment the
function into two sections and use an exponential curve and linear curve respectively
to fit. The expression of the fitting curve G∗0(n) is shown in Eq.(3.7) and the curve is
41
Figure 3.3: G0(n) and Fitting Curve
shown by the dashed line in Fig.3.3.
G∗0(n) =
0.686e−0.117n + 0.412, if n ∈ (1, 100]
−2.26× 10−4n+ 0.378, if n ∈ (100, 943]
(3.7)
The real users’ behavior and the real users plus 100 random/average/bandwagon
attackers’ behaviors are drawn in Fig.3.4, 3.5, 3.6. The difference is also shown by
the dashed line. We can see that the difference is maximized when the group size
is around 100 for random attacks and average attacks. However for the bandwagon
attacks, the group size is around 110. This is because the over count of the real users
who are similar to this fake users group. Further analysis will be illustrated in the
following sections.
42
Figure 3.4: Group Size Vs Average Similarity for 100 Random Attackers
3.4.4 Evaluation Metrics
For the classification of attack profiles, we use precision and recall to evaluate the
performance of the detection algorithm:
Precision =TP
TP + FP,
Recall =TP
TP + FN,
where TP is the number of attack profiles correctly detected, FP is the number of real
user profiles misclassified as attack profiles and FN is the number of attack profiles
misclassified as real user profiles.
43
Figure 3.5: Group Size Vs Average Similarity for 100 Average Attackers
3.4.5 Experimental Results and Discussion
The experimental results comparing our proposed SC with LC, RMDA, DegSim and
UnRAP are shown in Table 3.2. We can separate all these algorithms into two classes
based on their underlying assumptions. Both UnRAP and RMDA assume fake users
rate items with lower variance. DegSim, LC and SC assume fake users are highly
correlated and work together.
SC performs very well in most cases, especially in the enhanced random attack
case while the other algorithms lose their effectiveness. The reason is that the random
attack group has the lowest correlations among all the attack models. When the
attack size is small, there is no significant difference between genuine profiles and
fake profiles. Therefore even LC cannot easily find the right size n∗ due to the similar
values of G(n) and G0(n). However, our SC algorithm starts from the global effect
44
Figure 3.6: Group Size Vs Average Similarity for 100 Bandwagon Attackers
and uses the second smallest eigenvector of the graph Laplacian L to cut the graph
with minimal cost. Therefore, it is more stable and more effective.
UnRAP does the worst job in most cases because it assumes a fake user’s ratings
have lower variance and are related to the column mean, row mean, and overall mean.
However, random attacks and bandwagon attacks do not satisfy this assumption,
leading to poor results. The average attacks fit this assumption well so that the
algorithm does a better job but is still not accurate enough. RMDA starts from a
similar assumption to evaluate the genuineness of profiles, but is still not effective
due to the limitation of the assumption.
The DegSim method does not perform well because essentially it starts from the
assumption that fake profiles are highly correlated, but only focuses on the k nearest
neighbors instead of overall neighbors. SC starts from the same assumption but gets
the global optimal solution from min-cut algorithms. Note that SC separates real
45
and fake profiles quite well but the estimate of n∗ usually contains some error. As a
result, sometimes either high precision or recall is achieved but not both. Even in the
two obfuscated attack models with lower correlations, SC still performs well while the
performance of the other algorithms decreases significantly. Overall, SC outperforms
the existing methods, especially when fake profiles are highly correlated.
46
Tab
le3.
2:E
xp
erim
enta
lR
esult
sA
ttack
Mod
elE
nh
an
ced
Ran
dom
Att
ack
Aver
age
Att
ack
Ban
dw
agon
Att
ack
Aver
age
Over
Pop
ula
rIt
ems
Mix
edA
ttack
Att
ack
Siz
e5%
7%
10%
5%
7%
10%
5%
7%
10%
5%
7%
10%
3%
+3%
5%
+5%
Pre
cisi
on
SC
99.8
%99.9
%99.0
%99.3
%99.9
%99.9
%92.3
%95.7
%91.7
%88.9
%92.4
%93.9
%99.8
%99.0
%L
C0.3
%21.2
%53.6
%83.2
%99.8
%99.2
%92.7
%91.5
%90.5
%90.6
%89.3
%87.8
%92.1
%95.0
%D
egS
im5.2
%5.7
%18.7
%23.1
%36.9
%60.3
%74.5
%77.3
%81.2
%64.3
%72.2
%72.1
%43.4
%66.4
%R
MD
A72.3
%74.4
%78.3
%74.2
%76.2
%77.2
%72.5
%79.0
%81.2
%71.4
%78.2
%81.6
%33.3
%57.3
%U
nR
AP
1.0
%1.4
%3.0
%48.2
%48.8
%68.2
%8.7
%9.6
%28.1
%6.7
%12.6
%23.1
%23.3
%33.3
%
Rec
all
SC
90.7
%92.9
%94.1
%91.0
%92.9
%99.9
%99.7
%96.1
%99.7
%89.2
%93.6
%92.1
%92.3
%94.2
%L
C0.3
%46.3
%64.2
%81.0
%94.6
%99.0
%99.8
%99.9
%99.5
%96.2
%93.1
%92.1
%97.1
%96.8
%D
egS
im5.2
%5.7
%18.7
%23.1
%36.9
%60.3
%74.5
%77.3
%81.2
%64.3
%72.2
%72.1
%43.4
%66.4
%R
MD
A72.3
%74.4
%78.3
%74.2
%76.2
%77.2
%72.5
%79.0
%81.2
%71.4
%78.2
%81.6
%33.3
%57.3
%U
nR
AP
1.0
%1.4
%3.0
%48.2
%48.8
%68.2
%8.7
%9.6
%28.1
%6.7
%12.6
%23.1
%23.3
%33.3
%
Pre
dic
tion
Sh
ift
SC
0.0
40.0
50.0
70.0
10.0
3-0
.01
0.0
20.0
10.0
50.1
00.1
70.1
20.0
20.0
6L
C0.5
80.5
70.5
20.1
3-0
.01
0.0
10.0
10.0
10.0
30.0
70.1
30.2
90.0
1-0
.03
Deg
Sim
0.5
30.6
50.7
10.8
50.9
10.9
00.5
10.6
50.7
40.5
70.6
40.6
50.7
20.7
9R
MD
A0.2
10.2
80.3
40.3
80.4
20.5
70.5
10.6
20.6
90.5
20.6
50.6
80.9
30.9
6U
nR
AP
0.5
70.7
00.7
90.6
50.7
00.8
11.0
81.2
81.3
01.0
81.1
71.2
91.0
31.1
5
47
Chapter 4
Location-based Recommender
Systems
Location-based recommender systems have attracted a large number of users in recent
years since wireless networks and mobile devices have rapidly developed. Realtime
location-based recommender systems should take location, temporal information, and
social network information into consideration, in order to improve the user experi-
ence. In this chapter, we first review the development of augmented reality in recent
years from a content generation perspective, serving as an introduction to location-
based recommender systems. Then we present an aggregated random walk algorithm
incorporating personal preferences, location information, temporal information, and
social network information in a layered graph. By adaptively changing the graph edge
weight and computing the rank score, the proposed location-based recommender sys-
tem predicts users’ preferences and provides the most relevant recommendations with
aggregated information. Section 4.1 reviews recent applications, technologies, and
current trends in augmented reality. Section 4.2 briefly introduces the background
of location-based recommender systems and Section 4.3 formulates the problem as
a multi-dimensional recommender system. In Section 4.4, a biased random walk al-
48
gorithm is introduced to incorporate all available information. Finally experimental
results are shown in Section 4.5.
4.1 Introduction to Augmented Reality
4.1.1 AR Ecosystem
Augmented Reality (AR) has become an emerging technology in daily life. With
accurate location information, virtual objects can be integrated with the real world,
which allows users to interact between the real and virtual world. In the work of
Azuma in 1997 [5], three characteristics of AR are identified:
• Combine real and virtual objects in a real environment;
• Run interactively in both 3D and real time;
• Align real and virtual objects with each other.
AR technologies, both hardware and software, have rapidly developed in the past
several years [71], and the market has driven the development of more commercial AR
applications (e.g., Layar, Google glass, and Wikitube). We envision an AR ecosystem
which, with content as the core, bringing together content providers, users, AR ap-
plication developers, AR device manufacturers, industrial and academic researchers,
and transforming the current AR landscape (the way iTunes has changed mobile ap-
plication distribution). The AR ecosystem framework is shown in Fig.4.1. Content
providers will aggregate data from third party companies such as Google or Wikipedia,
local broadcasting sources, environmental sensors, and users, generate AR content,
and export general APIs to support a large range of AR applications. Users will
not only consume AR content and services but also will generate their own content
(e.g., locations or local information), thanks to sensing ability of their smart devices.
49
Figure 4.1: AR Ecosystem Framework
They will also contribute to the system in a crowd-sourcing approach, like the current
YouTube model. AR device manufacturers focus on hardware design such as GPS,
sensors, displays, or integration like smartphones or AR glasses. Researchers can con-
tribute by inventing advanced techniques in tracking, computer vision, ad hoc and
opportunistic data/content delivery and dissemination, mobile computing, display,
energy efficiency, etc. With the ecosystem, AR application developers do not need
to collect data, design their AR devices, or propose their own tracking algorithms.
Instead, they can use standard APIs to get data packages from content providers, and
embed existing AR-related algorithms into their devices made by third party manu-
facturers. Meanwhile, users can contribute through interaction with the ecosystem.
Each party in the ecosystem plays its own role, improves the efficiency of the whole
AR environment, and makes it more sustainable and extendable.
Recent advances in hardware and software for AR have been reviewed in several
survey papers [4, 5, 57, 71]. Localization and calibration have been the most difficult
challenges since AR was first proposed in the 1960s. Current sensor networks apply
multi-sensors system and cooperative localization algorithm to overcome them [60].
With the rapid development of wireless communication such as 3G and WiFi, the
50
communication and data exchange between each components of the AR ecosystem
are easy to implement. Ad hoc and opportunistic communication further provides a
scalable way to deliver AR content to users, especially in the current era of mobile
data explosion. The maturity of mobile computing along with the development of
AR-related algorithms lays a good foundation for the AR ecosystem in both hardware
and software. However, a natural challenge is how to generate the content for AR.
Therefore, a core component of the AR ecosystem would be content generation [6].
Content providers aggregate data from multiple sources, process and generate struc-
tured content which will be displayed to the users by AR devices, and thereby enable
the interaction with virtual objects.
4.1.2 Content Fusion in AR
A large amount of information is available online. However, display screens of AR
systems are usually small and narrow. Therefore, after gathering enough data from
content providers, an AR system will integrate multiple data streams that represent
the same real-world object, keeping the captured information consistent, accurate,
and informative. Content fusion plays an important role in user experience and
effective methods are necessary to determine what to display on the screen. Limited
by the computing power of local devices, current AR systems usually implement
content fusion offline, select highly related information, and store it in the database
in advance [73]. However, since AR systems operate interactively between real and
virtual objects in real time, the online content selection would be a main approach
to help users get the most relevant information [58,60,70].
For the content fusion pipeline, we can refer the three-tier model proposed by
Reitmayr and Schmalstieg [49, 53]. The first tier is a database, where data is ac-
quired from third party companies. The second tier is delivery, where the data in the
database is restructured to meet the specific use of the applications. The third tier is
51
Figure 4.2: Content Fusion Pipeline
for different applications to use, which belongs to the online content recommendation.
The pipeline is shown in Fig.4.2.
Offline Data Preprocessing
Information integration in the database can be regarded as offline content selection or
data preprocessing. Usually AR application developers do not execute the function
of content providers at the same time. They will download standardized format
content packages from third party companies such as Google, Wikipedia, Yelp, etc.
Therefore highly related and structured information is selected and assembled by
these websites in advance. However, some AR application developers still want to
personalize their AR system content by using their own expertise. Zhu et al. [73]
propose an AR shopping assistant providing dynamic contextualization to customers.
Product context is utilized and complementary products are aligned with each other
in the database in advance. When customers are interested in some specific items,
the shopping assistant automatically provides recommendations for closely related
products as well.
Online Content Selection
For online fusion, information is automatically selected in real time, depending on the
particular location, orientation, and user’s preference. In 2000, Julier et al. [33] in-
troduced the concept of information filtering to automatically select content to users.
52
They also specified some characteristics and desirable prperties of online content se-
lection procedures.
• Any object, of any type, at any point in time, can become sufficiently important
when it passes the filtering criteria.
• Certain objects are important to all users at all times.
• Some objects are only important to the particular users.
• All things being equal, the amount of information shown to a user about an
object is inversely proportional to the distance of that object from the user.
Filtering criteria helps to evaluate whether a certain object is important enough
for a specific users. Based on the filtering criteria, there are three kinds of information
filtering methods.
• Distance-based filtering: It thresholds an object’s visibility based on its distance
from the user. If the distance is larger than a pre-set threshold, information
about the object would be invisible to the user. However, some soft-threshold
methods are proposed as well. One example is the Touring Machine [21], the
brightness of augmented labels decreases as they are further far away from
center.
• Visibility-based filtering: The visibility of virtual objects depends on whether
the real objects are visible to user at the current time. It will automatically
prevent extra information of invisible objects from being displayed on the screen.
• Advanced filtering: Benford et al. establish a spatial model, using focus and
nimbus to determine the importance of objects [9]. [33] proposes hybrid filtering,
which combines a spatial model and logic rules together with knowledge of the
user’s objectives.
53
Currently more advanced techniques are being used for online content selection.
When the goal of the user is well defined, location is often one of the most impor-
tant criteria for content selection. [58] proposes a touring system to help reconstruct
archaeological sites using wearable and mobile computers. Based on the different
locations, computers will automatically download related information, providing ar-
chaeological sites and audio narration. The in-car intelligent system [60] updates
surrounding traffic information for drivers in real time to avoid possible accidents.
Specifically, social messages from other drivers such as ”Follow Me” or warnings from
the sensor systems of other cars such as ”Distracted Driver” are augmented to the cur-
rent driver through the intelligent system, improving car-to-car communication. [35]
proposes an AR education system, with an automatic content selection procedure.
Through mobile devices and positioning systems, learners have access to relevant
information as they arrive at certain locations.
Sometimes when the purpose or the preference of the customers are uncertain, de-
signing such a content selection criteria can be more challenging. Plenty of historical
data is stored and much calculation is required to better understand a user’s pref-
erence. The shopping assistant in [73] provides personalized item recommendations
based on customer preferences by using the collaborative filtering algorithm. The
most related items are recommended as customers walk around. [70] further extends
the idea into multi-dimensional recommender systems and proposes a graph-based
algorithm to automatically recommend AR users some places of interest, based on
the time, location, user history, and social network information.
In all, content fusion plays an important role in AR systems. Offline data pre-
processing deals with large amounts of data in cloud servers with high speed and the
data is stored with a predefined structure. After that the online content selection is
processed by local AR devices as real-time data is collected from devices. However,
computational power or memory can be limitations that prevent complex algorithm
54
implementation. Therefore sometimes crowd-sourcing or cloud computing is applied
to address the limited computing power of local devices.
4.2 Background of Location-based Recommender
Systems
As mentioned in the last section, with the rapid development of wireless networks,
location-based services using mobile devices such as AR google glass, Yelp and
Foursquare, have gained an inordinate number of users in recent years. As infor-
mation increases from multiple sources while the screen size of mobile devices is
limited, it is becoming increasingly important to design location-based recommender
systems to push relevant information to mobile users [46]. Compared with traditional
recommender systems, location-based recommender systems have the following
characteristics.
• Location: Nearby recommendations are usually more interesting than a place
in a remote location [64].
• Timing: Short-term or in-time preferences have high priority. For example, on
Sunday at 11am, a user is more likely looking for a brunch restaurant or coffee
shop rather than a night club [63].
• Cold start: New users are sensitive to application user experience but sparse
data may lead to inaccurate recommendations [26].
• Immediate feedback: Sometimes location-based recommender systems have to
react quickly based on users’ behaviors. Users may click interesting items im-
mediately after a recommendation is made. The recommendation list is then
updated accordingly.
55
To address these characteristics, extra information may help improve recommen-
dation quality, such as temporal information, location information and social network
information [54,63]. One possible approach for incorporating this additional informa-
tion is multi-dimensional collaborative filtering. [56] and [2] used a reduction-based
algorithm and applied classic 2D collaborative filtering algorithms to produce a final
recommendation. This approach has limitations. For example, when we extend avail-
able ratings into high dimensions, the data will become extremely sparse and many
existing algorithms will lose their effectiveness. Also, when multiple dimensions are
decoupled as pairs of dimensions, relationships among more than two dimensions are
likely to be lost. Gori and Pucci [27] first proposed the ItemRank algorithm using a
random walk to rank all the items for recommendations. [32, 54] further incorporate
friendship and social network information. [38] proposes a random walk algorithm
based entity ranking on graph for multidimensional recommendation. Three advan-
tages are pointed out, namely flexibility to incorporate any type of entities, dealing
with data sparsity and indirect relationships, and adaptability with various graph
ranking algorithms. However, it fails to clearly specify the methodology of incorpo-
rating location and temporal information. [70] models geographical information as a
decay function while [25] and [24] models temporal information as Gaussian mixture
model and linear regression respectively.
4.3 Problem Formulation
Assume that there is a set of users U = u1, ..., um and a set of places I = i1, ..., in.
Traditionally, a two dimensional rating matrix U × I → R can be constructed. Each
element ru,i in R denotes user u’s rating of place i. The ratings can be either explicit,
for example, on a 1-5 scale as in TripAdvisor, or implicit such as “visited” or “not
visited”. The rating data typically specifies only a small number of the elements
56
of R. We further assume binary label and user social information is given. Let
L = l1, ..., lk be the set of label information of items. For example, for places, L
can be restaurants, shopping malls, bus stops, etc. Li ∈ 0, 1k denote the features of
place i, where k is the total number of labels. Correspondingly, let S = (U, ε) contain
social network information, represented by an undirected or directed graph, where U
is a set of nodes while ε is a set of edges. ∀u, v ∈ U , (u, v) ∈ ε if v is a friend of u. We
further denote set of times as T and location set as P . Then the multidimensional
rating matrix will be formulated as R = U × I × L × T × P . Given the target user
u, the current time t and current location p, our ultimate goal is to find the optimal
place defined below.
∀u ∈ U, t ∈ T, p ∈ P, iu,t,p = arg maxiR(u, l, i, t, p) (4.1)
In the following part, we will address the questions mentioned above with a pro-
posed random walk algorithm.
4.4 Random Walk in Location-based Recom-
mender Systems
In this section, we will describe our random walk algorithm in detail, and discuss how
to deal with specific issues in multidimensional AR recommender systems.
4.4.1 Model Construction
Graph Formulation
Let G = V,E be a directed graph model for AR recommender systems, as shown
in Fig.4.3. We construct the graph by the following rules. 1) Nodes V represent
constant attributes such as users, places and labels; for each entity all the nodes stay
57
Figure 4.3: An example of recommendation graph
in the same layer; 2) Edges/Weighted edges E represent variables such as locations
or time, or relationships, e.g. social network information.
In this recommender system, the nodes V = U ∪ I ∪ L form three layers, which
consist of users, places and labels. The edges E are classified into one of five classes
(described below) based on the layers that the nodes belong to. Higher weight means
higher chance to transition from one node to another. We incorporate personal
records, location information, and label information into the graph. Note that we
use the inverse exponential distribution to model a human’s mobility [3]. Let d de-
note the distance between current location and the target place. Then the human’s
mobility is modeled by the distribution 1Z
exp(−αd) where α ≥ 0 is a decay parameter
and Z is a normalization factor. α is a tunable factor set by users. For example, if
α = 0, distance to the current location will not affect recommendation results. In
contrast, if α is large, only nearby places will be recommended. Experimental results
58
in Section 4.5.5 show the effect of α on average distance between local position and
recommendation.
• For u ∈ U, i ∈ I, (u, i) ∈ E and (i, u) ∈ E if and only if user u has vis-
ited i (assume only implicit ratings are available) and the weight wu,i =
exp(−α(d(i, p)), wi,u = 1 where d(:, :) is the distance between two places, p is
the current location.
• For i ∈ I, l ∈ L, (i, l) ∈ E and (l, i) ∈ E if and only if Lli 6= 0, i.e., the place i
belongs to label l and the weight wi,l = wl,i = 1.
• For u1, u2 ∈ U, (u1, u2) ∈ E if and only if (u1, u2) ∈ ε, which means u2 is a friend
of u1. Note that the relationship in social networks is not necessarily mutual
such as “follow” in Twitter.
• For i1, i2 ∈ I, (i1, i2) ∈ E and (i2, i1) ∈ E if and only if i1 6= i2. Define
wi1,i2 = exp(−α(d(i1, i2)).
• For l1, l2 ∈ L, (l1, l2) ∈ E if and only if the transition probability from label l1
to l2 is greater than 0, which we will get from the training data set.
Transition Probability
Assume that a random walk with an initial distribution is applied in such a weighted
graph. The path is a sequence of random variables X1, ..Xt, ..., which form a Markov
chain and the future state only depends on the current state. We need to further
normalize the weight to make it a transition probability.
Let Y1, Y2, ..., Ys denote s layers in the graph. We first define the transition prob-
ability Tij between different layers Yi, Yj.
Tij :=∑
n1∈Yi,n2∈Yj
P (Xt+1 = n2|Xt = n1). (4.2)
59
Specifically in this case, we have 3 layers U, I, L and we define Tij = 13,∀i, j ∈ 1, 2, 3.
We further define the transition probability between different nodes ni ∈ Yx, nj ∈
Yy. It is normalized by all weights to the layer Yy times a layer transition probability
Txy.
Pij := P (Xt+1 = nj|Xt = ni) =wni,nj
Txy∑n∈Outi∩n∈Yy wni,n
, (4.3)
where Outi = n|(ni, nj) ∈ E.
Temporal Information
Another important factor in the AR recommender system is temporal information.
For example, users at noon may look for restaurants rather than nightclubs, while at
around 3pm they might prefer coffee or ice cream rather than fine food. Accordingly,
we will calculate the probability Pu(t) of every label activities within each time slot,
say half an hour or an hour. Here Pu(t) is a k × 1 histogram distribution vector to
denote the probability that a specific user u looks for some places related to label l
at time slot t.
4.4.2 Score Computation
Random Walk
For the recommendation graph G = (V,E), let the |V | × 1 vector θ denote the cus-
tomized probability vector. We will illustrate how to set θ based on a customized
request in the following. We define another parameter β ∈ [0, 1], called the damping
factor. With probability β, the random walk will continue its path in G. Otherwise,
it will go back to the customized probability distribution θ. Let the |V | × |V | matrix
M denote the Markov transition matrix, in which Mij = Pji in Eqn.(4.3). We fur-
ther define γ as the stationary distribution for this random process. It satisfies the
60
following equations:
γ = βMγ + (1− β)θ (4.4)
Therefore, we can transform Eqn.(4.4) into
γ = (βM + (1− β)θ1T )γ, (4.5)
where 1T is a |V | × 1 vector and 1Tγ = |γ|1 = 1.
We define A = βM + (1 − β)θ1T and the rank score γ would be the eigenvector
of A. In the following section we will assign suitable θ for different purposes. Based
on Eqn.(4.5), we can calculate the rank score γ.
Temporal Information
For a specific time t and a target user u at a location p, we aggregate the user
information and location information into the graph. In order to combine Pu(t) and
γ to make an effective recommendation, we let Q be the |V | × k label matrix where
Qij denote whether vi belongs to label lj. Qij = 1 if and only if vi is a place node
(i.e., in the place layer) and vi belongs to label lj. Otherwise Qij = 0. Then our final
recommendation score will be
γ(t) = γ · (Q× Pu(t)), (4.6)
where · is the dot product between two vectors.
4.4.3 Personalized Recommendation through θ
Now the only challenge for this random walk algorithm is how to set θ to meet each
user’s personal requirement.
61
Regular Case
We set
θ =1
2eu +
1
2ei,
where eu and ei are the |V | × 1 unit vector, corresponding to the target user and its
current place. The ranking score β based on the PageRank algorithm is calculated.
We sort all the nodes in V based on γ and select the top n places for the top n
recommendations.
Group Case
We set
θ =1
n
n∑j=1
euj ,
where u1, ..., un are n users for recommendations. If the current location is known,
we can further add it into the θ as
θ =1
2n
n∑j=1
euj +1
2n
n∑j=1
eij ,
where u1, ..., un are n users and i1, ..., in are their corresponding locations.
Cold Start Case
The cold start problem has been one of the most important issues in recommender
system for years. It is crucial since new users will not tolerate a bad user experience for
a long time. In other words, if the recommender system cannot give good predictions
for the first several attempts, the new user may quite possibly delete the application
forever. However sparse data may lead to inaccurate personal rank score γ for new
users. Averages based on relatively low support (small values of |Iu|, i.e. the number
of places that user u has visited) can generally be improved by shrinkage towards a
62
common mean γ [8]. Set θ = 1|U |1v∈U and we can compute the rank score γ for
overall users. Therefore we can further define the rank score for cold start users as
follows.
γcold =|Iu|γ + τ γ
|Iu|+ τ, (4.7)
where the parameter τ controls the extent of the shrinkage.
Interaction/Update Case
Users will interact with an AR system in real time. When several selected places are
shown in the AR system, users will use their finger to click on the item, denoted as
i, in which they are interested. Here we propose two methods for further updates,
listed below.
• Label-driven Update: This method is fast and focuses on the label behind the
place i only. Every time user u selects place i among all the available items
the recommender system provides, we regard u’s purpose to be the label set to
which i belongs. We then replace all the top n recommendations by the top n
places that belongs to these labels.
• Place-driven Update: This method is relatively slow and needs to recompute
the rank score. Basically, we will set θ = ei to compute the rank score γ for the
recommendations.
4.5 Experiments
4.5.1 Preliminary Experiments for Geographical Model
We downloaded the Gowalla dataset1, which contains 19,183 users, 30,367 places in
NYC and 357,753 check-ins, each of which records a specific place a user has visited.
1http://code.google.com/p/locrec/downloads/list
63
Table 4.1: Average Percentile of Recommendationsα 0 0.1 0.5 1
Percentile(%) 93.5 76.3 57.4 50.1
Table 4.2: Recall of Top RecommendationsRecall (%) α = 0 α = 0.1 α = 0.5 α = 1
Top 10 83.3 32.1 26.6 24.5Top 30 95.2 75.1 47.1 30.1Top 50 100.0 81.0 65.3 43.0
Table 4.3: Average Distance of Top RecommendationsDistance (km) α = 0 α = 0.1 α = 0.5 α = 1
Top 10 18.46 7.52 5.82 3.65Top 30 20.01 9.30 7.63 6.34Top 50 22.85 13.32 10.90 9.60
We randomly select 100 users, 100 most popular places and 831 corresponding check-
ins to construct the transition graph. Due to the incomplete data, in this experiment
the layered graph only contains users, locations, and check-in information. Then a
target user is randomly picked and the current location is randomly generated. We
use the proposed algorithm to recommend places to the target user and calculate
three measures of the top recommendation list. Percentile is the average position
(in percentage) of the actual visited places out of the whole set of places, where 99%
denotes the top 1%. Recall is the number of hits (i.e., the visited places) of the
top n recommendations divided by the total number of visited places. These two
measures can evaluate the effectiveness of recommendations. The average distance
of the top recommendations can evaluate whether the location factor is taken into
account in the recommendations. In the experiment, we set β = 0.85 and vary α to
see the different performance of the algorithm. Note that when α = 0, the algorithm
will become the traditional random walk without location information. We compare
the average percentile, recall, and average distance with different α. We repeatedly
generate target user and current location 500 times. The average results are reported
in Table 4.6, 4.2 and 4.3.
64
Table 4.4: Statistics of Foursquare Dataset# of users 636# of places 1,012
# of check-ins 46,032# of friendship 674
The results show that incorporating location indeed improves AR applications by
reducing the average distance of top recommendations. When α becomes larger, the
average distance decreases while the percentile and recall decrease as well. Therefore
the value of α can deal with the tradeoff between recommendation accuracy and
average distance. Larger α means higher priority of distance and lower weight on
personal tastes.
4.5.2 Dataset Analysis
Overall Analysis
We downloaded the Foursquare dataset 2 [23], which contains 18,107 users ranging
from March 2010 to January 2011. For each user, we have their social networks,
previous check-in locations and the corresponding check-in time. Also geographical
information (e.g., longitude and latitude) of all the places are included, which can help
us calculate the distance between pairwise locations. Notice that the check-in places
are all around the world. Since our algorithm takes the geographical information into
consideration, it is more likely to apply in a certain area such as a city rather than the
whole world. Therefore we take a certain area with a radius of 50km. Also we remove
all the uses with less than 10 check-ins among the selected places. Some statistics of
the dataset is shown in Table 4.4.
2http://www.public.asu.edu/ hgao16/dataset.html
65
Table 4.5: Geographic Distance StatisticsDistance(km) Mean Median
Overall 19.6 16.5Single User 11.3 10.7
Consecutive Place 8.73 3.52
Geographical Information
Now let’s explore some of the geographical information for the check-in places. Fig.4.4
shows the histogram for distance between pairwise check-in places. We can see that
the majority of the distances are in the interval between 0 and 50km, which is the
radius of the area. The mean and median distance between pairwise check-in places
are 19.6km and 16.5km respectively. However, when we look at the mean and median
distance between check-in places for a single user, the number decreases to 11.3km
and 10.7km respectively. Its distribution is shown in Fig.4.5. When we look at the
average distance between two consecutive check-in places, it becomes only 8.73km
while its median is only 3.52km. It also states that users do check in at nearby
places rather than distant ones. If we incorporate geographical information as the
exponential function decaying with distance, it will increase the prediction accuracy.
Temporal Information
First let’s see the overall check-in time frequency in Fig.4.6. We can see that the
frequency achieves its peak at midnight while reaches the lowest point at noon, since
people will usually stay at the workplace during the daytime and prefer to share
their locations only when they are after work. When we look at the check-in time
frequency along 24 hours, it varies and has a relatively large standard deviation. Here
the individual’s temporal preference is neglected due to limited data. Therefore Pu(t)
can be rewritten as P (t). To evaluate the difference between time t, we propose a L1
66
Figure 4.4: Pairwise Check-in Places Distance Distribution
norm as follows.
L1(P ) =1
242
23∑t1=0
23∑t2=0
|P (t1)− P (t2)|.
Here in this dataset, L1(P ) = 0.829. Therefore, incorporating temporal informa-
tion into the final ranking score will help increase the prediction accuracy as well.
We randomly select 70% check-ins as training set and the rest as testing set. For
testing, the actual check-in place is hidden but the check-in time and geographic
information (we still regard each user is at the previous check-in location) are known.
4.5.3 Evaluation Metrics
We evaluate our results with two popular evaluation metrics for top-k recommenda-
tions: recall and percentile.
67
Figure 4.5: Average Distance Distribution for a Single User
Recall : In the top-k recommendations, we consider any item in the top-k recom-
mendations that match any items in the testing set as a “hit”, as in [54].
Recall(k) =#of hits in top-k
#of testing data.
Percentile: The individual percentile score is simply the average position (in per-
centage) that the actual check-in place in the test set occupies in the recommendation
list. For example, if the actual check-in place is ranked 15th out of 100 places, the
percentile would be 85%. We calculate all the percentiles for testing data and get the
average to report in the following section.
68
Figure 4.6: Check-in Time Frequency
4.5.4 Compared Algorithms
We compare our proposed temporal random walk algorithm (TRW) with the following
popular algorithms and the results are reported in the following subsection.
• Most Popular Place (MPP): the overall frequency of check-in places are calcu-
lated and the most popular items are recommended to each user. In this case,
all the users are treated equally.
• K Nearest Neighbor (KNN): the k nearest neighbors of the target user is calcu-
lated and their average preference list is recommended to the target user.
• Naive Random Walk (NRW): we use a naive graph without geographical infor-
mation and temporal information, namely all the places are equally likely to
transit from each other, regardless of the distance.
69
Table 4.6: Average Percentile of RecommendationsMPP KNN NRW GRW TRW78.7% 77.2% 77.7% 78.1% 88.6%
Table 4.7: Average Percentile of RecommendationsHitRatio@k MPP KNN NRW GRW TRW
50 38.2% 38.6% 33.8% 39.1% 39.7%100 53.3% 51.8% 51.5% 55.2% 55.9%150 60.4% 58.3% 59.3% 62.2% 67.5%200 66.1% 63.1% 65.0% 66.9% 78.5%
• Geographical Random Walk (GRW): the geographical information with an ex-
ponential decay model is added into the graph. θ is defined as a combination
of target user and his current location.
• Temporal Random Walk (TRW): in addition to geographical information, the
temporal information is further incorporated into the ranking score of recom-
mendation list.
4.5.5 Experimental Results
Experimental results comparing with state of the art methods are shown in Table 4.6
and Table 4.7. We can see that MPP and KNN provides the baseline for all. MPP
performs surprisingly well because actually in the dataset top 5% popular check-in
places count for around 65% check-in times of total. Therefore even though it does
not take individual’s preference into consideration, this naive algorithm still performs
well. Moreover we can see that NRW is even worse than MPP in the hit ratios, but
when geographical and temporal information are taken into consideration, the hit
ratio significantly improves. The fact that the top 50 hit ratio does not change too
much for TRW is simply because of the same reason that the top 5% popular check-in
places account for 65% of the histories. But the average percentile has been improved
around 10% comparied to MPP or NRW.
70
Figure 4.7: Top K Hit Ratio
71
Chapter 5
Conclusion
Recommender systems have been widely used for many years and generate much profit
for companies such as Netflix and Amazon. This dissertation has mainly focused on
three aspects of recommender systems – sparsity, robustness, and diversification. We
provides a better understanding of the current challenges of recommender systems
and their solutions.
5.1 Contribution of the Dissertation
Chapter 2 proposes iterative collaborative filtering to deal with sparse data in recom-
mender systems. Instead of calculating the similarity function and doing a weighted
summation aggregation only once, our algorithm first calculates the similarity func-
tion in a limited but reliable region. Based on an adaptive parameter, it then selects
a reliable subset of the missing ratings to fill in using the current similarity. It then
uses the new estimates of rating to go back and update the similarity. This itera-
tive process leads to a better prediction. Experimental results show our algorithm
performs better than other state of art methods when data is relatively sparse.
In Chapter 3, a spectral clustering algorithm is applied to the detection of shilling
attacks in recommender systems. High correlations between fake users are assumed
72
and pairwise correlations are calculated to avoid the sparsity of data. A submatrix
optimization problem is formulated for this detection and then transformed into a
graph. The spectral clustering algorithm is applied to solve the min-cut problem in
the graph with unbalanced structure. Experimental results show that our spectral
clustering algorithm performs better than other current methods for several attack
models.
In Chapter 4, we first review the state of the art in AR from a content-oriented
perspective. The general concept and main components of the AR ecosystem are
described, and a core component of the AR ecosystem, the content component, is in-
troduced. Then we have proposed a location-based multi-dimensional recommender
system by applying a random walk algorithm. We incorporate location information
into the weight of edges in the graph and incorporate temporal statistics as well into
the rank score. Therefore we aggregate the user’s personal preference, location infor-
mation, and temporal information into the biased random walk to recommend places
for mobile users. Experimental results show that location is indeed incorporated into
the layered graph.
5.2 Future Research Directions
For the iterative collaborative filtering, though prediction accuracy is improved when
the rating matrix is sparse, the confidence of the estimates is still hard to calculate,
especially in the case that we will use the estimates in the dense areas to reestimate
ratings in the sparse areas. Without clear estimates of confidence, the noise can be
propagated through the iterative estimate process.
For the shilling attack detection, spectral clustering minimizes the inter-group
correlations to find the highly correlated group. In this optimization problem, two
relaxations are applied. One is from discrete values to continuous values, which leads
73
the separation to be sub-optimal. Another is the unbalanced structure. Spectral
clustering tries to cluster the nodes with similar sizes. Even though some adjustment
of edges are applied to get the unbalanced structure, the process of tuning parameters
is ad-hoc and time-consuming. Moreover, more types of attack models should be
tested in the experimental part.
For the biased random walk algorithm, sometimes it is difficult to combine infor-
mation using an uniform metric. When multiple aspects are taken into consideration,
there are usually some conflicts between different dimensions. In this case, if we
cannot evaluate the confidence of our recommendation, we are unable to aggregate
information from different dimensions together.
Moreover there are more evaluation measures that this thesis does not cover such
as novelty or confidence. Novelty means that instead of recommending items that is
the top seller and known for everyone, we need to recommend some items that users
would be unlikely to try without the help of recommender systems. Amazon can
recommend the book “Harry Potter” to every young user. It is statistically accurate
but ineffective in practice. Another interesting topic in recommender systems would
be diversified recommendations. The recommender system should give users person-
alized recommendations instead of uniform ones. Then some metrics to calculate the
overall difference between all the recommendations.
74
Bibliography
[1] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommendersystems: A survey of the state-of-the-art and possible extensions. Knowledgeand Data Engineering, IEEE Transactions on, 17(6):734–749, 2005.
[2] Gediminas Adomavicius, Ramesh Sankaranarayanan, Shahana Sen, and Alexan-der Tuzhilin. Incorporating contextual information in recommender systems us-ing a multidimensional approach. ACM Transactions on Information Systems,23(1):103–145, 2005.
[3] Miltiadis Allamanis, Salvatore Scellato, and Cecilia Mascolo. Evolution of alocation-based online social network: analysis and models. In Proceedings of the2012 ACM Internet Measurement Conference, pages 145–158. ACM, 2012.
[4] R. Azuma, Y. Baillot, R. Behringer, S. Feiner, S. Julier, and B. MacIntyre.Recent advances in augmented reality. Computer Graphics and Applications,IEEE, 21(6):34–47, 2001.
[5] R.T. Azuma et al. A survey of augmented reality. Presence-Teleoperators andVirtual Environments, 6(4):355–385, 1997.
[6] P. Belimpasakis, Y. You, and P. Selonen. Enabling rapid creation of contentfor consumption in mobile augmented reality. In Next Generation Mobile Ap-plications, Services and Technologies, 2010 Fourth International Conference on,pages 1–6. IEEE, 2010.
[7] R. Bell, Y. Koren, and C. Volinsky. Modeling relationships at multiple scalesto improve accuracy of large recommender systems. In Proceedings of the 13thACM SIGKDD international conference on Knowledge discovery and data min-ing, pages 95–104. ACM, 2007.
[8] Robert M Bell and Yehuda Koren. Improved neighborhood-based collaborativefiltering. In KDD Cup and Workshop at the 13th ACM SIGKDD, 2007.
[9] S. Benford and L. Fahlen. A spatial model of interaction in large virtual en-vironments. In Proceedings of the third conference on European Conference onComputer-Supported Cooperative Work, pages 109–124. Kluwer Academic Pub-lishers, 1993.
75
[10] Runa Bhaumik, Bamshad Mobasher, and RD Burke. A clustering approach tounsupervised attack detection in collaborative recommender systems. In Pro-ceedings of 7th IEEE ICML, Las Vegas, USA, pages 181–187, 2011.
[11] D. Billsus and M.J. Pazzani. Learning collaborative information filters. In Pro-ceedings of the Fifteenth International Conference on Machine Learning, vol-ume 54, page 48, 1998.
[12] J.S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algo-rithms for collaborative filtering. In Proceedings of the Fourteenth conference onUncertainty in artificial intelligence, pages 43–52. Morgan Kaufmann PublishersInc., 1998.
[13] Kenneth Bryan, Michael O’Mahony, and Padraig Cunningham. Unsupervisedretrieval of attack profiles in collaborative recommender systems. In Proceedingsof the 2008 ACM RecSys, pages 155–162. ACM, 2008.
[14] R. Burke, B. Mobasher, C. Williams, and R. Bhaumik. Classification featuresfor attack detection in collaborative recommender systems. In Proceedings of the12th ACM SIGKDD international conference on Knowledge discovery and datamining, pages 542–547. ACM, 2006.
[15] Y. Cheng and G.M. Church. Biclustering of expression data. In Proceedings ofthe eighth international conference on intelligent systems for molecular biology,volume 8, pages 93–103, 2000.
[16] Z. Cheng and N. Hurley. Robust collaborative recommendation by least trimmedsquares matrix factorization. In Tools with Artificial Intelligence (ICTAI), 201022nd IEEE International Conference on, volume 2, pages 105–112. IEEE, 2010.
[17] P.A. Chirita, W. Nejdl, and C. Zamfir. Preventing shilling attacks in online rec-ommender systems. In Proceedings of the 7th annual ACM international work-shop on Web information and data management, pages 67–74. ACM, 2005.
[18] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein.Introduction to algorithms. MIT press, 2001.
[19] J. Delgado and N. Ishii. Memory-based weighted-majority prediction. In ACMSIGIR99 Workshop on Recommender Systems: Algorithms and Evaluation. Cite-seer, 1999.
[20] M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms.ACM Transactions on Information Systems (TOIS), 22(1):143–177, 2004.
[21] S. Feiner, B. MacIntyre, T. Hollerer, and A. Webster. A touring machine: Proto-typing 3d mobile augmented reality systems for exploring the urban environment.Personal and Ubiquitous Computing, 1(4):208–217, 1997.
76
[22] Ronald A Fisher. Frequency distribution of the values of the correlation coeffi-cient in samples from an indefinitely large population. Biometrika, 10(4):507–521,1915.
[23] Huiji Gao and Huan Liu. Location-based social network data repository, 2014.
[24] Huiji Gao, Jiliang Tang, Xia Hu, and Huan Liu. Exploring temporal effects forlocation recommendation on location-based social networks. In Proceedings ofthe 7th ACM conference on Recommender systems, pages 93–100. ACM, 2013.
[25] Huiji Gao, Jiliang Tang, Xia Hu, and Huan Liu. Modeling temporal effectsof human mobile behavior on location-based social networks. In Proceedingsof the 22nd ACM international conference on Conference on information andknowledge management, pages 1673–1678. ACM, 2013.
[26] Huiji Gao, Jiliang Tang, and Huan Liu. Addressing the cold-start problem inlocation recommendation using geo-social correlations. Data Mining and Knowl-edge Discovery, pages 1–25, 2014.
[27] Marco Gori, Augusto Pucci, V Roma, and I Siena. Itemrank: A random-walkbased scoring algorithm for recommender engines. In Proceedings of the 20thInternational Joint Conference on Artifical Intelligence, pages 2766–2771, 2007.
[28] J.L. Herlocker, J.A. Konstan, A. Borchers, and J. Riedl. An algorithmic frame-work for performing collaborative filtering. In Proceedings of the 22nd annualinternational ACM SIGIR conference on Research and development in informa-tion retrieval, pages 230–237. ACM, 1999.
[29] J.L. Herlocker, J.A. Konstan, L.G. Terveen, and J.T. Riedl. Evaluating col-laborative filtering recommender systems. ACM Transactions on InformationSystems (TOIS), 22(1):5–53, 2004.
[30] T. Hofmann. Latent semantic models for collaborative filtering. ACM Transac-tions on Information Systems (TOIS), 22(1):89–115, 2004.
[31] N. Hurley, Z. Cheng, and M. Zhang. Statistical attack detection. In Proceedingsof the third ACM conference on Recommender systems, pages 149–156. ACM,2009.
[32] Mohsen Jamali and Martin Ester. Trustwalker: a random walk model for com-bining trust-based and item-based recommendation. In Proceedings of the 15thACM SIGKDD international conference on Knowledge discovery and data min-ing, pages 397–406. ACM, 2009.
[33] S. Julier, M. Lanzagorta, Y. Baillot, L. Rosenblum, S. Feiner, T. Hollerer, andS. Sestito. Information filtering for mobile augmented reality. In IEEE and ACMInternational Symposium on Augmented Reality, pages 3–11. IEEE, 2000.
77
[34] D. Kim and B.J. Yum. Collaborative filtering based on iterative principal com-ponent analysis. Expert Systems with Applications, 28(4):823–830, 2005.
[35] Eric Klopfer. Augmented learning: Research and design of mobile educationalgames. MIT Press, 2008.
[36] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborativefiltering model. In Proceeding of the 14th ACM SIGKDD international conferenceon Knowledge discovery and data mining, pages 426–434. ACM, 2008.
[37] S.K. Lam and J. Riedl. Shilling recommender systems for fun and profit. InProceedings of the 13th international conference on World Wide Web, pages393–402. ACM, 2004.
[38] Sangkeun Lee, Sang-il Song, Minsuk Kahng, Dongjoo Lee, and Sang-goo Lee.Random walk based entity ranking on graph for multidimensional recommen-dation. In Proceedings of the fifth ACM conference on Recommender systems,pages 93–100. ACM, 2011.
[39] B. Mehta, T. Hofmann, and P. Fankhauser. Lies and propaganda: detectingspam users in collaborative filtering. In Proceedings of the 12th internationalconference on Intelligent user interfaces, pages 14–21. ACM, 2007.
[40] B. Mehta and W. Nejdl. Attack resistant collaborative filtering. In Proceed-ings of the 31st annual international ACM SIGIR conference on Research anddevelopment in information retrieval, pages 75–82. ACM, 2008.
[41] B. Mehta and W. Nejdl. Unsupervised strategies for shilling detection and robustcollaborative filtering. User Modeling and User-Adapted Interaction, 19(1):65–97, 2009.
[42] B. Mobasher, R. Burke, and J. Sandvig. Model-based collaborative filteringas a defense against profile injection attacks. In Proceedings of the NationalConference on Artificial Intelligence, volume 21, page 1388, 2006.
[43] M.P. O’Mahony, N.J. Hurley, and G.C.M. Silvestre. Promoting recommenda-tions: An attack on collaborative filtering. In Database and Expert SystemsApplications, pages 213–241, 2002.
[44] M.P. O’Mahony, N.J. Hurley, and G.C.M. Silvestre. An evaluation of neighbour-hood formation on the performance of collaborative filtering. Artificial Intelli-gence Review, 21(3):215–228, 2004.
[45] M.P. O’Mahony, N.J. Hurley, and G.C.M. Silvestre. Recommender systems: At-tack types and strategies. In Proceedings of the National Conference on ArtificialIntelligence, volume 20, page 334, 2005.
78
[46] Moon-Hee Park, Jin-Hyuk Hong, and Sung-Bae Cho. Location-based recom-mendation system using bayesian users preference model in mobile devices. InUbiquitous Intelligence and Computing, pages 1130–1139. Springer, 2007.
[47] Alexandrin Popescul, David M Pennock, and Steve Lawrence. Probabilistic mod-els for unified collaborative and content-based recommendation in sparse-dataenvironments. In Proceedings of the Seventeenth conference on Uncertainty inartificial intelligence, pages 437–444. Morgan Kaufmann Publishers Inc., 2001.
[48] Jing Qian and Venkatesh Saligrama. Spectral clustering with unbalanced data.arXiv preprint arXiv:1302.5134, 2013.
[49] Gerhard Reitmayr and Dieter Schmalstieg. Data management strategies formobile augmented reality. In Proceedings of International Workshop on SoftwareTechnology for Augmented Reality Systems, pages 47–52, 2003.
[50] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: anopen architecture for collaborative filtering of netnews. In Proceedings of the1994 ACM conference on Computer supported cooperative work, pages 175–186.ACM, 1994.
[51] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Item-based collaborative fil-tering recommendation algorithms. In Proceedings of the 10th international con-ference on World Wide Web, pages 285–295. ACM, 2001.
[52] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Application of dimension-ality reduction in recommender system-a case study. Technical report, DTICDocument, 2000.
[53] D. Schmalstieg, G. Schall, and D. Wagner. Managing complex augmented realitymodels. pages 48–57, 2007.
[54] Shang Shang, Sanjeev R Kulkarni, Paul W Cuff, and Pan Hui. A random walkbased model incorporating social information for recommendations. In 2012International Workshop on Machine Learning and Signal Processing, pages 1–6.IEEE, 2012.
[55] X. Su and T.M. Khoshgoftaar. A survey of collaborative filtering techniques.Advances in Artificial Intelligence, 2009:4, 2009.
[56] Karen HL Tso-Sutter, Leandro Balby Marinho, and Lars Schmidt-Thieme. Tag-aware recommender systems by fusion of collaborative filtering algorithms. InProceedings of the 2008 ACM Symposium on Applied computing, pages 1995–1999. ACM, 2008.
[57] DWF Van Krevelen and R Poelman. A survey of augmented reality technologies,applications and limitations. International Journal of Virtual Reality, 9(2):1,2010.
79
[58] V. Vlahakis, J. Karigiannis, M. Tsotros, N. Ioannidis, and D. Stricker. Personal-ized augmented reality touring of archaeological sites with wearable and mobilecomputers. In Wearable Computers, 2002.(ISWC 2002). Proceedings. Sixth In-ternational Symposium on, pages 15–22. IEEE, 2002.
[59] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing,17(4):395–416, 2007.
[60] Chieh-Chih Wang, Jennifer Healey, and Meiyuan Zhao. Augmenting on-roadperception: enabling smart and social driving with sensor fusion and cooper-ative localization. In Proceedings of the 3rd Augmented Human InternationalConference, page 21. ACM, 2012.
[61] J. Wang, A.P. De Vries, and M.J.T. Reinders. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proceedings of the29th annual international ACM SIGIR conference on Research and developmentin information retrieval, pages 501–508. ACM, 2006.
[62] C.A. Williams, B. Mobasher, and R. Burke. Defending recommender systems:detection of profile injection attacks. Service Oriented Computing and Applica-tions, 1(3):157–170, 2007.
[63] Liang Xiang, Quan Yuan, Shiwan Zhao, Li Chen, Xiatian Zhang, Qing Yang,and Jimeng Sun. Temporal recommendation on graphs via long-and short-termpreference fusion. In Proceedings of the 16th ACM SIGKDD, pages 723–732.ACM, 2010.
[64] Mao Ye, Peifeng Yin, and Wang-Chien Lee. Location recommendation forlocation-based social networks. In Proceedings of the 18th SIGSPATIAL In-ternational Conference on Advances in Geographic Information Systems, pages458–461. ACM, 2010.
[65] S. Zhang, Y. Ouyang, J. Ford, and F. Makedon. Analysis of a low-dimensionallinear model under recommendation attacks. In Proceedings of the 29th annualinternational ACM SIGIR conference on Research and development in informa-tion retrieval, pages 517–524. ACM, 2006.
[66] Zhuo Zhang, Paul Cuff, and Sanjeev Kulkarni. Iterative collaborative filteringfor recommender systems with sparse data. In Machine Learning for SignalProcessing (MLSP), 2012 IEEE International Workshop on, pages 1–6. IEEE,2012.
[67] Zhuo Zhang, Pan Hui, Sanjeev R Kulkarni, and Christoph Peylo. Enabling anaugmented reality ecosystem: A content-oriented survey. In Mobile AugmentedReality and Robotic Technology-based Systems.
[68] Zhuo Zhang and Sanjeev R Kulkarni. Graph-based detection of shilling attacksin recommender systems. In Machine Learning for Signal Processing (MLSP),2013 IEEE International Workshop on, pages 1–6. IEEE, 2013.
80
[69] Zhuo Zhang and Sanjeev R Kulkarni. Detection of shilling attacks in recom-mender systems via spectral clustering. In Fusion Conference, 2014 InternationalConference on Information Fusion, 2014.
[70] Zhuo Zhang, Shang Shang, Sanjeev R Kulkarni, and Pan Hui. Improving aug-mented reality using recommender systems. In Proceedings of the 7th ACMconference on Recommender systems, pages 173–176. ACM, 2013.
[71] F. Zhou, H.B.L. Duh, and M. Billinghurst. Trends in augmented reality tracking,interaction and display: A review of ten years of ismar. In 7th IEEE/ACM In-ternational Symposium on Mixed and Augmented Reality, pages 193–202. IEEE,2008.
[72] Q. Zhou and F. Zhang. A hybrid unsupervised approach for detecting profileinjection attacks in collaborative recommender systems. 2012.
[73] W. Zhu, C.B. Owen, H. Li, and J.H. Lee. Personalized in-store e-commerce withthe promopad: an augmented reality shopping assistant. Electronic Journal forE-commerce Tools and Applications, 1(3):1–19, 2004.
81