Download - Sparsity, robustness, and diversification of Recommender Systems · 2014-09-26 · recommender systems, namely sparsity, robustness and diversi cation. The dissertation starts with

Sparsity, robustness, and

diversification of Recommender Systems

Zhuo Zhang

A Dissertation

Presented to the Faculty

of Princeton University

in Candidacy for the Degree

of Doctor of Philosophy

Recommended for Acceptance

by the Department of

Electrical Engineering

Adviser: Sanjeev R. Kulkarni

September 2014

c© Copyright by Zhuo Zhang, 2014.

All rights reserved.

Abstract

Recommender systems have played an important role in helping individuals select

useful items or places of interest when they face too many choices. Collaborative

filtering is one of the most popular methods used in recommender systems. The idea

is to recommend to the target user an item that users with similar tastes will pre-

fer. An important goal of recommender systems is to predict the user’s preferences

accurately. However, prediction accuracy is not the only evaluation metric in recom-

mender systems. In this dissertation, we will mainly deal with three other aspects of

recommender systems, namely sparsity, robustness and diversification.

The dissertation starts with iterative collaborative filtering to overcome sparsity is-

sues in recommender systems. Instead of calculating the similarity matrix using sparse

data only once, we iterate this process many times until convergence is achieved. To

overcome the sparsity, users’ ratings in dense areas are estimated first and these esti-

mates are then used to estimate other ratings in sparse areas. Second, the robustness

of recommender system is taken into consideration to detect shilling attacks in recom-

mender systems. Some graph-based algorithms are applied in the user-user similarity

graph to detect the highly correlated group, in order to get the group of fake users.

Finally, we consider diversification of the types of information being used for rec-

ommendations. Specifically, geographical information, temporal information, social

network information, and tag information are all aggregated in a biased random walk

algorithm to make use of diversified data in multi-dimensional recommender systems.

iii

Acknowledgements

I would like to express my sincere gratitude to Professor Sanjeev Kulkarni, my advisor

in the EE department. As a fresh graduate student in summer 2011, I joined Professor

Kulkarni’s group. I have to admit that at that moment I had very little research

experience in academia, had little idea about what to do. It is Professor Kulkarni

who advised me not only in the academic area, but also in the career development.

He suggested me to read more to explore interesting research areas and follow my

passions instead of simply assigning a project to me. He encouraged me to go to

industry for a summer internship twice to see whether I enjoyed it. Without his

tremendous support, I could not achieve so much and finish the dissertation within 4

years.

I would also like to thank Professor Paul Cuff for his assistance and guidance in

my academic work, especially for serving as a dissertation reader. He always provided

insightful comments about my ongoing projects and proposed interesting questions

to get me inspired and motivated.

I would also like to thank my internship advisor, my dissertation reader, and

my friend, Professor Pan Hui. During the summer internship in Deutsche Telekom

Innovation Lab, I got so many innovative ideas in the mobile and social network area

from him, which forms part of my dissertation.

I would like to thank Professor Mung Chiang for serving on my general exam

committee, and Professor Peter Ramadge and Professor Mung Chiang for serving on

my thesis committee as non-readers.

My special thanks go to one of my group mates Shang Shang. I have had very

close collaborations with Shang in the recommender system area and I benefited

much from tremendous discussions with her. She contributed much to the location-

based recommender system during our last year in Princeton. I would also like to

express my gratitude to my friends in Princeton, Tiance (Mark) Wang, Haipeng

iv

Zheng, Guanchun (Arvid) Wang, Jieqi Yu, Pingmei Xu and Zhen (James) Xiang,

and all the other friends in EE and ORFE departments, for the sleepless nights we

were working together before deadlines, and for all the fun we have had in the last

four years.

Finally, and most importantly, I would like to thank my parents Xuan and Tianx-

ing, without whom my life would not be possible. Their support, encouragement,

patience, and love push me forward during four years of PhD life.

v

To my parents.

vi

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

1 Introduction 1

1.1 Iterative Collaborative Filtering in Sparse Recommender Systems . . 2

1.2 Shilling Attack Detection using Graph-based Algorithms . . . . . . . 3

1.3 Location-based Multi-dimensional Recommender Systems . . . . . . . 4

2 Iterative Collaborative Filtering in Sparse Recommender Systems 5

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Iterative Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Iterative Framework . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.2 Selective Processes . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.3 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 15

vii

2.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 16

3 Graph-based Shilling Attack Detection in the Recommender Sys-

tems 19

3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.1 Attack Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Graph-based Filtering Algorithm . . . . . . . . . . . . . . . . . . . . 24

3.2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.2 Heuristic Merging . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.3 Searching for the Largest Component . . . . . . . . . . . . . . 28

3.2.4 Iterative Refinement . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Spectral Clustering Detection . . . . . . . . . . . . . . . . . . . . . . 31

3.3.1 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.2 Dealing with Unbalanced Structure . . . . . . . . . . . . . . . 34

3.3.3 Iterative Refinement . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.4 Searching for the Number of Attack Profiles . . . . . . . . . . 38

3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4.2 Assumption Validation . . . . . . . . . . . . . . . . . . . . . . 40

3.4.3 Searching for the Number of Attack Profiles . . . . . . . . . . 41


3.4.5 Experimental Results and Discussion . . . . . . . . . . . . . . 44

4 Location-based Recommender Systems 48

4.1 Introduction to Augmented Reality . . . . . . . . . . . . . . . . . . . 49

4.1.1 AR Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.1.2 Content Fusion in AR . . . . . . . . . . . . . . . . . . . . . . 51

viii

4.2 Background of Location-based Recommender Systems . . . . . . . . . 55

4.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4 Random Walk in Location-based Recommender Systems . . . . . . . 57

4.4.1 Model Construction . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4.2 Score Computation . . . . . . . . . . . . . . . . . . . . . . . . 60

4.4.3 Personalized Recommendation through θ . . . . . . . . . . . . 61

4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.5.1 Preliminary Experiments for Geographical Model . . . . . . . 63

4.5.2 Dataset Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 65


4.5.4 Compared Algorithms . . . . . . . . . . . . . . . . . . . . . . 69

4.5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 70

5 Conclusion 72

5.1 Contribution of the Dissertation . . . . . . . . . . . . . . . . . . . . . 72

5.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . 73

Bibliography 75

ix

List of Tables

2.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1 General Form of Attack Profiles . . . . . . . . . . . . . . . . . . . . . 21

3.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1 Average Percentile of Recommendations . . . . . . . . . . . . . . . . 64

4.2 Recall of Top Recommendations . . . . . . . . . . . . . . . . . . . . . 64

4.3 Average Distance of Top Recommendations . . . . . . . . . . . . . . 64

4.4 Statistics of Foursquare Dataset . . . . . . . . . . . . . . . . . . . . . 65

4.5 Geographic Distance Statistics . . . . . . . . . . . . . . . . . . . . . . 66



x

List of Figures

2.1 Iterative Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 MAE versus Data Sparsity . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 RMSE versus Data Sparsity . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 Coverage versus Data Sparsity . . . . . . . . . . . . . . . . . . . . . . 18

3.1 The Average Similarity PDF for Different Group Size . . . . . . . . . 40

3.2 The 99% Percentile Average Similarity Value . . . . . . . . . . . . . . 41

3.3 G0(n) and Fitting Curve . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4 Group Size Vs Average Similarity for 100 Random Attackers . . . . . 43

3.5 Group Size Vs Average Similarity for 100 Average Attackers . . . . . 44

3.6 Group Size Vs Average Similarity for 100 Bandwagon Attackers . . . 45

4.1 AR Ecosystem Framework . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Content Fusion Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3 An example of recommendation graph . . . . . . . . . . . . . . . . . 58

4.4 Pairwise Check-in Places Distance Distribution . . . . . . . . . . . . 67

4.5 Average Distance Distribution for a Single User . . . . . . . . . . . . 68

4.6 Check-in Time Frequency . . . . . . . . . . . . . . . . . . . . . . . . 69

4.7 Top K Hit Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

xi

Chapter 1

Introduction

In less than two decades, recommender systems have been widely used on the Internet,

providing users with personalized items and information. They play a very important

role in making profits for companies such as Amazon or Netflix. In recommender

systems, there are lists of both users and items. Each user will use scores or linguistic

terms such as like or dislike to rate a subset of all possible items. With a large amount

of information, users often find it hard to select the useful and relevant information.

Therefore, recommender systems are designed to help select the relevant information

for the specific user [1]. By analyzing the available ratings, collaborative filtering

attempts to make the best predictions or recommendations to the target user. The

underlying principle in collaborative filtering is to find a group of users with similar

tastes and then provide a prediction for the target user based on the preferences of

similar users.

There are several current challenges in recommender systems. First, recommender

systems need to overcome sparsity issues. Only a small proportion of users tend to

rate or leave feedback on the products they used, while the set of items is usually very

large. Thus the rating information is always limited and the user-item rating matrix

1

is very sparse. The traditional collaborative filtering algorithms will cause overfitting

issues, leading to inaccurate prediction results [47].

Second, evaluating recommender systems and their algorithms is inherently dif-

ficult for several reasons [29]. High accuracy may not be the only goal that recom-

mender systems want to achieve. Since recommender systems have been widely used,

people are able to exploit some basic knowledge about them. Some fake user profiles

are generated based on some attack models and some target items are pushed to be

more popular or less popular, in order to make some profits. Therefore, robustness

to attacks is also important in recommender systems.

Last but not least, in recent years researchers have not only used explicit rating

information or feedback, but also aggregate “side information” such as geographical

information, temporal information, tag information, and social network information,

to improve the prediction accuracy [2]. With rapid development of smartphones

and wireless networks, location-based services have become more and more popular.

Geographic information is therefore very important. Recent research has shown that

time, item categories, and even friendship between users have strong connections

for item preference. Therefore a multi-dimensional recommender system using all

available information is one of the trends in the future.

1.1 Iterative Collaborative Filtering in Sparse

Recommender Systems

Collaborative filtering (CF) is one of the most successful techniques in recommender

systems. By utilizing co-rated items of pairwise users for similarity measurements,

traditional CF uses a weighted summation to predict unknown ratings based on the

available ones. However, in practice, the rating matrix is too sparse to find sufficiently

many co-rated items, thus leading to inaccurate predictions. In Chapter 2, to address

2

the case of sparse data, we propose an iterative CF that updates the similarity and

rating matrix [66]. The improved CF incrementally selects reliable subsets of miss-

ing ratings based on an adaptive parameter and therefore produces a more credible

prediction based on similarity. Experimental results on the MovieLens dataset show

that our algorithm significantly outperforms traditional CF, Default Voting, and SVD

when the data is 1% sparse. The results also show that in the dense data case our

algorithm performs as well as state of art methods.

1.2 Shilling Attack Detection using Graph-based

Algorithms

Collaborative filtering has been widely used in recommender systems as a method

to recommend items to users. However, by using knowledge of the recommendation

algorithm, shilling attackers can generate fake profiles to increase or decrease the

popularity of a targeted set of items. In Chapter 3, we present a spectral clustering

method to make recommender systems resistant to these attacks in the case that

the attack profiles are highly correlated with each other [68, 69]. We formulate the

problem as finding a maximum submatrix in the similarity matrix. This is an NP

hard problem. In order to search for the maximum submatrix, we first translate the

matrix into a graph and then use a spectral clustering algorithm to find the min-cut

to estimate the highly correlated group. The graph is created based on the edge

density in order to allow dealing with an unbalanced clustering. The detection is

refined through an iterative process to obtain a better estimate of the group of attack

profiles. Some analysis about the stability of the refinement process is further pro-

vided. Experimental results show that the proposed approach can improve detection

precision compared to existing methods.

3

1.3 Location-based Multi-dimensional Recom-

mender Systems

Location-based recommender systems have attracted a large number of users in re-

cent years since wireless networks and mobile devices have rapidly developed [67].

Realtime location-based recommender systems should take location, temporal infor-

mation, and social network information into consideration, in order to improve the

user experience. In Chapter 4, we first review the development of augmented reality

in recent years, serving as an introduction to location-based recommender systems.

Then we present an aggregated random walk algorithm incorporating personal pref-

erences, location information, temporal information, and social network information

in a layered graph [70]. By adaptively changing the graph edge weight and comput-

ing the rank score, the proposed location-based recommender system predicts users’

preferences and provides the most relevant recommendations with aggregated infor-

mation. Specifically, the geographical information is modeled as an exponential decay

function while the temporal information is abstracted as a time vector to incorporate

into the final ranking score. A biased random walk algorithm has flexibility so that

the personalized parameter can be specified to meet different purposes. Experimental

results show that the biased random walk algorithm gives better results for location-

based multi-dimensional recommender systems than other state-of-the-art methods.

4

Chapter 2

Iterative Collaborative Filtering in

Sparse Recommender Systems

In this chapter, we describe a method for iterative collaborative filtering applied in

sparse recommender systems. This chapter is organized as follows. Section 2.1 intro-

duces the background, describes the problem formulation and related work. Section

2.2 discusses details of our proposed iterative collaborative filtering. Experimental

results and comparisons with existing methods are shown in Section 2.3.

2.1 Background

In recommender systems, there are lists of both users and items. Each user will use

scores or linguistic terms such as like or dislike to rate a subset of all possible items.

With a large amount of information, users often find it hard to select the useful and

relevant information. Therefore, recommender systems such as Netflix and Amazon

are designed to help select the relevant information for the specific user [1]. By

analyzing these ratings, collaborative filtering attempts to make the best predictions

or recommendations to the target user. The underlying principle in collaborative

5

filtering is to find a group of users with similar tastes and then provide a prediction

for the target user based on such preferences.

Typically there are two kinds of collaborative filtering, memory-based methods

and model-based methods. Memory-based methods [50] [19] [61] first calculate the

similarity function between pairs of users and then uses a weighted summation of

available ratings to predict unknown ones. In contrast, model-based methods [30] [36]

first make certain assumptions about the data and then fit the existing data into the

assumed model to obtain predictions.

Currently there are several challenges in collaborative filtering [55], one of the most

difficult of which is dealing with data sparsity. In recommender systems, extremely

sparse rating data may occur when ratings are available for only a small proportion

of items compared to the actual large item set. Traditional approaches to CF, such as

the Pearson Correlation method or Vector Similarity method [12], calculate two users’

similarity based on their co-rated items. As a result, if the rating data is extremely

sparse, the lack of co-rated items for pairwise users will lead to inaccurate similarity

estimates.

Several approaches have been developed to address the data sparsity problem. One

of them is Default Voting [12], which automatically assumes a default rating value for

some number of additional items and thus extends the aggregation domain. However,

experiments show that even though Default Voting does improve performance to some

extent, it is still too rough because a default rating lacks specialization on each item.

Another memory-based method formulates a linear model [7] to fit existing ratings

and calculates similarity through a quadratic optimization problem. Although this

approach exploits global information, it still highly depends on the availability of a

sufficient number of co-rated items.

In this chapter, we propose an iterative collaborative filtering framework to deal

with sparse data in recommender systems. Our algorithm first estimates parts of the

6

missing ratings based on similarity. Afterwards, it goes back to update the similarity

function by using the estimated ratings. This process is repeated iteratively.

2.1.1 Problem Formulation

In recommender systems, there is a set of users U = u1, ..., um and a set of items

I = i1, ..., in. For each user u, Iu denotes the corresponding subset of items that

user u has rated. Let Iuv denote the subset of items that both user u and user v have

rated. An m×n dimensional rating matrix R is then constructed. Each element ru,i in

R denotes user u’s rating on item i. The data only provides part of this information.

Let ru denote the average rating of user u on all the items user u has rated. The goal

is to estimate an unknown rating ru,i of a specific user u on a target item i.

2.1.2 Related Work

As mentioned before, memory-based methods are divided into two steps: similarity

assessment and aggregation. Various approaches have been used in collaborative

filtering. The most commonly-used method, called the Pearson Correlation [50], uses

a similarity function su,v between users u and v which is defined as

su,v =

∑i∈Iuv(ru,i − ru)(rv,i − rv)√∑

i∈Iuv(ru,i − ru)2√∑

i∈Iuv(rv,i − rv)2(2.1)

Then the unknown rating ru,i for user u and item i is usually computed as an

aggregate of known ratings. The aggregation function is defined as follows:

ru,i = ru +

∑v∈U su,v(rv,i − rv)∑

v∈U |su,v|(2.2)

From Eq.(2.1), we may infer that the Pearson Correlation between users u and

v highly depends on Iuv, the co-rated items by users u and v. As unpaired ratings

7

cannot be used separately, insufficient co-rated items can lead to inaccurate similarity

estimation.

To overcome the sparsity problem, [12] proposed a default voting technique, as-

suming a default rating value d for some number k of additional items. This is done

to extend the aggregation domain from Iu ∩ Iv to Iu ∪ Iv, replacing unpaired missing

values with a default value d. Experiments show that this can improve performance in

the sparse case but the approach is rather ad-hoc and depends greatly on the choice of

d. Another memory-based method [7] formulates a linear model to fit existing ratings

and calculates similarity wi,j between items i and j through a quadratic optimization

problem. For a fixed item i, the optimization function is defined as follows:

arg minw

∑u∈Ui

cu

(ru,i −

∑j∈Iu wi,jru,j∑j∈Iu wi,j

)2

/∑u∈Ui

cu,

where cu =(∑

j∈Iu wi,j

)2

.

Here, Ui denotes all the users that have rated item i while Iu denotes all the items

that have been rated by user u. Although this algorithm takes global information into

consideration, it is still highly dependent on co-rated items. Moreover, it needs to

solve quadratic optimization for each individual item, leading to high computational

cost.

Model-based methods view ratings as a probabilistic model, fit a model to the

training data, and then make predictions on the testing data. A variety of Singular

Value Decomposition (SVD) related algorithms have been proposed for PCA or low

rank matrix completion [11] [52]. Formally, the SVD of an m× n matrix M is given

by

M = UΣV ∗,

where U is an m× n unitary matrix, Σ is an n× n rectangular diagonal matrix with

nonnegative real numbers on the diagonal, and V ∗ is an n × n unitary matrix. We

8

take the largest k component Σk in Σ and reconstruct the low rank matrix Mk with

its corresponding Uk and Vk:

Mk = UkΣkV∗k .

Billsus and Pazzani [11] proposed a binary SVD algorithm, regarding prediction as a

classification problem. It first transforms the sparse rating data into a dense binary

matrix. Then it applies SVD to get the low dimensional data as users’ preference

features and trains n feedforward neural networks for classification. Sarwar et al. [52]

proposed a generalized SVD algorithm, which works by first filling in all missing

data with average ratings of each user, and then reducing the dimension of data with

SVD and finally using the Pearson Correlation or N-Nearest-Neighbors for predic-

tion. Later, Kim and Yum [34] proposed an iterative principal component analysis

algorithm for collaborative filtering. Instead of filling in the missing ratings with

average values of each user, it uses low dimensional data derived from SVD to fill in

the missing ratings iteratively, until convergence is achieved. All these SVD related

algorithms address the sparsity problem to varying degrees. However when data gets

increasingly sparse, the performance of SVD degrades significantly due to limited

information. In the next section, we introduce a new algorithm that has improved

performance in the sparse regime.

2.2 Iterative Collaborative Filtering

In this section we will present our iterative collaborative filtering. We start by in-

troducing the iterative framework, and then go to the details of selective processes.

Finally we will show our algorithm in pseudo-code.

9

2.2.1 Iterative Framework

In the Pearson Correlation method in Eq.(2.1), the similarity between users u and

v is calculated only on co-rated items in the subset Iuv. Default Voting and SVD

algorithms overcome the sparse problem by filling in missing values with average

ratings of users. Here we propose an iterative framework to overcome the sparsity, by

first estimating some ratings for which we have sufficient data, and then using these

to refine our estimate of the similarity measure.

We base our iterative collaborative filtering method on the Pearson Correlation,

following the steps presented below:

1. Calculate a preliminary estimate of the similarity between pairs of users.

su,v =

∑i∈Iuv(ru,i − ru)(rv,i − rv)√∑

i∈Iuv(ru,i − ru)2√∑

i∈Iuv(rv,i − rv)2,

∀(u, v) ∈ U × U such that |Iuv| > M

2. Estimate a subset of missing values that can be predicted most reliably using

the preliminary similarity estimates.

ru,i = ru +

∑v∈U su,v(rv,i − rv)∑

v∈U |su,v|,

for eligible (u, i) ∈ U × I

3. Update all the average ratings of each user and similarity between pairs of users.

4. Use the updated estimates and original data for the target prediction ru0,i0 . If

|rnewu0,i0− roldu0,i0 | ≥ α, go back to Step 2.

Fig.2.1 shows the framework of iterative collaborative filtering.

10

Figure 2.1: Iterative Framework

2.2.2 Selective Processes

From the iterative framework, we can see that since the rating matrix and similarity

function highly depend on each other, the rough estimation of the missing ratings

in the first round can be very noisy if selected inappropriately, and thus can distort

the similarity measure when it is used for updating. In the similarity measurement

step, the similarity function su,v between u and v is calculated more reliably when

the number of co-rated items |Iuv| is large. In the aggregation step, to estimate ru,i,

it is intuitive that more available ratings ru,j,∀j ∈ I will provide a better estimate

of user u’s mean value ru while more available ratings rv,i,∀v ∈ U will provide a

better estimate of the bias term for item i. Moreover, [28] and [20] show that top-N

nearest-neighbor methods will have a higher accuracy when compared to the entire

11

area ratings aggregation, which means top similar users are more reliable. Therefore,

our algorithm applies the following selective processes:

• In the preliminary estimate of the similarity measure, we only rely on those

pairwise users u and v that have more than M co-rated items. In the following

experimental section, we set M = 5.

• To fill in missing ratings, we set a threshold γ such that if there are more

than γ related ratings available for this missing value, we will estimate it using

weighted summation prediction. Otherwise we leave it blank. Here we dynam-

ically choose this threshold γ as triple the average number of known related

ratings. When data is sparse, γ is small and we need to pre-estimate many

missing ratings. On the other hand, when data gets dense, a large γ selects less

regions for estimation.

• For every prediction, we only use N nearest neighbors instead of all ratings for

aggregation. Unless otherwise noted, all experimental results demonstrated in

this paper take N = 30.

2.2.3 Algorithm Description

The pseudo-code is shown in Algorithm 1. Note that we do not need to do the rough

estimate update every time when we estimate ru0,i0 for a specific u0 and i0. In fact,

for multiple estimations in each iteration, we need to go through this step only once.

Therefore this process significantly improves the efficiency of the algorithm.

12

Algorithm 1 Iterative Collaborative Filtering

Given m users u1, ..., um and n items i1, ..., in;Given sparse rating matrix Rm×n in which ru,i denotes the rating of item i by useruGoal: Estimate ru0,i0 for specific u0 and i0while |rnewu0,i0

− roldu0,i0| ≥ 0.01 do

Calculate ru =∑

i∈Iu ru,i|Iu| ∀u ∈ U

Calculate su,v =∑

i∈Iuv (ru,i−ru)(rv,i−rv)√∑i∈Iuv (ru,i−ru)2

√∑i∈Iuv (rv,i−rv)2

∀(u, v) ∈ U × U such that |Iuv| > M, zero otherwise.for u = u1, ..., um dodu = Number of available ratings on specific user u

end forfor i = i1, ..., in doci = Number of available ratings on specific item i

end forγ = 3(m+n−1)

m×n∑

u∈U dufor u = u1, ..., um do

for i = i1, ..., in doif du + ci > γ and ru,i is missing then

Estimate ru,i = ru +∑

v∈U su,v(rv,i−rv)∑v∈U |su,v |

;

end ifend for

end forru0,i0 = ru0 +

∑u∈U su0,u(ru,i0−ru)∑

u∈U |su0,u|end whilereturn ru0,i0

2.3 Experiments

This section discusses our experimental results on a real dataset MovieLens1, com-

paring our iterative collaborative filtering algorithm with other state of art methods.

2.3.1 Experimental Design

The MovieLens dataset was first used in [50]. It contains a total of 100,000 ratings

from 943 users on 1,682 items. Each user rates at least 20 movies, based on a scale

from 1 to 5. The sparsity of the rating matrix is 6.3%. Although simple demographic

1http://www.grouplens.org

13

information of users and basic information of movies are available in the dataset, we

do not use them here because our algorithm is purely collaborative filtering which

depends only on users’ ratings of items.

The MovieLens dataset has originally been divided into 80% training set and 20%

testing set. As mentioned in MovieLens, it is already a post-processed dataset, which

eliminates users who provided less than 20 ratings and maintains users who provided

at least 20 ratings. The rating sparsity is already 6.3% while [51] mentions that in

recommender systems even the active users may purchase or rate fewer than 1% of

the items. In order to consider more typical sparsity in recommender systems, we

change the density of data and randomly pick 20%, 40%, 60%, 80% or 100% from the

orginal training set, which corresponds to 1% to 5% data sparsity for training and

the rest for testing.

In our experiments, we compare our algorithm with several other algorithms:

• Baseline. This is the baseline for our algorithm evaluation, using average ratings

of each user as prediction.

• Pearson. We implement user-based collaborative filtering, with the Pearson

Correlation as its similarity function.

• DV. This is the Default Voting algorithm [12], with optimal parameters fit.

• Naive SVD. This is the work proposed in [52], with average ratings filling in for

preprocessing. It uses the reduced dimensional matrix data directly as predic-

tion ratings.

• CF-based SVD. This is also proposed in [52], with average ratings filling in for

preprocessing. It uses low dimensional data as users’ preference features for

similarity measurement.

14

• Iterative SVD. This is an improved version of Naive SVD, proposed in [34]. It

iteratively calculates SVD and replaces its missing values with the low dimen-

sional data, until convergence is achieved.

• Iterative CF. This is our proposed iterative collaborative filtering algorithm.

We find that in experiment, this algorithm converges very fast, usually within

2 to 4 iterations.

2.3.2 Evaluation Metrics

Currently recommender systems have been evaluated in many ways [29]. In this

experiment, we use the commonly-used CF predictive accuracy metrics MAE, RMSE

and an effectiveness evaluation metrics called Coverage.

MAE & RMSE

The most commonly-used metrics are Mean Absolute Error (MAE) and Root Mean

Square Error (RMSE). They are defined as follows:

MAE =

∑(u,i) |ru,i − ru,i|

n,

RMSE =

√∑(u,i)(ru,i − ru,i)2

n,

where ru,i is the estimated value while ru,i is the ground truth, and n denotes the

total number of estimated ratings.

Coverage

Although MAE and RMSE are one way to measure accuracy, there are other metrics

to evaluate an algorithm’s effectiveness and robustness. Coverage is the percentage of

items for which predictions are effective. Algorithms with lower coverage may be less

15

valuable to users since they are limited in the area that they can help with. In the

following part, we only test on the testing set and count for the effective predictions.

coverage =Number of effective estimated ratings

Number of total ratings that need to be estimated

2.3.3 Experimental Results

In this subsection, we first show the change of MAE, RMSE and coverage with differ-

ent data sparsity. From Fig.2.2 and Fig.2.3 we can see that when data is 1% sparse,

our algorithm performs the best. Default Voting (DV) is the second best algorithm

in the sparse case. However we should note that the performance of DV greatly de-

pends on the choice of parameters. All the results of DV shown here use the optimal

parameters. When the rating matrix gets denser, even 3% sparse, our algorithm still

performs better compared to others. At 5% sparse, our algorithm still performs as

well as CF-based SVD and the Pearson Correlation method. This is because when

data is dense, the adaptive parameter γ will only choose a very limited region of

missing ratings for estimation, leading to a process similar to the Pearson Correla-

tion method. CF-based SVD is slightly better than our algorithm because it gets

rid of useless information in the original data using PCA. Fig.2.4 shows that when

data is extremely sparse, namely 1% data sparsity, our algorithm still has a very high

coverage at 96%, compared to the Pearson Correlation method’s 69%.

In order to see the performance of our algorithm in sparse data as well as the

normal case, we show all the metrics in Table 3.2 for 1% sparse data and 5% sparse

data.

Table 3.2 shows that when data is 1% sparse, iterative collaborative filtering out-

performs by 0.03 in MAE and 0.04 in RMSE respectively compared with other algo-

rithms. The Pearson Correlation algorithm and the CF-based SVD perform poorly

because of insufficient co-rated items. When data becomes dense, achieving 5% sparse,

16

Figure 2.2: MAE versus Data Sparsity

Table 2.1: Experimental Results1% Data Sparsity 5% Data Sparsity

MAE RMSE Coverage MAE RMSE CoverageBaseline 0.877 1.107 100% 0.850 1.063 100%Pearson 0.873 1.118 68.76% 0.753 0.959 99.97%

DV 0.849 1.085 99.38% 0.760 0.965 99.93%Naive SVD 0.851 1.078 100% 0.791 0.977 100%

CF-based SVD 0.868 1.135 97.63% 0.750 0.958 99.54%Iterative SVD 0.850 1.078 100% 0.753 0.962 100%Iterative CF 0.817 1.042 96.23% 0.751 0.960 99.99%

iterative collaborative filtering still performs as well as the CF-based SVD and the

Pearson Correlation method.

17

Figure 2.3: RMSE versus Data Sparsity

Figure 2.4: Coverage versus Data Sparsity

18

Chapter 3

Graph-based Shilling Attack

Detection in the Recommender

Systems

This chapter introduces shilling detection in recommender systems using graph-based

algorithms. This chapter is organized as follows. Section 3.1 describes the background

and related work. Section 3.2 formulates the problem and introduces two graph-

based detection algorithms. Section 3.3 discusses details of the spectral clustering

algorithm, an advanced clustering method for searching the highly correlated group.

Experimental results and comparisons with existing methods are presented in Section

3.4.

3.1 Background

Recommender systems are vulnerable to shilling attacks [37] in which an attacker

signs up as a number of “dummy” users and gives fake ratings in an attempt to

increase or decrease the recommendations of specific items by exploiting knowledge

of the recommender system algorithm. “Push attacks” attempt to make one or more

19

items popular in the system so that they are recommended to more users. Conversely,

attacks that make a set of items less popular are called “nuke attacks”. One of the

difficult challenges in recommender system design is to find algorithms that are robust

to shilling attacks. For simplicity, in the following we consider only push attacks, but

actually our proposed algorithm can be applied to both cases.

In this chapter, we model shilling attacks as a group of users with highly correlated

ratings. We formulate the detection problem as a spectral clustering problem, namely

to cluster the whole user set and choose the fake profile groups from clusters. However,

since the rating matrix is usually very sparse, we cannot easily define a complete

distance measure for the clustering problem. In order to overcome the sparsity, we

construct a graph based on the similarity matrix. Using that graph we apply a spectral

clustering algorithm, which is based on the similarity measure instead of distance

measure for clustering. Experimental results show that our method performs well for

a range of different attacks.

Our algortihm makes the following contributions over prior work:

• We do not make any assumption on the attack model except that attack users

are highly correlated;

• We start from intra attributes to focus on statistics across user profiles instead

of individual profiles;

• We apply graph-based algorithms and spectral clustering to cluster user profiles

based on similarity, but we avoid the use of a distance measure, namely the

correlations between pairwise users;

• Our algorithms do not need to specify the exact number of attack profiles and

it can automatically estimate the number of attack profiles.

20

Table 3.1: General Form of Attack ProfilesLT LS LF LN

lt lS1 · · · lSs lF1 · · · lFf lN1 · · · lNnγ(lt) α(lS1 ) · · · α(lSs ) β(lF1 ) · · · β(lFf ) null null null

3.1.1 Attack Models

All the notations are addressed in Section 2.1.1 and 2.1.2. A fairly general form of an

attack profile is shown in Table 3.1. First the target item is rated as either highest or

lowest. Then some items are selected to be rated to mimic the real users’ rating so

that the fake profile can be similar to the real user, in order to make some impact on

the final recommendations. Based on the function, the attack profile can be thought

of as four sets of items:

• LT : a singleton target item lt;

• LS: a set of selected items with particular characteristics determined by the

attacker;

• LF : a set of filler items usually chosen randomly;

• LN : a set of unrated items.

In a typical attack, the target item lt is usually set at either the highest score (for

a push attack) or the lowest score (for a nuke attack). However, different choices of

rating functions and selections of LS and LF lead to different attack models, some

of which are described below. Here N(µ, σ2) denotes the Gaussian distribution with

mean µ and variance σ2.

• Random attack: LS = φ and β(l) ∼ N(r, σ2).

• Average attack: LS = φ and β(l) ∼ N(rl, σ2l ).

• Bandwagon attack: LS contains some number of popular items, α(l) = rmax

and β(l) ∼ N(rl, σ2l ).

21

3.1.2 Related Work

O’Mahony et al. summarize different types of attack strategies and empirically

evaluate the robustness of memory-based collaborative filtering [43–45]. The work

in [16, 41, 42, 65] extends the robustness analysis to model-based algorithms such as

K-means, Probabilistic Latent Semantic Analysis (PLSA), Singular Value Decompo-

sition (SVD), Principle Component Analysis (PCA), and Matrix Factorization (MF).

In the existing algorithms, three main methods are proposed for attack detection in

recommender systems.

Generic Attributes

Attack profiles usually have low deviation from the mean value for most items, but

high deviation from the mean for the attacked item, and they are highly correlated

with each other. Therefore generic attributes are often used to evaluate the deviation

of rating profiles or the similarity with nearest neighbors. [17] proposes Rating Devi-

ation from Mean Agreement (RDMA) and Degree of Similarity with Top Neighbors

(DegSim) to classify fake profiles. Furthermore, an unsupervised retrieval method

(UnRAP) based on matrix residues was proposed in [13]. These measures are defined

as follows.

• Rating Deviation from Mean Agreement (RDMA):

RDMAu0 =

∑l∈Lu0

|ru0,l−rl||Ll|

|Lu0|,

where |Lu| is the number of ratings that user u has rated and |Ll| is the number

of ratings provided for item l.

• Degree of Similarity with Top Neighbors (DegSim):

DegSimu0 =

∑ki=1 s0,ni

k,

22

where uniis the ith nearest neighbor for user u0.

• Unsupervised Retrieval of Attack Profiles (UnRAP):

UnRAPu0 =

∑l∈Lu0

(ru0,l − ru0 − rl + r)2

(ru0,l − rl)2.

Model-specific Attributes

Prior work has shown that generic attributes are generally insufficient for distinguish-

ing an attack profile from eccentric but authentic profiles [14]. Model-based methods

assume that we have some prior knowledge about the attack model. Based on an

assumed model, ratings can be automatically divided into LS and LF . Finally several

measurements such as Filler Mean Variance (FMV) or Filler Mean Target Differ-

ence (FMTD) [62] can be computed from each subset to evaluate the authenticity of

profiles.

• Filler Mean Variance (FMV)

FMVu0 =∑l∈LF

(ru0,l − rl)2

|LF |,

where LF denotes the filler item set.

• Filler Mean Target Difference (FMTD)

FMTDu0 =

∣∣∣∣∑

l∈LSru0,l

|LS|−∑

l∈LFru0,l

|LF |

∣∣∣∣ ,where LS denotes the selected item set while LF denotes the filler item set.

Intra Attributes

Model-specific methods usually need some training data to estimate parameters of

the attack model. Otherwise LS and LF can not be easily separated. Unlike generic

23

attributes and model-specific attributes which concentrate on characteristics within

a single profile, intra attributes focus on statistics across profiles. As Mehta et al.

mention in [39], spam users are highly correlated and often work together. There-

fore a PCA-based method can be applied to remove the most correlated users. This

approach orders all the user profiles based on the contribution of the principle com-

ponent (or the top few principle components) and removes the most highly correlated

items. However, [39–41, 72] do not specify how to choose the number of principle

components to be considered. In [68], starting from the same assumption, a large

component searching algorithm is proposed in the similarity graph in order to find

the highly correlated group. It is unstable when a small size of random attack is

applied since the algorithm only searches for local optimal solutions but in this case

the local optimum may be far away from the global optimum.

In the remainder of this chapter, we will start from the same assumption that fake

users are highly correlated, but we propose a more robust algorithm to find the most

correlated group of users.

3.2 Graph-based Filtering Algorithm

3.2.1 Problem Formulation

As we mentioned before, shilling attackers generally work as a group and are highly

correlated with each other. Suppose we have m user profiles among which n are fake.

From the original rating matrix Rm×p we can calculate the user correlation matrix

S = (si,j)m×m based on Eq.(2.1). Our final goal is to find the n × n submatrix with

the maximum sum in the original m ×m matrix, with the same columns and rows

selected. We define ~δ = (δ1, ..., δn) where δi is an indicator function that represents

whether column/row i is being selected. Therefore the problem is formulated as

below.

24

~δ = arg max||~δ||1=n

1

|~δ|2~δS~δT (3.1)

= arg max∑mi=1 δi=n

1

|~δ|2

m∑i=1

m∑j=1

si,jδiδj,

where δi = 0 or 1,∀i = 1, ...,m.

Searching for a maximal submatrix is typically referred to as a biclustering prob-

lem [15] and several algorithms have been proposed. Since (3.1) requires selecting the

same columns and rows, it is different from traditional biclustering. However, a data

matrix can be viewed as a weighted graph G = (V,E) where V is the set of vertices

and E is the set of edges. Each vertex in V denotes a corresponding column/row and

each edge between vi and vj has a weight si,j. Therefore we can get a suboptimal

solution from the graph, a subset of points which have a high average weight within

group. In the following subsections, we propose two algorithms, namely heuristic

merging and largest component searching, to derive an approximate solution. After

that, we use an iterative process to refine the attack group. There is always a trade-

off between the group size n and the average similarity within the group. Therefore,

how to set up the stopping criterion is another challenge in this problem and we will

address it later.

3.2.2 Heuristic Merging

A naive greedy algorithm selects one user out of the group with the highest average

similarities with all users in the group, and adds it into the original group. When

the group size is relatively large and most of them are fake profiles, this algorithm

works very well. However, when the group size is relatively small, the initial points

are hard to select. Therefore, we propose a generalized greedy, or heuristic merging

algorithm, regarding each point initially as a separate cluster, merging them step by

25

step by heuristic functions and finally leading to an optimal cluster with the right

size. Before doing that, we first introduce some notation for convenience.

• C(t)i is the node set of cluster i after t merging actions.

⋃iC

(t)i =1, ...,m and

C(t)i

⋂C

(t)j = ø, ∀i 6= j.

• n(t)i = |C(t)

i | is the number of nodes in cluster i after t merging actions.

• d(t)ij = 1

n(t)i n

(t)j

∑x∈C(t)

i ,y∈C(t)jsx,y is the average distance between i and j after t

merging actions, i 6= j.

• f (t)i = 1

(n(t)i )2

∑x∈C(t)

i ,y∈C(t)isx,y is the average similarity within the group after t

merging actions.

Initially each node belongs to an individual cluster, i.e. C(0)i = i,∀i = 1, ...,m.

Then at each time t we search for two clusters C(t)i , C

(t)j based on a heuristic function

f(C(t)i , C

(t)j ) and merge them together. Our final goal is to find a cluster C

(t0)i0

to

maximize f(t0)i0

such that n(t0)i0≥ n.

In the problem above we have two objects to focus on, namely the size of the

cluster n(t0)i0

and the average utility score f(t0)i0

. Therefore we can choose our heuristic

function h as either n(t0)i0

or f(t0)i0

. But before the algorithms are introduced, let us

first see the relationship after two clusters are combined.

Claim 1 Suppose in the (t + 1)th merging action, C(t)i and C

(t)j merge, i.e.

C(t+1)i = C

(t)i

⋃C

(t)j . Then

d(t+1)ik =

n(t)i d

(t)ik + n

(t)j d

(t)jk

n(t)i + n

(t)j

,∀k 6= i.

f(t+1)i =

(n(t)i )2f

(t)i + (n

(t)j )2f

(t)j + 2n

(t)i n

(t)j d

(t)ij

(n(t)i + n

(t)j )2

;

26

Proof.

d(t+1)ik =

∑x∈C(t+1)

i ,y∈C(t+1)k

sx,y

n(t+1)i n

(t+1)k

=

∑x∈C(t)

i

⋃C

(t)j ,y∈C(t)

ksx,y

(n(t)i + n

(t)j )n

(t)k

=

∑x∈C(t)

i ,y∈C(t)ksx,y

(n(t)i + n

(t)j )n

(t)k

+

∑x∈C(t)

j ,y∈C(t)ksx,y

(n(t)i + n

(t)j )n

(t)k

=n

(t)i n

(t)k d

(t)ik

(n(t)i + n

(t)j )n

(t)k

+n

(t)j n

(t)k d

(t)jk

(n(t)i + n

(t)j )n

(t)k

=n

(t)i d

(t)ik + n

(t)j d

(t)jk

n(t)i + n

(t)j

.

f(t+1)i =

∑x∈C(t+1)

i ,y∈C(t+1)i

sx,y

(n(t+1)i )2

=

∑x∈C(t)

i

⋃C

(t)j ,y∈C(t)

i

⋃C

(t)jsx,y

(n(t+1)i )2

=

∑x∈C(t)

i ,y∈C(t)isx,y

(n(t)i + n

(t)j )2

+

∑x∈C(t)

j ,y∈C(t)jsx,y

(n(t)i + n

(t)j )2

+2

∑x∈C(t)

i ,y∈C(t)jsx,y

(n(t)i + n

(t)j )2

=(n

(t)i )2f

(t)i + (n

(t)j )2f

(t)j + 2n

(t)i n

(t)j d

(t)ij

(n(t)i + n

(t)j )2

.

The heuristic merging algorithm is shown in Algorithm 2.

We can easily verify that when h(C(t)i , C

(t)j ) = n

(t+1)i = n

(t)i + n

(t)j (when n

(t+1)i

is equal we will maximize f(t+1)i to avoid trivial cases) the algorithm becomes the

naive greedy algorithm, which adds the largest node into the original cluster at

each step and converges within n − 1 steps. When we set h(C(t)i , C

(t)j ) = f

(t+1)i =

(n(t)i )2f

(t)i +(n

(t)j )2f

(t)j +2n

(t)i n

(t)j d

(t)ij

(n(t)i +n

(t)j )2

, we allow nodes to merge freely regardless of the current

cluster size. Finally, after at most m− 1 steps, a cluster with more than n nodes will

27

Algorithm 2 Heuristic Merging Algorithm

Input: Sm×m, a symmetric correlation matrix of real numbers, and n, the maximumsize of cluster.Output: C

(t0)i0

, a cluster which has a size n(t0)i0≥ n and local maximal average

similarity f(t0)i0

.

Initialization: C(0)i = i, n(0)

i = 1, d(0)ij = sij, f

(0)i = 0 ∀i, j = 1, ...,m. t = 0.

while maxi(n(t)i ) ≤ n do

Find cluster i, j s.t. h(C(t)i , C

(t)j ) achieves maximum:

C(t+1)i ← C

(t)i

⋃C

(t)j , n

(t+1)i ← n

(t)i + n

(t)j ;

d(t+1)ik ← n

(t)i d

(t)ik +n

(t)j d

(t)jk

n(t)i +n

(t)j

,∀k 6= i.

f(t+1)i ← (n

(t)i )2f

(t)i +(n

(t)j )2f

(t)j +2n

(t)i n

(t)j d

(t)ij

(n(t)i +n

(t)j )2

;

t← t+ 1;end whilereturn C

(t)i

appear to be a solution. In the following, we will use f(t+1)i as the merging heuristic

function.

3.2.3 Searching for the Largest Component

The heuristic merging algorithm achieves a local optimal solution step by step. How-

ever, as we mentioned before, we must do it carefully for the first few steps in order

to keep the local optimum not far away from the global optimum. Is there a way to

find a group of highly correlated users in a single step, namely to find a cluster that

maximizes the average similarity f(C) for a cluster C? Here f(C) is defined as

f(C) =1

|C|2∑

x∈C,y∈C

sx,y

A natural way would be to set a threshold to break some edges and find the largest

component from the modified graph.

Here we set a threshold γ and convert Sm×m to a connected graph based on the

following rule: if si,j > γ, we link the two vertices vi and vj; otherwise we break the

28

Algorithm 3 Largest Component Searching Algorithm

Input: Sm×m, a symmetric correlation matrix of real numbers, and n, the maximumsize of cluster.Output: C, a cluster with a size larger than n and a relatively high average simi-larity within the cluster f(C).Denote γ as the translation threshold, δγ as the smallest distinguished thresholdand Gm×m as the translated graphInitial γ ← 1repeatγ ← γ − δγif su,v > γ then

Set u, v connected in Gelse

Set u, v unconnected in Gend ifFind the largest component in G, denote as C

until |C| ≥ nreturn C

link between two vertices. When γ is close to 1, all the vertices are separate from each

other. As γ decreases, the original separate components connect with each other due

to high correlation. In this case, the largest component in the graph, denoted as C,

can be derived based on the classic algorithm introduced in [18] for an approximate

solution of Eq. (3.1). Here, we can always choose a proper γ to make sure the size of

C is around a prefixed number n. In the following step, we will further refine the set

and automatically determine the final fake profile size n. The algorithm is shown in

Algorithm 3.

3.2.4 Iterative Refinement

We expect that the solution derived from either heuristic merging or largest compo-

nent searching will contain a certain level of noise since many genuine profiles could be

highly similar with at least one of the fake profiles. Based on either algorithm, a real

user’s profile could be classified as an attack profile when one edge with large weight

is connected to the attack group. However the attack profiles share high similarity

29

Algorithm 4 Iterative Refinement

Input: Sm×m, a symmetric correlation matrix of real numbers, and C, a clusterwith a high average similarity within the cluster f(C).Output: C, a cluster with a size larger than n and a relatively high average simi-larity within the cluster f(C).while C 6= C0 doC0 = Cfor i = 1 to s doCoru ←

∑v∈C su,v|C| ,∀u ∈ C

C = C ∪ arg max(Coru)u∈Cend forfor i = 1 to s doCoru ←

∑v∈C su,v|C| ,∀u ∈ C

C = C/ arg min(Coru)u∈Cend for

end whilereturn C

with all the other attack profiles. Therefore we need a further refinement process for

this algorithm.

Here a greedy algorithm is applied for refinement. We separate this process into

two steps, deletion and addition. A pre-set number s steps of addition/deletion is

specified. For the deletion step, we can calculate the average similarity in set C and

remove the least correlated profile from C. We repeat this process s times to delete

s profiles. And then we proceed to the addition step. The average similarity of each

profile in C with profiles in C is calculated and the one with highest average similarity

with C is added. The addition process is repeated s times as well. We repeatedly

perform s deletions and s additions until convergence is achieved. Note that usually

we do not choose s = 1 but a larger number, say s = 10 in this case. This helps

avoid local optima and helps push the final results toward the global optimum. The

algorithm is shown in Algorithm 4.

30

3.3 Spectral Clustering Detection

3.3.1 Spectral Clustering

A connection between data matrices and graphs has been exploited in Section 3.2.

A data matrix can be viewed as a weighted graph G = (V,E) where V is the set

of vertices and E is the set of edges. Each vertex vi ∈ V denotes a corresponding

column/row, and each edge between vi and vj has a weight si,j. Therefore the adja-

cency matrix is represented by Sm×m. To find a highly correlated group in the graph,

our aim is to maximize the intragroup correlations and to minimize the intergroup

correlations. Given a subset of vertices C ⊂ V and its complement C, we define

Cut(C, C) as the cost function of a graph separation.

Cut(C, C) =∑

i∈C,j∈C

si,j

In order to get approximately equal size of each group, we incorporate the group size

and define RatioCut(C, C) as the minimization function.

RatioCut(C, C) =∑

i∈C,j∈C

si,j|C|

+∑

i∈C,j∈C

si,j|C|

(3.2)

Instead of optimizing Eq.(3.1), here in this section we will optimize Eq.(3.2) to get

the highly correlated group for fake user detection.

To rewrite the RatioCut function, let us first define the degree of a vertex vi ∈ V

as

di =m∑j=1

si,j.

31

The degree matrix D is defined as the diagonal matrix with the degrees d1, ..., dm on

the diagonal.

D =

d1 0 · · · 0

0 d2 · · · 0

......

. . . 0

0 · · · · · · dn

The unnormalized graph Laplacian matrix is defined as

L = D − S.

We further define the vector f = (f1, ..., fm)′ ∈ Rm with entries

fi =

√|C|/|C|, if vi ∈ C

−√|C|/|C|, if vi ∈ C

(3.3)

32

The RatioCut function then can be rewritten as follows:

f ′Lf = f ′Df − f ′Sf

=m∑i=1

dif2i −

m∑i=1

m∑j=1

fifjsi,j

=1

2(m∑i=1

dif2i − 2

m∑i=1

m∑j=1

fifjsi,j +m∑j=1

djf2j )

=1

2

m∑i=1

m∑j=1

si,j(fi − fj)2

=1

2

∑i∈C,j∈C

si,j

(√|C||C|

+

√|C||C|

)2

+

1

2

∑i∈C,j∈C

si,j

(−

√|C||C|−

√|C||C|

)2

= Cut(C, C)

(|C||C|

+|C||C|

+ 2

)= Cut(C, C)

(|C|+ |C||C|

+|C|+ |C||C|

)= |V |RatioCut(C, C).

We know that

m∑i=1

fi =∑i∈C

√¯|C||C|−∑i∈C

√|C|¯|C|

= |C|

√¯|C||C|− |C|

√|C|¯|C|

= 0

and

||f ||2 =m∑i=1

f 2i = |C|

¯|C||C|

+ |C| |C|¯|C|= |V |.

From the derivation above, minimizing the RatioCut function is the same as mini-

mizing f ′Lf . From [59], if we relax the condition on f and allow it to be continuous

33

instead of two predefined values in Eq.(3.3), the nontrivial optimal solution to min-

imize f ′Lf would be the second smallest eigenvalue of L, regardless of the smallest

eigenvalue 0 and trivial eigenvector (1, . . . , 1)′. Based on the second eigenvector, we

can map the values back to two corresponding values in Eq.(3.3) and get the right

clusterings.

3.3.2 Dealing with Unbalanced Structure

Standard spectral clustering creates a division into two roughly equal-sized clusterings

based on the RationCut function in Eq.(3.2). However in our case, the target group

size |C| |V |. The final optimization problem would be

minC∈V,|C|≤η|V |

Cut(C, C) =∑

i∈C,j∈C

si,j, (3.4)

where η is an upper bound on the attack size. Here, |V | = m and η|V | = n. To deal

with unbalanced data, we choose the rank-adjustment degree graph for separations

[48]. Basically it can be divided into three steps.

1. Rank Computation: The density rank R(vi) for each vertex vi is calculated

based on the underlying density function ρ(·).

R(vi) =1

|V |∑vk∈V

Iρ(vi)≥ρ(vk).

Here, we use the average weights with vi’s top 30 nearest neighbors as ρ(vi).

2. Graph Construction: Connect each point vi to its d(vi) nearest neighbors in

graph G, where

d(vi) = d0(λ+ 2(1− λ)R(vi)),

34

where λ is a scalar parameter to deal with the unbalanceness. For other edges,

we set the weight as zero. In the following part, we take λ = 0.5.

3. Graph Separation: Calculate the second smallest eigenvector of L and separate

it iteratively for the highly correlated group.

The reason we want to adjust the degree is because we want to add edges in the dense

area while reducing edges in the sparse area. With adjustment, the penalty of having

a smaller cluster has been reduced since edges have already been cut in the sparse

area. After adjustment, when we apply the standard spectral clustering method,

the algorithm can automatically cut the graph into two unbalanced structures. The

smaller cluster with higher average similarity will contain the fake users group CF .

Denote T as the pending processing stack and we initially insert U into it. We pop

each element V from T , apply this algorithm to separate into two subgroups V1 and

V2 and add them into stack T until the group size less than or equal to n. Finally the

group with a size less than or equal to n with the highest average similarity within

the group is selected as CF . The complete algorithm is shown in Algorithm 5.

3.3.3 Iterative Refinement

We expect that the solution derived from spectral clustering will contain a certain

level of noise since many genuine profiles could be highly similar with some of the fake

profiles. Therefore sometimes a real user’s profile could be classified as an attack pro-

file when several edges with large weight are connected to the attack group. However

the attack profiles share high similarity with all the other attack profiles while these

real profiles only share high similarity with a few attack profiles. Therefore we need a

further refinement process to remove these real profiles. The procedure is addressed

in Section 3.2.4.

35

Algorithm 5 Spectral Clustering Attack Detection Algorithm

Input: Sm×m, a symmetric correlation matrix.Goal: Find CF , a cluster with a high average similarity within the cluster with size≤ n.Add users set U into the pending stack Twhile nonempty(T ) do

Pop V from TR(ui) = 1

|V|∑

uk∈V Iρ(ui)≥ρ(uk) for each ui ∈ Vd(ui) = d0(λ+ 2(1− λ)R(ui)) for each ui ∈ VAdjust si,j = 0 if uj is not the top d(ui) nearest neighbor of uiCalculate graph Laplacian matrix L = D − S.Singular vector decomposition L = M × Σ×NTake second eigenvector of L and separate all the values in V into two groups V1

and V2 based on signsif |V1| ≤ n or |V2| ≤ n then

if AvgSim(V1) ≥ max or AvgSim(V2) ≥ max thenCF = V1 or V2

max =AvgSim(V1) or AvgSim(V2)end if

elseAdd V1 and V2 into T

end ifend whilereturn CF

We can further analyze the robustness of the refinement procedure. If both ui

and uj are fake users, assume that the similarity si,j between them is drawn from

N(µF , σ2); otherwise, the similarity (between two real users or between a real user

and a fake user) is drawn from N(µR, σ2). We further assume that they are all

independent. Here µF > µR due to the high correlation between fake users. Suppose

we have found a highly correlated group of users CR with size n. Let x denote the

current percentage of fake users in the group. Then for a new fake user ui coming

from outside of the group, the total similarity distribution with users in the group

36

CR is

CorFui =1

n

∑uj∈CR

si,j

=1

n

∑uj∈CR

uj is fake

si,j +∑uj∈CR

uj is real

si,j

=

1

n

xn∑j=1

Ωj +

(1−x)n∑j=1

ωj

∼ N

(µR + x(µF − µR),

σ2

n

),

where

Ωj ∼ N(µF , σ2) and ωj ∼ N(µR, σ

2).

While for a real user ui, the total similarity distribution

CorRui =1

n

∑uj∈CR

si,j =1

n

n∑j=1

ωj ∼ N

(µR,

σ2

n

).

Thus, CorFui −CorRui∼ N(x(µF − µR), 2σ2

n). A fake user is selected in the refinement

procedure with a probability

P (CorFui > CorRui) = P (CorFui − CorRui> 0)

= φ

(x√n(µF − µR)√

2σ

), (3.5)

where φ(x) is the cumulative distribution function (CDF) of the standard normal

distribution

φ(x) =1√2π

∫ x

−∞e−t

2/2dt

Similarly, a real user is removed from the target group in the refinement procedure

with the same probability in Eq.(3.5). We can see that the robustness depends on

37

the original fake user percentage x, target group size n and the statistical difference

between fake users and real users µF−µRσ

. A larger proportion of fake users, larger

group size, and larger difference between fake users and real users all give more

robustness of the refinement procedure.

3.3.4 Searching for the Number of Attack Profiles

Since we do not have any prior knowledge, it is difficult to find the right number of

attack profiles n. We know that when the size of the group gets larger, the average

similarity within the group would become lower. Therefore for the target profiles with

attacks, our algorithm varies the size of the highly correlated group to get a sequence,

G(n), denoting the maximal average similarity of a group with size n. Suppose G0(n)

is the maximal average similarity sequence for profiles without attacks. The optimal

number of fake profiles n∗ is then obtained from Eq.(3.6) as

n∗ = arg maxnG(n)−G0(n) (3.6)

Therefore, for the detection task, we first formulate it as an optimization problem

in Eq.(3.1). We construct the similarity graph with edge adjustment to deal with the

unbalanced clustering and then apply the spectral clustering algorithms iteratively,

to find the most correlated group with a fix size n, shown in Algorithm 5. We then

refine the results using greedy algorithm in Algorithm ??. Finally we vary the size n

and find the right attack size n∗ to maximize Eq.(3.6).

38

3.4 Experiments

3.4.1 Experimental Setup

In the experiments, we use the MovieLens dataset [28]. It contains around 100,000

ratings from 943 users on 1,682 items. Each user rates at least 20 movies on a scale

from 1 to 5. The density of the rating matrix is 6.3%. 80% of the ratings are randomly

selected as the training set and the rest as testing.

We randomly pick 20 movies as the attack items for each test and artificially insert

50, 70 or 100 attack profiles (corresponding to 5%, 7% and 10% attack size) with filler

size 5%. We choose such a filler size because it is consistent with the real user profiles.

Each movie is attacked individually and the average is reported in the results.

Since a general random attack cannot have a big impact on the final prediction,

we generate ratings based on N(r, (0.7σ)2) for an enhanced random attack. While

for an average attack, we generate ratings based on N(rl, σl2). For a bandwagon

attack, we select movies 50, 56, 100, 127, 174, 181 as the selected set LS and rate

them as rmax = 5. We select these movies because they are rated by more than 300

users and have an average rating larger than 4, which means they are very popular

in the system. We further generate two obfuscated attack models. One is a noisy

bandwagon attack, a special case of Average Over Popular Items (AOP) mentioned

in [31], which randomly selects 3 out of the 6 popular movies mentioned above rated

as rmax to avoid high correlations. The other is a mixed attack [10], which combines

a 3% or 5% average attack together with a 3% or 5% noisy bandwagon attack, to

make the attack model diversified and difficult to identify.

We compare our spectral clustering (SC) algorithm with RDMA [17], DegSim [17],

UnRAP [13] and large component searching (LC) algorithms [68]. In the experiments,

since we have no prior knowledge of the exact number of attack profiles, our algorithm

and LC derive n∗ from Eq.(3.6). For RDMA, DegSim and UnRAP, we assume the

39

exact number of attack profiles is exactly known, which yields the same precision and

recall in the results.

3.4.2 Assumption Validation

From [22], if ru,i1 , ..., ru,ip and rv,i1 , ..., rv,ip are generated from G(r, σ2) with no

correlations, the pdf of su,v is proportional to f(s) = (1−s2u,v)

p−42 . If we want to know

the distribution for independent multiple correlation’s distribution, we can simply do

the convolution of f(s). In real case, p is the number of co-rated items by two users u

and v. Then we can draw the figure for average similarity and 99% percentile average

similarity value in Fig.3.1 and Fig.3.2.

Figure 3.1: The Average Similarity PDF for Different Group Size

In Fig.3.1, we can see that when the group size is becoming larger, the pdf is

more central to its mean. We further see the 99% percentile value with different

group sizes in Fig.3.2 and the shape of the curve is similar to the real users’ maximal

40

Figure 3.2: The 99% Percentile Average Similarity Value

average similarity in Fig.3.3, but a little lower. The reason is that the calculated 99%

percentile average similarity value assumes that the similarities are independent with

each other but actually high correlations exist.

3.4.3 Searching for the Number of Attack Profiles

We run the spectral clustering algorithm first on the real user profiles to get the

relationship between maximal average similarity and group size G0(n). G0(n) is

shown as a solid line in Fig.3.3. We can see that G0(n) first decays very fast but

then begins to decay slowly. To analyze the characteristics of G0(n), we segment the

function into two sections and use an exponential curve and linear curve respectively

to fit. The expression of the fitting curve G∗0(n) is shown in Eq.(3.7) and the curve is

41

Figure 3.3: G0(n) and Fitting Curve

shown by the dashed line in Fig.3.3.

G∗0(n) =

0.686e−0.117n + 0.412, if n ∈ (1, 100]

−2.26× 10−4n+ 0.378, if n ∈ (100, 943]

(3.7)

The real users’ behavior and the real users plus 100 random/average/bandwagon

attackers’ behaviors are drawn in Fig.3.4, 3.5, 3.6. The difference is also shown by

the dashed line. We can see that the difference is maximized when the group size

is around 100 for random attacks and average attacks. However for the bandwagon

attacks, the group size is around 110. This is because the over count of the real users

who are similar to this fake users group. Further analysis will be illustrated in the

following sections.

42

Figure 3.4: Group Size Vs Average Similarity for 100 Random Attackers


For the classification of attack profiles, we use precision and recall to evaluate the

performance of the detection algorithm:

Precision =TP

TP + FP,

Recall =TP

TP + FN,

where TP is the number of attack profiles correctly detected, FP is the number of real

user profiles misclassified as attack profiles and FN is the number of attack profiles

misclassified as real user profiles.

43

Figure 3.5: Group Size Vs Average Similarity for 100 Average Attackers

3.4.5 Experimental Results and Discussion

The experimental results comparing our proposed SC with LC, RMDA, DegSim and

UnRAP are shown in Table 3.2. We can separate all these algorithms into two classes

based on their underlying assumptions. Both UnRAP and RMDA assume fake users

rate items with lower variance. DegSim, LC and SC assume fake users are highly

correlated and work together.

SC performs very well in most cases, especially in the enhanced random attack

case while the other algorithms lose their effectiveness. The reason is that the random

attack group has the lowest correlations among all the attack models. When the

attack size is small, there is no significant difference between genuine profiles and

fake profiles. Therefore even LC cannot easily find the right size n∗ due to the similar

values of G(n) and G0(n). However, our SC algorithm starts from the global effect

44

Figure 3.6: Group Size Vs Average Similarity for 100 Bandwagon Attackers

and uses the second smallest eigenvector of the graph Laplacian L to cut the graph

with minimal cost. Therefore, it is more stable and more effective.

UnRAP does the worst job in most cases because it assumes a fake user’s ratings

have lower variance and are related to the column mean, row mean, and overall mean.

However, random attacks and bandwagon attacks do not satisfy this assumption,

leading to poor results. The average attacks fit this assumption well so that the

algorithm does a better job but is still not accurate enough. RMDA starts from a

similar assumption to evaluate the genuineness of profiles, but is still not effective

due to the limitation of the assumption.

The DegSim method does not perform well because essentially it starts from the

assumption that fake profiles are highly correlated, but only focuses on the k nearest

neighbors instead of overall neighbors. SC starts from the same assumption but gets

the global optimal solution from min-cut algorithms. Note that SC separates real

45

and fake profiles quite well but the estimate of n∗ usually contains some error. As a

result, sometimes either high precision or recall is achieved but not both. Even in the

two obfuscated attack models with lower correlations, SC still performs well while the

performance of the other algorithms decreases significantly. Overall, SC outperforms

the existing methods, especially when fake profiles are highly correlated.

46

Tab

le3.

2:E

xp

erim

enta

lR

esult

sA

ttack

Mod

elE

nh

an

ced

Ran

dom

Att

ack

Aver

age

Att

ack

Ban

dw

agon

Att

ack

Aver

age

Over

Pop

ula

rIt

ems

Mix

edA

ttack

Att

ack

Siz

e5%

7%

10%

5%

7%

10%

5%

7%

10%

5%

7%

10%

3%

+3%

5%

+5%

Pre

cisi

on

SC

99.8

%99.9

%99.0

%99.3

%99.9

%99.9

%92.3

%95.7

%91.7

%88.9

%92.4

%93.9

%99.8

%99.0

%L

C0.3

%21.2

%53.6

%83.2

%99.8

%99.2

%92.7

%91.5

%90.5

%90.6

%89.3

%87.8

%92.1

%95.0

%D

egS

im5.2

%5.7

%18.7

%23.1

%36.9

%60.3

%74.5

%77.3

%81.2

%64.3

%72.2

%72.1

%43.4

%66.4

%R

MD

A72.3

%74.4

%78.3

%74.2

%76.2

%77.2

%72.5

%79.0

%81.2

%71.4

%78.2

%81.6

%33.3

%57.3

%U

nR

AP

1.0

%1.4

%3.0

%48.2

%48.8

%68.2

%8.7

%9.6

%28.1

%6.7

%12.6

%23.1

%23.3

%33.3

%

Rec

all

SC

90.7

%92.9

%94.1

%91.0

%92.9

%99.9

%99.7

%96.1

%99.7

%89.2

%93.6

%92.1

%92.3

%94.2

%L

C0.3

%46.3

%64.2

%81.0

%94.6

%99.0

%99.8

%99.9

%99.5

%96.2

%93.1

%92.1

%97.1

%96.8

%D

egS

im5.2

%5.7

%18.7

%23.1

%36.9

%60.3

%74.5

%77.3

%81.2

%64.3

%72.2

%72.1

%43.4

%66.4

%R

MD

A72.3

%74.4

%78.3

%74.2

%76.2

%77.2

%72.5

%79.0

%81.2

%71.4

%78.2

%81.6

%33.3

%57.3

%U

nR

AP

1.0

%1.4

%3.0

%48.2

%48.8

%68.2

%8.7

%9.6

%28.1

%6.7

%12.6

%23.1

%23.3

%33.3

%

Pre

dic

tion

Sh

ift

SC

0.0

40.0

50.0

70.0

10.0

3-0

.01

0.0

20.0

10.0

50.1

00.1

70.1

20.0

20.0

6L

C0.5

80.5

70.5

20.1

3-0

.01

0.0

10.0

10.0

10.0

30.0

70.1

30.2

90.0

1-0

.03

Deg

Sim

0.5

30.6

50.7

10.8

50.9

10.9

00.5

10.6

50.7

40.5

70.6

40.6

50.7

20.7

9R

MD

A0.2

10.2

80.3

40.3

80.4

20.5

70.5

10.6

20.6

90.5

20.6

50.6

80.9

30.9

6U

nR

AP

0.5

70.7

00.7

90.6

50.7

00.8

11.0

81.2

81.3

01.0

81.1

71.2

91.0

31.1

5

47

Chapter 4

Location-based Recommender

Systems

Location-based recommender systems have attracted a large number of users in recent

years since wireless networks and mobile devices have rapidly developed. Realtime

location-based recommender systems should take location, temporal information, and

social network information into consideration, in order to improve the user experi-

ence. In this chapter, we first review the development of augmented reality in recent

years from a content generation perspective, serving as an introduction to location-

based recommender systems. Then we present an aggregated random walk algorithm

incorporating personal preferences, location information, temporal information, and

social network information in a layered graph. By adaptively changing the graph edge

weight and computing the rank score, the proposed location-based recommender sys-

tem predicts users’ preferences and provides the most relevant recommendations with

aggregated information. Section 4.1 reviews recent applications, technologies, and

current trends in augmented reality. Section 4.2 briefly introduces the background

of location-based recommender systems and Section 4.3 formulates the problem as

a multi-dimensional recommender system. In Section 4.4, a biased random walk al-

48

gorithm is introduced to incorporate all available information. Finally experimental

results are shown in Section 4.5.

4.1 Introduction to Augmented Reality

4.1.1 AR Ecosystem

Augmented Reality (AR) has become an emerging technology in daily life. With

accurate location information, virtual objects can be integrated with the real world,

which allows users to interact between the real and virtual world. In the work of

Azuma in 1997 [5], three characteristics of AR are identified:

• Combine real and virtual objects in a real environment;

• Run interactively in both 3D and real time;

• Align real and virtual objects with each other.

AR technologies, both hardware and software, have rapidly developed in the past

several years [71], and the market has driven the development of more commercial AR

applications (e.g., Layar, Google glass, and Wikitube). We envision an AR ecosystem

which, with content as the core, bringing together content providers, users, AR ap-

plication developers, AR device manufacturers, industrial and academic researchers,

and transforming the current AR landscape (the way iTunes has changed mobile ap-

plication distribution). The AR ecosystem framework is shown in Fig.4.1. Content

providers will aggregate data from third party companies such as Google or Wikipedia,

local broadcasting sources, environmental sensors, and users, generate AR content,

and export general APIs to support a large range of AR applications. Users will

not only consume AR content and services but also will generate their own content

(e.g., locations or local information), thanks to sensing ability of their smart devices.

49

Figure 4.1: AR Ecosystem Framework

They will also contribute to the system in a crowd-sourcing approach, like the current

YouTube model. AR device manufacturers focus on hardware design such as GPS,

sensors, displays, or integration like smartphones or AR glasses. Researchers can con-

tribute by inventing advanced techniques in tracking, computer vision, ad hoc and

opportunistic data/content delivery and dissemination, mobile computing, display,

energy efficiency, etc. With the ecosystem, AR application developers do not need

to collect data, design their AR devices, or propose their own tracking algorithms.

Instead, they can use standard APIs to get data packages from content providers, and

embed existing AR-related algorithms into their devices made by third party manu-

facturers. Meanwhile, users can contribute through interaction with the ecosystem.

Each party in the ecosystem plays its own role, improves the efficiency of the whole

AR environment, and makes it more sustainable and extendable.

Recent advances in hardware and software for AR have been reviewed in several

survey papers [4, 5, 57, 71]. Localization and calibration have been the most difficult

challenges since AR was first proposed in the 1960s. Current sensor networks apply

multi-sensors system and cooperative localization algorithm to overcome them [60].

With the rapid development of wireless communication such as 3G and WiFi, the

50

communication and data exchange between each components of the AR ecosystem

are easy to implement. Ad hoc and opportunistic communication further provides a

scalable way to deliver AR content to users, especially in the current era of mobile

data explosion. The maturity of mobile computing along with the development of

AR-related algorithms lays a good foundation for the AR ecosystem in both hardware

and software. However, a natural challenge is how to generate the content for AR.

Therefore, a core component of the AR ecosystem would be content generation [6].

Content providers aggregate data from multiple sources, process and generate struc-

tured content which will be displayed to the users by AR devices, and thereby enable

the interaction with virtual objects.

4.1.2 Content Fusion in AR

A large amount of information is available online. However, display screens of AR

systems are usually small and narrow. Therefore, after gathering enough data from

content providers, an AR system will integrate multiple data streams that represent

the same real-world object, keeping the captured information consistent, accurate,

and informative. Content fusion plays an important role in user experience and

effective methods are necessary to determine what to display on the screen. Limited

by the computing power of local devices, current AR systems usually implement

content fusion offline, select highly related information, and store it in the database

in advance [73]. However, since AR systems operate interactively between real and

virtual objects in real time, the online content selection would be a main approach

to help users get the most relevant information [58,60,70].

For the content fusion pipeline, we can refer the three-tier model proposed by

Reitmayr and Schmalstieg [49, 53]. The first tier is a database, where data is ac-

quired from third party companies. The second tier is delivery, where the data in the

database is restructured to meet the specific use of the applications. The third tier is

51

Figure 4.2: Content Fusion Pipeline

for different applications to use, which belongs to the online content recommendation.

The pipeline is shown in Fig.4.2.

Offline Data Preprocessing

Information integration in the database can be regarded as offline content selection or

data preprocessing. Usually AR application developers do not execute the function

of content providers at the same time. They will download standardized format

content packages from third party companies such as Google, Wikipedia, Yelp, etc.

Therefore highly related and structured information is selected and assembled by

these websites in advance. However, some AR application developers still want to

personalize their AR system content by using their own expertise. Zhu et al. [73]

propose an AR shopping assistant providing dynamic contextualization to customers.

Product context is utilized and complementary products are aligned with each other

in the database in advance. When customers are interested in some specific items,

the shopping assistant automatically provides recommendations for closely related

products as well.

Online Content Selection

For online fusion, information is automatically selected in real time, depending on the

particular location, orientation, and user’s preference. In 2000, Julier et al. [33] in-

troduced the concept of information filtering to automatically select content to users.

52

They also specified some characteristics and desirable prperties of online content se-

lection procedures.

• Any object, of any type, at any point in time, can become sufficiently important

when it passes the filtering criteria.

• Certain objects are important to all users at all times.

• Some objects are only important to the particular users.

• All things being equal, the amount of information shown to a user about an

object is inversely proportional to the distance of that object from the user.

Filtering criteria helps to evaluate whether a certain object is important enough

for a specific users. Based on the filtering criteria, there are three kinds of information

filtering methods.

• Distance-based filtering: It thresholds an object’s visibility based on its distance

from the user. If the distance is larger than a pre-set threshold, information

about the object would be invisible to the user. However, some soft-threshold

methods are proposed as well. One example is the Touring Machine [21], the

brightness of augmented labels decreases as they are further far away from

center.

• Visibility-based filtering: The visibility of virtual objects depends on whether

the real objects are visible to user at the current time. It will automatically

prevent extra information of invisible objects from being displayed on the screen.

• Advanced filtering: Benford et al. establish a spatial model, using focus and

nimbus to determine the importance of objects [9]. [33] proposes hybrid filtering,

which combines a spatial model and logic rules together with knowledge of the

user’s objectives.

53

Currently more advanced techniques are being used for online content selection.

When the goal of the user is well defined, location is often one of the most impor-

tant criteria for content selection. [58] proposes a touring system to help reconstruct

archaeological sites using wearable and mobile computers. Based on the different

locations, computers will automatically download related information, providing ar-

chaeological sites and audio narration. The in-car intelligent system [60] updates

surrounding traffic information for drivers in real time to avoid possible accidents.

Specifically, social messages from other drivers such as ”Follow Me” or warnings from

the sensor systems of other cars such as ”Distracted Driver” are augmented to the cur-

rent driver through the intelligent system, improving car-to-car communication. [35]

proposes an AR education system, with an automatic content selection procedure.

Through mobile devices and positioning systems, learners have access to relevant

information as they arrive at certain locations.

Sometimes when the purpose or the preference of the customers are uncertain, de-

signing such a content selection criteria can be more challenging. Plenty of historical

data is stored and much calculation is required to better understand a user’s pref-

erence. The shopping assistant in [73] provides personalized item recommendations

based on customer preferences by using the collaborative filtering algorithm. The

most related items are recommended as customers walk around. [70] further extends

the idea into multi-dimensional recommender systems and proposes a graph-based

algorithm to automatically recommend AR users some places of interest, based on

the time, location, user history, and social network information.

In all, content fusion plays an important role in AR systems. Offline data pre-

processing deals with large amounts of data in cloud servers with high speed and the

data is stored with a predefined structure. After that the online content selection is

processed by local AR devices as real-time data is collected from devices. However,

computational power or memory can be limitations that prevent complex algorithm

54

implementation. Therefore sometimes crowd-sourcing or cloud computing is applied

to address the limited computing power of local devices.

4.2 Background of Location-based Recommender

Systems

As mentioned in the last section, with the rapid development of wireless networks,

location-based services using mobile devices such as AR google glass, Yelp and

Foursquare, have gained an inordinate number of users in recent years. As infor-

mation increases from multiple sources while the screen size of mobile devices is

limited, it is becoming increasingly important to design location-based recommender

systems to push relevant information to mobile users [46]. Compared with traditional

recommender systems, location-based recommender systems have the following

characteristics.

• Location: Nearby recommendations are usually more interesting than a place

in a remote location [64].

• Timing: Short-term or in-time preferences have high priority. For example, on

Sunday at 11am, a user is more likely looking for a brunch restaurant or coffee

shop rather than a night club [63].

• Cold start: New users are sensitive to application user experience but sparse

data may lead to inaccurate recommendations [26].

• Immediate feedback: Sometimes location-based recommender systems have to

react quickly based on users’ behaviors. Users may click interesting items im-

mediately after a recommendation is made. The recommendation list is then

updated accordingly.

55

To address these characteristics, extra information may help improve recommen-

dation quality, such as temporal information, location information and social network

information [54,63]. One possible approach for incorporating this additional informa-

tion is multi-dimensional collaborative filtering. [56] and [2] used a reduction-based

algorithm and applied classic 2D collaborative filtering algorithms to produce a final

recommendation. This approach has limitations. For example, when we extend avail-

able ratings into high dimensions, the data will become extremely sparse and many

existing algorithms will lose their effectiveness. Also, when multiple dimensions are

decoupled as pairs of dimensions, relationships among more than two dimensions are

likely to be lost. Gori and Pucci [27] first proposed the ItemRank algorithm using a

random walk to rank all the items for recommendations. [32, 54] further incorporate

friendship and social network information. [38] proposes a random walk algorithm

based entity ranking on graph for multidimensional recommendation. Three advan-

tages are pointed out, namely flexibility to incorporate any type of entities, dealing

with data sparsity and indirect relationships, and adaptability with various graph

ranking algorithms. However, it fails to clearly specify the methodology of incorpo-

rating location and temporal information. [70] models geographical information as a

decay function while [25] and [24] models temporal information as Gaussian mixture

model and linear regression respectively.

4.3 Problem Formulation

Assume that there is a set of users U = u1, ..., um and a set of places I = i1, ..., in.

Traditionally, a two dimensional rating matrix U × I → R can be constructed. Each

element ru,i in R denotes user u’s rating of place i. The ratings can be either explicit,

for example, on a 1-5 scale as in TripAdvisor, or implicit such as “visited” or “not

visited”. The rating data typically specifies only a small number of the elements

56

of R. We further assume binary label and user social information is given. Let

L = l1, ..., lk be the set of label information of items. For example, for places, L

can be restaurants, shopping malls, bus stops, etc. Li ∈ 0, 1k denote the features of

place i, where k is the total number of labels. Correspondingly, let S = (U, ε) contain

social network information, represented by an undirected or directed graph, where U

is a set of nodes while ε is a set of edges. ∀u, v ∈ U , (u, v) ∈ ε if v is a friend of u. We

further denote set of times as T and location set as P . Then the multidimensional

rating matrix will be formulated as R = U × I × L × T × P . Given the target user

u, the current time t and current location p, our ultimate goal is to find the optimal

place defined below.

∀u ∈ U, t ∈ T, p ∈ P, iu,t,p = arg maxiR(u, l, i, t, p) (4.1)

In the following part, we will address the questions mentioned above with a pro-

posed random walk algorithm.

4.4 Random Walk in Location-based Recom-

mender Systems

In this section, we will describe our random walk algorithm in detail, and discuss how

to deal with specific issues in multidimensional AR recommender systems.

4.4.1 Model Construction

Graph Formulation

Let G = V,E be a directed graph model for AR recommender systems, as shown

in Fig.4.3. We construct the graph by the following rules. 1) Nodes V represent

constant attributes such as users, places and labels; for each entity all the nodes stay

57

Figure 4.3: An example of recommendation graph

in the same layer; 2) Edges/Weighted edges E represent variables such as locations

or time, or relationships, e.g. social network information.

In this recommender system, the nodes V = U ∪ I ∪ L form three layers, which

consist of users, places and labels. The edges E are classified into one of five classes

(described below) based on the layers that the nodes belong to. Higher weight means

higher chance to transition from one node to another. We incorporate personal

records, location information, and label information into the graph. Note that we

use the inverse exponential distribution to model a human’s mobility [3]. Let d de-

note the distance between current location and the target place. Then the human’s

mobility is modeled by the distribution 1Z

exp(−αd) where α ≥ 0 is a decay parameter

and Z is a normalization factor. α is a tunable factor set by users. For example, if

α = 0, distance to the current location will not affect recommendation results. In

contrast, if α is large, only nearby places will be recommended. Experimental results

58

in Section 4.5.5 show the effect of α on average distance between local position and

recommendation.

• For u ∈ U, i ∈ I, (u, i) ∈ E and (i, u) ∈ E if and only if user u has vis-

ited i (assume only implicit ratings are available) and the weight wu,i =

exp(−α(d(i, p)), wi,u = 1 where d(:, :) is the distance between two places, p is

the current location.

• For i ∈ I, l ∈ L, (i, l) ∈ E and (l, i) ∈ E if and only if Lli 6= 0, i.e., the place i

belongs to label l and the weight wi,l = wl,i = 1.

• For u1, u2 ∈ U, (u1, u2) ∈ E if and only if (u1, u2) ∈ ε, which means u2 is a friend

of u1. Note that the relationship in social networks is not necessarily mutual

such as “follow” in Twitter.

• For i1, i2 ∈ I, (i1, i2) ∈ E and (i2, i1) ∈ E if and only if i1 6= i2. Define

wi1,i2 = exp(−α(d(i1, i2)).

• For l1, l2 ∈ L, (l1, l2) ∈ E if and only if the transition probability from label l1

to l2 is greater than 0, which we will get from the training data set.

Transition Probability

Assume that a random walk with an initial distribution is applied in such a weighted

graph. The path is a sequence of random variables X1, ..Xt, ..., which form a Markov

chain and the future state only depends on the current state. We need to further

normalize the weight to make it a transition probability.

Let Y1, Y2, ..., Ys denote s layers in the graph. We first define the transition prob-

ability Tij between different layers Yi, Yj.

Tij :=∑

n1∈Yi,n2∈Yj

P (Xt+1 = n2|Xt = n1). (4.2)

59

Specifically in this case, we have 3 layers U, I, L and we define Tij = 13,∀i, j ∈ 1, 2, 3.

We further define the transition probability between different nodes ni ∈ Yx, nj ∈

Yy. It is normalized by all weights to the layer Yy times a layer transition probability

Txy.

Pij := P (Xt+1 = nj|Xt = ni) =wni,nj

Txy∑n∈Outi∩n∈Yy wni,n

, (4.3)

where Outi = n|(ni, nj) ∈ E.

Temporal Information

Another important factor in the AR recommender system is temporal information.

For example, users at noon may look for restaurants rather than nightclubs, while at

around 3pm they might prefer coffee or ice cream rather than fine food. Accordingly,

we will calculate the probability Pu(t) of every label activities within each time slot,

say half an hour or an hour. Here Pu(t) is a k × 1 histogram distribution vector to

denote the probability that a specific user u looks for some places related to label l

at time slot t.

4.4.2 Score Computation

Random Walk

For the recommendation graph G = (V,E), let the |V | × 1 vector θ denote the cus-

tomized probability vector. We will illustrate how to set θ based on a customized

request in the following. We define another parameter β ∈ [0, 1], called the damping

factor. With probability β, the random walk will continue its path in G. Otherwise,

it will go back to the customized probability distribution θ. Let the |V | × |V | matrix

M denote the Markov transition matrix, in which Mij = Pji in Eqn.(4.3). We fur-

ther define γ as the stationary distribution for this random process. It satisfies the

60

following equations:

γ = βMγ + (1− β)θ (4.4)

Therefore, we can transform Eqn.(4.4) into

γ = (βM + (1− β)θ1T )γ, (4.5)

where 1T is a |V | × 1 vector and 1Tγ = |γ|1 = 1.

We define A = βM + (1 − β)θ1T and the rank score γ would be the eigenvector

of A. In the following section we will assign suitable θ for different purposes. Based

on Eqn.(4.5), we can calculate the rank score γ.


For a specific time t and a target user u at a location p, we aggregate the user

information and location information into the graph. In order to combine Pu(t) and

γ to make an effective recommendation, we let Q be the |V | × k label matrix where

Qij denote whether vi belongs to label lj. Qij = 1 if and only if vi is a place node

(i.e., in the place layer) and vi belongs to label lj. Otherwise Qij = 0. Then our final

recommendation score will be

γ(t) = γ · (Q× Pu(t)), (4.6)

where · is the dot product between two vectors.

4.4.3 Personalized Recommendation through θ

Now the only challenge for this random walk algorithm is how to set θ to meet each

user’s personal requirement.

61

Regular Case

We set

θ =1

2eu +

1

2ei,

where eu and ei are the |V | × 1 unit vector, corresponding to the target user and its

current place. The ranking score β based on the PageRank algorithm is calculated.

We sort all the nodes in V based on γ and select the top n places for the top n

recommendations.

Group Case

We set

θ =1

n

n∑j=1

euj ,

where u1, ..., un are n users for recommendations. If the current location is known,

we can further add it into the θ as

θ =1

2n

n∑j=1

euj +1

2n

n∑j=1

eij ,

where u1, ..., un are n users and i1, ..., in are their corresponding locations.

Cold Start Case

The cold start problem has been one of the most important issues in recommender

system for years. It is crucial since new users will not tolerate a bad user experience for

a long time. In other words, if the recommender system cannot give good predictions

for the first several attempts, the new user may quite possibly delete the application

forever. However sparse data may lead to inaccurate personal rank score γ for new

users. Averages based on relatively low support (small values of |Iu|, i.e. the number

of places that user u has visited) can generally be improved by shrinkage towards a

62

common mean γ [8]. Set θ = 1|U |1v∈U and we can compute the rank score γ for

overall users. Therefore we can further define the rank score for cold start users as

follows.

γcold =|Iu|γ + τ γ

|Iu|+ τ, (4.7)

where the parameter τ controls the extent of the shrinkage.

Interaction/Update Case

Users will interact with an AR system in real time. When several selected places are

shown in the AR system, users will use their finger to click on the item, denoted as

i, in which they are interested. Here we propose two methods for further updates,

listed below.

• Label-driven Update: This method is fast and focuses on the label behind the

place i only. Every time user u selects place i among all the available items

the recommender system provides, we regard u’s purpose to be the label set to

which i belongs. We then replace all the top n recommendations by the top n

places that belongs to these labels.

• Place-driven Update: This method is relatively slow and needs to recompute

the rank score. Basically, we will set θ = ei to compute the rank score γ for the

recommendations.

4.5 Experiments

4.5.1 Preliminary Experiments for Geographical Model

We downloaded the Gowalla dataset1, which contains 19,183 users, 30,367 places in

NYC and 357,753 check-ins, each of which records a specific place a user has visited.

1http://code.google.com/p/locrec/downloads/list

63

Table 4.1: Average Percentile of Recommendationsα 0 0.1 0.5 1

Percentile(%) 93.5 76.3 57.4 50.1

Table 4.2: Recall of Top RecommendationsRecall (%) α = 0 α = 0.1 α = 0.5 α = 1

Top 10 83.3 32.1 26.6 24.5Top 30 95.2 75.1 47.1 30.1Top 50 100.0 81.0 65.3 43.0

Table 4.3: Average Distance of Top RecommendationsDistance (km) α = 0 α = 0.1 α = 0.5 α = 1

Top 10 18.46 7.52 5.82 3.65Top 30 20.01 9.30 7.63 6.34Top 50 22.85 13.32 10.90 9.60

We randomly select 100 users, 100 most popular places and 831 corresponding check-

ins to construct the transition graph. Due to the incomplete data, in this experiment

the layered graph only contains users, locations, and check-in information. Then a

target user is randomly picked and the current location is randomly generated. We

use the proposed algorithm to recommend places to the target user and calculate

three measures of the top recommendation list. Percentile is the average position

(in percentage) of the actual visited places out of the whole set of places, where 99%

denotes the top 1%. Recall is the number of hits (i.e., the visited places) of the

top n recommendations divided by the total number of visited places. These two

measures can evaluate the effectiveness of recommendations. The average distance

of the top recommendations can evaluate whether the location factor is taken into

account in the recommendations. In the experiment, we set β = 0.85 and vary α to

see the different performance of the algorithm. Note that when α = 0, the algorithm

will become the traditional random walk without location information. We compare

the average percentile, recall, and average distance with different α. We repeatedly

generate target user and current location 500 times. The average results are reported

in Table 4.6, 4.2 and 4.3.

64

Table 4.4: Statistics of Foursquare Dataset# of users 636# of places 1,012

# of check-ins 46,032# of friendship 674

The results show that incorporating location indeed improves AR applications by

reducing the average distance of top recommendations. When α becomes larger, the

average distance decreases while the percentile and recall decrease as well. Therefore

the value of α can deal with the tradeoff between recommendation accuracy and

average distance. Larger α means higher priority of distance and lower weight on

personal tastes.

4.5.2 Dataset Analysis

Overall Analysis

We downloaded the Foursquare dataset 2 [23], which contains 18,107 users ranging

from March 2010 to January 2011. For each user, we have their social networks,

previous check-in locations and the corresponding check-in time. Also geographical

information (e.g., longitude and latitude) of all the places are included, which can help

us calculate the distance between pairwise locations. Notice that the check-in places

are all around the world. Since our algorithm takes the geographical information into

consideration, it is more likely to apply in a certain area such as a city rather than the

whole world. Therefore we take a certain area with a radius of 50km. Also we remove

all the uses with less than 10 check-ins among the selected places. Some statistics of

the dataset is shown in Table 4.4.

2http://www.public.asu.edu/ hgao16/dataset.html

65

Table 4.5: Geographic Distance StatisticsDistance(km) Mean Median

Overall 19.6 16.5Single User 11.3 10.7

Consecutive Place 8.73 3.52

Geographical Information

Now let’s explore some of the geographical information for the check-in places. Fig.4.4

shows the histogram for distance between pairwise check-in places. We can see that

the majority of the distances are in the interval between 0 and 50km, which is the

radius of the area. The mean and median distance between pairwise check-in places

are 19.6km and 16.5km respectively. However, when we look at the mean and median

distance between check-in places for a single user, the number decreases to 11.3km

and 10.7km respectively. Its distribution is shown in Fig.4.5. When we look at the

average distance between two consecutive check-in places, it becomes only 8.73km

while its median is only 3.52km. It also states that users do check in at nearby

places rather than distant ones. If we incorporate geographical information as the

exponential function decaying with distance, it will increase the prediction accuracy.


First let’s see the overall check-in time frequency in Fig.4.6. We can see that the

frequency achieves its peak at midnight while reaches the lowest point at noon, since

people will usually stay at the workplace during the daytime and prefer to share

their locations only when they are after work. When we look at the check-in time

frequency along 24 hours, it varies and has a relatively large standard deviation. Here

the individual’s temporal preference is neglected due to limited data. Therefore Pu(t)

can be rewritten as P (t). To evaluate the difference between time t, we propose a L1

66

Figure 4.4: Pairwise Check-in Places Distance Distribution

norm as follows.

L1(P ) =1

242

23∑t1=0

23∑t2=0

|P (t1)− P (t2)|.

Here in this dataset, L1(P ) = 0.829. Therefore, incorporating temporal informa-

tion into the final ranking score will help increase the prediction accuracy as well.

We randomly select 70% check-ins as training set and the rest as testing set. For

testing, the actual check-in place is hidden but the check-in time and geographic

information (we still regard each user is at the previous check-in location) are known.


We evaluate our results with two popular evaluation metrics for top-k recommenda-

tions: recall and percentile.

67

Figure 4.5: Average Distance Distribution for a Single User

Recall : In the top-k recommendations, we consider any item in the top-k recom-

mendations that match any items in the testing set as a “hit”, as in [54].

Recall(k) =#of hits in top-k

#of testing data.

Percentile: The individual percentile score is simply the average position (in per-

centage) that the actual check-in place in the test set occupies in the recommendation

list. For example, if the actual check-in place is ranked 15th out of 100 places, the

percentile would be 85%. We calculate all the percentiles for testing data and get the

average to report in the following section.

68

Figure 4.6: Check-in Time Frequency

4.5.4 Compared Algorithms

We compare our proposed temporal random walk algorithm (TRW) with the following

popular algorithms and the results are reported in the following subsection.

• Most Popular Place (MPP): the overall frequency of check-in places are calcu-

lated and the most popular items are recommended to each user. In this case,

all the users are treated equally.

• K Nearest Neighbor (KNN): the k nearest neighbors of the target user is calcu-

lated and their average preference list is recommended to the target user.

• Naive Random Walk (NRW): we use a naive graph without geographical infor-

mation and temporal information, namely all the places are equally likely to

transit from each other, regardless of the distance.

69

Table 4.6: Average Percentile of RecommendationsMPP KNN NRW GRW TRW78.7% 77.2% 77.7% 78.1% 88.6%

Table 4.7: Average Percentile of RecommendationsHitRatio@k MPP KNN NRW GRW TRW

50 38.2% 38.6% 33.8% 39.1% 39.7%100 53.3% 51.8% 51.5% 55.2% 55.9%150 60.4% 58.3% 59.3% 62.2% 67.5%200 66.1% 63.1% 65.0% 66.9% 78.5%

• Geographical Random Walk (GRW): the geographical information with an ex-

ponential decay model is added into the graph. θ is defined as a combination

of target user and his current location.

• Temporal Random Walk (TRW): in addition to geographical information, the

temporal information is further incorporated into the ranking score of recom-

mendation list.

4.5.5 Experimental Results

Experimental results comparing with state of the art methods are shown in Table 4.6

and Table 4.7. We can see that MPP and KNN provides the baseline for all. MPP

performs surprisingly well because actually in the dataset top 5% popular check-in

places count for around 65% check-in times of total. Therefore even though it does

not take individual’s preference into consideration, this naive algorithm still performs

well. Moreover we can see that NRW is even worse than MPP in the hit ratios, but

when geographical and temporal information are taken into consideration, the hit

ratio significantly improves. The fact that the top 50 hit ratio does not change too

much for TRW is simply because of the same reason that the top 5% popular check-in

places account for 65% of the histories. But the average percentile has been improved

around 10% comparied to MPP or NRW.

70

Figure 4.7: Top K Hit Ratio

71

Chapter 5

Conclusion

Recommender systems have been widely used for many years and generate much profit

for companies such as Netflix and Amazon. This dissertation has mainly focused on

three aspects of recommender systems – sparsity, robustness, and diversification. We

provides a better understanding of the current challenges of recommender systems

and their solutions.

5.1 Contribution of the Dissertation

Chapter 2 proposes iterative collaborative filtering to deal with sparse data in recom-

mender systems. Instead of calculating the similarity function and doing a weighted

summation aggregation only once, our algorithm first calculates the similarity func-

tion in a limited but reliable region. Based on an adaptive parameter, it then selects

a reliable subset of the missing ratings to fill in using the current similarity. It then

uses the new estimates of rating to go back and update the similarity. This itera-

tive process leads to a better prediction. Experimental results show our algorithm

performs better than other state of art methods when data is relatively sparse.

In Chapter 3, a spectral clustering algorithm is applied to the detection of shilling

attacks in recommender systems. High correlations between fake users are assumed

72

and pairwise correlations are calculated to avoid the sparsity of data. A submatrix

optimization problem is formulated for this detection and then transformed into a

graph. The spectral clustering algorithm is applied to solve the min-cut problem in

the graph with unbalanced structure. Experimental results show that our spectral

clustering algorithm performs better than other current methods for several attack

models.

In Chapter 4, we first review the state of the art in AR from a content-oriented

perspective. The general concept and main components of the AR ecosystem are

described, and a core component of the AR ecosystem, the content component, is in-

troduced. Then we have proposed a location-based multi-dimensional recommender

system by applying a random walk algorithm. We incorporate location information

into the weight of edges in the graph and incorporate temporal statistics as well into

the rank score. Therefore we aggregate the user’s personal preference, location infor-

mation, and temporal information into the biased random walk to recommend places

for mobile users. Experimental results show that location is indeed incorporated into

the layered graph.

5.2 Future Research Directions

For the iterative collaborative filtering, though prediction accuracy is improved when

the rating matrix is sparse, the confidence of the estimates is still hard to calculate,

especially in the case that we will use the estimates in the dense areas to reestimate

ratings in the sparse areas. Without clear estimates of confidence, the noise can be

propagated through the iterative estimate process.

For the shilling attack detection, spectral clustering minimizes the inter-group

correlations to find the highly correlated group. In this optimization problem, two

relaxations are applied. One is from discrete values to continuous values, which leads

73

the separation to be sub-optimal. Another is the unbalanced structure. Spectral

clustering tries to cluster the nodes with similar sizes. Even though some adjustment

of edges are applied to get the unbalanced structure, the process of tuning parameters

is ad-hoc and time-consuming. Moreover, more types of attack models should be

tested in the experimental part.

For the biased random walk algorithm, sometimes it is difficult to combine infor-

mation using an uniform metric. When multiple aspects are taken into consideration,

there are usually some conflicts between different dimensions. In this case, if we

cannot evaluate the confidence of our recommendation, we are unable to aggregate

information from different dimensions together.

Moreover there are more evaluation measures that this thesis does not cover such

as novelty or confidence. Novelty means that instead of recommending items that is

the top seller and known for everyone, we need to recommend some items that users

would be unlikely to try without the help of recommender systems. Amazon can

recommend the book “Harry Potter” to every young user. It is statistically accurate

but ineffective in practice. Another interesting topic in recommender systems would

be diversified recommendations. The recommender system should give users person-

alized recommendations instead of uniform ones. Then some metrics to calculate the

overall difference between all the recommendations.

74

Bibliography

[1] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommendersystems: A survey of the state-of-the-art and possible extensions. Knowledgeand Data Engineering, IEEE Transactions on, 17(6):734–749, 2005.

[2] Gediminas Adomavicius, Ramesh Sankaranarayanan, Shahana Sen, and Alexan-der Tuzhilin. Incorporating contextual information in recommender systems us-ing a multidimensional approach. ACM Transactions on Information Systems,23(1):103–145, 2005.

[3] Miltiadis Allamanis, Salvatore Scellato, and Cecilia Mascolo. Evolution of alocation-based online social network: analysis and models. In Proceedings of the2012 ACM Internet Measurement Conference, pages 145–158. ACM, 2012.

[4] R. Azuma, Y. Baillot, R. Behringer, S. Feiner, S. Julier, and B. MacIntyre.Recent advances in augmented reality. Computer Graphics and Applications,IEEE, 21(6):34–47, 2001.

[5] R.T. Azuma et al. A survey of augmented reality. Presence-Teleoperators andVirtual Environments, 6(4):355–385, 1997.

[6] P. Belimpasakis, Y. You, and P. Selonen. Enabling rapid creation of contentfor consumption in mobile augmented reality. In Next Generation Mobile Ap-plications, Services and Technologies, 2010 Fourth International Conference on,pages 1–6. IEEE, 2010.

[7] R. Bell, Y. Koren, and C. Volinsky. Modeling relationships at multiple scalesto improve accuracy of large recommender systems. In Proceedings of the 13thACM SIGKDD international conference on Knowledge discovery and data min-ing, pages 95–104. ACM, 2007.

[8] Robert M Bell and Yehuda Koren. Improved neighborhood-based collaborativefiltering. In KDD Cup and Workshop at the 13th ACM SIGKDD, 2007.

[9] S. Benford and L. Fahlen. A spatial model of interaction in large virtual en-vironments. In Proceedings of the third conference on European Conference onComputer-Supported Cooperative Work, pages 109–124. Kluwer Academic Pub-lishers, 1993.

75

[10] Runa Bhaumik, Bamshad Mobasher, and RD Burke. A clustering approach tounsupervised attack detection in collaborative recommender systems. In Pro-ceedings of 7th IEEE ICML, Las Vegas, USA, pages 181–187, 2011.

[11] D. Billsus and M.J. Pazzani. Learning collaborative information filters. In Pro-ceedings of the Fifteenth International Conference on Machine Learning, vol-ume 54, page 48, 1998.

[12] J.S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algo-rithms for collaborative filtering. In Proceedings of the Fourteenth conference onUncertainty in artificial intelligence, pages 43–52. Morgan Kaufmann PublishersInc., 1998.

[13] Kenneth Bryan, Michael O’Mahony, and Padraig Cunningham. Unsupervisedretrieval of attack profiles in collaborative recommender systems. In Proceedingsof the 2008 ACM RecSys, pages 155–162. ACM, 2008.

[14] R. Burke, B. Mobasher, C. Williams, and R. Bhaumik. Classification featuresfor attack detection in collaborative recommender systems. In Proceedings of the12th ACM SIGKDD international conference on Knowledge discovery and datamining, pages 542–547. ACM, 2006.

[15] Y. Cheng and G.M. Church. Biclustering of expression data. In Proceedings ofthe eighth international conference on intelligent systems for molecular biology,volume 8, pages 93–103, 2000.

[16] Z. Cheng and N. Hurley. Robust collaborative recommendation by least trimmedsquares matrix factorization. In Tools with Artificial Intelligence (ICTAI), 201022nd IEEE International Conference on, volume 2, pages 105–112. IEEE, 2010.

[17] P.A. Chirita, W. Nejdl, and C. Zamfir. Preventing shilling attacks in online rec-ommender systems. In Proceedings of the 7th annual ACM international work-shop on Web information and data management, pages 67–74. ACM, 2005.

[18] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein.Introduction to algorithms. MIT press, 2001.

[19] J. Delgado and N. Ishii. Memory-based weighted-majority prediction. In ACMSIGIR99 Workshop on Recommender Systems: Algorithms and Evaluation. Cite-seer, 1999.

[20] M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms.ACM Transactions on Information Systems (TOIS), 22(1):143–177, 2004.

[21] S. Feiner, B. MacIntyre, T. Hollerer, and A. Webster. A touring machine: Proto-typing 3d mobile augmented reality systems for exploring the urban environment.Personal and Ubiquitous Computing, 1(4):208–217, 1997.

76

[22] Ronald A Fisher. Frequency distribution of the values of the correlation coeffi-cient in samples from an indefinitely large population. Biometrika, 10(4):507–521,1915.

[23] Huiji Gao and Huan Liu. Location-based social network data repository, 2014.

[24] Huiji Gao, Jiliang Tang, Xia Hu, and Huan Liu. Exploring temporal effects forlocation recommendation on location-based social networks. In Proceedings ofthe 7th ACM conference on Recommender systems, pages 93–100. ACM, 2013.

[25] Huiji Gao, Jiliang Tang, Xia Hu, and Huan Liu. Modeling temporal effectsof human mobile behavior on location-based social networks. In Proceedingsof the 22nd ACM international conference on Conference on information andknowledge management, pages 1673–1678. ACM, 2013.

[26] Huiji Gao, Jiliang Tang, and Huan Liu. Addressing the cold-start problem inlocation recommendation using geo-social correlations. Data Mining and Knowl-edge Discovery, pages 1–25, 2014.

[27] Marco Gori, Augusto Pucci, V Roma, and I Siena. Itemrank: A random-walkbased scoring algorithm for recommender engines. In Proceedings of the 20thInternational Joint Conference on Artifical Intelligence, pages 2766–2771, 2007.

[28] J.L. Herlocker, J.A. Konstan, A. Borchers, and J. Riedl. An algorithmic frame-work for performing collaborative filtering. In Proceedings of the 22nd annualinternational ACM SIGIR conference on Research and development in informa-tion retrieval, pages 230–237. ACM, 1999.

[29] J.L. Herlocker, J.A. Konstan, L.G. Terveen, and J.T. Riedl. Evaluating col-laborative filtering recommender systems. ACM Transactions on InformationSystems (TOIS), 22(1):5–53, 2004.

[30] T. Hofmann. Latent semantic models for collaborative filtering. ACM Transac-tions on Information Systems (TOIS), 22(1):89–115, 2004.

[31] N. Hurley, Z. Cheng, and M. Zhang. Statistical attack detection. In Proceedingsof the third ACM conference on Recommender systems, pages 149–156. ACM,2009.

[32] Mohsen Jamali and Martin Ester. Trustwalker: a random walk model for com-bining trust-based and item-based recommendation. In Proceedings of the 15thACM SIGKDD international conference on Knowledge discovery and data min-ing, pages 397–406. ACM, 2009.

[33] S. Julier, M. Lanzagorta, Y. Baillot, L. Rosenblum, S. Feiner, T. Hollerer, andS. Sestito. Information filtering for mobile augmented reality. In IEEE and ACMInternational Symposium on Augmented Reality, pages 3–11. IEEE, 2000.

77

[34] D. Kim and B.J. Yum. Collaborative filtering based on iterative principal com-ponent analysis. Expert Systems with Applications, 28(4):823–830, 2005.

[35] Eric Klopfer. Augmented learning: Research and design of mobile educationalgames. MIT Press, 2008.

[36] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborativefiltering model. In Proceeding of the 14th ACM SIGKDD international conferenceon Knowledge discovery and data mining, pages 426–434. ACM, 2008.

[37] S.K. Lam and J. Riedl. Shilling recommender systems for fun and profit. InProceedings of the 13th international conference on World Wide Web, pages393–402. ACM, 2004.

[38] Sangkeun Lee, Sang-il Song, Minsuk Kahng, Dongjoo Lee, and Sang-goo Lee.Random walk based entity ranking on graph for multidimensional recommen-dation. In Proceedings of the fifth ACM conference on Recommender systems,pages 93–100. ACM, 2011.

[39] B. Mehta, T. Hofmann, and P. Fankhauser. Lies and propaganda: detectingspam users in collaborative filtering. In Proceedings of the 12th internationalconference on Intelligent user interfaces, pages 14–21. ACM, 2007.

[40] B. Mehta and W. Nejdl. Attack resistant collaborative filtering. In Proceed-ings of the 31st annual international ACM SIGIR conference on Research anddevelopment in information retrieval, pages 75–82. ACM, 2008.

[41] B. Mehta and W. Nejdl. Unsupervised strategies for shilling detection and robustcollaborative filtering. User Modeling and User-Adapted Interaction, 19(1):65–97, 2009.

[42] B. Mobasher, R. Burke, and J. Sandvig. Model-based collaborative filteringas a defense against profile injection attacks. In Proceedings of the NationalConference on Artificial Intelligence, volume 21, page 1388, 2006.

[43] M.P. O’Mahony, N.J. Hurley, and G.C.M. Silvestre. Promoting recommenda-tions: An attack on collaborative filtering. In Database and Expert SystemsApplications, pages 213–241, 2002.

[44] M.P. O’Mahony, N.J. Hurley, and G.C.M. Silvestre. An evaluation of neighbour-hood formation on the performance of collaborative filtering. Artificial Intelli-gence Review, 21(3):215–228, 2004.

[45] M.P. O’Mahony, N.J. Hurley, and G.C.M. Silvestre. Recommender systems: At-tack types and strategies. In Proceedings of the National Conference on ArtificialIntelligence, volume 20, page 334, 2005.

78

[46] Moon-Hee Park, Jin-Hyuk Hong, and Sung-Bae Cho. Location-based recom-mendation system using bayesian users preference model in mobile devices. InUbiquitous Intelligence and Computing, pages 1130–1139. Springer, 2007.

[47] Alexandrin Popescul, David M Pennock, and Steve Lawrence. Probabilistic mod-els for unified collaborative and content-based recommendation in sparse-dataenvironments. In Proceedings of the Seventeenth conference on Uncertainty inartificial intelligence, pages 437–444. Morgan Kaufmann Publishers Inc., 2001.

[48] Jing Qian and Venkatesh Saligrama. Spectral clustering with unbalanced data.arXiv preprint arXiv:1302.5134, 2013.

[49] Gerhard Reitmayr and Dieter Schmalstieg. Data management strategies formobile augmented reality. In Proceedings of International Workshop on SoftwareTechnology for Augmented Reality Systems, pages 47–52, 2003.

[50] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: anopen architecture for collaborative filtering of netnews. In Proceedings of the1994 ACM conference on Computer supported cooperative work, pages 175–186.ACM, 1994.

[51] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Item-based collaborative fil-tering recommendation algorithms. In Proceedings of the 10th international con-ference on World Wide Web, pages 285–295. ACM, 2001.

[52] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Application of dimension-ality reduction in recommender system-a case study. Technical report, DTICDocument, 2000.

[53] D. Schmalstieg, G. Schall, and D. Wagner. Managing complex augmented realitymodels. pages 48–57, 2007.

[54] Shang Shang, Sanjeev R Kulkarni, Paul W Cuff, and Pan Hui. A random walkbased model incorporating social information for recommendations. In 2012International Workshop on Machine Learning and Signal Processing, pages 1–6.IEEE, 2012.

[55] X. Su and T.M. Khoshgoftaar. A survey of collaborative filtering techniques.Advances in Artificial Intelligence, 2009:4, 2009.

[56] Karen HL Tso-Sutter, Leandro Balby Marinho, and Lars Schmidt-Thieme. Tag-aware recommender systems by fusion of collaborative filtering algorithms. InProceedings of the 2008 ACM Symposium on Applied computing, pages 1995–1999. ACM, 2008.

[57] DWF Van Krevelen and R Poelman. A survey of augmented reality technologies,applications and limitations. International Journal of Virtual Reality, 9(2):1,2010.

79

[58] V. Vlahakis, J. Karigiannis, M. Tsotros, N. Ioannidis, and D. Stricker. Personal-ized augmented reality touring of archaeological sites with wearable and mobilecomputers. In Wearable Computers, 2002.(ISWC 2002). Proceedings. Sixth In-ternational Symposium on, pages 15–22. IEEE, 2002.

[59] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing,17(4):395–416, 2007.

[60] Chieh-Chih Wang, Jennifer Healey, and Meiyuan Zhao. Augmenting on-roadperception: enabling smart and social driving with sensor fusion and cooper-ative localization. In Proceedings of the 3rd Augmented Human InternationalConference, page 21. ACM, 2012.

[61] J. Wang, A.P. De Vries, and M.J.T. Reinders. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proceedings of the29th annual international ACM SIGIR conference on Research and developmentin information retrieval, pages 501–508. ACM, 2006.

[62] C.A. Williams, B. Mobasher, and R. Burke. Defending recommender systems:detection of profile injection attacks. Service Oriented Computing and Applica-tions, 1(3):157–170, 2007.

[63] Liang Xiang, Quan Yuan, Shiwan Zhao, Li Chen, Xiatian Zhang, Qing Yang,and Jimeng Sun. Temporal recommendation on graphs via long-and short-termpreference fusion. In Proceedings of the 16th ACM SIGKDD, pages 723–732.ACM, 2010.

[64] Mao Ye, Peifeng Yin, and Wang-Chien Lee. Location recommendation forlocation-based social networks. In Proceedings of the 18th SIGSPATIAL In-ternational Conference on Advances in Geographic Information Systems, pages458–461. ACM, 2010.

[65] S. Zhang, Y. Ouyang, J. Ford, and F. Makedon. Analysis of a low-dimensionallinear model under recommendation attacks. In Proceedings of the 29th annualinternational ACM SIGIR conference on Research and development in informa-tion retrieval, pages 517–524. ACM, 2006.

[66] Zhuo Zhang, Paul Cuff, and Sanjeev Kulkarni. Iterative collaborative filteringfor recommender systems with sparse data. In Machine Learning for SignalProcessing (MLSP), 2012 IEEE International Workshop on, pages 1–6. IEEE,2012.

[67] Zhuo Zhang, Pan Hui, Sanjeev R Kulkarni, and Christoph Peylo. Enabling anaugmented reality ecosystem: A content-oriented survey. In Mobile AugmentedReality and Robotic Technology-based Systems.

[68] Zhuo Zhang and Sanjeev R Kulkarni. Graph-based detection of shilling attacksin recommender systems. In Machine Learning for Signal Processing (MLSP),2013 IEEE International Workshop on, pages 1–6. IEEE, 2013.

80

[69] Zhuo Zhang and Sanjeev R Kulkarni. Detection of shilling attacks in recom-mender systems via spectral clustering. In Fusion Conference, 2014 InternationalConference on Information Fusion, 2014.

[70] Zhuo Zhang, Shang Shang, Sanjeev R Kulkarni, and Pan Hui. Improving aug-mented reality using recommender systems. In Proceedings of the 7th ACMconference on Recommender systems, pages 173–176. ACM, 2013.

[71] F. Zhou, H.B.L. Duh, and M. Billinghurst. Trends in augmented reality tracking,interaction and display: A review of ten years of ismar. In 7th IEEE/ACM In-ternational Symposium on Mixed and Augmented Reality, pages 193–202. IEEE,2008.

[72] Q. Zhou and F. Zhang. A hybrid unsupervised approach for detecting profileinjection attacks in collaborative recommender systems. 2012.

[73] W. Zhu, C.B. Owen, H. Li, and J.H. Lee. Personalized in-store e-commerce withthe promopad: an augmented reality shopping assistant. Electronic Journal forE-commerce Tools and Applications, 1(3):1–19, 2004.

81

Download - Sparsity, robustness, and diversification of Recommender Systems · 2014-09-26 · recommender systems, namely sparsity, robustness and diversi cation. The dissertation starts with

Top Related