comparison of user and item- based collaborative filtering...

INOM EXAMENSARBETE TEKNIK,GRUNDNIVÅ, 15 HP

, STOCKHOLM SVERIGE 2017

Comparison of user and item-based collaborative filtering on sparse data

ALEXANDER SÁNDOR

HARIS ADZEMOVIC

KTHSKOLAN FÖR DATAVETENSKAP OCH KOMMUNIKATION

Comparison of user and item-basedcollaborative filtering on sparse data

ALEXANDER SÁNDOR, HARIS ADZEMOVIC

Bachelor in Computer ScienceDate: June 19, 2017Supervisor: Jens LagergrenExaminer: Örjan EkebergSchool of Computer Science and Communication

ii

Abstract

Recommender systems are used extensively today in many areas to help users and con-sumers with making decisions. Amazon recommends books based on what you havepreviously viewed and purchased, Netflix presents you with shows and movies youmight enjoy based on your interactions with the platform and Facebook serves personal-ized ads to every user based on gathered browsing information. These systems are basedon shared similarities and there are several ways to develop and model them. This studycompares two methods, user and item-based filtering in k nearest neighbours systems.The methods are compared on how much they deviate from the true answer when pre-dicting user ratings of movies based on sparse data.

The study showed that none of the methods could be considered objectively betterthan the other and that the choice of system should be based on the data set.

iii

Sammanfattning

Idag används rekommendationssystem extensivt inom flera områden för att hjälpa an-vändare och konsumenter i deras val. Amazon rekommenderar böcker baserat på vad dutittat på och köpt, Netflix presenterar serier och filmer du antagligen kommer gilla base-rat på interaktioner med plattformen och Facebook visar personaliserad, riktad reklamför varje enskild användare baserat på tidigare surfvanor. Dessa system är baserade pådelade likheter och det finns flera sätt att utveckla och modellera dessa på. I denna rap-port jämförs två metoder, användar- och objektbaserad filtrering i k nearest neighbourssystem. Metoderna jämförs på hur mycket de avviker från det sanna svaret när de försö-ker förutse användarbetyg på filmer baserat på gles data.

Studien visade att man ej kan peka ut någon metod som objektivt bättre utan att valav metod bör baseras på datasetet.

Contents

Contents v

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Recommender systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Collaborative based filtering . . . . . . . . . . . . . . . . . . . . . . . . 11.1.3 Calculating the similarity between users . . . . . . . . . . . . . . . . . 21.1.4 k Nearest Neighbours (kNN) . . . . . . . . . . . . . . . . . . . . . . . . 41.1.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.6 Sparse data problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Surprise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5 Research question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.6 Scope and constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Method 72.1 Data handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Simulating sparse data . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.2 Formatting data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.3 Creating test data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Conducting the tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.1 Building similarity model . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.2 Building the prediction algorithm . . . . . . . . . . . . . . . . . . . . . 82.2.3 Evaluating the algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Results 93.1 Pearson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Cosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Discussion 144.1 External dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.2 State of the art and relevancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5 Conclusion 16

Bibliography 17

v

vi CONTENTS

A Metrics 19A.1 CiaoDVD Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19A.2 FilmTrust Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20A.3 MovieLens Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

B Code 24B.1 main.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24B.2 split_data.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25B.3 ciao_dvd_format.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Chapter 1

Introduction

1.1 Background

In everyday life, it is often necessary to make choices without sufficient personal experi-ence of the alternatives. We then rely on recommendations from other people to make assmart choices as possible. E.g., when shopping at a shoe store, a customer could describefeatures of previously owned shoes to a clerk and then the clerk would make recom-mendations for new shoes based on the customer’s past experiences. A dedicated clerkcould, besides providing recommendations, also remember past choices and experiencesof customers. This would allow the clerk to make personalised recommendations to re-turning customers. The way we transform this experience to the digital era is by usingrecommender systems [1].

1.1.1 Recommender systems

Recommender systems can be viewed as a digital representation of the clerk in the previ-ous example. The goal of a recommender system is to make predictions of what itemsusers might be interested in by analysing gathered data. Gathering data can be donewith an implicit and/or an explicit approach. An implicit approach records users’ be-haviour when reacting to incoming data (e.g. by recording for how long a user actuallywatched a movie before switching to something else). This can be done without userknowledge. The explicit approach depends on the user explicitly specifying their pref-erences regarding items, e.g. by rating a movie.

Input to a recommender system is the gathered data and the output is a predictionor recommendation for the user [2]. A recommender system’s predictions will generallybe more accurate the more data it can base its predictions on. Having a small amount ofdata to base predictions on is known as the sparse data problem and is expanded upon insection 1.1.6.

1.1.2 Collaborative based filtering

Collaborative Filtering (CF) is a common algorithm used in recommender systems. CFprovides predictions and recommendations based on other users and/or items in the sys-tem. We assume that similar users or items in the system can be used to predict eachother’s ratings. If we know that Haris likes the same things as Alex and Alex also likescandy then we can predict that Haris will most likely also enjoy candy [3, 4].

1

2 CHAPTER 1. INTRODUCTION

Two common methods for implementing collaborative filtering are user and item-based filtering. Both of these methods create a similarity matrix where the similaritiesbetween users (or items) is calculated and stored in a matrix. The distance (similarity)between users can be calculated in several ways and two common methods are the Pear-son correlation coefficient or the cosine similarity.

1.1.3 Calculating the similarity between users

To calculate how similar users are, a matrix is used where the users are rows and differ-ent items are columns. One can then look at how similar users are by comparing theirratings for every item. Below is an example matrix and table with 3 users (Amy, Bill andJim) and only 2 items (Snow Crash and Girl with the Dragon Tattoo).

Figure 1.1: Comaprison matrix [guidetodatamining.com]

Figure 1.2: Comaprison table [guidetodatamining.com]

The figures 1.1 and 1.2 show Bill and Jim having more in common than any otherpair. There are several ways to give a value to this similarity. Some common approachesare:

CHAPTER 1. INTRODUCTION 3

Manhattan distance

The Manhattan distance is a simple form of similarity calculation. It is the sum of thedifferences between ratings in every axis. In the above case, where the matrix is in 2D,the Manhattan distance between Bill, at index 1, and Jim, at index 2, would be:|x1 − x2|+ |y1 − y2| = |4− 5|+ |1− 2| = 2

Euclidean distance

The Euclidean distance uses the difference of every axis and applies the PythagoreanTheorem to calculate the "straight line distance" between two objects in the matrix.

Pythagorean theorem:a2 + b2 = c2

Euclidean distance between Jim, at index 1, and Amy, at index 3, is calculated withthe equation:√

(|x1 − x3|2) + (|y1 − y3|2) =√

(|4− 5|2) + (|1− 5|)2 =√

17 ≈ 4.12

Correlation

An issue that isn’t visualized by this example is what happens when there is incompletedata. As in, some users haven’t rated some items of the matrix. If users A and B haverated the same 100 items but A and C only have 10 rated items in common, the simi-larity calculation between A and B should obviously be stronger as it is based on moredata. Using the Manhattan or Euclidean distance however, this will not be accountedfor, making these methods poor when data is missing [5]. To account for this, two othermethods, Pearson correlation coefficient and cosine similarity can be used.

Pearson correlation coefficient (PCC)

The PCC draws a line between two users’ ratings to get a correlation value where a straight,increasing line represents a high correlation while a decreasing line shows that the com-pared units do not correlate much.

Figure 1.3: Example of a correlation table [guidetodatamining.com]

The figures 1.3 and 1.4, show an example of positive correlation. The Pearson corre-lation coefficient takes what is known as "grade inflation" into account [5]. This is thephenomenon of users rating things differently even though they feel the same way aboutthem. In the above example, Weird Al is the band Clara dislikes the most yet they arestill rated at 4. Robert also dislikes Weird Al but gives them a rating of 1. In the Manhat-tan or Euclidean calculations, this would represent a big difference between the users but


Figure 1.4: Graphing the table shows a positive correlation [guidetodatamining.com]

the graph shows that they are very much alike. When placing these 5 bands in order ofpreference, they agree completely.

The formula for calculating PCC is:

r =

n∑i=1

(xi − x)(yi − y)√√√√ n∑i=1

(xi − x)2

√√√√ n∑i=1

(yi − y)2

(1.1)

Cosine similarity

Cosine similarity is another way of calculating the similarity between users’ preferences.Here the users and their ratings of items are represented as two vectors and their similar-ity is based on the cosine of the angle between them. Cosine similarity is often used forrecommender systems since it ignores items which both users haven’t rated, so called 0-0matches, which are in abundance when dealing with sparse data. The cosine similarity iscalculated as:

cos(−→x ,−→y ) =−→x · −→y

||−→x || × ||−→y ||(1.2)

Where the dot in the numerator represents the dot product and ||x|| in the denomi-nator indicates the length of vector x.

1.1.4 k Nearest Neighbours (kNN)

K nearest neighbours is the method of looking at some number (k) of users or items thatare similar to make predictions. Meaning that not all users, or items, are accounted forwhen making a prediction. The difference between user or item-based filtering is creat-ing a matrix of similar users or similar items. Similar users are users who often sharesentiment/rating of items. When recommender systems were first developed, user-basedfiltering was used but it has issues with scalability. As the amount of data increases, cal-culating the similarity matrix raises exponentially. To combat this, Amazon developeditem-based filtering which labels similar items into groups so that once a user rates some

CHAPTER 1. INTRODUCTION 5

item highly, the algorithm recommends other similar items from the same group. Item-based filtering scales better than the user-based approach [3, 5, 6].

1.1.5 Evaluation

Two common methods for evaluating recommender systems are used in this study. TheRoot Mean Squared Error (RMSE) is calculated by:

RMSE =

√√√√ 1

n

n∑i=1

d2i (1.3)

and the Mean Absolute Error (MAE) is calculated by:

MAE =

√√√√ 1

n

n∑i=1

|di| (1.4)

Where n is the number of predictions made and d is the distance between the rec-ommender system’s prediction and the correct answer. The closer the RMSE and MAEvalues are to 0 the better accuracy the recommender system has. RMSE disproportionallypenalizes large errors while MAE does not mirror many small errors properly so bothmeasurements should be used when evaluating the accuracy [7, 8, 9].

To provide test data for evaluation, a dataset is divided into two parts. One part isused for building the similarity matrix and the other part is used for evaluation.

1.1.6 Sparse data problem

Sparse data is a common problem in recommender systems where the dataset consists offew ratings compared to the number of users. This issue was simulated by splitting thedataset into two asymmetric parts. The smaller part is then used to make predictions forall objects in the larger part [10].

1.2 Datasets

Three datasets where used in this study. These are all datasets involving user ratings ofmovies. The datasets have all been previously used in studies about recommender sys-tems [10]. The datasets are:

FilmTrust

FilmTrust was an old film rating website that has now been shut down. The data wascrawled from the FilmTrust website in June 2011 as part of a research paper on recom-mender systems [11]. The FilmTrust database has 1 508 users and 2 071 items. There is atotal of 35 500 ratings where the scale goes from 1 to 5.

CiaoDVD

CiaoDVD was a DVD rating website where users could share their reviews of moviesand give recommendations for stores with the best prices. The data was crawled fromdvd.ciao.co.uk in December 2013 as part of a research paper on trust prediction [12]. The


CiaoDVD database has 920 users and 16 121 items. There is a total of 20 469 ratings andthe scale goes from 1 to 5.

MovieLens

MovieLens is a well-known dataset used in many scientific papers. It consists of a collec-tion of movie ratings from the MovieLens web site. The dataset was collected over vari-ous periods of time [13]. The MovieLens database has 6 040 users and 3 952 items. Thereare a total number of 1 000 209 ratings and the scale goes from 1 to 5. In this dataset, allusers have rated at least 20 items.

1.3 Surprise

There are multiple free and available to use implementations of recommender systems.The algorithms in this study was implemented using the python library Surprise [14].Surprise is licensed under the BSD 3-Clause license [15].

1.4 Purpose

The study compares how well the two collaborative based filtering systems user-basedand item-based perform when predictions are based on sparse data, known as the sparsedata problem. The sparse data problem is a common one in the field of machine learning[16] and understanding how effective these different methods are, is of great value forfuture implementations.

1.5 Research question

How do the two filtering systems user-based and item-based compare when making predic-tions based on sparse data?

1.6 Scope and constraints

The different datasets that were used are from MovieLens, FilmTrust and CiaoDVD. Thepython library Surprise was used to conduct all tests.

This study will only compare the correctness of predictions when these are based onsparse data. Other factors such as speed and memory efficiency will not be taken intoconsideration. The correctness will be measured using the RMSE and MAE.

Chapter 2

Method

Running the two filtering methods, user and item-based filtering, on a dataset is hence-forth referred to as a "test". Every test was conducted 10 times with randomized sets oftraining and test data. The mean value of these 10 runs represent the result of a test.

2.1 Data handling

Before use, the data needed processing. Following are the methods used to prepare thedata for testing.

2.1.1 Simulating sparse data

In the study, sparse data is defined by using 20% of the dataset for training and 80% forverification. This ratio has been used in similar studies [17].

2.1.2 Formatting data

The dataset provided from MovieLens and FilmTrust use a format that Surprise can han-dle natively. The dataset from CiaoDVD was formatted before use. The python script inappendix B.3 was used to retrieve only the columns with user id, movie id and rating.

2.1.3 Creating test data

The data was split using a python script, see appendix B.2, that first read all the datafrom file into an array. Then a shuffle of the array was done by providing a seed value,ranging from 1 to 10, to the shuffle function in the python library. After that every fifthrating (20%) was written to one file and the rest was written to another. The smaller filewas then used as training data for the recommender system and the bigger file was usedas test data. This was repeated 10 times with different seeds for each dataset.

2.2 Conducting the tests

The created test and training datasets were used to build models, run the prediction al-gorithm and evaluate the result. See appendix B.1 for code.

7

8 CHAPTER 2. METHOD

2.2.1 Building similarity model

A PCC and cosine similarity model was built for each dataset. Note that the models hadto be created for each dataset and only one model could be evaluated in each run. Thiswas configured with built in functions in the Surprise library.

2.2.2 Building the prediction algorithm

Built-in methods in Surprise were used to create the prediction algorithm. In table 2.1the configurations for the different prediction algorithms are shown. All setups used aminimum of 1 neighbour for predictions.

Test Filtering method Similarity model Max Neighbours used1 Item-based cosine 402 User-based cosine 403 Item-based pearson 404 User-based pearson 40

Table 2.1: Configurations for prediction algorithms

2.2.3 Evaluating the algorithms

Evaluation of the algorithms was done with the built-in function, evaluate(), in the Sur-prise library. Each test was run with all (10) test and training data combinations for eachdataset. For both correlation evaluations (PCC and cosine similarity) and each dataset amean value for the RMSE and MAE score was calculated based on the evaluation of the10 different seeded partitions of the data. An average was used to prevent strong influ-ences from deviating scores in the case of bad data in the results.

Chapter 3

Results

The following structure will be used to present the results of the study:Two sections are used showing results based on each of the similarity matrix struc-

tures, Pearson correlation coefficient (Pearson) or cosine similarity (Cosine). For all datasets,user and item-based filtering will be compared side by side in a plot for each metric,MAE or RMSE. The plot shows the average value of the 10 test runs. The lower the value,the better predictions have been made.

Following the plot of average scores there is another plot which shows the max devi-ation for the scores. This is the difference between the highest and lowest score of the 10test runs for each dataset and filtering method. The lower the difference, the smaller thespread which has been observed between different test runs. This plot is included to givean idea of how much the tests varied which is relevant as we use an average value.

The full metrics of the tests are presented in appendix A.

3.1 Pearson

The following results were obtained using the Pearson method for the similarity matrix.

Figure 3.1: MAE, Pearson

9

10 CHAPTER 3. RESULTS

The plot in figure 3.1 shows the results for the MAE scores. The plot shows a smalladvantage for item-based filtering for the FilmTrust dataset while there is an opposite ad-vantage for the MovieLens dataset. For the CiaoDVD dataset user and item-based basedfiltering score about the same.

Figure 3.2: Max MAE score deviation for Pearson

The difference plot in figure 3.2 shows that the difference of the max and min value isless than 0.0215 for all the datasets. FilmTrust has highest value for user-based filtering.The scores have a deviation of around 3%. The plot also shows that there is a big differ-ence for user and item-based deviation for FilmTrust.

Figure 3.3: RMSE, Pearson

The RMSE scores, plotted in figure 3.3, give hints about the same trends as the MAEscores. The dataset for FilmTrust had better accuracy when item-based filtering was used

CHAPTER 3. RESULTS 11

and MovieLens had better accuracy when user-based was used. CiaoDVD had about thesame accuracy for both filtering methods.

Figure 3.4: Max RMSE score deviation for Pearson

The difference plot in figure 3.4 shows the same max deviation for the FilmTrust datasetwith less than 0.025 difference between the max and min values. The difference betweenthe user and item-based approaches for the FilmTrust dataset which was observed in fig-ure 3.2 is present here as well.

12 CHAPTER 3. RESULTS

3.2 Cosine

The following results were obtained using the cosine similarity method for the similaritymatrix.

Figure 3.5: MAE, Cosine

In figure 3.5 the same trend which was observed for the pearson matrices in figure3.1 are still visible. However, user and item-based filtering scored slightly closer to eachother.

Figure 3.6: Max MAE score deviation for cosine

For the cosine similarity matrix, the difference between the max and min scores aremuch closer than for the Pearson similarity matrices. From figure 3.6 we see that the maxscore deviation is less than 0.01 points. However, there is a slightly lesser deviation for

CHAPTER 3. RESULTS 13

item-based filtering for all datasets. Notice that the big deviation for user-based filter-ing for the FilmTrust dataset which was observed when using the Pearson method is notpresent here.

Figure 3.7: RMSE, Cosine

The RMSE score using the cosine similarity matrix plotted in figure 3.7 shows thesame trends as the RMSE score for the Pearson similarity matrix in figure 3.3.

Figure 3.8: Max RMSE score deviation for cosine

As opposed to the MAE score we see a slightly smaller deviation of the scores foruser-based filtering. The deviation is less than 0.01 points which is very low.

Chapter 4

Discussion

The discussion section has been divided into three parts with one part discussing our re-sults and how the study was conducted, one part talking about external dependenciesand the last part analysing the current state of the art and the relevancy of the study.Figures 3.1 - 3.4 show a clear pattern where neither user nor item-based filtering hasa clear advantage over the other, independent of error and correlation measurements(MAE, RMSE and Pearson, cosine). The results suggest that the choice of filtering methodshould be based on the data set. Exactly what properties of the data set that one shouldlook for when determining filtering method is hard to say based on this study as it onlycontains 3 different ones with several differences (making it hard to pinpoint determiningfactors).

Our experiments show a clear correlation between the two error measurements whereboth give the same result for every dataset on what filtering method performed best. TheMAE scores being lower than the respective RMSE ones across the board is expected asMAE can never produce a higher value than RMSE, only an equal one (if all errors havethe same magnitude).

The maximum k value for the k-nearest neighbours algorithm which denotes howmany items or users one makes the recommendations based on was chosen to be 40 inall tests. Choosing the optimal k value is not a simple task and there are many sugges-tions for how one should go about doing it but no agreed upon best method [18]. Us-ing cross validation with different k values and comparing results is one recommendedmethod but this approach depends on the data set. Since different data sets are usedin this study, different k values might be needed for the datasets to enable the systemto perform at optimal capacity. Other ways of calculating an optimal k value are dis-cussed in [19]. Calculating an optimal k value for every data set was considered outsideof this study’s scope and the default value of the Surprise library (40) was used instead.This value is, as stated, the maximum number of neighbours which the algorithm willconsider. If there are not 40 users (or items) which are similar enough to be consideredneighbours, Surprise will use a lower amount (to a minimum of 1). Using a differentmaximum k value may have an impact on the results if this study’s experiments are tobe remade.

Every test result is a mean average of 10 runs where the training and test data setswere randomized. This method was used because it was a fair compromise when con-sidering its correctness and the scope of the study. One can naturally get a more sta-tistically sound value by averaging 1000 test runs instead of 10 but running the tests is

14

CHAPTER 4. DISCUSSION 15

time consuming (computationally) and it is hard to set a limit for how many data pointsare needed for a fair assessment. One more thing which our method doesn’t account foris outliers which can skew the mean considerably. However, only running each test 10times allowed us to see that no big statistical outliers were present in the mean calcula-tions. This is shown in the figures (3.2, 3.4, 3.6, 3.8)

4.1 External dependencies

Two of the datasets, FilmTrust and CiaoDVD, were acquired from a scientific paper andnot taken directly from their respective source. They were both collected by crawling thewebsites while these were online (they have been shut down at the time of writing). Thismakes it hard to control the correctness of the data. The dataset from CiaoDVD came ina non-compatible format for the python program so the data had to be processed andformatted which leaves room for human error.

An important attribute of the MovieLens dataset is that all users have made at least20 ratings. There are no known similar minimum thresholds for the other datasets.

To raise the confidence of the drawn conclusions, more datasets should be used ofvarying sizes and from areas other than movie ratings. Initially the paper included adataset from Yelp of restaurant reviews but because of its different data format and timerestrictions, this dataset could not be used in this study.

We have no reason to doubt the Surprise software. All our tests have returned rea-sonable results and Surprise looks like a professionally built product for all intents andpurposes. It is open source, actively maintained (latest commit was within 24 hours ofwriting (25-03-2017)), well documented and written by a Ph.D. student at IRIT (ToulouseInstitute of Computer Science Research). To confirm the accuracy of the software, one canuse the same data sets and algorithms of this study and input these into another work-ing recommender system and check if the results are identical.

4.2 State of the art and relevancy

Many companies use recommender systems today. Some bigger ones are Amazon, Face-book, Linkedin and Youtube. Finding out exactly what algorithms these companies useand how they are implemented has proven very difficult. There are two major reasonsfor this. One is that such information is part of their (often) closed source code. The otheris that there is no simple answer to the question as most modern recommender systemsare based on a plethora of algorithms. One famous case where this was displayed wasthe Netflix Prize, a contest for developing a better recommender system for Netflix witha price pool of a million dollars [20]. The best (winning) algorithms were in fact neverimplemented by Netflix as their huge complexity and engineering effort required over-shadowed the slightly better predictions they would bring [21].

The relevancy of the study can be questioned since its scope is quite narrow. Limitingitself to only comparing the accuracy of the two methods and dismissing other factorssuch as memory efficiency and computational demand/speed may make the results ir-relevant if one of the methods can’t ever be feasibly applied because of such limitations.However, even if such limitations do exist, this and similar studies could provide valu-able insight for if pursuing a solution to such limitations is worth putting effort into.

Chapter 5

Conclusion

The study shows that neither the user nor the item-based filtering approach can be con-sidered better than the other when only comparing prediction accuracy (and ignoringother aspects such as memory usage and speed). When choosing between user or itembased filtering, the choice should be based on the contents of the data set.

16

Bibliography

[1] Resnick P and Varian HR. Recommender systems. Communications of the ACM, 40:56–58, 1997.

[2] Lampropoulos AS and Tsihrintzis GA. Machine Learning Paradigms. Intelligent Sys-tems Refernce Library. Springer International Publishing, 2015. ISBN 9783319191355.

[3] MD Ekstrand, JT Riedl, and JA Konstan. Collaborative filtering recommender sys-tems. Foundations and Trends in Human-Computer Interaction, 4:81–173, 2010.

[4] Resnick P, Iacovou N, Suchak M, Bergstrom P, and Reindl J. Grouplens: An openarchitecture for collaborative filtering of netnews. Proceedings of CSCW, 1994.

[5] Zacharski R. A programmer’s guide to data mining, 2015.

[6] Linden G, Brent S, and York J. Amazon.com recommendations item-to-item collabo-rative filtering. Technical report, Amazon.com, 2003.

[7] Shani G and Gunawarda A. Evaluating recommender systems. Technical report,Microsoft Research, 2009.

[8] Wikipedia; Root mean-square deviation. https://en.wikipedia.org/wiki/root-mean-square_deviation, 2017. February 27, 2017.

[9] Wikipedia; Mean absolute error. https://en.wikipedia.org/wiki/mean_absolute_error,2017. March 25, 2017.

[10] Guo G, Zhang J, and Yorke-Smith N. Trustsvd: Collaborative filtering with both theexplicit and implicit influence of user trust and of item ratings. Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligence, 2015.

[11] G. Guo, J. Zhang, and N. Yorke-Smith. A novel bayesian similarity measure for rec-ommender systems. In Proceedings of the 23rd International Joint Conference on ArtificialIntelligence (IJCAI), pages 2619–2625, 2013.

[12] G. Guo, J. Zhang, D. Thalmann, and N. Yorke-Smith. Etaf: An extended trust an-tecedents framework for trust prediction. In Proceedings of the 2014 International Con-ference on Advances in Social Networks Analysis and Mining (ASONAM), pages 540–547,2014.

[13] Harper F. M. and Konstan J. A. The movielens datasets: History and context. ACMTransactions on Interactive Intelligent Systems, 2016.

[14] Nicolas Hug. surpriselib.com, 2017. Mars 25, 2017.

17

18 BIBLIOGRAPHY

[15] Open Source Initiative. opensource.org/licenses/bsd-3-clause, 2017. Mars 25, 2017.

[16] Heckerman Chickering. Fast learning from sparse data. Microsoft Research, page 1,1999.

[17] Evert AK. and Mattisson A. Rekommendationssystem med begränsad data. 2016.

[18] Researchgate.net. https://www.researchgate.net/post/how_can_we_find_the_optimum_k_in_k-nearest_neighbor, 2017. April 23, 2017.

[19] Byeoung U.Park Peter Hall and Richard J. Samworth. Choice of neighbor order innearest-neighbor classification. The Annals of Statistics, Volume 36 Number 5, 2008.

[20] Netflix Prize. http://www.netflixprize.com/, 2017. March 25, 2017.

[21] Mike Masnick. Why netflix never implemented the algorithm that won the netflix $1million challenge. Technical report, Techdirt.com, 2009.

**

Appendix A

Metrics

A.1 CiaoDVD Metrics

CiaoDVD, Item-based, CosineSeed MAE RMSE Users in trainset Items in trainset

1 0.8495 1.1120 6287 63652 0.8450 1.1072 6359 62673 0.8461 1.1084 6417 63694 0.8475 1.1095 6362 63985 0.8461 1.1079 6321 63406 0.8480 1.1079 6344 62987 0.8462 1.1093 6329 63188 0.8487 1.1092 6341 63019 0.8448 1.1038 6368 629210 0.8467 1.1067 6372 6303

Mean 0.8469 1.1082 6350 6325.1CiaoDVD, User-based, Cosine

Seed MAE RMSE Users in trainset Items in trainset1 0.8454 1.1051 6287 63652 0.8408 1.1015 6359 62673 0.8441 1.1044 6417 63694 0.8415 1.1015 6362 63985 0.8386 1.0975 6321 63406 0.8429 1.1026 6344 62987 0.8444 1.1035 6329 63188 0.8428 1.1023 6341 63019 0.8436 1.1044 6368 629210 0.8419 1.1005 6372 6303

Mean 0.8426 1.1023 6350 6325.1

19

20 APPENDIX A. METRICS

CiaoDVD, Item-based, PearsonSeed MAE RMSE Users in trainset Items in trainset

1 0.8374 1.0861 6287 63652 0.8332 1.0826 6359 62673 0.8362 1.0846 6417 63694 0.8332 1.0821 6362 63985 0.8332 1.0818 6321 63406 0.8342 1.0830 6344 62987 0.8351 1.0837 6329 63188 0.8354 1.0838 6341 63019 0.8336 1.0811 6368 6292

10 0.8344 1.0819 6372 6303Mean 0.8346 1.0831 6350 6325.1

CiaoDVD, User-based, PearsonSeed MAE RMSE Users in trainset Items in trainset

1 0.8367 1.0853 6287 63652 0.8335 1.0827 6359 62673 0.8370 1.0853 6417 63694 0.8343 1.0835 6362 63985 0.8340 1.0822 6321 63406 0.8344 1.0833 6344 62987 0.8362 1.0850 6329 63188 0.8353 1.0839 6341 63019 0.8342 1.0826 6368 6292

10 0.8351 1.0834 6372 6303Mean 0.8351 1.0837 6350 6325.1

A.2 FilmTrust Metrics

FilmTrust, Item-based, CosineSeed MAE RMSE Users in trainset Items in trainset

1 0.6757 0.8877 1267 8752 0.6724 0.8813 1245 8853 0.6780 0.8885 1253 9054 0.6733 0.8823 1258 8945 0.6721 0.8852 1266 8936 0.6776 0.8911 1267 9167 0.6747 0.8888 1274 9138 0.6715 0.8811 1250 9269 0.6768 0.8896 1266 895

10 0.6772 0.8864 1276 916Mean 0.6749 0.8862 1262.2 901.8

APPENDIX A. METRICS 21

FilmTrust, User-based, CosineSeed MAE RMSE Users in trainset Items in trainset

1 0.7290 0.9309 1267 8752 0.7300 0.9289 1245 8853 0.7259 0.9275 1253 9054 0.7263 0.9266 1258 8945 0.7274 0.9271 1266 8936 0.7248 0.9275 1267 9167 0.7314 0.9333 1274 9138 0.7233 0.9237 1250 9269 0.7304 0.9281 1266 89510 0.7269 0.9273 1276 916

Mean 0.7275 0.9281 1262.2 901.8FilmTrust, Item-based, Pearson


Mean 0.6794 0.8920 1262.2 901.8FilmTrust, User-based, Pearson


Mean 0.7669 0.9691 1262.2 901.8

22 APPENDIX A. METRICS

A.3 MovieLens Metrics

MovieLens, Item-based, CosineSeed MAE RMSE Users in trainset Items in trainset

1 0.8356 1.0503 6036 34662 0.8360 1.0503 6037 34733 0.8367 1.0503 6038 34774 0.8364 1.0504 6037 34625 0.8359 1.0504 6037 34766 0.8362 1.0503 6037 34687 0.8363 1.0517 6033 34738 0.8360 1.0496 6034 34739 0.8363 1.0512 6037 3458

10 0.8368 1.0513 6037 3467Mean 0.8362 1.0506 6036.3 3469.3

MovieLens, User-based, CosineSeed MAE RMSE Users in trainset Items in trainset

1 0.7893 0.9926 6036 34662 0.7902 0.9931 6037 34733 0.7895 0.9926 6038 34774 0.7900 0.9932 6037 34625 0.7899 0.9925 6037 34766 0.7905 0.9935 6037 34687 0.7897 0.9928 6033 34738 0.7897 0.9923 6034 34739 0.7907 0.9934 6037 3458

10 0.7900 0.9933 6037 3467Mean 0.7900 0.9929 6036.3 3469.3

MovieLens, Item-based, PearsonSeed MAE RMSE Users in trainset Items in trainset

1 0.8474 1.0694 6036 34662 0.8470 1.0685 6037 34733 0.8484 1.0694 6038 34774 0.8470 1.0680 6037 34625 0.8474 1.0697 6037 34766 0.8490 1.0707 6037 34687 0.8472 1.0693 6033 34738 0.8490 1.0701 6034 34739 0.8472 1.0694 6037 3458

10 0.8485 1.0700 6037 3467Mean 0.8478 1.0695 6036.3 3469.3

APPENDIX A. METRICS 23

MovieLens, User-based, PearsonSeed MAE RMSE Users in trainset Items in trainset

1 0.8043 1.0120 6036 34662 0.8056 1.0135 6037 34733 0.8040 1.0115 6038 34774 0.8038 1.0120 6037 34625 0.8048 1.0124 6037 34766 0.8050 1.0127 6037 34687 0.8053 1.0130 6033 34738 0.8049 1.0121 6034 34739 0.8053 1.0133 6037 345810 0.8053 1.0128 6037 3467

Mean 0.8048 1.0125 6036.3 3469.3

Appendix B

Code

B.1 main.py

from s u r p r i s e import KNNBasicfrom s u r p r i s e import Dataset , Readerfrom s u r p r i s e import evaluate , p r i n t _ p e r fimport os

_ _ l o c a t i o n _ _ = os . path . r ea lpa th ( os . path . j o i n ( os . getcwd ( ) , os . path . dirname ( _ _ f i l e _ _ ) ) )t e s t d a t a _ p a t h = os . path . j o i n ( __ locat ion__ , ’ t e s t d a t a _ ’ )t ra indata_path = os . path . j o i n ( __ locat ion__ , ’ t ra indata_ ’ )

reader = Reader ( l ine_ format = ’ user item r a t i n g timestamp ’ , sep = ’ : : ’ )

# Create a f o l d s l i s tt e s t _ f o l d s = [ ]f o r num in range ( 1 , 1 1 ) :

tuple = ( t ra indata_path + ’ { } ’ . format (num) , t e s t d a t a _ p a t h + ’ { } ’ . format (num) )t e s t _ f o l d s . append ( tuple )

p r i n t t e s t _ f o l d s

# Load t e s t and t r a i n datap r i n t "#### Read t e s t and t r a i n data from f i l e s ####"

data = Dataset . load_from_folds ( t e s t _ f o l d s , reader )

p r i n t "#### Read complete ####"

sim_options = { ’ name ’ : ’ cosine ’ ,’ user_based ’ : True}

p r i n t "#### Creat ing recommender system ####"algo = KNNBasic ( sim_options=sim_options , k=40)

p r i n t "#### Creat ion complete ####"

f o r t r a i n s e t , t e s t s e t in data . f o l d s ( ) :p r i n t "#### NUMBER OF USERS IN TRAINSET : { } # # # # " . format ( t r a i n s e t . n_users )p r i n t "#### NUMBER OF ITEMS IN TRAINSET : { } # # # # " . format ( t r a i n s e t . n_items )

# Evaluate performances of our algorithm on the d a t a s e t .per f = evaluate ( algo , data , measures =[ ’RMSE’ , ’MAE’ ] )p r i n t _ p e r f ( per f )

24

APPENDIX B. CODE 25

B.2 split_data.py

import osimport random

_ _ l o c a t i o n _ _ = os . path . r ea lpa th ( os . path . j o i n ( os . getcwd ( ) , os . path . dirname ( _ _ f i l e _ _ ) ) )

# Path to d a t a f i l e to be s p l i tf i l e _ p a t h = os . path . j o i n ( __ locat ion__ , ’ data/movielens/ml−1m/ r a t i n g s . dat ’ )

data = [ ]

# Number of l i n e s to skip in the beginning of the f i l es k i p _ l i n e s = 0p r i n t "#### Reading data ####"

with open ( f i l e _ p a t h , ’ r ’ ) as f :count = 0f o r l i n e in f :

i f not ( count < s k i p _ l i n e s ) :data . append ( l i n e )

count = count + 1

p r i n t "#### Data read complete ####"

f o r num in range ( 1 , 1 1 ) :random . seed (num)p r i n t "#### S h u f f l e data ####"

random . s h u f f l e ( data )

p r i n t "#### S h u f f l e data complete ####"

t e s t d a t a = open ( ’ t e s t d a t a _ { } ’ . format (num) , ’w’ )t r a i n d a t a = open ( ’ t r a i n d a t a _ { } ’ . format (num) , ’w’ )

count = 0

p r i n t "#### Writing data ####"

f o r l i n e in data :i f ( count == 0 ) :

t r a i n d a t a . wri te ( l i n e )e l s e :

t e s t d a t a . wri te ( l i n e )count = ( count + 1) % 5

t e s t d a t a . c l o s e ( )t r a i n d a t a . c l o s e ( )p r i n t "#### Writing complete ####"

B.3 ciao_dvd_format.py

import os

_ _ l o c a t i o n _ _ = os . path . r ea lpa th ( os . path . j o i n ( os . getcwd ( ) , os . path . dirname ( _ _ f i l e _ _ ) ) )

f i l e _ p a t h = os . path . j o i n ( __ locat ion__ , ’ data/CiaoDVD/movie−r a t i n g s . t x t ’ )

output = open ( ’ ciaoDVD_formated ’ , ’w’ )output . wri te ( " userID , movieID , movieRating\n " )

with open ( f i l e _ p a t h , ’ r ’ ) as f :

26 APPENDIX B. CODE

f o r l i n e in f :s p l i t _ l i n e = l i n e . s p l i t ( ’ , ’ )formated_l ine = " { } , { } , { } \ n " . format ( s p l i t _ l i n e [ 0 ] , s p l i t _ l i n e [ 1 ] , s p l i t _ l i n e [ 4 ] )output . wri te ( formated_l ine )

output . c l o s e ( )

www.kth.se

comparison of user and item- based collaborative filtering...

Documents