kth kexjobb user behavior prediction

Upload: anonymous-elo98s

Post on 06-Jul-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    1/38

    INDEGREE PROJECT TECHNOLOGY,FIRST CYCLE, 15 CREDITS

    ,STOCKHOLM SWEDEN 2016   

    A comparative study of the

    conventional item-based

    collaborative filtering and the

    Slope One algorithms for

    recommender systems

    HENRIK SVEBRANT

    JOHN SVANBERG

    KTH ROYAL INSTITUTE OF TECHNOLOGY

    SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    2/38

    A comparative study of the conventional

    item-based collaborative filtering and the Slope

    One algorithms for recommender systems

    HENRIK SVEBRANT

    JOHN SVANBERG

    Degree Project in Computer Science, DD143X

    Supervisor: Jeanette Hellgren KotaleskiExaminer: Örjan Ekeberg

    CSC KTH 2016-05

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    3/38

    Abstract

    Recommender systems are an important research topic intodays society as the amount of data increases across theglobe. In order for commercial systems to give their usersgood and personalized recommendations on what data maybe of interest to them in an effective manner, such a systemmust be able to give recommendations quickly and scalewell as data increases. The purpose of this paper is to eval-uate two such algorithms with this in mind.

    The two different algorithm families tested are classified asitem-based collaborative filtering  but work very differently.

    It is therefore of interest to see how their complexities af-fect their performance, accuracy as well as scalability. TheSlope One family is much simpler to implement and provesto be equally as efficient, if not even more efficient than theconventional item-based ones.

    Both families do require a precomputation stage before rec-ommendations are possible to give, this is the stage whereSlope One suffers in comparison to the conventional item-based one.

    The algorithms are tested using Lenskit, on data provided

    by GroupLens and their MovieLens project.

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    4/38

    Referat

    Rekommendationssystem är idag ett viktigt forskningsom-råde. Då mängden data snabbt ökar i dagens samhälle ärdet viktigt för kommersiella system att kunna ge sina an-vändare bra och personliga rekommendationer på intres-santa föremål på ett effektivt sätt. Samtidigt som de bör gerekommendationer snabbt skall de även vara skalbara föratt fortsättningsvis kunna användas då datamängden ökarytterligare. Syftet med denna rapport är att evaluera två

    typer av rekommendationsalgoritmer med dessa punkter iåtanke.

    De två olika algoritmfamiljerna som testas hör båda till ty-pen  item-based collaborative filtering  men fungerar i grun-den mycket olika. Det är därför intressant att se hur kom-plexiteten hos den ena står sig mot simpliciteten hos denandra, gällande prestanda, precision och skalbarhet. SlopeOne-algoritmen är mycket enklare att implementera och vi-sar sig vara lika effektiv, möjligen även mer effektiv än denvanliga item-baserade.

    Båda typerna kräver ett förberäkningsstadie för att rekom-mendationer skall kunna ges, i detta skede presterar SlopeOne sämre i jämförelse med dess motståndare.

    Algoritmerna har testats med hjälp av Lenskit, på data frånGroupLens och deras MovieLens-projekt.

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    5/38

    Contents

    1 Introduction 1

    1.1 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Scope and constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Thesis overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2 Background 4

    2.1 Applications of recommender systems . . . . . . . . . . . . . . . . . 42.2 Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.4 Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.5 Content-Based filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 52.6 Collaborative filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.6.1 Memory-based collaborative filtering . . . . . . . . . . . . . . 62.6.2 Model-based collaborative filtering . . . . . . . . . . . . . . . 6

    2.7 Tools and algorithms used in this study . . . . . . . . . . . . . . . . 72.7.1 Lenskit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.7.2 Conventional item-based CF . . . . . . . . . . . . . . . . . . 72.7.3 Slope One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.7.4 Weighted Slope One . . . . . . . . . . . . . . . . . . . . . . . 11

    3 Method 13

    3.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Testing hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Algorithm evaluation testing . . . . . . . . . . . . . . . . . . . . . . 14

    3.3.1 Conventional item-based CF . . . . . . . . . . . . . . . . . . 143.3.2 Slope One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    3.4 Algorithm recommendation performance testing . . . . . . . . . . . . 143.5 Validation of results . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    3.5.1 K-fold cross-validation . . . . . . . . . . . . . . . . . . . . . . 153.5.2 Root Mean Square Error . . . . . . . . . . . . . . . . . . . . 15

    4 Results and analysis 16

    4.1 Algorithm evaluation results . . . . . . . . . . . . . . . . . . . . . . . 16

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    6/38

    4.2 Recommendation performance results . . . . . . . . . . . . . . . . . 17

    5 Discussion 19

    5.1 Comparison of results with regards to accuracy . . . . . . . . . . . . 195.2 Criticism of testing methods . . . . . . . . . . . . . . . . . . . . . . . 20

    5.2.1 Lenskit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.2.2 Cross-folding . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.2.3 RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.2.4 Layout of testing . . . . . . . . . . . . . . . . . . . . . . . . . 21

    5.3 Challenges with limited hardware . . . . . . . . . . . . . . . . . . . . 215.4 Final conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    Bibliography 23

    Appendices 23

    A Evaluation result plots 24

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    7/38

    Chapter 1

    Introduction

    The rapidly increasing use of the internet supplies its users with an incredibleamount of information. Being able to quickly find what you need is becomingincreasingly difficult as the amount of information available grows. To mitigate thiseffect, it is important to develop systems that make personalized recommendationsand streamline the search process to make it more effective.

    Recommender systems (RS) does exactly this and is mainly used to generate rec-ommendations to users of a commercial system. The recommendations made variesbased upon what the system itself is used for. A few examples of recommendationsare books, movies, music, travel and other e-commerce. The recommendations are

    based on the perceived interests of the user’s with the intention of giving the useran idea of what the next interesting item could be.

    There are two main approaches to how recommender systems are made, eitherthrough Collaborative filtering (CF), or by Content-based filtering. Collaborativefiltering could further be divided into user-based CF and item-based CF. The user-based CF generates recommendations for user  u  based on other users similar to  u .An item-based CF investigates the similarity between items by representing eachitems ratings as a vector in  n   dimensions. The vectors of two items can then becompared with e.g cosine similarity. This information is later used to generate rec-ommendation to user   u   based on similar items to   u ’s previously liked items. A

    content-based RS generates recommendations based on the content of items, there-fore the system needs data about the items, eg. actors, director and genre of a movie.

    Either approach each have their own advantages and disadvantages, making thissubject a much researched one. The goal is to maximize the quality of the recom-mendations made, in order of achieving a greater user satisfaction, resulting in alarger customer-base and greater income.

    A big problem for recommender systems is to be fast, scalable and yet perform

    1

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    8/38

    CHAPTER 1. INTRODUCTION

    with high accuracy. It has been shown that a user-based RS does not scale very

    well with large datasets and systems with rapidly growing and changing user-base.Therefore this thesis will discuss different implementations of two different item-based algorithms. The two algorithms discussed are the conventional item-basedCF algorithm and the more simple Slope One algorithm. Focus is about comparingthe accuracy of the algorithms with their complexity and run time performance inmind. Both algorithms are easily modified to adjust certain parameters that possi-bly affect both accuracy and run time performance.

    A lot of research has been done in the field of recommender systems. The testscarried out in this thesis are based on the Lenskit recommender system researchtoolkit. Lenskit is a credible set of tools developed to be useful in recommender

    system research. Lenskit was developed by researchers at the Texas State Universityand GroupLens Researchers at the University of Minnesota.

    1.1 Problem definition

    The need for faster and more scalable recommender systems is an existing problemthat grows as the amount of data increase. It is therefore important to use well per-forming algorithms that can manage the demands set by the users of such a system.As item-based recommender systems have been proven to be more efficient thanthe user-based ones, this study will compare the performance and scalability of twoalgorithms of this type. The algorithms chosen for evaluation are the conventionalitem-based CF algorithm and the Slope One algorithm. They will be comparedwith regards of performance, scalability and their implementation difficulties. Thequestion to be discussed in this study is the following:

    How does the Slope One algorithm compare to the more complex conventionalitem-based algorithm, with regards to performance and scalability?

    1.2 Scope and constraints

    The data used in this study consists of datasets with movie ratings, however the

    algorithms used and results seen is well applicable for other recommendation sys-tem applications in other fields, branches of e-commerce and other businesses whererecommendations are needed.

    The testing software used in this study is an open-source recommender system soft-ware developed by researchers at Texas State University and GroupLens Researchat the University of Minnesota.

    The datasets used are supplied by GroupLens Research, using data collected bytheir MovieLens project. Item ratings are made on the scale of 1-5 using integer

    2

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    9/38

    CHAPTER 1. INTRODUCTION

    increments.

    1.3 Thesis overview

    •  Chapter 2 will provide the background needed to effectively understand thesubject, combined with definitions and explanations for a set of field specificterminology. It will also introduce the algorithms compared in this thesis.

    •  Chapter 3 explains the method used for this study.

    •   Chapter 4 will provide the final results from tests explained in the previouschapter.

    •  Chapter 5 discusses the results and provides the reader with the problemsencountered during this study.

    •  Chapter 6 provides the references used.

    •  The appendix provides plots for various evaluation test results.

    3

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    10/38

    Chapter 2

    Background

    To be able to implement a recommender system (RS) and do research about itsperformance, knowledge about the subject is needed. This section introduces thefield of recommender systems and its applications. Focus will be laid on collabora-tive filtering and especially item-based collaborative filtering, however alternativeapproaches and drawbacks will also be presented briefly.

    2.1 Applications of recommender systems

    Recommender systems are widely used on the Internet. Systems of this kind can be

    implemented using many different kinds of algorithms, and be applied in multipletypes of fields. Common uses are giving recommendations for movies, music anditems in online stores or to suggest friends in social networks for example.

    In short, a recommender system is a system that helps its users to find informationof interest in environments where the amount of information is large.

    2.2 Items

    The items being referred to in this report are the objects that are recommended bythe RS. An item is distinguished by a unique id number and characterized by item

    specific values, such as movie titles, genres, directors[3].

    2.3 Users

    The users of a RS could possibly have very diverse goals, personalities, taste andcharacteristics. Therefore the RS needs to be generic and exploit user informationwith an unbiased approach that does not exclude any characteristics. A user in theRS has several variables and attributes that correlates with other users or items.One important such are their ratings for previously used items[3].

    4

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    11/38

    CHAPTER 2. BACKGROUND

    2.4 Transactions

    The interaction between users and items are called transactions. The interactionsare recorded and applied to the owner of the interaction, namely the user that in-teracted with the item. Transactions are useful and is an important data for theRS to be able to generate qualitative recommendations to a user. Transactions maybe collected explicitly or implicitly. In a movie-RS typical explicit transactions aresystem-asked ratings whereas implicit could be the transaction when a user watchesa movie[3].

    Transactions could be abstracted in a variety of forms:

    •  User rating 1-5.

    •  Binary ratings, if the item is good or bad.

    •   User expressive ratings, eg. “Good”, “beautiful”, “emotional”.

    •  A users previously seen movies.

    2.5 Content-Based filtering

    Content-based (CB) recommender systems attempt to recommend items similar tothose a given user has liked in the past. This is done by building a model repre-

    sentation of the user, based on the features of the objects he or she has previouslyrated. The model is then used to match the attributes of the user model againstthose of a content object[3].

    Content-based filtering systems has some advantages when compared to Collabora-tive filtering. One advantage is that it is user independent, it solely makes use of ratings provided by the given user in order to build its respective user model. Thesystem is also able to give recommendations on items not yet rated by any user,which a Collaborative-based system is not.

    It does however have a couple of shortcomings as well. One of which is the limited

    content analysis it provides. To give movie recommendations the system needs toknow the actors and directors for example. No system can give reliable recommen-dations if data to distinguish items is lacking.

    Content-based recommender systems also tend to be over-specialized when giv-ing recommendations, as recommendations made are solely items whose scores arematched highly against the user model, e.g if a user previously liked movies by a cer-tain director, then the recommender system will give recommendations for movieswith this particular director and reject other movies because the recommender is

    5

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    12/38

    CHAPTER 2. BACKGROUND

    following a certain pattern that matches the user profile with the item specifics. Be-

    cause of this new users cannot be given accurate recommendations until the systemknows and understands the users’ preferences[3].

    2.6 Collaborative filtering

    Collaborative-filtering (CF) recommendation systems does typically make attemptsto identify users whose preferences are similar to those of the given user and recom-mend items that they have liked. Collaborative filtering algorithms could also havean item-based approach and will then identify similarities between different items.Just as with Content-based filtering systems, the most used rating representationsare numeric scale ratings or binary (like/dislike)-systems. In the field of e-commerceunary ratings, such as “has purchased” are also common[3][2].

    Common techniques for collaborative filtering systems are to build a user-item rat-ings matrix. The resulting matrix is often very sparse, which increases in difficultyto manage as the amount of data grows large[2].

    2.6.1 Memory-based collaborative filtering

    One approach to CF systems are the so-called memory-based, or user-based CF.Memory-based CF makes use of statistical techniques to find a set of users, calledneighbors, that have a common preference and a history of similarly liked items.

    The memory-based CF RS can then apply different types of algorithms to constructa top-N list with items that are predicted to be appreciated by the neighborhood.

    The memory-based CF system has limitations however, one of which is the fact thatsimilarity values are calculated based on common items. This becomes unreliablewhen data is sparse and common items are few. Another disadvantage of memory-based CF is the bad scalability performance. When the dataset used by the RS isgrowing, the computations in a memory-based CF increase both by number of usersand items, a system with millions of users and items will not scale very well[1].

    2.6.2 Model-based collaborative filtering

    Another approach to improve the prediction reliability is the model-based CF, alsocalled item-based CF. This approach builds a model based on rating data of theindividual items, commonly by using data mining och machine learning techniques.Item-based CF has been shown to scale better than User-based CF because themodels are based on items, which are more static than the users in a typical dataset.The scalability performance is also better due to the better handling of sparse datawith an item-based approach. This is of importance today as data grow large andscalability problems are important issues[8]. This approach will further be presentedin  section 2.7.2 .

    6

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    13/38

    CHAPTER 2. BACKGROUND

    2.7 Tools and algorithms used in this study

    This section introduces the software that was used in order of evaluating the al-gorithms studied in this report, they both belong to the family of Model-basedcollaborative filtering.

    2.7.1 Lenskit

    Lenskit is a free and open-source software being developed by researchers at TexasState University and GroupLens Research at the University of Minnesota, with con-tributions from developers around the world[5]. Lenskit is designed to be useful forbuilding production-quality recommender systems and to support many forms of 

    research, including research on evaluation techniques and algorithms.

    Lenskit is based on components that together build a functional recommender. Thecomponents can be changed and it is possible to implement components of yourown. It implements effective data structures optimized for sparse data and linearalgebra operations such as dot products. It also provides crossfolding techniques tosplit the dataset into N partitions for cross-validation[5]. In addition to this, it doesalso implement various performance measuring techniques, making it a justifiedresource for the research conducted in this project.

    2.7.2 Conventional item-based CF

    An item-based approach creates a model of user ratings. A list of similar itemsbased on the active user’s previously liked items is the goal of the algorithm. Thisis done by evaluating the similarity of the target item   i   and selecting the  k   mostsimilar items. A second list consists of those k  items corresponding similarity valuesto  i [1].

    The similarity between items can be calculated in several different ways and thereare benefits and disadvantages to every such method. Using a straightforwardmethod would benefit the complexity but could possibly lack the accuracy of amore complex method.

    Cosine-based similarity is a proven method to compute the similarity of two vectorsin an N-dimensional space. In this case the two compared items are represented asthe two vectors. The similarity is computed by calculating the cosine of the anglebetween the vectors. The dimension N   is based on the number of users who ratedthe items,  m  users implies  m  dimensions.

    7

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    14/38

    CHAPTER 2. BACKGROUND

    A similarity function between item i  and j  denoted Similarity(i,j) can be calculated

    by the formula below.

    Similarity(i, j) = cos j,i =   i·  j

     i×  j  (2.1)

    Where  i  and    j  are vectors and ·  is the dot-product of the two vectors.

    An adjusted version of the cosine-based similarity involves taking each individualaverage rating into account by subtracting the user’s average from each co-ratedpair, this could be beneficial because of the fact that some users are more criticaland others are more positive in general and therefore does not have a common ratingscale. A co-rated case means that two or more users all rated item   i   and   j . A setof users who co-rated item  i   and j   are denoted  U . Each element in the set is a pairof two items both rated by a user. This adjusted version is given by the followingformula.

    Similarity(i, j) =

    u∈U 

    (Ru,i−Ru)(Ru,j−Ru) u∈U 

    (Ru,i−Ru)2 

    u∈U (Ru,j−Ru)2

    (2.2)

    Where  Ru,i   is  u ’s rating of  i   and  Ru  is the average rating of  u .

    The similarity could also be calculated by the pearson-r correlation. To make thecorrelation significant the co-rated cases must be put in a set, similar to what wasdone for the previous adjusted cosine formula. This is illustrated by the formulabelow.

    Similarity(i, j) =

    u∈U 

    (Ru,i−Ri)(Ru,j−Rj) u∈U 

    (Ru,i−Ri)2 

    u∈U (Ru,j−Rj)2

    (2.3)

    Ri  is the average rating of item  i .

    Now that we have the similarities calculated, the first stage in the algorithm isdone. The second stage in a conventional item-based CF algorithm is to generatethe actual recommendations based on the similarities previously calculated and theratings of the target user.

    As with the similarity calculation between items there are several methods to pro-duce recommendations. One way to generate a predication on item  i  for the target

    8

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    15/38

    CHAPTER 2. BACKGROUND

    user u  is to calculate the sum of the ratings similar to  i  made by user u . All ratings

    which are summed are also weighted so that ratings corresponding to items withhigh similarity to i  affects the prediction for i  more. This method is called WeightedSum and the formula is shown below.

    P rediction(i, j) =

    ī∈N 

    (S i,̄i×Ru,̄i)ī∈N 

    (|S i,̄i|)  (2.4)

    N   is the set of items similar to  i   and  S i,̄i   is   i ’s similarity to  ī

    The Weighted sum could also be approximated with a regression model, this waythere is no need for directly using the ratings of similar items. A regression model

    could be more accurate than Weighted Sum because the euclidean distance betweenthe compared vectors, the items, could be distant but the actual similarity maystill be high. The approximated regression model can use the same formula asthe Weighted Sum where   Ru,i   is exchanged with the approximation

     Ŕu,i. Ŕu,i   is

    obtained by the following formula.[1]

    Ŕu,i = αRi + β  +     (2.5)

    The parameters α  and  β  are acquired from the rating vectors whereas    is the errorof the model.

    2.7.3 Slope One

    The Slope One algorithm is a model-based CF that is both simple to implementand to understand. It was developed and presented by Daniel Lemire and AnnaMaclachlan in 2005[4]. Among CF systems it can be argued to be one of the sim-plest forms of algorithms that is both non-trivial and item-based.

    The algorithm uses information regarding the items that the user has rated previ-ously, also from other users who have rated the same items as the given user. Onlythe ratings made by users who have some items in common with the predictee user,

    and of those items only the items also rated by the predictee are used. This doesin turn build a rating pair which will be used for the prediction.

    The prediction has the form of f(x) = x + b, being a simplified version of the linearregression algorithm f(x) = ax + b. The constant b is defined as the mean differencebetween each item and x is a variable representing rating values.

    9

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    16/38

    CHAPTER 2. BACKGROUND

    Table 2.1.   Movie rating example

    Vikings Game of Thrones Breaking Bad Band of Brothers Suits

    John 5 4 5 - 4

    Henrik 5 5 4 3 2

    Lisa 4 - 4 2 4

    Sophie ? 3 2 - -

    Anna 4 3 - - 5

    For example if we wish to predict  Sophie’s   rating for   Vikings   in   Table 2.1, it goesas follows.

    Using the common pairs of ratings for  Vikings   and   Game of Thrones   we get themean difference: ((5-4) + (5-5) + (4-3)) / 3 = 2/3. This value is our x, now we addx to b, which is Sophie’s rating for Game of Thrones, which is 3. We get the rating3 + 2/3.

    Doing the same for  Breaking Bad  we get ((5-5) + (5-4) + (4-4))/3 = 1/3, added tob gives us 1/3 + 2 = 2 + 1/3

    Using both of those predictions we can get a better one by calculating the mean.Resulting in ((3 * (3 + 3/2)) + (3 * (2 + 1/3)))/(3 + 3) = 18/ 6 = 3. Sophie’spredicted rating for Vikings  using available pairs for Game of Thrones  and  Breaking 

    Bad  is therefore 3, using the standard Slope One algorithm. The algorithm can moreformally be written as ni=1

    (vi+wi)

    n   (2.6)

    where v  and w  are the different items. And  vi  and  wi  are the ratings given by  user i  for those items.

    To get the best prediction on the form f(x) = x + b, given two arrays   vi   and   wiwith i = 1,2,...,n we minimize:

    ni=1(vi + b + wi)

    2(2.7)

    By deriving with respect to  b  and by setting the derivative to zero will imply thatb  is is equal to  Formula 2.6 . This result leads us to the following scheme.

    Given a user evaluation u  with ratings u j  and ui where i  and j  are items. Also givena training set  X , the average deviation between the given two items is consideredas:

    dev j,i =

    uS j,i(X )uj−ui

    card(S j,i(X ))  (2.8)

    10

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    17/38

    CHAPTER 2. BACKGROUND

    Where   S i,j(X )   is the set of all user evaluations in the training set X, with respect

    to  i   and  j . In other words, the deviation we get only take into account those usersthat has specified a rating to those specific items. The resulting information fromthe calculated   devi,j   is saved in a symmetric matrix, which can be easily updatedwhen new data is entered.

    Given the fact that devi,j  + ui is a prediction for  u j , given ui, a reasonable predictorwould therefore be the average of all such predictions.

    P (u) j  =

    iRj

    (devj,i+ui)

    card(Rj)  (2.9)

    Where   R j   is the set of all relevant items, and   P (u) j   is the prediction of item   j .Worth noting is that this implementation does not depend on how the user hasrated individual items, but instead only on the user’s average rating and on whichitems that the user has rated.

    2.7.4 Weighted Slope One

    The Slope One scheme suffers from the drawback that the number of ratings ob-served is not taken into consideration when predicting ratings. Consider the examplewhere we wish to predict user Adam’s rating of item A given Adam’s rating of theitems J and K. If 3000 users have rated the pair of items B and A, whereas only50 users have rated the pair of C and A. Then it is likely that Adams rating of item B is a far better predictor for item A than Adams rating for item C is. Withthe Weighted Slope One algorithm it is possible to increase the weight of the morerelevant ratings, in order of mitigating this effect.

    P (u) j  =

    iS(u)−j

    ((devj,i+ui))∗cj,iiS(u)−j

    (cj,i)  (2.10)

    Where c j,i is the number of relevant items in the set S, considered to be the weight.

    By using Table 2.1, the weight between Vikings  and  Game of Thrones  is representedby the number of users that have rated both of those items, formally representedas   c j,i. In   Table 2.1   this was done by the users John, Henrik and Anna, resultingin a weight of 3. A table representation of all weight from this example is:

    11

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    18/38

    CHAPTER 2. BACKGROUND

    Table 2.2.   Table of weights between TV-shows from   Table 2.1

    Vikings Game of Thrones Breaking Bad Band of Brothers Suits

    Vikings 4 3 3 2 4

    Game of Thrones 3 4 3 1 3

    Breaking Bad 3 3 4 2 3

    Band of Brothers 2 1 2 2 2

    Suits 4 3 3 2 4

    Using this table and the formula 2.10 presented previously, Sophie’s predicted rat-ing for Vikings using her ratings for   Game of Thrones   and   Breaking Bad   can becalculated as follows:

    (3∗3∗(3+3/2))+(3∗3∗(2+1/3))(3+3)∗3   =

      61.518   = 3.41   (2.11)

    12

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    19/38

    Chapter 3

    Method

    This section describes the method of the study. It is mainly a literature study withthe addition of running research software in order of testing and examining results.

    3.1 Datasets

    The data used in this research is collected and supplied by GroupLens, generatedusing their MovieLens website. MovieLens is a web-based research recommendersystem from 1997. In order to evaluate the chosen algorithms effectively, differentsized datasets have been used. MovieLens supplies datasets with a total of 100.000,

    1M, 10M and 20M ratings. The ratings are on a scale of 1 to 5 and have beencollected from 1000 up to 138,000 users, depending on the dataset. Each userincluded in those datasets have rated at least 20 movies.

    3.2 Testing hardware

    The test results presented in this report was brought forward using the followingcomputer hardware specifications:

    Table 3.1.   Computer specifications

    CPU Intel Core i5-36570K (4 cores @ 3.4GHz)

    GPU Geforce GTX 580RAM DDR3 8GB 2133MHz

    SSD 128GB Samsung

    Disk 1TB Western Digital 7200rpm

    Motherboard ASUS P8Z77-V PRO

    OS Windows 10 Pro 64-bit

    13

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    20/38

    CHAPTER 3. METHOD

    3.3 Algorithm evaluation testing

    This section will present the evaluation tests that has been done. Which algorithmsand parameters were used. The evaluation testing were done using MovieLens’sdatasets consisting of 100K and 1M ratings[6]. The accuracy of the algorithms willbe examined and tested with the crossfolding evaluation technique, explained insection 3.5.1. The result of the crossfolding algorithm will produce an error whichis calculated by a root mean square error, RMSE. The error is a measure of howaccurate a prediction is, compared with the actual value.

    3.3.1 Conventional item-based CF

    The first tests were run based on the conventional item-based CF algorithm aspresented in  section 2.7.2 . Tests were run using various algorithm modifications.With varying neighborhood sizes ranging from values 1 up to 250. When too manyneighbors are taken into account the result has been shown to be less accurate,therefore the upper limit of 250 was chosen. The algorithm was also modified bytesting similarity measurements using either Pearson-Correlation, Cosine similarityor Adjusted cosine similarity.

    3.3.2 Slope One

    Both the weighted and unweighted versions of the Slope One algorithm, presentedin   section 2.7.3   and   section 2.7.4   were tested. They were tested using different

    deviation damping levels ranging from 0 to 6. Deviation damping is a value that isadded to the number of coratings when calculating the deviation of item pairs. Thedamping levels used in this study were chosen as increasingly high ones appearedto affect the algorithms accuracy negatively.

    3.4 Algorithm recommendation performance testing

    The performance evaluation was carried out as a run time comparison between eachalgorithms and parameters. This was done using MovieLens’s datasets consisting of 1M and 10M ratings, testing each algorithm 100 times in order of calculating a mean-

    value representation of their run times. The algorithms were set to recommend 10movies for two randomly chosen users. User u   with few rated objects and  v   withmany, where u  and v  was set to be the test users for all tests. Model building timeswere also noted as they differ between each algorithm.

    3.5 Validation of results

    This section provides techniques used in order of validating the results acquiredfrom running tests described in section 3.3.

    14

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    21/38

    CHAPTER 3. METHOD

    3.5.1 K-fold cross-validation

    An implementation of k-fold cross-validation is present in the Lenskit evaluationalgorithm. The dataset is partitioned into an arbitrary number of partitions whereone of the k  partitions is randomly selected to be the validation partition, the otherk-1   partitions are used as training data. Each partition will be processed by thecross-validation exactly once, the result from the k-1 validations can then be aver-aged to produce one estimation of the entire dataset[7].

    The number of partitions used in our tests were 5 for all tests, meaning that onepartition was used for validation and the rest for training. This is also the defaultvalue used in Lenskit.

    3.5.2 Root Mean Square Error

    During the cross-validation there are several methods to calculate the error betweenestimations and the actual value. All tests that were run in this study used RMSEas error calculation[9]. The formula for RMSE is presented below.

    RMSE  = 

    1n

    ni=1 (wi − vi)

    2(3.1)

    Where w  is the set of actual ratings and v  is the set of predicted ratings. The closerthe RMSE value is to 0, the smaller is the error. Meaning that a value closer to 0

    is more accurate than a higher one.

    15

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    22/38

    Chapter 4

    Results and analysis

    This section will present the results acquired from testing. The tests that have beenrun are accuracy evaluations and recommendation performances.

    4.1 Algorithm evaluation results

    The accuracy of an algorithm is an important measurement that is taken into ac-count when the overall benefits of an algorithm is presented. The following plotpresents the accuracy measurement with RMSE presented in the y-axis, for themost accurate algorithm configurations from respective algorithm family. See  Ap-

    pendix A   for more evaluation results.

    Figure 4.1.   Evaluation results

    16

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    23/38

    CHAPTER 4. RESULTS AND ANALYSIS

    4.2 Recommendation performance results

    The time it takes to recommend items to the user is a critical factor to take intoaccount when choosing which algorithm to use for your project. Therefore it wasdecided to test the runtime performance measured in seconds for the algorithmsfound to be the most accurate in the previous section.

    The following table presents the runtime required to build the prediction model foreach of the tested algorithms, for datasets of sizes 1M and 10M ratings.

    Table 4.1.   Performance testing results - model-building runtime

    Algorithms 1M Buildtime (s) 10M Buildtime (s)

    Conventional item-based 20 neighbors 17.266 300.598Adjusted cosine item-based 20 neighbors 17.278 299.992

    Slope One, deviation damping 5 23.895 419.467

    Weighted Slope one, deviation damping 5 23.661 415.717

    The conventional item-based algorithm and the Adjusted cosine item-based algo-rithms perform better than the regular Slope One and the weighted Slope Onealgorithms, both with the 1M and the larger 10M dataset. The mean differencebetween the item-based and Slope One families calculated with both the 1M and10M datasets shows that the item-based family only took 72 % of the Slope Onebuild time. The build time difference between item-based and adjusted item-basedis slight and so is also the build time for Slope One compared to the weighted SlopeOne.

    The results presented in the following table were all collected using MovieLens’s 1Mdataset, where recommendations were made for the user with id 4, who has givenratings to 20 different movies as well as for user 53, who has rated 683 movies.The runtime is a computed mean value result from running each algorithm for 100iterations.

    Table 4.2.  Performance testing results on dataset 1M

    Algorithms Runtime user 4(s) Runtime user 53(s)

    Conventional item-based 20 neighbors 1.36722 1.60939

    Adjusted cosine item-based 20 neighbors 1.34965 1.59801

    Slope One, deviation damping 5 1.21249 1.52287

    Weighted Slope one, deviation damping 5 1.21609 1.52383

    The results from this test show that the algorithms from the Slope One family isnoticeably faster for users with few ratings. The tests run for user 53 shows less of a difference. The difference between the conventional item-based algorithm and theAdjusted cosine item-based is small. The Adjusted cosine item-based algorithm only

    17

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    24/38

    CHAPTER 4. RESULTS AND ANALYSIS

    differ by a few milliseconds compared to the conventional one. A similar pattern is

    shown in the difference between the weighted Slope One and the regular Slope One.

    Table 4.3.  Performance testing results on dataset 10M

    Algorithms Runtime user 4(s) Runtime user 53(s)

    Conventional item-based 20 neighbors 13.62649 13.96945

    Adjusted cosine item-based 20 neighbors 13.52653 13.72841

    Slope One, deviation damping 5 13.04692 12.70557

    Weighted Slope one, deviation damping 5 12.42521 12.56744

    The results from the test shows that the performance differences seen in  Table 4.2 are increasing as the amount of data grows larger. Both regarding the differencesbetween the two conventional algorithms and the differences between Slope Oneand the weighted Slope One. Also the difference between both algorithm familiesis more noticeable.

    18

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    25/38

    Chapter 5

    Discussion

    The testing methods chosen have not been perfect, as problems have been encoun-tered which may affect our testing results and thereby our final conclusions. Thischapter will discuss the results, the testing methods used in this study, the problemsthat were encountered and the conclusions that could be drawn.

    5.1 Comparison of results with regards to accuracy

    The algorithm evaluation results depicted in figure 4.1 are the best algorithms fromthe conventional item-based algorithm family and the Slope One algorithm family.

    Note that the Pearson-Correlation version of the conventional item-based family isnot listed as it was not as accurate as the ones depicted. Among the four algo-rithms that have been listed one can see that the Weighted Slope One algorithm isthe one with the smallest error and least amount of spread. Meaning that it is themost accurate algorithm together with the Adjusted cosine-similarity item-basedalgorithm for the datasets and evaluation method previously described. The SlopeOne algorithm proved to be efficient when it comes to run time performance andyet both simple to implement and accurate in its recommendations.

    In table  4.1   it shows that the Slope One algorithms are slower to build, indepen-dent of dataset size. Therefore it is recommended to use the algorithm that fits

    your needs, as having to rebuild the model often will affect system performancenegatively. If simplicity in implementations is desirable, the Slope One algorithm ispreferred however.

    The recommendation performance depicted in table 4.2   and  4.3   showcase that theSlope One algorithm is slightly, sometimes even noticeably faster than the item-based family. This result shows that the much simpler algorithm still performs verywell, making it a good alternative to the other.

    The most accurate deviation damping parameter value for the Slope One algorithm

    19

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    26/38

    CHAPTER 5. DISCUSSION

    was found to be 5, values past this point appeared to lose accuracy and was there-

    fore not tested thoroughly. It would be of interest to test how this parameter wouldaffect the algorithms behavior on various different datasets and could therefore beconsidered an improvement to consider if tests were to be remade.

    Regarding the conventional item-based algorithm, one parameter which supposedlywould have affected our results was to also test the minimum neighborhood size,meaning the smallest amount of neighbors to consider in the trials. The neighbor-hood size parameter used only affects the number of neighbors actually consideredfor each prediction and sets an upper limit. Testing more specifically defined sizeintervals would have given interesting results. The results would much likely varyheavily depending on the dataset used.

    5.2 Criticism of testing methods

    Although the results provided in this study have been thoroughly studied and arethe results from multiple tests and attempts, it may not be perfect. Problems havebeen met along the way that have limited the testing capabilities of this study,which may have affected the final results in some way. This section introducesthose problems and discusses how they may have interfered with the final resultspresented.

    5.2.1 LenskitLenskit is a credible and well documented software that has been cited in a numberof published papers, tests made in this thesis could also have been implemented inother software to avoid biased results.

    5.2.2 Cross-folding

    We chose to use the default value of 5 partitions for cross-validation for all tests,independent of data size, because of [5] doing the same for the same sets of data.Other ratios for training and validations as well as different evaluations techniquesmight have been interesting to test.

    5.2.3 RMSE

    The error measuring metric used in this study was chosen to be Root Mean SquareError (RMSE) at an early stage, as it was found that only the error size differedbetween RMSE and MAE (Mean Absolute Error), but their distributions along they-axis were mostly consistent. Time constraints for this study were also an impor-tant factor as to why this decision was made. RMSE is also a popular metric inmany studies, as well as being the one used in the Netflix Prize competition.

    20

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    27/38

    CHAPTER 5. DISCUSSION

    The metric has been critized by researchers however, and a combination using both

    RMSE and MAE may have been preferred.

    5.2.4 Layout of testing

    The tests could very well be perfected even more to achieve a higher confidence of results. One improvement to be made is to run each test for even more times thanwhat was done in this study. As well as running tests of different datasets and ondata of larger volume, a problem discussed in the next section.

    However, as each of the tests were run under the same conditions with the samehardware and software it can be argued to be efficient enough to draw a conclusion

    to the study, with respect to the given problem.

    5.3 Challenges with limited hardware

    Testing using the hardware presented in the previous section brought along oneproblem, critical in order of showcasing a confident result. This problem is the lackof RAM. It was shown that 8GB of RAM was not enough to run our tests on biggerdatasets, as the testing software requires a large size of memory to be allocated onthe heap. This limited our evaluation testing to the datasets consisting of 100Kratings and 1M ratings. Recommendation performance testing was limited to thedatasets of sizes up to 10M ratings. Whereas two additional datasets with respec-

    tive sizes of 10M and 20M ratings were available from the MovieLens project itself.

    The lack of memory resulted in thrashing, as well as not being able to run the testsat all. As a result of this, some tests had to be run multiple times in order of acquiring data that was understandable. Therefore there is a risk of errors in thedata, however as all tests were run under the same circumstances and only the goodresult-data have been used, this should not be as big of an issue.

    5.4 Final conclusion

    This study shows that Slope One is a good alternative for the more complex con-ventional item-based approach. Slope One delivers accurate recommendations andperforms well even though its high grade of simplicity, relative to the other. Thishas been shown in both small datasets as well as in bigger datasets with 10Mratings. The conventional item-based algorithm has a noticeably faster buildtime,something that may increase its attractiveness compared to Slope One. However alltests made were carried out on datasets from the same source, the results may havediffered if tests were run on other datasets. The lack of hardware support in ourtesting setup prevented us from including datasets of greater size. This report failsto draw a fully confident conclusion regarding scalability and performance in big

    21

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    28/38

    CHAPTER 5. DISCUSSION

    data environments, therefore further testing on greater datasets with appropriate

    hardware is recommended.

    22

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    29/38

    Bibliography

    [1] Sarwar. Badrul et al. “Item-based collaborative filtering recommendation algo-rithms”. In: vol. 2001. 2001, pp. 285–295.  doi:   http://dx.doi.org/10.1145/

    371920.372071.[2] Ekstrand., Rield., and Konstant. “Collaborative Filtering Recommender Sys-

    tems”. In: Foundations and Trends in Human–Computer Interaction  4.2 (2010),pp. 81–173. url:  http://files.grouplens.org/papers/FnT%20CF%20Recsys%20Survey.pdf.

    [3] Ricci. Francesco et al.  Recommender Systems Handbook . Springer, 2011.   isbn:978-0-387-85820-3.

    [4] Daniel Lamire and Anna Maclachlan. “Slope One Predictors for Online Rating-Based Collaborative Filtering”. In: (2005).   url:   http :/ /lemire .me /fr/documents/publications/lemiremaclachlan_sdm05.pdf .

    [5] Ekstrand. Michael et al. “Rethinking The Recommender Research Ecosystem:Reproducibility, Openness, and LensKit.” In:  In Proceedings of the Fifth ACM Conference on Recommender Systems (RecSys ’11). ACM, New York, NY,USA (2011), pp. 130–140.  doi:   10.1145/2043932.2043958.

    [6]   MovieLens Datasets .  url:   http://grouplens.org/datasets/movielens/.

    [7] Jeff Schneider.   Cross Validation . 2007.   url:   https: //www.cs.cmu.edu/~schneide/tut5/node42.html .

    [8] Xiaoyuan Su and Taghi M. Khoshgoftaar. “A Survey on Collaborative FilteringTechniques, Advances in Artificial Intelligence”. In: 2009, 421425 (2009), pp. 1–19.  url:   http://www.hindawi.com/journals/aai/2009/421425/.

    [9] Chai. T and Draxler. R. “Root mean square error (RMSE) or mean abso-lute error (MAE)? – Arguments against avoiding RMSE in the literature”. In:(2014).   url:   http://www.geosci-model-dev.net/7/1247/2014/gmd-7-1247-2014.pdf.

    23

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    30/38

    Appendix A

    Evaluation result plots

    Figure A.1.  Results from the chosen four algorithms

    24

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    31/38

    APPENDIX A. EVALUATION RESULT PLOTS

    Figure A.2.   Results from evaluating item-based

    25

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    32/38

    APPENDIX A. EVALUATION RESULT PLOTS

    Figure A.3.   Results from evaluating adjusted item-based

    26

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    33/38

    APPENDIX A. EVALUATION RESULT PLOTS

    Figure A.4.   Results from evaluating pearson item-based

    27

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    34/38

    APPENDIX A. EVALUATION RESULT PLOTS

    Figure A.5.   Results from evaluating adjusted pearson item-based

    28

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    35/38

    APPENDIX A. EVALUATION RESULT PLOTS

    Figure A.6.  Results from evaluating slope one

    29

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    36/38

    APPENDIX A. EVALUATION RESULT PLOTS

    Figure A.7.   Results from evaluating weighted slope one

    30

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    37/38

    APPENDIX A. EVALUATION RESULT PLOTS

    Figure A.8.  All results in one plot, for overview purposes

    31

  • 8/17/2019 KTH Kexjobb User Behavior Prediction

    38/38