comparison of recommender systems

Upload: ezekill

Post on 03-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Comparison of Recommender Systems

    1/6

    Comparing performance of Collaborative Filtering

    Algorithms

    Vandana A. PatilAssistant Professor, Department of Information Technology

    St. Francis Institute of Engineering

    Borivali(W), Mumbai, India

    E-mail: [email protected]

    Lata RaghaAssociate Professor, Department of Computer Engineering

    Terna Engineering College

    Navi Mumbai, India

    E-mail: [email protected]

    Abstract Recommender systems are widely used for making

    personalized recommendations for products or services during a

    live interaction nowadays. Collaborative filtering is the most

    successful and commonly used personalized recommendation

    technology. The open nature of collaborative recommender

    systems provides an opportunity for malicious users to access the

    systems with multiple fictitious identities and insert a number offake user profiles in an attempt to bias the recommender systems

    in their favor.

    In the proposed work, we will explore to combine the user trust

    mechanism with collaborative filtering algorithm for the purpose

    of improving the robustness of recommendation algorithm and

    ensuring the quality of recommendations. We propose

    computational model of trust and then a collaborative filtering

    algorithm based on it. This User Trust Based collaborative

    Filtering Algorithm is further modified considering impact of

    time on the user ratings. The performance of all the three

    algorithms is compared in terms of Mean Absolute Error

    between the actual and predicted rating by the respective

    recommender system.

    Keywords: Personalized Recommendation; Recommender

    System; Collaborative Filtering (CF); User Trust Model

    I. INTRODUCTIONThe rapid development of e-commerce has brought us a

    great convenience. But with the increasing expanding of E-commerce, there are more and more goods in internet whichmeans customers need to spend a lot of time in finding whatthey like or what they want. Numerous customers may losetheir patient and interest in online shopping because they areunable to search things in short time, for a lot of time is spenton scanning irrelevant information and products.

    Personalized recommendation system in e-commerce isemerged in this context to solve this problem, which takesadvantage of customers internet based on customerinformation, analyses customers hobby and interests, andinitiatively provides personalized products to them and helpsthem make purchase decisions.

    Collaborative filtering is the most successful andcommonly used personalized recommendation technology. Its

    basic idea is to recommend goods which other similar

    customers are also interested in to the target user, according toprinciples of user interest similarity.

    Traditional user-based collaborative recommendationalgorithm uses the similarity of users tastes to generaterecommendations. This profile level similarity method is

    subject to manipulation by malicious users. If the malicioususers changed the attack strategies, in particular, they had somecollaboration with others; this method would not effectively cutdown the negative effect. Due to the lack of trust betweenusers, they couldnt clearly judge whether one can be trusted ornot.

    Thus traditional collaborative recommender systems cannot prevent this kind of malicious attack. Thus how to ensurethe quality of recommendations for personalized collaborativerecommender systems in the face of profile injection attackshas become an important issue [1].

    Recent research on collaborative recommender systems hasfocused on techniques that can be used to protect the predictive

    integrity of collaborative recommenders from malicious profileinjection attacks.

    Traditional user-based collaborative recommendationalgorithm uses the similarity of users tastes to generaterecommendations. This profile level similarity method issubject to manipulation by malicious users. If the malicioususers changed the attack strategies, in particular, they had somecollaboration with others; this method would not effectively cutdown the negative effect. Due to the lack of trust betweenusers, they couldnt clearly judge whether one can be trusted ornot.

    Thus the reliability of users (Trust) should also take into

    account within the recommendation process. It is also observedthat the interest of a user at present time does not remain thesame in the coming time [2]. The interest changes with timeand hence it is also felt that while calculating user trust basedon the user interactions with reference to item ratings, timefactor shall also be considered to improve the performance ofthe recommender system .

    This paper compares the traditional collaborative filteringalgorithm with the user trust based collaborative filteringalgorithm and also with its modified version which is time

    based user trust based collaborative algorithm.

    2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

    978-1-4577-2078-9/12/$26.002011 IEEE 1

  • 7/29/2019 Comparison of Recommender Systems

    2/6

    This paper is organised as follows:

    Section 1 is the introduction to collaborative filtering,Section 2 is the brief overview of the related work, Section 3discusses the basic steps in traditional collaborative filteringalgorithms, proposed system with all its three modules isdescribed in section 4, evaluation matrices and methodologyadopted is given in section 5 and section 6 respectively.,experimental results are presented in section 7, followed byconclusion.

    II. RELATED WORKPersonalized recommendation system in e-commerce takes

    advantage of customers internet based on customerinformation, analyses customers hobby and interests, andinitiatively provides personalized products to them and helpsthem make purchase decisions.

    There are two kinds of recommender systems, contentbased recommender systems and collaborative filteringrecommender systems.

    Content-based recommender systems, such as Libra,CiteSeer and WebMate are suitable to recommend items inwhich machine can automatically analyze their content.Research shows that content-based recommender systems arenot good enough in most of the cases.

    Collaborative filtering recommender systems emerge as anew method to overcome the shortcomings of content-basedrecommender systems. With the development of collaborativefiltering recommendation algorithm, many improvedalgorithms are proposed.

    There are three types of traditional collaborative filtering:

    User-based collaborative filtering, which is to findneighbors with similar interests or hobbies, and thenrecommend target customer with certain kind items

    based on the neighbors.

    Model-based collaborative filtering, this algorithm usecustomers historical data to build a model, and then

    predict resources target user is interested according tothis model

    Item-based collaborative filtering, this method focuson comparing the similarity between items, andrecommend target user with the item he havent visitedaccording to the items which he had visited

    Montaner, M., Lopez, B., de la Rosa, J.L. [4] introduced a

    trust model into the recommendation algorithm, so users couldget the recommendations from the trust-building group. Thetentative idea was that trust-factor was based on the customerssatisfaction with the recommended items and trust value could

    be dynamically adjusted. The drawback of this method waslack of trust information among users at the beginning ofrecommendation, and whats more it was inefficient to buildthe trust group. So it was not an effective way to defend againstthe malicious noise.

    Paolo, M., Meersman, R., Tari, Z. (eds.), [5] proposed amethod that users who accepted the recommendations would

    evaluate the recommended items. The active user would get arating that stands for the trust value of target user to the activeuser. The trust information was propagated among users whohad a trust relation with the accepter.

    John, O., Barry S. [6] proposed a model in which the basicidea was to build a relation between users with recommendeditems. Based on the tentative idea, there would be a higherweight to active user who had more accurate recommendationson items than those with poor records within therecommendation process. They supposed that users with a highauthentic value have less intention to deceive others.

    The item-trust recommendation algorithms were moreeffective to defend the random attacks [7], but if the malicioususers changed the attack strategies, in particular, they had somecollaboration with others; this method would not effectively cutdown the negative effect. Due to the lack of trust betweenusers, they couldnt clearly judge who accepted item, who can

    be trusted or not.

    To overcome the drawback, in this paper we explore toexploit trust information explicitly expressed by the users to

    improve the robustness of recommender systems.

    III. BASIC STEPS IN CF ALGORITHMIn daily life, people tend to consult their friends or trust for

    the unfamiliar problem or something, and make their ownchoices based on these judgments and opinions. A typicalcollaborative filtering algorithm is based on the user's interestsimilarity. Its basic principle is to get user neighbors usinghistorical ratings data; recommend to the target user accordingto rates similar to the nearest neighbor of the score data. This

    process comprises following steps to complete: [4]

    A. Data representationCollaborative filtering algorithm of traditional system is

    based on the user - item rating matrixR(m,n) to find the targetuser's nearest neighbor set. Among them, R(m,n) is a mnorder matrix , m-rows show users and n-lines show out itemsand the cell shows the score value by user i on item j.

    B. Find the nearest neighborHere, we use the Pearson correlation to calculate the

    similarity of users.

    Let D = {U, I, R} be a data source of a recommendersystem, where U= {user1, user2, ..., user m} is a set of users ofthe system,I= {item1, item2, ..., item n} is a set of items of the

    system, and R is a user ratings matrix, where ri,j belongs to Rrepresents the rating ofuser i on item j. The similarity betweenuser u and user n is given by the following Pearsonscorrelation coefficient Equation

    =

    2

    ,,

    2

    ,,

    ,,,

    )()(

    ))((),(

    ucunuucunu

    ncnucunu

    RRICRRIC

    RRRRICnuSim

    (1)

    WhereRu,cand Rn c are the rating of user u and user n onitem c,RuandRnare the average ratings over all rated items for

    2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

    978-1-4577-2078-9/12/$26.002011 IEEE 2

  • 7/29/2019 Comparison of Recommender Systems

    3/6

    u and n, respectively. The setIu nstand for the rating items onwhich user u and user n have co-rated. It is important to notethat the coefficient can be computed only if there are itemsrated by both the users [1].

    C. Produce Predictions

    =

    =

    +=n

    m

    n

    m

    jjc

    ici

    jisim

    RRjisim

    RP

    1

    1,

    ),(

    )).(,((2)

    Among them,iR is the average score of the user i,

    ),( jisim is the user i and j nearest neighbor centralized

    user's similarity coefficient,jcR is a score of user j on item C,

    jR is the user js average score, N is the number of nearest

    neighbor.

    IV. PROPOSED SYSTEMThe proposed system contains three different models

    described below:

    A. User-Based CF Model (UBCF)This is the basic CF based recommendation model. The

    results obtained from this model are required to compare withthe results of enhanced CF techniques.

    In the user based CF recommendation system the userratings data are usually described as a user-item matrix Rm*n,in which m means the no. of all users, n is the number of allitems, and Ri, j is the score of item j rated by user i, indicatingthe users preference degree for the item.

    The most important step in the user-based CF is thesearching of the target users neighbor. Usually, the similarityis adopted as a means to measure the similar degree of userinterests and hobbies through the common user ratings data.

    B. User-Trust Based CF Model (UTCF)It is the enhanced version of user-based CF model. In this it

    is proposed to address the limitations of previous model. In thismodel trust between users is used to compute/predict the userratings and based on that recommendations are provided.

    To model the degree of trust, we assume that target user canassign a certain value to the active user by using the co-rateditems of the users.

    We use two types of trust: direct trust and recommendationtrust. The former can be constructed by users with exchangeexperiences such as friendship, good views. The latter is creditof a user award by the other users who are reliable by public[5].

    Direct Trust:

    Let unD T represent the direct trust of the target user for the

    active user, the direct trust value is given by Equation

    1

    ki

    nu i

    n k

    tD T

    ==

    (3)

    Where int represents the trust value of target user u for

    active usern on item i and k is the number of the set whichcontains items that the active user and the target user have co-

    rated.

    Recommendation Trust:

    The recommendation trust is computed with the help oftarget users trust group who has an interaction with the activeuser. Let m be a set of trust group of the target user, whichcontains the users who have a reliable interaction with theactive usern.

    Letu

    nIT represent the recommendation trust that is

    computed by the target user u for active usern.

    1

    1

    ( )i i

    i

    ku

    m mu

    n ku

    m

    i

    T C r nIT

    T

    =

    =

    =

    i (4)

    Wherei

    u

    mT is a trust value of the target user u for the

    reliable user in the set m and Cr(n) is credibility of active usern by the reliable user in the trust group.

    User Trust Value:

    Let unTrust stand for the combined trust value of the target

    useru for the active usern, it is computed in combination with

    the direct trust unDT , which is the trust value of the target user

    u for the active user n, and the recommendation trust unIT

    which is the expression of the set of users trusted by the targetuseru. The combined trust value is then given by Equation

    u u u

    n n nTrust DT IT = + (5)

    Where , are weighting factors to adjust the two parts,they are constrained by the equation + = 1. Aftercalculating the user similarity and trust value the compoundweight is generated which is used in generating predicted

    ratings.

    C. Time Based User Trust CF Model (TBUTCF)This method will improve the existing user-trust based

    algorithm by incorporating the weight of user rating time,which will reflect the change of user interest with time andenhance the evaluation accuracy [2].

    As the users interest may change dynamically over thetime, the user may have different ratings for the same item atdifferent times. However, the traditional method has the equal

    2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

    978-1-4577-2078-9/12/$26.002011 IEEE 3

  • 7/29/2019 Comparison of Recommender Systems

    4/6

    treatment to the user ratings in the search for the nearestneighbors, which deteriorates the accuracy of the neighborrecognition and results in poor quality of recommendation.

    So, we introduced the time weight to factor-in the changeof user interest with time to improve the accuracy of producingthe recommendation.

    The value of time weight can be given by following

    equation.

    ( , )

    1( )

    1c r i jt t

    T

    H t

    e

    =

    +

    (6)

    Where, parameters tc represents the current time at whichthe recommendation is given and tr(i,j) represents the time atwhich the user i has rated the item j. T is the time span constant[13].

    We make use of the time weight to modify the previousmethod to calculate the similarity between users. The improvedsimilarity measure formula can be described as follows:

    ,, ( )i ji jR R H t= (7)

    , ,

    2 2

    , ,

    ( )( )

    * ( , )( ) ( )

    uv

    uv uv

    u i u v i v

    i I

    u i u v i v

    i I i I

    R R R R

    Sim u vR R R R

    =

    (8)

    Where,i j

    R is the revised rating score for item j by user i

    and ,i jR is the original rating score for item j by user i.

    To calculate the similarity in the User-Trust based CFalgorithm we used the improved rating scores derived using thetime weight H(t). This gives the improved similarity betweenthe users and hence improves the performance of therecommendation system.

    The experimental results of above mentioned models withrespect to the Mean Absolute Error (MAE) are calculated andcompared at the end to evaluate the performance of them.

    V. EVALUATION MATRICESAccuracy is an major indicator for the evaluation of

    recommended system performance. As one of the mostcommonly used methods, the mean absolute error (MAE) isadopted as a metric here to compare the quality of our

    proposed approach with other collaborative filtering methods.

    Supposing the top-Nprediction rating set for the active user

    isNppp .,,........., 21 , and corresponding actual rating set is

    Nqqq .,,........., 21 the MAE can be defined as follows:

    N

    qp

    MAE

    N

    i

    ii=

    = 1 (9)

    Where,N is the number of the items recommended to theactive use. The lower value of MAE indicates more accuracy inthe prediction for user interest of the recommendation system

    [1].

    VI. METHODOLOGY ADOPTEDTo evaluate the performance of the algorithms it was

    necessary to have the ratings data for different items by actualusers. This was very important to evaluate the performance ofthe algorithms based on realistic information.

    So the data was collected from the MovieLens web site.MovieLens dataset was collected by the GroupLens ResearchProject at the University of Minnesota. The dataset used in thisimplementation consists of 100,000 ratings (1-5) from 943users on 1682 movies [17]. As the basic dataset fromMovieLens had records for 943 users, it was not possible to run

    the case studies using the entire dataset. The reason being theamount of time required to complete all the case studies

    proposed in our implementation would be enormously high.

    Hence to optimize on time required to run the various caseit was decided to choose network with different sizes varyingfrom 200 to 400 nodes for analysis.

    The records so collected in the sub dataset were dividedinto two groups namely Training Dataset and Test Dataset.Training dataset was used as the known information to thealgorithm, and test dataset was used for evaluating thealgorithm. The ratio of training to test data was varied from70% to 90%. Finally the predicted ratings obtained by running

    the algorithms were compared with the actual ratings and MeanAbsolute Error (% MAE) was calculated to evaluate theperformance of the said algorithm.

    VII. EXPERIMENTAL RESULTSWith User based collaborative filtering algorithm, the

    implemented model was tested for various network sizes. The80% of the total dataset was provided as training dataset fornetwork of 250, 350 and 450 nodes.

    For each network size the performance of the model wasevaluated for different number of iterations. From the resultsobtained it was seen that the trend of MAE gets stabilized aftercompleting 800 and above iterations.

    From the observations made and to have sufficient bufferfor number of iterations to be conducted to have the fareanalysis of the algorithm it was decided to keep the number ofiterations as 1200.

    Similarly On analysis it is observed that the UTCFalgorithm has shown best performance at 0.7 value of alpha

    parameter also the Time constant T can be fixed at 45 dayswhich provides the optimum result for TUTCF algorithm.

    2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

    978-1-4577-2078-9/12/$26.002011 IEEE 4

  • 7/29/2019 Comparison of Recommender Systems

    5/6

    Various case studies are conducted for varied data sparcityand network sizes on all the three algorithms separately andthen simultaneously.

    Combined Analysis:

    After freezing all the required parameters the performanceof all three algorithms is evaluated simultaneously.

    The parallel evaluation of algorithms was done for variousnetwork size containing 250, 300 and 350 nodes. For eachnetwork size the sparsity in the data was varied from 70% to90% in the steps of 10% and the performance results werecaptured. The trends are shown in Fig. 1 to Fig. 6 below fornetwork containing 250, 300 and 350 nodes with trainingdataset percentage as 70% and 90% respectively.

    Similar trend is observed in the graphs of remaining all casestudies. The impact of change in network size and data sparcityon performance of individual algorithms with same trainingand test datasets was analyzed, to draw final conclusion. Fromthe results of the parallel evaluation of all the three algorithmsit was noted that the performance of the TBUTCF algorithm is

    far better than other two algorithms. The trend of performanceof TBUTCF was consistently stable for different network sizesas well as for variable sparsity in the datasets.

    Figure 1. Combined performance evaluation for network with 250 nodes at

    70% dataset

    Figure 2. Combined performance evaluation for network with 250 nodes at

    90% dataset

    Figure 3. Combined performance evaluation for network with 300 nodes at

    70% dataset

    Figure 4. Combined performance evaluation for network with 300 nodes at

    90% dataset

    Figure 5. Combined performance evaluation for network with 350 nodes at70% dataset

    Figure 6. Combined performance evaluation for network with 350 nodes at

    90% dataset

    CONCLUSION

    We had proposed improvement in the recommendationsystem based on traditional user based collaborative filtering.The user trust computation model and the corresponding usertrust prediction algorithm was put forth to be combined withthe traditional user based collaborative filtering algorithm.From the results captured it is observed that, the data sparsityhas huge impact on the performance of UBCF and network size

    play almost no role in deciding the performance of UBCF.

    On the contrary there is no significant impact of datasparsity and variations in network sizes on the performance of

    UTCF algorithm. This indicates that due to incorporation ofuser trust, the algorithm is able to reduce the negative impact ofsparse as well as malicious data on the recommender system.

    In UTCF implementation it was noticed that the ratingsgiven for various items by various users at different time havesame weightage. But practically it is not possible that users

    preference for any item remains constant forever. Userspreference is bound to change with certain time interval. Hencean enhancement was also proposed in UTCF algorithm toconsider the impact of change in user interest with time. Forthis the time weight was devised, which signifies the

    2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

    978-1-4577-2078-9/12/$26.002011 IEEE 5

  • 7/29/2019 Comparison of Recommender Systems

    6/6

    importance to be given to a particular item rating. This timeweight parameter was combined with the existing UTCFalgorithm and new algorithm was formulated, which is calledas Time Weight Based User Trust Collaborative Filteringalgorithm.

    The performance of all the algorithms have been evaluatedand compared in terms of MAE (Mean Absolute Error) valuecalculated for different number of nearest neighbors groups.Analysis of the results reveals that introducing user trust inUBCF algorithm improves the quality of recommendations and

    provides stability and consistency to recommender system. Inaddition to this considering the impact of change of userinterest with time further enhances the performance of UTCFalgorithm.

    REFERENCES

    [1] Fuzhi Zhang, Long Bai, and Feng Gao, A User Trust-BasedCollaborative Filtering Recommendation Algorithm, Springer-VerlagBerlin Heidelberg 2009, ICICS 2009, LNCS 5927, pp. 411424

    [2] Zhimin Chen, Yi Jiang, Yao Zhao, A Collaborative FilteringRecommendation Algorithm Based on User Interest Change and Trust

    Evaluation, International Journal of Digital Content Technology and itsApplications Volume 4, Number 9, December 2010, pp 106-113

    [3] J. Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen,Collaborative Filtering Recommender Systems , School of ElectricalEngineering and Computer Science Oregon State University 2008.

    [4] Montaner, M., Lopez, B., de la Rosa, J.L., Developing trust inrecommender agents, 1st International Conference on AutonomousAgents, ACM Press, Bologna (2002), pp. 304-305

    [5] Paolo, M, Meersman, R., Tari, Z. (eds.), Trust-aware collaborativefiltering for recommender systems, LNCS, vol. 3290, Springer,Heidelberg (2004),pp. 492-508.

    [6] John, O., Barry, S., Trust in Recommender Systems, 10thInternational Conference on Intelligent User Interfaces, ACM Press,New York (2005), pp.167-174.

    [7] Xinyi Bu, Xiujuan He, An Optimized Trust Factor Based CollaborativeFiltering Recommendation Algorithm In E-Commerce, InternationalConference of Information Science and Management Engineering, IEEE2010, pp 468-471

    [8] Yanhong Guo,Xuefen Cheng, Dahai Dong; Chunyu Luo RishuangWang, An Improved Collaborative Filtering Algorithm Based on Trustin E-Commerce Recommendation Systems, research supported by agrant from the Chinese National Science Foundation Key Project, IEEE2010, pp 1-4

    [9] Xiao Cheng Chen, Run Jia Liu, Hui You Chang, Research ofCollaborative Filtering Recommendation Algorithm Based on TrustPropagation Model, International Conference on Computer Applicationand System Modeling (ICCASM 2010), IEEE 2010, pp. V4-177-V4-183

    [10] Jia Yubo, Cai Hao, Huang Chengwei, A Collaborative FilteringRecommendation Algorithm Based on User Trust Model, FirstInternational Conference on Networking and Distributed Computing,IEEE 2010, pp. 213-217

    [11] Yang Huai-Zhen, Li Lei, An Enhanced Collaborative FilteringAlgorithm Based on Time Weight, International Symposium onInformation Engineering and Electronic Commerce, IEEE 2009, pp 262-265

    [12] Liang He, Faqing Wu, A Time-context-based Collaborative FilteringAlgorithm, International conference on Granular Computing IEEE2010, pp 209-213

    [13] Qian Wang, Min Sun, Cong Xu, An Improved User-model-basedCollaborative Filtering Algorithm, Journal of Information &Computational Science 8:10 2011, pp 1837-1846

    [14] Franc, ois Foussi, Marco Saerens, Evaluating performance ofrecommender systems: An experimental comparison, IEEE/WIC/ACM

    International Conference on Web Intelligence and Intelligent AgentTechnology, 2008, pp 736-738

    [15] Stephen Naicken, Anirban Basu, Barnaby Livingston and SethalatRodhetbhai, A Survey of Peer-to-Peer Network Simulators,stephennaicken.com/wp-content/.../paper-pgnet2006_p2psimsurvey.pdf

    [16] Bruno DEFUDE, P2P simulation with PeerSim, ASR option,January/February 2007

    [17] Datasets GoupLanes Research Files.htm

    2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

    978-1-4577-2078-9/12/$26.002011 IEEE 6