[acm press the 15th symposium - lisboa, portugal (2011.09.21-2011.09.23)] proceedings of the 15th...

Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems

María N. Moreno*, Saddys Segrera, Vivian F. López, María Dolores Muñoz and Ángel Luis Sánchez

Department of Computing and Automatic. University of Salamanca Plaza de los Caídos s/n, 37008 Salamanca

+34 923 294400

*[email protected]

ABSTRACT Recommender systems are becoming very popular in recent years, mainly in the e-commerce sites, although they are increasing in importance in other areas such as e-learning, tourism, news pages, etc. These systems are endowed with intelligent mechanisms to personalize recommendations about products or services. However, they present some serious drawbacks that impact in user satisfaction. First-rater and cold-start problems are two important drawbacks that take place respectively when new products or new users are introduced in the system. The lack of rating about these products or from these users prevents from making recommendations. Nowadays, traditional collaborative filtering methods have being replaced by web mining techniques in order to deal with scalability and performance problems, but first-rater and cold-start ones require a different strategy. In this work, we propose a methodology that combines data mining techniques with semantic data in order to overcome these two important shortcomings.

Categories and Subject Descriptors H.2.8 [Database Applications]: Data mining.

General Terms Algorithms, Design, Reliability, Experimentation.

Keywords Recommender Systems, Semantic Web Mining, Cold-Start, First-Rater, Associative Classification.

1. INTRODUCTION Most of the current e-commerce systems are endowed with some kind of procedures for helping the user to find products or services they are interested in. In general terms, recommender systems provide users with intelligent mechanisms to search items, therefore, they can be considered as a personalized shop. Numerous recommender methods have been proposed in the last years; however, the recommendations provided by this type of systems already have some important drawbacks. For instance, traditional collaborative filtering methods using nearest neighbor techniques present severe performance and scalability problems due to the high computer time required for finding the neighbors, which grows proportionally to the number of users and products

in the system. Data mining methods do not present this drawback since the recommender model is already built when the user accesses the system.

On the other hand, the low precision caused by the sparsity of the data is another important inconvenience. Sparsity is due to the fact that the number of product evaluations (ratings) provided by the users is lesser than the number required for making recommendations. Data mining techniques do not need so many ratings as the traditional collaborative filtering methods, but their precision is also affected by sparsity. Though data mining methods contribute to relieve some of the limitations of the recommender systems, by themselves are not able to solve other important ones such as first-rater and cold-start problems.

In this paper, an approach that combines web mining and semantic web is proposed in order to overcome these problems in the context of a movies’ recommender system, but it can be easily extended to other application domains. The recommender model is built by applying a data mining algorithm to semantically enriched data.

2. BACKGROUND Recommendation methods can be classified into two main categories [3]: Collaborative filtering and content-based approach. In the content based approach text documents are recommended by comparing between their contents and user profiles [3]. Collaborative filtering techniques predict product preferences for a user based on the opinions of other users. The opinions can be obtained explicitly from the users as a rating score or by using some implicit measures from purchase records as timing logs [6]. Currently there are two approaches for collaborative filtering, memory-based (user-based) and model-based (item-based) algorithms. Memory-based algorithms, also known as nearest-neighbor methods, were the earliest used [5]. They treat all user items by means of statistical techniques in order to find users with similar preferences (neighbors). The advantage of these algorithms is the quick incorporation of the most recent information, but they have the inconvenience that the search for neighbors in large databases is slow [7]. Model-based collaborative filtering algorithms use data mining techniques in order to develop a model of user ratings, which is used to predict user preferences.

Collaborative filtering, specially the memory-based approach, has some limitations in the e-commerce environment. Sparsity and scalability are serious weaknesses which would lead to poor recommendations [1]. Sparsity is due to the number of ratings needed for prediction is greater than the number of the ratings obtained because usually collaborative filtering requires user explicit expression of personal preferences for products. The second limitation is related to performance problems in the search

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IDEAS11 2011, September 21-23, Lisbon [Portugal] Editors: Bernardino, Cruz, Desai Copyright ©2011 ACM 978-1-4503-0627-0/11/09 $10.00

256

for neighbors in memory-based algorithms. These drawbacks may be minimized by means of data mining methods, however, there are other shortcomings that may occur even with the last ones. The first-rater (or early-rater) problem arises when it is impossible to offer recommendations about an item that was just incorporated in the system and, therefore, has few, or even none, evaluations from users. Analogously, such drawback also occurs with a new user joining the system, since there is no information about him, it would be impossible to determine his behavior in order to provide him recommendations. Actually, this variant of the first-rater problem is also referred as the cold-start problem. Semantic Web Mining can be used to address the last problems. Taxonomic abstraction provided by an ontology allow inducing patterns at more abstract level, that is, regularities can be found between categories of products instead of between specific products. These patterns can be used in recommender systems for recommending new products that still have not been rated by the users [2]. This is a way of dealing with the first-rater problem. In a similar way the cold-start problem can be solved.

3. RECOMMENDATION FRAMEWORK Although the framework is valid for any recommendation scenario, overcoming first-rater and cold-start problems is the main target of this proposal. The predictive models built by mean of data mining algorithms are the basis of the procedure. They are induced at two different levels: high level models relate semantic information about products and users and low level models relate specific product and users with preference ratings. These ones are used in classical recommendation approaches while the high level models are specifically defined for first-rater and cold-start problems and they require the definition of a domain specific ontology. The high level model provides recommendations by association of user profiles and categories of products. In this way, first-rater and cold-start problems are avoided since neither rating from new user nor rating for new product are required. When a user who have rated products (old user) asks for recommendations, both the low level and the high level models are checked in order to find, respectively, rated and not rated product for recommending to him. This is the manner to overcome first-rater problem in the case of old users.

4. CASE STUDY In the following study, carried out with data from MovieLens system (http://movielens.umn.edu), several data mining algorithms were applied in order to select the best classifier for this application domain. Data from MovieLens were classified and annotated with semantic metadata according to a domain-specific ontology. The proposed ontology for our application domain must take into account the available data: user (id_user, gender, age, occupation, zip), movie (id_movie, title, genre) and rating (id_user, id_movie, score, rating_bin). The database used for building the high models by means of web mining methods was designed following the structure given by the defined ontology. Next step was focused on analyzing the precision of high level models where there is a loss of information with regard to the low level models. As commented before, a way to deal with the sparsity problem is to apply methods slightly sensitive to data sparsity, therefore, we try associative classification due to the better behavior of these methods in sparse data contexts [4]. Consequently more reliable recommendation can be obtained with a lesser number of ratings. The studied associative classification algorithms were CBA, CMAR, FOIL and CPAR. They were compared with non-associative classification methods. The class attribute was rating-bin, which takes the values “Recommended”

(rating 3-5) and “Not recommended” (rating 1, 2). The results, showed in figure 1, confirm the better performance of associative classifiers, excluding CPAR, compared to traditional classifiers.

Figure 1. Precision obtained with different algorithms

5. CONCLUSIONS In this work a recommendation framework specially addressed to overcome critical drawbacks of recommender systems is proposed. The proposal consists on combining web mining methods and domain specific ontologies in order to induce models at two abstraction levels. High level models are used for recommending not rated products or for making recommendation to new users, avoiding in this way the first rater and cold start problems. In addition, the framework also addresses scalability and sparsity drawbacks. The off-line model induction avoids scalability problems in recommendation time and the proposal of using associative classification methods provides a way to deal with sparsity problem due to this kind of methods present a better behavior with sparse datasets.

6. REFERENCES [1] Cho, H.C., Kim, J.K., Kim, S.H. 2002. A personalized

recommender system based on web usage mining and decision tree induction. Expert Systems with Applications, 23, 329-342.

[2] Huang, Y. and Bian, L. 2009. A Bayesian network and analytic hierarchy process based personalized recommendations for tourist attraction over the Internet, Expert Systems with Applications, 36, 933-943.

[3] Lee, CH., Kim, Y.H., Rhee, P.K. 2001. Web personalization expert with combining collaborative filtering and association rule Mining Technique. Expert Systems with Applications, 21, 131-137.

[4] Moreno M.N., Pinho, J., López, V, y Polo, M.J. 2010. Multivariate Discretization for Associative Classification in a Sparse Data Application Domain., Lecture Notes in Artificial Intelligence, v. 6076, Springer 104-111.

[5] Resnick, P., Iacovou, N., Suchack, M., Bergstrom, P. and Riedl, J. 1994. Grouplens: An open architecture for collaborative filtering of netnews. Proc. of ACM Conference on Computer Supported Cooperative Work, 175-186.

[6] Sarwar, B., Karypis, G., Konstan, J., Riedl, J. 2001. Item-based Collaborative Filtering Recommendation Algorithm. Proceedings of the tenth International World Wide Web Conference, 285-295.

[7] Schafer, J.B., Konstant, J.A. and Riedl, J. 2001. E-Commerce Recommendation Applications. Data Mining and Knowledge Discovery, 5, 115-153.

257

[acm press the 15th symposium - lisboa, portugal (2011.09.21-2011.09.23)] proceedings of the 15th...

Documents