estimating the magic barrier of recommender systems: a user study

1
The Magic Barrier Rootmeansquare error (RMSE) is commonly used for accuracy evaluation of a rating function on a set of ratings Having new opinions we can express the the error between an original rating and and a new opinion on item i by user u as We can suppose there is an unknown true rating function that knows the true opinions of each user on each item. We can derive an estimate of the RMSE of as which is equal to the standard deviation of where , It is possible that there are ratings functions with a lower RMSE than , these functions tend to overfit and their lower RMSE does not mean they perform better – they perform within the boundaries of the magic barrier. Evaluating Recommender Systems Recommender systems evaluation generally measures the quality of the algorithm based on some accuracy metric, e.g. precision, or error measure, e.g. rootmeansquare error. However, these measures neglect the inherent inconsistencies users – people – are afflicted by. These are the first results from a noise measurement user study for estimating the magic barrier of recommender systems conducted on a commercial movie recommendation community. The magic barrier is the expected squared error of the optimal recommendation algorithm, or, the lowest error we can expect from any recommendation algorithm. Our results show that the barrier can be estimated by collecting the opinions of users on already rated items. Technische Universität Berlin {alan, jain, narr, till, sahin, scheel}@dailab.de www.dailab.de Alan Said, Brijnesh J. Jain, Sascha Narr, Till Plumbaum, Sahin Albayrak, Christian Scheel Estimating the Magic Barrier of Recommender Systems: A User Study Results & Conclusion We presented a study on the inherent noise found in rating values given by users in a commercial recommendation system. Our assumption, that the magic barrier of recommender systems can be better assessed by noise estimation seems to hold. We presented an early model for the magic barrier and the level of accuracy a recommender systems can achieve without overfitting on the noise in the data. Performing an estimate of the magic barrier of a system makes it possible ot assess whether a system can be further improved or not. We suggest that in order to estimate a system’s prediction quality, opinion collection for magic barrier estaimation should be conducted regularly. The User Study We asked users of www.moviepilot.de to provide new ratings (socalled opinions) for movies they had rated in the past. We specifically asked for opinions and not reratings so not to suggest a change of heart. The user interface for collecting opinions was created so that it resembled the regular rating page of moviepilot in order to create a feeling of familiarity for the users and lower rating inconsistencies related to unfamiliarity with the system. Further Reading Detailed explanation of the magic barrier Users and Noise: The Magic Barrier of Recommender Systems [UMAP2012, Said et al.] Paper version of the poster Standard deviation of the error, where all refers to the deviation over all opinions; r avg and r < avg refer to the deviation over all ratings above and below average. Moviepilot’s rating scale is 010 stars. A magic barrier of ±1,2 means that rating prediction errors within that boundary are part of user’s rating inconsistencies. The ”rate new movies” page on moviepilot.de Our interface for collecting new opinions Calculated Magic Barrier Data The study ran in April and May 2011 and resulted in a dataset containing 6,299 opinions on 2,329 movies by 306 users – i.e. 6,299 ratingopinion pairs. All participating users had to have had rated at least 50 movies on moviepilot.de and gave at least 20 new opinions. 1,201 1,043 1,417 0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 all r avg r < avg SIGIR 2012 – Portland, OR, USA

Upload: alan-said

Post on 08-May-2015

347 views

Category:

Education


1 download

DESCRIPTION

Recommender systems are commonly evaluated by trying to predict known, withheld, ratings for a set of users. Measures such as the Root-Mean-Square Error are used to estimate the quality of the recommender algorithms. This process does however not acknowledge the inherent rating inconsistencies of users. In this paper we present the first results from a noise measurement user study for estimating the magic barrier of recommender systems conducted on a commercial movie recommendation community. The magic barrier is the expected squared error of the optimal recommendation algorithm, or, the lowest error we can expect from any recommendation algorithm. Our results show that the barrier can be estimated by collecting the opinions of users on already rated items.

TRANSCRIPT

Page 1: Estimating the Magic Barrier of Recommender Systems: A User Study

The Magic BarrierRoot‐mean‐square error (RMSE) is commonly used for accuracy evaluation of a rating function  on a set  of ratings 

Having new opinions we can express the the error between an original rating and and a new opinion  on item i by user u as 

We can suppose there is an unknown true rating function  that knows the true opinions of each user on each item. We can derive an estimate of the RMSE of as 

which is equal to the standard deviation of  where  ,

It is possible that there are ratings functions with a lower RMSE than  , these functions tend to overfit and their lower RMSE does not mean they perform better – they perform within the boundaries of the magic barrier.

Evaluating Recommender SystemsRecommender systems evaluation generally measures the quality of thealgorithm based on some accuracy metric, e.g. precision, or error measure, e.g.root‐mean‐square error. However, these measures neglect the inherentinconsistencies users – people – are afflicted by.

These are the first results from a noise measurement user study for estimatingthe magic barrier of recommender systems conducted on a commercial movierecommendation community.

The magic barrier is the expected squared error of the optimalrecommendation algorithm, or, the lowest error we can expect from anyrecommendation algorithm. Our results show that the barrier can be estimatedby collecting the opinions of users on already rated items.

Technische Universität Berlin  {alan, jain, narr, till, sahin, scheel}@dai‐lab.de www.dai‐lab.de

Alan Said, Brijnesh J. Jain, Sascha Narr, Till Plumbaum, Sahin Albayrak, Christian Scheel

Estimating the Magic Barrier of Recommender Systems: A User Study

Results & ConclusionWe presented a study on the inherent noise found in rating values given by users in acommercial recommendation system.

Our assumption, that the magic barrier of recommender systems can be better assessed bynoise estimation seems to hold.We presented an early model for the magic barrier and the level of accuracy a recommendersystems can achieve without over‐fitting on the noise in the data. Performing an estimate ofthe magic barrier of a system makes it possible ot assess whether a system can be furtherimproved or not.

We suggest that in order to estimate a system’s prediction quality, opinion collection formagic barrier estaimation should be conducted regularly.

The User StudyWe asked users of www.moviepilot.de to provide new ratings (so‐called opinions) for movies they had rated in the past. We specifically asked for opinions and not re‐ratings so not to suggest a change of heart.

The user interface for collecting opinions was created so that it resembled the regular rating page of moviepilot in order to create a feeling of familiarity for the users and lower rating inconsistencies related to unfamiliarity with the system.

Further Reading

Detailed explanation of themagic barrier

Users and Noise: The MagicBarrier of RecommenderSystems [UMAP2012, Said et al.]

Paper version of the poster

Standard deviation of the error, where all refers to the deviation over all opinions; r ≥ avg and r < avg refer to the deviation over all ratings above and below average. Moviepilot’s rating scale is 0‐10 stars. A magic barrier of ±1,2 means that rating prediction errors within that boundary are part of user’s rating inconsistencies.

The ”rate new movies” page on moviepilot.de

Our interface for collecting new opinions

Calculated Magic Barrier

DataThe study ran in April and May 2011 and resulted in a dataset containing 6,299opinions on 2,329 movies by 306 users – i.e. 6,299 rating‐opinion pairs. Allparticipating users had to have had rated at least 50 movies on moviepilot.deand gave at least 20 new opinions.

1,201

1,043

1,417

0

0,2

0,4

0,6

0,8

1

1,2

1,4

1,6

all r ≥ avg r < avg

SIGIR 2012 – Portland, OR, USA