estimating the magic barrier of recommender systems: a user study
DESCRIPTION
Recommender systems are commonly evaluated by trying to predict known, withheld, ratings for a set of users. Measures such as the Root-Mean-Square Error are used to estimate the quality of the recommender algorithms. This process does however not acknowledge the inherent rating inconsistencies of users. In this paper we present the first results from a noise measurement user study for estimating the magic barrier of recommender systems conducted on a commercial movie recommendation community. The magic barrier is the expected squared error of the optimal recommendation algorithm, or, the lowest error we can expect from any recommendation algorithm. Our results show that the barrier can be estimated by collecting the opinions of users on already rated items.TRANSCRIPT
The Magic BarrierRoot‐mean‐square error (RMSE) is commonly used for accuracy evaluation of a rating function on a set of ratings
Having new opinions we can express the the error between an original rating and and a new opinion on item i by user u as
We can suppose there is an unknown true rating function that knows the true opinions of each user on each item. We can derive an estimate of the RMSE of as
which is equal to the standard deviation of where ,
It is possible that there are ratings functions with a lower RMSE than , these functions tend to overfit and their lower RMSE does not mean they perform better – they perform within the boundaries of the magic barrier.
Evaluating Recommender SystemsRecommender systems evaluation generally measures the quality of thealgorithm based on some accuracy metric, e.g. precision, or error measure, e.g.root‐mean‐square error. However, these measures neglect the inherentinconsistencies users – people – are afflicted by.
These are the first results from a noise measurement user study for estimatingthe magic barrier of recommender systems conducted on a commercial movierecommendation community.
The magic barrier is the expected squared error of the optimalrecommendation algorithm, or, the lowest error we can expect from anyrecommendation algorithm. Our results show that the barrier can be estimatedby collecting the opinions of users on already rated items.
Technische Universität Berlin {alan, jain, narr, till, sahin, scheel}@dai‐lab.de www.dai‐lab.de
Alan Said, Brijnesh J. Jain, Sascha Narr, Till Plumbaum, Sahin Albayrak, Christian Scheel
Estimating the Magic Barrier of Recommender Systems: A User Study
Results & ConclusionWe presented a study on the inherent noise found in rating values given by users in acommercial recommendation system.
Our assumption, that the magic barrier of recommender systems can be better assessed bynoise estimation seems to hold.We presented an early model for the magic barrier and the level of accuracy a recommendersystems can achieve without over‐fitting on the noise in the data. Performing an estimate ofthe magic barrier of a system makes it possible ot assess whether a system can be furtherimproved or not.
We suggest that in order to estimate a system’s prediction quality, opinion collection formagic barrier estaimation should be conducted regularly.
The User StudyWe asked users of www.moviepilot.de to provide new ratings (so‐called opinions) for movies they had rated in the past. We specifically asked for opinions and not re‐ratings so not to suggest a change of heart.
The user interface for collecting opinions was created so that it resembled the regular rating page of moviepilot in order to create a feeling of familiarity for the users and lower rating inconsistencies related to unfamiliarity with the system.
Further Reading
Detailed explanation of themagic barrier
Users and Noise: The MagicBarrier of RecommenderSystems [UMAP2012, Said et al.]
Paper version of the poster
Standard deviation of the error, where all refers to the deviation over all opinions; r ≥ avg and r < avg refer to the deviation over all ratings above and below average. Moviepilot’s rating scale is 0‐10 stars. A magic barrier of ±1,2 means that rating prediction errors within that boundary are part of user’s rating inconsistencies.
The ”rate new movies” page on moviepilot.de
Our interface for collecting new opinions
Calculated Magic Barrier
DataThe study ran in April and May 2011 and resulted in a dataset containing 6,299opinions on 2,329 movies by 306 users – i.e. 6,299 rating‐opinion pairs. Allparticipating users had to have had rated at least 50 movies on moviepilot.deand gave at least 20 new opinions.
1,201
1,043
1,417
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
all r ≥ avg r < avg
SIGIR 2012 – Portland, OR, USA