modeling difficulty in recommender systems
DESCRIPTION
Presentation given at the Workshop on Recommendation Utility Evaluation: Beyond RMSE in conjunction with the conference on recommender systems (ACM) on September 9, 2012TRANSCRIPT
Competence Center Information Retrieval & Machine Learning
Modeling Difficulty in Recommender Systems
Benjamin Kille (@bennykille)
September 9, 2012
Recommendation Utility Evaluation: Beyond RMSE (2012)
2
Outline
► Recommender System Evaluation
► Problem definition
► Difficulty in Recommender Systems
► Future work
► Conclusions
September 9, 2012
Recommendation Utility Evaluation: Beyond RMSE (2012)
Recommendation Utility Evaluation: Beyond RMSE (2012)3
Recommender Systems Evaluation
► Definition of Evaluation measure:
RMSE (rating prediction scenario)
nDCG (ranking scenario)
Precision@N (top-N scenario)
► Splitting data into training and test partition
► Reporting results as average over the full set of users
► Is recommending to all users equally difficult?
September 9, 2012
Recommendation Utility Evaluation: Beyond RMSE (2012)4
Observed Differences
► Users differ with respect to Demographics (e.g., age, gender and location) Taste Needs Expectations Consumption patterns …
► Recommendation algorithms do not perform equally for each single userusers should not be evaluated all in the same way!
September 9, 2012
Recommendation Utility Evaluation: Beyond RMSE (2012)5
Risks of disregarding users‘ differences
► A subset of users receives worse recommendations than possible
► recommendation algorithm optimization targets all users equally:
„easy“ users costs could be saved „difficult“ users insufficient optimization
Control optimization towards those users who really require it!
How to determine difficulty?
September 9, 2012
Recommendation Utility Evaluation: Beyond RMSE (2012)6
Problem Formulation
► Measuring how difficult it will be to recommend items to a user
► Ideally: deriving difficulty directly from user attributes► Problem: unkown correlation between (combinations of)
attributes and difficulty
► We need a method to calculate the correlation of user attributes and the recommendation difficulty
September 9, 2012
Recommendation Utility Evaluation: Beyond RMSE (2012)7
Difficulty in Information Retrieval
► Target object: query► Method:
September 9, 2012
Query
IR-System IR-System IR-System IR-System IR-System
Doc 1 Doc 1
Doc 1
Doc 1Doc 1
Doc 2 Doc 2Doc 3Doc 2
Doc 2
Doc 4Doc 3 Doc 4Doc 2Doc 3
… … … … …
Difficulty = Diversity of returned list of documents
Recommendation Utility Evaluation: Beyond RMSE (2012)8
Difficulty in Recommender Systems
► Selecting several recommendation methods (state-of-the-art)► Measure the diversity of their output for a specific user► Based on the methods‘ agreement with respect to predicted
rating / ranking / top-N items, we conclude: high agreement low difficulty low agreement high difficulty
► Target correlation (user attributes ~ difficulty) can be estimated using the observed difficulties for a sufficiently large set of users
September 9, 2012
Recommendation Utility Evaluation: Beyond RMSE (2012)9
Future Work
► Experimentally verify feasability of difficulty estimation
► Evaluate observed correlation (user attributes ~ difficulty) on
data sets
► Investigate business rationale (reduced costs through
controlled optimization efforts)
► How to deal with sparsity / cold-start issues
September 9, 2012
Recommendation Utility Evaluation: Beyond RMSE (2012)10
Conclusions
► Users should not be treated equally when evaluating
recommender systems
► Difficulty of recommendation tasks varies between users
► Difficulty will allow to control optimization towards those users
who require it
► Diversity metrics could be used to estimate difficulty scores
(analogously to information retrieval)
► Proposed method needs to be evaluated
September 9, 2012
Recommendation Utility Evaluation: Beyond RMSE (2012)11
Thank you for your attention!
Questions
September 9, 2012
Recommendation Utility Evaluation: Beyond RMSE (2012)12
References
[He2008] J. He, M. Larson, and M. De Rijke. Using coherence-based measures to predict query difficulty. ECIR 2008
[Herlocker2004] J. Herlocker, J. Konstan, L. Terveen, and J. Riedl. Evaluating collaborative filtering recommender systems. ACM TOIS 22(1)
2004[Kuncheva2003] L. Kuncheva and C. Whitaker. Measures of
diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51 2003
[Vargas2011] S. Vargas and P. Castells. Rank and relevance in novelty and diversity metrics for recommender systems. RecSys 2011
September 9, 2012