a non-iid framework for collaborative filtering with...
TRANSCRIPT
Kostadin Georgiev, VMware BulgariaPreslav Nakov, Qatar Computing Research Institute
ICML, June 17, 2013, Atlanta
6. Comparison to Previous Work
1. Overview
A NON-IID FRAMEWORKFOR COLLABORATIVE FILTERING WITH RESTRICTED BOLTZMANN MACHINES
5. Experiments & Evaluation
• We propose a non-IID framework for collaborative filtering
• based on Restricted Boltzmann Machines (RBM)
• models both user-user and item-item correlations
• uses real values in the visible layer (as opposed to multinomial) to model user-item ratings, thus taking advantage of the natural order between them
• We further explore the potential of combining the original training data with data generated by the RBM-based model itself in a bootstrapping fashion
• Evaluation: on two MovieLens datasets - with 100K and 1M user-item ratings, respectively
• Results: Rival the state-of-the-art
2. User/Item-based RBM Model
7. Conclusion & Future Work
• Conclusion
• proposed a non-IID RBM framework for collaborative filtering
• rivals the best previous algorithms, which are more complex
3. Non-IID Hybrid RBM Model
MovieLens 100kMovieLens 1M
• The visible layer• represents either all ratings by a user or all rating for an item
• units model ratings as real values (vs. multinomial)
• noise-free reconstruction is better
• Remove the IID assumption for the training data
• Topology: Unit vij is connected to two independent hidden layers: one user-based and another item-based.
• Missing values (ratings): the generated predictions are used during testing, but are ignored during training
• Training procedure: we average the predictions of the user-based and of the item-based RBM models
4. Neighborhood + I-RBM
Use the I-RBM predictions from a neighborhood-based algorithm:
However, compute the averages from the original ratings:• Future work
• add an additional layer to model higher-order correlations
• add content-based features, e.g., demographic
• Two MovieLens datasets:• 100k:
• 1,682 movies assigned
• 943 users
• 100,000 ratings
• sparseness: 93.7%
• Each rating is between 1 (worst) and 5 (best)
Data
• 1M:
• 3,952 movies
• 6,040 users
• 1 million ratings
• sparseness: 95.8%
- Item-based RBM model outperforms user-based, but not by much.- Hybrid item-based RBM + NB model is relatively insensitive to number of units.
In the IID case, multinomial visible units are better than real-valued. In the non-IID case: real-valued visible units outperform multinomial.
• Mean Absolute Error (MAE):
• Cross-validation• 5-fold
• 80%:20% training:testing data splits
Evaluation