exploring content recommendation
DESCRIPTION
A study and experiences with Mahout recommenderTRANSCRIPT
Exploring content recommendation
Felipe Besson@fmbesson
March, 2013
“A lot of times, people don't know what they want until you show it to them.”
Steve Jobs
“We don't make money when we sell things; we make money when we help customers
make purchase decisions.” Jeff Bezos, Amazon
Why recommendation is important ?
An Apache project to build scalable machine learning libraries
● Focused on large data sets● Adaption of standard machine learning algorithms ● Run on Apache Hadoop (map/reduce paradigm)
… or on a non Hadoop node
Who is using Mahout ?
Source: https://cwiki.apache.org/MAHOUT/powered-by-mahout.html
Supported core algorithms
● Classification ● Clustering ● Recommendation● Pattern Mining ● Regression● Dimension Reduction ● Evolutionary Algorithms● Vector Similarity
Mahout Recommender
Collaborative filteringPeople often get the best recommendation from someone with similar taste
● People tend to like things that are similar to other things they like
● There are patterns in people likes and dislikes
John Bob
movie1 movie1
movie2
movie2
movie42
movie4
movie5
Will Bob like movie4? and movie5?
Mahout Recommender
Available recommenders● Item based● User based
Execution modes● Taste: online but not distributed● Hadoop: offline (batch) but distributed
Parameters● Many coefficients to calculate user and item
similarity and neighborhood● Data model abstractions
Mahout Recommender (Hadoop)
Input user_iditem_idpreference_value (optional)
1, 23, 0.91, 15, 0.5
1, 89, 0.12, 11, 0.32, 15, 0.29, 10, 0.59, 99, 0.99, 11, 0.18, 11, 0.5 ...
Outputuser_id[recommended_item, score]
1: [10, 0.93; 11, 0.84; … ]
2: [23, 0.72; 17, 0.60; … ]
8: [121, 0.98; 23, 0.78; … ]
17: [12, 0.89; 32, 0.56; … ]
42: [129, 0.92; 98, 0.45; … ]
...
1st try!
Movie recommendationNetflix base (http://www.netflixprize.com/)
● # of user tastes: 2.817.131● # of movies: 17.770● # of users: 472891
Environment and performance● Hadoop pseudo-distributed● Computer
● Intel® Core™ i5-3317U CPU @ 1.70GHz × 4● 6Gb RAM
● Total time: ~ 16 minutes
How to run ?
1. Copy the input file to HDFS (Hadoop distributed file system)
hadoop fs -put qualifying.txt /netflix/input/data.txt
2. Run the recommender
hadoop jar core/target/mahout-core-0.8-SNAPSHOT-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=/netflix/input/data.txt -Dmapred.output.dir=/netflix/output --numRecommendations 10 --similarityClassname SIMILARITY_LOGLIKELIHOOD
ResultsRecommender analyzerhttps://github.com/besson/recommender_analyzerhttp://rec-analyzer.herokuapp.com/
Results
References
Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman. Mahout in Action, Manning publications, 2011.
ThanksFelipe Besson
@fmbesson