exploring content recommendation

Exploring content recommendation

Felipe Besson@fmbesson

March, 2013

“A lot of times, people don't know what they want until you show it to them.”

Steve Jobs

“We don't make money when we sell things; we make money when we help customers

make purchase decisions.” Jeff Bezos, Amazon

Why recommendation is important ?

An Apache project to build scalable machine learning libraries

● Focused on large data sets● Adaption of standard machine learning algorithms ● Run on Apache Hadoop (map/reduce paradigm)

… or on a non Hadoop node

Who is using Mahout ?

Source: https://cwiki.apache.org/MAHOUT/powered-by-mahout.html

Supported core algorithms

● Classification ● Clustering ● Recommendation● Pattern Mining ● Regression● Dimension Reduction ● Evolutionary Algorithms● Vector Similarity

Mahout Recommender

Collaborative filteringPeople often get the best recommendation from someone with similar taste

● People tend to like things that are similar to other things they like

● There are patterns in people likes and dislikes

John Bob

movie1 movie1

movie2

movie2

movie42

movie4

movie5

Will Bob like movie4? and movie5?

Mahout Recommender

Available recommenders● Item based● User based

Execution modes● Taste: online but not distributed● Hadoop: offline (batch) but distributed

Parameters● Many coefficients to calculate user and item

similarity and neighborhood● Data model abstractions

Mahout Recommender (Hadoop)

Input user_iditem_idpreference_value (optional)

1, 23, 0.91, 15, 0.5

1, 89, 0.12, 11, 0.32, 15, 0.29, 10, 0.59, 99, 0.99, 11, 0.18, 11, 0.5 ...

Outputuser_id[recommended_item, score]

1: [10, 0.93; 11, 0.84; … ]

2: [23, 0.72; 17, 0.60; … ]

8: [121, 0.98; 23, 0.78; … ]

17: [12, 0.89; 32, 0.56; … ]

42: [129, 0.92; 98, 0.45; … ]

...

1st try!

Movie recommendationNetflix base (http://www.netflixprize.com/)

● # of user tastes: 2.817.131● # of movies: 17.770● # of users: 472891

Environment and performance● Hadoop pseudo-distributed● Computer

● Intel® Core™ i5-3317U CPU @ 1.70GHz × 4● 6Gb RAM

● Total time: ~ 16 minutes

How to run ?

1. Copy the input file to HDFS (Hadoop distributed file system)

hadoop fs -put qualifying.txt /netflix/input/data.txt

2. Run the recommender

hadoop jar core/target/mahout-core-0.8-SNAPSHOT-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=/netflix/input/data.txt -Dmapred.output.dir=/netflix/output --numRecommendations 10 --similarityClassname SIMILARITY_LOGLIKELIHOOD

ResultsRecommender analyzerhttps://github.com/besson/recommender_analyzerhttp://rec-analyzer.herokuapp.com/

https://github.com/besson/recommender_analyzer

Results

References

Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman. Mahout in Action, Manning publications, 2011.

ThanksFelipe Besson

@fmbesson

exploring content recommendation

Technology