mendeley suggest: engineering a personalised article recommender system
DESCRIPTION
I gave this presentation at the RecSysChallenge workshop (http://2012.recsyschallenge.com/) at Recommender Systems 2012 (http://recsys.acm.org/2012/) in Dublin on 13 September, 2012. This presentation describes how we have been making use of Mahout to power Mendeley Suggest. First, it includes some results from tuning Mahout's recommender on AWS and the cost vs. precision tradeoff. Then it concludes with details on how to make use of other big data technologies and AWS in order to put a serving layer in place. Acknowledgement to Daniel Jones for making the slides for the serving layer part of the presentation.TRANSCRIPT
Mendeley Suggest: Engineering a
Personalised Article Recommender System
Kris Jack, PhDChief Data Scientist
https://twitter.com/_krisjack
➔ What's Mendeley?
➔ What's Mendeley Suggest?
➔ Computation Layer
➔ Serving Layer➔ Architecture➔ Technologies➔ Deployment
➔ Conclusions
Overview
What's Mendeley?
➔ Mendeley is a platform that connects researchers, research data and apps
Mendeley Open API
➔ Mendeley is a platform that connects researchers, research data and apps
➔ Startup company with ~20 R&D engineers
Mendeley Open API
What's Mendeley Suggest?
Use Case
➔ Good researchers are on top of their game➔ Difficult with the amount being produced
➔ There must be a technology that can help
➔ Help researchers by recommending relevant research
Mendeley Suggest
Computation Layer
Mendeley Suggest
Mendeley Suggest
Mendeley Suggest
Running on Amazon's Elastic Map Reduce
On demand use and easy to cost
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
Mahout'sPerformance
1.5M Users, 50M ArticlesComputation Layer
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
Mahout'sPerformance
1.5M Users, 50M ArticlesComputation Layer
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
Mahout'sPerformance
1.5M Users, 50M ArticlesComputation Layer
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
Mahout'sPerformance
1.5M Users, 50M ArticlesComputation Layer
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
3
Mahout'sPerformance
1.5M Users, 50M ArticlesComputation Layer
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5Orig. item-based
3
Mahout'sPerformance
1.5M Users, 50M ArticlesComputation Layer
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5Orig. item-based
Cust. item-based➔2.4K, 1.5
3
Mahout'sPerformance
1.5M Users, 50M ArticlesComputation Layer
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5Orig. item-based
Cust. item-based➔2.4K, 1.5
3
-4.1K(63%)
Mahout'sPerformance
ParitionersMR allocation
1.5M Users, 50M ArticlesComputation Layer
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5Orig. item-based
Cust. item-based➔2.4K, 1.5
3
Mahout'sPerformance
1.5M Users, 50M ArticlesComputation Layer
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5Orig. item-based
Cust. item-based➔2.4K, 1.5
Orig. user-based➔1K, 2.5
3
Mahout'sPerformance
1.5M Users, 50M ArticlesComputation Layer
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5Orig. item-based
Cust. item-based➔2.4K, 1.5
Orig. user-based➔1K, 2.5
3
-1.4K(58%)
+1 (67%)
Mahout'sPerformance
1.5M Users, 50M ArticlesComputation Layer
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5Orig. item-based
Cust. item-based➔2.4K, 1.5
Orig. user-based➔1K, 2.5
3
Cust. user-based➔0.3K, 2.5
Mahout'sPerformance
1.5M Users, 50M ArticlesComputation Layer
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5Orig. item-based
Cust. item-based➔2.4K, 1.5
Orig. user-based➔1K, 2.5
3
Cust. user-based➔0.3K, 2.5
-0.7K(70%)
Mahout'sPerformance
-4.1K(63%)
1.5M Users, 50M ArticlesComputation Layer
Nor
ma l
ised
Am
azon
Ho u
rs
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5Orig. item-based
Cust. item-based➔2.4K, 1.5
Orig. user-based➔1K, 2.5
3
Cust. user-based➔0.3K, 2.5
-6.2K(95%)
Mahout'sPerformance
+1 (67%)
1.5M Users, 50M ArticlesComputation Layer
Mahout as the Computation Layer
➔ Out of the box, didn't work so well for us➔ Needed to understand Hadoop better➔ Contributed patch back to community (user-user)
➔ Next step, the serving layer...
Serving Layer
MendeleyHadoopCluster
UserLibraries
Cascading
Architecture
ComputationLayer
AWS
MendeleyHadoopCluster
DynamoDB ElasticBeanstalk
ElasticBeanstalk
ElasticBeanstalk
UserLibraries
Map Reduce
Architecture
ComputationLayer
ServingLayer
➔ Spring dependency injection framework➔ Context-wide integration testing is easy, including pre-loading
of test data➔ Allows other Spring features (cache, security, messaging)
➔ Spring MVC 3.2.M1➔ Annotated controllers, type conversion 'for free'➔ Asynchronous Servlet 3.0 supports thread 'parking'
➔ AlternatorDB➔ In-memory DynamoDB implementation for testing
Technologies
Recommendation<K>
LongRecommendation UuidRecommendation
DocumentRecommendationGroupRecommendation PersonRecommendation
➔ Build once, employ in several use cases
Technologies
➔ AWS ElasticBeanstalk➔ Managed, auto-scaling, health-checking .war container
➔ Jenkins continuous integration (CI) server➔ Maven build tool (useful dependency management)➔ beanstalk-maven-plugin (push a button to deploy)
➔ Deploys to ElasticBeanstalk➔ Replaces existing application version if required➔ 'Zero downtime' updates (tested at ~300ms)➔ Triggered by Jenkins
Deployment
Putting it all together... $$$
➔ Real-time article recommendations for 2 million users➔ 20 requests per second➔ $65.84/month
➔ $34.24 ElasticBeanstalk➔ $28.17 DynamoDB➔ $2.76 bandwidth
➔ $30 to update the computation layer periodically
Conclusions
Conclusions
➔ Mendeley Suggest is a personalised article recommender➔ Built by small team for big data➔ Uses Mahout as computation layer
➔ Needs some love out of the box➔ Serves from AWS
➔ Reduces maintenance costs and is reliable➔ Intend to release Mendeley Suggest to all users this year
We're Hiring!
➔ Data Scientist➔ apply recommender technologies to Mendeley's data
➔ work on improving the quality of Mendeley's research catalogue
➔ starting in first quarter of 2013
➔ 6 month secondment in KNOW Center, TU Graz, Austria as part of the EC FP7 TEAM project (http://team-project.tugraz.at/)
➔ http://www.mendeley.com/careers/
www.mendeley.com