indic threads pune12-recommenders-apache-mahout

29
How to Build a Recommendation Engine Using Apache Mahout Viraj Paripatyadar GS Lab

Upload: indicthreads

Post on 15-May-2015

431 views

Category:

Documents


2 download

DESCRIPTION

The 7th Annual IndicThreads Pune Conference was held on 14-15 December 2012. http://pune12.indicthreads.com/

TRANSCRIPT

Page 1: Indic threads pune12-recommenders-apache-mahout

How to Build a Recommendation Engine Using Apache MahoutViraj ParipatyadarGS Lab

Page 2: Indic threads pune12-recommenders-apache-mahout

2

Contents

• A recommendation problem• What is a recommender• Building a recommender using Mahout• Tips and tweaks

• Recommender considerations

Page 3: Indic threads pune12-recommenders-apache-mahout

A book store

• Sells books:• By various authors• Of various categories• On different subjects• From various publishers

• Readers/buyers are asked to rate• Readers/buyers can provide reviews

You walk into the store(buy something for a friend)

Page 4: Indic threads pune12-recommenders-apache-mahout

The store owner

• Asks you what:• your friend reads (already owns)• your friend usually likes more

• Has data on what:• his customers buy• his customers rate and review

• Uses a few strategies

Page 5: Indic threads pune12-recommenders-apache-mahout

1 - Find similar books

Depending on which books your friend has, pick books:• by the same author• on the same/similar subject/s• in the same category• from the same publication

(those with highest sales numbers)

Page 6: Indic threads pune12-recommenders-apache-mahout

2 - Find books with similar readership

• Define some similarity• e.g. two books are as similar as the number of

readers rating both of them

• Define some limit of relevance• e.g. only consider books which are more than 4

readers similar

• Look for all books which are similar to books your friend owns

Pick books from this set that you friend doesn’t own

Page 7: Indic threads pune12-recommenders-apache-mahout

3 - Find people with similar tastes

• Define some similarity• e.g. two people are as similar as the number of

books they like from the same category

• Define some limit of relevance• e.g. only consider the 3 top people when

ordered according to how similar they are to your friend

• Look for users similar to your friend and see what they read

Pick books which these people like and your friend doesn’t own

Page 8: Indic threads pune12-recommenders-apache-mahout

Example data1,101,5.0 3,101,2.5 4,106,4.0

1,102,3.0 3,104,4.0 5,101,4.0

1,103,2.5 3,105,4.5 5,102,3.0

2,101,2.0 3,107,5.0 5,103,2.0

2,102,2.5 4,101,5.0 5,104,4.0

2,103,5.0 4,103,3.0 5,105,3.5

2,104,2.0 4,104,4.5 5,106,4.0

• Your friend owns three books:• Gave 5 stars to book 101 (likes hugely and talks about it all the

time)

• Gave 3 stars to book 102 (has shown some liking to it)

• Gave 2.5 stars to book 103 (has read it, but didn’t say bad things about it)

Now, we need to recommend for your friend books he hasn’t seen

Page 9: Indic threads pune12-recommenders-apache-mahout

A pictorial representation

101 102 103 104 105 106 107

1

2

3

4

5

Page 10: Indic threads pune12-recommenders-apache-mahout

Visualize

101 102 103 104 105 106 107

1

2

3

4

5

Page 11: Indic threads pune12-recommenders-apache-mahout

A (slightly) bigger example1,101,5.0 3,111,2.5 6,103,2.0

1,102,3.0 4,101,5.0 6,106,4.0

1,103,2.5 4,103,3.0 6,113,3.0

1,109,3.5 4,104,4.5 6,115,5.0

1,112,4.0 4,106,4.0 7,103,4.5

2,101,2.0 4,109,2.0 7,104,2.5

2,102,2.5 4,111,2.5 7,108,4.0

2,103,5.0 5,101,4.0 7,109,3.5

2,104,2.0 5,102,3.0 7,110,3.5

2,107,4.5 5,103,2.0 7,112,2.5

2,113,3.5 5,104,4.0 8,101,2.0

3,101,2.5 5,105,3.5 8,105,4.0

3,104,4.0 5,106,4.0 8,106,4.5

3,105,4.5 5,109,3.0 8,110,3.0

3,107,5.0 5,112,4.0 8,114,5.0

3,115,4.0 6,101,4.5 8,115,3.5

Page 12: Indic threads pune12-recommenders-apache-mahout

A pictorial representation

101 102 103 104 105 106 107 108 109 110 111 112 113 114 115

1 2 3 4

5 6 7 8

Clearly, not a viable option

Page 13: Indic threads pune12-recommenders-apache-mahout

Mahout to the rescue

Page 14: Indic threads pune12-recommenders-apache-mahout

What is Apache Mahout

• Apache Mahout• A machine learning library• Works with Apache Hadoop

• Use cases:• Recommenders• Clustering• Classification

Page 15: Indic threads pune12-recommenders-apache-mahout

Recommenders in Mahout

• Recommenders use data culled from user behavior

• Recommending using Mahout• Similarity between users or items• Expressed as a number between 0-1

• Neighborhood of users/items• Recommendation using this info and an

algorithm• Generic• Specialized

Page 16: Indic threads pune12-recommenders-apache-mahout

Similarity

• Various algorithms:• Euclidean distance • Pearson correlation • Cosine measure • Spearman correlation • Tanimoto coefficient • Log-likelyhood

• Effectiveness dependent on the input data• Influences running time and memory

Page 17: Indic threads pune12-recommenders-apache-mahout

Neighborhood• Nearest N neighborhood (say, 4):

• Threshold neighborhood (say, > 0.8):

5

U

3

2

4

1

5

U

3

2

4

1

Page 18: Indic threads pune12-recommenders-apache-mahout

Recommender

• Recommenders• Generic recommender• User based• Item based

• Slope-one recommender• Singular Value Decomposition based• Liner Interpolation based• Cluster-based

• Recommender rescorer• Recommender evaluator

Page 19: Indic threads pune12-recommenders-apache-mahout

A real-life Web application

• News aggregator-cum-reader• Fetches news from a news service• Shows the news in a uniform UI• Lets readers read, like/dislike and comment on

news• Link social networks and share

• Make this a personalized newspaper• Track user actions• Derive and store preferences• Generate recommendations• Leverage social accounts, etc.

Page 20: Indic threads pune12-recommenders-apache-mahout

Overall design

 

User, application data

(MySQL)

News aggregation,

storage (Hbase)

Preferences, Recommender

(Mahout)

REST

REST

REST

Controller API (REST)

Web application

Phone/tablet applications

Third party applications

Page 21: Indic threads pune12-recommenders-apache-mahout

Recommender

REST (Grizzly, Tomcat)

REST service

Fetch recommendations

Input user actions

Recommender (offline, run periodically)

MySQL

Database

Input table dump

Page 22: Indic threads pune12-recommenders-apache-mahout

How to extract data – one dimension

1 2 3 4 5 6 7 8 91

10

100

1000

100004299

511

128

51

13

4 4

1

2

News article readership

News article readership

Number of News Articles

Page 23: Indic threads pune12-recommenders-apache-mahout

How to extract data – add dimensions

1 4 7 10 13 16 19 22 25 28 31 34 37 40 44 50 571

10

100

1000

10000

News article readership

Topic readership

Number of News articles / Topics

Page 24: Indic threads pune12-recommenders-apache-mahout

How more data helps

0 100 200 300 400 500 600 700 8000

5

10

15

20

25

30

35

40

No. of read-ers with x ar-ticles each

No. of readers with x top-ics each

Number of news articles/topics

21

Page 25: Indic threads pune12-recommenders-apache-mahout

How more data helps

5 15 25 35 45 55 65 75 85 950

1

2

3

4

5

6

7

8

9

No. of readers with x articles each

No. of readers with x topics each

Number of news articles/topics

Page 26: Indic threads pune12-recommenders-apache-mahout

How more data helps

95 145 195 245 295 345 3950

0.5

1

1.5

2

2.5

3

3.5

No. of readers with x articles each

No. of readers with x topics each

Number of news articles/topics

Page 27: Indic threads pune12-recommenders-apache-mahout

Learnings

• Know thy user• Frequency of visits• Preference logic wrt user

• Know thy items• Should have enough items per user• Maximize items per action• Should have enough intersections• Should not be transient

• Use tweaking abilities• Sharpen the saw

Page 28: Indic threads pune12-recommenders-apache-mahout

Questions

?