scalable collaborative filtering for commerce recommendation

Post on 28-Nov-2014

503 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

The slides for DataScience.SG Meetup (23, Sep, 2014). An introduction of a scalable collaborative filtering method using Hadoop mapreduce we build for commerce recommendation application.

TRANSCRIPT

Scalable Collaborative Filtering for Commerce Recommendation

Yiqun Hu & Yew Yap Goh {yiqhu, ygoh}@paypal.com

September, 2014

Agenda• Our Problem: Commerce Recommendation

• Collaborative Filtering (CF) 101

• Mahout’s Solution and Its Problems

• Our Solution

• Summary & Take Away

Commerce Recommendation

Which One To Choose ?We Accept

Which One To Choose ?We Accept

Which One To Choose ?We Accept

A Win-Win Solution

Consumer Merchant

Recommendation !Engine

Make good use of my money Grow my business

CF Input - Interaction Matrix

Consumer

Merchant

Commerce Interaction Matrix

CF Input - Implicit Feedback

Binary Likeness Matrix

Confidence Matrix

Consumer

Merchant

Interaction Matrix

Commerce Interaction Matrix

* Yifa Hu, Yehuda Koren and Chris Volinsky, Collaborative Filtering For Implicit Feedback Datasets, ICDM 2008

CF Modeling - Matrix Factorization

• U - the models of every consumer

• V - the models of every merchant

• Find the optimal U/V via optimization

Con

sum

er

Merchant

Con

sum

er Merchant

• Iteratively updateFix V and update U: Fix U and update V:

Alternative Least Square (ALS)

RegularizationData Fitting

* Yifa Hu, Yehuda Koren and Chris Volinsky, Collaborative Filtering For Implicit Feedback Datasets, ICDM 2008

Scalable ALS

Constant in current iteration

Only need to consider the

nonzero entities

Only need to consider

nonzero entities

* Yifa Hu, Yehuda Koren and Chris Volinsky, Collaborative Filtering For Implicit Feedback Datasets, ICDM 2008

Open Source Technologies

Open Source Technologies

Open Source Technologies

Implementation in MahoutAggregate'user'ra*ngs'

Ini*alize'item'models'

Update'user'models''

Update'item'models'

(1534,2323,2)'(1534,1128,3)'(1534,5678,1)''''''''''…'

(1534,'{1128:3,'2323:2,'5678:1,'…})'

Ini*alize'matrix'V'using'the'average'ra*ngs'

Run'K'itera*ons'

How To Parallelize ?

."

."

."

."".""."

."

."

."

Input&Transac,on&Matrix&

Worker&K(1& Worker&K&Worker&2&Worker&1& Worker&3&

…"…"

How To Parallelize ?

."

."

."

."".""."

."

."

."

Input&Transac,on&Matrix&

Worker&K(1& Worker&K&Worker&2&Worker&1& Worker&3&

…" …"…"…" …" …"…"

How To Parallelize ?

."

."

."

."".""."

."

."

."

Input&Transac,on&Matrix&

Worker&K(1& Worker&K&Worker&2&Worker&1& Worker&3&

…" …"…"…" …" …"…"

For Every Worker …

Worker&1&

…&

Compute&

Solve&Least&Square&Problem&

Simple Illustration

Consumer

Merchant

Commerce Interaction Matrix

Worker&1&

Worker&2&

Worker&3&

Worker&4&

Simple Illustration

Consumer

Merchant

Commerce Interaction Matrix

Worker&1&

Worker&2&

Worker&3&

Worker&4&

Simple Illustration

Consumer

Merchant

Commerce Interaction Matrix

Worker&1&

Worker&2&

Worker&3&

Worker&4&

Simple Illustration

Consumer

Merchant

Commerce Interaction Matrix

Worker&1&

Worker&2&

Worker&3&

Worker&4&

Simple Illustration

Consumer

Merchant

Commerce Interaction Matrix

Worker&1&

Worker&2&

Worker&3&

Worker&4&

Simple Illustration

Consumer

Merchant

Commerce Interaction Matrix

Worker&1&

Worker&2&

Worker&3&

Worker&4&

Anything Wrong ?!

!

• Unnecessary broadcast of all item vectors to every worker;

• The volume of all item vectors can be huge and impossible to load into memory;

…"…"Worker"K)1" Worker"K"Worker"2"Worker"1" Worker"3"

…" …"…"…" …"

Scalable ALS

Constant in current iteration

Only need to consider the

nonzero entities

Only need to consider

nonzero entities

ALS Recap

ALS Recap

The Solution of Spotify

* Chris Johnson, Erik Bernhardsson, Algorithmic Music Recommendations at Spotify

For Every Map Job …

…"…"

JobTracker

Worker 1 Worker 2 Worker 3

…"

…"

block 2 block 3block 1

…"…"…"

For Every Map Job …

…"…"

JobTracker

Worker 1 Worker 2 Worker 3

…"

…"

block 2 block 3block 1

…" …" …"

The Solution of Spotify

* Chris Johnson, Erik Bernhardsson, Algorithmic Music Recommendations at Spotify

Mapper&5&Mapper&4&Mapper&3&Mapper&2&Mapper&1&

Simple Illustration

Mapper&5&Mapper&4&Mapper&3&Mapper&2&Mapper&1&

Simple Illustration

Mapper&5&Mapper&4&Mapper&3&Mapper&2&Mapper&1&

Simple Illustration

Mapper&5&Mapper&4&Mapper&3&Mapper&2&Mapper&1&

Simple Illustration

Reducer 1

Mapper&5&Mapper&4&Mapper&3&Mapper&2&Mapper&1&

Simple Illustration

Reducer 1 Reducer 2

Mapper&5&Mapper&4&Mapper&3&Mapper&2&Mapper&1&

Simple Illustration

Reducer 1 Reducer 2 Reducer 3

Mapper&5&Mapper&4&Mapper&3&Mapper&2&Mapper&1&

Simple Illustration

Reducer 1 Reducer 2 Reducer 3 Reducer 4

Inside A Mapper

File System

Memory

Mapper

Inside A Mapper

File System

Memory

Mapper

Inside A Mapper

File System

Memory

Mapper

Inside A Mapper

File System

Memory

Mapper

Inside A Mapper

File System

Memory

Disadvantages:!

• Unnecessary file copy to all nodes before job;

• Each map function call, potential context switch with inefficient I/O load;

Mapper

Our Solution - Shard Partition

(Lee, {1:1, 2:5}) (Kim, {1:4, 2:2}) (Ha, {1:2, 2:4}) (Oh, {1:2, 2:4}

(Lee, {4:2}) (Kim, {4:5}) (Ha, {3:3}) (Oh, {4:5})

(Lee, {5:4}) (Kim, {5:1, 6:2}) (Ha, {6:5}) (Oh, {5:1})

Consumer

Merchant

Commerce Interaction Matrix

Our Solution - Shard Partition

Shard 1

(Lee, {1:1, 2:5}) (Kim, {1:4, 2:2}) (Ha, {1:2, 2:4}) (Oh, {1:2, 2:4}

(Lee, {4:2}) (Kim, {4:5}) (Ha, {3:3}) (Oh, {4:5})

(Lee, {5:4}) (Kim, {5:1, 6:2}) (Ha, {6:5}) (Oh, {5:1})

Consumer

Merchant

Commerce Interaction Matrix

Our Solution - Shard Partition

Shard 1 Shard 2

(Lee, {1:1, 2:5}) (Kim, {1:4, 2:2}) (Ha, {1:2, 2:4}) (Oh, {1:2, 2:4}

(Lee, {4:2}) (Kim, {4:5}) (Ha, {3:3}) (Oh, {4:5})

(Lee, {5:4}) (Kim, {5:1, 6:2}) (Ha, {6:5}) (Oh, {5:1})

Consumer

Merchant

Commerce Interaction Matrix

Our Solution - Shard Partition

Shard 1 Shard 2 Shard 3

(Lee, {1:1, 2:5}) (Kim, {1:4, 2:2}) (Ha, {1:2, 2:4}) (Oh, {1:2, 2:4}

(Lee, {4:2}) (Kim, {4:5}) (Ha, {3:3}) (Oh, {4:5})

(Lee, {5:4}) (Kim, {5:1, 6:2}) (Ha, {6:5}) (Oh, {5:1})

Consumer

Merchant

Commerce Interaction Matrix

Shard 1

Our Solution - Shard Partition

Shard 1 Shard 2 Shard 3

(Lee, {1:1, 2:5}) (Kim, {1:4, 2:2}) (Ha, {1:2, 2:4}) (Oh, {1:2, 2:4}

(Lee, {4:2}) (Kim, {4:5}) (Ha, {3:3}) (Oh, {4:5})

(Lee, {5:4}) (Kim, {5:1, 6:2}) (Ha, {6:5}) (Oh, {5:1})

Consumer

Merchant

Commerce Interaction Matrix

Shard 2Shard 1

Our Solution - Shard Partition

Shard 1 Shard 2 Shard 3

(Lee, {1:1, 2:5}) (Kim, {1:4, 2:2}) (Ha, {1:2, 2:4}) (Oh, {1:2, 2:4}

(Lee, {4:2}) (Kim, {4:5}) (Ha, {3:3}) (Oh, {4:5})

(Lee, {5:4}) (Kim, {5:1, 6:2}) (Ha, {6:5}) (Oh, {5:1})

Consumer

Merchant

Commerce Interaction Matrix

Shard 2Shard 1

Our Solution - Shard Partition

Shard 1 Shard 2 Shard 3

(Lee, {1:1, 2:5}) (Kim, {1:4, 2:2}) (Ha, {1:2, 2:4}) (Oh, {1:2, 2:4}

(Lee, {4:2}) (Kim, {4:5}) (Ha, {3:3}) (Oh, {4:5})

Shard 3(Lee, {5:4}) (Kim, {5:1, 6:2}) (Ha, {6:5}) (Oh, {5:1})

Consumer

Merchant

Commerce Interaction Matrix

Our Solution - Parallel Shard Processing

Shard 1 Shard 2 Shard 3Consumer

Merchant

Commerce Interaction Matrix

Our Solution - Parallel Shard Processing

Shard 1 Shard 2 Shard 3

Stage 1:!Compute the individual contributions of each rating

Consumer

Merchant

Commerce Interaction Matrix

MapReduce Job !for Shard 1!

MapReduce Job !for Shard 2!

MapReduce Job !for Shard 3!

Our Solution - Parallel Shard Processing

Shard 1 Shard 2 Shard 3

Global MapReduce Job for ALS !

Stage 1:!Compute the individual contributions of each rating

Stage 2:!Aggregate all contributions for every user and update their models in parallel

Consumer

Merchant

Commerce Interaction Matrix

MapReduce Job !for Shard 1!

MapReduce Job !for Shard 2!

MapReduce Job !for Shard 3!

For Every Map Job …

…"…"

JobTracker

Worker 1 Worker 2 Worker 3

…"

…"

block 2 block 3block 1

For Every Map Job …

…"…"

JobTracker

Worker 1 Worker 2 Worker 3

…"

…"

block 2 block 3block 1

Scalability Comparison

#Customers! #Merchants! #Transactions! Capacity!Apache Mahout! 4,790,651! 1,195,890! 23,721,103! Fail!

Our Method! 45,948,109! 4,386,744! 324,084,408! Success!

More than 10x scalability improvement! !

Summary

• Recommendation Problem

• Collaborative Filtering by Matrix Factorization

• Alternative Least Square (ALS)

• A Scalable Solution

Thank You!!Question & Answer

top related