vertical recommendation using collaborative filtering

56
Large-scale Vertical Recommendation

Upload: gorass

Post on 14-Aug-2015

89 views

Category:

Technology


5 download

TRANSCRIPT

Large-scale Vertical Recommendation

Prepared by

Goh Yew Yap, Hu Yiqun and Chen Yanhui

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 3

Table of Contents

Collaborative Filtering

Technology

Recommendation

Evaluation Method

Example

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Collaborative Filtering 101

4

Commerce Interaction Matrix

We want to model the affinity between consumers and merchants

More transactions occurred, more confident we believe the relationship

Consumer

Merchant

Likeness Matrix

Confidence Matrix

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Collaborative Filtering 101A matrix factorization method

5

Data Fitting Regularization

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Alternative Least Square

6

Iteratively updateFix V and update U:

Fix U and update V:

RegularizationData Fitting

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Scalable Collaborative FilteringImprove the scalability of ALS

7

Shard 1 Shard 2 Shard 3Consumer

Merchant

Commerce Interaction Matrix

Mapreduce Job for Shard 1

Mapreduce Job for Shard 2

Mapreduce Job for Shard 3

Stage 1:Compute the individual contributions of each rating

Stage 2:Aggregate all contributions for every user and update their models in parallel

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Reference

8

Scalable Collaborative Filtering for Commerce Recommendation

http://www.slideshare.net/jekky_yiqun/scalable-collaborative-filtering-for-commerce-recommendation

What can we use the CF model for?

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 10

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

What does CF Learn?

11

03.05.0

15.09.0

2.11.0

3.001.001.03.12.07.0

VU T

Each vector of U model a consumer by d implicit attributes

Each vector of V model a merchant by d implicit attributes

Score = UiT . Vj = 1.162

Can we use existing modelto recommend next best vertical?

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Model V

13

...1...7.0

...2.0...2.1

...1.0...1.0

...2.0...3.0

...03.05.0

...15.09.0

...2.11.0

...3.001.0

V

If we know that M1, M37, M99 belongs to specific vertical (E.g. electronics) then we can group their features vector and use that to calculate the verti-cal score.

M1 M37 M99

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Next Best Targeting

14

One Touch

Large-scale CF Model

Next Best Merchant

Application Applications

Analytics Services

Analytics Infra

Next Best Vertical

Next Best Corridor

Next Best ……

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Recommendation Flow

Cleanup old matrix

Generate weekly matrix

Merge matrixGenerate CF

Model

Generate Recommendation

Generate CF Module

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Next Best Vertical/Corridor Recommendation Flow

Cleanup old matrix

Generate weekly matrix

Merge matrixGenerate CF

Model

Generate Recommendation

Generate CF Module

RankingVertical / Corridor

Grouping Vertical / Corridor

How to implement this?

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Technology Stack

18

Hadoop / MapReduce

Mahout Pig

Python / Shell

HDFS

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Recommendation Flow

Cleanup old matrix

Generate weekly matrix

Merge matrixGenerate CF

Model

Generate Recommendation

Generate CF Module

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

If running it manually or put everything in script

./bin/cleanup-old-matrix.sh <parameters...>pig gen-matrix.pig -param <parameters...>pig merge-matrix.pig -param <parameters...>./bin/gen-block-als.sh <parameters...>./bin/gen-block-recomd-consumer.sh <parameters...>

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Define individual step

{

"comment":"-----------------Example------------------------------",

"type": "pig|shell|module",

"enable":"true|false",

"script": “script-name|module-name",

"args": {

“args-1”:”value-1”,

“args-n":“value-n",

}

}

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Define ALS step in Generate CF Module

{

"comment":"-----------------Gen BLOCK ALS ------------------------------",

"type":"shell",

"enable":"true",

"script": "gen-block-als.sh",

"args": {

"auto_delete_output":"$auto_delete_output",

"hdfs_input_dir": "$base_dir/$app_name/input_matrix",

"hdfs_output_dir": "$base_dir/$app_name/model",

"numIterations": "20"

}

}

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

gen-block-als.sh

#!/bin/sh...while [[ $# > 1 ]]; dokey="$1"shift

case $key in--numIterations) NUM_ITERATIONS="$1" shift ;;...

# run distributed ALS-WR to factorize the rating matrix defined by the training set$MAHOUT blockParallelALS --input ${DATASET_DIR}/trainingSet/ --output ${WORK_DIR}/als/out \ --tempDir ${WORK_DIR}/als/tmp --numFeatures ${NUM_FEATURES} --numIterations ${NUM_ITERATIONS} \ --lambda ${LAMBDA} --numThreadsPerSolver 1 \ --usesLongIDs ${USES_LONG_ID} --alpha 40 --implicitFeedback true --numUserBlocks ${NUM_BLOCKS} \ --numItemBlocks ${NUM_BLOCKS} --queueName ${QUEUE_NAME}

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

BlockParallelALSFactorizationJob

public class BlockParallelALSFactorizationJob extends AbstractJob {…

for (int currentIteration = 0; currentIteration < numIterations; currentIteration++) {

/* broadcast M, read A row-wise, recompute U row-wise */log.info("Checking U (iteration {}/{})", currentIteration,

numIterations);runSolver(pathToUserRatings(), pathToU(currentIteration),

pathToM(currentIteration - 1),pathToPrefix("UYtY", currentIteration),

currentIteration, "U",numItems, numUserBlocks, numItemBlocks, fs);

/* broadcast U, read A' row-wise, recompute M row-wise */log.info("Checking M (iteration {}/{})", currentIteration,

numIterations);runSolver(pathToItemRatings(), pathToM(currentIteration),

pathToU(currentIteration),pathToPrefix("MYtY", currentIteration),

currentIteration, "M",numUsers, numUserBlocks, numItemBlocks, fs);

}

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

BlockParallelALSFactorizationJob

for (int blockId = 0; blockId < numBlocks2; blockId++) {

// process each block

Path blockRatings = new Path(ratings.toString() + "/" + Integer.toString(blockId) + "-r-*");

Path blockRatingsOutput = new Path(getTempPath(blockOutputName).toString() + "/" +

Integer.toString(blockId));

Path blockFixUorM = new Path(pathToUorM.toString() + "/" + Integer.toString(blockId) + "-*-*");

if (!fs.exists(new Path(blockRatingsOutput.toString() + "/_SUCCESS"))) {

Job solveBlockUorI = prepareJob(blockRatings, blockRatingsOutput,

SequenceFileInputFormat.class,

MultithreadedSharingMapper.class, IntWritable.class,

ALSContributionWritable.class, SequenceFileOutputFormat.class, name + " blockId: " +

blockId);

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

BlockSolveImplicitFeedbackMapper & Solver

@Override protected void map(IntWritable userOrItemID, VectorWritable ratingsWritable, Context ctx) throws IOException, InterruptedException {

BlockImplicitFeedbackAlternatingLeastSquaresSolver solver = getSharedInstance(); uiOrmj.setA(solver.solveA(ratingsWritable.get())); uiOrmj.setb(solver.solveb(ratingsWritable.get())); ctx.write(userOrItemID, uiOrmj); }

public Matrix solveA(Vector ratings) throws IOException {

return getYtransponseCuMinusIYPlusLambdaI(ratings);

}

public Matrix solveb(Vector ratings) throws IOException {

return getYtransponseCuPu(ratings);

}

Next Best Vertical

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Revised Recommendation Flow For Next Best Vertical

28

Vertical Grouping

Merchant Recommendation

for respective Vertical

Vertical Score Aggregation

Vertical Merchant Merge

Vertical Ranking

Cleanup old matrix

Generate weekly matrix

Merge matrixGenerate CF

Model

Generate CF Model Module

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Vertical Grouping

29

M (Merchant

Model)

M - Electronic

M - Fashion

M - Travel

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Define Vertical Grouping Step

{

"comment":"-----------------Group Vertical M ------------------------------",

"type":"pig",

"enable":"true",

"script": "filter-by-verticals.pig",

"args": {

"auto_delete_output":"$auto_delete_output",

"from_week":"$to_week", "to_week":"$to_week",

"vertical_table": "$vertical_table",

"hdfs_input_dir" : "$base_dir/$app_name/model/als/out/M",

"hdfs_output_dir": "$base_dir/$app_name/model/vertical/M"

}

}30

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

filter-by-verticals.pig

-- separate model U or M by vertical...pairs = LOAD '$hdfs_input_dir' USING $SEQFILE_LOADER ( '-c $INT_CONVERTER', '-c $VECTOR_CONVERTER -- -dense -cardinality $numFeatures') AS (key: int, value);

-- Join merchant models with vertical infodw_vertical = LOAD '$vertical_table' USING PigStorage(',') AS (merchant_id:chararray, vertical:chararray);

shortid_vertical = FOREACH dw_vertical GENERATE (int) LongIdToIndex(merchant_id) as shortid,vertical as vertical;

31

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

filter-by-verticals.pig

join_list = JOIN shortid_vertical BY shortid LEFT, pairs BY key;

-- Filter merchant models M for every verticalelectronics_list = FILTER join_list BY shortid_vertical::vertical=='Electronics';...

-- Generate merchant models M for every verticalelectronics_list = FOREACH electronics_list GENERATE key, value;...

STORE electronics_list INTO '$hdfs_output_dir/electronics/' USING $SEQFILE_STORAGE ('-c $INT_CONVERTER', '-c $VECTOR_CONVERTER');...

32

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Revised Recommendation Flow For Next Best Vertical

33

Vertical Grouping

Merchant Recommendation

for respective Vertical

Vertical Score Aggregation

Vertical Merchant Merge

Vertical Ranking

Cleanup old matrix

Generate weekly matrix

Merge matrixGenerate CF

Model

Generate CF Model Module

How to evaluate?

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Evaluation Method

35

Train• Predict top 3 verticals for selected group of consumers• Use 1 year data to train CF model (Q4, 2013 ~ Q3, 2014)

Validation• Use 3 months data for validation (Q4, 2014)• If a consumer transacts with merchants belonging to one of the

predicted verticals, it is considered accurate

Result: 70.47%

Example 1

Previous Purchase Behavior

One Touch

Transacted with Healthcare Merchant

Next Best Vertical Recommendation:Healthcare, Retail, Fashion

One Touch

Healthcare Merchant Recommendation

One Touch

Retail Merchant Recommendation

One Touch

Retail Merchant Recommendation

Purchase occurred in Validation Period (Q4, 2014)

One Touch

Transacted with retail merchant selling health supply

Example 2

Previous Purchase Behavior

One Touch

Transacted with Fashion Merchant

One Touch

Transacted with Education Merchant

One Touch

Transacted with Fashion Merchant

Next Best Vertical Recommendation:Fashion, Retail, Home & Garden

One Touch

Retail Merchant Recommendation

One Touch

Fashion Merchant Recommendation

Purchase occurred in Validation Period (Q4, 2014)

One Touch

Transacted with retail merchant selling baby stuffs

One Touch

Transacted with fashion merchant selling kid wearing

The End