vertical recommendation using collaborative filtering
Embed Size (px)
TRANSCRIPT

Large-scale Vertical Recommendation

Prepared by
Goh Yew Yap, Hu Yiqun and Chen Yanhui

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 3
Table of Contents
Collaborative Filtering
Technology
Recommendation
Evaluation Method
Example

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Collaborative Filtering 101
4
Commerce Interaction Matrix
We want to model the affinity between consumers and merchants
More transactions occurred, more confident we believe the relationship
Consumer
Merchant
Likeness Matrix
Confidence Matrix

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Collaborative Filtering 101A matrix factorization method
5
Data Fitting Regularization

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Alternative Least Square
6
Iteratively updateFix V and update U:
Fix U and update V:
RegularizationData Fitting

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Scalable Collaborative FilteringImprove the scalability of ALS
7
Shard 1 Shard 2 Shard 3Consumer
Merchant
Commerce Interaction Matrix
Mapreduce Job for Shard 1
Mapreduce Job for Shard 2
Mapreduce Job for Shard 3
Stage 1:Compute the individual contributions of each rating
Stage 2:Aggregate all contributions for every user and update their models in parallel

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Reference
8
Scalable Collaborative Filtering for Commerce Recommendation
http://www.slideshare.net/jekky_yiqun/scalable-collaborative-filtering-for-commerce-recommendation

What can we use the CF model for?

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 10

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
What does CF Learn?
11
03.05.0
15.09.0
2.11.0
3.001.001.03.12.07.0
VU T
Each vector of U model a consumer by d implicit attributes
Each vector of V model a merchant by d implicit attributes
Score = UiT . Vj = 1.162

Can we use existing modelto recommend next best vertical?

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Model V
13
...1...7.0
...2.0...2.1
...1.0...1.0
...2.0...3.0
...03.05.0
...15.09.0
...2.11.0
...3.001.0
V
If we know that M1, M37, M99 belongs to specific vertical (E.g. electronics) then we can group their features vector and use that to calculate the verti-cal score.
M1 M37 M99

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Next Best Targeting
14
One Touch
Large-scale CF Model
Next Best Merchant
Application Applications
Analytics Services
Analytics Infra
Next Best Vertical
Next Best Corridor
Next Best ……

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Recommendation Flow
Cleanup old matrix
Generate weekly matrix
Merge matrixGenerate CF
Model
Generate Recommendation
Generate CF Module

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Next Best Vertical/Corridor Recommendation Flow
Cleanup old matrix
Generate weekly matrix
Merge matrixGenerate CF
Model
Generate Recommendation
Generate CF Module
RankingVertical / Corridor
Grouping Vertical / Corridor

How to implement this?

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Technology Stack
18
Hadoop / MapReduce
Mahout Pig
Python / Shell
HDFS

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Recommendation Flow
Cleanup old matrix
Generate weekly matrix
Merge matrixGenerate CF
Model
Generate Recommendation
Generate CF Module

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
If running it manually or put everything in script
./bin/cleanup-old-matrix.sh <parameters...>pig gen-matrix.pig -param <parameters...>pig merge-matrix.pig -param <parameters...>./bin/gen-block-als.sh <parameters...>./bin/gen-block-recomd-consumer.sh <parameters...>

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Define individual step
{
"comment":"-----------------Example------------------------------",
"type": "pig|shell|module",
"enable":"true|false",
"script": “script-name|module-name",
"args": {
“args-1”:”value-1”,
…
“args-n":“value-n",
}
}

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Define ALS step in Generate CF Module
{
"comment":"-----------------Gen BLOCK ALS ------------------------------",
"type":"shell",
"enable":"true",
"script": "gen-block-als.sh",
"args": {
"auto_delete_output":"$auto_delete_output",
"hdfs_input_dir": "$base_dir/$app_name/input_matrix",
"hdfs_output_dir": "$base_dir/$app_name/model",
"numIterations": "20"
}
}

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
gen-block-als.sh
#!/bin/sh...while [[ $# > 1 ]]; dokey="$1"shift
case $key in--numIterations) NUM_ITERATIONS="$1" shift ;;...
# run distributed ALS-WR to factorize the rating matrix defined by the training set$MAHOUT blockParallelALS --input ${DATASET_DIR}/trainingSet/ --output ${WORK_DIR}/als/out \ --tempDir ${WORK_DIR}/als/tmp --numFeatures ${NUM_FEATURES} --numIterations ${NUM_ITERATIONS} \ --lambda ${LAMBDA} --numThreadsPerSolver 1 \ --usesLongIDs ${USES_LONG_ID} --alpha 40 --implicitFeedback true --numUserBlocks ${NUM_BLOCKS} \ --numItemBlocks ${NUM_BLOCKS} --queueName ${QUEUE_NAME}

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
BlockParallelALSFactorizationJob
public class BlockParallelALSFactorizationJob extends AbstractJob {…
for (int currentIteration = 0; currentIteration < numIterations; currentIteration++) {
/* broadcast M, read A row-wise, recompute U row-wise */log.info("Checking U (iteration {}/{})", currentIteration,
numIterations);runSolver(pathToUserRatings(), pathToU(currentIteration),
pathToM(currentIteration - 1),pathToPrefix("UYtY", currentIteration),
currentIteration, "U",numItems, numUserBlocks, numItemBlocks, fs);
/* broadcast U, read A' row-wise, recompute M row-wise */log.info("Checking M (iteration {}/{})", currentIteration,
numIterations);runSolver(pathToItemRatings(), pathToM(currentIteration),
pathToU(currentIteration),pathToPrefix("MYtY", currentIteration),
currentIteration, "M",numUsers, numUserBlocks, numItemBlocks, fs);
}

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
BlockParallelALSFactorizationJob
…
for (int blockId = 0; blockId < numBlocks2; blockId++) {
// process each block
Path blockRatings = new Path(ratings.toString() + "/" + Integer.toString(blockId) + "-r-*");
Path blockRatingsOutput = new Path(getTempPath(blockOutputName).toString() + "/" +
Integer.toString(blockId));
Path blockFixUorM = new Path(pathToUorM.toString() + "/" + Integer.toString(blockId) + "-*-*");
if (!fs.exists(new Path(blockRatingsOutput.toString() + "/_SUCCESS"))) {
Job solveBlockUorI = prepareJob(blockRatings, blockRatingsOutput,
SequenceFileInputFormat.class,
MultithreadedSharingMapper.class, IntWritable.class,
ALSContributionWritable.class, SequenceFileOutputFormat.class, name + " blockId: " +
blockId);
…

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
BlockSolveImplicitFeedbackMapper & Solver
…
@Override protected void map(IntWritable userOrItemID, VectorWritable ratingsWritable, Context ctx) throws IOException, InterruptedException {
BlockImplicitFeedbackAlternatingLeastSquaresSolver solver = getSharedInstance(); uiOrmj.setA(solver.solveA(ratingsWritable.get())); uiOrmj.setb(solver.solveb(ratingsWritable.get())); ctx.write(userOrItemID, uiOrmj); }
…
public Matrix solveA(Vector ratings) throws IOException {
return getYtransponseCuMinusIYPlusLambdaI(ratings);
}
public Matrix solveb(Vector ratings) throws IOException {
return getYtransponseCuPu(ratings);
}

Next Best Vertical

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Revised Recommendation Flow For Next Best Vertical
28
Vertical Grouping
Merchant Recommendation
for respective Vertical
Vertical Score Aggregation
Vertical Merchant Merge
Vertical Ranking
Cleanup old matrix
Generate weekly matrix
Merge matrixGenerate CF
Model
Generate CF Model Module

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Vertical Grouping
29
M (Merchant
Model)
M - Electronic
M - Fashion
M - Travel

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Define Vertical Grouping Step
{
"comment":"-----------------Group Vertical M ------------------------------",
"type":"pig",
"enable":"true",
"script": "filter-by-verticals.pig",
"args": {
"auto_delete_output":"$auto_delete_output",
"from_week":"$to_week", "to_week":"$to_week",
"vertical_table": "$vertical_table",
"hdfs_input_dir" : "$base_dir/$app_name/model/als/out/M",
"hdfs_output_dir": "$base_dir/$app_name/model/vertical/M"
}
}30

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
filter-by-verticals.pig
-- separate model U or M by vertical...pairs = LOAD '$hdfs_input_dir' USING $SEQFILE_LOADER ( '-c $INT_CONVERTER', '-c $VECTOR_CONVERTER -- -dense -cardinality $numFeatures') AS (key: int, value);
-- Join merchant models with vertical infodw_vertical = LOAD '$vertical_table' USING PigStorage(',') AS (merchant_id:chararray, vertical:chararray);
shortid_vertical = FOREACH dw_vertical GENERATE (int) LongIdToIndex(merchant_id) as shortid,vertical as vertical;
31

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
filter-by-verticals.pig
join_list = JOIN shortid_vertical BY shortid LEFT, pairs BY key;
-- Filter merchant models M for every verticalelectronics_list = FILTER join_list BY shortid_vertical::vertical=='Electronics';...
-- Generate merchant models M for every verticalelectronics_list = FOREACH electronics_list GENERATE key, value;...
STORE electronics_list INTO '$hdfs_output_dir/electronics/' USING $SEQFILE_STORAGE ('-c $INT_CONVERTER', '-c $VECTOR_CONVERTER');...
32

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Revised Recommendation Flow For Next Best Vertical
33
Vertical Grouping
Merchant Recommendation
for respective Vertical
Vertical Score Aggregation
Vertical Merchant Merge
Vertical Ranking
Cleanup old matrix
Generate weekly matrix
Merge matrixGenerate CF
Model
Generate CF Model Module

How to evaluate?

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Evaluation Method
35
Train• Predict top 3 verticals for selected group of consumers• Use 1 year data to train CF model (Q4, 2013 ~ Q3, 2014)
Validation• Use 3 months data for validation (Q4, 2014)• If a consumer transacts with merchants belonging to one of the
predicted verticals, it is considered accurate
Result: 70.47%

Example 1

Previous Purchase Behavior

One Touch
Transacted with Healthcare Merchant

Next Best Vertical Recommendation:Healthcare, Retail, Fashion

One Touch
Healthcare Merchant Recommendation

One Touch
Retail Merchant Recommendation

One Touch
Retail Merchant Recommendation

Purchase occurred in Validation Period (Q4, 2014)

One Touch
Transacted with retail merchant selling health supply

Example 2

Previous Purchase Behavior

One Touch
Transacted with Fashion Merchant

One Touch
Transacted with Education Merchant

One Touch
Transacted with Fashion Merchant

Next Best Vertical Recommendation:Fashion, Retail, Home & Garden

One Touch
Retail Merchant Recommendation

One Touch
Fashion Merchant Recommendation

Purchase occurred in Validation Period (Q4, 2014)

One Touch
Transacted with retail merchant selling baby stuffs

One Touch
Transacted with fashion merchant selling kid wearing

The End