data science in e-commerce
TRANSCRIPT
![Page 1: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/1.jpg)
Data Science in E-commerce industry DSSP 2016/05/20Vincent Michel
Big Data Europe, BDD, Rakuten Inc. / PriceMinister
[email protected] @HowIMetYourData
![Page 2: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/2.jpg)
2
Short Bio
ESPCI: engineer in Physics / Biology
ENS Cachan: MVA Master Mathematics Vision and Learning
INRIA Parietal team: PhD in Computer ScienceUnderstanding the visual cortex by using classification techniques
Logilab – Development and data science consultingData.bnf.fr (French National Library open-data platform)Brainomics (platform for heterogeneous medical data)
EducationExperience
Rakuten PriceMinister– Senior Developer and data scientistData engineer and data science consulting
![Page 3: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/3.jpg)
Software engineeringLessons learned from (painful) experiences
![Page 4: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/4.jpg)
4
Do not redo it yourself !
Lots of really interesting open-source libraries for all your needs:Test first on a small POC, then contribute/developScikit-learn, pandas, Caffe, Scikit-image, opencv, ….Be careful: it is really easy to do something wrong !
Open-data:More and more open-data for catalogs, …E.g. data.bnf.fr
~ 2.000.000 authors~ 200.000 works~ 200.000 topics
Contribute to open-source:Is there a need / pool of potential developers ?Do it well (documentation / test)Unless you are doing some kind of super magical algorithmMay bring you help, bug fixes, and engineers ! But it takes time and energy
![Page 5: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/5.jpg)
5
Quality in data science software engineering
Never underestimates integration costReally easy to write a 20 lines Python code doing somefancy Random Forests… …that could be really hard to deploy (data pipeline, packaging, monitoring)Developer != DevOps != Sys admin
Make it clean from the start (> 2 days of dev or > 100 lines of code):Tests, tests, tests, tests, tests, tests, tests, …DocumentationPackaging / supervision / monitoringRelease often release earlierAgile development, Pull request, code versioning
Choose the right tool:Do you really need this super fancy NoSQL databaseto store your transactions?
![Page 6: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/6.jpg)
6
Monitoring and metrics
Always monitor:Your development: continuous integration (Jenkins)Your service: nagios/shinkenYour business data (BI): KibanaYour user: trackerYour data science process : e.g. A/B test
Evaluation:Choose the right metricPrediction accuracy / Precision-recall …Always A/B test rather than relying on personal thoughtsGood question leads to good answer: Define your problem
![Page 7: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/7.jpg)
Hiring remarksFinding the good data scientist
![Page 8: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/8.jpg)
8
Finding your data scientist
Do not try to find a unicorn!
Define your needs(and unicorns no longer exist…)
![Page 9: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/9.jpg)
9
Few remarks on hiring – my personal opinion
Be careful of CVs with buzzwords!E.g. “IT skills: SVM (linear, non-linear), Clustering (K-means, Hierarchical), Random Forests, Regularization (L1, L2, Elastic net…) …”It is like as someone saying “ IT skills: Python (for loop, if/else pattern, …)
Often found in Junior CVs (ok), but huge warning in Senior CVs
Hungry for data?Loving data is the most important thing to checkOpendata? Personal project? Curious about data? (Hackaton?)Pluridisciplinary == knowing how to handle various datasets
Check for IT skills:Should be able to install/develop new libraries/algorithmsA huge part of the job could be to format / cleanup the dataExperience VS education -> Autonomy
![Page 10: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/10.jpg)
Recommendations @RakutenData science use-case
![Page 11: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/11.jpg)
11
Rakuten Group Worldwide
Recommendationchallenges
Different languagesUsers behaviorBusiness areas
![Page 12: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/12.jpg)
12
Rakuten Group in Numbers
Rakuten in Japan
> 12.000 employees> 48 billions euros of GMS> 100.000.000 users> 250.000.000 items> 40.000 merchants
Rakuten Group
Kobo 18.000.000 usersViki 28.000.000 usersViber 345.000.000 users
![Page 13: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/13.jpg)
13
Rakuten Ecosystem
Rakuten global ecosystem :Member-based business model that connects Rakuten servicesRakuten ID common to various Rakuten servicesOnline shopping and services;
Main business areasE-commerceInternet financeDigital content
Recommendation challengesCross-servicesAggregated dataComplex users features
![Page 14: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/14.jpg)
14
Rakuten’s e-commerce: B2B2C Business Model
Business to Business to Consumer:Merchants located in different regions / online virtual shopping mallMain profit sources
• Fixed fees from merchants• Fees based on each transaction and other service
Recommendationchallenges
Many shopsItems referencesGlobal catalog
![Page 15: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/15.jpg)
15
Big Data Department @ Rakuten
Big Data Department150+ engineers – Japan / Europe / US
Missions
Development and operations of internal systems for:
RecommendationsSearchTargetingUser behavior tracking
Average traffic
> 100.000.000 events / day> 40.000.000 items view / day> 50.000.000 search / day> 750.000 purchases / day
Technology stackJava / Python / RubySolr / LuceneCassandra / CouchbaseHadoop / Hive / PigRedis / Kafka
![Page 16: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/16.jpg)
16
Recommendations on Rakuten Marketplaces
Non-personalized recommendationsAll-shop recommendations:
Item to itemUser to item
In-shop recommendationsReview-based recommendations
Personalized recommendationsPurchase history recommendationsCart add recommendationsOrder confirmation recommendations
System status and scaleIn production in over 35 services of Rakuten Group worldwideSeveral hundreds of servers running:
HadoopCassandraAPIS
![Page 17: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/17.jpg)
17
Challenges in Recommendations
ItemsCatalogue
ItemsSimilarity
Recommendationsengine
EvaluationProcess
Items cataloguesCatalogue for multiple shops with different items
references ?Items similarity / distances
Cross services aggregation ?Lots of parameters ?
Recommendations engineBest / optimal recommendations logic ?
Evaluation processOffline / online evaluation ?Long-tail ? KPI ?
![Page 18: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/18.jpg)
18
Recommendations Architecture: Constantly Evolving
BrowsingEvents
Cocounts Storage
PurchaseEvents
Cat
alog
ue(s
)
Dis
tribu
tion
laye
r
RecommendationsOffline / materialized
RecommendationsOnline algebra / multi-arm
![Page 19: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/19.jpg)
19
Items Catalogues
Use different levels of aggregation to improve recommendations
Category-level(e.g. food, soda, clothes, …)
Product-level(manufactured items)
Item in shop-level(specific product sell by a specific shop)
Increased statistical power in co-events computation
Easier business handling(picking the good item)
![Page 20: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/20.jpg)
20
Enriching Catalogues using Record Linkage
Marketplace 2Marketplace 1 Reference database
Record linkage Use external sources (e.g., Wikidata) to align markets' products Fuzzy matching of 600K vs 350K items for movies alignments usecase. Blocking algorithm
Cross recommendation Global catalog Items aggregation Helps with cold start issues Improved navigation
![Page 21: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/21.jpg)
21
Co-occurrences and Similarities Computation
Only access to unitary data (purchase / browsing)
Use co-occurrences for computing items similarity
Multiple possible parameters: Size of time window to be considered:
Does browsing and purchase data reflect similar behavior ?
Threshold on co-occurrencesIs one co-occurrence significant enough to be used ? Two ? Three ?
Symmetric or asymmetricIs the order important in the co-occurrence ? A then B == B then A ?
Similarity metricsWhich similarity metrics to be used based on the co-occurrences ?
![Page 22: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/22.jpg)
22
Co-occurrences Example
Browsing
Purchase
Session ? Session ?Time window 1
Session ?Time window 2
07/11/2015 08/11/2015
08/11/2015
24/11/2015
08/11/2015
08/11/2015
10/09/2015
08/09/2015
10/09/2015
![Page 23: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/23.jpg)
23
Co-occurrences Computation
Co-purchases
Co-browsing
Classical co-occurrences
Complementaryitems
Substituteitems
Other possible co-occurrences
Items browsed and bought together
Items browsed and not bought together
“You may also want…”
“Similar items…”
08/11/2015
08/11/2015
08/11/2015
07/11/2015
08/11/201510/09/2015
08/09/2015
07/11/2015
![Page 24: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/24.jpg)
24
Recommendation Quality Challenges
Recommendations categories
Cold start issue• External data ?• Cross-services ?
Hot products (A)• Top-N items ?
Short tail (B)
Long tail (C + D)
Minor Product
Major Product
(Popular)New Product
OldProduct
(A)(B)
(D)
(C)
![Page 25: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/25.jpg)
25
Long Tail is Fat
Long tail numbers
• Most of the items are long tail• They still represent a large
portion of the traffic
Long tail approaches
• Content-based• Aggregation / clustering• Personalization
Popular
Short tail
Long tail
Browsing share Number of items
Long tail Short tail Popular
![Page 26: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/26.jpg)
26
Recommendations Offline Evaluation
Pros/Cons
• Convenient way to try new ideas
• Fast and cheap• But hard to align
with online KPI
Approaches
• Rescoring• Prediction game• Business simulator
![Page 27: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/27.jpg)
27
Public Initiative – Viki Recommendation Challenge
567 submissions from 132 participantshttp://www.dextra.sg/challenges/rakuten-viki-video-challenge
![Page 28: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/28.jpg)
28
Datascience everywhere !
Rakuten provides marketplaces worldwide
Specific challenges for recommendations
Items catalogue: reinforce statistical power of co-occurrences across shops and services;
Items similarities: find the good parameters for the different use-cases;
Recommendations models: what is the best models for in-shop, all-shops, personalization?
Evaluation: handling long-tail? Comparing different models?
![Page 29: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/29.jpg)
29
THANKS !
Questions ?
More on Rakuten tech initiatives
http://www.slideshare.net/rakutentechhttp://rit.rakuten.co.jp/oss.html
http://rit.rakuten.co.jp/opendata.html
Positions
• http://global.rakuten.com/corp/careers/bigdata/• http://www.priceminister.com/recrutement/?p=197
![Page 30: Data Science in E-commerce](https://reader035.vdocument.in/reader035/viewer/2022062822/587e00421a28abe11a8b47fb/html5/thumbnails/30.jpg)
30
We are Hiring!
Big Data Department – team in Parishttp://global.rakuten.com/corp/careers/bigdata/
http://www.priceminister.com/recrutement/?p=197
Data Scientist / Software Developer
Build algorithms for recommendations, search, targeting Predictive modeling, machine learning, natural language processing Working close to business Python, Java, Hadoop, Couchbase, Cassandra…
Also hiring: search engine developers, big data system administrators, etc.