machine learning go’s and no - iot tech expo world series · to feature extraction vs algorithm...
TRANSCRIPT
![Page 1: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/1.jpg)
Machine learning Go’s and No-Go’s
Adrian Foltyn, External Data Science Expert
IoT / Blockchain / AI Expo, Amsterdam
27 June 2018
![Page 2: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/2.jpg)
Perfectly Portioned Ingredients For 3-5 Meals
Per Week
Personalised Fresh Food, Locally
Sourced
Easily Managed Via Subscription
Platform
1 Box Delivered Weekly To The
Door
NoPlanning
NoShopping
NoWaste
HelloFresh breaks the dinner routine by continuously innovating both service and product
![Page 3: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/3.jpg)
Disrupting the supply chain by cutting middlemen, ensuring higher margins and fresher products
![Page 4: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/4.jpg)
5
HelloFresh global footprint
+ Luxembourg and Northern France 2018
![Page 5: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/5.jpg)
How we use data science / machine learning
DATA SCIENCE @ HF
Fraud detection
Marketing attribution
Lifetime / churn prediction
Recommendation engines
Demand forecasting
minimize cost maximize revenue
Generalized Additive Models
Support Vector Regressions
Random Forests
Extreme Gradient Boosting
Bayesian networks
Collaborative filtering
Deep learning CNNs
ARIMA & other time series
models
Hidden Markov Models
Graph databases
![Page 6: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/6.jpg)
Myself: a drift between consulting and data science
▪ Quant methods and computational psychoacoustics
▪ Demand forecasting
▪ Market research & business intelligence
▪ Data Science in strategic consulting
▪ Data Science in-house
![Page 7: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/7.jpg)
Data Science Lead’s decision path
In-house <----> vendor
Bottom up <----> top down
Algorithms <----> features
Quality <----> business-valid output
Who is going to do it?
What approach shall we take?
Where shall we focus?
What result is expected?
![Page 8: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/8.jpg)
In-house <----> vendor
Bottom up <----> top down
Algorithms <----> features
Quality <----> business-valid output
![Page 9: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/9.jpg)
Why do I need a vendor as CDO / VP / Director / Head of Data Science?
▪ Actual business reasons, bla bla bla…
and...
▪ I have too few people
▪ My people don’t know that sh*t
▪ I don’t believe my team can do better
▪ I’m easily impressed by tech gimmicks
▪ I want a butt-cushion = evidence I’m right
▪ I’d probably need to pay ridiculous money to hire those PhDs…
▪ …..
![Page 10: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/10.jpg)
Big data landscape 2017
![Page 11: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/11.jpg)
Common fallacies -> collaboration with pink glasses on
![Page 12: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/12.jpg)
Common fallacies -> methodology & output
![Page 13: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/13.jpg)
In-house <----> vendor
Bottom up <----> top down
Algorithms <----> features
Quality <----> business-valid output
![Page 14: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/14.jpg)
First decision: forecast top-down or bottom-up ?
CustomerID
Weeks from activation
Weeks from last pause
Weeksfrom last meal swap
No. of meal swaps total
No. ofboxes in total
Box type
……. Probabilityof getting a box
….. ….. ….. ….. ….. ….. ….. ….. 0.4
….. ….. ….. ….. ….. ….. ….. ….. 0.7
….. ….. ….. ….. ….. ….. ….. ….. 0.5
….. ….. ….. ….. ….. ….. ….. ….. 0.6
Total 0.55
Sales (boxes)**
Outlook of actives
Outlook of pauses
………..
`
** Dummy data in all charts
~
+
+
Theorem I ☺: given data availability, nearly all problems in ML can be represented by both a bottom-up and top-down approach
![Page 15: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/15.jpg)
Why do we need a top-down forecasting model?
Should I stay (or should I
go)?
Shall I take a break?
Do I care to see my options
?
Do I swap my
meals?
CANCEL? PAUSE?TRUST DEFAULT MEAL CHOICE ?
SWAP MEALS?
Y
N
Y
Y
Y
N
N
N
• Each decision increases variance of final output
• In a bottom-up model those variances could mitigate each other or could explode…
• Top-down model (aggregate number of boxes) is much more stable
![Page 16: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/16.jpg)
Next step is bottom up: predicting user-level demand withdeep learning
CNNs
Factorization / Word2Vec
![Page 17: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/17.jpg)
Marketing attribution -> again: top-down or bottom-up approach?
CustomerID
Touchpoint Paid-Social
Touchpt.Affiliates
Touchpt. Bloggers
Touchpt….
Likelihoodof outdoor exposure
Likelihoodof TVexposure
…….
Number of boxes overfirst year(CLV)
….. ….. ….. ….. ….. ….. ….. ….. 10.5
….. ….. ….. ….. ….. ….. ….. ….. 2.5
….. ….. ….. ….. ….. ….. ….. ….. 5.3
….. ….. ….. ….. ….. ….. ….. ….. 7.6
Total 9.2
Number of boxes from
newly acquired
customers
Activity in TV**
Activity in PaidSocial
**
………..
`
** Dummy data in all charts
~
+
+
![Page 18: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/18.jpg)
In-house <----> vendor
Bottom up <----> top down
Algorithms <----> features
Quality <----> business-valid output
![Page 19: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/19.jpg)
We need to manage the trade off between devoting resourcesto feature extraction vs algorithm selection & tuning
• In most situations, a healthy balance is required, tending in the direction indicated by the 4 criteria• Focus on algorithms does not necessarily entail that we dive straight into deep learning!
Data size How unstructured is your data
Looking for causal explanationsBudget
Focus on algorithms
Focus on features
Focus on algorithms
Focus on features
![Page 20: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/20.jpg)
Some problems just get no good solutions without right features….
Demand forecasting backtest in country 1 Demand forecasting backtest in country 2
• Neither standard time series nor average ensemble forecast work• Best forecast method selected by progressive cross validation is better (final.forecast)• Frequent review based on backtesting and root-cause analysis is even better
` `
Mo
del
err
or
bas
ed o
n 1
6-w
eek
pro
gres
sive
cro
ss-v
alid
atio
n
Mo
del
err
or
bas
ed o
n 1
6-w
eek
pro
gres
sive
cro
ss-v
alid
atio
n
![Page 21: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/21.jpg)
In-house <----> vendor
Bottom up <----> top down
Algorithms <----> features
Quality <----> business-valid output
![Page 22: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/22.jpg)
Standard data science process is not linear => requires iterations
Source: Microsoft
• The key is to factor in iterations with business stakeholdersas indispensablesteps in ALL phasesof project timeline, not only at largemilestones
![Page 23: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/23.jpg)
Balancing business insight and simulation / prediction power
• Typically, statistics used doesn’t align exactly to desired business outcomes
• There is usually an inverse relationship between how well the model predicts and how interpretable are its components
• In marketing attribution, forcing intuitive constraints (non-negative contribution of channels, convex shape of response = saturation etc.) often affect fit and predictive strength
• Hitting sweet spot requires an iterative process of refining the model against business assumptions and usability / actionability
![Page 24: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/24.jpg)
Example: Simulator for marketing attribution & ROI purposes based on a PCA + Bayesian network + GAM model
• MVP alone required 8-9 iterations…• …and it’s an ongoing process
![Page 25: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/25.jpg)
Conclusions: Go’s for ML
▪ Vendors are your friends but don’t marry them
▪ Combine bottom-up and top-down approach
▪ Make informed decisions about balancing resources between algorithm dev / selection and feature engineering
▪ Factor in iterations with business and make it part of model building
▪ Keep calm and always be prepared to explain discrepancies, since…
Predicting / forecasting / simulating is the art of saying what will
happen and then explaining why it didn’t…
![Page 26: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/26.jpg)
We’re hiring at HelloFresh!
▪ Data Scientists
‒ Python, R, Spark, Scala, ML + computer vision / NLP / other deep learning experience
▪ Machine Learning Engineers
‒ Python, Hadoop, Spark, Kafka, ML productionizing expertise
▪ Data Engineers
‒ Python, Hadoop, Spark, Kafka, Airflow, ETL experience
https://www.hellofresh.com/careers/
![Page 27: Machine learning Go’s and No - IoT Tech Expo World Series · to feature extraction vs algorithm selection & tuning • In most situations, a healthy balance is required, tending](https://reader034.vdocument.in/reader034/viewer/2022042316/5f04be3e7e708231d40f7bc3/html5/thumbnails/27.jpg)
Thanks!Any Questions?