live predictions with schemaless data at scale. mlmu kosice, exponea
TRANSCRIPT
![Page 1: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/1.jpg)
Live predictions with schemaless data at scale
Ondrej Brichta
![Page 2: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/2.jpg)
CONFIDENTIAL
WE ARE HIRING
![Page 3: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/3.jpg)
Our problem at ExponeaWhat we have:
● Terabytes of customer data● Means to process all the data fast● Customers with business understanding (or our consultants)
What we needed:
● Probability classificator ○ For any chosen event or attribute○ Able to process any given data and return a reasonable output without human interaction
![Page 4: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/4.jpg)
Our problem at ExponeaOne specific problem:
● How to choose which is the best banner to show right now?● Web page personalization?
Solution?
In session prediction
● Predict probability of buying for each customer live when they are online
![Page 5: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/5.jpg)
What steps do we need to do and automate?1. Business understanding2. Data understanding3. Data preparation4. Modeling5. Evaluation6. Deployment
![Page 6: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/6.jpg)
1. Business understandingKey questions:
● What problem do we have?● What are our use cases?● Whom are we creating this model for?
![Page 7: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/7.jpg)
1. Business understandingKey questions:
● What problem do we have? - We want to optimize our campaigns, ...● What are our use cases? - Should I show this banner?
○ Translate this use case into machine learning - predict whether customer will buy something
● Whom are we creating this model for? - NOT FOR US, our customers!
![Page 8: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/8.jpg)
1. Business understandingKey questions:
● What problem do we have? - We want to optimize our campaigns, ...● What are our use cases? - Should I show this banner? ...send this email?
○ Translate this use case into machine learning - predict whether customer will buy something
● Whom are we creating this model for? - NOT FOR US, our customers!
Hard or impossible to automate, but!
We can delegate this to customers or consultants as easier problems.
● Create few templates to choose from○ At least few easy to use and few advanced to modify the model
![Page 9: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/9.jpg)
2. Data understanding● Closely related to business understanding● Most important part of machine learning!● Key questions
○ What data do i have?○ Is the data valuable?○ What do I want to predict?○ Should I train on all data or only subset? <- crucial
● Changes here can influence outcome by the largest amount
![Page 10: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/10.jpg)
2. Data understanding
Automation?
Yes, but difficult...
![Page 11: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/11.jpg)
2. Data understanding● Have pre sets for each template
○ We have in session predictions just for this problem
● Experiment with variables (date ranges, data filters,...)○ We have variables set to best values for all projects. Can be manually adjusted by user
● Have a model providing unselected variables
![Page 12: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/12.jpg)
3. Data preparation● Cleaning the data - have some common structure for easy selection● Derive attributes - have general easy aggregates (count, first, last,...)● Dataset balancing - easy rules to determine which technique to use● Aggregates - have a model which selects which complex aggregates to use● Reducing dimensionality - feature selection● Mapping data to “clean” values - most frequent items, vector indexer● Casting data to vectors - vector assembler
![Page 13: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/13.jpg)
3. Data preparation
![Page 14: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/14.jpg)
3. Data preparation
![Page 15: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/15.jpg)
● Customers connect from different platforms● Customers log in after few events
3. Data preparation
![Page 16: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/16.jpg)
3. Data preparation
![Page 17: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/17.jpg)
4. Modeling
![Page 18: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/18.jpg)
4. Modeling
![Page 19: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/19.jpg)
5. Evaluation● How good is my model?● ROC, F1, Lift curve
![Page 20: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/20.jpg)
5. Evaluation
![Page 21: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/21.jpg)
6. DeploymentHow do you want to use it? Must it be fast? Online or Offline?
![Page 22: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/22.jpg)
6. DeploymentFor us, Fast and Online!
![Page 23: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/23.jpg)
6. Deployment
![Page 24: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/24.jpg)
6. DeploymentOther deployments:
● Fast and Offline -> pre computed results for each customer● Slow and Online -> improved results with slower algorithms● Slow and Offline -> you messed up. Rework your solution.
![Page 25: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/25.jpg)
App overview
![Page 27: Live predictions with schemaless data at scale. MLMU Kosice, Exponea](https://reader034.vdocument.in/reader034/viewer/2022051521/5a64ffc87f8b9aa2548b58c3/html5/thumbnails/27.jpg)
CONFIDENTIAL
WE ARE HIRING