workday: building large scale machine learning pipelines
Post on 15-Apr-2017
1.230 Views
Preview:
TRANSCRIPT
Building Large Scale Machine Learning Pipelines
Vlad Giverts
Sr Director of Engineering
Workday Confidential
Background
Workday Confidential
Retention Risk
Architecture
Retention Risk
ML Pipeline
Spark
YARN HDFS
Architecture
Retention Risk
ML Pipeline
Spark Streaming
Kafka
Cassandra
Spark Streaming
Workday Confidential
Data Pipeline
Raw Data Feature Engineering
Model Training
Model Validation
Workday Confidential
Data and Features
Num Promotions
Pay Range Penetration
Time in Current Job
Manager Attrition Rate Time Between
Promotions
Tenure
Workday Confidential
Data Pipeline
Raw Data Feature Engineering
Model Training
Model Validation
Workday Confidential
Cross Validation
Workday Confidential
Data Pipeline
Raw Data Feature Engineering
Model Training
Model Validation
Workday Confidential
Data Pipeline
Raw Data Feature Engineering
Model Training
Model Validation
Partition Data
Workday Confidential
What are we predicting?
BarryRaise: $1,000
2014 2016
RaviTransferred
JohnLeft :(
JinPromoted!
YuryHired
TejasLeft :(
RogerLeft :(
So what happens?
Workday Confidential
Results?
95 / 95
What REALLY happens…
Workday Confidential
Results?
15 / 10
What do we want?
Workday Confidential
Temporal Validation
BarryRaise: $1,000
2014 2016
TejasTransferred
JohnLeft :(
JinPromoted!
YuryHired
TejasLeft :(
YuryHired
TejasLeft :(
3 mo 3 mo
Training with Validation
3 mo 3 mo 3 mo 3 mo 3 mo 3 mo
TRAINING VALIDATION
Early 2014 Mid 2015
Workday Confidential
Data Pipeline
Raw Data Feature Engineering
Model Training
Model Validation
Partition Data
Workday Confidential
Data Pipeline
Raw Data Feature Engineering
Model Training
Model Validation
Partition Data
Workday Confidential
Results?
2x Precision
3x Recall
30 / 3015 / 10
TM
Thank You
top related