when we spark and when we don’t - qcon.ai5. pipelines leave behind multiple artifacts for...

When we Spark and when we don’t:

ML Pipeline Development at Stitch Fix

Talk Flow

● What is Stitch Fix?

● Infrastructure and Tech Stack

● Thoughts on Good Practices for Developing ML Pipelines

● Case Study: Inventory Recommendation Models

● Tooling & Abstractions at Stitch Fix

Share your style, size and price preferences

with your personal stylist.

Get 5 hand-selected pieces of clothing delivered to your

door.

Try your fix on in the comfort of your home

Leave feedback and pay for only the items you keep

Return the other items in the

envelope provided

Stitch Fix

There’s an algorithm for that...

Styling Algorithms

Client/Stylist Matching

Demand Modeling

Human Computation

Pick Path Optimization

New Style Development

Inventory Allocation

State Machines

Warehouse Assignment

Batch Picking

Replenishment

* Find out more at http://algorithms-tour.stitchfix.com/

http://algorithms-tour.stitchfix.com/

OurInfrastructureandTech Stack

Camera: State Snapshots

FlotillaAWS ECS Cluster

Bumblebee: Metadata Manager

AWS:S3Prod

Dev/Research

MetastoreAWS ECS

Cluster

AWS ECS Cluster

Data Acquisition Data ProcessingData Storage

Data Management

Uhura

Job Execution

Workflow Management

Some facts

● 1000s of jobs / day

○ Model training, featurization, test analysis, reporting, analytics, adhoc research

● Production jobs run on

○ Spark: mostly Spark SQL and pySpark

○ Flotilla: Python or R in Docker containers on ECS

● ML pipelines typically consist of several jobs spanning the stack of technologies

● Data scientists own pipelines and implementations end-to-end

Good Practices for Developing ML Pipelines

Pipelines should be designed to support constant iteration

○ Individual pipelines/algorithms/implementations change quickly

○ Tooling and infrastructure should be relatively stable

At scale, failure should be expected

○ Be robust to failure

■ Checkpointing

■ Isolation

■ Automated Retries

■ Alerting

○ Make it easy to debug and diagnose

○ We train 100s of models / day, and expect some # to fail.

Pipelines and jobs should be idempotent.

Make pragmatic choices with respect to technology.

Case Study: Inventory Recommendation

Models

Extract Training Data Train Model Upload ModelExtract Training

Data Train Model Upload ModelExtract Training Data Train Model Upload ModelExtract Training

Data Train Model Upload ModelExtract Training Data Train Model Upload ModelExtract Training

Data Train Model Upload ModelExtract Training Data Train Model Upload Model

Algo_V1_1

Model by Inventory Department

User Item RatingData

Extract “wide” Client

Training Data

TrainModel A

Upload Model A

Extract “wide” Item

Training DataModel D Training

Data

Model C Training

Data

Ingest

TrainModel C

Upload Model C

TrainModel D

Upload Model D

Model B Training

Data

TrainModel B

Upload Model B

Model A Training

Data

Extract “wide” Client Training

Data

User Item RatingData

TrainModel A

Upload Model A

Extract “wide” Item

Training Data Model D Training Data

Model C Training Data

Model A Training Data

Ingest

TrainModel C

Upload Model C

TrainModel D

Upload Model D

Model B Training Data

TrainModel B

Upload Model B

client_features: { "expanded_colors": { "in": [ "client_colors" ], "fn": "dummy_expand" }, "X_Y_ratio" : { "in": [ X, Y ], "fn": "compute_scaled_ratio"

} …},

item_features: { "expanded_print" : { "in": [ colors ], "fn": "dummy_expand"

}},interaction_features: {}

Extract Jobs generated from resolution of Model + Feature Definitions

{ “deptA”: { "computed_features": [ “example_feature” ], "formula": [ "s ~ 1 + f_a + shiny_material_flag + x_y_ratio” ] }, "deptB": { "computed_features": [ “example_feature” ], "formula": [ "s ~ 1 + f_a + x_y_ratio + client_color_a + expanded_print_x” ] }}

1. Spark is utilized heavily for feature engineering.

2. Model fitting occurs in containerized Python and R environments.

3. Individual jobs communicate via data dependencies.

4. Our inventory recommendation algorithms are specified with a high degree of tooling.

5. Pipelines leave behind multiple artifacts for analysis, debugging, and checkpointing. (extract, train, load)

6. Individual models are isolated from one another. (and can fail without impacting the rest of the group)

7. Data is contextual: e.g. item type; business line

Some Observations

Platform Tooling is Important!

Desirable Properties of Infrastructure & Tooling

● Isolation should be guaranteed by the infrastructure

● It should be obvious what running jobs and services are doing, when, and why

● Access to data should be easy, consistent, and self-service

● Guide rails should enforce, or strongly encourage, idempotent patterns

● Scaling, logging, and security should be baked into infrastructure and tooling

Access to Data

● All data is managed and tracked by the Metastore

○ Hive metastore abstracted by Bumblebee

○ Location, Schema, Format

● Data access for Python and R is a 1st class citizen

○ Typically accessed as dataframes

○ df = load_dataframe( namespace, table)

○ store_dataframe(df, namespace, table)

the cloud.

embrace elasticity.

Containerized Batch Jobs

● Containerized job execution has many benefits○ Strong isolation○ High degree of control over resources and environment

● But, needs abstraction over job definition and management○ So we developed Flotilla○ And open sourced it!

https://stitchfix.github.io/flotilla-os/

https://stitchfix.github.io/flotilla-os/

Questions?

Get in touch:[email protected] @jeffmagnusson http://www.linkedin.com/in/jmagnuss

mailto:[email protected]

http://www.linkedin.com/in/jmagnuss

when we spark and when we don’t - qcon.ai5. pipelines leave behind multiple artifacts for...

Documents