meson: heterogeneous workflows with spark at netflix

Post on 19-Feb-2017

473 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Heterogeneous Workflows with Spark at Netflix

0

Antony Arokiasamy | Kedar Sadekar | Personalization Infrastructure

1

Help members find content to watch and enjoy to maximize member satisfaction and retention

Everything is a Recommendation2

Recommendations are driven by Machine Learning

Ranking

Row

s

Machine Learning Pipeline3

User SelectionFeature

GenerationModel

ValidationPublishModel

Model Training

Machine Learning Pipeline Challenges

4

• Innovation• Heterogeneous Environments

• Spark• Native Support

• Separate Orchestration and Execution

• Multi Tenancy

• Machine Learning Constructs• Parameter Sweep – 30k Dockers

Meson Workflow System5

• General Purpose Workflow Orchestration and Scheduling framework• Delegates execution to resource managers like Mesos

• Optimized for Machine Learning Pipelines and Visualization

• Checkout the Blog• http://bit.ly/mesonws or techblog.netflix.com

• Plan to Open Sourced soon

Meson Architecture6

Standard and Custom Step Types7

Parameter Passing8

Hive Query User DataSet Regional DataSet

Global DataSet

Get Users

Regional Model

Global Model

User DataSet

Wrangle Data

Structured Constructs9

Top Down or Bottom Up10

Two Way Communication11

Spark Step12

Artifacts13

• Step outputs tracked as Artifacts

• Visualization

• Memoization

Multi Tenancy14

• Resource Attributes • spark.cores.max• spark.executor.memory• spark.mesos.constraints• Dynamic Resource Allocation

Cluster Management15

• Red-Black software updates

• Scale up/Scale down

Meson/Spark Cluster16

• 100s of Concurrent Jobs

• 700 Nodes

• 5000 Cores

• 25 TB Memory

• Apps: Meson Workflow System, Spark and Dockers

• Few smaller clusters

17

Antony Arokiasamy

Kedar Sadekar

@aasamy

/aasamy

aarokiasamy@netflix.com

@kedar_sadekar

/kedar-sadekar

ksadekar@netflix.com

top related