how to deploy apache spark to mesos/dcos

28
How to deploy Apache Spark to Mesos/DCOS with Iulian Dragoș

Upload: typesafeinc

Post on 07-Jan-2017

28.135 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: How to deploy Apache Spark to Mesos/DCOS

How to deploy Apache Sparkto Mesos/DCOS

with Iulian Dragoș

Page 2: How to deploy Apache Spark to Mesos/DCOS

Agenda

• Intro Apache Spark

• Apache Mesos

• Why Spark on Mesos

• A look under the hood

2

Page 3: How to deploy Apache Spark to Mesos/DCOS
Page 4: How to deploy Apache Spark to Mesos/DCOS

Spark - lightning-fast cluster computing

• next generation Big Data solution

• analytics and data processing

• up to 100x faster than Hadoop MapReduce

• built with Scala and Akka

• Apache top-level project

4

Page 5: How to deploy Apache Spark to Mesos/DCOS

Spark

• It’s a next generation compute-engine

• Does not replace the whole Hadoop ecosystem

• just MapReduce

• Integrates/works with HDFS, Hive, Hbase, etc.

5

Page 6: How to deploy Apache Spark to Mesos/DCOS

Spark API

• Scala distributed collections

• also available from Python and Java

• interactive shell and job submission

• streaming and batch modes

• flourishing ecosystem (SparkSQL, MLLib, GraphX)

6

Page 7: How to deploy Apache Spark to Mesos/DCOS

Spark execution

7

http://spark.apache.org/docs/latest/cluster-­‐overview.html

Page 8: How to deploy Apache Spark to Mesos/DCOS

Spark execution

• local (for experimentation)

• standalone (built-in cluster manager)

• YARN (Hadoop cluster manager)

• Mesos (general cluster manager)

8

Page 9: How to deploy Apache Spark to Mesos/DCOS

Apache Mesos

Page 10: How to deploy Apache Spark to Mesos/DCOS

Why Apache Mesos?

• General (a “distributed kernel”)

• Efficient resource management

• Proven technology (in production at Apple and Twitter)

• Typesafe & Mesosphere maintain the Spark/Mesos framework

10

“Program against your datacenter as a single pool of resources”

Page 11: How to deploy Apache Spark to Mesos/DCOS

Frameworks running on Mesos

• HDFS

• Cassandra

• ElasticSearch

• Yarn (Myriad)

• Marathon, etc.

• and of course, Spark

11

Page 12: How to deploy Apache Spark to Mesos/DCOS

Resource scheduling with Mesos

• 2-level scheduling

• Mesos offers resources to frameworks

• Frameworks accept or reject offers

• Offers include

• CPU cores, memory, ports, disk

12

Page 13: How to deploy Apache Spark to Mesos/DCOS

13

Mesos Cluster

masterMesos Master

KeyMesosSparkHDFS

master / client

master / client

nodeMesos Slave

Name Node Executor

task1 …

node

DiskDiskDiskDiskDisk

Mesos SlaveData Node Executor… …

node

HDFS FW Sched. Job 1

Spark FW Sched. Job 1

1 (S1, 8CPU, 32GB, ...)

Page 14: How to deploy Apache Spark to Mesos/DCOS

14

Mesos Cluster

masterMesos Master

KeyMesosSparkHDFS

master / client

master / client

nodeMesos Slave

Name Node Executor

task1 …

node

DiskDiskDiskDiskDisk

Mesos SlaveData Node Executor… …

node

HDFS FW Sched. Job 1

Spark FW Sched. Job 12

(S1, 8CPU, 32GB, ...)1

Page 15: How to deploy Apache Spark to Mesos/DCOS

def foo(x: Int)

15

Mesos Cluster

masterMesos Master

KeyMesosSparkHDFS

master / client

master / client

nodeMesos Slave

Name Node Executor

task1 …

node

DiskDiskDiskDiskDisk

Mesos SlaveData Node Executor… …

node

HDFS FW Sched. Job 1

Spark FW Sched. Job 12

1

(S1, 2CPU, 8GB, ...)(S1, 2CPU, 8GB, ...)

3

Page 16: How to deploy Apache Spark to Mesos/DCOS

def foo(x: Int)

16

Mesos Cluster

masterMesos Master

KeyMesosSparkHDFS

master / client

master / client

nodeMesos Slave

Name Node Executor

task1 …

node

DiskDiskDiskDiskDisk

Mesos SlaveData Node Executor… …

node

HDFS FW Sched. Job 1

Spark FW Sched. Job 12

1

(S1, 2CPU, 8GB, ...)(S1, 2CPU, 8GB, ...)

3

4

Spark Executortask1 …

Page 17: How to deploy Apache Spark to Mesos/DCOS

17

Spark Cluster Abstraction

…NodeNode

Spark Driverobject MyApp { def main() { val sc = new SparkContext(…) … }}

Cluster Manager

Spark Executor

task task

task task

Spark Executor

task task

task task

Page 18: How to deploy Apache Spark to Mesos/DCOS

18

Mesos Coarse Grained Mode

…Node NodeMesos Executor …Mesos Executor

master

Spark Executor

task task

task task

Spark Executor

task task

task task

Mesos Master

Spark Framework

Spark Driverobject MyApp { def main() { val sc = new SparkContext(…) … }}

Scheduler

Page 19: How to deploy Apache Spark to Mesos/DCOS

Mesos Coarse Grained Mode

• Fast startup for tasks: • Better for interactive sessions.

• But resources locked up in larger Mesos task. • (Dynamic allocation changes this is in 1.5)

19

…Node NodeMesos Executor …Mesos Executor

master

Spark Executor

task task

task task

Spark Executor

task task

task task

Mesos Master

Spark Framework

Spark Driverobject MyApp { def main() { val sc = new SparkContext(…) … }}

Scheduler

Page 20: How to deploy Apache Spark to Mesos/DCOS

Mesos Fine Grained Mode

20

…NodeNode

Spark Framework

Mesos Executor …

master

Spark Driverobject MyApp { def main() { val sc = new SparkContext(…) … }}

task task

task task

Mesos Master

Mesos ExecutorSpark Exec

task

Spark Exec

task

Spark Exec

task

Spark Exec

task

Mesos ExecutorSpark Exec

task

Spark Exec

task

Spark Exec

task

Spark Exec

task…

Scheduler

Page 21: How to deploy Apache Spark to Mesos/DCOS

Mesos Fine Grained Mode

• Better resource utilization. • Slower startup for tasks:

• Fine for batch and relatively static streaming.

21

…NodeNode

Spark Framework

Mesos Executor …

master

Spark Driverobject MyApp { def main() { val sc = new SparkContext(…) … }}

task task

task task

Mesos Master

Mesos ExecutorSpark Exec

task

Spark Exec

task

Spark Exec

task

Spark Exec

task

Mesos ExecutorSpark Exec

task

Spark Exec

task

Spark Exec

task

Spark Exec

task…

Scheduler

Page 22: How to deploy Apache Spark to Mesos/DCOS

Dynamic allocation

• Mesos support was added in Spark 1.5

• adds and removes executors based on load

• when executors are idle, kills them

• when tasks queue up in the scheduler, adds executors

• needs external-shuffle-service to be running on each node

22

Page 23: How to deploy Apache Spark to Mesos/DCOS

Client vs Cluster mode

• Where does the driver process run?

• client-mode: on the machine that submits the job

• cluster-mode: on a machine in the cluster

23

Page 24: How to deploy Apache Spark to Mesos/DCOS

Demo

Page 25: How to deploy Apache Spark to Mesos/DCOS

What’s next on Mesos

• Oversubscription (0.23) • Persistence Volumes • Dynamic Reservations • Optimistic Offers • Isolations • More….

25

Page 26: How to deploy Apache Spark to Mesos/DCOS

Closing words on Spark Streaming

• Spark 1.5 improves resiliency by adding back-pressure inside Spark Streaming

• slow-down receivers dynamically, based on load

• Spark 1.6 will add the ability to connect to Reactive Streams

• propagate back-pressure outside of Spark

26

Page 27: How to deploy Apache Spark to Mesos/DCOS

Key points

• Spark is a next-generation compute engine for Big Data

• Mesos is a next-generation cluster manager

• better utilization of cluster resources across organization

• Spark on Mesos is commercially supported by Typesafe

• Typesafe&Mesosphere are the maintainers of Spark/Mesos

27

Page 28: How to deploy Apache Spark to Mesos/DCOS

EXPERT SUPPORT Why Contact Typesafe for Your Apache Spark Project?

Ignite your Spark project with 24/7 production SLA, unlimited expert support and on-site training:

• Full application lifecycle support for Spark Core, Spark SQL & Spark Streaming

• Deployment to Standalone, EC2, Mesos clusters • Expert support from dedicated Spark team • Optional 10-day “getting started” services

package

Typesafe is a partner with Databricks, Mesosphere and IBM.

Learn more about on-site trainingCONTACT US