aws re:invent 2014 talk: scheduling using apache mesos in the cloud

89

Upload: sharma-podila

Post on 07-Jul-2015

2.136 views

Category:

Software


1 download

DESCRIPTION

How can you reliably schedule tasks in an unreliable, autoscaling Cloud environment? In this presentation, we'll talk about the design of our scheduler built on top of Apache Mesos that serves as the core of our stream-processing platform, Mantis, designed for real-time insights. We'll focus on the following aspects of the scheduler: - Coarse-grained vs. fine-grained resource scheduling - Fault tolerance via a combination of task reconciliation and life cycle event processing - Scheduling optimizations for bin packing, for stream locality to reduce network bandwidth usage, for task placement to achieve auto scaling of the cluster size, etc. This talk will also include detailed information about approaches to scheduling in a distributed, auto-scaling, environment.

TRANSCRIPT

Page 1: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 2: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 3: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 4: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 5: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 6: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 7: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 8: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 9: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 10: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 11: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 12: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Analytics

System health

Customer experienceInsights

Anomaly detection

Page 13: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 14: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 15: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 16: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

User

Job 1

User

Job 2

User

Job 3

Dis

covery

Page 17: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

MantisMantis

Apache MesosApache Mesos

Mantis

Apache Mesos

ASGASG

ASG

FenzoMesos

Framework

Page 18: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

JobJob

Job

Page 19: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 20: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 21: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 22: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Mesos slave

FrmWrk2 executor

TaskTask

Mesos slave

FrmWrk2 executor

FrmWrk1 executor

TaskTask

Mesos master Standby master Standby master

Mesos slave

FrmWrk1 executor

TaskTask

FrmWrk1 FrmWrk2

Page 23: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 24: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Instance 1Instance 1

Task A

Instance 2

Task B

Instance 1

Task A

Instance 1

Task A

Task B

Task C Task D

Page 25: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 26: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 27: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 28: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 29: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 30: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 31: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 32: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 33: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 34: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 35: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Data

stream

Host A

Task1

Host B

Task2

Host C

Task3

Page 36: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Data

stream

Data

stream

Host X

Task1

Task2

Task3

Host A

Task1

Host B

Task2

Host C

Task3

Page 37: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 38: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 39: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 40: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 41: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

MantisMantis

Mantis

FenzoMesos

Framework

Page 42: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Mesos slave

FrmWrk1 executorFrmWrk1 executor

Mesos slave

Framework executor

TaskTaskTask

Framework executor

Task

Page 43: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Apache MesosApache Mesos

Mesos Master

Apache Mesos

Framework

Persistence

Page 44: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Apache MesosApache Mesos

Mesos Master

Apache Mesos

Framework

Persistence

Page 45: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

.setName(name)

.setFailoverTimeout(to)

.setId(id)

.setCheckpoint(true)

.build();

Page 46: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 47: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 48: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 49: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 50: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Heterogeneous

Autoscale

Visibility

Plugins for

Constraints, Fitness

High speed

Page 51: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 52: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Mesos master

Mesos framework

Tasks

requests

Available

resource

offers

Fenzo task

scheduler

Persistence

Page 53: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Fitness

Urg

ency

Page 54: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Speed Accuracy

First fit assignment Optimal assignment

Real world trade-offs~ O (1) ~ O (N * M)1

1 Assuming tasks are not reassigned

Page 55: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 56: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 57: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 58: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 59: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 60: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 61: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 62: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 63: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 64: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 65: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 66: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 67: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 68: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 69: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Page 70: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 71: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

750

950

1150

1350

1550

1750

1950

2150

#H

osts

No bin packing used

#Full

#Partial

#Empty

750

950

1150

1350

1550

1750

1950

2150

#H

osts

With bin packing

#Full

#Partial

#Empty

Page 72: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

1450

1550

1650

1750

1850

1950

2050

#H

osts

No task runtime-based packer

DifferentruntimesSameruntimesUnused

1450

1550

1650

1750

1850

1950

2050

#H

osts

Using task runtime-based packer

Differentruntimes

Sameruntimes

Unused

Page 73: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 74: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 75: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 76: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 77: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 78: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

ASG/Cluster:

mantisagent

MinIdle: 8

MaxIdle: 20

CooldownSecs:

360

ASG/Cluster:

mantisagent

MinIdle: 8

MaxIdle: 20

CooldownSecs:

360

ASG/cluster:

mantisagent

MinIdle: 8

MaxIdle: 20

CooldownSecs: 360

Fenzo

ScaleUp

action:

Cluster, N

ScaleDown

action:

Cluster,

HostList

Page 79: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 80: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 81: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

.withLeaseOfferExpirySecs(60)

.withLeaseRejectAction( (lease) -> {

mesosDriver.declineOffer(lease.getOffer().getId());

})

Page 82: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

.withLeaseOfferExpirySecs(60)

.withLeaseRejectAction( (lease) -> {

mesosDriver.declineOffer(lease.getOffer().getId());

} )

.withFitnessCalculator(

BinPackingFitnessCalculators.cpuBinPacker)

Page 83: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 84: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 85: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Heterogeneous

Autoscale

Visibility

Plugins for

Constraints, Fitness

High speed

Page 86: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

MantisMantis

Apache MesosApache Mesos

Mantis

Apache Mesos

ASGASG

ASG

FenzoMesos

Framework

Page 87: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
Page 88: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Talk Time Title

PFC-305 Wednesday, 1:15pm Embracing Failure: Fault Injection and Service Reliability

BDT-403 Wednesday, 2:15pm Next Generation Big Data Platform at Netflix

PFC-306 Wednesday, 3:30pm Performance Tuning EC2

DEV-309 Wednesday, 3:30pm From Asgard to Zuul, How Netflix’s proven Open Source

Tools can accelerate and scale your services

ARC-317 Wednesday, 4:30pm Maintaining a Resilient Front-Door at Massive Scale

PFC-304 Wednesday, 4:30pm Effective Inter-process Communications in the Cloud: The

Pros and Cons of Micro Services Architectures

ENT-209 Wednesday, 4:30pm Cloud Migration, Dev-Ops and Distributed Systems

APP-310 Friday, 9:00am Scheduling using Apache Mesos in the Cloud

Page 89: AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Please give us your feedback on this

presentation