orchestration: fancy buzzword, or the inevitable fate of docker containers?

28
Fancy buzzword, or the fate of containers? Orchestration

Upload: connor-doyle

Post on 18-Aug-2015

24 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

Fancy buzzword, or the fate of containers?

Orchestration

Page 3: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

• What problem are we solving?• Prior art• Axes of choice

• The allure of two-level scheduling• To infinity and beyond• Oversubscription• Maintenance

Agenda

3

Page 4: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

“Container orchestration” implies horizontal scalability.

Why you need scale varies, and your workload profile has bearing on how you should run your clusters. (e.g. HPC/HTC needs are different from a consumer retail website).

Mo’ scale, mo’ problems:failure (and cascading failure), fault zones, maintaining SLOs, maintenence windows, monitoring/alerts

The problem space

4

Page 5: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

“Container orchestration” implies horizontal scalability.

Why you need scale varies, and your workload profile has bearing on how you should run your clusters. (e.g. HPC/HTC needs are different from a consumer retail website).

Mo’ scale, mo’ problems:failure (and cascading failure), fault zones, maintaining SLOs, maintenence windows, monitoring/alerts

The problem space

5

Page 6: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

“Container orchestration” implies horizontal scalability.

Why you need scale varies, and your workload profile has bearing on how you should run your clusters. (e.g. HPC/HTC needs are different from a consumer retail website).

Mo’ scale, mo’ problems:failure (and cascading failure), fault zones, maintaining SLOs, maintenence windows, monitoring/alerts

The problem space

6

Page 7: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

“Container orchestration” implies horizontal scalability.

Why you need scale varies, and your workload profile has bearing on how you should run your clusters. (e.g. HPC/HTC needs are different from a consumer retail website).

Mo’ scale, mo’ problems:failure (and cascading failure), fault zones, maintaining SLOs, maintenence windows, monitoring/alerts

The problem space

7

We want:

- Stability- Performance- Flexibility- Abstractions we can grasp and

explain

Page 8: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

Orchestration starts with a good scheduler.

8

Page 9: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

We have options :)

• Centralized• Batch schedulers (HTCondor, Slurm, Torque)• Monolithic schedulers (Borg)• Process schedulers (systemd, fleet, Kubernetes)• Two-level schedulers (Mesos, Ω)

• Decentralized• Completely! Sparrow• Hybrid! Mercury

9

Page 10: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

This is a HUGE opportunity

10

Page 11: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

- To get the abstractions right

- To mitigate the next software crisis

- To do better!

This is a HUGE opportunity

11

Page 12: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

- To get the abstractions right

- To mitigate the next software crisis

- To do better!

This is a HUGE opportunity

12

Page 13: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

- To get the abstractions right

- To mitigate the next software crisis

- To do better!

This is a HUGE opportunity

13

Page 14: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

Two-level scheduling is a nice model.

14

Page 15: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

Let the cluster manager:

- Keep track of resources- Offer resources to applications fairly- Implement low-level isolation

Two-level scheduling

15

Page 16: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

Let the application-specific scheduler:

- Track its own job queue- Think about task constraints- Define task semantics- Choose appropriate containerization- Respond to failures

Two-level scheduling

16

Page 17: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

Hey, looks like a managed runtime!

These have been popular lately!

• JVM• HHVM• V8• ...

17

Page 18: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

Hey, looks like a managed runtime!

These have been popular lately!

• JVM• HHVM• V8• ...

Why?

They allow high-level general-purpose programs to benefit from:

- Portable units of execution- Architecture dependent optimizations- Dynamic (de)optimizations based on insights learned at execution time

and it gets better over time for free!

18

Page 19: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

A goal: maximize utilization

19

Page 20: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

...safely!

20

Jobs like to run on underutilized hardware!

Contention for shared resources can negatively impact other goals (such as tail-latency or throughput)

Besides estimating oversubscribable resources we need to revise the estimates over time!

Page 21: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

...safely!

21

Jobs like to run on underutilized hardware!

Contention for shared resources can negatively impact other goals (such as tail-latency or throughput)

Besides estimating oversubscribable resources we need to revise the estimates over time!

Page 22: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

...safely!

22

Jobs like to run on underutilized hardware!

Contention for shared resources can negatively impact other goals (such as tail-latency or throughput)

Besides estimating oversubscribable resources we need to revise the estimates over time!

Page 23: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc. 23

Page 24: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc. 24

Page 25: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

More challenges opportunities

Choose victims wisely!

Is killing the only option?

25

Page 26: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

Another goal: orderly downtime

“I’m removing this node from the cluster NOW.”

“I’m going to take this node offline in three hours.”

26

Page 27: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

Another goal: orderly downtime

27

Tag resource offers with a time horizon

Give application schedulers a chance to relocate affected tasks

Page 28: Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

© 2015 Mesosphere, Inc.

References1. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing2. Distributed Computing in Practice: The Condor Experience3. Heracles: Improving Resource Efficiency at Scale4. Large-scale cluster management at Google with Borg5. Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters6. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center7. Mesos Oversubscription Design Document8. MESOS-1474: Provide cluster maintenance primitives for operators9. Omega: flexible, scalable schedulers for large compute clusters10. Quasar: Resource-Efficient and QoS-Aware Cluster Management11. Reliable Cron across the Planet12. Sparrow: Distributed, Low Latency Scheduling

28