orchestration: fancy buzzword, or the inevitable fate of docker containers?
TRANSCRIPT
Fancy buzzword, or the fate of containers?
Orchestration
© 2015 Mesosphere, Inc.
2
Connor DoyleSoftware EngineerMesosphere, Inc.
[email protected]@nor0101
Hi!
© 2015 Mesosphere, Inc.
• What problem are we solving?• Prior art• Axes of choice
• The allure of two-level scheduling• To infinity and beyond• Oversubscription• Maintenance
Agenda
3
© 2015 Mesosphere, Inc.
“Container orchestration” implies horizontal scalability.
Why you need scale varies, and your workload profile has bearing on how you should run your clusters. (e.g. HPC/HTC needs are different from a consumer retail website).
Mo’ scale, mo’ problems:failure (and cascading failure), fault zones, maintaining SLOs, maintenence windows, monitoring/alerts
The problem space
4
© 2015 Mesosphere, Inc.
“Container orchestration” implies horizontal scalability.
Why you need scale varies, and your workload profile has bearing on how you should run your clusters. (e.g. HPC/HTC needs are different from a consumer retail website).
Mo’ scale, mo’ problems:failure (and cascading failure), fault zones, maintaining SLOs, maintenence windows, monitoring/alerts
The problem space
5
© 2015 Mesosphere, Inc.
“Container orchestration” implies horizontal scalability.
Why you need scale varies, and your workload profile has bearing on how you should run your clusters. (e.g. HPC/HTC needs are different from a consumer retail website).
Mo’ scale, mo’ problems:failure (and cascading failure), fault zones, maintaining SLOs, maintenence windows, monitoring/alerts
The problem space
6
© 2015 Mesosphere, Inc.
“Container orchestration” implies horizontal scalability.
Why you need scale varies, and your workload profile has bearing on how you should run your clusters. (e.g. HPC/HTC needs are different from a consumer retail website).
Mo’ scale, mo’ problems:failure (and cascading failure), fault zones, maintaining SLOs, maintenence windows, monitoring/alerts
The problem space
7
We want:
- Stability- Performance- Flexibility- Abstractions we can grasp and
explain
© 2015 Mesosphere, Inc.
Orchestration starts with a good scheduler.
8
© 2015 Mesosphere, Inc.
We have options :)
• Centralized• Batch schedulers (HTCondor, Slurm, Torque)• Monolithic schedulers (Borg)• Process schedulers (systemd, fleet, Kubernetes)• Two-level schedulers (Mesos, Ω)
• Decentralized• Completely! Sparrow• Hybrid! Mercury
9
© 2015 Mesosphere, Inc.
This is a HUGE opportunity
10
© 2015 Mesosphere, Inc.
- To get the abstractions right
- To mitigate the next software crisis
- To do better!
This is a HUGE opportunity
11
© 2015 Mesosphere, Inc.
- To get the abstractions right
- To mitigate the next software crisis
- To do better!
This is a HUGE opportunity
12
© 2015 Mesosphere, Inc.
- To get the abstractions right
- To mitigate the next software crisis
- To do better!
This is a HUGE opportunity
13
© 2015 Mesosphere, Inc.
Two-level scheduling is a nice model.
14
© 2015 Mesosphere, Inc.
Let the cluster manager:
- Keep track of resources- Offer resources to applications fairly- Implement low-level isolation
Two-level scheduling
15
© 2015 Mesosphere, Inc.
Let the application-specific scheduler:
- Track its own job queue- Think about task constraints- Define task semantics- Choose appropriate containerization- Respond to failures
Two-level scheduling
16
© 2015 Mesosphere, Inc.
Hey, looks like a managed runtime!
These have been popular lately!
• JVM• HHVM• V8• ...
17
© 2015 Mesosphere, Inc.
Hey, looks like a managed runtime!
These have been popular lately!
• JVM• HHVM• V8• ...
Why?
They allow high-level general-purpose programs to benefit from:
- Portable units of execution- Architecture dependent optimizations- Dynamic (de)optimizations based on insights learned at execution time
and it gets better over time for free!
18
© 2015 Mesosphere, Inc.
A goal: maximize utilization
19
© 2015 Mesosphere, Inc.
...safely!
20
Jobs like to run on underutilized hardware!
Contention for shared resources can negatively impact other goals (such as tail-latency or throughput)
Besides estimating oversubscribable resources we need to revise the estimates over time!
© 2015 Mesosphere, Inc.
...safely!
21
Jobs like to run on underutilized hardware!
Contention for shared resources can negatively impact other goals (such as tail-latency or throughput)
Besides estimating oversubscribable resources we need to revise the estimates over time!
© 2015 Mesosphere, Inc.
...safely!
22
Jobs like to run on underutilized hardware!
Contention for shared resources can negatively impact other goals (such as tail-latency or throughput)
Besides estimating oversubscribable resources we need to revise the estimates over time!
© 2015 Mesosphere, Inc. 23
© 2015 Mesosphere, Inc. 24
© 2015 Mesosphere, Inc.
More challenges opportunities
Choose victims wisely!
Is killing the only option?
25
© 2015 Mesosphere, Inc.
Another goal: orderly downtime
“I’m removing this node from the cluster NOW.”
“I’m going to take this node offline in three hours.”
26
© 2015 Mesosphere, Inc.
Another goal: orderly downtime
27
Tag resource offers with a time horizon
Give application schedulers a chance to relocate affected tasks
© 2015 Mesosphere, Inc.
References1. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing2. Distributed Computing in Practice: The Condor Experience3. Heracles: Improving Resource Efficiency at Scale4. Large-scale cluster management at Google with Borg5. Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters6. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center7. Mesos Oversubscription Design Document8. MESOS-1474: Provide cluster maintenance primitives for operators9. Omega: flexible, scalable schedulers for large compute clusters10. Quasar: Resource-Efficient and QoS-Aware Cluster Management11. Reliable Cron across the Planet12. Sparrow: Distributed, Low Latency Scheduling
28