containercon presentation
TRANSCRIPT
![Page 2: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/2.jpg)
© 2015 Mesosphere, Inc. 2
1. Mesosphere and the DCOS
2. Running a Production Cluster: Four Themes
![Page 3: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/3.jpg)
© 2015 Mesosphere, Inc.
About Me
• Engineer at Mesosphere who wants to make life easier for our users.
• Continuing fascination with datacenters.
• Managed an 800TB cluster once upon a time, now I just talk to people with large clusters!
3
![Page 4: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/4.jpg)
© 2015 Mesosphere, Inc.
Dan
Wants happy Datacenter machines.Seeks to always have enough headroom.Prefers to avoid 3am wakeup calls.Aims to provide top-level services - like app deployment platforms, CI, databases - to everyone else.Doesn’t care about what individual workloads are actually doing: that’s for developers to worry about.
DATACENTER OPERATOR
![Page 5: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/5.jpg)
© 2015 Mesosphere, Inc. 5
Mesosphere and the DCOS
![Page 6: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/6.jpg)
© 2015 Mesosphere, Inc.
operating system
“a collection of software that manages the computer hardware resources and provides common services for computer programs”
6
![Page 7: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/7.jpg)
© 2015 Mesosphere, Inc.
datacenter operating system
“a collection of software that manages the datacenter computer hardware resources and provides common services for computer programs”
7
![Page 8: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/8.jpg)
© 2015 Mesosphere, Inc. 8
Kernel
Mesos
DCOS Services
MarathonChronos
Kubernetes Spark YARN
Cassandra Kafka
ElasticSearch Jenkins
...
Service Discovery Mesos DNS
DCOS CLI DCOS GUI Repository
Security
System Image Monitoring / Alerting
Open Source Components
Operations
Mesosphere DCOS
![Page 9: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/9.jpg)
© 2015 Mesosphere, Inc.
Datacenter Operating System
Introduction to DCOS
9
• Native support for Docker containers
• Build around multiple open source projects:
• Apache Mesos (kernel)
• Mesosphere Marathon (init service)
• Mesos DNS (service discovery)
![Page 10: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/10.jpg)
© 2015 Mesosphere, Inc.
Datacenter Operating System
Introduction to DCOS
10
![Page 11: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/11.jpg)
© 2015 Mesosphere, Inc.
Datacenter Operating System
The Command Line for your Datacenter
11
• Easiest way to install distributed systems into a cluster
• One command installs of Spark, Cassandra, HDFS, etc.
• dcos package install spark
• More packages on their way!
• Myriad (YARN scheduler)
• ElasticSearch
• Provides tools to debug and monitor a DCOS cluster
![Page 12: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/12.jpg)
© 2015 Mesosphere, Inc.
Datacenter Operating System
The Command Line for your Datacenter
12
• Provides tools to debug and monitor a DCOS cluster
• dcos marathon app list
• dcos service log spark
• Open source (Apache 2 licensed)
• Extensible!
![Page 13: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/13.jpg)
© 2015 Mesosphere, Inc. 13
Apache Mesos: Datacenter Kernel
![Page 14: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/14.jpg)
14© 2015 Mesosphere, Inc.
![Page 15: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/15.jpg)
© 2015 Mesosphere, Inc.
Apache Mesos: Datacenter Kernel
Level of Indirection
15
Mesos%(slaves)%
coordinator%
Mesos%(master)%
coordinator%
![Page 16: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/16.jpg)
© 2015 Mesosphere, Inc. 16
![Page 17: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/17.jpg)
© 2015 Mesosphere, Inc.
Overview & Users
• A top-‐level Apache project • A cluster resource nego4ator • Scalable to 10,000s of nodes • Fault-‐tolerant, ba=le-‐tested • An SDK for distributed apps
17
Apache Mesos: Datacenter Kernel
![Page 18: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/18.jpg)
© 2015 Mesosphere, Inc. 18
Marathon: Init System
![Page 19: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/19.jpg)
© 2015 Mesosphere, Inc. 19
![Page 20: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/20.jpg)
© 2015 Mesosphere, Inc. 20
![Page 21: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/21.jpg)
© 2015 Mesosphere, Inc.
Features
• Start, stop, scale, update apps • Nice web interface, API • Highly available, no SPoF • Na4ve Docker support • Rolling deploy / restart • Applica4on health checks • Ar4fact staging
21
Marathon: Init System
![Page 22: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/22.jpg)
© 2015 Mesosphere, Inc. 22
Running a Production Cluster: Four Themes
![Page 23: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/23.jpg)
© 2015 Mesosphere, Inc. 23
Running a Production Cluster: Four Themes
1. Dependency Management2. Deployment3. Service Discovery4. Monitoring & Logging
![Page 24: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/24.jpg)
© 2015 Mesosphere, Inc. 24
1. Dependency Management
![Page 25: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/25.jpg)
© 2015 Mesosphere, Inc. 25
1. Dependency Management
a) Configuration of Servers
b) Application Dependencies
![Page 26: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/26.jpg)
© 2015 Mesosphere, Inc.
1. Dependency Management
Configuration of Servers
• Still need to configure the underlying system image but it’s now much simpler!
• Use Chef or Puppet.
26
Use a configuration management system to build your underlying machines.
![Page 27: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/27.jpg)
© 2015 Mesosphere, Inc.
1. Dependency Management
Application Dependencies
• Docker works really well!
• For non Dockerized applications, using a tarball is crude but works well.
27
Application developers should make no assumptions about the underlying system. Containers make this easy.
![Page 28: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/28.jpg)
© 2015 Mesosphere, Inc. 28
2. Deployment
![Page 29: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/29.jpg)
© 2015 Mesosphere, Inc.
2. Deployment
29
We need two things:1. An artifact repository2. A container orchestration system (i.e. Mesos)
![Page 30: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/30.jpg)
© 2015 Mesosphere, Inc.
2. Deployment
30
a) Developer Workflowb) Private Registriesc) Resource Limitsd) Resource Homogeneitye) Noisy Neighboursf) High Availability
![Page 31: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/31.jpg)
© 2015 Mesosphere, Inc.
2. Deployment
Developer Workflow
• Use a source control system to track application and job definitions. These can either live in a central repository or in each projects' repository.
31
Make use of source control and continuous integration tooling to provide an audit log of what's being deployed to your cluster.
![Page 32: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/32.jpg)
© 2015 Mesosphere, Inc.
2. Deployment
Private Registries
• Lots of machines pulling down containers. Docker Hub just won't suffice. You'll want to use a private registry backed by something like HDFS or S3.
32
Run an internal registry backed by a distributed file system.
![Page 33: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/33.jpg)
© 2015 Mesosphere, Inc.
Think harder about how much of various resources your application really needs.
2. Deployment
Resource Limits• Containers need to be sized appropriately.
• Running an application on a virtual machine allows the application to grow as much as needed. Container resource limits will be enforced by killing the task.
• Some languages are better than this than others (e.g. Java)
33
![Page 34: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/34.jpg)
© 2015 Mesosphere, Inc.
Leave some slack in your resource limits when deploying an application to account for performance differences between
servers.
2. Deployment
Resource Homogeneity
• CPUs perform at different rates! Generally 1 core = 1 share but one core doesn’t necessarily equal another core.
• Same goes for memory!
34
![Page 35: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/35.jpg)
© 2015 Mesosphere, Inc.
Leak some slack in your resource limits when deploying an application to account for noisy neighbours. Consider co-location
constraints (or machine roles) to avoid worst case interference.
2. Deployment
Noisy Neighbours• Just like VMs, containers suffer from the issues of noisy neighbours.
• Colocation between services is more frequent and interference becomes a really big problem. Networking isolation is still poor.
• Stanford’s David Lo has done some great research into what workloads work well with each other.
35
![Page 36: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/36.jpg)
© 2015 Mesosphere, Inc.
It's up to the application writer to build in high availability functionality. ZooKeeper is a good start.
2. Deployment
High Availability• A container based architecture will not make your applications more resilient.
• Mesos and Marathon are built to handle rolling upgrades.
• However it's up to the application itself to handle failover and persistence of state.
36
![Page 37: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/37.jpg)
© 2015 Mesosphere, Inc. 37
3. Service Discovery
![Page 38: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/38.jpg)
© 2015 Mesosphere, Inc.
3. Service Discovery
38
Two approaches:1. Static ports2. Dynamic ports
![Page 39: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/39.jpg)
© 2015 Mesosphere, Inc.
3. Service Discovery
Static Ports• Each instance service is given a unique hostname and runs on the same, well
known, port.
• In order to co-locate multiple instances of service on same physical host, it is necessary to allocate one IP per container.
• Typically using DNS A-records.
39
Less manual configuration but with static ports, unless you have one IP per container, you are limited to one instance of an
application per machine.
![Page 40: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/40.jpg)
© 2015 Mesosphere, Inc.
3. Service Discovery
Dynamic PortsRouting to services running on unique ports usually requires maintaining a secondary, out-of-band, process:
1. Using a DNS server and SRV records. Application must be able to read SRV records. Most languages don't have good support for this (Go does).
2. Use a proxy or iptables that is fed by a secondary process (e.g. ServiceRouter) to remap well known ports to dynamically allocated ports.
40
![Page 41: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/41.jpg)
© 2015 Mesosphere, Inc.
3. Service Discovery
Dynamic Ports
41
Applications must be written to accept ports dynamically. This may not be possible with legacy applications - which limits you
to running one instance per host.
DNS based approaches work well if your applications can handle SRV records.
A combination of approaches will most likely be required.
![Page 42: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/42.jpg)
© 2015 Mesosphere, Inc.
3. Service Discovery
Dynamic Ports (ZooKeeper/etc.d based)
• Use ZooKeeper or etc.d as a directory service / source of truth to store port mapping information.
• Load is significant and if clients misbehave then these services may have too many open connections.
42
Ensure that ZooKeeper/etc.d clients are well behaved. Stick a distributed cache in front of ZooKeeper to reduce load.
![Page 43: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/43.jpg)
© 2015 Mesosphere, Inc.
3. Service Discovery
Is Not Load Balancing
• Service discovery mechanisms primarily handle reachability of one service by another and don't typically route requests in an intelligent way.
43
Add some intelligence to your service discovery mechanism or use an external load balancer (e.g. ELB).
![Page 44: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/44.jpg)
© 2015 Mesosphere, Inc. 44
4. Monitoring & Logging
![Page 45: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/45.jpg)
© 2015 Mesosphere, Inc.
4. Monitoring & Logging
45
Different when using containers:1. Limited access to runtime environment2. Metrics are different
![Page 46: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/46.jpg)
© 2015 Mesosphere, Inc.
4. Monitoring & Logging
Utilisation vs Allocation
• It's hard to size applications correctly!
• Monitor running containers for CPU and memory usage to make sure they're correctly sized.
46
Monitor CPU and memory of running containers to ensure applications are correctly sized.
![Page 47: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/47.jpg)
© 2015 Mesosphere, Inc.
4. Monitoring & Logging
Application Metrics
• Applications may be using all of their allocated capacity.
• This doesn't mean that they're necessarily mis-sized though.
47
Monitor application level metrics like throughput and latency to get a more meaningful idea of how your application is
performing.
![Page 48: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/48.jpg)
© 2015 Mesosphere, Inc.
4. Monitoring & Logging
Health Checks• Health checks allow the container management system to automatically
cycle and route around tasks that may be still be running but are broken at an application level.
• Use these in combination with system/machine level monitoring to keep track of the state of a cluster.
48
Make health checks a mandatory part of the application deployment process.
![Page 49: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/49.jpg)
© 2015 Mesosphere, Inc.
4. Monitoring & Logging
Tooling
• It's not feasible to ssh into machines.
• Must provide tooling that allows users to introspect their containers. Mesos allows users to access their tasks' sandboxes (and the new DCOS command line interface provides similar functionality).
49
Make it easy for developers to access log output.
![Page 50: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/50.jpg)
© 2015 Mesosphere, Inc.
4. Monitoring & Logging
Tooling
• It's not feasible to ssh into machines.
• Must provide tooling that allows users to introspect their containers. Mesos allows users to access their tasks' sandboxes (and the new DCOS command line interface provides similar functionality).
50
Make it easy for developers to access log output.
![Page 51: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/51.jpg)
© 2015 Mesosphere, Inc.
4. Monitoring & Logging
Consistency FTW
• Logging becomes significantly more important to debug application failures when you're running many containers on various hosts.
• Ensure logging is approached in a standard way across applications and that log output is sufficiently descriptive to debug errors.
51
Mandate that applications log in a common way, either using a library or enforced best practices.
![Page 52: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/52.jpg)
© 2015 Mesosphere, Inc.
4. Monitoring & Logging
Aggregate Logs
• Good practice in general to view logging output across a cluster.
• If a machine dies, you'll lose logs.
• Aggregate these logs centrally and make them accessible to the user.
52
Aggregate logs and expose these to your application developers.
![Page 53: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/53.jpg)
© 2015 Mesosphere, Inc. 53
Summary
![Page 54: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/54.jpg)
© 2015 Mesosphere, Inc. 54
Summary1. Mesos and Marathon provide a great starting point.2. Docker with a container orchestration system makes it
easier to treat machines as “cattle”.3. Resource requirements need more thought.4. Developers need tooling to help debug application
failures.5. No right answer (yet) for service discovery.
![Page 55: ContainerCon Presentation](https://reader033.vdocument.in/reader033/viewer/2022051710/5852c71e1a28abfa398e0de2/html5/thumbnails/55.jpg)
© 2015 Mesosphere, Inc. 55
Special thanks to:
• Ben Hindman
• Brenden Matthews
• Sam Eaton
• Tyler Neely
Thank you!
Slides will be online at:
mesosphere.github.io/presentations