dockercon14 cluster management and containerization
TRANSCRIPT
![Page 1: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/1.jpg)
Cluster Management and Containerization Benjamin Hindman, @benh
Twitter, Inc.
![Page 2: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/2.jpg)
$ whoami
2006 -‐ 2011 2009 -‐ 2010 -‐
![Page 3: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/3.jpg)
cluster management
![Page 4: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/4.jpg)
cluster management (server/IT automation)
![Page 5: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/5.jpg)
cluster management
① configuration/package management
② deployment
③ naming
![Page 6: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/6.jpg)
cluster management
① configuration/package management
② deployment
③ naming
④ monitoring
![Page 7: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/7.jpg)
cluster management
① configuration/package management
② deployment
③ naming
④ monitoring
ops
![Page 8: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/8.jpg)
cluster management
① configuration/package management
② deployment
③ naming
④ monitoring
ops developers
![Page 9: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/9.jpg)
cluster management
configuration/package management
naming deployment
![Page 10: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/10.jpg)
configuration/package management
“what/how do things get installed?”
(10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
$ ssh host ./configure && make install
![Page 11: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/11.jpg)
configuration/package management
“what/how do things get installed?”
(10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
$ ssh host rpm -‐ivh pkg-‐x.y.z.rpm
![Page 12: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/12.jpg)
deployment
“what should run where?” “how should it be started/stopped?”
$ ssh host nohup myapp (10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
![Page 13: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/13.jpg)
deployment
“what should run where?” “how should it be started/stopped?”
$ ssh host monit start myapp (10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
![Page 14: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/14.jpg)
deployment
“what should run where?” “how should it be started/stopped?”
$ scp myapp host $ ssh host monit myapp
(10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
![Page 15: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/15.jpg)
deployment
“what should run where?” “how should it be started/stopped?”
(10’s of machines)
$ ssh host git pull && \ monit myapp
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
![Page 16: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/16.jpg)
“how should apps find each other?”
naming
webhosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
dbhosts.txt
db1.twttr.com db2.twttr.com db3.twttr.com db4.twttr.com
(10’s of machines)
![Page 17: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/17.jpg)
(10’s of machines)
(100’s -‐> 1000’s of machines)
to scale, need more automation
![Page 18: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/18.jpg)
Twitter, circa 2010
webhosts.txt dbhosts.txt
$ ssh host …
(configuration/package management)
(deployment)
![Page 19: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/19.jpg)
MySQL Cassandra Rails Hadoop memcached
![Page 20: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/20.jpg)
challenges
![Page 21: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/21.jpg)
challenges ① failures
![Page 22: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/22.jpg)
failures
![Page 23: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/23.jpg)
MySQL Cassandra Rails Hadoop memcached
![Page 24: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/24.jpg)
MySQL Cassandra Rails Hadoop memcached
![Page 25: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/25.jpg)
types of failure: fault domains
machine (disk, memory, CPU, etc)
rack (switch, PDU)
datacenter
![Page 26: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/26.jpg)
challenges ② maintenance
(aka “planned failures”)
![Page 27: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/27.jpg)
maintenance ① upgrading software (i.e., installing and
uninstalling packages)
ops developers
![Page 28: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/28.jpg)
maintenance ① upgrading software (i.e., installing and
uninstalling packages)
② replacing machines, switches, PDUs, etc
![Page 29: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/29.jpg)
MySQL Cassandra Rails Hadoop memcached
![Page 30: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/30.jpg)
MySQL Cassandra Rails Hadoop memcached
![Page 31: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/31.jpg)
challenges ③ utilization
![Page 32: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/32.jpg)
Rails
Hadoop
memcached
utilization
![Page 33: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/33.jpg)
utilization
Rails
Hadoop
memcached
![Page 34: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/34.jpg)
utilization
Rails
Hadoop
memcached buy less machines
or run more applications!
![Page 35: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/35.jpg)
challenges ① failures
② maintenance
③ utilization
![Page 36: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/36.jpg)
challenges ① failures
② maintenance
③ utilization
planning for failure?
![Page 37: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/37.jpg)
planning for failure
![Page 38: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/38.jpg)
challenges ① failures
② maintenance
③ utilization
planning for utilization?
![Page 39: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/39.jpg)
planning for utilization intra-‐machine resource sharing:
share a single machine’s resources between multiple applications (multi-‐tenancy)
intra-‐datacenter resource sharing:
share multiple machine’s resources between multiple applications
![Page 40: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/40.jpg)
Twitter, circa 2010
what software can help me!?
![Page 41: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/41.jpg)
cluster management
industry academia
![Page 42: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/42.jpg)
different software
academia industry
• MPI (Message Passing Interface) • Apache (mod_perl, mod_php) • web services (Java, Ruby, …)
![Page 43: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/43.jpg)
different scale (at first)
academia industry
• 1,000’s of machines • 10’s of machines
![Page 44: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/44.jpg)
cluster management
academia industry
• PBS (Portable Batch System) • TORQUE • SGE (Sun Grid Engine)
• ssh • Puppet/Chef • Capistrano/Ansible
cluster managers
![Page 45: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/45.jpg)
cluster manager provides a level-‐of-‐indirection between hardware resources (machines) and applications/jobs
![Page 46: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/46.jpg)
MySQL Cassandra Rails Hadoop memcached
![Page 47: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/47.jpg)
cluster manager
Rails Hadoop memcached …
…
![Page 48: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/48.jpg)
cluster management
academia industry
• PBS (Portable Batch System) • TORQUE • SGE (Sun Grid Engine)
• ssh • Puppet/Chef • Capistrano/Ansible
batch computation!
![Page 49: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/49.jpg)
Mesos is a modern general purpose cluster manager (i.e., not just focused on batch scheduling)
![Page 50: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/50.jpg)
Mesos
service batch storage …
…
streaming
support many different types of computation/scheduling
![Page 51: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/51.jpg)
Mesos
service batch storage … streaming
(1) coordinate for resources
![Page 52: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/52.jpg)
Mesos
service batch storage … streaming
(2) launch tasks
![Page 53: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/53.jpg)
Mesos
service batch storage … streaming
(3) launch tasks
![Page 54: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/54.jpg)
Mesos
service batch storage … streaming
![Page 55: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/55.jpg)
Mesos
service batch storage … streaming
![Page 56: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/56.jpg)
Mesos
service batch storage … streaming
(4) task termination
![Page 57: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/57.jpg)
Mesos
service batch storage … streaming
(5) task status update
![Page 58: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/58.jpg)
challenges revisited ① failures
② maintenance
③ utilization
![Page 59: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/59.jpg)
challenges revisited ① failures
② maintenance
③ utilization
![Page 60: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/60.jpg)
Mesos
service batch storage … streaming
![Page 61: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/61.jpg)
Mesos
service batch storage … streaming
![Page 62: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/62.jpg)
Mesos
service batch storage … streaming
![Page 63: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/63.jpg)
Mesos
service batch storage … streaming
(5) task status update
![Page 64: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/64.jpg)
challenges revisited ① failures
② maintenance
③ utilization
![Page 65: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/65.jpg)
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
![Page 66: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/66.jpg)
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
![Page 67: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/67.jpg)
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
![Page 68: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/68.jpg)
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
![Page 69: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/69.jpg)
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
![Page 70: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/70.jpg)
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
(2) multi-‐tenancy on individual machines
![Page 71: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/71.jpg)
Mesos
service batch storage … streaming
(1) when resources become idle, can be scheduled and reused by other schedulers
(2) multi-‐tenancy on individual machines
![Page 72: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/72.jpg)
multi-‐tenancy
task!
task!
containers
task!
![Page 73: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/73.jpg)
containerization started leveraging containerization technology
in ~2011
2011
LXC
2012
cgroups
2013
Docker (preliminary)
2014
![Page 74: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/74.jpg)
how Mesos has changed cluster management at Twitter today
![Page 75: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/75.jpg)
configuration/package management
developers
(1) bundle services as jar, tar/gzip
(2) upload to HDFS
![Page 76: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/76.jpg)
configuration/package management (planning)
developers
(1) bundle services as jar, tar/gzip, and using Docker
(2) upload to HDFS (or use registry)
![Page 77: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/77.jpg)
deployment
Apache Aurora (incubating), a scheduler for running stateless services written in any language (but primarily used at Twitter for JVM services)
![Page 78: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/78.jpg)
deployment (via Aurora)
developers
(1) describe service using Python based DSL
(2) submit service to Aurora using CLI
![Page 79: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/79.jpg)
deployment (via Marathon)
developers
(1) describe services using JSON
(2) submit service to Marathon via REST
![Page 80: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/80.jpg)
naming
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
![Page 81: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/81.jpg)
naming
(1) task gets launched on machine
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
![Page 82: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/82.jpg)
naming
(2) service gets registered in a server set in ZooKeeper
(1) task gets launched on machine
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
![Page 83: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/83.jpg)
naming
(2) service gets registered in a server set in ZooKeeper
(1) task gets launched on machine
(3) other services use ZooKeeper to find services they need
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
![Page 84: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/84.jpg)
naming
(2) service gets registered in a server set in ZooKeeper
(1) task gets launched on machine
(3) other services use ZooKeeper to find services they need
(4) services connect directly with one another
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
![Page 85: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/85.jpg)
naming alternative
(2) update HAProxy with new service location
(1) task gets launched on machine
(3) other services send traffic through HAProxy
ZooKeeper/server sets requires injecting code into your clients!
![Page 86: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/86.jpg)
where are we today?
ops developers
![Page 87: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/87.jpg)
where are we today?
ops developers
deploys decoupled from ops (many deploys per day, per service)
maintenance consists of “draining” hosts, getting tasks rescheduled, then pulling the cord
![Page 88: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/88.jpg)
wait … don’t virtual machines solve my cluster management
challenges?
![Page 89: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/89.jpg)
wait … don’t virtual machines solve my cluster management
challenges?
No.
VMs are neither sufficient nor necessary!
![Page 90: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/90.jpg)
big computers
small applications
![Page 91: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/91.jpg)
IaaS public private
![Page 92: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/92.jpg)
![Page 93: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/93.jpg)
![Page 94: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/94.jpg)
challenges revisited ① failures
② maintenance
③ utilization
![Page 95: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/95.jpg)
challenges revisited ① failures
② maintenance
③ utilization
public or private IaaS, failures still occur (on EC2, instead of racks, have availability zones, instead of datacenters, have regions)
![Page 96: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/96.jpg)
challenges revisited ① failures
② maintenance
③ utilization provider wins with public IaaS, better resource sharing with private IaaS, but a static partition of VMs is still a static partition!
![Page 97: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/97.jpg)
MySQL Cassandra Rails Hadoop memcached
![Page 98: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/98.jpg)
cluster manager
Rails Hadoop memcached …
…
![Page 99: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/99.jpg)
Mesos: level of abstraction
Mesos build and run using resources
![Page 100: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/100.jpg)
Mesos: level of abstraction
IaaS
Mesos
provision and manage machines
build and run using resources
![Page 101: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/101.jpg)
Mesos on IaaS
IaaS
Mesos
use OpenStack or EC2 to run Mesos
![Page 102: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/102.jpg)
Mesos on IaaS/hardware
IaaS
Mesos
hardware use OpenStack or EC2 or physical machines
to run Mesos
![Page 103: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/103.jpg)
physical machines virtual machines
aggregation not virtualization
![Page 104: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/104.jpg)
physical machines datacenter computer
aggregation not virtualization
![Page 105: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/105.jpg)
small computers
?
big applications
![Page 106: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/106.jpg)
"small" computers?
![Page 107: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/107.jpg)
power wall
time
complex single core
simple many core
![Page 108: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/108.jpg)
"big" applications?
![Page 109: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/109.jpg)
applications don’t fit on a single computer anymore
![Page 110: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/110.jpg)
![Page 111: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/111.jpg)
"BIG"
![Page 112: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/112.jpg)
![Page 113: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/113.jpg)
(1) lots of data … (2) lots of users …
growing everyday
![Page 114: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/114.jpg)
these applications need lots of resources (CPUs, memory, I/O)
![Page 115: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/115.jpg)
these applications need datacenters
![Page 116: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/116.jpg)
the datacenter is the new computer
![Page 117: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/117.jpg)
desktop computer
server datacenter
OS
OS
OS
the datacenter computer needs an OS
![Page 118: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/118.jpg)
operating system
“a collection of software that manages the computer hardware resources and provides common services for computer programs”
- Wikipedia
![Page 119: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/119.jpg)
datacenter operating system
“a collection of software that manages the datacenter computer hardware resources and provides common services for computer programs”
- Wikipedia
![Page 120: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/120.jpg)
datacenter operating system
“a collection of software that manages the datacenter computer hardware resources and provides common services for computer programs”
- Wikipedia
![Page 121: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/121.jpg)
today
Your App
API
tomorrow
datacenter OS provides common functionality every new distributed system re-‐implements:
• failure detection • package distribution • task starting • resource isolation • resource monitoring • task killing, cleanup • …
![Page 122: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/122.jpg)
today
Your App
API
tomorrow
provides common functionality every new distributed system re-‐implements:
• failure detection • package distribution • task starting • resource isolation • resource monitoring • task killing, cleanup • …
datacenter OS
Don’t reinvent the wheel!
![Page 123: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/123.jpg)
case studies
distributed in-‐memory analytics framework
distributed cron scheduler (with dependencies)
github.com/apache/spark
github.com/airbnb/chronos
![Page 124: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/124.jpg)
~Presentation()
![Page 125: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/125.jpg)
cluster management w/ Docker + Mesos
① configuration/package management
② deployment
③ naming
![Page 126: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/126.jpg)
datacenter OS
IaaS
Mesos
hardware
provide common functionality via an API (kernel)
your distributed system
![Page 127: DockerCon14 Cluster Management and Containerization](https://reader034.vdocument.in/reader034/viewer/2022051016/557d8f3fd8b42ab00f8b4c09/html5/thumbnails/127.jpg)
Mesos 0.19.0 released today!
mesos.apache.org
mesos.apache.org/blog
@ApacheMesos