building/running* distributed* systems*with* apache*mesos*€¦ ·...
TRANSCRIPT
![Page 1: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/1.jpg)
Benjamin Hindman – @benh
Building/Running Distributed Systems with Apache Mesos Philly ETE
April 8, 2015
![Page 2: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/2.jpg)
$ whoami
2007 -‐ 2012 2009 -‐ 2010 -‐ 2014
![Page 3: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/3.jpg)
my other computer is a datacenter
![Page 4: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/4.jpg)
my other computer is a datacenter
![Page 5: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/5.jpg)
my other computer is a datacenter*
* collection of physical and/or virtual machines
![Page 6: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/6.jpg)
![Page 7: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/7.jpg)
how should we run applications on the datacenter computer?
![Page 8: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/8.jpg)
how do we program applications for the datacenter computer?
![Page 9: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/9.jpg)
what are datacenter applications?
![Page 10: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/10.jpg)
agenda ① what are datacenter applications?
② how should we run datacenter applications?
③ how should we program datacenter applications?
![Page 11: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/11.jpg)
agenda ① what are datacenter applications?
② how should we run datacenter applications?
③ how should we program datacenter applications?
![Page 12: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/12.jpg)
distributed systems!
![Page 13: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/13.jpg)
![Page 14: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/14.jpg)
stateless stateful
![Page 15: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/15.jpg)
other distributed systems?
![Page 16: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/16.jpg)
![Page 17: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/17.jpg)
(micro)services ① do one thing and do it well (UNIX)
② compose!
③ build/commit in isolation, test in isolation, deploy in isolation (with easy rollback)
④ captures organizational structure (many teams working in parallel)
![Page 18: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/18.jpg)
![Page 19: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/19.jpg)
There’s Just No Getting Around It: You’re Building a Distributed System
by Mark Cavage | May 3, 2013
https://queue.acm.org/detail.cfm?id=2482856
![Page 20: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/20.jpg)
There’s Just No Getting Around It: You’re Building a Distributed System
by Mark Cavage | May 3, 2013
https://queue.acm.org/detail.cfm?id=2482856
![Page 21: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/21.jpg)
![Page 22: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/22.jpg)
![Page 23: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/23.jpg)
Driver Program
Spark Context
Worker Node
Executor Cache
Task Task
Worker Node
Executor Cache
Task Task
![Page 24: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/24.jpg)
![Page 25: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/25.jpg)
agenda ① what are datacenter applications?
② how should we run datacenter applications?
③ how should we program datacenter applications?
![Page 26: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/26.jpg)
considerations
① configuration/package management
② deployment
③ service discovery
![Page 27: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/27.jpg)
considerations
① configuration/package management
② deployment
③ service discovery
④ monitoring
![Page 28: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/28.jpg)
considerations
① configuration/package management
② deployment
③ service discovery
④ monitoring
ops
![Page 29: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/29.jpg)
considerations
① configuration/package management
② deployment
③ service discovery
④ monitoring
ops developers
![Page 30: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/30.jpg)
configuration/package management
service discovery deployment
![Page 31: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/31.jpg)
configuration/package management
“what/how do things get installed?”
(10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
$ ssh host ./configure && make install
![Page 32: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/32.jpg)
configuration/package management
“what/how do things get installed?”
(10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
$ ssh host rpm -‐ivh pkg-‐x.y.z.rpm
![Page 33: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/33.jpg)
deployment
“what should run where?” “how should it be started/stopped?”
$ ssh host nohup myapp (10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
![Page 34: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/34.jpg)
deployment
“what should run where?” “how should it be started/stopped?”
$ ssh host monit start myapp (10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
![Page 35: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/35.jpg)
deployment
“what should run where?” “how should it be started/stopped?”
$ scp myapp host $ ssh host monit myapp
(10’s of machines)
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
![Page 36: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/36.jpg)
deployment
“what should run where?” “how should it be started/stopped?”
(10’s of machines)
$ ssh host git pull && \ monit myapp
hosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
![Page 37: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/37.jpg)
“how should apps find each other?”
service discovery
webhosts.txt
web1.twttr.com web2.twttr.com web3.twttr.com web4.twttr.com
dbhosts.txt
db1.twttr.com db2.twttr.com db3.twttr.com db4.twttr.com
(10’s of machines)
![Page 38: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/38.jpg)
(10’s of machines)
(100’s -‐> 1000’s of machines)
to scale, need less moving parts, more automation
![Page 39: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/39.jpg)
Twitter, circa 2010
webhosts.txt dbhosts.txt
$ ssh host …
(configuration/package management)
(deployment)
![Page 40: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/40.jpg)
![Page 41: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/41.jpg)
MySQL
![Page 42: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/42.jpg)
MySQL memcached
![Page 43: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/43.jpg)
MySQL Rails memcached
![Page 44: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/44.jpg)
MySQL Cassandra Rails memcached
![Page 45: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/45.jpg)
MySQL Cassandra Rails Hadoop memcached
![Page 46: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/46.jpg)
challenges
![Page 47: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/47.jpg)
challenges ① failures
![Page 48: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/48.jpg)
failures
![Page 49: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/49.jpg)
MySQL Cassandra Rails Hadoop memcached
![Page 50: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/50.jpg)
MySQL Cassandra Rails Hadoop memcached
![Page 51: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/51.jpg)
types of failure: fault domains
machine (disk, memory, CPU, etc)
rack (switch, PDU)
datacenter
![Page 52: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/52.jpg)
challenges ② maintenance
(aka “planned failures”)
![Page 53: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/53.jpg)
maintenance ① upgrading software (i.e., the kernel)
ops developers
![Page 54: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/54.jpg)
maintenance ① upgrading software (i.e., the kernel)
② replacing machines, switches, PDUs, etc
![Page 55: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/55.jpg)
MySQL Cassandra Rails Hadoop memcached
![Page 56: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/56.jpg)
MySQL Cassandra Rails Hadoop memcached
![Page 57: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/57.jpg)
challenges ③ utilization
![Page 58: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/58.jpg)
Rails
Hadoop
memcached
utilization
![Page 59: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/59.jpg)
utilization
Rails
Hadoop
memcached
![Page 60: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/60.jpg)
utilization
Rails
Hadoop
memcached buy less machines
or run more applications!
![Page 61: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/61.jpg)
challenges ① failures
② maintenance
③ utilization
![Page 62: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/62.jpg)
challenges ① failures
② maintenance
③ utilization
planning for failure?
![Page 63: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/63.jpg)
planning for failure
![Page 64: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/64.jpg)
challenges ① failures
② maintenance
③ utilization
planning for utilization?
![Page 65: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/65.jpg)
planning for utilization intra-‐machine resource sharing:
share a single machine’s resources between multiple applications (multi-‐tenancy)
intra-‐datacenter resource sharing:
share multiple machine’s resources between multiple applications
![Page 66: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/66.jpg)
Twitter, circa 2010
I want a cluster manager!
![Page 67: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/67.jpg)
cluster manager provides a level-‐of-‐indirection between hardware resources (machines) and applications/jobs
![Page 68: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/68.jpg)
MySQL Cassandra Rails Hadoop memcached
![Page 69: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/69.jpg)
cluster manager
Rails Hadoop memcached …
…
![Page 70: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/70.jpg)
Twitter, circa 2010
I want a cluster manager!
![Page 71: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/71.jpg)
Twitter, circa 2010
I miss Borg!
![Page 72: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/72.jpg)
Twitter, circa 2010
![Page 73: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/73.jpg)
Apache Mesos is a modern general purpose cluster manager (i.e., not just focused on batch scheduling)
![Page 74: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/74.jpg)
cluster management
industry academia
![Page 75: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/75.jpg)
different software
academia industry
• MPI (Message Passing Interface) • Apache (mod_perl, mod_php) • web services (Java, Ruby, …)
![Page 76: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/76.jpg)
different scale (at first)
academia industry
• 100’s of machines • 10’s of machines
![Page 77: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/77.jpg)
cluster management
academia industry
• PBS (Portable Batch System) • TORQUE • SGE (Sun Grid Engine)
• ssh • Puppet/Chef • Capistrano/Ansible
cluster managers
![Page 78: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/78.jpg)
different scale (converging)
academia industry
• 100’s of machines • 10’s of machines
1,000’s of machines
![Page 79: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/79.jpg)
cluster management
academia industry
• PBS (Portable Batch System) • TORQUE • SGE (Sun Grid Engine)
• ssh • Puppet/Chef • Capistrano/Ansible
batch computation!
![Page 80: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/80.jpg)
Mesos
batch service storage … streaming
![Page 81: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/81.jpg)
Mesos
batch service storage … streaming
schedulers
![Page 82: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/82.jpg)
Mesos (nodes)
Mesos: level of indirection
scheduler
Mesos (master)
scheduler
![Page 83: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/83.jpg)
Mesos (nodes)
Mesos: level of indirection
scheduler
Mesos (master)
scheduler
responsible for allocation (and reallocation) of
resources
![Page 84: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/84.jpg)
Mesos (nodes)
Mesos: level of indirection
scheduler
Mesos (master)
scheduler
responsible for allocation (and reallocation) of
resources
![Page 85: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/85.jpg)
Mesos (nodes)
scheduler
Mesos (master)
scheduler
responsible for allocation (and reallocation) of
resources
challenges: failures/maintenance
![Page 86: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/86.jpg)
Mesos (nodes)
scheduler
Mesos (master)
scheduler
responsible for allocation (and reallocation) of
resources
challenges: failures/maintenance
![Page 87: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/87.jpg)
Mesos (nodes)
scheduler
Mesos (master)
scheduler
responsible for allocation (and reallocation) of
resources
challenges: failures/maintenance
![Page 88: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/88.jpg)
Mesos (nodes)
challenges: failures/maintenance
scheduler
Mesos (master)
scheduler
responsible for allocation (and reallocation) of
resources
![Page 89: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/89.jpg)
Mesos (nodes)
challenges: utilization
scheduler
Mesos (master)
scheduler
responsible for allocation (and reallocation) of
resources
![Page 90: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/90.jpg)
Mesos (nodes)
challenges: utilization
scheduler
Mesos (master)
scheduler
responsible for allocation (and reallocation) of
resources
![Page 91: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/91.jpg)
Mesos (nodes)
challenges: utilization
scheduler
Mesos (master)
scheduler
responsible for allocation (and reallocation) of
resources
![Page 92: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/92.jpg)
two-‐level scheduling Mesos influenced by operating system supported user-‐space scheduling (and scheduler activations)
Mesos is designed less like a “cluster manager” and more like an operating system kernel
![Page 93: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/93.jpg)
Mesos: level of abstraction
Mesos build and run
distributed systems using resources
![Page 94: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/94.jpg)
Mesos: level of abstraction
IaaS
Mesos
provision and manage machines
build and run distributed systems
using resources
![Page 95: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/95.jpg)
Mesos: level of abstraction
PaaS
IaaS
Mesos
deploy and manage applications/services
provision and manage machines
build and run distributed systems
using resources
![Page 96: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/96.jpg)
PaaS on Mesos
PaaS
Mesos
build and run a PaaS on top of Mesos:
Marathon
![Page 97: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/97.jpg)
Mesos on IaaS
IaaS
Mesos
use OpenStack or EC2 to run Mesos
![Page 98: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/98.jpg)
Mesos on IaaS/bare metal
IaaS
Mesos
hardware use OpenStack or EC2 or physical machines
to run Mesos
![Page 99: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/99.jpg)
Mesos: datacenter kernel
IaaS
Mesos
hardware
provide common functionality via an API (kernel)
scheduler
![Page 100: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/100.jpg)
but how should we run datacenter applications?
![Page 101: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/101.jpg)
![Page 102: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/102.jpg)
![Page 103: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/103.jpg)
stateless services!
![Page 104: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/104.jpg)
Mesos
service
![Page 105: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/105.jpg)
service scheduler (PaaS)
service
Mesos
orchestrate services on top of Mesos
![Page 106: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/106.jpg)
orchestration vs scheduling
service
Mesos
schedule
![Page 107: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/107.jpg)
orchestration vs scheduling
service
Mesos
orchestrate
schedule
![Page 108: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/108.jpg)
orchestration with Marathon ① configuration/package
management
② deployment
③ service discovery
![Page 109: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/109.jpg)
configuration/package management
developers
(1) bundle services as jar, tar/gzip, or using Docker
(2) upload to HDFS (or use a Docker registry)
![Page 110: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/110.jpg)
deployment
developers
(1) describe services using JSON
(2) submit services to Marathon via REST
![Page 111: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/111.jpg)
example-‐docker.json { "container": { "type": "DOCKER", "docker": { "image": "libmesos/ubuntu" }, "volumes" : [ { "containerPath": "/etc/a", "hostPath": "/var/data/a", "mode": "RO" }, { "containerPath": "/etc/b", "hostPath": "/var/data/b", "mode": "RW" } ] }, "id": "ubuntu", "instances": 1, "cpus": 0.5, "mem": 512, "cmd": "while sleep 10; do date -u +%T; done" }
![Page 112: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/112.jpg)
service discovery
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
![Page 113: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/113.jpg)
service discovery
(1) task gets launched on machine
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
![Page 114: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/114.jpg)
service discovery
(2) service gets registered in a server set in ZooKeeper
(1) task gets launched on machine
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
![Page 115: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/115.jpg)
service discovery
(2) service gets registered in a server set in ZooKeeper
(1) task gets launched on machine
(3) other services use ZooKeeper to find services they need
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
![Page 116: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/116.jpg)
service discovery
(2) service gets registered in a server set in ZooKeeper
(1) task gets launched on machine
(3) other services use ZooKeeper to find services they need
(4) services connect directly with one another
Apache ZooKeeper
using Apache ZooKeeper and server sets (github.com/twitter/commons)
![Page 117: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/117.jpg)
service discovery alternative
(2) update HAProxy with new service location
(1) task gets launched on machine
(3) other services send traffic through HAProxy
ZooKeeper/server sets requires injecting code into your clients!
![Page 118: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/118.jpg)
orchestration w/ Kubernetes (on Mesos) ① configuration/package
management
② deployment
③ service discovery
![Page 119: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/119.jpg)
multiple schedulers!
![Page 120: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/120.jpg)
multiple schedulers!
…
![Page 121: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/121.jpg)
multiple schedulers!
![Page 122: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/122.jpg)
multiple schedulers!
0.8.2 0.9.0
![Page 123: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/123.jpg)
agenda ① what are datacenter applications?
② how should we run datacenter applications?
③ how should we program datacenter applications?
![Page 124: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/124.jpg)
distributed systems: dev • leader election
• state management (working set)
• task management (launch, isolate, kill, etc)
• machine management (monitoring)
• …
![Page 125: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/125.jpg)
distributed systems: ops • what to do if task/machine fails?
• what happens when more resources/tasks are needed? does everything scale proportionally? should tasks take on different or more specialized functions/roles?
![Page 126: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/126.jpg)
distributed systems: ops • what to do if task/machine fails?
• what happens when more resources/tasks are needed? does everything scale proportionally? should tasks take on different or more specialized functions/roles?
![Page 127: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/127.jpg)
distributed systems: dev
everybody keeps reinventing the wheel
![Page 128: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/128.jpg)
distributed systems: ops
UX of operating distributed systems is horrible
![Page 129: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/129.jpg)
① why does each distributed system have to implement the same things?
![Page 130: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/130.jpg)
② why can’t we build distributed systems that incorporate and automate operations?
![Page 131: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/131.jpg)
thesis: we need a common abstraction layer upon which all distributed systems can be built and …
![Page 132: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/132.jpg)
we need to build distributed systems with schedulers (on top of said common abstraction layer)
![Page 133: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/133.jpg)
Apache Mesos is a distributed system for running and building other distributed systems
![Page 134: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/134.jpg)
Apache Mesos: distributed systems kernel
![Page 135: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/135.jpg)
Apache Mesos: datacenter kernel
![Page 136: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/136.jpg)
Mesos (nodes)
Mesos: datacenter kernel
scheduler
Mesos (master)
scheduler
syscall-‐like API for datacenter
![Page 137: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/137.jpg)
Mesos: datacenter kernel + provide common functionality every new distributed system re-‐implements
![Page 138: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/138.jpg)
Mesos: datacenter kernel + enable running multiple distributed systems on the same cluster of machines and dynamically share the resources more efficiently!
![Page 139: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/139.jpg)
build against Mesos: ① abstract cloud, i.e., “hardware”
② leverage primitives to implement/automate failures, maintenance, etc.
![Page 140: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/140.jpg)
Mesos primitives • principals, users, roles • advanced fair-‐sharing
allocation algorithms • high-‐availability (even
during upgrades) • resource monitoring • preemption/revocation • volume management • reservations (dynamic/
static) • …
![Page 141: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/141.jpg)
maintenance via primitives
propagate machine restart, shutdown, downtime, etc
![Page 142: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/142.jpg)
there is a lot of stuff in a kernel
![Page 143: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/143.jpg)
there is a lot of stuff in a datacenter kernel
![Page 144: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/144.jpg)
built on Mesos:
2009 2010 2013 2014
![Page 145: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/145.jpg)
ported to Mesos:
2011 2012 2013 2014
![Page 146: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/146.jpg)
some of our adopters …
2010 2013 2014 …
![Page 147: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/147.jpg)
conclusion
![Page 148: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/148.jpg)
the datacenter is just another form factor
![Page 149: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/149.jpg)
the datacenter is just another form factor
![Page 150: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/150.jpg)
why can’t we run applications on our datacenters just like we run applications on our mobile phones?
![Page 151: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/151.jpg)
![Page 152: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/152.jpg)
desktop computer
server datacenter
OS
OS
OS
the datacenter computer needs an operating system
![Page 153: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/153.jpg)
today
YOU
API
tomorrow
Mesos: datacenter kernel
provides common functionality every new distributed system re-‐implements: • failure detection • package distribution • task starting • resource isolation • resource monitoring • task killing, cleanup • …
![Page 154: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/154.jpg)
today
YOU
API
tomorrow
provides common functionality every new distributed system re-‐implements: • failure detection • package distribution • task starting • resource isolation • resource monitoring • task killing, cleanup • …
Mesos: datacenter kernel
don’t reinvent the wheel!
![Page 155: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/155.jpg)
KernelMesos
Frameworks MarathonChronos
...
Modules mesos-dns
DCOS CLI DCOS GUI Repository
Mesosphere DCOS
![Page 156: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/156.jpg)
Airbnb’s Chronos
Chronos, a scheduler for running cron jobs with dependencies
![Page 157: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/157.jpg)
Airbnb’s Chronos
Chronos, a scheduler for running cron jobs with dependencies
cron for the datacenter operating system
![Page 158: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/158.jpg)
Mesosphere’s Marathon
Marathon, a scheduler for running stateless services written in any language
![Page 159: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/159.jpg)
Mesosphere’s Marathon
Marathon, a scheduler for running stateless services written in any language
init for the datacenter operating system
![Page 160: Building/Running* Distributed* Systems*with* Apache*Mesos*€¦ · “howshould!apps!find!each!other?” servicediscovery webhosts.txt* web1.twttr.com* web2.twttr.com* web3.twttr.com*](https://reader034.vdocument.in/reader034/viewer/2022042323/5f0e03587e708231d43d302c/html5/thumbnails/160.jpg)
Q&A
mesos.apache.org mesosphere.com
@ApacheMesos @mesosphere