karl grzeszczak: september docker presentation at mediafly

CoreOSWhat is it and why should I care?

1 / 80

Who amI?

Karl Grzeszczak

Senior Software Engineer - Mediafly

twitter @karl_grz

karlgrz.com2 / 80

At Mediafly, a lot of our infrastructure isservice oriented distributed systemsrunning docker containers

3 / 80

CoreOS seems like an ideal fit for ourneeds, so I decided to investigate

4 / 80

I am -not- affiliated with CoreOS(I'm just curious and wanted to understand it!)

5 / 80

Brief Overview

6 / 80

LightweightCoreOS is designed to be a modern, minimal base to build yourplatform. Consumes 40% less RAM on boot than an averageLinux installation.

https://coreos.com/

7 / 80

https://coreos.com/

Painless UpdatingUtilizes an active/passive dual-partition scheme to updatethe OS as a single unit instead of package by package. Thismakes each update quick, reliable and able to be easily rolledback.

https://coreos.com/

8 / 80

https://coreos.com/

Docker ContainersApplications on CoreOS run as Docker containers. Containersprovide maximum flexibility in packaging and can start inmilliseconds.

https://coreos.com/

9 / 80

https://coreos.com/

Clustered By DefaultCoreOS works well on a single machine, but it's designed tobe clustered. Easily run application containers acrossmultiple machines with fleet and connect them together withservice discovery.

https://coreos.com/

10 / 80

https://coreos.com/

Distributed Systems ToolsBuilt-in primitives such as distributed locking and masterelection are the building blocks for large scale distributedsystems.

https://coreos.com/

11 / 80

https://coreos.com/

Service DiscoveryEasily locate where services are being run within the clusterand be notified when something changes. Essential for acomplex, highly dynamic cluster. Built into CoreOS with highavailability and automatic fail-over.

https://coreos.com/

12 / 80

https://coreos.com/

How is it different from other *NIXes?

13 / 80

No package manager

All your applications should run as a container

Linux kernel, docker, systemd, fleetd, etcd, sshd

According to https://coreos.com, it uses 114MB of RAM atboot, approximately 40% less than average Linux server

Designed specifically for running distributed systems

14 / 80

https://coreos.com/

This is ideal if you already use docker

15 / 80

What do you have to do differently?

16 / 80

What doyou have tododifferently?

etcd service discovery

17 / 80



broadcast your applications keyinfrastructure settings back to etcd

18 / 80



broadcast your applications keyinfrastructure settings back to etcd

use fleet to orchestrate your containers

19 / 80

etcd

20 / 80

http://github.com/coreos/etcd

21 / 80

http://github.com/coreos/etcd

A highly-available key value store forshared configuration and servicediscovery. etcd is inspired by ApacheZooKeeper and doozer

https://github.com/coreos/etcd#readme-version-046

22 / 80


Simple: curl'able user facing API (HTTP+JSON)

Secure: optional SSL client cert authentication

Fast: benchmarked 1000s of writes/s per instance

Reliable: properly distributed using Raft

etcd is written in Go and uses the Raft consensus algorithmto manage a highly-available replicated log.


23 / 80


Raft Concensus Algorithm

24 / 80

In Search of an Understandable Concensus Algorithm byStanford's Diego Ongaro and John Ousterhout

https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf

"As a result, each state machine processes the same seriesof commands and thus produces the same series of resultsand arrives at the same series of states."

http://raftconsensus.github.io/

25 / 80

https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf

http://raftconsensus.github.io/

Basically...

26 / 80

Raft elects a leader, and the leader records a master versionand distributes that to the other nodes in the cluster. It doesnot write a confirmation until it hears back from a concensusof nodes that agree.

If the leader goes AWOL for a certain time, then a newelection process begins to find a new leader and continue.

27 / 80

For now, just understand...Raft is similar to Paxos in fault-tolerance and performanceand it makes sure that etcd and your cluster can continueoperating even if some nodes experience partitions (or areterminated!)

28 / 80

This is an AWESOME animation you should watch because itexplains Raft MUCH better than I can:

http://thesecretlivesofdata.com/raft/

29 / 80

http://thesecretlivesofdata.com/raft/

fleet

30 / 80

http://github.com/coreos/fleet

31 / 80

http://github.com/coreos/fleet

fleet ties systemd and etcd together into a distributed initsystem

32 / 80

Supported Deployment Patterns

33 / 80

Deploy a single unit anywhere on thecluster

https://github.com/coreos/fleet#supported-deployment-patterns

34 / 80


Deploy a unit globally everywhere in thecluster


35 / 80


Automatic rescheduling of units onmachine failure


36 / 80


Ensure that units are deployed togetheron the same machine


37 / 80


Forbid specific units from colocation onthe same machine (anti-affinity)


38 / 80


Deploy units to machines only withspecific metadata


39 / 80


It makes it very easy to know what is running in your cluster,where, and how it's doing

40 / 80

fleet has a LOT of promise, and is myfavorite part of CoreOS

41 / 80

...but...

42 / 80

...it's also my least favorite part ofCoreOS

43 / 80

fleet (0.8) seems very early, rough, andopinionated whereas etcd seems readyfor production

44 / 80

...but it feels like the best option outthere right now

45 / 80

Read this post later:http://lukebond.ghost.io/deploying-docker-containers-on-a-coreos-cluster-with-fleet/

I found this while putting together this presentation, and Ithink it does a great job explaining all this in written form

46 / 80

http://lukebond.ghost.io/deploying-docker-containers-on-a-coreos-cluster-with-fleet/

Show me teh codez

47 / 80

Walkthrough on Vagranthttp://github.com/coreos/coreos-vagrant

https://coreos.com/docs/running-coreos/platforms/vagrant/

48 / 80

http://github.com/coreos/coreos-vagrant

https://coreos.com/docs/running-coreos/platforms/vagrant/

bootstrapping the clusterkarl@karl-mediafly:~$ curl discovery.etcd.io/newhttps://discovery.etcd.io/b9845b31a57793fe9f88137220b7f454

49 / 80

Output gets pasted into user-data:#cloud-config

coreos: etcd: # generate a new token for each unique cluster from https://discovery.etcd.io/new # WARNING: replace each time you 'vagrant destroy' discovery: https://discovery.etcd.io/b9845b31a57793fe9f88137220b7f454 addr: $public_ipv4:4001 peer-addr: $public_ipv4:7001 fleet: public-ip: $public_ipv4 units: - name: etcd.service command: start - name: fleet.service command: start - name: docker-tcp.socket command: start enable: true

50 / 80

show all machines in your clustercore@core-01 ~/share $ fleetctl list-machinesMACHINE IP METADATA78e5ab3e... 172.17.8.103 -adddf8be... 172.17.8.102 -df763c2f... 172.17.8.101 -

51 / 80

service unit[Unit]Description=karlgrz.comAfter=docker.serviceRequires=docker.service

[Service]TimeoutStartSec=0ExecStartPre=-/usr/bin/docker kill karlgrz_webExecStartPre=-/usr/bin/docker rm karlgrz_webExecStartPre=/usr/bin/docker pull karlgrz/ubuntu-14.04-base-nginxExecStartPre=/bin/sh -c "cd /srv/karlgrz.com && \ /usr/bin/docker build -t karlgrz/karlgrz_web ."ExecStart=/usr/bin/docker run --name karlgrz_web -p 8001:8001 karlgrz/karlgrz_webExecStop=/usr/bin/docker stop karlgrz_web

52 / 80

start up some unitscore@core-01 ~/share/karlgrz-docker/fleet $ fleetctl start fantasy_web.service \jcsdoorsolutions_web.service stickfigureninjas_web.service karlgrz_web.serviceUnit fantasy_web.service launched on adddf8be.../172.17.8.102Unit karlgrz_web.service launched on adddf8be.../172.17.8.102Unit jcsdoorsolutions_web.service launched on 78e5ab3e.../172.17.8.103Unit stickfigureninjas_web.service launched on 78e5ab3e.../172.17.8.103

53 / 80

list loaded units and their statuscore@core-01 ~/share/karlgrz-docker/fleet $ fleetctl list-unitsUNIT MACHINE ACTIVE SUBfantasy_web.service adddf8be.../172.17.8.102 activating start-prejcsdoorsolutions_web.service 78e5ab3e.../172.17.8.103 activating start-prekarlgrz_web.service adddf8be.../172.17.8.102 active runningstickfigureninjas_web.service 78e5ab3e.../172.17.8.103 activating start-pre

core@core-01 ~/share/karlgrz-docker/fleet $ fleetctl list-unitsUNIT MACHINE ACTIVE SUBfantasy_web.service adddf8be.../172.17.8.102 active runningjcsdoorsolutions_web.service 78e5ab3e.../172.17.8.103 active runningkarlgrz_web.service adddf8be.../172.17.8.102 active runningstickfigureninjas_web.service 78e5ab3e.../172.17.8.103 active running

54 / 80

discovery sidekick[Unit]Description=Announce karlgrz.comBindsTo=karlgrz_web.service[Service]EnvironmentFile=/etc/environmentExecStart=/bin/sh -c "while true; \ do etcdctl set /apps/karlgrz_web \ '{ \"host\": \"karlgrz.com\", \"appkey\": \"karlgrz_web\", \"ip\" :\"${COREOS_PUBLIC_IPV4}\", \"port\" :\"8001\" }' \ --ttl 60; sleep 45; done"ExecStop=/usr/bin/etcdctl rm /apps/karlgrz_web

[X-Fleet]MachineOf=karlgrz_web.service

55 / 80

run discovery sidekickscore@core-01 ~/share/karlgrz-docker/fleet $ etcdctl ls /apps

core@core-01 ~/share/karlgrz-docker/fleet $ fleetctl start fantasy_discovery.service \jcsdoorsolutions_discovery.service stickfigureninjas_discovery.service \karlgrz_discovery.serviceUnit jcsdoorsolutions_discovery.service launched on 78e5ab3e.../172.17.8.103Unit stickfigureninjas_discovery.service launched on 78e5ab3e.../172.17.8.103Unit fantasy_discovery.service launched on adddf8be.../172.17.8.102Unit karlgrz_discovery.service launched on adddf8be.../172.17.8.102

core@core-01 ~/share/karlgrz-docker/fleet $ etcdctl ls /apps/apps/rethinkdb_services/apps/fantasy_web/apps/karlgrz_web/apps/jcsdoorsolutions_web/apps/stickfigureninjas_web

56 / 80

etcd valuescore@core-01 ~/share/karlgrz-docker/fleet $ etcdctl get /apps/karlgrz_web{ "host": "karlgrz.com", "appkey": "karlgrz_web", "ip" :"172.17.8.102", "port" :"8001" }

57 / 80

list unitscore@core-01 ~/share/karlgrz-docker/fleet $ fleetctl list-unitsUNIT MACHINE ACTIVE SUBfantasy_discovery.service adddf8be.../172.17.8.102 active runningfantasy_web.service adddf8be.../172.17.8.102 active runningjcsdoorsolutions_discovery.service 78e5ab3e.../172.17.8.103 active runningjcsdoorsolutions_web.service 78e5ab3e.../172.17.8.103 active runningkarlgrz_discovery.service adddf8be.../172.17.8.102 active runningkarlgrz_web.service adddf8be.../172.17.8.102 active runningrethinkdb_discovery.service df763c2f.../172.17.8.101 active runningrethinkdb_services.service df763c2f.../172.17.8.101 active runningstickfigureninjas_discovery.service78e5ab3e.../172.17.8.103 active runningstickfigureninjas_web.service 78e5ab3e.../172.17.8.103 active running

58 / 80

run a unit on ONLY one SPECIFIC node[Unit]Description=rethinkdbAfter=docker.serviceRequires=docker.service

[Service]TimeoutStartSec=0ExecStartPre=-/usr/bin/docker kill rethinkdb_servicesExecStartPre=-/usr/bin/docker rm rethinkdb_servicesExecStartPre=/usr/bin/docker pull dockerfile/rethinkdbExecStart=/usr/bin/docker run --name rethinkdb_services \ -p 8080:8080 -p 28015:28015 -p 29015:29105 -v /home/core/rethinkdb:/data \ -t dockerfile/rethinkdb rethinkdb -d /data --bind allExecStop=/usr/bin/docker stop rethinkdb_services

[X-Fleet]MachineID=9f152bf8

59 / 80

see logging output from a running containercore@core-01 ~ $ fleetctl journal fantasy_web-- Logs begin at Wed 2014-09-24 21:32:32 UTC, end at Thu 2014-09-25 19:55:26 UTC. --Sep 25 18:22:08 core-02 docker[1572]: Python version: 2.7.6 (default, Mar 22 2014, 23:03:41) [GCC 4.8.2]Sep 25 18:22:08 core-02 docker[1572]: Python main interpreter initialized at 0xc53540Sep 25 18:22:08 core-02 docker[1572]: python threads support enabledSep 25 18:22:08 core-02 docker[1572]: your server socket listen backlog is limited to100 connectionsSep 25 18:22:08 core-02 docker[1572]: your mercy for graceful operations on workers is60 secondsSep 25 18:22:08 core-02 docker[1572]: mapped 72768 bytes (71 KB) for 1 coresSep 25 18:22:08 core-02 docker[1572]: *** Operational MODE: single process ***Sep 25 18:22:09 core-02 docker[1572]: WSGI app 0 (mountpoint='') ready in 1 seconds oninterpreter 0xc53540 pid: 13 (default app)Sep 25 18:22:09 core-02 docker[1572]: *** uWSGI is running in multiple interpreter mode ***Sep 25 18:22:09 core-02 docker[1572]: spawned uWSGI worker 1 (and the only) (pid: 13, cores: 1)

60 / 80

core@core-03 ~ $ fleetctl journal karlgrz_web-- Logs begin at Wed 2014-09-24 21:32:32 UTC, end at Thu 2014-09-25 19:56:33 UTC. --Sep 25 18:21:58 core-03 sh[1315]: ---> Using cacheSep 25 18:21:58 core-03 sh[1315]: ---> ce8cd32fe157Sep 25 18:21:58 core-03 sh[1315]: Step 6 : RUN cd /srv && make publishSep 25 18:21:58 core-03 sh[1315]: ---> Using cacheSep 25 18:21:58 core-03 sh[1315]: ---> 83f7f333889bSep 25 18:21:58 core-03 sh[1315]: Step 7 : CMD ["nginx"]Sep 25 18:21:58 core-03 sh[1315]: ---> Using cacheSep 25 18:21:58 core-03 sh[1315]: ---> 4cf274f01daeSep 25 18:21:58 core-03 sh[1315]: Successfully built 4cf274f01daeSep 25 18:21:59 core-03 systemd[1]: Started karlgrz.com.

61 / 80

core@core-02 ~/share/karlgrz-docker/fleet $ fleetctl journal classholes_web-- Logs begin at Wed 2014-09-24 21:32:01 UTC, end at Thu 2014-09-25 20:03:55 UTC. --Sep 25 20:01:40 core-02 systemd[1]: Starting classholes.com...Sep 25 20:01:40 core-02 docker[3071]: Error response from daemon: No such container:classholes_webSep 25 20:01:40 core-02 docker[3071]: 2014/09/25 20:01:40 Error: failed to kill one ormore containersSep 25 20:01:40 core-02 docker[3085]: Error response from daemon: No such container:classholes_webSep 25 20:01:40 core-02 docker[3085]: 2014/09/25 20:01:40 Error: failed to remove one ormore containersSep 25 20:01:40 core-02 docker[3095]: Pulling repository karlgrz/ubuntu-14.04-base-nginxSep 25 20:01:42 core-02 systemd[1]: classholes_web.service: control process exited, code=exited status=1Sep 25 20:01:42 core-02 systemd[1]: Failed to start classholes.com.Sep 25 20:01:42 core-02 sh[3110]: /bin/sh: line 0: cd: /home/core/share/classholes: No suchfile or directorySep 25 20:01:42 core-02 systemd[1]: Unit classholes_web.service entered failed state.

62 / 80

terminate a node and see the services running on it moved toanother node in the clusterkarl@karl-mediafly:~/workspace/coreos-vagrant$ vagrant ssh core-03 -- -ALast login: Thu Sep 25 16:37:01 2014 from 10.0.2.2CoreOS (beta)core@core-03 ~ $ shutdown -nshutdown: invalid option -- 'n'core@core-03 ~ $ shutdownMust be root.core@core-03 ~ $ sudo shutdown -nshutdown: invalid option -- 'n'core@core-03 ~ $ sudo shutdownShutdown scheduled for Thu 2014-09-25 16:46:14 UTC, use 'shutdown -c' to cancel.Broadcast message from root@core-03 (Thu 2014-09-25 16:45:14 UTC):

The system is going down for power-off at Thu 2014-09-25 16:46:14 UTC!

63 / 80

core@core-02 ~ $ fleetctl list-unitsUNIT MACHINE ACTIVE SUBfantasy_discovery.service adddf8be.../172.17.8.102 active runningfantasy_web.service adddf8be.../172.17.8.102 active runningkarlgrz_discovery.service adddf8be.../172.17.8.102 active runningkarlgrz_web.service adddf8be.../172.17.8.102 active runningrethinkdb_discovery.service df763c2f.../172.17.8.101 active runningrethinkdb_services.service df763c2f.../172.17.8.101 active running

64 / 80

core@core-02 ~ $ fleetctl list-unitsUNIT MACHINE ACTIVE SUBfantasy_discovery.service adddf8be.../172.17.8.102 active runningfantasy_web.service adddf8be.../172.17.8.102 active runningjcsdoorsolutions_discovery.service df763c2f.../172.17.8.101 active runningjcsdoorsolutions_web.service df763c2f.../172.17.8.101 activating start-prekarlgrz_discovery.service adddf8be.../172.17.8.102 active runningkarlgrz_web.service adddf8be.../172.17.8.102 active runningrethinkdb_discovery.service df763c2f.../172.17.8.101 active runningrethinkdb_services.service df763c2f.../172.17.8.101 active runningstickfigureninjas_discovery.servicedf763c2f.../172.17.8.101 active runningstickfigureninjas_web.service df763c2f.../172.17.8.101 activating start-pre

65 / 80

core@core-02 ~ $ fleetctl list-machinesMACHINE IP METADATAadddf8be... 172.17.8.102 -df763c2f... 172.17.8.101 -

66 / 80

core@core-02 ~ $ fleetctl list-unitsUNIT MACHINE ACTIVE SUBfantasy_discovery.service adddf8be.../172.17.8.102 active runningfantasy_web.service adddf8be.../172.17.8.102 active runningjcsdoorsolutions_discovery.service df763c2f.../172.17.8.101 active runningjcsdoorsolutions_web.service df763c2f.../172.17.8.101 active runningkarlgrz_discovery.service adddf8be.../172.17.8.102 active runningkarlgrz_web.service adddf8be.../172.17.8.102 active runningrethinkdb_discovery.service df763c2f.../172.17.8.101 active runningrethinkdb_services.service df763c2f.../172.17.8.101 active runningstickfigureninjas_discovery.servicedf763c2f.../172.17.8.101 active runningstickfigureninjas_web.service df763c2f.../172.17.8.101 active running

67 / 80

Conclusions

68 / 80

Please keep in mind I ran this cluster on my laptop usingVagrant, not on cloud infrastructure

69 / 80

Clustering just worked(I didn't even really have to think about failover or replicationmyself)

70 / 80

alpha softwarefleet and etcd are great, but they both need some more workbefore being "production ready"

71 / 80

fleet in particular gets into situations sometimes where Ihave destroyed a unit but it still shows in the list of units fora while

72 / 80

fleet doesn't have a nice mechanism to restart all your unitsor groups (at least that I found)

73 / 80

etcd is awesome :-)

74 / 80

not quite ready for Mediafly

75 / 80

...but...

76 / 80

I plan on deploying CoreOS to power myside projects, blog, and the handful ofsites I run for friends soon

77 / 80

I feel that after a bit of work this will bethe OS that powers distributed systemsin the future

78 / 80

Questions?

79 / 80

Fin.

80 / 80

karl grzeszczak: september docker presentation at mediafly

Engineering