everything you thought you already knew about orchestration

Everything You Thought You Already Knew About Orchestration

Laura Frank Director of Engineering, Codeship

Managing Distributed State with Raft Quorum 101 Leader Election Log Replication

Service Scheduling

Failure Recovery

Agenda

bonus debu

gging tips!

They’re trying to get a collection of nodes to behave like a single node. • How does the system maintain state? • How does work get scheduled?

The Big Problem(s)What are tools like Swarm and Kubernetes trying to do?

Manager leader

Worker Worker

Manager follower

Manager follower

Worker Worker Worker

raft consensus group

So You Think You Have Quorum

Quorum

The minimum number of votes that a consensus group needs in order to be allowed to perform an operation.

Without quorum, your system can’t do work.

Math! Managers Quorum Fault Tolerance

1 1 0

2 2 0

3 2 1

4 3 1

5 3 2

6 4 2

7 4 3

(N/2) + 1

In simpler terms, it means a majority

Having two managers instead of one actually doubles your chances of losing quorum.

Pay attention to datacenter topology when placing managers.

Quorum With Multiple Regions

Manager Nodes Distribution across 3 Regions

3 1-1-1

5 1-2-2

7 3-2-2

9 3-3-3

magically work

s with

Docker for AW

S

Let’s talk about Raft!

I think I’ll just write my own distributed consensus algorithm.

-no sensible person

Log replication

Leader election

Safety (won’t talk about this much today)

Raft is responsible for…

Being easier to understand

Orchestration systems typically use a key/value store backed by a consensus algorithm

In a lot of cases, that algorithm is Raft!

Raft is used everywhere……that etcd is used

SwarmKit implements the Raft algorithm directly.

In most cases, you don’t want to run work on your manager nodes

docker node update --availability drain <NODE>

Participating in a Raft consensus group is work, too. Make your manager nodes unavailable for tasks:

*I will run work on managers for educational purposes

Leader Election & Log Replication

Manager leader

Manager candidate

Manager follower

Manager offline

demo.consensus.group

The log is the source of truth for your application.

In the context of distributed computing (and this talk), a log is an append-only, time-based record of data.

2 10 30 25 5 12first entry append entry here!

This log is for computers, not humans.

2 10 30 25 5 12Server

12

Client12

In simple systems, the log is pretty straightforward.

In a manager group, that log entry can only “become truth” once it is confirmed from the majority of followers (quorum!)

Client12

Manager follower

Manager follower

Manager leader

demo.consensus.group

In distributed computing, it’s essential that you understand log replication.

bit.ly/logging-post

Debugging Tip

Watch the Raft logs.

Monitor via inotifywait OR just read them directly!

Scheduling

HA application problems

scheduling problems

orchestrator problems

Scheduling constraints

Restrict services to specific nodes, such as specific architectures, security levels, or types

docker service create \ --constraint 'node.labels.type==web' my-app

New in 17.04.0-ceTopology-aware scheduling!!1!

Implements a spread strategy over nodes that belong to a certain category.

Unlike --constraint, this is a “soft” preference

—placement-pref ‘spread=node.labels.dc’

Swarm will not rebalance healthy tasks when a new node comes online

Debugging Tip

Add a manager to your Swarm running with --availability drain and in Engine debug mode

Failure Recovery

Losing quorum

• Bring the downed nodes back online (derp)

Regain quorum

• On a healthy manager, run docker swarm init --force-new-cluster

This will create a new cluster with one healthy manager • You need to promote new managers

The datacenter is on fire

• Bring up a new manager and stop Docker • sudo rm -rf /var/lib/docker/swarm• Copy backup to /var/lib/docker/swarm• Start Docker • docker swarm init (--force-new-cluster)

Restore from a backup in 5 easy steps!

• In general, users shouldn’t be allowed to modify IP addresses of nodes

• Restoring from a backup == old IP address for node1 • Workaround is to use elastic IPs with ability to reassign

But wait, there’s a bug… or a feature

Thank You!

@docker

#dockercon

everything you thought you already knew about orchestration

Technology