nomad - goto london · highly available!"#!$%&'(built for scale. built on experience...
TRANSCRIPT
NomadHASHICORP
HASHICORP
Armon Dadgar @armon
HASHICORP
NomadHASHICORP
Cluster Manager
Scheduler
NomadHASHICORP
Cluster Manager
Scheduler
HASHICORP
Schedulers map a set of work to a set of resources
HASHICORP
CPU Scheduler
Web Server -Thread 1
CPU - Core 1
CPU - Core 2
Web Server -Thread 2
Redis -Thread 1
Kernel -Thread 1
Work (Input) Resources
CPU Scheduler
HASHICORP
CPU Scheduler
Web Server -Thread 1
CPU - Core 1
CPU - Core 2
Web Server -Thread 2
Redis -Thread 1
Kernel -Thread 1
Work (Input) Resources
CPU Scheduler
HASHICORP
Schedulers In the Wild
Type Work Resources
CPU Scheduler Threads Physical Cores
AWS EC2 / OpenStack Nova Virtual Machines Hypervisors
Hadoop YARN MapReduce Jobs Client Nodes
Cluster Scheduler Applications Servers
HASHICORP
Advantages
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
HASHICORP
Advantages
Bin Packing
Over-Subscription
Job Queueing
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
HASHICORP
Advantages
Abstraction
API Contracts
Standardization
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
HASHICORP
Advantages
Priorities
Resource Isolation
Pre-emption
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
HASHICORP
NomadHASHICORP
NomadHASHICORP
Cluster Scheduler
Easily Deploy Applications
Operationally Simple
Built for Scale
job "redis" { datacenters = ["us-east-1"]
task "redis" { driver = "docker" config { image = "redis:latest" }
resources { cpu = 500 # Mhz memory = 256 # MB
network { mbits = 10 dynamic_ports = ["redis"] } } }}
example.nomad
HASHICORP
Job Specification
Declares what to run
HASHICORP
Job Specification
Nomad determines where and manages how to run
HASHICORP
Job Specification
Nomad abstracts work from resources
HASHICORP
Containerized
Virtualized
Standalone
Docker
Rkt
Windows Server Containers
Qemu / KVM
Hyper-V
Xen
Java Jar
Static Binaries
C#
NomadHASHICORP
Declarative Job Specification
Infrastructure-As-Code
Removes Imperative Logic
External Dependencies?
NomadHASHICORP
Service Discovery?
Health Monitoring?
Application Secrets?
Stateful Applications?
job “my-app" { … task “my-app" { service { port = “http” check { type = “http” path = “/health” interval = “5s” } } }}
example.nomad
HASHICORP
Nomad Server Consul Server
Client
Nomad Consul
App 1
App N
Schedule App Register Service Monitor Health
NomadHASHICORP
Secret Distribution:
API Keys
DB Credentials
SSL/TLS Certificates
job “my-app" { … task “my-app" { env { DB_USERPASS = “foo:bar” } }}
example.nomad
Vault
Secure secret storage
Dynamic secrets
Leasing, renewal, and revocation
Auditing
Rich ACLs
Multiple client authentication methods
HASHICORP
Login
Vault Token
Vault Token + Operation
Op Response
job “my-app" { … task “my-app" { env { VAULT_TOKEN = “b6a10b96-9060-11e6-9c6f-67a52bc6b8d3” } }}
example.nomad
job “my-app" { … task “my-app" { vault { policies = [“my-app-role”] } }}
example.nomad
HASHICORP
Nomad Server
Client
Nomad
App 1
App N
Submit Job + Vault Token Verify Vault Token
Schedule App
Generate + Renew Vault Token
NomadHASHICORP
Native Vault Integration
No Secrets in Jobs
No Secrets on Client Disk
Minimize Trust
HASHICORP
Stateful Applications
Stateless Stateful
HASHICORP
Stateful Applications
Stateless StatefulAPI
Web Cache
HASHICORP
Stateful Applications
Stateless StatefulAPI
Web Cache
HDFS Cassandra MongoDB
HASHICORP
Stateful Applications
Stateless StatefulAPI
Web Cache
HDFS Cassandra MongoDB
*SQL
HASHICORP
Stateful Applications
Stateless StatefulAPI
Web Cache
HDFS Cassandra MongoDB
*SQL
EASY MEDIUM HARD
job “my-app" { … task “my-app" { ephemeral_disk { sticky = true } }}
example.nomad
HASHICORP
Moves data between tasks on the same machine
HASHICORP
Copies data between tasks on different machines
NomadHASHICORP
Easily Deploy Apps:
Declarative Jobs
Flexible Workloads
Consul Integration
Vault Integration
Sticky Volumes
HASHICORP
Operationally Simple
HASHICORP
Client Server
Built on Experience
GOSSIP CONSENSUS
Serf
Cluster Management
Gossip Based (P2P)
Membership
Failure Detection
Event System
Serf
Large Scale
Production Hardened
Simple Clustering and Federation
Consul
Service Discovery
Configuration
Coordination (Locking)
Central Servers + Distributed Clients
Consul
Multi-Datacenter
Raft Consensus
Large Scale
Production Hardened
NomadHASHICORP
Operational Simplicity:
Single Binary
No Dependencies
Highly Available
HASHICORP
Built for Scale
Built on Experience
GOSSIP CONSENSUS
Mature Libraries Proven Design Patterns
Lacking Scheduling Logic
Built on Research
GOSSIP CONSENSUS
HASHICORP
Single Region Architecture
SERVER SERVER SERVER
CLIENT CLIENT CLIENTDC1 DC2 DC3
FOLLOWER LEADER FOLLOWER
REPLICATIONFORWARDING
REPLICATIONFORWARDING
RPC RPC RPC
HASHICORP
Multi Region Architecture
SERVER SERVER SERVERFOLLOWER LEADER FOLLOWER
REPLICATIONFORWARDING
REPLICATION
REGION B GOSSIP
REPLICATION REPLICATIONFORWARDING
REGION FORWARDING
REGION A
SERVERFOLLOWER
SERVER SERVERLEADER FOLLOWER
NomadHASHICORP
Region is Isolation Domain
1-N Datacenters Per Region
Flexibility to do 1:1 (Consul)
Scheduling Boundary
HASHICORP
Hundreds of regions
Tens of thousands of clients per region
Thousands of jobs per region
Nomad
Inspired by Google Omega
Optimistic Concurrency
State Coordination
Service & Batch workloads
Pluggable Architecture
Data Model
ALLOCATION
JOB
EVALUATION
NODE
Evaluation ~= State Change
Evaluations
Create / Update / Delete Job
Node Up / Node Down
Allocation Failed / Finished
Evaluations
SCHEDULER
func(Evaluation) => []AllocationUpdates
Evaluations
SCHEDULER
func(Evaluation) => []AllocationUpdates
Service, Batch, System
HASHICORP
External Event
EvaluaBon CreaBon
EvaluaBon Queuing
EvaluaBon Processing
OpBmisBc CoordinaBon
State Updates
NomadHASHICORP
Omega Architecture
Optimistically Schedule
100’s of Jobs in Parallel
Controls for Correctness
NomadMillion Container Challenge
1,000 Jobs
1,000 Tasks per Job
5,000 Hosts on GCE
1,000,000 Containers
“– Bill Gates
640 KB ought to be enough for anybody.
2nd Largest Hedge Fund
18K Cores
5 Hours
2,200 Containers/second
NomadHASHICORP
Cluster Scheduler
Easily Deploy Applications
Operationally Simple
Built for Scale
HASHICORP
Thanks! Q/A
HASHICORP