lessons in moving from physical hosts to mesos
TRANSCRIPT
Lessons in moving from physical hosts to Mesos
Raj Shekhar, Senior Site Reliability Engineer
@ilunatech
Mesos
WHATWHYHOW
NOW WHAT
How most Ops teams run clusters today
Static partitioning has problemsUnequal load distribution on machinesSlower to add capacityNot fault tolerant
Is there a better way?Do we want machines or do we want resources?
MesosResource manager - the datacenter is one big poolCan run multi-tenant workloadsFailure detectionServices are isolated from one another
Why Mesos - Better resource utilizationRun multi-tenant workload on machines
Dynamic partitioning - no dedicated machines for tasks
Less resource hungry than virtual machines
Why Mesos - all the other good things
Fault tolerant - automatically restart failed jobs
Elasticity - grow and shrink on demand
Faster deploys
T.co - URL shortening
http://example.com/example http://t.co/examp
How
Package Deploy Test Go Live!
Life after Go LiveLowered operating expenseFewer routine operational tasksFaster deploys
Job throttling
Sudden spikes in latencies
What we learned
cgroups and cpu quotas
Capacity planning
Max traffic of the cluster was lower than our expectationWhat we learned
Different CPU variants have different throughput
Rethink service discovery
Services get hosts and ports assigned dynamically
What we learned
Use static proxies to forward connections
No perfect isolation
Sudden spike in latency
What we learned
Async ops where possible, noisy neighbours still affect us