experiences from devops production: deployment, performance, failure

83
Experiences from production Deployment, performance, failure David Mytton All Your Base - Oct 2014 blog.serverdensity.com

Upload: server-density

Post on 14-Jun-2015

717 views

Category:

Technology


0 download

DESCRIPTION

In his All Your Base talk, David Mytton (founder of Server Density) will talk you through our experiences in handling large scale MongoDB deployments.

TRANSCRIPT

Page 1: Experiences from DevOps production: Deployment, performance, failure

Experiences from productionDeployment, performance, failure

David MyttonAll Your Base - Oct 2014

blog.serverdensity.com

Page 2: Experiences from DevOps production: Deployment, performance, failure

David Mytton

Page 3: Experiences from DevOps production: Deployment, performance, failure

serverdensity.com/allyourbase

Page 4: Experiences from DevOps production: Deployment, performance, failure

Slides: twitter.com/davidmytton

Page 5: Experiences from DevOps production: Deployment, performance, failure

Agenda

● Performance

● Architecture

● Downtime

● Preparation

● Where to host?

Page 6: Experiences from DevOps production: Deployment, performance, failure

Server Density Architecture

Page 7: Experiences from DevOps production: Deployment, performance, failure

Server Density Architecture

● ~100 servers - Ubuntu 12.04

Page 8: Experiences from DevOps production: Deployment, performance, failure

Server Density Architecture

● ~100 servers - Ubuntu 12.04

● 50:50 virtual/dedicated

Page 9: Experiences from DevOps production: Deployment, performance, failure

Server Density Architecture

● ~100 servers - Ubuntu 12.04

● 50:50 virtual/dedicated

● 200TB/m processed data

Page 10: Experiences from DevOps production: Deployment, performance, failure

Server Density Architecture

● ~100 servers - Ubuntu 12.04

● 50:50 virtual/dedicated

● 200TB/m processed data

● Nginx, Python, MongoDB

Page 11: Experiences from DevOps production: Deployment, performance, failure

Server Density Architecture

● ~100 servers - Ubuntu 12.04

● 50:50 virtual/dedicated

● 200TB/m processed data

● Nginx, Python, MongoDB

● Softlayer > 1TB RAM, 5TB SSDs

Page 12: Experiences from DevOps production: Deployment, performance, failure

Two choices for deployment

Page 13: Experiences from DevOps production: Deployment, performance, failure

Two choices for deployment

● Virtualized

● Bare metal

Page 14: Experiences from DevOps production: Deployment, performance, failure

Advantages of virtualization

● Easy to manage

Page 15: Experiences from DevOps production: Deployment, performance, failure

Advantages of virtualization

● Easy to manage

● Fast boot

Page 16: Experiences from DevOps production: Deployment, performance, failure

Advantages of virtualization

● Easy to manage

● Fast boot

● Easier to resize/migrate

Page 17: Experiences from DevOps production: Deployment, performance, failure

Advantages of virtualization

● Easy to manage

● Fast boot

● Easier to resize/migrate

● Templating/snapshots

Page 18: Experiences from DevOps production: Deployment, performance, failure

Advantages of virtualization

● Easy to manage

● Fast boot

● Easier to resize/migrate

● Templating/snapshots

● Containment

Page 19: Experiences from DevOps production: Deployment, performance, failure

Disadvantages of virtualization

● Another layer

Page 20: Experiences from DevOps production: Deployment, performance, failure

Disadvantages of virtualization

● Another layer

● Hypervisor overhead

Page 21: Experiences from DevOps production: Deployment, performance, failure

Disadvantages of virtualization

● Another layer

● Hypervisor overhead

● Host contention

Page 22: Experiences from DevOps production: Deployment, performance, failure

Disadvantages of virtualization

● Another layer

● Hypervisor overhead

● Host contention

● i/o performance

Page 23: Experiences from DevOps production: Deployment, performance, failure

Advantages of bare metal

● Dedicated resources

Page 24: Experiences from DevOps production: Deployment, performance, failure

Advantages of bare metal

● Dedicated resources

● Direct access to hardware

Page 25: Experiences from DevOps production: Deployment, performance, failure

Advantages of bare metal

● Dedicated resources

● Direct access to hardware

● Customisable specs

Page 26: Experiences from DevOps production: Deployment, performance, failure

Advantages of bare metal

● Dedicated resources

● Direct access to hardware

● Customisable specs

● Performance

Page 27: Experiences from DevOps production: Deployment, performance, failure

Disadvantages of bare metal

● Build/deploy time

Page 28: Experiences from DevOps production: Deployment, performance, failure

Disadvantages of bare metal

● Build/deploy time

● More difficult to resize

Page 29: Experiences from DevOps production: Deployment, performance, failure

Disadvantages of bare metal

● Build/deploy time

● More difficult to resize

● Difficult to migrate/snapshot

Page 30: Experiences from DevOps production: Deployment, performance, failure

Disadvantages of bare metal

● Build/deploy time

● More difficult to resize

● Capex/lifetime

● Difficult to migrate/snapshot

Page 31: Experiences from DevOps production: Deployment, performance, failure

Performance problems?

Page 32: Experiences from DevOps production: Deployment, performance, failure

Performance problems?

Easy answer: move to bare metal!

Page 33: Experiences from DevOps production: Deployment, performance, failure

Key performance factors

● Network

Page 34: Experiences from DevOps production: Deployment, performance, failure

Key performance factors

● Network

● EC2: Cluster compute, high memory, high i/o, high storage

● GCE: Higher CPU instances

Page 35: Experiences from DevOps production: Deployment, performance, failure

Key performance factors

● Network

Page 36: Experiences from DevOps production: Deployment, performance, failure

Key performance factors

● Network

Location Ping RTT LatencyWithin USA 40-80msTrans-Atlantic 100msTrans-Pacific 150msEurope-Japan 300ms

Page 37: Experiences from DevOps production: Deployment, performance, failure

Networking performance

AWS

GCE

bit.ly/googlevsamazon

Page 38: Experiences from DevOps production: Deployment, performance, failure

Key performance factors

● Memory

Page 39: Experiences from DevOps production: Deployment, performance, failure

http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html

Page 40: Experiences from DevOps production: Deployment, performance, failure

http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html

Page 41: Experiences from DevOps production: Deployment, performance, failure

Key performance factors

● Memory is expensive

Page 42: Experiences from DevOps production: Deployment, performance, failure

Key performance factors

● Disk

● SSDs!

Page 43: Experiences from DevOps production: Deployment, performance, failure

Key performance factors

● Disk

● SSDs!

GCE: 256GB = $83.20/m

EC2: 256GB = $35.32/m

SL: 200GB = $81/m

Page 44: Experiences from DevOps production: Deployment, performance, failure

Why cloud?

● Flexible

Page 45: Experiences from DevOps production: Deployment, performance, failure

Why cloud?

● Flexible

● Unlimited resources

Page 46: Experiences from DevOps production: Deployment, performance, failure

Why cloud?

● Flexible

● Unlimited resources

● Cheap to get started

Page 47: Experiences from DevOps production: Deployment, performance, failure

Why cloud?

● Flexible

● Unlimited resources

● Cheap to get started

● Other products

Page 48: Experiences from DevOps production: Deployment, performance, failure

Why colo?

Page 49: Experiences from DevOps production: Deployment, performance, failure

Why colo?

● Vastly cheaper

Page 50: Experiences from DevOps production: Deployment, performance, failure

Why colo?

● Vastly cheaper

● Complete control

Page 51: Experiences from DevOps production: Deployment, performance, failure

Let’s talk about downtime

Page 52: Experiences from DevOps production: Deployment, performance, failure

2013 Spend: ~$5bn

Page 53: Experiences from DevOps production: Deployment, performance, failure

2013 Spend: ~$6bn

Page 54: Experiences from DevOps production: Deployment, performance, failure

2013 Spend: ~$4bn

Page 55: Experiences from DevOps production: Deployment, performance, failure

You will have downtime

How much do you spend?

Page 56: Experiences from DevOps production: Deployment, performance, failure

Preparation

Page 57: Experiences from DevOps production: Deployment, performance, failure

Preparation - On Call

● Rotations

Page 58: Experiences from DevOps production: Deployment, performance, failure

Preparation - On Call

● Off call

● Rotations

Page 59: Experiences from DevOps production: Deployment, performance, failure

Preparation - On Call

● Off call

● Rotations

● Work the next day?

● Reachability - Train, 3G/4G (edge?!), Do Not Disturb mode, system updates

Page 60: Experiences from DevOps production: Deployment, performance, failure

Preparation - On Call

● Off call

● Rotations

● Work the next day?

● Reachability - Train, 3G/4G (edge?!), Do Not Disturb mode, system updates

● Work the next day?

Page 61: Experiences from DevOps production: Deployment, performance, failure

Preparation - Documentation

Page 62: Experiences from DevOps production: Deployment, performance, failure

Preparation - Documentation

● Searchable

Page 63: Experiences from DevOps production: Deployment, performance, failure

Preparation - Documentation

● Searchable

● Easy to edit

Page 64: Experiences from DevOps production: Deployment, performance, failure

Preparation - Documentation

● Searchable

● Easy to edit

● Independent of your infrastructure

Page 65: Experiences from DevOps production: Deployment, performance, failure

Preparation - Documentation

● Searchable

● Easy to edit

● Independent of your infrastructure

● Up to date

Page 66: Experiences from DevOps production: Deployment, performance, failure
Page 67: Experiences from DevOps production: Deployment, performance, failure

Unexpected failures

Page 68: Experiences from DevOps production: Deployment, performance, failure

Unexpected failures

● Communication systems

Page 69: Experiences from DevOps production: Deployment, performance, failure

Unexpected failures

● Communication systems

● Network connectivity

Page 70: Experiences from DevOps production: Deployment, performance, failure

Unexpected failures

● Communication systems

● Network connectivity

● Access to support

Page 71: Experiences from DevOps production: Deployment, performance, failure

ALERT!

Page 72: Experiences from DevOps production: Deployment, performance, failure

ALERT!

1. Load up incident response checklist

Page 73: Experiences from DevOps production: Deployment, performance, failure

ALERT!

1. Load up incident response checklist

2. Log incident in JIRA

Page 74: Experiences from DevOps production: Deployment, performance, failure

ALERT!

1. Load up incident response checklist

2. Log incident in JIRA

3. Log into Ops War Room

Page 75: Experiences from DevOps production: Deployment, performance, failure

ALERT!

1. Load up incident response checklist

2. Log incident in JIRA

4. Public status post

3. Log into Ops War Room

Page 76: Experiences from DevOps production: Deployment, performance, failure

ALERT!

1. Load up incident response checklist

2. Log incident in JIRA

4. Public status post

5. Initial investigation

3. Log into Ops War Room

Page 77: Experiences from DevOps production: Deployment, performance, failure

Key response principles

Page 78: Experiences from DevOps production: Deployment, performance, failure

Key response principles

● Log everything

Page 79: Experiences from DevOps production: Deployment, performance, failure

Key response principles

● Log everything

● Frequent public status updates

Page 80: Experiences from DevOps production: Deployment, performance, failure

Key response principles

● Log everything

● Frequent public status updates

● Gather the team

Page 81: Experiences from DevOps production: Deployment, performance, failure

Key response principles

● Log everything

● Frequent public status updates

● Gather the team

● Escalate!

Page 82: Experiences from DevOps production: Deployment, performance, failure

Summary

● Performance

● Architecture

● Downtime

● Preparation

● Where to host?

Page 83: Experiences from DevOps production: Deployment, performance, failure

どもありがとうございます

@davidmytton

[email protected]

blog.serverdensity.com

serverdensity.com/allyourbase