stop worrying about prodweb001 and start loving i-98fb9856 (arc201) | aws re:invent 2013

132
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. They Don't Hug Back! Or Why You Need To Stop Worrying About prodweb001 And Start Loving i-98fb9856 Chris Munns, Amazon Web Services November 13, 2013

Upload: amazon-web-services

Post on 08-Sep-2014

5.308 views

Category:

Technology


1 download

DESCRIPTION

Traditionally, IT organizations have treated infrastructure components like family pets. We name them, we worry about them, and we let them wake us up at 4:00 am. Amazon CTO Werner Vogels has dubbed these behaviors as server hugging and antiquated in today's cloud infrastructures. In this breakout session, we will discuss methods and methodology to get away from server hugging and be concerned more with the overall status and life of our entire infrastructure. From making use of toss-away-able on-demand infrastructure, to monitoring services and not individual servers, to getting away from naming instances, this session helps you see your infrastructure for what it is, technology that you control.

TRANSCRIPT

Page 1: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

They Don't Hug Back! Or Why You Need To Stop Worrying About prodweb001 And Start Loving i-98fb9856

Chris Munns, Amazon Web Services

November 13, 2013

Page 2: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Why are we here? Old-school IT practices continue to weigh us down in the cloud. We need a way out.

Page 3: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

“Everything now is a programmable resource. There are no physical things anymore. Things that you needed to do by walking to the datacenter, by hugging your servers, and believe me I’ve hugged servers enough in my life. They DO NOT hug you back.”

Page 4: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

“Everything now is a programmable resource. There are no physical things anymore. Things that you needed to do by walking to the datacenter, by hugging your servers, and believe me I’ve hugged servers enough in my life. They DO NOT hug you back.” -

Dr. Werner Vogels (Re:Invent 2012)

Page 5: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

“But I love my servers!” - You (now)

https://secure.flickr.com/photos/schluesselbein/4157426778/

Page 6: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

“They hate you, actually, I honestly believe that they hate you.

Page 7: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

“They hate you, actually, I honestly believe that they hate you. At least that is how they behaved towards me.” –

Dr. Werner Vogels (Re:Invent 2012)

Page 8: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

“But I love my servers!” “Well now I’m kind of sad.”

- You (now)

https://secure.flickr.com/photos/bensonkua/2687804310/

Page 9: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

So where does server hugging

come from?

Page 10: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

NAMING THEM

https://secure.flickr.com/photos/quinnanya/4464205726

Page 11: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

So where does server hugging come from?

Why do we name them?

Page 12: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

So where does server hugging come from?

Why do we name them? Because we have to know where to find them.

Page 13: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

So where does server hugging come from?

Why do we name them? Because we have to know where to find them. Where do we need to find them?

Page 14: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Here

https://secure.flickr.com/photos/arthur-caranta/2925352521

Page 15: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Here

Or here?

https://secure.flickr.com/photos/arthur-caranta/2925352521

Page 16: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

IF THIS THING IS OUT OF TAPE, YOU HAD A REALLY BAD DAY.

https://secure.flickr.com/photos/stephendotcarter/6587082437

Page 17: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

So where does server hugging come from?

Why did we need to find them in person?

Page 18: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

So where does server hugging come from?

Why did we need to find them in person? Because we HAD to fix them.

Page 19: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

So where does server hugging come from?

Why did we need to find them in person? Because we HAD to fix them. WHY?

Page 20: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013
Page 21: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013
Page 22: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013
Page 23: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013
Page 24: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

So where does server hugging come from?

We fixed them because: Dead servers == dead space Dead space == wasted $$$ Dead servers == worse performance Worse performance == lost $$$

Page 25: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

So where else does server hugging

come from?

Page 26: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

SERVERS != OUR PETS

https://secure.flickr.com/photos/thegirlsny/3877243166/

Page 27: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

What we name our pets • Greek gods: Zeus, Thor, Hercules… • Elements: Hydrogen, Helium, Lithium… • Comic book heroes: Superman, Ironman… • Musicians, Cities, Countries, Movies • Prodweb01, Prodapi01… • Web01.prod, Web01.test… • Tacotruck01 • P1cfw01v03

Page 28: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

What we name our pets • Greek gods: Zeus, Thor, Hercules… • Elements: Hydrogen, Helium, Lithium… • Comic book heroes: Superman, Ironman… • Musicians, Cities, Countries, Movies • Prodweb01, Prodapi01… • Web01.prod, Web01.test… • Tacotruck01 • P1cfw01v03

Page 29: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

P1cfw01v03 https://secure.flickr.com/photos/75898532@N00/3243666946/

Page 30: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

EC2

EC2

EC2

EC2 EC2

EC2 EC2

EC2

P1cfw01v03 https://secure.flickr.com/photos/verylastexcitingmoment/3118396767/

Page 31: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Waking when they cry: *** Nagios *** Notification Type: PROBLEM Service: Web CPU Host: web03.example.com Address: 10.167.10.51 State: CRITICAL Date/Time: Thu Oct 24 08:14:13 UTC 2013 Additional Info: CRITICAL – CPU LOAD 29

Page 32: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Hugging server babies and you • Is the site performing worse? • Are your customers impacted? • How impacted are they? • What are the other 20 web instances doing? • Did I really need to wake up at 4am for this? • If a server uses 100% of its CPU, should I care? • If this server is bad, how much work is there in fixing

it? • Is there something custom about this server?

Page 33: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Server hugging bad practices • “Pet-ting” – caring about a server’s “name,” its

well being, its individual status • “Snowflakes” – unique hosts in a common pool • “Model T-ing” – Hand-built one-off servers • “Names In Stone” – overuse of host names as

a source of truth

Page 34: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

In short, there are a lot of old-school, dated habits being taken to cloud infrastructure. And once you’ve brought them to the cloud, you lose out on a lot of the benefits of the cloud. Such as: • Dynamic scale up/down • Self healing infrastructures • Increased flexibility • Automation

Page 35: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

https://secure.flickr.com/photos/tolomea/5113266973/

Page 36: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Letting go involves moving forward with some of the best of what AWS can offer you in terms of services and how you can work with them in some pretty incredible ways.

Page 37: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Letting go and loving the new way

• Using Auto Scaling for everything • ENIs and EIPs • Tags are the new DNS • Deployment tools • Host-based configuration • Service registries

Page 38: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Sleeping through Infrastructure Recovery

https://secure.flickr.com/photos/dominiqs/331702231

Page 39: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

The things that should never wake you up

• High CPU usage on anything • High memory usage on anything • Thread/process exhaustion • Filled disks • Not running software • Failed instances

Page 40: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Metrics:

Page 41: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Metrics:

Page 42: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Common actions taken when paged

1. Look at logs

2. Look at graphs

3. Reboot/restart related application/instance

Page 43: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Common actions taken when paged

1. Look at logs

2. Look at graphs

3. Reboot/restart related application/instance

} Looking at past data

Page 44: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Common actions taken when paged

1. Look at logs

2. Look at graphs

3. Reboot/restart related application/instance

} Looking at past data

Why do this manually?

Page 45: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Provisioned capacity

Traffic to our site vs. provisioned capacity manually

Page 46: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

76%

24%

Provisioned capacity

Traffic to our site vs. provisioned capacity manually

Page 47: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Traffic to our site vs. provisioned capacity with Auto Scaling

Provisioned capacity

Page 48: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

STONITH "Shoot the other node in the head”

Don’t be afraid to kill a node a with

something wrong with it as a resolution to failure!

With Auto Scaling it’s fine!

Page 49: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

STONITH

AWS Cloud

Virtual Private Cloud Availability Zone Availability Zone

Availability Zone

Web Instance

Web Instance

Web Instance

Internet Gateway

ELB ELB ELB

Auto Scaling Group min=3

Page 50: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

STONITH

AWS Cloud

Virtual Private Cloud Availability Zone Availability Zone

Availability Zone

Web Instance

Web Instance

Web Instance

Internet Gateway

ELB ELB ELB

Auto Scaling Group min=3

Page 51: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

STONITH

AWS Cloud

Virtual Private Cloud Availability Zone Availability Zone

Availability Zone

Web Instance

Web Instance

Web Instance

Internet Gateway

ELB ELB ELB

CloudWatch

Auto Scaling Group min=3

Page 52: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

STONITH

AWS Cloud

Virtual Private Cloud Availability Zone Availability Zone

Availability Zone

Web Instance

Web Instance

Web Instance

Internet Gateway

ELB ELB ELB

CloudWatch

Auto Scaling Group min=3

Page 53: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

STONITH

AWS Cloud

Virtual Private Cloud Availability Zone Availability Zone

Availability Zone

Amazon SNS

Web Instance

Web Instance

Web Instance

Internet Gateway

ELB ELB ELB

CloudWatch

Alarm

Auto Scaling Group min=3

Page 54: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

STONITH

AWS Cloud

Virtual Private Cloud Availability Zone Availability Zone

Availability Zone

Amazon SQS Amazon SNS

Web Instance

Web Instance

Web Instance

Internet Gateway

ELB ELB ELB

CloudWatch

Alarm

Auto scaling Group min=3

Page 55: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

STONITH

AWS Cloud

Virtual Private Cloud Availability Zone Availability Zone

Availability Zone

Amazon SQS Amazon SNS

Web Instance

Web Instance

Web Instance

Internet Gateway

ELB ELB ELB

CloudWatch

Alarm

Watcher Instance

Auto scaling Group min=3

Page 56: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

STONITH

AWS Cloud

Virtual Private Cloud Availability Zone Availability Zone

Availability Zone

Amazon SQS Amazon SNS

Web Instance

Web Instance

Web Instance

Internet Gateway

ELB ELB ELB

CloudWatch

Alarm

Watcher Instance

EC2 API

Auto scaling Group min=3

Page 57: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

STONITH

AWS Cloud

Virtual Private Cloud Availability Zone Availability Zone

Availability Zone

Amazon SQS Amazon SNS

Auto scaling Group min=3

Web Instance

Web Instance

Internet Gateway

ELB ELB ELB

CloudWatch

Alarm

Watcher Instance

EC2 API

Page 58: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

STONITH

AWS Cloud

Virtual Private Cloud Availability Zone Availability Zone

Availability Zone

CloudWatch Amazon SQS Amazon SNS

Web Instance

Web Instance

Web Instance

Internet Gateway

ELB ELB ELB

EC2 API

Watcher Instance

Auto scaling Group min=3

Page 59: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Auto Scaling for everything! • You can use Auto Scaling for singular instances that

don’t scale up or down – min = 1, max = 1

• Auto Scaling gives you the ability to specify multiple Availability Zones, even you only need a single host – gives you multi-AZ failover

• Auto Scaling supports notifications on instance creation/termination – Useful for configuring other resources, bootstrapping, and

provisioning • Auto Scaling is free!

Page 60: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Auto Scaling for everything!

• Make use of the user data or configuration management tools to do things like: – Re-attaching an Amazon Elastic Block Store (EBS) volume with

application data – Re-attaching an Elastic Network Interface (ENI) – Update service registries – Update DNS – Update other reliant applications of the new host

Page 61: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Elastic Network Interfaces/Elastic IPs ENI: • Add additional interfaces to an

instance • One or more secondary private

IP addresses • Has its own MAC address • Can have Security Groups

assigned • Tag-able • Free

EIP: • A static public IP address • Can be assigned to either an

instance or an ENI • Doesn’t replace private IP • Small hourly charge when not

attached to an instance

Page 62: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Elastic Network Interfaces

Attaching multiple network interfaces to an instance is useful when you want to: • Create a management network. • Use network and security appliances in your

Amazon Virtual Private Cloud (VPC). • Create dual-homed instances with workloads/roles on distinct

subnets. • Create a low-budget, high-availability solution.

Page 63: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Elastic Network Interfaces

Attaching multiple network interfaces to an instance is useful when you want to: • Create a management network. • Use network and security appliances in your

Amazon Virtual Private Cloud (VPC). • Create dual-homed instances with workloads/roles on distinct

subnets. • Create a low-budget, high-availability solution.

Page 64: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance

AWS Cloud

EC2 API

AWS CloudFormation

Page 65: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance

AWS Cloud

Virtual Private Cloud

Availability Zone

EC2 API

AWS CloudFormation

Internet Gateway

NAT Instance

Page 66: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance

AWS Cloud

Virtual Private Cloud

Availability Zone

App Instance

EC2 API

AWS CloudFormation

Internet Gateway

NAT Instance

Page 67: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance

AWS Cloud

Virtual Private Cloud

Availability Zone

Auto-Scaling Group

App Instance

EC2 API

AWS CloudFormation

NAT Instance

Internet Gateway

Page 68: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance

AWS Cloud

Virtual Private Cloud

Availability Zone

Auto-Scaling Group

Elastic Network Instance

App Instance

EBS Volume NAT

Instance

Internet Gateway

EC2 API

AWS CloudFormation

Page 69: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance

AWS Cloud

Virtual Private Cloud

Availability Zone

Auto-Scaling Group

Elastic Network Instance

App Instance

EBS Volume NAT

Instance

Internet Gateway

EC2 API

AWS CloudFormation

Page 70: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance

Instances

AWS Cloud

Virtual Private Cloud

Availability Zone

Auto-Scaling Group

Elastic Network Instance

App Instance

EBS Volume NAT

Instance

Internet Gateway

EC2 API

AWS CloudFormation

Page 71: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance

Instances

AWS Cloud

Virtual Private Cloud

Availability Zone

Auto-Scaling Group

Elastic Network Instance

App Instance

EBS Volume NAT

Instance

Internet Gateway

EC2 API

AWS CloudFormation

Page 72: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance

Instances

AWS Cloud

Virtual Private Cloud

Availability Zone

Auto-Scaling Group

Elastic Network Instance

App Instance

EBS Volume NAT

Instance

Internet Gateway

EC2 API

AWS CloudFormation

Page 73: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance

Instances

AWS Cloud

Virtual Private Cloud

Availability Zone

Auto-Scaling Group

Elastic Network Instance

App Instance

EBS Volume NAT

Instance

Internet Gateway

EC2 API

AWS CloudFormation

Page 74: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance

Instances

AWS Cloud

Virtual Private Cloud

Availability Zone

Auto-Scaling Group

Elastic Network Instance

App Instance

EBS Volume NAT

Instance

Internet Gateway

EC2 API

AWS CloudFormation

Page 75: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance

Instances

AWS Cloud

Virtual Private Cloud

Availability Zone

Auto-Scaling Group

Elastic Network Instance

App Instance

EBS Volume NAT

Instance

Internet Gateway

EC2 API

AWS CloudFormation

Page 76: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance "myENI" : {

"Type" : "AWS::EC2::NetworkInterface",

"Properties" : {

"Tags": [{"Key":"Name","Value":"AppENI"}, {"Key":"Project","Value":"Blog"}],

"Description": "Blog One Off App Server ENI.",

"SubnetId": "subnet-d2286cb9",

"PrivateIpAddress": "192.168.11.100"

}

}

Page 77: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance import boto.ec2

import boto.utils

conn = boto.ec2.connect_to_region('us-west-2')

myfilters = {'tag:Name': 'AppENI', 'tag:Project': 'Blog’}

myEni=conn.get_all_network_interfaces(filters=myfilters)

myInstance=boto.utils.get_instance_metadata()['instance-id']

conn.attach_network_interface(myEni[0].id, myInstance, device_index=1, dry_run=False)

Page 78: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Healing a single instance import boto.ec2

import boto.utils

conn = boto.ec2.connect_to_region('us-west-2')

myfilters = {'tag:Name': 'AppENI', 'tag:Project': 'Blog’}

myEni=conn.get_all_network_interfaces(filters=myfilters)

myInstance=boto.utils.get_instance_metadata()['instance-id']

conn.attach_network_interface(myEni[0].id, myInstance, device_index=1, dry_run=False)

Connect to API

Find the right ENI Attach ENI to instance

Page 79: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

https://secure.flickr.com/photos/cambodia4kidsorg/260004685

Use tags as a source of “truth” in your

infrastructure

Page 80: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

DNS bad. Tags good.

DNS • 30-year old technology • Only tells us a single

thing about a host, a hostname to IP mapping.

• Potential for split brain/broken replicas

• Caching issues, caching issues, caching issues

• Set by you the user, held in AWS and available via APIs

• Key:Value is totally up to you

• Can have several per resource

• Free to implement and query

Tags

Page 81: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

DNS bad. Tags good.

DNS Web03.example.com:

– 10.167.10.51

Tags i-933f81a4:

– Name:Web – Env:Prod – Project:Blog – Owner:BobSmith – aws:autoscaling:groupName :

ProdBlogWebsASG – aws:cloudformation:stack-name:

BlogSiteProd

Page 82: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Tags as a source of truth

• Tie various resources together • Billing reports • IAM resource-level permissions • Build automation • Deploy automation • Security resource grouping

Page 83: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Stop hand-crafting servers!

https://secure.flickr.com/photos/ndrwfgg/115898387

Page 84: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Use automation!

https://secure.flickr.com/photos/genewolf/147722350

Page 85: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

AWS management tools

AWS Elastic Beanstalk AWS OpsWorks AWS CloudFormation

Higher-level services Do it yourself

Convenience Control

Page 86: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Host-based configuration management

Fabric

Page 87: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Host-based configuration management

• All more or less accomplish the same things – File configuration, package/software installation, user management, run

commands, interface with OS, process management

• All have their own syntax that isn’t too dissimilar • Some rely on agents, some are agentless • Use HBCM alongside one of the tools from the previous

slide • Spend the time required to learn them • Can’t scale easily without HBCM

Page 88: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

“I don’t have time to learn Chef!?”

https://secure.flickr.com/photos/45909111@N00/9374169461/

Page 89: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

“I don’t have time to learn Chef!?”

“I wrote custom shell scripts instead!”

https://secure.flickr.com/photos/45909111@N00/9374169461/

Page 90: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

https://secure.flickr.com/photos/45909111@N00/9374169461/

Go visit the AWS & Partner exhibits and ask for more

info!

Page 91: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Making Use of Service Registries

https://secure.flickr.com/photos/fringedbenefit/9178086713

Page 92: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

https://secure.flickr.com/photos/smartfinn/2651755337/

Page 93: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

NOT THAT KINDA REGISTRY!

https://secure.flickr.com/photos/smartfinn/2651755337/

Page 94: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

“A service registry is one of the fundamental pieces of service-oriented architecture (SOA) for achieving reuse. It refers to a

place in which service providers can impart information about their offered services and

potential clients can search for services.” - www.architecturejournal.net, Sept 2009

Page 95: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Service registry workflow

1. A new instance boots. 2. It registers itself with our “service registry.” 3. Changes to the service registry kick off changes on

other systems related to the new instance. 4. Other instances now know about our new instance. 5. On instance termination, instance is deregistered,

and other instances remove it from use.

Page 96: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Service registry examples:

• Zookeeper • MuleSoft Anypoint Service Registry • Netflix Eureka • IBM WebSphere Service Registry and

Repository • Airbnb SmartStack

Page 97: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Zookeeper “is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.” – zookeeper.apache.org

– leader election – group membership – configuration maintenance – event notification – locking – priority queue mechanism

Page 98: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Zookeeper

AWS Cloud

Virtual Private Cloud Availability Zone

Availability Zone Availability Zone

Zookeeper Instance

Auto scaling Group min=2

Worker Instance

Worker Instance

Zookeeper Instance

Zookeeper Instance

Leader Host

Page 99: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Enough from me!

Page 100: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Customer Story: Airbnb SmartStack Martin Rhoads

Page 101: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Martin Rhoads SRE @ Airbnb November 13, 2013

Airbnb SmartStack Helping you build Service Oriented Architectures

Page 102: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

not at Re:Invent

Intros

Igor Serebryany + SRE at Airbnb since 2012 + Built datacenter automation at

SingleHop + Scientific computing at University

of Chicago + Hobbies: welding, biking, long

walks on the beach

102

Page 103: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

This guy is even more bearded than the last!

Intros

Martin Rhoads + SRE at Airbnb + user of AWS since 2006 + First 10 employees at RightScale + Previously worked at

Cloudscaling deploying OpenStack at Tier1s and Telcos

+ BioInformatics at UCSB + Obsessed with making things

easier

103

Page 104: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

SmartStack Helping you build SOA

Page 105: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

What are you trying to sell me?

Why do I need SOA?

+ The definitive way to scale your architecture + Allow different people to work on different code without stepping on toes + Separate deployment schedules + Separate machine and data requirements + Fail separately -- so you can have graceful degradation

105

Page 106: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

How SOA happens When customers love a service very, very much...

106

Page 107: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

How SOA happens

107

When customers love a service very, very much...

Page 108: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

How SOA happens

108

When customers love a service very, very much...

Page 109: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

How SOA happens When customers love a service very, very much...

109

Page 110: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

How SOA happens When customers love a service very, very much...

110

Page 111: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

How SOA happens When customers love a service very, very much...

111

Page 112: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Here’s how it ends up A certain kind of fun

112

Page 113: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

To sum up

113

1 Services help you scale

2 SOA is an architecture style designed around services

3 A SOA is hard to manage

4 SmartStack makes managing SOA a breeze

Page 114: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

What is SmartStack? And how does it help?

Page 115: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

SERVICE 1 Service(s) you want to deliver

2 Zookeeper registry to track everything

ZOOKEEPER

3 Nerve checks health and updates Zookeeper

4 Synapse routes between services

SYNAPSE NERVE NERVE

Page 116: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

MONORAIL

NERVE SYNAPSE

MOBILE WEB

NERVE SYNAPSE

ZOOKEEPER

+ /production/monorail/services/i-1234567 => {‘host’: 1.2.3.4, ‘port’: 5678}

+ /production/mobile_web/services/i-0abcdef => {‘host’: 5.6.7.8, ‘port’: 5678}

Page 117: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

haproxy

We get myriad benefits from haproxy + Stable and well-tested

+ Performs in-process connectivity checks

+ Great introspection and logging

+ Lots of load-balancing algorithms (RR, least-conn)

+ Somewhat dynamically reconfigurable (stats socket)

At the core of synapse

117

Page 118: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

To Recap SmartStack in action

118

Page 119: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Introspection

Abstraction and DRY

Distributed by design

Automatic failure detection Why SmartStack?

Page 120: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Abstraction

120

+ The same code in the same language is always doing discovery/registration

+ Your application doesn’t know about nerve/synapse -- it only knows about its dependencies

+ Always consistent across your infrastructure

Page 121: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

You don’t have to wake up

Automatic Failure Handling

+ Bad backends are automatically taken out of rotation + Useful during both problems and routine maintenance/deploys + Push-based => very rapid detection; avoid those little blips + haproxy even routes around network partitions!

121

Page 122: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

See what’s REALLY going on

Introspection

Leverage the power of haproxy + status page that lets you see local

state + lots of available integrations to

gather global state + world-class logging for large-scale

analysis

122

Page 123: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

No central point of failure

Distributed by Design

+ Traffic flows directly between boxes -- no routing layer + Even if SmartStack is stopped or broken, haproxy keeps traffic flowing + Zookeeper helps to avoid common pitfalls (like different backends in

different network segments)

123

Page 124: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

How SmartStack has changed Airbnb

The Impact

124

100+

Services using

SmartStack

Requests per second

LOC deleted

Engineers using

SmartStack

2K 3K 30

Page 125: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Ben: “SmartStack is great! It helped me to discover services – and quit smoking”

Phillippe: “Distributed computing? And all this time I thought everything was running on one machine”

Spike : “Nerve and Synapse have greatly simplified my life as an application developer, and have enabled me to launch our first Node.js services with very little ops overhead.”

Barbara: “I love it!”

Sean: “Smart Stack has made deployment of new java services a matter of beer and 20 lines of ruby”

Our engineers love SmartStack

Page 126: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Future Direction Is this project, like, done...?

126

1

2

3

4

Better resiliency: more graceful handling of zookeeper edge cases

Better testing: improve on the current integration test suite

Dynamic registration: for services running on Mesos et. al.

A push API for nerve: allow services to communicate coming downtime

5 An auto-scaling layer: use nerve information to determine load levels

Page 127: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

I’m sold! How do I get started?

Page 128: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Getting Started

128

1

2

3

install Vagrant

git clone https://github.com/airbnb/smartstack-cookbook.git

vagrant up

Page 129: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Where is the code?

129

https://github.com/airbnb/nerve.git

https://github.com/airbnb/synapse.git

Page 130: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

AWS re:Invent Pub Crawl

Join the AWS Startup Team this evening at the AWS Pub Crawl When: Wednesday November 13, 5:30pm - 7:30pm Where: Canaletto at The Venetian, 2nd Floor Who Will Be There: Startups, the AWS Startup Team, Startup Launch Companies, and AWS re:Invent Hackathon winners

Page 131: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

Startup Spotlight Sessions with Dr. Werner Vogels Thurs. Nov 14, Marcello Room 4406

SPOT 203 – Fireside Chats – Startup Founders, 1:30-2:30pm – Eliot Horowitz, CTO of MongoDB – Jeff Lawson, CEO of Twilio – Valentino Volonghi, Chief Architect of AdRoll

SPOT 204 – Fireside Chats – Startup Influencers, 3:00-4:00pm – Albert Wegner, Managing Partner at Union Square Ventures – David Cohen, Founder and CEO of TechStars

SPOT 101 - Startup Launches, 4:15-5:15pm – 5 companies powered by AWS launching at AWS re:Invent 2013

Page 132: Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.