more nines for your dimes: improving availability and lowering costs using auto scaling and amazon...

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

More Nines for Your Dimes: Improving Availability and Lowering Costs using Auto Scaling

Ran Tessler, AWS Solutions Architecture ([email protected])

September 17, 2014

Topics We’ll Cover Today• Auto Scaling introduction• Console demo

• Maintaining application response times and fleet utilization• Handling cyclical demand, unexpected “weather events”

• Auto Scaling for 99.9% Uptime• Single-instance groups

• The opportunity cost of NOT scaling• Auto Scaling to reduce costs

AWS

The Weather Channel

Nokia

Dreambox

Ways You Can Use Auto Scaling

Launch EC2 instances

and groups from

reusable templates

Scale up and down as

needed automatically

Auto-replace

Instances and

maintain EC2 capacity

Launch Configurations Auto Scaling Groups Auto Scaling Policies

Demo

Learn the new terms:

Launch Configuration

Auto Scaling Group

Scaling Policy

Amazon CloudWatch Alarm

Amazon SNS Notification

What’s New in Auto Scaling

Better integration

• EC2 console support

• Scheduled scaling policies in

CloudFormation templates

• ELB connection draining

• Auto-assign public IPs in VPC

• Spot + Auto Scaling

More APIs

• Create groups based on running instances

• Create launch configurations based on running

instances

• Attach or detach running instances from a

group

• Perform lifecycle actions on group instances

• Place instances in standby state for

troubleshooting

Why Auto Scaling?

Scale Up Control CostsImprove Availability

The Weather Company• Top 30 web property in the U.S.• 2nd most viewed television

channel in the U.S.• 85% of U.S. airlines depend on

our forecasts• Major retailers base marketing

spend and store displays based on our forecasts

• 163 million unique visitors across TV and web

Wunderground Radar and Maps100 million hits a day One Billion data points per day

Migrated real-time radar mapping system wunderground.com to AWS Cloud

30,000 PersonalWeatherStations

Source: Wunderground, Inc. 2013

Why Auto Scaling?

Why Auto Scaling?Hurricane Sandy

Before Migration – Traditional IT Model doesn’t scale well

Server Count(110 Servers)

Avg. CPU Load HTTP Response Latency(~6000 ms)

HTTP Response Latency(5-15ms)

Server Count(from 110 to 170 Instances)

Avg. CPU Load

After Migration - Wunderground Radar App

Radar on AWS Auto Scaling Architecture

Radar on AWS

CPU Utilization

Radar on AWS

Host Count

Radar on AWS

Scale up to ensure consistent performance during high-demand

Why Auto Scaling?


Auto Scaling for 99.9% Uptime

Here.com Local Search Application

• Local Search app• First customer facing

application on AWS• Obvious need for

Uptime

Here.com Local Search Architecture

US-East-1

US-West-2

EU-West-1

US-East-1a

Zookeeper1

Zookeeper2

Zookeeper3

Frontend Group

BackendGroups

US-East-1b

Zookeeper1

Zookeeper2

Zookeeper3

Frontend Group

Backend Groups

AP-Southeast-1

Here.com Local Search Architecture

US-East-1

US-West-2

EU-West-1

US-East-1a

Zookeeper1

Zookeeper2

Zookeeper3

Frontend Group

BackendGroups

US-East-1b

Zookeeper1

Zookeeper2

Zookeeper3

Frontend Group

Backend Groups

AP-Southeast-1

Single-Instance Auto Scaling Groups (Zookeeper)

1. Auto-healing: Instances auto-register in

DNS via Route53

2. Dynamic: Auto Scaling Group Names

are used for cluster-node lookups

(cluster1-zookeeper1)

3. Used Standard Tools such as DNS

instead of Queries or Elastic IPs

Here.com Local Search Success

• Increased Uptime to 99.9%• All detected health

problems have been successfully replaced by Auto Scaling with zero intervention.

• Zookeeper setup has performed flawlessly

“We’ve been paranoid so it still pages us; It’s beginning to feel silly.”

Why Auto Scaling?


A little background on our application

• Ruby on Rails• Unicorn• We teach kids math!

A workload well suited for auto scaling

The opportunity cost of NOT scaling• Our usage curve

from 3/20• Low of about 5

concurrent users• High of about

10,000 concurrent users

The opportunity cost of NOT scaling• No autoscaling• 672 instance hours• $302.40 at on-

demand prices

The opportunity cost of NOT scaling• Autoscaling four

times per day• 360 instance hours• $162 at on-demand

prices• 46% savings vs no

autoscaling

The opportunity cost of NOT scaling• Autoscaling as

needed, twelve times per day

• 272 instance hours• $122.40 at on-

demand prices• 24% savings vs

scaling 4 times per day

• 60% savings vs no autoscaling

The opportunity cost of NOT scaling

$302/day

$162/day

$122/day

Demand curve hugs the usage curve…

“Auto Scaling saves us a lot of money; with a little bit of math, flexibility of AWS allows us to further save by aligning our demand curve with usage curve.” -- Dreambox

Why Auto Scaling?


Key Takeaways

• Maintaining application response times and fleet utilization• Scaling up and handling unexpected “weather events”

• Auto Scaling for 99.9% Uptime• Single-instance groups

• The opportunity cost of NOT scaling• Auto Scaling to reduce costs

The Weather Channel

Nokia

Dreambox

Common Scenarios

• Schedule a one-time scale out and flip to production

• Follow daily, weekly, or monthly cycles

• Provision capacity dynamically by scaling on CPU, memory,

request rate, queue depth, users, etc.

• Auto-tag instances with cost center, project, version, stage

• Auto-replace instances that fail ELB or EC2 checks

• Auto-balance instances across multiple zones.

Prepare for a Big Launch

Fit Capacity to Demand

Be Ready for Spikes

Simplify Cost Allocation

Maintain Stable Capacity

Go Multi-AZ

Thank You!

Ran [email protected]

more nines for your dimes: improving availability and lowering costs using auto scaling and amazon...

Technology

1bzookeeper1zookeeper2zookeeper3frontend

local search

scaling auto

auto scaling

instance groups

opportunity

9 uptime single

scaling