Transcript
Page 1: Managing RightScale on RightScale

1

Managing RightScale on RightScale

Rafael H. Saavedra

VP of Engineering

Page 2: Managing RightScale on RightScale

2

Topics• RightScale managed by RightScale

• Meta, production, staging & development

• An overview of the production system

• Quis Custodiet Ipsos Custodes

• Deploying RightScale – best practices

• What we love about using RightScale

• Features that are difficult to use

Page 3: Managing RightScale on RightScale

3

RightScale Production

RightScale: Cloud Management Platform

Customer A Customer DCustomer B Customer C

Page 4: Managing RightScale on RightScale

4

RightScaleProduction

RightScale: Cloud Management PlatformRightScale Meta

Production

RightScaleStaging

Customer A Customer D

RightScaleDevelopment

RightScaleDevelopment

Page 5: Managing RightScale on RightScale

5

A multitude of RightScale systems• Meta Production currently lives outside the cloud

• Use only to manage the production system• Only RightScale ops accounts

• Production: my.rightscale.com• Reaching 200 servers, a large fraction in EC2 us-east• Servers in every cloud to achieve high availability• Servers allocated in well defined availability zones

• A few staging systems used for integration and QA• Ad hoc systems for performance testing, demos, betas

• Many development systems with simplified configurations• A development system at the click of a button

Page 6: Managing RightScale on RightScale

6

Significant increase in cloud usage

N-08 D-08 J-09 F-09 M-09 A-09 M-09 J-09 J-09 A-09 S-09 O-09 N-09 D-09 J-10 F-10 M-10 A-10 M-10 J-10 J-10 A-10 S-10 O-10

EC

2 U

sag

e

N-08 D-08 J-09 F-09 M-09 A-09 M-09 J-09 J-09 A-09 S-09 O-09 N-09 D-09 J-10 F-10 M-10 A-10 M-10 J-10 J-10 A-10 S-10 O-10

EC

2 U

sa

ge

Page 7: Managing RightScale on RightScale

7

Some interesting RightScale numbers• 1.65M servers launched by RightScale

• RightScale continuously monitors more than 60k servers

• Every day at RightScale:• 2,000 array resize actions are executed• 35,000 alert escalations are triggered• 20,000 escalation emails are sent to users• 9.0TB of monitoring data is exchange with our servers• 1.6TB of logging data is sent to our servers

Page 8: Managing RightScale on RightScale

8

RightScale production – simplifiedd

aem

on

s

DB Master

DB Slave

da

tab

ase

sm

irro

rs

log

gin

gm

on

ito

rin

g

Front Ends

da

shb

oar

d

AP

I

Main App oth

ers

Page 9: Managing RightScale on RightScale

9

What is that our users do?• Dashboard, API, monitoring graphs & event notifications• Most of the requests are monitoring updates 85% (70%)• Dashboard and API represent 7% of requests but 26% of

traffic

Monitoring85%

Noti-fica-tions8%

API6%

Dashboard1%

Distribution by Requests

Monitoring70%

Noti-fica-tions4%

API15%

Dashboard11%

Distribution by Bandwidth

Page 10: Managing RightScale on RightScale

10

We eat our own dog food• Production servers organized into independent deployments

• Core servers: frontends, core/api servers, databases, daemons

Page 11: Managing RightScale on RightScale

11

We eat our own dog food• Extensive use of security groups to isolate servers

• ServerTemplates are maintained for each major release• Ability to launch exact configurations of past versions

Page 12: Managing RightScale on RightScale

12

Monitoring, alerts & escalations• Monitor as much as possible, what is relevant and display it

in insightful ways

• The need to quickly detect patterns and abnormalities

• Proactively eliminate the conditions that raise critical alerts• No broken windows policy

APIs Cores

Page 13: Managing RightScale on RightScale

13

Quis Custodiet Ipsos Custodes?*• The need to monitor the monitoring and alerting systems

• Extensive use of alerts to monitor the responsiveness of all the RightScale servers

• Instance and EBS failures gives us headaches

• Decoupling the meta & production monitoring and alerting systems

* Who watches the watchmen?

Page 14: Managing RightScale on RightScale

14

How to Monitor hundreds of servers?• Starting to use

stacked graphs & heat maps

• The need to quickly detect patterns and abnormalities

Page 15: Managing RightScale on RightScale

15

Our favorite RightScale features• RightImages: never again the need to build custom images

• Input inheritance: makes it easy to keep the configurations of dozens of servers in sync

• ServerTemplates: very easy to reproduce configurations in production, staging and development

• The Library: there is always an example of something new that can be adapted to our needs

• Monitoring: easy to make a collectd plugins to monitor just about anything

Page 16: Managing RightScale on RightScale

16

Our not so favorite features• ServerTemplate inputs: powerful but too many of them make

templates difficult to use

• Revision management: a way to go to make users aware of new revisions and version and how to update

• The Library: checking out new resources from library is not easy

• Alerts: they work pretty well but they are not easy to configure, in particular, custom ones

Page 17: Managing RightScale on RightScale

17

Best practices: upgrading RightScale• Avoid upgrading existing servers; instead launch fresh ones

with new software (fail forward)• Not possible on some components, e.g. monitoring servers, which are

in the hundreds

• The cost of duplicating servers is minimal

• Old servers can take over in case something goes wrong

• Launch additional slaves to capture recovery points• One slave continues to replicate in case of master failure• Another slave is frozen at upgrade point – can rollback by failing over• Don’t forget to take snapshots in case of major failure

Page 18: Managing RightScale on RightScale

18

Front Ends

DB Slave

Databases

DB Master

Main App

Upgrading RightScale: Step by Step

Main App

DB Slave

take snapshot at cutoff

stop replication

servers with new code

servers with old code

cut access to site

stop all access to DB

Page 19: Managing RightScale on RightScale

19

reconnect all servers

Front Ends

DB Slave

Databases

DB Master

Main App

Upgrading RightScale: Step by Step

Main Appsnapshot at cutoff

servers with new code

servers with old code

DB Slave

stop replication

Page 20: Managing RightScale on RightScale

20

Front Ends

Main App

Upgrading RightScale: Step by Step

Main App

servers with new code

servers with old code

open access to site reconnect all servers

DB Slave

Databases

DB Master

DB Slave

Page 21: Managing RightScale on RightScale

21


Top Related