managing rightscale on rightscale

of 21 /21
1 Managing RightScale on RightScale Rafael H. Saavedra VP of Engineering

Author: rightscale

Post on 20-Aug-2015




0 download

Embed Size (px)


  1. 1. Managing RightScale on RightScale
    Rafael H. Saavedra
    VP of Engineering
  2. 2. Topics
    RightScale managed by RightScale
    Meta, production, staging & development
    An overview of the production system
    Deploying RightScale best practices
    What we love about using RightScale
    Features that are difficult to use
  3. 3. RightScale: Cloud Management Platform
    RightScale Production
    Customer A
    Customer D
    Customer B
    Customer C
  4. 4. RightScale: Cloud Management Platform
    RightScale Meta Production
    Customer A
    Customer D
  5. 5. A multitude of RightScale systems
    Meta Production currently lives outside the cloud
    Use only to manage the production system
    Only RightScale ops accounts
    Reaching 200 servers, a large fraction in EC2 us-east
    Servers in every cloud to achieve high availability
    Servers allocated in well defined availability zones
    A few staging systems used for integration and QA
    Ad hoc systems for performance testing, demos, betas
    Many development systems with simplified configurations
    A development system at the click of a button
  6. 6. Significant increase in cloud usage
  7. 7. Some interesting RightScale numbers
    1.65M servers launched by RightScale
    RightScale continuously monitors more than 60k servers
    Every day at RightScale:
    2,000 array resize actions are executed
    35,000 alert escalations are triggered
    20,000 escalation emails are sent to users
    9.0TB of monitoring data is exchange with our servers
    1.6TB of logging data is sent to our servers
  8. 8. RightScale production simplified
    Main App
    Front Ends
    DB Master
    DB Slave
  9. 9. What is that our users do?
    Dashboard, API, monitoring graphs & event notifications
    Most of the requests are monitoring updates 85% (70%)
    Dashboard and API represent 7% of requests but 26% of traffic
  10. 10. We eat our own dog food
    Production servers organized into independent deployments
    Core servers: frontends, core/api servers, databases, daemons
  11. 11. We eat our own dog food
    Extensive use of security groups to isolate servers
    ServerTemplates are maintained for each major release
    Ability to launch exact configurations of past versions
  12. 12. Monitoring, alerts & escalations
    Monitor as much as possible, what is relevant and display it in insightful ways
    The need to quickly detect patterns and abnormalities
    Proactively eliminate the conditions that raise critical alerts
    No broken windows policy
  13. 13. QuisCustodietIpsosCustodes?*
    The need to monitor the monitoring and alerting systems
    Extensive use of alerts to monitor the responsiveness of all the RightScale servers
    Instance and EBS failures gives us headaches
    Decoupling the meta & production monitoring and alerting systems
    * Who watches the watchmen?
  14. 14. How to Monitor hundreds of servers?
    Starting to use stacked graphs & heat maps
    The need to quickly detect patterns andabnormalities
  15. 15. Our favorite RightScale features
    RightImages: never again the need to build custom images
    Input inheritance: makes it easy to keep the configurations of dozens of servers in sync
    ServerTemplates: very easy to reproduce configurations in production, staging and development
    The Library: there is always an example of something new that can be adapted to our needs
    Monitoring: easy to make a collectdplugins to monitor just about anything
  16. 16. Our not so favorite features
    ServerTemplate inputs: powerful but too many of them make templates difficult to use
    Revision management: a way to go to make users aware of new revisions and version and how to update
    The Library: checking out new resources from library is not easy
    Alerts: they work pretty well but they are not easy to configure, in particular, custom ones
  17. 17. Best practices: upgrading RightScale
    Avoid upgrading existing servers; instead launch fresh ones with new software (fail forward)
    Not possible on some components, e.g. monitoring servers, which are in the hundreds
    The cost of duplicating servers is minimal
    Old servers can take over in case something goes wrong
    Launch additional slaves to capture recovery points
    One slave continues to replicate in case of master failure
    Another slave is frozen at upgrade point can rollback by failing over
    Dont forget to take snapshots in case of major failure
  18. 18. Upgrading RightScale: Step by Step
    Front Ends
    Main App
    Main App
    servers with new code
    take snapshot at cutoff
    DB Master
    DB Slave
    DB Slave
    stop replication
    cut access to site
    stop all access to DB
    servers with old code
  19. 19. Upgrading RightScale: Step by Step
    Front Ends
    Main App
    Main App
    servers with new code
    snapshot at cutoff
    reconnect all servers
    DB Master
    DB Slave
    DB Slave
    stop replication
    servers with old code
  20. 20. Upgrading RightScale: Step by Step
    Front Ends
    Main App
    Main App
    servers with new code
    reconnect all servers
    open access to site
    DB Master
    DB Slave
    DB Slave
    servers with old code