rightscale webinar: how rightscale architects its databases (for worldwide scale, ha and dr...
Embed Size (px)
TRANSCRIPT

#rightscale
How RightScale ArchitectsIts Databases
(for Worldwide Scale, HA and DR Scenarios)
January 30, 2013
Watch the recording of this webinar

#rightscale
# 2
Your Panel TodayPresenting• Rafael H. Saavedra, VP Engineering, RightScale• Josep Blanquer, Chief Architect, RightScale
Q&A • Jared Marcell, Account Manager, RightScale• David Manriquez, Account Manager, RightScale
Please use the “Questions” window to ask questions any time!

#rightscale
# 3
Menu
Intro
Data Taxonomy
Data Storage DesignScale, HA and DR
Conclusion

#rightscale
# 4
Intro: Expectations and scope
What this is and what is not• IS a talk about:
• how RightScale has designed and implemented its backing datastores• …for a few of the most representative internal systems• …with the rationale behind it
• Is NOT a talk about• RightScale’s overall architecture• Nodes or hosts, it’s about Systems• RightScale’s data modeling
Note: Most of the design is implemented and in production but some of the most advanced things that are still in beta, or are still being worked on

#rightscale
# 5
Intro: Tools and Technologies• RightScale uses a mix of RDBMS and NoSQL technologies:
• MySQL , Cassandra and S3 (for backups and archiving)
• Transactionality:• MySQL: strong ACID properties• Cassandra: no Atomicity, eventually Consistent, some Isolation, Durable
• Availability:• MySQL: async replication. Master-SlaveN or Master-Master• Cassandra: Distributed, master-less, highly-replicated (multi-DC)
• Sharding:• MySQL: no explicit inter-node tools. (Sharding done by application)• Cassandra: partitions data internally across nodes.

#rightscale
# 6
Glossary: Examples we will use
Marketplace Assets RightScripts
ServerTemplates
Configuration data objects that areuser-generated, private or shared
TagsResource data that drives automation and reporting
EventsData used to communicate recent events and news feeds to users
Cloud Polling and GatewayData that records actions and states of external API-linked services
RoutingData used to locate and transport messages across instances and/or our services
MonitoringInfrastructure monitoring data recorded and presented on behalf of users

#rightscale
# 7
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog

#rightscale
# 8
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Common across accounts: Users Account Plans Settings MultiCloud Marketplace:
Published Assets Sharing Groups …

#rightscale
# 9
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account: Deployments Imported assets Alert Specifications Server Inputs Audit Tags User Events …

#rightscale
# 10
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account: Cloud resource states (cache) Cloud credentials

#rightscale
# 11
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account: Instance agents location Core agents location Agent action registry …

#rightscale
# 12
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account: Collected metric data Collected syslog data …

#rightscale
# 13
Taxonomy of RightScale’s DataX
-acc
tA
ccou
nt
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Which data do we need?• Data for all accounts• Data for a single account
Data shared between accounts
Data required within scopeof a single account
Data scope and containment

#rightscale
# 14
Use
rs
Taxonomy of RightScale’s DataIn
stan
ces
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Who uses the data?• Users through the Dash/API• Instances from the Cloud
Data close to the Users
Data close to the Cloud
Data Placement

#rightscale
# 15
Use
rs
Taxonomy of RightScale’s DataIn
stan
ces
X-a
cct
Acc
ount
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Who uses the data? Proximity to User vs. Cloud
Which data do we need? Scope of data available
Close to cloud resourcesAccount-shardable* data
Close to userAccount-shardable data
Close to userGlobally accessible data

#rightscale
# 16U
sers
Inst
ance
s
AccountX-Account

#rightscale
# 17U
sers
Inst
ance
s
global
X-Account
Custom replication
Why custom? More control• Multiple sources• Individual columns• Apply transformations• Smart re-sync features
Global: MySQL• ACID semantics• Master-Slave replication

#rightscale
# 18U
sers
Inst
ance
s
Account
global dash
S3
events
tags
audit
X-Account
Dashboard: MySQL• ACID semantics• Master-SlaveN replication• Slave reads• Rows tagged by account
Other systems: Cassandra• Simpler Key-Value access• Great scalability• Great replica control• High write availability• Time-to-live expiration as cache• Rows tagged by account
Data archive: S3• Low read rate• Globally accessible

#rightscale
# 19U
sers
Inst
ance
s
Account
global dash
S3
events
tags
audit
X-Account
dash
events
tags
audit
So we can horizontally scale our dashboard by partitioning objects
based on account groups:
Clusters

#rightscale
# 20U
sers
AccountC
lust
er 1
dash
S3
events
tags
audit
Clu
ster
N
dash
S3
events
tags
audit
Account Set 1 Account Set 2
RightScale Accounts
Clu
ster
3
dash
S3
events
tags
audit …
Features:• 1 cluster: N accounts• 1 account: 1 home• Migratable accounts
Benefits:• Great horizontal growth• Better failure isolation• Independent scale• Load rebalancing• Versionable code• Differentiated service
US Eas
t
EU Ja
pan

#rightscale
# 21
dash
events
tags
audit
Use
rsIn
stan
ces
Account
global dash
S3
events
tags
audit
routing
polling
monitor
X-Account

#rightscale
# 22
routing
polling
monitor
routing
polling
monitor
Use
rsIn
stan
ces
Account
global dash
S3
events
tags
audit
X-Account
And partition our cloud objects based on the cloud the instances of an account run on:
Islands

#rightscale
# 23
Cloud 1 Cloud 2 Cloud N
Account
Inst
ance
s
Services co-locatedwith resources
Services co-locatedwith resources
Services co-locatedwith resources
Features:• 1 instance: 1 home island• 1 Island can serve N clouds• Core Agents: global data
Benefits:• Close to cloud resources• Good failure isolation
• As good as cloud • Good scale: global replicas
across Cassandra DCs
routing
polling
monitor
Isla
nd 1
Isla
nd 2
Isla
nd N
routing
polling
monitor
routing
polling
monitor
routing
polling
monitor
routing
polling
monitor
routing
polling
monitor
Isla
nd 1
Isla
nd 2
Isla
nd N
Polling Clouds: MySQL• Master-Slave replication• Can port to NoSQL easily• Mostly a resource cache• But cloud partitionable
Monitoring: Custom• Replicated files• Backup to S3• Archive to S3
Routing: Cassandra• Simpler Key-Value access• Very high availability• Great scalability• Great replica control• Plus cross DC replication*

#rightscale
# 24U
sers
AccountC
lust
er 1
dash
S3
events
tags
audit
Clu
ster
N
dash
S3
events
tags
audit
Clu
ster
3
dash
S3
events
tags
audit …
routing
polling
monitor
routing
polling
monitor
routing
polling
monitor
Isla
nd 1
Isla
nd 2
Isla
nd N
Inst
ance
s
Azure
AWS E
ast
Privat
e
US Eas
t
Wes
t EU
Japa
n
Different Geographies
Different Clouds
What if the cloud where the clusteris deployed on…
Fails?

#rightscale
# 25U
sers
AccountC
lust
er 1
dash
S3
events
tags
audit
Clu
ster
N
dash
S3
events
tags
audit
Clu
ster
3
dash
S3
events
tags
audit …
US Eas
t
Wes
t EU
Japa
n
routing
polling
monitor
routing
polling
monitor
routing
polling
monitor
Isla
nd 1
Isla
nd 2
Isla
nd N
Inst
ance
s
Azure
AWS E
ast
Privat
e
Sister Clusters
Full replica
Features:• Each master has an extra remote slave• Each cluster in a pair is a DC replica of the other’s localring
At Disaster Recovery time:• Apps are told to start serving an extra shard• No need to provision more infrastructure to recover(try to avoid since everybody is on the same boat)
• New resources can be allocated over time to help offload existing ones

#rightscale
# 26
Conclusions• Shown that RightScale uses multiple database technologies
• RDBMS – MySQL for the ACID semantics and ‘queryability’• Using a Master to N-Slaves for RO scale, and quick failure recovery• And ReadOnly Provisioning – To increase RO availability and scale remote systems
• NoSQL: Cassandra for Availability and Scalability• for higher Read/Write availability within a cluster• For fully replicated regions across the globe (for Read/Write!)
• Shown how RightScale uses them in different techniques• It partitions resource data into Islands based on cloud proximity
• Can achieve in-cloud polling,and keep monitoring/syslog data storage next to instances• Can provide routing availability, colocated with instances for any world region
• It partitions core data into Clusters based on account groups• To scale the core horizontally, and independently and achieve account isolation/differentiation• Enhances fault isolation: Assigning accounts to Clusters deployed away their cloud resources
• It maintains cluster pairs (sister sites)• To recover from full cloud region failures• It doesn’t require massive amounts of new resources to recover

#rightscale
# 27
Next Steps1. Learn: Building Scalable Applications
in the Cloud Whitepaperwww.rightscale.com/whitepapers
2. Analyze: Deployment review of your environmentwww.rightscale.com/contact
3. Try: Free Editionwww.rightscale.com/free
Contact RightScale
(866) [email protected]
The next big RightScale Community Event!April 25-26 in San Franciscowww.RightScaleCompute.com
• Attend technical breakout sessions• Get RightScale training
• Talk with RightScale customers• Ask questions at the Expert Bar