architectures for open and scalable clouds
DESCRIPTION
My presentation for 2012's Cloud Connect that goes over architectural and design patterns for open and scalable clouds. Technical deck targeted at business audiences with a technical bent.TRANSCRIPT
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution
Architectures for open and scalable cloudsFebruary 14, 2012
Randy Bias, CTO & Co-founder
Our Perspective on Cloud Computing
2
It came from the large Internet players.
A Story of Two Clouds
3
A Story of Two Clouds
4
Tenets of Open & Scalable Clouds
1. Avoid vendor lock-in like bubonic plague
• See also Open Cloud Initiative (opencloudinitiative.org)
2. Simplicity scales, complexity fails
• 10x bigger == 100x more complex
3. TCO matters; measuring ROI is critical to success
4. Security is paramount ... but different
5. Risk acceptance over risk mitigation
6. Agility & iteration over big bang
5
This is a BIG Topic
• What I am covering today is patterns in:
• Hardware and software
• Networking, storage, and compute
• NOT covered today:
• Cloud operations
• Infrastructure software engineering
• Measuring success through operational excellence
• Security
6
Open Clouds(briefly)
7
A Word on ‘Open’
8
Here we go ...
• Elements:
• Open APIs & protocols
• Open hardware
• Open networking
• Open source software (OSS)
• Combined with:
• Architectural patterns, best practices, & de facto standards
• Operational excellence
9
Open APIs & Protocols
10
Open Hardware
11
Open Networking
12
Published Networking Blueprints
Open Source Software
13
Open Cloud OS
Open & ScalableCloud Patterns
14
Threads
• Small failure domains are less impacting
• Loose-coupling minimizes cascade failures
• Scale-out over scale-up with exceptions
• More AND cheaper
• State synchronization is dangerous (remember CAP)
• Everything has an API
• Automation ONLY works w/ homogeneity & modularity
• Lowest common denominator (LCD) services (LBaaS vs F5aaS)
• People are the number one source of failures
15
Pattern:Loose coupling
16
Synchronous, blocking calls mean cascading
failures.
Async, non-block calls mean failure in
isolation.
Pattern:Open source software
17
Excessive software taxation is the past.
Black boxes create lock-in.
You can always fork.
Pattern:Uptime in software - self management
18
Hardware fails.Software fails.
People fail.
Only software can measure itself &
respond to failure in near real-time.
Applications designed for 99.999% uptime can run anywhere
Pattern:Scale-out, not UP
19attrib: Bill Baker, Distinguished Engineer, Microsoft* added by yours truly ...
Scale Up: (Virtual*) Servers are like pets
You name them and when they get
sick, you nurse them back to
health
garfield.company.com
Pattern:Scale-out, not UP
19attrib: Bill Baker, Distinguished Engineer, Microsoft* added by yours truly ...
Scale Up: (Virtual*) Servers are like pets
You name them and when they get
sick, you nurse them back to
health
garfield.company.com
Scale Out: (Virtual*) Servers are like cattle
You number them and when they get
sick, you shoot them
web001.company.com
Pattern:Buy from ODMs
20
ODMs operate their businesses on 3-10%
margins.
AMZN, GOOG, and Facebook buy direct
without a middleman.
Only a few enterprise vendors are pivoting to
compete.
Pattern:Less enterprise “value” in x86 servers
21
Generic servers rule. Full stop. Nothing is better because nothing else is
*generic*.
“... a data center full of vanity free servers ... more
efficient ... less expensive to build and run ... “ - OCP
Pattern:Flat Networking
22
The largest cloud operators all run layer-3 routed, flat networks with no VLANs.
Cloud-ready apps don’t need or want VLANs.
Enterprise apps can be supported on open clouds
using Software-defined Networking (SDN)
Pattern:Software-defined Networking (SDN)
23
• x86 server is the new Linecard• network switch is the new ASIC• VXLAN (or NVGRE) is the new Chassis• SDN Controller is the new SUP Engine
“Network Virtualization”
Pattern:Flat Networking + SDNs
24
Flat + SDN co-exist & thrive together
Standard SecurityGroup
1 2
Availability Zone
VM VM
VM
VM
VM
VM
Virtual L2 Network
VM
VMVM
Virtual Private Cloud
Networking
VPC SecurityGroup
Internet
VPC Gateway
Physical Node
Pattern:RAIS instead of HA pairs/clusters
• Redundant arrays of inexpensive services (RAIS)
• Load balanced
• No state sharing
• On failure, connections are lost, but failures are rare
• Ridiculously simple & scalable
• Most things retry anyway
• Hardware failures are in-frequent & impact subset of traffic
• (N-F)/N, where N = total, F = failed
• Cascade failures are unlikely and failure domains are small
25
Service array (RAIS) example:
26
Backbone Routers
Cloud Access Switches
AZ (Spine) Switches
RAIS (NAT, LB, VPN)
OSPF Route Announcements
Return Traffic (default or source NAT)
API
Public IP Blocks
Cloud Control Plane
Pattern:Lots of inexpensive 1RU Switches
27
1RU: 6K-30K VMs / AZ
Simple spine-and-leaf flat routed network
Rack 1 Rack 2 Rack 3
Pattern:Lots of inexpensive 1RU Switches
27
1RU: 6K-30K VMs / AZ
Simple spine-and-leaf flat routed network
Rack 1 Rack 2 Rack 3
Modular: 40K-200K VMs / AZ
Rack 1Rack 2
MultipleRacks
Rack 1Rack 2
MultipleRacks
Rack 1Rack 2
MultipleRacks
Pattern:Direct-attached Storage (DAS)
28
Cloud-ready apps manage their own data replication.
DAS is the smallest failure domain possible with
reasonable storage I/O.
SAN == massive failure domain.
SSDs will be the great equalizer.
Pattern:Elastic Block Device Services
29
EBS/EBD is a crutch for poorly written apps.
Bigger failure domains (AWS outage anyone?), complex, sets
high expectations
Sometimes you need a crutch. When you do, overbuild the
network, and make sure you have a smart scheduler.
Pattern:More Servers == More Storage I/O
30
>1M writes/second, triple-redundancy w/ Cassandra on AWS
Linear scale-out == linear costs for performance
Pattern:Hypervisors are a commodity
31
Cloud end-users want OS of choice, not HVs.
Level up! Managing iron is for mainframe operators.
Hypervisor of the future is open source, easily modifiable, &
extensible.