architectures for open and scalable clouds

CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution

Architectures for open and scalable cloudsFebruary 14, 2012

Randy Bias, CTO & Co-founder

Our Perspective on Cloud Computing

It came from the large Internet players.

A Story of Two Clouds

Tenets of Open & Scalable Clouds

1. Avoid vendor lock-in like bubonic plague

• See also Open Cloud Initiative (opencloudinitiative.org)

2. Simplicity scales, complexity fails

• 10x bigger == 100x more complex

3. TCO matters; measuring ROI is critical to success

4. Security is paramount ... but different

5. Risk acceptance over risk mitigation

6. Agility & iteration over big bang

This is a BIG Topic

• What I am covering today is patterns in:

• Hardware and software

• Networking, storage, and compute

• NOT covered today:

• Cloud operations

• Infrastructure software engineering

• Measuring success through operational excellence

• Security

Open Clouds(briefly)

A Word on ‘Open’

Here we go ...

• Elements:

• Open APIs & protocols

• Open hardware

• Open networking

• Open source software (OSS)

• Combined with:

• Architectural patterns, best practices, & de facto standards

• Operational excellence

Open APIs & Protocols

Open Hardware

Open Networking

Published Networking Blueprints

Open Source Software

Open Cloud OS

Open & ScalableCloud Patterns

Threads

• Small failure domains are less impacting

• Loose-coupling minimizes cascade failures

• Scale-out over scale-up with exceptions

• More AND cheaper

• State synchronization is dangerous (remember CAP)

• Everything has an API

• Automation ONLY works w/ homogeneity & modularity

• Lowest common denominator (LCD) services (LBaaS vs F5aaS)

• People are the number one source of failures

Pattern:Loose coupling

Synchronous, blocking calls mean cascading

failures.

Async, non-block calls mean failure in

isolation.

Pattern:Open source software

Excessive software taxation is the past.

Black boxes create lock-in.

You can always fork.

Pattern:Uptime in software - self management

Hardware fails.Software fails.

People fail.

Only software can measure itself &

respond to failure in near real-time.

Applications designed for 99.999% uptime can run anywhere

Pattern:Scale-out, not UP

19attrib: Bill Baker, Distinguished Engineer, Microsoft* added by yours truly ...

Scale Up: (Virtual*) Servers are like pets

You name them and when they get

sick, you nurse them back to

health

garfield.company.com

Pattern:Scale-out, not UP

19attrib: Bill Baker, Distinguished Engineer, Microsoft* added by yours truly ...

Scale Up: (Virtual*) Servers are like pets

You name them and when they get

sick, you nurse them back to

health

garfield.company.com

Scale Out: (Virtual*) Servers are like cattle

You number them and when they get

sick, you shoot them

web001.company.com

Pattern:Buy from ODMs

ODMs operate their businesses on 3-10%

margins.

AMZN, GOOG, and Facebook buy direct

without a middleman.

Only a few enterprise vendors are pivoting to

compete.

Pattern:Less enterprise “value” in x86 servers

Generic servers rule. Full stop. Nothing is better because nothing else is

*generic*.

“... a data center full of vanity free servers ... more

efficient ... less expensive to build and run ... “ - OCP

Pattern:Flat Networking

The largest cloud operators all run layer-3 routed, flat networks with no VLANs.

Cloud-ready apps don’t need or want VLANs.

Enterprise apps can be supported on open clouds

using Software-defined Networking (SDN)

Pattern:Software-defined Networking (SDN)

• x86 server is the new Linecard• network switch is the new ASIC• VXLAN (or NVGRE) is the new Chassis• SDN Controller is the new SUP Engine

“Network Virtualization”

Pattern:Flat Networking + SDNs

Flat + SDN co-exist & thrive together

Standard SecurityGroup

Availability Zone

Virtual L2 Network

Virtual Private Cloud

Networking

VPC SecurityGroup

Internet

VPC Gateway

Physical Node

Pattern:RAIS instead of HA pairs/clusters

• Redundant arrays of inexpensive services (RAIS)

• Load balanced

• No state sharing

• On failure, connections are lost, but failures are rare

• Ridiculously simple & scalable

• Most things retry anyway

• Hardware failures are in-frequent & impact subset of traffic

• (N-F)/N, where N = total, F = failed

• Cascade failures are unlikely and failure domains are small

Service array (RAIS) example:

Backbone Routers

Cloud Access Switches

AZ (Spine) Switches

RAIS (NAT, LB, VPN)

OSPF Route Announcements

Return Traffic (default or source NAT)

Public IP Blocks

Cloud Control Plane

Pattern:Lots of inexpensive 1RU Switches

1RU: 6K-30K VMs / AZ

Simple spine-and-leaf flat routed network

Rack 1 Rack 2 Rack 3

Pattern:Lots of inexpensive 1RU Switches

1RU: 6K-30K VMs / AZ

Simple spine-and-leaf flat routed network

Rack 1 Rack 2 Rack 3

Modular: 40K-200K VMs / AZ

Rack 1Rack 2

MultipleRacks

Rack 1Rack 2

MultipleRacks

Rack 1Rack 2

MultipleRacks

Pattern:Direct-attached Storage (DAS)

Cloud-ready apps manage their own data replication.

DAS is the smallest failure domain possible with

reasonable storage I/O.

SAN == massive failure domain.

SSDs will be the great equalizer.

Pattern:Elastic Block Device Services

EBS/EBD is a crutch for poorly written apps.

Bigger failure domains (AWS outage anyone?), complex, sets

high expectations

Sometimes you need a crutch. When you do, overbuild the

network, and make sure you have a smart scheduler.

Pattern:More Servers == More Storage I/O

>1M writes/second, triple-redundancy w/ Cassandra on AWS

Linear scale-out == linear costs for performance

Pattern:Hypervisors are a commodity

Cloud end-users want OS of choice, not HVs.

Level up! Managing iron is for mainframe operators.

Hypervisor of the future is open source, easily modifiable, &

extensible.

Open Cloud SystemProduction ReadySimply Scaled

randyb@cloudscaling.com@randybias

architectures for open and scalable clouds

open cloudsbriey

open hardware11

software networking

open scalablecloud patterns

open apis protocols10

software canmeasure

softwaredenednetworking

cascade failures scale

Technology

scalable parallel computing on...

dissecting scalable database architectures

scalable web architectures and infrastructure

scalable internet architectures - goto...

scalable web architectures

architecting scalable private clouds

rapidly building and deploying scalable web architectures

scalable service architectures @ vdb16

scalable parallel computing on clouds using twister4azure

cs-310 scalable software architectures lecture 6

scalable graphics architectures: interface & texture ·...

building scalable web architectures

scalable architectures - microsoft finland devdays 2014

highly scalable-architectures

scalable architectures 101 -...

scalable web architectures: common patterns and approaches

scalable parallel architectures and their software

scalable internet architectures · who am i? @postwait on...

scalable web architectures - common patterns & approaches

scalable surface reconstruction from point clouds...