architectures for open and scalable clouds

CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution

Architectures for open and scalable cloudsFebruary 14, 2012

Randy Bias, CTO & Co-founder

Our Perspective on Cloud Computing

2

It came from the large Internet players.

A Story of Two Clouds

3

A Story of Two Clouds

4

Tenets of Open & Scalable Clouds

1. Avoid vendor lock-in like bubonic plague

• See also Open Cloud Initiative (opencloudinitiative.org)

2. Simplicity scales, complexity fails

• 10x bigger == 100x more complex

3. TCO matters; measuring ROI is critical to success

4. Security is paramount ... but different

5. Risk acceptance over risk mitigation

6. Agility & iteration over big bang

5

http://www.opencloudinitiative.org/

http://www.opencloudinitiative.org/

This is a BIG Topic

• What I am covering today is patterns in:

• Hardware and software

• Networking, storage, and compute

• NOT covered today:

• Cloud operations

• Infrastructure software engineering

• Measuring success through operational excellence

• Security

6

Open Clouds(briefly)

7

A Word on ‘Open’

8

Here we go ...

• Elements:

• Open APIs & protocols

• Open hardware

• Open networking

• Open source software (OSS)

• Combined with:

• Architectural patterns, best practices, & de facto standards

• Operational excellence

9

Open APIs & Protocols

10

Open Hardware

11

Open Networking

12

Published Networking Blueprints

Open Source Software

13

Open Cloud OS

Open & ScalableCloud Patterns

14

Threads

• Small failure domains are less impacting

• Loose-coupling minimizes cascade failures

• Scale-out over scale-up with exceptions

• More AND cheaper

• State synchronization is dangerous (remember CAP)

• Everything has an API

• Automation ONLY works w/ homogeneity & modularity

• Lowest common denominator (LCD) services (LBaaS vs F5aaS)

• People are the number one source of failures

15

Pattern:Loose coupling

16

Synchronous, blocking calls mean cascading

failures.

Async, non-block calls mean failure in

isolation.

Pattern:Open source software

17

Excessive software taxation is the past.

Black boxes create lock-in.

You can always fork.

Pattern:Uptime in software - self management

18

Hardware fails.Software fails.

People fail.

Only software can measure itself &

respond to failure in near real-time.

Applications designed for 99.999% uptime can run anywhere

Pattern:Scale-out, not UP

19attrib: Bill Baker, Distinguished Engineer, Microsoft* added by yours truly ...

Scale Up: (Virtual*) Servers are like pets

You name them and when they get

sick, you nurse them back to

health

garfield.company.com

Pattern:Scale-out, not UP

19attrib: Bill Baker, Distinguished Engineer, Microsoft* added by yours truly ...

Scale Up: (Virtual*) Servers are like pets

You name them and when they get

sick, you nurse them back to

health

garfield.company.com

Scale Out: (Virtual*) Servers are like cattle

You number them and when they get

sick, you shoot them

web001.company.com

Pattern:Buy from ODMs

20

ODMs operate their businesses on 3-10%

margins.

AMZN, GOOG, and Facebook buy direct

without a middleman.

Only a few enterprise vendors are pivoting to

compete.

Pattern:Less enterprise “value” in x86 servers

21

Generic servers rule. Full stop. Nothing is better because nothing else is

*generic*.

“... a data center full of vanity free servers ... more

efficient ... less expensive to build and run ... “ - OCP

Pattern:Flat Networking

22

The largest cloud operators all run layer-3 routed, flat networks with no VLANs.

Cloud-ready apps don’t need or want VLANs.

Enterprise apps can be supported on open clouds

using Software-defined Networking (SDN)

Pattern:Software-defined Networking (SDN)

23

• x86 server is the new Linecard• network switch is the new ASIC• VXLAN (or NVGRE) is the new Chassis• SDN Controller is the new SUP Engine

“Network Virtualization”

Pattern:Flat Networking + SDNs

24

Flat + SDN co-exist & thrive together

Standard SecurityGroup

1 2

Availability Zone

VM VM

VM

VM

VM

VM

Virtual L2 Network

VM

VMVM

Virtual Private Cloud

Networking

VPC SecurityGroup

Internet

VPC Gateway

Physical Node

Pattern:RAIS instead of HA pairs/clusters

• Redundant arrays of inexpensive services (RAIS)

• Load balanced

• No state sharing

• On failure, connections are lost, but failures are rare

• Ridiculously simple & scalable

• Most things retry anyway

• Hardware failures are in-frequent & impact subset of traffic

• (N-F)/N, where N = total, F = failed

• Cascade failures are unlikely and failure domains are small

25

Service array (RAIS) example:

26

Backbone Routers

Cloud Access Switches

AZ (Spine) Switches

RAIS (NAT, LB, VPN)

OSPF Route Announcements

Return Traffic (default or source NAT)

API

Public IP Blocks

Cloud Control Plane

Pattern:Lots of inexpensive 1RU Switches

27

1RU: 6K-30K VMs / AZ

Simple spine-and-leaf flat routed network

Rack 1 Rack 2 Rack 3

Pattern:Lots of inexpensive 1RU Switches

27

1RU: 6K-30K VMs / AZ

Simple spine-and-leaf flat routed network

Rack 1 Rack 2 Rack 3

Modular: 40K-200K VMs / AZ

Rack 1Rack 2

MultipleRacks

Rack 1Rack 2

MultipleRacks

Rack 1Rack 2

MultipleRacks

Pattern:Direct-attached Storage (DAS)

28

Cloud-ready apps manage their own data replication.

DAS is the smallest failure domain possible with

reasonable storage I/O.

SAN == massive failure domain.

SSDs will be the great equalizer.

Pattern:Elastic Block Device Services

29

EBS/EBD is a crutch for poorly written apps.

Bigger failure domains (AWS outage anyone?), complex, sets

high expectations

Sometimes you need a crutch. When you do, overbuild the

network, and make sure you have a smart scheduler.

Pattern:More Servers == More Storage I/O

30

>1M writes/second, triple-redundancy w/ Cassandra on AWS

Linear scale-out == linear costs for performance

Pattern:Hypervisors are a commodity

31

Cloud end-users want OS of choice, not HVs.

Level up! Managing iron is for mainframe operators.

Hypervisor of the future is open source, easily modifiable, &

extensible.

Open Cloud SystemProduction ReadySimply Scaled

32

[email protected]@randybias

mailto:[email protected]

mailto:[email protected]

architectures for open and scalable clouds

Technology

open cloudsbriey

open hardware11

software networking

open scalablecloud patterns

open apis protocols10

software canmeasure

softwaredenednetworking

cascade failures scale