architectures for open and scalable clouds

Post on 16-Jan-2015

29.096 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

My presentation for 2012's Cloud Connect that goes over architectural and design patterns for open and scalable clouds. Technical deck targeted at business audiences with a technical bent.

TRANSCRIPT

CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution

Architectures for open and scalable cloudsFebruary 14, 2012

Randy Bias, CTO & Co-founder

Our Perspective on Cloud Computing

2

It came from the large Internet players.

A Story of Two Clouds

3

A Story of Two Clouds

4

Tenets of Open & Scalable Clouds

1. Avoid vendor lock-in like bubonic plague

• See also Open Cloud Initiative (opencloudinitiative.org)

2. Simplicity scales, complexity fails

• 10x bigger == 100x more complex

3. TCO matters; measuring ROI is critical to success

4. Security is paramount ... but different

5. Risk acceptance over risk mitigation

6. Agility & iteration over big bang

5

This is a BIG Topic

• What I am covering today is patterns in:

• Hardware and software

• Networking, storage, and compute

• NOT covered today:

• Cloud operations

• Infrastructure software engineering

• Measuring success through operational excellence

• Security

6

Open Clouds(briefly)

7

A Word on ‘Open’

8

Here we go ...

• Elements:

• Open APIs & protocols

• Open hardware

• Open networking

• Open source software (OSS)

• Combined with:

• Architectural patterns, best practices, & de facto standards

• Operational excellence

9

Open APIs & Protocols

10

Open Hardware

11

Open Networking

12

Published Networking Blueprints

Open Source Software

13

Open Cloud OS

Open & ScalableCloud Patterns

14

Threads

• Small failure domains are less impacting

• Loose-coupling minimizes cascade failures

• Scale-out over scale-up with exceptions

• More AND cheaper

• State synchronization is dangerous (remember CAP)

• Everything has an API

• Automation ONLY works w/ homogeneity & modularity

• Lowest common denominator (LCD) services (LBaaS vs F5aaS)

• People are the number one source of failures

15

Pattern:Loose coupling

16

Synchronous, blocking calls mean cascading

failures.

Async, non-block calls mean failure in

isolation.

Pattern:Open source software

17

Excessive software taxation is the past.

Black boxes create lock-in.

You can always fork.

Pattern:Uptime in software - self management

18

Hardware fails.Software fails.

People fail.

Only software can measure itself &

respond to failure in near real-time.

Applications designed for 99.999% uptime can run anywhere

Pattern:Scale-out, not UP

19attrib: Bill Baker, Distinguished Engineer, Microsoft* added by yours truly ...

Scale Up: (Virtual*) Servers are like pets

You name them and when they get

sick, you nurse them back to

health

garfield.company.com

Pattern:Scale-out, not UP

19attrib: Bill Baker, Distinguished Engineer, Microsoft* added by yours truly ...

Scale Up: (Virtual*) Servers are like pets

You name them and when they get

sick, you nurse them back to

health

garfield.company.com

Scale Out: (Virtual*) Servers are like cattle

You number them and when they get

sick, you shoot them

web001.company.com

Pattern:Buy from ODMs

20

ODMs operate their businesses on 3-10%

margins.

AMZN, GOOG, and Facebook buy direct

without a middleman.

Only a few enterprise vendors are pivoting to

compete.

Pattern:Less enterprise “value” in x86 servers

21

Generic servers rule. Full stop. Nothing is better because nothing else is

*generic*.

“... a data center full of vanity free servers ... more

efficient ... less expensive to build and run ... “ - OCP

Pattern:Flat Networking

22

The largest cloud operators all run layer-3 routed, flat networks with no VLANs.

Cloud-ready apps don’t need or want VLANs.

Enterprise apps can be supported on open clouds

using Software-defined Networking (SDN)

Pattern:Software-defined Networking (SDN)

23

• x86 server is the new Linecard• network switch is the new ASIC• VXLAN (or NVGRE) is the new Chassis• SDN Controller is the new SUP Engine

“Network Virtualization”

Pattern:Flat Networking + SDNs

24

Flat + SDN co-exist & thrive together

Standard SecurityGroup

1 2

Availability Zone

VM VM

VM

VM

VM

VM

Virtual L2 Network

VM

VMVM

Virtual Private Cloud

Networking

VPC SecurityGroup

Internet

VPC Gateway

Physical Node

Pattern:RAIS instead of HA pairs/clusters

• Redundant arrays of inexpensive services (RAIS)

• Load balanced

• No state sharing

• On failure, connections are lost, but failures are rare

• Ridiculously simple & scalable

• Most things retry anyway

• Hardware failures are in-frequent & impact subset of traffic

• (N-F)/N, where N = total, F = failed

• Cascade failures are unlikely and failure domains are small

25

Service array (RAIS) example:

26

Backbone Routers

Cloud Access Switches

AZ (Spine) Switches

RAIS (NAT, LB, VPN)

OSPF Route Announcements

Return Traffic (default or source NAT)

API

Public IP Blocks

Cloud Control Plane

Pattern:Lots of inexpensive 1RU Switches

27

1RU: 6K-30K VMs / AZ

Simple spine-and-leaf flat routed network

Rack 1 Rack 2 Rack 3

Pattern:Lots of inexpensive 1RU Switches

27

1RU: 6K-30K VMs / AZ

Simple spine-and-leaf flat routed network

Rack 1 Rack 2 Rack 3

Modular: 40K-200K VMs / AZ

Rack 1Rack 2

MultipleRacks

Rack 1Rack 2

MultipleRacks

Rack 1Rack 2

MultipleRacks

Pattern:Direct-attached Storage (DAS)

28

Cloud-ready apps manage their own data replication.

DAS is the smallest failure domain possible with

reasonable storage I/O.

SAN == massive failure domain.

SSDs will be the great equalizer.

Pattern:Elastic Block Device Services

29

EBS/EBD is a crutch for poorly written apps.

Bigger failure domains (AWS outage anyone?), complex, sets

high expectations

Sometimes you need a crutch. When you do, overbuild the

network, and make sure you have a smart scheduler.

Pattern:More Servers == More Storage I/O

30

>1M writes/second, triple-redundancy w/ Cassandra on AWS

Linear scale-out == linear costs for performance

Pattern:Hypervisors are a commodity

31

Cloud end-users want OS of choice, not HVs.

Level up! Managing iron is for mainframe operators.

Hypervisor of the future is open source, easily modifiable, &

extensible.

Open Cloud SystemProduction ReadySimply Scaled

32

randyb@cloudscaling.com@randybias

top related