1 scalability terminology: farms, clones, partitions, and packs: racs and raps bill devlin, jim...

31
1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999 Based on presentation by Hongwei Zhang, Ohio State

Upload: esmond-dickerson

Post on 28-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

1

Scalability Terminology:Farms, Clones, Partitions, and Packs:

RACS and RAPS

Bill Devlin, Jim Cray, Bill Laing, George Spix

Microsoft Research

Dec. 1999

Based on presentation by Hongwei Zhang, Ohio State

Page 2: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

2

Outline

• Introduction

• Basic scalability terminology / techniques

• Software requirements for these scalable systems

• Cost/performance metrics to consider

• Summary

Page 3: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

3

Why need to Scale ?

• Server systems must be able to start small– Small-size company (garage-scale) v.s. international

company (kingdom-scale)

• System should be able to grow as demand grows e.g.– eCommerce made system growth more rapid & dynamic– ASP also need dynamic growth

Page 4: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

4

How to scale?

• Scale up - expanding a system by incrementally adding more devices to an existing node – CPUs, discs, NICS, etc. – inherently limited

• Scale Out – expanding the system by adding more nodes – convenient (computing capacity can be purchased incrementally), no theoretical scalability limit – slogans:

• Buy computing by the slice• Build systems from CyberBricks

– slice, cyberbricks: fundamental building blocks for a scalable system

Page 5: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

5

Basic terminology associated with scale-out

• Ways to organize massive computation– Farm– Geoplex

• Ways to scale a farm– Clone (RACS)– Partition (RAPS)

• Pack

Page 6: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

6

Farm/Geoplex

• Farm - the collection of servers, applications and data at a particular site– features:

• functionally specialized services (email, WWW, directory, database, etc.)

• administered as a unit (common staff, management policies, facilities, networking)

• Geoplex – a replicated (duplicated?) farm at two or more sites– disaster protection– may be

• active-active: all farms carry some of the load;• active-passive: one or more are hot-standbys (waiting for

fail-over of corresponding active farms)

Page 7: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

7

Page 8: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

8

Clone

• A replica of a server or a service• Allows load balancing• External to the clones - IP sprayer like Cisco LocalDirectorTM

• LocalDirectorTM dispatches (sprays) requests to different nodes in the clone to achieve load-balancing

• Internal to the clones - IP sieve like Network Load Balancing in Windows 2000

• Every requests arrive at every node in the clone, but each node intelligently accepts a part of these requests;

• Distributed coordination among nodes

Page 9: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

9

RACS

• DFN:– The collection of clones for a particular service is is called a

RACS (Reliable Array of Cloned Services).• Two types of RACS (Fig. 2)

– Shared-nothing RACS• Each node duplicate all the storage locally

– Shared-disk RACS (also called a cluster)• All the nodes (clones) share a common storage

manager. Stateless servers at different nodes access a common backend storage server

Page 10: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

10

Page 11: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

11

• Advantages of cloning and RACS– Offer both scalability and availability

• Scalability: excellent ways to add processing power, network bandwidth, and storage bandwidth to a farm;

• Nodes can act as backup for one another: one node fail, other nodes continue to offer service (probably with degraded performance)

• Failures could be masked, if node- and application-failure detection mechanisms are integrated with the load-balancing system or with client applications

– Easy to manage• Administrative operations on one service instance at one

node could be replicated to all others.

RACS (contd.)

Page 12: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

12

• Challenges– Shared-nothing RACS

• not a good way to grow storage capacity: updates at one node’s must be applied to all other nodes’ storage

• problematic for write-intensive services: all clones must perform all writes (no throughput improvement) and need subtle coordination

– Shared-disk RACS could ameliorate (to some extent) this cost and complexity of cloned storage;

– Shared-disk RACS• Storage server should be fault-tolerant for availability (only

one copy of data)• Still require subtle algorithms to manage updates (such as

cache validation, lock managers, transaction logs, etc.)

RACS (contd.)

Page 13: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

13

Partition

• DFN:– To grow a service by duplicating the hardware and software but

dividing the data among the nodes (Fig. 3).• Features

– Only the software is cloned, data is divided among the nodes (unlike shared-nothing clone)

– Transparent to applications– Simple partitioning has only one copy of data, thus not improving

availability:• Geoplex to guard against loss of storage• More common: locally duplex (raid 1) or parity protect (raid

5) the storage

Page 14: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

14

Page 15: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

15

• Example– Typically, the application middleware partitions the data and

workload by object:• Mail servers partition by mailboxes• Sales systems partition by customer accounts or product

lines• Challenge

– When a partition (node) is added, the data should be automatically repartitioned among the nodes to balance the storage and computational load.

– The partitioning should automatically adapt as new data is added and as the load changes.

Partition (contd.)

Page 16: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

16

Pack

• Purpose– To deal with hardware/software failure at a partition

• DFN:– Each partition is implemented as a pack of two or more

nodes that provide access to the storage (Fig. 3).

Page 17: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

17

• Two types of Pack– Shared-disk pack

• All members of the pack may access all the disks in the pack;

• Similar to shared-disk clone, except that the pack is serving just one part of the total database.

– Shared-nothing pack• Each member of the pack may serve just one partition of

the disk pool during normal conditions, but serve a failed partition if the partition’s primary server fails;

Pack (contd.)

Page 18: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

18

– Two modes:• Active-active pack:

– each member of the pack can have primary responsibility for one or more partitions;

– When a node in the pack fails, the service of its partition migrates to another node of the pack.

• Active-passive pack:– Just one node of the pack is actively serving

requests while the other nodes are acting as hot-standbys

Shared-nothing Pack (contd.)

Page 19: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

19

RAPS

• DFN:– The collection of nodes that support a packed-partitioned

service are called a RAPS (Reliable Array of Partitioned Services).

• Advantage– Provides both scalability and availability;– Better performance than RACS for write-intensive services.

Page 20: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

20

Brief Summary of Scalability Design

Page 21: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

21

Summary (contd.)

• Clones and RACS– For read-mostly applications with low consistency and modest

storage requirements (<= 100 GB)

• Web/file/security/directory servers

• Partitions and RAPS

– For update-intensive and large database applications (routing requests to specific partitions)

• Email/instant messaging/ERP/record keeping

Page 22: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

22

Example

• Multi-tier applications

(Functional separation)– front-tier:

• Web and firewall services (read mostly)– middle-tier:

• File servers (read mostly)– data-tier:

• SQL servers (update intensive)

Page 23: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

23

Page 24: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

24

Example (contd.)

• Load balancing and routing at each tier– Front-tier

• IP-level load distribution scheme– Middle-tier

• Data and process specific load steering, according to request semantics

– Data-tier

• Routing to the correct partition

Page 25: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

25

Software Requirements for Geoplex, Farms, RACS and RAPS

(more of a wish listwish list than a reflection of current tools and capabilities)

• Be able to manage everything from a single remote console, treating RACS and RAPS as entities– Automated operation software to deal with “normal events”

(summarizing, auditing, etc.) and to help the operator manage exceptional events (detects failures and orchestrates repair, etc.): reduce operator error (thus, enhancing site availability)

Page 26: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

26

Software requirements (contd.)

• Both the software and hardware components must allow online maintenance and replacement– Tools to support versioned software deployment and staging

across a site• Good tools to design user interfaces, services, and databases• Good tools to configure and then load balance the system as it

evolves

Page 27: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

27

• RACS– Automatic replication of software and data to new nodes– Automatic request routing to load balance the work and to

route around failures– Recognize repaired and new nodes

• RAPS– Automatic routing requests to nodes dedicated to serving a

partition of data (affinity routing)– Middleware to provide transparent partitioning and load-

balancing (a application-level service)– Similar manageability features of cloned system (for Pack)

Software requirements (contd.)

Page 28: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

28

Price/Performance Metrics

• Why need cloning/partitioning ? – One cannot buy a single 60 billion-instructions per second

processor or a single 100TB server– So, at least some degree of cloning and partitioning is

required

• What is the right building block for a site?

Page 29: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

29

Right building block

• Mainframe vendors– Mainframe is the right choice!

• Their hardware and software offer high availability• Easier to manage their systems than to manage coned

PCs

– But, mainframe prices are fairly high ! • 3x to 10x more expensive

Page 30: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

30

• Commodity servers and storage– Less costly to use inexpensive clones for those CPU

intensive services, such as web service.– Commodity software is easier to manage than the traditional

services that require skilled operators and administrators• Consensus

– Much easier to manage homogeneous sites (all NT, all FreeBSD, etc.) than to manage heterogeneous sites

• Stats: middleware (such as Netscape, IIS, Notes, Exchange) are where the administrators spend most of their time

Right building block (contd.)

Page 31: 1 Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec. 1999

31

Summary

• Scalability technique– Replicate a service at many nodes

• Simpler forms of replication– Duplicate both programs and data: RACS

• For large databases or update-intensive services– Data partitioned: RAPS– Packs make partitions highly available

• Against disaster– The entire farm is replicated to form a geoplex