systems issues for scalable, fault tolerant internet services

Systems Issues for Scalable, Systems Issues for Scalable, Fault Tolerant Internet ServicesFault Tolerant Internet Services

Yatin ChawatheYatin ChawatheEric BrewerEric Brewer

To appear in Middleware ’98To appear in Middleware ’98http://www.cs.berkeley.edu/~yatin/papers/sns-crc.pshttp://www.cs.berkeley.edu/~yatin/papers/sns-crc.ps

MotivationMotivation• Proliferation of network-based servicesProliferation of network-based services• Two critical issues must be addressed by Two critical issues must be addressed by

Internet services:Internet services:– System scalabilitySystem scalability

• Incremental and linear scalabilityIncremental and linear scalability– Availability and fault toleranceAvailability and fault tolerance

• 24x7 operation24x7 operation

A Reusable SNS FrameworkA Reusable SNS Framework• Clusters of workstations are ideal for Internet Clusters of workstations are ideal for Internet

services services [FGC+97][FGC+97]

• But, clusters are difficult to manageBut, clusters are difficult to manage– To ensure linear scalability, service must distribute To ensure linear scalability, service must distribute

load across the clusterload across the cluster– Service must grow the cluster with increasing loadService must grow the cluster with increasing load– Partial failures within a cluster complicate fault Partial failures within a cluster complicate fault

managementmanagementIsolate common requirements of cluster-based Internet apps into a reusable substrate --

the Scalable Network Services (SNS) framework

ArchitectureArchitecture

SNS SNS ManagerManager

InternalInternalNetworkNetwork

WorkerWorker

Worker DriverWorker Driver

WorkerWorker


WorkerWorker



WorkerWorker


WorkerWorker

...

...

Outside WorldOutside World

WorkersWorkers• Workers are grouped into classes. Within a class, Workers are grouped into classes. Within a class,

workers are identicalworkers are identical• Workers can receive tasks from the outside world, or Workers can receive tasks from the outside world, or

from other workersfrom other workers• Workers have a simple serial interface for tasksWorkers have a simple serial interface for tasks

– The The originatororiginator sends a task to the sends a task to the consumerconsumer by specifying by specifying the class and inputs for the taskthe class and inputs for the task

– Tasks are atomic and restartableTasks are atomic and restartable– Worker Drivers present a narrow interface between the SNS Worker Drivers present a narrow interface between the SNS

substrate and the worker applicationsubstrate and the worker application

Centralized SNS ManagerCentralized SNS Manager• SNS Manager is intentionally centralizedSNS Manager is intentionally centralized

– makes it easier to reason about and implement the makes it easier to reason about and implement the various policiesvarious policies

– ““all” we need to do is ensure the fault tolerance of the all” we need to do is ensure the fault tolerance of the manager, and make sure it is not a performance manager, and make sure it is not a performance bottleneckbottleneck

• Three key functionsThree key functions– Resource locationResource location– Load balancing and scalabilityLoad balancing and scalability– Fault toleranceFault tolerance

Resource LocationResource Location

WorkerWorker


WorkerWorker



Multicast BeaconsMulticast BeaconsMulticast BeaconsMulticast BeaconsMulticast BeaconsMulticast Beacons

RegisterRegister

FindFindFoundFound

PersistentPersistentConnectionConnection

Load BalancingLoad Balancing• Load measurement and reportingLoad measurement and reporting

– Each worker examines incoming requests and Each worker examines incoming requests and estimates the “load” that would be generatedestimates the “load” that would be generated

– Simplest load metric: queue length at workersSimplest load metric: queue length at workers– Workers periodically report their current load to Workers periodically report their current load to

the SNS Managerthe SNS Manager– SNS Manager maintains load history and SNS Manager maintains load history and

aggregates load reports from all workersaggregates load reports from all workers– Load reports are piggybacked on manager beacons Load reports are piggybacked on manager beacons

to rest of the systemto rest of the system

Load BalancingLoad Balancing• Each worker performs local load balancing Each worker performs local load balancing

decisionsdecisions• Use lottery scheduling -- # of tickets are Use lottery scheduling -- # of tickets are

inversely proportional to worker loadinversely proportional to worker load• Stale load reports can cause oscillationsStale load reports can cause oscillations

– Use a correction factor based on the number of Use a correction factor based on the number of requests that were sent since last load reportrequests that were sent since last load report

Auto-launch for ScalabilityAuto-launch for Scalability• Worker replication to handle short traffic burstsWorker replication to handle short traffic bursts

– Multiple workers handle requests in parallelMultiple workers handle requests in parallel– If load on a class of workers gets too high, the SNS If load on a class of workers gets too high, the SNS

Manager launches a new oneManager launches a new one

• Overflow pool for long burstsOverflow pool for long bursts– non-dedicated set of machines (e.g. users’ desktop non-dedicated set of machines (e.g. users’ desktop

machines)machines)– when all dedicated nodes are exhausted, harness an when all dedicated nodes are exhausted, harness an

overflow node; release it after burst subsidesoverflow node; release it after burst subsides– useful for incremental scalabilityuseful for incremental scalability

Fault ToleranceFault Tolerance• Starfish Fault toleranceStarfish Fault tolerance

– ““Peer” monitoring as opposed to Peer” monitoring as opposed to primary/secondary fault toleranceprimary/secondary fault tolerance

• Two mechanisms: Two mechanisms: – Timeouts and retriesTimeouts and retries– Preemptive detection and component restartPreemptive detection and component restart

• Reliance on soft state simplifies crash Reliance on soft state simplifies crash recoveryrecovery

Fault ToleranceFault Tolerance

WorkerWorker


WorkerWorker


WorkerWorker




AmRestarting




ReRegisterReRegister

Example ApplicationsExample Applications• TranSendTranSend

– Web proxy for on-the-fly content distillationWeb proxy for on-the-fly content distillation

• WingmanWingman– The world’s only graphical web browser for the 3COM The world’s only graphical web browser for the 3COM

PalmPilotPalmPilot

• TopGun MediaboardTopGun Mediaboard– PDA groupware: shared electronic whiteboard for the PDA groupware: shared electronic whiteboard for the

3COM PalmPilot3COM PalmPilot

• MARSMARS– MBone archive serverMBone archive server

EvaluationEvaluation

0

2

4

6

8

10

12

14

16

18

0 10 20 30 40 50 60 70

Time (seconds)

Load

(que

ue le

ngth

)

Worker 1Worker 2

EvaluationEvaluation

0

5

10

15

20

25

0 200 400 600 800Time (seconds)

Que

ue L

engt

h

0

20

40

60

Offe

red

Load

(req

uest

s/se

cond

)

Worker 1Worker 2Worker 3Worker 4Worker 5Offered Load

Worker 2started

Worker 3started

Workers 4& 5started

SummarySummary• Reusable architecture substrate for building Reusable architecture substrate for building

Internet service applicationsInternet service applications• Application developers program their services Application developers program their services

to a well-defined narrow interfaceto a well-defined narrow interface• SNS takes care of resource location, spawning, SNS takes care of resource location, spawning,

load balancing, fault toleranceload balancing, fault tolerance• Number of interesting applications on top of Number of interesting applications on top of

the SNS substratethe SNS substrate• Next step: SNSv2 Next step: SNSv2 NINJANINJA

systems issues for scalable, fault tolerant internet services

Documents

parallelif load

load history

current load

aggregates load reports

worker loadstale load

sns substrate

generatedsimplest load

load reportautolaunch