cluster-based scalable network service author: armando steven d.gribble steven d.gribble yatin...

Post on 13-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Cluster-Based Scalable Cluster-Based Scalable Network ServiceNetwork Service

Author: Armando Author: Armando

Steven D.GribbleSteven D.Gribble

Yatin ChawatheYatin Chawathe

Eric A. BrewerEric A. Brewer

Paul GauthierPaul Gauthier

Presenter: Kang CaoPresenter: Kang Cao

Over ViewOver View

• IntroductionIntroduction• Cluster-Based Scalable Service Cluster-Based Scalable Service

ArchitectureArchitecture• Service ImplementationService Implementation• MeasurementsMeasurements• DiscussDiscuss• conclusionconclusion

IntroductionIntroduction

• GoalGoal• Advantages of ClustersAdvantages of Clusters• Challenges of Cluster computingChallenges of Cluster computing• BASE SemanticsBASE Semantics

GoalGoal

• ScalabilityScalability– Keep same per-user cost as load Keep same per-user cost as load

increases.increases.

• Availability:Availability:– Run 24 hour a day and 7 day a weekRun 24 hour a day and 7 day a week

• Cost effectivenessCost effectiveness

AdvantagesAdvantages

• ScalabilityScalability– Clusters are well suited to Internet Clusters are well suited to Internet

Service workloadService workload– Incremental scalability Incremental scalability

• High availabilityHigh availability• Commodity building blocksCommodity building blocks

– Cheap commodity PCCheap commodity PC– Get service quickly and cheapGet service quickly and cheap

challengeschallenges

• Administration Administration • Component VS. System replicationComponent VS. System replication• Partial failuresPartial failures• Share statesShare states

BASE SemanticsBASE Semantics

Against ACID(atomicity, Against ACID(atomicity, consistency,isolation,durability)consistency,isolation,durability)

• StaleStale• Soft stateSoft state• ApproximateApproximate

Cluster-Based Scalable Cluster-Based Scalable Service ArchitectureService Architecture

• Layer ArchitectureLayer Architecture• Separate network services from Separate network services from

their implementationtheir implementation• Stateless workersStateless workers

Cluster-Based Scalable Cluster-Based Scalable Service Architecture Service Architecture

• SNSSNS• TACCTACC• ServiceService

Scalable network serviceScalable network service

• Incremental and absolute scalabilityIncremental and absolute scalability• Worker load balancing and overflow Worker load balancing and overflow

managementmanagement• Front-end availability, fault tolerance Front-end availability, fault tolerance

mechanismsmechanisms• System monitoring and logging System monitoring and logging

SNSSNS

SNS SNS ManagerManager

SNS SNS ManagerManager

InternalInternalNetworkNetwork

Front EndFront EndFront EndFront End

MSMSMSMS

Front EndFront EndFront EndFront End

MSMSMSMS

Front EndFront EndFront EndFront End

MSMSMSMS

Worker DriverWorker DriverWorker DriverWorker Driver

WorkerWorkerWorkerWorker

Worker DriverWorker DriverWorker DriverWorker Driver

WorkerWorkerWorkerWorker

...

...

$

$

Internet

Load balanceLoad balance

• Centralized load balancingCentralized load balancing• Easy to implementEasy to implement

How to handle BurstsHow to handle Bursts

• Has a overflow poolHas a overflow pool• Manager can spawn workers on Manager can spawn workers on

overflow machines on the demandoverflow machines on the demand

ScalabilityScalability

• Components replicated Components replicated • Amount of additional resources Amount of additional resources

required is a linear function of the required is a linear function of the increase in offered loadincrease in offered load

• Partition the function between front Partition the function between front end and workerend and worker

• Keep worker as simple as possible Keep worker as simple as possible

Fault Tolerance and AvailabilityFault Tolerance and Availability

• Process peer fault toleranceProcess peer fault tolerance• Using soft statesUsing soft states• Timeout as an additional fault-Timeout as an additional fault-

tolerance mechanismtolerance mechanism

TACCTACC

TACC: Transformation, Aggregation, Caching, TACC: Transformation, Aggregation, Caching, CustomizationCustomization

• API for composition of stateless data API for composition of stateless data transformation and content aggregation transformation and content aggregation modulesmodules

• Uniform caching of original, post-aggregation Uniform caching of original, post-aggregation and post-transformation dataand post-transformation data

• Transparent access to Customization databaseTransparent access to Customization database

TACCTACC

A programming model for internet A programming model for internet ServiceService

• TransformationTransformation• Aggregation Aggregation • CachingCaching• CustomizationCustomization

Service ImplementationService Implementation

• Workers that present human Workers that present human interface to what TACC modules interface to what TACC modules do, including device-specific do, including device-specific presentationpresentation

• User interface to control the User interface to control the serviceservice

• Most service can be done at the Most service can be done at the service and TACC layersservice and TACC layers

Example:TranSendExample:TranSend

Model pool

switch

workstation Workstation workstation

Internet

TranSendTranSend

• Front EndsFront Ends• Load balancing ManagerLoad balancing Manager• User profile DatabaseUser profile Database• Cache NodesCache Nodes• Datatype-Specific DistillersDatatype-Specific Distillers• Graphical MonitorGraphical Monitor

Load Balancing ManagerLoad Balancing Manager

• Client-side JavaScript support Client-side JavaScript support balance load across multiple front balance load across multiple front endsends

• Centralized manager for internal Centralized manager for internal load balancingload balancing

Load balancingLoad balancing

• components register to managercomponents register to manager• Front end asks manager to give it a Front end asks manager to give it a

worker when it has taskworker when it has task• Manager locates a worker to Front endManager locates a worker to Front end• Manager may create a new distiller Manager may create a new distiller • Workers report their load to managerWorkers report their load to manager

Load balancingLoad balancing

• Manager broadcast the information Manager broadcast the information of load periodicallyof load periodically

• FrontEnds cache these informationFrontEnds cache these information• FrontEnds use the cached FrontEnds use the cached

information to dispatch requests to information to dispatch requests to workersworkers

Fault Tolerance and crash Fault Tolerance and crash RecoveryRecovery

• Using BASE semantics simplifies Using BASE semantics simplifies crash recoverycrash recovery

• Manager reports workers failures to Manager reports workers failures to the FrontEndthe FrontEnd

• Manager detects and restarts a Manager detects and restarts a crashed front endcrashed front end

• The front end detects and restarts The front end detects and restarts a crashed managera crashed manager

PerformancePerformanceLoad balancingLoad balancing

Performance:Performance:Load balancingLoad balancing

Conclusions:Conclusions:

• Layer architecture for cluster-base Layer architecture for cluster-base scalable network servicescalable network service

• The architecture is reusableThe architecture is reusable• Cluster-based value-added network Cluster-based value-added network

services will become an important services will become an important Internet-service paradigmInternet-service paradigm

Performance:Performance:ScalabilityScalability

questionquestion

1.1. Why are the cluster-based Why are the cluster-based network service well suited to network service well suited to internet serviceinternet service

answeranswer

• The requirements are highly The requirements are highly parallel( many indepent parallel( many indepent simultaneous users)simultaneous users)

• The grain size typically corresponds The grain size typically corresponds to at most a few CPU seconds on a to at most a few CPU seconds on a commodity PCcommodity PC

Question 2Question 2

• Why does the cluster-base network Why does the cluster-base network service use BASE semantics?service use BASE semantics?

Answer:Answer:

• BASE semantics allow us to handle BASE semantics allow us to handle partial failure in clusters with less partial failure in clusters with less complexity and cost.complexity and cost.

Question 3Question 3

• When the overflow machines are When the overflow machines are being recruited unusually often, being recruited unusually often, what should be done at this time?what should be done at this time?

Answer:Answer:

• It is time to add new machines. It is time to add new machines.

Question 4Question 4

• Does the Frontend crash not lost Does the Frontend crash not lost any information? If does, what kind any information? If does, what kind information will be lost?information will be lost?

Answer:Answer:

• User requests will be lost and user User requests will be lost and user need to handle timeout and resend need to handle timeout and resend request.request.

top related