scalabilityavailability

11

Scalability & AvailabilityScalability & Availability

Paul GreenfieldPaul Greenfield

22

Building Real SystemsBuilding Real Systems

ScalableScalable– Handle Handle expectedexpected load with acceptable load with acceptable

levels of performancelevels of performance– Grow easily when load growsGrow easily when load grows

AvailableAvailable– Available Available enoughenough of the time of the time

Performance and availability costPerformance and availability cost– Aim for ‘enough’ of each but not moreAim for ‘enough’ of each but not more– Have to be ‘architected’ in… not addedHave to be ‘architected’ in… not added

33

ScalableScalable

Scale-up or… Scale-up or… – Use bigger and faster systems Use bigger and faster systems

… … Scale-outScale-out– Systems working together to handle loadSystems working together to handle load

Server farmsServer farms ClustersClusters

Implications for application designImplications for application design– Especially state managementEspecially state management– And availability as well And availability as well

44

AvailableAvailable

Goal is 100% availabilityGoal is 100% availability– 24x7 operations 24x7 operations – Including time for maintenanceIncluding time for maintenance

Redundancy is the key to availabilityRedundancy is the key to availability– No single points of failureNo single points of failure– Spare everythingSpare everything

Disks, disk channels, processors, power Disks, disk channels, processors, power supplies, fans, memory, ..supplies, fans, memory, ..

Applications, databases, … Applications, databases, … – Hot standby, quick changeover on failureHot standby, quick changeover on failure

55

PerformancePerformance

How fast is this system? How fast is this system? – Not the same as scalability but related Not the same as scalability but related – Measured by response time and Measured by response time and

throughputthroughput How scalable is this system? How scalable is this system?

– Scalability is concerned with the upper Scalability is concerned with the upper limits to performancelimits to performance

– How big can it grow?How big can it grow?– How does it grow? (evenly, lumpily?)How does it grow? (evenly, lumpily?)

66

Performance MeasuresPerformance Measures

Response timeResponse time– What delay does the user see?What delay does the user see?– Instantaneous is good Instantaneous is good

95% under 2 seconds is acceptable?95% under 2 seconds is acceptable? Consistency is important psychologicallyConsistency is important psychologically

– Response time varies with ‘heaviness’ Response time varies with ‘heaviness’ of transactionsof transactions

Fast read-only transactionsFast read-only transactions Slower update transactionsSlower update transactions Effects of resource/database contentionEffects of resource/database contention

77

Response TimeResponse Time

Each transaction takes…Each transaction takes…– Processor timeProcessor time

Application, system services, database, …Application, system services, database, … Shared amongst competing processesShared amongst competing processes

– I/O time I/O time Largely disk reads/writesLargely disk reads/writes Large DB caches reduce # of I/OsLarge DB caches reduce # of I/Os

– 2TB in IBM’s top TPCC entry2TB in IBM’s top TPCC entry

– Wait time for shared resourcesWait time for shared resources Locks, shared structures, … Locks, shared structures, …

88

Response TimesResponse Times

Performance with contention

0

2000

4000

6000

8000

10000

12000

14000

1 5 10 20 50 100 200 400 600 800 1000

Clients

Res

pons

e tim

e (m

s)

Buy

Create

Get HS

Query C

Query ID

Sell

Update

99

Response TimesResponse TimesPerformance with no contention

0

500

1000

1500

2000

2500

3000

1 5 10 20 50 100 200 400 600 800 1000

Clients

Res

pons

e tim

e (m

s)

Buy

Create

Get HS

Query C

Query ID

Sell

Update

1010

Response TimesResponse TimesC++ response times

remote db - identity & keytable

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 200 400 600 800 1000 1200

Clients

Res

po

nse

tim

e (m

s)

Read ident

Update ident

Average ident

Read key

Update key

Average key

1111

ThroughputThroughput

How many transactions can be handled in How many transactions can be handled in some period of timesome period of time– Transactions/second or tpm, tph or tpdTransactions/second or tpm, tph or tpd– A measure of overall capacityA measure of overall capacity– Inverse of response timeInverse of response time

Transaction Processing CouncilTransaction Processing Council– Standard benchmarks for TP systemsStandard benchmarks for TP systems– www.tpc.orgwww.tpc.org– TPC-C models typical transaction systemTPC-C models typical transaction system

Current record is Current record is 4,092,799 4,092,799 tpmc (HP)tpmc (HP)

– TPC-E approved as TPC-C replacement (2/07)TPC-E approved as TPC-C replacement (2/07)

1212

ThroughputThroughput

Increases until resource saturationIncreases until resource saturation– Start waiting for resourcesStart waiting for resources

Processor, disk & network bandwidthProcessor, disk & network bandwidth Increasing response time with loadIncreasing response time with load

– Slowly decreases with contentionSlowly decreases with contention Overheads of sharing, interferenceOverheads of sharing, interference

– Some resources share/overload badlySome resources share/overload badly Contention for shared locksContention for shared locks Ethernet network performance degradesEthernet network performance degrades Disk degrades with sharingDisk degrades with sharing

1313

ThroughputThroughputRI x100 closed

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

1500

0 200 400 600 800 1000 1200 1400

Threads

TP

S

ops/avg thread

from HT RT

from run time

observed TPS

observed TPS

1414

System Capacity?System Capacity?

How many clients can you support?How many clients can you support?– Name an acceptable response timeName an acceptable response time– Average 95% under 2 secs is commonAverage 95% under 2 secs is common

And what is ‘average’?And what is ‘average’?

– Plot response time vs # of clientsPlot response time vs # of clients Great if you can run benchmarksGreat if you can run benchmarks

– Reason for prototyping and proving Reason for prototyping and proving proposed architectures before leaping proposed architectures before leaping into full-scale implementationinto full-scale implementation

1515

System CapacitySystem CapacityRI 100x 1-100

0

200

400

600

800

1000

1200

1400

1600

0 200 400 600 800 1000 1200 1400

Threads

TP

S

0.0

200.0

400.0

600.0

800.0

1000.0

1200.0

tps

tps

rt

rt

rt

1616

Scaling OutScaling Out

More boxes at every levelMore boxes at every level– Web servers (handling user interface)Web servers (handling user interface)– App servers (running business logic)App servers (running business logic)– Database servers (perhaps… a bit tricky?)Database servers (perhaps… a bit tricky?)– Just add more boxes to handle more loadJust add more boxes to handle more load

Spread load out across boxesSpread load out across boxes– Load balancingLoad balancing at every level at every level– Partitioning or replication for database?Partitioning or replication for database?– Impact on application design?Impact on application design?– Impact on system managementImpact on system management– All have impacts on architecture & operationsAll have impacts on architecture & operations

1717

Scaling OutScaling Out

UI tier Business tier Data tier

1818

‘‘Load Balancing’Load Balancing’

A few different but related meaningsA few different but related meanings– Distributing Distributing client bindingsclient bindings across across

servers or processesservers or processes Needed for Needed for statefulstateful systems systems Static allocation of client to serverStatic allocation of client to server

– Balancing Balancing requestsrequests across server across server systems or processessystems or processes

DynamicallyDynamically allocating requests to servers allocating requests to servers Normally only done for stateless systemsNormally only done for stateless systems

1919

Static Load BalancingStatic Load Balancing

Client

Client

Client

Name Server

Server process

Server process

Advertise service

Request server reference

Return server reference

Call server object’s methods

Get server object reference

Load balancing across application process instances within a server

2020

Load Balancing in CORBALoad Balancing in CORBA

Client calls on Client calls on name servername server to find the to find the location of a suitable serverlocation of a suitable server– CORBA terminology for object directoryCORBA terminology for object directory

Name server can spread client objects Name server can spread client objects across multiple serversacross multiple servers– Often ‘round robin’Often ‘round robin’

Client is bound to server and stays Client is bound to server and stays bound foreverbound forever– Can lead to performance problems if Can lead to performance problems if

server loads are unbalancedserver loads are unbalanced

2121

Name ServersName Servers

Server processes call name server as part Server processes call name server as part of their initialisationof their initialisation– Advertising their services/objectsAdvertising their services/objects

Clients call name server to find the Clients call name server to find the location of a server process/objectlocation of a server process/object– Up to the name server to match clients to Up to the name server to match clients to

serversservers Client then directly calls server process to Client then directly calls server process to

create or link to objects create or link to objects – Client-object binding usually staticClient-object binding usually static

2222

Dynamic Stateful?Dynamic Stateful?

Dynamic load balancing with stateful Dynamic load balancing with stateful servers/objects?servers/objects?– Clients can throw away server objects Clients can throw away server objects

and get new ones every now and againand get new ones every now and again In application code or middleware In application code or middleware Have to save & restore stateHave to save & restore state

– Or object replication in middlewareOr object replication in middleware Identical copies of objects on all serversIdentical copies of objects on all servers Replication of changes between serversReplication of changes between servers Clients have references to all copiesClients have references to all copies

2323

BEA WLS Load BalancingBEA WLS Load Balancing

Clients

Clients

DBMS

MACHINE B

MACHINE AEJB Cluster

HeartBeat viaMulticast backbone

EJB Servers instances

EJB Servers instances

2424

Threaded ServersThreaded Servers

No need for load-balancing within a No need for load-balancing within a single systemsingle system– Multithreaded server processMultithreaded server process

Thread pool servicing requestsThread pool servicing requests

– All objects live in a single process spaceAll objects live in a single process space– Any request can be picked up by any Any request can be picked up by any

threadthread Used by modern app serversUsed by modern app servers

2525

Threaded ServersThreaded Servers

Client

Client

Client

App

DLL

COM+

COM+ process

Thread pool

Shared object space

Application code

COM+ using thread pools rather than load balancing within a single system

2626

Dynamic Load BalancingDynamic Load Balancing

Dynamically balance load across serversDynamically balance load across servers– Requests from a client can go to any serverRequests from a client can go to any server

Requests dynamically routedRequests dynamically routed– Often used for Web Server farmsOften used for Web Server farms– IP sprayer (Cisco etc)IP sprayer (Cisco etc)– Network Load Balancer etcNetwork Load Balancer etc

Routing decision has to be fast & reliableRouting decision has to be fast & reliable– Routing in main processing pathRouting in main processing path

Applications normally statelessApplications normally stateless

2727

Web Server FarmsWeb Server Farms

Web servers are highly scalableWeb servers are highly scalable– Web applications are normally statelessWeb applications are normally stateless

Next request can go to any Web serverNext request can go to any Web server State comes from client or databaseState comes from client or database

– Just need to spread incoming requestsJust need to spread incoming requests IP sprayers (hardware, software)IP sprayers (hardware, software) Or >1 Web server looking at same IP Or >1 Web server looking at same IP

address with some coordination address with some coordination

2828

ClustersClusters

A group of independent computers A group of independent computers acting like a single systemacting like a single system– Shared disksShared disks– Single IP addressSingle IP address– Single set of servicesSingle set of services– Fail-over to other members of clusterFail-over to other members of cluster– Load sharing within the clusterLoad sharing within the cluster– DEC, IBM, MS, …DEC, IBM, MS, …

2929

ClustersClusters

Client PCsClient PCs

Server AServer A Server BServer B

Disk cabinet ADisk cabinet A

Disk cabinet BDisk cabinet B

HeartbeatHeartbeat

Cluster managementCluster management

3030

ClustersClusters

Address scalabilityAddress scalability– Add more boxes to the clusterAdd more boxes to the cluster– Replication or shared storageReplication or shared storage

Address availabilityAddress availability– Fail-overFail-over– Add & remove boxes from the cluster Add & remove boxes from the cluster

for upgrades and maintenancefor upgrades and maintenance Can be used as one element of a Can be used as one element of a

highly-available systemhighly-available system

3131

Scaling State Stores?Scaling State Stores?

Scaling stateless logic is easyScaling stateless logic is easy……but how are state stores scaled?but how are state stores scaled?

Bigger, faster box (if this helps at all)Bigger, faster box (if this helps at all)– Could hit lock contention or I/O limitsCould hit lock contention or I/O limits

ReplicationReplication– Multiple copies of shared dataMultiple copies of shared data– Apps access their own state storesApps access their own state stores– Change anywhere & send to everyoneChange anywhere & send to everyone

3232

Scaling State StoresScaling State Stores

PartitioningPartitioning– Multiple servers, each looking after a Multiple servers, each looking after a

part of the state storepart of the state store Separate customers A-M & N-ZSeparate customers A-M & N-Z Split customers according to state Split customers according to state

– Preferably transparent to appsPreferably transparent to apps e.g. SQL/Server partitioned viewse.g. SQL/Server partitioned views

Or combination of these approachesOr combination of these approaches

3333

Scaling Out SummaryScaling Out Summary

Districts11-20

Districts1-10

Web server farm (Network Load

Balancing)

Application farm (Component Load

Balancing)

Database servers (Cluster Services and partitioning)

UI tier Business tier Data tier

3434

Scale-upScale-up

No need for load-balancing No need for load-balancing – Just use a bigger boxJust use a bigger box– Add processors, memory, ….Add processors, memory, ….– SMP (symmetric multiprocessing)SMP (symmetric multiprocessing)– May not fix problem!May not fix problem!

Runs into limits eventuallyRuns into limits eventually Could be less availableCould be less available

– What happens on failures? Redundancy?What happens on failures? Redundancy? Could be easier to manageCould be easier to manage

3535

Scale-upScale-up

eBay example eBay example – Server farm of Windows boxes (scale-out)Server farm of Windows boxes (scale-out)– Single database server (scale-up)Single database server (scale-up)

64-processor SUN box (max at time)64-processor SUN box (max at time)

– More capacity needed?More capacity needed? Easily add more boxes to Web farmEasily add more boxes to Web farm Faster DB box? (not available)Faster DB box? (not available)

– More processors? (not possible)More processors? (not possible)– Split DB load across multiple DB servers?Split DB load across multiple DB servers?

– See eBay presentation… See eBay presentation…

3636

Available SystemAvailable System

Web Clients

Web Server farm Load balanced using WLB

App Servers farm using COM+ LB

Database installed on cluster for high availability

3737

AvailabilityAvailability

How much?How much?– 99%99% 87.6 hours a year87.6 hours a year– 99.9%99.9% 8.76 hours a year8.76 hours a year– 99.99%99.99% 0.876 hours a year0.876 hours a year

Need to consider operations as wellNeed to consider operations as well– Not just faults and recovery timeNot just faults and recovery time– Maintenance, software upgrades, Maintenance, software upgrades,

backups, application changesbackups, application changes

3838

AvailabilityAvailability

Often a question of application designOften a question of application design– Stateful vs statelessStateful vs stateless

What happens if a server fails?What happens if a server fails? Can requests go to any server?Can requests go to any server?

– Synchronous method calls or Synchronous method calls or asynchronous messaging?asynchronous messaging?

Reduce dependency between componentsReduce dependency between components Failure tolerant designsFailure tolerant designs

– And manageability decisions to considerAnd manageability decisions to consider

3939

Redundancy=AvailabilityRedundancy=Availability

Passive or active standby systemsPassive or active standby systems– Re-route requests on failureRe-route requests on failure– Continuous service (almost)Continuous service (almost)

Recover failed system while alternative Recover failed system while alternative handles workloadhandles workload

May be some hand-over time (db recovery?)May be some hand-over time (db recovery?) Active standby & log shipping reduce thisActive standby & log shipping reduce this

– At the expense of 2x system cost… At the expense of 2x system cost…

What happens to in-flight work?What happens to in-flight work?– State recovers by aborting in-flight ops & State recovers by aborting in-flight ops &

doing db recovery but … doing db recovery but …

4040

Transaction RecoveryTransaction Recovery

Could be handled by middlewareCould be handled by middleware– Persistent queues of accepted requestsPersistent queues of accepted requests– Still a failure window thoughStill a failure window though

Large role for client apps/usersLarge role for client apps/users– Did the request get lost on failure?Did the request get lost on failure?– Retry on error?Retry on error?

Large role for server appsLarge role for server apps– What to do with duplicate requests?What to do with duplicate requests?– Try for idempotency (repeated txns OK)Try for idempotency (repeated txns OK)– Or track and reject duplicatesOr track and reject duplicates

4141

FragilityFragility

Large, distributed, synchronous Large, distributed, synchronous systems are systems are notnot robust robust– Many independent systems & links… Many independent systems & links…

Everything always has to be workingEverything always has to be working

– Rationale for Asynchronous MessagingRationale for Asynchronous Messaging Loosen ‘coupling’ between componentsLoosen ‘coupling’ between components Rely on guaranteed delivery insteadRely on guaranteed delivery instead May just defer error handling thoughMay just defer error handling though

– Could be much harder to handle laterCould be much harder to handle later To be discussed next time… To be discussed next time…

4242

ExampleExample

Customer component

Payment component

Order component

Goods component

Stock component

Delivery component

DCOM/COM

MTS

Database server

Application server

Warehouse manager

Print invoices

Web server

Web server

Local client workstations

Internet/ Intranet

Remote warehouses Scripted Web

pages

Central Office

User client apps

Remote sales offices

Router

Remote sites

HTTP

ORPC

MQ

User client apps

CICS interface

External company

accounting system

scalabilityavailability

Technology

load balancing

db recovery

application

response time

disk cabinet

large role

database servers

load