Download - ScalabilityAvailability
11
Scalability & AvailabilityScalability & Availability
Paul GreenfieldPaul Greenfield
22
Building Real SystemsBuilding Real Systems
ScalableScalable– Handle Handle expectedexpected load with acceptable load with acceptable
levels of performancelevels of performance– Grow easily when load growsGrow easily when load grows
AvailableAvailable– Available Available enoughenough of the time of the time
Performance and availability costPerformance and availability cost– Aim for ‘enough’ of each but not moreAim for ‘enough’ of each but not more– Have to be ‘architected’ in… not addedHave to be ‘architected’ in… not added
33
ScalableScalable
Scale-up or… Scale-up or… – Use bigger and faster systems Use bigger and faster systems
… … Scale-outScale-out– Systems working together to handle loadSystems working together to handle load
Server farmsServer farms ClustersClusters
Implications for application designImplications for application design– Especially state managementEspecially state management– And availability as well And availability as well
44
AvailableAvailable
Goal is 100% availabilityGoal is 100% availability– 24x7 operations 24x7 operations – Including time for maintenanceIncluding time for maintenance
Redundancy is the key to availabilityRedundancy is the key to availability– No single points of failureNo single points of failure– Spare everythingSpare everything
Disks, disk channels, processors, power Disks, disk channels, processors, power supplies, fans, memory, ..supplies, fans, memory, ..
Applications, databases, … Applications, databases, … – Hot standby, quick changeover on failureHot standby, quick changeover on failure
55
PerformancePerformance
How fast is this system? How fast is this system? – Not the same as scalability but related Not the same as scalability but related – Measured by response time and Measured by response time and
throughputthroughput How scalable is this system? How scalable is this system?
– Scalability is concerned with the upper Scalability is concerned with the upper limits to performancelimits to performance
– How big can it grow?How big can it grow?– How does it grow? (evenly, lumpily?)How does it grow? (evenly, lumpily?)
66
Performance MeasuresPerformance Measures
Response timeResponse time– What delay does the user see?What delay does the user see?– Instantaneous is good Instantaneous is good
95% under 2 seconds is acceptable?95% under 2 seconds is acceptable? Consistency is important psychologicallyConsistency is important psychologically
– Response time varies with ‘heaviness’ Response time varies with ‘heaviness’ of transactionsof transactions
Fast read-only transactionsFast read-only transactions Slower update transactionsSlower update transactions Effects of resource/database contentionEffects of resource/database contention
77
Response TimeResponse Time
Each transaction takes…Each transaction takes…– Processor timeProcessor time
Application, system services, database, …Application, system services, database, … Shared amongst competing processesShared amongst competing processes
– I/O time I/O time Largely disk reads/writesLargely disk reads/writes Large DB caches reduce # of I/OsLarge DB caches reduce # of I/Os
– 2TB in IBM’s top TPCC entry2TB in IBM’s top TPCC entry
– Wait time for shared resourcesWait time for shared resources Locks, shared structures, … Locks, shared structures, …
88
Response TimesResponse Times
Performance with contention
0
2000
4000
6000
8000
10000
12000
14000
1 5 10 20 50 100 200 400 600 800 1000
Clients
Res
pons
e tim
e (m
s)
Buy
Create
Get HS
Query C
Query ID
Sell
Update
99
Response TimesResponse TimesPerformance with no contention
0
500
1000
1500
2000
2500
3000
1 5 10 20 50 100 200 400 600 800 1000
Clients
Res
pons
e tim
e (m
s)
Buy
Create
Get HS
Query C
Query ID
Sell
Update
1010
Response TimesResponse TimesC++ response times
remote db - identity & keytable
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 200 400 600 800 1000 1200
Clients
Res
po
nse
tim
e (m
s)
Read ident
Update ident
Average ident
Read key
Update key
Average key
1111
ThroughputThroughput
How many transactions can be handled in How many transactions can be handled in some period of timesome period of time– Transactions/second or tpm, tph or tpdTransactions/second or tpm, tph or tpd– A measure of overall capacityA measure of overall capacity– Inverse of response timeInverse of response time
Transaction Processing CouncilTransaction Processing Council– Standard benchmarks for TP systemsStandard benchmarks for TP systems– www.tpc.orgwww.tpc.org– TPC-C models typical transaction systemTPC-C models typical transaction system
Current record is Current record is 4,092,799 4,092,799 tpmc (HP)tpmc (HP)
– TPC-E approved as TPC-C replacement (2/07)TPC-E approved as TPC-C replacement (2/07)
1212
ThroughputThroughput
Increases until resource saturationIncreases until resource saturation– Start waiting for resourcesStart waiting for resources
Processor, disk & network bandwidthProcessor, disk & network bandwidth Increasing response time with loadIncreasing response time with load
– Slowly decreases with contentionSlowly decreases with contention Overheads of sharing, interferenceOverheads of sharing, interference
– Some resources share/overload badlySome resources share/overload badly Contention for shared locksContention for shared locks Ethernet network performance degradesEthernet network performance degrades Disk degrades with sharingDisk degrades with sharing
1313
ThroughputThroughputRI x100 closed
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
0 200 400 600 800 1000 1200 1400
Threads
TP
S
ops/avg thread
from HT RT
from run time
observed TPS
observed TPS
1414
System Capacity?System Capacity?
How many clients can you support?How many clients can you support?– Name an acceptable response timeName an acceptable response time– Average 95% under 2 secs is commonAverage 95% under 2 secs is common
And what is ‘average’?And what is ‘average’?
– Plot response time vs # of clientsPlot response time vs # of clients Great if you can run benchmarksGreat if you can run benchmarks
– Reason for prototyping and proving Reason for prototyping and proving proposed architectures before leaping proposed architectures before leaping into full-scale implementationinto full-scale implementation
1515
System CapacitySystem CapacityRI 100x 1-100
0
200
400
600
800
1000
1200
1400
1600
0 200 400 600 800 1000 1200 1400
Threads
TP
S
0.0
200.0
400.0
600.0
800.0
1000.0
1200.0
tps
tps
rt
rt
rt
1616
Scaling OutScaling Out
More boxes at every levelMore boxes at every level– Web servers (handling user interface)Web servers (handling user interface)– App servers (running business logic)App servers (running business logic)– Database servers (perhaps… a bit tricky?)Database servers (perhaps… a bit tricky?)– Just add more boxes to handle more loadJust add more boxes to handle more load
Spread load out across boxesSpread load out across boxes– Load balancingLoad balancing at every level at every level– Partitioning or replication for database?Partitioning or replication for database?– Impact on application design?Impact on application design?– Impact on system managementImpact on system management– All have impacts on architecture & operationsAll have impacts on architecture & operations
1717
Scaling OutScaling Out
UI tier Business tier Data tier
1818
‘‘Load Balancing’Load Balancing’
A few different but related meaningsA few different but related meanings– Distributing Distributing client bindingsclient bindings across across
servers or processesservers or processes Needed for Needed for statefulstateful systems systems Static allocation of client to serverStatic allocation of client to server
– Balancing Balancing requestsrequests across server across server systems or processessystems or processes
DynamicallyDynamically allocating requests to servers allocating requests to servers Normally only done for stateless systemsNormally only done for stateless systems
1919
Static Load BalancingStatic Load Balancing
Client
Client
Client
Name Server
Server process
Server process
Advertise service
Request server reference
Return server reference
Call server object’s methods
Get server object reference
Load balancing across application process instances within a server
2020
Load Balancing in CORBALoad Balancing in CORBA
Client calls on Client calls on name servername server to find the to find the location of a suitable serverlocation of a suitable server– CORBA terminology for object directoryCORBA terminology for object directory
Name server can spread client objects Name server can spread client objects across multiple serversacross multiple servers– Often ‘round robin’Often ‘round robin’
Client is bound to server and stays Client is bound to server and stays bound foreverbound forever– Can lead to performance problems if Can lead to performance problems if
server loads are unbalancedserver loads are unbalanced
2121
Name ServersName Servers
Server processes call name server as part Server processes call name server as part of their initialisationof their initialisation– Advertising their services/objectsAdvertising their services/objects
Clients call name server to find the Clients call name server to find the location of a server process/objectlocation of a server process/object– Up to the name server to match clients to Up to the name server to match clients to
serversservers Client then directly calls server process to Client then directly calls server process to
create or link to objects create or link to objects – Client-object binding usually staticClient-object binding usually static
2222
Dynamic Stateful?Dynamic Stateful?
Dynamic load balancing with stateful Dynamic load balancing with stateful servers/objects?servers/objects?– Clients can throw away server objects Clients can throw away server objects
and get new ones every now and againand get new ones every now and again In application code or middleware In application code or middleware Have to save & restore stateHave to save & restore state
– Or object replication in middlewareOr object replication in middleware Identical copies of objects on all serversIdentical copies of objects on all servers Replication of changes between serversReplication of changes between servers Clients have references to all copiesClients have references to all copies
2323
BEA WLS Load BalancingBEA WLS Load Balancing
Clients
Clients
DBMS
MACHINE B
MACHINE AEJB Cluster
HeartBeat viaMulticast backbone
EJB Servers instances
EJB Servers instances
2424
Threaded ServersThreaded Servers
No need for load-balancing within a No need for load-balancing within a single systemsingle system– Multithreaded server processMultithreaded server process
Thread pool servicing requestsThread pool servicing requests
– All objects live in a single process spaceAll objects live in a single process space– Any request can be picked up by any Any request can be picked up by any
threadthread Used by modern app serversUsed by modern app servers
2525
Threaded ServersThreaded Servers
Client
Client
Client
App
DLL
COM+
COM+ process
Thread pool
Shared object space
Application code
COM+ using thread pools rather than load balancing within a single system
2626
Dynamic Load BalancingDynamic Load Balancing
Dynamically balance load across serversDynamically balance load across servers– Requests from a client can go to any serverRequests from a client can go to any server
Requests dynamically routedRequests dynamically routed– Often used for Web Server farmsOften used for Web Server farms– IP sprayer (Cisco etc)IP sprayer (Cisco etc)– Network Load Balancer etcNetwork Load Balancer etc
Routing decision has to be fast & reliableRouting decision has to be fast & reliable– Routing in main processing pathRouting in main processing path
Applications normally statelessApplications normally stateless
2727
Web Server FarmsWeb Server Farms
Web servers are highly scalableWeb servers are highly scalable– Web applications are normally statelessWeb applications are normally stateless
Next request can go to any Web serverNext request can go to any Web server State comes from client or databaseState comes from client or database
– Just need to spread incoming requestsJust need to spread incoming requests IP sprayers (hardware, software)IP sprayers (hardware, software) Or >1 Web server looking at same IP Or >1 Web server looking at same IP
address with some coordination address with some coordination
2828
ClustersClusters
A group of independent computers A group of independent computers acting like a single systemacting like a single system– Shared disksShared disks– Single IP addressSingle IP address– Single set of servicesSingle set of services– Fail-over to other members of clusterFail-over to other members of cluster– Load sharing within the clusterLoad sharing within the cluster– DEC, IBM, MS, …DEC, IBM, MS, …
2929
ClustersClusters
Client PCsClient PCs
Server AServer A Server BServer B
Disk cabinet ADisk cabinet A
Disk cabinet BDisk cabinet B
HeartbeatHeartbeat
Cluster managementCluster management
3030
ClustersClusters
Address scalabilityAddress scalability– Add more boxes to the clusterAdd more boxes to the cluster– Replication or shared storageReplication or shared storage
Address availabilityAddress availability– Fail-overFail-over– Add & remove boxes from the cluster Add & remove boxes from the cluster
for upgrades and maintenancefor upgrades and maintenance Can be used as one element of a Can be used as one element of a
highly-available systemhighly-available system
3131
Scaling State Stores?Scaling State Stores?
Scaling stateless logic is easyScaling stateless logic is easy……but how are state stores scaled?but how are state stores scaled?
Bigger, faster box (if this helps at all)Bigger, faster box (if this helps at all)– Could hit lock contention or I/O limitsCould hit lock contention or I/O limits
ReplicationReplication– Multiple copies of shared dataMultiple copies of shared data– Apps access their own state storesApps access their own state stores– Change anywhere & send to everyoneChange anywhere & send to everyone
3232
Scaling State StoresScaling State Stores
PartitioningPartitioning– Multiple servers, each looking after a Multiple servers, each looking after a
part of the state storepart of the state store Separate customers A-M & N-ZSeparate customers A-M & N-Z Split customers according to state Split customers according to state
– Preferably transparent to appsPreferably transparent to apps e.g. SQL/Server partitioned viewse.g. SQL/Server partitioned views
Or combination of these approachesOr combination of these approaches
3333
Scaling Out SummaryScaling Out Summary
Districts11-20
Districts1-10
Web server farm (Network Load
Balancing)
Application farm (Component Load
Balancing)
Database servers (Cluster Services and partitioning)
UI tier Business tier Data tier
3434
Scale-upScale-up
No need for load-balancing No need for load-balancing – Just use a bigger boxJust use a bigger box– Add processors, memory, ….Add processors, memory, ….– SMP (symmetric multiprocessing)SMP (symmetric multiprocessing)– May not fix problem!May not fix problem!
Runs into limits eventuallyRuns into limits eventually Could be less availableCould be less available
– What happens on failures? Redundancy?What happens on failures? Redundancy? Could be easier to manageCould be easier to manage
3535
Scale-upScale-up
eBay example eBay example – Server farm of Windows boxes (scale-out)Server farm of Windows boxes (scale-out)– Single database server (scale-up)Single database server (scale-up)
64-processor SUN box (max at time)64-processor SUN box (max at time)
– More capacity needed?More capacity needed? Easily add more boxes to Web farmEasily add more boxes to Web farm Faster DB box? (not available)Faster DB box? (not available)
– More processors? (not possible)More processors? (not possible)– Split DB load across multiple DB servers?Split DB load across multiple DB servers?
– See eBay presentation… See eBay presentation…
3636
Available SystemAvailable System
Web Clients
Web Server farm Load balanced using WLB
App Servers farm using COM+ LB
Database installed on cluster for high availability
3737
AvailabilityAvailability
How much?How much?– 99%99% 87.6 hours a year87.6 hours a year– 99.9%99.9% 8.76 hours a year8.76 hours a year– 99.99%99.99% 0.876 hours a year0.876 hours a year
Need to consider operations as wellNeed to consider operations as well– Not just faults and recovery timeNot just faults and recovery time– Maintenance, software upgrades, Maintenance, software upgrades,
backups, application changesbackups, application changes
3838
AvailabilityAvailability
Often a question of application designOften a question of application design– Stateful vs statelessStateful vs stateless
What happens if a server fails?What happens if a server fails? Can requests go to any server?Can requests go to any server?
– Synchronous method calls or Synchronous method calls or asynchronous messaging?asynchronous messaging?
Reduce dependency between componentsReduce dependency between components Failure tolerant designsFailure tolerant designs
– And manageability decisions to considerAnd manageability decisions to consider
3939
Redundancy=AvailabilityRedundancy=Availability
Passive or active standby systemsPassive or active standby systems– Re-route requests on failureRe-route requests on failure– Continuous service (almost)Continuous service (almost)
Recover failed system while alternative Recover failed system while alternative handles workloadhandles workload
May be some hand-over time (db recovery?)May be some hand-over time (db recovery?) Active standby & log shipping reduce thisActive standby & log shipping reduce this
– At the expense of 2x system cost… At the expense of 2x system cost…
What happens to in-flight work?What happens to in-flight work?– State recovers by aborting in-flight ops & State recovers by aborting in-flight ops &
doing db recovery but … doing db recovery but …
4040
Transaction RecoveryTransaction Recovery
Could be handled by middlewareCould be handled by middleware– Persistent queues of accepted requestsPersistent queues of accepted requests– Still a failure window thoughStill a failure window though
Large role for client apps/usersLarge role for client apps/users– Did the request get lost on failure?Did the request get lost on failure?– Retry on error?Retry on error?
Large role for server appsLarge role for server apps– What to do with duplicate requests?What to do with duplicate requests?– Try for idempotency (repeated txns OK)Try for idempotency (repeated txns OK)– Or track and reject duplicatesOr track and reject duplicates
4141
FragilityFragility
Large, distributed, synchronous Large, distributed, synchronous systems are systems are notnot robust robust– Many independent systems & links… Many independent systems & links…
Everything always has to be workingEverything always has to be working
– Rationale for Asynchronous MessagingRationale for Asynchronous Messaging Loosen ‘coupling’ between componentsLoosen ‘coupling’ between components Rely on guaranteed delivery insteadRely on guaranteed delivery instead May just defer error handling thoughMay just defer error handling though
– Could be much harder to handle laterCould be much harder to handle later To be discussed next time… To be discussed next time…
4242
ExampleExample
Customer component
Payment component
Order component
Goods component
Stock component
Delivery component
DCOM/COM
MTS
Database server
Application server
Warehouse manager
Print invoices
Web server
Web server
Local client workstations
Internet/ Intranet
Remote warehouses Scripted Web
pages
Central Office
User client apps
Remote sales offices
Router
Remote sites
HTTP
ORPC
MQ
User client apps
CICS interface
External company
accounting system