advanced distributed software architectures and technology group adsat 1 scalability &...
Post on 22-Dec-2015
214 Views
Preview:
TRANSCRIPT
1Advanced Distributed Software Architectures and Technology group
ADSaT
Scalability & Availability
Paul GreenfieldCSIRO
2Advanced Distributed Software Architectures and Technology group
ADSaT
Building Real Systems
• Scalable– Fast enough to handle expected load– Grow easily when load grows
• Available– Available enough of the time
• Performance and availability cost– Aim for ‘enough’ of each but not more
3Advanced Distributed Software Architectures and Technology group
ADSaT
Scalable
• Scale-up– Bigger and faster systems
• Scale-out– Systems working to handle load– Server farms– Clusters
• Implications for application design
4Advanced Distributed Software Architectures and Technology group
ADSaT
Available
• Goal is 100% availability– 24x7 operations
• Redundancy is the key– No single points of failure– Spare everything
• Disks, disk channels, processors, power supplies, fans, memory, ..
• Automated fail-over and recovery
5Advanced Distributed Software Architectures and Technology group
ADSaT
Performance
• How fast is this system? – Not the same as scalability but related
• Scalability is concerned with the limits to possible performance
– Measured by response time and throughput
– Aim for enough performance• Have a performance target• Tune and add hardware until target hit• Then worry about tomorrow…
6Advanced Distributed Software Architectures and Technology group
ADSaT
Performance Measures
• Response time– What delay does the user see?– Instantaneous is good but 95%
under 2 seconds is acceptable– Response time varies with
‘heaviness’ of transactions• Fast read-only transactions• Slower update transactions• Effects of database contention
7Advanced Distributed Software Architectures and Technology group
ADSaT
Response TimesKeytable performance
0
2000
4000
6000
8000
10000
12000
14000
1 5 10 20 50 100 200 400 600 800 1000
Clients
Res
po
nse
tim
e (m
s)
Buy
Create
Get HS
Query C
Query ID
Sell
Update
8Advanced Distributed Software Architectures and Technology group
ADSaT
Response TimesIdentity performance
0
500
1000
1500
2000
2500
3000
1 5 10 20 50 100 200 400 600 800 1000
Clients
Res
po
nse
tim
e (m
s)
Buy
Create
Get HS
Query C
Query ID
Sell
Update
9Advanced Distributed Software Architectures and Technology group
ADSaT
Response TimesC++ response times
remote db - identity & keytable
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 200 400 600 800 1000 1200
Clients
Res
po
nse
tim
e (m
s)
Read ident
Update ident
Average ident
Read key
Update key
Average key
10Advanced Distributed Software Architectures and Technology group
ADSaT
Throughput
• How many transactions can be handled in some period of time– Transactions/second or tpm, tph or tpd– A measure of overall capacity
• Transaction Processing Council– Standard benchmarks for TP systems– TPCC for typical transaction system– www.tpc.org– Current record is 227,000 tpmc
11Advanced Distributed Software Architectures and Technology group
ADSaT
Throughput
• Throughput increases until some resource limit is hit– Adding more clients just increases
the response time– Run out of processor, disk
bandwidth, network bandwidth– Some resources overload badly
• Ethernet network performance degrades
12Advanced Distributed Software Architectures and Technology group
ADSaT
ThroughputC++ transaction rates
0
50
100
150
200
250
300
350
400
450
500
0 200 400 600 800 1000 1200
Client threads
TP
S
Local keytable
Local Identity
Remote identity 10M
Remote identity 100M
Remote keytable 100M
13Advanced Distributed Software Architectures and Technology group
ADSaT
System Capacity
• How many clients can you support?– Name an acceptable response time– Average 95% under 2 secs is common
• And what is ‘average’?
– Plot response time vs # of clients• Great if you can run benchmarks
– Reason for prototyping and proving proposed architectures before leaping into full-scale implementation
14Advanced Distributed Software Architectures and Technology group
ADSaT
System CapacityC++ average response times
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 200 400 600 800 1000 1200
Client threads
Res
po
nse
tim
e (m
s)
Local keytable
Remote keytable
Local identity
Remote identity
15Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing I
• A few different but related meanings• 1. Balancing across server processes
– CORBA-style where clients use objects that live inside server processes
– Want all server processes to be busy– Client calls have to go to the process
containing their object, even if this process is busy and others are idle
16Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing I
Simple Load balancing
02468
101214
0 10 20 30
Servers
%Load
No LoadBalancing
Load Balanced
Load Balanced
17Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing I
• Client calls on name server to find the location of a suitable server
• Name server can spread client objects across multiple servers– Often ‘round robin’
• Client is bound to server and stays bound forever– Can lead to performance problems
18Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing I
Server Object Reference
Client Numbers
Total Clients per server object
1 1-100 100
2 101-200 100
3 201-300 100
4 301-400 100
5 401-500 100
Server Object Reference
Client Numbers
Total Clients per server object
1 1-100, 201, 206, 211, ….496
160
2 101-200, 202, 207, 212, …, 497
160
3 203, 208, 213, …, 498
60
4 204, 209, 214, …, 499
60
5 205, 210, 215, …, 500
60
Initial Later
19Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing I
• Solution to static allocation problem is for clients to throw away their server objects and get new ones every now and again
• Application coding problem– And can be objects be discarded?– What kind of ‘objects’ are they if
they can be discarded?
20Advanced Distributed Software Architectures and Technology group
ADSaT
Name Servers
• Server processes call name server when they come up– Advertising their services
• Clients call name server to find the location of a server process– Up to the name server to match
clients to servers• Client calls server process to
create objects
21Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing I
Client
Client
Client
Name Server
Server process
Server process
Advertise service
Request server reference
Return server reference
Call server object’s methods
Get server object reference
Load balancing across processes within a server
22Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing II
• What happens when our single system is full?– Use faster systems
• Scale-up
– Use additional systems• Scale-out• Now load-balancing is used to spread
load across systems
23Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing II
• CORBA world…– Name server can distribute across
server processes running on different systems
– Scales well…• Name server only involved when
handing out a reference to a server, not on every method call
24Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing II
Client
Client
Client
Name Server
Server process
Server process
Advertise service
Request server reference
Return server reference
Call server object’s methods
Get server object reference
Load balancing across multiple systems
25Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing II
• COM+ world…– No need for load-balancing within a
system• Multithreaded server process• All objects live in a single process space
– Component load balancing across systems• Client calls router when creating object• Router returns reference to an object in a
COM+ server process• Load balanced at time of object creation
26Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing II
Client
Client
Client
App
DLL
DCOM/
MTS
MTS process
Thread pool
Shared object space
Application code
COM+/MTS using thread pools rather than load balancing within a single system
27Advanced Distributed Software Architectures and Technology group
ADSaT
COM+ Component Load Balancing
Client
Client
Client
Response time tracker
RouterCreate object
Call object’s methods
Pass request to server
Create object and pass back reference
COM + CLB balancing load across multiple systems
28Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing II
• COM+ scales well…– Router only involved when object is
created• May change in later release to support
dynamic re-balancing as server load changes
– Method calls direct from client to server– Allocation based on response time
rather than round-robin• Allocate to least-loaded server
29Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing II
• No name server in COM world?– COM/MTS clients ‘know’ the name
of the server• Set at client installation time• Can change using GUI tools• Admin problem if server app is moved
– COM+ uses Active Directory to find services
30Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing II
• Some systems involve the router in every method call/request– Request goes to router process who
then passes it on to a server process– Scales poorly as the router can be a
major bottle-neck– Some availability concerns as well
• What happens if the router fails?
31Advanced Distributed Software Architectures and Technology group
ADSaT
Load Balancing II
Client
Client
Client
Router
Server process
Server process
Load balancing with router in main call path
32Advanced Distributed Software Architectures and Technology group
ADSaT
Scale-up
• No need for load-balancing across systems
• Just use a bigger box– Add processors, memory, ….– SMP (symmetric multiprocessing)
• Runs into limits eventually• Could be less available
33Advanced Distributed Software Architectures and Technology group
ADSaT
Scale-up
• Example from the Web– Large auction site– Server farm of NT boxes (scale-out)– Single database server (scale-up)
• 64-processor SUN box
– More capacity needed?• Add more NT boxes easily• SUN box is full so have to shift some
databases to another box
34Advanced Distributed Software Architectures and Technology group
ADSaT
Clusters
• A group of independent computers acting like a single system– Shared disks– Single IP address– Single set of services– Fail-over to other members of cluster– Load sharing within the cluster– DEC, IBM, MS, …
35Advanced Distributed Software Architectures and Technology group
ADSaT
ClustersClient PCsClient PCs
Server AServer A Server BServer B
Disk cabinet ADisk cabinet A
Disk cabinet BDisk cabinet B
HeartbeatHeartbeat
Cluster managementCluster management
36Advanced Distributed Software Architectures and Technology group
ADSaT
Clusters
• Address scalability– Add more boxes to the cluster
• Address availability– Fail-over– Add & remove boxes from the
cluster for upgrades and maintenance
• Can be used as one element of a highly-available system
37Advanced Distributed Software Architectures and Technology group
ADSaT
Web Server Farms
• Web servers are highly scalable– Web applications are normally stateless
• Next request can go to any Web server• State comes from client or database
– Just need to spread incoming requests• IP sprayers (hardware, software)• >1 Web server looking at same IP address
with some coordination (see MS WLB docs)
– Same technique for other network apps
38Advanced Distributed Software Architectures and Technology group
ADSaT
Available SystemWeb Clients
Web Servers Load balanced using Convoy
App Servers use COM+ LB
Database is installed on Wolfpack cluster for high availability
COM+ LBS router node
39Advanced Distributed Software Architectures and Technology group
ADSaT
Availability
• How much?– 99% 87.6 hours a year– 99.9% 8.76 hours a year– 99.99% 0.876 hours a year
• Need to consider operations as well– Maintenance, software upgrades,
backups, application changes– Not just faults and recovery time
40Advanced Distributed Software Architectures and Technology group
ADSaT
Availability and Scalability• Often a question of application design
– Stateful vs stateless• What happens if a server fails?• Can requests go to any server?
– What language and database API• Balance cost vs speed – VB/C++ - ODBC/ADO
– Synchronous method calls or asynchronous messaging?• Reduce dependency between components• Failure tolerant designs
41Advanced Distributed Software Architectures and Technology group
ADSaT
Next Week
• Distributed application architectures– How to design systems that will
work, scale and be available– Web-based systems– Web technology
top related