architecture of distributed systems 2011/2012johanl/educ/2ii45/ads.02.distributed.pdf ·...
TRANSCRIPT
Architecture of Distributed Systems 2011/2012
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking122-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking1
Distributed systems concepts & concerns
Johan Lukkien
Related to chapters 1-2: Tanenbaum & van Steen, Distributed Systems, Principles and Paradigms
Goals of this presentation
• Understanding extra-functional concerns guiding
design choices in distributed systems.
• In particular, understanding the concept of
scalability and the various forms it can take.
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking2
Definition• Distributed system (Tanenbaum, van Steen): a collection of
independent computers that appears to its users as a single, coherent system
• Distributed system (Lukkien): the hard- and software of a collection of independent computers that cooperate to realize some functionality
• Notes– may be: serving a user, but users may not be
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking3
– may be: serving a user, but users may not be involved directly
• e.g. DNS, DHCP
– independent:
• Concurrent
• Independent failure
• No shared clock
• Autonomous
• Heterogeneous (different capabilities and capacities)
• (Spatial distribution)
– individual parts cannot realize that functionality just by themselves
Hence,
• Distributed system (Lamport): a system in which the failure of a computer you didn't even know existed can render your own computer unusable
– quote from a May ‘87 email
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking4
Motivation for distributed systems
• As requirement
– distribution of the physical environment is given as part of the technical environment (a constraint)
• physical separation of data and access to data
– different administrative domains
– physical need for separation (body sensors, video surveillance)
• ‘fact of life’ – internet connected machines
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking5
• ‘fact of life’ – internet connected machines
• As solution
– introduce distribution for addressing extra-functional properties
• concurrency – e.g performance
• replication – e.g. reliability
• security – e.g. separate access control from web-page access
• modularity, specialization – ease of design
Extra-functional aims in architectures of distributed
systems• Sharing, and independent development
– share functionality and expertise• compose applications from distributed functionality, maintained by specialized parties
– share resources and data• e.g. storage, files, centralized compute power
– ....particularly when the distribution is a given constraint
• Transparency, when needed– hide internal details with respect to some viewpoint
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking6
– hide internal details with respect to some viewpoint
– ....particularly the consequences of distribution and multiple access
• Scalability– in size: dealing with large numbers of machines, users, tasks, ...
– in location: dealing with geometric distribution and mobility
– in administration: addressing data passing through different regions of ownership
• Performance– by exploiting concurrency
– balancing storage, communication, computation
Possible transparencies in a Distributed System
Transparency Description
AccessHide differences in data representation and how a resource is accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking7
Relocation Hide that a resource may be moved to another location while in use
Replication Hide that a resource is replicated
Concurrency Hide that a resource may be shared by several competitive users
Failure Hide the failure and recovery of a resource
Persistence Hide whether a (software) resource is in memory or on disk
Scalability• .... being able to size things up....
• Examples: scalable algorithms, scalable networks, scalable filesystems– what does: ‘this parallel algorithm scales linearly’ mean?
– when is a computer network scalable? or a filesystem?
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking8
Example
• “For this discussion, file system scalability is defined as the ability to
support very large file systems, large files, large directories and large
numbers of files while still providing I/O performance. (A more
detailed discussion of performance is contained in a later section.)”
• “By comparison, the UFS file system was designed in the early 1980’s
at UC Berkeley when the scalability requirements were much different
than they are today. File systems at that time were designed as much
to conserve the limited available disk space as to maximize to conserve the limited available disk space as to maximize
performance. This file system is also frequently referred to as FFS or
the "fast" file system. While numerous enhancements have been
made to UFS over the years to overcome limitations that appeared as
technology marched on, the fundamental design still limits its
scalability in many areas.”
• http://linux-xfs.sgi.com/projects/xfs/papers/xfs_white/xfs_white_paper.html
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking9
Scalability framework• scale parameter, or size: k
– k is carried through into all considered system aspects of interest together
– e.g. # clients, # servers
• scalability metric, m(k), measure of the system at scale k– measure of a quality property (of a system, of an algorithm, ....)
• e.g. response time, network diameter, bandwidth between pairs, bisection bandwidth, reliability, utilization, number of operations, cost (money)
• scalability criterion, Z(k)
• scalability is defined as a relation between m(k) and Z(k)– e.g. m(k) ≥ Z(k), m(k) ~ Z(k), m(k) ≤ Z(k), m(k) → Z(k)
– this relation may have bounds – a range for which the scaling is considered
• Often, Z(k) is not made explicit– e.g. “system 1 scales better than system 2”: m1(k) ≤ m2(k)
– or: “this system does not scale”: the shape of m is (subjectively) discouraging
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking10
Example: Google File System
• Above: model in the deployment view (process view) of GFS
• GFS aims at efficiently and reliably managing many extremely large files for many clients, using commodity hardware (for GFS)
– hence, many parameters for which scalability questions can be asked
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking11
Picture from ‘The Google File System’, by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, published at http://labs.google.com/papers/gfs-sosp2003.pdf (download June 2011)
GFS: scaling with number of clients
• k: # clients
• m(k): aggregated read (write, append) speed, assuming random file access
• Z(k): (not explicitly mentioned): the closer to network limit, the better
• Notes:
– scalability here says something about how efficient resources are used (utilization)
– explain the shape of the Network limit curve (think of the physical view)
– what are shapes that indicate bad scalability?
– what are alternative scalability metrics (mention three)?22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking
12
Picture from ‘The Google File System’, by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, published at http://labs.google.com/papers/gfs-sosp2003.pdf
Scalability framework• Scalability is always with respect to the scalability metric and the criterion
(and, in fact, on the way the scaling factor k is taken into account).– ‘This system is scalable’ is a rather pointless expression (or underspecified)
– always investigate ‘what scales with what’
• reference: compare with k = k0 as reference to see the dependence on k: – examine m(k)/m(k0) or m(k0)/m(k)
• (depending on behavior, e.g. whether m is increasing or decreasing with k)
• linear scalability: m(k) /m(k0) ≥ f·k/k0 (where f is a positive number)– dividing by m(k0) can be regarded as normalization (e.g. k0 = 1)
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking13
Example: matrix multiplication
• Executed on a network of connected (processor, memory) pairs
• Each process(or) performs the computations required for the part of C it is assigned to
for i := 0 to N-1 do
for j := 0 to N-1 do
C[i,j] := DotProduct (A[i,*], B[*,j])
is assigned to
• To that end it needs to receive parts of A and B stored elsewhere
• Then, an approximation of the time to execute on P processors is
T(P,N) = (2fN3)/P + 2cN2
• f = time for a computation step, being multiplications and additions
• c = time for communication of an element – simple communication model
• assumes that entire A and B matrices need to be communicated (ignores local storage)
• additional delays, including possible communication delays, ignored
• assumes serialization of computation and communication
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking14
Scalability metrics• T(P,N) is a scalability metric: examine scaling as function of P and N
– as function of N this represents the complexity of the parallel algorithm
• m(N) = T(P,N) , follows theoretically the performance of the sequential algorithm
• though a ‘limited-scalable’ implementation might limit values of N (see below)
– as function of P, normalization leads to the speedup
• S(P,N) = T(1,N) / T(P,N)
• T(1,N) has no communication costs
• The speedup can be examined in both dimensions• The speedup can be examined in both dimensions
– S(P,N) = (2fN3) / T(P,N) = P / (1+(cP/fN))
• Scalability as function of P:
– e.g. require S(P,N) ≥ Z(P) = 0,6P, this gives a range for each N
• Scalability questions as function of N:
– will the speedup converge to P for large N?
– How fast? How large must N be to be close to P?
– How many processors can reasonably be used for a size N problem?22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking15
Scalability analysis• Possible scalability criteria:
– how large P can become or N must be while having e.g. S(P,N) ≥ 0.6P
– how fast S(P,N) converges to P as function of N; in particular, how fast
the initial rise is (then the network may even be used for small N)
• Shape depends on speed of communication relative to computation
– picture shows slow and faster communication
– c/f is an important platform parameter for this system– c/f is an important platform parameter for this system
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking16
A value preserving perspective• Looking only at a single dependency is limited
– typically, a larger system is designed to work with larger problems
– hence, when going to size k, increase a number of relevant, dependent
parameters, collectively
– Examples:• more servers are installed to address a larger number of clients
• a larger processor network is use to perform a larger matrix multiplication
• note: a decision must be made how to jointly change these numbers!• note: a decision must be made how to jointly change these numbers!
• The value of the system, v(k), is a metric representing the systems’ benefit at scale k
– e.g. effective #transactions/sec, effective computations/sec
• The cost of the system at scale k, C(k), represents a cost measure for creating the system at scale k
– e.g. # processors including additional ones for increasing connectivity, network
or memory resources, or real money
22-Sep-11
Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking17
Joint changes
• Scalability question: if we scale system parameters jointly, do we retain value for the investment?
• Website
– increase #servers such that #clients / #servers is roughly constant
• Matrix multiplication
– increase #processors with factor p– increase #processors with factor p
– how should N change?
• a factor p would generate a ~p3 increase in work
• hence, increasing N with factor p1/3 gives increase in work request that is
comparable to increase in work capacity
• The metric v(k)/C(k), represents value for money (“efficiency”)
– scalable: must be constant, or increasing with k
– note: C(k) plays the role of Z(k) in our scalability framework
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking18
Our example: matrix multiplication
• Start situation (k=1)
– matrix dimension N0
• Scale with factor k:
– matrix dimension: k1/3N0
– k processors
• Value v(k), options:
– we choose: speedup: – we choose: speedup:
v(k) = T(1, k1/3N0 ) / T(k, k1/3N0) = k / (1 + ck/(fk1/3N0) )
– alternative: effective number of operations per second:
v(k) = 2kN03 / T(k, k1/3N0)
• Cost, C(k): # processors
• v(k) / C (k):
– ~ 1 / (1+ ck2/3/(fN0) )
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking19
Example (cnt’d)
• v(k) / C (k) ~ 1 / (1+ ck2/3/(fN0) )
– When k increases, this approaches 0
– From this perspective, the matrix multiplication is not scalable
• We need to increase the problem size faster in order to have scalability
– the analysis shows that only if the problem size is scaled linearly with k the – the analysis shows that only if the problem size is scaled linearly with k the
value for money metric is constant
– this is because only then the overhead term from the communication is
‘overcome’
• If the communication would also decrease with #processors
– e.g. T(P,N) = (2fN3+ 2cN2 )/P
– then v(k) / C (k) = 1 / (1+ c/(fk1/3N0) ) (goes to 1 for k to infinity)
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking20
• Assumptions in our example
– communication time can be seen as an overhead, proportional to the size of
the matrices
– no idle time is incurred, e.g. by waiting on communication resources
– communication and computation don’t overlap
• Possible realization:
Reviewing some assumptions
• Possible realization:
– block distribution of A, B and C
– mapped on a torus• note that this architecture admits
concurrency in the communication
– circle matrix blocks around in
both dimensions
– use latency hiding• communicating while computing
• Results in better scalability
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking21
Non-Scaling
• Suppose communications in the matrix multiplication are done via a shared medium, without local buffering
– e.g. bus, wireless medium
• Exercise: model T(P,N) again and examine the scalability. Mind the order of local computations.
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking22
Architecture scalability: balance usage• A scalable architecture can accommodate changes in usage, maintaining
value for money.• Two types of parameters that are scaled
– usage parameters • e.g. number of users, input size, average behavior, number of tasks, number of
connected stations
– architecture parameters• e.g. number of servers, number of links and bandwidth, topology, distribution of
tasks, number of processors
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking23
tasks, number of processors
• Scalability of the architecture: the extent to which architecture parameters can and must be changed to sustain a given metric under changes in usage parameters– i.e., whether the architecture can be adjusted to accommodate for the changes,
and how much the ‘cost’ will be
• Example:– usage parameter: #clients, architecture parameter: #servers, metric: response
time
– what would this mean for the Google File System example?
Amdahl’s law
• Consider a system described with scaling parameters
– N (problem size, number of jobs or clients per second) and
– P (number of servers, processors)
• In order to improve performance, more machines are added
• Typically, the performance metric (for one processor) has two parts:
– m(1,N) = seq(N) + par(N) • e.g., seq(N) = cN2 and par(N) = 2fN3 in previous slides
• The first term cannot be improved by adding more processors, which limits potential improvements
– m(P,N) = seq(N) + par(N)/P
– S(P,N) ≤ m(1,N)/seq(N) = 1 + par(N)/seq(N)
– However, notice that the second term may be increasing in N
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking24
Amdahl’s law, general lesson
• Typically, improvements focus on a part of the problem, reducing a cost function only partly
– TotalCost = Cost (Static part) + Cost (Part subject to improvement)
• The improvement of TotalCost by addressing the second term is limited
• Therefore, whenever the cost contribution has become negligible, move • Therefore, whenever the cost contribution has become negligible, move your attention somewhere else
• Examples:
– single server dispatches request• the server can become the bottleneck
– improving performance of a subtask
– what about improving code?
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking25
Scalability Problems
Concept Example
Centralized services A single server for all users
Centralized data A single on-line telephone book
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking26
Centralized data A single on-line telephone book
Centralized algorithms Doing routing based on complete information
Scalability principles
• Rules of thumb (towards scalability): limit dependencies– No machine needs/has complete information
– Decisions are based on local information
– Failure of one machine does not ruin results
– No assumption of a global, shared clock
• Techniques & mechanisms– Hide latency: do something useful while waiting
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking27
– Hide latency: do something useful while waiting
– Introduce concurrency• e.g. obtain parts of a webpage concurrently
– Replicate or relocate data, but also work • bring them together
• distribute evenly
• compute same results by multiple machines
– Distribute tasks• using the locality principle: tasks should depend only on a small number of
neighboring tasks
Scaling Techniques (1)
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking28
Scaling Techniques (2)
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking29
• Dividing the DNS space into zones
• Benefits from:– locality principle
– task distribution
– information replication (caching)
Extra-functional properties and architecture
• Question of the architect: which architectural choices (design decisions) enhance some given extra-functional properties?
– e.g. how does scalability of a client-server system compare to a peer-2-peer system?
– well, then we must know usage parameters, architecture parameters and metrics of choice
• e.g. #clients, #servers, response time
– and we must model the behavior and/or perform experiments
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking30
– and we must model the behavior and/or perform experiments
• This question also relates to finer grained choices to be made– e.g. the nature of the connectors (message passing, RPC, ....)
• Main views addressed in the following presentations: – process and deployment views,
– and logical organization of the development view
Questions
• Does Amdahl’s law mean that it does not make sense to make parallel systems beyond a certain size? Explain your answer.
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking31
Literature
• Further reading:
– Grama, Gupta, Karypis and Kumar, Introduction
to parallel computing 2nd edition, Chapter 5
– Foster, Designing and Building Parallel
Programs, Chapter 3.
• (available on the Web, http://www.mcs.anl.gov/dbpp)• (available on the Web, http://www.mcs.anl.gov/dbpp)
– See the web site
22-Sep-11 Johan J. Lukkien, [email protected]
TU/e Informatica, System Architecture and Networking32