03 december 2002 (gary mckee)ecen 5053 engineering distributed systems1 introduction lecture...
Post on 15-Jan-2016
217 views
TRANSCRIPT
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 1
Introduction• Lecture originally prepared by: Gary McKee
– Lecturer in Computer Science at CU Boulder
– Independent Consultant for 14 years
– Working in the software industry for over thirty years
– Currently working as a Principal Engineer (in algorithm design) for a systems engineering company in Colorado Springs
• Today’s topic: Distributed Data – Reading assignment for today
• Chapter 15 in Coulouris text – Replication
• Chapter 18 in Coulouris text – Distributed Shared Memory (DSM)
– Optional Homework• See “handout” on web site
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 2
Overview• The topics to be presented:
– Data Distribution– Risk Analysis– Distributed Shared Memory– Distributed Data Base Systems– Engineering Concerns– Homework: Characterizing Distributed Data Architectures
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 3
Business Requirements• ‘Data Distribution’ is about putting the data for a system in the
correct location based on the needs of the business.
• The best technology in the world is useless if:– it is not applied APPROPRIATELY– It isn’t what the customer NEEDS– It isn’t what the customer ASKED for
• Carefully examine the requirements associated with a particular problem.
– develop understanding of the ‘Business goals’– talk to the customer
This is NOT what I asked for, I don’t care WHAT the requirements said!!
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 4
Data Distribution• What can we accomplish with the correct selection of
technology for Data Distribution?– fault tolerance – hardware & software failures are not catastrophic
– high availability – downtime is short, infrequent, not disruptive
– Reduction of bandwidth consumption – a shared resource
• There are a number of ways to accomplish data distribution in computer based systems– Replication (passive data systems)
– Distributed shared memory (passive data systems)
– Distributed data bases (active data systems)
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 5
Replication• There are a number of ways to accomplish data
distribution in computer based systems using data replication
• Passive replication - this is the most traditional approach and is fairly easy to design, implement, and manage– distinguished replica, primary backup
• Active replication - this approach is more robust – but at the cost of more effort in design and implementation and
significantly more bandwidth consumption
– multicast to all replicas
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 6
Architectures• Criteria
– Correctness
– sequential consistency• The interleaved sequence provides a single correct copy of the objects
• Program order is sustained
– Linearizability• The interleaved sequence provides a single correct copy of the objects
• The order of operations is consistent with the real times of the operation or request.
• Effect is same as serialized operations from a single replica manager despite of order of arrival
• Examples (gossip, bayou, coda)
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 7
Architectures• Criteria
– Correctness
– sequential consistency• The interleaved sequence provides a single correct copy of the objects
• Program order is sustained
– Linearizability• The interleaved sequence provides a single correct copy of the objects
• The order of operations is consistent with the real times of the operation or request.
• Effect is same as serialized operations from a single replica manager despite of order of arrival
• Examples (gossip, bayou, coda)
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 8
Stages of an Action • The stages of an action are these:
– Request
– Coordination: FIFO, causal, total ordering
– Execution
– Agreement
– Response
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 9
Ordering coordination types• FIFO
– If a front end issues request r then r’, then any correct replica manager that handles r’ handles r before it.
• Causal– If the issue of request r happened-before the issues of request r’,
then any correct replica manager that handles r’ handles r before it.
• Total– If a correct replica manager handles r before request r’, then any
correct replica manager that handles r’ handles r before it
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 10
user requestto RM7 which multicasts to
RM6
RM43
RM97
coordination
agreement
execution
(bandwidth concerns are here)
response
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 11
Risk Analysis• What are the effects of various failure modes?
– Consider the needs of the business and the cost of failure
• Analytical principles– What can fail? For how long?
• Transaction failure modes– Network, node, software, security
– Transient, permanent
– Accidental, malicious
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 12
Failure Modes• Network
– transient
– partitioning
– disconnected
– corrupt/attacked
• Node– transient
– corrupt
– dead
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 13
Figure 14.11Available copies
A
X
Client + front end
P
B
Client + front end
Replica managers
deposit(A,3);
UT
deposit(B,3);
getBalance(B)
getBalance(A)
Replica managers
Y
M
B
N
A
B
from text by Coulouris et al
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 14
Network failure modes• transient
– short time interval
– temporary partitioning
• partitioning – some neighbors
– some nodes not available
• disconnected – no neighbors
• corrupt/attacked – network is unreliable
– dangerous to access
node failure
network failure
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 15
Node failure modes• transient
– data contention
– process hangs and is restarted
– OS crashes and restarts
• corrupt – erroneous process
– malicious process
– security failure
• dead– hardware failed
– hardware disconnected
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 16
Distributed Shared Memory• An abstract model
– DSM is an example of a data replication system– Competes with message passing technologies for the same kind of
problems (eg. CORBA)
• Summary– Programming model is that DSM involves directly shared data
with no marshalling or formatting– Synchronization is via locks & semaphores as in direct shared
memory applications– DSM systems can be implemented to be persistent– For small, high speed networks (2-10 CPUs), DSM can be
implemented as efficiently as direct shared memory
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 17
cpu
ramVM
(Disk)
cpu
ram
VM
(Disk)
<desired data>
CPU makes request for memory that is not in ram or VM. It’s in DSM. Sends request across to other cpu that is managing that data which sends the data across the network (in pages).
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 18
cpu
ramVM
(Disk)
cpu
ram
VM
(Disk)
<desired data>
CPU makes request for memory that is not in ram or VM. It’s in DSM. Sends request across to other cpu that is managing that data which sends the data across the network.
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 19
Principles of DSM• Uses the same programming protocols as physical shared
memory systems
• Much more difficult with a heterogeneous network– Especially if the data structures are different
• Communications control (and cost) is hidden from the developer– Development is easier
– Efficiency is harder
– Debugging can be very difficult
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 20
Concepts & protocols• Invalidation
– Based on ‘paged virtual memory’ concepts
– Presumes that every page is ‘owned’ by some process
– The page owner has the most up-to-date copy of the page
– Protocol manager is a performance bottleneck
• Update– A write request generates a page fault, all other ‘read-only’ pages
are invalidated immediately
– The current owner sends the most current page and transfers ownership to the requester
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 21
Consistency Models• This model will achieve Sequential Consistency, which is
what most programmers ASSUME that they have.– But it is costly to implement and uses a lot of bandwidth
– It also requires the programmer to do things the right way
• Release Consistency can reduce DSM overhead by considering the locks and semaphores used by the programmers to manage the data.– This is weaker than sequential consistency but much easier to
accomplish
– Examples: Munin, Treadmark
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 25
Distributed Data Base System• Principles
– A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network
• Coupling– Multiprocessors are ‘tightly coupled’ if they share primary
memory
– Multiprocessors are ‘loosely coupled’ if they share secondary memory but not primary memory
• Problem areas– Note, the data itself is distributed across the network, not
just the users
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 26
Architecture Concepts• Transparencies
– Refers to the separation of the higher level semantics of a system from the lower level implementation details
– In other words, the implementation details are hidden
• Data independence– Refers to the immunity of user applications to changes in the
definition and organization of data (and vice versa)
• Network transparency– The user should be protected from the details of the
network distribution details
• Replication– Data replication should be transparent to the user
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 27
Standard models• Database reference models
– Components, Functions, Data
• Database architecture– Autonomy, Heterogeneity, Distribution
• homogeneous-heterogeneous
• federated-integrated
• Centralized - distributed – cached - replicated
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 28
Control• Data Control
– User authentication
– Centralized-distributed
• Concurrency– Serializability
• Same as in other forms of data distribution
– Control mechanisms• Centralized, primary copy, decentralized
– Two-phased locking (2PL)• No transaction can request a lock after it releases one of its locks
• Hence, there is a growing locks phase and a shrinking locks phase
– Optimistic & pessimistic concurrency control
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 31
Engineering Concerns• The engineer’s obligation is to identify and characterize
each advantage and disadvantage
• Tradeoffs– All technologies have advantages and disadvantages
• Architecture– The architecture should reflect knowledge about the business
requirements
• Design for Changeability– All systems change, and the data distribution aspects of a system
change in response to factors outside the engineer’s control such as existing hardware, politics, funding availability, increased volume, new requirements, changing performance expectations
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 32
Bandwidth Analyses & Control• Bandwidth is the key to a successful data distribution
characterization
• The more you have available, the more flexibility you have in your decision process
• The less you use, the less vulnerable you are to external changes in network characteristics
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 33
Debugging Distributed Data Systems• Identifying potential problems
– Planning ahead will facilitate both design and debugging
• Knowing the possible failure modes is key to determining what has actually happened– Determine what can fail, what can change, what can be
temporarily unavailable
– Determine the cost of each failure mode
– Develop a risk mitigation and recovery strategy for each failure mode
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 34
Homework– The homework assignment is to develop one of the problems
further and produce a documented characterization.
– It is intended that each of the first three problems can be implemented using at least two different data distribution technologies -- choose an implementation; demonstrate that you understand the advantages, disadvantages, and consequences of the choice.
– You may make any reasonable assumption about unspecified characteristics of the system but all such assumptions should be documented in your written report.
– ----------------------4th problem not in your assignment------------------
– The fourth problem is intended to use DSM. If you select this one, show the disadvantages of another data distribution technology.
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 35
Choices1. Design a database of teachers in the USA that are employed by any
government entity. This system should be locally available in the event of network outages
2. Port a legacy of existing IT applications for a large national company that is decentralizing its corporate offices in response to the 9/11 terrorist event. The amount of legacy code is dauntingly large and anything that can be done to simplify the conversion process should be considered as important.
3. Design the data distribution strategy for an educational self-study company that runs on the Internet, serves four million customers per year worldwide in seven languages and has 4000 course offerings. Consider the bandwidth, congestion, and network delays inherent in using the public Internet for this operation.
-------------------------------------problem below not in assignment--------4. Identify, specify, and characterize a system of your own that requires or
obviously benefits from a distributed shared memory system as described in chapter 16. Describe the DSM technology chosen and the benefits accrued thereby. Also identify the distribution constraints.
03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 36
The EndPresentation copyright 2002, Gary McKeePresentation copyright 2002, Gary McKeeFor use with For use with ECEN 5053ECEN 5053
Students and instructors who contribute to or attend lectures for ECEN 5053Students and instructors who contribute to or attend lectures for ECEN 5053 arearewelcome to use this presentation however they see fit, so long as this copyright notice welcome to use this presentation however they see fit, so long as this copyright notice remains intact. remains intact.
Some artwork in the presentation is used with permission from Corel Gallery Clipart Catalog Some artwork in the presentation is used with permission from Corel Gallery Clipart Catalog (copyright Corel Corporation, 3G Graphics Inc., Archive Arts, Cartesia Software)(copyright Corel Corporation, 3G Graphics Inc., Archive Arts, Cartesia Software)