03 december 2002 (gary mckee)ecen 5053 engineering distributed systems1 introduction lecture...

31
03 December 2002 (Gary McKee ) ECEN 5053 Engineering Distributed S ystems 1 Introduction Lecture originally prepared by: Gary McKee Lecturer in Computer Science at CU Boulder Independent Consultant for 14 years Working in the software industry for over thirty years Currently working as a Principal Engineer (in algorithm design) for a systems engineering company in Colorado Springs Today’s topic: Distributed Data Reading assignment for today Chapter 15 in Coulouris text – Replication Chapter 18 in Coulouris text – Distributed Shared Memory (DSM) Optional Homework See “handout” on web site

Post on 15-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 1

Introduction• Lecture originally prepared by: Gary McKee

– Lecturer in Computer Science at CU Boulder

– Independent Consultant for 14 years

– Working in the software industry for over thirty years

– Currently working as a Principal Engineer (in algorithm design) for a systems engineering company in Colorado Springs

• Today’s topic: Distributed Data – Reading assignment for today

• Chapter 15 in Coulouris text – Replication

• Chapter 18 in Coulouris text – Distributed Shared Memory (DSM)

– Optional Homework• See “handout” on web site

Page 2: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 2

Overview• The topics to be presented:

– Data Distribution– Risk Analysis– Distributed Shared Memory– Distributed Data Base Systems– Engineering Concerns– Homework: Characterizing Distributed Data Architectures

Page 3: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 3

Business Requirements• ‘Data Distribution’ is about putting the data for a system in the

correct location based on the needs of the business.

• The best technology in the world is useless if:– it is not applied APPROPRIATELY– It isn’t what the customer NEEDS– It isn’t what the customer ASKED for

• Carefully examine the requirements associated with a particular problem.

– develop understanding of the ‘Business goals’– talk to the customer

This is NOT what I asked for, I don’t care WHAT the requirements said!!

Page 4: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 4

Data Distribution• What can we accomplish with the correct selection of

technology for Data Distribution?– fault tolerance – hardware & software failures are not catastrophic

– high availability – downtime is short, infrequent, not disruptive

– Reduction of bandwidth consumption – a shared resource

• There are a number of ways to accomplish data distribution in computer based systems– Replication (passive data systems)

– Distributed shared memory (passive data systems)

– Distributed data bases (active data systems)

Page 5: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 5

Replication• There are a number of ways to accomplish data

distribution in computer based systems using data replication

• Passive replication - this is the most traditional approach and is fairly easy to design, implement, and manage– distinguished replica, primary backup

• Active replication - this approach is more robust – but at the cost of more effort in design and implementation and

significantly more bandwidth consumption

– multicast to all replicas

Page 6: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 6

Architectures• Criteria

– Correctness

– sequential consistency• The interleaved sequence provides a single correct copy of the objects

• Program order is sustained

– Linearizability• The interleaved sequence provides a single correct copy of the objects

• The order of operations is consistent with the real times of the operation or request.

• Effect is same as serialized operations from a single replica manager despite of order of arrival

• Examples (gossip, bayou, coda)

Page 7: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 7

Architectures• Criteria

– Correctness

– sequential consistency• The interleaved sequence provides a single correct copy of the objects

• Program order is sustained

– Linearizability• The interleaved sequence provides a single correct copy of the objects

• The order of operations is consistent with the real times of the operation or request.

• Effect is same as serialized operations from a single replica manager despite of order of arrival

• Examples (gossip, bayou, coda)

Page 8: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 8

Stages of an Action • The stages of an action are these:

– Request

– Coordination: FIFO, causal, total ordering

– Execution

– Agreement

– Response

Page 9: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 9

Ordering coordination types• FIFO

– If a front end issues request r then r’, then any correct replica manager that handles r’ handles r before it.

• Causal– If the issue of request r happened-before the issues of request r’,

then any correct replica manager that handles r’ handles r before it.

• Total– If a correct replica manager handles r before request r’, then any

correct replica manager that handles r’ handles r before it

Page 10: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 10

user requestto RM7 which multicasts to

RM6

RM43

RM97

coordination

agreement

execution

(bandwidth concerns are here)

response

Page 11: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 11

Risk Analysis• What are the effects of various failure modes?

– Consider the needs of the business and the cost of failure

• Analytical principles– What can fail? For how long?

• Transaction failure modes– Network, node, software, security

– Transient, permanent

– Accidental, malicious

Page 12: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 12

Failure Modes• Network

– transient

– partitioning

– disconnected

– corrupt/attacked

• Node– transient

– corrupt

– dead

Page 13: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 13

Figure 14.11Available copies

A

X

Client + front end

P

B

Client + front end

Replica managers

deposit(A,3);

UT

deposit(B,3);

getBalance(B)

getBalance(A)

Replica managers

Y

M

B

N

A

B

from text by Coulouris et al

Page 14: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 14

Network failure modes• transient

– short time interval

– temporary partitioning

• partitioning – some neighbors

– some nodes not available

• disconnected – no neighbors

• corrupt/attacked – network is unreliable

– dangerous to access

node failure

network failure

Page 15: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 15

Node failure modes• transient

– data contention

– process hangs and is restarted

– OS crashes and restarts

• corrupt – erroneous process

– malicious process

– security failure

• dead– hardware failed

– hardware disconnected

Page 16: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 16

Distributed Shared Memory• An abstract model

– DSM is an example of a data replication system– Competes with message passing technologies for the same kind of

problems (eg. CORBA)

• Summary– Programming model is that DSM involves directly shared data

with no marshalling or formatting– Synchronization is via locks & semaphores as in direct shared

memory applications– DSM systems can be implemented to be persistent– For small, high speed networks (2-10 CPUs), DSM can be

implemented as efficiently as direct shared memory

Page 17: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 17

cpu

ramVM

(Disk)

cpu

ram

VM

(Disk)

<desired data>

CPU makes request for memory that is not in ram or VM. It’s in DSM. Sends request across to other cpu that is managing that data which sends the data across the network (in pages).

Page 18: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 18

cpu

ramVM

(Disk)

cpu

ram

VM

(Disk)

<desired data>

CPU makes request for memory that is not in ram or VM. It’s in DSM. Sends request across to other cpu that is managing that data which sends the data across the network.

Page 19: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 19

Principles of DSM• Uses the same programming protocols as physical shared

memory systems

• Much more difficult with a heterogeneous network– Especially if the data structures are different

• Communications control (and cost) is hidden from the developer– Development is easier

– Efficiency is harder

– Debugging can be very difficult

Page 20: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 20

Concepts & protocols• Invalidation

– Based on ‘paged virtual memory’ concepts

– Presumes that every page is ‘owned’ by some process

– The page owner has the most up-to-date copy of the page

– Protocol manager is a performance bottleneck

• Update– A write request generates a page fault, all other ‘read-only’ pages

are invalidated immediately

– The current owner sends the most current page and transfers ownership to the requester

Page 21: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 21

Consistency Models• This model will achieve Sequential Consistency, which is

what most programmers ASSUME that they have.– But it is costly to implement and uses a lot of bandwidth

– It also requires the programmer to do things the right way

• Release Consistency can reduce DSM overhead by considering the locks and semaphores used by the programmers to manage the data.– This is weaker than sequential consistency but much easier to

accomplish

– Examples: Munin, Treadmark

Page 22: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 25

Distributed Data Base System• Principles

– A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network

• Coupling– Multiprocessors are ‘tightly coupled’ if they share primary

memory

– Multiprocessors are ‘loosely coupled’ if they share secondary memory but not primary memory

• Problem areas– Note, the data itself is distributed across the network, not

just the users

Page 23: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 26

Architecture Concepts• Transparencies

– Refers to the separation of the higher level semantics of a system from the lower level implementation details

– In other words, the implementation details are hidden

• Data independence– Refers to the immunity of user applications to changes in the

definition and organization of data (and vice versa)

• Network transparency– The user should be protected from the details of the

network distribution details

• Replication– Data replication should be transparent to the user

Page 24: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 27

Standard models• Database reference models

– Components, Functions, Data

• Database architecture– Autonomy, Heterogeneity, Distribution

• homogeneous-heterogeneous

• federated-integrated

• Centralized - distributed – cached - replicated

Page 25: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 28

Control• Data Control

– User authentication

– Centralized-distributed

• Concurrency– Serializability

• Same as in other forms of data distribution

– Control mechanisms• Centralized, primary copy, decentralized

– Two-phased locking (2PL)• No transaction can request a lock after it releases one of its locks

• Hence, there is a growing locks phase and a shrinking locks phase

– Optimistic & pessimistic concurrency control

Page 26: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 31

Engineering Concerns• The engineer’s obligation is to identify and characterize

each advantage and disadvantage

• Tradeoffs– All technologies have advantages and disadvantages

• Architecture– The architecture should reflect knowledge about the business

requirements

• Design for Changeability– All systems change, and the data distribution aspects of a system

change in response to factors outside the engineer’s control such as existing hardware, politics, funding availability, increased volume, new requirements, changing performance expectations

Page 27: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 32

Bandwidth Analyses & Control• Bandwidth is the key to a successful data distribution

characterization

• The more you have available, the more flexibility you have in your decision process

• The less you use, the less vulnerable you are to external changes in network characteristics

Page 28: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 33

Debugging Distributed Data Systems• Identifying potential problems

– Planning ahead will facilitate both design and debugging

• Knowing the possible failure modes is key to determining what has actually happened– Determine what can fail, what can change, what can be

temporarily unavailable

– Determine the cost of each failure mode

– Develop a risk mitigation and recovery strategy for each failure mode

Page 29: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 34

Homework– The homework assignment is to develop one of the problems

further and produce a documented characterization.

– It is intended that each of the first three problems can be implemented using at least two different data distribution technologies -- choose an implementation; demonstrate that you understand the advantages, disadvantages, and consequences of the choice.

– You may make any reasonable assumption about unspecified characteristics of the system but all such assumptions should be documented in your written report.

– ----------------------4th problem not in your assignment------------------

– The fourth problem is intended to use DSM. If you select this one, show the disadvantages of another data distribution technology.

Page 30: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 35

Choices1. Design a database of teachers in the USA that are employed by any

government entity. This system should be locally available in the event of network outages

2. Port a legacy of existing IT applications for a large national company that is decentralizing its corporate offices in response to the 9/11 terrorist event. The amount of legacy code is dauntingly large and anything that can be done to simplify the conversion process should be considered as important.

3. Design the data distribution strategy for an educational self-study company that runs on the Internet, serves four million customers per year worldwide in seven languages and has 4000 course offerings. Consider the bandwidth, congestion, and network delays inherent in using the public Internet for this operation.

-------------------------------------problem below not in assignment--------4. Identify, specify, and characterize a system of your own that requires or

obviously benefits from a distributed shared memory system as described in chapter 16. Describe the DSM technology chosen and the benefits accrued thereby. Also identify the distribution constraints.

Page 31: 03 December 2002 (Gary McKee)ECEN 5053 Engineering Distributed Systems1 Introduction Lecture originally prepared by: Gary McKee –Lecturer in Computer Science

03 December 2002 (Gary McKee) ECEN 5053 Engineering Distributed Systems 36

The EndPresentation copyright 2002, Gary McKeePresentation copyright 2002, Gary McKeeFor use with For use with ECEN 5053ECEN 5053

Students and instructors who contribute to or attend lectures for ECEN 5053Students and instructors who contribute to or attend lectures for ECEN 5053 arearewelcome to use this presentation however they see fit, so long as this copyright notice welcome to use this presentation however they see fit, so long as this copyright notice remains intact. remains intact.

Some artwork in the presentation is used with permission from Corel Gallery Clipart Catalog Some artwork in the presentation is used with permission from Corel Gallery Clipart Catalog (copyright Corel Corporation, 3G Graphics Inc., Archive Arts, Cartesia Software)(copyright Corel Corporation, 3G Graphics Inc., Archive Arts, Cartesia Software)