information resources management april 17, 2001. agenda n administrivia n database architectures

88
Information Resources Information Resources Management Management April 17, 2001 April 17, 2001

Post on 19-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Information Resources Information Resources ManagementManagement

April 17, 2001April 17, 2001

AgendaAgenda

AdministriviaAdministrivia Database ArchitecturesDatabase Architectures

AdministriviaAdministrivia

Homework #8Homework #8

Database ArchitecturesDatabase Architectures

CentralizedCentralized Client-ServerClient-Server Parallel - single siteParallel - single site Distributed - multiple sitesDistributed - multiple sites

Database ArchitecturesDatabase Architectures

Centralized

(Parallel)

Distributed

Client-Server

Function Data

CentralizedCentralized

PC, Mini, or MainframePC, Mini, or Mainframe Single DatabaseSingle Database Single Database ManagerSingle Database Manager One or More UsersOne or More Users Data and Function in One PlaceData and Function in One Place

Client-ServerClient-Server

PCs to Mainframes to MinisPCs to Mainframes to Minis PC to PCPC to PC Mainframe to MainframeMainframe to Mainframe

Use Desktop Processing PowerUse Desktop Processing Power Better User InterfaceBetter User Interface Greater FunctionalityGreater Functionality Retain Centralized Control of DataRetain Centralized Control of Data

Client-Server: Basic ModelClient-Server: Basic Model

ServerClient

Client Clien

tClien

t

Client

RequestResult

ServersServers

SupercomputerSupercomputer MainframeMainframe MiniMini PC ServerPC Server

All retain all dataAll retain all data

Client-Server ArchitectureClient-Server Architecture

Data

Function

Server

(Back-End)

Client

(Front-End)

Thin

Client

Fat

Client

FunctionalityFunctionality

PresentationPresentation I/O ProcessingI/O Processing

ValidationValidation Business RulesBusiness Rules

Application LogicApplication Logic Data Management Data Management

ValidationValidation Error HandlingError Handling

““Thin” ClientThin” Client

Presentation Services OnlyPresentation Services Only Accept InputAccept Input Format OutputFormat Output DisplayDisplay

Server does all processingServer does all processing

““Fat” ClientFat” Client

PresentationPresentation ValidationValidation Application Logic - ProgramsApplication Logic - Programs Data ManagementData Management Send SQL to ServerSend SQL to Server

Server is just DBMSServer is just DBMS

““In Between” ClientIn Between” Client

ClientClient PresentationPresentation Some Application LogicSome Application Logic

ServerServer Some Applicaton LogicSome Applicaton Logic Data Management and ServicesData Management and Services

Benefits of Client-ServerBenefits of Client-Server

Use Local Processing PowerUse Local Processing Power Better User InterfaceBetter User Interface Some Functionality if System DownSome Functionality if System Down Use Sunk Costs of PCsUse Sunk Costs of PCs Support ReengineeringSupport Reengineering Support IntranetsSupport Intranets Flexibility, Scalability, CustomizeabilityFlexibility, Scalability, Customizeability

Challenges of Client-ServerChallenges of Client-Server

Cost of (Upgraded) PCsCost of (Upgraded) PCs Network RelianceNetwork Reliance Distributing Application UpdatesDistributing Application Updates Management of Complex SystemManagement of Complex System Problem Identification & ResolutionProblem Identification & Resolution Application PartitioningApplication Partitioning

Other Client-Server Other Client-Server ArchitecturesArchitectures Traditional is Two-Tiered (client-server)Traditional is Two-Tiered (client-server) Three-TieredThree-Tiered

Client-Application Server-DB ServerClient-Application Server-DB Server (PC - Mini - Mainframe)(PC - Mini - Mainframe) (PC - PC Server - Mainframe)(PC - PC Server - Mainframe)

Beyond ThreeBeyond Three PC - PC Server - Web Server - Mini - PC - PC Server - Web Server - Mini -

MainframeMainframe

Client-Server vs. DistributedClient-Server vs. Distributed

Client-Server: Application DistributionClient-Server: Application Distribution

Distributed: Data DistributionDistributed: Data Distribution

Often, “client-server” is used to refer to Often, “client-server” is used to refer to either application distribution or data either application distribution or data distribution or both.distribution or both.

MiddlewareMiddleware

What ifWhat if Multiple databases (sources) need to Multiple databases (sources) need to

be accessed from a single client?be accessed from a single client? Different kinds of clients?Different kinds of clients? Mix of clients and servers?Mix of clients and servers? Want to take advantage of existing Want to take advantage of existing

base of applications (legacy systems)?base of applications (legacy systems)?

MiddlewareMiddleware

Fat Clients just send SQL transactionsFat Clients just send SQL transactions Other types of transactions may be Other types of transactions may be

needed based on the server (system)needed based on the server (system)

MiddlewareMiddleware

Software that shields applications from the complexity of the operating environment.

Client Client Client

Middleware

System

(Legacy)

System

(Legacy)

Types of MiddlewareTypes of Middleware

Transaction Process (TP) MonitorTransaction Process (TP) Monitor Database MiddlewareDatabase Middleware Remote Procedure Call (RPC)Remote Procedure Call (RPC) Message-Oriented Middleware (MOM)Message-Oriented Middleware (MOM) Object-Request BrokersObject-Request Brokers

(CORBA - ORB)(CORBA - ORB)

TP MonitorTP Monitor

Synchronous - sender must waitSynchronous - sender must wait QueuingQueuing Message DeliveryMessage Delivery Insured DeliveryInsured Delivery Either DirectionEither Direction

Database MiddlewareDatabase Middleware

Variety of Clients/PlatformsVariety of Clients/Platforms Variety of Servers/DBMSs/PlatformsVariety of Servers/DBMSs/Platforms Specific to DB transactions (SQL)Specific to DB transactions (SQL)

Message-Oriented Message-Oriented Middleware (MOM)Middleware (MOM) Asynchronous - clients do not waitAsynchronous - clients do not wait Queues & Queue Management/RecoveryQueues & Queue Management/Recovery Message DeliveryMessage Delivery Insured DeliveryInsured Delivery Either DirectionEither Direction

(like email or EDI only transactions)(like email or EDI only transactions)

Advantages of MiddlewareAdvantages of Middleware

Leverage sunk costs (legacy systems)Leverage sunk costs (legacy systems) Reduce development costReduce development cost Reduce development timeReduce development time

Increase responsivenessIncrease responsiveness Improve overall systems managementImprove overall systems management Consolidate diffuse informationConsolidate diffuse information

Challenges of MiddlewareChallenges of Middleware

CostCost Session management - Transaction stateSession management - Transaction state SecuritySecurity Network relianceNetwork reliance Diversity of systems - lack of standardsDiversity of systems - lack of standards Constant technology changeConstant technology change Availability of talentAvailability of talent Middleware ManagementMiddleware Management

Parallel and DistributedParallel and Distributed

Client-Server is an attempt to improve Client-Server is an attempt to improve performanceperformance

Reduce time to execute a transactionReduce time to execute a transaction ParallelParallel

Reduce time to get the dataReduce time to get the data DistributedDistributed

Parallel SystemsParallel Systems

Single site for dataSingle site for data Very LargeVery Large databases databases Operations performed simultaneouslyOperations performed simultaneously

Parallel Database Parallel Database ArchitecuresArchitecures Shared MemoryShared Memory Shared DiskShared Disk Shared NothingShared Nothing HierarchicalHierarchical

Shared MemoryShared Memory

P

P

PM

Shared MemoryShared Memory

AdvantagesAdvantages Extremely efficient communicationsExtremely efficient communications

DisadvantagesDisadvantages Max of 32/64 processorsMax of 32/64 processors Bus becomes bottleneckBus becomes bottleneck

Shared DiskShared Disk

P

P

P

M

M

M

Shared DiskShared Disk

AdvantagesAdvantages No bus bottleneckNo bus bottleneck Fault tolerance providedFault tolerance provided

DisadvantagesDisadvantages Disk access becomes bottleneckDisk access becomes bottleneck

Shared NothingShared Nothing

P

P

P

M

M

M

Shared NothingShared Nothing

AdvantagesAdvantages No disk bottleneckNo disk bottleneck Highly scaleableHighly scaleable

DisadvantagesDisadvantages High communication overhead/costHigh communication overhead/cost

Between processorsBetween processors To another processor’s dataTo another processor’s data

HierarchicalHierarchical

P

P

P

P

P

M

M

M

HierarchicalHierarchical

AdvantagesAdvantages Best of all worldsBest of all worlds

DisadvantagesDisadvantages Worst of all worldsWorst of all worlds Some high communcation overhead/costSome high communcation overhead/cost

Between subsystemsBetween subsystems ComplexityComplexity

Distributed DatabasesDistributed Databases

Client-Server - distribute functionalityClient-Server - distribute functionality

What about distributing data?What about distributing data?

Distributed DatabasesDistributed Databases

OverviewOverview Distributed StorageDistributed Storage Distributed QueriesDistributed Queries Distributed TransactionsDistributed Transactions Multidatabase (Middleware)Multidatabase (Middleware)

Distributed DatabasesDistributed Databases

Multiple locationsMultiple locations Single Single logicallogical database database Several physical databasesSeveral physical databases Network connectionsNetwork connections

AdvantagesAdvantages

Sharing across locationsSharing across locations Local controlLocal control AvailabilityAvailability

ChallengesChallenges

Development costsDevelopment costs People & EquipmentPeople & Equipment

TestingTesting Problem identification & resolutionProblem identification & resolution Technical expertiseTechnical expertise Network dependenceNetwork dependence Increased processing overheadIncreased processing overhead

Distributed Data StorageDistributed Data Storage

ReplicationReplication FragmentationFragmentation BothBoth

ReplicationReplication

Data is repeatedData is repeated Spectrum of options availableSpectrum of options available

Temporary replication of specific rowsTemporary replication of specific rows Replicate infrequently changed dataReplicate infrequently changed data Replicate by siteReplicate by site

Central site - all / each local site - their Central site - all / each local site - their data onlydata only

Full replicationFull replication Everything everywhereEverything everywhere

Concerns with ReplicationConcerns with Replication

Availability neededAvailability needed Amount of parallelism in readsAmount of parallelism in reads Overhead of updatesOverhead of updates Keeping replicas updatedKeeping replicas updated Conflicting updatesConflicting updates

FragmentationFragmentation

PartitioningPartitioning Divide data into subsets based on needDivide data into subsets based on need Have to be able to pull back together to Have to be able to pull back together to

get original tablesget original tables

FragmentationFragmentation

HorizontalHorizontal by rowsby rows specified conditionsspecified conditions

VerticalVertical by columnby column each requires primary key (or created key)each requires primary key (or created key)

MixedMixed by row and columnby row and column

Fragmentation & ReplicationFragmentation & Replication

Repeat as necessary:Repeat as necessary: Replicate fragmentsReplicate fragments Fragment replicasFragment replicas

Don’t lose track of what you have and Don’t lose track of what you have and where it is!where it is!

Network TransparencyNetwork Transparency

Distributing data should not require that Distributing data should not require that the user know where or how it’s been the user know where or how it’s been distributed.distributed.

The database should be seen as a The database should be seen as a single entity no matter how fragmented single entity no matter how fragmented and replicated it becomes.and replicated it becomes.

Network TransparencyNetwork Transparency

Some DBMSs are starting to provide this Some DBMSs are starting to provide this level of functionality so transparency level of functionality so transparency exists even at the program level, but in exists even at the program level, but in many cases this “transparency” must be many cases this “transparency” must be programmed into the applications.programmed into the applications.

It must always be designed into the It must always be designed into the database.database.

Distributed QueriesDistributed Queries

How do you query data that is How do you query data that is everywhere?everywhere?

Effeciency vs. OverheadEffeciency vs. Overhead

Splitting the query apartSplitting the query apart Keeping track of the data/locationsKeeping track of the data/locations Making sure everything gets executedMaking sure everything gets executed Putting the results back togetherPutting the results back together Generating network trafficGenerating network traffic Handling partial resultsHandling partial results

Distributed QueriesDistributed Queries

Full replication can avoid the overheadFull replication can avoid the overhead Huge increase in update overheadHuge increase in update overhead Parallel execution no longer possibleParallel execution no longer possible Additional costs of replicationAdditional costs of replication

ExampleExample

5 sites - NY, Pgh, Chicago, Dallas, Los 5 sites - NY, Pgh, Chicago, Dallas, Los AngelesAngeles

Data fragmented by site - no replicationData fragmented by site - no replication

Query (in Pgh):Query (in Pgh):

SELECT Name, Max (Salary) from SELECT Name, Max (Salary) from EmployeeEmployee

Option 1 - High BandwidthOption 1 - High Bandwidth

1. Have all sites send their full employee 1. Have all sites send their full employee tables to Pgh.tables to Pgh.

2. Build a temporary employee table.2. Build a temporary employee table.

3. Run the query against this table.3. Run the query against this table.

Option 2 - Option 2 - Not so High BandwidthNot so High Bandwidth1. Examine the query and determine it can 1. Examine the query and determine it can

be run separately at each location and the be run separately at each location and the results combined.results combined.

2. Submit just the query to each location.2. Submit just the query to each location.

3. Wait for the results from each city.3. Wait for the results from each city.

4. As results return, build a temporary table 4. As results return, build a temporary table (5 rows only).(5 rows only).

5. Find the max using the temporary table.5. Find the max using the temporary table.

Distributed TransactionsDistributed Transactions

Transaction TypesTransaction Types CoordinatorsCoordinators Commit ProtocolsCommit Protocols Concurrency ControlsConcurrency Controls DeadlocksDeadlocks

Transaction TypesTransaction Types

Local - transaction only needs local dataLocal - transaction only needs local data Global - transaction uses non-local dataGlobal - transaction uses non-local data

My global becomes someone else’s localMy global becomes someone else’s local

Either type of transaction must still have Either type of transaction must still have ACID properties - global is the concernACID properties - global is the concern

System StructureSystem Structure

Things to do:Things to do:

1. Process local transactions1. Process local transactions

(transaction manager)(transaction manager)

2. Process and track global transactions2. Process and track global transactions

(transaction coordinator)(transaction coordinator)

Global ProcessingGlobal Processing

1. Recognize as global1. Recognize as global

2. Break up transaction2. Break up transaction

3. Distribute pieces3. Distribute pieces

4. Assemble results4. Assemble results

5. Coordinate termination5. Coordinate termination

6. Handle problems6. Handle problems

Coordinator of CoordinatorsCoordinator of Coordinators

Coordinate among sitesCoordinate among sites Detect problemsDetect problems Attempt to fixAttempt to fix Share status with othersShare status with others

Coordinator FailureCoordinator Failure

Backup CoordinatorBackup Coordinator receives all messages - maintains statereceives all messages - maintains state monitors coordinatormonitors coordinator automatically takes over if coordinator automatically takes over if coordinator

downdown avoids delays - increases overheadavoids delays - increases overhead

ElectionElection highest pre-assigned numberhighest pre-assigned number

Commit ProtocolsCommit Protocols

Two-PhaseTwo-Phase Three-PhaseThree-Phase

AllAll sites must commit or all sites have to sites must commit or all sites have to rollbackrollback

Replicated data onlyReplicated data only

Two-Phase CommitTwo-Phase Commit

Phase 1Phase 1 Send PREPARE to all sitesSend PREPARE to all sites Sites respond READY or ABORTSites respond READY or ABORT

Phase 2Phase 2 If all sites READY,If all sites READY,

COMMIT locally - Send COMMITsCOMMIT locally - Send COMMITs If not READY or time expiresIf not READY or time expires

ROLLBACK locally - Send ROLLBACKROLLBACK locally - Send ROLLBACK

Two-Phase CommitTwo-Phase Commit

Coordinator

Site Site Site

Site requests commit

Two-Phase Commit -Two-Phase Commit -Phase 1Phase 1

Coordinator

Site Site Site

Send PREPARE - all sites

Two-Phase Commit -Two-Phase Commit -Phase 1Phase 1

Coordinator

Site Site Site

Sites respond READY

Two-Phase Commit -Two-Phase Commit -Phase 2Phase 2

Coordinator

Site Site Site

COMMIT locally

Two-Phase Commit -Two-Phase Commit -Phase 2Phase 2

Coordinator

Site Site Site

Send COMMIT - all sites

Two-Phase Commit -Two-Phase Commit -Phase 1Phase 1

Coordinator

Site Site Site

Site responds ABORT or does not respond

Two-Phase Commit -Two-Phase Commit -Phase 2Phase 2

Coordinator

Site Site Site

ROLLBACK locally

Two-Phase Commit -Two-Phase Commit -Phase 2Phase 2

Coordinator

Site Site Site

Send ROLLBACK - all sites

Site Failure - RecoverySite Failure - Recovery

COMMIT and ROLLBACK as normalCOMMIT and ROLLBACK as normal If READY onlyIf READY only

Check with coordinator or other sitesCheck with coordinator or other sites Either COMMIT or ROLLBACKEither COMMIT or ROLLBACK If no one found, ROLLBACKIf no one found, ROLLBACK

Coordinator FailureCoordinator Failure

Ask the sitesAsk the sites If one has COMMIT, then REDOIf one has COMMIT, then REDO If one has ROLLBACK, then UNDOIf one has ROLLBACK, then UNDO If one doesn’t have READY, UNDOIf one doesn’t have READY, UNDO

If all READY onlyIf all READY only Coordinator must decideCoordinator must decide Sites must wait and locks are heldSites must wait and locks are held ““Blocking” occursBlocking” occurs

Three-Phase CommitThree-Phase Commit

Phase 1Phase 1 Sent PREPARESent PREPARE Sites respond READY or ABORTSites respond READY or ABORT

Phase 2Phase 2 If all sites READY, send PRECOMMITIf all sites READY, send PRECOMMIT Else, ROLLBACKElse, ROLLBACK Sites must ACKNOWLEDGESites must ACKNOWLEDGE

Phase 3Phase 3 If at least K sites ACKNOWLEDGE, send If at least K sites ACKNOWLEDGE, send

COMMITCOMMIT

Coordinator FailureCoordinator Failure

Three-Phase Commit prevents blockingThree-Phase Commit prevents blocking If coordinator failsIf coordinator fails

New coordinator is selectedNew coordinator is selected Sites queried to determine statusSites queried to determine status New coordinator resumesNew coordinator resumes

Network PartitioningNetwork Partitioning

Network split creates two separate Network split creates two separate networksnetworks

Each “half” selects a coordinatorEach “half” selects a coordinator Coordinators make independent decisionsCoordinators make independent decisions Result could be different decisionsResult could be different decisions Resolution of network problem may create Resolution of network problem may create

need to resolve database problemsneed to resolve database problems

Concurrency ControlConcurrency Control

Single Lock ManagerSingle Lock Manager Multiple Lock ManagersMultiple Lock Managers

Single Lock ManagerSingle Lock Manager

One site for all lockingOne site for all locking All other sites must go to itAll other sites must go to it Can read from anywhereCan read from anywhere Updates must be to all copiesUpdates must be to all copies

Advantages: Simple, Easy deadlock detectionAdvantages: Simple, Easy deadlock detection Disadvantages: Bottleneck, VulnerabilityDisadvantages: Bottleneck, Vulnerability

Simple Multiple Lock MgrsSimple Multiple Lock Mgrs

Each site locks a unique partition of the Each site locks a unique partition of the datadata non-replicated datanon-replicated data

Advantages: Fairly simple, reduced Advantages: Fairly simple, reduced bottlenecksbottlenecks

Disadvantages: Complicated deadlock Disadvantages: Complicated deadlock detectiondetection

Majority ProtocolMajority Protocol

Each site locks its own data Each site locks its own data replication possiblereplication possible

Request owner for lock on data that isn’t localRequest owner for lock on data that isn’t local When multiple owners, n/2 + 1 (majority) must When multiple owners, n/2 + 1 (majority) must

provide the lockprovide the lock

Advantages: No bottlenecksAdvantages: No bottlenecks Disadvantages: More messages sent, Complicated Disadvantages: More messages sent, Complicated

deadlock detection, More deadlocks (each gets 1/2)deadlock detection, More deadlocks (each gets 1/2)

Biased ProtocolBiased Protocol

Reduced form of Majority ProtocolReduced form of Majority Protocol For a READ, only need any single lockFor a READ, only need any single lock For a WRITE, need all locksFor a WRITE, need all locks

Advantages: No bottle necks, Reduced trafficAdvantages: No bottle necks, Reduced traffic Disadvantages: Update traffic, DeadlocksDisadvantages: Update traffic, Deadlocks

Primary CopyPrimary Copy

Site designated to hold “primary” copySite designated to hold “primary” copy Multiple sitesMultiple sites Replicated DataReplicated Data

All locks through that siteAll locks through that site

Advantages: Fairly simple, reduced bottlenecksAdvantages: Fairly simple, reduced bottlenecks Disadvantages: Vulnerability, Complicated Disadvantages: Vulnerability, Complicated

deadlock detectiondeadlock detection

Other Than LockingOther Than Locking

TimestampsTimestamps Centralized generationCentralized generation Local generationLocal generation

Timestamp tests determine ability to Timestamp tests determine ability to read or writeread or write

Deadlocks & Distributed DataDeadlocks & Distributed Data

CentralizedCentralized One SiteOne Site

DistributedDistributed

Centralized - same advantages and Centralized - same advantages and disadvantages as other centralized disadvantages as other centralized control (database or locking)control (database or locking)

Distributed Deadlock Distributed Deadlock DetectionDetection Each site tracks all transactions accessing its Each site tracks all transactions accessing its

own dataown data Dummy transaction for transactions that Dummy transaction for transactions that

originated here but are executing elsewhereoriginated here but are executing elsewhere If deadlock found that includes dummy If deadlock found that includes dummy

transactiontransaction Must send deadlock information to other sitesMust send deadlock information to other sites They check for deadlockThey check for deadlock May have to pass on to another siteMay have to pass on to another site

Homework #9Homework #9

Continuuing with the Carnegie LibraryContinuuing with the Carnegie Library Client/ServerClient/Server Distrributed DatabaseDistrributed Database