ims 4212: distributed databases 1 dr. lawrence west, management dept., university of central florida...

20
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida [email protected] Distributed Databases Business needs for distributed databases Introduction to distributed databases Subscriber / Publisher Model Snapshots Transactional Replication Merge Replication Dissimilar Databases Implementing Distributed DB Design Implications Advantages & Disadvantages

Upload: roderick-poole

Post on 11-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

1Dr. Lawrence West, Management Dept., University of Central [email protected]

Distributed Databases

• Business needs for distributed databases

• Introduction to distributed databases

• Subscriber / Publisher Model

• Snapshots

• Transactional Replication

• Merge Replication

• Dissimilar Databases

• Implementing Distributed DB

• Design Implications

• Advantages & Disadvantages

Page 2: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

2Dr. Lawrence West, Management Dept., University of Central [email protected]

Business Needs for Distributed Databases

• The concept of a central database to handle all of the organization’s needs has several potential limitations

– Geographically dispersed organization requires extensive database traffic

• Large organization creates congestion at the server

• Large volumes of data must be moved across the network

– The entire organization can be vulnerable to a problem with a single server

– Data communications interruptions can disrupt the entire organization’s operations

Page 3: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

3Dr. Lawrence West, Management Dept., University of Central [email protected]

Business Needs for Distributed Databases (cont.)

• Central database limitations (cont.)

– Dissimilar operating units create differing data access needs

• Local units require autonomy over the design and implementation of DB systems

• Information sharing across the organization still requires connectivity

• Local unit DB designers will not be allowed to design against the entire DB

Page 4: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

4Dr. Lawrence West, Management Dept., University of Central [email protected]

Business Needs for Distributed Databases (cont.)

• Central database limitations (cont.)

– Mergers and acquisitions create ad-hoc integration of dissimilar DB systems

• Different business units may have fully developed DB and applications on dissimilar platforms, DBMS, etc.

• The organization still requires information sharing for organizational effectiveness

• Rewriting the whole system in a single DB is impractical (or may take time to implement)

Page 5: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

5Dr. Lawrence West, Management Dept., University of Central [email protected]

Distributed Databases

• Distributed Databases are characterized by decisions made regarding:

– Distribution of data schema

• All nodes share same schema or not

– Update rights on objects (especially table data)

– Latency / concurrency requirements

– Commonality of DBMS

Page 6: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

6Dr. Lawrence West, Management Dept., University of Central [email protected]

Subscriber/Publisher Model

• A susbcriber / publisher model is often used to describe database updates

• Nodes allowed to change data & objects are publishers

• Nodes needing to be aware of changes are subscribers

• Decisions are made on methods for making subscribers aware of changes and of getting changes to them

– Near real time

– On demand

– Batch

– On schedule

Page 7: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

7Dr. Lawrence West, Management Dept., University of Central [email protected]

Snapshots

• Distribution of databases (except in connecting existing databases) usually start with a snapshot of all or part of a DB

– Copy of structures, data, SP, triggers, etc.

• The snapshot is distributed to all nodes

– May be different snapshots to different nodes

Page 8: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

8Dr. Lawrence West, Management Dept., University of Central [email protected]

A Scenario

• Corporate HQ isthe central site

• Regional HQ or even ‘retail’ locations are Remote sites

• Remote sites executefrequent transactions

• Q: What data isneeded in each locationfor the organization’s business needs?

Page 9: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

9Dr. Lawrence West, Management Dept., University of Central [email protected]

Transactional Replication

• In transactional replication aseach transaction is executed on any node it is ‘published’ to all subscribing nodes which also execute the transaction

• Data integrity rules are checked at each node

• Violation of a data integrity rule at any node can roll back the transaction at all nodes

• Data is kept relatively current at all nodes

Page 10: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

10Dr. Lawrence West, Management Dept., University of Central [email protected]

Transactional Replication (cont.)

• Application (“business”) needs control urgency and frequency of updates

• Some data is read only at some nodes

– Price schedule might be set centrally and only read locally

– Sales transactions are probably executed locally and rolled up centrally

Page 11: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

11Dr. Lawrence West, Management Dept., University of Central [email protected]

Transactional Replication

• When is Transactional Replication appropriate?

– Higher interaction between actions at nodes (easier to cause conflicts with out of date data)

– Decision making requires updated information

– Frequent changes can cause concurrency problems

– Connectivity is not an issue

• Detected problems can result in near-real time rollbacks

Page 12: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

12Dr. Lawrence West, Management Dept., University of Central [email protected]

Merge Replication

• In Merge Replication subscribers may receive a partition of the data

– Certain rows

• Only customers or employees in their region

– Certain columns

• Employee contact info but not salary info

• Subscribers may add, update, or delete rows to which they have write access

• Changes are committed (published) to the subcribers in a batch (merged back into the subscriber DB)

Page 13: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

13Dr. Lawrence West, Management Dept., University of Central [email protected]

Merge Replication (cont.)

• System is able to detect whenremote site copy of data haschanged (including newrecords)

• Changed data is marked forupdating in central copyduring merge

Page 14: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

14Dr. Lawrence West, Management Dept., University of Central [email protected]

Merge Replication (cont)

• When is merge replication appropriate?

– Few chances for node operations to create conflicts

• Highly autonomous activities

• Different lines of business

– Infrequent changes requiring immediate awareness by all subscribers

– Physical connectivity issues

• May create more complex problems when a conflict does occur

– Rolling back already committed transactions

Page 15: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

15Dr. Lawrence West, Management Dept., University of Central [email protected]

Dissimilar Databases

• Distributed DB nodes may be dissimilar on two dimensions

– DB architecture (table structure, field data types/names, etc.)

– DBMS and OS (may not even be relational data)

• “Messages” sent between nodes to inform them of updates must be translated somewhere

• Imposes new layers of complexity for connectivity

• SQL Server provides support for this process

• Many third party products for logical integration

Page 16: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

16Dr. Lawrence West, Management Dept., University of Central [email protected]

Implementing DB Distribution

• SQL Server comes with a wealth of distributed DB management tools

– Specify publication schedules, rights, update frequencies, etc.

– Manage conflicts when they occur and notify clients

– Perform translations between DBMS

– Perform translations between structures

Page 17: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

17Dr. Lawrence West, Management Dept., University of Central [email protected]

Design Implications

• Some DB designs may change when the DB is replicated

– Relationships may not be enforced in remote nodes because matching parent rules may not exist

– GUID attributes may be needed for PKs since independently generated Identity attributes could conflict when rolled up

– Triggers or constraints may be different

• May violate locally but be OK globally

• Vice-versa

Page 18: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

18Dr. Lawrence West, Management Dept., University of Central [email protected]

Database Distribution Advantages & Tradeoffs

• Key advantages of distributed DB

– Increased reliability

– Local access and control

– Modular growth

– Lower communication costs

– Faster response

What are the mechanisms thatgive rise to theseadvantages?

Page 19: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

19Dr. Lawrence West, Management Dept., University of Central [email protected]

Database Distribution Advantages & Tradeoffs (cont.)

• Disadvantages of distributed DB

– Software cost & complexity

• Keeping data current

• Maintaining data integrity

• Integrating multiple sites and applications

– Processing overhead

– Data integrity

– Slow response from poor design

Page 20: IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Distributed Databases Business needs

IMS 4212: Distributed Databases

20Dr. Lawrence West, Management Dept., University of Central [email protected]

Distributed DBMS (cont.)

• Distributed DBMS attempts to achieve “Location Transparency”

– User or application will not need to know that the query is going to multiple nodes

– User has one integrated DB schema

– Distributed DBMS performs all network operations

• Also seek to achieve “Replication Transparency”

– Replication operations are performed automatically

– Manages multiple updates against different copies of replicated data