ims 4212: distributed databases 1 dr. lawrence west, management dept., university of central florida...
TRANSCRIPT
IMS 4212: Distributed Databases
1Dr. Lawrence West, Management Dept., University of Central [email protected]
Distributed Databases
• Business needs for distributed databases
• Introduction to distributed databases
• Subscriber / Publisher Model
• Snapshots
• Transactional Replication
• Merge Replication
• Dissimilar Databases
• Implementing Distributed DB
• Design Implications
• Advantages & Disadvantages
IMS 4212: Distributed Databases
2Dr. Lawrence West, Management Dept., University of Central [email protected]
Business Needs for Distributed Databases
• The concept of a central database to handle all of the organization’s needs has several potential limitations
– Geographically dispersed organization requires extensive database traffic
• Large organization creates congestion at the server
• Large volumes of data must be moved across the network
– The entire organization can be vulnerable to a problem with a single server
– Data communications interruptions can disrupt the entire organization’s operations
IMS 4212: Distributed Databases
3Dr. Lawrence West, Management Dept., University of Central [email protected]
Business Needs for Distributed Databases (cont.)
• Central database limitations (cont.)
– Dissimilar operating units create differing data access needs
• Local units require autonomy over the design and implementation of DB systems
• Information sharing across the organization still requires connectivity
• Local unit DB designers will not be allowed to design against the entire DB
IMS 4212: Distributed Databases
4Dr. Lawrence West, Management Dept., University of Central [email protected]
Business Needs for Distributed Databases (cont.)
• Central database limitations (cont.)
– Mergers and acquisitions create ad-hoc integration of dissimilar DB systems
• Different business units may have fully developed DB and applications on dissimilar platforms, DBMS, etc.
• The organization still requires information sharing for organizational effectiveness
• Rewriting the whole system in a single DB is impractical (or may take time to implement)
IMS 4212: Distributed Databases
5Dr. Lawrence West, Management Dept., University of Central [email protected]
Distributed Databases
• Distributed Databases are characterized by decisions made regarding:
– Distribution of data schema
• All nodes share same schema or not
– Update rights on objects (especially table data)
– Latency / concurrency requirements
– Commonality of DBMS
IMS 4212: Distributed Databases
6Dr. Lawrence West, Management Dept., University of Central [email protected]
Subscriber/Publisher Model
• A susbcriber / publisher model is often used to describe database updates
• Nodes allowed to change data & objects are publishers
• Nodes needing to be aware of changes are subscribers
• Decisions are made on methods for making subscribers aware of changes and of getting changes to them
– Near real time
– On demand
– Batch
– On schedule
IMS 4212: Distributed Databases
7Dr. Lawrence West, Management Dept., University of Central [email protected]
Snapshots
• Distribution of databases (except in connecting existing databases) usually start with a snapshot of all or part of a DB
– Copy of structures, data, SP, triggers, etc.
• The snapshot is distributed to all nodes
– May be different snapshots to different nodes
IMS 4212: Distributed Databases
8Dr. Lawrence West, Management Dept., University of Central [email protected]
A Scenario
• Corporate HQ isthe central site
• Regional HQ or even ‘retail’ locations are Remote sites
• Remote sites executefrequent transactions
• Q: What data isneeded in each locationfor the organization’s business needs?
IMS 4212: Distributed Databases
9Dr. Lawrence West, Management Dept., University of Central [email protected]
Transactional Replication
• In transactional replication aseach transaction is executed on any node it is ‘published’ to all subscribing nodes which also execute the transaction
• Data integrity rules are checked at each node
• Violation of a data integrity rule at any node can roll back the transaction at all nodes
• Data is kept relatively current at all nodes
IMS 4212: Distributed Databases
10Dr. Lawrence West, Management Dept., University of Central [email protected]
Transactional Replication (cont.)
• Application (“business”) needs control urgency and frequency of updates
• Some data is read only at some nodes
– Price schedule might be set centrally and only read locally
– Sales transactions are probably executed locally and rolled up centrally
IMS 4212: Distributed Databases
11Dr. Lawrence West, Management Dept., University of Central [email protected]
Transactional Replication
• When is Transactional Replication appropriate?
– Higher interaction between actions at nodes (easier to cause conflicts with out of date data)
– Decision making requires updated information
– Frequent changes can cause concurrency problems
– Connectivity is not an issue
• Detected problems can result in near-real time rollbacks
IMS 4212: Distributed Databases
12Dr. Lawrence West, Management Dept., University of Central [email protected]
Merge Replication
• In Merge Replication subscribers may receive a partition of the data
– Certain rows
• Only customers or employees in their region
– Certain columns
• Employee contact info but not salary info
• Subscribers may add, update, or delete rows to which they have write access
• Changes are committed (published) to the subcribers in a batch (merged back into the subscriber DB)
IMS 4212: Distributed Databases
13Dr. Lawrence West, Management Dept., University of Central [email protected]
Merge Replication (cont.)
• System is able to detect whenremote site copy of data haschanged (including newrecords)
• Changed data is marked forupdating in central copyduring merge
IMS 4212: Distributed Databases
14Dr. Lawrence West, Management Dept., University of Central [email protected]
Merge Replication (cont)
• When is merge replication appropriate?
– Few chances for node operations to create conflicts
• Highly autonomous activities
• Different lines of business
– Infrequent changes requiring immediate awareness by all subscribers
– Physical connectivity issues
• May create more complex problems when a conflict does occur
– Rolling back already committed transactions
IMS 4212: Distributed Databases
15Dr. Lawrence West, Management Dept., University of Central [email protected]
Dissimilar Databases
• Distributed DB nodes may be dissimilar on two dimensions
– DB architecture (table structure, field data types/names, etc.)
– DBMS and OS (may not even be relational data)
• “Messages” sent between nodes to inform them of updates must be translated somewhere
• Imposes new layers of complexity for connectivity
• SQL Server provides support for this process
• Many third party products for logical integration
IMS 4212: Distributed Databases
16Dr. Lawrence West, Management Dept., University of Central [email protected]
Implementing DB Distribution
• SQL Server comes with a wealth of distributed DB management tools
– Specify publication schedules, rights, update frequencies, etc.
– Manage conflicts when they occur and notify clients
– Perform translations between DBMS
– Perform translations between structures
IMS 4212: Distributed Databases
17Dr. Lawrence West, Management Dept., University of Central [email protected]
Design Implications
• Some DB designs may change when the DB is replicated
– Relationships may not be enforced in remote nodes because matching parent rules may not exist
– GUID attributes may be needed for PKs since independently generated Identity attributes could conflict when rolled up
– Triggers or constraints may be different
• May violate locally but be OK globally
• Vice-versa
IMS 4212: Distributed Databases
18Dr. Lawrence West, Management Dept., University of Central [email protected]
Database Distribution Advantages & Tradeoffs
• Key advantages of distributed DB
– Increased reliability
– Local access and control
– Modular growth
– Lower communication costs
– Faster response
What are the mechanisms thatgive rise to theseadvantages?
IMS 4212: Distributed Databases
19Dr. Lawrence West, Management Dept., University of Central [email protected]
Database Distribution Advantages & Tradeoffs (cont.)
• Disadvantages of distributed DB
– Software cost & complexity
• Keeping data current
• Maintaining data integrity
• Integrating multiple sites and applications
– Processing overhead
– Data integrity
– Slow response from poor design
IMS 4212: Distributed Databases
20Dr. Lawrence West, Management Dept., University of Central [email protected]
Distributed DBMS (cont.)
• Distributed DBMS attempts to achieve “Location Transparency”
– User or application will not need to know that the query is going to multiple nodes
– User has one integrated DB schema
– Distributed DBMS performs all network operations
• Also seek to achieve “Replication Transparency”
– Replication operations are performed automatically
– Manages multiple updates against different copies of replicated data