w hy not use f ederated approach for d atabase m anagement s ystem (dbms)? yan cui itk478 position...

20
WHY NOT USE FEDERATED APPROACH FOR DATABASE MANAGEMENT SYSTEM (DBMS)? Yan Cui ITK478 Position paper

Upload: howard-hubbard

Post on 24-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

WHY NOT USE FEDERATED APPROACH FOR DATABASE MANAGEMENT SYSTEM (DBMS)?

Yan Cui

ITK478

Position paper

Page 2: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

CRUCIAL ISSUES IN ENTERPRISES

“…organizations merge or takeover since the existing systems have been designed for different corporate needs, the resulting enterprise will have to face information inconsistency, heterogeneity and incompatible overlap”. Wijegunartne, Fernandez and Vltoudis in [1]

“…a large modern enterprise, it is also inevitable that …use different database systems to store and search their critical data. Competition, evolving technology, mergers, acquisitions, geographic distribution, and … decentralization of growth…” Haas and Lin in [2]

Page 3: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

DBMS APPROACHES

Compare two major approaches Federated database system approach Distributed database system approach

Comparison in their architectures/designs, transparency, integration, autonomy, and others.

Page 4: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

DISTRIBUTED DBMS

Definition of Distributed database (DDBS) and Distributed Database Management System (DBMS)

Centralized and distributed databases conversion

Distributed DBMS design

Page 5: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

DISTRIBUTED DBMS (CONT)

Definition of Distributed database (DDBS) and Distributed Database Management System (DBMS) Distributed database – “a collection of multiple,

logically interrelated database distributed over a computer network” by M. Özsu and P. Valduriez in [1]

Distribute DBMS – “as the software system that permits the management of the DDBS and makes the distribution transparent to the users” by M. Özsu and P. Valduriez in [1].

Page 6: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

DISTRIBUTED DBMS (CONT)

Centralized and distributed databases conversion Distributed DBMS is more “local autonomy,

improved performance, improved reliability/availability, economics, expandability, and shareability” [3].

Fig. 1 - Central Database on a Network [3] Fig. 2 - DDBS Environment [3]

Page 7: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

DISTRIBUTED DBMS (CONT)

Distributed DBMS design - in [4] by F. A. Baião, M. Mattoso and G. Zaverucha, defined “Distribution design involves making decisions on the fragmentation and placement of data across the sites of a computer network” Fragmentation Allocation

Page 8: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

DISTRIBUTED DBMS (CONT)

Distributed DBMS design – Fragmentation Defined as “clustering fragments the information

accessed simultaneously by applications” [4]. vertical fragmentation horizontal fragmentation mixed fragmentation

Page 9: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

DISTRIBUTED DBMS (CONT)

Distributed DBMS design – Fragmentation horizontal fragmentation - class instances are

distributed across fragments, and also a horizontal fragment of a class contains a subset of the whole class extension [4] Primary (Round-Robin, Hash-partition, and Rang-

partition) Derived fragment

Fig.3 - Round-robin [5] Fig. 4 - Hash-partition [5] Fig. 5 - Range partition [5]

Page 10: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

DISTRIBUTED DBMS (CONT)

Distributed DBMS design – Fragmentation horizontal fragmentation

Derived fragment

Fig. 5 - Range partition [5]

Page 11: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

DISTRIBUTED DBMS (CONT)

Distributed DBMS design – Fragmentation horizontal fragmentation - distribute attributes

and methods across fragments, as fragment 1(name, GPA) and fragment 2(address, bDate, picture) from student class in Fig. 7

mixed fragmentation – combination of vertical and horizontal fragmentations

Fig. 7 – Vertical fragmentation [5] Fig. 8 – Mixed fragmentation [5]

Page 12: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

DISTRIBUTED DBMS (CONT)

Distributed DBMS design – Allocation by M. Özsu and P. Valduriez in [3] is to distribute

all resources/fragments across the nodes/sites of a computer network.

Page 13: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

FEDERATED DBMS

Definition all data sources are federated and linked

together from heterogeneous DBMSs, different locations, relevant/irrelevant and structure/non-structure data, into a unified system by DBMS by L.M. Haas, E.T. Lin and M.A. Roth in [6].

Characteristics of federated DBMS transparency, heterogeneity, a high degree of

function, extensibility, openness, autonomy, and optimized performance in [2,6].

Page 14: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

FEDERATED DBMS

DB2 architecture for database federation user-defined function (UDF) (Scalar and

Table UDFs) Wrapper

Fig. 9 – DB2 architecture of database federation [6]

Page 15: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

FEDERATED DBMS

DB2 architecture for database federation UDF - take input parameters and return either a

scalar result or a table of data. Scalar UDF - takes SQL statement as input and returns

a scalar result. Table UDF - is the other method which produces table

as output from any referenced SQL statements.

Select db2mq.mqsend(a.headline)From Articles aWhere a.article_timestamp >= CURRENT TIMESTAMP

Select a.first, a.last, a.phone, a.emailFrom TABLE(addressbook()) AS a, Company_Profiles cWhere c.industry = ‘FINANCIAL’ AND c.revenue > 50,000,000 AND c.name = a.company_name

Example. 1 - Scalar UDF [6] Example. 2 - Table UDF [6]

Page 16: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

FEDERATED DBMS

DB2 architecture for database federation Wrapper - as “powerful and flexible

infrastructure for federation” in [6] because it integrates both scalar UDF function and Table UDF data

Select c.name, a.URLFrom Compounds c, Experiments e, Articles aWhere e.result < 1.1e-p and e.id = c.id and serach (a.subject, c.name) > 0

Example. 3 – Wrapper [6]

Page 17: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

COMPARISON TABLEComparison Distributed DBMS Federated DBMS

Transparency Very transparency because distributed database needs to be interrelated through communication network. Each site holds its own database. Therefore, users or applications need to know how to interact with database system.

Not transparency because it masks from the user the differences, idiosyncrasies, and implementations of the underlying data sources [2]. Therefore, the users not need to aware of location, invocation, dialect, fragmentation, etc.

Heterogeneity Very hard to handle for heterogeneity if multiple databases are not interrelated and different networks.

Can handle different hardware, network protocols, software, query language, data models.

Autonomy Local autonomy because each department have authority to manage their data.

Not disturb local operation, moved or modified data, remain application/interface.

Data integration Hard if not same network protocols, and multiple DBMS, and not interrelated. It also increases cost and traffic for query.

Can be easy to integrate data from different protocols, DMBS, using wrapper.

Database access Can be access using ODBC, JDBC, etc, as adapters. Each adapter may be different based on the database system: Oracle using Oracleadapter; SQL using SQLadapter, and Access using OLEadapter. Each programming language has its own embedded SQL.

Using Xperanto as middleware layer to access any DBMSs with simple programming model. Application can push XML as standard SQL statement for various query execution.

Other features Economic, Reflects organizational structure. A high degree of function, extensibility and openness of the federation, optimized performance.

Page 18: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

CONCLUSION/POSITION

the disadvantages of distributed DBMS are complexity, economic, difficulty to maintain data integration, database access [3].

federated database system provides transparency, autonomy, optimized performance, accessibility, and query standard through multiple DBMSs

an efficient way to integrate multiple DMBSs if enterprises merging or using different DBMSs, and provide data sharing and processing efficiently throughout the enterprises.

Page 19: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

REFERENCE [1] I. Wijegunaratne, G. Fernandez, J. Valtoudis. 2000. “A Federated

Architecture for Enterprise Data Integration”, 2000 Australian Software Engineering Conference. Retrieved September 12, 2007. (http://portal.acm.org.proxy.lib.ilstu.edu:2048/citation.cfm?id=787253&coll=Portal&dl=GUIDE&CFID=5277637&CFTOKEN=95867344)

[2] Laura Haas, Eileen Lin, 2002 “IBM Federated Database Technology”, IBM, retrieved September 10, 2007 (http://www.ibm.com/developerworks/db2/library/techarticle/0203haas/0203haas.html)

[3] M. Özsu and P. Valduriez, Principles of Distributed Database Systems, 2nd edition (1st edition 1991), New Jersey, Prentice-Hall, 1999.

[4] F.A. Baião , M. Mattoso , G. Zaverucha. 1998. “Towards an Inductive Design of Distributed Object Oriented Databases”. Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems, p.188-197, August 20-22. Retrieved September 28, 2007 from http://csdl.computer.org/dl/proceedings/coopis/1998/8380/00/83800188.pdf.

[5] F. Baião, M. Mattoso, G. Zaverucha. “An Algorithm for the Design of Distributed Object Databases” PowerPoint. Retrieved September 14, 2007. From http://www-db.cs.wisc.edu/dbseminar/spring00/talks/fernanda_slides.pdf.

[6] L.M. Haas, E.T. Lin, M.A. Roth. 2002. “Data integration through database federation”. IBM Systems Journal, Volume 41 ,  Issue 4, retrieved October 1, 2007 from http://www.research.ibm.com/journal/sj/414/haas.pdf.

Page 20: W HY NOT USE F EDERATED APPROACH FOR D ATABASE M ANAGEMENT S YSTEM (DBMS)? Yan Cui ITK478 Position paper

QUESTION?