schema integration - smckearney.com · • schema integration is used to merge two or more database...

23
BBIT4/SEM4 Advanced Database Systems © Stephen Mc Kearney, 2002. 1 Schema Integration Conceptual Database Design Batini, Ceri, Navathe Ch. 5 “A Comparative Analysis of Methodologies for Database Schema Integration” Batini, Lenzerini, Navathe ACM Computing Surveys, Vol 18, No 4, Dec 1986 Fundamentals of Database Systems Elmasri/Navathe sec. 16.2.2

Upload: trantu

Post on 04-Jul-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 1

Schema Integration

Conceptual Database DesignBatini, Ceri, Navathe

Ch. 5

“A Comparative Analysis of Methodologies forDatabase Schema Integration”

Batini, Lenzerini, Navathe

ACM Computing Surveys, Vol 18, No 4, Dec 1986

Fundamentals of Database SystemsElmasri/Navathe

sec. 16.2.2

Page 2: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 2

2

Overview• Definition

– What is schema integration?– Merging two or more

database schemas (models).

• Problems– Different methods of

representing the sameconcepts, for example, namingconcepts.

• Strategies– Schemas may be merged (i) in

one go or (ii) in stages.

• Process– Three stages: (i) identify

problems, (ii) resolveproblems, (iii) merge schemas.

• Resolving Conflicts– Changing the names and the

structure of entities.

• Example

– See paper.

Page 3: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 3

• A database schema is the description of a database, for example, theentity-relationship model.

• Batini et al define schema integration as “the process of merging severalconceptual schemas into a global conceptual schema that represents allthe requirements of the application”.

• Schema integration is used to merge two or more database schemas into asingle schema that can store data from both the original databases.

• Schema integration is used when two or more existing databases must becombined, for example, when a new management information system isbeing developed.

• Schema integration may be used when the process of database design istoo large to be carried out by one individual. Two or more designers willbuild models of different parts of the database and use schema integrationto merge the resulting models.

• There are two major types of schema integration:

• View Integration View integration takes place during the designof a new database when user requirements may be different foreach user group. View integration is used to merge differentviewpoints into a single data model.

• Database Integration Database integration is used when two ormore databases must be combined to produce a single schema,called a global schema.

Ref: Batini, p119.

3

Definition

Database 1 Database 2

Integrated Database

Page 4: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 4

4

Overview• Definition

– What is schema integration?

– Merging two or more databaseschemas (models).

• Problems– Different methods of

representing the sameconcepts, for example,naming concepts.

• Strategies– Schemas may be merged (i) in

one go or (ii) in stages.

• Process– Three stages: (i) identify

problems, (ii) resolveproblems, (iii) merge schemas.

• Resolving Conflicts– Changing the names and the

structure of entities.

• Example

– See paper.

Page 5: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 5

5

Problems - Different Perspectives

Employee

Project

Perspective 1

Relationship

Employee

Department

Project

Perspective 2

• When two database schemas are designed by different designers usingdifferent user requirements, the resulting schemas will often presentcontrasting views of the same data.

• In the example above, the relationship between employee and project inone database is represented as a relationships between employee,department and project in another database.

• This situation might occur in an organisation that allows differentdepartments to have different rules as to how employees are allocated toprojects. For example, in one department employees may be assigned toprojects while in another department employees may not be considered tobe directly related to a project.

Ref: Batini, sec. 5.1.

Page 6: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 6

6

Problems - Equivalent Concepts

Book

Publisher

Title

Name

BookTitlePublisher

Database 1 Database 2

Attribute

• Different databases may treat the same concepts in different ways.

• In the above example, the publisher concept is an entity in database 1 butan attribute in database 2.

• There are two situations that must be dealt with during schemaintegration:

1. When different concepts are modelled in the same way. Forexample, in a university database staff and students may berepresented by the entity person even though they are differentconcepts.

2. When the same concepts are modelled in different way. Forinstance, the above example models the concept of a publisher asan entity and as an attribute.

Ref: Batini, sec. 5.1.

Page 7: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 7

7

Problems - Incompatible Designs

Employee

Project

1

N

Employee

Project

N

M

Database 1 Database 2

one-to-many many-to-many

• Two database designs may be incompatible because mistakes were madein the initial design or there are different constraints placed on the data.

• For instance, in the above example, the relationship between employeeand project is represented as a one-to-many relationship in database 1 andas a many-to-many relationship in database 2.

• This problem may be caused by mistakes made during the initial databaseanalysis task or because users of the system have different workingpractices. For example, one department in an organisation, which workson small projects, may allocate one employee to a project but a differentdepartment, which works on large projects, may allocate many employeesto a project.

• During schema integration these different viewpoints must be reconciled.

Ref: Batini, sec. 5.1.

Page 8: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 8

8

Overview• Definition

– What is schema integration?

– Merging two or more databaseschemas (models).

• Problems– Different methods of

representing the sameconcepts, for example, namingconcepts.

• Strategies– Schemas may be merged (i)

in one go or (ii) in stages.

• Process– Three stages: (i) identify

problems, (ii) resolveproblems, (iii) merge schemas.

• Resolving Conflicts– Changing the names and the

structure of entities.

• Example

– See paper.

Page 9: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 9

9

Strategies - All-In-One

Database Schema 1 Database Schema 2 Database Schema 3

Integrated Database Schema

• The first strategy for integrating a set of schemas is to merge them all intoa single large schema (called the global schema).

• This approach would be difficult when the schemas are large or whenthere are a large number of schemas.

Ref: Batini, sec. 5.2.

Page 10: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 10

10

Strategies - Stages

Database Schema 1 Database Schema 2

Database Schema 3

Integrated Database Schema

Partially IntegratedDatabase Schema

• The second strategy for schema integration is to integrate some of theschemas (e.g. two) and then to integrate the resulting schemas.

• This approach would be more appropriate when the schemas are complexor when there are a large number of schemas.

Ref: Batini, sec. 5.2.

Page 11: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 11

11

Overview• Definition

– What is schema integration?

– Merging two or more databaseschemas (models).

• Problems– Different methods of

representing the sameconcepts, for example, namingconcepts.

• Strategies– Schemas may be merged (i) in

one go or (ii) in stages.

• Process– Three stages: (i) identify

problems, (ii) resolveproblems, (iii) mergeschemas.

• Resolving Conflicts– Changing the names and the

structure of entities.

• Example

– See paper.

Page 12: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 12

12

Process

ConflictAnalysis

ConflictResolution

SchemaMerging

Schema A

Schema B

Schema ASchema B

List of Conflicts

Schema ASchema B

Interschema Properties

IntegratedSchema

• The schema integration process starts with two or more schemas andinvolves three main stages:

1. Conflict Analysis During conflict analysis differences in theschemas are identified, for example, similar concepts that arerepresented in different ways.

2. Conflict Resolution During conflict resolution the conflictsidentified during conflict analysis are resolved. For example, acommon method of representing equivalent concepts will bedecided upon. This process may involve discussing the problemswith the users or correcting errors in the schemas.

3. Schema Merging During schema merging the schemas aremerged into a single schema using the decisions made during theconflict resolution.

Ref: Batini, sec. 5.2.

Page 13: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 13

13

Overview• Definition

– What is schema integration?

– Merging two or more databaseschemas (models).

• Problems– Different methods of

representing the sameconcepts, for example, namingconcepts.

• Strategies– Schemas may be merged (i) in

one go or (ii) in stages.

• Process– Three stages: (i) identify

problems, (ii) resolveproblems, (iii) merge schemas.

• Resolving Conflicts– Changing the names and the

structure of entities.

• Example

– See paper.

Page 14: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 14

14

Conflicts - Names

• Synonyms– Objects that are the same but have different names.

• For example, passenger and customer.

• Homonyms– Objects that are different but have the same names.

• For example, publication(=book) and publication(=journal).

• There are two types of name conflict that occur in a database schema:

• Synonyms When two similar concepts occur with differentnames. For example, two public transport databases may haveentities called passenger and customer. These entities may be thesame entity.

• Homonyms When two different concepts occur with the samename. For example, two publishing databases may have entitiescalled publication but in one database a publication may be a bookwhile in the other database a publication may be a journal.

• Name conflicts cause a problem because information may be duplicated inthe integrated database. It is important to identify those data items in eachschema that actually represents the same concept or that should berepresented using different structures in the integrated schema.

• Synonyms may be removed from the database by renaming the conceptsso that they have the same name.

• Homonyms may be removed from the database by renaming the conceptsso that they have different names.

• It may be possible to use a superclass/subclass relationship to avoidsynonyms or homonyms.

Ref: Batini, sec. 5.3.1.

Page 15: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 15

15

Conflicts - Structural

• Identical concepts– Merged

• Compatible concepts– Representations are adapted and merged

• Incompatible concepts– Different cardinalities

– Different identifiers

– Reverse subset relationships

• Structural conflicts occur when the actual method of representing thesame concept in different databases is different or incompatible.

• There are three cases:

• Identical Concepts When the same concept in differentdatabases is represented in the same way they may be merged.For example, when an entity publication has the same structureand means the same in two databases the entities may be merged.

• Compatible Concepts When the same concept in differentdatabases is represented in compatible ways they may be merged.For example, when an entity publication is represented by anattribute in one database and an entity in another database theymay be merged by converting the attribute into an entity.

• Incompatible Concepts When the same concept in differentdatabases is represented using different structures then it may bedifficult to merge them directly. For example:

- Relationships may have different cardinalities (i.e. one-to-many and many-to-many).

- Primary keys may be different.

- Set relationships may be reversed (e.g. projects containprogrammes and programmes contain projects).

Incompatible designs must be resolved by re-analysing the dataand adapting one or more of the schemas or by constructing anew, common representation.

Ref: Batini, sec. 5.3.2.

Page 16: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 16

16

Overview• Definition

– What is schema integration?

– Merging two or more databaseschemas (models).

• Problems– Different methods of

representing the sameconcepts, for example, namingconcepts.

• Strategies– Schemas may be merged (i) in

one go or (ii) in stages.

• Process– Three stages: (i) identify

problems, (ii) resolveproblems, (iii) merge schemas.

• Resolving Conflicts– Changing the names and the

structure of entities.

• Example

– See paper.

Page 17: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 17

Schema Integration

Example adapted from

“A Comparative Analysis of Methodologies forDatabase Schema Integration”

Batini, Lenzerini, Navathe

ACM Computing Surveys, Vol 18, No 4, Dec 1986

Page 18: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 18

18

Original Schemas

Publisher Book University

Topics

Title

Name

Name

State

Surname

Address

Publication

Keyword

Title

Code

Publisher

Title

Code

Research Area

Page 19: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 19

19

Step 1

Publisher Book University

Topics

Title

Name

Name

State

Surname

Address

Publication

Topics

Title

Code

Publisher

Title

Code

Research Area

Rename ‘Keywords’ to ‘Topics’

Page 20: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 20

20

Step 2

Publisher Book University

Topics

Title

Name

Name

State

Surname

Address

Publication

Topics

Title

Code

Title

Code

Research Area

Publisher

Name

Make the ‘publisher’ attribute an entity

Page 21: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 21

21

Step 3

Publisher

Book

University

Topics

Title

ResearchTitle

Name

State

Surname

Address

Publication

Title

Code

Code

Name

Two ‘Topic’ entities mergedtogether.

Two ‘Publisher’ entities mergedtogether.

Page 22: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 22

22

Step 4

Publisher

Book

University

Topics

Title

ResearchTitle

Name

State

Surname

Address

Publication

Title

Code

Code

Name

‘Book’ is a ‘Publication’

Page 23: Schema Integration - smckearney.com · • Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases

BBIT4/SEM4 Advanced Database Systems

© Stephen Mc Kearney, 2002. 23

23

Step 5

Publisher

Book

University

Topics

Title

ResearchTitle

Name

State

Surname

Address

Publication

Title

Code

Code

Name

Remove relationships inherited from ‘Publication’