evolution of data models

Upload: sunil-kumar

Post on 07-Apr-2018

223 views

Category:

Documents


1 download

TRANSCRIPT

  • 8/6/2019 Evolution of Data Models

    1/13

    Topic 2.4: The Evolution of Data Models

    The quest for better data management has led to different models that attempt toresolve the file systems critical shortcomings. Because each data model evolvedfrom its predecessors, it is essential to examine the major data models in roughly

    chronological order.

    2.4.1 The Hierarchical Model

    The first data model was developed by Rockwell and IBM in the 1970s. It isknown as the hierarchical model. The hierarchical database is a collection ofrecords that is logically organized to conform to the upside-down tree(hierarchical) structure. Within the hierarchy, the top layer (the root) is perceivedas the parent of the segment directly beneath it. While this model represents1:M relationships well, it does not represent M:N relationships.

    Basic StructureGiven its manufacturing heritage, the hierarchical models best basic logicalstructure is best understood when you examine a manufacturing process. For,example, lets examine a somewhat simplified production process that creates afiling cabinet:

    1. A filing cabinet has many components: a frame, a set of drawers, andsliding bars for those drawers.

    2. A component may be composed of many smaller assemblies. Forexample, each drawer has a handle with a latching mechanism, a set of

    rollers that fits into the frames sliding bars, and a divider blade.3. An assembly may contain many parts. For instance, each roller is

    composed of a small wheel, an axle, and a brace.4. The production process is based on data relationships that remain fixed

    over time. Whether a given filing cabinet model is produced today ortomorrow, the same parts are put together in the same ways to producethe same assemblies that are combined to produce the same componentsthat are assembled in the same way to create the filing cabinet.

    Tracking the parts, the assemblies, and the components we have just describedis facilitated by understanding the logical process that is represented by the

    upside-down tree, known as a hierarchical structure, shown in Figure 2.1. Wehave labeled the structures components to help you understand the basichierarchical models vocabulary.

    As you examine Figure 2.1, note that the user perceives the hierarchicaldatabase as a hierarchy of segments. A segment is the equivalent of a filesystems record type. In other words, the hierarchical database is a collectionof record segment structures that is logically organized to conform to the

  • 8/6/2019 Evolution of Data Models

    2/13

    upside-down tree (hierarchical) structure shown in Figure 2.1.Within thehierarchy, the top layer (the root) is perceived as the parent of the segmentdirectly beneath it.

    For example, in Figure 2.1, the root segment is the parent of level 1segments,which in turn, are the parents of the level 2 segments, and so on. In turn thesegments below other segments are the children of the segment above them. Inshort:

    Each parent can have many children Each child has only one parent

    In this hierarchical structure, it is easy to trace both the databases componentsand the 1:M relationships among them.

    Advantages Conceptual simplicity Database security Data independence (because the data characteristics of the database

    structure are not defined in the programs accessing the database, instead

    the database structure and its data characteristics are defined in the datadictionary component of the DBMS. Therefore the programs accessingthe database become independent of the database)

    Database integrity (because data duplication or data redundancy isminimized as a result of relating the segments or records)

    Efficiency (the hierarchical DBMS file storage organization and accessmethods are based on the new hierarchal database structure which ismuch faster than the file storage organization and access methods used in

  • 8/6/2019 Evolution of Data Models

    3/13

    the old file system)

    Disadvantages Complex implementation Difficult to manage

    Lacks structural independence (because the programmer still needs towrite instructions on how and where to find the data stored on thecomputer disk, which depends on the database structure)

    Complex applications programming and use Implementation limitations (because the hierarchical data model does not

    support entities or record segments having multiple parents which aremodeled in a M:M relationships between two or more entities)

    Lack of standards among the implementation software (DBMS) developedby various software vendors

    2.4.2 The Network ModelThe network model was created to represent complex data more effectively thanthe hierarchical model could, to improve database performance, and to impose adatabase standard.

    Basic StructureIn many respects the network model resembles the hierarchical model. Forexample, as in the hierarchical model, the user perceives the network databaseas a collection of records in 1:M relationships. However, unlike the hierarchicalmodel, the network model allows a record to have more than one parent ormultiple parents. This feature allows the network model to handle complex (M:M)

    relationships between two or more entities, such the commonly encountered M:Mrelationships depicted in Figure 2.2 can be handled easily by the network model.

    In Figure 2.2, the M:M relationship between the ORDER and PART is resolved

  • 8/6/2019 Evolution of Data Models

    4/13

    by the introduction of the ORDER_LINE bridge entity.

    In network database terminology, a relationship is called a set. Each set iscomposed of at least two record types: an owner record and a member record.The difference between the hierarchical model and the network model is that the

    latter might include a condition in which a record can appear (as a member) inmore than one set. In other words, a member may have several owners. A setrepresents a 1:M relationship between the owner and the member. An exampleof such a relationship is depicted in Figure 2.3.

    Advantages Conceptual simplicity Handles more relationship types Data access flexibility Promotes database integrity Data independence

    Conformance to standards

    Disadvantages System complexity Lack of structural independence (because the programmer still needs to

    write instructions on how and where to find the data stored on thecomputer disk, which depends on the database structure)

  • 8/6/2019 Evolution of Data Models

    5/13

    2.4.3 The Relational Model

    The basic building block of the relational model is the table, which is a matrix ofrows and columns. Tables are related to each other via a common entitycharacteristic or attribute (primary key in the parent table is a foreign key in the

    child table). The parent table is the table which maps to the entity of the 1 sideof the relationship and the child table maps to the entity of the many side of therelationship between the two tables.

    All three relationship types are easily represented in this model. One of thedisadvantages of the relational model is that it requires substantial systemoverhead to run the Relational DBMS (RDBMS). However, with the currentlyavailable advanced computer hardware and software, high requirements forprocessing relational databases do not represent an overhead problem anymore.

    Basic StructureThe relational data model is implemented through a very sophisticated relationaldatabase management system (RDBMS). The RDBMS performs the samebasic functions provided by the hierarchical and network database systems, plusa host of other functions that make the relational data model easier to understandand to implement.

    The most important advantage of the RDBMS is its ability to let the user/designeroperate in a human logical environment. The RDBMS manages all of thecomplex physical details. Thus, the relational database is perceived by the user

    to be a collection of tables in which data are stored.

    Each table is a matrix consisting of series of row/column intersections. Tables,also called relations, are related to each other by sharing a common entitycharacteristic/attribute. For example, the CUSTOMER table in Figure 2.4 mightcontain a sales agents number which maintains a common link to the agenttable.

    The common link between the CUSTOMER and AGENT tables thus enables usto match the customer to his/her sales agent, even though the customer data arestored in another table. Although the tables are completely independent of oneanother, we can easily connect the data between tables. The relational model

    thus provides a minimum level of controlled redundancy to eliminate most of theredundancies found in old file systems.

  • 8/6/2019 Evolution of Data Models

    6/13

    The relationship type (1:1, 1:M, or M:N) is often shown in a relational schema, anexample of which is depicted in Figure 2.5. A relational schema is a visualrepresentation of the relational databases entities, the attributes within thoseentities, and the relationship between those entities.

    As you examine Figure 2.5, note that the relational schema shows theconnecting fields (in this case, AGENT_CODE) and the relationship type, 1:M.MS Access DBMS software used to generate Figure 2.5, employs the symbolto indicate the many side. In this example, the CUSTOMER represents the

  • 8/6/2019 Evolution of Data Models

    7/13

    many side because an AGENT can serve many CUSTOMERS. The AGENTrepresents the 1 side, because each CUSTOMER is served by only oneAGENT.

    Advantages

    Structural independence Improved conceptual simplicity Easier database design, implementation, management, and use Ad hoc query capability Powerful database management system

    Disadvantages Substantial hardware and system software overhead (this is not an issue

    any more because of the currently available hardware and software) Can facilitate poor design and implementation (Less experienced

    database designers may develop poor database design)

    May promote islands of information problems (because various users indifferent departments will be developing their own database applications)

    2.4.4 The Entity Relationship Model

    An alternate model is the Entity Relationship (ER) model. In this model, entitiesare drawn by using diagrams with line connectors that depict their relationships.This model has the advantage of visually depicting relationships. A disadvantageis that there is no corresponding (data management language (DML).

    The ER model or ERM is a widely accepted and adapted graphical tool for datamodeling. Peter Chen first introduced the ER data model in 1976 in his landmarkpaper The Entity Relationship Model: Toward a Unified View of Data. The ERMyielded a graphical representation that popularized the use of the ER diagramsas a tool for conceptual-level data modeling. Better yet, the ER modelcomplemented the relational model concepts, thus providing the foundation for atightly structured database design environment to ensure the proper design ofrelational databases.

    Basic Structure

    ER models are normally represented in an entity relationship diagram (ERD),which uses graphical representations to model the database requirements.

    An entity is represented in the ERD model by a rectangle, also known asan entity box. The name of the entity, a noun, is written in the center of therectangle. The name of the entity is generally written in capital letters andis written in singular form: PAINTER rather than PAINTERS. Normally,

  • 8/6/2019 Evolution of Data Models

    8/13

    when applying the ERD to the relational model, an entity is mapped to arelational table. Each row in the relational table is known as an entityinstance orentity occurrence in the ER model. Each entity is describedby a set ofattributes that describe particular characteristics of the entity.For example, the entity EMPLOYEE will have attributes such as Social

    Security number, a last name, and a first name.

    Relationships describe associations among data entities. Mostrelationships describe associations between two entities. ERD modelersuse the term connectivity to label the types of relationships (1:M, M:N,1:1). The entity connectivity is written next to each entity box.Relationships are represented by a diamond connected to the relatedentities through a relationship line. The name of the relationship, an activeor passive verb, is written inside the diamond. For example, each of thecompanys DEPARTMENTs has many EMPLOYEEs. And a PAINTERpaintsmany PAINTINGS.

    Figure 2.6 shows some basic ERD models that illustrate these relationships andconnectivity type.

    The ERD shown in Figure 2.6 is based on the so-called Chen model. Althoughthe entities and relationships are shown in a horizontal format in Figure 2.6, theyalso may be oriented vertically. The entity location and the order in which theentities are presented are immaterial just remember to always read a 1:Mrelationship from the 1 side to the M side.

    A more current version of the ERD is the Crows Foot Model shown in Figure

  • 8/6/2019 Evolution of Data Models

    9/13

    2.7. The label Crows Foot is derived from the three-pronged symbol used torepresent the many side of the relationship. The Crows Foot model places therelationship name in the relationship line.

    As you examine the basic Crows Foot ERD in Figure 2.7, note that theconnectivity is represented by symbols. For example, the 1 is represented by ashort line segment and the M is represented by the three-pronged crows foot.Like the Chen ERD, the entities and the relationships may be representedhorizontally or vertically. And again like the Chen ERD, the location and the orderin which the entities are presented in a Crows Foot ERD are immaterial.

    Advantages Exceptional conceptual simplicity Visual representation Effective communication tool Integrated with the relational data model

    Disadvantages Limited constraint representation Limited relationship representation (relationships between attributes can not

    be modeled) No data manipulation language Loss of information content (limited space is available to draw large number

    of entities in the Chen original notations of the ERD technique)

  • 8/6/2019 Evolution of Data Models

    10/13

    2.4.5 The Object Oriented Model

    In the Object Oriented model entities are represented as objects that containboth data and operations. An advantage of this model is the addition of semanticcontent. A disadvantage is the steeper learning curve.

    The semantic data model (SDM) modeled both data and theirrelationships in asingle structure known as an object. Because its basic modeling structure is anobject, the SDM is said to be an object oriented data model (OODM). In turn,the OODM becomes the basis for the object oriented database managementsystem (OODBMS).

    An OODM reflects a very different way to define and use entities. Like therelational models entity, an object is described by its factual content. But, quiteunlikean entity, an object includes information about relationships between thefacts within the object, as well as information about relationships with other

    objects. Therefore, the facts within the objects are given greatermeaning.

    Basic Structure

    The object oriented data model is based on the following components:

    An object is an abstraction of a real-world thing. An object class is arepresentation of a set of objects with shared attributes and behavior. Forexample, an object class student is a model of all students in aneducational institution. An object class may be considered equivalent to

    an ER models entity. More precisely, an object represents only oneindividual occurrence of an entity.

    Attributes describe the properties of an object. For example, a PERSONobject class includes the attributes ID, Name, Social Security Number andDate of Birth.

    Objects that share similar characteristics are grouped in classes. A classis a collection of similar objects with shared structure (attributes) andbehavior (methods). In a general sense, a class resembles the ERmodels entity set. However, a class is different from an entity in that itcontains a set of procedures known as methods. A classs methodrepresents a real-world action such as finding a selected PERSONs

    name, changinga PERSONs name, orprintinga PERSONs address. Inother words, methods are the equivalent of procedures in traditionalprogramming languages. In object oriented terms methods define anobjects behavior.

    Classes are organized in class hierarchy. The class hierarchy resemblesan upside-down tree in which each class has only one parent. Forexample, the CUSTOMER and EMPLOYEE class share a parentPERSON class. However it is possible that one child class to have

  • 8/6/2019 Evolution of Data Models

    11/13

    multiple parents.

    Inheritance is the ability of an object within the class hierarchy to inheritthe attributes and methods of the classes above it. For example, we cancreate two classes, CUSTOMER and EMPLOYEE, as subclasses fromthe class PERSON. In this case CUSTOMER and EMPLOYEE will inherit

    all attributes and methods from PERSON.

    To illustrate the difference between the OO model and the ER model, letsexamine their graphic representations in the simple invoicing problem shown inFigure 2.8.

    As you examine Figure 2.8, note that:

    The OO data model represents an object class as a box; all of the objectsattributes and relationships to other objects are included within the objectclass box. The object class representation of the INVOICE includes all

    related objects within the sameobject class box. The ER model uses three separate entities and two relationships to

    represent an invoice transaction. Because customers can put more thanone item at a time, each invoice references one or more lines, one itemper line. And, because invoices are generated by customers, the data-modeling requirements include a customer entity and a relationshipbetween the customer and the invoice.

    Advantages Adds semantic content Visual presentation includes semantic content Database integrity

  • 8/6/2019 Evolution of Data Models

    12/13

    Both structural and data independence

    Disadvantages Slow pace of OODM standards development Complex navigational data access

    Steep learning curve High system overhead slows transactions Lack of market penetration

    2.4.6 Other Models

    Another semantic data model was developed in response to the increasingcomplexity of applications- the extended relational data model (ERDM). The

    ERDM championed by many relational database researchers constitutes therelational models response to the OODM challenge. This model includes manyof the OO models best features within an inherently simpler relational databasestructural environment. Thats why a DBMS based on the ERDM is oftendescribed as an object/relational database management system (O/RDBMS).

    The OODM and ERDM are similar in the sense that each attempts to address thedemand for more semantic information to be incorporated into the model.However, the OODM and the ERDM differ substantially both in underlyingphilosophy and in the nature of the problem to be addressed.

    Although the ERDM includes a strong semantic component, it is primarily basedon the relational data models concepts. In contrast, the OODM is wholly basedon the OO semantic data model concepts. The ERDM is primarily geared tobusiness applications, while the OODM tends to focus on very specializedengineering and scientific applications. In the database arena, the most likelyscenario appears to be an ever-increasing merging of OO and relational datamodel concepts and procedures.

    2.4.7 Data Models: SummaryThe evolution of database management systems has always been driven by the

    search for new ways of modeling increasingly complex real-world data. Asummary of the most commonly recognized data models is shown in Figure 2.9.

  • 8/6/2019 Evolution of Data Models

    13/13

    Concept Check

    What are major types of data models?

    How does the hierarchical data model address the problem of data redundancy?

    What are the features of relational data models?