module 3: the relational database modellagardemics.weebly.com/uploads/1/1/3/5/11356329/mod3.pdf ·...

92
Module 3: The Relational Database Model OBJECTIVES: In this chapter, you will learn: That the relational database model offers a logical view of data About the relational model’s basic component: relations That relations are logical constructs composed of rows (tuples) and columns (attributes) That relations are implemented as tables in a relational DBMS About relational database operators, the data dictionary, and the system catalog

Upload: others

Post on 02-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Module 3: The Relational Database Model

    OBJECTIVES:

    In this chapter, you will learn:

    • That the relational database model offers a logical view of data

    • About the relational model’s basic component: relations

    • That relations are logical constructs composed of rows (tuples) and columns (attributes)

    • That relations are implemented as tables in a relational DBMS

    • About relational database operators, the data dictionary, and the system catalog

  • Relational Model

    • The Relational Model has 3 well-definedcomponents:

    a) A logical data structure represented byrelations

    b) A set of integrity rules to enforce that thedata are and remain consistent over time

    c) A set of operations that defines how data aremanipulated.

  • • In relational model, you can think of related recordsas being stored in independent tables, this makes iteasier to understand than the hierarchical ornetwork model.

    • This is important because logical simplicity leadsto simple and effective database designmethodologies.

    • Because the table plays such a prominent role in therelational model, it deserves a closer look.

    • Therefore, our discussion begins with an explorationof the details of table structure and contents.

  • Tables and their characteristics

    • The logical view of the relational database isfacilitated by the creation of data relationshipsbased on a logical construct known as a relation.

    • Because a relation is a mathematical construct,end users find it much easier to think of arelation as a table.

    • A table contains a group of related entityoccurrences (a.k.a entity set)

  • Table 3.1 Characteristics of a Relational Table

    1. A table is perceived as a two-dimensional structure composed of rowsand columns.

    2. Each table row (tuple) represents a single entity occurrence within theentity set.

    3. Each table column represents an attribute, and each column has adistinct name.

    4. Each row/column intersection represents a single data value.

    5. All values in a column must conform to the same data format.

    6. Each column has a specific range of values known as the attributedomain.

    7. The order of the rows and columns is immaterial to the DBMS.

    8. Each table must have an attribute or a combination of attributes thatuniquely identifies each row.

  • Fig 3.1 STUDENT table attribute values• To illustrate the characteristics of a relational table consider this example:

    • STU_NUM = Student number• STU_LNAME = Student last name• STU_FNAME = Student first name• STU_INIT = Student middle initial• STU_DOB = Student date of birth• STU_HRS = Credit hours earned• STU_CLASS = Student classification• STU_GPA = Grade point average• STU_TRANSFER = Student transferred from another institution• DEPT_CODE = Department code• STU_PHONE = 4-digit campus phone extension• PROF_NUM = Number of the professor who is the student’s advisor

  • • Using the STUDENT table shown in Fig. 3.1, wedraw the following conclusions corresponding to thepoints in Table 3.1:

    1. The STUDENT table is perceived to be a two-dimensional structure composed of eight rows(tuples) and twelve columns (attributes).

    2. Each row in the STUDENT table describes a singleentity occurrence within the entity set. Forexample, row 4 in Figure 3.1 describes a studentnamed Walter H. Oblonski. Given the tablecontents, the STUDENT entity set includes eightdistinct entities (rows), or students.

  • 3. Each column represents an attribute, and eachcolumn has a distinct name.

    4. All of the values in a column match theattribute’s characteristics. For example, thegrade point average (STU_GPA) columncontains only STU_GPA entries for each of thetable rows.

    * Data must be classified according to their formatand function.

  • Common Data Types

    Although various DBMSs can support different data types, most support at least the following:

    a) Numeric - Numeric data are data on which you can perform meaningful arithmetic procedures.

    For example, in Figure 3.1, STU_HRS and STU_GPA are numeric attributes.

    b) Character - Character data, also known as text data orstring data, can contain any character or symbol notintended for mathematical manipulation.

    In Figure 3.1, STU_CLASS and STU_PHONE areexamples of character attributes.

  • c) Date - Date attributes contain calendar datesstored in a special format known as the Juliandate format.

    For example, STU_DOB in Figure 3.1 is a date attribute.

    d) Logical - Logical data can only have true orfalse (yes or no) values.

    In Figure 3.1, the STU_TRANSFER attributeuses a logical data format.

  • 5. The column’s range of permissible values is known as itsdomain. Because the STU_GPA values are limited tothe range 0–4, inclusive, the domain is [0,4].

    6. The order of rows and columns is immaterial to theuser.

    7. Each table must have a primary key.

    In general terms, the primary key (PK) is anattribute (or a combination of attributes) that uniquelyidentifies any given row.

    In this case, STU_NUM (the student number) is theprimary key. Using the data presented in Figure 3.1,observe that a student’s last name (STU_LNAME) wouldnot be a good primary key because it is possible to findseveral students whose last name is Smith.

  • Keys- used to ensure that each row in a table is uniquely

    identifiable.

    - used to establish relationships among tables and toensure the integrity of the data.

    - consists of one or more attributes that determineother attributes.

    For example, an invoice number identifies all of theinvoice attributes, such as the invoice date and thecustomer name.

  • Determination

    • The key’s role is based on a concept known asdetermination.

    • In the context of a database table, the statement ―Adetermines B‖ (shorthand notation: A → B) indicatesthat if you know the value of attribute A, you can look up(determine) the value of attribute B.

    • For example, knowing the STU_NUM in the STUDENTtable means that you are able to look up that student’slast name, grade point average, phone number, and soon. Thus,

    STU_NUM → STU_LNAME, STU_FNAME, STU_INIT,STU_DOB, STU_TRANSFER

  • • The principle of determination is very importantbecause it is used in the definition of a centralrelational database concept known asfunctional dependence.

    The term functional dependence can bedefined most easily this way: the attribute B isfunctionally dependent on A if A determines B.More precisely:

    The attribute B is functionally dependent on theattribute A if each value in column A determinesone and only one value in column B.

  • • Using the contents of the STUDENT table inFigure 3.1, it is appropriate to say thatSTU_PHONE is functionally dependent onSTU_NUM.

    For example, the STU_NUM value 321452determines the STU_PHONE value 2134.

    On the other hand, STU_NUM is not functionallydependent on STU_PHONE because theSTU_PHONE value 2267 is associated with twoSTU_NUM values: 324274 and 324291.

  • • The functional dependence definition can begeneralized to cover the case in which thedetermining attribute values occur more thanonce in a table. Functional dependence can thenbe defined this way:

    Attribute A determines attribute B (that is, Bis functionally dependent on A) if all of therows in the table that agree in value forattribute A also agree in value for attribute B.

  • • Be careful when defining the dependency’s direction.

    ▫ For example, UPLB determines its student classification based onnumber of units completed.

    ▫ Therefore, you can write:STU_UNITS → STU_CLASS

    ▫ But the specific number of hours is not dependent on the classification. It is quite possible to find a junior with 62 completed hours or one with 84 completed hours. In other words, the classification (STU_CLASS) does not determine one and only one value for completed hours (STU_HRS).

    UNITS COMPLETED CLASSIFICATION

    Less than 30 Fr

    31 – 59 So

    60 – 89 Jr

    90 or more Sr

  • • It might take more than a single attribute to definefunctional dependence.

    Composite Key – a key composed of more thanone attribute (multi-attribute key).

    • Any attribute that is part of a key is known as a keyattribute.

    • Example, in the STUDENT table, the student’s last name would not be sufficient to serve as a key.

    • On the other hand, the combination of last name,first name, initial, and phone is very likely toproduce unique matches for the remainingattributes.

  • • STU_LNAME, STU_FNAME, STU_INIT, STU_PHONE → STU_HRS, STU_CLASS,

    STU_GPA, STU_DOB

    • Given the possible existence of a composite key,the notion of functional dependence can befurther refined by specifying full functionaldependence:

    If the attribute (B) is functionally dependent ona composite key (A) but not on any subset of thatcomposite key, the attribute (B) is fullyfunctionally dependent on (A).

  • • Within the broad key classification, severalspecialized keys can be defined:

    superkey – is any key that uniquely identifies eachrow.

    • In the STUDENT table, the superkey could be any of the following:

    STU_NUMSTU_NUM, STU_LNAMESTU_NUM, STU_LNAME, STU_INIT

    • In fact, STU_NUM, with or without additionalattributes, can be a superkey even when theadditional attributes are redundant.

  • candidate key – can be described as a superkeywithout unnecessary attributes, that is, a minimal superkey.

    • Using this distinction, note that the composite keySTU_NUM, STU_LNAME is a superkey, but it is not a candidate key because STU_NUM by itself is a candidate key!

    The combination

    STU_LNAME, STU_FNAME, STU_INIT, STU_PHONE

    might also be a candidate key, as long as you discountthe possibility that two students share the same lastname, first name, initial, and phone number.

  • • the selection of STU_NUM as the primary keywould be driven by the designer’s choice or byend-user requirements.

    • In short, the primary key is the candidate keychosen to be the unique row identifier.

    • Note, incidentally, that a primary key is asuperkey as well as a candidate key.

  • • Within a table, each primary key value must beunique to ensure that each row is uniquelyidentified by the primary key. In that case, thetable is said to exhibit entity integrity.

    • To maintain entity integrity, a null (that is, nodata entry at all) is not permitted in the primarykey.

    • A null is no value at all. It does not mean a zeroor a space. A null is created when you press theEnter key or the Tab key to move to the nextentry without making a prior entry of any kind.

  • • Nulls can never be part of a primary key, andthey should be avoided—to the greatest extentpossible—in other attributes, too.

    • There are rare cases in which nulls cannot bereasonably avoided when you are working withnonkey attributes.

    • Example, STU_INIT

    Some students do not have a middle initial.Therefore, some of the STU_INIT values may benull.

  • • Even if nulls cannot always be avoided, theymust be used sparingly. In fact, the existence ofnulls in a table is often an indication of poordatabase design.

    • Nulls, if used improperly, can create problemsbecause they have many different meanings. Forexample, a null can represent:

    An unknown attribute value.

    A known, but missing, attribute value.

    A ―not applicable‖ condition.

  • • nulls can create problems when functions suchas COUNT, AVERAGE, and SUM are used.

    • In addition, nulls can create logical problemswhen relational tables are linked.

    Controlled redundancy – makes therelational database work.

    • Tables within the database share commonattributes that enable the tables to be linkedtogether.

  • Quiz 1:

    1 – 3 Enumerate 3 characteristics of a relationtable.

    4 – 6 Enumerate the 3 meanings of null value orwhat a null value can represent.

    7 any key that uniquely identifies each row.

    8 a key composed of more than one attribute(multi-attribute key).

    9 can be described as a superkey withoutunnecessary attributes, that is, a minimalsuperkey.

  • Fig. 3.2 An example of a simple relational database

    Table Name: VendorPrimary Key: Vend_CodeForeign Key: none

    Table Name: ProductPrimary Key: Prod_CodeForeign Key: Vend_Code

  • • Because the PRODUCT table is related to theVENDOR table through these VEND_CODE values,the multiple occurrence of the values is required tomake the 1:M relationship between VENDOR andPRODUCT work.

    • Each VEND_CODE value in the VENDOR table isunique—the VENDOR is the ―1‖ side in theVENDOR-PRODUCT relationship.

    • But any given VEND_CODE value from theVENDOR table may occur more than once in thePRODUCT table, thus providing evidence thatPRODUCT is the ―M‖ side of the VENDOR-PRODUCT relationship.

  • • In database terms, the multiple occurrences of theVEND_CODE values in the PRODUCT table are notredundant because they are required to make therelationship work.

    • Recall that data redundancy exists only when there isunnecessary duplication of attribute values.

    • Remember the naming convention—the prefix PRODwas used to indicate that the attributes ―belong‖ to thePRODUCT table. Therefore, the prefix VEND in thePRODUCT table’s VEND_CODE indicates thatVEND_CODE points to some other table in the database.

    • In this case, the VEND prefix is used to point to the VENDOR table in the database.

  • • A relational database can also be represented by a relational schema.

    relational schema – is a textual representation of thedatabase tables where each table is listed by its namefollowed by the list of its attributes in parentheses. Theprimary key attribute(s) is (are) underlined.

    Example, the relational schema for Figure 3.2 would beshown as:

    VENDOR (VEND_CODE, VEND_CONTACT,VEND_AREACODE, VEND_PHONE)

    PRODUCT (PROD_CODE, PROD_DESCRIPT,PROD_PRICE, PROD_ON_HAND, VEND_CODE)

  • Fig. 3.3 Relational Diagram

    • Note that the link is the equivalent of therelationship line in an ERD. This link is createdwhen two tables share an attribute with commonvalues. More specifically, the primary key of onetable (VENDOR) appears as the foreign key in arelated table (PRODUCT).

    • A foreign key (FK) is an attribute whose valuesmatch the primary key values in the related table.

  • • For example, in Figure 3.2, the VEND_CODE isthe primary key in the VENDOR table, and itoccurs as a foreign key in the PRODUCT table.

    • Because the VENDOR table is not linked to athird table, the VENDOR table does not containa foreign key.

    • If the foreign key contains either matchingvalues or nulls, the table that makes use of thatforeign key is said to exhibit referentialintegrity.

  • referential integrity - means that if theforeign key contains a value, that value refers toan existing valid tuple (row) in another relation.

    • Note that referential integrity is maintainedbetween the PRODUCT and VENDOR tables.

    • a secondary key is defined as a key that is usedstrictly for data retrieval purposes.

  • • Suppose customer data are stored in aCUSTOMER table in which the customernumber is the primary key. Do you suppose thatmost customers will remember their numbers?

    • Data retrieval for a customer can be facilitatedwhen the customer’s last name and phonenumber are used.

    • In that case, the primary key is the customernumber; the secondary key is the combination ofthe customer’s last name and phone number.

  • Table 3.1 Relational Database Keys

    Key Type Definition

    Superkey An attribute (or combination of attributes) that uniquelyidentifies each row in a table.

    Candidate key A minimal (irreducible) superkey. A superkey that does not contain a subset of attributes that is itself a superkey.

    Primary key A candidate key selected to uniquely identify all other attribute values in any given row. Cannot contain null entries.

    Secondary key An attribute (or combination of attributes) used strictlyfor data retrieval purposes.

    Foreign key An attribute (or combination of attributes) in one tablewhose values must either match the primary key inanother table or be null.

  • Integrity Rules

    • Relational database integrity rules are veryimportant to good database design.

    • Many RDBMSs enforce integrity rulesautomatically. However, it is much safer to makesure that your application design conforms tothe entity and referential integrity rulessummarized below.

  • Entity Integrity Description

    Requirement All primary key entries are unique, and no part of a primary key maybe null.

    Purpose Each row will have a unique identity, and foreign key values canproperly reference primary key values.

    Example No invoice can have a duplicate number, nor can it be null. (allinvoices are uniquely identified by their invoice number.)

    Reference Integrity Description

    Requirement A foreign key may have either a null entry, as long as it is not a part of its table’s primary key, or an entry that matches the primary key value in a table to which it is related. (Every non-null foreign key value must reference an existing primary key value.)

    Purpose It is possible for an attribute NOT to have a corresponding value, but itwill be impossible to have an invalid entry. The enforcement of thereferential integrity rule makes it impossible to delete a row in onetable whose primary key has mandatory matching foreign key valuesin another table.

    Example A customer might not yet have an assigned sales representative(number), but it will be impossible to have an invalid salesrepresentative (number).

  • Fig 3.4 An illustration of integrity rules

    Table name: CUSTOMERPrimary key: CUS_CODEForeign key: AGENT_CODE

    Table name: AGENTPrimary key: AGENT_CODEForeign key: none

  • 1. Entity integrity. The CUSTOMER table’s primary keyis CUS_CODE. The CUSTOMER primary key columnhas no null entries, and all entries are unique.Similarly, the AGENT table’s primary key isAGENT_CODE, and this primary key column is alsofree of null entries.

    2. Referential integrity. The CUSTOMER table contains aforeign key, AGENT_CODE, which links entries in theCUSTOMER table to the AGENT table. TheCUS_CODE row that is identified by the (primary key)number 10013 contains a null entry in itsAGENT_CODE foreign key because Mr. Paul F.Olowski does not yet have a sales representativeassigned to him. The remaining AGENT_CODE entriesin the CUSTOMER table all match the AGENT_CODEentries in the AGENT table.

  • • To avoid nulls, some designers use special codes,known as flags, to indicate the absence of somevalue.

    • Using Figure 3.4 as an example, the code -99could be used as the AGENT_CODE entry of thefourth row of the CUSTOMER table to indicatethat customer Paul Olowski does not yet have anagent assigned to him. If such a flag is used, theAGENT table must contain a dummy row withan AGENT_CODE value of -99.

  • • Other integrity rules that can be enforced in therelational model are the NOT NULL andUNIQUE constraints.

    • The NOT NULL constraint can be placed on acolumn to ensure that every row in the table hasa value for that column.

    • The UNIQUE constraint is a restriction placedon a column to ensure that no duplicate valuesexist for that column.

  • Relational Set Operators

    • The data in relational tables are of limited valueunless the data can be manipulated to generateuseful information.

    • Now we will learn the basic data manipulationcapabilities of the relational model.

    • Relational algebra defines the theoretical way ofmanipulating table contents using the eightrelational operators:

    SELECT, PROJECT, JOIN, INTERSECT, UNION,DIFFERENCE, PRODUCT, and DIVIDE.

  • • The degree of relational completeness can bedefined by the extent to which relational algebrais supported.

    • To be considered minimally relational, theDBMS must support the key relational operatorsSELECT, PROJECT, and JOIN.

    • Very few DBMSs are capable of supporting alleight relational operators.

  • • The relational operators have the property ofclosure.

    • Closure – the use of relational algebraoperators on existing relations (tables) producesnew relations.

    • There is no need to examine the mathematicaldefinitions, properties, and characteristics ofthose relational algebra operators. However,their use can easily be illustrated as follows:

  • 1. SELECT, also known as RESTRICT, yieldsvalues for all rows found in a table that satisfya given condition.

    SELECT can be used to list all of the rowvalues, or it can yield only those row valuesthat match a specified criterion.

    In other words, SELECT yields a horizontalsubset of a table. The effect of a SELECT isshown below:

  • SELECT ALL yields

    SELECT only PRICE less than $2.00 yields

    SELECT only P_CODE = 311452 yields

    Original Table New Table

  • 2. PROJECT yields all values for selectedattributes. In other words, PROJECT yields avertical subset of a table. The effect of aPROJECT is shown here:

    Original Table New Table

    PROJECT PRICE yields

    PROJECT P_CODE and PRICE yields

  • 3. UNION combines all rows from two tables, excluding duplicate rows.

    • The tables must have the same attributecharacteristics (the columns and domains mustbe compatible) to be used in the UNION.

    • When two or more tables share the samenumber of columns, and when theircorresponding columns share the same (orcompatible) domains, they are said to be union-compatible.

    • The effect of a UNION is shown below:

  • UNION

    yields

  • 4. INTERSECT yields only the rows that appear inboth tables. As was true in the case of UNION,the tables must be union-compatible to yieldvalid results.

    • For example, you cannot use INTERSECT if oneof the attributes is numeric and one is character-based.

    INTERSECT

    yields

  • 5. DIFFERENCE yields all rows in one table thatare not found in the other table; that is, itsubtracts one table from the other.

    • As was true in the case of UNION, the tablesmust be union-compatible to yield valid results.The effect of a DIFFERENCE is shown below.

    • However, note that subtracting the first tablefrom the second table is not the same assubtracting the second table from the first table.

  • DIFFERENCE

    yields

  • 6. PRODUCT yields all possible pairs of rowsfrom two tables—also known as the Cartesianproduct. Therefore, if one table has six rows andthe other table has three rows, the PRODUCTyields a list composed of 6 × 3 = 18 rows. Theeffect of a PRODUCT is shown here:

    PRODUCT

    yields

  • 7. JOIN allows information to be combined fromtwo or more tables.

    • JOIN is the real power behind the relationaldatabase, allowing the use of independent tableslinked by common attributes.

    • A natural join links tables by selecting only therows with common values in their commonattribute(s).

    • A natural join is the result of a three-stageprocess:

  • a. First, a PRODUCT of the tables is created

    b. Second, a SELECT is performed on the outputof Step a to yield only the rows for which theAGENT_CODE values are equal. The commoncolumns are referred to as the join columns.

    c. A PROJECT is performed on the results of Stepb to yield a single copy of each attribute,thereby eliminating duplicate columns.

  • Two Tables that will be used to illustrate JOIN

    Table Name: Customer Table Name: Agent

  • Natural Join, Step 1: Product

  • Natural Join, Step 2: Select

    SELECT only Customer.Agent_Code = Agent.Agent_Code yiels

  • Natural Join, Step 3: Project

    • The final outcome of a natural join yields a tablethat does not include unmatched pairs andprovides only the copies of the matches.

    PROJECT AGENT_CODE yields

  • Note a few crucial features of the natural join operation:

    • If no match is made between the table rows, the new tabledoes not include the unmatched row. In that case, neitherAGENT_CODE 421 nor the customer whose last name isSmithson is included. Smithson’s AGENT_CODE 421 doesnot match any entry in the AGENT table.

    • The column on which the join was made—that is,AGENT_CODE—occurs only once in the new table.

    • If the same AGENT_CODE were to occur several times in theAGENT table, a customer would be listed for each match.

    For example, if the AGENT_CODE 167 were to occur threetimes in the AGENT table, the customer named Rakowski,who is associated with AGENT_CODE 167, would occur threetimes in the resulting table. (A good AGENT table cannot, ofcourse, yield such a result because it would contain uniqueprimary key values.)

  • • Another form of join, known as equijoin, linkstables on the basis of an equality condition thatcompares specified columns of each table.

    • The outcome of the equijoin does not eliminateduplicate columns, and the condition orcriterion used to join the tables must beexplicitly defined. The equijoin takes its namefrom the equality comparison operator (=) usedin the condition.

    • If any other comparison operator is used, thejoin is called a theta join.

  • • Each of the preceding joins is often classified asan inner join. An inner join is a join that onlyreturns matched records from the tables that arebeing joined.

    • In an outer join, the matched pairs would beretained, and any unmatched values in the othertable would be left null.

  • • if an outer join is produced for tablesCUSTOMER and AGENT, two scenarios arepossible:

    • A left outer join yields all of the rows in theCUSTOMER table, including those that do nothave a matching value in the AGENT table. Asshown here:

  • • A right outer join yields all of the rows in theAGENT table, including those that do not havematching values in the CUSTOMER table. Asshown here:

    • Generally speaking, outer joins operate likeequijoins. The outer join does not drop one copyof the common attribute, and it requires thespecification of the join condition.

  • • Outer joins are especially useful when you are tryingto determine what value(s) in related tables cause(s)referential integrity problems. Such problems arecreated when foreign key values do not match theprimary key values in the related table(s).

    8. DIVIDE operation uses one single-column table(e.g., column ―a‖) as the divisor and one 2-columntable (i.e., columns ―a‖ and ―b‖) as the dividend. Thetables must have a common column (e.g., column―a‖). The output of the DIVIDE operation is a singlecolumn with the values of column ―a‖ from thedividend table rows where the value of the commoncolumn (i.e., column ―a‖) in both tables matches.

  • • Divide is illustrated here:

    • Using the example shown, note that:

    a. Table 1 is ―divided‖ by Table 2 to produce Table 3. Tables 1 and 2 both contain the column CODE but do not share LOC.

    b. To be included in the resulting Table 3, a value in the unshared column (LOC) must be associated (in the dividing Table 2) with every value in Table 1.

    c. The only value associated with both A and B is 5.

    DIVIDE yields

  • Data Redundancy and System Catalog

    Data dictionary – provides a detailed description of alltables found within the user/designer-created database.

    • Thus, the data dictionary contains at least all of theattribute names and characteristics for each table in thesystem.

    • In short, the data dictionary contains metadata—dataabout data.

    • The data dictionary is sometimes described as ―thedatabase designer’s database‖ because it records thedesign decisions about tables and their structures.

  • • Like the data dictionary, the system catalogcontains metadata.

    System catalog - detailed system datadictionary that describes all objects within thedatabase, including data about table names, thetable’s creator and creation date, the number ofcolumns in each table, the data typecorresponding to each column, index filenames,index creators, authorized users, and accessprivileges.

  • TableName

    Attribute Name

    Contents Type Format Range Reqd PK or FK

    FK Referenced Table

    Customer CUS_CODE

    CUS_LNAME

    CUS_FNAME

    CUS_INITIAL

    CUS_RENEW_DATE

    AGENT_CODE

    Customer account codeCustomer last nameCustomer first nameCustomer initialCustomer insurancerenewal dateAgent code

    CHAR(5)

    VARCHAR(20)

    VARCHAR(20)

    CHAR(1)

    DATE

    CHAR(3)

    99999

    Xxxxxxxx

    Xxxxxxxx

    X

    dd-mmm-yyyy

    999

    10000−99999

    Y

    Y

    Y

    PK

    FK AGENT_CODE

    Agent AGENT_CODEAGENT_AREACODEAGENT_PHONE

    AGENT_LNAME

    AGENT_YTD_SLS

    Agent codeAgent area codeAgent telnumberAgent last nameAgent year-to-date sales

    CHAR(3)CHAR(3)

    CHAR(8)

    VARCHAR(20)

    NUMBER(9,2)

    999999

    999-9999

    Xxxxxxxx

    9,999,999.99

    YY

    Y

    Y

    Y

    PK

    A Sample Data Dictionary

  • • FK = Foreign key

    • PK = Primary key

    • CHAR = Fixed character length data (1−255 characters)

    • VARCHAR = Variable character length data (1−2,000 characters)

    • NUMBER = Numeric data (NUMBER(9,2)) are used to specify numbers with two decimal places and up to nine digits, including the decimal places.

    • Some RDBMSs permit the use of a MONEY or CURRENCY data type.

  • • Because the system catalog contains all requireddata dictionary information, the terms systemcatalog and data dictionary are often usedinterchangeably.

    • In fact, current relational database softwaregenerally provides only a system catalog, from whichthe designer’s data dictionary information may bederived.

    • The system catalog is actually a system-createddatabase whose tables store the user/designer-created database characteristics and contents.

    • Therefore, the system catalog tables can be queriedjust like any user/designer-created table.

  • • In effect, the system catalog automatically produces databasedocumentation. As new tables are added to the database, thatdocumentation also allows the RDBMS to check for and eliminatehomonyms and synonyms.

    • In general terms, homonyms are similar-sounding words with differentmeanings, such as boar and bore, or identically spelled words withdifferent meanings, such as fair (meaning “just”) and fair (meaning“festival”).

    • In a database context, the word homonym indicates the use of the sameattribute name to label different attributes.

    • For example, you might use C_NAME to label a customer name attribute ina CUSTOMER table and also use C_NAME to label a consultant nameattribute in a CONSULTANT table.

    • To lessen confusion, you should avoid database homonyms; the datadictionary is very useful in this regard.

    • In a database context, a synonym is the opposite of a homonym andindicates the use of different names to describe the same attribute. Forexample, car and auto refer to the same object. Synonyms must be avoided.

  • Relationships within the relational database

    • You already know that relationships are classified as one-to-one (1:1), one-to-many (1:M), and many-to-many (M:N or M:M).

    • This section explores those relationships further to help you apply themproperly when you start developing database designs, focusing on thefollowing points:

    1) The 1:M relationship is the relational modeling ideal. Therefore, this relationship type should be the norm in any relational database design.

    2) The 1:1 relationship should be rare in any relational database design.

    3) M:N relationships cannot be implemented as such in the relational model.

    Later in this section, you will see how any M:N relationship can be changed into two 1:M relationships.

  • • The 1:M relationship is the relational database norm. Consider this example:

    Table name: PAINTERPrimary key: PAINTER_NUMForeign key: none

    Table name: PAINTINGPrimary key: PAINTING_NUMForeign key: PAINTER_NUM

    The 1:M Relationship

  • • Each painting is painted by one and only onepainter, but each painter could have painted manypaintings.

    • Note that painter 123 (Georgette P. Ross) has threepaintings stored in the PAINTING table.

    • There is only one row in the PAINTER table for anygiven row in the PAINTING table, but there may bemany rows in the PAINTING table for any given rowin the PAINTER table.

    • The 1:M relationship is found in any databaseenvironment.

  • • Students in a typical college or university willdiscover that each COURSE can generate manyCLASSes but that each CLASS refers to only oneCOURSE.

    • For example, an CS102B course might yield twoclasses: one offered on MF from 3pm to 4pmand one offered on TTh from 8am to 9am.

    • Therefore, the 1:M relationship betweenCOURSE and CLASS might be described thisway:

  • • Each COURSE can have many CLASSes, buteach CLASS references only one COURSE.

    • There will be only one row in the COURSE tablefor any given row in the CLASS table, but therecan be many rows in the CLASS table for anygiven row in the COURSE table.

    • Here is the ERM (entity relationship model) forthe 1:M relationship between COURSE andCLASS:

  • Table name: COURSEPrimary key: CRS_CODEForeign key: none

    Table name: CLASSPrimary key: CLASS_CODEForeign key: CRS_CODE

  • • Note that CLASS_CODE in the CLASS tableuniquely identifies each row. Therefore,CLASS_CODE has been chosen to be theprimary key.

    • However, the combination CRS_CODE andCLASS_SECTION will also uniquely identifyeach row in the class table.

    • In other words, the composite key composed ofCRS_CODE and CLASS_SECTION is acandidate key. Any candidate key must have thenot null and unique constraints enforced.

  • • As the 1:1 label implies, in this relationship, oneentity can be related to only one other entity, andvice versa.

    • For example, one department chair—a professor—can chair only one department, and one departmentcan have only one department chair.

    • The entities PROFESSOR and DEPARTMENT thusexhibit a 1:1 relationship.

    The 1:1 Relationship

  • The 1:M DEPARTMENT employs PROFESSORrelationship is implemented through the placement ofthe DEPT_CODE foreign key in the PROFESSOR table.

    The 1:1 PROFESSOR chairs DEPARTMENT relationship is implemented through the placement of the EMP_NUM foreign key in the DEPARTMENT table.

    Table name: DEPARTMENTPrimary key: DEPT_CODEForeign key: EMP_NUM

    Table name: PROFESSORPrimary key: EMP_NUMForeign key: DEPT_CODE

  • • Each professor is an employee. Therefore, the professoridentification is through the EMP_NUM. (However, notethat not all employees are professors—there’s anotheroptional relationship.)

    • The 1:1 PROFESSOR chairs DEPARTMENT relationshipis implemented by having the EMP_NUM foreign key inthe DEPARTMENT table. Note that the 1:1 relationshipis treated as a special case of the 1:M relationship inwhich the ―many‖ side is restricted to a singleoccurrence. In this case, DEPARTMENT contains theEMP_NUM as a foreign key to indicate that it is thedepartment that has a chair.

    • Also note that the PROFESSOR table contains theDEPT_CODE foreign key to implement the 1:MDEPARTMENT employs PROFESSOR relationship. Thisis a good example of how two entities can participate intwo (or even more) relationships simultaneously.

  • The M:N Relationship

    • A many-to-many (M:N) relationship is notsupported directly in the relational environment.

    • However, M:N relationships can be implemented bycreating a new entity in 1:M relationships with theoriginal entities.

    • To explore the many-to-many (M:N) relationship,consider a rather typical college environment inwhich each STUDENT can take many CLASSes, andeach CLASS can contain many STUDENTs.

  • • Each CLASS can have many STUDENTs, and each STUDENTcan take many CLASSes.

    • There can be many rows in the CLASS table for any given rowin the STUDENT table, and there can be many rows in theSTUDENT table for any given row in the CLASS table.

    • To examine the M:N relationship more closely, imagine a small college with two students, each of whom takes three classes.

    STUDENT’S LAST NAME SELECTED CLASSES

    Bowser Accounting 1, ACCT-211, code 10014Intro to Microcomputing, CIS-220, code 10018 Intro to Statistics, QM-261, code 10021

    Smithson Accounting 1, ACCT-211, code 10014Intro to Microcomputing, CIS-220, code 10018Intro to Statistics, QM-261, code 10021

  • • Given such a data relationship and the sample data,you could wrongly assume that you could implementthis M:N relationship by simply adding a foreign keyin the many side of the relationship that points tothe primary key of the related table:

    The wrong implementation of the M:N relationship between STUDENT and CLASS

    Table name: STUDENTPrimary key: STU_NUMForeign key: none

    Table name: CLASSPrimary key: CLASS_CODEForeign key: STU_NUM

  • • However, the M:N relationship should not beimplemented as shown for two good reasons:

    1) The tables create many redundancies. Those redundancies lead to the anomalies.

    For example, note that the STU_NUM valuesoccur many times in the STUDENT table.

    2) Given the structure and contents of the twotables, the relational operations become verycomplex and are likely to lead to systemefficiency errors and output errors.

  • • Fortunately, the problems inherent in the many-to-many (M:N)relationship can easily be avoided by creating a composite entity(also referred to as a bridge entity or an associative entity).

    • Because such a table is used to link the tables that were originallyrelated in an M:N relationship, the composite entity structureincludes—as foreign keys—at least the primary keys of the tablesthat are to be linked.

    • The database designer has two main options when defining acomposite table’s primary key:

    a) use the combination of those foreign keys orb) create a new primary key.

    • Therefore, you can create the composite ENROLL table to link thetables CLASS and STUDENT

    • Because the ENROLL table links two tables, STUDENT and CLASS,it is also called a linking table. In other words, a linking table isthe implementation of a composite entity.

  • Module 3 Summary

    • Tables are the basic building blocks of arelational database.

    • A grouping of related entities, known as anentity set, is stored in a table.

    • The relational table is composed of intersectingrows (tuples) and columns. Each row representsa single entity, and each column represents thecharacteristics (attributes) of the entities.

  • • Keys are central to the use of relational tables. Keysdefine functional dependencies; that is, otherattributes are dependent on the key and can,therefore, be found if the key value is known.

    • A key can be classified as a superkey, a candidatekey, a primary key, a secondary key, or a foreign key.

    • Each table row must have a primary key. Theprimary key is an attribute or a combination ofattributes that uniquely identifies all remainingattributes found in any given row.

    • Because a primary key must be unique, no nullvalues are allowed if entity integrity is to bemaintained.

  • • Although the tables are independent, they can be linked by common attributes. Thus, the primary key of one table can appear as the foreign key in another table to which it is linked.

    • Referential integrity dictates that the foreign key must containvalues that match the primary key in the related table or mustcontain nulls.

    • The relational model supports relational algebra functions:SELECT, PROJECT, JOIN, INTERSECT, UNION,DIFFERENCE, PRODUCT, and DIVIDE.

    • A relational database performs much of the data manipulationwork behind the scenes. For example, when you create adatabase, the RDBMS automatically produces a structure tohouse a data dictionary for your database. Each time youcreate a new table within the database, the RDBMS updatesthe data dictionary, thereby providing the databasedocumentation.

  • • Once you know the relational database basics, youcan concentrate on design.

    • Good design begins by identifying appropriateentities and their attributes and then therelationships among the entities.

    • Those relationships (1:1, 1:M, and M:N) can berepresented using ERDs. The use of ERDs allowsyou to create and evaluate simple logical design.

    • The 1:M relationship is most easily incorporated in agood design; you just have to make sure that theprimary key of the ―1‖ is included in the table of the―many.‖