mc0067 ( database management system)

Master of Computer Application (MCA) – Semester 2MC0067 – Database Management System

Assignment Set – 1

Que 1. Explain the functions and advantages of a DBMS over a traditional file system.

Ans:-

DBMS – A Database is a collection of interrelated data and a Database Management System is a set of programs to use and/or modify this data.

File-Based Systems

Conventionally, before the Database systems evolved, data in software systems was stored in and represented using flat files.

· Database Systems

Database Systems evolved in the late 1960s to address common issues in applications handling large volumes of data which are also data intensive. Some of these issues could be traced back to the following disadvantages of File-based systems.

Functions of a DBMS

The functions performed by a typical DBMS are the following:

· Data Definition

The DBMS provides functions to define the structure of the data in the application. These include defining and modifying the record structure, the type and size of fields and the various constraints/conditions to be satisfied by the data in each field.

· Data Manipulation

Once the data structure is defined, data needs to be inserted, modified or deleted. The functions which perform these operations are also part of the DBMS. These function can handle planned and unplanned data manipulation needs. Planned queries are those which form part of the application. Unplanned queries are ad-hoc queries which are performed on a need basis.

1

· Data Security & Integrity

The DBMS contains functions which handle the security and integrity of data in the application. These can be easily invoked by the application and hence the application programmer need not code these functions in his/her programs.

· Data Recovery & Concurrency

Recovery of data after a system failure and concurrent access of records by multiple users are also handled by the DBMS.

· Data Dictionary Maintenance

Maintaining the Data Dictionary which contains the data definition of the application is also one of the functions of a DBMS.

· Performance

Optimizing the performance of the queries is one of the important functions of a DBMS. Hence the DBMS has a set of programs forming the Query Optimizer which evaluates the different implementations of a query and chooses the best among them.

Thus the DBMS provides an environment that is both convenient and efficient to use when there is a large volume of data and many transactions to be processed.

Advantages of Database Systems

Minimal Data Redundancy

Since the whole data resides in one central database, the various programs in the application can access data in different data files. Hence data present in one file need not be duplicated in another. This reduces data redundancy. However, this does not mean all redundancy can be eliminated. There could be business or technical reasons for having some amount of redundancy. Any such redundancy should be carefully controlled and the DBMS should be aware of it.

· Data Consistency

Reduced data redundancy leads to better data consistency.

· Data Integration

Since related data is stored in one single database, enforcing data integrity is much easier. Moreover, the functions in the DBMS can be used to enforce the integrity rules with minimum programming in the application programs.

2

· Data Sharing

Related data can be shared across programs since the data is stored in a centralized manner. Even new applications can be developed to operate against the same data.

· Enforcement of Standards

Enforcing standards in the organization and structure of data files is required and also easy in a Database System, since it is one single set of programs which is always interacting with the data files.

· Application Development Ease

The application programmer need not build the functions for handling issues like concurrent access, security, data integrity, etc. The programmer only needs to implement the application business rules. This brings in application development ease. Adding additional functional modules is also easier than in file-based systems.

· Better Controls

Better controls can be achieved due to the centralized nature of the system.

· Data Independence

The architecture of the DBMS can be viewed as a 3-level system comprising the following:

– The internal or the physical level where the data resides.

– The conceptual level which is the level of the DBMS functions

Data Independence is isolating an upper level from the changes in the organization or structure of a lower level. For example, if changes in the file organization of a data file do not demand for changes in the functions in the DBMS or in the application programs, data independence is achieved. Thus Data Independence can be defined as immunity of applications to change in physical representation and access technique. The provision of data independence is a major objective for database systems.

· Reduced Maintenance

Maintenance is less and easy, again, due to the centralized nature of the system

3

Que. 2. Describe indexing and clustering techniques with relevant real time examples.

Ans:-

Indexing:-

Indexing is another common method for making retrievals faster.

Consider the example of CUSTOMER table used above. The following query is based on Customer’s city.

“Retrieve the records of all customers who reside in Delhi”

Here a sequential search on the CUSTOMER table has to be carried out and all records with the value ‘Delhi’ in the Cust_City field have to be retrieved. The time taken for this operation depends on the number of pages to be accessed. If the records are randomly stored, the page accesses depends on the volume of data. If the records are stored physically together, the number of pages depends on the size of each record also.

If such queries based on Cust_City field are very frequent in the application, steps can be taken to improve the performance of these queries. Creating an Index on Cust_City is one such method. This results in the scenario as shown below.

A new index file is created. The number of records in the index file is same as that of the data file. The index file has two fields in each record. One field contains the value of the Cust_City field and the second contains a pointer to the actual data record in the CUSTOMER table.

Whenever a query based on Cust_City field occurs, a search is carried out on the Index file. Here, it is to be noted that this search will be much faster than a sequential search in the CUSTOMER table, if the records are stored physically

4

http://resources.smude.edu.in/slm/wp-content/uploads/2009/06/clip-image0045.gif

together. This is because of the much smaller size of the index record due to which each page will be able to contain more number of records.

When the records with value ‘Delhi’ in the Cust_City field in the index file are located, the pointer in the second field of the records can be followed to directly retrieve the corresponding CUSTOMER records.

Thus the access involves a Sequential access on the index file and a Direct access on the actual data file.

Retrieval Speed v/s Update Speed : Though indexes help making retrievals faster, they slow down updates on the table since updates on the base table demand update on the index field as well.

It is possible to create an index with multiple fields i.e., index on field combinations. Multiple indexes can also be created on the same table simultaneously though there may be a limit on the maximum number of indexes that can be created on a table.

Clustering:-

In the above process, if the page containing the requested record is already in the memory, retrieval from the disk is not necessary. In such a situation, time taken for the whole operation will be less. Thus, if records which are frequently used together are placed physically together, more records will be in the same page. Hence the number of pages to be retrieved will be less and this reduces the number of disk accesses which in turn gives a better performance.

This method of storing logically related records, physically together is called clustering.

5


Eg: Consider CUSTOMER table as shown below.

Cust ID Cust Name Cust City …

10001 Raj Delhi …

10002 … … …

10003 … … …

10004 … … …

… … … …

… … … …

If queries retrieving Customers with consecutive Cust_IDs frequently occur in the application, clustering based on Cust_ID will help improving the performance of these queries. This can be explained as follows.

Assume that the Customer record size is 128 bytes and the typical size of a page retrieved by the File Manager is 1 Kb (1024 bytes).

If there is no clustering, it can be assumed that the Customer records are stored at random physical locations. In the worst-case scenario, each record may be placed in a different page. Hence a query to retrieve 100 records with consecutive Cust_Ids (say, 10001 to 10100), will require 100 pages to be accessed which in turn translates to 100 disk accesses.

But, if the records are clustered, a page can contain 8 records. Hence the number of pages to be accessed for retrieving the 100 consecutive records will be ceil(100/8) = 13. i.e., only 13 disk accesses will be required to obtain the query results. Thus, in the given example, clustering improves the speed by a factor of 7.7

· Intra-file Clustering – Clustered records belong to the same file (table) as in the above example.

· Inter-file Clustering – Clustered records belong to different files (tables). This type of clustering may be required to enhance the speed of queries retrieving related records from more than one tables. Here interleaving of records is used.

6

Que 3. Describe various integrity rules with a relevant example.

Ans:-

Integrity Rules

The following are the integrity rules to be satisfied by any relation.

• No Component of the Primary Key can be null.

• The Database must not contain any unmatched Foreign Key values. This is called the referential integrity rule.

Unlike the case of Primary Keys, there is no integrity rule saying that no component of the foreign key can be null. This can be logically explained with the help of the following example:

Consider the relations Employee and Account as given below.

Employee

Emp# EmpName EmpCity EmpAcc#

X101 Shekhar Bombay 120001

X102 Raj Pune 120002

X103 Sharma Nagpur Null

X104 Vani Bhopal 120003

Account

ACC# OpenDate BalAmt

120001 30-Aug-1998 5000

120002 29-Oct-1998 1200

120003 01-Jan-1999 3000

120004 04-Mar-1999 500

7

EmpAcc# in Employee relation is a foreign key creating reference from Employee to Account. Here, a Null value in EmpAcc# attribute is logically possible if an Employee does not have a bank account. If the business rules allow an employee to exist in the system without opening an account, a Null value can be allowed for EmpAcc# in Employee relation.

In the case example given, Cust# in Ord_Aug cannot accept Null if the business rule insists that the Customer No. needs to be stored for every order placed.

The next issue related to foreign key reference is handling deletes / updates of parent?

In the case example, can we delete the record with Cust# value 002, 003 or 005 ?

The default answer is NO, as long as there is a foreign key reference to these records from some other table. Here, the records are referenced from the order records in Ord_Aug relation. Hence Restrict the deletion of the parent record.

Deletion can still be carried if we use the Cascade or Nullify strategies.

Cascade: Delete/Update all the references successively or in a cascaded fashion and finally delete/update the parent record. In the case example, Customer record with Cust#002 can be deleted after deleting order records with Ord# 101 and 104. But these order records, in turn, can be deleted only after deleting those records with Ord# 101 and 104 from Ord_Items relation.

Nullify: Update the referencing to Null and then delete/update the parent record. In the above example of Employee and Account relations, an account record may have to be deleted if the account is to be closed. For example, if Employee Raj decides to close his account, Account record with Acc# 120002 has to be deleted. But this deletion is not possible as long as the Employee record of Raj references it. Hence the strategy can be to update the EmpAcc# field in the employee record of Raj to Null and then delete the Account parent record of 120002. After the deletion the data in the tables will be as follows:

Employee

Emp# EmpName EmpCity EmpAcc#

X101 Shekhar Bombay 120001

X102 Raj Pune 120002 Null

X103 Sharma Nagpur Null

X104 Vani Bhopal 120003

Account

ACC# OpenDate BalAmt

120001 30-Aug-1998 5000

8

120002 29-Oct-1998 1200

120003 01-Jan-1999 3000

120004 04-Mar-1999 500

Que 4. Explain the three-level architecture of a DBMS with a labeled diagram.

Ans:-

Three Level Architecture of a Database

A commonly used view of data approach is the three-level architecture suggested by ANSI/SPARC (American National Standards Institute/Standards Planning and Requirements Committee). ANSI/SPARC produced an interim report in 1972 followed by a final report in 1977. The reports proposed an architectural framework for databases. Under this approach, a database is considered as containing data about an enterprise. The three levels of the architecture are three different views of the data:

1. External – individual user view 2. Conceptual – community user view

3. Internal – physical or storage view

The three level database architecture allows a clear separation of the information meaning (conceptual view) from the external data representation and from the physical data structure layout. A database system that is able to separate the three different views of data is likely to be flexible and adaptable. This flexibility and adaptability is data independence that we have discussed earlier.

We now briefly discuss the three different views.

The external level is the view that the individual user of the database has. This view is often a restricted view of the database and the same database may provide a number of different views for different classes of users. In general, the end users and even the applications programmers are only interested in a subset of the database. For example, a department head may only be interested in the departmental finances and student enrolments but not the library information. The librarian would not be expected to have any interest in the information about academic staff. The payroll office would have no interest in student enrolments.

The conceptual view is the information model of the enterprise and contains the view of the whole enterprise without any concern for the physical implementation. This view is normally more stable than the other two views. In a database, it may be desirable to change the internal view to improve performance while there has been no change in the conceptual view of the database. The conceptual view is the

9

overall community view of the database and it includes all the information that is going to be represented in the database. The conceptual view is defined by the conceptual schema which includes definitions of each of the various types of data.

The internal view is the view about the actual physical storage of data. It tells us what data is stored in the database and how. At least the following aspects are considered at this level:

1. Storage allocation e.g. B-trees, hashing etc. 2. Access paths e.g. specification of primary and secondary keys, indexes and

pointers and sequencing.

3. Miscellaneous e.g. data compression and encryption techniques, optimization of the internal structures.

Efficiency considerations are the most important at this level and the data structures are chosen to provide an efficient database. The internal view does not deal with the physical devices directly. Instead it views a physical device as a collection of physical pages and allocates space in terms of logical pages.

The separation of the conceptual view from the internal view enables us to provide a logical description of the database without the need to specify physical structures. This is often called physical data independence. Separating the external views from the conceptual view enables us to change the conceptual view without affecting the external views. This separation is sometimes called logical data independence.

If there is no faculty name field in the conceptual schema, there can be no facultyname field in an external schema. The conceptual schema can change (like adding fields, relationships etc.) without affecting the external schemas. An external schema would have to be changed if some field it required is deleted from the conceptual schema.

An internal schema can be changed without affecting the conceptual schema. For example storage details and access strategies can change. A given field would

10

http://resources.smude.edu.in/slm/wp-content/uploads/2009/06/image193.png

have been earlier stored as decimal whereas we may now want to store it in a binary format. Earlier we would have used direct access to a faculty member whereas now we may want to use another scheme.

DBA can thus make changes to the internal schema for improving the performance of the database. DBA can also make changes to the conceptual schema to respond to new requirements within the organization. In these two cases, the external schemas can remain the same subject to the condition that the data they require is still present in the new structure. Database processing also has many disadvantages, which are given below.

1. Size for supporting all the complex functions it must provide to users, a DBMS must be a large program occupying substantial amount of disk space and internal memory.

2. Complexity of the functions furnished by DBMS make it a complex product, programmers and analysts must understand the features of the system so that they can take full advantages of it. With several choices to make while designing and implementing a new system using DBMS, it is possible that these choices are made incorrectly which may spell disaster for the entire project.

3. Cost: A good mainframe DBMS may be an expensive product.

Que 5. Describe the relational algebra operations with relevant real time examples.

Ans:-

Relational Algebra

Relational Algebra has operations, which act on relations and produce new relations. Relational Algebra is relationally complete. Anything, which can be achieved using relational calculus, can also be achieved using relational algebra.

SELECT COMMAND:-

This command in relational algebra takes a horizontal subset of a relation. It causes only certain rows to be included in the new relation. Five all information about faculty whose number is 58A.

SELECT FACULTY WHERE facultynumber = ’58A’ GIVING ANSWER

PROJECT COMMAND:-

11

This command in relational algebra takes a vertical subset of a relation. It causes only certain columns to be included in the new relation.

- Give the facultyname and salary of all faculties.

PROJECT FACULTY OVER (facultyname, salary) GIVING ANSWER

JOIN:-

This command allows us to pull together data from more than one relation. We join two tables together based on a common attribute. Join forms a new table containing the columns of both tables, which have been joined. Rows in this table will be the concatenation of a row from the first table and a row from the second, which match, on the common attribute called JOIN column.

Consider the following relations

FACULTY

Facultyno facultyname deptno

1241 Hamid 2

1349 Tom 3

1436 John 1

1514 Amit 5

DEPARTMENT

DeptNo DeptName

1 Maths

2 IT

3 MT

4 TDD

Suppose we want to JOIN the above tables on deptno (join column) and create a new relation call TEMP

TEMP facultyno facultyname deptno deptname

1241 Hamid 2 IT

12

1349 Tom 3 MT

1436 John 1 Maths

the column on which the tables are joined appears only once. All columns from both tables are present in the result. If there is a row in one table, which does not match any row in the other table, it will not appear in the result. The department (dept no 4 and depatname TDD) and the faculty (facultyno 1514, facultyname Amit and Deptno 5) do not appear in the result.

Give facultyno, facultyname together with Deptname to which the faculty is attached

JOIN FACULTY DEPARTMENT

WHERE FACULTY.DeptNo=DEPARTMENT.Deptno GIVING TEMP

PROJECT TEMP OVER (facultyno, facultyname, deptname) GIVING ANSWER

This type of Join is called natural join. In natural join, the column on which the table is joined appears only once. If we follow the same process but leave both copies of the join column in the table, we have equijoin. An equijoin for the above example would contain two deptno column one for Deptno from FACULTY relation and the other one for Deptno from DEPARTMENT relation. We also have theta join.

Give the computer ID and manufacturer name of all computers, which either have a 386SX processor or have been assigned, for the use or both.

SELECT COMPUTER WHERE ProcType = ‘386SX’ GIVING TEMP1

PROJECT TEMP1 OVER Compid, Mfgname GIVING TEMP2

JOIN COMPUTER, PC

WHERE COMPUTER.Compid = PC.compid GIVING TEMP3

SELECT TEMP3 WHERE LOCATION = ‘Home’ GIVING TEMP4

PROJECT TEMP4 OVER Compid.MfgName GIVING TEMP5

UNION TEMP2 WITH TEMP5 GIVING ANSWER

Give the computer ID and the name of the manufacturer of all computers which have 386SX processor and which have been assigned for home use.

Change the last line to

INSERT TEMP2 WITH TEMP5 GIVING ANSWER

13

Give the computer id and manufacturer name of all computers, which have a 386SX processor but have not been assigned for home use.

Change the last line to

SUBTRACT TEMP5 FROM TEMP2 GIVING ANSWER

Product of two relations is the relation obtained by concatenating every row of the first relation with every row of second relation.

Product of FACULTY and DEPARTMENT looks something as follows.

ANSWER

Facultyno facultyname deptno deptno deptname

1241 Hamid 2 1 Maths

1241 Hamid 2 2 IT

1241 Hamid 2 3 MT

1241 Hamid 2 4 TDD

1241 Hamid 2 1 Maths

—- — — – —-

—- — — – —-

—- — — – —-

There will be 16 rows. Every row of FACULTY is matched with every row of DEPARTMENT.If one table has m rows and the other one has n rows, then the product will have mn rows.

DIVISION

Consider the following relations

SOFTWARE

Packid TagNumber

MSOffice 1548

Lotus 1231

Lotus 134214

Dbase 1411

Front48

PACKAGE

Packid

MSOffice

FrontPage

When we divide SOFTWARE by PACKAGE, we obtain a new relation with a single column Tagnumber. The rows of this relation consist of those Tagnumbers from SOFTWARE which are matched to all package Ids appearing in the PACKAGE relation result is the following.

ANSWER

Tagnumber

1548

Note: SOFTWARE table has a row with 1548 as tagnumber and with MSOffice as packid and also a row with 1548 as Tagnumber and front-page as packid. No other tagnumber has this property.

Que .6. Write about the database system environment.

Ans:-

The Database System Environment

A DBMS is a complex software system. In this section, we discuss the types of software components that constitute a DBMS and the types of computer system software with which the DBMS interacts.

15

dbms Component Modules

The database and the DBMS catalog are usually stored on disk. Access to the disk is controlled primarily by the operating system (OS), which schedules disk input/output. A higher-level stored data manager module of the DBMS controls access to DBMS information that is stored on disk, whether it is part of the database or the catalog. The stored data manager may use basic OS services for carrying out low-level data transfer between the disk and computer main storage, but it controls other aspects of data transfer, such as handling buffers in main memory. Once the data is in main memory buffers, it can be processed by other DBMS modules, as well as by application programs. Some DBMSs have their own buffer manager module, while others use the OS for handling the buffering of disk pages.

The DDL compiler processes schema definitions, specified in the DDL, and stores descriptions of the schemas (meta-data) in the DBMS catalog. The catalog includes information such as the names and sizes of files, names and data types of data items, storage details of each file, mapping information among schemas, and constraints, in addition to many other types of information that are needed by the DBMS modules. DBMS software modules then look up the catalog information as needed.

The runtime database processor handles database accesses at runtime; it receives retrieval or update operations and carries them out on the database. Access to disk goes through the stored data manager, and the buffer manager keeps track of the database pages in memory. The query compiler handles high-level queries that are entered interactively. It parses, analyzes, and compiles or interprets a query by creating database access code, and then generates calls to the runtime processor for executing the code.

The precompiler extracts DML commands from an application program written in a host programming language. These commands are sent to the DML compiler for compilation into object code for database access. The rest of the program is sent to the host language compiler. The object codes for the DML commands and the rest of the program are linked, forming a canned transaction whose executable code includes calls to the runtime database processor.

It is now common to have the client program that accesses the DBMS running on a separate computer from the computer on which the database resides. The former is called the client computer, and the latter is called the database server. In some cases, the client accesses a middle computer, called the application server, which in turn accesses the database server.

16

Figure . Component Modules of DBMS and their interactions

Que 7. Describe Entity Types, Entity Sets, Attributes and Keys.

Ans:-

Entities and Attributes

Entities and Their Attributes. The basic object that the ER model represents is an entity, which is a "thing" in the real world with an independent existence. An entity may be an object with a physical existence (for example, a particular person, car, house, or employee) or it may be an object with a conceptual existence (for example, a company, a job, or a university course). Each entity has attributes—the particular properties that describe it. For example, an employee entity may be described by the employee’s name, age, address,

17


salary, and job. A particular entity will have a value for each of its attributes. The attribute values that describe each entity become a major part of the data stored in the database.

Figure:-shows an ER schema diagram for the company database

two entities and the values of their attributes. The employee entity e, has four attributes: Name, Address, Age, and HomePhone; their values are "Mahesh Kumar," "2311 Ameerpet, Hyderabad, AP 500001," "55," and "402459672," respectively. The company entity c, has three attributes: Name, Headquarters, and President; their values are "CDAC," "Hyderabad," and "Mahesh Kumar," respectively.

Several types of attributes occur in the ER model: simple versus composite, single-valued versus multivalued, and stored versus derived. We first define these attribute types and

18

illustrate their use via examples. We then introduce the concept of a null value for an attribute.

Composite versus Simple (Atomic) Attributes:

Composite attributes can be divided into smaller subparts, which represent more basic attributes with independent meanings. For example, the Address attribute of the employee entity shown in Figure 2.3 can be subdivided into StreetAddress, City, State, and Zip, with the values "2311 Ameerpet," "Hyderabad," "AR" and "500001." Attributes that are not divisible are called simple or atomic attributes. Composite attributes can form a hierarchy. For example, StreetAddress can be further subdivided into three simple attributes: Number, Street, and ApartmentNumber, as shown in Figure 2.4. The value of a composite attribute is the concatenation of the values of its constituent simple attributes.

Figure A hierarchy of composite attributes

Composite attributes are useful to model situations in which a user sometimes refers to the composite attribute as a unit but at other times refers specifically to its components. If the composite attribute is referenced only as a whole, there is no need to subdivide it into component attributes. For example, if there is no need to refer to the individual components of an address (zip code, street, and so on), then the whole address can be designated as a simple attribute.

Single-Valued versus Multivalued Attributes:

Most attributes have a single value for a particular entity; such attributes are called single-valued. For example, Age is a single-valued attribute of a person. In some cases an attribute can have a set of values for the same entity—for example, a Colors attribute for a car, or a CollegeDegrees attribute for a person. Cars with one color have a single value, whereas two-tone cars have two values for Colors. Similarly, one person may not have a college degree, another person may have one, and a third person may have two or more degrees; therefore, different persons can have different numbers of values for the CollegeDegrees attribute. Such attributes arc called multivalued. A multivalued attribute may have lower and upper bounds to constrain the number of values allowed for each individual entity. For

19


example, the Colors attribute of a car may have between one and three values, if we assume that a car can have at most three colors.

Stored versus Derived Attributes. In some cases, two (or more) attribute values are related – for example, the Age and POB attributes of a person. For a particular person entity, the value of Age can be determined from the current (today’s) date and the value of that person’s date of birth (DOB). The Age attribute is hence called a derived attribute and is said to be derivable from the DOB attribute, which is called a stored attribute. Some attribute values can be derived from related entities. For example, an attribute Number Of Employees of a department entity can be derived by counting the number of employees related to (working for) that department.

Null Values. In some cases a particular entity may not have an applicable value for an attribute. For example, the ApartmentNumber attribute of an address applies only to addresses that are in apartment buildings and not to other types of residences, such as single-family homes. Similarly, a CollegeDegrees attribute applies only to persons with college degrees. For such situations, a special value called null is created. An address of a single-family home would have null for its ApartmentNumber attribute, and a person with no college degree would have null for ColicgcDegrees. Null can also be used if we do not know the value of an attribute for a particular entity – for example, if we do not know the home phone of "Mahesh Kumar" the meaning of the former type of null is not applicable, whereas the meaning of the latter is unknown. The "unknown" category of null can be further classified into two cases. The first: case arises when it is known that the attribute value exists but is missing – for example, if the Height attribute of a person is listed as null. The second case arises when it is net known whether the attribute value exists – tor example, if the HomePhone attribute of a person is null.

Complex Attributes. Notice that composite and multivalued attributes can be nested in an arbitrary way. We can represent arbitrary nesting by grouping components of a composite attribute between parentheses () and separating the components with commas, and by displaying multivalued attributes between braces {}. Such attributes are called complex attributes. For example, if a person can have more than one residence and each residence can have multiple phones, an attribute AddressPhone for a person can be specified as shown in Figure 2.5.

Entity Types and Entity Sets.

A database usually contains groups of entities that are similar. For example, a company employing hundreds of employees may want to store pimilar information concerning each of the employees. These employee entities share the same attributes, but each entity has its own value(s) tor each attribute. An entity type defines a collection (or set) of entities that have the same attributes. Each entity type in the database is described by its name and attributes. Figure 2.6 shows two entity types, named employee and COMPANY, and a list of attributes for each. A few individual entities of each type are also illustrated, alone; with the values of their attributes. The collection of all entities of a particular entity type in the database at any point in time is called an entity set; the entity set is usually referred to using

20

the same name as the entity type. For example, employee refers to both a type of entity as well as the current set of all employee entities m the database.

An entity type is represented in ER diagrams as a rectangular box enclosing the entity type name. Attribute names are enclosed in ovals and are attached to their entity type by straight lines. Composite attributes are attached to their component attributes by straight lines. Multivalued attributes arc displayed in double ovals.

An entity type describes the schema or intension for a set of entities that share the same structure. The collection of entities of a particular entity type are grouped into an entity set, which is also called the extension of the entity type.

Key Attributes of an Entity Type:

An important constraint on the entities of an entity type is the key or uniqueness constraint on attributes. An entity type usually has an attribute whose values are distinct for each individual entity in the entity set. Such an attribute is called a key attribute, and its values can be used to identify each entity uniquely. For example, the Name attribute is a key of the company entity type in Figure 2.6, because no two companies are allowed to have the same name. Sometimes, several attributes together form a key, meaning that the combination of the attribute values must be distinct for each entity. If a set of attributes possesses this property, the proper way to represent this in the ER model that we describe here.

Figure: - Two entity types, employee and company and some member entities of each is to define a composite attribute and designate it as a key attribute of the entity type.

Specifying that an attribute is a key of an entity type means that the preceding uniqueness property must hold for every entity set of the entity type. Hence, it is a constraint that prohibits any two entities from having the same value for the key attribute at the same time. It is not the property of a particular extension; rather, it is a constraint on all extensions of the entity type. This key constraint (and other constraints we discuss later) is derived from the constraints of the miniworld that the database represents.

21


Some entity types have more than one key attribute. For example, each of the VehiclelD and Registration attributes of the entity type CAR is a key in its own right. The Registration attribute is an example of a composite key formed from two simple component attributes, RegistrationNumber and State, neither of which is a key on its own. An entity type may also have no key, in which case it is called a weak entity type.

Figure:-The car entity type with two key attributes, Registration and VehiclelD.

Value Sets (Domains) of Attributes:

Each simple attribute of an entity type is associated with a value set (or domain of values), which specifies the set of values that may be assigned to that attribute for each individual entity. In Figure 3.6, if the range of ages allowed for employees is between 16 and 70, we can specify the value set of the Age attribute of EMPLOYEE to be the set of integer numbers between 16 and 70. Similarly, we can specify the value set for the Name attribute as being the set of strings of alphabetic characters separated by blank characters, and so on. Value sets are not displayed in ER diagrams. Value sets are typically specified using the basic data types available in most programming languages, such as integer, string, Boolean, float, enumerated type, subrange, and so on. Additional data types to represent date, time, and other concepts are also employed

Que 8. Explain the following database operations with one query example for each: a) Insert

22


b) Delete c) Update

Ans:-

The Insert Operation

The Insert operation provides a list of attribute values for a new tuple t that is to be inserted into a relation R. Insert can violate any of the four types of constraints discussed in the previous section. Domain constraints can be violated if an attribute value is given that does not appear in the corresponding domain. Key constraints can be violated if a key value in the new tuple t already exists in another tuple in the relation r(R). Entity integrity can be violated if the primary key of the new tuple t is null. Referential integrity can be violated if the value of any foreign key in t refers to a tuple that does not exist in the referenced relation. Here are some examples to illustrate this discussion.

1. Insert <’Raju ‘V, ‘Kanaka’, null, ‘05-04-1960′, ‘621, Dabagardens, Visakhapatnam, AP M, 28000, null, 4> into employee.

· This insertion violates the entity integrity constraint (null for the primary key ENO), so it is rejected.

2. Insert <’valli’, ‘V, ‘Anjana, ‘999887777′, ‘05-04-1960′ ‘6357 Bhanu street, Hyderabad, AR’ F, 28000, ‘987654321′, 4> into employee.

· This insertion violates the key constraint because another tuple with the same ENO value already exists in the employee relation, and so it is rejected.

If an insertion violates one or more constraints, the default option is to reject the insertion. In this case, it would be useful if the DBMS could explain to the user why the insertion was rejected. Another option is to attempt to correct the reason for rejecting the insertion, but this is typically not used for violations caused by Insert; rather, it is used more often in correcting violations for Delete and Update. In operation 1 above, the DBMS could ask the user to provide a value for ENO and could accept the insertion if a valid ENO value were provided. In operation 3, the DBMS could either ask the user to change the value of DNO to some valid value (or set it to null), or it could ask the user to insert a department tuple with DNUMBER = 7 and could accept the original insertion only after such an operation was accepted. Notice that in the latter case the insertion violation can cascade back to the EMPLOYEE relation if the user attempts to insert a tuple for department 7 with a value for MGRENO that does not exist in the employee relation.

The Delete Operation

The Delete operation can violate only referential integrity, if the tuple being deleted is referenced by the foreign keys from other tuples in the database. To specify deletion, a

23

condition on the attributes of the relation selects the tuple (or tuples) to be deleted. Here are some examples.

1. Delete the WORKSJDN tuple with ENO = ‘999887777′ and PNO = 10.

· This deletion is acceptable.

2. Delete the employee tuple with eno = ‘999887777′.

· This deletion is not acceptable, because tuples in WORKS_ON refer to this tuple. Hence, if the tuple is deleted, referential integrity violations will result.

3. Delete the EMPLOYEE tuple with ENO = ‘333445555′.

· This deletion will result in even worse referential integrity violations, because the tuple involved is referenced by tuples from the EMPLOYEE, DEPARTMENT, WORKS_ON, and dependent relations.

Several options are available if a deletion operation causes a violation. The first option is to reject the deletion. The second option is to attempt to cascade (or propagate) the deletion by deleting tuples that reference the tuple that is being deleted. For example, in operation 2, the DBMS could automatically delete the offending tuples from WORKS_ON with ENO = ‘999887777′. A third option is to modify the referencing attribute values that cause the violation; each such value is either set to null or changed to reference another valid tuple. Notice that if a referencing attribute that causes a violation is part of the primary key, it cannot be set to null; otherwise, it would violate entity integrity.

Combinations of these three options are also possible. For example, to avoid having operation 3 cause a violation, the DBMS may automatically delete all tuples from WORKS_ON and DEPENDENT with ENO = ‘333445555′. Tuples in EMPLOYEE with SUPERENO = ‘333445555′ and the tuple in DEPARTMENT with MGRENO = ‘333445555′ can have their SUPERENO and MGRENO values changed to other valid values or to null.

Although it may make sense to delete automatically the WORKS_ON and DEPENDENT tuples that refer to an employee tuple, it may not make sense to delete other employee tuples or a DEPARTMENT tuple.

In general, when a referential integrity constraint is specified in the DDL, the DBMS will allow the user to specify which of the options applies in case of a violation of the constraint.

The Update Operation

The Update (or Modify) operation is used to change the values of one or more attributes in a tuple (or tuples) of some relation R. It is necessary to specify a condition on the attributes of the relation to select the tuple (or tuples) to be modified. Here are some examples.

1. Update the salary of the employee tuple with eno = ‘999887777′ to 28000.24

· Acceptable.

2. Update the DNO of the EMPLOYEE tuple with ENO = ‘999887777′ to 1.

· Acceptable.

3. Update the DNO of the employee tuple with ENO = ‘999887777′ to 7.

· Unacceptable, because it violates referential integrity.

4. Update the eno of the employee tuple with eno = ‘999887777′ to ‘987654321′.

· Unacceptable, because it violates primary key and referential integrity constraints.

Updating an attribute that is neither a primary key nor a foreign key usually causes no problems; the DBMS need only check to confirm that the new value is of the correct data type and domain. Modifying a primary key value is similar to deleting one tuple and inserting another in its place, because we use the primary key to identify tuples. If a foreign key attribute is modified, the DBMS must make sure that the new value refers to an existing tuple in the referenced relation (or is null). Similar options exist to deal with referential integrity violations caused by Update as those options discussed for the Delete operation. In fact, when a referential integrity constraint is specified in the DDL, the DBMS will allow the user to choose separate options to deal with a violation caused by Delete and a violation caused by Update

25