rdbms

1.1. Introduction - Approach to Data Management- from files to Database

A file is a collection of records or documents dealing with an organization, person, area or subject. A

file, in the computer world, is a self contained piece of information available to the operating system

or any number of individual programs. In a computer, a file system is the collection of files, where the

files are placed logically for storage and retrieval.

File-Based System

Advantages of File Based system

File-based systems was the first method to store data in computers. The data was stored and

retrieved sequentially from the disk.

File based systems are an early attempt to computerise the manual filing system (organizing

the files with the help of papers). For example, a manual file can be set up to hold all the details

relating to a particular matter as a project, product, task, client or employee. In an organisation

there could be such files which may be labeled and stored.

The manual filing system works good when the number of items to be stored is small. It even

works quite well when the number of items stored is quite large and they are only needed to be

stored and retrieved. However, a manual file system crashes when the referencing of files are

not proper and processing of information in the files are carried out.

Drawbacks of File Based system

There were certain drawbacks in using the file-based approach which will be discussed in detail

using the below example.

Example:

In a university, a large number of students are enrolled and they have the option of

choosing various courses. The personal details of students, fees paid by them, the number and

details of the courses available, the number and the details of faculty members in various

departments are the separate files in the university system. Consider the effort to answer the

following queries.

Figure 1.0 : File Based System

Annual fees paid by the students of Computer science department.

Number of students requiring transportation facility from a particular area.

This year's revenue from students as compared to last year.

Number of students choosing different courses in different departments.

As shown in Figure1.0, in a file-based system, different programs in the same application may be

interacting with different private data files.

Thus the drawbacks of the file system are:

a. Data Redundancy and Inconsistency

Since data resides in different private data files, the file system leads to uncontrolled duplication of

data. This duplication leads to wastage of a lot of store room. This also costs time and money to

enter the data more than once. For example, the address information of student may have to be

duplicated in transport details data file (Figure 1.0). The data in a file system can become

inconsistent if multiple persons modify the data concurrently. For example, if any student changes

the residence and the change is notified to only his/her file and not to bus list. Entering wrong data is

also another reason for inconsistency.

b. Unanticipated Queries

Handling sudden/ad-hoc queries can be difficult in a file-based system, because it requires changes

in the existing programs. For example, finding the number of faculty members(Faculty Details Data

file in figure1.0) who used the transport facility (Transport Details Data File in figure1.0) in this year

may be difficult as separate file system is used to store the details.

c. Data Isolation

Though data used by different programs in the application may be related, they reside in isolated

data files. For example, there is a relationship between the course details (Course Details data file)

and student details (Student Personal Details data File), but they stand isolated as shown in Figure

1.0.

d. Concurrent Access Anomalies

In large multi-user systems the same file or record may need to be accessed by multiple users

simultaneously. Handling this in a file-based system is difficult. For example, the transport details

may be accessed by both faculty and students simaltaneously (Figure 1.0)

e. Security Issues

In data-intensive applications, security of data is a major concern. Users should be given access

only to required data and not the whole database. The data in a file-based system can be made

secure only by additional programming in each application. For example, the student marks should

be accessed only by the faculty members and not by the fellow students (Figure 1.0).

f. Integrity Issues

In any application, there are some data integrity rules that need to be maintained. These could be in

the form of certain conditions/constraints on the elements of the data records. In a file-based system,

all these rules need to be explicitly programmed in the application program. For example, the salary

structure of each faculty will be dependant on their designation as well as on any other conditions

which needs to be explicitly programmed (Figure 1.0).

g. Recovery Issues

System failures or loss of connection to remote systems should be dealt by the file system. In the

event of operating system failure or "soft" power failure, special routines in the file system must be

invoked similar to individual program failure. The damaged structures must also be corrected by the

file system . These may occur as a result of an operating system failure for which the OS is not able

to notify the file system, power failure or reset. The file system must also record events to allow the

analysis of systemic issues as well as the problems with specific files or directories.For example,

during the admission process of students if there is a system failure, the non recovery of data will

cause issues (Figure 1.0).

It may be noted that, facing the above issues like concurrent access, integrity security problems,

etc., is possible in a file-based system. Though all these are general issues of concern to any data-

intensive application (An application which processes huge volume of data simultaneously), each

application has to handle all these problems on its own. The application programmer should bother

not only about implementing the application business process but also about handling these general

issues.

Database Approach

In order to overcome the limitations of the file-based approach, the concept of database and the

Database Management System (DBMS) was introduced in 1960's.

The Database Management System (DBMS) is the System that works on a database which is a

static storage. The user interacts with database through DBMS. The purpose of a DBMS is to

provide an efficient way of storing and retrieving the data quicker for both single-user and multi-user

systems.

Database Management largely involves:

Storage of Data

Manipulation of the data

Access restriction for unauthorized users

We shall discuss in detail about this in the next section.

For further reading you may refer the websites below.

1.2. Types of Databases

The following are the types of Data Bases.

Hierarchical Database Management System (HDBMS)

Network Database Management System (NDBMS)

Relational Database Management System (RDBMS)

Hierarchical Database Management System(HDBMS)

Figure 1.1 : Hierarchical Database Management System

A hierarchical database is controlled in pyramid fashion, similar to the branches of a tree

extending downwards.

The related fields or records are grouped together so that there are higher-level records

and lower-level records.

The root record is the parent record at the top of the pyramid. (i.e, Segment which has no

parent record).

A "leaf" is a segment with no children. In figure 1.2 leaf segment is at the botton and it

does not have children.

A child record maps to only one parent record to which it is linked. In contrast, a parent

record may have one or more child records linked to it.

Databases strive by moving from the top to bottom.

A record search is performed by starting at the top of the pyramid and working down

through the tree from parent to child until the correct child record is found.

Example of HDBMS

Figure 1.2 : Animals Heirarchy

In Figure 1.2, Animals is the root record, Fish and Mammals are the child records of Animals, Fresh

Water and Marine are the sub child of the Animals, Halibut, Rainbow Trout, Dog and Cat are the

leafs.

Advantages

This database type can be accessed and updated rapidly because of the tree-like

structure and the relationships between records are defined in advance.

Disadvantages

Each child in a tree may have only one parent and relationships or linkages between the

children are not permitted, even if they seem right from a logical standpoint.

Hierarchical databases are so firm in their design that adding a new field or record

requires that the entire database be redefined.

Network Database Management System(NDBMS)

Figure 1.3 : Network Database Management System

A network database seems more similar to a cobweb or interconnected network of

records.

Parents are termed 'owners' and Children are termed 'members'.

In network database, each child or member can have one or more parents (or owners).

Example of NDBMS

In Figure 1.4, The Store node is parent or Owner. Customer node, Manager node and Salesman

node are the children or members of the Store node. They also act as the parent or owner of Order

node, where in store, customer, manager and salesman are interconnected. Items node has a single

parent or owner which is the Salesman node.

Figure 1.4 : Network representation in a Store Management

Advantages

A lot of connections can be made between the different types of data and the network

databases, and they work more flexibly.

Disadvantages

Network databases must be described in advance.

There is a limit to the number of connections which can be made between records.

Relational Database Management System(RDBMS)

Relational databases connect the information (Data) in different files by using common

information elements (data elements) or a key field.

Information (Data) is stored in different tables in relational databases, each having a key

field which is used to uniquely identify each row.

Each table has a key field that uniquely identifies each row which is the principle, and

these key fields can be used to connect one table of information (data) to another.

In relational databases, a relation is a table or file filled with data, rows or records are

termed as tuples, and columns are termed as attributes or fields.

Figure 1.5 : Relational Database Management System

The above Figure 1.5 shows the relationship between the Student Table and Course Table. The

primary key of Student Table ie Student ID and the primary key of Course Table ie Course ID. These

are referenced in the StudentCourse Table (Refer to Section 2.2 for more details about primary key).

This table works as mapping between the student and course details and its used to get the reports

similar to the number of students taking up a particular course etc.


Example of RDBMS

Figure 1.6 : Tables in Relational Model

Please refer figure 1.6 which explains about RDBMS. It has three tables (i.e Database1, Database2,

Database3). All these three tables have a unique column called Social Security No.(SSN*) which is

identified as a Key field to link all the three tables. This key field is used to uniquely identify each

row. This represents the relationship between the tables.

Advantage

The database server and application tools can be easily installed and upgraded.

RDBMS permits multiple database users to access a database simultaneously.

Authorization and privilege control features in an RDBMS permit the DBA (Database

Administrator) to restrict the access to authorized users and grant privileges to individual users

derived from the types of database tasks they need to perform.

RDBMSs support a generic language called "Structured Query Language" (SQL). The SQL

syntax is simple and the language uses standard English language keywords and phrasing,

making it fairly intuitive and easy to learn.

Supports data independence

RDBMS is simple and flexible.

RDBMS is more effective since SQL is very easy to learn.

Reduces redundancy of information.

Disadvantage

Searching for a data or information can take extra time as compared to other methods.













parent record).








Example of HDBMS




leafs.

Advantages



Disadvantages








records.



Example of NDBMS






Advantages



Disadvantages



















Example of RDBMS






Advantage













Disadvantage














parent record).








Example of HDBMS




leafs.

Advantages



Disadvantages








records.



Example of NDBMS






Advantages



Disadvantages



















Example of RDBMS






Advantage













Disadvantage


1.4. Functions of Database

The following are the functions which are performed by a typical DBMS:

a. Data Definition

The DBMS provides the functions to describe the structure of the data in the application. These

include defining and modifying the record structure, the type and size of fields and the various

constraints/conditions to be satisfied by the data in each field.

Figure 1.9 : Example of Data Definition

Example:

In Figure1.9 Attributes such as Name, Email Address,.., State are the defined columns of the

Student table. Each column has specific length, type and rules applicable to the corresponding

columns. So here Attribute (i.e Columns) represents data elements and Length, Type, Rules

represent the data definitions.

b. Data Manipulation

The DBMS must be able to handle requests from the users to retrieve, update and delete the

existing data in the database, and add new data to the database .The DBMS performs these

operations in the database .

c. Data Dictionary Management

The data dictionary stores the definition of data elements (information elements) and their

relationships.This information is called as metadata.The metadata contains the definition of data,

data type, integrity constraints, relationship between data elements etc. Any change made in a

database structure is automatically reflected in the data dictionary. The DBMS provides the data

abstraction and it removes structural and data dependency from the system.

d. Data Security & Integrity

Data integrity is an important component of information security. It refers to the consistency and

accuracy of data is stored in a database. Data integrity ensures that the data entered into the

database is accurate, valid, and consistent. The DBMS includes the functions which manage the

integrity and security of data in the application. These can be easily invoked by the application and

hence the application programmer need not code these functions in his/her programs.

e. Data Concurrency & Consistency

The DBMS makes sure that multiple users can access the database concurrently without

compromising the integrity of the database. In a single-user database, the user can modify data in

the database without being concerned that other users would modify the same data at the same

time. However, in multi-user database, several simultaneous transactions can update the same data

at same time. Transactions executing at the same time should produce meaningful and consistent

results. Hence, the control of data concurrency and data consistency is vital in multiuser database.

Data concurrency means that many users can access data at the same time.

Data consistency means that each user sees a consistent view of data, including visible changes

made by the user's own transactions and transactions of other users.

f. Data Backup & Recovery

The DBMS provides the backup and data recovery procedures to ensure data safety and integrity.

DBMS system provides special utilities which allow the DBA to perform routine and special backup

and restore procedures. Recovery Management handles the recovery of the database after a failure.

g. Performance

Optimizing the performance of the queries is one of the important functions of a DBMS. The DBMS

has a set of programs that form the Query Optimizer which evaluates the different implementations

of a query and selects the best among them.

Thus the DBMS provides a convenient and effective environment to use when a large volume of

data and many transactions are to be processed in the environment.

1.4. Functions of Database

The following are the functions which are performed by a typical DBMS:

a. Data Definition

The DBMS provides the functions to describe the structure of the data in the application. These

include defining and modifying the record structure, the type and size of fields and the various

constraints/conditions to be satisfied by the data in each field.

Figure 1.9 : Example of Data Definition

Example:

In Figure1.9 Attributes such as Name, Email Address,.., State are the defined columns of the

Student table. Each column has specific length, type and rules applicable to the corresponding

columns. So here Attribute (i.e Columns) represents data elements and Length, Type, Rules

represent the data definitions.

b. Data Manipulation

The DBMS must be able to handle requests from the users to retrieve, update and delete the

existing data in the database, and add new data to the database .The DBMS performs these

operations in the database .

c. Data Dictionary Management

The data dictionary stores the definition of data elements (information elements) and their

relationships.This information is called as metadata.The metadata contains the definition of data,

data type, integrity constraints, relationship between data elements etc. Any change made in a

database structure is automatically reflected in the data dictionary. The DBMS provides the data

abstraction and it removes structural and data dependency from the system.

d. Data Security & Integrity

Data integrity is an important component of information security. It refers to the consistency and

accuracy of data is stored in a database. Data integrity ensures that the data entered into the

database is accurate, valid, and consistent. The DBMS includes the functions which manage the

integrity and security of data in the application. These can be easily invoked by the application and

hence the application programmer need not code these functions in his/her programs.

e. Data Concurrency & Consistency

The DBMS makes sure that multiple users can access the database concurrently without

compromising the integrity of the database. In a single-user database, the user can modify data in

the database without being concerned that other users would modify the same data at the same

time. However, in multi-user database, several simultaneous transactions can update the same data

at same time. Transactions executing at the same time should produce meaningful and consistent

results. Hence, the control of data concurrency and data consistency is vital in multiuser database.

Data concurrency means that many users can access data at the same time.

Data consistency means that each user sees a consistent view of data, including visible changes

made by the user's own transactions and transactions of other users.

f.Data Backup & Recovery

The DBMS provides the backup and data recovery procedures to ensure data safety and integrity.

DBMS system provides special utilities which allow the DBA to perform routine and special backup

and restore procedures. Recovery Management handles the recovery of the database after a failure.

g.Performance

Optimizing the performance of the queries is one of the important functions of a DBMS. The DBMS

has a set of programs that form the Query Optimizer which evaluates the different implementations

of a query and selects the best among them.

Thus the DBMS provides a convenient and effective environment to use when a large volume of

data and many transactions are to be processed in the environment.

2.1. Entity Relationship Modeling

Introduction

The Entity-Relationship model (or ER model) is a way of graphically representing the logical

relationships of objects in order to create a database. Creation of an ER diagram is the first step in

designing a database. It helps the designer(s) to understand and to specify the desired components

of the database and the relationships among those components. An ER model is a graphical

representation which contains entities or "items", relationships among the entities and attributes of

the entities and relationships.

The following are the three basic elements in the ER model.

Entities : Any objects or items

Attribute: The Attribute is nothing but a property of an entity

Relationships : The links between various entities

Let us take University database as an example and try to understand how ER model is arrived at.

Example:

A university consists of a number of departments. Each department offers several courses. Each

course includes a number of modules. Students enroll in a particular course and study modules

towards the completion of that course. Each module is taught by a lecturer from the appropriate

department, and each lecturer teaches a group of students.

Entities

Entities are real world items or concepts that exist on their own and are represented as objects or

things of interest. An entity type is a collection of entities that share a common definition.

Identify all nouns in our university example,



towards the completion of that course. Each module is taught by a lecturer from the appropriate


This scenario consists of students, lecturers, modules, courses and departments. So here the

physical things(Physical things are those which exist in this world, that we can touch, feel etc.) like

students, lecturers and abstract things(An abstract thing is an idea or a concept in your mind. It is

not something that you can physically reach out and touch, smell, hear, taste, see) like

modules,department etc., make an entity type. If we take students as an entity type, then each

student in the university is an entity. The entities are represented as nouns in the description

because they are objects or things.

We can touch an entity of physical things and feel the entity of abstract things but an entity type is

simply an idea. Student is an idea of physical things (entity type) while Scott, Nancy, Lindsey, and

Mackenzie are touchable (Student names are entities). Department is an idea of abstract things

(entity type) while IT,CSE,ECE and CIVIL are entities.

Entity Diagrams

In an E-R Diagram, an entity is usually drawn as a rectangle.

The box is labeled with the name of the entity type. The entities identified in our example

are shown in Figure 2.1.

Figure 2.1 : Entities

Weak Entity

If an entity depends on another existing entity then it is considered as weak. A weak entity cannot be

identified by its own attributes. A weak entity is represented by double rectangles in E-R diagram.

Example:

SubModule is a good example for weak entity. The SubModule will be meaningless without a

Module entity and so it depends on the existence of Module as shown in Figure 2.2

Figure 2.2 : Weak Entity

Attributes

Attributes represent properties, facts, aspects or details of an entity. There are attributes or particular

properties that describe each entity.

In our University database each student in the university will have a Student ID, Name, Course taken

etc. Similarly each lecturer will have his/her own properties of ID, Name, department etc.

Attributes will have a name, an associated entity and properties of an entity. Attributes are often

nouns also.

Attributes in ER diagram

In an E/R Diagram attributes are represented by an oval.

A line is used to link an attribute to its entity.

The figure below represents the entities and their corresponding attributes in the University

database.

Figure 2.3 : Entities and Attributes

Multivalued Attribute

A multivalued attribute is an attribute that has more than one value attached to it. For instance if

phone number and graduating degree are the attributes of an Entity called Person, then those

attributes could have multiple values, as a person could have multiple phone numbers or could hold

multiple graduating degrees. We represent a multivalued attribute by double oval in E-R diagram.

Single Valued Attribute: Attribute that holds a single value; in Our example the attributes of Students

such as Roll number, Age, Date of Birth, City etc., can have only a single value.

In our example, a Student can have multiple phone numbers, and so Phone number is a multivalued

attribute.

Figure 2.4 : Multivalued Attributes

Relationships

The association between two or more entities is called a relationship. In our University database,

each student studies several Modules and each Lecturer teaches several Students. Here the entity

types Student - Modules and Lecturer - Students have a relationship. The Verbs most often describe

relationships between entities.

Identify the verbs(relationships) in our University database example:



towards the completion of that course. Each module is taught by a lecturer from the appropriate a


Each relationship has a name, a set of entities that participate in it, a degree and a cardinality ratio.

The degree is the number of entities that participate in that relationship(most have degree 2, For

example in figure 2.3 each Lecturer teaches several Students, so we can say that this relationship

has degree 2. Here the degree is 2 because it has two entities related to it).

Relationships in an ER diagram

Relationships are denoting links between two entities.

The name of the relationship is given in a diamond box (For example Belongs to as shown

in Figure 5.1).

Cardinality Ratio

Each entity can be involved in three types of relationships as shown:

One to One (1:1)

Each student belongs to one University. We can illustrate this ratio by writing ones on the

lines indicating the relationship as shown in Figure 2.5.

Figure 2.5 : One-one Mapping

The notation for the 1:1 relationship is shown in Figure 2.6.

Figure 2.6 : One-one Mapping

One to Many (1:M)

A lecturer teaches many students, and this One to Many relationship is illustrated in figure

2.7.

Figure 2.7 : One-Many

The notation for the 1:M relationship is shown in Figure 2.8.

Figure 2.8 : One-Many

Many to Many (M:M)

Each student takes many modules, and each module is taken by many students as shown

in figure 2.9.

Figure 2.9 : Many-Many

Making E/R Models

Till now we have seen how to identify the basic elements in an ER Diagram. Finally, to make an E/R

model you need to identify:

Entities

Attributes

Relationships

Cardinality ratios

Now lets see how an ER model will look like when all these elements are put together. The final ER

Model of our University database is shown in the Figure 2.10. In this figure we have shown the

entities and the relationship between the entities which depict the complete ER model of a

University. Here Department, Course, Module, Lecturer and Student are the entities.

The relationships in the Figure 2.10 are defined as Department Offers many Courses and those two

entities have One to Many relationship. A Department Assigns Many Lecturers(One(1) To Many(n)).

Each Lecturer teaches Many Students(One(1) To Many(n)). Every Student takes several

Modules(Many(n) To Many(n)). Every Module includes Many Courses(Many(n) To Many(n)). A

Course is enrolled by Many Students(One(1) to Many(n)).

The ER Model for the above example is given below:

The complete ER Model for our University database will be as shown in the diagram below. It is an

Integrated ER model containing the Entities and Relationships for a University database.

Figure 2.10 : University ER Model

Summary

ER Diagrams play a major role in database designing.

The ER Diagrams act as a non-technical communication tool.

This tool is used by both technical and non-technical users.

Entities represent real world things; They can be conceptual as a transaction or physical

as a bank.

Figure 2.11 : ER Model Summary

For further reading you may refer the websites below.

http://lynnbob.com/bob/articles/EntityRelationship1.htm

http://www.studytonight.com/dbms/er-diagram.php

2.2. Normalization - First Normal Form, Second Normal Form and Third Normal Form

The database design technique that is used to organize tables in a manner that reduces redundancy

and dependency of data is called Normalization. It is the scientific process of decomposing complex

tables(Relations) into smaller and easily manageable tables. The use of normalization is to

accurately access data from database. Without normalization, database systems can be inaccurate,

redundant, slow and inefficient. They might not produce the data that is expected. Listed below are

the advantages of normalization.

Advantages

Smaller, simpler and well-structured relations.

Avoids unnecessary duplication of data. That is, it helps to reduce redundancy.

Provides data integrity.

Helps to avoid update anomalies. That is, it isolates data so that additions, deletions, and

modifications of a field can be made in just one table. The changes are then propagated to the

rest of the database through the defined relationships.

Save storage space.

Edgar Codd invented the relational model and he proposed the theory of normalization with the

introduction of First Normal Form. He continued to extend the theory with Second and Third Normal

Forms. Later Edgar Codd joined with Raymond F. Boyce to develop the theory of Boyce-Codd

Normal Form(BCNF).

Theory of Normalization is still developing. For example, the discussions on 6th Normal Form are in

progress. However, in most practical applications normalization achieves its best in Third Normal

Form. The evolution of Normalization theories is illustrated below:

Figure 2.12 : Normalization Evolution

Let's understand a few things before we proceed --

What is a KEY ?

A KEY is a value used to uniquely identify a row in a table. It could be a single column or a

combination of multiple columns.

Note: The columns in a table that are NOT used to uniquely identify a record or row in a table are

called non-key columns.

What is a primary Key?

A primary key is a single column value that is used to uniquely identify a database record.

Figure 2.13 : Primary Key

The primary key column in a table must always have a value.

The primary key column in a table cannot have duplicate values. Each primary key value must

be unique.

The primary key values cannot be modified.

The primary key column should have a value when a new record is inserted into the table.

Example:

The table below contains the details of students. Here studentId is Primary Key which is used to

uniquely identify the details of a student from the table.

Figure 2.14 : Primary Key Illustration

Composite Key

If two or more columns are used to uniquely identify a record then combination of those multiple

columns constitutes a composite key.

In the Student table given below, we have StudentId, TestId and Mark. Here one student can take

multiple tests and one test can be taken by multiple students. In this case in order to uniquely

identify the mark of a student in a test we require both StudentId and TestId. This is a composite

key.

Student Table

Table 2.1

Functional Dependency

In simple terms, functional dependency can be explained as follows. If you know one attribute then

you can get another attribute. Then both these attributes are said to be functionally dependent. In

the Student table given below, we can get the attribute 'Name' if you know the attribute 'StudentId',

then Name and StudentId are functionally dependent. Here we can say StudentId is determinant and

Name as dependent.

For example, let's consider the Student table given below. Table 2.2 stores student

details(StudentId, Name, Languages Known), student's department details (Dept_No, Dept_Name)

and lecturer details (LecturerInCharge, Designation) for Students.

In this approach, we keep repeating the languages known and department details data for all the

students in the same field. This is called an UnNormalized table. Instead of storing the same data

again and again, we could normalize the data and create related tables.

Let's see how we can normalize the table,create related tables and learn forms with the Student

table(which is not normalized):

Student Table (UnNormalized Table):

Table 2.2

First Normal Form

To move from unnormalized form to first normal form all multi-valued attributes (called repeating

groups) should be removed. The repeating groups nust be eliminated. All attributes must be atomic.

Table 2.2 is not in 1NF since there are repeating groups (more than 1 value in a field). The column

"Languages Known" has(English, Hindi and Tamil) in the Row(Tuple)1 and (English and Hindi) in the

Row(Tuple) 2 .To satisfy 1NF we can create separate rows for each value in Languages Known by

duplicating the values in the remaining columns. Table 2.3 represents the same.

1NF Rules

Each column in a table should contain single value.

Each record needs to be unique as shown in Table 2.3

Table 2.3 : 1NF Form

Second Normal Form

Partial functional dependencies must be removed. If two attributes of a table are combined to form a

composite key, then the non-key attributes of that table must depend on both the attributes of the

composite key. They must not depend on one of the attributes, which is the part of the composite

key.

2NF Rules

Rule 1- The table should be in 1NF.

Rule 2- The Single Column must be used as Primary Key.

A relation in 1NF will be in second normal form (2NF) if there are no partial dependencies.

Partial dependency

It is the functional dependency on part of the primary key instead of the entire primary key.

It is clear that we can't move forward to make our simple database in 2nd Normalization form unless

we partition the columns in Table 2.3. Here, assume that StudentId and Dept_No together act as the

key (Composite key). As per 2NF all non-key attributes must be dependent on whole key.

In Table 2.3 the attribute 'Dept_Name' is functionally dependent on whole key (StudentId+Dept_No).

That is, you can get the department name only if you know both StudentId and Dept_No. All other

column attributes can be identified by just providing 'StudentId'. So for all other columns StudentId

acts as the primary key. So split the table as given below to satisfy 2NF.

Student

Table 2.4

Department

Table 2.5

Languages

Table 2.6

Introducing Foreign Key

A foreign key is a field in a table that matches the primary key column of another table. The cross-

reference tables can be achieved by Foreign Key.

In Table 2.7,Dept_No is the foreign Key

Table 2.7

Figure 2.15 : Foreign Key

Foreign key refers primary key of another table. It helps to connect the two tables.

The values of a foreign key and a primary key may be different.

The foreign key ensures that a row in a table is mapped to a corresponding row in another

table.

Foreign key does not have to be unique; most often it is not unique.

Foreign Key

Figure 2.16 : Foreign Key Illustration

Why do you need a foreign key?

Foreign key is required in RDBMS for the concept of Referential Integrity.

Referential integrity

It is a concept used in database to ensure that there is consistency in table relationships. If one table

has a foreign key to another table, then the concept of referential integrity states that you cannot add

a record to the table that contains the foreign key unless there is a corresponding record in the

link/relationship with the other table.

For example, consider the Figure 2.16 given in the previous page, where Dept_No in the Student

table is foreign key of Dept_No in Department table. Here let's try to add a student with StudentId as

"103" and Dept_No as "D003" in Student table as shown below. But the entry for Dept_No "D003" is

not present in Department table which means we have added a student to a department which does

not exist. This leads to inconsistency of data across related tables. Hence RDMS has the concept of

referential integrity which does not allow to add a record to the table that contains the foreign key

unless there is a corresponding record in the table to which it is linked.

Student

Table 2.8

Department

Table 2.9

Transitive functional dependencies

When changing a non-key column might cause any of the other non-key columns to change, it is

called transitive functional dependency. Attributes that are not a part of the key must not depend on

any non-key attribute.

Consider the table 2.9. Changing the non-key column Lecturer In Charge , may change Designation.

Here Dept_No acts as the key. All other columns are non-key attributes. As per 3NF non-key

attributes should not be dependent on any other non-key attributes but 'Lecturer In Charge' is

dependent on 'Designation'. Both Lecturer In Charge and Designation are non-key attributes. So it

forms transitive dependency. So, to satisfy 3NF let's split the table in a short while.

Third Normal Form

Third normal form (3NF) is the third step in database normalization and it builds on the first (INF)and

second normal forms(2NF).

The Third Normal Form(3NF) states that all column references in the referenced data that are not

dependent on the primary key should be removed. Another way of putting this statement is that only

foreign key columns should be used to reference another table, and the other columns from the

parent table should not exist in the reference table.

The Second Normal form(2NF) covers in case of multi-column primary keys. 3NF is meant to cover

single column keys as mentioned in transitive functional dependencies above.

3NF Rules

Rule 1- The table should be in 2NF.

Rule 2- The table has no transitive functional dependencies which is explained above.

We need to divide our table if it has to be moved from second normal form(2NF) into Third Normal

form(3NF). In table 2.1 Dept_No acts as the key. All other columns are non-key attributes. The non-

key attributes should not be dependent on any other non-key attributes as per third normal form. The

'Designation' is dependent on 'Lecturer In Charge' and these are non key attributes in the Lecturer

table explained. It forms transitive dependency. So, to satisfy 3NF split the table as follows.

Student

Table 2.10

Department

Table 2.11

Lecturer

Table 2.12

Languages

Table 2.13

The example given above cannot be decomposed further to attain higher forms of normalization

because it is already normalized to the highest level.Normally only complex data bases would need

next levels of normalization.

2.3. Joins

What are Joins?

A join is a technique where records from two or more tables are retrieved through a single SQL

query and shown as a single output. As it forms a set, It can be saved as a table or used as it is. A

join is a means of combining columns from two tables by using values common to both tables. It

allows us to combine data from more than one table into a single result set. A join condition is used

in the WHERE clause of select, update and delete queries.

Note: The query will give results from two tables as Cartesian product(A Cartesian product is defined

as all possible combinations of rows in all tables). If join condition is omitted. The first table's rows

are joined with all rows of the second table. For example, if the first table has 30 rows and the

second table has 10 rows, the result will be 30 * 10, or 300 rows. This query will take a long time to

execute.

Let's use the two tables below to explain the join conditions.

Table "Student"

Table 2.14

Table "Department"

Table 2.15

In the above example the column that is common between both the tables is Dept_No. Using

Dept_No,the Student and Department tables can be joined to combine data from both the tables as

shown below.

Figure 2.17 : Joining of tables

Lets consider a scenario to retrieve the details of student who belong to 'CSE' department. We have

to join two tables based on the common column present in the two tables.

Figure 2.18 : Mapping data

Result: After joining two tables:

Table 2.16

3.1. Relational Database Management System (RDBMS)

A Data Base Management System that is based on a relational model is called as RDBMS.

Relational model is the most successfully used Data Base Management System Model (DBMS)

model.

Relational model represents data in the form of a table. A table is a two dimensional array which

contains rows and columns.

Consider a scenario of a college where we need to maintain huge amount of student details. All

these student details are stored in a table as mentioned in Figure 3.1.

In Figure 3.1, (as discussed in Section 2 ER model) students is the entity and Name is one of the

attributes of this students entity. Other attributes are RollNo and Phone. The table given below

contains rows and columns. Each row contains data related to an entity/students. Each column

contains the data related to an attribute.

Figure 3.1 : Student Table

Figure 3.1 shows the data represented in relational model and the terms that are used to refer to

various components of a table. The terms mentioned below are used in relational model.

Tuple / Row

A single row that is available in the table is called as tuple. Each row in the table represents the data

of a single entity. For example, in Figure 3.1 s1, Louis Figo, 454333 represents a row.

Attribute / Column

A column in the table stores an attribute of the entity. For example, in Students table (Figure 3.1)

Louis Figo, Rahul, etc. are the attributes as highlighted in figure.

Column Name

Each column that is available in the table is given a name. This name is used to refer to values in the

column. In Students table (Figure 3.1), RollNo, Name and Phone are the column names of the table.

Table Name

Each table is provided with a name. The name that is provided is used to refer to the table. The

name of the table depicts the contents of the table. In the above Figure 3.1, Students is the name of

the table.

Structured Query Language (SQL)

Relational database management systems ( RDBMS) use SQL (Structured Query Language) for

data manipulation and retrieval. SQL is the standard language for relational database systems. It is a

non-procedural language.

Non-procedural language requires the programmer to specify what the program should do, rather

than providing the sequential steps indicating how the program should perform a task.

SQL Commands are divided into three categories, depending upon what they do:

DDL (Data Definition Language)

DML (Data Manipulation Language)

DCL (Data Control Language)

Related Video/Material Links:

http://www.trainsignal.com/blog/videos/free-video-sql-101-data-definition-language

http://www.sqlcourse.com/index.html

http://www.w3schools.com/sql/sql_intro.asp

http://www.studytonight.com/dbms/rdbms-concept.php

3.2. Introduction to Data Definition Language(DDL)

Data Definition Language ( DDL) statements are used to create and modify the structure of your

tables and any other objects in the database. Some of the DDL commands are CREATE, ALTER

and DROP.

CREATE statement

A CREATE statement in SQL is used to create a table.

The general syntax of the CREATE statement is given below:

Syntax:

CREATE TABLE table_name

( column_name1 data_type constraints,

column_name2 data_type constraints,

...

column_nameN data_type constraints,

);

where,

table_name - is the name of the table

column_name1, column_name2,.... ,column_nameN- is the name of the columns

data_type - is the data type for the column like char, date, number etc.

constraints - constraints are used to validate or limit the type of data that can go into a

table.

Constraints are optional for the columns.

We will focus on a few constraints now:

NOT NULL

PRIMARY KEY

FOREIGN KEY

UNIQUE

NOT NULL: The NOT NULL constraint enforces a column to not accept NULL values. This means

that this column must contain some value while inserting or updating a record.

PRIMARY KEY: Primary key uniquely identifies each record in the database. So a primary key

column cannot contain NULL values.(Refer Section 2 for more details about Primary Key).

FOREIGN KEY: A foreign key is a column in a table that matches the primary key column of another

table. The foreign key can be used to map two tables. (Refer Section 2 for more details about

Foreign Key).

UNIQUE: Unique constraints are used to make sure that no duplicate values are entered in specific

columns that do not participate in a primary key. A column defined as UNIQUE can contain NULL

values.

Basic SQL DATA types :

CHAR: The CHAR data type is used for storing fixed length character strings with a

maximum size of 2000 bytes. The CHAR(n) holds fixed length of n characters.

DATE: It allows to define the Date attributes as Date fields in the database. Here the DATE

data type stores year, month, and day values.

NUMBER: It allows to define a column as number field. Only number values can be stored in

the database.

Now let's see how to implement the SQL queries with examples:

Example1:

With the help of CREATE statement, let's create Students table with columns as RollNo, Name and

Phone as shown below.

CREATE TABLE Students (

RollNo NUMBER PRIMARY KEY,

Name CHAR(25) NOT NULL,

Phone NUMBER

);

Here "Students" is the name of the table. RollNo, Name and Phone are the columns of the table.

NUMBER and CHAR(25) are the data types which convey what kind of data that particular column

will hold.

Here RollNo is given as PRIMARY KEY which means that this particular column will not accept any

duplicate values. The other two columns are defined as NOT NULL which conveys that these two

columns will not accept NULL values.

Note: NULL specifies that the column doesn't have any value or the column is empty.

Example2:

Let's see another example using the Create Statement.

The query below is used to create "employees" table with columns such as employee_id, first_name,

etc.

CREATE TABLE employees (

employee_id NUMBER PRIMARY KEY,

first_name CHAR(10) NULL,

last_name CHAR(10) NOT NULL,

email CHAR(25) NOT NULL,

phone_number NUMBER NOT NULL,

hire_date DATE NOT NULL,

job_id CHAR(10),

salary NUMBER,

commission_pct NUMBER,

manager_id NUMBER,

department_id NUMBER

);

Example3:

In Example1 and Example2 we explained how to create primary key and NOT NULL constrains.

Now let's see how to implement foreign key constraints. To implement foreign key we need two

tables that are dependent on each other.

In Example 2 we have employees table which contains the department_id as one of the columns but

does not have department details. Now let's create a department table which contains details of the

department such as department_id and department name.

CREATE TABLE department(

department_id NUMBER PRIMARY KEY,

department_name CHAR(10)

);

Consider a scenario where we need to identify the department_name of an employee. In this case

the employees table is dependent on department table to get the department name based on the

common column department_id. Foreign key constraint comes into picture in this case. The syntax

below creates foreign key. (For more details about Foreign key refer Section 2).

CREATE TABLE employees (

employee_id NUMBER PRIMARY KEY,

first_name CHAR(10) NULL,

last_name CHAR(10) NOT NULL,

email CHAR(25) NOT NULL,

phone_number NUMBER NOT NULL,

hire_date DATE NOT NULL,

job_id CHAR(10),

salary NUMBER,

commission_pct NUMBER,

manager_id NUMBER,

department_id NUMBER FOREIGN KEY REFERENCES department(department_id )

);

DROP statement

The DROP command is used to remove a table from the database . If you drop a table, all the rows

in the table are deleted and the table structure is removed from the database permanently. Once a

table is dropped using DROP command , we cannot retrieve the data / table back. So we should be

careful while using this command.

Syntax:

DROP TABLE table_name;

Example1:

The following command is used to permanently remove the Students table structure/definition along

with the data that was created.

DROP TABLE Students;

After execution of the above command the entire Students table is removed from the database. We

cannot get back any data about Students table.

Example2:

Let's see how the employees table definition/structure is removed from the database.

DROP TABLE employees;

After execution of the above command the entire employees table is removed from the database.

We cannot get back any data about employees table.

ALTER statement

The ALTER statement helps to modify the structure of an existing table in the database.

Once you've created a table within a database, you may wish to modify it's definition at some

instance. ALTER statement allows you to make changes to the structure of a table without deleting

or recreating it.

General syntax of Alter statement is given below.

Syntax for adding a column to the existing table:

ALTER TABLE table_name ADD column_name data_type;

Example:

Let's see how we can alter or edit the structure of the Students table that we created using CREATE

statement in Section 3.2.1 using SQL queries. Let's assume that we have to add a new column

called 'gender' to the existing Students table .

ALTER TABLE Students ADD gender CHAR(10);

In the above example the "gender" column with the data type as CHAR(10) has been added to the

existing Students table.

Syntax for adding constraints:

ALTER TABLE table_name ADD CONSTRAINT clause ;

where: A CONSTRAINT clause is optional in the above ALTER TABLE statement for defining the

constraint.

Example:

In the example below Unique constraint is applied to Phone column in order to avoid duplicate

phone numbers getting inserted into the table.

ALTER TABLE employees ADD UNIQUE Phone;

The constraint UNIQUE has been added on column Phone of employees table to show unique data.

3.3. Introduction to Data Manipulation Language (DML)

Data manipulation language (DML) is a family of computer language that includes commands which

permit users to manipulate data in a database. This manipulation involves inserting data into

database tables, retrieving existing data, deleting data from existing tables and modifying existing

data.

DML commands:

INSERT - To add a new row into a table

UPDATE - To update existing records within a table

DELETE - To delete records in a table

SELECT - To retrieve records from a table

INSERT Statement

The INSERT statement inserts new rows into an existing table.

The syntax for INSERT statement is as follows.

Syntax:

INSERT INTO table_name (col1,col2,col3,....)

VALUES (vallue1,value2,value3,.....);

Example 1:

Now let's see how to insert the details of students into Students table. The following is the structure

of the Students table. Let's see how to insert the details of a student named David.

Table 3.1

In the query given below RollNo, Name, Phone and Gender are the columns defined in the Students

table. Using INSERT statement the corresponding values 100, David, 9830028200, Male are

inserted into those columns.

INSERT INTO Students(RollNo, Name, Phone, Gender)

VALUES (100,'David',9830028200,'Male');

Similarly, we can insert details of another student named 'Peter'. Let's try to ignore a column which

accepts NULL value during insertion.

INSERT INTO Students(RollNo, Name, Gender) VALUES (200,'Peter','Male');

In the above query we have given values only for three columns (RollNo, Name, Gender). Though

we didn't mention Phone, the record will be successfully inserted because it is not mandatory to

provide values for the columns which can accept NULL values during insertion. In Students table

Phone and Gender are the columns which can accept NULL values. For Peter's record Phone

column will be empty.

The data we inserted is represented in the table below:

Table 3.2

Example 2:

Let us assume employees table structure as below:

Table 3.3

Query:

INSERT INTO employees (First_Name,Last_Name,Email,Phone_Number,

Hire_Date,Job_ID,Salary,Commission_PCT,Manager_ID,Age,Department_ID)

VALUES ('George', 'Gordon','GGORDON',6505062222,

'01-JAN-07','SA_REP',9000,.1,148,25,80);

Result:

ERROR at line 1:

ORA-01400: cannot insert NULL into ("E668292"."EMPLOYEES"."EMPLOYEE_ID")

In the above query we are trying to insert the details of an employee without providing value for

Employee_ID column. Employee_ID column is a NOT NULL column. So it is mandatory to provide

value for the same. Since we tried to insert some data excluding the NOT NULL column value, the

execution of the query gives an error. Since we didn't give any value for the Employee_ID column

the value that will get into the table would be NULL. So it has thrown as error as "cannot insert NULL

into EMPLOLYEE_ID column".

Let's now insert a row by providing Employee_ID column value.

INSERT INTO employees

(Employee_ID,First_Name,Last_Name,Email,Phone_Number,


VALUES (10,'George', 'Gordon','GGORDON',6505062222,

'01-JAN-07','SA_REP',9000,.1,148,25,80);

Inserting another employee:

INSERT INTO employees

(Employee_ID,First_Name,Last_Name,Email,Phone_Number,


VALUES (11,'James', 'Keats','j_keats@gm',6505062221,

'01-JAN-07','SA_REP',7000,.1,148,25,80);

The inserted data is represented in table format below:

Table 3.4

SELECT Statement

Select statement is used to retrieve the data from the database table.

General syntax of the Select statement is given below:

Syntax:

SELECT column_list FROM table_name WHERE search_condition

where

column_list includes one or more columns from which data is retrieved.

table_name is the name of the table from which the information is retrieved.

search_condition specifies the conditions based on which rows will be retrieved.

The three clauses used in the SELECT statement:

Table 3.5

Example 1:

If we want to view the details of all students after inserting the values in the Students table, the query

below can be executed.

SELECT * FROM Students;

Result:

Table 3.6

Here * denotes all the columns and rows of the table.

Example 2:

Let's assume that we want to select a row from Students table whose roll no is 200.

To retrieve this the following query is executed.

SELECT name FROM Students WHERE RollNo=200;

Result:

Table 3.7

Example 3:

Now let's consider a scenario where we need to retrieve the Salary from employees table whose first

name is George. The query for the scenario will be as follows:

SELECT Salary FROM employees WHERE first_name= 'George' ;

There are chances that there are more than one employees with first name as George. The above

query will retrieve all the employees whose first name is George. But if we need only one specific

employee whose first name is George then we can add one more condition in WHERE clause which

will help in retrieving the exact required data.

SELECT Salary FROM employees

WHERE first_name= 'George' AND employee_id= 10 ;

Result:

Table 3.8

In the above query we have added two conditions with the help of the "AND" key word. AND checks

for both the conditions and will retrieve the record which matches both. So salary of the employee(i.e

George as shown in result) is retrieved from the employees table whose FIRST_NAME is "George"

and EMPLOYEE_ID is equal to 10.

So far we saw how to retrieve data from one table. Now let's see how to retrieve data from more

than one table.

To retrieve or combine data from more than one table we use Joins.

Joins:

Join command is used to combine records from two or more tables in a database. Join command

creates a set that can be saved as a table or used as it is.

A Join is a means of combining fields from two tables by using values common to each other.

A Join condition can be used in the WHERE clause of SELECT, UPDATE, DELETE statements.

(Refer Section 2 in this document for more details)

The following is the syntax for joining two tables:

SELECT col1, col2, col3... FROM table_name1, table_name2

WHERE table_name1.col2 = table_name2.col1;

Example:

Let's assume that Department table has the following data.

Table 3.9

The employees table has the following data:

Table 3.10

The column that is common between the two tables is Department_ID. So using Department_ID we

can join Department table and employees table. Please find below query for the same.

SELECT employee_Id, first_name,department_name,department_id

FROM department,employees

WHERE department.department _id = employees.department_id;

Result:

Table 3.11

Here data from employees table and department table are joined and displayed.

Department_Id from department table is compared with Department_Id from employees table and

the records that have same value for Department_ Id(i,e. 80) have been displayed.

UPDATE Statement

Let' see how to modify the existing rows in a table.

In Section 3.3.1 we saw how to insert the records into a table. Here we will see how to update the

inserted records.

The UPDATE statement modifies the set of existing table rows.

General syntax for the UPDATE statement is given below.

Syntax:

UPDATE table_name

SET (column_name1 = value,column_name2=value,..)

WHERE condition;

Note: The WHERE clause in the above syntax specifies which record or records should be updated.

All records will be updated, if we omit the WHERE clause in UPDATE statement.

Let's see a few examples for the UPDATE statement.

Example 1:

Table 3.12

Let's update the Name "David" in the students tables to "John".

We use the below query for the same.

UPDATE students SET name = 'John' WHERE rollno = 100;

Result:

1 row updated.

Students table will look like this now:

Table 3.13

Example 2:

employees table has the following data:

Table 3.14

Now we want to update the salary of the employee whose manager_ID is 148.

We use the below query for the same.

UPDATE employees SET salary = 10500 WHERE manager_id = 148;

Result:

2 rows updated.

In the above query employees table is updated with the salary value 10500 for the rows which have

the manager_id as 148.

There were two rows which had manager_id as 148 and they have been updated with salary value

as 10500 from 9000 and 7000 respectively.

employees table will look like this now:

Table 3.15

DELETE Statement

The DELETE statement is used to delete the rows from a table.

The DELETE statement syntax is given below.

Syntax:

DELETE FROM table_name WHERE condition ;

If we include the WHERE clause, the statement deletes only those records that satisfy the condition.

If we omit the WHERE clause, the statement deletes all records from the table, but the table still

exists without records.

Example 1:

Students table has the data as shown below:

Table 3.16

If we want to delete a row from the above table whose RollNo is 100, we use the below query.

DELETE FROM Students WHERE ROLLNO = 100;

Result:

1 row deleted.

After deleting the record, Students table will look like this:

Table 3.17

Example 2:

employees table has the following data:

Table 3.18

Now let's delete the employee details whose hire date is 1st Jan 07.

DELETE FROM employees WHERE hire_date = '01-JAN-07';

Result:

2 rows deleted.

Two rows which have the Hire_Date value as '01-JAN-07' have been deleted from employees table

(Refer Example 2 of Sec3.2.1).

Note : The DELETE statement is different from the DROP statement. The DELETE statement

deletes some (or all) data from the table but the table exists in the data base. The DROP statement

removes the table permanently from the data base.

3.4. Introduction to DCL(Data Control Language)

The Data Control Language (DCL) component of the SQL language and it is used to provide

privileges to the users to access or to manipulate the database. The following are two main

commands:

GRANT - This command is used to grant privileges to a user.

REVOKE - This command is used to revoke (remove) privileges from a user.

GRANT command

In order to do anything within a database you must be given the appropriate privileges. Database

operates in a closed system where you cannot perform any action at all unless you have been

authorized to do so. This includes logging onto the database, creating tables, manipulating data (ie

select, insert, update and delete) in tables created by other users, etc.

Syntax:

GRANT privilege_name ON table_name TO user_name;

Where,

privilege_name is the access right or privilege granted to the user.

table_name is the name of the table in the database.

user_name is the name of the user to whom an access right is being granted.

Example:

GRANT SELECT ON employees TO user10;

This command grants a SELECT permission on employees table to user10.

REVOKE command

The SQL command is used to revoke a privilege on a table.

Syntax:

REVOKE privilege_name ON table_name FROM user_name;

Where,

privilege_name is the access right or privilege revoked from the user.

table_name is the name of the table in the database.

user_name is the name of the user from whom an access right is being revoked.

Example:

REVOKE SELECT ON employees FROM user10;

This command will REVOKE a SELECT privilege on employees table from user10. If you REVOKE

SELECT privilege on a table from a user, then the user is not able to SELECT data from that table

anymore.

rdbms

Documents

drawbacks of file

system filebased systems

databasea file

hisher file

filebased approach

manual file system crashes

university system

different private data