dbms - data models · dbms - data models database model ... mapping cardinalities define the number...

DBMS - Data Models

Database Model

A Database model defines the logical design of data. The model describes the relationships between

different parts of the data. In history of database design, three models have been in use.

Hierarchical Model

Network Model

Relational Model

Hierarchical Model

In this model each entity has only one parent but can have several children . At the top of hierarchy

there is only one entity which is called Root.

Network Model

In the network model, entities are organised in a graph,in which some entities can be accessed

through sveral path

Relational Model

In this model, data is organised in two-dimesional tables called relations. The tables or relation are

related to each other.

The most popular data model in DBMS is the Relational Model. It is more

scientific a model than others. This model is based on first-order predicate

logic and defines a table as an n-ary relation.

The main highlights of this model are −

Data is stored in tables called relations.

Relations can be normalized.

In normalized relations, values saved are atomic values.

Each row in a relation contains a unique value.

Each column in a relation contains values from a same domain.

Entity-Relationship Model Entity-Relationship (ER) Model is based on the notion of real-world entities

and relationships among them. While formulating real-world scenario into

the database model, the ER Model creates entity set, relationship set,

general attributes and constraints.

ER Model is best used for the conceptual design of a database.

ER Model is based on −

Entities and their attributes.

Relationships among entities.

These concepts are explained below.

Entity − An entity in an ER Model is a real-world entity having properties

called attributes. Every attribute is defined by its set of values

called domain. For example, in a school database, a student is considered as

an entity. Student has various attributes like name, age, class, etc.

Relationship − The logical association among entities is calledrelationship.

Relationships are mapped with entities in various ways. Mapping cardinalities

define the number of association between two entities.

Mapping cardinalities −

o one to one

o one to many

o many to one

o many to many

RDBMS Concepts

A Relational Database management System(RDBMS) is a database management system based

on relational model introduced by E.F Codd. In relational model, data is represented in terms of

tuples(rows).

RDBMS is used to manage Relational database. Relational database is a collection of organized

set of tables from which data can be accessed easily. Relational Database is most commonly used

database. It consists of number of tables and each table has its own primary key.

What is Table ?

In Relational database, a table is a collection of data elements organised in terms of rows and

columns. A table is also considered as convenient representation of relations. But a table can have

duplicate tuples while a true relation cannot have duplicate tuples. Table is the most simplest form

of data storage. Below is an example of Employee table.

ID Name Age Salary

1 Adam 34 13000

2 Alex 28 15000

3 Stuart 20 18000

4 Ross 42 19020

What is a Record ?

A single entry in a table is called a Record or Row. A Record in a table represents set of related

data. For example, the above Employee table has 4 records. Following is an example of single

record.

1 Adam 34 13000

What is Field ?

A table consists of several records(row), each record can be broken into several smaller entities

known asFields. The above Employee table consist of four fields, ID, Name, Age and Salary.

What is a Column ?

In Relational table, a column is a set of value of a particular type. The term Attribute is also used to

represent a column. For example, in Employee table, Name is a column that represent names of

employee.

Name

Adam

Alex

Stuart

Ross

Database Keys

Keys are very important part of Relational database. They are used to establish and identify relation

between tables. They also ensure that each record within a table can be uniquely identified by

combination of one or more fields within a table.

Super Key

Super Key is defined as a set of attributes within a table that uniquely identifies each record within a

table. Super Key is a superset of Candidate key.

Candidate Key

Candidate keys are defined as the set of fields from which primary key can be selected. It is an

attribute or set of attribute that can act as a primary key for a table to uniquely identify each record in

that table.

Primary Key

Primary key is a candidate key that is most appropriate to become main key of the table. It is a key

that uniquely identify each record in a table.

Composite Key

Key that consist of two or more attributes that uniquely identify an entity occurance is

called Composite key. But any attribute that makes up the Composite key is not a simple key in its

own.

Secondary or Alternative key

The candidate key which are not selected for primary key are known as secondary keys or

alternative keys

Non-key Attribute

Non-key attributes are attributes other than candidate key attributes in a table.

Non-prime Attribute

Non-prime Attributes are attributes other than Primary attribute.

E-R Diagram

ER-Diagram is a visual representation of data that describes how data is related to each other.

Symbols and Notations

Components of E-R Diagram

The E-R diagram has three main components.

1) Entity

An Entity can be any object, place, person or class. In E-R Diagram, an entity is represented using

rectangles. Consider an example of an Organisation. Employee, Manager, Department, Product and

many more can be taken as entities from an Organisation.

Weak Entity

Weak entity is an entity that depends on another entity. Weak entity doen't have key attribute of their

own. Double rectangle represents weak entity.

2) Attribute

Entities are represented by means of their properties, called attributes. All

attributes have values. For example, a student entity may have name,

class, and age as attributes.

Types of Attributes

Simple attribute − Simple attributes are atomic values, which cannot be

divided further. For example, a student's phone number is an atomic value of

10 digits.

Composite attribute − Composite attributes are made of more than one

simple attribute. For example, a student's complete name may have first_name

and last_name.

Derived attribute − Derived attributes are the attributes that do not exist in

the physical database, but their values are derived from other attributes present

in the database. For example, average_salary in a department should not be

saved directly in the database, instead it can be derived. For another example,

age can be derived from data_of_birth.

Single-value attribute − Single-value attributes contain single value. For

example − Social_Security_Number.

Multi-value attribute − Multi-value attributes may contain more than one

values. For example, a person can have more than one phone number,

email_address, etc.

These attribute types can come together in a way like −

simple single-valued attributes

simple multi-valued attributes

composite single-valued attributes

composite multi-valued attributes

An Attribute describes a property or characterstic of an entity. For example, Name, Age, Address

etc can be attributes of a Student. An attribute is represented using eclipse.

Key Attribute

Key attribute represents the main characterstic of an Entity. It is used to represent Primary key.

Ellipse with underlying lines represent Key Attribute.

Composite Attribute

An attribute can also have their own attributes. These attributes are known as Composite attribute.

3) Relationship

A Relationship describes relations between entities. Relationship is represented using diamonds.

There are three types of relationship that exist between Entities.

Binary Relationship

Recursive Relationship

Ternary Relationship

Binary Relationship

Binary Relationship means relation between two Entities. This is further divided into three types.

1. One to One : This type of relationship is rarely seen in real world.

The above example describes that one student can enroll only for one course and a course will

also have only one Student. This is not what you will usually see in relationship.

2. One to Many : It reflects business rule that one entity is associated with many number of same

entity. The example for this relation might sound a little weird, but this menas that one student

can enroll to many courses, but one course will have one Student.

The arrows in the diagram describes that one student can enroll for only one course.

3. Many to One : It reflects business rule that many entities can be associated with just one entity.

For example, Student enrolls for only one Course but a Course can have many Students.

4. Many to Many :

The above diagram represents that many students can enroll for more than one courses.

Recursive Relationship

When an Entity is related with itself it is known as Recursive Relationship.

Ternary Relationship

Relationship of degree three is called Ternary relationship.

Relationship The association among entities is called a relationship. For example, an

employee works_at a department, a student enrolls in a course. Here,

Works_at and Enrolls are called relationships.

Relationship Set

A set of relationships of similar type is called a relationship set. Like

entities, a relationship too can have attributes. These attributes are

called descriptive attributes.

Degree of Relationship

The number of participating entities in a relationship defines the degree of

the relationship.

Binary = degree 2

Ternary = degree 3

n-ary = degree

Mapping Cardinalities

Cardinality defines the number of entities in one entity set, which can be

associated with the number of entities of other set via relationship set.

One-to-one − One entity from entity set A can be associated with at most one

entity of entity set B and vice versa.

One-to-many − One entity from entity set A can be associated with more than

one entities of entity set B however an entity from entity set B, can be

associated with at most one entity.

Many-to-one − More than one entities from entity set A can be associated with

at most one entity of entity set B, however an entity from entity set B can be

associated with more than one entity from entity set A.

Many-to-many − One entity from A can be associated with more than one

entity from B and vice versa.

Enhanced Entity Relationship Diagram

An enhanced entity-relationship diagram, or EERD, is a

specialized model that deviates from traditional ERDs. It

uses several concepts that are closely related to object-

oriented design and programming.

What is an Enhanced ERD?

An enhanced entity-relationship model, also known as an extended entity-relationship

model, is a type of database diagram that's similar to regular ERDs. Enhanced ERDs are

high-level conceptual models that accurately represent the requirements of complex

databases.

Enhanced ERDs include the same concepts that ordinary ER diagrams encompass. In

addition, EERDs include:

Subtypes and supertypes (sometimes known as subclasses and superclasses)

Specialization or generalization

Category or union type

Attribute and relationship inheritance

Enhanced ERD Definitions and Examples

The modeling concepts of EERDs differ somewhat from those of ERDs. See the list below

for definitions of concepts that are unique to enhanced entity-relationship diagrams. Before

you dive in, be sure to review our ERD pages, including this comprehensive look atER

diagram symbols and meanings. When you fully understand ERD structure, you're ready to

acquaint yourself with enhanced entity-relationship diagrams.

SUPERTYPES & SUBTYPES

Supertype - an entity type that has a relationship with one or more subtypes.

Subtype - a subgroup of entities with unique attributes.

Inheritance - the idea that subtype entities inherit the values of all supertype attributes.

Remember than a subtype instance is also classified as a supertype instance.

GENERALIZATION & SPECIALIZATION

Generalization - the process of defining a general entity type from a collection of

specialized entity types.

Specialization - the inverse of generalization, since it defines subtypes of the supertype

and forms relationships between supertype and subtupe.

Inheritance - the idea that subtype entities inherit the values of all supertype attributes.

Remember than a subtype instance is also classified as a supertype instance.

CONSTRAINTS

Disjointness constraints - decide whether a supertype instance may simultaneously be a

member of two or more subtypes. The disjoint rule forces subclasses to have disjoint sets

of entities. The overlap rule forces a subclass (also known as a supertype instance) to have

overlapping sets of entities.

Completeness constraints - decide whether a supertype instance must also be a member

of at least one subtype. Thetotal specialization rule demands that every entity in the

supclass belong to some subclass. Just as with a regular ERD, total specialization is

symbolized with a double line connection between entities. The partial specialization

https://www.lucidchart.com/pages/lucidu/erd

https://www.lucidchart.com/pages/ER-diagram-symbols-and-meaning

https://www.lucidchart.com/pages/ER-diagram-symbols-and-meaning

rule allows an entity to not belong to any of the subclasses. It is represented with a single

line connection.

SUBTYPE DISCRIMINATORS

A subtype discriminator is an attribute of the supertype that indicates an entity's subtype

The attribute's values are what determine the target subtype.

Disjoint subtypes - simple attributes that must have alternative values to indicate any

possible subtypes.

Overlapping subtypes - composite attributes whose subparts pertain to various subtypes.

Each subpart has a Boolean value that indicates whether or not the instance belongs to the

associated subtype.

How to Create an Effective EERD

Just like entity-relationship diagrams, a well-designed EERD will help you build storage

systems that are long-lasting and useful. When evaluating the effectiveness of an entity

relationship diagram, be sure that you’re modeling a system design that will meet

important business requirements. Possible considerations are:

Stability - will the diagram support business needs that change over time?

Breadth - can this diagram accommodate all of the data we need to store?

Flexibility - can data in this model be re-allocated to support additional information

requirements?

Efficiency - does this model represent the simplest solution? Is data modeled with the

appropriate symbols?

Accessibility - can both creators and end users of the ERD easily understand it?

Conformity - will the design integrate seamlessly with any existing database structure?

Generalization

Generalization is a bottom-up approach in which two lower level entities combine to form a

higher level entity. In generalization, the higher level entity can also combine with other lower

level entity to make further higher level entity.

Specialization

Specialization is opposite to Generalization. It is a top-down approach in which one higher level

entity can be broken down into two lower level entity. In specialization, some higher level

entities may not have lower-level entity sets at all.

Aggregration

Aggregration is a process when relation between two entity is treated as a single entity. Here the

relation between Center and Course, is acting as an Entity in relation with Visitor.

Relational Algebra Relational algebra is a procedural query language, which takes instances of

relations as input and yields instances of relations as output. It uses

operators to perform queries. An operator can be either unary or binary.

They accept relations as their input and yield relations as their output.

Relational algebra is performed recursively on a relation and intermediate

results are also considered relations.

The fundamental operations of relational algebra are as follows −

Select

Project

Union

Set different

Cartesian product

Rename

We will discuss all these operations in the following sections.

Select Operation (σ) It selects tuples that satisfy the given predicate from a relation.

Notation − σp(r)

Where σ stands for selection predicate and r stands for relation. p is

prepositional logic formula which may use connectors like and, or, and not.

These terms may use relational operators like − =, ≠, ≥, < , >, ≤.

For example −

σsubject = "database"(Books)

Output − Selects tuples from books where subject is 'database'.

σsubject = "database" and price = "450"(Books)

Output − Selects tuples from books where subject is 'database' and 'price'

is 450.

σsubject = "database" and price = "450" or year > "2010"(Books)

Output − Selects tuples from books where subject is 'database' and 'price'

is 450 or those books published after 2010.

Project Operation (∏) It projects column(s) that satisfy a given predicate.

Notation − ∏A1, A2, An (r)

Where A1, A2 , An are attribute names of relation r.

Duplicate rows are automatically eliminated, as relation is a set.

For example −

∏subject, author (Books)

Selects and projects columns named as subject and author from the

relation Books.

Union Operation (∪) It performs binary union between two given relations and is defined as −

r ∪ s = { t | t ∈ r or t ∈ s}

Notion − r U s

Where r and s are either database relations or relation result set

(temporary relation).

For a union operation to be valid, the following conditions must hold −

r, and s must have the same number of attributes.

Attribute domains must be compatible.

Duplicate tuples are automatically eliminated.

∏ author (Books) ∪ ∏ author (Articles)

Output − Projects the names of the authors who have either written a book

or an article or both.

Set Difference (−) The result of set difference operation is tuples, which are present in one

relation but are not in the second relation.

Notation − r − s

Finds all the tuples that are present in r but not in s.

∏ author (Books) − ∏ author (Articles)

Output − Provides the name of authors who have written books but not

articles.

Cartesian Product (Χ) Combines information of two different relations into one.

Notation − r Χ s

Where r and s are relations and their output will be defined as −

r Χ s = { q t | q ∈ r and t ∈ s}

σauthor = 'tutorialspoint'(Books Χ Articles)

Output − Yields a relation, which shows all the books and articles written

by tutorialspoint.

Rename Operation (ρ) The results of relational algebra are also relations but without any name.

The rename operation allows us to rename the output relation. 'rename'

operation is denoted with small Greek letter rho ρ.

Notation − ρ x (E)

Where the result of expression E is saved with name of x.

Additional operations are −

Set intersection

Assignment

Natural join

Relational Calculus In contrast to Relational Algebra, Relational Calculus is a non-procedural

query language, that is, it tells what to do but never explains how to do it.

Relational calculus exists in two forms −

Tuple Relational Calculus (TRC)

Filtering variable ranges over tuples

Notation − {T | Condition}

Returns all tuples T that satisfies a condition.

For example −

{ T.name | Author(T) AND T.article = 'database' }

Output − Returns tuples with 'name' from Author who has written article

on 'database'.

TRC can be quantified. We can use Existential (∃) and Universal Quantifiers

(∀).

For example −

{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}

Output − The above query will yield the same result as the previous one.

Domain Relational Calculus (DRC)

In DRC, the filtering variable uses the domain of attributes instead of entire

tuple values (as done in TRC, mentioned above).

Notation −

{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}

Where a1, a2 are attributes and P stands for formulae built by inner

attributes.

For example −

{< article, page, subject > | ∈ TutorialsPoint ∧ subject = 'database'}

Output − Yields Article, Page, and Subject from the relation TutorialsPoint,

where subject is database.

Just like TRC, DRC can also be written using existential and universal

quantifiers. DRC also involves relational operators.

The expression power of Tuple Relation Calculus and Domain Relation

Calculus is equivalent to Relational Algebra.

Relational Calculus

Relational calculus is a non procedural query language. It uses mathematical predicate calculus

instead of algebra. It provides the description about the query to get the result where as relational

algebra gives the method to get the result. It informs the system what to do with the relation, but

does not inform how to perform it.

For example, steps involved in listing all the students who attend ‘Database’ Course in relational

algebra would be

SELECT the tuples from COURSE relation with COURSE_NAME = ‘DATABASE’

PROJECT the COURSE_ID from above result

SELECT the tuples from STUDENT relation with COUSE_ID resulted above.

There are two types of relational calculus - Tuple Relational Calculus (TRC) and Domain Relational

Calculus (DRC).

A tuple relational calculus is a non procedural query language which specifies to select the tuples in

a relation. It can select the tuples with range of values or tuples for certain attribute values etc. The

resulting relation can have one or more tuples. It is denoted as below:

{t | P (t)} or {t | condition (t)} -- this is also known as expression of relational calculus

Where t is the resulting tuples, P(t) is the condition used to fetch t.

{t | EMPLOYEE (t) and t.SALARY>10000} - implies that it selects the tuples from EMPLOYEE

relation such that resulting employee tuples will have salary greater than 10000. It is example of

selecting a range of values.

{t | EMPLOYEE (t) AND t.DEPT_ID = 10} – this select all the tuples of employee name who work for

Department 10.

The variable which is used in the condition is called tuple variable. In above example t.SALARY

and t.DEPT_ID are tuple variables. In the first example above, we have specified the condition

t.SALARY >10000. What is the meaning of it? For all the SALARY>10000, display the employees.

Here the SALARY is called as bound variable. Any tuple variable with ‘For All’ (?) or ‘there exists’ (?)

condition is called bound variable. Here, for any range of values of SALARY greater than 10000,

the meaning of the condition remains the same. Bound variables are those ranges of tuple variables

whose meaning will not change if the tuple variable is replaced by another tuple variable.

In the second example, we have used DEPT_ID= 10. That means only for DEPT_ID = 10 display

employee details. Such variable is called free variable. Any tuple variable without any ‘For All’ or

‘there exists’ condition is called Free Variable. If we change DEPT_ID in this condition to some

other variable, say EMP_ID, the meaning of the query changes. For example, if we change EMP_ID

= 10, then above it will result in different result set. Free variables are those ranges of tuple variables

whose meaning will change if the tuple variable is replaced by another tuple variable.

All the conditions used in the tuple expression are called as well formed formula – WFF. All the

conditions in the expression are combined by using logical operators like AND, OR and NOT, and

qualifiers like ‘For All’ (?) or ‘there exists’ (?). If the tuple variables are all bound variables in a WFF

is called closed WFF. In an open WFF, we will have at least one free variable.

Domain Relational Calculus

In contrast to tuple relational calculus, domain relational calculus uses list of attribute to be selected

from the relation based on the condition. It is same as TRC, but differs by selecting the attributes

rather than selecting whole tuples. It is denoted as below:

{< a1, a2, a3, … an > | P(a1, a2, a3, … an)}

Where a1, a2, a3, … an are attributes of the relation and P is the condition.

For example, select EMP_ID and EMP_NAME of employees who work for department 10

{<EMP_ID, EMP_NAME> | <EMP_ID, EMP_NAME> ? EMPLOYEE Λ DEPT_ID = 10}

Get name of the department name that Alex works for.

{DEPT_NAME |< DEPT_NAME > ? DEPT Λ ? DEPT_ID (<DEPT_ID> ? EMPLOYEE Λ

EMP_NAME = Alex)}

Here green color expression is evaluated to get the department Id of Alex and then it is used to get

the department name form DEPT relation.

Let us consider another example where select EMP_ID, EMP_NAME and ADDRESS the employees

from the department where Alex works. What will be done here?

{<EMP_ID, EMP_NAME, ADDRESS, DEPT_ID > | <EMP_ID, EMP_NAME, ADDRESS,

DEPT_ID> ? EMPLOYEE Λ ? DEPT_ID (<DEPT_ID> ? EMPLOYEE Λ EMP_NAME = Alex)}

First, formula is evaluated to get the department ID of Alex (green color), and then all the employees

with that department is searched (red color).

Other concepts of TRC like free variable, bound variable, WFF etc remains same in DRC too. Its

only difference is DRC is based on attributes of relation.

What is RDBMS? RDBMS stands for Relational Database Management System. RDBMS is the

basis for SQL, and for all modern database systems like MS SQL Server,

IBM DB2, Oracle, MySQL, and Microsoft Access.

A Relational database management system (RDBMS) is a database

management system (DBMS) that is based on the relational model as

introduced by E. F. Codd.

What is table? The data in RDBMS is stored in database objects called tables. The table is

a collection of related data entries and it consists of columns and rows.

Remember, a table is the most common and simplest form of data storage

in a relational database. Following is the example of a CUSTOMERS table:

+----+----------+-----+-----------+----------+

| ID | NAME | AGE | ADDRESS | SALARY |

+----+----------+-----+-----------+----------+

| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |

| 2 | Khilan | 25 | Delhi | 1500.00 |

| 3 | kaushik | 23 | Kota | 2000.00 |

| 4 | Chaitali | 25 | Mumbai | 6500.00 |

| 5 | Hardik | 27 | Bhopal | 8500.00 |

| 6 | Komal | 22 | MP | 4500.00 |

| 7 | Muffy | 24 | Indore | 10000.00 |

+----+----------+-----+-----------+----------+

What is field? Every table is broken up into smaller entities called fields. The fields in the

CUSTOMERS table consist of ID, NAME, AGE, ADDRESS and SALARY.

A field is a column in a table that is designed to maintain specific

information about every record in the table.

What is record or row? A record, also called a row of data, is each individual entry that exists in a

table. For example there are 7 records in the above CUSTOMERS table.

Following is a single row of data or record in the CUSTOMERS table:

+----+----------+-----+-----------+----------+

| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |

+----+----------+-----+-----------+----------+

A record is a horizontal entity in a table.

What is column? A column is a vertical entity in a table that contains all information

associated with a specific field in a table.

For example, a column in the CUSTOMERS table is ADDRESS, which

represents location description and would consist of the following:

+-----------+

| ADDRESS |

+-----------+

| Ahmedabad |

| Delhi |

| Kota |

| Mumbai |

| Bhopal |

| MP |

| Indore |

+----+------+

What is NULL value? A NULL value in a table is a value in a field that appears to be blank, which

means a field with a NULL value is a field with no value.

It is very important to understand that a NULL value is different than a zero

value or a field that contains spaces. A field with a NULL value is one that

has been left blank during record creation.

SQL Constraints: Constraints are the rules enforced on data columns on table. These are

used to limit the type of data that can go into a table. This ensures the

accuracy and reliability of the data in the database.

Constraints could be column level or table level. Column level constraints

are applied only to one column where as table level constraints are applied

to the whole table.

Following are commonly used constraints available in SQL:

NOT NULL Constraint: Ensures that a column cannot have NULL value.

DEFAULT Constraint: Provides a default value for a column when none is

specified.

UNIQUE Constraint: Ensures that all values in a column are different.

PRIMARY Key: Uniquely identified each rows/records in a database table.

FOREIGN Key: Uniquely identified a rows/records in any another database table.

CHECK Constraint: The CHECK constraint ensures that all values in a column

satisfy certain conditions.

INDEX: Use to create and retrieve data from the database very quickly.

Data Integrity: The following categories of the data integrity exist with each RDBMS:

Entity Integrity: There are no duplicate rows in a table.

Domain Integrity: Enforces valid entries for a given column by restricting the

type, the format, or the range of values.

http://www.tutorialspoint.com/sql/sql-not-null.htm

http://www.tutorialspoint.com/sql/sql-default.htm

http://www.tutorialspoint.com/sql/sql-unique.htm

http://www.tutorialspoint.com/sql/sql-primary-key.htm

http://www.tutorialspoint.com/sql/sql-foreign-key.htm

http://www.tutorialspoint.com/sql/sql-check.htm

http://www.tutorialspoint.com/sql/sql-index.htm

Referential integrity: Rows cannot be deleted, which are used by other

records.

User-Defined Integrity: Enforces some specific business rules that do not fall

into entity, domain or referential integrity.

Functional Dependency Functional dependency (FD) is a set of constraints between two attributes in

a relation. Functional dependency says that if two tuples have same values

for attributes A1, A2,..., An, then those two tuples must have to have same

values for attributes B1, B2, ..., Bn.

Functional dependency is represented by an arrow sign (→) that is, X→Y,

where X functionally determines Y. The left-hand side attributes determine

the values of attributes on the right-hand side.

Armstrong's Axioms If F is a set of functional dependencies then the closure of F, denoted as F+,

is the set of all functional dependencies logically implied by F. Armstrong's

Axioms are a set of rules, that when applied repeatedly, generates a closure

of functional dependencies.

Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha,

then alpha holds beta.

Augmentation rule − If a → b holds and y is attribute set, then ay → by also

holds. That is adding attributes in dependencies, does not change the basic

dependencies.

Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c

holds, then a → c also holds. a → b is called as a functionally that determines b.

Trivial Functional Dependency Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X,

then it is called a trivial FD. Trivial FDs always hold.

Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is

called a non-trivial FD.

Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is

said to be a completely non-trivial FD.

Normalization If a database design is not perfect, it may contain anomalies, which are like

a bad dream for any database administrator. Managing a database with

anomalies is next to impossible.

Update anomalies − If data items are scattered and are not linked to each

other properly, then it could lead to strange situations. For example, when we

try to update one data item having its copies scattered over several places, a

few instances get updated properly while a few others are left with old values.

Such instances leave the database in an inconsistent state.

Deletion anomalies − We tried to delete a record, but parts of it was left

undeleted because of unawareness, the data is also saved somewhere else.

Insert anomalies − We tried to insert data in a record that does not exist at

all.

Normalization is a method to remove all these anomalies and bring the

database to a consistent state.

First Normal Form First Normal Form is defined in the definition of relations (tables) itself. This

rule defines that all the attributes in a relation must have atomic domains.

The values in an atomic domain are indivisible units.

We re-arrange the relation (table) as below, to convert it to First Normal

Form.

Each attribute must contain only a single value from its pre-defined domain.

Second Normal Form Before we learn about the second normal form, we need to understand the

following −

Prime attribute − An attribute, which is a part of the prime-key, is known as a

prime attribute.

Non-prime attribute − An attribute, which is not a part of the prime-key, is

said to be a non-prime attribute.

If we follow second normal form, then every non-prime attribute should be

fully functionally dependent on prime key attribute. That is, if X → A holds,

then there should not be any proper subset Y of X, for which Y → A also

holds true.

We see here in Student_Project relation that the prime key attributes are

Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e.

Stu_Name and Proj_Name must be dependent upon both and not on any of

the prime key attribute individually. But we find that Stu_Name can be

identified by Stu_ID and Proj_Name can be identified by Proj_ID

independently. This is calledpartial dependency, which is not allowed in

Second Normal Form.

We broke the relation in two as depicted in the above picture. So there

exists no partial dependency.

Third Normal Form For a relation to be in Third Normal Form, it must be in Second Normal form

and the following must satisfy −

No non-prime attribute is transitively dependent on prime key attribute.

For any non-trivial functional dependency, X → A, then either −

o X is a superkey or,

o A is prime attribute.

We find that in the above Student_detail relation, Stu_ID is the key and

only prime key attribute. We find that City can be identified by Stu_ID as

well as Zip itself. Neither Zip is a superkey nor is City a prime attribute.

Additionally, Stu_ID → Zip → City, so there exists transitive dependency.

To bring this relation into third normal form, we break the relation into two

relations as follows −

Boyce-Codd Normal Form Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on

strict terms. BCNF states that −

For any non-trivial functional dependency, X → A, X must be a super-key.

In the above image, Stu_ID is the super-key in the relation Student_Detail

and Zip is the super-key in the relation ZipCodes. So,

Stu_ID → Stu_Name, Zip

and

Zip → City

Which confirms that both the relations are in BCNF.

Multivalued Dependencies

1. Functional dependencies rule out certain tuples from appearing in a relation.

If A B, then we cannot have two tuples with the same A value but

different B values.

2. Multivalued dependencies do not rule out the existence of certain tuples.

Instead, they require that other tuples of a certain form be present in the

relation.

3. Let R be a relation schema, and let and .

The multivalued dependency

holds on R if in any legal relation r(R), for all pairs of tuples and in r such

that , there exist tuples and in r such that:

Normalization Using Multivalued

Dependencies (not to be covered)

1. Suppose that in our banking example, we had an alternative design including

the schema: 2. BC-schema = (loan#, cname, street, ccity) 3.

We can see this is not BCNF, as the functional dependency

cname street ccity

holds on this schema, and cname is not a superkey.

4. If we have customers who have several addresses, though, then we no longer

wish to enforce this functional dependency, and the schema is in BCNF.

5. However, we now have the repetition of information problem. For each

address, we must repeat the loan numbers for a customer, and vice versa.

DBMS – Joins Join is a combination of a Cartesian product followed by a selection

process. A Join operation pairs two tuples from different relations, if and

only if a given join condition is satisfied.

We will briefly describe various join types in the following sections.

Theta (θ) Join Theta join combines tuples from different relations provided they satisfy the

theta condition. The join condition is denoted by the symbol θ.

Notation

R1 ⋈θ R2

R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,..

,Bn) such that the attributes don’t have anything in common, that is R1 ∩

R2 = Φ.

Theta join can use all kinds of comparison operators.

Student

SID Name Std

101 Alex 10

102 Maria 11

Subjects

Class Subject

10 Math

10 English

11 Music

11 Sports

Student_Detail −

STUDENT ⋈Student.Std = Subject.Class SUBJECT

Student_detail

SID Name Std Class Subject

101 Alex 10 10 Math

101 Alex 10 10 English

102 Maria 11 11 Music

102 Maria 11 11 Sports

Equijoin When Theta join uses only equality comparison operator, it is said to be

equijoin. The above example corresponds to equijoin.

Natural Join (⋈)

Natural join does not use any comparison operator. It does not concatenate

the way a Cartesian product does. We can perform a Natural Join only if

there is at least one common attribute that exists between two relations. In

addition, the attributes must have the same name and domain.

Natural join acts on those matching attributes where the values of

attributes in both the relations are same.

Courses

CID Course Dept

CS01 Database CS

ME01 Mechanics ME

EE01 Electronics EE

HoD

Dept Head

CS Alex

ME Maya

EE Mira

Courses ⋈ HoD

Dept CID Course Head

CS CS01 Database Alex

ME ME01 Mechanics Maya

EE EE01 Electronics Mira

Outer Joins Theta Join, Equijoin, and Natural Join are called inner joins. An inner join

includes only those tuples with matching attributes and the rest are

discarded in the resulting relation. Therefore, we need to use outer joins to

include all the tuples from the participating relations in the resulting

relation. There are three kinds of outer joins − left outer join, right outer

join, and full outer join.

Left Outer Join(R S) All the tuples from the Left relation, R, are included in the resulting relation.

If there are tuples in R without any matching tuple in the Right relation S,

then the S-attributes of the resulting relation are made NULL.

Left

A B

100 Database

101 Mechanics

102 Electronics

Right

A B

100 Alex

102 Maya

104 Mira

Courses HoD

A B C D

100 Database 100 Alex

101 Mechanics --- ---

102 Electronics 102 Maya

Right Outer Join: ( R S ) All the tuples from the Right relation, S, are included in the resulting

relation. If there are tuples in S without any matching tuple in R, then the

R-attributes of resulting relation are made NULL.

Courses HoD

A B C D



--- --- 104 Mira

Full Outer Join: ( R S) All the tuples from both participating relations are included in the resulting

relation. If there are no matching tuples for both relations, their respective

unmatched attributes are made NULL.

Courses HoD

A B C D


101 Mechanics --- ---


--- --- 104 Mira

Integrity Constraints

1. Integrity constraints provide a way of ensuring that changes made to the

database by authorized users do not result in a loss of data consistency.

2. We saw a form of integrity constraint with E-R models:

o key declarations: stipulation that certain attributes form a candidate key

for the entity set.

o form of a relationship: mapping cardinalities 1-1, 1-many and many-

many.

3. An integrity constraint can be any arbitrary predicate applied to the database.

4. They may be costly to evaluate, so we will only consider integrity constraints

that can be tested with minimal overhead.

Domain Integrity

Domain integrity means the definition of a valid set of values for an attribute. You define

- data type,

- lenght or size

- is null value allowed

- is the value unique or not

for an attribute.

You may also define the default value, the range (values in between) and/or specific values

for the attribute. Some DBMS allow you to define the output format and/or input mask for

the attribute.

These definitions ensure that a specific attribute will have a right and proper value in the

database.

Entity Integrity Constraint

The entity integrity constraint states that primary keys can't be null. There must be a proper

value in the primary key field.

This is because the primary key value is used to identify individual rows in a table. If there

were null values for primary keys, it would mean that we could not indentify those rows.

On the other hand, there can be null values other than primary key fields. Null value means

that one doesn't know the value for that field. Null value is different from zero value or

space.

In the Car Rental database in the Car table each car must have a proper and unique

Reg_No. There might be a car whose rate is unknown - maybe the car is broken or it is

brand new - i.e. the Rate field has a null value. See the picture below.

The entity integrity constraints assure that a spesific row in a table can be identified.

Picture. Car and CarType tables in the Rent database

Referential Integrity Constraint

The referential integrity constraint is specified between two tables and it is used to maintain

the consistency among rows between the two tables.

The rules are:

1. You can't delete a record from a primary table if matching records exist in a related table.

2. You can't change a primary key value in the primary table if that record has related

records.

3. You can't enter a value in the foreign key field of the related table that doesn't exist in

the primary key of the primary table.

4. However, you can enter a Null value in the foreign key, specifying that the records are

unrelated.

Examples

Rule 1. You can't delete any of the rows in the CarType table that are visible in the picture

since all the car types are in use in the Car table.

Rule 2. You can't change any of the model_ids in the CarType table since all the car types

are in use in the Car table.

Rule 3. The values that you can enter in the model_id field in the Car table must be in the

model_id field in the CarType table.

Rule 4. The model_id field in the Car table can have a null value which means that the car

type of that car in not known

Foreign Key Integrity Constraint

There are two foreign key integrity constraints: cascade update related fields and cascade

delete related rows. These constraints affect the referential integrity constraint.

Cascade Update Related Fields

Any time you change the primary key of a row in the primary table, the foreign key values

are updated in the matching rows in the related table. This constraint overrules rule 2 in

the referential integrity constraints.

If this contraint is defined in the relationship between the tables Car and CarType, it is

possible to change the model_id in the CarType table. If one should change the model_id 1

(Ford Focus) to model_id 100 in the CarType table, the model_ids in the Car table would

change from 1 to 100 (cars ABC-112, ABC-122, ABC-123).

Cascade Delete Related Rows

Any time you delete a row in the primary table, the matching rows are automatically

deleted in the related table. This constraint overrules rule 1 in the referential integrity

constraints.

If this contraint is defined in the relationship between the tables Car and CarType, it is

possible to delete rows from the CarType table. If one should delete the Ford Focus row

from the CarType table, the cars ABC-112, ABC-122, ABC-123 would be deleted from the

Car table, too. Source: Gillette Cynthia. 2001. MSCE SQL 2000 Database Design. Chapter

2: Data Modelling. Coriolis Group.

Importance of Security in Database Environment

Database security is the protection of the database against intentional and unintentional

threats that may be computer-based or non-computer-based. Database security is the

business of the entire organization as all people use the data held in the organization's

database and any loss or corruption to data would affect the day-to-day operation of the

organization and the performance of the people. Therefore, database security

encompasses hardware, software, infrastructure, people and data of the organization.

Now there is greater emphasis on database security than in the past as the amount of

data stored in corporate database is increasing and people are depending more on the

corporate data for decision-making, customer service management, supply chain

management and so on. Any loss or unavailability to the corporate data will cripple

today's organization and will seriously affect its performance. Now the unavailability of

the database for even a few minutes could result in serious losses to the organization.

Data Security Risks

We have seen that the database security is the concern of the entire organization. The

organization should identify all the risk factors and weak elements from the database

security Perspective and find solutions to counter and neutralize each such threat.

A threat is any situation, event or personnel that will adversely affect the database

security and the smooth and efficient functioning of the organization. A threat may be

caused by a situation or event involving a person, action or circumstance that is likely to

bring harm to the organization. The harm may be tangible, such as loss of data, damage

to hardware, loss of software or intangible such as loss of customer goodwill or

credibility and so on.

Data Tampering

Privacy of communications is essential to ensure that data cannot be modified or viewed

in transit. The chances of data tampering are high in case of distributed environments as

data moves between sites. In a data modification attack, an unauthorized party on the

network intercepts data in transit and changes that data before retransmitting it. An

example of this is changing the amount of a banking transaction from Rs. 1000 to Rs.

10000.

Data Theft

Data must be stored and transmitted securely, so that information such as credit card

numbers cannot be stolen. Over the Internet and Wide Area Network (WAN)

environments, both public carriers and private network owners often route portions of

their network through insecure landlines, extremely vulnerable microwave and satellite

links, or a number of servers. This situation leaves valuable data opens to view by any

interested party. In Local Area Network (LAN) environments within a building or

campus, insiders with access to the physical wiring can potentially view data not

intended for them.

Falsifying User Identities

In a distributed environment, it becomes more feasible for a user to falsify an identity to

gain access to sensitive and important information. Criminals attempt to st.eal users'

credit card numbers, and then make purc~1ases against the accounts. Or they steal

other personal data, such as bank account numbers and driver's license numbers, and

setup bogus credit accounts in someone else's name.

Password-Related Threats

In large systems, users must remember multiple passwords for the different

applications and services that they use. Users typically respond to the problem of

managing multiple passwords in several ways:

• They may select easy-to-guess password

• They may also choose to standardize passwords so that they are the same on all

machines or websites.

All these strategies compromise password secrecy and service availability. Moreover,

administration of multiple user accounts and passwords is complex, time-consuming,

and expensive.

Unauthorized Access to Tables and Columns

The database may contain confidential tables, or confidential columns in a table,

which should not be available indiscriminately to all users authorized to access the

database. It should be possible to protect data on a column level.

Unauthorized Access to Data Rows

Certain data rows may contain confidential information that should not be available

indiscriminately to users authorized to access the table. For example, in a shared

environment' businesses should have access only to their own data; customers should be

able to see only their own orders.

Lack of Accountability

If the system administrator is unable to track users' activities, then users cannot ~e held

responsible for their actions. There must be some reliable ways to monitor who is

performing what operations on the data.

Complex User Management Requirements

System must often support large number of users and therefore they must be scalable.

In such large-scale environments, the burden of managing user accounts and passwords

makes your system yulnerable to error and attack.

Security Levels

To protect the database, we must take security measures at several levels:

• Physical: The sites containing the computer systems must be secured against armed

or surreptitious entry by intruders.

• Human: Users must be authorized carefully to reduce the chance of any such user

giving access to an intruder in exchange for a bribe or other favors .

•Operating System: No matter how secure the database system is, weakness

in operating system security may serve as a means of unauthorized access to the

database.

• Network: Since almost all database systems allow remote access through terminals or

networks, software-level security within the network software is as important as

physical security, both on the Internet and in networks private to an enterprise.

• Database System: Some database-system users may be authorized to access only a

limited portion of the database. Other users may be allowed to issue queries, but may be

forbidden to modify the data. It is responsibility of the database system to ensure that

these authorization restrictions are not violated.

Security at all these levels must be maintained if database security is to be ensured. A

weakness at a low level of security (physical or human) allows circumvention of strict

high level (database) security measures.

http://ecomputernotes.com/fundamental/introduction-to-computer/what-is-computer

http://ecomputernotes.com/fundamental/disk-operating-system/what-is-operating-system

http://ecomputernotes.com/fundamental/disk-operating-system/what-is-operating-system

Data Security Requirements

We should use technology to ensure a secure computing environment for the

organization. Although it is not possible to find a technological solution for all problems,

most of the security issues could be resolved using appropriate technology. The bas~c

security standards which technology can ensure are confidentiality, integrity and

availability.

Confidentiality

A secure system ensures the confidentiality of data. This means that it allows individuals

to see only the data they are supposed to see. Confidentiality has several aspects like

privacy of communications, secure storage of sensitive data, authenticated users and

authorization of users.

Privacy of Communications

The DBMS should be capable of controlling the spread of confidential personal

information such as health, employment, and credit records. It should also keep the

corporate data such as trade secrets, proprietary information about products and

processes, competitive analyses, as well as marketing and sales plans secure and away

from the unauthorized people.

Secure Storage of Sensitive Data

Once confidential data has been entered, its integrity and privacy must be protected on

the databases and servers wherein it Resides.

Authentication

One of the most basic concepts in database security is authentication, which is quite

simply the process by which it system verifies a user's identity, A user can respond to a

request to authenticate by providing a proof of identity, or an authentication token

You're probably already familiar with concept. If you have ever been asked to show a

photo ID (for example, when opening a bank account), you have been presented with

a request for authentication. You proved your identity by showing your driver's license

(or other photo ID). In this case, your driver's license served as your authentication

token.

Despite what you see in the movies, most software programs cannot use futuristic

systems such as face recognition for authentication. Instead most authentication

requests ask you to provide a user ID and a password. Your user ID represents your

http://ecomputernotes.com/fundamental/what-is-a-database/advantages-and-disadvantages-of-dbms

claim to being a person authorized to access the environment, and the password is

protected and you are the only person who knows it.

Authorization

An authenticated user goes through the second layer of security, authorization.

Authorization is the process through which system obtains information about the

authenticated user, including which database operations that user may perform and

which data objects that user may access.

Your driver's license is a perfect example of an authorization document. Though it can

be used for authentication purposes, it also authorizes you to drive a certain class of car.

Furthermore, the type of authorization you have gives you more or fewer privileges as

far as driving a vehicle goes.

A user may have several forms of authorization on parts of the database. There are the

following authorization rights.

• Read authorization allows reading, but not modification, of data.

• Insert authorization allows insertion of new data, but not modification of existing data.

• Update authorization allows modification, but not deletion of data.

• Delete authorization allows deletion of data.

A user may be assigned all, none, 'or a combination of these types of authorization. In

addition to these forms of authorization for access to data, a user may be granted

authorization to modify the database schema:

• Index authorization allows the creation and deletion of indexes.

• Resource authorization allows the creation of new relations.

• Alteration authorization allows the addition or deletion of attributes in a relation.

• Drop authorization allows the deletion of relations.

The drop and delete authorization differ in that delete authorization allows deletion of

tuples only. If a user deletes all tuples of a relation, the relation still exists, but it is

empty. If a relation is dropped it no longer exists. The ability to create new relations is

regulated through resource authorization. A user with resource authorization who

creates a relation is given a privilege on that relation automatically. Index authorization

is given to user to get the fast access of data on the bases of some key field.

Integrity

A secure system en sums that the data it contains is valid. Data integrate means that

data is protected from deletion and corruption, both while it resides within the data-

case, and while it is being transmitted over the network. The detailed discussion on

Integrity is un next section.

Availability

A secure system makes data available to authorized users, without delay. Denial of

service attacks are attempts to block authorized users' ability to access and use the

system when needed.

dbms - data models · dbms - data models database model ... mapping cardinalities define the number...

Documents