dbms - data models · dbms - data models database model ... mapping cardinalities define the number...
TRANSCRIPT
DBMS - Data Models
Database Model
A Database model defines the logical design of data. The model describes the relationships between
different parts of the data. In history of database design, three models have been in use.
Hierarchical Model
Network Model
Relational Model
Hierarchical Model
In this model each entity has only one parent but can have several children . At the top of hierarchy
there is only one entity which is called Root.
Network Model
In the network model, entities are organised in a graph,in which some entities can be accessed
through sveral path
Relational Model
In this model, data is organised in two-dimesional tables called relations. The tables or relation are
related to each other.
The most popular data model in DBMS is the Relational Model. It is more
scientific a model than others. This model is based on first-order predicate
logic and defines a table as an n-ary relation.
The main highlights of this model are −
Data is stored in tables called relations.
Relations can be normalized.
In normalized relations, values saved are atomic values.
Each row in a relation contains a unique value.
Each column in a relation contains values from a same domain.
Entity-Relationship Model Entity-Relationship (ER) Model is based on the notion of real-world entities
and relationships among them. While formulating real-world scenario into
the database model, the ER Model creates entity set, relationship set,
general attributes and constraints.
ER Model is best used for the conceptual design of a database.
ER Model is based on −
Entities and their attributes.
Relationships among entities.
These concepts are explained below.
Entity − An entity in an ER Model is a real-world entity having properties
called attributes. Every attribute is defined by its set of values
called domain. For example, in a school database, a student is considered as
an entity. Student has various attributes like name, age, class, etc.
Relationship − The logical association among entities is calledrelationship.
Relationships are mapped with entities in various ways. Mapping cardinalities
define the number of association between two entities.
Mapping cardinalities −
o one to one
o one to many
o many to one
o many to many
RDBMS Concepts
A Relational Database management System(RDBMS) is a database management system based
on relational model introduced by E.F Codd. In relational model, data is represented in terms of
tuples(rows).
RDBMS is used to manage Relational database. Relational database is a collection of organized
set of tables from which data can be accessed easily. Relational Database is most commonly used
database. It consists of number of tables and each table has its own primary key.
What is Table ?
In Relational database, a table is a collection of data elements organised in terms of rows and
columns. A table is also considered as convenient representation of relations. But a table can have
duplicate tuples while a true relation cannot have duplicate tuples. Table is the most simplest form
of data storage. Below is an example of Employee table.
ID Name Age Salary
1 Adam 34 13000
2 Alex 28 15000
3 Stuart 20 18000
4 Ross 42 19020
What is a Record ?
A single entry in a table is called a Record or Row. A Record in a table represents set of related
data. For example, the above Employee table has 4 records. Following is an example of single
record.
1 Adam 34 13000
What is Field ?
A table consists of several records(row), each record can be broken into several smaller entities
known asFields. The above Employee table consist of four fields, ID, Name, Age and Salary.
What is a Column ?
In Relational table, a column is a set of value of a particular type. The term Attribute is also used to
represent a column. For example, in Employee table, Name is a column that represent names of
employee.
Name
Adam
Alex
Stuart
Ross
Database Keys
Keys are very important part of Relational database. They are used to establish and identify relation
between tables. They also ensure that each record within a table can be uniquely identified by
combination of one or more fields within a table.
Super Key
Super Key is defined as a set of attributes within a table that uniquely identifies each record within a
table. Super Key is a superset of Candidate key.
Candidate Key
Candidate keys are defined as the set of fields from which primary key can be selected. It is an
attribute or set of attribute that can act as a primary key for a table to uniquely identify each record in
that table.
Primary Key
Primary key is a candidate key that is most appropriate to become main key of the table. It is a key
that uniquely identify each record in a table.
Composite Key
Key that consist of two or more attributes that uniquely identify an entity occurance is
called Composite key. But any attribute that makes up the Composite key is not a simple key in its
own.
Secondary or Alternative key
The candidate key which are not selected for primary key are known as secondary keys or
alternative keys
Non-key Attribute
Non-key attributes are attributes other than candidate key attributes in a table.
Non-prime Attribute
Non-prime Attributes are attributes other than Primary attribute.
E-R Diagram
ER-Diagram is a visual representation of data that describes how data is related to each other.
Symbols and Notations
Components of E-R Diagram
The E-R diagram has three main components.
1) Entity
An Entity can be any object, place, person or class. In E-R Diagram, an entity is represented using
rectangles. Consider an example of an Organisation. Employee, Manager, Department, Product and
many more can be taken as entities from an Organisation.
Weak Entity
Weak entity is an entity that depends on another entity. Weak entity doen't have key attribute of their
own. Double rectangle represents weak entity.
2) Attribute
Entities are represented by means of their properties, called attributes. All
attributes have values. For example, a student entity may have name,
class, and age as attributes.
Types of Attributes
Simple attribute − Simple attributes are atomic values, which cannot be
divided further. For example, a student's phone number is an atomic value of
10 digits.
Composite attribute − Composite attributes are made of more than one
simple attribute. For example, a student's complete name may have first_name
and last_name.
Derived attribute − Derived attributes are the attributes that do not exist in
the physical database, but their values are derived from other attributes present
in the database. For example, average_salary in a department should not be
saved directly in the database, instead it can be derived. For another example,
age can be derived from data_of_birth.
Single-value attribute − Single-value attributes contain single value. For
example − Social_Security_Number.
Multi-value attribute − Multi-value attributes may contain more than one
values. For example, a person can have more than one phone number,
email_address, etc.
These attribute types can come together in a way like −
simple single-valued attributes
simple multi-valued attributes
composite single-valued attributes
composite multi-valued attributes
An Attribute describes a property or characterstic of an entity. For example, Name, Age, Address
etc can be attributes of a Student. An attribute is represented using eclipse.
Key Attribute
Key attribute represents the main characterstic of an Entity. It is used to represent Primary key.
Ellipse with underlying lines represent Key Attribute.
Composite Attribute
An attribute can also have their own attributes. These attributes are known as Composite attribute.
3) Relationship
A Relationship describes relations between entities. Relationship is represented using diamonds.
There are three types of relationship that exist between Entities.
Binary Relationship
Recursive Relationship
Ternary Relationship
Binary Relationship
Binary Relationship means relation between two Entities. This is further divided into three types.
1. One to One : This type of relationship is rarely seen in real world.
The above example describes that one student can enroll only for one course and a course will
also have only one Student. This is not what you will usually see in relationship.
2. One to Many : It reflects business rule that one entity is associated with many number of same
entity. The example for this relation might sound a little weird, but this menas that one student
can enroll to many courses, but one course will have one Student.
The arrows in the diagram describes that one student can enroll for only one course.
3. Many to One : It reflects business rule that many entities can be associated with just one entity.
For example, Student enrolls for only one Course but a Course can have many Students.
4. Many to Many :
The above diagram represents that many students can enroll for more than one courses.
Recursive Relationship
When an Entity is related with itself it is known as Recursive Relationship.
Ternary Relationship
Relationship of degree three is called Ternary relationship.
Relationship The association among entities is called a relationship. For example, an
employee works_at a department, a student enrolls in a course. Here,
Works_at and Enrolls are called relationships.
Relationship Set
A set of relationships of similar type is called a relationship set. Like
entities, a relationship too can have attributes. These attributes are
called descriptive attributes.
Degree of Relationship
The number of participating entities in a relationship defines the degree of
the relationship.
Binary = degree 2
Ternary = degree 3
n-ary = degree
Mapping Cardinalities
Cardinality defines the number of entities in one entity set, which can be
associated with the number of entities of other set via relationship set.
One-to-one − One entity from entity set A can be associated with at most one
entity of entity set B and vice versa.
One-to-many − One entity from entity set A can be associated with more than
one entities of entity set B however an entity from entity set B, can be
associated with at most one entity.
Many-to-one − More than one entities from entity set A can be associated with
at most one entity of entity set B, however an entity from entity set B can be
associated with more than one entity from entity set A.
Many-to-many − One entity from A can be associated with more than one
entity from B and vice versa.
Enhanced Entity Relationship Diagram
An enhanced entity-relationship diagram, or EERD, is a
specialized model that deviates from traditional ERDs. It
uses several concepts that are closely related to object-
oriented design and programming.
What is an Enhanced ERD?
An enhanced entity-relationship model, also known as an extended entity-relationship
model, is a type of database diagram that's similar to regular ERDs. Enhanced ERDs are
high-level conceptual models that accurately represent the requirements of complex
databases.
Enhanced ERDs include the same concepts that ordinary ER diagrams encompass. In
addition, EERDs include:
Subtypes and supertypes (sometimes known as subclasses and superclasses)
Specialization or generalization
Category or union type
Attribute and relationship inheritance
Enhanced ERD Definitions and Examples
The modeling concepts of EERDs differ somewhat from those of ERDs. See the list below
for definitions of concepts that are unique to enhanced entity-relationship diagrams. Before
you dive in, be sure to review our ERD pages, including this comprehensive look atER
diagram symbols and meanings. When you fully understand ERD structure, you're ready to
acquaint yourself with enhanced entity-relationship diagrams.
SUPERTYPES & SUBTYPES
Supertype - an entity type that has a relationship with one or more subtypes.
Subtype - a subgroup of entities with unique attributes.
Inheritance - the idea that subtype entities inherit the values of all supertype attributes.
Remember than a subtype instance is also classified as a supertype instance.
GENERALIZATION & SPECIALIZATION
Generalization - the process of defining a general entity type from a collection of
specialized entity types.
Specialization - the inverse of generalization, since it defines subtypes of the supertype
and forms relationships between supertype and subtupe.
Inheritance - the idea that subtype entities inherit the values of all supertype attributes.
Remember than a subtype instance is also classified as a supertype instance.
CONSTRAINTS
Disjointness constraints - decide whether a supertype instance may simultaneously be a
member of two or more subtypes. The disjoint rule forces subclasses to have disjoint sets
of entities. The overlap rule forces a subclass (also known as a supertype instance) to have
overlapping sets of entities.
Completeness constraints - decide whether a supertype instance must also be a member
of at least one subtype. Thetotal specialization rule demands that every entity in the
supclass belong to some subclass. Just as with a regular ERD, total specialization is
symbolized with a double line connection between entities. The partial specialization
rule allows an entity to not belong to any of the subclasses. It is represented with a single
line connection.
SUBTYPE DISCRIMINATORS
A subtype discriminator is an attribute of the supertype that indicates an entity's subtype
The attribute's values are what determine the target subtype.
Disjoint subtypes - simple attributes that must have alternative values to indicate any
possible subtypes.
Overlapping subtypes - composite attributes whose subparts pertain to various subtypes.
Each subpart has a Boolean value that indicates whether or not the instance belongs to the
associated subtype.
How to Create an Effective EERD
Just like entity-relationship diagrams, a well-designed EERD will help you build storage
systems that are long-lasting and useful. When evaluating the effectiveness of an entity
relationship diagram, be sure that you’re modeling a system design that will meet
important business requirements. Possible considerations are:
Stability - will the diagram support business needs that change over time?
Breadth - can this diagram accommodate all of the data we need to store?
Flexibility - can data in this model be re-allocated to support additional information
requirements?
Efficiency - does this model represent the simplest solution? Is data modeled with the
appropriate symbols?
Accessibility - can both creators and end users of the ERD easily understand it?
Conformity - will the design integrate seamlessly with any existing database structure?
Generalization
Generalization is a bottom-up approach in which two lower level entities combine to form a
higher level entity. In generalization, the higher level entity can also combine with other lower
level entity to make further higher level entity.
Specialization
Specialization is opposite to Generalization. It is a top-down approach in which one higher level
entity can be broken down into two lower level entity. In specialization, some higher level
entities may not have lower-level entity sets at all.
Aggregration
Aggregration is a process when relation between two entity is treated as a single entity. Here the
relation between Center and Course, is acting as an Entity in relation with Visitor.
Relational Algebra Relational algebra is a procedural query language, which takes instances of
relations as input and yields instances of relations as output. It uses
operators to perform queries. An operator can be either unary or binary.
They accept relations as their input and yield relations as their output.
Relational algebra is performed recursively on a relation and intermediate
results are also considered relations.
The fundamental operations of relational algebra are as follows −
Select
Project
Union
Set different
Cartesian product
Rename
We will discuss all these operations in the following sections.
Select Operation (σ) It selects tuples that satisfy the given predicate from a relation.
Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is
prepositional logic formula which may use connectors like and, or, and not.
These terms may use relational operators like − =, ≠, ≥, < , >, ≤.
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database'.
σsubject = "database" and price = "450"(Books)
Output − Selects tuples from books where subject is 'database' and 'price'
is 450.
σsubject = "database" and price = "450" or year > "2010"(Books)
Output − Selects tuples from books where subject is 'database' and 'price'
is 450 or those books published after 2010.
Project Operation (∏) It projects column(s) that satisfy a given predicate.
Notation − ∏A1, A2, An (r)
Where A1, A2 , An are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
For example −
∏subject, author (Books)
Selects and projects columns named as subject and author from the
relation Books.
Union Operation (∪) It performs binary union between two given relations and is defined as −
r ∪ s = { t | t ∈ r or t ∈ s}
Notion − r U s
Where r and s are either database relations or relation result set
(temporary relation).
For a union operation to be valid, the following conditions must hold −
r, and s must have the same number of attributes.
Attribute domains must be compatible.
Duplicate tuples are automatically eliminated.
∏ author (Books) ∪ ∏ author (Articles)
Output − Projects the names of the authors who have either written a book
or an article or both.
Set Difference (−) The result of set difference operation is tuples, which are present in one
relation but are not in the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)
Output − Provides the name of authors who have written books but not
articles.
Cartesian Product (Χ) Combines information of two different relations into one.
Notation − r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
σauthor = 'tutorialspoint'(Books Χ Articles)
Output − Yields a relation, which shows all the books and articles written
by tutorialspoint.
Rename Operation (ρ) The results of relational algebra are also relations but without any name.
The rename operation allows us to rename the output relation. 'rename'
operation is denoted with small Greek letter rho ρ.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.
Additional operations are −
Set intersection
Assignment
Natural join
Relational Calculus In contrast to Relational Algebra, Relational Calculus is a non-procedural
query language, that is, it tells what to do but never explains how to do it.
Relational calculus exists in two forms −
Tuple Relational Calculus (TRC)
Filtering variable ranges over tuples
Notation − {T | Condition}
Returns all tuples T that satisfies a condition.
For example −
{ T.name | Author(T) AND T.article = 'database' }
Output − Returns tuples with 'name' from Author who has written article
on 'database'.
TRC can be quantified. We can use Existential (∃) and Universal Quantifiers
(∀).
For example −
{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output − The above query will yield the same result as the previous one.
Domain Relational Calculus (DRC)
In DRC, the filtering variable uses the domain of attributes instead of entire
tuple values (as done in TRC, mentioned above).
Notation −
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where a1, a2 are attributes and P stands for formulae built by inner
attributes.
For example −
{< article, page, subject > | ∈ TutorialsPoint ∧ subject = 'database'}
Output − Yields Article, Page, and Subject from the relation TutorialsPoint,
where subject is database.
Just like TRC, DRC can also be written using existential and universal
quantifiers. DRC also involves relational operators.
The expression power of Tuple Relation Calculus and Domain Relation
Calculus is equivalent to Relational Algebra.
Relational Calculus
Relational calculus is a non procedural query language. It uses mathematical predicate calculus
instead of algebra. It provides the description about the query to get the result where as relational
algebra gives the method to get the result. It informs the system what to do with the relation, but
does not inform how to perform it.
For example, steps involved in listing all the students who attend ‘Database’ Course in relational
algebra would be
SELECT the tuples from COURSE relation with COURSE_NAME = ‘DATABASE’
PROJECT the COURSE_ID from above result
SELECT the tuples from STUDENT relation with COUSE_ID resulted above.
There are two types of relational calculus - Tuple Relational Calculus (TRC) and Domain Relational
Calculus (DRC).
A tuple relational calculus is a non procedural query language which specifies to select the tuples in
a relation. It can select the tuples with range of values or tuples for certain attribute values etc. The
resulting relation can have one or more tuples. It is denoted as below:
{t | P (t)} or {t | condition (t)} -- this is also known as expression of relational calculus
Where t is the resulting tuples, P(t) is the condition used to fetch t.
{t | EMPLOYEE (t) and t.SALARY>10000} - implies that it selects the tuples from EMPLOYEE
relation such that resulting employee tuples will have salary greater than 10000. It is example of
selecting a range of values.
{t | EMPLOYEE (t) AND t.DEPT_ID = 10} – this select all the tuples of employee name who work for
Department 10.
The variable which is used in the condition is called tuple variable. In above example t.SALARY
and t.DEPT_ID are tuple variables. In the first example above, we have specified the condition
t.SALARY >10000. What is the meaning of it? For all the SALARY>10000, display the employees.
Here the SALARY is called as bound variable. Any tuple variable with ‘For All’ (?) or ‘there exists’ (?)
condition is called bound variable. Here, for any range of values of SALARY greater than 10000,
the meaning of the condition remains the same. Bound variables are those ranges of tuple variables
whose meaning will not change if the tuple variable is replaced by another tuple variable.
In the second example, we have used DEPT_ID= 10. That means only for DEPT_ID = 10 display
employee details. Such variable is called free variable. Any tuple variable without any ‘For All’ or
‘there exists’ condition is called Free Variable. If we change DEPT_ID in this condition to some
other variable, say EMP_ID, the meaning of the query changes. For example, if we change EMP_ID
= 10, then above it will result in different result set. Free variables are those ranges of tuple variables
whose meaning will change if the tuple variable is replaced by another tuple variable.
All the conditions used in the tuple expression are called as well formed formula – WFF. All the
conditions in the expression are combined by using logical operators like AND, OR and NOT, and
qualifiers like ‘For All’ (?) or ‘there exists’ (?). If the tuple variables are all bound variables in a WFF
is called closed WFF. In an open WFF, we will have at least one free variable.
Domain Relational Calculus
In contrast to tuple relational calculus, domain relational calculus uses list of attribute to be selected
from the relation based on the condition. It is same as TRC, but differs by selecting the attributes
rather than selecting whole tuples. It is denoted as below:
{< a1, a2, a3, … an > | P(a1, a2, a3, … an)}
Where a1, a2, a3, … an are attributes of the relation and P is the condition.
For example, select EMP_ID and EMP_NAME of employees who work for department 10
{<EMP_ID, EMP_NAME> | <EMP_ID, EMP_NAME> ? EMPLOYEE Λ DEPT_ID = 10}
Get name of the department name that Alex works for.
{DEPT_NAME |< DEPT_NAME > ? DEPT Λ ? DEPT_ID (<DEPT_ID> ? EMPLOYEE Λ
EMP_NAME = Alex)}
Here green color expression is evaluated to get the department Id of Alex and then it is used to get
the department name form DEPT relation.
Let us consider another example where select EMP_ID, EMP_NAME and ADDRESS the employees
from the department where Alex works. What will be done here?
{<EMP_ID, EMP_NAME, ADDRESS, DEPT_ID > | <EMP_ID, EMP_NAME, ADDRESS,
DEPT_ID> ? EMPLOYEE Λ ? DEPT_ID (<DEPT_ID> ? EMPLOYEE Λ EMP_NAME = Alex)}
First, formula is evaluated to get the department ID of Alex (green color), and then all the employees
with that department is searched (red color).
Other concepts of TRC like free variable, bound variable, WFF etc remains same in DRC too. Its
only difference is DRC is based on attributes of relation.
What is RDBMS? RDBMS stands for Relational Database Management System. RDBMS is the
basis for SQL, and for all modern database systems like MS SQL Server,
IBM DB2, Oracle, MySQL, and Microsoft Access.
A Relational database management system (RDBMS) is a database
management system (DBMS) that is based on the relational model as
introduced by E. F. Codd.
What is table? The data in RDBMS is stored in database objects called tables. The table is
a collection of related data entries and it consists of columns and rows.
Remember, a table is the most common and simplest form of data storage
in a relational database. Following is the example of a CUSTOMERS table:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
What is field? Every table is broken up into smaller entities called fields. The fields in the
CUSTOMERS table consist of ID, NAME, AGE, ADDRESS and SALARY.
A field is a column in a table that is designed to maintain specific
information about every record in the table.
What is record or row? A record, also called a row of data, is each individual entry that exists in a
table. For example there are 7 records in the above CUSTOMERS table.
Following is a single row of data or record in the CUSTOMERS table:
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+----+----------+-----+-----------+----------+
A record is a horizontal entity in a table.
What is column? A column is a vertical entity in a table that contains all information
associated with a specific field in a table.
For example, a column in the CUSTOMERS table is ADDRESS, which
represents location description and would consist of the following:
+-----------+
| ADDRESS |
+-----------+
| Ahmedabad |
| Delhi |
| Kota |
| Mumbai |
| Bhopal |
| MP |
| Indore |
+----+------+
What is NULL value? A NULL value in a table is a value in a field that appears to be blank, which
means a field with a NULL value is a field with no value.
It is very important to understand that a NULL value is different than a zero
value or a field that contains spaces. A field with a NULL value is one that
has been left blank during record creation.
SQL Constraints: Constraints are the rules enforced on data columns on table. These are
used to limit the type of data that can go into a table. This ensures the
accuracy and reliability of the data in the database.
Constraints could be column level or table level. Column level constraints
are applied only to one column where as table level constraints are applied
to the whole table.
Following are commonly used constraints available in SQL:
NOT NULL Constraint: Ensures that a column cannot have NULL value.
DEFAULT Constraint: Provides a default value for a column when none is
specified.
UNIQUE Constraint: Ensures that all values in a column are different.
PRIMARY Key: Uniquely identified each rows/records in a database table.
FOREIGN Key: Uniquely identified a rows/records in any another database table.
CHECK Constraint: The CHECK constraint ensures that all values in a column
satisfy certain conditions.
INDEX: Use to create and retrieve data from the database very quickly.
Data Integrity: The following categories of the data integrity exist with each RDBMS:
Entity Integrity: There are no duplicate rows in a table.
Domain Integrity: Enforces valid entries for a given column by restricting the
type, the format, or the range of values.
Referential integrity: Rows cannot be deleted, which are used by other
records.
User-Defined Integrity: Enforces some specific business rules that do not fall
into entity, domain or referential integrity.
Functional Dependency Functional dependency (FD) is a set of constraints between two attributes in
a relation. Functional dependency says that if two tuples have same values
for attributes A1, A2,..., An, then those two tuples must have to have same
values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y,
where X functionally determines Y. The left-hand side attributes determine
the values of attributes on the right-hand side.
Armstrong's Axioms If F is a set of functional dependencies then the closure of F, denoted as F+,
is the set of all functional dependencies logically implied by F. Armstrong's
Axioms are a set of rules, that when applied repeatedly, generates a closure
of functional dependencies.
Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha,
then alpha holds beta.
Augmentation rule − If a → b holds and y is attribute set, then ay → by also
holds. That is adding attributes in dependencies, does not change the basic
dependencies.
Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c
holds, then a → c also holds. a → b is called as a functionally that determines b.
Trivial Functional Dependency Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X,
then it is called a trivial FD. Trivial FDs always hold.
Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is
called a non-trivial FD.
Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is
said to be a completely non-trivial FD.
Normalization If a database design is not perfect, it may contain anomalies, which are like
a bad dream for any database administrator. Managing a database with
anomalies is next to impossible.
Update anomalies − If data items are scattered and are not linked to each
other properly, then it could lead to strange situations. For example, when we
try to update one data item having its copies scattered over several places, a
few instances get updated properly while a few others are left with old values.
Such instances leave the database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was left
undeleted because of unawareness, the data is also saved somewhere else.
Insert anomalies − We tried to insert data in a record that does not exist at
all.
Normalization is a method to remove all these anomalies and bring the
database to a consistent state.
First Normal Form First Normal Form is defined in the definition of relations (tables) itself. This
rule defines that all the attributes in a relation must have atomic domains.
The values in an atomic domain are indivisible units.
We re-arrange the relation (table) as below, to convert it to First Normal
Form.
Each attribute must contain only a single value from its pre-defined domain.
Second Normal Form Before we learn about the second normal form, we need to understand the
following −
Prime attribute − An attribute, which is a part of the prime-key, is known as a
prime attribute.
Non-prime attribute − An attribute, which is not a part of the prime-key, is
said to be a non-prime attribute.
If we follow second normal form, then every non-prime attribute should be
fully functionally dependent on prime key attribute. That is, if X → A holds,
then there should not be any proper subset Y of X, for which Y → A also
holds true.
We see here in Student_Project relation that the prime key attributes are
Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e.
Stu_Name and Proj_Name must be dependent upon both and not on any of
the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is calledpartial dependency, which is not allowed in
Second Normal Form.
We broke the relation in two as depicted in the above picture. So there
exists no partial dependency.
Third Normal Form For a relation to be in Third Normal Form, it must be in Second Normal form
and the following must satisfy −
No non-prime attribute is transitively dependent on prime key attribute.
For any non-trivial functional dependency, X → A, then either −
o X is a superkey or,
o A is prime attribute.
We find that in the above Student_detail relation, Stu_ID is the key and
only prime key attribute. We find that City can be identified by Stu_ID as
well as Zip itself. Neither Zip is a superkey nor is City a prime attribute.
Additionally, Stu_ID → Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two
relations as follows −
Boyce-Codd Normal Form Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on
strict terms. BCNF states that −
For any non-trivial functional dependency, X → A, X must be a super-key.
In the above image, Stu_ID is the super-key in the relation Student_Detail
and Zip is the super-key in the relation ZipCodes. So,
Stu_ID → Stu_Name, Zip
and
Zip → City
Which confirms that both the relations are in BCNF.
Multivalued Dependencies
1. Functional dependencies rule out certain tuples from appearing in a relation.
If A B, then we cannot have two tuples with the same A value but
different B values.
2. Multivalued dependencies do not rule out the existence of certain tuples.
Instead, they require that other tuples of a certain form be present in the
relation.
3. Let R be a relation schema, and let and .
The multivalued dependency
holds on R if in any legal relation r(R), for all pairs of tuples and in r such
that , there exist tuples and in r such that:
Normalization Using Multivalued
Dependencies (not to be covered)
1. Suppose that in our banking example, we had an alternative design including
the schema: 2. BC-schema = (loan#, cname, street, ccity) 3.
We can see this is not BCNF, as the functional dependency
cname street ccity
holds on this schema, and cname is not a superkey.
4. If we have customers who have several addresses, though, then we no longer
wish to enforce this functional dependency, and the schema is in BCNF.
5. However, we now have the repetition of information problem. For each
address, we must repeat the loan numbers for a customer, and vice versa.
DBMS – Joins Join is a combination of a Cartesian product followed by a selection
process. A Join operation pairs two tuples from different relations, if and
only if a given join condition is satisfied.
We will briefly describe various join types in the following sections.
Theta (θ) Join Theta join combines tuples from different relations provided they satisfy the
theta condition. The join condition is denoted by the symbol θ.
Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,..
,Bn) such that the attributes don’t have anything in common, that is R1 ∩
R2 = Φ.
Theta join can use all kinds of comparison operators.
Student
SID Name Std
101 Alex 10
102 Maria 11
Subjects
Class Subject
10 Math
10 English
11 Music
11 Sports
Student_Detail −
STUDENT ⋈Student.Std = Subject.Class SUBJECT
Student_detail
SID Name Std Class Subject
101 Alex 10 10 Math
101 Alex 10 10 English
102 Maria 11 11 Music
102 Maria 11 11 Sports
Equijoin When Theta join uses only equality comparison operator, it is said to be
equijoin. The above example corresponds to equijoin.
Natural Join (⋈)
Natural join does not use any comparison operator. It does not concatenate
the way a Cartesian product does. We can perform a Natural Join only if
there is at least one common attribute that exists between two relations. In
addition, the attributes must have the same name and domain.
Natural join acts on those matching attributes where the values of
attributes in both the relations are same.
Courses
CID Course Dept
CS01 Database CS
ME01 Mechanics ME
EE01 Electronics EE
HoD
Dept Head
CS Alex
ME Maya
EE Mira
Courses ⋈ HoD
Dept CID Course Head
CS CS01 Database Alex
ME ME01 Mechanics Maya
EE EE01 Electronics Mira
Outer Joins Theta Join, Equijoin, and Natural Join are called inner joins. An inner join
includes only those tuples with matching attributes and the rest are
discarded in the resulting relation. Therefore, we need to use outer joins to
include all the tuples from the participating relations in the resulting
relation. There are three kinds of outer joins − left outer join, right outer
join, and full outer join.
Left Outer Join(R S) All the tuples from the Left relation, R, are included in the resulting relation.
If there are tuples in R without any matching tuple in the Right relation S,
then the S-attributes of the resulting relation are made NULL.
Left
A B
100 Database
101 Mechanics
102 Electronics
Right
A B
100 Alex
102 Maya
104 Mira
Courses HoD
A B C D
100 Database 100 Alex
101 Mechanics --- ---
102 Electronics 102 Maya
Right Outer Join: ( R S ) All the tuples from the Right relation, S, are included in the resulting
relation. If there are tuples in S without any matching tuple in R, then the
R-attributes of resulting relation are made NULL.
Courses HoD
A B C D
100 Database 100 Alex
102 Electronics 102 Maya
--- --- 104 Mira
Full Outer Join: ( R S) All the tuples from both participating relations are included in the resulting
relation. If there are no matching tuples for both relations, their respective
unmatched attributes are made NULL.
Courses HoD
A B C D
100 Database 100 Alex
101 Mechanics --- ---
102 Electronics 102 Maya
--- --- 104 Mira
Integrity Constraints
1. Integrity constraints provide a way of ensuring that changes made to the
database by authorized users do not result in a loss of data consistency.
2. We saw a form of integrity constraint with E-R models:
o key declarations: stipulation that certain attributes form a candidate key
for the entity set.
o form of a relationship: mapping cardinalities 1-1, 1-many and many-
many.
3. An integrity constraint can be any arbitrary predicate applied to the database.
4. They may be costly to evaluate, so we will only consider integrity constraints
that can be tested with minimal overhead.
Domain Integrity
Domain integrity means the definition of a valid set of values for an attribute. You define
- data type,
- lenght or size
- is null value allowed
- is the value unique or not
for an attribute.
You may also define the default value, the range (values in between) and/or specific values
for the attribute. Some DBMS allow you to define the output format and/or input mask for
the attribute.
These definitions ensure that a specific attribute will have a right and proper value in the
database.
Entity Integrity Constraint
The entity integrity constraint states that primary keys can't be null. There must be a proper
value in the primary key field.
This is because the primary key value is used to identify individual rows in a table. If there
were null values for primary keys, it would mean that we could not indentify those rows.
On the other hand, there can be null values other than primary key fields. Null value means
that one doesn't know the value for that field. Null value is different from zero value or
space.
In the Car Rental database in the Car table each car must have a proper and unique
Reg_No. There might be a car whose rate is unknown - maybe the car is broken or it is
brand new - i.e. the Rate field has a null value. See the picture below.
The entity integrity constraints assure that a spesific row in a table can be identified.
Picture. Car and CarType tables in the Rent database
Referential Integrity Constraint
The referential integrity constraint is specified between two tables and it is used to maintain
the consistency among rows between the two tables.
The rules are:
1. You can't delete a record from a primary table if matching records exist in a related table.
2. You can't change a primary key value in the primary table if that record has related
records.
3. You can't enter a value in the foreign key field of the related table that doesn't exist in
the primary key of the primary table.
4. However, you can enter a Null value in the foreign key, specifying that the records are
unrelated.
Examples
Rule 1. You can't delete any of the rows in the CarType table that are visible in the picture
since all the car types are in use in the Car table.
Rule 2. You can't change any of the model_ids in the CarType table since all the car types
are in use in the Car table.
Rule 3. The values that you can enter in the model_id field in the Car table must be in the
model_id field in the CarType table.
Rule 4. The model_id field in the Car table can have a null value which means that the car
type of that car in not known
Foreign Key Integrity Constraint
There are two foreign key integrity constraints: cascade update related fields and cascade
delete related rows. These constraints affect the referential integrity constraint.
Cascade Update Related Fields
Any time you change the primary key of a row in the primary table, the foreign key values
are updated in the matching rows in the related table. This constraint overrules rule 2 in
the referential integrity constraints.
If this contraint is defined in the relationship between the tables Car and CarType, it is
possible to change the model_id in the CarType table. If one should change the model_id 1
(Ford Focus) to model_id 100 in the CarType table, the model_ids in the Car table would
change from 1 to 100 (cars ABC-112, ABC-122, ABC-123).
Cascade Delete Related Rows
Any time you delete a row in the primary table, the matching rows are automatically
deleted in the related table. This constraint overrules rule 1 in the referential integrity
constraints.
If this contraint is defined in the relationship between the tables Car and CarType, it is
possible to delete rows from the CarType table. If one should delete the Ford Focus row
from the CarType table, the cars ABC-112, ABC-122, ABC-123 would be deleted from the
Car table, too. Source: Gillette Cynthia. 2001. MSCE SQL 2000 Database Design. Chapter
2: Data Modelling. Coriolis Group.
Importance of Security in Database Environment
Database security is the protection of the database against intentional and unintentional
threats that may be computer-based or non-computer-based. Database security is the
business of the entire organization as all people use the data held in the organization's
database and any loss or corruption to data would affect the day-to-day operation of the
organization and the performance of the people. Therefore, database security
encompasses hardware, software, infrastructure, people and data of the organization.
Now there is greater emphasis on database security than in the past as the amount of
data stored in corporate database is increasing and people are depending more on the
corporate data for decision-making, customer service management, supply chain
management and so on. Any loss or unavailability to the corporate data will cripple
today's organization and will seriously affect its performance. Now the unavailability of
the database for even a few minutes could result in serious losses to the organization.
Data Security Risks
We have seen that the database security is the concern of the entire organization. The
organization should identify all the risk factors and weak elements from the database
security Perspective and find solutions to counter and neutralize each such threat.
A threat is any situation, event or personnel that will adversely affect the database
security and the smooth and efficient functioning of the organization. A threat may be
caused by a situation or event involving a person, action or circumstance that is likely to
bring harm to the organization. The harm may be tangible, such as loss of data, damage
to hardware, loss of software or intangible such as loss of customer goodwill or
credibility and so on.
Data Tampering
Privacy of communications is essential to ensure that data cannot be modified or viewed
in transit. The chances of data tampering are high in case of distributed environments as
data moves between sites. In a data modification attack, an unauthorized party on the
network intercepts data in transit and changes that data before retransmitting it. An
example of this is changing the amount of a banking transaction from Rs. 1000 to Rs.
10000.
Data Theft
Data must be stored and transmitted securely, so that information such as credit card
numbers cannot be stolen. Over the Internet and Wide Area Network (WAN)
environments, both public carriers and private network owners often route portions of
their network through insecure landlines, extremely vulnerable microwave and satellite
links, or a number of servers. This situation leaves valuable data opens to view by any
interested party. In Local Area Network (LAN) environments within a building or
campus, insiders with access to the physical wiring can potentially view data not
intended for them.
Falsifying User Identities
In a distributed environment, it becomes more feasible for a user to falsify an identity to
gain access to sensitive and important information. Criminals attempt to st.eal users'
credit card numbers, and then make purc~1ases against the accounts. Or they steal
other personal data, such as bank account numbers and driver's license numbers, and
setup bogus credit accounts in someone else's name.
Password-Related Threats
In large systems, users must remember multiple passwords for the different
applications and services that they use. Users typically respond to the problem of
managing multiple passwords in several ways:
• They may select easy-to-guess password
• They may also choose to standardize passwords so that they are the same on all
machines or websites.
All these strategies compromise password secrecy and service availability. Moreover,
administration of multiple user accounts and passwords is complex, time-consuming,
and expensive.
Unauthorized Access to Tables and Columns
The database may contain confidential tables, or confidential columns in a table,
which should not be available indiscriminately to all users authorized to access the
database. It should be possible to protect data on a column level.
Unauthorized Access to Data Rows
Certain data rows may contain confidential information that should not be available
indiscriminately to users authorized to access the table. For example, in a shared
environment' businesses should have access only to their own data; customers should be
able to see only their own orders.
Lack of Accountability
If the system administrator is unable to track users' activities, then users cannot ~e held
responsible for their actions. There must be some reliable ways to monitor who is
performing what operations on the data.
Complex User Management Requirements
System must often support large number of users and therefore they must be scalable.
In such large-scale environments, the burden of managing user accounts and passwords
makes your system yulnerable to error and attack.
Security Levels
To protect the database, we must take security measures at several levels:
• Physical: The sites containing the computer systems must be secured against armed
or surreptitious entry by intruders.
• Human: Users must be authorized carefully to reduce the chance of any such user
giving access to an intruder in exchange for a bribe or other favors .
•Operating System: No matter how secure the database system is, weakness
in operating system security may serve as a means of unauthorized access to the
database.
• Network: Since almost all database systems allow remote access through terminals or
networks, software-level security within the network software is as important as
physical security, both on the Internet and in networks private to an enterprise.
• Database System: Some database-system users may be authorized to access only a
limited portion of the database. Other users may be allowed to issue queries, but may be
forbidden to modify the data. It is responsibility of the database system to ensure that
these authorization restrictions are not violated.
Security at all these levels must be maintained if database security is to be ensured. A
weakness at a low level of security (physical or human) allows circumvention of strict
high level (database) security measures.
Data Security Requirements
We should use technology to ensure a secure computing environment for the
organization. Although it is not possible to find a technological solution for all problems,
most of the security issues could be resolved using appropriate technology. The bas~c
security standards which technology can ensure are confidentiality, integrity and
availability.
Confidentiality
A secure system ensures the confidentiality of data. This means that it allows individuals
to see only the data they are supposed to see. Confidentiality has several aspects like
privacy of communications, secure storage of sensitive data, authenticated users and
authorization of users.
Privacy of Communications
The DBMS should be capable of controlling the spread of confidential personal
information such as health, employment, and credit records. It should also keep the
corporate data such as trade secrets, proprietary information about products and
processes, competitive analyses, as well as marketing and sales plans secure and away
from the unauthorized people.
Secure Storage of Sensitive Data
Once confidential data has been entered, its integrity and privacy must be protected on
the databases and servers wherein it Resides.
Authentication
One of the most basic concepts in database security is authentication, which is quite
simply the process by which it system verifies a user's identity, A user can respond to a
request to authenticate by providing a proof of identity, or an authentication token
You're probably already familiar with concept. If you have ever been asked to show a
photo ID (for example, when opening a bank account), you have been presented with
a request for authentication. You proved your identity by showing your driver's license
(or other photo ID). In this case, your driver's license served as your authentication
token.
Despite what you see in the movies, most software programs cannot use futuristic
systems such as face recognition for authentication. Instead most authentication
requests ask you to provide a user ID and a password. Your user ID represents your
claim to being a person authorized to access the environment, and the password is
protected and you are the only person who knows it.
Authorization
An authenticated user goes through the second layer of security, authorization.
Authorization is the process through which system obtains information about the
authenticated user, including which database operations that user may perform and
which data objects that user may access.
Your driver's license is a perfect example of an authorization document. Though it can
be used for authentication purposes, it also authorizes you to drive a certain class of car.
Furthermore, the type of authorization you have gives you more or fewer privileges as
far as driving a vehicle goes.
A user may have several forms of authorization on parts of the database. There are the
following authorization rights.
• Read authorization allows reading, but not modification, of data.
• Insert authorization allows insertion of new data, but not modification of existing data.
• Update authorization allows modification, but not deletion of data.
• Delete authorization allows deletion of data.
A user may be assigned all, none, 'or a combination of these types of authorization. In
addition to these forms of authorization for access to data, a user may be granted
authorization to modify the database schema:
• Index authorization allows the creation and deletion of indexes.
• Resource authorization allows the creation of new relations.
• Alteration authorization allows the addition or deletion of attributes in a relation.
• Drop authorization allows the deletion of relations.
The drop and delete authorization differ in that delete authorization allows deletion of
tuples only. If a user deletes all tuples of a relation, the relation still exists, but it is
empty. If a relation is dropped it no longer exists. The ability to create new relations is
regulated through resource authorization. A user with resource authorization who
creates a relation is given a privilege on that relation automatically. Index authorization
is given to user to get the fast access of data on the bases of some key field.
Integrity
A secure system en sums that the data it contains is valid. Data integrate means that
data is protected from deletion and corruption, both while it resides within the data-
case, and while it is being transmitted over the network. The detailed discussion on
Integrity is un next section.
Availability
A secure system makes data available to authorized users, without delay. Denial of
service attacks are attempts to block authorized users' ability to access and use the
system when needed.