rdbms

142

Upload: api-3746880

Post on 13-Nov-2014

8 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Rdbms
Page 2: Rdbms

2

Page 3: Rdbms

Introduction to

Database Management Systems

(DBMS)

Page 4: Rdbms

4

Database Management Database Management System (DBMS)System (DBMS)

Definitions:Definitions:

Data: Data: Known facts that can be Known facts that can be recorded and that have implicit meaningrecorded and that have implicit meaning

Database:Database: Collection of related data Collection of related data Ex. the names, telephone numbers and Ex. the names, telephone numbers and

addresses of all the people you knowaddresses of all the people you know

Database Management System:Database Management System: A A computerized record-keeping systemcomputerized record-keeping system

Page 5: Rdbms

5

DBMS (Contd.)DBMS (Contd.) Goals of a Database Management System:Goals of a Database Management System:

To provide an efficient as well as convenient To provide an efficient as well as convenient

environment for accessing data in a databaseenvironment for accessing data in a database Enforce information security: database Enforce information security: database

security, concurrence control, crash recoverysecurity, concurrence control, crash recovery

It is a general purpose facility for:It is a general purpose facility for: Defining Defining database database ConstructingConstructing database database Manipulating Manipulating databasedatabase

Page 6: Rdbms

6

Benefits of database Benefits of database approach approach

Redundancy can be reducedRedundancy can be reduced Inconsistency can be avoided Inconsistency can be avoided Data can be sharedData can be shared Standards can be enforcedStandards can be enforced Security restrictions can be appliedSecurity restrictions can be applied Integrity can be maintainedIntegrity can be maintained Data independence can be providedData independence can be provided

Page 7: Rdbms

7

DBMS FunctionsDBMS Functions

Data DefinitionData Definition Data ManipulationData Manipulation Data Security and IntegrityData Security and Integrity Data Recovery and ConcurrencyData Recovery and Concurrency Data DictionaryData Dictionary Performance Performance

Page 8: Rdbms

8

Database SystemDatabase System

Stored Data Defn. Stored Database

Software to access stored data

Software to process queries/programs

DBMS

Software

Application Programs/Queries

Users

DATABASE

SYSTEM

(META-DATA).

Page 9: Rdbms

9

Database SystemDatabase System

user query Q1

Database scheme

Application program query

Q2

Query processor DDL compiler

Database manager

File manager

Physical database

Compiled query Q2 Database

description

Page 10: Rdbms

10

Categories of Data Categories of Data ModelsModels

ConceptuaConceptuall

PhysicalPhysical RepresentationRepresentationalal

Data ModelData Model A set of concepts used to describe the A set of concepts used to describe the

structure of a databasestructure of a database By structure, we mean the data types, By structure, we mean the data types,

relationships, and constraints that relationships, and constraints that should holds for the datashould holds for the data

Page 11: Rdbms

11

Database ArchitectureDatabase Architecture

Internal level(storage view)

Conceptual level(community user view)

External level(individual user views)

Database

Page 12: Rdbms

12

An example of the three An example of the three levelslevels

SNo FName LName Age Salary

SNo FName LName Age Salary

SNo LName BranchNo

struct STAFF { int staffNo; int branchNo; char fName[15]; char lName[15]; struct date dateOfBirth; float salary; struct STAFF *next; /* pointer to next Staff record */};index staffNo; index branchNo; /* define indexes for staff */

BranchNo

Conceptual View

External View1

External View2

Internal View

Page 13: Rdbms

13

SchemaSchema Schema: Description of data in terms of Schema: Description of data in terms of

a data modela data model Three-level DB Architecture defines Three-level DB Architecture defines

following schemas:following schemas: External Schema (or sub-schema)External Schema (or sub-schema)

Written using Written using external DDLexternal DDL Conceptual Schema (or schema)Conceptual Schema (or schema)

Written using Written using conceptual DDLconceptual DDL Internal SchemaInternal Schema

Written using Written using internal DDLinternal DDL or or storage structure storage structure definitiondefinition

Page 14: Rdbms

14

Data IndependenceData Independence Change the schema at one level of a database Change the schema at one level of a database

system without a need to change the schema system without a need to change the schema at the next higher levelat the next higher level Logical data independence: Refers to the immunity Logical data independence: Refers to the immunity

of the external schemas to changes in the of the external schemas to changes in the conceptual schema e.g., add new record or fieldconceptual schema e.g., add new record or field

Physical data independence: Refers to the Physical data independence: Refers to the immunity of the conceptual schema to changes in immunity of the conceptual schema to changes in the internal schema e.g., adding new index should the internal schema e.g., adding new index should not void existing onesnot void existing ones

Page 15: Rdbms

15

HIERARCHICAL

NETWORK

RELATIONAL

TABLEROW

COLUMN

VALUE

TYPES OF DATABASE TYPES OF DATABASE MODELSMODELS

Page 16: Rdbms

16

DATA ANALYSIS

Entities - Attributes - Relationships - Integrity Rules

LOGICAL DESIGN

Tables - Columns - Primary Keys - Foreign Keys

PHYSICAL DESIGN

DDL for Tablespaces, Tables, Indexes

DATABASE DESIGN DATABASE DESIGN PHASESPHASES

Page 17: Rdbms

Introduction to Introduction to Relational Relational Databases:Databases:

RDBMSRDBMS

Page 18: Rdbms

18

Definition : RDBMSDefinition : RDBMS

It is a system in which, at a minimum :It is a system in which, at a minimum :

The data is perceived by the user as tables ( The data is perceived by the user as tables (

and nothing but tables ); andand nothing but tables ); and

The operators at the user’s disposal - e.g., The operators at the user’s disposal - e.g.,

for data retrieval - are operators that for data retrieval - are operators that

generate new tables from old, and those generate new tables from old, and those

include at least SELECT, PROJECT, and include at least SELECT, PROJECT, and

JOIN.JOIN.

Page 19: Rdbms

19

Features of an RDBMSFeatures of an RDBMS

The ability to create multiple relations The ability to create multiple relations (tables) and enter data into them(tables) and enter data into them

An interactive query languageAn interactive query language Retrieval of information stored in Retrieval of information stored in

more than one tablemore than one table Provides a Catalog or Dictionary, Provides a Catalog or Dictionary,

which itself consists of tables ( called which itself consists of tables ( called systemsystem tables ) tables )

Page 20: Rdbms

20

Some Important TermsSome Important Terms

Relation : Relation : a tablea table

Tuple : Tuple : a row in a tablea row in a table

Attribute : Attribute : a Column in a tablea Column in a table

Degree : Degree : number of attributesnumber of attributes

Cardinality : Cardinality : number of tuplesnumber of tuples

Primary Key : Primary Key : a unique identifier for the tablea unique identifier for the table

Domain :Domain : a pool of values from which specific a pool of values from which specific

attributes of specific relations draw their valuesattributes of specific relations draw their values

Page 21: Rdbms

21

Properties of Relations Properties of Relations (Tables)(Tables)

There are no duplicate rows (tuples)There are no duplicate rows (tuples)

Tuples are unordered, top to bottomTuples are unordered, top to bottom

Attributes are unordered, left to rightAttributes are unordered, left to right

All attribute values are atomic ( or All attribute values are atomic ( or

scalar )scalar )

Relational databases do not allow Relational databases do not allow

repeating groupsrepeating groups

Page 22: Rdbms

22

KeysKeys KeyKey

Super KeySuper Key

Candidate KeysCandidate Keys Primary KeyPrimary Key

Alternate KeyAlternate Key

Secondary KeysSecondary Keys

Page 23: Rdbms

23

Keys and Referential Keys and Referential IntegrityIntegrity

sid cid grade

53666 carnatic101 C

53688 reggae203 B

53650 topology112 A

53666 history105 B

sid name age

53666 Jones 18

53688 Smith 18

53650 Smith 19

gpa

3.4

3.2

3.8

login

Jones@cs

Smith@eecs

Smith@math

Enrolled Student

Primary keyForeign key referring tosid of STUDENT relation

Page 24: Rdbms

24

Page 25: Rdbms

Relational Relational AlgebraAlgebra

Page 26: Rdbms

26

Relational Query Relational Query LanguagesLanguages

Query languages: Allow manipulation Query languages: Allow manipulation and retrieval of data from a database.and retrieval of data from a database.

Relational model supports simple, Relational model supports simple, powerful QLs:powerful QLs: Strong formal foundation based on Strong formal foundation based on

logic.logic. Allows for much optimization.Allows for much optimization.

Query Languages != programming Query Languages != programming languages!languages!

Page 27: Rdbms

27

Example InstancesExample Instancessid bid

22 101

58 103

day

10/10/99

11/12/99

sid sname age

22 Deepa 45.0

31 Laxmi 55.5

58 Roopa 35.0

rating

7

8

10

sid sname age

28 Yamuna 35.0

31 Laxmi 55.5

44 Geeta 35.0

rating

9

8

5

58 Roopa 35.010

R1

S1

S2

Page 28: Rdbms

28

Relational AlgebraRelational Algebra

Basic operations:Basic operations: Selection Selection (( ) ) Projection Projection (() ) Cross- product Cross- product ( ( ) ) Set- difference Set- difference ( –) ( –) Union Union (( ) )

Page 29: Rdbms

29

ProjectionProjection

sname

Yamuna

Laxmi

Geeta

rating

9

8

5

Roopa 10

age

35.0

sname, rating(S2)

age(S2)55.5

Page 30: Rdbms

30

SelectionSelection

sid sname age

28 Yamuna 35.0

rating

9

58 Roopa 35.010

rating > 8(S2)

sname

Yamuna

rating

9

Roopa 10 sname, rating(S2) (rating > 8(S2))

Page 31: Rdbms

31

Union, Intersection, Set Union, Intersection, Set DifferenceDifference

sid sname age

22 Deepa 45.0

31 Laxmi 55.5

58 Roopa 35.0

rating

7

8

10

44 Geeta 35.0

28 Yamuna 35.0

5

9

sid sname age

22 Deepa 45.0

rating

7

sid sname age

31 Laxmi 55.5

58 Roopa 35.0

rating

8

10

S1 S2

S1 S2

S1 S2

Page 32: Rdbms

32

Cross- ProductCross- Product

(sid) bid

22 101

58 103

day

10/10/99

11/12/99

(sid) sname age

22 Deepa 45.0

22 Deepa 45.0

31 Laxmi 55.5

rating

7

7

8

31 Laxmi 55.5

58 Roopa 35.0

58 Roopa 35.0

8

10

10

22 101 10/10/99

58 103 11/12/99

22 101 10/10/99

58 103 11/12/99

Page 33: Rdbms

33

JoinsJoins

Condition Join :

(sid) bid

22 101

58 103

day

10/10/99

11/12/99

(sid) sname age

22 Deepa 45.0

31 Laxmi 55.5

rating

7

8

Page 34: Rdbms

34

Equi-JoinEqui-Join

bid

101

103

day

10/10/99

11/12/99

(sid) sname age

22 Deepa 45.0

58 Roopa 35.0

rating

7

10

Page 35: Rdbms

35

DivisionDivision

sno pno

s1 p1

s1 p2

s1 p3

s1 p4

s2 p1

s2 p2

s3 p2

s4 p2

s4 p4

Apno

p2

pno

p1

p2

p4

pnop2p4

B1B2

B3snos1s2s3s4

sno

s1

s4

sno

s1

A/B1 A/B2 A/B3

•Not supported as a primitive operator, but useful for expressing queries like:

•Find sailors who have reserved all boats .

Page 36: Rdbms

36

Page 37: Rdbms

Introduction to Introduction to Query Query

OptimizationOptimization

Page 38: Rdbms

38

Processing A High-Processing A High-level Querylevel Query

Query in a high level language

Intermediate form of query

Execution plan

Code to execute the query

SCANING, PARSING AND VALIDATING

QUERY OPTIMIZER

QUERY CODE GENERATOR

Result of query

RUNTIME DATABASE PROCESSOR

Typical steps when processing a high level query.

Page 39: Rdbms

39

Two Main Techniques for Two Main Techniques for QueryQuery

OptimizationOptimization Heuristic Rules: A heuristic is a rule that works well Heuristic Rules: A heuristic is a rule that works well

in most of cases, but not always. General Idea:in most of cases, but not always. General Idea: Many different relational algebra expressions (and thus Many different relational algebra expressions (and thus

query trees) are equivalent.query trees) are equivalent. Transform the Transform the initial query tree initial query tree of a query into an of a query into an

equivalent equivalent final query tree final query tree that is efficient to execute.that is efficient to execute.

Cost based query optimizationCost based query optimization Estimate Estimate the cost for each execution plan, and choose the the cost for each execution plan, and choose the

one with the lowest cost.one with the lowest cost.

Can we get the best execution plan?Can we get the best execution plan?

Page 40: Rdbms

40

Motivating ExampleMotivating Example

select *from R1, R2, R3where R1.r2no=R2.r2noand R2.r3no=R3.r3noand R1.a=5000

NLJ

SS(R2) SS(R3)

NLJ

SS(R1, “a=5000”)

Page 41: Rdbms

41

Alternative Plans 1Alternative Plans 1(No (No Indexes)Indexes)

select *from R1, R2, R3where R1.r2no=R2.r2noand R2.r3no=R3.r3noand R1.a=5000

NLJ

SS(R1, “a=5000”) SS(R2)

NLJ

SS(R3)

Page 42: Rdbms

42

Alternative Plans 2 Alternative Plans 2 (With Indexes)(With Indexes)

select *from R1, R2, R3where R1.r2no=R2.r2noand R2.r3no=R3.r3noand R1.a=5000

NLJ

IS(R1, “a=5000”) SS(R2)

NLJ

SS(R3)

Page 43: Rdbms

43

Page 44: Rdbms

Conceptual Design Conceptual Design Using theUsing the

Entity- Entity- Relationship Relationship

ModelModel

Page 45: Rdbms

45

Overview of Database Overview of Database DesignDesign

Conceptual design : (ER Model is Conceptual design : (ER Model is used at this stage.)used at this stage.)

Schema Refinement : Schema Refinement : (Normalization)(Normalization)

Physical Database Design and Physical Database Design and Tuning Tuning

Page 46: Rdbms

46

E R ModelingE R Modeling

Conceptual Schema DesignConceptual Schema Design Relational Calculus Relational Calculus

- Formal Language for Relational D/B. - Formal Language for Relational D/B.

Relational Calculus

Predicate Calculus Domain Calculus

SQL / Tuple Based Query By Examples

Page 47: Rdbms

47

Design Phases…Design Phases…Requirements Collection

& Analysis

Data Requirements

Functional Requirements Conceptual Design

Logical Design

Physical Design

User Defined Operations Data Flow DiagramsSequence Diagrams, Scenarios

Entity Types, Constraints , RelationshipsNo Implementation Details.

Ensures Requirements Meets the Design

Data Model Mapping – Type of Database is identified

Internal Storage Structures / Access Path / File Organizations

Page 48: Rdbms

48

E-R ModelingE-R Modeling

EntityEntity is anything that exists and is is anything that exists and is

distinguishabledistinguishable Entity SetEntity Set

a group of similar entitiesa group of similar entities AttributeAttribute

properties that describe an entityproperties that describe an entity RelationshipRelationship

an association between entitiesan association between entities

Page 49: Rdbms

49

NotationsNotations

ENTITY TYPE ( REGULAR )

WEAK ENTITY TYPE

RELATIONSHIP TYPE

WEAK RELATIONSHIP TYPE

Page 50: Rdbms

50

CREATE TABLE Employees(ssn CHAR (11),name CHAR (20),lot INTEGER,PRIMARY KEY (ssn))

Employee

ssn name lotSSN NAME LOT

123- 22- 3666Attishoo 48

231- 31- 5368Smiley 22

131- 24- 3650Smethurst 35

Entity

Entity Set

Attributes

Page 51: Rdbms

51

Types of Relationships

student ID cardIs issued

students courseenrols in

students teststake

1 1

1M

M M

1:1

1:M

M:M

Page 52: Rdbms

52

ER Model

Department

did dname budgetsincesince

Works_inEmployee

ssn name lot

Reports_To

supervisor Sub-ordinate

Page 53: Rdbms

53

CREATE TABLE Works_ In(ssn CHAR (11),did INTEGER,since DATE,PRIMARY KEY (ssn, did),FOREIGN KEY (ssn)REFERENCES Employees,FOREIGN KEY (did)REFERENCES Departments)

SSN DID SINCE

123-22-3666 51 1/1/91

123-22-3666 56 3/3/93

231-31-5368 51 2/2/92

ER Model (Contd.)

Works_ In

Page 54: Rdbms

54

ManagesDepartment

did dname budgetsince

Employee

ssn name lot

Key Constraints

Page 55: Rdbms

55

Key Constraints for Ternary Relationships

Department

did dnamesince

Works_inEmployee

ssn name lotbudget

Location

capacityaddress

Page 56: Rdbms

56

Participation Constraints

Department

did dname budgetsince

ManagesEmployee

ssn name lot

Works_in

since

Page 57: Rdbms

57

policyDependent

pnameagecost

Employee

ssn name lot

Weak Entities

Page 58: Rdbms

58

ISA (‘is a’) Hierarchies

Employee

ssn name lot

Hourly_Emp

Hrs_worked

Hrly_wages

Contract_Emp

contractidIsA

Page 59: Rdbms

59

Employee

ssn name lot

monitors

project

pid pbudget Started on

department

did dname budget

sponsors

until

Aggregation

Page 60: Rdbms

60

Works_ In does not allow an employee to work in a department for two or more periods (why?)

Entity vs. Attribute

Works_inDepartment

did dname budgetfrom

Employee

ssn name lot to

Page 61: Rdbms

61

Entity vs. Attribute (Contd.)

Works_inDepartment

did dname budget

from

Employee

ssn name lot

toDuration

Page 62: Rdbms

62

managesDepartment

did dname budgetsince

Employee

ssn name lot DB

DB - Dbudget

Entity vs. Relationship

Page 63: Rdbms

63

managesDepartment

did dname budget

since

Employee

ssn name lot

DBudgetMgr_apptAppt num

Entity vs. Relationship

Page 64: Rdbms

64

Dependent

pname age

cost

Employee

ssn name lot

covers

Policy

policyid

Binary vs. Ternary Relationships

Page 65: Rdbms

65

Dependent

pnameage

cost

Employee

ssn name lot

Beneficiary

Policypolicyid

Better Design

purchaser

Binary vs. Ternary Relationships

Page 66: Rdbms

66

• Some constraints cannot be captured in ER diagrams:

• Functional dependencies

• Inclusion dependencies

• General constraints

Constraints Beyond the ER Model

Page 67: Rdbms

67

E-R DiagramE-R Diagram

DEPARTMENT

DEPT_EMP

EMPLOYEE

EMP_DEP

DEPENDENT

PROJ_WORK

PROJ_MGR

PROJECT

SUPPLIER

SUPP_PART_PROJ

PART

PART_STRUCTURE

SUPP_PART

MM

M

M

M

M

M

M

M M

M

M

1

1 1

Page 68: Rdbms

68

Example to Start with ….Example to Start with ….

An Example Database Application An Example Database Application called COMPANY which serves to called COMPANY which serves to illustrate the ER Model concepts and illustrate the ER Model concepts and their schema design.their schema design.

The following are collection from the The following are collection from the Client.Client.

Page 69: Rdbms

69

Analysis…Analysis…

Company :Company :Organized into Departments, Each Organized into Departments, Each Department has a name, no and Department has a name, no and manager who manages the manager who manages the department. The Company keeps department. The Company keeps track of the date that employee track of the date that employee managing the department. A managing the department. A Department may have a Several Department may have a Several locations.locations.

Page 70: Rdbms

70

Analysis…Analysis…

Department :Department :A Department controls a number of Projects A Department controls a number of Projects each of which has a unique name , no and a each of which has a unique name , no and a single Location.single Location.

Employee :Employee :Name, Age, Gender, BirthDate, SSN, Name, Age, Gender, BirthDate, SSN, Address, Salary. An Employee is assigned to Address, Salary. An Employee is assigned to one department, may work on several one department, may work on several projects which are not controlled by the projects which are not controlled by the department. Track of the number of hours department. Track of the number of hours per week is also controlled.per week is also controlled.

Page 71: Rdbms

71

Analysis….Analysis….

Keep track of the dependents of Keep track of the dependents of each employee for insurance policies each employee for insurance policies : We keep each dependant first : We keep each dependant first name, gender, Date of birth and name, gender, Date of birth and relationship to the employee.relationship to the employee.

Page 72: Rdbms

72

Now to our Company…Now to our Company…

DEPARTMENT ( Name , Number , { Locations } , Manager, Start Date )

PROJECT( Name, Number, Location , Controlling Department )

EMPLOYEE(Name (Fname, Lname) , SSN , Gender, Address, Salary

Birthdate, Department , Supervisor , (Workson ( Project , Hrs))

DEPENDENT ( Employee, Name, Gender, Birthdate , Relationship )

Page 73: Rdbms

73

Example …Example …

Manage:Manage: Department and Employee Department and Employee Partial Partial ParticipationParticipation

Relation Attribute : StartDate.Relation Attribute : StartDate. Works For:Works For:

Department and EmployeeDepartment and Employee Total ParticipationTotal Participation

Page 74: Rdbms

74

Example…Example…

Control :Control : Department , ProjectDepartment , Project Partial Participation from Department Partial Participation from Department Total Participation from ProjectTotal Participation from Project Control Department is a RKA.Control Department is a RKA.

Supervisor :Supervisor : Employee, EmployeeEmployee, Employee Partial and RecursivePartial and Recursive

Page 75: Rdbms

75

Example …Example …

Works – On :Works – On : Project , EmployeeProject , Employee Total ParticipationTotal Participation Hours Worked is a RKA.Hours Worked is a RKA.

Dependants of:Dependants of: Employee , DependantEmployee , Dependant Dependant is a WeakerDependant is a Weaker Dependant is Total , Employee is Dependant is Total , Employee is

Partial.Partial.

Page 76: Rdbms

76

One Possible mapping of the One Possible mapping of the Problem Statement Problem Statement

Works For Department

Name No Loc

Controls

Project

Name No Loc

WorksOn

manages

Sdate

Hours

Depend On

Name Sex Bdate

Relationship

Supervise

s

Employee Address

Fname

SexSSN

Name

Bdate

Sal

Lname

Dependent

Page 77: Rdbms

77

Page 78: Rdbms

78

Page 79: Rdbms

79

Page 80: Rdbms

80

Page 81: Rdbms

Schema Schema Refinement andRefinement andNormalizationNormalization

Page 82: Rdbms

82

Normalization and Normalization and Normal FormsNormal Forms

Normalization:Normalization: DecomposingDecomposing a larger, complex table into several a larger, complex table into several

smaller, simpler ones.smaller, simpler ones. Move from a lower Move from a lower normal formnormal form to a higher to a higher

Normal form.Normal form. Normal Forms:Normal Forms:

First Normal Form (1NF)First Normal Form (1NF) Second Normal Form (2NF)Second Normal Form (2NF) Third Normal Form (3NF)Third Normal Form (3NF) *Higher Normal Forms (BCNF, 4NF, 5NF ....)*Higher Normal Forms (BCNF, 4NF, 5NF ....)

In practice, 3NF is often good enough.In practice, 3NF is often good enough.

Page 83: Rdbms

83

Why Normal FormsWhy Normal Forms

The first question to ask is whether The first question to ask is whether

any refinement is needed!any refinement is needed!

If a relation is in a certain normal If a relation is in a certain normal

form (BCNF, 3NF etc.), it is known form (BCNF, 3NF etc.), it is known

that certain kinds of problems are that certain kinds of problems are

avoided/ minimized. This can be used avoided/ minimized. This can be used

to help us decide whether to help us decide whether

decomposing the relation will help.decomposing the relation will help.

Page 84: Rdbms

84

The Evils of RedundancyThe Evils of Redundancy

Redundancy is at the root of several Redundancy is at the root of several problems associated with relational problems associated with relational schemasschemas

More seriously, data redundancy causes More seriously, data redundancy causes several anomalies: insert, update, deleteseveral anomalies: insert, update, delete

Wastage of storage.Wastage of storage. Main refinement technique: Main refinement technique:

decomposition (replacing ABCD with, decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD).say, AB and BCD, or ACD and ABD).

Page 85: Rdbms

85

Refining an ER Diagram - Refining an ER Diagram - BeforeBefore

Department

did dname budgetsince

Works_inEmployee

ssn name lot

Page 86: Rdbms

86

Refining an ER Diagram - Refining an ER Diagram - AfterAfter

Works_in

since

Employee

ssn name

lot

Department

did dname budget

Page 87: Rdbms

87

First Normal FormFirst Normal Form A table is in 1NF, if every row contains exactly one A table is in 1NF, if every row contains exactly one

value for each attribute.value for each attribute. Disallow multivalued attributes, composite attributes Disallow multivalued attributes, composite attributes

and their combinations.and their combinations. 1NF states that :1NF states that :

domains of attributes must include only atomic (simple, domains of attributes must include only atomic (simple, indivisible) values and that value of any attribute in a tuple indivisible) values and that value of any attribute in a tuple must be a single value from the domain of that attribute.must be a single value from the domain of that attribute.

By definition, any relational table must be in 1NF.By definition, any relational table must be in 1NF.

Page 88: Rdbms

88

Functional Dependencies Functional Dependencies (FDs)(FDs)

Provide a formal mechanism to Provide a formal mechanism to

express constraints between express constraints between

attributes attributes

Given a relation R, attribute Y of R is Given a relation R, attribute Y of R is

functionally dependent on the functionally dependent on the

attribute X of R if & only if each X-attribute X of R if & only if each X-

value in R has associated with it value in R has associated with it

precisely one Y-value in R.precisely one Y-value in R.

Page 89: Rdbms

89

Full DependencyFull Dependency

Concept of full functional Concept of full functional

dependencydependency A FD x A FD x y y is a full functional is a full functional

dependency if removal of any attribute dependency if removal of any attribute

A from X means that the dependency A from X means that the dependency

does not hold any more.does not hold any more.

Page 90: Rdbms

90

Partial DependencyPartial Dependency

An F.D. x An F.D. x y is a partial dependency y is a partial dependency

if there is some attribute A if there is some attribute A X that X that can be removed from X and the can be removed from X and the dependency will still hold.dependency will still hold.

Page 91: Rdbms

91

Example: Constraints on Example: Constraints on Entity SetEntity Set

123- 22- 3666 Attishoo231- 31- 5368131- 24- 3650434- 26- 3751612- 67- 4134

SmileySmethurstGulduMadayan

4822353535

88558

1010

77

10

4030303240

S N L R W H

58

710

R W123- 22- 3666 Attishoo231- 31- 5368131- 24- 3650434- 26- 3751612- 67- 4134

SmileySmethurstGulduMadayan

4822353535

S N L4030303240

H8R

85

58

Page 92: Rdbms

92

Second Normal Form Second Normal Form (2NF)(2NF)

A relation schema R is in 2NF if:A relation schema R is in 2NF if: it is in 1NF andit is in 1NF and

every non-prime attribute A in R is fully every non-prime attribute A in R is fully

functionally dependent on the primary functionally dependent on the primary

key of R.key of R.

2NF prohibits 2NF prohibits partial dependenciespartial dependencies..

Page 93: Rdbms

93

2NF: An Example2NF: An Example Emp{Eno, Dept, ProjCode, Hours}Emp{Eno, Dept, ProjCode, Hours}

Primary key: {Eno, ProjCode}Primary key: {Eno, ProjCode} {Eno} -> {Dept}, {Eno, ProjCode} -> {Hours}{Eno} -> {Dept}, {Eno, ProjCode} -> {Hours}

Test of 2NFTest of 2NF {Eno} -> {Dept}: {Eno} -> {Dept}: partial dependency.partial dependency. Emp is in 1NF, but not in 2NF.Emp is in 1NF, but not in 2NF.

Decomposition:Decomposition: Emp {Emp {EnoEno, Dept}, Dept} Proj {Proj {Eno, ProjCodeEno, ProjCode, Hours}, Hours}

Page 94: Rdbms

94

Transitive DependencyTransitive Dependency

An FD X An FD X Y in a relation schema R Y in a relation schema R

is a transitive dependency if is a transitive dependency if there is a set of attributes Z that is not there is a set of attributes Z that is not

a subset of any key of R, and a subset of any key of R, and both X both X Z and Z Z and Z Y hold. Y hold.

Page 95: Rdbms

95

Third Normal FormThird Normal Form A relation schema R is in 3NF if A relation schema R is in 3NF if

It is in 2NF and It is in 2NF and

No nonprime attribute of R is transitively No nonprime attribute of R is transitively

dependent on the primary key.dependent on the primary key.

3NF means that each non-key attribute value in any tuple 3NF means that each non-key attribute value in any tuple is truly dependent on the Primary Key and not even is truly dependent on the Primary Key and not even partially on other attributes.partially on other attributes.

3NF prohibits 3NF prohibits transitive dependenciestransitive dependencies..

Page 96: Rdbms

96

3NF: An Example3NF: An Example Emp{Eno, Dept, Dept_Head}Emp{Eno, Dept, Dept_Head}

Primary key: {Eno}Primary key: {Eno} {Eno} -> {Dept}, {Dept} -> {Dept_Head}{Eno} -> {Dept}, {Dept} -> {Dept_Head}

Test of 3NFTest of 3NF {Eno} -> {Dept} -> {Dept_Head}: Transitive {Eno} -> {Dept} -> {Dept_Head}: Transitive

dependency.dependency. Emp is in 2NF, but not in 3NF.Emp is in 2NF, but not in 3NF.

Decomposition:Decomposition: Emp {Emp {Eno, DeptEno, Dept}} Dept {Dept, Dept_Head}Dept {Dept, Dept_Head}

Page 97: Rdbms

97

Boyce –Codd Normal Boyce –Codd Normal FormForm

The intention of BCNF is that- 3NF The intention of BCNF is that- 3NF does not satisfactorily handle the does not satisfactorily handle the case of a relation processing two or case of a relation processing two or more composite or overlapping more composite or overlapping candidate keys candidate keys

Page 98: Rdbms

98

BCNF ( Boyce Codd BCNF ( Boyce Codd Normal Form)Normal Form)

A Relation is said to be in Boyce A Relation is said to be in Boyce Codd Normal Form (BCNF) if and Codd Normal Form (BCNF) if and only if every determinant is a only if every determinant is a candidate key.candidate key.

Page 99: Rdbms

99

Decomposition of a Decomposition of a Relation SchemeRelation Scheme

Suppose that relation R contains Suppose that relation R contains attributes A1 ... An. A decomposition attributes A1 ... An. A decomposition of R consists of replacing R by two of R consists of replacing R by two or more relations such that:or more relations such that: Each new relation scheme contains a Each new relation scheme contains a

subset of the attributes of R (and no subset of the attributes of R (and no attributes that do not appear in R), andattributes that do not appear in R), and

Every attribute of R appears as an Every attribute of R appears as an attribute of one of the new relations.attribute of one of the new relations.

Page 100: Rdbms

100

Page 101: Rdbms

101

Page 102: Rdbms

102

Page 103: Rdbms

103

Page 104: Rdbms

104

Page 105: Rdbms

105

Page 106: Rdbms

106

Page 107: Rdbms

Transaction, Transaction, Concurrency Concurrency Control and Control and

RecoveryRecovery

Page 108: Rdbms

108

TransactionTransaction

A sequence of many actions which A sequence of many actions which are considered to be one atomic unit are considered to be one atomic unit of work.of work. Read, write, commit, abortRead, write, commit, abort

Governed by four ACID properties:Governed by four ACID properties: AAtomicity, tomicity, CConsistency, onsistency, IIsolation, solation,

DDurabilityurability Has a unique starting point, some Has a unique starting point, some

actions and one end pointactions and one end point

Page 109: Rdbms

109

The ACID PropertiesThe ACID Properties

A tomicity: All actions in the A tomicity: All actions in the transaction happen, or none happen.transaction happen, or none happen.

C onsistency: If each transaction is C onsistency: If each transaction is consistent, and the DB starts consistent, and the DB starts consistent, it ends up consistent.consistent, it ends up consistent.

I solation: Execution of one I solation: Execution of one transaction is isolated from that of transaction is isolated from that of other transactions.other transactions.

D urability: If a transaction commits, D urability: If a transaction commits, its effects persist.its effects persist.

Page 110: Rdbms

110

AutomicityAutomicity All-or-nothing, no partial results. An event either happens All-or-nothing, no partial results. An event either happens

and is committed or fails and is rolled back.and is committed or fails and is rolled back. e.g. in a money transfer, debit one account, credit the e.g. in a money transfer, debit one account, credit the

other. Either both debiting and crediting operations other. Either both debiting and crediting operations succeed, or neither of them do.succeed, or neither of them do.

Transaction failure is called AbortTransaction failure is called Abort Commit and abort are irrevocable actions. There is no undo Commit and abort are irrevocable actions. There is no undo

for these actions.for these actions. An Abort undoes operations that have already been An Abort undoes operations that have already been

executedexecuted For database operations, restore the data’s previous For database operations, restore the data’s previous

value from before the transaction (Rollback-it); a value from before the transaction (Rollback-it); a Rollback command will undo all actions taken since the Rollback command will undo all actions taken since the last commit for that user.last commit for that user.

But some real world operations are not undoable.But some real world operations are not undoable.Examples - transfer money, print ticket, fire missileExamples - transfer money, print ticket, fire missile

Page 111: Rdbms

111

ConsistencyConsistency Every transaction should maintain DB consistencyEvery transaction should maintain DB consistency

Referential integrity - e.g. each order Referential integrity - e.g. each order references an existing customer number and references an existing customer number and existing part numbersexisting part numbers

The books balance (debits = credits, assets = The books balance (debits = credits, assets = liabilities)liabilities)

Consistency preservation is a property of a Consistency preservation is a property of a transaction, not of the database mechanisms for transaction, not of the database mechanisms for controlling it (unlike the A, I, and D of ACID)controlling it (unlike the A, I, and D of ACID)

If each transaction maintains consistency, If each transaction maintains consistency, then a serial execution of transactions does alsothen a serial execution of transactions does also

Page 112: Rdbms

112

IsolationIsolationIntuitively, the effect of a set of transactions should Intuitively, the effect of a set of transactions should be the same as if they ran independently.be the same as if they ran independently. Formally, an interleaved execution of Formally, an interleaved execution of

transactions is serializable if its effect is transactions is serializable if its effect is equivalent to a serial one.equivalent to a serial one.

Implies a user view where the system runs each Implies a user view where the system runs each user’s transaction stand-alone.user’s transaction stand-alone.

Of course, transactions in fact run with lots of Of course, transactions in fact run with lots of concurrency, to use device parallelism – this will concurrency, to use device parallelism – this will be covered later.be covered later.

Transactions can use common data (shared data)Transactions can use common data (shared data) They can use the same data processing They can use the same data processing

mechanismsmechanisms (time sharing)(time sharing)

Page 113: Rdbms

113

DurabilityDurability When a transaction commits, its results will survive When a transaction commits, its results will survive

failures (e.g. of the application, OS, DB system … failures (e.g. of the application, OS, DB system … even of the disk).even of the disk).

Makes it possible for a transaction to be a legal Makes it possible for a transaction to be a legal contract.contract.

Implementation is usually via a logImplementation is usually via a log DB system writes all transaction updates to a log DB system writes all transaction updates to a log

filefile to commit, it adds a record “commit(Ti)” to the logto commit, it adds a record “commit(Ti)” to the log when the commit record is on disk, the transaction when the commit record is on disk, the transaction

is committed.is committed. system waits for disk ack before acknowledging to system waits for disk ack before acknowledging to

useruser

Page 114: Rdbms

114

Transaction processingTransaction processing

Can be automatic (controlled by the Can be automatic (controlled by the RDBMS) or programmatic RDBMS) or programmatic (programmed using SQL or other (programmed using SQL or other supported programming languages, supported programming languages, like PL/SQL)like PL/SQL)

Page 115: Rdbms

115

Why Have Concurrent Why Have Concurrent Processes?Processes?

Better transaction throughputBetter transaction throughput Improved response time Improved response time Done via better utilization of Done via better utilization of

resources:resources: While one processes is doing a disk While one processes is doing a disk

read, another can be using the CPU or read, another can be using the CPU or reading another disk.reading another disk.

Page 116: Rdbms

116

Typical situations requiring Typical situations requiring

concurrency control concurrency control Exclusive access to an external device or Exclusive access to an external device or

shared service (e.g., manshared service (e.g., manaaging printer ging printer queues)queues)

Coordination of applications which process Coordination of applications which process parallel parallel data (e.g. parallel DB servers)data (e.g. parallel DB servers)

Disabling or enabling execution of the client Disabling or enabling execution of the client programs in a specific moment (typically for programs in a specific moment (typically for database administration - e.g. database database administration - e.g. database backups, enforcing resource occupation, etc.)backups, enforcing resource occupation, etc.)

Detection of transaction ends when managing Detection of transaction ends when managing multiple sessions for connection to the multiple sessions for connection to the database (client/server architectures, Web database (client/server architectures, Web access)access)

Page 117: Rdbms

117

Problems with Concurrency (in Problems with Concurrency (in absence of locking)absence of locking)

Lost Update problem - losing values Lost Update problem - losing values due to intervention of write operation due to intervention of write operation from other overlapping transactionsfrom other overlapping transactions

Temporary Update problem - Temporary Update problem - discarding previous changes made by discarding previous changes made by overlapping transaction after rollbackoverlapping transaction after rollback

Incorrect Summary problem - Incorrect Summary problem - overwriting of certain overwriting of certain

values used for calculation by write values used for calculation by write operations from other transactionsoperations from other transactions

Page 118: Rdbms

118

Lost Update ProblemLost Update Problem

Time

T0

Transaction A

Transaction B

Value

Start A 6

T1Read Value

(6)6

T2 Add 2 (6+2=8) Read Value(6)

6

T3 Write Value (8)

Add 3 (6+3=9)

8

T4 End A Write Value (9)

9

Start B

What should the final Order Value be?What should the final Order Value be?

Which Update has been lost?Which Update has been lost?

T5 End B9

Page 119: Rdbms

119

Temporary Update ProblemTemporary Update ProblemTime

T0

Transaction A Transaction B

Value

Start A 6

T1Read Value (6) 6

T2 Add 2 (8) 6

T3 Write Value (8)

8

T4 Failure: Rollback!

8 Read Value (8)

Start B

T5 Write Value (6) Add 3 (8+3=11)

6

Write Value (11)

T6 End A 11

What should the final Order Value be?What should the final Order Value be? Where is the temporary update?Where is the temporary update?

T5 End B11

Page 120: Rdbms

120

Incorrect Summary ProblemIncorrect Summary Problem

Time

T0

Transaction A

Transaction BValues

T1

Read 1st Value (6)

63

T2

Add 2 (6+2=8)63

T3

Write 1st Value (8)

83

T4

83

T5

Add 2 (3+2 = 5)83

Write 2nd Value (5)

85

Read 2nd Value (3)

Read 1st Value (8)

Read 2nd Value (3)

Total Sum = 11

What should the total Order Value be? What should the total Order Value be? Which order was accumulated before update, and which after?Which order was accumulated before update, and which after?

Page 121: Rdbms

121

3.1 Database State and Changes3.1 Database State and Changes

D1, D2 - Logically consistent states of the database data

T - Transaction for changing the databaset1, t2 - Absolute time before and after the transaction

State D1 State D2

T

t1 t2

Page 122: Rdbms

122

active partially committed committed

aborted terminated

BEGIN

READ , WRITE

END

ROLLBACKROLLBACK

COMMIT

3.2 Transaction State and 3.2 Transaction State and ProgressProgress

A transaction reaches its commit point when all operations accessing the database are completed and the result has been recorded in the log. It then writes a [commit, <transaction-id>] and terminates.

When a system failure occurs, search the log file for entries[start, <transaction-id>]

and if there are no logged entries [commit, <transaction-id>]then undo all operations that have logged entries

[write, <transaction-id>, X, old_value, new_value]

Page 123: Rdbms

123

SchedulesSchedules

T1T1 T2T2R(A)R(A)W(A)W(A)

R(B)R(B)W(B)W(B)

R(C)R(C)W(C)W(C)

• Schedule: Actions of transactions as seen by the DBMS

Page 124: Rdbms

124

Serializable ScheduleSerializable Schedule

A schedule whose effect on the DB A schedule whose effect on the DB

“state” is the same as that of some “state” is the same as that of some

serial scheduleserial schedule

All serial schedules are serializableAll serial schedules are serializable

But the reverse may not be trueBut the reverse may not be true

Page 125: Rdbms

125

Serializability ViolationsSerializability Violations

T1T1 T2T2R(A)R(A)W(A)W(A)

R(A)R(A)W(A)W(A)R(B)R(B)W(B)W(B)

commitcommitR(B)R(B)W(B)W(B)

commitcommit

Database is Database is inconsistent!inconsistent!

Transfer Transfer Rs.10,000 Rs.10,000 from A to Bfrom A to B

Add 6% Add 6% interest to interest to A & BA & B

Page 126: Rdbms

126

Cascading AbortsCascading Aborts

T1T1 T2T2

R(A)R(A)

W(A)W(A)

R(A)R(A)

W(A)W(A)

abortabort

Page 127: Rdbms

127

Recoverable SchedulesRecoverable Schedules

T1T1 T2T2

R(A)R(A)

W(A)W(A)

R(A)R(A)

W(A)W(A)

commitcommit

abortabort

T1T1 T2T2

R(A)R(A)

W(A)W(A)

R(A)R(A)

W(A)W(A)

commitcommit

commitcommit

Unrecoverable Schedule Recoverable Schedule

Page 128: Rdbms

128

LockingLocking The concept of locking data items is one of the main The concept of locking data items is one of the main

techniques for controlling the concurrent execution of techniques for controlling the concurrent execution of transactions.transactions.

A lock is a variable associated with a data item in the A lock is a variable associated with a data item in the database. database. Generally there is a lock for each data item in the Generally there is a lock for each data item in the

database.database. A lock describes the status of the data item with respect A lock describes the status of the data item with respect

to possible operations that can be applied to that item to possible operations that can be applied to that item used for synchronising the access by concurrent used for synchronising the access by concurrent

transactions to the database items.transactions to the database items. A transaction locks an object before using itA transaction locks an object before using it When an object is locked by another transaction, the When an object is locked by another transaction, the

requesting transaction must waitrequesting transaction must wait

Page 129: Rdbms

129

Locking GranularityLocking Granularity A database item which can be locked could be A database item which can be locked could be

a database recorda database record a field value of a database recorda field value of a database record the whole databasethe whole database

Trade-offsTrade-offs coarse granularitycoarse granularity

the larger the data item size, the lower the the larger the data item size, the lower the degree of concurrencydegree of concurrency

fine granularityfine granularity the smaller the data item size, the more locks the smaller the data item size, the more locks

to be managed and stored, and the more to be managed and stored, and the more lock/unlock operations needed.lock/unlock operations needed.

Page 130: Rdbms

130

Locking: A Technique for Locking: A Technique for Concurrency ControlConcurrency Control

---- SS XX

---- SS XX

Compatibility matrix for lock types X and S

S: Shared lockX: Exclusive lock-- No lock

•Locks are automatically obtained by DBMS.•Guarantees serializability!

Page 131: Rdbms

131

Two- Phase Locking (2PL)Two- Phase Locking (2PL)

Strict 2PL:– If T wants to read an object, first obtains an S lock.– If T wants to modify an object, first obtains X lock.– Hold all locks until end of transaction.– Guarantees serializability, and recoverable schedule, too!

also avoids WW problems!2PL:– Slight variant of strict 2PL– transactions can release locks before the end (commit or abort)

But after releasing any lock it can acquire no new locks– Guarantees serializability

Page 132: Rdbms

132

Handling a Lock RequestHandling a Lock Request

Lock Request (XID, OID, Mode)Lock Request (XID, OID, Mode)

Currently Locked?Currently Locked? Empty Wait Queue?Empty Wait Queue?

Currently X-locked?Currently X-locked?

Put on QueuePut on Queue

Grant LockGrant Lock

Mode==X Mode==S

No

No

No

Yes

Yes

Yes

Page 133: Rdbms

133

Page 134: Rdbms

134

RecoveryRecovery

Occurs in case of transaction failures.Occurs in case of transaction failures.

Database (DB) is restored to the most Database (DB) is restored to the most recent consistent state just before the time recent consistent state just before the time of failure.of failure.

To do this, the DB system needs To do this, the DB system needs information about changes applied by information about changes applied by various transactions. It is the various transactions. It is the system logsystem log..

Page 135: Rdbms

135

Recovery: MotivationRecovery: Motivation

T1T1

T2T2

T3T3

T4T4

T5T5

crashcrash

•Atomicity: Undoing actions of transaction that do not commit•Durability: Making sure all actions of committed transactions survive system crashes•The Recovery Manager guarantees Atomicity & Durability.

Page 136: Rdbms

136

Recovery OutlineRecovery Outline Restore to most recent “consistent” state just Restore to most recent “consistent” state just

before time of failurebefore time of failure Use data in Use data in the the log log filefile

Catastrophic FailureCatastrophic Failure Restore database from backupRestore database from backup Replay transactions from Replay transactions from loglog file file

Database becomes inconsistent (non-Database becomes inconsistent (non-catastrophic errors)catastrophic errors) Undo or Redo last transactions until Undo or Redo last transactions until cconsistent state onsistent state

is restoredis restored

Page 137: Rdbms

137

LoggingLogging

Record REDO and UNDO Record REDO and UNDO

information, for every update, in a information, for every update, in a

log.log.

– – Sequential writes to log (put it on a Sequential writes to log (put it on a

separate disk).separate disk).

– – Minimal info (diff) written to log, so Minimal info (diff) written to log, so

multiple updates fit in a single log page.multiple updates fit in a single log page.

Page 138: Rdbms

138

Handling the Buffer PoolHandling the Buffer Pool

DesiredDesired

TrivialTrivial

• When is buffer written back to disk?• Steal/No-steal

Can it be written before commit? (steal)Or does it have to wait till after commit? (no-steal)

• Force/No-forceIs it written “immediately” after commit? (force)Or can it remain in memory? (no-force)

NoStealNoSteal StealSteal

NoForceNoForce

ForceForce

Page 139: Rdbms

139

Write- Ahead Logging Write- Ahead Logging (WAL)(WAL)

The Write- Ahead Logging Protocol:The Write- Ahead Logging Protocol: Must force the log record for an update Must force the log record for an update

before the corresponding data page gets to before the corresponding data page gets to

disk.disk.

Must write all log records for a transaction Must write all log records for a transaction

before commit .before commit .

What goes into log:What goes into log: BFIM needed for UNDO type algorithmsBFIM needed for UNDO type algorithms

AFIM needed for REDO type algorithms AFIM needed for REDO type algorithms

Page 140: Rdbms

140

Checkpoints in the System Checkpoints in the System LogLog

Checkpoint record written in log when all updated DB Checkpoint record written in log when all updated DB buffers written out to diskbuffers written out to disk

Any committed transaction occurring before checkpoint Any committed transaction occurring before checkpoint in log can be considered permanent (won’t have to be in log can be considered permanent (won’t have to be redone after crash)redone after crash)

ActionsActions suspend execution of all transactionssuspend execution of all transactions force-write all modified buffers force-write all modified buffers tto disko disk write checkpoint entry in log and force write logwrite checkpoint entry in log and force write log resume transactionsresume transactions

Fuzzy checkpointingFuzzy checkpointing resume transactions as soon as buffers writtenresume transactions as soon as buffers written

Page 141: Rdbms

141

Page 142: Rdbms

142