yazici database concepts ek
Post on 15-Jan-2016
225 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
DATABASE MODELING
Prof.Dr. Adnan YAZICIDepartment of Computer Engineering
Middle East Technical University (METU)
Ankara - Turkey(February 2008)
2
CONTENT
� Introduction� Data Modeling� Database Design� Database Models� EER Database Model� Relational Database Model� Nested Relational Database Model� Object-Relation Database Model� Object-Oriented Database Model
3
Models of Reality
REALITY
• structures
• processes
DATABASE MNG SYSTEM
DATABASE
DML
DDL
l A database is a model of structures of reality
l The use of a database reflect processes of realityl Ex: data collected, maintained, and used in airline reservation system
l A database management system is a software systemwhich supports the definition and the use of a database
l DDL: Data Definition Language
l DML: Data Manipulation Language
4
Why Use Models?� Models can be useful when we want to examine or
manage part of the real world.
� The costs of using a model are often considerably lower than the costs of using or experimenting with the real world itself.
� Examples:� airplane simulator
� nuclear power plant simulator
� flood warning system
� model of Turkish economy
� model of a heat reservoir
� map
5
Definition
Manipulation
Construction
• We define the database
• Initial state when database ispopulated (loaded)
• Current state changes with each insert, delete, update
• Hopefully, the database goes from one correct/valid state toanother
Database architecture
6
Three-schema architecture
External
View-1External
View-n
Conceptual
Schema
Internal
Schema
Physical storage
structures and
details
Describes the
whole database
for all users
A specific user or
groups view of the
database
Data residing in
disks
7
Use a DBMS when this is important
� persistent storage of data
� centralized control of data
� control of redundancy
� control of consistency and integrity
� multiple user support
� sharing of data
� data documentation
� data independence
� control of access and security
� backup and recovery
Do not use a DBMS when
l the initial investment in hardware, software, and training is high
l the generality a DBMS provides is not needed
l the overhead for security, concurrency control, and recovery is high
l data and applications are simple and stable
l real-time requirementscannot be met by it
l multiple user access is not needed
8
Defn: Ability to change the database at one level with no impact to the next higher level
1.Physical data independence - the ability to change the physical schema without affecting the conceptual schema• Example: add a new index structure
2.Logical data independence - the ability to change the conceptual schema without affecting existing external views or application programs• Example: add an attribute
Data Independence
9
Data Modeling
REALITY
• structures
• processes
DATABASE MNG SYSTEM
DB MODEL
data modeling
l The database mdatabase mdatabase mdatabase modelodelodelodel represents a perception of structures of reality
l The data modelingdata modelingdata modelingdata modeling process is to fix a perception of structures of reality and represent this perception
l In the data modeling process we selectselectselectselect aspects and we abstractabstractabstractabstract
10
Process Modeling
REALITY
• structures
• processes
DATABASE MNG SYSTEM
DB MODELprocess modeling
l The use of theuse of theuse of theuse of the modelmodelmodelmodel reflects processes of reality
l Processes may be represented by adadadad----hoc database hoc database hoc database hoc database queriesqueriesqueriesqueries and updates at run-time
l Processes may be represented by programs with programs with programs with programs with embedded database queriesembedded database queriesembedded database queriesembedded database queries and updates
DML DML
PROG
11
Database Design
� is a model of structures of reality
� supports queries and updates modeling processes of reality
� runs efficiently
The purpose of database design is to create a database which
12
Abstraction
� Classification
� Aggregation
� Generalization
It is very important that the language used for data representation supports abstraction
We will discuss three kinds of abstraction:
13
Classification
In a classification we form a concept in a way which allows us to decide whether or not a given phenomena is a member of the extension of the concept.
Ali Ed Veli ... Liz Joe Can
CUSTOMER
14
Aggregation
In an aggregation we form a concept from existing concepts. The phenomena that are members of the new concept’s extension are composed of phenomena from the extensions of the existing concepts.
AIRPLANE
COCKPIT
ENGINE
WING
15
Generalization
In a generalization we form a new concept by emphasizing common aspects of existing concepts, leaving out special aspects
EMPLOYEE
ENGINEERMANAGERSECRETARY
16
Generalization (cont.)
Subclasses may overlap
EMPLOYEE
MANAGERENGINEER
Subclasses may have multiple superclasses
MOTORIZED
VEHICLES
AIRBORNE
VEHICLES
TRUCKS HELICOPTERS GLIDERS
17
Relationships Between Abstractions
Abstraction Concretization
classification exemplificationaggregation decompositiongeneralization specialization
T TT
OO O
aggregation generalization
clas
sifi
cati
on intension
extension
18
Data Model
� data structures
� integrity constraints
� operations
A data modeldata modeldata modeldata model consists of notations for expressing:
19
Data Model - Data Structures
� attribute types
� entity types
� relationship types
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
DEPT-AIRPORT
FLIGHT# AIRPORT-CODE
101 atl
912 cph
545 lax
All data models have notation for defining:
20
Data Model - Constraints
� Static constraints apply to database state
� Dynamic constraints apply to change of database state
E.g., “All FLIGHT-SCHEDULE entities must have precisely one DEPT-AIRPORT relationship”
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
Constraints express rules that cannot be expressed by the data structures alone:
DEPT-AIRPORT
FLIGHT# AIRPORT-CODE
101 atl
912 cph
545 lax
242 bos
21
Data Model - Operations
� Insert FLIGHT-SCHEDULE(97, delta, tu, 258);
� Insert DEPT-AIRPORT(97, atl);
� Select FLIGHT#, WEEKDAY
From FLIGHT-SCHEDULE
Where AIRLINE=‘delta’;
Operations support change and retrieval of data:
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
97 delta tu 258
DEPT-AIRPORT
FLIGHT# AIRPORT-CODE
101 atl
912 cph
545 lax
242 bos
97 atl
22
Data Model - Operations from Programs
declare C cursor for
select FLIGHT#, WEEKDAY
from FLIGHT-SCHEDULE
where AIRLINE=‘delta’;
open C;
repeat
fetch C into :FLIGHT#, :WEEKDAY;
do your thing;
until done;
close C;
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
97 delta tu 258
23
Keys and Identifiers
� A key on FLIGHT# in FLIGHT-SCHEDULE will force all FLIGHT#’s to be unique in FLIGHT-SCHEDULE
� Consider the following keys on DEPT-AIRPORT:
Keys (or identifiers) are uniqueness constraints
FLIGHT# AIRPORT-CODE FLIGHT# AIRPORT-CODE FLIGHT# AIRPORT-CODEFLIGHT# AIRPORT-CODE
DEPT-AIRPORT
FLIGHT# AIRPORT-CODE
101 atl
912 cph
545 lax
242 bos
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
24
Integrity and Consistency� Integrity: does the model reflect reality well?
� a FLIGHT# in FLIGHT-SCHEDULE cannot be null because it models the existence of an entity in the real world.
� Consistency: is the model without internal conflicts?
� a FLIGHT# in DEPT-AIRPORT must exist in FLIGHT-SCHEDULE because it doesn’t make sense for a non-existing FLIGHT-SCHEDULE entity to have a DEPT-AIRPORT.
DEPT-AIRPORT
FLIGHT# AIRPORT-CODE
101 atl
912 cph
545 lax
242 bos
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
25
Triggers and Stored Procedures
� Triggers can be defined to enforce constraints on a database, e.g.,
� DEFINE TRIGGER DELETE-FLIGHT-SCHEDULE
ON DELETE FROM FLIGHT-SCHEDULE WHERE FLIGHT#=‘X’
ACTION DELETE FROM DEPT-AIRPORT WHERE FLIGHT#=‘X’;
DEPT-AIRPORT
FLIGHT# AIRPORT-CODE
101 atl
912 cph
545 lax
242 bos
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
Yazici, ODTÜ, Bilgisayar Müh. Bl
Triggers
� Trigger: procedure that starts automatically if specified changes occur to the DBMS
� Common use is to maintain db consistent and activated by specific kind of stmt (Insert, Delete,
� Three parts:� Event (activates the trigger)
� Condition (tests whether the triggers should run)
� Action (what happens if the trigger runs)
Yazici, ODTÜ, Bilgisayar Müh. Bl
Triggers: Example (SQL2)
CREATE TRIGGER init_count BEFORE INSERT ON Students /*Event*/DECLARE
count INTEGERBEGIN /*Action*/
count := 0;END
CREATE TRIGGER incr_count AFTER INSERT ON Students /* Event */ WHEN (new.age < 18) /*Condition; ‘new’ is just inserted tuple*/FOR EACH ROW
BEGIN /*Action; a procedure in Oracle’s PL/SQL syntax*/
count := count + 1;END
Yazici, ODTÜ, Bilgisayar Müh. Bl
Triggers: Example (SQL2)
CREATE TRIGGER youngSailorUpdate
AFTER INSERT ON SAILORS
REFERENCING NEW TABLE NewSailors
FOR EACH STATEMENT
INSERT
INTO YoungSailors(sid, name, age, rating)
SELECT sid, name, age, rating
FROM NewSailors N
WHERE N.age <= 18
29
DATABASE MODELS -- OVERVIEW
� EER Database Model
� Relational Database Model
� Nested Relational Database Model
� Object-Relation Database Model
� Object-Oriented Database Model(s)
30
ER-Model - Data Structures
entity type
relationship
type
attribute
multivalued
attribute
derived
attribute
subset relationship
type
composite
attribute
31
ER-Model - Integrity Constraints
RE1 E21 n
RE1 E2
RE1 E2
(min,max)
RE1 E2R
cardinality: 1:n for E1:E2 in R
(min,max) participation of E2 in R
total participation of E2 in R
weak entity type E2; identifying relationship type R
E
E1 E3E2
A
key attribute
disjointd
o overlap
32
ER Model - Example
deptairport
internationalflight
domesticflight
d
flightinstance
flightschedule
instanceof
arrivairport
airport
1
n
n
1
street
depttime
airportcode
arrivtime
cityzip
airportaddr
airportname
date
reserva-tion
nn
1
n
customer
flight#
customername
customer#
seat#
weekdays
visarequired
33
Specialization Example
34
ER-Model - Operations
� Several navigational query languages have been proposed
� A closed query language as powerful as relational languages has not been developed
� None of the proposed query languages has been generally accepted
35
EER Database Model
l The EER-Model is successful as a database design model
l Translation algorithms to many databasemodels
l Commercial database design tools, e.g., Erwin
l No DBMS is based on the model
36
Relational Model
� Data Structures
� Integrity Constraints
� Operations
l Commercial systems include: ACCESS,
ORACLE 7i, DB2, SYBASE 8, INFORMIX, INGRES, SQL Server, MYSQL
l Dominates the database market on all platforms
37
Relational Model - Data Structures
� domains
� attributes
� relations
Flight-Scheduleflight#:
integer
airline:
char(20)
weekday:
char(2)
price:
dec(6,2)
relation nameattribute names
domain names
38
DDL - Tables
CREATE TABLE AIRLINE.FLT-SCHEDULE
(FLT# FLIGHTNUMBER NOT NULL,
AIRLINE VARCHAR(25),
FROM-AIRPORTCODE AIRPORT-CODE,
DTIME TIME,
TO-AIRPORTCODE AIRPORT-CODE,
ATIME TIME,
PRIMARY KEY (FLT#),
FOREIGN KEY (FROM-AIRPORTCODE) REFERENCES AIRPORT(AIRPORTCODE),
FOREIGN KEY (TO-AIRPORTCODE)
REFERENCES AIRPORT(AIRPORTCODE));
� To create a table in the AIRLINE schema:
39
Relational Model - Operations
� Powerful set-oriented query languages� Relational Algebra:
Procedural; describes how to compute a query Operators; Union, Intersect, Select, Project, Cartesian Product, Difference, Join, Division, Outer Join, Outer Union, etc.
� Relational Calculus: declarative; describes the desired result, e.g. SQL, QBE
� Insert, delete, and update capabilities
40
Relational Calculus
� The Relational Calculus is non-procedural. It allows you to express a result relation using a predicate on tuple variables (tuple calculus):
{ t | P(t) }
or on domain variables (domain calculus):
{ <x1, x
2, ..., x
n> | P(<x
1, x
2, ..., x
n>) }
� You tell the system which result you want, but not how to construct it
41
Projection
� “find FLT# for all flights scheduled for Mondays”
{ t.FLT# | FLT-WEEKDAY(t) ∧ t.WEEKDAY = MO}
FLT-WEEKDAY
flt# weekday
42
Relational Model - Operations
� tuple calculus example (SQL)
select flight#, date
from reservation R, customer C
where R.customer# = C.customer# and customer-name=‘Ali’;
� algebra example (ISBL)
((reservation join customer) where customer-name=‘Ali’) [flight#, date];
� domain calculus example (QBE)
customer
customer# customer-name
_c Ali
date
reservation
flight# customer#
_c.P .P
43
Interactive DML - set operations
� ““““Find FLT# for flights on Tuesdays in FLT-WEEKDAY and FLT# with more than 100 seats in FLT-INSTANCE ”
SELECTSELECTSELECTSELECT FLT#FROMFROMFROMFROM FLT-WEEKDAYWHEREWHEREWHEREWHERE WEEKDAY = “TU”UNIONUNIONUNIONUNIONSELECTSELECTSELECTSELECT FLT#FROMFROMFROMFROM FLT-INSTANCEWHEREWHEREWHEREWHERE #AVAIL-SEATS > 100;
� UNION ALLUNION ALLUNION ALLUNION ALL preserves duplicates
TS∪S T
S union T
44
Interactive DML - set operation
� ““““Find FLT# for flights on Tuesdays in FLT-WEEKDAY with more than 100 seats in FLT-INSTANCE”
SELECTSELECTSELECTSELECT FLT#FROMFROMFROMFROM FLT-WEEKDAYWHEREWHEREWHEREWHERE WEEKDAY = “TU”INTERSECTINTERSECTINTERSECTINTERSECTSELECTSELECTSELECTSELECT FLT#FROMFROMFROMFROM FLT-INSTANCEWHEREWHEREWHEREWHERE #AVAIL-SEATS > 100;
� INTERSECT ALLINTERSECT ALLINTERSECT ALLINTERSECT ALL preserves duplicates
TS∩S T
S intersect T
45
Interactive DML - set operation
� ““““Find FLT# for flights on Tuesdays in FLT-WEEKDAY except FLT# with more than 100 seats in FLT-INSTANCE”
SELECTSELECTSELECTSELECT FLT#FROMFROMFROMFROM FLT-WEEKDAYWHEREWHEREWHEREWHERE WEEKDAY = “TU”EXCEPTEXCEPTEXCEPTEXCEPTSELECTSELECTSELECTSELECT FLT#FROMFROMFROMFROM FLT-INSTANCEWHEREWHEREWHEREWHERE #AVAIL-SEATS > 100;
� EXCEPT ALLEXCEPT ALLEXCEPT ALLEXCEPT ALL preserves duplicates
TSS - T
S minus T
46
Interactive DML - joins
� cross join: Cartesian product� [inner] join: only keeps rows that satisfy the join
condition� natural or on-condition [or Equi Join] must be
specified for all inner and outer joins� natural: equi-join on columns with same name; one
column preserved� left outer join: keeps all rows from left table; fills in
nulls as needed� right outer join: keeps all rows from right table; fills in
nulls as needed� full outer join: keeps all rows from both tables; fills in
nulls as needed
47
Interactive DML - joins
� “Find all two-leg, one-day trips out of Istanbul; show also a leg-one even if there is no connecting leg-two the same day”
SELECT X.FLT# LEG-ONE, Y.FLT# LEG-TWO
FROM
((FLT-SCHEDULE NATURAL JOIN FLT-INSTANCE) X
LEFT OUTER JOIN
(FLT-SCHEDULE NATURAL JOIN FLT-INSTANCE) Y
ON (X.TO-AIRPORTCODE=Y.FROM-AIRPORTCODE
AND X.DATE=Y.DATE AND X.ATIME<Y.DTIME))
WHERE X.FROM-AIRPORTCODE=“IST”;
48
Interactive DML - built-in functions
� count (COUNT), sum (SUM), average (AVG), minimum (MIN), maximum (MAX)
� “Count flights scheduled for Tuesdays from FLT-WEEKDAY”
SELECT COUNT( *)
FROM FLT-WEEKDAYWHERE WEEKDAY = “TU”;
� “Find the average ticket price by airline from FLT-SCHEDULE”
SELECT AIRLINE, AVG(PRICE)
FROM FLT-SCHEDULEGROUP BY AIRLINE;
49
Relational Model - Integrity Constraints
� Keys
� Primary Keys
� Entity Integrity
� Referential Integrity
reservation
flight# date customer#
flight-schedule
flight#
p
customer
customer# customer name
p
50
Finding a Key for a RelationAlgorithm: Finding a Key K for R, given a set of FDs
1. Set K = R.2. For each attribute A in K
{Compute (K-A)+ wrt F;If (K-A)+ contains all the attributes in R, Then set K = K – {A}};
Example: R = Ssn, Pnumber, Ename, Pname, Plocation, HoursF = {Ssn � Ename, Pnumber � {Pname, Plocation},
{Ssn,Pnumber} � Hours}
The Key is {Ssn,Pnumber}, since {Ssn,Pnumber}+ = {Ssn, Pnumber, Ename, Pname, Plocation, Hours}
51
Good Database Design
� no redundancy of FACT (!)
� no inconsistency
� no insertion, deletion or update anomalies
� no information loss
� no dependency loss
52
� Informal measures of quality for relation schemas design.
1. Semantics of the attributes: it should be easy to explain the meaning of the schema. If a schema correspond to one entity type or one relationship type, its meaning tends to be clear.
2. Reducing the redundant values in tuples: no anomalies.
3. Reducing the null values in tuples: nulls in exceptional cases only.
4. Disallowing the possibility of generating spurious tuples.
Informal Design Guidelines for Good Relation Schemas
53
Null Values
123-45-6789
234-56-7890
345-67-8901
CUSTOMER
Lisa Smith
George Foreman
unknown
Lisa Jones
inapplicable
Mary Blake
inapplicable
drafted
inapplicable
CUSTOMER# NAME MAIDEN NAME DRAFT STATUS
l Null-value unknownunknownunknownunknown (unk)(unk)(unk)(unk) reflects that the attribute does apply, but the value is currently unknown. That’s ok!
l Null-value inapplicableinapplicableinapplicableinapplicable (dne)(dne)(dne)(dne) indicates that the attribute does not apply. That’s bad!
l Null-value inapplicableinapplicableinapplicableinapplicable (dne)(dne)(dne)(dne) results from the direct use of “catch all forms” in database design.
l “Catch all forms” are ok in reality, but detrimental in database design.
54
Bad Database Design- fact clutter
� insertion anomalies: how do we represent that TK912 is flown by Turkish Airline without there being a date and a plane assigned?
� deletion anomalies: cancelling AA411 on 10/22/00 makes us lose that it is flown by American.
� update anomalies: if DL242 is flown by KLM, we must change it everywhere.
FLIGHTS
flt# date airline plane#
DL242 10/23/00 Delta k-yo-33297
DL242 10/24/00 Delta t-up-73356
DL242 10/25/00 Delta o-ge-98722
AA121 10/24/00 American p-rw-84663
AA121 10/25/00 American q-yg-98237
AA411 10/22/00 American h-fe-65748
55
Normalization
FLIGHT# AIRLINE PRICE
FLIGHT-SCHEDULE
101 delta 156
545 american 110
912 scandinavian 450
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo
545 american mo 110
912 scandinavian fr 450
156
101 delta fr 156
545 american we 110
545 american fr 110
FLIGHT# WEEKDAY
FLIGHT-WEEKDAY
101 mo
545 mo
912 fr
101 fr
545 we
545 fr
FLIGHT# AIRLINE WEEKDAYS PRICE
FLIGHT-SCHEDULE
101 delta mo,fr 156
545 american mo,we,fr 110
912 scandinavian fr 450
unnormalized
redundant
just right!
56
Overview of Normal FormsNF2
1NF
2NF
3NF
BCNF
4NF
5NF
57
Normal Forms - definitions
� NF2: non-first normal form
� 1NF: R is in 1NF. iff all domain values are atomic2
� 2NF: R is in 2. NF. iff R is in 1NF and every nonkeyattribute is fully dependent on the key
� 3NF: R is in 3NF iff R is 2NF and every nonkey attribute is non-transitively dependent on the key
� BCNF: R is in BCNF iff every determinant is a candidate key
� Determinant: an attribute on which some other attribute(s) is fully functionally dependent.
58
Example of Normalization
flt# date plane# airline from to miles
FLT-INSTANCE
flt#dateplane#
airline
fromto
miles
59
flt#date
plane#
airline
fromto
miles
flt#date
plane# flt#
airline
fromto
miles
Example of Normalization
1NF:
2NF:
60
from
to
miles
flt#
airline
from
to
flt#
dateplane#
Example of Normalization
3NF & BCNF:
61
3NF that is not BCNF
A
B C
A decomposition:
Lossless, but not dependency preserving!
A B C
R
C B
R1
A C
R2
Candidate keys: {A,B} and {A,C}
Determinants: {A,B} and {C}
62
How to guarantee lossless joins
• Decompose relation, R, with functional dependencies, F, into relations, R1 and R2, with attributes, A1 and A2, and associated FDs, F1 and F2.
• The decomposition is lossless iff:
• R1∩R2 → R1 - R2 is in F+, or
• R1∩R2 → R2 - R1 is in F+
R1 R2 = R
63
How to guarantee preservation of FDs
� Decompose relation, R, with functional dependencies, F, into relations, R1,..., Rk, with associated functional dependencies, F1,..., Fk.
� The decomposition is dependency preserving iff:
F+=(F1∪... ∪ Fk)+
top related