data models
DESCRIPTION
Data Models. How to structure data. What is a Data Model?. Having formed a model of the enterprise, we now need to represent the data. The data model tells us the structure of the database. Historically, three data models: Hierarchical data model Network data model Relational data model. - PowerPoint PPT PresentationTRANSCRIPT
2
What is a Data Model?
Having formed a model of the enterprise, we now need to represent the data.
The data model tells us the structure of the database.
Historically, three data models: Hierarchical data model Network data model Relational data model
3
Hierarchical and Network Data Models
Hierarchical and network data models have been superseded by the relational data model.
Reasons: Lack of expressive power
E.g., one cannot express many-to-many relationships in the hierarchical model
More closely tied to the underlying implementation. Hence, less data independence.
Relational data model has a clean mathematical basis.
4
The Relational Model
Due to Codd. Everything is represented as a
relation in the mathematical sense. Also called tables.
A database therefore is a collection of tables, each of which has a unique name, and each of which is described by a schema.
In addition, Codd defined a data manipulation language.
5
Example of Schemas in the Relational Model
Example of a representation of entity sets:Student(sid,name,addr)
Course(cid,title,eid)
Empl(eid, ename, deptid)
Dept(deptid, dname, loc) Primary keys are underlined. Recall that a primary key is one
that uniquely identifies an entity. An entity is a row in a table.
6
More Example Schemas
Relationship sets between entity sets are also represented in tables.
Example of a table corresponding to a relationship:Enrol(sid, cid, grade)
Again, a relationship is represented by a row (or a tuple) in a relation.
7
Relational Databases:Basic Concepts I
Attribute: A column in a table
Domain The set of values from which the
values of an attribute are drawn. Null value
A special value, meaning “not known” or “not applicable”.
Relation schema A set of attribute names
8
Relational Databases: Basic Concepts II
Tuple A set of values, one for each
attribute in the relation scheme over which the tuple is defined, i.e. a mapping from attributes to the appropriate domains
Relation instance A set of tuples over the scheme
of the relation
9
Relational Databases:Basic Concepts III
Relational Database A set of relations, each with a
unique name Normalized Relation
A relation in which every value is atomic (non-decomposable). Hence, every attribute in every tuple has a single value.
10
Keys
Candidate Key A minimal set of attributes that
uniquely identifies a tuple Primary Key
The candidate key chosen as the identifying key of the relation
Alternate Key Candidate keys which are not
primary keys
11
Foreign Key An attribute (or set of attributes)
in table R1 which also occurs as the primary key of relation R2.
R2 is called the referenced relation.
Foreign keys are also called connection keys or reference attributes.
12
Integrity Rules: Entity Constraint
Entity constraint All attributes in a primary key
must be non-null. Motivation: If the primary key
uniquely identifies an entity in an entity set, then we must ensure that we have all the relevant information
13
Integrity Rules:Referential Integrity
Referential integrity A database cannot contain a
tuple with a value for a foreign key that does not match a primary key value in the referenced relation.
Or, a foreign key must refer to a tuple that exists.
Motivation: If referential integrity were violated, we could have relationships between entities that we do not have any information about.
14
Data Manipulation Languages
In order for a database to be useful, it should be possible to store and retrieve information from it. This is the role of the data manipulation language.
One of the attractions of the relational data model is that it comes with a well-defined data manipulation language.
15
Types of DML
Two types of data manipulation languages Navigational (procedural)
The query specifies (to some extent) the strategy used to find the desired result e.g. relational algebra.
Non-navigational(non-procedural)
The query only specifies what data is wanted, not how to find it e.g. relational calculus.
16
Relational Algebra
Codd defined a number of algebraic operations for the relational model.
Unary operations take as input a single table and produce as output another table.
Binary operations take as input two tables and produce as output another table.
17
Unary Operations: Select
Select produces a table that only contains the tuples that satisfy a particular condition, in other words a “horizontal” subset.
Appearance:C(R) where C is a selection condition and R is the relation over which
the selection takes place
18
Example of Select
Student
sid name addr
123 Fred 3 Oxford
345 John 6 Hope Rd.
567 Ann 5 Garden
sid > 300(Student) yields
345 John 6 Hope Rd.
567 Ann 5 Garden
19
Unary Operations:Project
Project produces a table consisting of only some of the attributes. It creates a “vertical” subset.
Note that a project eliminates duplicates.
Appearance:
ПA(R) where A is a set of attributes of R and R is the relation over which
the project takes place.
20
Example of Project
Enrol
sid cid grade
123 CS51T 76
234 CS52S 50
345 CS52S 55
Пcid(Enrol) yields
CS51T
CS52S
21
Binary Operations
Two relations are (union) compatible if they have the same set of attributes.
Example, one table may represent suppliers in one country, while another table with same schema represents suppliers in another country.
For the union, intersection and set-difference operations, the relations must be compatible.
22
Union, Intersection, Set-difference
R1 R2 The union is the table comprised
of all tuples in R1 or R2. R1 R2
The intersection is the table comprised of all tuples in R1 and R2
R1 - R2 The set-difference between R1
and R2 is the table consisting of all tuples in R1 but not in R2.
23
Cartesian Product
R1 R2 The Cartesian product is the
table consisting of all tuples formed by concatenating each tuple in R1 with a tuple in R2, for all tuples in R2.
24
Example of a Cartesian Product
R1 A B
1 x
2 y
R2 C D
a s
b t
c u
R1 R2 A B C D
1 x a s
1 x b t
1 x c u
2 y a s
2 y b t
2 y c u
25
Natural Join
R1 R2 Assume R1 and R2 have
attributes A in common. Natural join is formed by concatenating all tuples from R1 and R2 with same values for A, and dropping the occurrences of A in R2
R1 R2 = ПA’(C(R1 R2)) where C is the condition that the
values for R1 and R2 are the same for all attributes in A and A’ is all attributes in R1 and R2 apart from the occurrences of A in R2.
hence, natural join is syntactic sugar
26
Example of a Natural Join I
Course
cid title eid
CS51T DBMS 123
CS52S OS 345
CS52T Networking 345
CS51S ES 456
Instructor
eid ename
123 Rao
345 Allen
456 Mansingh
27
Example of a Natural Join II
Course Instructorcid title eid ename
CS51T DBMS 123 Rao
CS52S OS 345 Allen
CS52T Net... 345 Allen
CS51S ES 456 Mansingh
28
Division
R1 R2 Assume that the schema for R2
is a proper subset of the one for R1.
We form the division by Ordering the tuples in R1 so that
all the tuples with the same value for the non-common attributes are grouped together.
Each group contributes a tuple to the result if the group’s values on the common attributes form a superset of the values of these attributes in R2.
29
Example of Division I
Enrol cid sid grade
CS51T 123 A
CS52S 123 A
CS51T 234 C
CS52S 234 B
CS51T 345 C
CS52S 345 C
Temp sid grade
123 A
234 B
30
Example of Division II
Enrol cid sid grade
CS51T 123 A
CS51T 234 C
CS51T 345 C
CS52S 123 A
CS52S 234 B
CS52S 345 C
Enrol Temp cid
CS52S
Thus, the division gives all courses for which 123 got an A and 234 a B.
31
Assignment
Allows the expression to be written in parts.
Assigns the part to a temporary variable.
This variable can be used in subsequent expressions.
E.g. sid(title = ‘DBMS’ (Enrol Course)
Could be re-written as: r Enrol Course sid(title = ‘DBMS’(r))
32
Rename Operation
Names the result of an expression.
x(A1,A2,…,An) (E)
returns the result of expression E under the name x with the attributes renamed as A1,A2,…,An.
E.g. S (Student) Renames Student table to S.
33
Database Modification
Insert r r E e.g.
Course Course {(‘CS51T’,’DBMS’)}
Delete r r - E e.g.
Student Student - sid=‘1’(Student)
Update r
F1,F2,…,Fn (r)
e.g. Enrol sid,cid,grade grade + 2 (Enrol)
34
Examples
Assume the following schema: Student(sid,sname,saddr)
Course(cid,title,lid)Enrol(sid, cid, grade)Lecturer(lid,lname,deptname)
Query 1: Find the name of all students that have taken the course entitled ‘Expert Systems’.
Query 2: Find the titles of all courses that student ‘Mark Smith’ has done.
Query 3: Find the id of students that have enrolled in all the courses that lecturer with id. = ‘234’ has taught.
Query 4: Find the highest grade for ‘CS51T’.
35
Relational Calculus
A relational calculus expression defines a new relation in terms of other relations.
A tuple variable ranges over a named relation. So, its values are tuples from that relation.
Example: Get grades for CS51T e(Enrol)
{<e.grade>: e.cid = ‘CS51T’ }
36
Basic Syntax for Relational Calculus Expressions
r(R),…,s(S)
{ <target> : predicate} where
R,..,S are tables r,..,s are tuple variables target specifies the attributes of
the resulting relation predicate is a formula giving a
condition that tuples must satisfy to qualify for the resulting relation.
37
The Predicate
Predicate is constructed from attribute names constants comparison operators
logical connectives
quantified tuple variables
t(R), t(R)
38
Examples of Relational Calculus
Example 2 Get names and grades for
students enrolled in CS51T e(Enrol), s(Student)
{<s.name, e.grade>:
e.cid = ‘CS51T’ s.sid = e.sid}
In relation algebraПcid, name( CID =‘ CS51T’(Grade Student))
39
Example 3
Give the names of all students who got at least one A. s(Student)
{<s.name>:
e(Enrol)
(e.grade = ‘A’ s.sid = e.sid)}
Tuple variables not mentioned in the target list must be bound in the predicate.
40
Example 4
Get the names of all students who only got A’s s(Student)
{<s.name>:
e(Enrol)( s.sid = e.sid e.grade
= ‘A’)
e2(Enrol) (s.sid = e2.sid)}
41
Example 5
Get the names of all students who got an A and a B s(Student)
{<s.name>:
e(Enrol) (e.grade = ‘B’ s.sid =
e.sid)
e2(Enrol) (e2.grade = ‘A’
s.sid = e2.sid)}