database management system review
TRANSCRIPT
Database Management System CS157A SJSU Fall 2015 Kaya
What is DB Definition of Database
A collection of information organized to afford efficient retrieval.
** not necessary to RDB **
Why do we need DB?
1. Sharing = support concurrent access by multiple users(read and write)
2. Data Model Enforcement = make sure all apps see clean and organized data.
3. Scale = work with dataset too large to fit in memory
4. Flexibility = use data in new and unanticipated
Data Models
Database Model Kinds of Database model
1. Relational data model2. Object oriented relational data model3. Semi-structured data model
Relational data model Excel like i.e. working with tables
Has operations Union, intersection, difference, selection, projection,
products, join, renaming
Person ID
Last Name
First Name
DateOf Birth
HomeaddrStreet
Homeaddr City
HomeAddrzip
Home Addr state
Work Addr street
Work Addr City
Work Addr Zip
Work AddrState
1 Yamada
taro 4/15 aaa bbb 111 CA eee fff 222 CA
Object Oriented Relational Data Model
Similar to Relational database Added: object, classes, and inheritance directly
support DB-schema and query language OBJ: Person
Last Name
First Name
Date of Birth
Home Address
Work Address
OBJ: Address
Street
City
Zip
State
Refer
Object Oriented Relational Data Model OBJ: Person
Last Name
First Name
Date of Birth
Home Address
Work Address
OBJ: Address
Street
City
Zip
StateRefer
Instance :Home Address
aaa
bbb
111
CA
Instance: Person
yamada
taro
4/15
Home
Work
Instance: Work Address
ccc
ddd
222
CA
Semi Structured Data Model Data are Represented by Graph or Tree
To implement use XML
Movies
title
Genredrama
Length
281
Year1939
title
Year1977
Length
124
Genrescifi
Gone with the wind Star Wars
XML representation <Movies>
<movie title =Gone with the wind>
<year>1939</year><Length>281</length><genre>drama</genre></movie >
<movie title =star wars><year>1992</year><Length>124</length><genre>scifi</genre></movie >
</Movies>
Other Data Model Hierarchical model
Can be used to taxonomy(分類学 )
☆Has parent ID as meta data
Pictorial Representation
Relational Representation
Other Data Model Network model: differs from Relational model in
that data are represented by: Collection of Recodes Among data represented by link
Schema
Customer Account
Defining Schema in SQL
DATA TYPE-letters- Character string
Char(n): fixed length of char are stored. If you KNOW number of chars will be stored, then use this.
VARCHAR(N):upto n chars will be stored. If you do NOT know number of chars will be stored then use this.
Bit string BIT(n): like char(n) fixed length of bit chars BIT VARYING(n):like varchar(n) upto n bit chars
Data types-math- BOOLEAN = {True , False}
INTEGER
SHORTINT: range is shorter then integer
FLOAT
DOUBLE
DECIMAL(n, d): customized real number;
NUMERIC(n, d): same as DECIMAL
Data type-time- DATE: formed by 'yyyy-mm-dd’
TIME: formed by 'HH:mm:ss' or 'HH:mm:ss.d’ Where d is a fraction of sec
TIMESTAMP: formed by 'yyyy-mm-dd HH:mm:ss'
Creating Tables Syntax in SQL:
In generalCreate table_name(Attribute1 data_type PRIMARY KEY Attribute2 data_type DEFAULT value Attribute3 data_type…….);
In ExampleCreate Movie(title varchar(50) PRIMARY KEY year int DEFAULT 0000length int);
reserved word = blue
Set initial value to 0000
Set title to be unique key
Relational Operations
UNION
Union
Basic Rules of Union # of columns and order of columns MUST be SAME Data type of columns on involving tables in each
query MUST be SAME or compatible Returned columns are usually from the first table
Titlevarchar()
YearInt
Length Int
Titlevarchar()
YearInt
Length Int U
Titlevarchar()
YearTime
Length Int
Titlevarchar()
YearInt
Length Int
Syntax In general
SELECT attribute1, attribute2 FROM Table1 UNIONSELECTattribute1, attribute2 FROM Table2
In example SELECT prod_code, prod_nameFROM Product UNIONSELECT prod_code,prod_name FROM Parches
Example—table—
PUR_#
PROD_CODE PROD_NAME COM_NAME PUR_QTY PUR_AMOUNT
2 PR001 TV SONY 15 4500001 PR003 iPod PHILIPS 20 600003 PR007 laptop HP 6 2400004 PR005 mobile NOKIA 100 3000005 PR002 DVD player LG 10 300006 PR006 Sound system CREATIVE 8 40000
PROD_CODE PROD_NAME COM_NAME LIFEPR001 TV SONY 7PR002 DVD player LG 9PR003 iPod PHILIPS 9PR004 Sound system CREATIVE 8PR005 mobile NOKIA 6
UNION
Products:
Purchase:
Example—output—
PROD_CODE PROD_NAME COM_NAME
PR001 TV SONY
PR002 DVD player LG
PR003 iPod PHILIPS
PR004 Sound system CREATIVE
PR005 mobile NOKIA
PR007 laptop HP
Products UNION of Purchase
Union with different columns name
SELECT prod_code,prod_name,lifeFROM productWHERE life>6UNIONSELECT prod_code,prod_name,pur_qtyFROM purchaseWHERE pur_qty<20
PROD_CODE PROD_NAME COM_NAME LIFE(int)
PUR_# PROD_CODE PROD_NAME COM_NAME PUR_QTY(int)
PUR_AMOUNT
the two queries have been set using two different criteria(life and PUR_QTY) and different columns.
BUT NOTE both criteria have INTEGER VALUE
Union with different columns name
PROD_CODE PROD_NAME LIFE
PR001 TV 7
PR001 TV 15
PR002 DVD player 9
PR002 DVD player 10
PR003 iPod 9
PR004 Sound system 8
PR006 Sound system 8
PR007 laptop 6
Orange values come from PRODUCT.LIFE
Blue values come from PURCHASE.PUR_QTR
BE CAREFULIN Most of cases,
This is unwelcomed result
INTERSECTION
Selection and Projection
Selection
C is a condition (as in if-statement) that refers to attributes of R2
R1 is all those tuples of R2 that satisfy C
SQL form SELECT * FROM R2 WHERE C
SelectionBar Beer Price
Joe’s Bud 2.50
Joe’s Miller 2.75
Sue’s Mud 2.50
Sue’s Miller 3.00
R2:
Bar Beer Price
Joe’s Bud 2.50
Joe’s Miller 2.75R1:
C: BAR = Joe’s
Projection
R1 is constructed by looking at each tuples of R2 extracting the attributes on list L, in the order specified and creating from those components a tuples for R1
Eliminate Duplicated tuples if any
SQL form SELECT L FROM R2
ProjectionBar Beer Price
Joe’s Bud 2.50
Joe’s Miller 2.75
Sue’s Bud 2.50
Sue’s Miller 3.00
R2:
Beer Price
Bud 2.50
Miller 2.75
Bud 2.50
Miller 3.00
Beer Price
Bud 2.50
Miller 2.75
Miller 3.00
Delete duplicate
PRODUCT and JOIN
CROSS PRODUCT Consider ALL possible combinations of two or more
tables.
# of row inTable1= x
# of row inTable2= y
# of rows in Result tablesx * y
Syntax In general
SELECT T1.A1, T1.A2, T2.A1, T2.A2….FROM T1CROSS JOIN T2
In example SELECT Eats.pizza, Eats.name, Person.age, Person.gender, Person.name FROM EatsCROSS JOIN Person
Eats has 9 rows and Person has 20results 9 * 20 = 180 rows
EQUI-JOIN Equi join performs a join against equality or
matching column’s value of the associated tables
An equal sign(=) is used as comparison operator in the WHERE clause to refer equality. Select * from t1, t2 where t1.attr1 = t2.attr2
Also perform equi-join by using JOIN followed by ON and then specifying names of the columns along with their associated tables to check equality
EQUI-JOINID Attribute1
2 A2
5 A5
3 A3
1 -----
4 A4
ID Attribute2
5 B5
1 B1
3 -----
6 B6
2 B2
5 C4
T1: T2:
ID Attribute1 ID Attribute2
1 ----- 1 B1
2 A2 2 B2
3 A3 3 -----
5 A5 5 B5
5 A5 5 C4
SELECT * FROM T1 JOIN T2 ON T1.ID = T2.ID
SELECT * FROM T1 ,T2 WHERE T1.ID = T2.ID
NOTEOne of IDs is NOT
eliminated
ID 5 in T1 is matched to two of ID 5 in T2.So, ID 5 in T1 is duplicated
Natural Join Natural Join is a type of EQUI-JOIN
It is structured such a way that columns with same name of associated table will appear only once No duplicated columns name
Guidelines The associated table have one or more pairs of
identically named columns The columns MUST be the same data type Do not use ON clause in a natural join
Natural-JOINID Attribute1
2 A2
5 A5
3 A3
1 -----
4 A4
ID Attribute2
5 B5
1 B1
3 -----
6 B6
2 B2
5 C4
T1: T2:
ID Attribute1 Attribute2
1 ----- B1
2 A2 B2
3 A3 -----
5 A5 B5
5 A5 C4
SELECT * FROM T1 ,T2 WHERE T1.ID = T2.ID
NOTEOne of IDs IS eliminated
SELECT *FROM T1NATURAL JOIN T2;
Theta-Join Theta join allows for arbitrary comparison relation
Such as {<=, =>, <,>,= , !=}
Relational Algebra Notation
where C = any Boolean-valued condition
Take R1 × R2 then apply Projection with condition C
Theta JoinBar Beer Price
Joe’s Bud 2.50
Joe’s Miller 2.75
Sue’s Mud 2.50
Sue’s Coors 3.00
Name ADDR
Joe’s Maple St
Sue’s River StR1: R2:
Bar Beer Price Name ADDRJoe’s Bud 2.50 Joe’s Maple St
Joe’s Miller 2.75 Joe’s Maple St
Sue’s Mud 2.50 Sue’s River St
Sue’s Coors 3.00 Sue’s River St
C: R1.Bar = R2.Name
Other Join --
Normalization
Normalization Why do we need to normalize data?
To reduce redundancy and dependency
No normalization Problems without normalization
Anomaly (矛盾 /不調和 ) can happen: Update anomaly Insertion anomaly Deletion anomaly
Solution normalization!
We need to data normalization to reduce anomalies
Update anomaly
Update anomaly is a data inconsistency that result from data redundancy and a partial update.
Update anomaly EmployeeID Name Department Student group123 J. Longfellow Accounting Beta Alpha Psi
234 B. Rech Marketing Marketing Club
234 B. Rech Marketing Marketing Manage Club
456 A.Bruchs CIS Technology Org
456 A.Bruchs CIS Beta Alpha Psi
What happen if you update like below?
UPDATE Employee SET department = “ECON”WHERE StudentGroup = “technology Org”
Table: employee
Update anomaly EmployeeID Name Department Student group
123 J. Longfellow Accounting Beta Alpha Psi
234 B. Rech Marketing Marketing Club
234 B. Rech Marketing Marketing Manage Club
456 A.Bruchs ECON Technology Org
456 A.Bruchs CIS Beta Alpha Psi
When A.Bruchs’s department has been updated,say CIS to ECON ,Then 5th row’ s department has to be updated too.Otherwise, data can not be consistent
Can not be the same person any more !!!
Another Update Anomaly S_id S_name S_address Suj_opted
401 Adam Noida Bio
402 Alex Panipat Math
403 Stuart Jammu Math
404 Adam Noida Physic
Update student’s address that appears >= 2
We need to check ALL ROWS for the update.
If this is not updated, Adam lives two different place inconsistency
Insertion Anomaly
Insertion anomaly The inability to add data to DB due to absences
of other data
Insertion Anomaly
This company hires Roy who has not decided student_group yet
Insert into Employee (EmployeeID, Name, Department, StudentGroup) values(125, “Roy”, “Math”, ) ERROR
Need to have smaller table that only controls employees, not employees AND their student group, department, etc.
EmployeeID Name Department Student group123 J. Longfellow Accounting Beta Alpha Psi
234 B. Rech Marketing Marketing Club
234 B. Rech Marketing Marketing Manage Club
456 A.Bruchs CIS Technology Org
456 A.Bruchs CIS Beta Alpha Psi
Deletion Anomaly
Deletion anomaly is the unintended loss of data due to deletion of other data.
Deletion anomalyEmployeeID Name Department Student group123 J. Longfellow Accounting Beta Alpha Psi
234 B. Rech Marketing Marketing Club
234 B. Rech Marketing Marketing Manage Club
456 A.Bruchs CIS Technology Org
456 A.Bruchs CIS Beta Alpha Psi
What happen if you execute:delete from Employee where StudentGroup = “Beta Alpha Psi”
Deletion Anomaly
J.Longfellow no longer exists (as data)!!!
EmployeeID Name Department Student group123 J. Longfellow Accounting Beta Alpha Psi
234 B. Rech Marketing Marketing Club
234 B. Rech Marketing Marketing Manage Club
456 A.Bruchs CIS Technology Org
456 A.Bruchs CIS Beta Alpha Psi
Functional Dependencies Trivial functional dependency
Partially functional dependency
A B C
B determines B == knowing B, can find B
A B C
B determines C == knowing B, can find C
Functional Dependencies Fully functional dependency
Transitive dependency
A B C
A determines B AND C == knowing A, can find every non-key attributes
A B C
A determines B and B determines C
First Normalization Form Definition of 1NF
Relation is in 1nf if it satisfy following condition: No two rows of data must contain repeating group of
information I.e. Each set of column must have an atomic value, such
that multiple columns cannot be used to fetch the same row
2nd normalization form Definition: A relation is in 2nd nf if it satisfies following
condition: It is in 1st NF All non-key attributes are fully-functional dependency on
the primary key. Primary key has to be able to determine all other attributes.
A functional dependency that holds in a relation is partial when removing one of the determining attributes gives a functional dependency that holds in the relation.
If {A,B} {C} but also {A} {C} then {C} is partially functionally dependent on {A,B}
☆Can contain transitive functionality
3rd Normalization Form A relation is in 3rd NF if it satisfies the following
condition: It is in 2nd NF There is no transitive dependency
Transitive dependency
A B Crelation
A determines BB = f(A)
B determines CC = h(B)
Transitive: C =h(f(A))
f h
BCNF Determinant: is any attribute(simple or composite) on which
some other attribute is fully functional dependent.
BCNF definition: A relation R is in BCNF if and only if every determinant is
candidate key
Note -- 3rd NF does not deal with: A relation has multiple candidate key Those candidate keys are composite The candidate keys overlap
BCNF is to eliminate anomaly of those cases
BCNF is to deal with cases where 3rd
normalization can not.
BCNF-Example Table = Supplies(supplier_no, supplier_name,city,zip)
Supplier_name is unique Supplier_no and supplier_name are unique
H1 (supplier_no) = city = g1(supplier_name)
H2(supplier_no) = zip = g2(supplier_name)\
H3(supplier_no) = supplier_name
G3(supplier_name) = supplier_no
Possible Anomaly in BCNF INSERT: We cannot record the city for a supplier_no
without also knowing the supplier_name
DELETE: If we delete the row for a given supplier_name, we lose the information that the supplier_no is associated with a given city.
UPDATE: Since supplier_name is a candidate key (unique), there are none.
http://www2.york.psu.edu/~lxn/IST_210/normal_form_definitions.html
Possible solution
Decompose Supplier into to two tables.
SUPPLIER_INFO (supplier_no, city, zip)
SUPPLIER_NAME (supplier_no, supplier_name)
Representation
Representation SQL Representation
select movietitle From(select starname, movietitle from starln) a,
(select name from moviestar where birthdate like ‘%1974%’) bWhere a.starname = b.name
Relational Algebra 3 Different representations shows the same
query
Query Treeπmovietitle
∞movietitle
πstarnameπname
Starlnσmoveyearlike’%1974%’
MovieStar
Disk
Structure of disk
Reading one data at one time b/c using magnetic current is not reliable If failure, then it needs back to recover
Cylinder(non-physical)
All references https://iamcam.wordpress.com/2006/03/17/storing-hiera
rchical-data-in-a-database-part-1/
http://codex.cs.yale.edu/avi/db-book/db6/appendices-dir/d.pdf
http://www.w3resource.com/sql/sql-union.php
http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
http://infolab.stanford.edu/~ullman/fcdb/aut07/slides/ra.pdf
Cont http://blog.codinghorror.com/a-visual-explanation-
of-sql-joins/