© d. wong 2003 1 normalization purpose: process to eliminate redundancy in relations due to...
Post on 17-Jan-2016
215 Views
Preview:
TRANSCRIPT
11 © D. Wong 2003© D. Wong 2003
NormalizationNormalization
Purpose: process to eliminate redundancy in Purpose: process to eliminate redundancy in relations due to functional or multi-valued relations due to functional or multi-valued dependencies.dependencies.
Decompose relation schema into Normal forms:Decompose relation schema into Normal forms:
– Boyce-Codd Normal Form (BCNF)Boyce-Codd Normal Form (BCNF)
– Third Normal Form (3NF)Third Normal Form (3NF)
– Fourth Normal Form (4NF)Fourth Normal Form (4NF) To obtain the new relations, project the schemas To obtain the new relations, project the schemas
onto the original relation schema (e.g. Movie)onto the original relation schema (e.g. Movie) To recover information (I.e. Movie) from the new To recover information (I.e. Movie) from the new
relations: natural join the new relations. relations: natural join the new relations.
22 © D. Wong 2003© D. Wong 2003
BCNF Decomposition Example 3.24 pp 104BCNF Decomposition Example 3.24 pp 104
Relation: Movie(title, year, length, filmType, studioName, Relation: Movie(title, year, length, filmType, studioName, starName)starName)
Key: {title, year, starName}Key: {title, year, starName}
FD’s: title year FD’s: title year length filmType studioName is a BCNF length filmType studioName is a BCNF violation, so Movie not in BCNFviolation, so Movie not in BCNF
Decomposition:Decomposition:
Schema 1: {title, year, length, filmType, studioName}Schema 1: {title, year, length, filmType, studioName}
Schema 2: {title, year, starName}Schema 2: {title, year, starName}
To obtain the new relations, project the schemas onto MovieTo obtain the new relations, project the schemas onto Movie
To recover information (I.e. Movie) from the new relations: To recover information (I.e. Movie) from the new relations: natural join the new relations. Does not lose information.natural join the new relations. Does not lose information.
33 © D. Wong 2003© D. Wong 2003
Functional Dependencies (FD)Functional Dependencies (FD)
Given: Given: relation schemarelation schema R(A1, …, An), and X and R(A1, …, An), and X and Y be subsets of (A1, … An).Y be subsets of (A1, … An).
FD : X FD : X Y means X functionally determines Y Y means X functionally determines Y
e.g. Ae.g. A11AA22…A…Ann B B11BB22…B…Bmm
AA11AA22…A…Ann B B1BB2…B…Bm is an assertion about R that is an assertion about R that
two attributes or sets of attributes in R are two attributes or sets of attributes in R are dependent of one another.dependent of one another.
44 © D. Wong 2003© D. Wong 2003
Mutivalued Dependencies (MVD)Mutivalued Dependencies (MVD)Given: relation schema R, and AGiven: relation schema R, and A11AA22…A…Ann and B and B1BB2…B…Bm be subsets be subsets
of attributes of R.of attributes of R.
MVD : AMVD : A11AA22…A…Ann B B1BB2…B…Bm holds in R if : holds in R if :
For each pair of tuples t and u of relation R that agree on For each pair of tuples t and u of relation R that agree on all the A’s, we can find in R some tuple v that agrees:all the A’s, we can find in R some tuple v that agrees:
1.1. With both t and u on the A’s,With both t and u on the A’s,2.2. With t on the B’s, andWith t on the B’s, and3.3. With u on all attributes of R that are not among the With u on all attributes of R that are not among the
A’s or B’sA’s or B’sAA11AA22…A…Ann B B1BB2…B…Bm is an assertion about R that two is an assertion about R that two
attributes or sets of attributes in R are attributes or sets of attributes in R are independentindependent of one of one another.another.
Cause redundancy not related to FD’s in a BCNF schema. Cause redundancy not related to FD’s in a BCNF schema. Most common source: putting 2 or more many-many Most common source: putting 2 or more many-many
relationships in a single relation.relationships in a single relation.
55 © D. Wong 2003© D. Wong 2003
MVD RulesMVD Rules
Trivial dependencies ruleTrivial dependencies rule
If AIf A11AA22…A…Ann B B1BB2…B…Bm holds for R, then AA11AA22…A…Ann CC1CC2…C…Ck holds where the C’s are the B’s + one or more of the A’s. The converse also hold.
Transitive rule
If AA11AA22…A…Ann B B1BB2…B…Bm and BB11BB22…B…Bmm C C1CC2…C…Ck
then AA11AA22…A…Ann C C1CC2…C…Ck
Splitting rule does not holdE.g. name street city, but not name street city, but not name street street
So, always start with set of attributes on the R.S. because So, always start with set of attributes on the R.S. because splitting rule does not hold.splitting rule does not hold.
66 © D. Wong 2003© D. Wong 2003
More MVD RulesMore MVD Rules Every FD is an MVDEvery FD is an MVD
Because If FD Because If FD AA11AA22…A…Ann BB1BB2…B…Bm, then swapping B’s between , then swapping B’s between tuples that agree on A’s doesn’t create new tuples.tuples that agree on A’s doesn’t create new tuples.
Complementation ruleComplementation rule
If If X X Y, then X Y, then X Z, where Z is all attributes not in X Z, where Z is all attributes not in X or Yor Y
e.g. Star_Star_In {name, street, city, title, year}e.g. Star_Star_In {name, street, city, title, year}
name name street city street city
namename title year title year
A’s B’s
t
u
77 © D. Wong 2003© D. Wong 2003
Nontrivial MVDNontrivial MVD
AA11AA22…A…Ann B B1BB2…B…Bm for a relation R is nontrivial if:
1.1. BB1BB2…B…Bm is not a subset of AA11AA22…A…Ann
2.2. AA11AA22…A…An n B B1BB2…B…Bm is not all attributes of R
88 © D. Wong 2003© D. Wong 2003
Fourth Normal Form (4NF)Fourth Normal Form (4NF)
Decompose relations that has MVD’s into 4NF to Decompose relations that has MVD’s into 4NF to eliminate MVD’s.eliminate MVD’s.
Definition:Definition:
R is in 4NF if AR is in 4NF if A11AA22…A…Ann B B1BB2…B…Bm is a nontrivial MVD, {AA11AA22…A…Ann} is a superkey.} is a superkey.
Since every FD is an MVD, so 4NF is more every FD is an MVD, so 4NF is more stringent than BCNFstringent than BCNF
Only nontrivial MVD’s has the potential to violate 4NF
99 © D. Wong 2003© D. Wong 2003
4NF Decomposition4NF Decomposition
Given: relation R, and nontrivial MVD X Given: relation R, and nontrivial MVD X Y that violate Y that violate 4NF4NF
1.1. Decompose X Decompose X Y into XY and X Y into XY and X (R-Y) (R-Y)
2.2. Produce the relations by projecting R onto XY and Produce the relations by projecting R onto XY and X X (R-Y) (R-Y)
3.3. Reconstruct R from the new relations using natural joinReconstruct R from the new relations using natural join
e.g. Star_Star_In {name, street, city, title, year} and e.g. Star_Star_In {name, street, city, title, year} and
name name street city street city
Decompose Star_Star_In using name Decompose Star_Star_In using name street city into street city into {name, street, city} and {name, title, year}{name, street, city} and {name, title, year}
X
Y
R
1010 © D. Wong 2003© D. Wong 2003
Relationships among normal formsRelationships among normal forms
4NF is the most stringent4NF is the most stringent
4NF 4NF BCNF BCNF 3NF 3NF
1111 © D. Wong 2003© D. Wong 2003
Lossless-join decompositionLossless-join decomposition
Given: Relation R, decomposed into schemes RGiven: Relation R, decomposed into schemes R11, R, R22, … , …
RRkk, and D is a set of dependencies., and D is a set of dependencies.
Definition: RDefinition: R11, R, R22, … R, … Rk k is a lossless-join (w.r.t. D) if for is a lossless-join (w.r.t. D) if for
every relation r for R satisfying D:every relation r for R satisfying D:
r = r = R1R1(r) (r) R2R2(r) (r) …RkRk(r) (r)
i.e. Every relation r for R is the natural join of i.e. Every relation r for R is the natural join of its projections onto the Rits projections onto the Rii’s.’s.
The lossless-join property is necessary if the decomposed The lossless-join property is necessary if the decomposed relation is to be recoverable from its relation is to be recoverable from its decomposition.decomposition.
However, joins are expensive. So, don’t over decompose!However, joins are expensive. So, don’t over decompose!
1212 © D. Wong 2003© D. Wong 2003
Structured Query Language (SQL)Structured Query Language (SQL)
A DDL and DML for relational DBMSsA DDL and DML for relational DBMSs
History: ANSI SQL, , SQL-92 (SQL2), SQL-99 (SQL3)History: ANSI SQL, , SQL-92 (SQL2), SQL-99 (SQL3)
SQL-99 extends SQL2 with object-relational features and SQL-99 extends SQL2 with object-relational features and other new featuresother new features
Most DBMS vendors implements the core, and then add Most DBMS vendors implements the core, and then add bells and whistles and variationsbells and whistles and variations
Query capability is close to relational algebra, with lots of Query capability is close to relational algebra, with lots of extensions.extensions.
Case insensitive except characters inside quoted strings ' 'Case insensitive except characters inside quoted strings ' '
e.g. 'Smith' e.g. 'Smith' 'SMITH' 'SMITH'
; as statement delimiter; as statement delimiter
1313 © D. Wong 2003© D. Wong 2003
Example database schemaExample database schema
Movie(title, year, length, inColor, studioName, producerC#)Movie(title, year, length, inColor, studioName, producerC#)
StartIn(movieTitle, movieYear, starName)StartIn(movieTitle, movieYear, starName)
MovieStar(name, address, gender, birthdate)MovieStar(name, address, gender, birthdate)
MovieExec(name, address, cert#, netWorth)MovieExec(name, address, cert#, netWorth)
Studio(name, address, presC#)Studio(name, address, presC#)
1414 © D. Wong 2003© D. Wong 2003
SQL Quries – basic formSQL Quries – basic form
SELECT attribute/sSELECT attribute/s
FROM relations / views /subquryFROM relations / views /subqury
WHERE conditional expression;WHERE conditional expression;
1515 © D. Wong 2003© D. Wong 2003
SQL query examplesSQL query examples
1.1. Example 1:Example 1:
SELECT * SELECT *
FROM Movie;FROM Movie; -- * => all attributes of Movie -- * => all attributes of Movie
2.2. Example 2:Example 2:
SELECT * SELECT *
FROM MovieFROM Movie
WHERE studioName = 'Disney' AND year = 1990;WHERE studioName = 'Disney' AND year = 1990;
3.3. Example 3:Example 3:
SELECT title, length SELECT title, length
FROM MovieFROM Movie
WHERE studioName = 'Disney' AND year = 1990;WHERE studioName = 'Disney' AND year = 1990;
1616 © D. Wong 2003© D. Wong 2003
DuplicatesDuplicates
SQL generally operates using bags instead of setsSQL generally operates using bags instead of sets
Exception: UNION, INTERSECT, EXCEPT Exception: UNION, INTERSECT, EXCEPT operationoperation
To eliminate duplicates, add keyword DISTINCT To eliminate duplicates, add keyword DISTINCT to the SELECT clauseto the SELECT clause
e.g. SELECT DISTINCT starName e.g. SELECT DISTINCT starName
FROM StarsIn;FROM StarsIn;
Duplicate elimination is costly. Use judiciously.Duplicate elimination is costly. Use judiciously.
1717 © D. Wong 2003© D. Wong 2003
SQL Correspondence to Relational AlgebraSQL Correspondence to Relational Algebra
SELECT SELECT LL -- -- R.A. project R.A. project
FROM FROM RR -- -- R.A. operands R.A. operands
WHERE WHERE CC ;; -- -- R.A. select R.A. select
R.A. expression: R.A. expression: LL((CC(R))(R))
When reading and writing queries:When reading and writing queries:
1.1. FROMFROM -- what relations are involved-- what relations are involved
2.2. WHEREWHERE -- what's the tuples selection criteria-- what's the tuples selection criteria
3.3. SELECTSELECT -- what columns to output-- what columns to output
1818 © D. Wong 2003© D. Wong 2003
Union, Intersection, Difference of QueriesUnion, Intersection, Difference of Queries
UNION : UNION : R1 UNION R2R1 UNION R2 or or (Q1) UNION (Q2)(Q1) UNION (Q2)
e.g. (SELECT title, year FROM Movie)e.g. (SELECT title, year FROM Movie)
UNIONUNION
(SELECT movieTitle AS title, movieYear (SELECT movieTitle AS title, movieYear AS year FROM StarsIn);AS year FROM StarsIn);
INTERSECT : INTERSECT : R1 INTERSECT R2R1 INTERSECT R2 or or
(Q1) INTERSECT (Q2)(Q1) INTERSECT (Q2)
EXCEPT: EXCEPT: R1 EXCEPT R2R1 EXCEPT R2 -- difference-- difference
(Q1)(Q1) EXCEPT EXCEPT (Q2)(Q2)
1919 © D. Wong 2003© D. Wong 2003
Union, Intersection, Difference of Queries (continued)Union, Intersection, Difference of Queries (continued)
Q1 and Q2 are queries that produce relationsQ1 and Q2 are queries that produce relations
R1 and R2, or results of Q1 and Q2 should have R1 and R2, or results of Q1 and Q2 should have the same list of attributes and attribute types. the same list of attributes and attribute types. Rename if necessary.Rename if necessary.
Duplicates are eliminated automaticallyDuplicates are eliminated automatically
Add the keyword ALL after UNION, Add the keyword ALL after UNION, INTERSECT, or EXCEPT to prevent duplicates INTERSECT, or EXCEPT to prevent duplicates eliminationelimination
2020 © D. Wong 2003© D. Wong 2003
SQL and Relational AlgebraSQL and Relational Algebra
The six independent operations are implemented The six independent operations are implemented by SQLby SQL
SQL is relational completeSQL is relational complete
2121 © D. Wong 2003© D. Wong 2003
Some data values in SQLSome data values in SQL
1.1. Strings Strings
2.2. Dates and TimesDates and Times
3.3. Null valuesNull values
4.4. Truth value of UnknownTruth value of Unknown
2222 © D. Wong 2003© D. Wong 2003
1. Strings1. Strings
Comparison operators (according to lexicographical order) Comparison operators (according to lexicographical order) <, >, <=, >= = <, >, <=, >= =
LIKE -- pattern matchingLIKE -- pattern matching
%% -- matches any sequence of 0 or more characters -- matches any sequence of 0 or more characters
__ -- matches any one character -- matches any one character
E.g.: title LIKE 'Star E.g.: title LIKE 'Star _ _ _ __ _ _ _''
E.g.: title LIKE 'E.g.: title LIKE '%''%''ss%%''
Can specify escape characterCan specify escape characterE.g. title LIKE 'E.g. title LIKE 'x%x%%%x%' ESCAPE 'x'x%' ESCAPE 'x'
2323 © D. Wong 2003© D. Wong 2003
2. Dates and Times2. Dates and Times
Date constant: DATE '2002-10-01'Date constant: DATE '2002-10-01'
Time constant: TIME '15:00:02.5'Time constant: TIME '15:00:02.5'
Timestamp (combines dates and times):Timestamp (combines dates and times):
TIMESTAMP '2002-10-01 15:00:02.5‘TIMESTAMP '2002-10-01 15:00:02.5‘
(beware of implementation differences!)(beware of implementation differences!)
Comparison operators applyComparison operators apply
2424 © D. Wong 2003© D. Wong 2003
3. Null Values3. Null Values
NULL to represent:NULL to represent:
1.1. Value unknownValue unknown
2.2. Value inapplicableValue inapplicable
3.3. Value withheldValue withheld Operations involving NULLOperations involving NULL
1.1. Arithmetic operation: result is NULLArithmetic operation: result is NULL
2.2. Comparison: result is UNKNOWNComparison: result is UNKNOWN NULL is NULL is notnot a constant, therefore NULL cannot be used a constant, therefore NULL cannot be used
explicitly as an operand.explicitly as an operand. IS NULL and IS NOT NULL checksIS NULL and IS NOT NULL checks Read "Pitfalls Regarding Nulls" pp. 250Read "Pitfalls Regarding Nulls" pp. 250
2525 © D. Wong 2003© D. Wong 2003
4. UNKNOWN4. UNKNOWN
Consider TRUE = 1, FALSE = 0, UNKNOWN = Consider TRUE = 1, FALSE = 0, UNKNOWN = 0.50.5
1.1. AND of 2 truth-value = min. of the 2 valuesAND of 2 truth-value = min. of the 2 values
2.2. OR of 2 truth-value = max. of the 2 valuesOR of 2 truth-value = max. of the 2 values
3.3. Negation of v = 1-vNegation of v = 1-v
Refer Fig. 6.2 pp. 250 for truth table for 3-valued Refer Fig. 6.2 pp. 250 for truth table for 3-valued logiclogic
2626 © D. Wong 2003© D. Wong 2003
The Six Clauses in SQL QueriesThe Six Clauses in SQL Queries
1.1. SELECTSELECT -- required-- required
2.2. FROMFROM -- required-- required
3.3. WHEREWHERE
4.4. GROUP BYGROUP BY
5.5. HAVINGHAVING -- if used, must follows a group by -- if used, must follows a group by clauseclause
6.6. ORDER BYORDER BY
Subqueries may appear in the FROM clause and the Subqueries may appear in the FROM clause and the WHERE clauseWHERE clause
Comments begins with ‘--’Comments begins with ‘--’
2727 © D. Wong 2003© D. Wong 2003
Table level SQL (ref. 6.6, pp. 292)Table level SQL (ref. 6.6, pp. 292)
Create table – to define the schema of a base table Create table – to define the schema of a base table (Ref. 6.6.1 for data types syntax)(Ref. 6.6.1 for data types syntax)
E.g. E.g. create tablecreate table EMP EMP (( empno empno int not null,int not null, lastName lastName varchar(varchar(3030) not null,) not null, firstName firstName varchar(varchar(3030) not null,) not null, num_of_children num_of_children int,int, constraintconstraint pk_EMP pk_EMP primary keyprimary key ((empnoempno))););
Drop table – to destroy a base tableDrop table – to destroy a base tablee.g. e.g. drop tabledrop table EMP; EMP;
2828 © D. Wong 2003© D. Wong 2003
Tuple Modification Statements (ref. 6.5, pp. 286)Tuple Modification Statements (ref. 6.5, pp. 286)
Insert – to add a rowInsert – to add a row
Syntax: Syntax: insert intoinsert into R(A R(A11..A..Ann) ) valuesvalues (v (v11…v…vnn))
– E.g. E.g. insert intoinsert into emp( emp(empnoempno,, lastName lastName,, firstName firstName, , num_of_children)num_of_children) valuesvalues (12345, ‘Doe’, ‘John’, 1) (12345, ‘Doe’, ‘John’, 1)
– Or Or insert intoinsert into emp emp valuesvalues (12345, ‘Doe’, ‘John’, 1) (12345, ‘Doe’, ‘John’, 1)
Delete – to remove a rowDelete – to remove a row
Syntax: Syntax: delete fromdelete from R R wherewhere <condition> <condition>
– E.g. E.g. delete fromdelete from emp emp wherewhere empno = 12345 empno = 12345 Update – to modify the contents of a rowUpdate – to modify the contents of a row
Syntax: Syntax: updateupdate R R setset A Aii = value = value wherewhere A Ajj = targetValue = targetValue
– E.g. E.g. updateupdate emp emp setset num_of_children = 2 num_of_children = 2 wherewhere empno = empno = 1234512345
2929 © D. Wong 2003© D. Wong 2003
Some JOINS in SQL. (ref. pp. 270)Some JOINS in SQL. (ref. pp. 270)
CROSS JOINCROSS JOIN -- -- R.A. cartesian product R.A. cartesian product
e.g. Movie CROSS JOIN StarsIn;e.g. Movie CROSS JOIN StarsIn;
JOIN … ONJOIN … ON -- -- R.A. theta-join R.A. theta-join
e.g. Movie JOIN StarsIn ON title = movieTitle AND year = e.g. Movie JOIN StarsIn ON title = movieTitle AND year = movieYear;movieYear;
[NATURAL] JOIN[NATURAL] JOIN -- -- R.A. natural join R.A. natural join
e.g. MovieStar NATURAL JOIN MovieExec; ore.g. MovieStar NATURAL JOIN MovieExec; or
MovieStar JOIN MovieExec;MovieStar JOIN MovieExec;
OUTERJOINSOUTERJOINS -- joins that include dangling -- joins that include dangling tuplestuples
3030 © D. Wong 2003© D. Wong 2003
OUTERJOINSOUTERJOINS
An operator to augment the result of a join by the An operator to augment the result of a join by the dangling tuples, padded with null values.dangling tuples, padded with null values.
Full outerjoin of R1 and R2 is a join that includes all Full outerjoin of R1 and R2 is a join that includes all rows from R1 and R2 matched or not. Unmatched rows rows from R1 and R2 matched or not. Unmatched rows are padded with NULLs.are padded with NULLs.
LEFT outerjoin of R1 and R2 is a join that includes all LEFT outerjoin of R1 and R2 is a join that includes all rows from R1, matched or not, plus the matching values rows from R1, matched or not, plus the matching values from R2. Unmatched rows are padded with NULLs.from R2. Unmatched rows are padded with NULLs.
RIGHT outerjoin of R1 and R2 is a join that includes all RIGHT outerjoin of R1 and R2 is a join that includes all rows from R2, matched or not, plus the matching values rows from R2, matched or not, plus the matching values from R1. Unmatched rows are padded with NULLs.from R1. Unmatched rows are padded with NULLs.
The joining may be NATURAL or theta joinThe joining may be NATURAL or theta join
3131 © D. Wong 2003© D. Wong 2003
Outerjoins SyntaxOuterjoins Syntax
1.1. R1R1 NATURAL { NATURAL {FULL FULL | | LEFT LEFT | | RIGHT}RIGHT} OUTER OUTER JOIN JOIN R2R2;;
E.g. 1. MovieStar NATURAL FULL OUTER E.g. 1. MovieStar NATURAL FULL OUTER JOIN MovieExec;JOIN MovieExec;
E.g. 2. MovieStar NATURAL LEFT OUTER E.g. 2. MovieStar NATURAL LEFT OUTER JOIN MovieExec;JOIN MovieExec;
E.g. 3. MovieStar NATURAL RIGHT OUTER E.g. 3. MovieStar NATURAL RIGHT OUTER JOIN MovieExec;JOIN MovieExec;
3232 © D. Wong 2003© D. Wong 2003
Outerjoins Syntax (continued)Outerjoins Syntax (continued)
1.1. R1R1 { {FULL FULL | | LEFT LEFT | | RIGHT}RIGHT} OUTER JOIN OUTER JOIN R2 R2 ON conditional expressionON conditional expression;;
E.g. 1. Movie E.g. 1. Movie FULL OUTER JOINFULL OUTER JOIN StarsIn StarsIn ONON title = movieTitle title = movieTitle ANDAND year = movieYear; year = movieYear;
E.g. 2. MovieStar E.g. 2. MovieStar LEFT OUTER JOINLEFT OUTER JOIN StarsIn StarsIn ONON title = movieTitle title = movieTitle ANDAND year = movieYear; year = movieYear;
E.g. 3. MovieStar E.g. 3. MovieStar RIGHT OUTER JOINRIGHT OUTER JOIN StarsIn StarsIn ONON title = movieTitle title = movieTitle ANDAND year = movieYear; year = movieYear;
3333 © D. Wong 2003© D. Wong 2003
Use result of joins as subqueries in queriesUse result of joins as subqueries in queries
E.g. E.g.
SELECT title, year, length, inColor, studioName, SELECT title, year, length, inColor, studioName, producerC#, starNameproducerC#, starName
FROM Movie JOIN StarsIn ONFROM Movie JOIN StarsIn ON
title = movieTitle AND year = movieYear;title = movieTitle AND year = movieYear;
top related