basics of database tuning · • decomposition in database design • functional dependencies •...

Post on 25-Jun-2020

15 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Functional Dependencies

CMPT 354: Database I -- Functional Dependencies 2

Redundancy in Database Design

• A table Students-take-courses (stud-id, name, address, phone, crs-id, instructor-name, office)– Students(stud-id, name, address, phone, …)– Instructors(name, office, …)

• Redundant information– If a student takes 20 courses, her/his name, address,

phone number have to be repeated 20 times– If an instructor teaches 2 courses with 120 students in

total, her/his office number is repeated 120 times

CMPT 354: Database I -- Functional Dependencies 3

Why Redundancy Could Be Bad?

• Space cost• Maintenance overhead

– If a student updates her/his address, 20 records need to be updated

– If an instructor moves to a new office, 120 records need to be updated

– What if inconsistency happens during the update?

CMPT 354: Database I -- Functional Dependencies 4

Why Redundancy Could Be Good?

• Students-take-courses(stud-id, name, crs-id)– Student name is redundant if we have table

Students(stud-id, name, address, phone, …)– Only need Students-take-courses(stud-id, crs-

id)• What if often we need to generate class

rosters?– Fast query answering: avoid joining two tables

many times

CMPT 354: Database I -- Functional Dependencies 5

Requirements of Good Design

• Correctness: no information loss– Must be guaranteed

• Efficiency– Minimum (or, as less as possible) redundant

(repeated) information– Good performance with respect to (expected)

typical workload– May have to trade off between space and query

answering time• Redundant information may help query answering

CMPT 354: Database I -- Functional Dependencies 6

Atomic Domains

• Domain is atomic if its elements are considered to be indivisible units– Course-id consisting of department code and course

number, e.g., CMPT 354– Bad examples: a customer’s all accounts, all owners of

an account• Non-atomic values complicate storage and query

answering, and encourage redundant (repeated) storage of data– Storage and redundancy: a set of accounts stored with

each customer, and a set of owners stored with each account

CMPT 354: Database I -- Functional Dependencies 7

First Normal Form

• Normal form: a quality criteria that the database design should meet

• A relational schema R is in first normal form if the domains of all attributes of R are atomic– All relations are assumed in first normal form

• A property of how the elements of the domain are used– Strings would normally be considered indivisible – Course-id is not atomic since two pieces of information

are encoded

CMPT 354: Database I -- Functional Dependencies 8

Combine Schemas?

• Combine borrow and loan to get bor_loan = (customer_id, loan_number, amount )

• Result is possible repetition of information (L-100 in example below)

CMPT 354: Database I -- Functional Dependencies 9

Why Decomposition?

• Suppose we had started with bor_loan, how would we know to split up (decompose) it into borrower and loan?

• Write a rule “if there were a schema (loan_number, amount), then loan_number would be a candidate key”

CMPT 354: Database I -- Functional Dependencies 10

Why Decomposition?

• Denote as a functional dependency loan_number → amount

• In bor_loan, because loan_number is not a candidate key, the amount of a loan may have to be repeated– This indicates the need to decompose bor_loan

CMPT 354: Database I -- Functional Dependencies 11

Combined Schema w/o Repetition

• Consider combining loan_branch and loanloan_amt_br = (loan_number, amount,

branch_name)• No repetition

CMPT 354: Database I -- Functional Dependencies 12

Decomposition Is Not Always Good

• Suppose we decompose employee intoemployee1 = (employee_id, employee_name)employee2 = (employee_name,

telephone_number, start_date)• We cannot reconstruct the original

employee relation if there are two employees having the same name

CMPT 354: Database I -- Functional Dependencies 13

A Lossy Decomposition

More tuples after re-joining the tables is considered loss of information instead of gain

CMPT 354: Database I -- Functional Dependencies 14

Designing by Decomposition

• Start from a wide table – the universal table– Containing all pieces of information

• Decide whether a particular relation R is in “good” form

• In the case that a relation R is not in a “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that – Each relation is in good form– The decomposition does not lose information

CMPT 354: Database I -- Functional Dependencies 15

Functional Dependencies

• Constraints on the set of legal relations• Require that the value for a certain set of

attributes determines uniquely the value for another set of attributes

• A functional dependency is a generalization of the notion of a key

CMPT 354: Database I -- Functional Dependencies 16

Functional Dependencies

• Let R be a relation schema, α ⊆ R and β⊆ R

• The functional dependency α → βholds on R if and only if for any legal relations r(R), whenever any two tuples t1and t2 of r agree on the attributes α, they also agree on the attributes β– t1[α] = t2 [α] ⇒ t1[β ] = t2 [β ]

CMPT 354: Database I -- Functional Dependencies 17

Example

• Example: Consider r(A,B ) with the following instance of r.

• On this instance, A → B does NOT hold, but B → A does hold

A B1 41 53 7

CMPT 354: Database I -- Functional Dependencies 18

Super Keys and Candidate Keys

• K is a superkey for relation schema R if and only if K → R

• K is a candidate key for R if and only if K →R and for no α ⊂ K, α → R

CMPT 354: Database I -- Functional Dependencies 19

Dependencies and Constraints

• Functional dependencies can express constraints that cannot be expressed using superkeys

• Consider the schemabor_loan = (customer_id, loan_number, amount )– We expect loan_number → amount– We do not expect amount → customer_id

CMPT 354: Database I -- Functional Dependencies 20

Use of Functional Dependencies• Testing relations to see if they are legal under a given set

of functional dependencies– If a relation r is legal under a set F of functional dependencies, we

say that r satisfies F

• Specifying constraints on the set of legal relations– We say that F holds on R if all legal relations on R satisfy the set of

functional dependencies F

• A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances– For example, a specific instance of loan may, by chance, satisfy

amount → customer_name

CMPT 354: Database I -- Functional Dependencies 21

Trivial Functional Dependencies

• A functional dependency is trivial if it is satisfied by all instances of a relation– Example:

• customer_name, loan_number → customer_name• customer_name → customer_name

• In general, α → β is trivial if β ⊆ α

CMPT 354: Database I -- Functional Dependencies 22

Closure

• A set of functional dependencies may logically imply other functional dependencies– If A → B and B → C, then A → C

• The set of all functional dependencies logically implied by F is the closure of F

• We denote the closure of F by F+

– F+ is a superset of F

CMPT 354: Database I -- Functional Dependencies 23

Armstrong’s Axioms

• Finding F+

– (reflexivity) If β ⊆ α, then α → β– (augmentation) If α → β, then γ α → γ β– (transitivity) If α → β, and β → γ, then α → γ

• These rules are – Sound: generate only functional dependencies

that actually hold – Complete: generate all functional dependencies

that hold

CMPT 354: Database I -- Functional Dependencies 24

Example

• R = (A, B, C, G, H, I)F = { A → B

A → CCG → HCG → I

B → H}• some members of F+

A → H – By using transitivity

from A → B and B → H

AG → I – By augmenting A → C

with G, to get AG → CG and then using transitivity with CG → I

CG → HI – By augmenting CG → I

to infer CG → CGI, and augmenting of CG → H to infer CGI → HI, and then using transitivity

CMPT 354: Database I -- Functional Dependencies 25

Procedure for Computing F+

F + = Frepeat

for each functional dependency f in F+

apply reflexivity and augmentation rules on fadd the resulting functional dependencies to F +

for each pair of functional dependencies f1and f2 in F +if f1 and f2 can be combined using transitivity

then add the resulting functional dependency to F +until F + does not change any further

CMPT 354: Database I -- Functional Dependencies 26

Auxiliary Rules

• We can further simplify manual computation of F+ by using the following additional rules– (union) If α → β holds and α → γ holds, then α→ β γ holds

– (decomposition) If α → β γ holds, then α → βholds and α → γ holds

– (pseudotransitivity) If α → β holds and γ β → δholds, then α γ → δ holds

• The above rules can be inferred from Armstrong’s axioms

CMPT 354: Database I -- Functional Dependencies 27

Summary

• First normal form• Decomposition in database design• Functional dependencies• Armstrong’s axioms and auxiliary rules for

closure computation

CMPT 354: Database I -- Functional Dependencies 28

To-Do-List

• Please prove the auxiliary rules using Armstrong’s Axioms

top related