normalization (codd, 1972) practical information for real world database design
TRANSCRIPT
Normalization(Codd, 1972)
Practical Information
For Real World Database Design
Requirements for Relational DB
• Table format• Supports Boolean Algebra
– Selects, joins, projects + 5 other operations to define queries
• Supports mathematical, relational, and logical operators (And-Or-Not)
• Codd’s Twelve Rules (abstracted)– Null values can be present except in primary key– All data represented in tables– Must be able to update views– Access data with table name, field name, or value– Data and programs should be independent
– Should enforce integrity and validity constraints
Normalization Defined
• Normalization– Purpose is to avoid potential update problems called
anomalies– Assigned attributes to entities using 1NF, 2NF, 3NF,
4NF, 5NF
• Denormalization– Moving back a level to gain better performance in the
real-world database; in practice 3NF is most common, however, to gain efficiency and speed, minor changes may need to be made
Why Normalization Is Important
• If not done, updates are less efficient (larger tables, possibly more than one update per data item change)
• If not done, indexing is more cumbersome – impractical to build large databases
• If not done, no simple strategies for creating views required by users
Design Rules
• Determine Business Rules– A company manages many different projects
– Each project requires the services of many employees
– Employees may be assigned to work on more than one project
– Each employee has a job classification
– Many employees have the same job classification
• Translate business rules to validity constraints and relationships
Design Rules, Continued
• Analyze documents, interview key users, etc. to develop a field list
• Determine entities to be used (see next slide for definition)
• Determine relationships between entities• Assign the attributes to the entities• Identify primary and foreign keys• Check for 1NF, 2NF, and 3 NF
Definitions
• Entity – the subject to be modeled by the database file (table or relation)
• Primary Key – the field value that uniquely identifies the entity entry (row, tuple, record); all other attributes are functionally dependent on it; can’t be null
• Foreign Key – the field (attribute, column) that relates the table to a pre-existing table
• Functional Dependence - determines or depends on, e.g. advisor name depends on advisor ID
Definitions, Continued
• Views– Selected group of records (select)– Selected group of fields (project)– Selected group of records and fields from two
or more tables (join)– A query– A report– A set of labels
Definitions, Continued
• Determinant – Determines value of another attribute; e.g. primary key
• Indexes– Tables that contain record numbers only arranged in an order
based on some field value
• Entity Integrity– Every table must have a field to uniquely identity each
record and there must be a field value for every record
• Referential Integrity– If a record has a value in a foreign key field, it must match
an exiting value in the original table to which it is linked
1 Normal Form
• Table does not contain repeating groups– To put it another way, each record has at least one field
that differentiates it from every other record in the file; e.g. a unique primary key
Examples:Faculty ID is primary key and the same faculty id is associated with two or more courses
Solve by creating a course fileFaculty ID is primary key and the same faculty id is associated with two or more offices
Solve by redesigning database to include offices as a separate table
2 Normal Form
• Table must be in 1 Normal Form
• No non-key attribute is dependent on only part of the concatenated key– Concatenated key (two or more fields taken
together represent primary key)• In course table, concatenated key is faculty ID and
Catalog No – every field in table must be dependent on both faculty ID and catalog number
Anomalies Avoided By 2 NF
• Only have one data item to change when update is made
• Avoids “loose” data when deletes are made– When a part number is deleted, could lose reference to
invoice
– How do you add a new course when there is no associated faculty ID?
– A new office with no assigned faculty?
• Avoids inconsistent data
3 Normal Form
• The only determinants are candidate keys– Candidate keys in student file are social
security number and patron ID (both are unique)
• To put it another way, there are no transitive dependencies– If student file contains Dept ID (foreign key)
and department name, this is a transitive dependency
4 and 5 Normal Form
• 4 Normal Form– There are no multivalued dependencies – is like
Boyce Codd
• 5 Normal Form– Holds only theoretical interest