data modeling overview by: dave wentzel. what we will accomplish u review of dbms u issues related...
TRANSCRIPT
Data Modeling Overview
By: Dave Wentzel
What we will accomplish
Review of DBMS Issues related to DBMS Entity Relationship Modeling
– Process flow– Model types– Component definition
Selecting entities and attributes Defining relationships
What we will accomplish
Defining Cardinality Selecting Primary Keys Review of recursive relationships, weak
entities, and ternary relationships Participation constraints Erwin Notation NULL issues The Physical Model
What we will accomplish
Generalization / Specialization Transaction processing Normalization Rules History issues
What is data? Data
– Raw facts. Can be described, observed, and measured.
Information– Data organized in a form that is useful for
decision making. The meaning behind the data.– New thing not previously observed that is
created based on the data. Knowledge
– Information that is used for decision making.
What is a Database?
Collection of interrelated data Data which can be visualized in a table
format Contains relationships between data Can be of any size and varying complexity Can be maintained manually or by
computer
D atabase
Data Base Management System (DBMS)
Collection of programs (software) that allows users to create and maintain a database
Supports data:– Definition - specification of data types,
structures, and constraints– Construction - storing of the data itself– Manipulation - updating & querying of the data
Defines itself. Contains a catalog which describes its data.
Components of a DBMS
Catalog– Maintains information about the data in the
database– Considered data about data (metadata)
Databases– Collection of related tables
Tables– Rows and columns containing data
Issues in DBMS Data independence Query optimization
– Improve efficiency– Faster responses
Transaction management– Sequence of operations that are treated as a unit– Once 1st step is completed, 2nd step must also be
completed otherwise 1st step is aborted (ROLLBACK mechanism)
Example: Transferring Bank Funds
Issues in DBMS continued
Transaction management – Concurrency– Recovery
Controlled redundancy– Goal of database design is to minimize
redundancy (duplicate data) Integrity constraints
– Includes business rules and data rules
Issues in DBMS continued
Security and privacy– Protect against unauthorized access
Data / database administration– Involves managing people, data, performance,
security, etc.
Entity Relationship Modeling
Person Account
T ransaction
Em ployee
Data Model
Tool for describing data, its relationships, semantics, and integrity constraints
Provides for data abstraction Hides details of data storage
Why use an ER Model?
Easy to use for modeling DB design Succinct representation of database layout Good communication tool among project
team members Most case tools support ER modeling Implementation independent
Categories of Data Models
Logical model – Conceptual data model– High level model– Closest view user has of the data
Physical model– Low level model– Defines how data is stored
Steps in Database Design
Mini World
RequirementsCollection and
Analysis
Functional Analysis
Functional Requirements
Database Requirements
API
Physical Design
TransactionImplementation
Application ProgramDesign
Logical Model
Data Model MappingDBMS Independent
DBMS Specific
High Level Trans-action Requirement
Internal Schema
Application Programs
ER Modeling composed of
Entity (table) Attribute (field) Relationship
– Binary Relationships– Cardinality of relationships
What is an entity?
Conceptual definition– Distinguishable object that exists
Operational definition– Business object that has properties we are
interested in storing Physical definition
– Set of related data forming a table composed of attributes (fields)
Entities
Primary THINGS of a business about which users need to record data
Objects about which the business is interested in tracking information
When an ER Diagram is translated into a relational model, the entities become the tables.
Selecting Entities
Nouns are candidate entities Possible classes of entities:
– People who carry out some function ( employees, students, customers)
– Places (cities, offices, routes)– Things which are tangible physical objects
(equipment, products, buildings)– Organizations (teams, suppliers, departments)
Selecting Entities Continued
Events which occur at a given date/time or have steps (employee promotions, project phases, account payments)
Concepts which are intangible ideas used to keep track of business activities (projects, accounts, complaints)
Questions to ask...
What things do we need to keep data about? What things are essential to the organization? What things do we talk about in the organization? What questions do we have that reports can help
answer? What information should the reports contain?
Naming entities
Use a SINGULAR noun Meaningful but intuitive Avoid names which may be misinterpreted within
the problem domain Follow organizational / industry trends Do not try to rename entities within an organization Avoid abused names such as Task, Form,
Operation, Schedule...
Is it an entity to worry about?
Decide if an entity is relevant to your problem domain by determining if it has attributes you need to track
If it does not have attributes you need to track, it is NOT a valid entity for your problem
Is it really an entity?
Can you define attributes for it? An attribute is a piece of information that we are interested in tracking about an entity. It is a property of an entity.
In general, if two objects differ by one attribute, they are separate entities.
Does it participate in a relationship? Two entities that are related somehow interact with one another.
Attributes
Properties of an object (entity) Each attribute has a data type (char, int,
datetime) Each attribute in an RDBMS (relational
database management system) has only one value at a time (atomic)
Categories of Attributes
Descriptive– Property of the entity that helps describe the
entity Identifying (key attributes)
– Property of the entity that helps uniquely identify the entity
– Normally short– If one does not exist it MUST be created– If creating a key, use a numeric/integer data
type
Types of Attributes
Atomic– Indivisible value– Most desired state
Composite– Can be divided into smaller parts– Need to convert into atomic
Types of Attributes Continued
Multi-valued– Multiple instances of an attribute– Normally create another entity
Derived– Can be determined by the value of another
attribute or attributes– In most cases, do NOT store derived attributes
Naming Attributes
Use a noun, adjective, or adverb Name should be unique database wide Use attribute names consistently Use singular names Define a naming convention for the
organization
Rules for Entity Analysis Every noun is a candidate for an entity Every entity should be relevant to the problem If an object has only one property of importance,
then it should be considered an attribute of another entity
If an object has only one data instance (1 row) then do not model as an entity
If an object needs a unique identifier then model it as an entity
Relationships
Way entities interact with one another An association between two or more
entities Depicts business interactions between
entities They DO NOT represent business flow
Relationships Continued
Number of entities associated through a relationship defines its degree (unary, binary, ternary, n-ary)
Cardinality defines the maximum number of entities that can participate in the relationship
How to Identify a Relationship
Ask what is the action or verb used to describe how one entity interacts with another
Three types of relations to consider:– Existence (Employee HAS Children)– Functional (Professor TEACHES Course)– Event (Customer PLACES Order)
Ignore verbs not important to the organization
More on Relationships
Relationships and cardinality constraints represent business rules
When naming a relationship use and active verb in the present tense
Relationships are read bi-directionally
Example notes: Together the customer and account tables form a
schema - structure / layout of a logical database design
Note the attributes. Order DOES NOT MATTER but convention puts primary key first.
No duplicates for attributes. No duplicate tuples (rows) Relationship - same attribute name ( or different
attribute name with same meaning, in 2 tables.
Cardinality Constraints
Express the MAXIMUM number of entities that can be associated with another entity via a relationship
Also known as mapping constraints Types:
– 1:1 (one to one)– 1:N (one to many)– N:M (many to many)
The Key to It All
Identifiers...
Attribute(s) which uniquely identify a record
An entity may have multiple identifiers Every entity MUST have at least one Can be made up of more then one attribute
Candidate vs. Primary Keys
Both are identifiers Candidate keys are all the identifiers from
which you can choose which uniquely identify the record
Primary key is the one candidate key which is selected to always uniquely identify the record
Selecting the Primary Key
In general we create a primary key however...
Choose the attribute most widely used in the query
Select the shorter data type If one does not exist, must create one Select a MINIMUM key if using compound
attributes (not recommended)
Key Requirements and Preferences Known at all times Can NOT be null Should not be changed Shorter is better Numeric / integer is better Avoid keys containing letters O, I, Z, S - can be
confused with numbers If key includes time, it should be in 24hr format Avoid carrying meaning
With this all said...
It is difficult to come up with a primary key based on real attributes which will not change over time (phone numbers, SSN, addresses, driver’s license numbers…)
In most cases it is best to create the primary key
In SQL Server can use the identity column which creates a sequential number
Primary Keys and Relationships
In a 1:1 relationship, the primary key of either one of the entities must migrate to the other entity
In a 1:N, the primary key of the 1 side must migrate to the entity on the N side
In a M:N, the keys of both entities are used to identify a new entity which resolves the M:N into two 1:N relationships
Foreign Key
When a key migrates to another entity it is called a Foreign Key
A foreign key CAN BE null if it is not part of an entity’s primary key
If the FK value is NOT null, then that value MUST exist in the table in which it is the primary key. This is called Referential Integrity (RI)
Recursive Relationships
An entity having a relationship with itself Same entity participates more than once in
a relationship type in different roles Same cardinality examples exist in
recursive relationships
Weak Entity Type
Entity that does not have a key attribute of its own
Identified by its relationship with another entity Created for multi-valued attributes and time
dependent attributes Weak entity has EXISTENCE dependence on
the parent. Only exists if the owner entity exists.
Primary Keys of Weak Entities
Can use the primary key of the owner entity along with a qualifier such as sequence number or date/time
Can create a surrogate key but make sure you migrate the key of the parent
Ternary Relationship
Relationship between 3 entities Differs from 3 binary relationships States that all three entities occur at the
same time Must be converted to binary relationships
Creating Binary Relationships from a Ternary Relationship
Participation Constraints
Specifies whether the existence of an entity depends on its being related to another entity via a relationship
Notes the minimum cardinality Total participation (mandatory) Partial participation (optional)
Identifying Participation Constraints
Can entity A exist without entity B?– If no, A has total participation in the
relationship– If yes, entity A has partial participation in the
relationship
Identifying Relationships In Erwin
An identifying relationship is a relationship between two tables in which an instance of a child table is identified through its association with a parent table, which means the child table is dependent on the parent table for its identity, and cannot exist without it. In an identifying relationship, one instance of the parent table is related to multiple instances of the child.
Non-Identifying Relationship In Erwin
A non-identifying relationship is a relationship between two tables in which an instance of the child table is not identified through its association with a parent table, which means the child table is not dependent on the parent table for its identity, and can exist without it. In a non-identifying relationship, one instance of the parent table is related to multiple instances of the child.
Optional Non-Identifying
In an optional non-identifying relationship, the columns that are migrated into the non-key area of the child table are not required in the child table. This means that nulls are allowed in the foreign key. ERwin draws an optional non-identifying relationship differently depending on the notation for your diagram
Mandatory Non-Identifying
In a mandatory non-identifying relationship, the columns that are migrated into the non-key area of the child table are required in the child table. This means that the foreign key cannot be null.
Erwin NotationCardinality Description
Identifying Non-Identifying
Nulls No Nulls
One to 0, 1, or M
To Null or Not to Null….
NULL means no value Two types of null values
– Unknown– None (does not exist or not applicable)
Null Examples
Employee
e# name salary spouse1 Bob 10,000 Mary2 Jack 20,000 Kate3 Mary 30,000 NULL4 Kelly NULL John
Questions:
• How many people make more than 15K?
• What is the average salary?
• Is Mary married?
Problems with NULL
Null values are ambiguous More programming is required to deal with
NULL values Try to use UNKNOWN or NONE if
applicable
Getting Physical…
Getting Physical…
Converting the logical data model into the physical data model
Things to do when converting
Identify data type– Is it a string (character field) or a number?– Use of varchar() or char()?– Dates are dates not strings
Identify data length– Consider growth over time and maximum size
requirements Identify value constraints (valid ranges, values,
etc.)
Things to do when converting
Follow proper naming conventions Determine indexes Consider combining 1:1 relationship
entities Roll-up generalization / specialization
hierarchies Add organizational attributes if any
Indexes
Index is a physical access structure Makes queries more efficient Things to consider when creating
– Create an index for each PK– Create an index for each FK– Create an index for each AK which will be used in
queries– Try to minimize number of indexes (update
overhead)
Specialization / Generalization
Specialization / Generalization
Inheritance / Abstraction Subclasses / Superclasses
Specialization / Generalization
Two processes resulting in the same model Specialization is top-down approach. Can a
high level entity be broken down? Generalization is bottom-up approach. Can
entities be combined at a higher level?
Example
Notes on Generalization/Specialization Key of subclass is always key of superclass Subclasses can participate in their own relationships Participation in a subclass can either be inclusive or
exclusive Exclusive subclasses should be defined by a type Multiple inheritance not allowed in most modeling tools When converting to physical could combine into one
entity
Database Operations
CRUD – Create (Insert)– Read– Update (Modify)– Delete
Transactions can not violate any integrity constraints
Several may be grouped into a transaction May propagate to maintain integrity constraints
If update violations occur
Cancel the operation (Restrict) Perform additional updates / deletes so the
violation is corrected (Cascade) Execute a user specified operation to
correct (Trigger) Perform the operation but inform the user
Normalization - What’s normal...
Normalization
Process to design a highly desirable relational schema using functional dependencies
Guidelines for relational database design which– Minimize redundancy– Avoid potential inconsistency– Help predict data behavior problems– Avoid update anomalies
Update Anomalies
Insert extra values Add redundant records Delete records not intended Change a fact more then once, possibly in
multiple tables Miss changing a fact which is repeated
multiple times
Normal Forms
First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form Fourth Normal Form Fifth Normal Form
# of Tables
Joins
First Normal Form A relation is in 1NF if it contains only scalar
(atomic) values– One value for an attribute– No repeating groups– No composite attributes– No multi-valued attributes
To convert to 1NF– Create 1 table for each repeating group by adding the
PK of the original table– Remove the repeating group from the original table
Example of Non-1NF w/ ConversionNon-1NF
Dname Dnumber DMGRSSN DlocationsResearch 5 333445555 {Bellaire, Sugarland, Houston}Administration 4 987654321 Stafford, VoorheesHeadquarters 1 888665555 Houston
1NF (note redundancy)
Dname Dnumber DMGRSSN DlocationsResearch 5 333445555 BellaireResearch 5 333445555 SugarlandResearch 5 333445555 HoustonAdministration 4 987654321 StaffordAdministration 4 987654321 VoorheesHeadquarters 1 888665555 Houston
Example of Non-1NFEmployeeProject - NON-1NF
SSN Ename Pnumber Hours123456789 Smith, John 1 32.5
2 7.5666885555 Narayan, Ramesh 3 40453223344 English, Joyce 1 20
2 20
Conversion
SSN Ename SSN Pnumber Hours123456789 Smith, John 123456789 1 32.5666885555 Narayan, Ramesh 123456789 2 7.5453223344 English, Joyce 666885555 3 40
453223344 1 20453223344 2 20
Second Normal Form
All attributes in the relation have a functional dependency on the complete PK
Each non-key attribute is uniquely defined by all components of the primary key
Example of Non-2NF w/ ConversionEmployeeProject
SSN Pnumber Hours Ename Pname Plocation FD1
FD2FD3
Conversion to 2NF
EP1SSN Pnumber Hours
EP2SSN Ename
EP3Pnumber Pname Plocation
Third Normal Form
Every non-key attribute (does not participate in the primary key) is mutually independent
Irreducibly dependent on the primary key
Example of Non-3NF w/ ConversionExample
LotsPropertyID# CountyName Lot# Area Price TaxRate
2NF
Lots1PropertyID# CountyName Lot# Area Price
Lots2CountyName TaxRate
3NF
Lots1APropertyID# CountyName Lot# Area
Lots1BArea Price
Maintaining History
Maintaining History can serve one of two purposes:– Tracking changes in the entity over time– Tracking record history in order to maintain inactive
records over time and maintain RI Tracking changes in an entity over time is very
difficult and requires significant storage Tracking inactive records is our standard here
and provides value to the end user
Examples of History…