copyright irwin/mcgraw-hill 1998 1 data modeling introduction the presentation will address the...
TRANSCRIPT
Copyright Irwin/McGraw-Hill 19981
Data Modeling
Introduction
The presentation will address the following questions: What is systems modeling and what is the difference between
logical and physical system models? What is data modeling and what are its benefits? Can you recognize and understand the basic concepts and
constructs of a data model? Can you read and interpret a entity relationship data model? When in a project are data models constructed and where are they
stored? Can you discover entities and relationships? Can you construct an entity-relationship context diagram?
Copyright Irwin/McGraw-Hill 19982
Data Modeling
Introduction
The presentation will address the following questions: Can you discover or invent keys for entities? Can you construct a fully attributed entity relationship diagram
and describe all data structures and attributes to the repository or encyclopedia?
Copyright Irwin/McGraw-Hill 19983
Data ModelingAn Introduction to Systems
Modeling
Systems Modeling One way to structure unstructured problems is to draw models.
A model is a representation of reality. Just as a picture is worth a thousand words, most system models are pictorial representations of reality.
Models can be built for existing systems as a way to better understand those systems, or for proposed systems as a way to document business requirements or technical designs.
What are Logical Models? Logical models show what a system ‘is’ or ‘does’. They are
implementation-independent; that is, they depict the system independent of any technical implementation. As such, logical models illustrate the essence of the system.
Copyright Irwin/McGraw-Hill 19984
Data ModelingAn Introduction to Systems
Modeling
Systems Modeling What are Physical Models?
Physical models show not only what a system ‘is’ or ‘does’, but also how the system is physically and technically implemented. They are implementation-dependent because they reflect technology choices, and the limitations of those technology choices.
Systems analysts use logical system models to depict business requirements, and physical system models to depict technical designs.
Copyright Irwin/McGraw-Hill 19985
Data ModelingAn Introduction to Systems
Modeling
Systems Modeling Systems analysis activities tend to focus on the logical system
models for the following reasons: Logical models remove biases that are the result of the way the
current system is implemented or the way that any one person thinks the system might be implemented.
Logical models reduce the risk of missing business requirements because we are too preoccupied with technical details.
Logical models allow us to communicate with end-users in non-technical or less technical languages.
Copyright Irwin/McGraw-Hill 19986
Data ModelingAn Introduction to Systems
Modeling
Systems Modeling Data modeling is a technique for defining business requirements
for a database. Data modeling is a technique for organizing and documenting
a system’s DATA. Data modeling is sometimes called database modeling because a data model is usually implemented as a database. It is sometimes called information modeling.
Many experts consider data modeling to be the most important of the modeling techniques.
Why is data modeling considered crucial? Data is viewed as a resource to be shared by as many processes
as possible. As a result, data must be organized in a way that is flexible and adaptable to unanticipated business requirements – and that is the purpose of data modeling.
Copyright Irwin/McGraw-Hill 19987
Data ModelingAn Introduction to Systems
Modeling
Systems Modeling Why is data modeling considered crucial? (continued)
Data structures and properties are reasonably permanent – certainly a great deal more stable than the processes that use the data. Often the data model of a current system is nearly identical to that of the desired system.
Data models are much smaller than process and object models and can be constructed more rapidly.
The process of constructing data models helps analysts and users quickly reach consensus on business terminology and rules.
Copyright Irwin/McGraw-Hill 19988
Data Modeling
CUSTOMER
Customer Number (PK) Customer Name Shipping Address Billing Address Balance Due
ORDER
Order Number (PK) Order Date Order Total Cost Customer Number (FK)
INVENTORY PRODUCT
Product Number (PK) Product Name Product Unit of Measure Product Unit Price
ORDERED PRODUCT
Ordered Product ID (PK) . Order Number (FK) . Product Number (FK) Quantity Ordered Unit Price at Time of Order
has placed
sold
sold as
Copyright Irwin/McGraw-Hill 19989
Data Modeling
System Concepts for Data Modeling
System Concepts Most systems analysis techniques are strongly rooted in systems
thinking. Systems thinking is the application of formal systems theory
and concepts to systems problem solving. There are several notations for data modeling, but the actual model
is frequently called an entity relationship diagram (ERD). An ERD depicts data in terms of the entities and relationships
described by the data.
Copyright Irwin/McGraw-Hill 199810
Data Modeling
System Concepts for Data Modeling
Entities All systems contain data. Data describes ‘things’. A concept to abstractly represent all instances of a group of
similar ‘things’ is called an entity. An entity is something about which we want to store data.
Synonyms include entity type and entity class. An entity is a class of persons, places, objects, events, or
concepts about which we need to capture and store data. An entity instance is a single occurrence of an entity.
STUDENT
An entity
Copyright Irwin/McGraw-Hill 199811
Data Modeling
System Concepts for Data Modeling
Attributes The pieces of data that we want to store about each instance of a
given entity are called attributes. An attribute is a descriptive property or characteristic of an
entity. Synonyms include element, property, and field. Some attributes can be logically grouped into super-attributes
called compound attributes. A compound attribute is one that actually consists of more
primitive attributes. Synonyms in different data modeling languages are numerous: concatenated attribute, composite attribute, and data structure.
STUDENT
Name . Last Name . First Name . Middle Initial Address . Street Address . City . State or Province . Country . Postal Code Phone Number . Area Code . Exchange Number . Number Within Exchange Date of Birth Gender Race Major Grade Point Average
Attributes and compound attributes
Copyright Irwin/McGraw-Hill 199812
Data Modeling
System Concepts for Data Modeling
Attributes Domains:
The values for each attribute are defined in terms of three properties: data type, domain, and default.
• The data type for an attribute defines what class of data can be stored in that attribute.
• For purposes of systems analysis and business requirements definition, it is useful to declare logical (non-technical) data types for our business attributes.
• An attribute’s data type determines its domain.
– The domain of an attribute defines what values an attribute can legitimately take on.
• Every attribute should have a logical default value.
– The default value for an attribute is that value which will be recorded if not specified by the user.
Copyright Irwin/McGraw-Hill 199813
Data Modeling
Logical Data Type Logical Business Meaning
NUMBER Any number, real or integer
TEXT A string of characters, inclusive of numbers. When numbers are
included in a TEXT attribute, it means we do not expect to
perform arithmetic or comparisons with those numbers.
MEMO Same as TEXT but of an indeterminate size. Some business
systems require the ability to attach potentially lengthy note to a
give database record.
DATE Any date in any format.
TIME Any time in any format.
YES/NO An attribute that can only assume one of these two values
VALUE SET A finite set of values. In most cases, a coding scheme would be
established (e.g., FR=freshman, SO=sophomore, JR=junior,
SR=senior, etc.)
IMAGE Any picture or image.
Copyright Irwin/McGraw-Hill 199814
Data ModelingData Type Domain Examples
NUMBER For integers, specify the range:
{minimum - maximum}
For real numbers, specify the range and
precision:
{minimum.precision -
maximum.precision}
{10- 99}
{1.000 - 799.999}
TEXT TEXT (maximum size of attribute)
Actual values are usually infinite;
however, users may specify certain
narrative restrictions.
TEXT (30)
MEMO Not applicable. There are no restrictions
on size or content.
Not applicable.
DATE Variation on the MMDDYYYY format. To
accommodate the year 2000, do not
abbreviate year to YY. Formatting
characters are rarely stored; therefore, do
not include hyphens or slashes.
MMDDYYYY
MMYYYY
YYYY
TIME For AM/PM times: HHMMT
- or -
HHMMT
HHMM
Copyright Irwin/McGraw-Hill 199815
Data Modeling
Default Value Interpretation Examples
A legal value from the
domain (as described above)
For an instance of the attribute, if the user
does not specify a value, then use this value.
0
1.00
FR
NONE or NULL For an instance of the attribute, if the user
does not specify a value, then leave it blank.
NONE
NULL
REQUIRED or NOT NULL For an instance of the attribute, require the
user to enter a legal value from the domain.
(This is used when no value in the domain is
common enough to be a default, but a some
value must be entered.)
REQUIRED
NOT NULL
Copyright Irwin/McGraw-Hill 199816
Data Modeling
System Concepts for Data Modeling
Attributes Identification:
An entity typically has many instances; perhaps thousands or millions and there exists a need to uniquely identify each instance based on the data value of one or more attributes.
Every entity must have an identifier or key.• An key is an attribute, or a group of attributes, which assumes a
unique value for each entity instance. It is sometimes called an identifier.
Sometimes more than one attribute is required to uniquely identify an instance of an entity.
• A group of attributes that uniquely identifies an instance of an entity is called a concatenated key. Synonyms include composite key and compound key.
Copyright Irwin/McGraw-Hill 199817
Data Modeling
System Concepts for Data Modeling
Attributes Identification:
Frequently, an entity may have more than one key. Each of these attributes is called a candidate key.
• A candidate key is a ‘candidate to become the primary identifier’ of instances of an entity. It is sometimes called a candidate identifier. (Note: A candidate key may be a single attribute or a concatenated key.)
• A primary key is that candidate key which will most commonly be used to uniquely identify a single entity instance.
• Any candidate key that is not selected to become the primary key is called an alternate key.
Copyright Irwin/McGraw-Hill 199818
Data Modeling
System Concepts for Data Modeling
Attributes Identification:
Sometimes, it is also necessary to identify a subset of entity instances as opposed to a single instance.
• For example, we may require a simple way to identify all male students, and all female students.
• A subsetting criteria is a attribute (or concatenated attribute) whose finite values divide all entity instances into useful subsets. Some methods call this an inversion entry.
STUDENT
Student Number (Primary Key 1) Name (Alternate Key 1) . Last Name . First Name . Middle Initial Address . Street Address . City . State or Province . Country . Postal Code Phone Number . Area Code . Exchange Number . Number Within Exchange Date of Birth Gender (Subsetting Criteria 1) Race (Subsetting Criteria 2) Major (Subsetting Criteria 3) Grade Point Average
Keys and submitting criteria
Copyright Irwin/McGraw-Hill 199819
Data Modeling
System Concepts for Data Modeling
Relationships Conceptually, entities and attributes do not exist in isolation. Entities interact with, and impact one another via relationships to
support the business mission. A relationship is a natural business association that exists
between one or more entities. The relationship may represent an event that links the entities, or merely a logical affinity that exists between the entities.
A connecting line between two entities on an ERD represents a relationship.
A verb phrase describes the relationship.• All relationships are implicitly bidirectional, meaning that they
can interpreted in both directions.
Copyright Irwin/McGraw-Hill 199820
Data Modeling
STUDENT CURRICULUMis enrolled inis being studied by
Copyright Irwin/McGraw-Hill 199821
Data Modeling
System Concepts for Data Modeling
Relationships Cardinality:
Each relationship on an ERD also depicts the complexity or degree of each relationship and this is called cardinality.
• Cardinality defines the minimum and maximum number of occurrences of one entity for a single occurrence of the related entity. Because all relationships are bi-directional, cardinality must be defined in both directions for every relationship.
Copyright Irwin/McGraw-Hill 199822
Data Modeling
C a r d i n a l i t yI n t e r p r e t a t i o n
M i n i m u mI n s t a n c e s
M a x i m u mI n s t a n c e s
G r a p h i c N o t a t i o n
E x a c t l y o n e 1 1
Z e r o o r o n e 0 1
O n e o r m o r e 1 m a n y ( > 1 )
Z e r o , o n e , o r m o r e 0 m a n y ( > 1 )
M o r e t h a n o n e > 1 > 1
F i g u r e 5 . 3
Copyright Irwin/McGraw-Hill 199823
Data Modeling
System Concepts for Data Modeling
Relationships Degree:
The degree of a relationship is the number of entities that participate in the relationship.
• A binary relationship has a degree = 2, because two different entities participated in the relationship.
Relationships may also exist between different instances of the same entity.
• This is called a recursive relationship (sometimes called a unary relationship; degree = 1).
Copyright Irwin/McGraw-Hill 199824
Data Modeling
COURSE
Course Id (Primary Key) . Subject Abbreviation . Course Number Course Title Course Credit
is a prerequisite for
has as a prerequisite
Copyright Irwin/McGraw-Hill 199825
Data Modeling
System Concepts for Data Modeling
Relationships Degree: (continued)
Relationships can also exist between more than two different entities.
• These are sometimes called N-ary relationships.
• A relationship existing among three entities is called a 3-ary or ternary relationship.
• An N-ary relationship maybe associated with an associative entity.
– An associative entity is an entity that inherits its primary key from more than one other entity (parents). Each part of that concatenated key points to one and only one instance of each of the connecting entities.
Copyright Irwin/McGraw-Hill 199826
Data Modeling COURSE
Course ID (Primary Key) . Subject Abbreviation . Course Number Course Title Credit
INSTRUCTOR
Instructor ID Code (Primary Key) Instructor Name . Last Name . First Name . Middle Initial
ROOM
Classroom ID . Building Abbreviation . Room Number Number of Seats
SCHEDULED CLASS
Scheduled Class ID (Primary Key) . Course ID . Instructor ID . Room ID Division Number Days of Week Start Time End Time
meets as is assigned to
is assigned to
Copyright Irwin/McGraw-Hill 199827
Data Modeling
System Concepts for Data Modeling
Relationships Foreign Keys:
A relationship implies that instances of one entity are related to instances of another entity.
To be able to identify those instances for any given entity, the primary key of one entity must be migrated into the other entity as a foreign key.
• A foreign key is a primary key of one entity that is contributed to (duplicated in) another entity for the purpose of identifying instances of a relationship. A foreign key (always in a child entity) always matches the primary key (in a parent entity).
Copyright Irwin/McGraw-Hill 199828
Data Modeling
CURRICULUM Program of Study Code (Primary Key) Title of Program Type of Degree Awarded (Subsetting Criteria 1) Department Number (Foreign Key)
DEPARTMENT Department Number (Primary Key) Department Name
offers is offered by
Copyright Irwin/McGraw-Hill 199829
Data Modeling
System Concepts for Data Modeling
Relationships Foreign Keys: (continued)
When you have a relationship that you cannot differentiate between parent and child it is called a non-specific relationship.
• A non-specific relationship (or many-to-many relationship) is one in which many instances of one entity are associated with many instances of another entity. Such relationships are suitable only for preliminary data models, and should be resolved as quickly as possible.
• All non-specific relationships can be resolved into a pair of one-to-many relationships by inserting an associative entity between the two original entities.
Copyright Irwin/McGraw-Hill 199830
Data Modeling STUDENT
Student Number (Primary Key 1) Name (Alternate Key 1) . Last Name . First Name . Middle Initial Address . Street Address . City . State or Province . Country . Postal Code Phone Number . Area Code . Exchange Number . Number Within Exchange Date of Birth Gender (Subsetting Criteria 1) Race (Subsetting Criteria 2) Grade Point Average
FIGURE(a)
CURRICULUM
Program of Study Code (Primary Key) Title of Program Type of Degree Awarded (Subsetting Criteria 1)
applies to is enrolled in
STUDENT
Student Number (Primary Key 1) Name (Alternate Key 1) . Last Name . First Name . Middle Initial Address . Street Address . City . State or Province . Country . Postal Code Phone Number . Area Code . Exchange Number . Number Within Exchange Date of Birth Gender (Subsetting Criteria 1) Race (Subsetting Criteria 2) Grade Point Average
CURRICULUM Program of Study Code (Primary Key) Title of Program Type of Degree Awarded (Subsetting Criteria 1)
has declared
is being studied by
FIGURE (b)
MAJOR
Major ID (Primary Key) . Student Number (Foreign Key) . Program of Study Code (Foriegn Key) Date Enrolled Current Candidate for Degree?
Copyright Irwin/McGraw-Hill 199831
Data Modeling
System Concepts for Data Modeling
Relationships Generalization:
Generalization is an approach that seeks to discover and exploit the commonalties between entities.
• Generalization is a technique wherein the attributes that are common to several types of an entity are grouped into their own entity, called a supertype.
• An entity supertype is an entity whose instances store attributes that are common to one or more entity subtypes.
– The entity supertype will have one or more one-to-one relationships to entity subtypes. These relationships are sometimes called IS A relationships (or WAS A, or COULD BE A) because each instance of the supertype ‘is also an’ instance of one or more subtypes.
Copyright Irwin/McGraw-Hill 199832
Data Modeling
System Concepts for Data Modeling
Relationships Generalization: (continued)
• An entity subtype is an entity whose instances inherit some common attributes from an entity supertype, and then add other attributes that are unique to an instances of the subtype.
An entity can be both a supertype and subtype. Through inheritance, the concept of generalization in data
models permits the the reduction of the number of attributes through the careful sharing of common attributes.
• The subtypes not only inherit the attributes, but also the data types, domains, and defaults of those attributes.
• In addition to inheriting attributes, subtypes also inherit relationships to other entities.
Copyright Irwin/McGraw-Hill 199833
Data Modeling PERSON Personal ID Number (Primary Key) Name . Last Name . First Name . Middle Initial Gender (Subsetting Criteria 1) Race (Subsetting Criteria 2) Marital Status (Subsetting Criteria 3)
STUDENT
Personal ID Number = Student Number (Primary Key) all attributes from PERSON
EMPLOYEE
Personal ID Number = Social Security Number (Primary Key) all attributes from PERSON plus Pension Plan Code Life Insurance Plan Code Medical Insurance Plan Code Vacation Days Accumulated Sick Days Acculumlated
ADDRESS
is a is a
can be contacted
at
PROSPECT
all attributes from PERSON and STUDENT plus First Contact Date Last Contact Date Has Visited Campus?
ALUMNUS
all attributes from PERSON and STUDENT plus Member of Alumni Association? Job in Field of Study? Last Known Salary
FORMER STUDENT
all attributes from PERSON and STUDENT plus Reason for Withdrawal Plans to Return?
CURRENT STUDENT
all attributes from PERSON and STUDENT plus Number of Credits Earned Grade Point Average Encumberance Status Financial Aid Eligibility Status
is a
is a
could be a
could be a
CONTRACTis bound by
AWARDED DEGREE
has earned
Copyright Irwin/McGraw-Hill 199834
Data ModelingThe Process of Logical Data
Modeling
Strategic Data Modeling Many organizations select application development projects based
on strategic information system plans. Strategic planning is a separate project.
This project produces an information systems strategy plan that defines an overall vision and architecture for information systems.
• Almost always, the architecture includes an enterprise data model.
Copyright Irwin/McGraw-Hill 199835
Data ModelingThe Process of Logical Data
Modeling
Strategic Data Modeling An enterprise data model typically identifies only the most
fundamental of entities. The entities are typically defined (as in a dictionary) but they
are not described in terms of keys or attributes. The enterprise data model may or may not include relationships
(depending on the planning methodology’s standards and the level of detail desired by executive management). If relationships are included, many of them will be non-
specific. The enterprise data model is usually stored in a corporate
repository.
Copyright Irwin/McGraw-Hill 199836
Data ModelingThe Process of Logical Data
Modeling
Data Modeling During Systems Analysis The data model for a single system or application is usually called
an application data model. Logical data models have a DATA focus and a SYSTEM USER
perspective. Logical data models are typically constructed as deliverables of
the study and definition phases of a project. Logical data models are not concerned with implementation
details or technology, they may be constructed (through reverse engineering) from existing databases.
Data models are rarely constructed during the survey phase of systems analysis.
Copyright Irwin/McGraw-Hill 199837
Data ModelingINFORMATION SYSTEMS FRAMEWORK
SYSTEM
ANALYSTS
SYSTEMBUILDERS
(components)
SYSTEMDESIGNERS
(specification)
SYSTEMUSERS
(requirements)
SYSTEMOWNERS
(scope)
ExistingDatabases
andTechnology
Data Requirements
data models
Business Subjects
entities and definitions
FOCUS ONSYSTEM
DATA
FOCUS ONSYSTEM
PROCESSES
FOCUS ON SYSTEM
INTERFACES
ExistingApplications
andTechnology
ExistingInterfaces
andTechnology
ExistingNetworks
andTechnology
FOCUS ONSYSTEM
GEOGRAPHY
Definition Phase
(establish and
prioritize
business system
requirements)
Study Phase
(establish
system
improvement
objectives)
Survey Phase
(establish scope
and project plan)
FASTMethodology
CUSTOMER customer-no customer-name customer-rating balance-due
PRODUCT product-no product-name unit-of-measure unit-price quantity-available
ORDER order-no order-date products-ordered quantities-ordered
Customers order zero, one, or more products. Products may be ordered by zero, one, or more customers.
Reverse
Engineering
(optional)
Copyright Irwin/McGraw-Hill 199838
Data ModelingThe Process of Logical Data
Modeling
Data Modeling During Systems Analysis Data modeling is rarely associated with the study phase of systems
analysis. Most analysts prefer to draw process models to document the current system. Many analysts report that data models are far superior for the
following reasons: • Data models help analysts to quickly identify business vocabulary
more completely than process models.
• Data models are almost always built more quickly than process models.
• A complete data model can be fit on a single sheet of paper. Process models often require dozens of sheets of paper.
• Process modelers too easily get hung up on unnecessary detail.
Copyright Irwin/McGraw-Hill 199839
Data ModelingThe Process of Logical Data
Modeling
Data Modeling During Systems Analysis Many analysts report that data models are far superior for the
following reasons: (continued)• Data models for existing and proposed systems are far more
similar than process models for existing and proposed systems. Consequently, there is less work to throw away as you move into later phases.
A study phase model should include only entities relationships, but no attributes – a context data model. The intent is to refine the understanding of scope; not to get
into details about the entities and business rules.
Copyright Irwin/McGraw-Hill 199840
Data ModelingThe Process of Logical Data
Modeling
Data Modeling During Systems Analysis The definition phase data model will be constructed in at least two
stages: 1 A key-based data model will be drawn.
• This model will eliminate non-specific relationships, add associative entities, include primary, alternate keys, and foreign keys, plus precise cardinalities and any generalization hierarchies.
2 A fully attributed data model will be constructed. • The fully attributed model includes all remaining descriptive
attributes and subsetting criteria.
– Each attribute is defined in the repository with data types, domains, and defaults.
The completed data model represents all of the business requirements for a system’s database.
Copyright Irwin/McGraw-Hill 199841
Data ModelingThe Process of Logical Data
Modeling
Looking Ahead to Systems Configuration and Design The logical data model from systems analysis describes business
data requirements, not technical solutions. The purpose of the configuration phase is to determine the best
way to implement those requirements with database technology. During system design, the logical data model will be transformed
into a physical data model (called a database schema) for the chosen database management system. This model will reflect the technical capabilities and limitations
of that database technology, as well as the performance tuning requirements suggested by the database administrator.
The physical data model will also be analyzed for adaptability and flexibility through a process called normalization.
Copyright Irwin/McGraw-Hill 199842
Data ModelingThe Process of Logical Data
Modeling
Fact-Finding and Information Gathering for Data Modeling
Data models cannot be constructed without appropriate facts and information as supplied by the user community. These facts can be collected by a number of techniques such as
sampling of existing forms and files; research of similar systems; surveys of users and management; and interviews of users and management.
The fastest method of collecting facts and information, and simultaneously constructing and verifying the data models is Joint Application Development (JAD).
Copyright Irwin/McGraw-Hill 199843
Data ModelingPurpose Candidate Questions
Discover the systementities
What are the subjects of the business? In other words, whattypes of persons, organizations, organizational units, places,things, materials, or events are used in, or interact with thissystem, about which data must be captured or maintained?How many instances of each subject exist?
Discover the entity keys What unique characteristic (or characteristics) distinguishes aninstance of each subject from other instances of the samesubject? Are there any plans to change this identificationscheme in the future?
Discover entity subsettingcriteria
Are there any characteristics of a subject that divide allinstances of the subject into useful subsets? Are there anysubsets of the above subjects for which you have no convenientway to group instances?
Discover attributes anddomains
What characteristics describe each subject? For each of thesecharacteristics: (1) what type of data is stored? (2) who isresponsible for defining legitimate values for the data? (3) whatare the legitimate values for the data? (4) is a value required?and (5) is there any default value that should be assigned if youdon’t specify otherwise?
Discover security andcontrol needs
Are there any restrictions on who can see or use the data? Whois allowed to create the data? Who is allowed to update thedata? Who is allowed to delete the data?
Discover data timingneeds
How often does the data change? Over what period of time isthe data of value to the business? How long should we keep thedata? Do you need historical data or trends? If a characteristicchanges, must you know the former values?
Discover generalizationhierarchies
Are all instances of each subject the same? That is, are therespecial types of each subject that are described or handleddifferently? Can any of the data be consolidated for sharing?
Discover relationshipsand degrees
What events occur that imply associations between subjects?What business activities or transactions require involvehandling or changing data about several different subjects of thesame or a different type?
Discover cardinalities Is each business activity or event handled the same way or arethere special circumstances? Can an event occur with onlysome of the associated subjects, or must all the subjects beinvolved?
Copyright Irwin/McGraw-Hill 199844
Data ModelingThe Process of Logical Data
Modeling
Computer-Aided Systems Engineering (CASE) for Data Modeling
Data models are stored in the repository. In a sense, the data model is metadata – that is, data about the
business’ data. Computer-aided systems engineering (CASE) technology,
provides the repository for storing the data model and its detailed descriptions.
Copyright Irwin/McGraw-Hill 199845
Data ModelingThe Process of Logical Data
Modeling
Computer-Aided Systems Engineering (CASE) for Data Modeling
Using a CASE product, you can easily create professional, readable data models without the use of paper, pencil, erasers, and templates. The models can be easily modified to reflect corrections and
changes suggested by end-users. Most CASE products provide powerful analytical tools that can
check your models for mechanical errors, completeness, and consistency.
Copyright Irwin/McGraw-Hill 199846
Data ModelingThe Process of Logical Data
Modeling
Computer-Aided Systems Engineering (CASE) for Data Modeling
Not all data model conventions are supported by all CASE products. It is very likely that any given CASE product may force the
company to adapt their methodology’s data modeling symbols or approach so that it is workable within the limitations of their CASE tool.
Copyright Irwin/McGraw-Hill 199847
Data Modeling
How to Construct Data Models
1st Step - Entity Discovery The first task in data modeling is to discover those fundamental
entities in the system that are or might be described by data. There are several techniques that may be used to identify entities.
During interviews or JAD sessions with system owners and users, pay attention to key words in their discussion.
During interviews or JAD sessions, specifically ask the system owners and users to identify things about which they would like to capture, store, and produce information.
Study existing forms and files. Some CASE tools can reverse engineer existing files and
databases into physical data models.
Copyright Irwin/McGraw-Hill 199848
Data Modeling
How to Construct Data Models
1st Step - Entity Discovery A true entity has multiple instances—dozens, hundreds, thousands,
or more! Entities should be named with nouns that describe the person,
event, place, or tangible thing about which we want to store data. Try not to abbreviate or use acronyms. Names should be singular so as to distinguish the logical
concept of the entity from the actual instances of the entity. Define each entity in business terms.
Don’t define the entity in technical terms, and don’t define it as ‘data about …’.
Your entity names and definitions should establish an initial glossary of business terminology that will serve both you and future analysts and users for years to come.
Copyright Irwin/McGraw-Hill 199849
Data ModelingEntity Name Business Definition
AGREEMENT A contract whereby a member agrees to purchase a certain number ofproducts within a certain time. After fulfilling that agreement, themember becomes eligible for bonus credits that are redeemable for freeor discounted products.Note: A major system improvement objective is to make agreementsmore flexible with respect to other clubs. Currently, only purchaseswithin the club that issued an agreement count toward credits. Anothersystem improvement objective would award bonus credits for eachpurchase leading up to fulfillment of the agreement, with acceleratedbonuses after fulfillment of the agreement.
CLUB A SoundStage membership group to which members can belong. Clubstend to be organized according to product interests such as music versusmovies versus games; or specialized media interests such as DigitalVideo Disks (DVD) or Nintendo.Note: Cross-club interaction is a desired objective for the new system.
MEMBER An active member of one or more clubs.Note: A target system objective is to re-enroll inactive members asopposed to deleting them.
MEMBER ORDER An order generated for a member as part of a monthly promotion, or anorder initiated by a member.Note: The current system only supports orders generated frompromotions; however, customer initiated orders have been given a highpriority as an added option in the proposed system.
PRODUCT An inventoried product available for promotion and sale to members.Note: System improvement objectives include (1) compatibility withnew bar code system being developed for the warehouse, and (2)adaptability to a rapidly changing mix of products.
PROMOTION A monthly or quarterly event whereby dated orders are generated for allmembers in a club. Members then have some period of time to cancel oraccelerate fulfillment of that order, after which the order is automaticallyfilled.
Copyright Irwin/McGraw-Hill 199850
Data Modeling
How to Construct Data Models
2nd Step - The Context Data Model The second task in data modeling is to construct the context data
model. The context data model includes the fundamental or
independent entities that were previously discovered.• An independent entity is one which exists regardless of the
existence of any other entity. Its primary key contain no attributes that would make it dependent on the existence of another entity.
• Independent entities are almost always the first entities discovered in your conversations with the users.
Relationships should be named with verb phrases that, when combined with the entity names, form simple business sentences or assertions.
• Always name the relationship from parent-to-child.
Copyright Irwin/McGraw-Hill 199851
Data Modeling
MEMBER ORDERComment
PRODUCTComment
MEMBERComment
PROMOTIONComment
AGREEMENTComment
CLUBComment
responds to
is featured in
places
establishessponsors
belongs to
sells binds
generates
Copyright Irwin/McGraw-Hill 199852
Data Modeling
How to Construct Data Models
3rd Step - The Key-Based Data Model The third task is to identify the keys of each entity. The following guidelines are suggested for keys:
The value of a key should not change over the lifetime of each entity instance.
The value of a key cannot be null. Controls must be installed to ensure that the value of a key is
valid.
Copyright Irwin/McGraw-Hill 199853
Data Modeling
How to Construct Data Models
3rd Step - The Key-Based Data Model The following guidelines are suggested for keys: (continued)
Some experts suggest that you avoid intelligent keys because the key may change over the lifetime of the entity instance.
• An intelligent key is a business code whose structure communicates data about an entity instance (such as its classification, size, or other properties).
• A code is a group of characters and/or digits that identifies and describes something in the business system.
Other experts suggest that you avoid intelligent keys because business codes can return value to the organization because they can be quickly processed by humans without the assistance of a computer.
Copyright Irwin/McGraw-Hill 199854
Data Modeling
How to Construct Data Models
3rd Step - The Key-Based Data Model The following guidelines are suggested for keys: (continued)
Consider inventing a surrogate key instead to substitute for large concatenated keys of independent entities.
• This suggestion is not practical for associative entities since because each part of the concatenated key is a foreign key that must precisely match its parent entity’s primary key.
If you cannot define keys for an entity, it may be that the entity doesn’t really exist—that is, multiple occurrences of the so-called entity do not exist.
Copyright Irwin/McGraw-Hill 199855
Data Modeling
How to Construct Data Models
3rd Step - The Key-Based Data Model Business Codes
There are several types of codes and they can be combined to form effective means for entity instance identification.
• Serial codes assign sequentially generated numbers to entity instances.
– Many database management systems can generate and constrain serial codes to a business’ requirements.
• Block codes are similar to serial codes except that serial numbers are divided into groups that have some business meaning.
• Alphabetic codes use finite combinations of letters (and possibly numbers) to describe entity instances.
– Alphabetic codes must usually be combined with serial or block codes in order to uniquely identify instances of most entities.
Copyright Irwin/McGraw-Hill 199856
Data Modeling
How to Construct Data Models
3rd Step - The Key-Based Data Model Business Codes
There are several types of codes and they can be combined to form effective means for entity instance identification. (continued)
• In significant position codes, each digit or group of digits describes a measurable or identifiable characteristic of the entity instance.
– Significant digit codes are frequently used to code inventory items.
• Hierarchical codes provide a top-down interpretation for an entity instance.
– Every item coded is factored into groups, subgroups, and so forth.
Copyright Irwin/McGraw-Hill 199857
Data Modeling
How to Construct Data Models
3rd Step - The Key-Based Data Model Business Codes
The following guidelines are suggested when creating a business coding scheme:
• Codes should be expandable to accommodate growth.
• The full code must result in a unique value for each entity instance.
• Codes should be large enough to describe the distinguishing characteristics, but small enough to be interpreted by people without a computer.
• Codes should be convenient. A new instance should be easy to create.
Copyright Irwin/McGraw-Hill 199858
Data Modeling
PRODUCT ON ORDERKey Data
Order-Number [PK1] [FK]Member-Number [PK2] [FK]Product-Number [PK3] [FK]Universal-Product-Code [PK4] [FK]
PROMOTIONKey Data
Product-Number [PK2] [FK]Club-Name [PK1] [FK]Universal-Product-Code [PK3] [FK]
PRODUCTKey Data
Product-Number [PK1]Universal-Product-Code [PK2]
MEMBER ORDERKey Data
Order-Number [PK1]Member-Number [PK2] [FK]
AGREEMENTKey Data
Club-Name [PK2] [FK]Agreement-Number [PK1]
CLUB MEMBERSHIPKey Data
Member-Number [PK2] [FK]Club-Name [PK3] [FK]
MEMBERKey Data
Member-Number [PK1]
CLUBKey Data
Club-Name [PK1]
sold as
sells
enrolls in
sponsors
binds
responds to
is featured in
places
establishessponsors
generates
Copyright Irwin/McGraw-Hill 199859
Data Modeling
How to Construct Data Models
4th Step - Generalized Hierarchies At this time, it would be useful to identify any generalization
hierarchies in a business problem.
Copyright Irwin/McGraw-Hill 199860
Data Modeling
PRODUCT ON AN ORDERKey Data
Order-Number [PK1] [FK]Product-Number [PK2] [FK]Universal-Product-Code [PK3] [FK]
CLUB MEMBERSHIPKey Data
Club-Name [PK1] [FK]Member-Number [PK2] [FK]Agreement-Number [PK3] [FK]
TITLEKey Data
Product-Number [PK1] [FK]Universal-Product-Code [PK2] [FK]
MERCHANDISEKey Data
Product-Number [PK1] [FK]Universal-Product-Code [PK2] [FK]
VIDEO TITLEKey Data
Product-Number [PK1] [FK]Universal-Product-Code [PK2] [FK]
GAME TITLEKey Data
Product-Number [PK1] [FK]Universal-Product-Code [PK2] [FK]
AUDIO TITLEKey Data
Product-Number [PK1] [FK]Universal-Product-Code [PK2] [FK]
PRODUCTKey Data
Product-Number [PK1]Universal-Product-Code [PK2]
AGREEMENTKey Data
Club-Name [PK2] [FK]Agreement-Number [PK1]
MEMBERKey Data
Member-Number [PK1]
MEMBER ORDERKey Data
Order-Number [PK1]
PROMOTIONKey Data
Club-Name [PK1] [FK] CLUBKey Data
Club-Name [PK1]
generates
sold as
sells
responds to
placed
sponsors
sponsors
binds
establishes
enrolls in
generates
is a
is a
Copyright Irwin/McGraw-Hill 199861
Data Modeling
How to Construct Data Models
5th Step - The Fully Attributed Data Model The fifth task is to identify the remaining data attributes.
The following guidelines are offered for attribution.• Many organizations have naming standards and approved
abbreviations.
– The data or repository administrator usually maintains such standards.
• Many attributes share common base names such as NAME, ADDRESS, DATE.
– Unless the attributes can be generalized into a supertype, it is best to give each variation a unique name such as:
CUSTOMER NAME vs SUPPLIER NAME
– Names must be distinguishable across projects.
• Logical attribute names should not be abbreviated.
Copyright Irwin/McGraw-Hill 199862
Data Modeling
How to Construct Data Models
5th Step - The Fully Attributed Data Model The following guidelines are offered for attribution.
(continued)• For attributes that have only YES or NO values, name as
questions.
– For example, CANDIDATE FOR A DEGREE?
• Each attribute should be mapped to only one entity.
– Foreign keys are the exception – they identify associated instances of related entities.
• An attribute’s domain should not be based on logic.
Copyright Irwin/McGraw-Hill 199863
Data ModelingMEMBERKey Data
Member-Number [PK1]Non-Key Data
Member-Name. Last-Name. First-Name. Middle-InitialMember-StatusMember-Street-AddressMember-Post-Office-BoxMember-CityMember-StateMember-Zip-CodeMember-Daytime-Phone-Number. Area-Code. Phone-Number. Extension ()Member-Date-of-Last-OrderMember-BalanceMember-Credit-Card-TypeMember-Credit-Card-NumberMember-Credit-Card-Expire-DateMember-Bonus-Balance
MEMBER ORDERKey Data
Order-Number [PK1]Non-Key Data
Order-Creation-DateOrder-Fill-DateShipping-Address-NameShipping-Street-AddressShipping-CityShipping-StateShipping-ZipShipping-InstructionsOrder-Sub-TotalOrder-Sales-TaxOrder-Shipping-MethodOrder-Shipping-&-Handling-CostOrder-StatusOrder-Prepaid-AmountOrder-Prepayment-MethodMember-Number [FK]Club-Name [FK]Promotion-Number
VIDEO TITLEKey Data
Product-Number [PK1] [FK]Universal-Product-Code [PK2] [FK]Non-Key Data
ProducerDirectorVideo-CategoryVideo-Sub-CategoryClosed-CaptionedLanguageRunning-TimeVideo-Media-TypeVideo-EncodingScreen-AspectMPA-Rating-Code
GAME TITLEKey Data
Product-Number [PK1] [FK]Universal-Product-Code [PK2] [FK]Non-Key Data
ManufacturerGame-CategoryGame-Sub-CategoryGame-PlatformGame-Media-TypeNumber-of-PlayersParent-Advisory-Code
PRODUCTKey Data
Product-Number [PK1]Universal-Product-Code [PK2]Non-Key Data
Product-Quantity-in-StockProduct-TypeManf-Suggested-PriceClub-Default-PriceSpecial-PriceUnits-Sold-Month-to-DateUnits-Sold-Year-to-DateUnits-Sold-Lifetime
PRODUCT ON AN ORDERKey Data
Order-Number [PK1] [FK]Product-Number [PK2] [FK]Universal-Product-Code [PK3] [FK]
Non-Key DataQuantity-OrderedQuantity-ShippedQuantity-BackorderedPurchase-Unit-PriceCredits-Earned
AUDIO TITLEKey Data
Product-Number [PK1] [FK]Universal-Product-Code [PK2] [FK]Non-Key Data
ArtistAudio-CategoryAudio-Sub-CategoryNumber-of-Units-in-PackageAudio-Media-CodeContent-Advisory-Code
TITLEKey Data
Product-Number [PK1] [FK]Universal-Product-Code [PK2] [FK]Non-Key Data
Title-of-WorkTitle-CoverCatalog-DescriptionCopyright-DateEntertainment-CategoryCredit-Value
PROMOTIONKey Data
Club-Name [PK1] [FK]Non-Key Data
Promotion-NumberPromotion-Release-DatePromotion-StatusPromotion-TypeAutomatic-Fill-DelayProduct-Number [FK]Universal-Product-Code [FK]
MERCHANDISEKey Data
Product-Number [PK1] [FK]Universal-Product-Code [PK2] [FK]Non-Key Data
Merchandise-NameMerchandise-DescriptionMerchadise-TypeUnit-of-Measure
CLUB MEMBERSHIPKey Data
Club-Name [PK1] [FK]Member-Number [PK2] [FK]Agreement-Number [PK3] [FK]
Non-Key DataDate-EnrolledExpiration-DateNumber-of-Credits-RequiredNumber-of-Credits-Earned
AGREEMENTKey Data
Club-Name [PK2] [FK]Agreement-Number [PK1]Non-Key Data
Agreement-Active-DateAgreement-Expire-DateFulfillment-PeriodRequired-Number-of-Credits
CLUBKey Data
Club-Name [PK1]Non-Key Data
Club-DescriptionClub-Charter-Date
sold as
sells
responds to
placed
sponsors
sponsors
binds
establishes
enrolls in
generates
generates
is a
is a
Copyright Irwin/McGraw-Hill 199864
Data Modeling
How to Construct Data Models
6th Step - The Fully Described Model The last task is to fully describe the data model.
This task is the most time consuming. This task can be started in parallel with the key-based model or
fully attributed model, but it is usually the last data modeling task completed.
At this time the descriptions for the attributes are still incomplete – they require domains.
• Most CASE tools provide extensive facilities for describing the data types, domains, and defaults for all attributes to the repository.
Copyright Irwin/McGraw-Hill 199865
Data Modeling
How to Construct Data Models
6th Step - The Fully Described Model Additional descriptive properties may be recorded for attributes
such as:• Who should be able to create, delete, update, and access each
attribute?
• How long should each attribute (or entity) be kept before the data is deleted or archived?
Copyright Irwin/McGraw-Hill 199866
Data Modeling
The Next Generation
Data modeling should remain a value-added skill for many years.
The demand for data modeling as a skill is dependent on two factors: (1) the need for databases, and (2) the use of relational database management system
technology to implement those databases. • There is some belief that relational database technology will
eventually be replaced by object technology.
• If that were to happen, data modeling would be replaced by object modeling techniques.
• Even as object database technology becomes available, we expect the relational database industry to add object features and technologies to their product lines.
Copyright Irwin/McGraw-Hill 199867
Data Modeling
The Next Generation
CASE technology will continue to improve. Today’s better CASE tools provide a two-way synchronization
between the logical data models and their database designs. This synchronization will likely extend as CASE vendors enable
their tools to directly communicate and interoperate with database management systems and working databases.
Copyright Irwin/McGraw-Hill 199868
Data Modeling
Summary
Introduction An Introduction to Systems Modeling System Concepts for Data Modeling The Process of Logical Data Modeling How to Construct Data Models The Next Generation