review of the entity-relationship model slides courtesy of amol deshpande material from ch. 2 of...
Post on 21-Dec-2015
217 views
TRANSCRIPT
Review of the Entity-Relationship Model
Slides courtesy of Amol Deshpandematerial from ch. 2 of
Korth & Silberschatz Database System Concepts,
Data Modeling
Goals:Conceptual representation of the data“Reality” meets “bits and bytes”Must make sense, and be usable by other people
Review:Entity-relationship ModelRelational Model
Motivation
You’ve just been hired by Bank of America as their DBA for their online banking web site.
You are asked to create a database that monitors:customersaccountsloansbranchestransactions, …
Now what??!!!
Database Design Steps
Three Levels of Modeling
info
Conceptual Data Model
Logical Data Model
Physical Data Model
Conceptual DB design
Logical DB design
Physical DB design
Entity-relationship Model Typically used for conceptual database design
Relational Model Typically used for logical database design
Entity-Relationship Model
Two key conceptsEntities:
• An object that exists and is distinguishable from other objects– Examples: Bob Smith, BofA, CMSC424
• Have attributes (people have names and addresses)• Form entity sets with other entities of the same type that share the
same properties– Set of all people, set of all classes
• Entity sets may overlap– Customers and Employees
Entity-Relationship Model
Two key conceptsRelationships:
• Relate 2 or more entities – E.g. Bob Smith has account at College Park Branch
• Form relationship sets with other relationships of the same type that share the same properties
– Customers have accounts at Branches
• Can have attributes:– has account at may have an attribute start-date
• Can involve more than 2 entities– Employee works at Branch at Job
ER Diagram: Starting Example
Rectangles: entity setsDiamonds: relationship setsEllipses: attributes
customer has
cust-street
cust-id
cust-name
cust-city
account
balance
numberaccess-date
Review Roadmap
Details of the ER ModelHow to represent various types of constraints/semantic
information etc.
Design issues
A detailed example
Relationship CardinalitiesWe may know:
One customer can only open one accountOROne customer can open multiple accounts
Representing this is importantWhy ?
Better manipulation of dataCan enforce such a constraintRemember: If not represented in conceptual model, the domain knowledge
may be lost
Mapping Cardinalities
Express the number of entities to which another entity can be associated via a relationship set
Most useful in describing binary relationship sets
Mapping Cardinalities
One-to-One
One-to-Many
Many-to-One
Many-to-Many
customer has account
customer has account
customer has account
customer has account
Types of Attributes
Simple vs CompositeSingle value per attribute ?
Single-valued vs Multi-valuedE.g. Phone numbers are multi-valued
DerivedIf date-of-birth is present, age can be derived
Can help in avoiding redundancy, enforcing constraints etc…
Types of Attributes
customer has
cust-street
cust-id
cust-name
cust-city
account
balance
numberaccess-date
Types of Attributes
customer
cust-street
cust-id
cust-name
cust-city
has account
balance
numberaccess-date
phone no.
date-of-birth
age
multi-valued (double ellipse)
derived (dashed ellipse)
Types of Attributes
customer
cust-street
cust-id
cust-name
cust-city
has account
balance
numberaccess-date
phone no.
date-of-birth
age
month day year
Composite Attribute
Next: Keys
Key = set of attributes identifying individual entities or relationships
customer
cust-street
cust-id
cust-name
cust-city phone no.
age
date-of-birthPossible Keys:
{cust-id}
{cust-name, cust-city, cust-street}
{cust-id, age}
cust-name ?? Probably not.
Domain knowledge dependent !!
Entity Keys
Entity KeysSuperkey
any attribute set that can distinguish entities
Candidate keya minimal superkey
• Can’t remove any attribute and preserve key-ness– {cust-id, age} not a superkey – {cust-name, cust-city, cust-street} is
» assuming cust-name is not unique
Primary keyCandidate key chosen as the key by DBAUnderlined in the ER Diagram
Entity Keys
{cust-id} is a natural primary key
Typically, SSN forms a good primary key
Try to use a candidate key that rarely changes
e.g. something involving address not a great idea
customer
cust-street
cust-id
cust-name
cust-city phone no.
age
date-of-birth
Relationship Set Keys
What attributes are needed to represent a relationship completely and uniquely ?Union of primary keys of the entities involved, and relationship attributes
{cust-id, access-date, account number} describes a relationship completely
customer has
cust-id
account
numberaccess-date
Relationship Set Keys
Is {cust-id, access-date, account number} a candidate key ?No. Attribute access-date can be removed from this set without losing key-ness
customer has
cust-id
account
numberaccess-date
Relationship Set Keys
Is {cust-id, account-number} a candidate key ?Depends
customer has
cust-id
account
numberaccess-date
Relationship Set Keys
Is {cust-id, account-number} a candidate key ?Depends
customer has
cust-id
account
numberaccess-date
If one-to-one relationship, either {cust-id} or {account-number} sufficientSince a given customer can only have one account, she can only participate in one
relationship
Ditto account
Relationship Set Keys
Is {cust-id, account-number} a candidate key ?Depends
customer has
cust-id
account
numberaccess-date
If one-to-many relationship (as shown), {account-number} is a candidate keyA given customer can have many accounts, but at most one account holder per account
allowed
Relationship Set Keys
General rule for binary relationshipsone-to-one: primary key of either entity setone-to-many: primary key of the entity set on the many sidemany-to-many: union of primary keys of the associate entity
sets
n-ary relationshipsMore complicated rules
Data Constraints
Representing semantic data constraintsWe already saw constraints on relationship cardinalities
Participation Constraint
Given an entity set E, and a relationship R it participates in:If every entity in E participates in at least one relationship in R,
it is total participation
partial otherwise
Participation Constraint
customer has
cust-street
cust-id
cust-name
cust-city
account
balance
numberaccess-date
Total participation
Cardinality Constraints
customer has
cust-id
account
numberaccess-date
0..* 1..1
How many relationships can an entity participate in ?
Minimum - 0Maximum – no limit
Minimum - 1 Maximum - 1
Recursive Relationships
Sometimes a relationship associates an entity set to itself
Recursive Relationships
Must be declared with roles
employee works-for
emp-street
emp-id
emp-name
emp-city
manager
worker
Weak Entity Sets
An entity set without enough attributes to have a primary key
E.g. Transaction EntityAttributes:
• transaction-number, transaction-date, transaction-amount, transaction-type
• transaction-number: may not be unique across accounts
Weak Entity Sets
A weak entity set must be associated with an identifying or owner entity set
Account is the owner entity set for Transaction
Weak Entity Sets
account
balance
number
Transactionhas
trans-type
trans-number
trans-date
trans-amt
Still need to be able to distinguish between different weak entities associated with the same strong entity
Weak Entity Sets
account
balance
number
Transactionhas
trans-type
trans-number
trans-date
trans-amt
Discriminator: A set of attributes that can be used for that
Weak Entity Sets
Primary key:Primary key of the associated strong entity +
discriminator attribute set
For Transaction:
• {account-number, transaction-number}
Specialization
Consider entity person:Attributes: name, street, city
Further classification:customer
• Additional attributes: customer-id, credit-ratingemployee
• Additional attributes: employee-id, salary
Note similarities to object-oriented programming
Specialization: Example
Aggregation
No relationships between relationshipsE.g.: Associate account officers with has account relationship set
customer has account
account officer
employee
?
Aggregation
Associate an account officer with each account ? What if different customers for the same account can have different account
officers ?
customer has account
account officer
employee
?
Aggregation
Solution: Aggregation
customer has account
account officer
employee
More…
Read Chapter 2 for:Specialization/Aggregation details
• Different types of specialization’s etc
Generalization: opposite of specialization
Lower- and higher-level entities
Attribute inheritance
…
An Example: Employees can have multiple phones
E/R Data ModelDesign Issue #1: Entity Sets vs. Attributes
Employee Employee PhoneUsesvs(a) (b)
To resolve, determine how phones are used1. Can many employees share a phone?
(If yes, then (b))
2. Can employees have multiple phones?
(if yes, then (b), or (a) with multivalued attributes)
3. Else
(a), perhaps with composite attributes
phone_no phone_loc no loc
Employee
no loc
phone
An Example: How to model bank loans
E/R Data ModelDesign Issue #2: Entity Sets vs. Relationship Sets
vs
(a)
To resolve, determine how loans are issued1. Can there be more than one customer per loan?
• If yes, then (a). Otherwise, loan info must be replicated for each customer (wasteful, potential update anomalies)
2. Is loan a noun or a verb?• Both, but more of a noun to a bank. (hence (a) probably more appropriate)
Customer
(b)
Loans BranchCustomer LoanBorrows
lno amt ssn name ssn name
lno amt
bname bcity
An Example: Works_At
E/R Data ModelDesign Issue #3: N-ary vs Binary Relationship Sets
Employee Works_at Branch
Dept
Employee WAE Branch
Dept
WABinary:
Ternary:
WAB
WAD
vs(Joe, Moody, Acct) Works_At
(Joe, w3) WAE
(Moody, w3) WAB
(Acct, w3) WAD
Choose n-arywhen possible!(Avoids redundancy,update anomalies)
Example Design
We will model a university databaseMain entities:
• Professor
• Projects
• Departments
• Graduate students
• etc…
professorarea
name
SSN
rank
projectstart
sponsor
proj-number
budget
deptoffice
name
dept-no
homepage
gradage
name
SSN
degree
professorarea
name
SSN
rank
projectstart
sponsor
proj-number
budget
deptoffice
name
dept-no
homepage
gradage
name
SSN
degree
professorarea
name
SSN
rank
projectstart
sponsor
proj-number
budget
deptoffice
name
dept-no
homepage
gradage
name
SSN
degree
PI
Co-PI
RA
Major
ChairSupervises
Mentorad
vise
e
advi
sor
Appt
Time (%)
professorarea
name
SSN
rank
projectstart
sponsor
proj-number
budget
deptoffice
name
dept-no
homepage
gradage
name
SSN
degree
PI
Co-PI
RA
Major
Chair ApptSupervises
Mentorad
vise
e
advi
sor
Time (%)
professorarea
name
SSN
rank
projectstart
sponsor
proj-number
budget
deptoffice
name
dept-no
homepage
gradage
name
SSN
degree
PI
Co-PI
RA
Major
Chair ApptSupervises
Mentorad
vise
e
advi
sor
Time (%)
And so on…
Summary
Entity-relationship ModelIntuitive diagram-based representation of domain knowledge, data
properties etc…Two key concepts:
• Entities• Relationships
Additional Details:• Relationship cardinalities• Keys• Participation Constraints• …
Database Design Steps
Three Levels of Modeling
info
Conceptual Data Model
Logical Data Model
Physical Data Model
Conceptual DB design
Logical DB design
Physical DB design
Entity-relationship Model Typically used for conceptual database design
Relational Model Typically used for logical database design
Review: Entity-Relationship Model
Basics
E1 Entity set
R Relationship set
Attribute (primary key if underlined)
E1
……
…
R E2
a1 an
c1 ck
b1 bm
a
Thoughts…
Nothing about actual data
How is it stored ?
No talk about the query languages
How do we access the data ?
Semantic vs Syntactic Data ModelsRemember: E/R Model is used for conceptual modeling
Many conceptual models have the same properties
They are much more about representing the knowledge than about database storage/querying
Thoughts…
Basic design principles
FaithfulMust make sense
Satisfies the application requirements
Models the requisite domain knowledgeIf not modeled, lost afterwards
Avoid redundancyPotential for inconsistencies
Go for simplicity
Typically an iterative process that goes back and forth
Relational Data Model
• Before = “Network Data Model” (Cobol as DDL, DML)• Very contentious: Database Wars (Charlie Bachman vs. Mike Stonebraker)
Introduced by Ted Codd (late 60’s – early 70’s)
1. Separation of logical, physical data models (data independence)2. Declarative query languages3. Formal semantics4. Query optimization (key to commercial success)
Relational data model contributes:
Key Abstraction: Relation
bname acct_no balance
Downtown
Brighton
Brighton
A-101
A-201
A-217
500
900
500
Account =
Terms:
• Tables (aka: Relations)
Why called Relations?
Why Called Relations?
Given sets: R = {1, 2, 3}, S = {3, 4}
• R S = { (1, 3), (1, 4), (2, 3), (2, 4), (3, 3), (3, 4) }
• A relation on R, S is any subset () of R S (e.g: { (1, 4), (3,
4)})
Mathematical relations
Account Branches Accounts Balances{ (Downtown, A-101, 500),
(Brighton, A-201, 900),(Brighton, A-217, 500) }
Database relationsGiven attribute domains
Branches = { Downtown, Brighton, … }Accounts = { A-101, A-201, A-217, … }Balances = R
Relations
bname acct_no balance
Downtown
Brighton
Brighton
A-101
A-201
A-217
500
900
500
Account =
Relational database semantics defined in terms of mathematical relations
{ (Downtown, A-101, 500), (Brighton, A-201, 900), (Brighton, A-217, 500) }
Considered equivalent to…
Relations
bname acct_no balance
Downtown
Brighton
Brighton
A-101
A-201
A-217
500
900
500
• Rows (aka: tuples)
Account =
Terms:
• Columns (aka: attributes)
{ (Downtown, A-101, 500), (Brighton, A-201, 900), (Brighton, A-217, 500) }
Considered equivalent to…
• Tables (aka: Relations)
• Schema (e.g.: Acct_Schema = (bname, acct_no, balance))
Definitions
1. Relation Schema (or Schema) A list of attributes and their domains
We will require the domains to be atomic
E.g. account(account-number, branch-name, balance)
• Relation Instance A particular instantiation of a relation with actual values
Will change with time
bname acct_no balance
Downtown
Brighton
Brighton
A-101
A-201
A-217
500
900
500
Programming language equivalent: A variable (e.g. x)
Programming language equivalent: Value of a variable
Rest of the Class
• Converting from an E/R diagram to a relational schema
– Remember: We still use E/R models for conceptual modeling of the database
• Relational Algebra– Data retrieval language
E/R Diagrams Relations
Convert entity sets into a relational schema with the same set of attributes
Customer
cname ccity cstreet
Customer_Schema(cname, ccity, cstreet)
Branch
bname bcity assetsBranch_Schema(bname, bcity, assets)
E/R Diagrams Relations
Convert relationship sets also into a relational schema
Remember: A relationship is completely described by primary keys of associate entities and its own attributes
Depositor_Schema(cname, acct-no, access-date)
Account
Customer
Depositor
acct-no balance
cname ccity cstreet
access-date
Well… Not quite. We can do better.It depends on the relationship cardinality
Customer_Schema(cname, ccity, cstreet)
Account_Schema(acct-no, balance)
E/R Diagrams Relations
Say One-to-Many Relationship from Customer to Account
Many accounts per customer
Account
Customer
Depositor
acct-no balance
cname ccity cstreet
access-date
Customer_Schema(cname, ccity, cstreet)
Account_Schema(acct-no, balance,cname, access-date)
Exactly same information, fewer tables
E/R Diagrams Relations
E/R Relational SchemaEntity Sets
E = (a1, …, an)E1
…a1 an
E/R Diagrams Relations
E/R Relational SchemaEntity Sets
E = (a1, …, an)
Relationship Sets
R = (a1, b1, c1, …, cn)
a1: E1’s key
b1: E2’s key
c1, …, ck: attributes of R
Not the whole story for Relationship Sets …
E1
…a1 an
E1
……
…
R E2
a1 an
c1 ck
b1 bm
E/R Diagrams Relations
Relationship Cardinality Relational Schema
n:m E1 = (a1, …, an)E2 = (b1, …, bm)R = (a1, b1, c1, …, cn)
R
E1
……
…
R E2
a1 an
c1 ck
b1 bm
E/R Diagrams Relations
Relationship Cardinality Relational Schema
n:m E1 = (a1, …, an)E2 = (b1, …, bm)R = (a1, b1, c1, …, cn)
n:1 E1 = (a1, …, an, b1, c1, …, cn)E2 = (b1, …, bm)
R
R
E1
……
…
R E2
a1 an
c1 ck
b1 bm
E/R Diagrams Relations
Relationship Cardinality Relational Schema
n:m E1 = (a1, …, an)E2 = (b1, …, bm)R = (a1, b1, c1, …, cn)
n:1 E1 = (a1, …, an, b1, c1, …, cn)E2 = (b1, …, bm)
1:n E1 = (a1, …, an)E2 = (b1, …, bm,, a1, c1, …, cn)
R
R
R
E1
……
…
R E2
a1 an
c1 ck
b1 bm
E/R Diagrams Relations
Relationship Cardinality Relational Schema
n:m E1 = (a1, …, an)E2 = (b1, …, bm)R = (a1, b1, c1, …, cn)
n:1 E1 = (a1, …, an, b1, c1, …, cn)E2 = (b1, …, bm)
1:n E1 = (a1, …, an)E2 = (b1, …, bm,, a1, c1, …, cn)
1:1
Treat as n:1 or 1:n
R
R
R
R
E1
……
…
R E2
a1 an
c1 ck
b1 bm
Translating E/R Diagrams to Relations
Acct-BranchAccount Branch
BorrowerCustomer Loan
Depositor Loan-Branch
Q. How many tables does this get translated into?A. 6 (account, branch, customer, loan, depositor, borrower)
acct_no balance bname bcity assets
cname ccity cstreet lno amt
Bank DatabaseAccount
bname acct_no balance
Depositor
cname acct_no
Customer
cname cstreet ccity
Branch
bname bcity assets
Borrower
cname lno
Loan
bname lno amt
Bank DatabaseAccount
bname acct_no balance
DowntownMianusPerryR.H.
BrightonRedwoodBrighton
A-101A-215A-102A-305A-201A-222A-217
500700400350900700750
Depositor
cname acct_no
JohnsonSmithHayesTurner
JohnsonJones
Lindsay
A-101A-215A-102A-305A-201A-217A-222
Customer
cname cstreet ccity
JonesSmithHayesCurry
LindsayTurner
WilliamsAdamsJohnsonGlennBrooksGreen
MainNorthMainNorthPark
PutnamNassauSpringAlma
Sand HillSenatorWalnut
HarrisonRye
HarrisonRye
PittsfieldStanfordPrincetonPittsfieldPalo AltoWoodsideBrooklynStanford
Branch
bname bcity assets
DowntownRedwood
PerryMianus
R.H.Pownel
N. TownBrighton
BrooklynPalo AltoHorseneckHorseneckHorseneckBennington
RyeBrooklyn
9M2.1M1.7M0.4M8M
0.3M3.7M7.1M
Borrower
cname lno
JonesSmithHayes
JacksonCurrySmith
WilliamsAdams
L-17L-23L-15L-14L-93L-11L-17L-16
Loan
bname lno amt
DowntownRedwood
PerryDowntown
MianusR.H.Perry
L-17L-23L-15L-14L-93L-11L-16
1000200015001500500900
1300
E/R Diagrams & Relations
E/R Relational Schema
Weak Entity Sets
E1 = (a1, …, an)
E2 = (a1, b1, …, bm)E1
……
IR E2
a1 anb1 bm
E/R Diagrams & Relations
E/R Relational SchemaMultivalued Attributes
Emp = (ssn, name)
Emp-Phones = (ssn, phone)
Emp
ssn name
001
…
Smith
…
Emp-Phones
ssn phone
001
001
…
4-1234
4-5678
…
Employee
ssn name phone
E/R Diagrams & Relations
E/R Relational Schema
Subclasses
Method 1:
E = (a1, …, an)
E1 = (a1, b1, …, bm)
E2 = (a1, c1, …, ck)
E
E2
ISA
E1
……b1bm c1 ck
…a1an
E/R Diagrams & Relations
E/R Relational SchemaSubclasses
Method 1:
E = (a1, …, an)
E1 = (a1, b1, …, bm)
E2 = (a1, c1, …, ck)
Method 2:
E1 = (a1, …, an, b1, …, bm)
E2 = (a1, …, an, c1, …, ck)
E
E2
ISA
E1
……b1bm c1 ck
…a1an
E/R Diagrams & Relations
Subclasses example:
Method 1:Account = (acct_no, balance)SAccount = (acct_no, interest)CAccount = (acct_no, overdraft)
Method 2:SAccount = (acct_no, balance, interest)CAccount = (acct_no, balance, overdraft)
Q: When is method 2 not possible?
A: When subclassing is partial
Keys and Relations
1. Superkeys• set of attributes of table for which every row has distinct set of values
2. Candidate keys•“minimal” superkeys
3. Primary keys•DBA-chosen candidate keys
As in the E/R Model:
Act as Integrity Constraintsi.e., guard against illegal/invalid instance of given schema
e.g., Branch = (bname, bcity, assets)
bname bcity assets
Brighton
Brighton
Brooklyn
Boston
5M
3M
Invalid!!