database conceptual and logical design zachary g. ives university of pennsylvania cis 550 –...

44
Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 me slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Post on 19-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

Database Conceptual and Logical Design

Zachary G. IvesUniversity of Pennsylvania

CIS 550 – Database & Information Systems

October 4, 2005

Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Page 2: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

2

Administrivia

Homework 2 due now

Homework 3 handed out (due on the 13th)

Page 3: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

3

Modifying the Database:Inserting Data

Inserting a new literal tuple is easy, if wordy:

INSERT INTO PROFESSOR(fid, name)VALUES (4, ‘Simpson’)

But we can also insert the results of a query!

INSERT INTO PROFESSOR(fid, name) SELECT sid AS fid, name FROM STUDENT WHERE sid < 20

Page 4: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

4

Deleting and Modifying Tuples

Deletion is a fairly simple operation:

DELETEFROM STUDENT SWHERE S.sid < 25

So is insertion:

UPDATE STUDENT SSET S.sid = 1 + S.sid, S.name = ‘Janet’WHERE S.name = ‘Jane’

Page 5: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

5

I’m Building an App: How Do I Talk to the DB?

Generally, apps are in a different (“host”) language with embedded SQL statements Static: SQLJ, embedded SQL in C Dynamic: ODBC, JDBC, ADO, OLE DB, …

Typically, predefined mappings between host language types and SQL types (e.g., VARCHAR String or char[])

Page 6: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

6

The Impedance Mismatch and Cursors

SQL is set-oriented – it returns relations There’s no relation type in most

languages! Solution: result sets and cursors that are

opened, read, as if from a file

Page 7: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

7

JDBC: Dynamic SQL for Java

See Chapter 6 of the text for more info

import java.sql.*;Connection conn = DriverManager.getConnection(…);PreparedStatement stmt =

conn.prepareStatement(“SELECT * FROM STUDENT”);…ResultSet rs = stmt.executeQuery ();while (rs.next()) {

sid = rs.getInteger(1);…

}

Page 8: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

8

Database-Backed Web Sites

We all know traditional static HTML web sites:

Web Browser

HTTP-Request

GET ...

Web-Server

File-System

Load File

HTML-File

HTML-File

Page 9: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

9

Interaction Is Achieved via HTML Forms

<html><form action=“http://my.com/some-

handler-url” method=“POST”><input type=“text” name=“value1” /><input type=“submit” value=“Send” /><input type=“rest” value=“Cancel” />

</form>

Page 10: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

10

Java-Server-Process

DB Access with JavaApplets and Server Processes

Sybase

Java Applet

TCP/UDP

IP

Oracle ...

JDBC-Driver

JDBC-Driver

JDBC-Driver

JDBC Driver manager

BrowserJVM

Page 11: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

11

Java Applets: Discussion

Advantages: Can take advantage of client processing Platform independent – assuming standard java

Disadvantages: Requires JVM on client; self-contained Inefficient: loading can take a long time ... Resource intensive: Client needs to be state of the art Restrictive: can only connect to server where applet was

loaded from (for security … can be configured)

A common alternative is to run code on the server-side CGI, ASP/PHP/JSP, ASP.Net, servlets

Page 12: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

12

Server Pages (*P) and Servlets(IIS, Tomcat, …)

File-SystemWeb Server

HTTP Request

HTML File

Web Server

Load File

FileHTML?

HTML

I/O, Network, DB

Script?Output

Server Extension

Page 13: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

13

ASP/JSP/PHP “Escapes”

<html><head><title>Sample</title></head><body><h1>Sample</h1><%

myClass.Process(request.getParameter(“test”)); %>

<%= request.getParameter(“value”); %></body></html>

Page 14: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

14

Servlets

class MyClass extends HttpServlet {public void doGet(HttpRequest req, HttpResponse res) … {

res.println(“<html><head><title>Test</title></head></html>”);}

}

Page 15: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

15

ASP/JSP/PHP Versus Servlets

The goal: combine direct HTML (or XML) output with program code that’s executed at the server

The code is responsible for generating more HTML, e.g., to output the results of a database table as HTML table elements

How might I do this? HTML with embedded code (*P) Code that prints out HTML (Servlets)

Page 16: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

16

Now: How Do We Get the Database in the First Place?

Database design theory!

Neat outcome: we can actually prove that we have optimal design, in a manner of speaking…

But first we need to understand how to visualize in pretty pictures…

Page 17: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

17

Databases Anonymous:A 6-Step Program

1. Requirements Analysis: what data, apps, critical operations

2. Conceptual DB Design: high-level description of data and constraints – typically using ER model

3. Logical DB Design: conversion into a schema4. Schema Refinement: normalization

(eliminating redundancy)5. Physical DB Design: consider workloads,

indexes and clustering of data6. Application/Security Design

Page 18: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

18

Entity-Relationship Diagram(based on our running example)

STUDENTS COURSESTakes

namesid serno subj

PROFESSORS

Teaches

cid

fid name

entity set relationship set

exp-grade

attributes (recall these have domains)

Underlined attributes are keys

semester

Page 19: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

19

Conceptual Design Process

What are the entities being represented?

What are the relationships?

What info (attributes) do we store about each?

What keys & integrity constraints do we have?

name

STUDENTS

Takes

sid

exp-grade

Page 20: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

20

Translating Entity Sets toLogical Schemas & SQL DDL

CREATE TABLE STUDENTS (sid INTEGER, name VARCHAR(15) PRIMARY KEY (sid) )

CREATE TABLE COURSES (serno INTEGER, subj VARCHAR(30), cid CHAR(15), PRIMARY KEY (serno) )

Fairly straightforward to generate a schema…

Page 21: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

21

Translating Relationship Sets

Generate schema with attributes consisting of: Key(s) of each associated entity (foreign keys) Descriptive attributes

CREATE TABLE Takes (sid INTEGER, serno INTEGER, exp-grade CHAR(1), PRIMARY KEY (?), FOREIGN KEY (serno) REFERENCES COURSES, FOREIGN KEY (sid) REFERENCES STUDENTS)

Page 22: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

22

… OK, But What about Connectivityin the E-R Diagram?

Attributes can only be connected to entities or relationships

Entities can only be connected via relationships

As for the edges, let’s consider kinds of relationships and integrity constraints…

COURSESPROFESSORS Teaches

(warning: the book has a slightly different notation here!)

Page 23: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

23

Logical Schema Design

Roughly speaking, each entity set or relationship set becomes a table (not always be the case; see Thursday)

Attributes associated with each entity set or relationship set become attributes of the relation; the key is also copied (ditto with foreign keys in a relationship set)

Page 24: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

24

Binary Relationships & Participation

Binary relationships can be classified as 1:1, 1:Many, or Many:Many, as in:

1:1 1:n m:n

Page 25: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

25

1:Many (1:n) Relationships

Placing an arrow in the many one direction, i.e. towards the entity that’s ref’d via a foreign key

Suppose profs teach multiple courses, but may not have taught yet:

Suppose profs must teach to be on the roster:

COURSESPROFESSORS Teaches

COURSESPROFESSORS Teaches

Partial participation (0 or more…)

Total participation (1 or more…)

Page 26: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

26

Many-to-Many Relationships

Many-to-many relationships have no arrows on edges The “relationship set” relation has a key that

includes the foreign keys, plus any other attributes specified as key

STUDENTS COURSESTakes

Page 27: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

27

Examples

Suppose courses must be taught to be on the roster

Suppose students must have enrolled in at least one course

Page 28: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

28

Representing 1:n Relationships in Tables

CREATE TABLE Teaches( fid INTEGER, serno CHAR(15), semester CHAR(4), PRIMARY KEY (serno), FOREIGN KEY (fid) REFERENCES PROFESSORS, FOREIGN KEY (serno) REFERENCES Teaches)

CREATE TABLE Teaches_Course( serno INTEGER, subj VARCHAR(30), cid CHAR(15), fid CHAR(15), when CHAR(4), PRIMARY KEY (serno), FOREIGN KEY (fid) REFERENCES PROFESSORS)

• Key of relationship set:

• Or embed relationship in “many” entity set:

Page 29: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

29

1:1 Relationships

If you borrow money or have credit, you might get:

What are the table options?

CreditReport Borrower

delinquent?

ssn

namedebt

Describesrid

Page 30: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

30

Roles: Labeled Edges

Sometimes a relationship connects the same entity, and the entity has more than one role:

This often indicates the need for recursive queries

name

qty

Partsid

Assembly Subpart

Includes

Page 31: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

31

DDL for Role ExampleCREATE TABLE Parts (Id INTEGER, Name CHAR(15), … PRIMARY KEY (ID) )

CREATE TABLE Includes (Assembly INTEGER, Subpart INTEGER, Qty INTEGER, PRIMARY KEY (Assemb, Sub), FOREIGN KEY (Assemb) REFERENCES Parts, FOREIGN KEY (Sub) REFERENCES Parts)

Page 32: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

32

Married

Roles vs. Separate Entities

Husband Wifeid id

Husband Wife

name name

What is the differencebetween these two representations?

Married

Personid

name

Page 33: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

33

ISA Relationships: Subclassing(Structurally)

Inheritance states that one entity is a “special kind” of another entity: “subclass” should be member of “base class”

name

ISA

Peopleid

Employees salary

Page 34: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

34

But How Does this Translateinto the Relational Model?

Compare these options: Two tables, disjoint tuples Two tables, disjoint attributes One table with NULLs Object-relational databases

Page 35: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

35

Weak Entities

A weak entity can only be identified uniquely using the primary key of another (owner) entity. Owner and weak entity sets in a one-to-many

relationship set, 1 owner : many weak entities Weak entity set must have total

participation

People Feeds Pets

ssn name weeklyCost name species

Page 36: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

36

Translating Weak Entity Sets

Weak entity set and identifying relationship set are translated into a single table; when the owner entity is deleted, all owned weak entities must also be deleted

CREATE TABLE Feed_Pets ( name VARCHAR(20), species INTEGER, weeklyCost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (pname, ssn), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE)

Page 37: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

37

N-ary Relationships

Relationship sets can relate an arbitrary number of entity sets:

Student Project

Advisor

IndepStudy

Page 38: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

38

Summary of ER Diagrams

One of the primary ways of designing logical schemas

CASE tools exist built around ER (e.g. ERWin, PowerBuilder, etc.) Translate the design automatically into DDL,

XML, UML, etc. Use a slightly different notation that is better

suited to graphical displays Some tools support constraints beyond what ER

diagrams can capture Can you get different ER diagrams from the

same data?

Page 39: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

39

Schema Refinement & Design Theory

ER Diagrams give us a start in logical schema design

Sometimes need to refine our designs further There’s a system and theory for this Focus is on redundancy of data

Let’s briefly touch on one key concept in preparation for Thursday’s lecture on normalization…

Page 40: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

40

Not All Designs are Equally Good

Why is this a poor schema design?

And why is this one better?

Stuff(sid, name, cid, subj, grade)

Student(sid, name)Course(cid, subj)Takes(sid, cid, exp-grade)

Page 41: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

41

Focus on the Bad Design

Certain items (e.g., name) get repeated Some information requires that a student be

enrolled (e.g., courses) due to the key

sid

name

cid

subj

exp-grade

1 Sam 570

AI B

23 Nitin 550

DB A

45 Jill 505

OS A

1 Sam 505

OS C

Page 42: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

42

Functional DependenciesDescribe “Key-Like” Relationships

A key is a set of attributes where:If keys match, then the tuples match

A functional dependency (FD) is a generalization:If an attribute set determines another, written A ! B

then if two tuples agree on A, they must agree on B:

sid ! Address

What other FDs are there in this data?

FDs are independent of our schema design choice

Page 43: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

43

Formal Definition of FD’s

Def. Given a relation scheme R (a set of attributes) and subsets X,Y of R:An instance r of R satisfies FD X Y if,

for any two tuples t1, t2 2 r, t1[X ] = t2[X ] implies t1[Y] = t2[Y]

For an FD to hold for scheme R, it must hold for every possible instance of r

(Can a DBMS verify this? Can we determine this by looking at an instance?)

Page 44: Database Conceptual and Logical Design Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 4, 2005 Some slide content

44

General Thoughts on Good Schemas

We want all attributes in every tuple to be determined by the tuple’s key attributesWhat does this say about redundancy?

But: What about tuples that don’t have keys (other

than the entire value)? What about the fact that every attribute

determines itself?

Stay tuned for Thursday!