introduction to database design donghui zhang ccis, northeastern university
Post on 14-Dec-2015
237 Views
Preview:
TRANSCRIPT
Introduction toIntroduction toDatabase DesignDatabase Design
Donghui ZhangDonghui Zhang
CCIS, Northeastern UniversityCCIS, Northeastern University
OutlineOutline
Database and DBMSDatabase and DBMS Architecture of Database ApplicationsArchitecture of Database Applications Database DesignDatabase Design Database Application ProgrammingDatabase Application Programming
Database, DBMSDatabase, DBMS A A DatabaseDatabase is a very large, is a very large,
integrated collection of integrated collection of datadata.. A A Database Management System Database Management System
(DBMS)(DBMS) is a is a softwaresoftware designed to designed to store and manage databases.store and manage databases.
A A Database ApplicationDatabase Application is a is a softwaresoftware which enables the users which enables the users to access the database.to access the database.
Why DBMS?Why DBMS?
We currently live in a world experiencing We currently live in a world experiencing information explosion.information explosion.
To manage the huge amount of data: To manage the huge amount of data: DBMSDBMS
the total RDBMS market in 2003 was $7 the total RDBMS market in 2003 was $7 billion in license revenues.billion in license revenues.
Much more money was spent to develop Much more money was spent to develop Database applications.Database applications.
RDBMS New Li scence Revenue
0500
10001500200025003000
I BM
Orac
l e
Micr
osof
t
NCR
Tera
data
Othe
rs
#mil
lion
dol
lars
20022003
Total revenue: 7.1 billion in 2003.
The worldwide database management The worldwide database management software market saw double-digit software market saw double-digit growth in 2004. growth in 2004.
The five-year forecast calls for a The five-year forecast calls for a compound annual growth rate of nearly compound annual growth rate of nearly 6 percent, bringing the market to $12.7 6 percent, bringing the market to $12.7 billion in new license revenue by 2009. billion in new license revenue by 2009.
Title: Forecast: Database Management Title: Forecast: Database Management Systems Software, Worldwide, 2003-2009 Systems Software, Worldwide, 2003-2009
Author: Colleen Graham, GartnerAuthor: Colleen Graham, Gartner Time: April 21, 2005Time: April 21, 2005
DBMS can Provide …DBMS can Provide …
Data independence and efficient Data independence and efficient access.access.
Reduced application development Reduced application development time.time.
Data integrity and security.Data integrity and security. Uniform data administration.Uniform data administration. Concurrent access, recovery from Concurrent access, recovery from
crashes.crashes.
DBMS Historic PointsDBMS Historic Points
First DBMS developed by Turing First DBMS developed by Turing Award winner Award winner Charles BachmanCharles Bachman in in the early 1960s.the early 1960s.
in 1970, Turing Award winner in 1970, Turing Award winner Edgar Edgar CoddCodd proposed the relational data proposed the relational data model.model.
in the late 1980s, IBM proposed SQL.in the late 1980s, IBM proposed SQL.
OutlineOutline
Database and DBMSDatabase and DBMS Architecture of Database ApplicationsArchitecture of Database Applications Database DesignDatabase Design Database Application ProgrammingDatabase Application Programming
Components of Data-Intensive SystemsComponents of Data-Intensive Systems
Three separate types of functionality:Three separate types of functionality: Data managementData management Application logicApplication logic PresentationPresentation
Example: Course EnrollmentExample: Course Enrollment
-- -- Build a system using which students can Build a system using which students can enroll in courses:enroll in courses:
Data ManagementData Management• Student info, course info, instructor info, Student info, course info, instructor info,
course availability, pre-requisites, etc.course availability, pre-requisites, etc. Application LogicApplication Logic
• Logic to add a course, drop a course, create Logic to add a course, drop a course, create a new course, etc.a new course, etc.
PresentationPresentation• Log in different users (students, staff, Log in different users (students, staff,
faculty), display forms and human-readable faculty), display forms and human-readable outputoutput
The Three-Tier ArchitectureThe Three-Tier Architecture
Database System
Application Server
Client Program (Web Browser)Presentation tier
Middle tier
Data managementtier
E.g. What we useE.g. What we use
Database System
Application Server
Client Program (Web Browser)Presentation tier
Middle tier
Data managementtier
MySQL
ApacheJSP
HTML: An ExampleHTML: An Example<HTML><HTML> <HEAD></HEAD><HEAD></HEAD> <BODY><BODY> <h1>Barns and Nobble Internet <h1>Barns and Nobble Internet
Bookstore</h1>Bookstore</h1> Our inventory:Our inventory:
<h3>Science</h3><h3>Science</h3> <b>The Character of Physical <b>The Character of Physical
Law</b>Law</b> <UL><UL> <LI>Author: Richard <LI>Author: Richard
Feynman</LI>Feynman</LI><LI>Published 1980</LI><LI>Published 1980</LI><LI>Hardcover</LI><LI>Hardcover</LI>
</UL></UL>
<h3>Fiction</h3><h3>Fiction</h3>
<b>Waiting for the Mahatma</b><b>Waiting for the Mahatma</b>
<UL><UL>
<LI>Author: R.K. Narayan</LI><LI>Author: R.K. Narayan</LI>
<LI>Published 1981</LI><LI>Published 1981</LI>
</UL></UL>
<b>The English Teacher</b><b>The English Teacher</b>
<UL><UL>
<LI>Author: R.K. Narayan</LI><LI>Author: R.K. Narayan</LI>
<LI>Published 1980</LI><LI>Published 1980</LI>
<LI>Paperback</LI><LI>Paperback</LI>
</UL></UL>
</BODY></BODY>
</HTML></HTML>
HTML: static vs dynamicHTML: static vs dynamic
Static: you create an HTML file which is Static: you create an HTML file which is sent to the client’s web browser upon sent to the client’s web browser upon request. E.g.:request. E.g.:• your CCIS login is ‘donghui’, your CCIS login is ‘donghui’, • your HTML file is your HTML file is
/home/donghui/.www/index.html/home/donghui/.www/index.html• The URL is The URL is
http://www.ccs.neu.edu/home/donghui Dynamic: the HTML file is generated Dynamic: the HTML file is generated
dynamically via your ASP.NET code.dynamically via your ASP.NET code.
Another ViewAnother View
MySQL
Machine 1
Apache
Your JSP
Code
Machine 2
Client Machines
Client browser 1
Client browser 2
Client browser 3
Your database
Client-Server ArchitectureClient-Server Architecture
Data Management: DBMS @ Server.Data Management: DBMS @ Server. Presentation: Client program.Presentation: Client program. Application Logic: can go either way.Application Logic: can go either way.
• If combined with server: If combined with server: thin-clientthin-client architecturearchitecture
• If combined with client: If combined with client: thick-clientthick-client architecturearchitecture
Server Client
Thin-Client ArchitectureThin-Client Architecture
• Database server and web server too closely Database server and web server too closely coupled,coupled,
• E.g. Does not allow the application logic to E.g. Does not allow the application logic to access multiple databases on different servers.access multiple databases on different servers.
Server Client
Client
Client
Thick-Client ArchitectureThick-Client Architecture
• No central place to update the business logicNo central place to update the business logic• Security issues: Server needs to trust clientsSecurity issues: Server needs to trust clients• Does not scale to more than several 100s of Does not scale to more than several 100s of
clientsclients
Server Client
Client
Client
Advantages of the Three-Tier ArchitectureAdvantages of the Three-Tier Architecture Heterogeneous systems Heterogeneous systems
• Tiers can be independently maintained, modified, and Tiers can be independently maintained, modified, and replacedreplaced
Thin clientsThin clients• Only presentation layer at clients (web browsers)Only presentation layer at clients (web browsers)
Integrated data accessIntegrated data access• Several database systems can be handled transparently at Several database systems can be handled transparently at
the middle tierthe middle tier• Central management of connectionsCentral management of connections
ScalabilityScalability• Replication at middle tier permits scalability of business logicReplication at middle tier permits scalability of business logic
Software developmentSoftware development• Code for business logic is centralizedCode for business logic is centralized• Interaction between tiers through well-defined APIs: Can Interaction between tiers through well-defined APIs: Can
reuse standard components at each tierreuse standard components at each tier
OutlineOutline
Database and DBMSDatabase and DBMS Architecture of Database ApplicationsArchitecture of Database Applications Database DesignDatabase Design Database Application ProgrammingDatabase Application Programming
ER-ModelER-Model
EntityEntity: Real-world object : Real-world object distinguishable from other objects. distinguishable from other objects. E.g. Students, Courses.E.g. Students, Courses.
An entity has multiple An entity has multiple attributesattributes. . E.g. Students have ssn, name, E.g. Students have ssn, name, phone.phone.
Entities have Entities have relationshipsrelationships with with each other. E.g. Students each other. E.g. Students enrollenroll Courses.Courses.
Example of ER DiagramExample of ER Diagram
title
unitcidphone
name
ssn
EnrollStudents Courses
time
To implement the above design, store three tables in the database.
ssnssn namename phonephone11111111 JohnJohn 617-373-5120617-373-5120
22222222 AliceAlice 781-322-6084781-322-6084
33333333 VictorVictor 617-442-7798617-442-7798
Students
cidcid titletitle unitunitCSU430CSU430 Database DesignDatabase Design 44
CSG131CSG131 Transaction ProcessingTransaction Processing 44
CSG339CSG339 Data MiningData Mining 44
Courses
ssnssn cidcid timetime11111111 CSU430CSU430 Fall’03Fall’03
11111111 CSG339CSG339 Spring’04Spring’04
22222222 CSG131CSG131 Winter’03Winter’03
22222222 CSG339CSG339 Spring’04Spring’04
33333333 CSU430CSU430 Winter’01Winter’01
Enroll
Key Constraint in ER DiagramKey Constraint in ER Diagram
dname
addressdidphone
name
ssn
BelongsToStudents Departments
Many-to-one relationship: no need to be implemented as a table!
ssnssn namename phonephone diddid11111111 JohnJohn 617-373-5120617-373-5120 11
22222222 AliceAlice 781-322-6084781-322-6084 11
33333333 VictorVictor 617-442-7798617-442-7798 33
Students
diddid dnamedname addressaddress11 Computer ScienceComputer Science #161 Cullinane#161 Cullinane
22 Electrical EngineeringElectrical Engineering #300 Egan#300 Egan
33 PhysicsPhysics #112 Richard#112 Richard
Departments
Some Other Design ConceptsSome Other Design Concepts
Primary keyPrimary key Participation constraintParticipation constraint Normal forms (BCNF, 3-NF, etc.)Normal forms (BCNF, 3-NF, etc.) IS-A hierarchyIS-A hierarchy Ternary relationshipsTernary relationships
OutlineOutline
Database and DBMSDatabase and DBMS Architecture of Database ApplicationsArchitecture of Database Applications Database DesignDatabase Design Database Application ProgrammingDatabase Application Programming
SQL QuerySQL Query
Find the students in Computer Science Department .
SELECT S.nameFROM Students SWHERE S.did=1
• if we know the did is 1:
• otherwise:
SELECT S.nameFROM Students S, Departments DWHERE D.did=S.did AND D.dname=`Computer Science’
SQL in Application CodeSQL in Application Code SQL SQL commands can be called from commands can be called from
within a host language (e.g., within a host language (e.g., C++C++, , JavaJava) ) program.program.
Two main integration approaches:Two main integration approaches:• Embed SQL in the host language Embed SQL in the host language
(Embedded SQL, SQLJ)(Embedded SQL, SQLJ)• Create special API to call SQL Create special API to call SQL
commands (JDBC)commands (JDBC)
3232
Implementation of Implementation of Database SystemDatabase System
IntroductionIntroduction
Donghui ZhangDonghui Zhang
Partially using Prof. Hector Garcia-Molina’s slides (Notes01)http://www-db.stanford.edu/~ullman/dscb.html
3333
Isn’t Implementing a Database Isn’t Implementing a Database System Simple?System Simple?
Relations Statements Results
3434
Introducing the
Database Management System
• The latest from Megatron Labs• Incorporates latest relational technology• UNIX compatible
3535
Megatron 3000 Implementation Megatron 3000 Implementation DetailsDetails
Relations stored in files (ASCII)Relations stored in files (ASCII)
e.g., relation R is in /usr/db/Re.g., relation R is in /usr/db/R
Smith # 123 # CSJones # 522 # EE
.
.
.
3636
Megatron 3000 Implementation Megatron 3000 Implementation DetailsDetails
Directory file (ASCII) in Directory file (ASCII) in /usr/db/directory/usr/db/directory
R1 # A # INT # B # STR …R2 # C # STR # A # INT …
.
.
.
3737
Megatron 3000Megatron 3000Sample SessionsSample Sessions
% MEGATRON3000 Welcome to MEGATRON 3000!&
& quit%
.
.
.
3838
Megatron 3000Megatron 3000Sample SessionsSample Sessions
& select * from R #
Relation R A B C SMITH 123 CS
&
3939
Megatron 3000Megatron 3000Sample SessionsSample Sessions
& select A,B from R,S where R.A = S.A and S.C > 100 #
A B 123 CAR 522 CAT
&
4040
Megatron 3000Megatron 3000
To execute “To execute “select * from R where select * from R where conditioncondition”:”:
(1) Read directory file to get R attributes(1) Read directory file to get R attributes
(2) Read R file, for each line:(2) Read R file, for each line:
(a) Check condition(a) Check condition
(b) If OK, display(b) If OK, display
4141
Megatron 3000Megatron 3000
To execute “To execute “select A,B from R,S where select A,B from R,S where conditioncondition”:”:
(1) Read dictionary to get R,S attributes(1) Read dictionary to get R,S attributes
(2) Read R file, for each line:(2) Read R file, for each line:
(a) Read S file, for each line:(a) Read S file, for each line:
(i) Create join tuple(i) Create join tuple
(ii) Check condition(ii) Check condition
(iii) Display if OK(iii) Display if OK
4242
What’s wrong with the Megatron What’s wrong with the Megatron 3000 DBMS?3000 DBMS?
Expensive update and searchExpensive update and searche.g.,e.g., - To locate an employee with a given SSN, file - To locate an employee with a given SSN, file
scan.scan.
- To change “Cat” to “Cats”, complete file - To change “Cat” to “Cats”, complete file write.write.
• Solution: Indexing!
4343
What’s wrong with the Megatron What’s wrong with the Megatron 3000 DBMS?3000 DBMS?
Brute force query processingBrute force query processinge.g.,e.g., select *select *
from R,Sfrom R,S
where R.A = S.A and S.B > 1000where R.A = S.A and S.B > 1000
- Do select first?- Do select first?
- More efficient join?- More efficient join?
• Solution: Query optimization!
4444
What’s wrong with the Megatron What’s wrong with the Megatron 3000 DBMS?3000 DBMS?
No concurrency control or reliabilityNo concurrency control or reliabilitye.g.,e.g., - if two client programs read your bank - if two client programs read your bank
balance ($5000) and add $1000 to it…balance ($5000) and add $1000 to it…
- Crash.- Crash.
• Solution: Transaction management!
top related