cs3223: database systems implementationtankl/cs3223/slides/intro.pdf · cs3223: a “second” db...

18
CS3223: Database Systems Implementation (http://www.comp.nus.edu.sg/~tankl/cs3223) Knowledge is of two kinds: we know a subject ourselves, or we know where we can find information upon it.” -- Samuel Johnson (1709-1784) CS3223 - Introduction 1

Upload: ledan

Post on 05-Apr-2018

236 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

CS3223: Database Systems Implementation(http://www.comp.nus.edu.sg/~tankl/cs3223)

“Knowledge is of two kinds: we know a subject ourselves, or we know where we can find information upon it.”

-- Samuel Johnson (1709-1784)CS3223 - Introduction 1

Page 2: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

Course Admin

Course Instructor: Professor TAN Kian‐LeeEmail: [email protected]: COM1, #03‐23URL: http://www.comp.nus.edu.sg/~tankl/cs3223

TA: TBA

CS3223 - Introduction 2

Page 3: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

CS3223: A “Second” DB Course

• Database design– ER design and 

normalization theory– Integrity management by 

design– Relational model

• Database programming– Relational algebra and 

calculus– Query languages  

• SQL• Embedded languages

CS2102 (Application) CS3223 (Implementation)

CS3223 - Introduction 3

Database Implementation• Storage management

Disk‐based data organization for efficient data access

Indexes Buffer management

• Query processing• Sorting • Query processing and 

optimization• Transaction management

– Concurrency control and recovery

Page 4: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

Database Courses @ SoC

CS3223 - Introduction 4

CS6220Advanced Topics in Data Mining

CS5228Knowledge 

Discovery & Data Mining

CS5226Database Tuning

CS3223Database Systems Implementation

CS2102Database Systems

CS4221 Database 

Applications Design & Tuning

CS6203Advanced Topics in DB Systems

CS4224Distributed Databases

CS4225 Massive Data Processing 

Techniques in Data Science

Page 5: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

Course Policies• Students are responsible for the following:

– Attending lectures & tutorials– Studying referenced materials– Checking course website/IVLE/emails for course‐related 

announcements/updates• Late assignment/project submissions will not be accepted 

without prior approval from lecturer• Students can seek clarifications on lecture materials as 

follows:– Post questions to IVLE Discussion Forum– Consult the lecturer during consultation hours– Email the lecturer

• Emailed questions may be posted to IVLE Forum and answered there or emailed to all

CS3223 - Introduction 5

Page 6: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

Course Policies (cont.)

• Zero‐tolerance for plagiarism– Plagiarism resources

• http://www.cdtl.nus.edu.sg/ug/resources/plagiarism.htm

– Plagiarism prevention• http://cit.nus.edu.sg/plagiarism‐prevention/

– Plagiarism E‐Module• http://emodule.nus.edu.sg/ac/

CS3223 - Introduction 6

Page 7: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

Course Policies (Cont.)

http://www.comp.nus.edu.sg/undergraduates/plagiarism.html

All students share the responsibility for upholding the academic standards and reputation of the University. Academic honesty is a prerequisite condition in the pursuit and acquisition of knowledge. Academic dishonesty is any misrepresentation with the intent to deceive or failure to acknowledge the source or falsification of information or inaccuracy of statements or cheating at examinations/tests or inappropriate use of resources. There are many forms of academic dishonesty and plagiarism is one of them. Plagiarism is generally defined as the practice of taking someone else’s work or ideas and passing them off as one’s own (The New Oxford Dictionary of English). The University does not condone plagiarism.

CS3223 - Introduction 7

Page 8: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

Course Policies (Cont.)

http://www.nus.edu.sg/registrar/adminpolicy/acceptance.html

• Students should adopt this rule ‐ You have the obligation to make clear to the assessor which is your own work, and which is the work of others. Otherwise, your assessor is entitled to assume that everything being presented for assessment is being presented as entirely your own work. This is a minimum standard.

• A student may not knowingly intend to plagiarise, but that should not be used as an excuse for plagiarism. Students should seek clarification from their instructors or supervisors if they are unsure whether or not they are plagiarising the work of another person.

CS3223 - Introduction 8

Page 9: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

Introduction

CS3223 - Introduction 9

Page 10: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

Isn’t Implementing a (Relational) Database (Management) System Simple?

Relations(Tables)

+Algorithms

Statements(SQL Queries)

Results(Output)

CS3223 - Introduction 10

Page 11: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

myDBMS: A Simple Implementation• Relations stored in files (ASCII), e.g., relation R is in /usr/db/R

• Directory file (ASCII) in /usr/db/directory

Smith # 123 # CSJones # 522 # EE

CS3223 - Introduction 11

R1 # A # INT # B # STR …R2 # C # STR # A # INT …

Page 12: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

Sample Sessions

% myDBMSWelcome to myDBMS!

&

& quit%

...

CS3223 - Introduction 12

Page 13: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

& select * from R where R.B = 123 #

Relation RA B CSMITH 123 CS

&

Sample Sessions

CS3223 - Introduction 13

Page 14: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

• To execute “select * from R where condition”:(1) Read dictionary to get R attributes(2) Read R file, for each line:

(a) Check condition(b) If OK, display

Sample evalutation strategy

CS3223 - Introduction 14

Page 15: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

What’s wrong with myDBMS?

• Tuple layout on disk– e.g., Change string from ‘Cat’ to ‘Cats’ and we have to rewrite file– ASCII storage is expensive– Deletions are expensive

• Search expensive; no indexes – Cannot find tuple with a given key quickly– Always have to read/scan full relation

• Brute force query processing

CS3223 - Introduction 15

Page 16: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

What’s wrong with myDBMS?

• No buffer manager– Need caching

• No concurrency control• No reliability

– Can lose data– Can leave operations half done

• No security– File system insecure– File system security is coarse

CS3223 - Introduction 16

• No application program interface (API)– How can a payroll program get at 

the data?

• No GUI• Poor dictionary facilities

• Cannot interact with other DBMS

Page 17: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

CS3223 - Introduction 17

Architecture of a DBMS

Files & Access Methods

Buffer Manager

Disk Space Manager

TransactionManager

LockManager

Concurrency Control

RecoveryManager

(Logging & Recovery)

Operator Evaluator

Plan Executor Parser

OptimizerQuery Execution Engine

DBMS

Index FilesData Files System Catalog Database

Applications (Web Forms, SQL Interface)

IntegrityManager

AccessControlManager

Page 18: CS3223: Database Systems Implementationtankl/cs3223/slides/intro.pdf · CS3223: A “Second” DB Course • Database design – ER design and normalization theory – Integrity management

Capabilities of a Modern (R)DBMS

• Persistence ‐ permanent storage of data• Efficiency ‐manage large volumes of data efficiently• High‐level access ‐ data model & language for defining database 

structures, retrieval and manipulation• Transaction management ‐ provide correct, concurrent access to 

the database by many users at once• Access control ‐ limit access by unauthorized users• Integrity management ‐ assure compliance to known constraints 

imposed by application semantics• Resiliency ‐ ability to recover from system failures without losing 

dataCS3223 - Introduction 18