fall 2008, infs614 1 database management systems infs 614-001 fall 08 instructor: carlotta...

50
Fall 2008, INFS614 Database Management Systems Database Management Systems INFS 614-001 INFS 614-001 Fall 08 Fall 08 Instructor: Carlotta Domeniconi carlotta @cs.gmu.edu http://www.cs.gmu.edu/~carlotta/teaching/ INFS-614-s08/info.html Lecture One Lecture One - - Introduction Introduction

Upload: caroline-fox

Post on 28-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Fall 2008, INFS614 1

Database Management Database Management SystemsSystems

INFS 614-001INFS 614-001Fall 08Fall 08

Instructor: Carlotta [email protected]

http://www.cs.gmu.edu/~carlotta/teaching/INFS-614-s08/info.html

Lecture OneLecture One - - IntroductionIntroduction

Fall 2008, INFS614 2

OutlineOutline

Course syllabus– Course Schedule– Homeworks & exams

Satisfaction of prerequisites– Strictly enforced: GMU HONOR CODE

applies!

Introduction to DB & DBMS– Outline of the entire course material

Fall 2008, INFS614 3

Front mattersFront matters To communicate with me:

– Email: [email protected], I will try to reply promptly.

– Office hours: by appointment.

Sign up for your Mason account. You may forward all your Mason emails to your favorite email address.

Fall 2008, INFS614 4

Front matters (cont.)Front matters (cont.)

GTA:– Huaming Liu and Chun-Kit Ngan– Email: [email protected]– Office hours: Tuesday, 4-6pm

(Huaming)– Room: 330 -ST2.

Fall 2008, INFS614 5

Front matters (cont.)Front matters (cont.) Required textbooks:

– Database Management Systems, 3rd ed. by Raghu Ramakrishnan & Johannes Gehrke, McGraw-Hill.

– Oracle 9i Programming: a Primer, by Rajshkhar Sunderraman, Addison Wesley, ISBN 0-321-19498-5

On-Line Course Resources: http://www.cs.gmu.edu/~carlotta/teaching/

INFS-614-f08/info.html– You are required to read all the material

there. The content will be updated frequently. So check the web site periodically, at least once every week, and every time before class!

Fall 2008, INFS614 6

Date Topic (chapter/section) HW assignment HW due

Aug 28 Introduction (chapter 1)    

Sep 4 ER model (chapter 2) 1  

Sep 11 Relational Model (chapter 3)     

Sep 18Relational Algebra (sections 4.1-4.2)

2 1

Sep 25 Relational Algebra (continued)    

Oct 2 SQL (sections 3.4, 5.1-5.5) 3 2

Oct 9 Review    3  

Oct 16 Midterm Exam

Oct 23 SQL (sections 5.6-5.15) 4

Oct 30 Functional Dependencies  

Nov 6Functional Dependencies (continued)

 5 4 

Nov 13 Decomposition and Normal Forms   

Nov 20 Advanced topics (or catch up)   5 

Dec 4 Review

Dec 11 Final Exam    

Fall 2008, INFS614 7

Submission and GradingSubmission and Grading Late submissions are not accepted: no exceptions! No make-up exams! On-time: within 5 minutes after the class

begins. Important: your homeworks must run

properly under the Oracle system in the labs. Final grades:

– homework assignments (20%)– Project (15%)– midterm exam (25%) – final exam (40%)

Fall 2008, INFS614 8

Honor Code SystemHonor Code System

GMU honor Codehttp://honorcode.gmu.edu/

For this class– Homeworks & exams require individual work. Study

groups are encouraged, but homeworks’ solutions and write up must be individual.

– Exams: individual effort, closed books

Satisfaction of prerequisites:– Honor code invoked.

Fall 2008, INFS614 9

Satisfaction of prerequisitesSatisfaction of prerequisites Prerequisites (strictly enforced)

– INFS-501 (Discrete mathematics)– INFS-515 (Computer architecture/organization)– INFS-590 (Program Design and data structures)

Specifically: – Good background in discrete mathematics

(e.g., set theory, mathematical logic, relations and functions);

– Programming (good knowledge of either C, C++ or Java);

– Data structures and algorithms, Computer architecture, and Operating systems.

Fall 2008, INFS614 10

Satisfaction of prerequisitesSatisfaction of prerequisitesFor INFS/SWE/ISA students:

Consult your letter of acceptance. It specifies your status with respect to these foundation courses. For each course, it must be that either – You were waved from the course (the

evidence should be either in the acceptance letter or in a subsequent official document).

– You took the course and received a grade of B or better.

Fall 2008, INFS614 11

Satisfaction of prerequisitesSatisfaction of prerequisites

For non-IS/SWE/ISA students, MUST DO THE FOLLOWING (by next week):– Consult the

description of each of the prerequisite courses in the university catalog.

– For each course, provide a list of one or more courses taken, that cover the subjects of that course, as follows: course-number, course-title, institution, year, final-grade;

– Syllabus of each course taken;– Copy of transcripts that shows equivalent classes taken

(with grade B or better);

– Current status;

Fall 2008, INFS614 12

Useful links for your Useful links for your computing needscomputing needs

http://labs.ite.gmu.edu/ (click on FAQ’s) for IT&E computing labs, IT&E cluster account, and Oracle DBMS information.

http://cs.gmu.edu/~ami/teaching/infs614/current/oracle.html

for information on our particular computing environment.

Fall 2008, INFS614 13

Introduction to DB and Introduction to DB and DBMSDBMS

Fall 2008, INFS614 14

What is a Database?What is a Database?

Database : A very large, integrated collection of data.

Data : Known facts about the real-world that can be recorded and have implicit meaning;

A database models real-world scenarios :– Entities – Relationships between entities

Fall 2008, INFS614 15

University DatabaseUniversity Database

Information about university environment

• Relationships :•Students’ enrollment course•Faculty teaching courses•Use of classroom for course•Prerequisite courses

• Entities :

• Students• Faculty• Courses• Classrooms

Fall 2008, INFS614 16

What is a DBMS?What is a DBMS?

A Database Management System (DBMS) is a software package designed to store, provide access and manage databases– One DBMS, many databases;

Database System: A database and a DBMS.

Why use a DBMS?

Fall 2008, INFS614 17

A simple problem: A simple problem: address listaddress list

Solution 1Solution 1: a blank notebook, entries recorded with a pen, in time order.

Advantages: simple, private, reliable, space efficient.

Disadvantages:– Hard to search;– Hard to add information (e.g., e-mail);– Hard to update information;– Hard to extract information (print Christmas cards);– Integrity and consistency (Mary Jones: see P. Jones

address, P. Jones-Smith entry);– Loosing it is a catastrophe!

Fall 2008, INFS614 18

Solution 2: a loose-leaf Solution 2: a loose-leaf notebook with n entries per notebook with n entries per

pagepage

Better:– Can keep it sorted by key;– Insertions & deletions can be done;

Same as Sol. 1 in other aspects:– No search by other keys (e.g., phone

number).

Fall 2008, INFS614 19

Solution 3: Text Solution 3: Text format,managed by text format,managed by text

editoreditor Advantages:

– Free format;– “Unlimited” size;– Easily copied (for backup);– Easily shared;– Sub-string searchable;– Easy Update.

Disadvantages:– Change requirements?

Fall 2008, INFS614 20

Complications with Complications with Solution 3Solution 3

File gets very large– Search gets slow and possibly imprecise.(E.g., search for “Elm Street” may yield “Wilhelm Street”)

SolutionSolution: structure entries into records with fields and add indexes over fields.

Database Concepts: Record Organization, Keys, Indexes

Fall 2008, INFS614 21

Complication 2: need to Complication 2: need to separate families from separate families from

addressesaddresses Why?

– People move;– Might forget to update addresses;– Want space economy: single point of update;– Important to separate for applications: 1 Christmas

card per residence! SolutionSolution: two files (one: people, one:

addresses). How do we link them? How many residences a person can have?

Database Concepts: Consistency, Normalization, Foreign Keys

Fall 2008, INFS614 22

Complication 3: multiple Complication 3: multiple associationassociation

People own, rent, manage residences May want to impose constraints in the

number of residences per person or vice versa.

Examples: Many to many (rich people); Many to one (single family); One to many (Builder); One to one (legal residence)Database Concepts: Relationships; Cardinality

Fall 2008, INFS614 23

Complication 4: dynamic Complication 4: dynamic nature of the datanature of the data

Add new information:– Cards sent and received– Zip+4

Requirements:– Adding fields– Summarizing

Database Concepts: Data Abstraction; Data Evolution

Fall 2008, INFS614 24

Complication 5: Ad hoc Complication 5: Ad hoc analysis and retrievalanalysis and retrieval

Example: – Find who sent me cards over the past 5

years, but received less than 3 cards from me.

Requirements:– A language– An implementation of retrieval functions

(correct and efficient).

Database Concepts: Query languages; Query optimization

Fall 2008, INFS614 25

Complication 6: SharingComplication 6: Sharing Different users, different organizations

– Other family members want to see names and addresses together

– You don’t want to give update access over your business contacts to anybody.

Solutions:– Use stored queries as ”windows” or “views” over the

database.– Ability to “reunite” data from different files.– Data not selected by the query is “not there”– Permissions

Database Concepts: Joins; Views; Security

Fall 2008, INFS614 26

Complication 7: Required Complication 7: Required existence of associated existence of associated

datadata Examples:

– Can’t send Christmas card to somebody without an address

– Names are not unique; only when associated with residence.

Solution:– Don’t insert a name if there is no address and vice

versa– Or tolerate multiple non-unique names

Database Concepts: Referential Integrity; Weak entity sets

Fall 2008, INFS614 27

Complication 8: Multiple Complication 8: Multiple updates in an all or none updates in an all or none

basisbasis Examples

– Two households merge (marriage)– Need to change residences (or other data)

for a group of people– Computer crashes in the middle of updates

Solution– Illusion of updates being done

simultaneously– Commit or rollback an entire chunk of workDatabase Concepts: Transactions; ACID

properties; Recovery

Fall 2008, INFS614 28

Complication 9: computer Complication 9: computer crashescrashes

Will I have my data after the crash?– Uncorrupted?– Consistent?

Solution:– Make sure data is available uncorrupted at a

point in the past (checkpoint)

Database Concepts: Durability; Consistency (ACID); Recovery

Fall 2008, INFS614 29

Complication 10: Complication 10: multimediamultimedia

Pictures, Audio, Text, … Requirements:

– Ability to store new data types– Content search– Integration with text and numeric data

Database Concepts: Multimedia databases; Query by content

Fall 2008, INFS614 30

Complication 11: You Complication 11: You become President! …become President! …

Of something …(US, Corporation, Local chapter of charity, your household)– Your address list grows exponentially– You realize some of the information is useful!

Examples:– zip codes in states where there are less than 5%

difference in Rep./Dem. Votes in 2004?– Which combinations of products sold best last year?

Database Concepts: Data Warehousing; Data Mining

Fall 2008, INFS614 31

Files vs. DBMSFiles vs. DBMS

Application must store large datasets between main memory and secondary storage (e.g., buffering, page-oriented access, etc.);

Special code to answer different queries;

Must protect data from inconsistency due to multiple concurrent users;

Crash recovery; Security and access control.

Fall 2008, INFS614 32

Why Use a DBMS?Why Use a DBMS?

Easier and More Efficient

Data independence and efficient access; Reduced application development time; Data integrity and security; Uniform data administration; Concurrent access, recovery from

crashes.

Fall 2008, INFS614 33

Data ModelsData Models A data model is a collection of concepts for

describing data. A schema is a description of a particular

collection of data, using a given data model. The relational model of data is the most

widely used model today.– Main concept: relation, basically a table with

rows and columns.– Every relation has a schema, which describes

the columns, or fields.

Fall 2008, INFS614 34

Relational ModelRelational Model

The main concept is a relation:– A table with rows and columns

Each row in the table is called a tuple

sid name login age gpa

5366 J ones J [email protected] 18 2.4

5498 Boon [email protected] 19 3.9

6756 Gioff rey Gioff [email protected] 23 3.5

.. … … … ..

Fall 2008, INFS614 35

Relational Model (cont.)Relational Model (cont.)

The relation schema specifies: – name of the relation,

– name of each attribute (column,field) and its type. Every attribute has an atomic type.

Relation Name

Student(sid:string, login:string, age:integer, gpa:real);

Attribute Name A Relation (Relation instance): a set of tuples.

Fall 2008, INFS614 36

Levels of AbstractionLevels of Abstraction

Many views, single conceptual (logical) schema and physical schema.– Views describe how

users see the data.

– Conceptual schema defines logical structure

– Physical schema describes the files and indexes used. Schemas are defined using DDL (Data Definition Language);

Data is modified/queried using DML (Data Manipulation Language).

Physical Schema

Conceptual Schema

View 1 View 2 View 3

Fall 2008, INFS614 37

Levels of AbstractionLevels of Abstraction

* Conceptual Schema : the data is described through the data model. It describes structure and constraints for the whole database.

* External Schema : how the users see and use the data. Many views of the data.

* Physical schema : describes the physical structure of the DB

* Mappings among schema levels are also needed. Programs and applications refer to an external schema, and are mapped by the DBMS to the conceptual schema for execution.

Conceptual, External Schemas are defined using Data Definition Language (DDL) : specification for defining the database schema

Fall 2008, INFS614 38

Example: University Example: University DatabaseDatabase

Conceptual schema:– Student (sid: string, name: string, login:

string, age: integer, gpa: real)– Courses (cid: string, cname: string, credits:

integer)– Enrolled (sid: string, cid: string, grade:

string) Physical schema:

– Relations stored as unordered files.– Index on first column of Students…

External schema (View):– Course_info (cid: string, enrollment: integer)

Fall 2008, INFS614 39

Data IndependenceData Independence

Applications insulated from how data is structured and stored.

Logical data independence: Protection from changes in logical structure of data.

Physical data independence: Protection from changes in physical structure of data. One of the most important benefits of using a DBMS!

Fall 2008, INFS614 40

Easy Manipulation & Access of the Easy Manipulation & Access of the InformationInformation

A DBMS enables users to create , update and query the data through Data Languages

Data Definition Language (DDL)

– Specification notation to create the Database schema

Data Manipulation Language (DML)

– A language for manipulating the data : updating the data and accessing the data

– The portion of a DML that allows to access the information through formulating queries is called the Query Language

Query : Request for retrieving data stored in a DBMS.

Fall 2008, INFS614 41

Concurrency ControlConcurrency Control Concurrent execution of user programs

is essential for good DBMS performance– Because disk accesses are frequent, and

relatively slow, it is important to keep the cpu not idle by working on several user programs concurrently.

Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed.

DBMS ensures such problems don’t arise: users can pretend they are using a single-user system.

Fall 2008, INFS614 42

Concurrency Control Concurrency Control (cont.)(cont.)

Example: One course still has space for one

more student. Two students are trying to enroll in

that course at the same time. The DBMS executes the two

requests in a serial order. Thus, only one student will be

enrolled.

Fall 2008, INFS614 43

TransactionTransaction

An execution of a DB program Key concept is transaction, which is an

atomic sequence of database actions (reads/writes).

ACID properties– A – Atomicity– C – Consistency– I – Isolation– D – Durability

How: log and concurrency control sub-system

Fall 2008, INFS614 44

Scheduling Concurrent TransactionsScheduling Concurrent Transactions

DBMS ensures that execution of {T1,…, Tn} is equivalent to some serial execution T1’…Tn’. – Before reading/writing an object, a transaction

requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of the transaction. (Strict 2PL locking protocol.)

– Idea: If an action of Ti (say, writing X) affects Tj (e.g., reads X), one of them, say Ti, will obtain the lock on X first and Tj is forced to wait until Ti completes; this effectively orders the transactions.

Fall 2008, INFS614 45

Ensuring AtomicityEnsuring Atomicity

DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a transaction.

Idea: Keep a log (history) of all actions carried out by the DBMS while executing a set of transactions:– Before a change is made to the database, the

corresponding log entry is forced to a safe location. (WAL – Write-Ahead Log – protocol);

– After a crash, the effects of partially executed transactions are undone using the log.

Fall 2008, INFS614 46

The LogThe Log

The following actions are recorded in the log:– Ti writes an object: the old and new value.

Log record must go to disk before the changed page!

– Ti commits/aborts: a log record indicating this action.

Log records chained together by the transaction id, so it’s easy to undo a specific transaction (e.g., resolve a deadlock)

All log related activities are handled transparently by the DBMS.

Fall 2008, INFS614 47

StructureStructure of a DBMS of a DBMS

A typical DBMS has a layered architecture

Each layer is composed of several modules

The architecture varies from vendor to vendor

Fall 2008, INFS614 48

• High initial investment and possible need for additional hardware.

• Overhead for providing generality, security, recovery, integrity and concurrency control.

Main cost of using aMain cost of using a DBMSDBMS

• If the Database and application are simple, well-defined and not expected to change.

• If there are stringent real-time requirements, that may not be met due to DBMS overhead.

• If access to data by multiple users is not required.

When aWhen a DBMS DBMS may be may be unnecessaryunnecessary

Fall 2008, INFS614 49

Database UsersDatabase Users End users (or DB application users) DB application programmers (more precisely,

they are DBMS users)– E.g. smart webmasters– This course is mostly to learn how to (start to)

be a DB application programmer. Database administrator (DBA)

– Designs logical /physical schemas– Handles security and authorization– Data availability, crash recovery – Database tuning as needs evolve

Must understand how a DBMS works!

Fall 2008, INFS614 50

SummarySummary DBMS used to maintain, query large datasets. Benefits include recovery from system

crashes, concurrent access, quick application development, data integrity and security.

Levels of abstraction give data independence. We will learn how to

– Set up a database Design (ERD and Relational Model), and refine (Relational

Normalization Theory)

– Use to query the database Relational Algebra and SQL