introduction to database systems ruoming jin tth 9:15 – 10:30pm spring 2009 rm msb115

54
Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Post on 20-Dec-2015

228 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Introduction to Database Systems

Ruoming JinTTH 9:15 – 10:30pm

Spring 2009rm MSB115

Page 2: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Course GoalsThis course is an introduction to the design, use, and internal workings of database management system.

1. Understanding Relational Model/Algebra: data is represented as a set of two dimensional tables and operators are defined by manipulate the data.

2. Mastering SQL programming and mySQL database: you will master the operations on a database system.

3. Database design: how to build a real database for an real application?

4. Web+Database: PHP and C++ access of mySQL. 5. Database System Implementation:

Storage/Indexing/Transactions 1 & 5 separate a CS major with an IT professional 1, 2, 5 are the basics of a database system.

Page 3: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

References

• A. Silberschatz, H. F. Korth, S Sudarshan,

Database System Concepts, 5th Ed., McGrow Hill, 2005

http://www.db-book.com

• Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer Widom,

Database Systems, The Complete Book, Prentice Hall, 2002

http://www-db.stanford.edu/~ullman/dscb.html

• Class notes

Page 4: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Prerequisites

• CS 33001 – Data Structures

• CS31011 – Discrete Structures

• Structured Programming Language (C++)

• Software engineering topics related to project documentation and project design

Page 5: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Workload & Requirements

• Project• 3 Exams during the semester and Final Exam• Project 20% of the final grade 2 Exams 15% of the final grade per each exam 1 Exam 20% of the final grade Final 25% of the final grade Attendance 5%• No late projects are accepted• A – 91 – 100; B – 80-90; C – 70–79; D - >64

Page 6: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Class Schedule

• Week 1 – Database Overview

• Week 2,3 – Relational Model (exam 1)

• Week 4,5,6 –SQL+ mySQL database (2 labs+exam 2)

• Week 7,8, 9 – ER model+ Relational Database Design Theory (1 lab+ exam 3) -> project assignment

• Week 10, 11,12 – PHP programming + C++ database access (1 lab)

• Web 13, 14,15 – Advanced topic (indexing/Query Procesing/Transaction management) + (1 lab)

Page 7: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Database Overview

• File Management vs Database Management (why do we need database?) – Advantages of Database systems: storage persistence, programming interface, transaction management

• Data Model (What is Data?) • Database Language (How to manipulate data?) • DBMS Architecture and Database System Components

(How can you build a billion-dollar software, like Oracle? Or you can get it free, mySQL?)

• Users classification (What you can do and what you cannot do?)

Page 8: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Where are databases?

• You cannot avoid it and it’s everywhere!

• You can say it actually makes the current society and your life work!

• Banking/Credit card /Social Security Info…

• Online shopping/booking…

Page 9: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115
Page 10: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115
Page 11: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115
Page 12: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115
Page 13: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

The DBMS Marketplace

• Relational DBMS companies – Oracle, Sybase – are among the largest software companies in the world.

• IBM offers its relational DB2 system. With IMS, a nonrelational system, IBM is by some accounts the largest DBMS vendor in the world.

• Microsoft offers SQL-Server, plus Microsoft Access for the cheap DBMS on the desktop, answered by “lite” systems from other competitors.

• OpenSource: mySQL, postgreSQL

Page 14: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Pre-Database Era: Stone Age of Data

• Imagine you want build an online shopping website– Maintain products/categories (price, picture, properties, …)

– Customers accounts

• File is uninterpreted, unstructured collection of information• File operations: delete, catalog, create, rename, open, close,

read, write, find, …• Access methods: Algorithms to implement operations along

with internal file organization• Examples: File of Customers, File of Products; Access

method: implementation of a set of operations on those files

Page 15: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

C++ file programming

• open - open a file- specify how its opened (read/write) and type (binary/text)

• close - close an opened file

• read - read from a file

• write - write to a file

• seek - move a file pointer to somewhere in a file

Page 16: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

File Management System Problems

• Any question (access) on the data is a small program!!

• Data redundancy

• Data is not isolated from the access implementation (different format…)

• Multiple application (concurrent program) on the same file

Page 17: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Concurrent Program Execution

Program1

What is the final value of the account AC?

Program2

AC=AC-50

AC=AC-100

AC #103 450

Page 18: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Security Problems

• Allow access to the file only to the authorized personnel

• Ability to restrict access to parts of the record

• Ability to control operation usage by different users

• Protection from unauthorized use

• Protection from the derivation of unauthorized information

Page 19: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Data Integrity

• A database constraint is a logical constraint about the data expressed in a logical language.

– STUDENT.AGE >15

– If (STUDENT.CLASS ==cs43005) then (STUDENT.PRIOR_CLASS ==cs31001)

• Database is consistent if data at each time satisfies all integrity constraints.

• Input to any application is a set of consistent data. An application output is a set of consistent data.

Page 20: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Collection of Files

Hierarchical

60’s

70's

80's

90’s

now

RelationalChoice for most new

applications

Object Bases

Knowledge Bases

Network

Page 21: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Advantages of Databases

• Persistent Storage – Database not only provides persistent storage but also efficient access to large amounts of data

• Programming Interface – Database allows users to access and modify data using powerful query language. It provides flexibility in data management

• Transaction Management – Database supports a concurrent access to the data

Page 22: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Early Database Applications

• Airline Reservation Systems – Data items are: single passenger reservations; Information about flights and airports; Information about ticket prices and tickets restrictions.

• Banking Systems – Data items are accounts, customers, loans, mortgages, balances, etc. Failures are not tolerable. Concurrent access must be provided

• Corporate Records – Data items are: sales, accounts, bill of materials records, employee and their dependents

Page 23: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Modern Database Applications

• Client – Server architecture– DBMS serves as a server and client queries are sent to servers– Where to locate servers

• Multimedia Applications• Multidatabase Applications• Data Warehouses• It’s everywhere!!

Page 24: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Three Aspects to Studying DBMS's

1. Modeling and design of databases.

– Allows exploration of issues before committing to an implementation.

2. Programming: queries and DB operations like update.

3. DBMS implementation.

Page 25: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

What Is Data ?

• Different view points:

– A sequence of characters stored in computer memory or storage

– Interpreted sequence of characters stored in computer memory or storage

– Interpreted set of objects

• This maybe one of the most profound questions in computer science! It is still open and keep evolving!!

Page 26: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Data Levels and their Roles

• Physical – corresponds to the first view of data: How data is stored, how is it accessed, how data is modified, is data ordered, how data is allocated to computer memory and/or peripheral devices, how data items are actually represented (ASCI, EBCDIC,…)

• Conceptual – corresponds to the second view of data: What we want the data to express and what relationships between data we must express, what “ story” data tells, are all data necessary for the “story’ are discussed.

• View – corresponds to the third view of data:What part of the data is seen by a specific application

Page 27: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Physical Data - Example

• Physical

benjamin 63

10 3

0000035000 Jjames

6 10

3

3 6

000375 . . . . . . . . .

Page 28: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Examples

• Conceptual– TA

Name char(10), Age char (3), Salary Fixed Dec(6); - Student Name char(10), Year-of_study char(3) GPA Fixed Dec(5,2);

Page 29: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Examples

- STUDENTS-TA

Name char(25),

Age char (3),

Salary Fixed Dec(8,2),

Year-of_study char(3)

GPA Fixed Dec(3,2);

A view

Page 30: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Three Level Data View –Data Abstractions

View1 View k

Conceptual View Of Data

Phyisal Data Storage

. . . . .

Page 31: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Logical Data Models

• A collection of tools for describing – data – data relationships– data semantics– data constraints

Page 32: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

A Break-through : Relational Model

• An enterprise is represented as a set of relations

• Domain – is a set of atomic values. Each domain has a NULL value.

• Data type – Description of a form that domain values can be represented.

• Relation is a subset of a cartesian product of one or more domains

• The elements of relations are called tuples. Each element in the cartesian product is called attribute.

Page 33: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Relational Model

• Example of tabular data in the relational modelNameStudent-id

Street City gpa

Johnson

Smith

Johnson

Jones

Smith

192-83-7465

019-28-3746

192-83-7465

321-12-3123

019-28-3746

Alma

North

Alma

Main

North

Palo Alto

Rye

Palo Alto

Harrison

Rye

3.6

2.7

3.2

4.0

3.45

Attributes

Page 34: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Object – Oriented Model

• An enterprise is described as a collection of objects and a collection of algorithms that work with objects

• Example: Person is an object.• Object is characterized by a set of public attributes.

Applications may refer only to public attributes; private attributes . Algorithms that implement the object may refer to private attributes; a set of protected attributes and a set of methods

• Attribute of an object can be another object• Objects are nested into a hierarchy and can inherit

attributes of their parents

Page 35: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Object Oriented Model

OBJECT DATA MODEL

1. Complex Objects – Nested Structure (pointers or references)

2. Encapsulation, set of Methods/Access functions3. Object Identity4. Inheritance – Defining new classes like old classes

Object model: usually find objects via explicit navigationAlso query language in some systems

Page 36: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Example

• Class Person{ public: Person(); ~Person(); float GetSalary(); float PutSalary(float&); string Name; int SSN; date BirthDate; private: float salary; }

Page 37: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Object-Oriented ModelData Encapsulation

• An object contains both data and methods to work with the data

• The physical data representation is visible only to the object creator.

• The implementation details of methods are not visible to object users

• An interface of the object consists of public attributes and methods

• Each object is characterized by an object identity

Page 38: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Data Manipulation Language

• Language for accessing and manipulating the data organized by the appropriate data model

• Two classes of languages – Procedural – user specifies what data is required and

how to get those data

– Nonprocedural – user specifies what data is required without specifying how to get those data

• SQL is the most widely used query language

Page 39: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Database Languages

Faculty

Name Dept

Department

Dept Chair

SQL

SELECT Chair

FROM Faculty, DepartmentWHERE Faculty.name = “Ken Noname”

AND Faculty.Dept = Department.Dept

Data definition language (DDL) ~ like type definitions in C or C++

Data Manipulation Language (DML)Query (SELECT)UPDATE < relation name >SET <attribute> = < new-value>WHERE <condition>

Page 40: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Data Definition Language

• Specification notation for defining the database schema– E.g.

create table account ( account-number char(10), balance integer)

• DDL compiler generates a set of tables stored in a data dictionary

• Data dictionary contains metadata (i.e., data about data)– database schema – Data storage and definition language

• language in which the storage structure and access methods used by the database system are specified

• Usually an extension of the data definition language

Page 41: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Database Host Languages C, C++, Fortran, Lisp, COBOL

Application prog.

Local Vars

DBMSCalls to

DB

Host language is completely general

Query language—less general "non procedural" andoptimizable

(Memory)

(Storage)

Page 42: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Database Definition

• A database is a collection of stored operational data used by various applications and/or users by some particular enterprise or by a set of outside authorized applications and authorized users

• A DataBase Management System (DBMS) is a software system that manages execution of users applications to access and modify database data so that the data security, data integrity, and data reliability is guaranteed for each application and each application is written with an assumption that it is the only application active in the database.

Page 43: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

DBMS Architecture

Page 44: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Logical and Physical Database Components

• Data Definition Language (DDL)• Data Manipulation Language (DML)• Host Language Interface• Data Administrator• Users• Query Processor

– Compiler– Optimizer

• Management– Transaction Manager– File Manager– Buffer Manager– Authorization and Integrity Manager

Logical

Physical

Page 45: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Query Processor

• Compiler – verifies whether a program or query is written in accordance with DDL and DML rules

• Optimizer – Finds the most effective way to access the required data and supply it in a user requested form. Monitors the query execution and modifies a query evaluation plan if necessary.

Page 46: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Storage Management

• Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.

• The storage manager is responsible to the following tasks: – interaction with the file manager – efficient storing, retrieving and updating of data

Page 47: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Transaction Manager

• A transaction is a collection of operations that performs a single logical function in a database application

• Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.

• Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.

Page 48: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

File Manager

• File Manager is responsible for mapping logical database units (objects, relations, etc.) into a set of low level files.

• It is responsible for maintenance of files and indexes on them. It should be able to create and destroy index and collect unused storage space to eliminate an unneeded gaps on disks.

Page 49: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Buffer Manager

• Buffer Manager is responsible for the allocation and maintenance buffer space in a memory to facilitate processing database data by several concurrent applications.

• Buffer Manager decides when to load data from a buffer to a database or discard the data and under what conditions a new data should be put into a buffer

Page 50: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Authorization and Integrity Manager

• This manager is responsible for granting an access to database or portions thereof only to authorized users and preventing the access to unauthorized users

• Integrity manager must assure data integrity during normal database operations as well as during the database failures

Page 51: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Data Administrator

• Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise’s information resources and needs.

• Database administrator's duties include:– Schema definition– Storage structure and access method definition– Schema and physical organization modification– Granting user authority to access the database– Specifying integrity constraints– Acting as liaison with users– Monitoring performance and responding to

changes in requirements

Page 52: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Database Users

• Naïve – do not know about database too much, invoke application programs that are prepared already

• Application Programmers – know how to interact with the system but may not know how DBMS is designed

• Sophisticated users that know advanced use of the system and can use the system and packages on the top of the system

• DBMS system users – write specialized database applications that do not fit into the traditional data processing framework

Page 53: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

A Little Design Methodology: Entity-Relationship Model

• The enterprise data can be described as a set of entities and a set of relationships between them.

• Entity – a data that pertains to, or describes some component of the enterprise

• Each entity is characterized by a set of attributes

• Relationship – describes an interconnection between different entities

• Entity Set – a set of entities that are characterized by the same entity definition

• Relationship Set – a set of relationships of the same type

Page 54: Introduction to Database Systems Ruoming Jin TTH 9:15 – 10:30pm Spring 2009 rm MSB115

Entity-Relationship Model

Example of schema in the entity-relationship model