first 4 weeks
DESCRIPTION
First 4 Weeks. Introduction to Databases Course Information Grading and Other Things Questionnaire The Relational Data Model Relational Algebra / SQL Part1 The E/R Data Model SQL Part2. Week1/2. Weeks 2-4. Textbooks for COSC 6340. - PowerPoint PPT PresentationTRANSCRIPT
Christoph F. Eick Introduction File and Database Systems
First 4 Weeks
1. Introduction to Databases2. Course Information 3. Grading and Other Things4. Questionnaire5. The Relational Data Model6. Relational Algebra / SQL Part17. The E/R Data Model 8. SQL Part2
Week1/2
Weeks 2-4
Christoph F. Eick Introduction File and Database SystemsTextbooks for COSC 6340
Required Text: Raghu Ramakrishnan and Johannes Gehrke, Data Management Systems, McGraw Hill, Third Edition, 2002
Other books with relevant material: Ramez Elmasri and Shamkant Navathe, Fundamentals of Database Systems, Fourth Edition
Christoph F. Eick Introduction File and Database Systems
Lectures in COSC 34801. Basic Concepts of Database Management (2-3 classes; Chapter 1, 2.1,2.2, 2.3;
instructor teaching material) 2. Introduction to the Relational Data Model (1.5 classes; Chapter 3.1, 3.2, 3.3, 3.4, 3.6)3. Introduction to the Relational Algebra and SQL (3-4 classes; Chapter 4.2, Chapter 5)4. Conceptual Schema Design using the Entity Relationship Data Model (2-3 classes;
instructor material; Chapters 2.4, 2.5) 5. Relational Database Design and Normalization (2-3 classes; instructor material,
Chapter 19) 6. Introduction to KDD and Data Warehousing (2 classes; instructor material, Chapters
25 and 26) 7. Disks, Files, Storage Structures, Index Structures and Physical Database Design (4
classes, Chapter 8, 9, 10, 11.1, 11.2, 13, 20) 8. Internet Databases and XML (1-2 classes; Chapter 7 and 27) 9. Query Optimization (1 class; Chapter 12, only if enough time left)10. Summary: Where Do We Stand? (1 class; instructor material)
Christoph F. Eick Introduction File and Database Systems
Databases
Definition: A database is a collection of data with the following properties:
It represents certain aspect of the real-world. Its data are logically related. It is created for a specific purpose.
Christoph F. Eick Introduction File and Database Systems
Definition: A database management system (DBMS) is a set of software that are used to define, store, manipulate and control the data in a database.
define --- define data types, structures and constraints. store --- store data; provide efficient access. manipulate --- perform retrieval and update operations using a
query language. control --- control access to data.
Database System = Database + DBMS
DBMS
Christoph F. Eick Introduction File and Database Systems
A Brief History Note
Database technology has a history of about 40 years. Database technology has gone through several generations .
First Generation: File systems, 50's -- 60's A typical file system consists of a set of independent files, and
a number of application programs
Definition: A file stores a set of record (on a disk drive) all of
which have the same format.
Christoph F. Eick Introduction File and Database Systems
An Example File System
A banking system may have
files for customers, saving accounts and checking
accounts;
application programs to deposit and withdraw money, to
find balance, etc.
different files are used for customers, saving and checking
accounts
Christoph F. Eick Introduction File and Database Systems
Problems of File Systems (1)
It is difficult to support new applications. Two existing
application programs:
(i) find customers who have a checking account
(ii) find customers who have a saving account Need a new program to find the customers who have a
checking account and a saving account.
Christoph F. Eick Introduction File and Database Systems
Problems of File Systems (2)
It has no centralized control of all data. Files are often created for a particular application. Files are created and managed independently.
Christoph F. Eick Introduction File and Database Systems
Problems of File Systems (3)
There often exists severe data redundancy and inconsistency.
Checking-Account: Acct#, Owner-name, Owner-SSN, Owner-Addr, Balance, ...
Saving-Account: Acct#, Owner-name, Owner-SSN, Owner-Addr, Balance, Interest, …
Christoph F. Eick Introduction File and Database Systems
Problems of File Systems (4)
It lacks concurrency control.
Concurrency control: prevent mutual interference of concurrent requests.
Example (Airplane ticket reservation): Consider the situation
when two customers are trying to book the only ticket left
for a flight through two operators at about the same time.
Christoph F. Eick Introduction File and Database Systems
Problems of File Systems (5)
Weak security
Can not provide multiple views of the same data Lack isolation between program and data
Lack self-describing feature
Christoph F. Eick Introduction File and Database Systems
Database History (Continued)
Second Generation: Hierarchical database systems (HDBS),
late 60's -- early 70's Best known HDBS: IMS (Information Management System
of IBM). One-to-many relationships between parent records and
child records which can have different types. Data are organized in trees Records are connected by pointers.
Christoph F. Eick Introduction File and Database Systems
An IMS Query
Query: find all Binghamton University students whose major is computer science and whose GPA is higher than 3.5.
GU University (Name = `Binghamton University')
Department (Name = `Computer Science')
Student (GPA > 3.5)
L1: GNP Student (GPA > 3.5)
Goto L1
Christoph F. Eick Introduction File and Database Systems
History of Database (Continued)
Third Generation: Network database systems (NDBS), late 60's -- early 70's
Some commercial NDBSs: IDS II (Honeywell), DMS II
(UNISYS). In NDBS, record types are organized into an acyclic graph.
Main problem with HDBS and NDBS: difficult to use.
Christoph F. Eick Introduction File and Database Systems
History of Database (Continued)
Fourth Generation: Relational database systems (RDBS), early 70's -- now
Example relational DBSs: Oracle 7, Sybase, Informax, DB2, Ingres, ...
In RDBS, data are organized into tables (relations).
Christoph F. Eick Introduction File and Database Systems
History of Database (Continued)
Fifth Generation: Object-oriented and Object-Relational database systems (OODBS), 80's -- now
Example OODBSs: O2, Objectivity, ObjectStore, Versant, … Example ORDBSs: Oracle 8, Informix, UniSQL/X.
Christoph F. Eick Introduction File and Database Systems
Database Languages
Data Definition Language (DDL): used by DBA or database designer to define database schemas.
Data Manipulation Language (DML): used by database users to retrieve, insert, delete and update data in the database.Query language: The part of DML that is used to retrieve data.
Data Control Language (DCL): used by database owners and DBA to control the access of data.
Christoph F. Eick Introduction File and Database Systems
Persons Involving DBS (1)
DBMS developers: Those who design and implement
DBMS software: buffer manager, query processor,
transaction manager, interface, ... Database designers: Those who are responsible for
determining
what data should be stored in the database;how data in the database should be organized; the design of customized views; the design of special data structures to improve
the performance of the system.
Christoph F. Eick Introduction File and Database Systems
Persons Involving DBS (2)
Database administrator (DBA): Those who manage and monitor the daily operation of a database system.
authorization for database access, e.g., who can access what data in what mode.
routine maintenance: backup, install new tools, ...modification to existing database design.
Christoph F. Eick Introduction File and Database Systems
Persons Involving DBS (3)
End-users:
Casual users: those who access the database using SQL directly.
Naive users: those who access the database using pre-prepared packages.
Application programmers: Those who write menu applications for naive users, typically, through database calls embedded in a program.
Christoph F. Eick Introduction File and Database Systems
After This Course, You Will Be
familiar with the relational data model; a decent database designer; a sophisticated casual user; a good application programmer; knowledgeable with major aspects on how to use a DBMS knowledge in a few advanced topics with respect to
database systems This course will not teach you to become a database
administrator
Christoph F. Eick Introduction File and Database Systems
Popular Topics in Databases Efficient algorithms for data collections that reside on disks (or
which are distributed over multiple disk drives, multiple computers or over the internet).
Study of data models (knowledge representation, mappings, theoretical properties)
Algorithms to run a large number of transactions on a database in parallel; finding efficient implementation for queries that access large databases; database backup and recovery,…
Database design How to use database management systems as an application
programmer / end user. How to use database management systems as database
administrator How to implement database management systems Data summarization, knowledge discovery, and data mining Special purpose databases (genomic, geographical, internet,…)
Christoph F. Eick Introduction File and Database Systems
Review: Why are integrated databases popular?
IntegratedDatabase
BookkeepingDevice
Car Salesman
Christoph F. Eick Introduction File and Database Systems
Review: Why are integrated databases popular?
Avoidance of uncontrolled redundancy Making knowledge accessible that would otherwise not be
accessible Standardization --- uniform representation of data facilitating
import and export Reduction of software development (though the availability of
data management systems) Support for Parallel Access and Data Security
IntegratedDatabase
BookkeepingDevice
Car Salesman
Christoph F. Eick Introduction File and Database Systems
Data Model
Data Model
Schema (definesa set of database
states)
Current Database State
is used to define
Christoph F. Eick Introduction File and Database Systems
Schema for the Library Example using the E/R Data Model
Many-to-Many1-to-1 1-to Many Many-to-1
title
authorB#
when
phone
name
ssn
Check_out Person Book(0,35) (0,1)
Christoph F. Eick Introduction File and Database Systems
Relational Schema for Library Example in SQL/92
CREATE TABLE Book (B# INTEGER, title CHAR(30), author CHAR(20), PRIMARY KEY (B#));
CREATE TABLE Person (ssn CHAR(9), name CHAR(30), phone INTEGER, PRIMARY KEY (ssn));
CREATE TABLE Checkout( book INTEGER, person CHAR(9), since DATE, PRIMARY KEY (book), FOREIGN KEY (book) REFERENCES Book, FOREIGN KEY (person) REFERENCES Person));
Christoph F. Eick Introduction File and Database Systems
Example Instances
Person(name, ssn, phone): (Eick,111111111.33345), (Miller, 222222222,33337)
Book(B#,title, author): (1,Today,Yu), (2, Today, Yu), (7, Blue, Xu)
Checkout(book,person,since): (2,222222222.8/8/05), (7,222222222,8/8/05)
Christoph F. Eick Introduction File and Database Systems
Referential Integrity in SQL/92
SQL/92 supports all 4 options on deletes and updates. Default is NO ACTION
(delete/update is rejected) CASCADE (also delete all tuples
that refer to deleted tuple) SET NULL / SET DEFAULT (sets
foreign key value of referencing tuple)
CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students
ON DELETE CASCADEON UPDATE SET DEFAULT )
Christoph F. Eick Introduction File and Database Systems
Example of an Internal Schemafor the Library Example
INTERNAL Schema Library12 references Library. Book is stored sequentially, index on B# using hashing, index on Author using hashing. Person is stored using hashing on ssn. Check_out is stored sequentially, index on since using B+-tree.
Christoph F. Eick Introduction File and Database Systems
Example: Stored Database
Relation Book
(11, Y,W)(30, Z, B)
(20, Y,W)(51, C, B)
(1, C,W)
11151
Index on B#Block#= B# mod 10
0
1
2030
(101,…)Relation PersonBlock#= sss mod 10
0
1
(200,…)(500,…)
W,…
Index on Author
RelationCheckout
Index on since
Christoph F. Eick Introduction File and Database Systems
3 Schema Architecture
ExternalSchema
ExternalSchema
External Schema
ConceptualSchema
Internal Schema
How usersSee the Data
What the databasecontains and whatconstraints hold withrespect to the database
How the data arephysically stored
Christoph F. Eick Introduction File and Database Systems
Data Independence
Data Independence: the ability to modify the lower level descriptions of a database without causing application programs to be rewritten.
Logical Data Independence: the ability to modify the conceptual schema without causing application programs to be rewritten.
Physical Data Independence: the ability to modify the internal schema without causing application programs to be rewritten.
Data independence is achieved through proper manipulation of the
above two mappings.
Modern Relational DBMSModern Relational DBMS
Modern DBMS
Modern DBMS
Support for Web-Interfaces, XML, andData Exchange
Efficient Implementation of Queries (Query Optimization, Join & Selection & Indexing techniques)
Transaction Concepts; capability
of running manytransactions in
parallel; support forbackup and recovery.
Support for specialData-types: long fields,images, html-links, DNA-sequences,spatial information,…
Support for higher level user interfaces:graphical, natural language, form-based,…
Support for OLAP and Data Warehousing
Support for Data Mining
operations
Support for OO; capability to store operations
Supportfor data-driven computing
Christoph F. Eick Introduction File and Database Systems
Disks and Files
DBMS stores information on (“hard”) disks. This has major implications for DBMS design!
READ: transfer data from disk to main memory (RAM). WRITE: transfer data from RAM to disk. Both are high-cost operations, relative to in-memory
operations, so must be planned carefully!
Christoph F. Eick Introduction File and Database Systems
Remark: All reported disk performance/prize data are as of middle of 2003
Why Not Store Everything in Main Memory?
Costs too much. $100 will buy you either 512MB of RAM or 50GB of disk today --- that is disk storage 100 times cheaper (but it is approx. 10000 times slower).
Main memory is volatile. We want data to be saved between runs. (Obviously!)
Typical storage hierarchy: Main memory (RAM) for currently used data. Disk for the main database (secondary storage). Tapes for archiving older versions of the data
(tertiary storage).
Components of a Disk
Platters
The platters spin (say, 90rps).
Spindle
The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!).
Disk head
Arm movement
Arm assembly
Only one head reads/writes at any one time.
Tracks
Sector
Block size is a multiple of sector size (which is fixed).
Christoph F. Eick Introduction File and Database Systems
Accessing a Disk Page
Time to access (read/write) a disk block: seek time (moving arms to position disk head on
track) rotational delay (waiting for block to rotate under
head) transfer time (actually moving data to/from disk
surface) Seek time and rotational delay dominate.
Seek time varies from about 1 to 20msec Rotational delay varies from 0 to 10msec Transfer rate is about 1msec per 32KB page
Christoph F. Eick Introduction File and Database Systems
Support for Transactions
Database management systems provide powerful transaction concepts that “guarantee” ACID properties
Transaction:
Begin_Transaction
<set of operations that read and modify the database>
End_Transaction Usually 2 commands are available to terminate a transaction:
Abort and Commit
Christoph F. Eick Introduction File and Database Systems
Review: The ACID properties
AA tomicity: All actions in the Xact happen, or none happen. CC onsistency: If each Xact is consistent, and the DB starts
consistent, it ends up consistent. II solation: Execution of one Xact is isolated from that of
other Xacts. D D urability: If a Xact commits, its effects persist.
The Recovery Manager guarantees Atomicity & Durability.
Christoph F. Eick Introduction File and Database Systems
Example
Consider two transactions (Xacts):
T1: BEGIN A=A+100, B=B-100 ENDT2: BEGIN A=1.06*A, B=1.06*B END
Intuitively, the first transaction is transferring $100 from B’s account to A’s account. The second is crediting both accounts with a 6% interest payment.
There is no guarantee that T1 will execute before T2 or vice-versa, if both are submitted together. However, the net effect must be equivalent to these two transactions running serially in some order.
Christoph F. Eick Introduction File and Database Systems
Atomicity of Transactions
A transaction might commit after completing all its actions, or it could abort (or be aborted by the DBMS) after executing some actions.
A very important property guaranteed by the DBMS for all transactions is that they are atomic.
DBMS logs all actions so that it can undo the actions of aborted transactions and redo the actions of successful transactions.
Christoph F. Eick Introduction File and Database Systems
Concurrency in a DBMS
Users submit transactions, and can think of each transaction as executing by itself. Concurrency is achieved by the DBMS, which interleaves
actions (reads/writes of DB objects) of various transactions. Each transaction must leave the database in a consistent
state if the DB is consistent when the transaction begins. DBMS will enforce some ICs, depending on the ICs
declared in CREATE TABLE statements. Beyond this, the DBMS does not really understand the
semantics of the data. (e.g., it does not understand how the interest on a bank account is computed).
Issues: Effect of interleaving transactions, and crashes.
Christoph F. Eick Introduction File and Database Systems
Example (Contd.)
Consider a possible interleaving (schedule):
T1: A=A+100, B=B-100 T2: A=1.06*A, B=1.06*B
This is OK. But what about:T1: A=A+100, B=B-100 T2: A=1.06*A, B=1.06*B
The DBMS’s view of the second schedule:
T1: R(A), W(A), R(B), W(B)T2: R(A), W(A), R(B), W(B)
Christoph F. Eick Introduction File and Database Systems
Summary
Concurrency control and recovery are among the most important functions provided by a DBMS.
Users need not worry about concurrency. System automatically inserts lock/unlock requests and
schedules actions of different transactions in such a way as to ensure that the resulting execution is equivalent to executing the transactions one after the other in some order.
Write-ahead logging (WAL) is used to undo the actions of aborted transactions and to restore the system to a consistent state after a crash.