database management system introduction. warning this class is a lot of work. but it is worth it. of...
TRANSCRIPT
Database Management
System
Introduction
Warning
•This class is a lot of work.
•But it is worth it.
•Of all courses you take at CS, this may be the one that gets you a job.
Syllabus• The background and history of database
management systems.
• The fundamentals of using a database management systems.
• Relational model.
• Queries and Updates.
• Relational Algebra.
• Normalization
• Transactions and Security.
• Object-oriented, object-relational, semi-structured and XML database systems.
What Is a Database System?
• Database: a very large, integrated collection of data.
• Models a real-world enterprise
• Entities (e.g., teams, games)
• Relationships (e.g., Abo Teraka is playing in Al Ahly)
• More recently, also includes active components , often called “business logic”. (e.g., the BCS ranking system)
• A Database Management System (DBMS) is a software system designed to store, manage, and facilitate access to databases.
Why Study Databases??
• Shift from computation to information
• always true for corporate computing
• Web made this point for personal computing
• more and more true for scientific computing
• Need for DBMS has exploded in the last years
• Corporate: retail swipe/clickstreams, “customer relationship”, “supply chain”, “data warehouses”, etc.
• Scientific: digital libraries, Human Genome project, NASA Mission to Planet Earth, physical sensors, grid physics network
• DBMS encompasses much of CS in a practical discipline
• OS, languages, theory, AI, multimedia, logic
• Yet traditional focus on real-world apps
?
databases you may use
Database Applications
• These examples are what we called traditional database applications
(First part of book focuses on traditional applications)
• More Recent Applications:
• Youtube
• iTunes
• Geographic Information Systems (GIS)
• Data Warehouses
• Many other applications
Database Systems: Then
History of Database Systems
• 1950’s and early 1960’s:
• Data processing using magnetic tapes for storage
• Tapes provide only sequential access
• Punched cards for input
• Late 1960’s and 1970’s:
• Hard disks allow direct access to data
• Network and hierarchical data models in widespread use
• Ted Codd defines the relational data model
• Would win the ACM Turing Award for this work
• IBM Research begins System R prototype
• UC Berkeley begins Ingres prototype
• High-performance (for the era) transaction processing
History (cont.)• 1980s:
• Research relational prototypes evolve into commercial systems
• SQL becomes industry standard
• Parallel and distributed database systems
• Object-oriented database systems
• 1990s:
• Large decision support and data-mining applications
• Large multi-terabyte data warehouses
• Emergence of Web commerce
• 2000s:
• XML and XQuery standards
• Automated database administration
• Increasing use of highly parallel database systems
• Web-scale distributed data storage systems
Q: How do you write programs over a subsystem when it promises you only “???” ?
A: Very, very carefully!!
Is a File System a DBMS?
• Thought Experiment 1:
• You and your project partner are editing the same file.
• You both save it at the same time.
• Whose changes survive?
=
•Thought Experiment 2:–You’re updating a file.–The power goes out.–Which of your changes survive?
A) Yours B) Partner’s C) Both D) Neither E) ???
A) All B) None C) All Since Last Save D) ???
Can we do it without a DBMS ?
Sure we can! Start by storing the data in files:
students.txt courses.txt professors.txt
Now write C or Java programs to implement specific tasks
13
Doing it without a DBMS...
• Enroll “Mary Johnson” in “CSE444”:
Read ‘students.txt’Read ‘courses.txt’Find&update the record “Ahmed Hassan”Find&update the record “CS444”Write “students.txt”Write “courses.txt”
Read ‘students.txt’Read ‘courses.txt’Find&update the record “Ahmed Hassan”Find&update the record “CS444”Write “students.txt”Write “courses.txt”
Write a C program to do the following:
Enters a DMBS
Data files
Database server(someone else’s
C program) Applications
“Two tier database system”
Problems without a DBMS...• System crashes:
• What is the problem ?
• Large data sets (say 50GB)• What is the problem ?
• Simultaneous access by many users• Need locks: we know them from OS, but now data on
disk; and is there any fun to re-implement them ?
Read ‘students.txt’Read ‘courses.txt’Find&update the record “Mary Johnson”Find&update the record “CSE444”Write “students.txt”Write “courses.txt”
Read ‘students.txt’Read ‘courses.txt’Find&update the record “Mary Johnson”Find&update the record “CSE444”Write “students.txt”Write “courses.txt”
CRASH !
Why Use a DBMS?
•Without a DBMS, we'd have:
data stored as bits on disksorganized as files
Access by a collectionof ad hoc programs
in C++, Java, PHP, etc.users ofthe data
There is no control orcoordination of whatthese programs do
with the data
Why Use a DBMS?
•With a DBMS, we have:
data stored as bits on disksorganized as files
users ofthe data
DBMS provides controland coordination to
protect the data.
DBMS
applications
Database definition
• Database is “data” or facts supplied by
a base or software
• Files contain data with the same
structure
• Database is an integration of different
kinds of data
Database Systems
• The big commercial database vendors:
• Oracle
• IBM (with DB2) bought Informix recently
• Microsoft (SQL Server)
• Sybase
• Some free database systems (Unix) :
• Postgres
• Mysql
• Predator
DBMS Functions
1. Define the database
2. Construct the database
3. Manipulating database
4. Data security and integrity
5. Concurrency
6. Recovery
Disadvantages of database
• Expensive
• Incompatible with any other DBMS
Concurrency
•A DBMS supports access by concurrent users
• concurrent = happening at the same time
• concurrent access, particularly writes (data changes), can result in inconsistent states (even when the individual operations are correct)
• the DBMS can check the actual operations of concurrent users, to prevent activity that will lead to inconsistent states
Access Control
•A DBMS can restrict access to authorized users
• security policies often require control that is more fine-grained than that provided by a file system
• since the DBMS understands the data structure, it can enforce fairly sophisticated and detailed security policies• on subsets of the data
• on subsets of the available operations
Redundancy Control
•A DBMS can assist in controlling redundancy
• redundancy = multiple copies of the same data
• with file storage, it's often convenient to store multiple copies of the same data, so that it's "local" to other data and applications
• this can cause many problems:• wasted disk space
• inconsistencies
• need to enter the data multiple times
Backup and Recovery•A DBMS can provide backup and recovery
• backup = snapshots of the data particular times
• recovery = restoring the data to a consistent state after a system crash
• the higher level semantics (relationships and constraints) can make it difficult to restore a consistent state
• transaction analysis can allow a DBMS to reconstruct a consistent state from a number of backups
Views and Interfaces•A DBMS can support
multiple user interfaces and user views
• since the DBMS provides a well-defined data model and a persistent data dictionary, many different interfaces can be developed to access the same data
• data independence ensures that these UIs will not be made invalid by most changes to the data
• new user views can be supported as new schemas defined against the conceptual schema
Database Components
DBMS===============
Design toolsTable CreationForm CreationQuery CreationReport CreationProcedural language compiler (4GL)=============Run timeForm processorQuery processorReport WriterLanguage Run time
UserInterfaceApplications
ApplicationProgramsDatabase
Database contains:User’s DataMetadataIndexesApplication Metadata
Actors on DBMS
• Database Administrator
• System analysis
• Database designer
• Application programmer
• End user
Actors on the Scene
•Database Administrators
• acquiring a DBMS
• managing the system
• acquiring HW and SW to support the DBMS
• authorizing access (security policies)
• managing staff, including DB designers
Actors on the Scene•Database Designers
• identifying the information of interested in the Universe of Discourse (UoD)
• designing the database conceptual schema
• designing views for particular users
• designing the physical data layout and logical schema
• adjusting data parameters for performance
Actors on the Scene
•Systems Analysts and Application Programmers (generic database developers)
• provide specialized knowledge to optimize database usage
• provide generic (canned) application programs
Actors on the Scene
•End Users
• casual users: ad-hoc queries
• naïve or parametric users: canned queries such as menus for a phone company customer service agent
• sophisticated users: people who understand the system and the data and use it in many novel ways
• standalone users: people who use personal easy-to-use databases for personal data
Three-Schema Architecture
External View External View External View
Internal Schema
Conceptual Schema generic view
physical view
user-specificviews
Levels of Abstraction
• Views describe how users see the data.
• Conceptual schema defines logical structure
• Physical schema describes the files and indexes used.
• (sometimes called the ANSI/SPARC model)
Physical Schema
Conceptual Schema
View 1 View 2 View 3
DB
Users
Example: University Database
• Conceptual schema:
• Students(sid: string, name: string, login: string,
age: integer, gpa:real)
• Courses(cid: string, cname:string, credits:integer)
• Enrolled(sid:string, cid:string, grade:string)
• External Schema (View):
• Course_info(cid:string,enrollment:integer)
• Physical schema:
• Relations stored as unordered files.
• Index on first column of Students.
Physical Schema
Conceptual Schema
View 1 View 2 View 3
DB
Levels of Abstraction
• Physical level: describes how a record (e.g., customer) is stored.
• Logical level: describes data stored in database, and the relationships among the data.
type customer = record
customer_id : string; customer_name : string;customer_street : string;customer_city : string;
end;
• View level: application programs hide details of data types. Views can also hide information (such as an employee’s salary) for security purposes.
Conceptual Data Models
•A data model describes the possible schemas (essentially the meta-schema)
•A DBMS is designed around a particular data model
• this is what allows all system components (and humans) to understand the schema and data
•possible data models• relational,
object-oriented, object-relational, entity-relationship, semantic, network, hierarchical, etc.
Physical Data Models
•A physical data model describes the way in which data is stored in the computer
• typically only of interest to database designers, implementers and maintainers …not end users
• must provide a well-defined structure that can be mapped to the conceptual schema
• allows optimization strategies to be defined generically
Instances and Schemas
• Similar to types and variables in programming languages
• Schema – the logical structure of the database
• Example: The database consists of information about a set of customers and accounts and the relationship between them)
• Physical schema: database design at the physical level
• Logical schema: database design at the logical level
• Instance – the actual content of the database at a particular point in time
Classification
• DBMS has 3 criteria as
• Data models (relational & object &….)
• Number of users (single user & Multi-user)
• Number of sites (Centralized & Distributed)
Data model
Is a technique for organization data
and concepts to describe the
structure of data, relationship and
integrity constrains.
Database models
1. Relational data modelOracle, Access
2. Hierarchical data mode (as a tree) IMS DBMS
3. Network data model (as a graph)IDMS DBMS
4. Object oriented modelVERSANT DBMS
5. Object relational data modelUNISQL DBMS
Data Models
• Hierarchical Model (1960’s and 1970’s)
• Similar to data structures in programming languages.
Books(id, title)
Publisher SubjectsAuthors(first, last)
Data Models
• Network Model (1970’s)
• Provides for single entries of data and navigational “links” through chains of data.
Subjects Books
Authors
Publishers
Data Models
• Object Oriented Data Model (1990’s)
• Encapsulates data and operations as “Objects”
Books(id, title)
Publisher SubjectsAuthors(first, last)
Relational Model
• Example of tabular data in the relational model
Attributes
A Sample Relational Database
Relational data model
• Based on the relations between data
• Each relation or table (entity) is a data structure
or a collection of attributes describing data
• Attribute or a field is a column in the table
• A tuple or record is a raw in the table
Relational data model
• Null value is assigned to attribute
which means that the attribute is
not yet known
• Primary key is a unique identifier
for the table. One attribute or
combination of attributes
Relational data model
•Foreign key is an attribute
(combination of attributes) is one
relation whose values are
required to match those of the
primary of some relation
•Candidate key is any key (primary
or foreign keys)
New Trends in Databases• Object-relational databases
• Main memory database systems
• XML XML XML !• Relational databases with XML support
• Middleware between XML and relational databases
• Native XML database systems
• Lots of research here at UW on XML and databases
• Data integration
• Peer to peer, stream data management – still research
SQL
• SQL: widely used non-procedural language
• Example: Find the name of the customer with customer-id 192-83-7465
select customer.customer_namefrom customerwhere customer.customer_id = ‘192-83-7465’
• Example: Find the balances of all accounts held by the customer with customer-id 192-83-7465
select account.balancefrom depositor, accountwhere depositor.customer_id = ‘192-83-7465’ and
depositor.account_number = account.account_number
• Application programs generally access databases through one of
• Language extensions to allow embedded SQL
• Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database
The Entity-Relationship Model
• Models an enterprise as a collection of entities and relationships
• Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects
• Described by a set of attributes
• Relationship: an association among several entities
• Represented diagrammatically by an entity-relationship diagram:
Transaction Management• A transaction is a collection of operations
that performs a single logical function in a database application
• Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.
• Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.
An UNIVERSITY example
• A UNIVERSITY database for maintaining information concerning students, courses, and grades in a university environment
• We have:
STUDENT file stores data on each student
COURSE file stores data on each course
SECTION file stores data on each section of each course
GRADE_REPORT file stores the grades that students receive
PREREQUISITE file stores the prerequisites
Example of a simple database
COMPANY Database
• The company is organized into DEPARTMENTs. Each department has a name, number, and an employee who manages the department. We keep track of the start date of the department manager. A department may have several locations.
• Each department controls a number of PROJECTs. Each project has a name, number, and is located at a single location.
COMPANY Database
• We store each EMPLOYEE's social security number, address, salary, sex, and birth date. Each employee works for one department but may work on several projects. We keep track of the number of hours per week that an employee currently works on each project. We also keep track of the direct supervisor of each employee.
• Each employee may have a number of DEPENDENTs. For each dependent, we keep their name, sex, birth date, and relationship to the employee.