overview of databases -...

31
Overview of Databases Gerome Miklau CMPSCI 645 – Database Design & Implementation UMass Amherst Feb 1, 2006 Some slide content courtesy of Zack Ives, Ramakrishnan & Gehrke, Dan Suciu, Ullman & Widom

Upload: others

Post on 15-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Overview of Databases

Gerome MiklauCMPSCI 645 – Database Design & Implementation

UMass AmherstFeb 1, 2006

Some slide content courtesy of Zack Ives, Ramakrishnan & Gehrke, Dan Suciu, Ullman & Widom

Page 2: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Today

• Student information form• Overview of databases• Course topics• Course requirements

Page 3: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Databases & DBMS’s• A database is a large, integrated collection of

data.

• A database management system (DBMS) is a software package designed to store and manage databases, allowing:– Define the kind of data stored– Querying/updating interface– Reliable storage & recovery of 100s of GB– Control access to data from many concurrent users

Page 4: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Can filesystems do it?

• Schema for files is limited• No query language for data in files• Files can store large amounts of data, but

– no recovery from failure– no efficient access to items within file

• Concurrent access not safe

No

Page 5: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Evolution

• Early DBMS’s (1960’s), evolved from file systems.

• Data with many small items & many queries or modifications:– Airline reservations– Banking

Page 6: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Early DB systems

• Tree-based hierarchical data model• Graph-based network data model

• Encouraged users to think about data the way it was stored.

• No high level query language

Data model The data model includes basic assumptions about what’s

an “item” of data, how to represent it and interpret it.

Page 7: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

The Relational Model•The relational data model (Codd, 1970):

– Data independence: details of physical storage are hidden from users

– High-level declarative query language• say what you want, not how to compute it. • mathematical foundation

– A theory of normalization guides the design of relations

Side-note: Turing Awards in Databases1973: Bachman, networked data model 1981: Codd, relational model1998: Jim Gray, transaction processing

Page 8: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

DBMS Benefit #1: Generality and Declarativity

• The programmer or user does not need to know details like indices, sort orders, machine speeds, disk speeds, concurrent users, etc.

• Instead, the programmer/user programs with a logical model in mind

• The DBMS “makes it happen” based on an understanding of relative costs of different methods

Page 9: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Benefit #2: Efficiency and Scale

• Efficient storage of hundreds of GBs of data

• Efficient access to data

• Rapid processing of transactions

Page 10: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Benefit #3: Management of Concurrency and Reliability

• Simultaneous transactions handled safely.• Recovery of system data after system failure.

• More formally: the ACID properties– Atomicity - all or nothing– Consistency - sensible state not violated– Isolation - separated from effects– Durability - once completed, never lost

Page 11: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

How Does One Build a Database?

• Start with a conceptual model• Design & implement schema• Write applications using DBMS and other

tools– Many ways of doing this (DBMS, API writers,

library authors, web server, etc.)– Common applications include PHP/JSP/servlet-

driven web sites• The DBMS takes care of query optimization

and execution

Page 12: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Conceptual Design

STUDENT COURSETakes

namesid cid name

PROFESSOR

Teaches

semester

fid name

Page 13: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Designing a Schema (Set of Relations)

• Convert to tables +constraints

• Then need to do “physical” design: the layout on disk, indices, etc.

sid name1 Jill2 Bo3 Maya

fid name1 Diao2 Saul8 Weems

sid cid1 6451 6833 635

cid name sem645 DB F05683 AI S05635 Arch F05

fid cid1 6452 6838 635

STUDENT Takes COURSE

PROFESSOR Teaches

Page 14: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Queries

• Find all courses that “Mary” takes

• What happens behind the scene ?– Query processor figures out how to answer

the query efficiently.

SELECT C.nameFROM Students S, Takes T, Courses CWHERE S.name=“Mary” and S.sid = T.sid and T.cid = C.cid

Page 15: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Queries, behind the scene

Query execution plan:

SELECT C.nameFROM Students S, Takes T, Courses CWHERE S.name=“Mary” and S.sid = T.sid and T.cid = C.cid

Declarative SQL query

Students Takes

sid=sid

sname

name=“Mary”

cid=cid

Courses

The optimizer chooses the best execution plan for a query

Page 16: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

An Issue: 80% of the World’s Data is Not in a DB!

Examples: – Scientific data

(large images, complex programs that analyze the data) – Personal data– WWW and email

(some of it is stored in something resembling a DBMS)Data management is expanding to tackle these

problems

Page 17: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

DBMSs in the Real WorldA huge industry for 20% of the world’s data!• Big, mature relational databases

– IBM DB2, Oracle, Microsoft SQL Server– Adding advanced features, including “native XML” support

• “Middleware” above these systems– SAP, Siebel, PeopleSoft, dozens of special-purpose apps

• Integration and warehousing systems– BEA AquaLogic, DB2 Information Integrator

• Current trends:– Web services; XML everywhere– Smarter, self-tuning systems

Page 18: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Database Research

• One of the broadest, most exciting areas in CS!• A microcosm of CS in general

• languages, operating systems, concurrent programming, data structures, algorithms, theory, distributed systems, statistical techniques.

• Theory and systems well-integrated.

Page 19: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Recent Trends in Databases

• XML– Relational databases with XML support– Middleware between XML and relational databases– Large-scale XML message systems

• Main memory database systems• Peer data management• Stream data management• Model management, provenance• Security and privacy• Modeling uncertainty, probabilistic databases

Page 20: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

What is the Field of Databases ?

• To an applied researcher (SIGMOD/VLDB/ICDE)– Query optimization– Query processing (yet-another join algorithm)– Transaction processing, recovery (but most stuff is already

done)– Novel applications: data mining, high-dimensional search

• To a theoretical researcher (PODS/ICDT/LICS)– Focus on the query languages– Query language = logic = complexity classes

Page 21: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Course topics

• Fundamentals: relational design, query languages.

• Database internals: storage, indexing, query processing, query optimization, transaction management.

• Theory: expressiveness of query languages, static analysis, complexity.

• XML and semi-structured data models.• Security: access control, privacy.• Advanced topics: streaming data,

information integration.

Page 22: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Prerequisites

• Official: undergrad course in DB or OS• Also:

– Elementary complexity theory

Page 23: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Grading

• Homework: 20%• Paper reviews: 10%• Project: 25%• Midterm: 20%• Final: 25%

Page 24: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Homework: 20%

• 4 assignments throughout the course– written problem sets– practical experience with SQL, XQuery

Page 25: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Paper Reviews: 10%

• Approximately 5 classic papers will be assigned

• Short written reviews are due before the day of class. Email to: – [email protected]

First paper review:Read Sec 1 of Codd’s paper Due Wed Feb 8th

Page 26: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Project: 25%• General theme: apply database principles to a new

problem• Suggested topics will be discussed next Monday• Groups of 2 preferred. 3 possible.• Project work will include:

– Reading some of the research literature– Implementation– Written report– In-class presentation

• Periodic consultation with one of the instructors

Page 27: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Exams

• Midterm (20%)– in-class, Monday, Apr 3

• Final (25%)– Thursday, May 25, 10:30am

Page 28: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Textbook

Database Management Systems Ramakrishnan and Gehrke

Page 29: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Other useful resources• Database systems: the complete book (Ullman,

Widom and Garcia-Molina)• Readings in Database Systems (Stonebraker and

Hellerstein)• Foundations of Databases (Abiteboul, Hull, Vianu)• Data on the Web (Abiteboul, Buneman, Suciu)• Parallel and Distributed DBMS (Ozsu and Valduriez)• Transaction Processing (Gray and Reuter)• Data and Knowledge based Systems (volumes I, II)

(Ullman)• Proceedings of SIGMOD, VLDB, PODS conferences.

Page 30: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Communication

• Instructors– Office hours: (see website)– Email: [email protected]

• Check the course webpage often• Course bulletin board, mailing list

31

Page 31: Overview of Databases - avid.cs.umass.eduavid.cs.umass.edu/courses/645/s2006/lectures/645-Lec1-CourseIntro.pdf · Databases & DBMS’s • A database is a large, integrated collection

Questions about the course?

32