advanced databases introduction dr. toon calders prof. dr. jan paredaens

46
Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Upload: angelica-sharp

Post on 22-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Advanced DatabasesIntroduction

dr. Toon Caldersprof. dr. Jan Paredaens

Page 2: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Outline

• Motivation for the course• Other DH courses

• Practical organization• Course topics• Project

• Overview of changes

Page 3: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Motivation for the Course

• Database = a piece of software to handle data:− store,

− maintain, and

− query

• Most ideal system situation-dependent• data type: simple / semi-structured / complex / …

• types of queries: simple lookup / analytical / …

• type of usage: multi-user / single-user / distributed / …

• …

Page 4: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Motivation for the Course

• Relational databases are tuned towards:• simple data

• simple, ad-hoc queries

• multiple users

• Other models are more suitable for other types of data• Object-Oriented,

• Deductive,

• Semi-Structured Databases,

• Data warehouses

Page 5: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Motivation for the Course

• Study different data models• Advantages, disadvantages

• Conceptual level

− what are the important notions?

• What’s underneath?

• In a scientific way• exact, not just claims

Page 6: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Motivation for the Course

• Student knows:• different database models

• Understands:• why they are introduced

• conceptual notions

• Is able to:• quickly master vendor-specific products

Page 7: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Outline

• Motivation for the course• Other DH courses

• Practical organization• Course topics• Project

• Overview of changes

Page 8: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Other DH Courses

• Relational database systems(2ID05) Databases and Data Modelling

(2ID35) Database Technology

transations, indexing, query optimization, distributed DB

• Other database models

(2ID45) Advanced Databases• (2II15) Data Mining• (2ID25) Information Retrieval• (2ID99) Capita Selecta DH

Page 9: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Outline

• Motivation for the course• Other DH courses

• Practical organization• Course topics• Project

• Overview of changes

Page 10: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Practical Organization

In principle …

• Wed 8:45 10:30 Practical session M 1.46− no new material

− opportunity to practice, ask questions

− together solve exercises

• Fri 10:45 12:30 Lectures HG 6.09− XML : Paredaens (6 lectures)

− other parts: Calders

Page 11: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Practical Organization

• Important information

http://wwwis.win.tue.nl/~tcalders/teaching/advancedDB/

• Subscribe to 2ID45 on studyweb !• messages to the whole class group

− lecture postponed, room changes, …

[email protected]

Page 12: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Practical Organization

• Course material• Book:

Silberschatz, Korth, Sudarshan. Database system concepts 5th edition. McGraw-Hill International

• Lots of additional material on course webpage

− papers

− slides

− solutions to exercises

− …

Page 13: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Practical Organization

• Grades:• 70% written exam

• 30% group project

• No project = no grade• Grade for the project can be transfered to August,

similar for grade for the exam• Grades expire in August

Page 14: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Outline

• Motivation for the course• Other DH courses

• Practical organization• Course Topics• Project

• Overview of changes

Page 15: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Course Topics

• Limitations of the relational model

• Deductive databases

• Object-Oriented Databases

• Data Warehousing & OLAP

• Semi-Structured data

Page 16: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Limitations of the relational model

• Not every query can be expressed• Transitive closure cannot be expressed in Relational

Algebra− Give all cities reachable from Antwerp by plane− Give all smallest components of a part− Give all decendants of person X

• Not even if you’re very smart …− proof

• Extension to other relational query languages

Page 17: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Deductive Databases

• Motivation is two-fold:• add deductive capabilities to databases; the database

contains:

− facts (intensional relations)

− rules to generate derived facts (extensional relations)

Database is knowledge base

• Extend the querying

− datalog allows for recursion

Page 18: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Deductive Databases

• Datalog as engine of deductive databases• similarities with Prolog

• has facts and rules

• rules define -possibly recursive- views

• Semantics not always clear• safety

• negation

• recursion

Page 19: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Deductive Databases

g(a,b). g(b,c). g(a,d).

reach(X,X) :- g(X,Y).

reach(X,Y) :- g(X,Y).

reach(X,Z) :- reach(X,Y), reach(Y,Z).

node(X) :- g(X,Y).

node(Y) :- g(X,Y).

unreach(X,Y) :- node(X), node(Y), not reach(X,Y).

Page 20: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Deductive Databases

• In this topic we study:

• How to handle negation and recursion in the same program

• How to efficiently evaluate Datalog queries

Page 21: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

OO Databases

• Many applications require the storage and manipulation of complex data• design databases

• geometric databases

• …

• Object-Oriented programming languages manipulate complex objects• classes, methods, inheritance, polymorphism

Page 22: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

OO Databases

• Very simple example:• Class book

− set of authors

− title

− set of keywords

Extremely simple to model in OO language

Hard in relational database!

Page 23: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

OO Databases

• In many applications persistency of the data is nevertheless required• protection against system failure

• consistency of the data

• Mapping: object in OO language tuples of atomic values in relational database is often problematic

Page 24: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

OO Databases

• Either we ignore the multivalued dependencies

• This table is in 3NF, BCNF

Title Author Keyword

Database System Concepts Silberschatz Database

Database System Concepts Korth Database

Database System Concepts Sudarshan Database

Database System Concepts Silberschatz Storage

Database System Concepts Korth Storage

Database System Concepts Sudarshan Storage

Page 25: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

OO Databases

• Or we go to 4NF

Title Author

Database System Concepts Silberschatz

Database System Concepts Korth

Database System Concepts Sudarshan

Title Keyword

Database System Concepts Database

Database System Concepts Storage

Page 26: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

OO Databases

• Basically OODB = persistent OO programming language• Very important concept

• rather uninteresting scientifically

• This topic will mainly be self-study• Reading bookchapter + Q & A session

Page 27: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Data Warehousing & OLAP

DataWarehouse

ExtractTransformLoadRefresh

OLAP Engine

Monitor&

IntegratorMetadata

Data Sources Front-End Tools

Serve

Data Marts

Operational DBs

other

sources

Data Storage

OLAP Server

Analysis

Query/Reporting

Data Mining

ROLAPServer

Page 28: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Data Warehousing & OLAP

Transaction processing

• Operational setting• Up-to-date = critical

• Simple data

• Simple queries; only « touch » a small part of the database

Flight reservations

• ticket sales• do not sell a seat twice

• reservation, date, name

• Give flight details of X

List flights to Y

Page 29: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Data Warehousing & OLAP

Decision support

• Off-line setting• « Historical » data• Summarized data

• Integrate different databases• Statistical queries

Flight company

• Evaluate ROI flights• Flights of last year• # passengers per carrier for

destination X• Passengers, fuel costs,

maintenance info• Average % of seats

sold/month/destination

Page 30: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Data Warehousing & OLAP

• In this topic we will study:• Conceptual models for decision support

• Database explosion problem

• Efficient implementation strategies

− indexing, view materialization

Page 31: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

XML

• Why is XML important?• simple open non-proprietary widely accepted data

exchange format

• XML is like HTML but• no fixed set of tags

− X = “extensible”• no fixed semantics (c.q. representation) of tags

− representation determined by separate ‘stylesheet’− semantics determined by application

• no fixed structure− user-defined schemas

Page 32: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

<PersonList Type="Student" Date="2004-12-12"> <Title Value="Student List"/> <Contents> <Person> <Name>Jan Vijs</Name> <Id>11</Id> <Address> <Number>123</Number> <Street>Turnstreet</Street> </Address> </Person> <Person> <Id>66</Id> <Address>

<Street>Hole Rd</Street> </Address> </Person> </Contents></PersonList>

XML

Page 33: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

XML

• In this topic:• XML

• XQuery, XSLT

• LiXQuery

• Taught by prof Paredaens

Page 34: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Outline

• Motivation for the course• Other DH courses

• Practical organization• Course Topics• Project

• Overview of changes

Page 35: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Project

• Pick one of the 4 topics:• deductive databases / rule-based systems

• object-oriented databases

• data warehouses

• semi-structured databases

• Formulate your own project• illustrating the different course concepts

• showing you mastered the technology

Page 36: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Project

• Make a project proposal ( WEEK 10 )• examples of last year will be given

• fulfilling certain constraints

• listing technologies to be used

• Status report ( WEEK 15 )

• Final report ( WEEK 20 )

• Project presentations ( WEEKS 21 & 22 )

Page 37: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Outline

• Motivation for the course• Other DH courses

• Practical organization• Course Topics• Project

• Overview of changes

Page 38: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Overview of Changes

• First some facts and figures regarding Spring 2008• Heterogeneous group

− Outside NL, HBO, BSc TU/e

CSE

BIS

Page 39: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Overview of Changes

• Some suggestions I decided to act upon:

1. Start with the difficult material:− expressiveness of RA− Gaifman locality

2. Too much time is being spent on XML− (5+5) (6+3) & topic (XSLT) has been added

3. Disproportional weight given to XML in exam− project no longer exclusively XML

Page 40: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Overview of Changes

• Some suggestions I decided to act upon:

4. Some materials and instruction just too hard

− extra exercices will be added; more modular

5. The course was split up in lots of individual subjects, with no apparent relation to one another

− tried to handle that in the course motivation

Page 41: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Overview of Changes

• Some suggestions that were ignored:

A google for 'advanced databases' returns quite some courses from other universities that look interesting to me. Perhaps the lecturers could take a look at those.

− When (re-)constructing the course last year other universities’ ADB courses were surveyed. Many of the interesting topics are already handled in other courses (Data Mining, Information retrieval, Database technology)

Page 42: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Overview of Changes

• Some suggestions that were ignored:

Don't discuss prerequisite knowledge too much, it is prerequisite.

Heterogeneous group.

Balance the course subjects more, TC was discussed very specific while the other 3 subjects where treated in global.

Time spent on TC is justified by its difficulty and its importance for database theory + motivates OODB & Deductive DB

Page 43: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Overview of Changes

• Take-away message• (some?) lecturers do act on questionnaires

• filling out the questionnaires is useful

Page 44: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Overview of Changes

• Take-away message• (some?) lecturers do act on questionnaires

• filling out the questionnaires is useful

Page 45: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

Summary

• Relational model has limitations• simple queries• simple data

• OODBs allow complex data types

• Deductive databases, datalog complex queries

• Somewhere in-between: datawarehouses and OLAP• special requirements, special datastructures

• Semi-structured data can be stored in XML

• Project complements theoretical lectures

• Instructions for clarification

Page 46: Advanced Databases Introduction dr. Toon Calders prof. dr. Jan Paredaens

!! See you on Friday !!