introduction to databases: relational and xml models and languages instructors: bertram ludaescher...
TRANSCRIPT
![Page 1: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/1.jpg)
Introduction to Databases:Introduction to Databases:Relational and XML Models Relational and XML Models and Languagesand Languages
Introduction to Databases:Introduction to Databases:Relational and XML Models Relational and XML Models and Languagesand Languages
Instructors:
Bertram LudaescherKai Lin
Instructors:
Bertram LudaescherKai Lin
![Page 2: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/2.jpg)
2Introduction to Databases, B. Ludaescher & K. Lin
Overview
• 09:15-10:20 Relational Databases (1h05’)• 10:20-10:30 BREAK (10’)• 10:30-11:50 Relational Databases (1h20’)• 11:50-13:15 LUNCH (1h25’)• 13:15-13:45 Demo & Hands-on (30’)• 13:45-15:10 XML: Basics (1h25’)• 15:10-15:30 BREAK (20’)• 15:30-16:30 XML: Querying (1h)• 16:30-17:00 Demo & Hands-on (30’)
![Page 3: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/3.jpg)
3Introduction to Databases, B. Ludaescher & K. Lin
Scope• Today: Introduction to Databases, in
particular– Relational database model– Relational Operations and Queries– Constraints– XML “data” model– Querying and transforming XML– Some demos & simple hands-on exercise
• Tomorrow: – Introduction to Knowledge Representation and
Ontologies
• But first: … déjà vu …
![Page 4: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/4.jpg)
4Introduction to Databases, B. Ludaescher & K. Lin
![Page 5: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/5.jpg)
5Introduction to Databases, B. Ludaescher & K. Lin
What is a Database System?
• Database (system) = – Database Instance (set of tables of rows)– Database Management System (DBMS)
• Origins in the commercial world: – to organize, query, and manipulate data
more effectively, efficiently, and independently
• Scientific databases– often special features:
• spatial, temporal, spatiotemporal, GIS, units, uncertainty, raw & derived data, …
![Page 6: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/6.jpg)
6Introduction to Databases, B. Ludaescher & K. Lin
Why not just use files as “databases”?• For some applications: yeah… why not?• But in general:
– scanning & ’grep’ing large files can be very inefficient– no language support for selecting desired data, joining
them, etc.• cannot express the kinds of questions/queries you’d like to ask• ‘grep’ is no substitute for a query language
– redundant and/or inconsistent storage of data– no transaction management and concurrency control
among multiple users– no security– no recovery– no data independence (application data)– no data modeling support– …
![Page 7: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/7.jpg)
7Introduction to Databases, B. Ludaescher & K. Lin
Features of a Database System• A data model (relational, object-
oriented, XML) prescribes how data can be organized:– as relations (tables) of tuples (rows)– as classes of (linked) objects– as XML trees
• A (database) schema (stored in the “data dictionary”) defines the structure of a specific database instance:– Relational schema– OO schema– XML Schema (or XML DTD)
![Page 8: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/8.jpg)
8Introduction to Databases, B. Ludaescher & K. Lin
Features of a Database System• Data is treated uniformly and separately from the
application• Efficient data access • Queries and views are expressed over the schema• Integrity constraints (checking and enforcement)• Transactions combine sets of operations into
logical units (all-or-nothing)• Synchronization of concurrent user transactions• Recovery (after system crash)
– not to be confused w/ backup– instead: guarantee consistency by “roll-back” of partially
executed transactions (how? Hint: logging)
• …
![Page 9: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/9.jpg)
9Introduction to Databases, B. Ludaescher & K. Lin
DB features, e.g., Concurrency Control
• Concurrent execution of simultaneous requests – long before web servers where around... – transaction management guarantees consistency
despite concurrent/interleaved execution• Transaction (= sequence of read/write operations)
– Atomicity: a transaction is executed completely or not at all
– Consistency: a transaction creates a new consistent DB state, i.e., in which all integrity constraints are maintained
– Isolation: to the user, a transaction seems to run in isolation
– Durability: the effect of a successful (“committed”) transaction remains even after system failure
![Page 10: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/10.jpg)
10Introduction to Databases, B. Ludaescher & K. Lin
Levels of Abstraction: Architecture Overview
DBinstances
Physical level
Logical (“conceptual”) level
View 1 View 2 View nConceptual…
Level
Index structures
Tables
Export schemas
ER-Model(Entity-Relationship) OO Models (Classes…)
part of DB design conceptual design
… often lost in the process…
logical dataindependence
physical data independence
User
![Page 11: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/11.jpg)
11Introduction to Databases, B. Ludaescher & K. Lin
Database Design: Entity-Relationship (ER) Model
• Entities:• Relationships:• Attributes:• ER Model:
– initial, high-level DB design (conceptual model)– easy to map to a relational schema (database tables)– comes with more constraints (cardinalities, aggregation) and
extensions: EER (is-a => class hierarchies)– related: UML (Unified Modeling Language) class diagrams
Employee Departmentworks-for
Name Salary ManagerName
since
![Page 12: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/12.jpg)
12Introduction to Databases, B. Ludaescher & K. Lin
The Relational Model
• Relation/Table Name:– employee, dept
• Attributes = Column Names:– Emp, Salary, DeptNo, Name, Mgr
• Relational Schema:– employee(Emp:string,
Salary:integer, DeptNo:integer), ...
• Tuple = Row of the table:– (“tom”, “60000”, “1”)
• Relation = Set of tuples:– {(...), (...), ...}
Emp Salary DNotom 60k 1tim 57k 1sally 45k 3carol 30k 1carol 35k 2….
Employee
DNo Name Mgr 1 Toys carol2 Comp. carol3 Shoes sam
Department
FK: foreign key, pointing to another key
![Page 13: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/13.jpg)
13Introduction to Databases, B. Ludaescher & K. Lin
Ex: Creating a Relational Database in SQL
CREATE TABLE employee (ssn CHAR(11),name VARCHAR(30),deptNo INTEGER,PRIMARY KEY (ssn),
FOREIGN KEY (deptNo) REFERENCES department )
CREATE TABLE department (deptNo INTEGER,name VARCHAR(20),manager CHAR(11),PRIMARY KEY (deptNo),
FOREIGN KEY (manager) REFERENCES employee(ssn) )
![Page 14: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/14.jpg)
14Introduction to Databases, B. Ludaescher & K. Lin
What is a Query?• Intuitively:
– An “executable question” in terms of a database schema
– Evaluating a query Q against a database instance D yields a set of answer objects:
• Relational tuples or XML elements
• Example:– Who are the employees in the ‘Toys’ dept.?– Who is (are) the manager(s) of ‘Tom’?– Show all pairs (Employee, Mgr)
• Technically: – A mapping from an input schema (the given table
schemas) to a result schema (the new columns you are interested in) defined in some query language
![Page 15: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/15.jpg)
15Introduction to Databases, B. Ludaescher & K. Lin
Why (Declarative) Query Languages?
• Things we talk and think about in PLs and QLs … – Assembly languages:
• registers, memory locations, jumps, ...
– C and the likes: • if-then-else, for, while, memory (de-)allocation,
pointers, ...
– Object-oriented languages:• C++: C plus objects, methods, classes, ...• Java: objects, methods, classes, references, ... • Smalltalk: objects, objects, objects, ... • OQL: object-query language
,,Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt.”“The limits of my language mean the limits of my world.”
Ludwig Wittgenstein, Tractatus Logico-Philosophicus
“If you have a hammer, everything looks like a nail.”
![Page 16: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/16.jpg)
16Introduction to Databases, B. Ludaescher & K. Lin
Why (Declarative) Query Languages?
• Things we talk and think about in PLs and QLs …– Functional languages (Haskell, ML):
• (higher-order) functions, fold(l|r), recursion, patterns, ...
=> Relational languages (SQL, Datalog)• relations (tables), tuples (rows); conceptual level: ER• relational operations: , , , , ..., ,,,,,..., , , |X|
=> Semistructured/XML (Tree) & Graph Query Languages
• trees, graphs, nodes, edges, children nodes, siblings, … • XPath, XQuery, …
• Also: – Focus on what, and not how!
![Page 17: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/17.jpg)
17Introduction to Databases, B. Ludaescher & K. Lin
Example: Querying a Relational Database
Emp SalaryDeptNoanne 62k 2john 60k 1
Employee
DeptNoMgr
1 anne2 anne
Department
SELECT e.Emp, d.MgrFROM Employee e, Department dWHERE e.DeptNo = d.DeptNo
Emp Mgrjohn anneanne anne
result
join
input tables
SQL query (or view def.)
answer (or view)
we don’t say how to evaluate this expression
![Page 18: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/18.jpg)
18Introduction to Databases, B. Ludaescher & K. Lin
Example Query: SQL vs DATALOG
• “List all employees and their managers”
• In SQL:SELECT e.name, d.managerFROM Employee e, Department dWHERE e.deptNo = d.deptNo
• In DATALOG:q(E, M) :- employee(E, S, D), department(D, N,
M).
a “join” operation
![Page 19: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/19.jpg)
19Introduction to Databases, B. Ludaescher & K. Lin
Important Relational Operations• select(R, Condition)
– filter rows of a table wrt. a condition
• project(R, Attr) – remove unwanted columns; keep rest
• join(R1, A2, R2, A2, Condition) – find “matches” in a “related” table – e.g. match R1.foreign key = R2.primary key
• cartesian product(R1, R2)• union (“OR”), intersection (“AND”)• set-difference (“NOT IN”)
![Page 20: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/20.jpg)
20Introduction to Databases, B. Ludaescher & K. Lin
Relational Operations (in DATALOG)
condition
Y1=Y2Y
independent
same
multiple rules union results
(query) output :– (query) input(query) output :– (query) input
![Page 21: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/21.jpg)
21Introduction to Databases, B. Ludaescher & K. Lin
Demo
Relational Queries in DATALOG
![Page 22: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/22.jpg)
22Introduction to Databases, B. Ludaescher & K. Lin
Queries, Views, Integrity Constraints• … can all be seen as “special queries”
• Query q(…) :- … ad-hoc queries
• View v(…) :- … exported views;
• Integrity Constraints– ic (…) :- …. MgrSal < EmpSal … – say what shouldn’t happen– if it does: alert the user (or refuse an update,
…)
![Page 23: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/23.jpg)
23Introduction to Databases, B. Ludaescher & K. Lin
Query Evaluation vs Reasoning• Query evaluation
– Given a database instance D and a query Q, run Q(D)– What databases do all the time
• Reasoning (aka “Semantic Query Optimization”)– Given a query Q and a constraint C, “optimize” Q&C
(e.g., given C, Q might be unsatisfiable)– Given Q1 and Q2 decide whether Q1 Q2– Given Q1,Q2, C decide whether Q1 Q2 | C
– Note: we are NOT given a database instance D here; just the schema and the query/IC expressions
![Page 24: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/24.jpg)
24Introduction to Databases, B. Ludaescher & K. Lin
Summary QLs for Relational Databases
Natural Hoin: same attribute name add condition that values must match
![Page 25: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/25.jpg)
25Introduction to Databases, B. Ludaescher & K. Lin
Relational Algebra
![Page 26: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/26.jpg)
26Introduction to Databases, B. Ludaescher & K. Lin
Relational Algebra
![Page 27: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/27.jpg)
27Introduction to Databases, B. Ludaescher & K. Lin
Relational Algebra
![Page 28: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/28.jpg)
28Introduction to Databases, B. Ludaescher & K. Lin
Relational Algebra
![Page 29: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/29.jpg)
29Introduction to Databases, B. Ludaescher & K. Lin
Relational Algebra
![Page 30: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/30.jpg)
30Introduction to Databases, B. Ludaescher & K. Lin
Hands-on Part
DBDesigner 4
![Page 31: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/31.jpg)
31Introduction to Databases, B. Ludaescher & K. Lin
DBDesigner 4 • An open source (GPL) database design tool 1. Goto http://www.fabforce.net/dbdesigner4/ 2. Download and install (5 min)3. Open the example schema Order (File -> Open -> Order), and
examine the relations between the tables forumtopic and forumpost. Find the foreign key used in the relation postHasTopic (5 min)
4. Connect to a sample MySQL database host: geon07.sdsc.edu port: 3306 database: summer_institute username: root password: [blank]
(5 min)5. Select all records in the table forumtopic (2 min)6. Select all records in the table forumpost, and sort the result
according to their idforumpost (3min)7. Find all forum posts with the topic Cars (5 min)8. Insert a record into the table forumpost (5 min)
![Page 32: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/32.jpg)
32Introduction to Databases, B. Ludaescher & K. Lin
Additional Material
(not presented)
![Page 33: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/33.jpg)
33Introduction to Databases, B. Ludaescher & K. Lin
Non-Relational Data Models• Relational model is “flat”: atomic data values
– nesting is modeled via “pointers” (foreign keys) and “Skolem-ids”
– extension: nested relational model (“tables within tables”, cf. nested HTML tables)
– values can be nested lists {...}, tuples (...), sets [...]– ISO standard(s): SQL– identity is value based
• Object-oriented data model:– complex (structured) objects with object-identity (oid)– class and type hierarchies (sub-/superclass, sub-/supertype)– OODB schema may be very close to “world model” (no
translation into tables)(+) queries fit your OO schema(-) (new) queries that don’t fit nicely
– ODMG standard, OQL (Object Query Language)
![Page 34: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/34.jpg)
34Introduction to Databases, B. Ludaescher & K. Lin
Example: Object Query Language (OQL)
• Q: what does this OQL query compute?• Note the use of path expressions like e.manager.children=> Semistructured/Graph Databases
SELECT DISTINCT STRUCT( E: e.name, C: e.manager.name, M: ( SELECT c.name FROM c IN e.children
WHERE FOR ALL d IN e.manager.children: c.age > d.age ) ) FROM e IN Employees;
![Page 35: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/35.jpg)
35Introduction to Databases, B. Ludaescher & K. Lin
A Graph Database
![Page 36: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/36.jpg)
36Introduction to Databases, B. Ludaescher & K. Lin
Querying Graphs with OO-Path Expressions
?- dblp."Inf. Systems".L.P, substr("Volume",L), P : person.spouse[lives_in = P.lives_in].
?- dblp."Inf. Systems".L."Michael E. Senko".Answer:
L="Volume 1, 1975”;L="Volume 5, 1980".
![Page 37: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/37.jpg)
37Introduction to Databases, B. Ludaescher & K. Lin
Constructs for Querying Graphs
Example: ?- dblp . any* . (if(vldb)| if(sigmod))
![Page 38: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/38.jpg)
38Introduction to Databases, B. Ludaescher & K. Lin
Keys, Keys, and more Keys• A key is a minimal set of attributes that:
– uniquely identify a tuple– determine every other attribute value in a tuple
• There may be many keys for a relation; we designate one as the primary key
• The phrase candidate key is used in place of “key” where “the key” denotes the primary key
• A superkey is a superset of a key (i.e., not necessarily minimal)
![Page 39: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/39.jpg)
39Introduction to Databases, B. Ludaescher & K. Lin
Example of “good” design
Employee(EName, SSN, BDate, Address, DNumber)Employee(EName, SSN, BDate, Address, DNumber)
Example of “bad” design (why is it bad?)
Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)
Normalization of Relations
Department(DName, DNumber, DMgrSSN)Department(DName, DNumber, DMgrSSN)
![Page 40: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/40.jpg)
40Introduction to Databases, B. Ludaescher & K. Lin
The description of the department (DName, DMgrSSN) is repeated for every employee that works in that department.
Redundancy!
The department is described redundantly.This leads to update anomalies! (… and wastes space)
Digression (for experts only): • redundancy can be used to increase performance; e.g. “materialized views”)
What’s Wrong?
Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)
![Page 41: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/41.jpg)
41Introduction to Databases, B. Ludaescher & K. Lin
Update Anomalies (caused by redundancy)
Insertion AnomaliesIf you insert an employee
Need to know which department he/she worksNeed to know the description information for that department
If you want to insert a department, you can’t … until there is at least one employee
Deletion Anomalies: if you delete an employee, is that dept. gone? was this the last employee in that dept?
* Modification Anomalies: if you change DName, for example, it needs to be changed everywhere!
![Page 42: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/42.jpg)
42Introduction to Databases, B. Ludaescher & K. Lin
Null values also cause problemsNull values might help in special cases, but are
not a general solution to update anomalies
For example, they may:– Waste space– Make it hard to specify and understand joins– Make it hard to aggregate (count, sum, etc.)– Have different meanings:
• Attribute does not apply to this tuple• Attribute value is unkown• Value is known but absent (not yet recorded)
– Causes problems with queries (can’t interpret query answers)
![Page 43: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/43.jpg)
43Introduction to Databases, B. Ludaescher & K. Lin
Why worry about normalization?
• To … • … reduce redundancy & update anomalies
• … reduce the need for null values
• Last not least: it’s a solved problem:•all algorithms & proofs have been worked out
Here: Normalization based on FDs (there’s more…)
![Page 44: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/44.jpg)
44Introduction to Databases, B. Ludaescher & K. Lin
Functional DependenciesStatement that if two tuples agree on attributes A
they must agree on attributes B
A B
If the value of the first attribute(s), A, is known, then the value of the second attribute(s), B, is known (read “A determines B”)
We want to know if it is always true in the application
![Page 45: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/45.jpg)
45Introduction to Databases, B. Ludaescher & K. Lin
Functional Dependencies
Examples of functional dependencies: social-security-number employee-name course-number course-title
Examples that are NOT functional dependencies
course-number - book course-number - car-color
![Page 46: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/46.jpg)
46Introduction to Databases, B. Ludaescher & K. Lin
Remember what it means to be a function:
x f(x) x g(x) x h(x)1 2 1 2 1 101 3 2 2 2 202 5 3 5 3 303 5
we are looking for functional relationships(that must occur in a relation) among attribute values
What is a functional dependency?
f is not a functionfor x=1, f(x) is not unique
![Page 47: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/47.jpg)
47Introduction to Databases, B. Ludaescher & K. Lin
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)
EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)
What are the FDs?
![Page 48: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/48.jpg)
48Introduction to Databases, B. Ludaescher & K. Lin
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)
EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)
12
3
4
5
6
7
8
9
10
What are the FDs?
![Page 49: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/49.jpg)
49Introduction to Databases, B. Ludaescher & K. Lin
EMPLOYEE (SSN, NAME, SALARY, JOB_DESC)
Why do we care?
If all FDs are “implied by the key”
it means that the DBMS only enforces keys (not FDs) and the DBMS is going to enforce the keys anyway
otherwise, we have to perform expensive operations to maintain consistency (code or check statements)
![Page 50: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/50.jpg)
50Introduction to Databases, B. Ludaescher & K. Lin
Decomposition
Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD) based on the projection operator.
Decomposition should be used judiciously:– Is there reason to decompose a relation?
(via Normal Forms)– What problems (if any) does the
decomposition cause? (lost information or dependencies?)
![Page 51: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/51.jpg)
51Introduction to Databases, B. Ludaescher & K. Lin
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)
EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)
EMP1(SSN, ENAME, BDATE, ADDRESS, DNUM)
X(DNUM, DNAME, DMGRSSN)
EMP2(SSN, ENAME)
X(PNUM, PNAME, PLOCATION)
Y(SSN, PNUM, HOURS)
12
3 4
5 6
7
8 9
10
How can we decompose using the Project Operator?
![Page 52: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/52.jpg)
52Introduction to Databases, B. Ludaescher & K. Lin
Correct DecompositionsHow do we know if a decomposition is correct?That we haven’t lost anything?
• We have three goals:
lossless-join decomposition (don’t throw any information away)(be able to reconstruct the original relation)
dependency preservationall of the FDs end up in just one relation (not split across two or more relations)
Boyce-Codd Normal Form (BCNF) - no redundancy
![Page 53: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/53.jpg)
53Introduction to Databases, B. Ludaescher & K. Lin
Lossless Decompositions• What is a lossless decomposition?• What is a lossy decomposition?
When R is decomposed into R1 and R2
Check to see if (R1 R2) = R
if it is a lossy decomposition, then R1 R2
gives you TOO MANY tuples.
Note: we are doing a natural join
![Page 54: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/54.jpg)
54Introduction to Databases, B. Ludaescher & K. Lin
Example: a lossy decomposition
Now join them using natural join: we get extra tuples!!!
1 smith p2 billing
Original: Employee SSN Name Project Ptitle
1 Smith p1 accounting 2 Jones p1 accounting 3 Smith p2 billing
Decomposition: Employee SSN Name Project PID Ptitle Name
1 Smith p1 accounting Smith 2 Jones p1 accounting Jones 3 Smith p2 billing Smith
![Page 55: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/55.jpg)
55Introduction to Databases, B. Ludaescher & K. Lin
Testing for a Lossless DecompositionFor decomposition into two relations
Let R1 and R2 form a decomposition of R.
R1 and R2 are both sets of attributes from R.
For the decomposition to be lossless ... The attributes in common must be a key for 1 of
the relations!
Note: You are joining a key and a foreign key
![Page 56: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/56.jpg)
56Introduction to Databases, B. Ludaescher & K. Lin
Example: test for a lossless decomposition
Employee(SSN, name, project, p-title)
decomposition: Employee (SSN, name) Project (project, p-title, name)
Which attribute is in common? Employee.Name and Project.Name
Is name a key for either of these two tables?NO! We have a problem.
![Page 57: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/57.jpg)
57Introduction to Databases, B. Ludaescher & K. Lin
Example: test for a lossless decomposition
Employee(SSN, name, project, p-title)
decomposition: Employee (SSN, name) Project (project, p-title)
Which attribute is in common? None
Is this decomposition lossless? NO! We have a problem.
![Page 58: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/58.jpg)
58Introduction to Databases, B. Ludaescher & K. Lin
Example: test for a lossless decomposition
Employee(SSN, name, project, p-title)
decomposition: Employee (SSN, name, project) Project (project, p-title)
Which attribute is in common? Employee.project and Project.project
Is project a key for either of these two tables?YES! We have a lossless decomposition.
![Page 59: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/59.jpg)
59Introduction to Databases, B. Ludaescher & K. Lin
Some Preliminary Normal Form Definitions
Prime Attribute - an attribute A is prime if it is a member of a key, otherwise it is nonprime
Partial Dependency - given an FD XY, Y is partially dependent on X if there is a proper subset X of X such that XY.
Transitive Dependency - A is transitively
dependent upon X if XY, Y-/X, and YA and AXY.
![Page 60: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/60.jpg)
60Introduction to Databases, B. Ludaescher & K. Lin
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)
EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)
12
3
4
5
6
7
8
9
10
What are the FDs?
![Page 61: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/61.jpg)
61Introduction to Databases, B. Ludaescher & K. Lin
Normal Forms Based on FDs
1NF - all attribute values (domain values) are atomic(part of the definition of the relational model)
2NF - 1NF + no key partial dependencies (every nonprime attribute fully depends on every key of R)
R(A B C D) B C (not allowed)
3NF - 2NF + no nonprime attribute is transitively dependent upon any key
R(A B C D) AB C, C D (not allowed)
BCNF - 3NF + no attribute is transitively dependent upon any key (all FDs determined by a key)
R(A B C D) AB C, C B (not allowed)
![Page 62: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/62.jpg)
62Introduction to Databases, B. Ludaescher & K. Lin
BCNF, Lossless, and Dependency-Preserving
(first choice)
3NF, Lossless and Dependency-Preserving
(second choice)
because sometimes you can’t preserve all dependencies
What’s the Goal?
![Page 63: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/63.jpg)
63Introduction to Databases, B. Ludaescher & K. Lin
Finding all of the FDsArmstrong’s Axioms
Reflexivity: If X Y, then X YTrivially, all attributes A A, e.g., name name and gender gender
Augmentation: If X Y, then XZ YZ for any ZTransitivity: If X Y and Y Z, then X Z
X, Y, and Z are sets of attributes in RArmstrong’s Axioms are a sound & complete set of inference rules
Union: If X Y and X Z, then X YZDecomposition: If X YZ, then X Y and X Z
![Page 64: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/64.jpg)
64Introduction to Databases, B. Ludaescher & K. Lin
Finding all FDs: the closure of a set of FDs
The closure of a set of FDs:
Let F be a set of FDs and F+ the closure of F.
F+ is the set of all FDs implied (or derivable) from F using Armstrong’s Axioms
F+ is computed by applying the inference rules until no new FDs can be found
![Page 65: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/65.jpg)
65Introduction to Databases, B. Ludaescher & K. Lin
What is “Dependency Preserving”
Suppose F is the original set of FDs and G is the set of FDs after decomposition
If we compute F+ and G+, and F+ = G+ then the decomposition is dependency preserving
![Page 66: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/66.jpg)
66Introduction to Databases, B. Ludaescher & K. Lin
Other Results (textbook)
Algorithms To:– Compute F+
– Compute the Minimal Cover for F– Find a dependency-preserving
decomposition into 3NF– Find a lossless-join decomposition into
BCNF– Find a lossless-join & dependency
preserving decomposition into 3NF
![Page 67: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/67.jpg)
67Introduction to Databases, B. Ludaescher & K. Lin
addr(number, street, city, state, zip)
number, street, city, state zipzip state
If we decompose, this FD won’t occur within one relation. Thus, since the DBMS only enforces keys (and not FDs directly), this can’t be enforced.
(If we leave it alone, it is in 3NF)
Example: Not dependency preserving
![Page 68: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/68.jpg)
68Introduction to Databases, B. Ludaescher & K. Lin
Lossless join decomposition algorithm1. Set D = {R} (the current set of relations)
2. While there is a relation in D that’s not in BCNFChoose a relation scheme Q that is not in BCNFFind a FD X Y in Q that violates BCNFReplace Q in D by (Q – Y) and (X Y)
End While;
3. Identify dependences that are not preserved (X A).
4. Add XA as a table to the set D
![Page 69: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/69.jpg)
69Introduction to Databases, B. Ludaescher & K. Lin
Tuning the Conceptual Schema
The choice of conceptual schema should be guided by the workload, in addition to redundancy issues:
– We may settle for a 3NF schema rather than BCNF.
– E.g., the workload may influence our choice to decompose a relation into 3NF or BCNF.
– We may further decompose a BCNF schema!
– We might denormalize (i.e., undo a decomposition step), or add fields to a relation. We must take care to avoid the problems caused by the redundancy!
![Page 70: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/70.jpg)
70Introduction to Databases, B. Ludaescher & K. Lin
Decomposition of one table into several
A relation is replaced by a collection of relations that are projections. Most important case. (Vertical partitioning)
• Sometimes, we might want to replace a relation by a collection of relations that are selections. (Horizontal partitioning)
– Each new relation has the same schema as the original, but a subset of the rows.
– Collectively, new relations contain all rows of the original. Typically, the new relations are disjoint.
Why might we do this?
![Page 71: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/71.jpg)
71Introduction to Databases, B. Ludaescher & K. Lin
Tuning the Conceptual Schema
Vertical decomposition using the project operator
Coursec# cname instructor room days
Course2c# cname
Course1c# room days
Course3c# instructor
![Page 72: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/72.jpg)
72Introduction to Databases, B. Ludaescher & K. Lin
Tuning the Conceptual Schema
Horizontal partition/decomposition using the select operator
Coursec# cname instructor room days
Undergraduate-Coursec# cname instructor room days
Graduate-Coursec# cname instructor room days
![Page 73: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/73.jpg)
73Introduction to Databases, B. Ludaescher & K. Lin
Tuning the Conceptual Schema
Denormalizing……always introduces redundancy!Course-Offering (offering#, quarter, c#, instructor, time, days)
Course-Offering (offering#, quarter, c#, cname, instructor, room, time, days)
This introduces redundancy - which must be managed.
Course (c#, cname) Instructor-Room (instructor, room)
![Page 74: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin](https://reader036.vdocument.in/reader036/viewer/2022081514/56649cd95503460f949a2cd8/html5/thumbnails/74.jpg)
74Introduction to Databases, B. Ludaescher & K. Lin
Denormalization interferes with dependency preservation
Course-Offering (offering#, quarter, c#, cname, instructor, room, time,
days)
Note that having the FD instructor room (and nothing else) in a single table allows the FD to be enforced (by enforcing the candidate key of instructor).
Instructor-Room (instructor, room)
Denormalization