neu 221: neuroinformatics seminar introduction to databases bertram ludäscher ludaesch@sdsc.edu san...
Post on 02-Jan-2016
221 Views
Preview:
TRANSCRIPT
NEU 221: Neuroinformatics Seminar Introduction to Databases
Bertram Ludäscherludaesch@sdsc.edu
San Diego Supercomputer CenterU.C. San Diego
2 B. Ludäscher: Introduction to Databases
Outline
• What is a DB and why should I care? • DB Basics & Architecture• Relational Model (SQL, ER)• Extended/Other Models
– Deductive Databases
– Object-Oriented Databases
– Semistructured/Graph-Databases
3 B. Ludäscher: Introduction to Databases
What is a Database?
• The term database can stand for ...– a concrete collection of data (books@amazon, CCDB@NCMIR) – a system (software & hardware) for storing and managing databases
(=> Database Management System: DBMS + DB)
• Underlying data model => Type of DBMS (short: DB) – relational model: based on relations (“tables”) and entities– object-oriented model: complex objects, classes – object-relational model: relations + objects– XML: “semistructured” model, trees
• Specialized/extended models– deductive DBs– multimedia DBs– GIS (Geographic Information Systems)
4 B. Ludäscher: Introduction to Databases
Functions of a DBMS (aka what does it buy me?)
• Persistent Data Storage – but don’t forget to backup!
• Efficient & High-Level Querying of Very Large Datasets– file systems + your homegrown “scans” won’t do for VLDBs!!
• Same for Updates: insert, delete, and modify• Data Integrity, Security
– Checking/enforcement of integrity constraints
– Access control
• Concurrent (multi-user) Access, Transactions, Recovery• Robust, Scalable Data Management Solutions
5 B. Ludäscher: Introduction to Databases
3-Level ANSI/SPARC Database Architecture
• external (user) level • conceptual (logical) level • internal (physical) level
=> Data independence– logical data independence
– physical data independence
View -1 View -2 View -n
physical schema
conceptual/logical schema
6 B. Ludäscher: Introduction to Databases
Concurrency Control
• Concurrent execution of simultaneous requests – long before web servers where around...
– transaction management guarantees consistency despite concurrent/interleaved execution
• Transactions= Sequence of DB operations (read/write)
– Atomicity: a transaction is executed completely or not at all
– Consistency: a transaction creates a new consistent DB state, i.e., in which all integrity constraints are maintained
– Isolation: to the user, a transaction seems to run in isolation
– Durability: the effect of a successful (“committed”) transaction remains even after system failure
7 B. Ludäscher: Introduction to Databases
The Relational Model
• Relation/Table Name:– employee, dept
• Attributes = Column Names:– Emp, Salary, Deptno, Name,
Mgr
• Relational Schema:– employee(Emp:string,
Salary:float, DeptNo:integer), ...
• Tuple = Row of the table:– (“anne”, “62000”, “2”)
• Relation = Set of tuples:– {(...), (...), ...}
Emp Salary Deptnojohn 60k 1anne 62k 2bob 57k 1jane 45k 3
employee
DeptNo NameMgr 1 Toys anne2 Sales anne3 Shoes tim
dept
8 B. Ludäscher: Introduction to Databases
Database Design: Entity-Relationship (ER) Model
• Entities:• Relationships:• Attributes:• ER Model:
– initial, high-level DB design (conceptual model)– easy to map to a relational schema (database tables)– comes with more constraints (cardinalities, aggregation) and extensions: EER
(is-a => class hierarchies)– related: UML (Unified Modeling Language) class diagrams
Employee Departmentworks-for
Name Salary ManagerName
since
9 B. Ludäscher: Introduction to Databases
Example: Creating a Relational Database in SQL
CREATE TABLE employee (
ssn CHAR(11),
name VARCHAR(30),
deptNo INTEGER,
PRIMARY KEY (ssn),
FOREIGN KEY (deptNo) REFERENCES department )
CREATE TABLE department (
deptNo INTEGER,
name VARCHAR(20),
manager CHAR(11),
PRIMARY KEY (deptNo),
FOREIGN KEY (manager) REFERENCES employee(ssn) )
10 B. Ludäscher: Introduction to Databases
Important Relational Operations
• Select(Relation, Condition) – filter rows of a table wrt. a condition
• Project(Relation, Attributes) – keep the columns of interest
• Join(Rel1, Att1, Rel2, Att2, Condition) – find “matches” in a “related” table
– e.g. match Rel1.foreign key = Rel2.primary key
• Union (“OR”), Intersection (“AND”)• Set-Difference (“NOT IN”)
11 B. Ludäscher: Introduction to Databases
Why (Declarative) Query Languages?
• Things we talk and think about in PLs and QLs– Assembly languages: registers, memory locations, jumps, ...
– C: if-then-else, for, while, memory (de-)allocation, pointers, ...
– Object-oriented languages:• C++: C plus objects, methods, classes, ...
• Java: objects, methods, classes, references, ...
• Smalltalk: objects, objects, objects, ...
• OQL: object-query language
– Functional languages (Haskell, ML): • (higher-order) mappings, recursion/induction, patterns, ...
=> Relational languages (SQL, Prolog)• relations, relational operations: , , , , ..., ,,,,,..., , ,
=> Semistructured/XML (Tree) & Graph Query Languages
,,Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt.”“The limits of my language mean the limits of my world.”
Ludwig Wittgenstein, Tractatus Logico-Philosophicus
“If you have a hammer, everything looks like a nail.”
12 B. Ludäscher: Introduction to Databases
Example: Querying a Relational Database
Emp Salary Deptnoanne 62k 2john 60k 1
employee
DeptNoMgr
1 anne2 anne
dept
SELECT Emp, MgrFROM employee, deptWHERE employee.DeptNo = dept.DeptNo
Emp Mgrjohn anneanne anne
result
join
input tables
SQL query (or view def.)
answer (or view)
13 B. Ludäscher: Introduction to Databases
Query Languages for Relational Databases
14 B. Ludäscher: Introduction to Databases
Deductive Databases (DATALOG) Syntax
15 B. Ludäscher: Introduction to Databases
DATALOG: Examples for Relational Operations
16 B. Ludäscher: Introduction to Databases
Recursive DATALOG Example: Transitive Closure
17 B. Ludäscher: Introduction to Databases
Non-Relational Datamodels• Relational model is “flat”: atomic data values
– extension: nested relational model (“tables within tables”, cf. nested HTML tables)
– values can be nested lists {...}, tuples (...), sets [...]– ISO standard(s): SQL– identity is value based
• Object-oriented data model:– complex (structured) objects with object-identity (oid)– class and type hierarchies (sub-/superclass, sub-/supertype)– OODB schema may be very close to “world model” (no translation
into tables)(+) queries fit your OO schema
(-) (new) queries that don’t fit nicely
– ODMG standard, OQL (Object Query Language)
18 B. Ludäscher: Introduction to Databases
Example: Object Query Language (OQL)
• Q: what does this OQL query compute?• Note the use of path expressions like e.manager.children
=> Semistructured/Graph Databases
SELECT DISTINCT STRUCT( E: e.name, C: e.manager.name, M: ( SELECT c.name FROM c IN e.children
WHERE FOR ALL d IN e.manager.children: c.age > d.age ) ) FROM e IN Employees;
19 B. Ludäscher: Introduction to Databases
A Semistructured (Graph) Database
20 B. Ludäscher: Introduction to Databases
Querying Graphs with OO-Path Expressions
?- dblp."Inf. Systems".L.P, substr("Volume",L), P : person.spouse[lives_in = P.lives_in].
?- dblp."Inf. Systems".L."Michael E. Senko".Answer:
L="Volume 1, 1975”;L="Volume 5, 1980".
21 B. Ludäscher: Introduction to Databases
Constructs for Querying Graphs
Example: ?- dblp . any* . (if(vldb)| if(sigmod))
top related