ghislain fourny information systems for engineers fall 2018 f… · ahv fiscalresidence varchar(13)...
TRANSCRIPT
Database System
Database
Database Management System (DBMS)
Database system(Database + Database Management System)
Data Independence (Edgar Codd)
Lorem Ipsum
Dolor sit amet
Consectetur
Adipiscing
Elit. In
Imperdiet
Ipsum ante
Physical storage
Logical data model
Relations (the math, for database scientists)A relation R is made of
1. A set of attributes
𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠) ⊆ 𝕊
2. An extension (set of tuples)
𝐸𝑥𝑡𝑒𝑛𝑠𝑖𝑜𝑛) ⊆ 𝕊 ↛ 𝕍
such that:
∀𝑡 ∈ 𝐸𝑥𝑡𝑒𝑛𝑠𝑖𝑜𝑛), support t = 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠)
𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠)
𝐸𝑥𝑡𝑒𝑛𝑠𝑖𝑜𝑛)
Summary of standardized SQL typesKind Types
Characterchar(p)varchar(p)clob
Binarybinary(p)varbinary(p)blob
Number, exact
numeric(p,s)decimal(p,s)smallintinteger(p)bigint
Number, approximatefloat(p)realdouble precision
Boolean boolean
Date and time
datetimetime with time zonetimestamptimestamp with time zone
Intervals interval year to monthinterval day to second
Primary key / Foreign key
ahv name
varchar(13) text
1234567890123 Albert Einstein
1234567890124 Kurt Gödel
1234567890125 Alan Turing
persons
ahv fiscal residence
varchar(13) char(2)
1234567890123 CH
1234567890124 AT
1234567890126 GB
taxes
SelectionProjection
Filter queries
Cartesian productNatural joinTheta join
Binary queries
Relation renamingAttribute renaming
Renaming queries
UnionIntersectionSubstraction
Set queries
Relational algebra
Languages
Imperative languages Declarative languages
Here's how to do what I want.
Here's whatI want.
CJavaC++
Python...
HaskellCAMLSQL
XQuery...
SQL
SELECT century AS cFROM personsGROUP BY centuryHAVING COUNT(*) > 2
name middle_initial last_name century captain
varchar(30) char(1) text integer boolean
James T Kirk 23 TRUE
Beverly C Crusher 24 FALSE
Jean-Luc NULL Picard 24 TRUE
Kathryn NULL Janeway 24 TRUE
persons
Full outer joins
SELECT *FROM persons FULL OUTER JOIN spaceships
ON persons.name = spaceships.captain_name
first_name middle_initial last_name century captain
varchar(30) char(1) text integer boolean
James T Kirk 23 TRUE
Beverly C Crusher 24 FALSE
Kathryn NULL Janeway 24 TRUE
persons
warp spaceship_name captain_name code
numeric varchar(30) text integer
5 USS Enterprise A Kirk NCC-1701-A
4 USS Enterprise Kirk NCC-1701
9.2 USS Enterprise D Picard NCC-1701-D
9.975 USS Voyager Janeway NCC-74656
spaceships
first_name middle_initial last_name captain century warp spaceship_name captain_name code
varchar(30) char(1) text boolean integer numeric varchar(30) text integer
James T Kirk TRUE 23 5 USS Enterprise A Kirk NCC-1701-A
James T Kirk TRUE 23 4 USS Enterprise Kirk NCC-1701
NULL NULL NULL NULL NULL 9.2 USS Enterprise D Picard NCC-1701-D
Beverly C Crusher 24 FALSE NULL NULL NULL NULL
Kathryn NULL Janeway TRUE 24 9.975 USS Voyager Janeway NCC-74656
Database Design Theory
sales
product customer quantity
varchar(30) text date
Phone John 1
Phone Peter 2
Phone Mary 1
Laptop John 3
Laptop Mary 1
HDTV Mary 2
products
product price
varchar(30) char(1)
Phone 800
Laptop 2000
HDTV 1000
USB Stick 10
customers
customer
text
John
Peter
Mary
Bill
Integritysales
product price customer quantity
varchar(30) numeric text integerSche
ma
All sets of partial functions
Tables with the desired domains[domain integrity]
Tables with further constraints[check/unique integrity]
Normal forms
Tables with the desired attributes[attribute integrity]
𝒫 (𝕊 ⇸ 𝕍) All tables(identical support)
[fundamental integrity]𝕋
Functional dependency
sales
A1 A2 C B1 B2
text text text boolean integer
foo bar lorem true 1
foo bar ipsum true 1
bar foobar false 1
bar foobar true 1
= == ≠
(A1,A2)⟶R (B1,B2)
Closure of a set of attributes: example
{ a, b }
a⟶ca,b⟶ca,b⟶da,c⟶ba,d⟶ef,g⟶ha,b,c⟶da,b,c⟶ii⟶ji⟶kc,j⟶l
{ a, b }+ = { a,b,c,d,e,i,j,k,l}
a,b⟶a,b,c,d,e,i,j,k,l
Boyce-Codd Normal Form
A relation R is in Boyce-Codd Normal Form if
for any non-trivial Functional Dependency F ⟶R G
F is a super-key.
A C Da foo truea bar falseb foobar true
(FDs such as AC⟶RAcanstillholdbutaretrivial,andirrelevant)
IndicesOn intuitionistic arithmetic
and number theory
Drawn from the Phenomena of Capillarity
A Theory of the Foundations of Thermodynamics
An Example of a New Type of Cosmological Solutions of Einstein's
Field Equations of Gravitation
Mechanical Intelligence
Mathematical Logic
On the Possibility of a New Test of the Relativity Principle
On the Electrodynamics of Moving Bodies
On the General Molecular Theory of Heat
On the Theory of Brownian Motion
The Theory of Relativity
Data models
Relational model (SQL) Arborescent model(XML, JSON)
Graph model (RDF, ...)Object-oriented model (programming)
NoSQL
Data cube model(OLAP)
In this course
Slicers and Dicers
2014 2015 2016
Peter 1,000,000$ 1,500,000$ 1,400,000$
Mary 2,000,000$ 2,300,000$ 2,200,000$
Servers
World
USD
Dicers
Slicers
Materializing a full (rolled up) cube(SELECT t.Year, p.Brand, SUM(s.Quantity)FROM Sales s, Time t, Product p
WHERE s.Date = t.DateAND s.Product = p.Name
GROUP BY CUBE (t.Year, p.Brand)
Year Brand Quantity
2017 Apple 6
2017 Samsung 5
2016 Apple 9
2016 Samsung 5
2015 Apple 5
2015 Samsung 4
2017 NULL 11
2016 NULL 14
2015 NULL 9
NULL Apple 20
NULL Samsung 14
NULL NULL 34
Database Architecture
Memory (RAM)
Disk (Secondary storage)
Tapes, DVDs (Tertiary storage)
Cache (CPU), level 1 and 2
Exam Material
Slides Recordings Textbook
Theoretical exercises Practical exercises Clicker questions
Big Data Lectures
Fall 2018 Spring 2019
Computer ScienceData ScienceCBB MSc with CS background
Other departments
Big Data
Information Systems for Engineers Big Data (for Engineers)
Data Structure
Lorem ipsum dolor sit amet, consecteturadipiscing elit. Etiam vel erat nec duialiquet vulputate sed quis nulla. Doneceget ultricies magna, eu dignissim elit.Nullam sed urna nec nisl rhoncusullamcorper placerat et enim. Integervarius ornare libero quis consequat. Loremipsum dolor sit amet, consecteturadipiscing elit. Aenean eu efficitur orci.Aenean ac posuere tellus. Ut id commodoturpis.
Unstructureddata
Semi-structureddata
Structureddata
What is Big Data (my definition)?
Big Data is a portfolio of technologies that were designed to
store, manage and analyze data that istoo large to fit on a single machine
while accommodating for the issue of
growing discrepancy betweencapacity, throughput and latency.
§ Data in the large§ Key-value stores (S3, DynamoDB)§ Distributed file systems (HDFS)§ Distributed query processing (MapReduce, Spark)§ Resource management (YARN)§ Column stores (HBase)
§ Data in the small§ Document stores (MongoDB)§ Syntax (XML, JSON)§ Data models, Schemas, Querying
§ Data in the very small§ Graph databases (RDF)
Lecture Overview