ghislain fourny information systems for engineers fall 2018 f… · ahv fiscalresidence varchar(13)...

37
Ghislain Fourny Information Systems for Engineers Fall 2018 11. Outlook

Upload: others

Post on 09-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Ghislain Fourny

Information Systems for Engineers Fall 201811. Outlook

What we learned

Database System

Database

Database Management System (DBMS)

Database system(Database + Database Management System)

Data Independence (Edgar Codd)

Lorem Ipsum

Dolor sit amet

Consectetur

Adipiscing

Elit. In

Imperdiet

Ipsum ante

Physical storage

Logical data model

Relations (the math, for database scientists)A relation R is made of

1. A set of attributes

𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠) ⊆ 𝕊

2. An extension (set of tuples)

𝐸𝑥𝑡𝑒𝑛𝑠𝑖𝑜𝑛) ⊆ 𝕊 ↛ 𝕍

such that:

∀𝑡 ∈ 𝐸𝑥𝑡𝑒𝑛𝑠𝑖𝑜𝑛), support t = 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠)

𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠)

𝐸𝑥𝑡𝑒𝑛𝑠𝑖𝑜𝑛)

Summary of standardized SQL typesKind Types

Characterchar(p)varchar(p)clob

Binarybinary(p)varbinary(p)blob

Number, exact

numeric(p,s)decimal(p,s)smallintinteger(p)bigint

Number, approximatefloat(p)realdouble precision

Boolean boolean

Date and time

datetimetime with time zonetimestamptimestamp with time zone

Intervals interval year to monthinterval day to second

DDL on tables

+

-

Create a table

Drop a table

Modify a table

*

Data Manipulation Language

Inserting tuples

Deleting tuples

Updating tuples

Querying tuples

Primary key / Foreign key

ahv name

varchar(13) text

1234567890123 Albert Einstein

1234567890124 Kurt Gödel

1234567890125 Alan Turing

persons

ahv fiscal residence

varchar(13) char(2)

1234567890123 CH

1234567890124 AT

1234567890126 GB

taxes

SelectionProjection

Filter queries

Cartesian productNatural joinTheta join

Binary queries

Relation renamingAttribute renaming

Renaming queries

UnionIntersectionSubstraction

Set queries

Relational algebra

Joining

AAB

A

B

ABA

Grouping

A

B

C

D

Sorting

12345

Languages

Imperative languages Declarative languages

Here's how to do what I want.

Here's whatI want.

CJavaC++

Python...

HaskellCAMLSQL

XQuery...

SQL

SELECT century AS cFROM personsGROUP BY centuryHAVING COUNT(*) > 2

name middle_initial last_name century captain

varchar(30) char(1) text integer boolean

James T Kirk 23 TRUE

Beverly C Crusher 24 FALSE

Jean-Luc NULL Picard 24 TRUE

Kathryn NULL Janeway 24 TRUE

persons

Full outer joins

SELECT *FROM persons FULL OUTER JOIN spaceships

ON persons.name = spaceships.captain_name

first_name middle_initial last_name century captain

varchar(30) char(1) text integer boolean

James T Kirk 23 TRUE

Beverly C Crusher 24 FALSE

Kathryn NULL Janeway 24 TRUE

persons

warp spaceship_name captain_name code

numeric varchar(30) text integer

5 USS Enterprise A Kirk NCC-1701-A

4 USS Enterprise Kirk NCC-1701

9.2 USS Enterprise D Picard NCC-1701-D

9.975 USS Voyager Janeway NCC-74656

spaceships

first_name middle_initial last_name captain century warp spaceship_name captain_name code

varchar(30) char(1) text boolean integer numeric varchar(30) text integer

James T Kirk TRUE 23 5 USS Enterprise A Kirk NCC-1701-A

James T Kirk TRUE 23 4 USS Enterprise Kirk NCC-1701

NULL NULL NULL NULL NULL 9.2 USS Enterprise D Picard NCC-1701-D

Beverly C Crusher 24 FALSE NULL NULL NULL NULL

Kathryn NULL Janeway TRUE 24 9.975 USS Voyager Janeway NCC-74656

Database Design Theory

sales

product customer quantity

varchar(30) text date

Phone John 1

Phone Peter 2

Phone Mary 1

Laptop John 3

Laptop Mary 1

HDTV Mary 2

products

product price

varchar(30) char(1)

Phone 800

Laptop 2000

HDTV 1000

USB Stick 10

customers

customer

text

John

Peter

Mary

Bill

Integritysales

product price customer quantity

varchar(30) numeric text integerSche

ma

All sets of partial functions

Tables with the desired domains[domain integrity]

Tables with further constraints[check/unique integrity]

Normal forms

Tables with the desired attributes[attribute integrity]

𝒫 (𝕊 ⇸ 𝕍) All tables(identical support)

[fundamental integrity]𝕋

Functional dependency

sales

A1 A2 C B1 B2

text text text boolean integer

foo bar lorem true 1

foo bar ipsum true 1

bar foobar false 1

bar foobar true 1

= == ≠

(A1,A2)⟶R (B1,B2)

Closure of a set of attributes: example

{ a, b }

a⟶ca,b⟶ca,b⟶da,c⟶ba,d⟶ef,g⟶ha,b,c⟶da,b,c⟶ii⟶ji⟶kc,j⟶l

{ a, b }+ = { a,b,c,d,e,i,j,k,l}

a,b⟶a,b,c,d,e,i,j,k,l

Minimal basis: example

a⟶cb⟶da,b⟶ea,b⟶f

a⟶ca,b,c⟶c,d

b⟶da⟶a

a,b,c,d⟶a,d,e,fa,b⟶b

Boyce-Codd Normal Form

A relation R is in Boyce-Codd Normal Form if

for any non-trivial Functional Dependency F ⟶R G

F is a super-key.

A C Da foo truea bar falseb foobar true

(FDs such as AC⟶RAcanstillholdbutaretrivial,andirrelevant)

Impossibility triangle

Boyce-Codd normal form

Lossless join Dependency preserving

Three-tier stack

Database

Database servers

Application servers

Web servers

Internet

IndicesOn intuitionistic arithmetic

and number theory

Drawn from the Phenomena of Capillarity

A Theory of the Foundations of Thermodynamics

An Example of a New Type of Cosmological Solutions of Einstein's

Field Equations of Gravitation

Mechanical Intelligence

Mathematical Logic

On the Possibility of a New Test of the Relativity Principle

On the Electrodynamics of Moving Bodies

On the General Molecular Theory of Heat

On the Theory of Brownian Motion

The Theory of Relativity

Data models

Relational model (SQL) Arborescent model(XML, JSON)

Graph model (RDF, ...)Object-oriented model (programming)

NoSQL

Data cube model(OLAP)

In this course

Slicers and Dicers

2014 2015 2016

Peter 1,000,000$ 1,500,000$ 1,400,000$

Mary 2,000,000$ 2,300,000$ 2,200,000$

Servers

World

USD

Dicers

Slicers

Materializing a full (rolled up) cube(SELECT t.Year, p.Brand, SUM(s.Quantity)FROM Sales s, Time t, Product p

WHERE s.Date = t.DateAND s.Product = p.Name

GROUP BY CUBE (t.Year, p.Brand)

Year Brand Quantity

2017 Apple 6

2017 Samsung 5

2016 Apple 9

2016 Samsung 5

2015 Apple 5

2015 Samsung 4

2017 NULL 11

2016 NULL 14

2015 NULL 9

NULL Apple 20

NULL Samsung 14

NULL NULL 34

Database Architecture

Memory (RAM)

Disk (Secondary storage)

Tapes, DVDs (Tertiary storage)

Cache (CPU), level 1 and 2

Exam Material

Slides Recordings Textbook

Theoretical exercises Practical exercises Clicker questions

Big Data Lectures

Fall 2018 Spring 2019

Computer ScienceData ScienceCBB MSc with CS background

Other departments

Big Data

Information Systems for Engineers Big Data (for Engineers)

Data Structure

Lorem ipsum dolor sit amet, consecteturadipiscing elit. Etiam vel erat nec duialiquet vulputate sed quis nulla. Doneceget ultricies magna, eu dignissim elit.Nullam sed urna nec nisl rhoncusullamcorper placerat et enim. Integervarius ornare libero quis consequat. Loremipsum dolor sit amet, consecteturadipiscing elit. Aenean eu efficitur orci.Aenean ac posuere tellus. Ut id commodoturpis.

Unstructureddata

Semi-structureddata

Structureddata

2000s: The NoSQL Era

foo

bar

foobar

Key-value stores

Triple stores

Column stores Document stores

The Three VsBi

g D

ata

Volume

Variety

Velocity

TB ZB

What is Big Data (my definition)?

Big Data is a portfolio of technologies that were designed to

store, manage and analyze data that istoo large to fit on a single machine

while accommodating for the issue of

growing discrepancy betweencapacity, throughput and latency.

Lecture scope

DatabasesMachine Learning

AI

§ Data in the large§ Key-value stores (S3, DynamoDB)§ Distributed file systems (HDFS)§ Distributed query processing (MapReduce, Spark)§ Resource management (YARN)§ Column stores (HBase)

§ Data in the small§ Document stores (MongoDB)§ Syntax (XML, JSON)§ Data models, Schemas, Querying

§ Data in the very small§ Graph databases (RDF)

Lecture Overview