1 cs351 introduction to database systems quan.wang@wsu.edu

Post on 27-Dec-2015

223 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

CS351 Introduction to Database systems

quan.wang@wsu.edu

2

About me

Quan Wang– developer at Oracle Corp.

• JDBC• Web Services• Cloud

3

Content

Introduction– Why databases?

Data Model

4

Why databases?

It used to be about boring stuff: employee records, bank records, etc.

Today, the field covers all the largest sources of data, with many new ideas. Web search. Data mining. Scientific and medical databases. Integrating information.

5

Why databases?

You may not notice it, but databases are behind almost everything you do on the Web. Google searches. Queries at Amazon, eBay, etc.

6

Why databases?

Databases often have unique concurrency-control problems. Many activities (transactions) at the

database at all times. Must not confuse actions, e.g., two

withdrawals from the same account must each debit the account.

7

Why databases?

Changing landscape System R: Oracle, MySQL, IBM, MS NoSQL: MangoDB NewSQL: VoltDB Cloud: Amazon RDS, Google Bigquery Bigdata: Hadoop, MapR

8

Why databases?

Job perspectives– Startups– Massive industry

9

Content

Introduction– Why database?– What to cover?

Data Model

10

What to cover?

Database – a set of inter-related data

pertaining to some organization– airline, university, store

Database Management System (DBMS)

– one or more databases – algorithms accessing the databases

11

12

What to cover?

Database Theory– relational model and algebra– norm forms

Design of databases.– normalization– E/R, UML

13

What to cover?

Database programming– SQL, stored procedures.

Integration?– ODBC, JDBC– O-R Mapping

14

What not covered?

Not covered– DB tuning– DB administration(DBA)– DB implementation– Programming languages

• May need to learn languages and the DB related aspects

15

Content

Introduction– Why database?– What to cover?– Logistics

Data Model

16

Logistics

Email: quan.wang@wsu.edu Office hour

– Tuesday & Thursday – 1 pm to 2pm– 201L

17

Logistics

No TA No detailed comments on

assignments Distribute answer keys when

appropriate Check discrepancies Raise issues

18

Logistics

Course page: – Http://encs.vancouver.wsu.edu/~

quwang– weekly schedule, assignments

Submission page– http://lms.wsu.edu – Use drop box for assignments and

project submission

19

Logistics Assignments 30% Team project 20% Midterms 25% Final 25%

20

Logistics

Team project– mini-ebay– Max 3 people each team– Single grade for each team

Assignments and project – Late submission 10% penalty

21

Logistics

Databases– MySQL– SQLite

• /usr/bin/sqlite3

22

Logistics

Exams– Lectures, textbook, and assigned

readings– Only areas covered in the

lectures– Not curved

23

Content

Introduction Why database? What to cover? Logistics

Data Model– Why not flat files?

24

Why not flat files?

datasavings.dat

NAME,ADDR,BALANCE,OPENED,INTEREST

David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!

Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

operations – check balance– deposit or withdraw– change address

25

Why not flat files?

dataDavid, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!

Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

problems – code coupled with data format

• changing delimiter would require code change

26

Why not flat files?

dataSavings.data:

David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!

Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

Checkings.data:

David, 789 NE Sany Blvd, 1500, 08/20/1998, 0.001!

Lisa, 432 NE Alberta St, 3500, 09/20/1990, 0

problems – redundancy leads to

inconsistency

27

Why not flat files?

dataDavid, 123 NW Flanders St, 500.75, 08/20/2010, 0.005

Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

problems – concurrency

• withdraw calls for file lock– withdraw $100 from the same account at

the same time, synchronized

– withdraw $100 from different accounts at the same time, synchronized unnecessarily

28

Why not flat files?

dataDavid, 123 NW Flanders St, 500.75, 08/20/2010, 0.005

Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

problems – user interface

• file based algorithms

29

Why not flat files?

summary – code coupled with data– redundancy leading to

inconsistency – low concurrency– unfriendly user interface

30

Why not flat files?

database as abstraction data model hides data format SQL hides algorithms runtime handles concurrency

31

Content

Introduction Data Model

– why not flat files?– relational data model

• relations• schema• keys

32

What is a Data Model?

1. Underlying structure of a database.2. Mathematical representation of data.

Examples: relational model = tables; semistructured model = trees/graphs; key-value model=key value pairs.

3. Operations on data.4. Constraints.

33

A Relation is a Table

name manfWinterbrew Pete’sBud Lite Anheuser-Busch

Beers

Attributes(columnheaders)

Tuples(rows)

Relation name

34

Schemas

Relation schema = relation name and attribute list. Optionally: types of attributes. Example: Beers(name, manf) or

Beers(name: string, manf: string) Database = collection of relations. Database schema = set of all

relation schemas in the database.

35

Why Relations?

Very simple model. Often matches how we think

about data. Abstract model that underlies SQL,

the most important database language today.

37

Example

Sells( bar, beer, price ) Bars( name, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors3.00

38

Database Schemas in SQL

SQL is primarily a query language, for getting information from a database.

But SQL also includes a data-definition component, DDL, for describing database schemas.

39

Creating (Declaring) a Relation

Simplest form is:CREATE TABLE <name> (<list of elements>);

To delete a relation:DROP TABLE <name>;

40

Elements of Table Declarations

Most basic element: an attribute and its type.

The most common types are: INT or INTEGER (synonyms). REAL or FLOAT (synonyms). CHAR(n ) = fixed-length string of n

characters. VARCHAR(n ) = variable-length string

of up to n characters.

41

Example: Create Table

CREATE TABLE Sells (

bar CHAR(20),

beer VARCHAR(20),

price REAL

);

42

SQL Values

Integers and reals are represented as you would expect.

Strings are too, except they require single quotes. Two single quotes = real quote, e.g., ’Joe’’s Bar’.

Any value can be NULL.

43

Dates and Times

DATE and TIME are types in SQL. The form of a date value is:

DATE ’yyyy-mm-dd’ Example: DATE ’2007-09-30’ for

Sept. 30, 2007.

44

Times as Values

The form of a time value is:TIME ’hh:mm:ss’

with an optional decimal point and fractions of a second following. Example: TIME ’15:30:02.5’ = two

and a half seconds after 3:30PM.

45

Declaring Keys

An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE.

Either says that no two tuples of the relation may agree in all the attribute(s) on the list.

There are a few distinctions to be mentioned later.

46

Declaring Single-Attribute Keys

Place PRIMARY KEY or UNIQUE after the type in the declaration of the attribute.

Example:CREATE TABLE Beers (

name CHAR(20) UNIQUE,

manf CHAR(20)

);

47

Declaring Multiattribute Keys

A key declaration can also be another element in the list of elements of a CREATE TABLE statement.

This form is essential if the key consists of more than one attribute. May be used even for one-attribute

keys.

48

Example: Multiattribute Key

The bar and beer together are the key for Sells:CREATE TABLE Sells (

bar CHAR(20),

beer VARCHAR(20),

price REAL,

PRIMARY KEY (bar, beer)

);

49

PRIMARY KEY vs. UNIQUE

1. There can be only one PRIMARY KEY for a relation, but several UNIQUE attributes.

2. No attribute of a PRIMARY KEY can ever be NULL in any tuple. But attributes declared UNIQUE may have NULL’s, and there may be several tuples with NULL.

50

Content

Introduction Data Model

– why not flat files?– relational data model

relations schema keys

– why relations?

51

Why relations?

datasavings.dat

NAME,ADDR,BALANCE,OPENED,INTEREST

David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!

Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

52

Terminalogy

DDL: Data Definition Language CREATE TABLE

DML: Data Manipulation Language INSERT

SQL: Structured Query Language SELECT...FROM...WHERE

53

Why relations?

schema Savings( name CHAR(50),

addr CHAR(50),

balance REAL,

opened DATE,

interest REAL);

54

Why relations?

relation

operations – check balance– deposit or withdraw– change address

name addr balance opened interest

David 123 NW Flander St 500.75 2010 0.005

Lisa 432 Alberta St 4000.00 2001 0.006

55

Why relations?

relation

advantage – code coupled with data format?

name addr balance opened interest

David 123 NW Flander St 500.75 2010 0.005

Lisa 432 Alberta St 4000.00 2001 0.006

name addr balance opened interest

David 123 NW Flander St 500.75 2010 0.005

Lisa 432 Alberta St 4000.00 2001 0.006

56

Why relations?

relation

advantage – concurrency

• simultaneous withdraw calls

name addr balance opened interest

David 123 NW Flander St 500.75 2010 0.005

Lisa 432 Alberta St 4000.00 2001 0.006

57

Why relations?

relations

advantage – redundancy?

Savings(name, addr, balance, opened, interest)

Checkings(name, addr, balance, opened, interest)

58

Why relations?

relations

advantage – redundancy?

name addr balance opened interest

David 123 NW Flander St 500.75 2010 0.005

Lisa 432 Alberta St 4000.00 2001 0.006

name addr balance opened interest

David 123123 Sandy Blvd 1500 1998 0.001

Lisa 432 Alberta St 3500 1990 0

Savings

Checkings

59

Why relations?

relations

advantage – redundancy?

Customers(cust_id, name, addr)

Savings(cust_id, balance, opened, interest)

Checkings(cust_id, balance, opened, interest)

60

Why relations?

relations

advantage – redundancy?

cust_id name addr

c1 David 123 NW Flander St

c2 Lisa 432 Alberta Stcust_id

balance opened interest

c1 1200 1998 0.001

c2 3500 1990 0

cust_id

balance opened interest

c1 500.75 2010 0.005

c2 4000.00 2001 0.006

Customers

Savings

Checkings

61

Why relations?

relation

advantage – user interface

• natural language like support desirable

name addr balance opened interest

David 123 NW Flander St 500.75 2010 0.005

Lisa 432 Alberta St 4000.00 2001 0.006

62

Why relations?

relation

advantage – user interface SELECT name, addr

FROM savingsWHERE balance>2000

NAME ADDR BALANCE OPENED INTEREST

David 123 NW Flander St

500.75 2010 0.005

Lisa 432 Alberta St

4000.00 2001 0.006

63

Why relations?

relation

advantage – user interface

SELECT name, addr, balance FROM savingsORDER BY balance

NAME ADDR BALANCE OPENED INTEREST

David 123 NW Flander St

500.75 2010 0.005

Lisa 432 Alberta St

4000.00 2001 0.006

64

Content

Introduction Data Model

– why not flat files?– relational data model– why relations?

65

Content

Introduction Data Model

– why not flat files?– relational data model– why relations?– semi-structured data models

• XML and JSON

66

<Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author>Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <Publisher>Springer</Publisher></Book>

XML

67

JSON{ "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" }}

68

XML and JSON, side-by-side

<Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author>Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <Publisher>Springer</Publisher></Book>

{ "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" }}

69

XML Schema<xs:element name="Book"> <xs:complexType> <xs:sequence> <xs:element name="Title" type="xs:string" /> <xs:element name="Authors"> <xs:complexType> <xs:sequence> <xs:element name="Author" type="xs:string" maxOccurs="5"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Date" type="xs:gYear" /> <xs:element name="Publisher" minOccurs="0"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Springer" /> <xs:enumeration value="MIT Press" /> <xs:enumeration value="Harvard Press" /> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType></xs:element>

70

JSON Schema{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

71

String type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

<xs:element name="Title" type="xs:string" />

72

List type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

<xs:element name="Authors"> <xs:complexType> <xs:sequence> <xs:element name="Author" type="xs:string" maxOccurs="5"/> </xs:sequence> </xs:complexType></xs:element>

73

Year type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

<xs:element name="Date" type="xs:gYear" />

74

Enumeration type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

<xs:element name="Publisher" minOccurs="0"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Springer" /> <xs:enumeration value="MIT Press" /> <xs:enumeration value="Harvard Press" /> </xs:restriction> </xs:simpleType></xs:element>

75

Semi-Structured Data Models

XML Adoption– Web Services (SOAP)– XML DB

JSON Adoption– Web Services (REST)– MongoDB

76

Content

Introduction Data Model

– why not flat files?– relational data model– why relations?– semi-structured data models– history

77

History Hierarchical (IMS): late 1960’s and 1970’s Network (CODASYL): 1970’s Relational: 1970’s and early 1980’s Entity-Relationship: 1970’s Object-oriented: late 1980’s and early 1990’s

Object-relational: late 1980’s and early 1990’s Semi-structured (XML): late 1990’s to the present

top related