1 cs351 introduction to database systems quan.wang@wsu.edu

CS351 Introduction to Database systems

quan.wang@wsu.edu

About me

Quan Wang– developer at Oracle Corp.

• JDBC• Web Services• Cloud

Content

Introduction– Why databases?

Data Model

Why databases?

It used to be about boring stuff: employee records, bank records, etc.

Today, the field covers all the largest sources of data, with many new ideas. Web search. Data mining. Scientific and medical databases. Integrating information.

Why databases?

You may not notice it, but databases are behind almost everything you do on the Web. Google searches. Queries at Amazon, eBay, etc.

Why databases?

Databases often have unique concurrency-control problems. Many activities (transactions) at the

database at all times. Must not confuse actions, e.g., two

withdrawals from the same account must each debit the account.

Why databases?

Changing landscape System R: Oracle, MySQL, IBM, MS NoSQL: MangoDB NewSQL: VoltDB Cloud: Amazon RDS, Google Bigquery Bigdata: Hadoop, MapR

Why databases?

Job perspectives– Startups– Massive industry

Content

Introduction– Why database?– What to cover?

Data Model

What to cover?

Database – a set of inter-related data

pertaining to some organization– airline, university, store

Database Management System (DBMS)

– one or more databases – algorithms accessing the databases

What to cover?

Database Theory– relational model and algebra– norm forms

Design of databases.– normalization– E/R, UML

What to cover?

Database programming– SQL, stored procedures.

Integration?– ODBC, JDBC– O-R Mapping

What not covered?

Not covered– DB tuning– DB administration(DBA)– DB implementation– Programming languages

• May need to learn languages and the DB related aspects

Content

Introduction– Why database?– What to cover?– Logistics

Data Model

Logistics

Email: quan.wang@wsu.edu Office hour

– Tuesday & Thursday – 1 pm to 2pm– 201L

Logistics

No TA No detailed comments on

assignments Distribute answer keys when

appropriate Check discrepancies Raise issues

Logistics

Course page: – Http://encs.vancouver.wsu.edu/~

quwang– weekly schedule, assignments

Submission page– http://lms.wsu.edu – Use drop box for assignments and

project submission

Logistics Assignments 30% Team project 20% Midterms 25% Final 25%

Logistics

Team project– mini-ebay– Max 3 people each team– Single grade for each team

Assignments and project – Late submission 10% penalty

Logistics

Databases– MySQL– SQLite

• /usr/bin/sqlite3

Logistics

Exams– Lectures, textbook, and assigned

readings– Only areas covered in the

lectures– Not curved

Content

Introduction Why database? What to cover? Logistics

Data Model– Why not flat files?

Why not flat files?

datasavings.dat

NAME,ADDR,BALANCE,OPENED,INTEREST

David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!

Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

operations – check balance– deposit or withdraw– change address

Why not flat files?

dataDavid, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!

problems – code coupled with data format

• changing delimiter would require code change

Why not flat files?

dataSavings.data:

Checkings.data:

David, 789 NE Sany Blvd, 1500, 08/20/1998, 0.001!

Lisa, 432 NE Alberta St, 3500, 09/20/1990, 0

problems – redundancy leads to

inconsistency

Why not flat files?

dataDavid, 123 NW Flanders St, 500.75, 08/20/2010, 0.005

problems – concurrency

• withdraw calls for file lock– withdraw $100 from the same account at

the same time, synchronized

– withdraw $100 from different accounts at the same time, synchronized unnecessarily

Why not flat files?

dataDavid, 123 NW Flanders St, 500.75, 08/20/2010, 0.005

problems – user interface

• file based algorithms

Why not flat files?

summary – code coupled with data– redundancy leading to

inconsistency – low concurrency– unfriendly user interface

Why not flat files?

database as abstraction data model hides data format SQL hides algorithms runtime handles concurrency

Content

Introduction Data Model

– why not flat files?– relational data model

• relations• schema• keys

What is a Data Model?

1. Underlying structure of a database.2. Mathematical representation of data.

Examples: relational model = tables; semistructured model = trees/graphs; key-value model=key value pairs.

3. Operations on data.4. Constraints.

A Relation is a Table

name manfWinterbrew Pete’sBud Lite Anheuser-Busch

Attributes(columnheaders)

Tuples(rows)

Relation name

Schemas

Relation schema = relation name and attribute list. Optionally: types of attributes. Example: Beers(name, manf) or

Beers(name: string, manf: string) Database = collection of relations. Database schema = set of all

relation schemas in the database.

Why Relations?

Very simple model. Often matches how we think

about data. Abstract model that underlies SQL,

the most important database language today.

Example

Sells( bar, beer, price ) Bars( name, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors3.00

Database Schemas in SQL

SQL is primarily a query language, for getting information from a database.

But SQL also includes a data-definition component, DDL, for describing database schemas.

Creating (Declaring) a Relation

Simplest form is:CREATE TABLE <name> (<list of elements>);

To delete a relation:DROP TABLE <name>;

Elements of Table Declarations

Most basic element: an attribute and its type.

The most common types are: INT or INTEGER (synonyms). REAL or FLOAT (synonyms). CHAR(n ) = fixed-length string of n

characters. VARCHAR(n ) = variable-length string

of up to n characters.

Example: Create Table

CREATE TABLE Sells (

bar CHAR(20),

beer VARCHAR(20),

price REAL

SQL Values

Integers and reals are represented as you would expect.

Strings are too, except they require single quotes. Two single quotes = real quote, e.g., ’Joe’’s Bar’.

Any value can be NULL.

Dates and Times

DATE and TIME are types in SQL. The form of a date value is:

DATE ’yyyy-mm-dd’ Example: DATE ’2007-09-30’ for

Sept. 30, 2007.

Times as Values

The form of a time value is:TIME ’hh:mm:ss’

with an optional decimal point and fractions of a second following. Example: TIME ’15:30:02.5’ = two

and a half seconds after 3:30PM.

Declaring Keys

An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE.

Either says that no two tuples of the relation may agree in all the attribute(s) on the list.

There are a few distinctions to be mentioned later.

Declaring Single-Attribute Keys

Place PRIMARY KEY or UNIQUE after the type in the declaration of the attribute.

Example:CREATE TABLE Beers (

name CHAR(20) UNIQUE,

manf CHAR(20)

Declaring Multiattribute Keys

A key declaration can also be another element in the list of elements of a CREATE TABLE statement.

This form is essential if the key consists of more than one attribute. May be used even for one-attribute

Example: Multiattribute Key

The bar and beer together are the key for Sells:CREATE TABLE Sells (

bar CHAR(20),

beer VARCHAR(20),

price REAL,

PRIMARY KEY (bar, beer)

PRIMARY KEY vs. UNIQUE

1. There can be only one PRIMARY KEY for a relation, but several UNIQUE attributes.

2. No attribute of a PRIMARY KEY can ever be NULL in any tuple. But attributes declared UNIQUE may have NULL’s, and there may be several tuples with NULL.

Content

– why not flat files?– relational data model

relations schema keys

– why relations?

Why relations?

datasavings.dat

NAME,ADDR,BALANCE,OPENED,INTEREST

Terminalogy

DDL: Data Definition Language CREATE TABLE

DML: Data Manipulation Language INSERT

SQL: Structured Query Language SELECT...FROM...WHERE

Why relations?

schema Savings( name CHAR(50),

addr CHAR(50),

balance REAL,

opened DATE,

interest REAL);

Why relations?

relation

operations – check balance– deposit or withdraw– change address

name addr balance opened interest

David 123 NW Flander St 500.75 2010 0.005

Lisa 432 Alberta St 4000.00 2001 0.006

Why relations?

relation

advantage – code coupled with data format?

Lisa 432 Alberta St 4000.00 2001 0.006

Why relations?

relation

advantage – concurrency

• simultaneous withdraw calls

Lisa 432 Alberta St 4000.00 2001 0.006

Why relations?

relations

advantage – redundancy?

Savings(name, addr, balance, opened, interest)

Checkings(name, addr, balance, opened, interest)

Why relations?

relations

Lisa 432 Alberta St 4000.00 2001 0.006

David 123123 Sandy Blvd 1500 1998 0.001

Lisa 432 Alberta St 3500 1990 0

Savings

Checkings

Why relations?

relations

Customers(cust_id, name, addr)

Savings(cust_id, balance, opened, interest)

Checkings(cust_id, balance, opened, interest)

Why relations?

relations

cust_id name addr

c1 David 123 NW Flander St

c2 Lisa 432 Alberta Stcust_id

balance opened interest

c1 1200 1998 0.001

c2 3500 1990 0

cust_id

balance opened interest

c1 500.75 2010 0.005

c2 4000.00 2001 0.006

Customers

Savings

Checkings

Why relations?

relation

advantage – user interface

• natural language like support desirable

Lisa 432 Alberta St 4000.00 2001 0.006

Why relations?

relation

advantage – user interface SELECT name, addr

FROM savingsWHERE balance>2000

NAME ADDR BALANCE OPENED INTEREST

David 123 NW Flander St

500.75 2010 0.005

Lisa 432 Alberta St

4000.00 2001 0.006

Why relations?

relation

advantage – user interface

SELECT name, addr, balance FROM savingsORDER BY balance

NAME ADDR BALANCE OPENED INTEREST

David 123 NW Flander St

500.75 2010 0.005

Lisa 432 Alberta St

4000.00 2001 0.006

Content

– why not flat files?– relational data model– why relations?

Content

– why not flat files?– relational data model– why relations?– semi-structured data models

• XML and JSON

<Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author>Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <Publisher>Springer</Publisher></Book>

JSON{ "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" }}

XML and JSON, side-by-side

<Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author>Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <Publisher>Springer</Publisher></Book>

{ "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" }}

XML Schema<xs:element name="Book"> <xs:complexType> <xs:sequence> <xs:element name="Title" type="xs:string" /> <xs:element name="Authors"> <xs:complexType> <xs:sequence> <xs:element name="Author" type="xs:string" maxOccurs="5"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Date" type="xs:gYear" /> <xs:element name="Publisher" minOccurs="0"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Springer" /> <xs:enumeration value="MIT Press" /> <xs:enumeration value="Harvard Press" /> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType></xs:element>

JSON Schema{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

String type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

<xs:element name="Title" type="xs:string" />

List type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

<xs:element name="Authors"> <xs:complexType> <xs:sequence> <xs:element name="Author" type="xs:string" maxOccurs="5"/> </xs:sequence> </xs:complexType></xs:element>

Year type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

<xs:element name="Date" type="xs:gYear" />

Enumeration type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

<xs:element name="Publisher" minOccurs="0"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Springer" /> <xs:enumeration value="MIT Press" /> <xs:enumeration value="Harvard Press" /> </xs:restriction> </xs:simpleType></xs:element>

Semi-Structured Data Models

XML Adoption– Web Services (SOAP)– XML DB

JSON Adoption– Web Services (REST)– MongoDB

Content

– why not flat files?– relational data model– why relations?– semi-structured data models– history

History Hierarchical (IMS): late 1960’s and 1970’s Network (CODASYL): 1970’s Relational: 1970’s and early 1980’s Entity-Relationship: 1970’s Object-oriented: late 1980’s and early 1990’s

Object-relational: late 1980’s and early 1990’s Semi-structured (XML): late 1990’s to the present

1 cs351 introduction to database systems quan.wang@wsu.edu

Documents

disaster planning: an overview eileen e. brady washington...

the human side of relationship building april 28, 2010...

the dynamic land ecosystem model (dlem) and its applications...

joe valacich washington state university august 3, 2003...

presentations demonstrations gui design intro cs351 –...

mark hennessy cs351 dept computer science nui maynooth 1...

cs351, fall 2009, hw2 solutions by salim sarımurat...

codling moth orchard sampling protocol wendy jones wsu -...

cs351 paper

verification and validation. cs351 - software engineering...

cs 351 design of large programs...

cs351 © 2003 ray s. babcock software testing what is it?

marina tolmachЁva washington state university...

robert’orpet,phdstudent,robert.orpet@wsu.edu’...

cs 351 computer graphics, fall 2008 - colby...

cs 351 design of large programs architectural design...

t&l 317 initial practicum - college of education · email...

stress management allan sanders, mn, arnp asanders@wsu.edu

cs351 software engineering (software engineering in the...

john fouts 509-477-2176 fouts@wsu.edu maintaining your...