databases - brown universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,richie...

41
Databases John Jannotti /course/cs0320/www/lectures/ Feb 27, 2018 John Jannotti (cs32) Databases Feb 27, 2018 1/1

Upload: nguyendung

Post on 15-May-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Databases

John Jannotti

/course/cs0320/www/lectures/

Feb 27, 2018

John Jannotti (cs32) Databases Feb 27, 2018 1 / 1

Page 2: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Course Announcements

Bacon is due in under two weeks.I Gear Up session Tonight 7pm Here in Metcalf.

Then at 8pm Tonight Team-forming mixer, Faunce Underground.

Database lab due tomorrow.

John Jannotti (cs32) Databases Feb 27, 2018 2 / 1

Page 3: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Files are more (or less?) than you think.

Files are not just streams of char (or String).I That’s a nice abstraction, but limited for some use cases.

Sometimes you don’t want a stream.I You want random access.

And sometimes you don’t want charactersI You want bytes.

The “true” abstraction that is provided by most modern operatingsystems is an ordered list of bytes.

I and some metadata — creation time, owner, etc

John Jannotti (cs32) Databases Feb 27, 2018 3 / 1

Page 4: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Bytes are ordered, but accessible randomly

The OS is generally as good at reading byte 518,251 as the 1st.

In Java, use a RandomAccessFile.I You can seek(long position) to a postion.I You might have calculated position or obtained it from

getFilePointer().I And then int ch = raf.read() or readFully(byte[] buffer).I Steer clear of readChar(), readLine(), readUTF(), and

writeUTF().I Really, avoid all of the readX and writeX methods.

F They follow strange (sometimes Java-only) conventions.

With disks, large sequential reads/writes are much more efficient, butflash storage is making that less relevant.

Nonetheless, if your program must read, then compute to decide whatto read next, think about latency.

John Jannotti (cs32) Databases Feb 27, 2018 4 / 1

Page 5: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Bytes are not Characters

The OS only stores bytes. Applications decide what they mean.

A “Character” is a surprisingly complicated issue.I Often, an n byte file contains n characters.I But you should consider that a coincidence.

Unicode “code points” are numbers that have a defined mapping toan abstract character (not a particular glyph).

I And there are a lot of code points. Too many for one, or even two,bytes.

I For efficiency and backward compatibility, codepoints are almost neverstored in a file “simply” as binary integers in a sequence.

I “Simply” is in quotes, because even that would not be obviously simple.

John Jannotti (cs32) Databases Feb 27, 2018 5 / 1

Page 6: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Character encodings

We must translate from bytes to String (or characters, generally).

A Character encoding specifies those rules.

For example, UCS-2 says that two bytes equal a code point.I Strings just got twice as big.I And, you have “endianness” problemsI And, Unicode (now) has more than 216 code points.I “Byte order marks” and “surrogate pairs” address the latter two issues.

(Clunkily)

John Jannotti (cs32) Databases Feb 27, 2018 6 / 1

Page 7: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

UTF-8 is a clever encoding

UTF-8 is an encoding with many nice properties.I Codepoints from 0-127 are encoded in a single byte.I Bytes 0-127 never appear except when they mean code points 0-127.I It is defined without regard to the reader/writer’s endianness.I The first byte of an encoded character indicates how many more bytes

must be read.I “Continuation” bytes begin with the bits 10.

Practically, these properties meanI ASCII text is encoded the same as it was before Unicode.I If you find yourself in the middle of a UTF-8 sequence, you can find

where the next character begins.I You never need to read the next byte in the stream to know if you’re

done reading the current character.I Sorting UTF-8 bytes is the same as sorting Unicode lexigraphically.

John Jannotti (cs32) Databases Feb 27, 2018 7 / 1

Page 8: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

UTF-8’s encoding

One-byte codes are used only for the ASCII values 0 through 127.

Code points larger than 127 are multi-byte sequencesI A leading byte.

F High-order bit is 1.F n more high-bits are 1 to indicate n continuation bytes.F Next highest bit is 0

I n continuation bytes.F High-order bit is 1.F Next highest bit is 0.

I All bits in the sequence not defined by the above are concatenated toform an integer equal to the code point.

John Jannotti (cs32) Databases Feb 27, 2018 8 / 1

Page 9: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Examples

Character Decimal Binary code point Binary UTF-8

$ U+0024 36 0100100 00100100¢ U+00A2 162 000 10100010 11000010 10100010

(euro) U+20AC 8364 00100000 10101100 11100010 10000010 10101100

John Jannotti (cs32) Databases Feb 27, 2018 9 / 1

Page 10: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Don’t encode or decode UTF-8 “by hand”!

But know when to ask libraries to do it.

Often, A Reader or Writer will do it for you.I Use the two argument constructor, with StandardCharsets.UTF 8I Because UTF-8 is clever and common, you may not have noticed the

need.I FindBugs notices!

Similarly, if you have a byte array to converted to a String, usenew String(bytes, charset).

John Jannotti (cs32) Databases Feb 27, 2018 10 / 1

Page 11: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Let’s think about UTF-8 binary search

What if we gave you the actor list for Bacon as a CSV file?Each line is an ID, an actor name, other info (all UTF-8)You need to create an Actor object for, say, ID 0n4dkpq.

I Use a RandomAccessFile to start reading at the mid-point.I Read bytes until you find a newline=10. (Why is that safe?)I Acquire more bytes until the next newline.I Convert to String, check the ID.I Repeat on half the file until you find it.

010hn,Amy Grant,1960

010p3,Adam Carolla,1964

010q3,Fred Rogers,1928

.

0n4dk,Nelson Varela,1972

.

0zhb4,Alberto Giacometti,1901

0zjpz,Richie Sambora,1959John Jannotti (cs32) Databases Feb 27, 2018 11 / 1

Page 12: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Let’s think about UTF-8 binary search

What if we gave you the actor list for Bacon as a CSV file?Each line is an ID, an actor name, other info (all UTF-8)You need to create an Actor object for, say, ID 0n4dkpq.

I Use a RandomAccessFile to start reading at the mid-point.

I Read bytes until you find a newline=10. (Why is that safe?)I Acquire more bytes until the next newline.I Convert to String, check the ID.I Repeat on half the file until you find it.

010hn,Amy Grant,1960

010p3,Adam Carolla,1964

010q3,Fred Rogers,1928

.

0n4dk,Nelson Varela,1972

.

0zhb4,Alberto Giacometti,1901

0zjpz,Richie Sambora,1959John Jannotti (cs32) Databases Feb 27, 2018 11 / 1

Page 13: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Let’s think about UTF-8 binary search

What if we gave you the actor list for Bacon as a CSV file?Each line is an ID, an actor name, other info (all UTF-8)You need to create an Actor object for, say, ID 0n4dkpq.

I Use a RandomAccessFile to start reading at the mid-point.I Read bytes until you find a newline=10. (Why is that safe?)

I Acquire more bytes until the next newline.I Convert to String, check the ID.I Repeat on half the file until you find it.

010hn,Amy Grant,1960

010p3,Adam Carolla,1964

010q3,Fred Rogers,1928

.

0n4dk,Nelson Varela,1972

.

0zhb4,Alberto Giacometti,1901

0zjpz,Richie Sambora,1959John Jannotti (cs32) Databases Feb 27, 2018 11 / 1

Page 14: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Let’s think about UTF-8 binary search

What if we gave you the actor list for Bacon as a CSV file?Each line is an ID, an actor name, other info (all UTF-8)You need to create an Actor object for, say, ID 0n4dkpq.

I Use a RandomAccessFile to start reading at the mid-point.I Read bytes until you find a newline=10. (Why is that safe?)I Acquire more bytes until the next newline.

I Convert to String, check the ID.I Repeat on half the file until you find it.

010hn,Amy Grant,1960

010p3,Adam Carolla,1964

010q3,Fred Rogers,1928

.

0n4dk,Nelson Varela,1972

.

0zhb4,Alberto Giacometti,1901

0zjpz,Richie Sambora,1959John Jannotti (cs32) Databases Feb 27, 2018 11 / 1

Page 15: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Let’s think about UTF-8 binary search

What if we gave you the actor list for Bacon as a CSV file?Each line is an ID, an actor name, other info (all UTF-8)You need to create an Actor object for, say, ID 0n4dkpq.

I Use a RandomAccessFile to start reading at the mid-point.I Read bytes until you find a newline=10. (Why is that safe?)I Acquire more bytes until the next newline.I Convert to String, check the ID.

I Repeat on half the file until you find it.

010hn,Amy Grant,1960

010p3,Adam Carolla,1964

010q3,Fred Rogers,1928

.

0n4dk,Nelson Varela,1972

.

0zhb4,Alberto Giacometti,1901

0zjpz,Richie Sambora,1959John Jannotti (cs32) Databases Feb 27, 2018 11 / 1

Page 16: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Let’s think about UTF-8 binary search

What if we gave you the actor list for Bacon as a CSV file?Each line is an ID, an actor name, other info (all UTF-8)You need to create an Actor object for, say, ID 0n4dkpq.

I Use a RandomAccessFile to start reading at the mid-point.I Read bytes until you find a newline=10. (Why is that safe?)I Acquire more bytes until the next newline.I Convert to String, check the ID.I Repeat on half the file until you find it.

010hn,Amy Grant,1960

010p3,Adam Carolla,1964

010q3,Fred Rogers,1928

.

0n4dk,Nelson Varela,1972

.

0zhb4,Alberto Giacometti,1901

0zjpz,Richie Sambora,1959John Jannotti (cs32) Databases Feb 27, 2018 11 / 1

Page 17: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

A CSV file can only be sorted in one way

What would you do if you had to look up an actor by name?

The actors are not sorted that way. No binary search.

I Add another file, sorted by name.I Binary search it for a given name to get id.I Then lookup that id in the main file.

But let’s “work smarter.” We’ve invented an index. Let’s use awell-developed software system — a database — that offers them, andmuch more, for organizing data.

John Jannotti (cs32) Databases Feb 27, 2018 12 / 1

Page 18: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

A CSV file can only be sorted in one way

What would you do if you had to look up an actor by name?

The actors are not sorted that way. No binary search.I Add another file, sorted by name.I Binary search it for a given name to get id.I Then lookup that id in the main file.

But let’s “work smarter.” We’ve invented an index. Let’s use awell-developed software system — a database — that offers them, andmuch more, for organizing data.

John Jannotti (cs32) Databases Feb 27, 2018 12 / 1

Page 19: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

A CSV file can only be sorted in one way

What would you do if you had to look up an actor by name?

The actors are not sorted that way. No binary search.I Add another file, sorted by name.I Binary search it for a given name to get id.I Then lookup that id in the main file.

But let’s “work smarter.” We’ve invented an index. Let’s use awell-developed software system — a database — that offers them, andmuch more, for organizing data.

John Jannotti (cs32) Databases Feb 27, 2018 12 / 1

Page 20: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

How many disk reads?

Suppose that a disk read is 8kb. Suppose you have 50,000 actors, andeach actor row in the CSV is about 100 bytes long. Further, suppose eachline in the index we invented is about 50 bytes long. About how many diskreads would it take to find an actor’s age if you only know their name?(Birth year is stored in the main actor CSV).

A) 19

B) 30

C) 9

D) 42

E) 3

John Jannotti (cs32) Databases Feb 27, 2018 13 / 1

Page 21: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

“Database” means a lot of things

A database is a set of related data that can be accessed and related invarious ways.

For many applications, flexibility of data access is the prime motivator.

But ACID properties are often just as important, when consideringcorrectness.

I Atomicity - Transactions (related changes) either all occur or none do.Think of moving money between accounts.

I Consistency - The database enforces rules that check data validity.Think of “not null”, “positive”

I Isolation - It is impossible to see a transaction’s changes until theycomplete. Think about counting the bank’s money during transfers.

I Durability - Once changes are made, they are permanent. Think ofcrashes, power loss, etc.

We’ll be going quickly through this from the perspective of“consumers.” Take CS127 to learn how it all works.

John Jannotti (cs32) Databases Feb 27, 2018 14 / 1

Page 22: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Relational Model

Today, “Database” is usually shorthand for “Relational Database.”I “NoSQL” databases aim to be simpler, faster, more scalable.

All information is represented as relations.

A table is a set of relations of a single kind.I No inherent ordering.

A single relation is a row or tuple.

All tuples of a relation have the same named columns.I No collections. (In the sense of a variable number of columns.)I Pretending a table is a CSV file is not bad for first intuition.

SQL is a language for organizing, accessing, and manipulating theserelations.

John Jannotti (cs32) Databases Feb 27, 2018 15 / 1

Page 23: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Schemas

Schemas are the “type system” of the relational database world.

A databases’s schema describes what tables exist.

And the columns of each table (including types and constraints).

Schemas do not change at runtime (in the vast majority of apps).

Although the schema is setup with SQL, these bits of SQL aregenerally not used in applications themselves.

create, drop, alter

1 CREATE TABLE company (2 name VARCHAR( 5 0 ) NOT NULL ,3 symbol VARCHAR( 5 ) NOT NULL UNIQUE ,4 p r i c e NUMERIC( 8 , 4 ) ,5 employees INTEGER6 ) ;

Sidebar: REAL and FLOAT exists, but DO NOT use for handling money.John Jannotti (cs32) Databases Feb 27, 2018 16 / 1

Page 24: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

A Company Table

Name Symbol Price Employees

Microsoft MSFT 51.99 35010

Alphabet GOOG 713.00 20100

Oracle ORCL 37.60 18350

John Jannotti (cs32) Databases Feb 27, 2018 17 / 1

Page 25: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Using SQL to access data

> sqlite3 company.db

sqlite> select * from company where price < 60.0;

Microsoft|MSFT|51.99|35010

Oracle|ORCL|37.60|18350

sqlite> select employees from company where price < 60.0;

35010

18350

sqlite> select employees/price from company

...> where price < 40.0;

1134.85

699.85

John Jannotti (cs32) Databases Feb 27, 2018 18 / 1

Page 26: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Indexes (Indices if you prefer)

SQL is declarative. You state what you want, not how to get it.

A good database will determine the fastest way to evaluate a query.

To perform well, it may need an index.

The “Actor by Name” file we proposed is an index.

Unfortunately, you generally must decide to add them yourself.

You can index a column, multiple columns, or even expressions overcolumns.

The cost is disk space, a little bit of time at insert/delete.

The benefit is faster queries if an index can be employed.

John Jannotti (cs32) Databases Feb 27, 2018 19 / 1

Page 27: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Table Scan vs Index Lookup

# explain select * from orders where email = ’[email protected]’;

Seq Scan on orders (cost=0.00..132812.74 rows=64 width=971)

Filter: ((email)::text = ’[email protected]’::text)

# explain select * from users where email = ’[email protected]’;

Seq Scan on users (cost=0.00..10552.00 rows=1 width=284)

Filter: ((email)::text = ’[email protected]’::text)

# explain select * from users where lower(email) = ’[email protected]’;

Index Scan using users_email_ignorecase_key on users (cost=0.00..8.33 rows=1 width=284)

Index Cond: (lower((email)::text) = ’[email protected]’::text)

John Jannotti (cs32) Databases Feb 27, 2018 20 / 1

Page 28: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Table Scan vs Index Lookup

# explain select * from orders where email = ’[email protected]’;

Seq Scan on orders (cost=0.00..132812.74 rows=64 width=971)

Filter: ((email)::text = ’[email protected]’::text)

# explain select * from users where email = ’[email protected]’;

Seq Scan on users (cost=0.00..10552.00 rows=1 width=284)

Filter: ((email)::text = ’[email protected]’::text)

# explain select * from users where lower(email) = ’[email protected]’;

Index Scan using users_email_ignorecase_key on users (cost=0.00..8.33 rows=1 width=284)

Index Cond: (lower((email)::text) = ’[email protected]’::text)

John Jannotti (cs32) Databases Feb 27, 2018 20 / 1

Page 29: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Table Scan vs Index Lookup

# explain select * from orders where email = ’[email protected]’;

Seq Scan on orders (cost=0.00..132812.74 rows=64 width=971)

Filter: ((email)::text = ’[email protected]’::text)

# explain select * from users where email = ’[email protected]’;

Seq Scan on users (cost=0.00..10552.00 rows=1 width=284)

Filter: ((email)::text = ’[email protected]’::text)

# explain select * from users where lower(email) = ’[email protected]’;

Index Scan using users_email_ignorecase_key on users (cost=0.00..8.33 rows=1 width=284)

Index Cond: (lower((email)::text) = ’[email protected]’::text)

John Jannotti (cs32) Databases Feb 27, 2018 20 / 1

Page 30: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Rows as Objects

To a rough approximation, rows can be mapped to objects.I table ⇔ classI row ⇔ objectI column (“normal”) ⇔ simple fieldI column (foreign key) ⇔ object reference

An “ORM” is a software layer to make that mapping easy.I Object-Relational Mapping.

But there’s a LOT more to it.I And it should be said, some argue against ORMs.I Learn more: “object relational impedance mismatch”.

You won’t use an ORM in Bacon, but you should think aboutabstraction. Read about them for inspiration.

John Jannotti (cs32) Databases Feb 27, 2018 21 / 1

Page 31: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Relationships Between Objects

An employee table

Name Salary Company

Roger Williams 98293 ?

Ida Lewis 87293 ?

Sissieretta Jones 129349 ?

What goes in the Company column?

John Jannotti (cs32) Databases Feb 27, 2018 22 / 1

Page 32: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Primary Keys

First, add a primary key to your company table.

ID Name Symbol Price Employees

1 Microsoft MSFT 30.85 35010

2 Google GOOG 595.03 20100

3 Oracle ORCL 26.22 18350

Then, use a foreign key in your employee table.

ID Name Salary Company

234 Roger Williams 98293 2

129 Ida Lewis 87293 3

233 Sissieretta Jones 129349 2

1 s e l e c t name from employee2 where employee . company i n3 ( s e l e c t i d from company as c where c . name = ’ Google ’ )

John Jannotti (cs32) Databases Feb 27, 2018 23 / 1

Page 33: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Joins query across tables

How can we see Microsoft’s entire payroll?I The name of a company is in the company table.I The salary of an employee is in the employee table.

Tables may be joined in queries to construct a logical table that SQLcan operate on.

1 s e l e c t sum ( s a l a r y ) from employee e , company c2 where e . company = c . i d and c . name = ’ M i c r o s o f t ’ ;

The select clause operates as if there is a table with all of the columnsfrom the employee and company tables. What is in that table?

John Jannotti (cs32) Databases Feb 27, 2018 24 / 1

Page 34: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Difficulties with Collections

A row may not contain an arbitrary number of anything.

To encode a many-to-one relationship between companies andemployees:

I The company ID is embedded in the employee row.I May feel “backwards”I This is a “foreign key” and there are consistency checks.

Foreign keys provide associated sets, but not lists, and notmany-many relations.

John Jannotti (cs32) Databases Feb 27, 2018 25 / 1

Page 35: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Lists

Simple foreign-keys only provide a set abstraction.I Your users probably expect their data to maintain a consistent order.

To get lists, add a “position” column, alonside the reference.

Your code (or framework) must manage the position.

Or, use SQL ordering constructs to obtain the order desired.I For example, add a ’created’ column, and use: SORT BY created

The “position” technique is not checked by the database.Programming error = data corruption.

John Jannotti (cs32) Databases Feb 27, 2018 26 / 1

Page 36: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Managing (Ordered) Lists

ID Name Salary Company Employee Position

234 Roger Williams 98293 2 1

129 Ida Lewis 87293 3 0

233 Sissieretta Jones 129349 2 0

or

ID Name Salary Company Hired

234 Roger Williams 98293 2 4/12/1636

129 Ida Lewis 87293 3 3/29/1869

233 Sissieretta Jones 129349 2 4/5/1888

What happens if Ida quits? (It’s why I recommend the second style.)

John Jannotti (cs32) Databases Feb 27, 2018 27 / 1

Page 37: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Many-to-many relations

Suppose employees might work for multiple companies.

More columns? company1, company2I Not very generalI A huge pain to write code for.I Every query that involves a company has to check both columns.I Another example where the “simple” code is harder to write, and

error-prone.

Better solution, a table to represent the relationship.

John Jannotti (cs32) Databases Feb 27, 2018 28 / 1

Page 38: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Many-to-many relations

Suppose employees might work for multiple companies.

More columns? company1, company2I Not very generalI A huge pain to write code for.I Every query that involves a company has to check both columns.I Another example where the “simple” code is harder to write, and

error-prone.

Better solution, a table to represent the relationship.

John Jannotti (cs32) Databases Feb 27, 2018 28 / 1

Page 39: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Employees with multiple companies

ID Name Symbol Price Employees

1 Microsoft MSFT 30.85 35010

2 Google GOOG 595.03 20100

3 Oracle ORCL 26.22 18350

ID Name Salary

234 Roger Williams 98293

129 Ida Lewis 87293

233 Sissieretta Jones 129349

Employee ID Company ID

234 2

129 3

233 2

129 1

233 3John Jannotti (cs32) Databases Feb 27, 2018 29 / 1

Page 40: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Think hard about your many-many relationships

Maybe the relationship itself deserves “promotion” to a type.

Does this relationship deserve its own abstraction? “Employment”I Then it has one-to-many relationships to employees and companies.I More importantly, a natural place for associated data.I For example, salary belongs there, not in employee.

Same schema, except the intermediate table is now a Java objectI Now you think of the schema as representing two one-to-many

relationships, instead of one many-to-many.

John Jannotti (cs32) Databases Feb 27, 2018 30 / 1

Page 41: Databases - Brown Universitycs.brown.edu/courses/cs0320/lectures/databases.pdf · 0zjpz,Richie Sambora,1959 John Jannotti (cs32) Databases Feb 27, 2018 11 / 1. ... 0zjpz,Richie Sambora,1959

Advice for database-backed applications

Identify your “units of work” (often a single web, or API request)

Use transactions to isolate those units.

You can abort cleanly because of atomicity.

Keep your state in the database (not in memory).I Scaling is more straightforward.I A (single!) caching layer makes it fast enough.I Large binary data (e.g images) may be an exception.

John Jannotti (cs32) Databases Feb 27, 2018 31 / 1