jump to first page relational databases prepared for sys364 systems design by joyce walton

24
Jump to first page Relational Databases Prepared for SYS364 Systems Design by Joyce Walton

Upload: drake-gratton

Post on 16-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Jump to first page

Relational Databases

Prepared for SYS364

Systems Design

by Joyce Walton

Jump to first page

Relational Databases

In the beginning: Hierarchical and Network DBMS were awkward to design, and hopelessly unreliable

e.g., IBM’s DBMS called IMS (Information Management System) was in use at U of T for Student records … two identical inquiries would get the same answer only 14% of the time!

Jump to first page

DB: Dinosaur Base File Processing

Program centric – the code knows how to make information from data

Collections of data organized sequentially, direct (random) with hashing, and by index.

Hierarchical Structure centric – programs must know how to

navigate data Tree structure, parent-child relationships

Network Files with records & fields contain data Set of records provide one to many relationships

(similar to RDBMS Join) High performance but changes mean recompiling

Jump to first page

Relational Databases, continued

Meanwhile, at U of T and other grad schools of Computer Science around the world, people were using a new way of talking about data: the Relational Calculus

Great for discussion, but no one thought that a commercially viable DBMS could be built to use the Relational Calculus!

Then a student did it, not for a big mainframe but for a PC! (dBASEII)

Jump to first page

The Relational Calculus

A scary name for some very reasonable stuff! But unfortunately, it has a lot of fancy words

we need to memorize ! A relation is a named table of data, arranged

in rows (called tuples) and columns (called attributes)

Jump to first page

The Relational Calculus,continued

It may help to think of file equivalents to learn this vocabulary: think of a Relation as being a file and a Tuple as being a record and an Attribute as being a field

Theoretically, there are no duplicate tuples in a Relation. However, most real-life applications use them, and most Relational DBMS allow them !

Jump to first page

The Relational Calculus,continued

Each Attribute (column, like field) has a “Domain”, the range or list of values it may contain. This is quite different from the “Data Type” idea we use with files, because it is more restrictive.

Now let’s talk about some of the operations we can perform on Relations...

Jump to first page

The Relational Calculus,continued The Selection operation consists of using one

Relation to create another which has only some of the Tuples of the original

These Tuples in the new Relation are “selected” from the original on the basis of some criterion -- really just a “condition” like those we use in the Control Constructs in programming languages, based on the “Relational Operators”: == != < <= > >=

And just like conditions, criteria can be combined into more complex criteria using the “Logical Operators”: AND, OR, NOT

Jump to first page

The Relational Calculus,continued

Projection is another operation we can perform on one Relation to create a second

The new “projected” relation will contain only some of the attributes (columns) of the original, and these are selected by name

I usually declare a student contest to come up with a “Memory Crutch” to help you remember that you Select Rows, but Project Columns

Jump to first page

The Relational Calculus,continued

In addition to the operations you can perform on a single Relation, there are several you can do to two Relations to create a third.

The simplest of these is the Union, and it defines a new term: Union-Compatible or UC

The Union of two Relations which have the same attributes with the same domains (UC) is formed by appending all the tuples of the second after all the tuples of the first

Jump to first page

The Relational Calculus,continued

The Intersection of two UC Relations is a third which contains only those tuples in the second which are identical to some in the first. (Compare the Intersection to the AND operation on Venn diagrams!)

The Difference of two UC Relations is a third which contains only the tuples which are present in the first but not in the second. (Just like subtraction!)

Jump to first page

The Relational Calculus,continued And now we come to a truly ugly operation! But

we’ll see later that it can be useful… The Product of two Relations (not necessarily

UC) is a third relation which has every tuple of the second appended to each tuple of the first.

So if you had one Relation which was 3 tuples (rows) by 4 attributes (columns) and another which was 2 by 2, their product would have 6 tuples (3 times 2) each having 6 attributes (4+2), truly big and ugly !

Jump to first page

Relational Calculus Examples Here’s a relation called DOGS:

NAME BREED SEX AGE OWNER

Rover Mongrel M 2 Li

Spot Cocker M 3 Sue

Blue Beagle F 4 Mary

Jump to first page

Relational Calculus Examples And another relation called CATS:

NAME BREED SEX AGE OWNER

Puss DSH F 18 Jane

Rocky DSH M 10 Jane

Maggie Maine Coon

F 12 Li

Kurt DSH M 3 Sue

Jump to first page

Relational Calculus Examples

As you can see, the DOGS and CATS relations are Union-Compatible (UC) because they have the same Attributes with the same Domains

Since DOGS has 5 Attributes with 3 Tuples and CATS has the same 5 Attributes with 4 Tuples, their Union would have the same 5 Attributes with 7 Tuples

Jump to first page

Relational Calculus Examples Now let’s look at another Relation, called FOLK:

NAME NUMBER

Li 215

Jane 184

Sue 182

Mary 214

FOLK is not UC with either DOGS or CATS

However, we can make a Product of DOGS and FOLK

It will be Huge! And useless. 7 Attributes, by 12 Tuples !

Jump to first page

Relational Calculus Examples However ugly that Product is, we can make it

useful! All we have to do is Select the Tuples for which DOG’s OWNER is identical to FOLK’s NAME, and then Project only the Attributes of Name and Number! This will let us determine the Number (address on Street) for all the dogs!

What we’ve just done is called a Join, a Product which has been Selected and Projected

Jump to first page

Relational Calculus, continued

The Join operation is what makes the whole Relational idea useful

It allows us to keep data in separate tables where they can easily be maintained, and then combine them whenever we need their data united

Jump to first page

Files vs. Relations: Records in files can have “repeating

groups”; I.e., several fields which are repeated, and in some records, some groups are omitted

In Relations, there are no repeating groups; every tuple is the same size, and each tuple carries enough data to recombine them, so that there are no variable-sized records, or blank records

Jump to first page

Files vs. Relations, continued Files can be “chained” in much the same way as

Relations can be Joined However, with files, you can only do “Equijoin”

operations, or put together records from the two files which have matching fields (as we did with DOGS and FOLK)

In Relations, however, you can Select a Product to create a Join using any criterion (condition)

For example, you could Join a Payroll file with a TaxTable to link the appropriate Tax percentage for a Salary between two values in the TaxTable

Jump to first page

Files vs. Relations, continued

This often means that you can keep “Dictionary” data (which is relatively static and unchanging) separate from “Master” data (which is very volatile, constantly changing)

With Transaction records, you can keep the “Header and Footer” data in one Table, and all the Detail data in another Table, which can then be indexed several ways, and re-united with its Header and Footer data any time it’s needed

Jump to first page

File Media and Storage

PCs: floppy disk, CD-ROM, cartridges, hard drives.

Large systems: Archival storage RAID 1 and 5 Magnetic tape Sizing: a few megabytes…who

cares? Giga/Tera/Petabytes…you care.

Jump to first page

DB transaction features

Important in client/server architectureI.e. e-commerce on the web

Journaling: DB changes recorded Rollback: use the Journal to “undo” a change Commitment control (2 phase commit) Multiple record locking.

Jump to first page

Next Steps: Continue to read more about DBMS, especially

Relational ones in Chapter 8 Also, read about Normalization and Normal

Forms With the new efficient Relational DBMS

approach, an entirely different approach to System Design is feasible: start with Entity Relationships Chart build a Relational DBMS (3NF) Experiment to create necessary reports with

simple DBMS queries, and input dialogues using available DBMS tools !