database technology introduction,...

77
Database Technology, EDA636 1 Database Technology Introduction, SQL Slides courtesy of Per Holm and EDA216 2 Database Technology, EDA636 What Is a Database? A database is a collection of data: Persistent Structured, and the database contains a description of the structure (metadata) A database management system (DBMS): Manages the data (very large amounts of them, usually) Allows users to specify the logical structure of the database (the database “schema”) and to enter, alter, and delete data Allows users to query the data via a query language (SQL most common) Supports concurrent access to the data Handles backup, crash recovery, … 3 Database Technology, EDA636 Why Not Simple Files Instead? Consider: The file structure is determined by the application program: Difficult to change the logical structure of the data A new kind of query => a new program must be written Difficult to implement efficient queries and updates Difficult to handle concurrent access, crash recovery, … 4 Database Technology, EDA636 Developing a Database Application Several steps: Model the data. Entity-Relationship (E/R) modeling (also called conceptual modeling, semantic modeling, …). Very similar to object-oriented modeling (static model) Translate the E/R model into a relational model Implement the relational model in a DBMS Write the necessary programs to implement the desired queries and updates

Upload: others

Post on 28-Oct-2019

18 views

Category:

Documents


0 download

TRANSCRIPT

Database Technology, EDA636 1

Database TechnologyIntroduction, SQL

Slides courtesy of Per Holm and EDA216

2 Database Technology, EDA636

What Is a Database?A database is a collection of data:

• Persistent• Structured, and the database contains a description of

the structure (metadata)A database management system (DBMS):

• Manages the data (very large amounts of them, usually)• Allows users to specify the logical structure of the

database (the database “schema”) and to enter, alter,and delete data

• Allows users to query the data via a query language(SQL most common)

• Supports concurrent access to the data• Handles backup, crash recovery, …

3 Database Technology, EDA636

Why Not Simple Files Instead?

Consider:• The file structure is determined by the application

program:• Difficult to change the logical structure of the data• A new kind of query => a new program must be

written• Difficult to implement efficient queries and updates• Difficult to handle concurrent access, crash recovery, …

4 Database Technology, EDA636

Developing a Database Application

Several steps:• Model the data. Entity-Relationship (E/R) modeling (also

called conceptual modeling, semantic modeling, …).Very similar to object-oriented modeling (static model)

• Translate the E/R model into a relational model• Implement the relational model in a DBMS• Write the necessary programs to implement the desired

queries and updates

5 Database Technology, EDA636

The Relational Data Model

Suggested in 1970 by E.F. Codd. Today used in almost allDBMS’s.

• A relational database consists of relations (tables)• A relation has attributes (columns in the table)• A relation is a set of tuples (rows in the table)

Earlier, hierarchical or network databases were used.

6 Database Technology, EDA636

The Relational Model Illustrated

title year length filmTypeStar Wars 1977 124 colorMighty Ducks 1991 104 colorWayne’s World 1992 95 color 

Relation

Tuple

Attribute

7 Database Technology, EDA636

Sample Relation

An instance of a relation Accounts that records bank accountswith their balance, type, and owner:+-----------+---------+----------+-------------+| accountNo | balance | type | ownerPNo |+-----------+---------+----------+-------------+| 12345 | 1000 | savings | 730401-2334 || 67890 | 2846.92 | checking | 790614-2921 |+-----------+---------+----------+-------------+

Four attributes: accountNo, balance, type, and ownerPNo. Twotuples: the first says that account number 12345 has abalance of one thousand dollars, and is a savings accountowned by a person with the person number 730401-2334.Relation schemas are usually described in a more conciseform. The relation name is given first, followed by theattribute names inside parentheses:

Accounts(accountNo, balance, type, ownerPNo)

8 Database Technology, EDA636

SQL

• SQL stands for “Structured Query Language”,pronounced “ess-queue-ell”

• Current standard SQL:2003, revision of SQL3 (‘99)• Declarative — just tell SQL what should be done; it is up

to the SQL compiler to figure out details about how todo it

• Two major parts: a DDL (data definition language), aDML (data manipulation language)

• reserved words, case-insensitive

9 Database Technology, EDA636

SQL Usage

SQL can be used in different ways:• Write SQL statements directly at a terminal or in a file:

select accountNo from Accounts where balance > 1000;This is for developers, not for end users.

• Write an application program (here in Java) and pass the SQLstatements to the DBMS:

ResultSet rs = stmt.executeQuery(” select accountNo from ” +         ”Accounts where balance > 1000”);

/* present the results to the user */This is the most common usage, and the way that most peoplecome into contact with databases.

In either case, the application developer has to know SQL.

10 Database Technology, EDA636

Different DBMS’s

Huge, commercial:• Oracle, MS SQL Server, DB2, Sybase, Informix, …

Smaller, free:• MySQL, PostgreSQL, mSQL, SQLite, …

Earlier, we used MS SQL Server in this course. Now, we use MySQL.Advantages with MySQL: free, easy to install (you can do it on your owncomputer), fast, supports most of the SQL standard (from version 5),enormous user community, …Disadvantages: we’ll see …Differences between MySQL and standard SQL are noted on some slides.

11 Database Technology, EDA636

Simple SQL Queries

Find the balance of account 67890:select balancefrom Accountswhere accountNo = 67890;

Find all savings accounts with negative balances:select accountNofrom Accountswhere type = ’savings’ and balance < 0;

Typical structure:select SOMETHINGfrom SOMEWHEREwhere CERTAIN CONDITIONS HOLD;

12 Database Technology, EDA636

SQL ExamplesThe following examples are similar, but not always identical, to theexamples in the english text-book.A movie database: a movie has a title, a production year, a length,is in color or black-and-white, is produced by a studio, and theproducer has a certificate number. Schema:

Movies(title, year, length, filmType, studioName, prodNbr)Example relation instance:

+---------------+------+--------+----------+------------+---------+| title | year | length | filmType | studioName | prodNbr |+---------------+------+--------+----------+------------+---------+| Pretty Woman | 1990 | 119 | color | Disney | 99912 || Star Wars | 1977 | 124 | color | Fox | 12345 || Wayne's World | 1992 | 95 | color | Paramount | 34129 |+---------------+------+--------+----------+------------+---------+

13 Database Technology, EDA636

Another SQL Example

Find all movies produced by Disney in 1990:select *from Movieswhere studioName = ’Disney’ and year = 1990;

Sample result:+--------------+------+--------+----------+------------+---------+| title | year | length | filmType | studioName | prodNbr |+--------------+------+--------+----------+------------+---------+| Pretty Woman | 1990 | 119 | color | Disney | 99912 |+--------------+------+--------+----------+------------+---------+MySQL: string comparisons are not case sensitive.

14 Database Technology, EDA636

Projection (select clause)Select only certain attributes:

select title, lengthfrom Movieswhere studioName = ’Disney’ and year = 1990;+--------------+--------+| title | length |+--------------+--------+| Pretty Woman | 119 |+--------------+--------+

The same with new names for the attributes (as is optional):select title as name, length as duration

from Movieswhere studioName = ’Disney’ and year = 1990;+--------------+----------+| name | duration |+--------------+----------+| Pretty Woman | 119 |+--------------+----------+

15 Database Technology, EDA636

Selection (where clause)

Expressions in the where clause (note single quotesaround string literals):

select titlefrom Movieswhere year > 1970 and filmType <> ’color’;

Wildcards:select titlefrom Movieswhere title like ’Star%’;

% matches 0-many characters, _ matches exactly onecharacter.

16 Database Technology, EDA636

Ordering the Output

Sort the output, first by length, then by title:select *from Moviesorder by length, title;

May (naturally) be combined with where:select *from Movieswhere studioName = ’Disney’ and year = 1990order by length, title;

17 Database Technology, EDA636

More Than One Relation

A relation describing movie studio executives who have aname, address, certificate number, and a net worth:

MovieExecs(name, address, certNbr, netWorth)

Find the producer of Star Wars:select prodNbrfrom Movieswhere title = ’Star Wars’;

(Output: 12345, remember this)

select namefrom MovieExecswhere certNbr = 12345;

Can be done in a better way, see next slide.

18 Database Technology, EDA636

Several Relations in One Query

The better way to find the producer of Star Wars:

select namefrom Movies, MovieExecswhere title = ’Star Wars’ and

prodNbr = certNbr;

The relations mentioned in the from clause are joinedaccording to the condition in the where clause.In the select and where clauses, all attributes in all therelations mentioned in the from clause are available.If attribute names are not unique, they must be prefixed withthe relation name and a dot.

19 Database Technology, EDA636

Join Example II

The banking database (see an early slide, “Sample Relation”)also keeps track of bank customers.

• A customer is described by his person number (Swedishcivic registration number), his name, and his address

• Each customer can own many accounts• Each account is owned by exactly one customer

This is described by two relations:Customers(persNo, name, address)Accounts(accountNo, balance, type, ownerPNo)

The attribute ownerPNo in Accounts refers to the attribute persNoin one of the Customers tuples.

20 Database Technology, EDA636

Join Example II, cont’d

Find the name of the customer who owns the account withnumber 12345:

select namefrom Accounts, Customerswhere accountNo = 12345 and

  ownerPNo = persNo;

accountNo balance type ownerPNo67890 2846.92 checking 790614-292112345 1000.00 savings 730401-2334… … …  … 

persNo name address730401-2334 Bo Ek Malmö801206-4321 Eva Alm Lund… … … 

21 Database Technology, EDA636

Join Example III

Two relations (describe movies and that stars appear in movies):Movies(title, year, length, filmType, studioName, prodNbr)StarsIn(title, year, starName)

SQL join:select Movies.title, Movies.year,

     length, filmType, studioName, starNamefrom Movies, StarsInwhere Movies.title = StarsIn.title and

Movies.year = StarsIn.year;The resulting table is shown on the next slide.

22 Database Technology, EDA636

Join Example III, cont’d

title year length filmType studioName starNameStar Wars 1977 124 color Fox Carrie FisherStar Wars 1977 124 color Fox Mark HamillStar Wars 1977 124 color Fox Harrison FordMighty Ducks 1991 104 color Disney Emilio EstevezWayne’s World 1992 95 color Paramount Dana CarveyWayne’s World 1992 95 color  Paramount Mike Meyers

title year length filmType studioNameStar Wars 1977 124 color FoxMighty Ducks 1991 104 color DisneyWayne’s World 1992 95 color Paramount

title year starNameStar Wars 1977 Carrie FisherStar Wars 1977 Mark HamillStar Wars 1977 Harrison FordMighty Ducks 1991 Emilio EstevezWayne’s World 1992 Dana CarveyWayne’s World 1992 Mike Meyers

23 Database Technology, EDA636

Set Operations, Union

The common set operations, union, intersection, and difference,are available in SQL.The relation operands must be compatible in the sense that theyhave the same attributes (same data types) in the same order.The relations Movies and StarsIn again:

Movies(title, year, length, filmType, studioName, prodNbr)StarsIn(title, year, starName)

Find all movies mentioned in either relation:select title, year from Movie

unionselect title, year from StarsIn;

24 Database Technology, EDA636

Intersection, DifferenceStars(name, address, gender, birthDate)MovieExecs(name, address, certNbr, netWorth)

Female movie stars who are also movie executives and whoare rich:

(select name, address from Stars where gender = ’F’)

intersect(select name, address from MovieExecs where netWorth > 10000000);

Movie stars who are not also movie executives:(select name, address from Stars)

except(select name, address from MovieExecs);

MySQL: doesn’t have intersect or except.

25 Database Technology, EDA636

Subqueries

Find the producer of Star Wars (same as before):select namefrom Movies, MovieExecswhere title = ’Star Wars’ and prodNbr = certNbr;

Another way to do this, using a subquery:select namefrom MovieExecswhere certNbr =

(select prodNbr from Movies where title = ’Star Wars’);

The subquery in parentheses produces a single tuple like(12345). It would be an error if it produced more than onetuple (then you cannot test with =).

26 Database Technology, EDA636

Conditions Involving Relations

Other tests on relations:• exists R• s in R• s > all R (or <, <=, =, < >, >=)• s > any R (or <, …)

Prefix with not to get the negated condition.

Examples follow.

27 Database Technology, EDA636

Conditions Involving Tuples

Movies(title, year, length, filmType, studioName, prodNbr)StarsIn(title, year, starName)MovieExecs(name, address, certNbr, netWorth)

Find all producers of movies where Harrison Ford stars:select namefrom MovieExecswhere certNbr in

(select prodNbr from Movies where (title, year) in

(select title, year from StarsIn where starName = ’Harrison Ford’)

);

28 Database Technology, EDA636

Eliminating Subqueries

Many queries that use subqueries can be rewritten to usejoins.A simpler way to find producers of Harrison Ford movies:

select namefrom MovieExecs, Movies, StarsInwhere certNbr = prodNbr and

Movies.title = StarsIn.title andMovies.year = Starsin.year andstarName = ’Harrison Ford’;

That is, join three relations.

29 Database Technology, EDA636

Testing For Membership

Find the producers who haven’t produced any movies.This cannot be written as a join:

select namefrom MovieExecs, Movieswhere certNbr <> prodNbr; -- WRONG

Must do it this way:select namefrom MovieExecswhere certNbr not in (select prodNbr from Movies);

30 Database Technology, EDA636

Correlated SubqueriesFind titles that have been used for two or more movies(produced in different years). Using a correlated subquery:

select title, yearfrom Movies Old -- ”rename” the relationwhere year <> any

(select year from Movies where title = Old.title);

With a join:select Old.title, Old.yearfrom Movies Old, Movies Newwhere Old.title = New.title and

Old.year <> New.year;This will list the same title more than once. Use select distinctto avoid this.

31 Database Technology, EDA636

Cross Join (Cartesian Product)Two tables, R and S:+------+------+ +------+------+------+| a | b | | b | c | d |+------+------+ +------+------+------+| 1 | 2 | | 2 | 5 | 6 || 3 | 4 | | 4 | 7 | 8 |+------+------+ | 9 | 10 | 11 | +------+------+------+select * from R, S;     or    select * from R cross join S;+------+------+------+------+------+| a | b | b | c | d |+------+------+------+------+------+| 1 | 2 | 2 | 5 | 6 || 3 | 4 | 2 | 5 | 6 || 1 | 2 | 4 | 7 | 8 || 3 | 4 | 4 | 7 | 8 || 1 | 2 | 9 | 10 | 11 || 3 | 4 | 9 | 10 | 11 |+------+------+------+------+------+

32 Database Technology, EDA636

Natural (“normal”) Joinselect * from R, S where R.b = S.b;+------+------+------+------+------+| a | b | b | c | d |+------+------+------+------+------+| 1 | 2 | 2 | 5 | 6 || 3 | 4 | 4 | 7 | 8 |+------+------+------+------+------+This is called “equijoin” (equality join). A natural join, whereattributes with the same names are joined, is almost thesame:

select * from R natural join S;+------+------+------+------+| b | a | c | d |+------+------+------+------+| 2 | 1 | 5 | 6 || 4 | 3 | 7 | 8 |+------+------+------+------+

33 Database Technology, EDA636

Variations on Join SyntaxJoining two tables with a condition in the where clause iscommon. The following syntaxes are equivalent:

select *from R, Swhere <some-condition>;

select *from R inner join S on <some-condition>;

The two forms can be combined, often like this:

select *from R inner join S on R.b = S.bwhere a = 1;

34 Database Technology, EDA636

Outer Joins

Sometimes, you wish to join but also include tuples fromone table which do not match with any tuple in the othertable. A left outer join includes tuples from the “left” table(mentioned first), a right outer join from the “right” table.select *from R right outer join S on R.b = S.b;+------+------+------+------+------+| a | b | b | c | d |+------+------+------+------+------+| 1 | 2 | 2 | 5 | 6 || 3 | 4 | 4 | 7 | 8 || NULL | NULL | 9 | 10 | 11 |+------+------+------+------+------+

35 Database Technology, EDA636

Aggregation OperatorsFive operators that can be applied to a column name:

SUM, AVG, MIN, MAX, COUNTExamples:

select avg(netWorth)from MovieExecs;select count(*) -- counts nullsfrom StarsIn;select count(starName) -- does not count nullsfrom StarsIn;select count(distinct starName)from StarsIn;

The aggregation operators may not be used in whereclauses — they operate on a whole relation aftertuples have been selected with where.

36 Database Technology, EDA636

Grouping

It is possible to first group tuples from a query with group byand then apply an operator to the tuples of each group.Examples:

Find the sum of the lengths of all movies for each studio:select studioName, sum(length)from Moviesgroup by studioName;

Find the total length of all movies produced by each producer:select name, sum(length)from MovieExecs, Movieswhere prodNbr = certNbrgroup by name;

37 Database Technology, EDA636

Grouping IllustratedOperas(composer, opera).

select composer, count(*)from Operasgroup by composer;

composer operaVerdi NabuccoVerdi La TraviataBellini NormaPuccini BohemeVerdi RigolettoPuccini Madama ButterflyRossini Guglielmo TellPuccini Il TritticoVerdi OtelloBellini Il Pirata

composer operaBellini NormaBellini Il PirataPuccini BohemePuccini Madama ButterflyPuccini Il TritticoRossini Guglielmo TellVerdi NabuccoVerdi La TraviataVerdi RigolettoVerdi Otello

composer count(*)Bellini 2Puccini 3Rossini 1Verdi 4

38 Database Technology, EDA636

Having Clauses

You can control which groups that should be present in theoutput by introducing a condition about the group.Find the composers, except Verdi, that have written morethan two operas:

select composer, count(*)from Operaswhere composer <> ’Verdi’group by composerhaving count(*) > 2:

Note the order of selection:1) first where (determine which tuples to include)2) then group by3) last having (determine which groups to include)

39 Database Technology, EDA636

Database Modifications – Insertion

Insert a new tuple into the relation StarsIn(title, year, starName):insert into StarsIn(title, year, starName)

values (’The Maltese Falcon’, 1942, ’Sidney Greenstreet’);Since we here give new values for all the attributes we mayinstead write:

insert into StarsInvalues (’The Maltese Falcon’, 1942, ’Sidney Greenstreet’);

When this form is used, the attribute values must be in thesame order as in the definition of the relation schema. This issensitive to changes in the relation schema, so it is better touse the first form.

40 Database Technology, EDA636

Deletion

Delete Sidney Greenstreet as a star in The Maltese Falcon:delete from StarsInwhere title = ’The Maltese Falcon’ and

year = 1942 andstarName = ’Sidney Greenstreet’;

Note that the tuple must be described precisely. This woulddelete all occurrences of Sidney Greenstreet as a star:

delete from StarsInwhere starName = ’Sidney Greenstreet’;

And this would delete all tuples from the relation:delete from StarsIn;

41 Database Technology, EDA636

Updates

Update the relation MovieExecs(name, address, certNbr, netWorth).Double the net worth of all executives:

update MovieExecsset netWorth = 2 * netWorth;

Attach the title ’Pres. ’ in front of the name of every movieexecutive who is the president of a studio (described by therelation Studios(name, address, presCNbr)):

update MovieExecsset name = ’Pres. ’ || namewhere certNbr in (select presCNbr from Studios);

|| is string concatenation.MySQL: use concat(string1, string2) to concatenate strings.

42 Database Technology, EDA636

Defining a Relation Schema

A table is created in SQL with the create table statement. Youlist all attributes with their data types.Example:

create table Stars (name char(20),address varchar(256),gender char(1) default ’?’,birthdate date

);See next slide for possible data types.

43 Database Technology, EDA636

Data Types

char(n) A character string of fixed length, n characters,padded with blanks.

varchar(n) A character string of varying length, at most ncharacters.

boolean True or false or unknown (!). In MySQL, booleanis a synonym for tinyint and true is 1, false is 0,unknown is null.

int, integer Integers.float, real Floating-point numbers.decimal(n,d) Real numbers, n positions, d decimals.date, time Date and time.blob “Binary Large Objects”

44 Database Technology, EDA636

Modifying Relation SchemasDelete a table:

drop table Stars;

Add a column:alter table Stars add phone char(16);

Note: all tuples in the relation will have null as value for the newattribute. Null values should be avoided; can do this instead:

alter table Stars add phone char(16) default ’unlisted’;

Delete a column:alter table Stars drop birthdate;

Another note: adding and deleting columns is common —databases are long-lived and changed many times during theirlifetime.

45 Database Technology, EDA636

Indexes

An index on an attribute of a relation is a data structure thatmakes it efficient to find those tuples that have a fixed valuefor the attribute.Example. It would speed up queries like the following if wecould find the movies made in a specific year quickly:

select *from Movieswhere studioName = ’Disney’ and year = 1990;

An index on the year attribute is created with:create index YearIndex on Movies(year);

and deleted with:drop index YearIndex on Movies;

46 Database Technology, EDA636

Index Selection

Naturally:• An index on an attribute speeds up queries where a

value for the attribute is specified• An index makes insertions, deletions, and updates more

time consumingIndex selection is a difficult task!Often, the DBMS automatically creates an index on the keyattributes of a relation. More about keys later.

47 Database Technology, EDA636

View Definitions

Tables created with create table actually exist in the database.Another class of relations, views, do not exist physically.They are created with create view as the answer to a queryand are materialized when they are accessed. Views can bequeried as other relations, and sometimes also modified.Example:

create view ParamountMovies asselect title, yearfrom Moviewhere studioName = ’Paramount’;

select titlefrom ParamountMovieswhere year = 1976;

There are rules for modification of relations via views.

48 Database Technology, EDA636

View Advantages

Views are good for several things:• to restrict what data a user sees. You can create a view

and give a user access rights to this view only, not tothe underlying tables

• to simplify queries where another query is used morethan once

49 Database Technology, EDA636

Constraints

The data in a database should be correct …Data can be checked by the application program when it isentered, but some of these checks can be automaticallyperformed by the DBMS.Integrity constraints in SQL:

• Key constraints• Foreign-key constraints• Assertions (not in MySQL)• Triggers

Key constraints and foreign-key constraints are coveredhere, triggers later.

50 Database Technology, EDA636

Primary Key ConstraintsA key on a relation is an attribute, or a set of attributes, that uniquelyidentifies each tuple in a relation.Example: Persons(persNo, name, address). persNo is a key, name isnot a key, address is not a key, name + address is not a key, …SQL (two equivalent variants):

create table Persons (persNo char(11),name varchar(40),address varchar(60),primary key (persNo)

);create table Persons (

persNo char(11) primary key,name varchar(40),address varchar(60)

);

51 Database Technology, EDA636

Invented Keys

For many relations, the selection of attributes that are to be used asa primary key is straightforward. In the Persons example, the personnumber is an obvious key (actually, person numbers were inventedfor this purpose.)But consider the Movies relation that we have used:

Movies(title, year, filmType, studioName, prodNbr)We will show later that {title, year} is a key to the Movies relation.But it may be costly to use a string and an integer as a key, and alsoimpractical if you later find that there actually exist two movies withthe same name that are made in the same year.In practice, you would probably invent a “movie identificationnumber” to use as a key for the relation. It would be advantageous ifthis number could be accepted as a standard by the whole movieindustry.

52 Database Technology, EDA636

Invented Keys in MySQLIn MySQL, you usually use an auto increment column when you needan invented key. Example:

create table T (x integer auto_increment,y varchar(10),primary key (x)

);insert into T(y) values (’first’);insert into T values (0, ’second’);insert into T values(null, ’third’);select * from T;+---+--------+| x | y |+---+--------+| 1 | first || 2 | second || 3 | third |+---+--------+

select last_insert_id( );+------------------+| last_insert_id() |+------------------+| 3 |+------------------+

53 Database Technology, EDA636

Primary Key Consequences

If you declare an attribute (or a set of attributes) S as aprimary key, you state that:

• Two tuples cannot agree on all of the attributes in S• Attributes in S cannot have null as a value

Any insertion or update that violates one of these conditionswill be rejected by the DBMS.

54 Database Technology, EDA636

Declaring Keys with Unique

Keys may be declared with unique instead of with primary key.This means almost the same, but:

• There may be any number of unique declarations, butonly one primary key

• unique permits nulls in the attribute values, and nulls arealways considered unique

55 Database Technology, EDA636

Foreign KeysA foreign key is an attribute in a relation that references a key(primary key or unique) in another relation.Example:

Studios(name, address, presCNbr)MovieExecs(name, address, certNbr, netWorth)

We expect that each studio president is also a movieexecutive (or null, if the studio doesn’t have a president at thepresent time):

name address certNbr netWorth… … … …… … … …… … … …… … … …

name address presCNbr… … …… … …… … … 

56 Database Technology, EDA636

Declaring Foreign KeysThe Studios relation on the previous slide can be written:

create table Studios (name char(30),address varchar(255),presCNbr int,primary key (name),foreign key (presCNbr) references MovieExecs(certNbr)

);or, alternatively, in standard SQL but NOT in MySQL:

create table Studios (name char(30) primary key,address varchar(255),presCNbr int references MovieExecs(certNbr)

);certNbr must be a key (primary or unique) in MovieExecs.MySQL: only the InnoDB storage engine supports foreign keys.

57 Database Technology, EDA636

Foreign Key Checks

We wish to state that Steven Spielberg, certNbr = 12345, is thepresident of Fox studios (he isn’t). The following is ok:

insert into Studios values(’Fox’, ’Hollywood’, 12345);This will fail, since there is no movie executive with certNbr =99999:

insert into Studios values(’Fox’, ’Hollywood’, 99999);The following fails — foreign keys are always checked.

update Studios set presCNbr = 99999 where name = ’Fox’;update MovieExecs set certNbr = 99999 where certNbr = 12345;delete from MovieExecs where certNbr = 12345;

More on next slide.

58 Database Technology, EDA636

More on Deletes and UpdatesThere are alternatives on what to do on deletes and updates.Suppose that Steven Spielberg retires — then we must setpresCNbr = null in the Fox studio before we delete Steven’s rowin the MovieExecs table.This is simpler:

foreign key (presCNbr) references MovieExecs(certNbr)on delete set null;

Or we can delete the studio if the president retires (not realistic): foreign key (presCNbr) references MovieExecs(certNbr)

on delete cascade;Similar with updates — what if Steven’s certificate number ischanged? Do it like this:

foreign key (presCNbr) references MovieExecs(certNbr)on delete set nullon update cascade;

59 Database Technology, EDA636

Constraints on Attributes

You can disallow tuples in which an attribute is null bydeclaring the attribute as not null:

presCNbr int not null,You can also make simple checks on attribute values:

presCNbr int check (presCNbr > 100000),The following constraint is correct, but it cannot replace theforeign-key declaration:

presCNbr int check(presCNbr in (select certNbr from MovieExecs)),

since the constraint isn’t checked when a tuple from theMovieExecs relation is deleted or updated.MySQL: check on attributes is not supported (not null is).

Database Technology, EDA636 60

Database Design,E/R Modeling

61 Database Technology, EDA636

E/R Modeling, Introduction

What information must the database hold? What are therelationships between the information components?These questions are answered during the analysis phase ofdatabase development, when an analysis model is developed.This is called Entity-Relationship (E/R) modeling. Entities are“information pieces”, “things” (compare with objects inobject-oriented modeling).After the analysis, the E/R model is translated into a relationalmodel with tables, attributes, keys, etc. (compare with thedesign phase in object-oriented modeling).Finally, the relational model is expressed in a data definitionlanguage and the queries in a data manipulation language(compare with the implementation phase in object-orientedmodeling).

62 Database Technology, EDA636

Database Design, Overview

Movie Star

Studio

Movies(title, year, …)Stars(name, …)…

create table Movies (   title varchar(40),   year int,   …);

Ideas

E/R design

Relational schema

Relational DBMS

63 Database Technology, EDA636

Elements of the E/R Model

Element types in the E/R model:Entity sets Similar to OO classes but contain only data,

no operations. An entity set is a collectionof entities (objects).NOTE: I use singular names for entity sets(as for classes).

Attributes The data in the entity sets. Attributes areatomic, i.e. “simple values” (ints, chars,strings, but not arrays, structs, …).

Relationships Connections between entity sets(associations in OO modeling).

64 Database Technology, EDA636

Entity-Relationship Diagrams

An E/R model is expressed in diagram form. There areseveral notations available, none of which is standardized.The book uses boxes for entity sets, ovals for attributes, anddiamonds for relationships. Multiplicity of relationships isspecified on top of the edges if not 1-to-1.

Movie Star

Studio

Stars-in

Owns

title year

filmTypelength

name address

address

name

attribute

relationship

entity set

1N

65 Database Technology, EDA636

The First Diagram Explained• Entity sets Movie, Star, and Studio.• Movies have a title, a production year, a length, and a

type, “color” or “B/W”. Stars have a name and anaddress. Studios have a name and an address.

• A movie has several stars, a star can star in severalmovies. A movie is owned by one studio, a studio canown several movies.

Movie Star

Studio

Stars-in

Owns

title year

filmTypelength

name address

address

name1N

66 Database Technology, EDA636

UML NotationWe will use UML (Unified Modeling Language) instead of theconventional E/R notation. Advantages:

• Standardized• Compact• Multiplicity is explicit

   MovietitleyearlengthfilmType

    Starnameaddress

  Studionameaddress

0..* 0..*

0..*

1

stars-in

owns

67 Database Technology, EDA636

UML Notation Details

   MovietitleyearlengthfilmType

    Starnameaddress

  Studionameaddress

0..* 0..*

0..*

1

stars-in

ownsentity set –cf. ”class”

attributes

relationshipwith name

multiplicity –zero to many

68 Database Technology, EDA636

Instances of an E/R Diagram

In object-oriented modeling, a class diagram can beinstantiated to show objects instead of classes. Similarly, anE/R diagram can be instantiated to show entities instead ofentity sets.Example (not UML notation, and most attribute values havebeen omitted):

BasicInstinct

TotalRecall

SharonStone

ArnoldSchwarzenegger

CarolcoPictures

MichaelDouglas

owns

owns

stars-in

stars-in

stars-in

stars-in

69 Database Technology, EDA636

MultiplicityA company has manyemployees; each employeemay work in severalcompanies.

Company Personemploys0..*0..*

A company has manyemployees; each employeeworks in one company. Company Personemploys

0..*1

A company has manyemployees; each employeeworks in one company or isunemployed.

Company Personemploys0..*0..1

70 Database Technology, EDA636

Multiway RelationshipsA star signs a contract with a studio for a movie:

Star Movie0..*0..*

Studio

0..1

contract

Often simpler to promote the relationship to an entity set:

Star Movie0..*0..*

Studio

1

Contract1 1

0..*

71 Database Technology, EDA636

Roles

In addition to naming a relationship, you can specify the rolesthat the entities play in the relationship.A role name is written at the end of the relationship line.

Roles are especially useful when an entity set has arelationship to itself:

Company Personemployer0..*0..*

employeeemploys

Person0..1husband

0..1wife married-to

72 Database Technology, EDA636

Attributes on RelationshipsReconsider the model with the company that has employees.Employees are paid salaries. Where does the salary attributebelong?

• Not in Company, since the company has manyemployees with different salaries.

• Not in Person, since a person may be employed by manycompanies and thus have many different salaries, or beunemployed and have no salary.

The attribute belongs to the relationship, employs. In UML:

Company Personemploys 0..*0..*

Employmentsalary

73 Database Technology, EDA636

SubclassesThere are special kinds of movies: cartoons and murdermysteries. Expressed with the “is-a” relationship(generalization/specialization):

   MovietitleyearlengthfilmType

Cartoon   Murder-Mysteryweapon

Star

voices

This is similar to, but not the same as, inheritance in object-oriented modeling. In E/R modeling, a movie may be both acartoon and a murder mystery.

74 Database Technology, EDA636

An Idea From OO-modeling

We have not considered how we go about finding the entitysets and relationships in a system, just said that they should“reflect reality”.In object-oriented modeling one common technique to findclasses is to pick out the nouns from the requirementsspecification. This will result in a long list of names, and thenthere are rules for determining which of the names that aregood classes. Many associations can also be found in therequirements specification.This technique works also in database modeling. Some of thenouns in the list will become entities, some will becomeattributes, some are irrelevant, etc.

75 Database Technology, EDA636

Design PrinciplesIf you are used to object-oriented modeling, you have alreadymet most of the principles of good design.Faithfulness:

• The entity sets and their attributes should reflect realityIn order to meet this requirement you have to study thecurrent system in detail.Example, where we have entity sets Course and Instructor,with a relationship teaches between them. What is themultiplicity of this relationship?

• Do instructors teach more than one course? Are thereinstructors that do not teach?

• Can courses be taught by more than one instructor?The answers to these and similar questions may varydepending on the policies and needs of the institution that isbeing modeled.

76 Database Technology, EDA636

Choosing Relationships

We have the relationship contract — do we still need therelationships stars-in (Star <–> Movie) and owns (Studio <–>Movie)?

• If a star can appear in a movie only if there’s a contract wedon’t need stars-in

• If all movies have at least one star under contract wedon’t need owns

Star Movie0..*0..*

Studio

0..1

contract

77 Database Technology, EDA636

Expressing RelationshipsThis is not a correct model:

The attribute studioName expresses the same thing as therelationship owns that we had earlier, but in the wrong way:

• one of the main purposes of the E/R model is to makerelationships clear and visible

• so relationships must not be hidden in attributesLater, when the E/R model is translated into relations, therelation Movies will (maybe) contain an attribute studioNameas a foreign key, but that is an implementation detail.

   MovietitleyearlengthfilmTypestudioName

  Studionameaddress

78 Database Technology, EDA636

The Right Kind of Element

Why is Studio an entity set? Couldn’t we put the name andaddress of the studio as attributes in Movie instead?The answer is no, since

• Studios are probably important real-world entities thatdeserve their own description

• To do so would lead to redundancy: we would have torepeat the studio address for each movie

The case would be different if we did not record the addressof studios, but even then it would probably be a good idea tokeep the entity set Studio during analysis, and maybe removeit later, during design.

   MovietitleyearlengthfilmType

  Studionameaddress

10..*owns

79 Database Technology, EDA636

Modeling Constraints

Constraints give additional information on a model. Differentkinds:

• Key constraints• Single-value constraints• Referential integrity constraints• Domain constraints• General constraints

Constraints are part of the database schema and may beenforced by a DBMS.

80 Database Technology, EDA636

Key Constraints

A key is an attribute, or a set of attributes, that uniquelyidentifies an entity.

• A key can consist of more than one attribute. Forinstance, (title,year) uniquely identifies a movie. (Notjust title by itself, since there may be several movieswith the same title, but hopefully made in differentyears.)

• There can be more than one possible key. Pick one anduse it as the primary key.

• In a generalization hierarchy, the key must be containedin the “root” entity set.

NOTE: in database modeling, keys are essential! Not so inobject-oriented modeling, where each object has a uniqueidentity.

81 Database Technology, EDA636

Keys in the E/R Model

To show keys in a UML diagram, you underline the attributesbelonging to a key for an entity set:

   MovietitleyearlengthfilmType

  Studionameaddress

10..*owns

82 Database Technology, EDA636

Referential IntegrityReferential integrity means that an entity surely exists at “theother end” of a relationship. Like this, where the multiplicity“1” tells us that every movie is owned by a studio:

   MovietitleyearlengthfilmType

  StudionameAddress

10..*owns

   MovietitleyearlengthfilmType

  Studionameaddress

0..10..*owns

Note that if a studio is deleted, all movies owned by thatstudio must also be deleted.The following is another case — there may be movies that arecurrently not owned by any studio.

83 Database Technology, EDA636

Weak Entity Sets

An entity set which does not contain enough information toform a key is called a weak entity set. Some attributes fromanother (related) entity set must be used to form a key.Example: a movie studio has several film crews, where crewsare numbered 1, 2, … Other studios may use the samenumbering, so to identify a crew we must first identify thestudio (using the name as a key), then we can use thenumber to identify the crew.

<< weak >>     Crewnumber

  Studionameaddress

10..*unit-of

84 Database Technology, EDA636

Weak Entity Sets, cont’d

We use the stereotype <<weak>> to designate a weak entityset. (This is not a standard stereotype, but UML allows you toinvent your own stereotypes.)The relationship to the entity set whose key is used to formthe key for the weak entity set is called a supportingrelationship. We don’t have any notation to express this.The example shows a common source of weak entity sets,where an entity is existent-dependent on another entity.

Database Technology, EDA636 85

The Relational DataModel

86 Database Technology, EDA636

The Relational Data Model

Suggested in 1970 by E.F. Codd. Today used in almost allDBMS’s.

• A relational database consists of relations (tables)• A relation has attributes (columns in the table)• A row in a table is called a tuple

The relational model is very well understood, and high level,very efficient, query languages (e.g., SQL) are supported.During the analysis phase, however, it is better to use amodel (E/R, for instance), that is richer and more expressive.After analysis, the E/R model is translated into relations.

87 Database Technology, EDA636

Before, and After, RelationsHierarchical databases (1965–1985)

• Data is structured in trees• Operations: tree traversal and node manipulation

Network databases (1965–1985)• Data is structured in graphs• Operations: graph traversal and node manipulation

Relational databases (1975–?)• Data is structured in tables• Operations: create new tables, modify tables, …

Object-oriented databases (1990–?)• Data is structured in objects, with references• Operations: methods in the objects

88 Database Technology, EDA636

Relational Model Basics

Relation schema: Movies(title, year, length, filmType)Relations are sets (order between tuples is immaterial)Order between attributes also is immaterial.

title year length filmTypeStar Wars 1977 124 colorMighty Ducks 1991 104 colorWayne’s World 1992 95 color 

Relation

Tuple

Attribute

89 Database Technology, EDA636

E/R Model –> Relational Model

In short:• Each entity set becomes a relation with the same

attributes as the entity set• Each relationship becomes a relation whose attributes

are the keys for the connected entity sets, plus theattributes of the relationship

But:• Special cases for relationships with multiplicity 1• Special treatment of weak entity sets• Careful treatment of generalization hierarchies (“is-a”)

And:• Relations must be normalized to avoid redundancy

90 Database Technology, EDA636

E/R Entity Sets –> Relations

Movies(title, year, length, filmType)Stars(name, address)Studios(name, address)

   MovietitleyearlengthfilmType

    Starnameaddress

  Studionameaddress

0..* 0..*

0..*

1

stars-in

owns

91 Database Technology, EDA636

E/R Relationships –> Relations   MovietitleyearlengthfilmType

    Starnameaddress

  Studionameaddress

0..* 0..*

0..*

1

stars-in

owns

Owns(title, year, studioName) [will be deleted later]StarsIn(title, year, starName)

Note the keys in the relations, they are related to themultiplicities in the E/R model.

92 Database Technology, EDA636

Foreign Keys

Foreign keys in relations that come from relationships:Owns(title, year, studioName)

In SQL, the corresponding table will look like this:create table Owns (

title varchar(20),year int,studioName varchar(20) not null,primary key (title, year),foreign key (title, year) references Movies(title, year),foreign key (studioName) references Studios(name),

);Note not null: this is because there is a “1” on the studioside of the relationship. Had the multiplicity been “0..1”we wouldn’t have specified not null.

93 Database Technology, EDA636

Example Relationstitle year length filmTypeStar Wars 1977 124 colorMighty Ducks 1991 104 colorWayne’s World 1992 95 color 

name addressCarrie Fisher 123 Maple St., HollywoodMark Hamill 456 Oak Rd., BrentwoodHarrison Ford 789 Palm Dr., Beverly Hills

title year starNameStar Wars 1977 Carrie FisherStar Wars 1977 Mark HamillStar Wars 1977 Harrison FordMighty Ducks 1991 Emilio EstevezWayne’s World 1992 Dana CarveyWayne’s World 1992 Mike Meyers

   MovietitleyearlengthfilmType

    Starnameaddress

0..*

0..*

stars-in

94 Database Technology, EDA636

Combining RelationsRelations that arise from relationships with multiplicity 1 (or0..1) can be deleted by modifying the relation on the “many”side.

   MovietitleyearlengthfilmType

   Studionameaddress

0..* 1owns

Earlier, we used the schema:Movies(title, year, length, filmType)Owns(title, year, studioName)Studios(name, address)

The studio is determined uniquely by the key for Movies, sowe can use the following, simpler, schema instead:

Movies(title, year, length, filmType, studioName)Studios(name, address)

95 Database Technology, EDA636

Nulls

The combining of relations on the previous slides works alsofor relationships with multiplicity “0..1” instead of “1”:

   MovietitleyearlengthfilmType

   Studionameaddress

0..* 0..1owns

Results in:Movies(title, year, length, filmType, studioName)Studios(name, address)

In this case, studioName will be null if a movie doesn’t havean owning studio.

96 Database Technology, EDA636

Several Alternatives

Persons(name), Dogs(name), Owns(pName, dName)Good if there are lots of persons who don’t own a dog,and lots of dogs without an owner, but introduces a newtable

Persons(name, dName), Dogs(name)Good if all (most) persons own a dog

Persons(name), Dogs(name, pName)Good if all (most) dogs have an owner

PersonsDogs(pName, dName)Good if all (most) persons own a dog and all (most) dogshave an owner

   Dogname

0..1 0..1owns   Personname

97 Database Technology, EDA636

When Not to Combine RelationsMany-many (0..*–0..*) relationships may not be combined.Example:

   MovietitleyearlengthfilmType

    Starnameaddress

0..* 0..*stars-in

title year length filmType starNameStar Wars 1977 124 color Carrie FisherStar Wars 1977 124 color Mark HamillStar Wars 1977 124 color Harrison FordMighty Ducks 1991 104 color Emilio EstevezMighty Ducks 1991 104 color Dana CarveyWayne’s World 1992 95 color  Mike Meyers

If we combined the relations we would have to repeat all theinformation about a movie for each star!

98 Database Technology, EDA636

Handling Weak Entity Sets

Recall: A weak entity set does not contain enoughinformation to form a key.

• The relation for a weak entity set must include alsothe key attributes of the other (supporting) entity setsthat help form the key

• A supporting relationship need not be converted to arelation

<< weak >>     Crewnumber

  Studionameaddress

10..*unit-of

This becomes:Studios(name, address)Crews(number, studioName)

99 Database Technology, EDA636

Relationships with AttributesCreate relations from the following model:

Company Personemploys 0..*0..*

Employmentsalary

gives: Company(… attributes …)Person(… attributes …)Employment(salary, Company-key, Person-key)

Note that we get exactly the same result if we promote theemploys relationship to a weak entity set:

Company Person0..*0..*   <<weak>>

Employmentsalary

11

100 Database Technology, EDA636

Subclass Structures –> Relations   MovietitleyearlengthfilmType

Cartoon   Murder-Mysteryweapon

Star

voices

Different options:• E/R: create a relation for each entity set• Object-oriented: create a relation for each possible

“concrete” entity• Null values: create one relation containing all attributes,

use null where an attribute is not applicable

101 Database Technology, EDA636

Subclass ExamplesE/R:

• Movies(title, year, length, filmType)• MurderMysteries(title, year, weapon)• Cartoons(title, year)

Object-Oriented:• Movies(title, year, length, filmType)• MurderMysteries(title, year, length, filmType, weapon)• Cartoons(title, year, length, filmType)

Nulls:• Movies(title, year, length, filmType, weapon)

(where weapon is null for regular movies and cartoons)• Often necessary to introduce a type-attribute

Database Technology, EDA636 102

SQL in Programs,JDBC

103 Database Technology, EDA636

SQL in ProgramsWe have started by using SQL interactively, i.e. by writingSQL statements in a client and watching the results. AllDBMS’s contain a facility for doing this, but it is not intendedfor end-users.Instead, the SQL statements are specified within programs.

DBMS

client (mysql)

Application

Web ServerWeb Browser

SQL

SQL

SQLHTTP

104 Database Technology, EDA636

Advantages and Problems

The most important advantage with SQL in programs:• You get access to a powerful programming language

with advanced program and data structures, graphics,etc.

But there are also problems: SQL uses relations as the onlydata structure, “normal” programming languages use otherstructures: classes, arrays, lists, … There must be a way toovercome this mismatch:

• How are values passed from the program into SQLcommands?

• How are results of SQL commands returned intoprogram variables?

• How do we deal with relation-valued data?

105 Database Technology, EDA636

Embedded SQLOne way to interface to a DBMS is to embed the SQLstatements in the program. Example (C language, Java’sSQLJ is similar):

void createStudio( ) {EXEC SQL BEGIN DECLARE SECTION;

char studioName[80], studioAddr[256];char SQLSTATE[6];

EXEC SQL END DECLARE SECTION;/* read studioName and studioAddr from terminal */EXEC SQL INSERT INTO Studios(name, address)

       VALUES (:studioName, :studioAddr);}

This code must be preprocessed to produce a normal Cprogram with special DBMS function calls. The C program isthen compiled and linked with a special DBMS library.

106 Database Technology, EDA636

Dynamic SQLAnother way to interface to a DBMS is to let the program assemblethe SQL statements at runtime, as strings, and use the strings asparameters to the library function calls (CLI, Call Level Interface).Then, the preprocessing step can be skipped.Example (Java, JDBC):

void createStudio( ) {String studioName, studioAddr;/* read studioName and studioAddr from terminal */Statement stmt = con.createStatement( );stmt.executeUpdate(”INSERT INTO Studios(name, address) ” +

         ”VALUES ('” + studioName + ”', '” +         studioAddr + ” ')”);

}(” is the Java string delimiter, ' is the SQL string delimiter.)

107 Database Technology, EDA636

JDBC

JDBC (just a name, sometimes said to mean Java DatabaseConnectivity) is a call level interface that allows Javaprograms to access SQL databases. JDBC is a part of theJava 2 Platform, Standard Edition (packages java.sql, basics,and javax.sql, advanced and new features).In addition to the Java classes in J2SE you need a vendor-specific driver class for your DBMS.ODBC is an earlier standard, not only for Java.

108 Database Technology, EDA636

Loading the Database Driver

To load a database driver (in this example, a MySQL driver),you use the following statement:

try {Class.forName(”com.mysql.jdbc.Driver”);

}catch (ClassNotFoundException e) {

// … if the class cannot be found}

You must only load the driver once in the program.

109 Database Technology, EDA636

Establishing a ConnectionWhen a driver has been loaded you connect to the database with astatement of the following form:

Connection con = DriverManager.getConnection(        url, username, password);

Connection is a class in package java.sqlurl is a vendor- and installation-specific stringusername is your database login namepassword is your database login password

Example (for our local installation):try {

Connection con = DriverManager.getConnection(”jdbc:mysql://srv016.hbg.lu.se/” + ”db01”,”db01”, ”abc123de”);

}catch (SQLException e) {

// … if the connection couldn’t be established}

110 Database Technology, EDA636

A Note on SQLExceptions

All JDBC calls can throw SQLExceptions (package java.sql). Inthe following, we will not show the necessary try-catchstructure that is necessary to handle these exceptions:

try {// … JDBC calls

}catch (SQLException e) {

// … do something to handle the error}

Do not leave the catch block empty. As a minimum, writesomething like this:

catch (SQLException e) {e.printStackTrace( );

}

111 Database Technology, EDA636

Creating JDBC Statement Objects

A JDBC Statement object (package java.sql) is used to sendSQL statements to the DBMS. An active connection is neededto create a Statement object:

Statement stmt = con.createStatement( );At this point stmt exists, but it does not have an SQLstatement to pass on to the DBMS. You must supply that, asa string, to the method that is used to execute the statement.After the statement object has been used you must close thestatement object:

stmt.close( );The closing is absolutely necessary, since there is a limit onthe number of active Statement objects.

112 Database Technology, EDA636

Creating Tables

Consider a database with the following schema (bars selldifferent beers at different prices):

Bars(name, address)Sells(barName, beerName, price)

It is possible to create these tables from JDBC, butdatabase tables are usually not created or deleted inapplication programs (instead, in mysql or a similardatabase client).

113 Database Technology, EDA636

Insert/Update Statements

Insert some data into the database, then raise the price of Urquellat Bishop’s Arms:

stmt.executeUpdate(”insert into Bars ” +”values ('Bishop''s Arms', 'Lund')”);

stmt.executeUpdate(”insert into Sells ” +”values ('Bishop''s Arms', 'Urquell', 45.00)”);

int n = stmt.executeUpdate(”update Sells ” +”set price = price + 2.00 where ” +”barName = 'Bishop''s Arms' and beerName = 'Urquell')”);

Comments on next slide.

114 Database Technology, EDA636

Insert/Update Comments

Notice:• No semicolon after the SQL statement string• Two consecutive ’-s inside an SQL string become one ’• executeUpdate returns the number of affected tuples when an

update statement is executed. It returns zero on create andinsert statements.

115 Database Technology, EDA636

Update ExampleA more general method to raise the price of a specific beer at aspecific bar. Essential to get all ”-s and '-s and +-s right!

boolean raiseBeerPrice(String bar, String beer, double amount) {try {

Statement stmt = con.createStatement( );int n = stmt.executeUpdate(”update Sells ” +

”set price = price + ” + Double.toString(amount) +” where ” +”barName = '” + bar + ”' and ” +”beerName = '” + beer + ”')”);

stmt.close( );if (n!=1)

return false;}catch (SQLException e) {

return false;}return true;

}116 Database Technology, EDA636

Select Statements

To issue a select query you execute the statement withexecuteQuery. This method returns an object of classResultSet that functions as an iterator for the returned bag oftuples. A tuple is fetched with the next( ) method, andattributes of a tuple are fetched with getXXX( ) methods.Example:

ResultSet rs = stmt.executeQuery(”select * from Sells”);while (rs.next( )) {

String bar = rs.getString(”barName”);String beer = rs.getString(”beerName”);double price = rs.getDouble(”price”);System.out.println(bar + ” sells ” + beer +

      ” for ” + price + ” kr.”);}

next( ) returns false if there are no more tuples.

117 Database Technology, EDA636

Result Sets

Notice: there is at most one ResultSet object associated witheach Statement object. So the following sequence ofstatements is wrong:

ResultSet rs = stmt.executeQuery(”select * from Sells”);…stmt.close( );String bar = rs.getString(”barName”);// the statement is closed, the result set no longer exists

This is also wrong:ResultSet rs1 = stmt.executeQuery(”select * from Sells”);ResultSet rs2 = stmt.executeQuery(”select * from Bars”);…String bar = rs1.getString(”barName”);// rs1 no longer exists, it has been replaced by rs2

118 Database Technology, EDA636

Reading Invented Keys

Consider the following table definition:

create table T (x integer auto_increment primary key,y varchar(10)

);How can I find out the value of the auto generated keys?

stmt.executeUpdate( ”insert into T(y) values (’first’)”,Statement.RETURN_GENERATED_KEYS);

…ResultSet rs = stmt.getGeneratedKeys();// work with rs as with a normal ResultSet

The result set will contain the invented keys.

119 Database Technology, EDA636

Different get Methods

There are overloaded versions of the get methods that accessa column not by name but by ordinal number. The followinggives the same result as the get’s on the previous slide:

String bar = rs.getString(1);String beer = rs.getString(2);double price = rs.getDouble(3);

This form is not recommended, since it presumes knowledgeof the column order.There are several different get methods: getByte, getShort,getInt, getLong, getFloat, getDouble, getBoolean, getString, …

120 Database Technology, EDA636

Prepared StatementsThe SQL query inside a Statement object is compiled everytime the statement is executed. If you issue many similarrequests, it may be more efficient to do the compilation once,and supply parameters when the statement is to be executed.This you can do with PreparedStatement objects:

PreparedStatement updatePrice = con.prepareStatement(”update Sells set price = price + ? where ” +”barName = ? and beerName = ?”)

The question marks are placeholders for the parameters.These are plugged in as follows:

updatePrice.setFloat(1, 2.00);updatePrice.setString(2, ”Bishop''s Arms”);updatePrice.setString(3, ”Urquell”);

Now the statement is ready for execution:int n = updatePrice.executeUpdate( );

121 Database Technology, EDA636

A Note on SecurityBe careful to check all input that you request from a user. Donot use unchecked input as part of an SQL string.Example:

String accntNbr = requestAccntNbrFromUser( );double amount = requestAmountFromUser( );stmt.executeUpdate(”update BankAccounts ” +

”set balance = balance + ” + Double.toString(amount) +” where ” +”accntNbr = '” + accntNbr + ”')”);

A user who knows SQL could enter the following string whenhe is requested to enter the account number (this is called”SQL injection”):

' or 'x' = 'xThe resulting condition always evaluates to true! The problemdoesn’t occur if you use PreparedStatements.

122 Database Technology, EDA636

Let the DBMS Do the Work

Do not write code like the following:ResultSet rs = stmt.executeQuery(”select * from Sells”);while (rs.next( ))

if (rs.getDouble(”price”) < 40.00 &&    rs.getString(”beerName”).equals(”Urquell”))    System.out.println(”Cheap Urquell at: ” +

  rs.getString(”barName”));

Instead, let the DBMS do the work it is designed to do:ResultSet rs = stmt.executeQuery(”select barName ” +

     ”from Sells where price < 40.00 and ” +     ”beerName = 'Urquell'”);

while (rs.next( ))System.out.println(”Cheap Urquell at: ” +

                  rs.getString(”barName”));

123 Database Technology, EDA636

TransactionsNormally, databases are used by several users simultaneously. TheDBMS must ensure that updates from different users do notinterfere with each other. It does this by locking appropriate itemsin the database, but it must receive hints from the user on what tolock and when to do it.The user gives these hints by grouping statements in transactions.The DBMS guarantees that a transaction is ACID:

Atomic either performed in its entirety or not performed at allConsistent transforms the database from one consistent state to

another consistent stateIsolated executes independently of other transactions, so the

partial effects of an incomplete transaction isinvisible

Durable the effects of a successfully completed transactionare permanently recorded in the database

124 Database Technology, EDA636

Transactions in MySQLOnly the InnoDB storage engine supports transactions.As default, a connection runs in auto-commit mode. This meansthat each SQL statement is treated as a transaction — allchanges performed by the statement are immediately committed(saved permanently in the database).With the command start transaction you turn off auto-commitmode. Then you issue a series of statements and can choose toeither commit or to rollback (undo) the effects of the statements.

delete from A;start transaction;insert into A values (1);insert into A values (2);commit;-- A contains 1 and 2start transaction;insert into A values (3);rollback;-- A still contains 1 and 2

125 Database Technology, EDA636

Transaction Isolation LevelsFor a single user, transactions are mostly a help in crashrecovery: if the DBMS crashes in the middle of a transaction,the transaction will be rolled back when the DBMS is restarted.The main use of transactions is to isolate actions from differentuser from each other. By setting the transaction isolation levelclients can control what kind of changes made by othertransactions that they want to see:

READ UNCOMMITTED Can see modifications even beforethey are committed

READ COMMITTED Can only see committed modificationsREPEATABLE READ If a transaction issues the same query

twice, the results are identicalSERIALIZABLE Rows examined by one transaction

cannot be modified by othertransactions

126 Database Technology, EDA636

Transactions in JDBC

Transaction control in JDBC is performed by the Connectionobject. When a connection is created, by default it is in auto-commit mode.Turn off auto-commit mode (start a transaction) with:

con.setAutoCommit(false);// … a series of statements treated as a transaction

The series of statements can then be committed with:con.commit( );

or rolled back with:con.rollback( );

After a transaction, auto-commit mode must be turned on again.

127 Database Technology, EDA636

Communication Java–Database

Guidelines:• The database communication should be in one class. Do not

spread SQL statements all over your program!• Write methods in the class that return results of SQL queries,

or perform updates• select address from Bars where barName = ?;

=> public String getAddress(String barName) { … }• select barName from Bars;

=> public ArrayList<String> getBarNames( ) { … }• select * from Bars where barName = ?;

=> public Bar getBar(String barName) { … }and write a class Bar with the same attributes as the relation.

Database Technology, EDA636 128

Web ServerTechnologies,Java Servlets

129 Database Technology, EDA636

Web Servers

A web server services requests from web clients (“browsers”).Servers must be able to handle several clients simultaneously.

Client Server

HTTP GET (or other) request

HTML code

Most HTML pages are static. Then, the server’s task is easy:find the requested file, return it unchanged.

130 Database Technology, EDA636

Dynamic Web Pages

There is a need for web pages with dynamic content:• Answers to database queries• Animated web pages• User dialogs• Checking user input• Etc.

Dynamic content may be handled by the client (Java applets,JavaScript, VBScript, Flash, Shockwave, …) or by the server(next slide).

131 Database Technology, EDA636

Server-Side Technologies

Many systems of plug-in modules or scripting languages:• CGI — Common Gateway Interface• ASP — Active Server Pages• PHP — PHP Hypertext Preprocessor (earlier Personal

Home Page)• Java Servlets• Java Server Pages

When the server handles dynamic requests from a client,parameters are transmitted to the server (as part of the URLor as part of a POST message).

132 Database Technology, EDA636

Supplying Parameters, HTML FormsThis is a static HTML page containing a form:

<html><head><title>CGI-Example</title></head><body>

<form method = ”get” action = ”/cgi-bin/storeaddress.pl”>Your name: <input name = ”name” type = ”text”><br>Your email: <input name = ”email” type = ”text”><br><input type = ”submit”>

</form></body></html>

133 Database Technology, EDA636

Parameters in HTTP Requests

GET /cgi-bin/storeaddress.pl?name=Per+Holm&email=Per.Holm%40cs.lth.se&Submit=Submit HTTP 1.0

The form on the previous slide generates the following HTTPrequest:

If the request method POST had been used, the following requestwould have been generated:

POST /cgi-bin/storeaddress.pl HTTP 1.0Content-type: application/x-www-form-urlencodedContent-length: 54

name=Per+Holm&email=Per.Holm%40cs.lth.se&Submit=Submit

134 Database Technology, EDA636

GET or POST

You can transfer parameters via GET or POST:• GET is used when you write a URL in the address field of a

browser• GET transfers the parameters as part of the URL• POST transfers the parameters as part of the HTTP request

Usually, you use POST to transfer parameters from HTML forms.Advantageous if you have many long parameters.

135 Database Technology, EDA636

CGI – Common Gateway Interface

1. The web server receives a request for a web page with aspecial URL (/cgi-bin/program-name)

2. The “CGI script” is started in a new process. The scriptmay be written in any programming language

3. The script reads the parameters via stdin4. The script writes output (HTML-code) to stdout5. The script terminatesMany problems with this:• Expensive to start a new process for each request• Difficult to save state between program executions• Difficult to synchronize between processes

136 Database Technology, EDA636

Reply to HTTP RequestA reply to a HTTP request contains:

1. The MIME type, typically “Content-type: text/html”2. A blank line3. HTML code

Example:Content-type: text/html

<html><head><title>Registration completed</title></head><body><h1>Registration completed</h1>Per Holm ([email protected]) has been added tothe user database.</body>

</html>

137 Database Technology, EDA636

Java Servlets

A Java servlet is a “small server” (applet = “small application”).Similar to CGI, but:

• Written in Java• System independent• The servlets are run (in separate threads) by a JVM inside

the web server• No overhead for process creation/termination• Easy to save state• Easy to synchronize between servlets

138 Database Technology, EDA636

The Servlet Life Cycle

The server loadsthe servlet

One or more servletobjects are created

The server calls the init( )method in each object

A HTTP request isreceived by the server

The server picks a servletobject and calls itsservice( ) method

The service( ) methodhandles the request andgenerates an HTML page

The servlet waits forthe next request

The server stops theservlet and calls itsdestroy( ) method

139 Database Technology, EDA636

Servlet Classes

Packages:javax.servlet.*javax.servlet.http.*

Classes:HttpServlet — superclass for web servletsHttpServletRequest — the request from the clientHttpServletResponse — the response from the servletServletConfig — server informationHttpSession — session information

140 Database Technology, EDA636

init( ), destroy( ), service( )

public class MyServlet extends HttpServlet {/** initialize the servlet */public void init( ) {

// initialize, e.g. open database connection}/** terminate the servlet */public void destroy( ) {

// terminate, e.g. close database connection}/** service a request */protected void service(HttpServletRequest request,

      HttpServletResponse response)throws ServletException, IOException {// service a request, see next slide

}}

141 Database Technology, EDA636

More on service( )

The servlet may receive different kinds of requests: GET,POST, … The default implementation of service( ) distributesthese different requests to methods doGet( ), doPost( ), …So, usually, you don’t implement service( ). Instead, youimplement doGet( ) and/or doPost( ):

/** service a GET request */protected void doGet(HttpServletRequest request,

    HttpServletResponse response)throws ServletException, IOException {// service a GET request

}

142 Database Technology, EDA636

HttpServletRequest

Get information about the request, typically parameters:public String getParameter(String name);

But also:public String getMethod( );public String getRemoteAddr( );public String getRemoteHost( );public HttpSession getSession(boolean create);

And several more …

143 Database Technology, EDA636

Responding to a HTTP Request

Set the MIME type of the response:response.setContentType(”text/html”);

Get a print writer for the HTML code:PrintWriter out = response.getWriter( );

Write HTML code:out.println(”<html>”);…;out.println(”</html>”);

Close the print writer:out.close( );

144 Database Technology, EDA636

Servlet Example

145 Database Technology, EDA636

HTML for First Page

<html><head><title>Form Data Entry Page</title></head><body>

<h1 align = "center">Fill in some data</h1>

<p>The demo servlet works out square roots, so feed it a number.<p>

<form method = "get" action = "/demo/sqrtservlet"><input type = "text" name = "number"><input type = "submit" value = "Compute root">

</form></body></html>

146 Database Technology, EDA636

ServletCode,part 1

import java.io.*;import javax.servlet.*;import javax.servlet.http.*;

public class SqrtServlet extends HttpServlet {public void doGet(HttpServletRequest request,                             HttpServletResponse response)throws ServletException, IOException {

// set the response typeresponse.setContentType("text/html");

// get a print writer, print html headerPrintWriter out = response.getWriter( );out.println("<html>");out.println("<head><title> Square roots </title></head>");out.println("<body>");

// continued on the next slide

147 Database Technology, EDA636

ServletCode,part 2

// get the parameterString data = request.getParameter("number");

// convert parameter to double (checks omitted), print square rootdouble value = Double.parseDouble(data);out.println("The square root of " + value + " is " +                 Math.sqrt(value));

// print a new form (note \" for a " inside a string)out.println("<p>Try another:<p>");out.println("<form method=\"get\" action=\"/demo/sqrtservlet\">");out.println("<input type=\"text\" name=\"number\">");out.println("<input type=\"submit\" value=\"Compute root\">");out.println("</form>");

// print html footer, close the print writerout.println("</body></html>");out.close();

}}

148 Database Technology, EDA636

Servlet Discussion

In the square roots example, it is overkill to use servlets! ACGI solution (using, e.g., Perl), would have been muchsimpler.But servlets are not intended for simple applications like this.They are used, e.g., in web shops. There, you have to save alot of state (shopping carts) and also need to fetch and savedata from/to a database.

149 Database Technology, EDA636

Saving State

Since servlets are “permanent” objects (during their life time)you can save state in attributes. This is good for, e.g.,database connections. (You may need a pool of opendatabase connections if you have many simultaneousrequests.)But, you have to be able to distinguish between differentclients (sessions). Different ways to accomplish this:

• Cookies (for small amounts of data)• URL rewriting• Hidden fields in forms• HttpSession objects

150 Database Technology, EDA636

HttpSession

A HttpSession object provides a way to identify a user acrossmore than one page request or visit to a web site and to storeinformation about that user.Create/retrieve a session object:

HttpSession session = request.getSession(true/false);Store/retrieve an attribute value (here, a databaseconnection):

session.setAttribute("db", db);Database db = (Database) session.getAttribute("db");

Internally, HttpSession uses cookies (or URL rewriting, ifcookies are not enabled) to transfer a “session id” from theclient to the server.

151 Database Technology, EDA636

Servlet Discussion, part 2

Advantages:• Servlets are powerful and scale well for large

applicationsBut also disadvantages:

• You need a programmer to develop and maintainservlets

• The HTML code is hidden inside print statements• So you cannot use web design tools to design pages• All changes (even just to HTML layout) force

recompilation and reinstallation

152 Database Technology, EDA636

JSP, Java Server Pages

To overcome the disadvantages, Java Server Pages weredeveloped.

• Basic idea: put the Java code inside the HTML codeinstead of the other way around

• Special tags mark insertion of Java code• When a JSP (.jsp) page is requested, the JSP page is

converted into a servlet and compiled (only on the firstrequest)

• Often use Java BeansASP (Microsoft) and PHP use the same idea.

Database Technology, EDA636 153

FunctionalDependencies,Normalization

154 Database Technology, EDA636

Normalization, Background

This “feels right”:

    MovietitleyearlengthfilmTypestudioName

    StarstarName

0..* 0..*stars-in

This doesn’t:

    MovietitleyearlengthfilmTypestudioNamestarName

What happens if we convert the second model into a relationwith the schema Movies(title, year, length, filmType, studioName,starName)? Anomalies occur, see next slide.

155 Database Technology, EDA636

Anomalies

• Redundancy — information is repeated in several tuples(length, filmType, studioName)

• Update anomalies — we must be careful to change everyoccurrence of a value (change length of Star Wars)

• Deletion anomalies — if a set of values becomes empty wemay lose other information (delete Estevez => allinformation about Mighty Ducks is lost)

title year length filmType studioName starNameStar Wars 1977 124 color Fox Carrie FisherStar Wars 1977 124 color Fox Mark HamillStar Wars 1977 124 color Fox Harrison FordMighty Ducks 1991 104 color Disney Emilio EstevezWayne’s World 1992 95 color Paramount Dana CarveyWayne’s World 1992 95 color  Paramount Mike Meyers

156 Database Technology, EDA636

Normalization

If we have a relation, e.g. Movies on the previous slide, thereexists a formal procedure to:

1) Discover anomalies2) Decompose (“split”) the relation into two (or more)

relations without anomaliesThis procedure is called normalization.Normalization builds on the theory of functionaldependencies.(In the example on the previous slides, common sense wouldhave led us right — it seems “unnatural” to put the starNameattribute in the Movies entity. However, there are otherexamples where common sense may not suffice.)

157 Database Technology, EDA636

Functional DependenciesA functional dependency, FD, on a relation R is a statementof the form:

If two tuples of R agree on the attributes A1, A2, …, An,then they must also agree on another attribute, B.

A1 A2 … An → B

A functional dependency is a constraint on a schema andmust hold for all instances of a schema.Example (where persNo is a key which functionallydetermines all other attributes):

Person(persNo, name, address)

persNo → namepersNo → address

or, shorter:

persNo → name address158 Database Technology, EDA636

FD Example

Relation Movies(title, year, length, filmType, studioName, starName).The following FD’s hold:

title year → lengthtitle year → filmTypetitle year → studioName

or:

title year → length filmType studioName

but not:

title year → starName (naturally, since there may be severalstars in a movie)

159 Database Technology, EDA636

FD’s Are About the Schema

Notice again that you cannot find FD’s by looking at onespecific instance of a relation. FD’s are semantic propertiesthat concern the meaning of the attributes.Example: by looking at the instance of the Movies schemabelow, you might be led to believe that the FD title → filmTypeholds. This is not true (e.g. the three versions of King Kong,two in color and one in black-and-white).

title year length filmType studioName starNameStar Wars 1977 124 color Fox Carrie FisherStar Wars 1977 124 color Fox Mark HamillStar Wars 1977 124 color Fox Harrison FordMighty Ducks 1991 104 color Disney Emilio EstevezWayne’s World 1992 95 color Paramount Dana CarveyWayne’s World 1992 95 color  Paramount Mike Meyers

160 Database Technology, EDA636

Keys of RelationsWe have already defined the concept of a key for a relationinformally. A formal definition:A set of one or more attributes {A1, A2, …, An} is a key for arelation R if:

1. Those attributes functionally determine all other attributesof R.

2. No proper subset of {A1, A2, …, An} functionally determinesall other attributes of R.

The last point means that a key must be minimal.Example: {title, year, starName} is a key for the relation Movies.Notice that a relation may have more than one key. In that case,one of the keys is chosen as the primary key.A set of attributes that contains a key is called a superkey(“superset of key”).

161 Database Technology, EDA636

Simple Rules About FD’s

title year → lengthtitle year → filmType title year → length filmType studioNametitle year → studioName

combining

splitting

2. Trivial dependencies:

A trivial FD is an FD where the right hand side is one of theattributes on the left hand side:

title year → title

3. The transitive rule (the most important rule):If the FD’s A → B and B → C hold, then also A → C holds

1. The splitting/combining rule:

162 Database Technology, EDA636

Closure of Attributes

The closure of a set of attributes, {A1, A2, …, An}, under a set S ofFD’s is the set of attributes B such that A1 A2 … An → B. That is,A1 A2 … An → B follows from the FD’s in S.

The closure of {A1, A2, …, An} is denoted {A1, A2, …, An}+.

This is used for finding keys, and in normalizing, examples later.The algorithm for computing closures works by adding attributeson the right side of FD’s to an initial set.Example on next slide.

163 Database Technology, EDA636

Closure Computation ExampleRelation R(A, B, C, D, E, F).

FD’s: AB → C, BC → AD, D → E, CF → B.

Compute X = {A, B}+:

1. Start with X = {A, B}

2. AB → C, so C can be added, X = {A, B, C}

3. BC → AD, so AD can be added, X = {A, B, C, D}

4. D → E, so E can be added, X = {A, B, C, D, E}

5. Nothing more can be added, so {A, B}+  = {A, B, C, D, E}

From this we can infer that AB → D and AB → E follows from theinitial set of FD’s. (But not AB → F, so A,B is not a key for R.)

164 Database Technology, EDA636

Key Computation IThe same relation and FD’s as in the previous example:

R(A, B, C, D, E, F)

FD1. AB → CFD2. BC → ADFD3. D → EFD4. CF → B

What are the keys in this relation? To answer this you have tocompute the closures of all subsets of attributes. The keys are thesubsets whose closures contains all five attributes.Initial observation: F is not on the right-hand side of any FD. Thismeans that all keys must contain F.One-attribute subsets containing F:

{F}+ = {F}

165 Database Technology, EDA636

Key Computation II

Two-attribute subsets containing F:{AF}+ => {AF}{BF}+ => {BF}{CF}+ => {CF} => {CFB} (FD4) => {CFBAD} (FD2) => {CFBADE} (FD3){DF}+ => {DF} => {DFE} (FD3){EF}+ => {EF}

This shows that {CF} is a key. So, when we examine three-attributesubsets we don’t have to consider subsets that contain {CF} (suchsubsets would be superkeys).

166 Database Technology, EDA636

Key Computation III

Three-attribute subsets containing F but not C:{ABF}+ => {ABFCDE}{ADF}+ => {ADFE}{AEF}+ => {AEF}{BDF}+ => {BDFE}{BEF}+ => {BEF}{DEF}+ => {DEF}

{ABF} is a key. The four-attribute subsets {ADEF} and {BDEF} are notkeys, so the keys are:

{CF}{ABF}

167 Database Technology, EDA636

Projecting FD’s

An unnormalized relation is normalized by splitting therelation in two (or more) relations. This is done by eliminatingcertain attributes from the relation schema. This is calledprojection.Question: what FD’s hold in the projected relation? Example:

R(A, B, C, D) with FD’s A → B, B → C, C → D.

What FD’s hold in the new relation S(A, C, D)?

{A}+ = {A,B,C,D} gives A → C, A → D

{C}+ = {C,D} gives C → D

{D}+ = {D} gives nothing

{C,D}+ = {C,D} gives nothing

168 Database Technology, EDA636

A Note on Projecting FD’sNotice again that FD’s are semantic statements about thedata in the schema, and that FD’s hold regardless of thedecomposition of data into relations. FD’s cannot disappearjust because data items are split over several relations.Previous example:

R(A, B, C, D) with FD’s A → B, B → C, C → D.

What FD’s hold in the new relation S(A, C, D)?The following reasoning is wrong:

“A → B cannot hold since B is not in S, B → C cannothold since B is not in S, so C → D is the only FD thatholds in S”

You must take dependencies that have been derived from thetransitive rule into account!

169 Database Technology, EDA636

Normalization again

There exists a formal procedure to:1) Discover anomalies (see next slide)2) Decompose (“split”) the relation into two (or more)

relations without anomaliesThis procedure is called normalization.There are several “normal forms”. The most important isBoyce-Codd normal form.

170 Database Technology, EDA636

Anomalies again

• Redundancy — information is repeated in several tuples(length, filmType)

• Update anomalies — we must be careful to change everyoccurrence of a value (change length of Star Wars)

• Deletion anomalies — if a set of values become empty wemay lose other information (delete Estevez => allinformation about Mighty Ducks is lost)

title year length filmType studioName starNameStar Wars 1977 124 color Fox Carrie FisherStar Wars 1977 124 color Fox Mark HamillStar Wars 1977 124 color Fox Harrison FordMighty Ducks 1991 104 color Disney Emilio EstevezWayne’s World 1992 95 color Paramount Dana CarveyWayne’s World 1992 95 color  Paramount Mike Meyers

171 Database Technology, EDA636

BCNF, Boyce-Codd Normal FormDefinition of BCNF:

• A relation R is in BCNF if and only if: whenever there isa nontrivial FD A1 A2 … An → B for R, it is the case that{A1, A2, …, An} is a superkey for R.

Example:The relation Movies(title, year, length, filmType,studioName, starName) has the FD’s:

title year starName → length filmType studioName{title, year, starName} is the key

title year → length filmType studioName

{title, year} is not a superkey, i.e. not a superset of {title,year, starName}.Movies is not in BCNF.

172 Database Technology, EDA636

A Relation in BCNF

Consider the relation Movies1(title, year, length, filmType,studioName), i.e. Movies without the starName.This relation has the only FD:

title year → length filmType studioName

{title, year} is a key.There are no other non-trivial FD’s, so Movies1 is in BCNF.

173 Database Technology, EDA636

Decomposition into BCNFBy repeatedly applying suitable decompositions, we can breakany relation into smaller relations that are in BCNF.Important: we must be able to reconstruct the original relationinstance exactly by joining the decomposed relation instances.The reconstruction will be shown later.Decomposition algorithm that meets this goal:

1. Start with a BCNF-violating FD, A1 A2 … An → B1 B2 … Bm,optionally expand the right-hand side as much as possible(closure)

2. Create a new relation with all the attributes of the FD, i.e. allthe A’s and all the B’s

3. Create a new relation with the left-hand side of the FD, i.e.all the A’s, plus all the attributes not involved in the FD

4. Repeat if necessary

174 Database Technology, EDA636

Example BCNF Decomposition

Movies(title, year, length, filmType, studioName, starName).

BCNF-violating FD: title year → length filmType studioName

New relation: Movies1(title, year, length, filmType, studioName)

New relation: Movies2(title, year, starName)

We have already checked that Movies1 is in BCNF. Equally easy tocheck that Movies2 is in BCNF (it contains no FD’s at all, so the key isall three attributes).

175 Database Technology, EDA636

Another BCNF Decomposition

Consider the relationMovieStudios(title, year, length, filmType, studioName, studioAddr).

{title, year} is a key.

An obvious FD is studioName → studioAddr, and studioName is not asuperkey.Decomposition:

BCNF-violating FD: studioName → studioAddr

New relation: MovieStudios2(studioName, studioAddr)

New relation: MovieStudios1(title, year, length, filmType, studioName)

176 Database Technology, EDA636

What Causes non-BCNF?

Common causes of BCNF-violations:• Putting a many-many relationship in one relation (the

first example, with Movies containing starNames)• Transitive dependencies (the second example, where

studioName → studioAddr)

177 Database Technology, EDA636

Common Sense IWe started our discussion of normalization by saying that thefollowing E/R diagram “feels right”, but then we did not takeit as a basis for the relations:

    MoviestitleyearlengthfilmTypestudioName

    StarstarName

0..* 0..*stars-in

If we had followed the rules for translating an E/R model intorelations we would have ended up with the followingrelations:

Movies(title, year, length, filmType, studioName)Stars(starName) [superfluous if every star is in a movie]StarsIn(title, year, starName)

i.e. exactly the same relations as Movies1 and Movies2.178 Database Technology, EDA636

Common Sense II

The example with movies and their owning studios should bemodeled like this:

   MoviestitleyearlengthfilmType

     StudiostudioNamestudioAddr

0..* 1owns

By the rules, this would have resulted in the followingrelations:

Movies(title, year, length, filmType, studioName)Studios(studioName, studioAddr)

i.e. exactly the same relations as MovieStudios1 andMovieStudios2.

179 Database Technology, EDA636

Common Sense III

Conclusions:• Careful E/R modeling is essential for proper

understanding of a problem• But careful E/R modeling also results in relations that

are better normalized than if you start with relationsdirectly

This is not to say that careful E/R modeling always results innormalized relations. Consider the following model, which“feels right” but gives a relation that isn’t in BCNF:

   PersonpersNonamestreetAddrpostCodecity

180 Database Technology, EDA636

Reconstructing Information

We earlier said that “we must be able to reconstruct theoriginal relation instance exactly from the decomposedrelation instances”.The reconstruction is performed by joining two relations. Twotuples can be joined if they agree on the values of an attribute(or values of sets of attributes).The decomposition algorithm presented earlier, which isbased on FD’s, yields relations that may be joined on theattributes on the left-hand sides of the FD.Other “ad hoc” algorithms may yield relations that, whenjoined, contain spurious (false) tuples.

181 Database Technology, EDA636

A Reconstruction ExampleA relation R(A, B, C) with only one FD:

FD1. A → B

is not in BCNF. {A, C} is the only key, FD1 violates the BCNFcondition. A decomposition of R that is not done according tothe BCNF rules is R1(A, B) and R2(B, C). Both R1 and R2 arein BCNF, since they have only two attributes each.Example projection and reconstruction:

A B C1 2 32 2 4

A B C1 2 31 2 42 2 32 2 4

A B1 22 2

B C2 32 4

project=>

reconstruct(join on B)

=>

The tuples (1, 2, 4) and (2, 2, 3) were not in the originalrelation, and are spurious. Exercise: decompose according tothe BCNF rules and then check the same example.

182 Database Technology, EDA636

More Reconstruction

The same example as on the previous slide, with betternames for the attributes. People have names and own cars. Aperson may own many cars, and a car may be owned bymany persons. This is described by the relation R(pNo, name,carNo), with the FD:

FD1. pNo → name

{pNo, carNo} is the only key, FD1 violates the BCNF condition.A decomposition of R that is not done according to the BCNFrules is R1(pNo, name) and R2(name, carNo). Both R1 and R2are in BCNF, since they have only two attributes each.With these attributes, it is obvious that the decomposition inR1 and R2 is stupid. If we instead decompose according tothe BCNF rules, we get the relations S1(pNo, name) andS2(pNo, carNo), which can be joined on pNo.

183 Database Technology, EDA636

Another Reconstruction Example

title year length filmType studioName starNameStar Wars 1977 124 color Fox Carrie FisherStar Wars 1977 124 color Fox Mark HamillStar Wars 1977 124 color Fox Harrison FordMighty Ducks 1991 104 color Disney Emilio EstevezWayne’s World 1992 95 color Paramount Dana CarveyWayne’s World 1992 95 color  Paramount Mike Meyers

title year length filmType studioNameStar Wars 1977 124 color FoxMighty Ducks 1991 104 color DisneyWayne’s World 1992 95 color Paramount

title year starNameStar Wars 1977 Carrie FisherStar Wars 1977 Mark HamillStar Wars 1977 Harrison FordMighty Ducks 1991 Emilio EstevezWayne’s World 1992 Dana CarveyWayne’s World 1992 Mike Meyers

184 Database Technology, EDA636

Other Normal Forms, Motivation

Sometimes, BCNF is “too strong”, in the sense that we maylose important FD’s if we decompose into BCNF.Example, a relation Bookings(title, theater, city) that describesthat a movie plays in a theater in a city. FD’s:

FD1: theater → cityFD2: title city → theater [assumed, no two theaters in the

                    same city show the same movie]Keys:

{title, city} [from FD2]{theater, title} [FD1 augmented with title]

So, FD1 is a BCNF violation. Attempt at decomposition onnext slide.

185 Database Technology, EDA636

Other Normal Forms, cont’dIf we decompose the relation Bookings into BCNF we get therelations:

R1(theater, city)R2(theater, title)

These relations can be updated independently of each other.But when we do that, we cannot check that FD2, title city →theater, holds. I.e., when we join the relations (on theater) wemay get tuples for which FD2 does not hold.

theater cityGuild Menlo ParkPark Menlo Park

theater titleGuild The NetPark The Net

theater city titleGuild Menlo Park The NetPark Menlo Park The Net

186 Database Technology, EDA636

3NF, Third Normal Form

Third normal form is a relaxation of BCNF. Definition:A relation R is in third normal form if and only if:whenever there is a nontrivial FD A1 A2 … An → B for R,it is the case that {A1, A2, …, An} is a superkey for R, or Bis a member of some key.

The definition is the same as for BCNF with the addition “or Bis a member of some key”.In the relation Bookings(title, theater, city) this allows theBCNF-violating FD:

theater → city

Since city is a member of the key {title, city}, Bookings is in3NF.

187 Database Technology, EDA636

Another Example

The following relation is not in BCNF:Persons(persNo, name, street, postCode, city)

The problem is with the postal code. FD’s:

FD1: persNo → name street postCode cityFD2: postCode → cityFD3: street city → postCode

FD2 and FD3 are BCNF violations. Notice that the relation isnot even in 3NF, since neither city nor postCode is a memberof a key.

188 Database Technology, EDA636

Decomposition 1

We can decompose the Persons relation into BCNF relations.If we start with FD1, we get:

R1(postCode, city)R2(persNo, name, street, postCode)

Both R1 and R2 are in BCNF.This decomposition eliminates the redundancy that the cityhas to be mentioned several times for each postal code. Andif we change the city for a postal code, we only have toperform the change in one place.in the decomposition we have lost the possibility to checkFD3, street city → postCode.

189 Database Technology, EDA636

Decomposition 1 Comments

In practice, you probably wouldn’t bother to decompose thePersons relation:

• How often do you change postal codes? Probably neveror very seldom.

• How costly is the extra storage requirement? Probablynot very costly.

• How often do you wish to check FD3? Probably never.There is also an advantage to keeping the original relation. Ifyou decompose, you have to perform a join on two tablesevery time you wish to access a person’s full address.

190 Database Technology, EDA636

Decomposition 2

If we instead start decomposing with FD2 we get:R1(street, city, postCode)R2(persNo, name, street, city)

R1 is in 3NF, R2 is in BCNF. We do not wish to decompose R1further — we would need to join three tables in order toaccess a person’s full address.Conclusions:

• Sometimes, unnormalized relations and the resultingredundancy are acceptable

• But you must be able to motivate why you choose touse unnormalized relations (and to be able to realizethat you are …)

191 Database Technology, EDA636

More Redundancy

Some redundancy is not detected by the BCNF condition.Consider the relation StarMovie(name, street, city, title, year): astar has a name, may have several addresses, and may starin several movies. Example instance:

name street city title yearC. Fisher 123 Maple St. Hollywood Star Wars 1977C. Fisher 5 Locust Ln. Malibu Star Wars 1977C. Fisher 123 Maple St. Hollywood Empire Strikes Back 1980C. Fisher 5 Locust Ln. Malibu Empire Strikes Back 1980C. Fisher 123 Maple St. Hollywood Return of the Jedi 1983C. Fisher 5 Locust Ln. Malibu Return of the Jedi 1983

There is obvious redundancy in this relation, but the relationis still in BCNF (it has no FD’s at all).

192 Database Technology, EDA636

Common Sense Again

In the relation StarMovie we have tried to put two many-manyrelationships in one relation:

     Starname

      Movietitleyear

0..*    Addressstreetcity

0..*0..*0..*

Naturally, this is not a reasonable thing to do, but it willillustrate the need for yet another normal form.

193 Database Technology, EDA636

Multivalued Dependencies

In the relation StarMovie there are no FD’s at all. For example,name → street city does not hold, since a star may haveseveral addresses. However, for each star there is a well-defined set of addresses. Also, for each star there is a well-defined set of movies. Furthermore, these sets areindependent of each other.This is called a multivalued dependency (MVD) and isdenoted:

name –>> street cityname –>> title year

MVD’s always come in pairs.

194 Database Technology, EDA636

A Formal Definition of MVDIn a relation R, the MVD

A1 A2 … An –>> B1 B2 … Bm

holds if: for each pair of tuples t and u of relation R that agreeon all the A’s, we can find in R some tuple v that agrees:

1. With both t and u on the A’s,2. With t on the B’s,3. With u on all attributes of R that are not among the A’s

or the B’s.Note that v may be = t or = u.

a1 b1 c1

a1 b1 c2

a1 b2 c2

t

v

u

195 Database Technology, EDA636

Reasoning About MVD’s

Some rules about MVD’s are similar to the FD rules:• trivial dependencies• transitivity

The splitting rule for FD’s does not hold for MVD’s.Every FD is a MVD.

196 Database Technology, EDA636

Fourth Normal Form

Definition of fourth normal form (4NF):• A relation R is in fourth normal form if and only if:

whenever there is a nontrivial MVD A1 A2 … An –>> B forR, it is the case that {A1, A2, …, An} is a superkey for R.

i.e. the same definition as BCNF but FD is changed to MVD.The decomposition into 4NF is analogous to BCNFdecomposition. The decomposed relations become:

R1(name, street, city)R2(name, title, year)

197 Database Technology, EDA636

Normal Forms Hierarchy

4NF —> BCNF —> 3NF (—> 2NF —> 1NF).Properties among normal forms:

Property 3NF BCNF 4NFEliminates redundancy due to FD's Most Yes YesEliminates redundancy due to MVD's No No YesPreserves FD's Yes Maybe MaybePreserves MVD's Maybe Maybe Maybe

You should aim for at least BCNF for all relations.In some cases, 3NF is acceptable.Again: you have to know what normal form your relations arein, and to know why if you choose a lower form than BCNF.

Database Technology, EDA636 198

Stored Programs,Triggers

199 Database Technology, EDA636

Stored ProgramsSQL is not “Turing complete” so there are things that youcannot express in SQL, for instance if statements or whilestatements.In most practical situations such constructs are necessary.Solutions:

1. Write a “normal program” in Java or some otherprogramming language, call SQL from this language.This can result in a lot of data being shuffled from thedatabase to the application program, and this isexpensive.

2. Write the program and store the program in thedatabase. This is called SQL/PSM, Persistent StoredModules.In PSM, you can mix “normal” programming languagestatements with SQL.

200 Database Technology, EDA636

ExampleEmployees in a company are described by the followingrelation:

Employees(nbr, name, bossNbr)nbr is the employee number, name is the employee’s (unique)name. bossNbr is the employee number of the employee’sboss. A boss may have a boss, … A top-level boss has ‘0’ asbossNbr. There may be many top-level bosses.Example:

1/Chris/0

2/Josefin/1 3/Per/1 4/Sven/1

5/Johan/2 6/Alvar/3 7/Bosse/3

8/Anna/0

9/Jan/11

10/Peter/8 11/Eva/8

12/Erik/9

13/Andrzej/11 14/Ali/11

201 Database Technology, EDA636

Questions

• Who are the top-level bosses?select name from Employees where bossNbr = 0;

• Who is Jan’s immediate boss?select e1.namefrom Employees e1, Employees e2where e2.name = ’Jan’

and e2.bossNbr = e1.nbr;• Who is Jan’s top-level boss?

… cannot answer in SQL(In the latest SQL standard there is a provision forrecursive queries which would solve this problem. Thisis not available in MySQL)

202 Database Technology, EDA636

JDBC Solution

The answer to the last question is easy to obtain if you use Java (detailsregarding statement creation and deletion, and exceptions, are omitted):

public String getTopLevelBossName(String empName) {ResultSet rs = stmt.executeQuery

(”select bossNbr from Employees where name = '” + empName + ”'”);String bName = empName;int bNbr = rs.getInt(”bossNbr”);while (bNbr != 0) {

rs = stmt.executeQuery(”select name, bossNbr from Employees where nbr = ” + bNbr);

bName = rs.getString(”name”);bNbr = rs.getInt(”bossNbr”);

}return bName;

}

203 Database Technology, EDA636

PSM Solution

create function getTopLevelBossName(empName varchar(10))return varchar(10)

begindeclare bNbr int;declare bName varchar(10);select bossNbr into bNbr from Employees where name = empName;set bName = empName;while bNbr <> 0 do

select name, bossNbrinto bName, bNbrfrom Employeeswhere nbr = bNbr;

end while;return bName;

end;

204 Database Technology, EDA636

Comments

Note the following:• Many syntactic differences from other programming

languages (declare to declare local variables, set toassign a value, no parentheses around the condition inthe while loop, etc.)

• You can mix PSM statements with SQL statements, andyou use select into to assign the result of an SQL queryto a PSM variable

• The data types are the usual SQL types• You cannot have relations as parameters (the relations

Employees in the example is “global” and must existwhen the function is defined)

205 Database Technology, EDA636

Running in MySQL

(This you must try at home; you don’t have the privileges tocreate functions at the LTH installation.)mysql and PSM both use a semicolon as delimiter. You mustchange the mysql delimiter before you define the function, andrestore it afterwards.

delimiter //create function getTopLevelBossName(empName varchar(10))

return varchar(10)begin

…end;//delimiter ;

Call the function:select getTopLevelBossName(’Jan’);

206 Database Technology, EDA636

CursorsWhen you wish to examine all tuples in a relation, you use acursor. A cursor is a variable that runs through the tuples ofa relation. Compare with ResultSet objects in JDBC, and thenext( ) function.

• Cursors must be declared as local variables• Cursors must be opened and closed• A tuple is fetched with the fetch statement• To detect that there are no more tuples, you declare a

continue handler that checks for a specific SQL errorcode (SQLSTATE). Similar to an exception handler.

The procedure copyToStaff on the next slide copies allemployee names from Employees to the Staff table. It is anexample only, you could have copied like this in pure SQL:

insert into Staff select name from Employees;

207 Database Technology, EDA636

A Cursor Examplecreate procedure copyToStaff( )begin

declare done int default 0;declare eName varchar(10);declare empCursor cursor for

select name from Employees;declare continue handler for SQLSTATE '02000' set done = 1;

delete from Staff;open empCursor;fetch empCursor into eName;while not done do

insert into Staff values(eName);fetch empCursor into eName;

end while;close empCursor;

end;

208 Database Technology, EDA636

Recursive Procedures

PSM allows recursive procedures.Example:

Find all staff that are managed by a boss. Store the staffnames in a relation Staff(name).

8/Anna/0

9/Jan/11

10/Peter/8 11/Eva/8

12/Erik/9

13/Andrzej/11 14/Ali/11

findStaff(’Eva’)=> Staff = {’Andrzej’, ’Ali’, ’Jan’, ’Erik’}

findStaff(’Jan’)=> Staff = {’Erik’}

findStaff(’Peter’)=> Staff = { }

209 Database Technology, EDA636

Solution, 1

Do the following in mysql:1. Create the Staff table (once):

create table Staff (name varchar (10)

);2. Create the procedures findStaff and findStaffRecursive

(once, see next slide)3. Execute the procedure findStaff:

call findStaff(’Eva’);4. Study the result:

select * from Staff;

210 Database Technology, EDA636

Solution, 2

Set up for the recursion (clear the Staff table and call therecursive procedure):

create procedure findStaff(in bName varchar(10))begin

declare bNbr int;delete from staff;select nbr into bNbr from Employees where name = bName;call findStaffRecursive(bNbr);

end;

211 Database Technology, EDA636

Solution, 3create procedure findStaffRecursive(in bNbr int)begin

declare done int default 0;declare eName varchar(10);declare eNbr int;declare empCursor cursor for

select nbr, name from Employees where bossNbr = bNbr;declare continue handler for SQLSTATE '02000' set done = 1;

open empCursor;fetch empCursor into eNbr, eName;while not done do

insert into Staff values (eName);call findStaffRecursive(eNbr);fetch empCursor into eNbr, eName;

end while;close empCursor;

end;212 Database Technology, EDA636

PSM Types and Statements

Types Same as SQL data types. Local variables mustbe declared with declare.

Assignment set empName = ’Bosse’;select name into empNamefrom Employees where nbr = 10;

If statement if <condition> then<statements>

else<statements>

end if;Use elseif for else if.

213 Database Technology, EDA636

PSM Loops

Loop [label:] loop…if … leave label;…

end loop;While loop while <condition> do

<statements>end while;

For loop Not implemented in MySQL.

214 Database Technology, EDA636

Debugging Stored Procedures

It is not easy to debug stored procedures. It may be difficulteven to find compilation errors, since the error messages arenot very informative (usually “you have an error in your SQLsyntax”).There is no “stored procedure debugger” where you canfollow the execution of a stored procedure. What you can dois to insert “normal” select statements in a procedure (notselect into). The results from such statements are sent directlyto the database client (usually mysql).

215 Database Technology, EDA636

Why Use PSM?

Some reasons for using stored procedures instead of clientcode:

• Essential to have the “business logic” in one place,instead of spread out over client programs. (Forinstance, banks often don’t give applications access todatabase tables directly, they must perform all databaseactions via stored procedures.)

• More efficient• Lets clients in different languages on different platforms

perform the same database actions

216 Database Technology, EDA636

Triggers

A trigger is an active database element that is executedwhenever a triggering event occurs. The triggering event canbe an insertion, deletion or modification of a tuple in arelation.Typical uses of triggers:

• make sure that an attribute contains a reasonable value(you can also use check on an attribute value when atable is defined, but only for simple cases)

• insert an audit tuple in another table when something ismodified

• perform checks so new information is consistent, and ifit is not, roll back the transaction

217 Database Technology, EDA636

Trigger Example

A table which records texts with author andcreation/modification date:

create table Comments (author varchar(10),text varchar(1000),modified date

);A trigger that ensures that the modified field alwayscontains the date of the modification:

create trigger commentsModifiedbefore insert or update on Comments

for each rowbegin

new.modified := now( );end;

218 Database Technology, EDA636

MySQL Trigger Syntax

create trigger <trigger-name>[ before | after ] [ insert | update | delete ] on <table-name>for each row<trigger body in PSM>

Notes:• for each row means “for each modified row”. The trigger

body is executed for each modified tuple. The new andold tuples are accessed with new and old.

• Note that the syntax before insert or update is notavailable in MySQL. You have to write one insert triggerand one update trigger. If the trigger body is complex,write a stored procedure and call it from both triggers.

• MySQL trigger support is still rudimentary. StandardSQL provides more possibilities.

Database Technology, EDA636 219

Object-Oriented andObject-RelationalDatabases

220 Database Technology, EDA636

Object-Oriented Databases

Object-oriented databases are better than relationaldatabases at handling complex data that arise in manyapplications:

• Images, video, sound, … (multimedia in general)• Spatial data (GIS, Geographical Information Systems)• Biological data (DNA strings)• CAD data (Computer Aided Design)• Virtual worlds, games, …

Object-oriented databases provide persistent storage forobjects. The ODBMS and the application programminglanguage are integrated.Object-oriented databases may provide a query language,indexing, transaction support, distributed objects, …

221 Database Technology, EDA636

Shortcomings of RDMBS

In a relational database system:• Everything must be a relation — the logical model must

be “flattened”. E.g. a many-many relationship becomesa relation.

• There are no complex objects, apart from BLOB’s(Binary Large Objects). BLOB’s cannot be type checked.

• There is no inheritance.• There is an impedance mismatch between the data

access language (SQL) and the host language (Java,C++, …). You need a lot of time-consuming code toconvert from tuples to objects, and vice-versa.

222 Database Technology, EDA636

Future of ODBMS• When should you use an ODBMS?

When you have a need for high performance on complexdata.

• Are ODBMS’s ready for production applications?Yes, many new applications use ODBMS’s.

• Are there any commercial ODBMS products?Objectivity, Gemstone, POET, Jasmine, … All are verysmall compared to Oracle or IBM …

• Will ODBMS’s replace RDBMS’s?Certainly not. ODBMS’s are mainly used in newdevelopment.

• Can ODBMS’s and RDBMS’s coexist?Certainly, and this is what you usually see. OO is goodfor many things, relational also but for different things.

223 Database Technology, EDA636

Standards?

There is not much standardization in the object-orienteddatabase world. Most vendors provide their own solutions todifferent problems.One standards body:

ODMG, Object Data Management Group. Have workedsince about 1990 trying to standardize ODL (ObjectDefinition Language) and OQL (Object Query Language).

Java standards:JDO (Java Data Objects). Has the goal to standardizedata store access in Java, but is broader in scope thanthe ODMG attempts and does not follow ODL or OQLstandards.JDBC and SQLJ will continue to exist but are limited torelational databases with SQL as the query language.

224 Database Technology, EDA636

Object-Relational?

The latest SQL standard has extended SQL with object-oriented features.

• Permits definition of objects with data and methods• Objects may contain explicit references to other objects• Can store an object as the value of an attribute in a tuple

MySQL does not support objects.

225 Database Technology, EDA636

Persistence

When you use a programming language together with anODBMS the objects you create are (may be) persistent, i.e.,they outlive the execution of a program.In an object-oriented database the objects are stored in“object format”, instead of being stored in tuples in arelation, or even worse spread out over several relations.Objects are loaded from the database when they areaccessed by the program. Pointers (references) areautomatically translated (“swizzled”) back and forth betweentwo representations: memory address or disk address.

226 Database Technology, EDA636

Approaches to Persistence

Class based:• persistent objects must be of a class that inherits from a

Persistent classObject based:

• any object may be marked as persistentReachability based:

• one “root” object is marked as persistent, all objectsthat can be reached from this object are also persistent

In JDO (Java Data Objects) reachability-based persistence isimplemented.

227 Database Technology, EDA636

Query LanguagesSQL is a powerful query language for relational databases. Inan object-oriented database, you sometimes need a querylanguage as well. OQL (Object Query Language) is a proposalfor a standardized object-oriented query language.OQL example:

select c.addressfrom Person p, p.children cwhere p.address.street = ”Main Street” and

count(p.children) >= 2 andp.address.city != c.address.city;

Notice the similarity to SQL. This is, of course, intentional.Also note “p.children”: children is a set-valued referenceattribute in the class Person.SQL is both a data definition language and a query language.The object-oriented data definition language is called ODL.

228 Database Technology, EDA636

Java Data Objects (JDO)

Persistent objects in Java:• serialization — save/restore objects with explicit

commands• JDBC, SQLJ — interface to a relational database using

standard SQL• JDO — transparent persistence. Automatic persistence,

persistent objects are treated the same as transientobjects. The underlying data store may be a file system,a spreadsheet, a relational DBMS, an object-orientedDBMS, …

229 Database Technology, EDA636

JDO Interface

The primary participants in the JDO model are:• PersistenceCapable classes — the actual objects that are

stored and fetched• PersistenceManager — negotiates accesses, stores,

transactions, and queries between applications and theunderlying data store

• Transaction — handles ACID transactions• Query — handles language-independent queries

230 Database Technology, EDA636

PersistenceCapable Classes

All user-defined classes can be made persistent. Somesystem classes are persistent, e.g. java.util.Collection classes.Persistent classes must implement the PersistenceCapableinterface, but this is not visible in the user code. Instead:

• the class author provides an XML file with details aboutthe class, e.g. which of the attributes that are persistent

• a Class Enhancer tool processes the class fileDuring runtime, objects of PersistenceCapable classes can bemade persistent by calling the PersistenceManager. Eachpersistent object has its own unique identity in the data storeand hence can be shared between different applicationsconcurrently.

231 Database Technology, EDA636

PersistenceManager

The PersistenceManager class contains the following methodsto make objects persistent:

void makePersistent(Object pc);void makePersistent(Object[ ] pcs);void makePersistent(Collection pcs);

And methods to find and fetch persistent objects:Object getObjectById(Object oid_or_pc, boolean validate);Object getObjectId(Object pc);

And methods to delete persistent objects and to makepersistent objects transient:

void deletePersistent(Object pc);void makeTransient(Object pc);

232 Database Technology, EDA636

Transaction

Transactions are handled by the class Transaction. Methods:void begin( );void commit( );void rollback( );

Almost the same as in SQL.

233 Database Technology, EDA636

QueryQueries in JDO are handled by the Query class. The methodsfor specifying ’select’, ’from’, and ’where’ are languageindependent (SQL, OQL, …). Example:

class Employee {String name;Integer salary;Employee manager;

}Collection extent = persistMgr.getExtent

(Class.forName(”Employee”), false);Query q = persistMgr.newQuery(

Class.forName(”Employee”), // classextent, // candidates”salary > 50000”); // filter

Collection resultSet = q.execute( );Compare: select * from Employees where salary > 50000;

234 Database Technology, EDA636

More Complex Queries

SQL query:select e1.namefrom Employees e1, Employees e2where e1.salary > ? and

e1.manager.name = e2.name ande2.salary > ?;

JDO query:q.declareParameters(”int sal”);q.setFilter(”salary > sal and manager.salary > sal”);resultSet = query.execute(new Integer(50000)); // sal = 50000

Notice that joins are handled transparently. It is up to JDO totranslate the filter specification into SQL (or OQL, …).

235 Database Technology, EDA636

ODL, Object Definition Language

ODL is a language, defined by ODMG, the Object DatabaseManagement Group, for defining the structure of an object-oriented database. This “object model” is on the sameconceptual level as the relational model, and ODL is theobject-oriented counterpart to the data definition language inSQL.It is possible to convert object models into correspondingrelational models. I don’t see why anyone should want to dothis. You start with an E/R model, and if you intend to use anRDBMS you convert the E/R model into a relational model. Ifyou intend to use an ODBMS you instead convert the E/Rmodel into an object model.

236 Database Technology, EDA636

ODL FeaturesIn ODL, you can have many things that you cannot have in arelational model:

• Complex attributes, e.g. structs (“records”):class Star {

attribute string name;attribute Struct Addr

(string street, string city) address;};

• Relationships expressed as collections (sets, arrays,etc.):

class Movie {attribute string title;attribute integer year;attribute integer length;relationship Set<Star> stars;

};

237 Database Technology, EDA636

More ODL Features

• Referential integrity with inverse relationships:class Movie {

attribute string title;attribute integer year;attribute integer length;relationship Set<Star> stars

inverse Star::starredIn;};class Star {

attribute string name;attribute Struct Addr

(string street, string city) address;relationship Set<Movie> starredIn

inverse Movie::stars;};

238 Database Technology, EDA636

Even More ODL Features• Methods:

class Movie {attribute string title;attribute integer year;attribute integer length;relationship Set<Star> stars

inverse Star::starredIn;float lengthInHours( ) raises (NoLengthFound);void starNames(out Set<string>);

};• Inheritance:

class Cartoon extends Movie {relationship Set<Star> voices;

};class MurderMystery extends Movie {

attribute string weapon;};

239 Database Technology, EDA636

ODL Summary

• As has been shown, ODL looks like the data definition partof an object-oriented programming language.

• ODL is programming-language independent.• ODL methods are intended to be implemented in an object-

oriented programming language: C++, Java, etc. Thebinding between ODL and the programming language ishandled by some tool.

• Note that JDO has chosen not to use ODL as the datadefinition language.

• ODL is a superset of ODMG’s IDL (Interface DefinitionLanguage).

240 Database Technology, EDA636

OQL, Object Query Language

OQL is a query language for object-oriented databases. AsODL, it is defined by ODMG.OQL is designed to operate on data described in ODL.OQL has many similarities to SQL, including the familiar“select-from-where” structure of queries. But since ODL hasa richer notion of types, and relationships in ODL aredescribed by (collections of) references, several changeshave been made.

241 Database Technology, EDA636

OQL Features

When was Gone With the Wind produced?select m.yearfrom Movies mwhere m.title = ”Gone With the Wind”

Movies m is a variable declaration — the set of objects of theclass Movie (the “extent” of the Movie class).Which stars appeared in Casablanca?

select s.namefrom Movies m, m.stars swhere m.title = ”Casablanca”

242 Database Technology, EDA636

Object-Relational Databases

In the latest SQL standard (SQL3, SQL-99), object-orientedfeatures have been incorporated.

• relations are still the basic structural concept• User-Defined Types, UDT’s, are similar to classes• a UDT can be the type of a table• a UDT can be the type of an attribute in a table• UDT’s may have methods

243 Database Technology, EDA636

Defining UDT’s

A simple UDT definition:create type AddressType as (

street char(50),city char(20),

)method houseNumber( ) returns char(10);

The method houseNumber is defined as:create method houseNumber( ) returns char(10)

for AddressTypebegin

… (PSM code)end;

244 Database Technology, EDA636

Declaring Relations

A star type:create type StarType as (

name char(30),address AddressType

);A relation containing stars:

create table MovieStar of StarType;

245 Database Technology, EDA636

References

A table in SQL3 may have a reference column that serves asits “identity”. The column may be:

• the primary key of the table, if there is one, or• a column with values that are generated and maintained

to be unique by the DBMS.Supposing that the UDT MovieType has a reference column,other tuples may refer to movie tuples as in the followingexample:

create type StarType as (name char(30),address AddressType,bestMovie ref(MovieType) scope Movie

);

246 Database Technology, EDA636

Declaring Reference ColumnsIn order to give the table Movie a reference column, areference declaration is added to the creation code:

create type MovieType as (title char(30),year integer,inColor boolean

);create table Movie of MovieType (

ref is movieID system generated,primary key (title, year)

);The table Movie now has four columns: title, year, inColor, andmovieID.“System generated” means that the DBMS will create aunique identity for the Movie tuples. If “derived” is usedinstead, the identity will be computed based on the primarykey of the table.

247 Database Technology, EDA636

More References

The many-many relationship between stars and movies is stilldefined as a table. But now that we have references we nolonger need the primary key values in the relation.First, the MovieStar table must be made referenceable:

create table MovieStar of StarType (ref is starId system generated;

);Then, the relation StarsIn can be defined as follows:

create table StarsIn (star ref (StarType) scope MovieStar,movie ref (MovieType) scope Movie

);

248 Database Technology, EDA636

Following References

If x is a value of type ref (T), x refers to a tuple t of type T. Wecan obtain the tuple t, or its components, in the followingway:

• deref(x) == t. Compare with * in C or C++.• t–>a is the value of the attribute a in tuple t.

Example:select deref(movie)from StarsInwhere star–>name = ’Mel Gibson’;

Note: not “select movie” — this would give a list of system-generated movieID’s, not a list of movie tuples.

249 Database Technology, EDA636

Observer Methods

Each UDT has an implicitly defined observer method(“getter”) for each attribute.Example:

select m.year( )from Movie mwhere m.title( ) = ’King Kong’;

Since Movie is a “table of MovieType” each tuple in Movie isa UDT, and we need the observer method to access theirattributes.

250 Database Technology, EDA636

Generator and Mutator Functions

In addition to the observer methods, two kinds of methodsare automatically created for a UDT:

• A generator method (cf. default constructor in Java). If Tis a UDT, then T( ) returns an empty object of type T.

• Mutator methods (“setters”). The value of an attribute xof a UDT can be set to the new value v with x(v).

Example (body of a PSM procedure to create a new star; s, c,and n are string parameters):

set newAddr = AddressType( );set newStar = StarType( );newAddr.street(s);newAddr.city(c);newStar.name(n);newStar.address(newAddr);insert into MovieStar values (newStar);

Database Technology, EDA636 251

XML

252 Database Technology, EDA636

XML

XML (eXtensible Markup Language) is a World-Wide WebConsortium (www.w3.org) standard for defining the structureand meaning of data stored in text documents.New, interesting, and still under development.Some Google search results (January 9, 2007):

XML: 381 million hitsXML & Tutorial: 69,4 million (!)XML & Database: 248 million

java.sun.com search:XML: 26 899

253 Database Technology, EDA636

Digesting the Alphabet Soup

The names of the following important XML standards andtechnologies have been fetched from the XML tutorial atjava.sun.com:XML, SAX, DOM, JDOM, dom4j, DTD, XSL, XSLT, XPath, XMLSchema, RELAX NG, TREX, SOX, XML Linking, XML Base,XPointer, XHTML, RDF, RDF Schema, XTM, XLink, XPointer,SMIL, MathML, SVG, DrawML, ICE, ebXML, cxml, CBL, UBL.So there is a lot to learn… This is an introduction to get a feelof what’s it all about.

254 Database Technology, EDA636

The Structure of Data

In the relational, object-oriented and object-relational datamodels data is structured according to a schema (or class,…). This makes searchable databases possible and isimportant for efficiency.In the real world data often is unstructured. It can be of anytype and it doesn’t necessarily follow any organized format orsequence.You sometimes need to handle unstructured data, but yourprograms must know something of the data to be able tohandle it.A new invention is semistructured data.

255 Database Technology, EDA636

Semistructured Data

Semistructured data is organized enough to be predictable:• Data is organized in semantic entities• Similar entities are grouped together

But:• Entities in the same group do not necessarily have the

same attributes• The order of the attributes is not necessarily important• The presence of some attributes may not always be

required• The type and size of attributes of entities in the same

group may not be the sameAn HTML document is an example of semistructured data.

256 Database Technology, EDA636

Semistructured Data, Example

cf mh sw

starstar

movie

name name title yearaddressstreet

citystreet

city

Maple st. H’wood

Carrie Fisher MarkHamill

Oak rd. B’wood Star Wars 1977

starOf

starOf

starsIn

257 Database Technology, EDA636

XML Basics

• A language, like HTML, for markup of data• The markup is, unlike HTML, for the semantic meaning of

data, not just for presentation• No predefined tags• Extensible tags can be defined and extended based on

applications and needs• tags• attributes• example: <BOOK page = ”453”> … </BOOK>

258 Database Technology, EDA636

Rules for XML Documents

• An XML document must have one root• XML is case sensitive• All tags must be terminated:

<FirstName>Jennifer</FirstName>• Tags must be properly nested:

<Author> <name>Widom</name> </Author>

259 Database Technology, EDA636

Two Modes of XML

Well-formed XML:Allows you to invent your own tags (cf. labels insemistructured data). Entirely schemaless.

Valid XML:Involves a Document Type Definition (DTD) thatspecifies the allowable tags and how they may benested. That is, a schema, but more flexible than arelational schema.

260 Database Technology, EDA636

Well-Formed XML Example

<? xml version = ”1.0” standalone = ”yes” ?><Star-Movie-Data>

<Star> <Name>Carrie Fisher</Name><Address> <Street>123 Maple St.</Street>

<City>Hollywood</City></Address>

</Star><Star> <Name>Mark Hamill</Name>

<Street>456 Oak Rd.</Street> <City>Brentwood</City></Star><Movie> <Title>Star Wars</Title> <Year>1977</Year></Movie>

</Star-Movie-Data>

261 Database Technology, EDA636

Document Type Definitions<!DOCTYPE Stars [

<!ELEMENT Stars (Star*)><!ELEMENT Star (Name, Address+, Movies)><!ELEMENT Name (#PCDATA)><!ELEMENT Address (#PCDATA | (Street, City))><!ELEMENT Street (#PCDATA)><!ELEMENT City (#PCDATA)><!ELEMENT Movies (Movie*)><!ELEMENT Movie (Title, Year)><!ELEMENT Title (#PCDATA)><!ELEMENT Year (#PCDATA)>

] >

262 Database Technology, EDA636

A Document Following a DTD<? xml version = ”1.0” standalone = ”no” ?><! DOCTYPE Stars SYSTEM ”star.dtd”><Stars>

<Star> <Name>Carrie Fisher</Name><Address> <Street>123 Maple St.</Street>

<City>Hollywood</City></Address><Address>5 Locust Ln. Malibu�</Address><Movies>

<Movie><Title>Star Wars</Title><Year>1977</Year> </Movie>

<Movie><Title>Empire Strikes Back</Title><Year>1980</Year> </Movie>

</Movies></Star>…

</Stars>

263 Database Technology, EDA636

More About DTD’s

• Any element or attribute not defined in the DTD generatesan error

• There is a need to parse XML documents and validate themagainst the DTD

• SYSTEM indicates that the DTD is intended for private use,PUBLIC references a public DTD

• Importing external DTD’s:<!DOCTYPE rss PUBLIC

”–//Netscape Communications//DTD RSS 0.91//EN””http://my.netscape.com/publish/formats/rss-0.91.dtd”>

(Rich Site Summary, XML format for sharing headlines andother web content.)

264 Database Technology, EDA636

Attribute Lists

It should be clear that well-formed XML can describe a tree ofdata. However, our initial example of semistructured data wasa graph.Opening XML tags can have attributes that appear within thetag (similar to HTML <a href = ”…”>). One common use ofattributes is to associate single, labeled values with a tag.But you can also use attributes to express “links” in the data,using ID’s and IDREF’s.

265 Database Technology, EDA636

A DTD with ID’s and IDREFS

<!DOCTYPE Stars-Movies [<!ELEMENT Stars-movies (Star* Movie*)<!ELEMENT Star (Name, Address+)>

<!ATTLIST StarstarId IDstarredIn IDREFS>

<!ELEMENT Movie (Title, Year)><!ATTLIST Movie

movieId IDstarsOf IDREFS>

<!-- elements Name, … as before -->] >

266 Database Technology, EDA636

Document with ID’s and IDREF’s

<Stars-Movies><Star starId = ”cf” starredIn = ”sw, esb, rj”>

<Name>Carrie Fisher</Name><Address> <Street>123 Maple St.</Street>

<City>Hollywood</City> </Address></Star><!-- similar for Mark Hamill, ”mh” --><Movie movieId = ”sw” starsOf = ”cf, mh”>

<Title>Star Wars</Title><Year>1977</Year>

</Movie><!-- similar for Empire Strikes Back, ”esb”, and Return ofthe Jedi, ”rj” -->

</Stars-Movies>

267 Database Technology, EDA636

SAX and DOMXML documents are intended to be handled by programs.In a program that processes an XML document you need toconvert the XML text into program data structures. There areseveral ways to do this (in Java and in other languages thathave implemented the standards). Two examples:SAX (Simple API for XML):

A “serial access” protocol. Event-driven: you register ahandler with a SAX parser, and the parser invokes yourcallback methods whenever it sees a new XML tag, orencounters an error, or wants to tell you anything else.

DOM (Document Object Model):Converts an XML document into a tree of objects in yourprogram. You can then manipulate the data in any waythat makes sense: modify the data, remove it, or insertnew data.

268 Database Technology, EDA636

SAX ExampleThe SAX parser calls methods from the ContentHandlerinterface, which you have to implement in your program(similar to a listener interface in AWT or Swing).Example:

<priceList>  [parser calls startElement]<coffee>  [parser calls startElement]

<name>  [parser calls startElement]Mocha Java  [parser calls characters]

</name>  [parser calls endElement]<price>11.95</price>  [parser calls startElement,

        characters, and endElement]</coffee>  [parser calls endElement]…

Notice: with SAX, you can only read an XML document, notmodify it in any way. The parser can check that an XMLdocument is valid (follows a given DTD).

269 Database Technology, EDA636

More about DOM

DOM defines a standard structure of XML documents inmemory. The structure is a tree, with nodes for the differentkinds of XML elements.DOM contains:

• a parser, so you can parse an existing XML documentand build a DOM representation in memory. Many DOMparsers use SAX parsers internally,

• an API to manipulate the DOM tree,• an API to create a new XML document from a DOM tree.

Naturally, you can also create a DOM tree in your programand programmatically create an XML document.

270 Database Technology, EDA636

Locating Data in a Document

Suppose that you have XML documents containinginteresting data, and you want only specific parts of that data.You could write a program that parses a document, builds aDOM tree, and then searches that tree using the DOM API.But this is often not flexible enough.Another example: in order to write a program that processesdifferent parts of an XML data structure in different ways, youneed to be able to specify the part of the structure you aretalking about at any given time.The XML Path Language, XPath, provides a syntax forlocating specific parts of an XML document.

271 Database Technology, EDA636

XPathXPath is an addressing mechanism that lets you specify apath to an element so that, for example, <article><title> can bedistinguished from <person><title>. That way, you candescribe different kinds of translations for the different <title>elements.Examples:/h1/h2 select all h2 elements under a h1 tag/h1[4]/h2[5] select the fourth h1 element, then the fifth h2

element under that/books/book/translation[ . = ’Japanese’]/../title

select the title element node for each book thathas a Japanese translation

As you can see from the examples, XPath expressions looklike search paths in a tree structured file system.(There is much more to XPath than this …)

272 Database Technology, EDA636

Transforming XML Documents

XML specifies how to identify data, but you often need totransform the data in predefined ways.Examples:

• present the data in a readable form, e.g., in HTML,XHTML, plain text, … Note that XML is text, but it is notintended to be read

• create another XML document, maybe in a differentformat

For this, you use XSLT, Extensible Stylesheet Language forTransformations. XSLT uses XPath to match nodes.XSLT is the first part of XSL, Extensible Stylesheet Language.The second part is XSL formatting objects.

273 Database Technology, EDA636

An XSLT ExampleOne common use for XSLT is to transform XML documents intoHTML. Internet Explorer can do this.A simple template stylesheet (intro.xsl):

<?xml version = "1.0"?><xsl:stylesheet version = "1.0"

xmlns:sxl = "http://www.w3.org/1999/XSL/Transform"><xsl:template match = "myMessage">

<html><body><xsl:value-of select = "message"/></body>

</html></xsl:template>

</xsl:stylesheet>

Uses XPath to find the myMessage element, then the messageelement, then puts the value inside HTML <body> tags.

274 Database Technology, EDA636

A Stylesheet Applied

An XML document that uses this stylesheet:<?xml version = "1.0"?>

<?xml stylesheet type = "text/xsl" href = "intro.xsl"?>

<myMessage><message>Welcome to XSLT!</message>

</myMessage>This is a normal XML document, with the addition of the “xmlstylesheet” tag.More in XSLT:

• variables• iteration, sorting• conditional processing

275 Database Technology, EDA636

Another XSLT Example

Another XML document that we will process using XSLT:

<?xml version="1.0"?><ARTICLE>

<TITLE>A Sample Article</TITLE><SECT>The First Major Section

<PARA>This section will introduce a subsection.</PARA>

<SECT>The Subsection Heading<PARA>This is the text of the subsection.</PARA>

</SECT></SECT>

</ARTICLE>

276 Database Technology, EDA636

The XSLT TemplateProcess the root element:<xsl:template match="/">

<html><body><xsl:apply-templates/>

</body></html></xsl:template>

Process the TITLE and SECT elements:<xsl:template match="/ARTICLE/TITLE">

<h1> align="center"> <xsl:apply-templates/> </h1></xsl:template><xsl:template match="/ARTICLE/SECT">

<h1> <xsl:apply-templatesselect="text()|B|I|U|DEF|LINK"/> </h1><xsl:apply-templates select="SECT|PARA|LIST|NOTE"/>

</xsl:template>

277 Database Technology, EDA636

XSLT Processors

The work necessary for transforming an XML document isperformed by an XSLT processor.Some alternatives:

• Write your own processor. Many standard classes inJava to do this

• If you want HTML presented in a web browser, useInternet Explorer and and the processor ’msxml’ fromMicrosoft

278 Database Technology, EDA636

Custom Markup Languages

XML is used as a markup language in many application areas.Examples:

MathML mathematical formulas. Like LaTeX, but processorindependent

CML chemical formulas, molecules, …SMIL multimedia presentationSVG scalable vector graphics. Lines, curves, etc.XBRL business information

279 Database Technology, EDA636

XML and Databases

An XML document is a collection of data, so in the strictestsense it is a database. It has some advantages as a database:

• self-describing, portable, can define data in trees orgraphs

• flexible schemas (DTD’s, XML Schema)• query languages (XPath, XQuery, …)• programming interfaces (SAX, DOM, …)

Also disadvantages:• verbose, access to data is slow (needs parsing)• lacks indexes, security, transactions, multi-user access,

triggers, queries across multiple documents, …

280 Database Technology, EDA636

XML for Transporting Data

One important use of XML in database systems is not to storedata, but to transport data extracted from a (relational)database.In commercial database systems there are nowadaysapplications for this “serializing” of data.Example, e-commerce:Use a relational database to store information aboutproducts, customers, etc. Use XML documents to transportthis information. Use an XSL stylesheet to convert theinformation for presentation.

281 Database Technology, EDA636

Extracting XML from a Database

An example of an XSLT template for a query returning XMLfrom a relational database:

<?xml version=”1.0”?><FlightInfo>

<Intro>The following flights are available:</Intro><Select>SELECT airline, flightNumber, depart, arrive

   FROM Flights</Select><Flight>

<Airline>$airline</Airline><FltNumber>$flightNumber</FltNumber><Depart>$depart</Depart><Arrive>$arrive</Arrive>

<Flight></FlightInfo>

When the template is processed, the query is executed andan XML document with the appropriate format is produced.

282 Database Technology, EDA636

XML for Storing Data

If your data is not structured in a way so it can beconveniently described by a relational (or object-oriented)schema, you can use a native XML database.As an example, suppose you have a Web site built from anumber of XML documents, and you would like to provide away for users to search the contents of the site. In this case,you will use a native XML database and execute queries in anXML query language.

283 Database Technology, EDA636

Native XML Databases

One possible definition of a native XML database is that it:• defines a logical model for an XML document, and

stores and retrieves documents according to that model• has an XML document as its fundamental unit of logical

storage, just as a relational database has a tuple in arelation as its fundamental unit of logical storage

• is not required to have any particular physical storagemodel. For example, it can be built on a relationaldatabase, or an object-oriented database, or use anormal file system

Note that it is not required that XML documents be stored astext. They may equally well be stored in some other format,such as the DOM model.

Database Technology, EDA636 284

Relational Algebra

285 Database Technology, EDA636

Relational Algebra

SQL and other query languages have a theoretical basis. Thisbasis is called relational algebra, “computing with relations”.Relational algebra is an algebra that operates on sets oftuples. It must be modified somewhat to handles bags(multivalued sets), which are used in commercial DBMS’s.Relational algebra is good for:

• understanding what queries that can be expressed• expressing queries non-ambiguously and compactly• reasoning about queries, e.g., which queries that are

equivalent to each other• planning and optimizing query execution (only of

interest for query language implementers)

286 Database Technology, EDA636

Sets or Bags?

SQL handles bags instead of sets. In a bag, each value mayappear several times.The motivation behind this is that bags are more efficient.Consider the “union” operation:

• Take the union of two sets — you have to check eachtuple in the result so it only appears once

• Take the union of two bags — you just have toconcatenate the bags

287 Database Technology, EDA636

Basics of Relational Algebra

The operands in a relational algebra operation are relations.Basic operations:

• Set operations — union, intersection, difference• Operations that remove parts of a relation: “selection”

(remove tuples) and “projection” (remove attributes)• Operations that combine the tuples of two relations:

Cartesian product, different “join” operations• A renaming operation that changes the name of a

relation or the names of attributes

288 Database Technology, EDA636

Set Operations

The set operations operate on two relations R and S:

R U S UnionR ∩ S IntersectionR − S Difference

Naturally, R and S must be compatible in the sense that theyhave the same attribute names with the same types in thesame order.In SQL:

R union SR intersect SR except S

289 Database Technology, EDA636

ProjectionProjection: removal of attributes.

Operator π (“pi” for “project”).

title year length inColor studioName producerC#Star Wars 1977 124 true Fox 12345Mighty Ducks 1991 104 true Disney 67890Wayne's World 1992 95 true Paramount 99999

title year lengthStar Wars 1977 124Mighty Ducks 1991 104Wayne's World 1992 95

πtitle,year,length(Movie)

inColortrue

πinColor(Movie)

(A set, not a bag)

290 Database Technology, EDA636

Projection in SQLThe projection operator corresponds to the “select list” of anSQL statement:

select title, year, lengthfrom Movie;select inColorfrom Movie;

But note that the second SQL statement produces a bag withthree tuples instead of a set:

“select distinct” produces a set.

inColortruetruetrue

291 Database Technology, EDA636

SelectionSelection: choose tuples based on some condition.

Operator σ (“sigma” for “select”).

title year length inColor studioName producerC#Star Wars 1977 124 true Fox 12345Mighty Ducks 1991 104 true Disney 67890Wayne's World 1992 95 true Paramount 99999

σlength≥100(Movie)

title year length inColor studioName producerC#Star Wars 1977 124 true Fox 12345Mighty Ducks 1991 104 true Disney 67890

292 Database Technology, EDA636

Selection in SQL

The selection operator corresponds to the “where condition”of an SQL statement:

select *from Moviewhere length >= 100;

Note that the selection operator of relational algebra hasnothing to do with the select clause of SQL …

293 Database Technology, EDA636

Cartesian ProductCartesian product, “cross product”: combine every tuplefrom a relation with every tuple from another relation.

Operator ×. Two relations, R and S:

B C D2 5 64 7 89 10 11R × S

A B1 23 4

A R.B S.B C D1 2 2 5 61 2 4 7 81 2 9 10 113 4 2 5 63 4 4 7 83 4 9 10 11

294 Database Technology, EDA636

Cartesian Product in SQL

In SQL, when selection from two relations with a ’*’ in theselect clause, and no where-condition, is performed, theresult is the Cartesian product between the relations:

select *from R, S;

Alternatively:select *from R cross join S;

There is usually not much use for the “unrestricted”Cartesian product. When you restrict the product withconditions in the where clause, you get a join instead.

295 Database Technology, EDA636

Natural JoinNatural join: take the Cartesian product of two relations, butkeep only the tuples whose values match on attributes withthe same name.Operator         .

B C D2 5 64 7 89 10 11

A B1 23 4

A B C D1 2 5 63 4 7 8

R     S

296 Database Technology, EDA636

Natural Join in SQL

In SQL, a natural join is performed when you select from tworelations, give a where-condition that expresses equality, andrestrict the select list:

select A, R.B, C, Dfrom R, Swhere R.B = S.B;

orselect *from R natural join S;

297 Database Technology, EDA636

Another Natural Join Example

Two relations U and V:

A B C1 2 36 7 89 7 8

A B C D1 2 3 41 2 3 56 7 8 109 7 8 10

U     V

B C D2 3 42 3 57 8 10

298 Database Technology, EDA636

Theta JoinTheta join: join relations on an arbitrary condition.Operator           (the condition is C).C

A B C1 2 36 7 89 7 8

A U.B U.C V.B V.C D1 2 3 2 3 41 2 3 2 3 51 2 3 7 8 106 7 8 7 8 109 7 8 7 8 10

B C D2 3 42 3 57 8 10

U     VA<D

Note U.B ≠ V.B

299 Database Technology, EDA636

Combining Operations

Relation Movies(title, year, length, filmType, studioName).What are the titles and years of movies made by Fox that areat least 100 minutes long?

σlength ≥ 100 σstudioName = ’Fox’

πtitle, year

MoviesMovies

300 Database Technology, EDA636

Equivalent Queries

πtitle, year (σlength ≥ 100 (Movies) ∩ σstudioName = ’Fox’ (Movies))

πtitle, year (σlength ≥ 100 AND studioName = ’Fox’ (Movies))

It is the task of the query optimizer to rewrite queries in themost efficient form. To do this, it uses:

• relational algebra rules• ad hoc rules, e.g., perform selections as early as

possible• usage statistics collected by the DBMS

301 Database Technology, EDA636

Join Example

Relations:Movies1(title, year, length, filmType, studioName)Movies2(title, year, starName)

Find the stars of movies that are at least 100 minutes long.

πstarName (σlength ≥ 100 (Movies1          Movies2))

302 Database Technology, EDA636

RenamingRenaming: rename a relation and invent new names for theattributes.

Operator ρ (“rho” for “rename”).

B C D2 5 64 7 89 10 11

R × ρS(X,C,D)(S)

A B1 23 4

A B X C D1 2 2 5 61 2 4 7 81 2 9 10 113 4 2 5 63 4 4 7 83 4 9 10 11

303 Database Technology, EDA636

Independent Operations

The following operations are “independent”, i.e. basic:• union• difference• selection• projection• Cartesian product• renaming

The other operations can be expressed in terms of the basicoperations. Examples:

C

R ∩ S  =  R − (R − S)

R         S  =  σC (R × S)

304 Database Technology, EDA636

Relational Operations on Bags

Relational algebra has sets of tuples as operands.Commercial query languages use bags instead of sets.The relational algebra has to be extended to handle bags. Theextensions are mostly straight-forward and present nosurprises.

305 Database Technology, EDA636

Constraints on RelationsAmong other constraints, referential integrity constraints can beexpressed in relational algebra:

Movies(title, year, length, filmType, studioName, prodNbr)MovieExecs(name, address, certNbr, netWorth)

Here, prodNbr must be one of the certNbr’s (every movie musthave a producer).In relational algebra:

πprodNbr (Movies) is-a-subset-of  πcertNbr (MovieExecs)

In SQL:

create table Movies (…foreign key (prodNbr) references MovieExecs(certNbr)

);

306 Database Technology, EDA636

Extended Operators

The basic select-from-where structure of SQL statements can beexpressed with the relational algebra operators that we havedescribed on the previous slides.However, there are other features of SQL that need differentoperators:

• “select distinct” — extended operator δ

• “group by” — extended operator γ

• expressions in the select list — extended projection operator π

• “order by” — extended operator τ

307 Database Technology, EDA636

Outerjoins

As described earlier, the purpose of the join operation is tomatch tuples from two relations that agree on the values of twoattributes.There are cases when you wish to include “dangling” tuples inthe output, i.e. tuples that fail to match with a tuple in the otherrelation. The missing attributes are padded with NULL.These cases are handled by different kinds of outerjoins.Operator:In practice, you normally consider one of the relations as the“basis”, the tuples of which you want included in the outputeven if they have no match in the other relation. Then, you use aleft or right outerjoin, indicated with L or R on the operator.

308 Database Technology, EDA636

Outerjoin Example

Two relations U and V, and the result of their natural left outerjoin:

A B C1 2 34 5 67 8 9

A B C D1 2 3 101 2 3 114 5 6 null7 8 9 null

B C D2 3 102 3 116 7 12

U       VL