Data Structures, Algorithms and Database Programming
Semester 2/ Weeks 13-24
Database Programming
Nick Rossiter/Emma-Jane Phillips-Tait
Data Structures, Algorithms and Database Programming
Semester 2/ Week 13
Database Programming
Nick Rossiter/Emma-Jane Phillips-Tait
Introduction
• Database Programming– A program is defined simply as:
• a sequence of instructions that a computer can interpret and execute
– So SQL (Structured Query Language)• the ISO standard language for relational databases
– is a programming language
SQL - Classification
• SQL is the basis of all database programming• As a language SQL is:
– Non-procedural• Specify the target, not the mechanism (what not how)
– Safe• Negations limited by context
– Set-oriented• All operations are on entire sets
– Relationally complete• Has the power of the relational algebra
– Functionally incomplete• Does not have the power of a programming language like Java
SQL – Program Constructions
SELECT id, name, addressFROM studentWHERE name = ‘Mary Brown’;
• FROM statement specifies tables to be queried (source/range)
• WHERE statement specifies restriction on values to be processed (predicate)
• SELECT statement specifies what is to be retrieved (target)
Some properties of SQL re-visited
• Non-procedural– No loops or tests for end of file
• Set-oriented– The operation is automatically applied to all the rows in
STUDENT
• Relationally complete– Restrict shown here (all others are available)
• Functionally incomplete– Does not matter here if just want information displayed
SQL Program - Example of Natural Join
SELECT student.id, name, address, year
FROM student, module_choice
WHERE name = ‘Mary Brown’
AND module = ‘CM503’
AND student.id = module_choice.id;
• Last line does primary key – foreign key match
• “Give details of when a student called Mary Brown took module CM503”
Id * | Name | Address
Student
Module_choice
Module * | Id * | year
Module_choice.Id is foreign key to Student.id; represents a path along which joins are made
* Indicates component of primary key
Data names in SQL are case insensitive; SQL values are case sensitive.
Id * Name Address
127 Mary Brown Hexham
296 John Brown Morpeth
654 Mary Brown Newcastle
Student
Module_choice Module * Id * year
CM503 127 2003
CM503 654 2001
cm503 127 2002
For the above data values, what’s the answer?
Id Name Address Year
127 Mary Brown
Hexham 2003
654 Mary Brown
New-castle
2001
Column name contains only one value (as would a module column)
Why only 2 rows? Why is one ‘127’ match missing?
Rewriting Joins as Intersections• SQL is not necessarily run in the way you enter it• You (or the system) could rewrite the join earlier as:SELECT id, name, addressFROM studentWHERE name = ‘Mary Brown’ AND id IN
(SELECT id FROM module_choiceWHERE module= ‘CM503’);
• There’s one difference. Why?
SQL controls the filing cabinet
• Defines data structures (CREATE TABLE, CREATE VIEW, …)
• Handles updates (INSERT, DELETE, COMMIT, ROLLBACK, …)
• Provides retrieval (SELECT)
• But it is not functionally complete in its interactive form
Functional Incompleteness in SQL
• No control statements such as: – Case, Repeat, If, While
• No substitution at run time:– e.g. … WHERE id = :idread– where idread is a program variable
• You don’t see travel agents typing in SQL statements to search for holiday vacancies– although they will be searching a relational database
• There is SQL underneath – But its functionality is increased through additional features
Getting More out of Basic SQL
• To overcome functional incompleteness:– Pre-defined Functions – Procedures (e.g. PL/SQL)– User-defined Functions– Embedded SQL– Web-based Servers (e.g. Microsoft/ASP,
Oracle/JSP, Oracle/JDBC, MySQL/PHP)
Pre-defined Functions in SQL
• An SQL function:– Is a method applied to a particular type– Returns a single value
• There are many pre-defined functions:– Can be used without any knowledge of how they are
implemented– All can be used in target (SELECT) and some in predicate
(WHERE)– Used in areas such as string handling, simple statistics, date
manipulation, type casting
• Use pre-defined functions where available to avoid writing your own code
Example of Predefined Function
SELECT student.id, name, address, yearFROM student, module_choiceWHERE name = ‘Mary Brown’ AND upper(module) = ‘CM503’ AND student.id = module_choice.id;
• Upper(char) takes a value of type char and forces it to upper case
• In the example above it does not update module values in the module_choice table
• So how many rows are retrieved by this join?
Other String-Handling Functions
• Include– CONCAT(arg1, arg2)
• concatenation of the two arguments arg1, arg2
– TRIM(arg1)
• removes leading and trailing blanks from arg1
– LOWER(arg1)
• translates arg1 to lower case
– UPPER(arg1)
• translates arg1 to upper case
– SUBSTR(arg1,n,m)• returns positions n…(n+m) of arg1
• arg1, arg2 are char types; n, m are integer types
Predefined Aggregation Functions
• Functions operating on collections include:– AVG(setN)
• returns average of setN – SUM(setN)
• returns sum of setN– COUNT(setR)
• returns count of setR
– MAX(setT)• returns maximum of setT
• setN is a set of type number, setR is a set of type row, setT is a set of any type
• Sets may be formed as columns of values via:– a SELECT command on a table or– a GROUP BY on a table
Predefined Date Functions
• Include:– SYSDATE
• no arguments, returns current date
– MONTH(arg3)• returns month component of arg3
– YEAR(arg3)• returns year component of arg3
– MONTHS_BETWEEN(arg3,arg4)• the number of months between arg3 and arg4
– In some versions of Oracle, need to use syntax e.g. {fn MONTH(arg3)}
• arg3, arg4 are date types
• (arg3-arg4) gives number of days between the two dates
• time handling is also available within date type
Data Structures, Algorithms and Database Programming
Semester 2/ Week 14
Database Programming
Nick Rossiter/Emma-Jane Phillips-Tait
Programming with SQL*Plus
• Predefined functions play an increasing role– Sometimes termed built-in functions
• Rather unstable to some extent– New functions in Oracle 9i– Some redefinitions of Oracle 8i functions
– Not always upwards compatible
– What works in one system may not work in another without some tweaking
Reference Material
• So programmers must always consult reference material
• Text books and lecture notes are not reference manuals
• All database manuals are available on-line e.g.– DB2 notes cited in exercises for week 13 (should be
very similar to Oracle as same standard)– Oracle 9i notes available from (prefix id by unn/)– http://cgweb1.unn.ac.uk/SubjectAreaResources/database/oracle/doc/
• Sound advice is: Read the Manual!
Reading Variable Values into SQL Programs
• Makes programs dynamic• Number of methods
– Substitution variables• &variable in programs
– Accept statements • User-defined prompt and type for a variable value
– Script parameters• Variables assigned values on executing a script file
• Such reads make programs versatile– Can run with values specified by user at run-time
Substitution Variablesselect *from patientwhere pid = ‘&pno’;
• when run, prompts user for value for pno• quotes indicate a char value select * from patientwhere pid <= '&&pno'and pid >= '&&pno';
• Double && means only one prompt is made– even if same variable occurs more than once in program
• In all input operations quotes are also used for dates but not for numbers.
Accept Statement
• ACCEPT sp_variable type PROMPT ‘string’;• where upper case is literal • sp_variable is the SQL*Plus variable being assigned • type is the type of the SQL*Plus variable• string is the prompt
• Example:accept pat_no char prompt 'Enter patient id:';select *from patientwhere pid = ‘&pat_no’;• Value entered for pat_no is available for rest of session.
Script Parameters• Can run a script file called S.sql (type sql is required by
default) by:@S• Say S containsselect *from patientwhere pname = '&1‘ AND pid = '&2';
• Then can run the script by:@S 'Fred' '1';• ‘Fred’ is parameter 1 and ‘1’ is parameter 2• May need to use file/open in SQL*Plus to set directory
Nulls
• In more serious programming with SQL– often need to know whether a variable has been initialised or not.
• An un-initialised variable has a null value – unless a default has been supplied
• Cannot search for nulls as `’ or ``’’
select * from patient where pname IS NULL;
• finds rows where patient name is null
select * from patient where pname IS NOT NULL;
• finds rows where patient name is not null
Spooling
• Useful to record a whole session
• From the File menu in SQL*Plus:– Can set a (text) file as the recipient of all output
including commands– Need
• SET ECHO ON
– to have a record of everything
Intermediate Results• Building up results in stages is often a good idea:
– Can check intermediate results for correctness– May be able to re-use intermediate results in more than one
way
• Views may be used for this purpose– No data storage costs for a view– Updated automatically as data changes (in effect)
• Reflects latest data position
• Tables are less satisfactory– Duplicate data storage– Out of date as snapshot of data held
Examples of Views
create view pv as(select patient.*, visits.did, visits.vdatefrom patient, visitswhere patient.pid = visits.pid);• Natural join of patient and visits, contains:
– pid, pname, address, dobirth, date_reg, did, vdate
create view dv as(select doctor.*, visits.pid, visits.vdatefrom doctor, visitswhere doctor.did = visits.did);• Natural join of doctor and visits, contains:
– did, dname, date_start, pid, vdate
Operations on Views 1
• Can select and search as if they were tables• Updates may cause problems (not considered here)
• Examplesa) select * from pv; b) select * from pv where pid = ‘5’;c) Create view pvv as (select pv.*, action, vaccinatedfrom pv, vaccinations vcwhere pv.pid = vc.pidand pv.vdate = vc.vdate);• Does natural join of pv and vaccinations over pid and vdate. • Natural join pvv contains:
– pid, pname, address, dobirth, date_reg, did, vdate, action, vaccinated
vc is alias forvaccinations
Operations on Views 2
• View pvv is the natural join of patient, visits and vaccinations
• It can be presented to users as a structure:– For ease of searching (just where clause)– In which no knowledge of joins is required
select distinct pid, pname from pvvwhere upper(vaccinated) = ‘TYPHOID’ and address = ‘Heaton’and vdate < ’25-apr-2002’; • Searches for values in all three base tables with joins (that is
logical connections) already made in pvv.
Setting the SQL*Plus Environment
• Over 50 variables control the environment in which SQL*Plus runs.
• Can see all their current settings through:show all;
• Cover appearance of prompts, formats on screen, transaction settings, recovery, escape, compatibility, …
• A potential pitfall for imported applications if different environment assumed
SHOW var;
• shows the current setting for a particular variable
Examples of Environment Variables 1
• Autocommit– If on updates are committed after each update
command (insert, delete, update)– If off updates are not committed after each
update command
• Time– If on all prompts are preceded by time giving
time stamping;– If off time is not displayed with the prompt
Examples of Environment Variables 2
• Linesize– Set to an integer giving width of line display
• SQLPrompt– Can vary default prompt from SQL>
• Feedback– If on report on number of rows found– If off give no report
• Echo– If on echo input on screen– If off do not echo input
Setting Environment Variables
• SET variable value;– variable is the environment variable– value is the new value
• Examples:– set autocommit on; – set sqlprompt input>;– Set linesize 100;
Data Structures, Algorithms and Database Programming
Semester 2/ Week 15
Database Programming
Nick Rossiter/Emma-Jane Phillips-Tait
SQL*Plus Scripting 1• Plus points:
– Same SQL language as in interactive mode• Can test programs interactively first
• Includes predefined (built-in) functions
– Fast development possible• Rapid prototyping
• Get results and feedback quickly
• No cumbersome environment
– Variable inputs• Parameters, substitution, accept
SQL*Plus Scripting 2
• Plus points (continued)– Can have multiple script files
• Each file created by simple text editor
– Can have master script file• calling others in sequence
– Or can nest script files more generally• scripts can call other scripts (default file type sql)
– @S6– start S6
SQL*Plus Scripting 3
• Problems with scripts:– interpreted each time they are run
• not verified and compiled
• optimisation of SQL code done each run time
• poor performance
– no control environment• procedural actions lacking (case, if, while, for, repeat)
• no error handling– resulting in outright failures or ignoring of messages
SQL*Plus Scripting 4
• Problems with scripts (continued):– Lack of control by business (via DBA -- DataBase
Administrator)
– How do we permit scripts for usage by particular people?
• Can anyone write a script to do anything they like?
– If people write scripts themselves to handle business rules
• how do we know they’ve implemented the rules in the same way?
Example: Different Representation of Same Rule
• update patient set dobirth = '20-feb-1932’ where pid = '3';• select round(months_between(sysdate,dobirth)/12) as age_mb from
patient;• select trunc((sysdate-dobirth)/365.25) as age_dd from patient;• select trunc((sysdate-dobirth)/365) as age_ddnl from patient;• select (extract(year from sysdate)-extract(year from dobirth)) as
age_yr from patient;
• Above:• Alter dobirth for patient ‘3’• Run four queries each one, according to the user,
calculating the age.
Results 14-Feb-2004
pid 1 2 3 4
Age_mb 34 20 72 22
Age_dd 33 20 71 22
Age_ddnl 33 20 72 22
Age_yr 34 21 72 22
Comments on Table• Minor differences such as these
– often more of a problem than major differences
• If big differences– these rapidly become obvious– e.g. paying an interest rate ten times more than others
• Age differences here could play havoc with:– social security benefits
• Later we will create a function to calculate the age precisely
Production Environment
• Encourages:– business rules in one place
• application of rules then controlled by DBA
– users need permission to apply rules• permission is granted/revoked by DBA
• Discourages:– duplicated, potentially inconsistent, rules– access by users to anything they like
SQL Procedure
• An important technique• Part of PL/SQL in Oracle
– Procedural Language/Structured Query Language
• Part of the SQL standard– approximate portability from one system to another
• Techniques are available for:– procedural control (case, if, while, …)– parameterised input/output– security
Oracle PL/SQL
• Not available in Oracle 8i Lite• Available in Oracle 9i at Northumbria• Available in Oracle 9i Personal Edition for
Windows (XP/NT/2000/98) and linux. – http://otn.oracle.com/software/products/oracle9i/index.html
– c1.4Gb download -- needs Broadband -- 3 CDs
• Useful guide to PL/SQL:– http://www-db.stanford.edu/~ullman/fcdb/oracle/or-plsql.html
– Using Oracle PL/SQL -- Jeffrey Ullman, Stanford University
Procedures are First-class Database Objects
• Procedures are held in database tables– under the control of the database system
• in the data dictionary• select object_type, object_name• from user_objects• where object_type = 'PROCEDURE';
• user_objects is data dictionary table maintained by Oracle• object_type is attribute of table user_objects holding value
‘PROCEDURE’ (upper case) for procedures – other values for object_type include ‘TABLE’, ‘VIEW’
• object_name is user assigned name for object e.g. ‘PATIENT’
Procedures aid Security
• Privileges on Tables: • Select
– query the table with a select statement. • Insert
– add new rows to the table with insert statement. • Update
– update rows in the table with update statement. • Delete
– delete rows from the table with delete statement. • References
– create a constraint that refers to the table. • Alter
– change the table definition with the alter table statement. • Index
– create an index on the table with the create index statement
Privileges on Tables• SQL statement -- issued by DBA:
– GRANT select, insert, update, delete ON patient TO cgnr2;
– ‘no grants to cgnr3 for table access’• allows user cgnr2 to issue SQL commands:
• beginning with SELECT, INSERT, UPDATE, DELETE on table patient
• but this user cannot issue SQL commands
• beginning with REFERENCES, ALTER, INDEX on table patient
• User cgnr3 does not know that table patient exists
Privileges on Procedures• The SQL statement
– GRANT execute ON add_patient TO cgnr3;
• allows user cgnr3 to execute the procedure called add_patient
• So user cgnr3 can add patients
– presumably the task of add_patient
• but cannot do any other activity on the patient table
– including SELECT
• So procedures give security based on tasks– powerful task-based security system
Data Structures, Algorithms and Database Programming
Semester 2/ Week 16
Database Programming
Nick Rossiter/Emma-Jane Phillips-Tait
SQL Procedure Construction• Simple example, SQL*Plus Window:SQL> create or replace procedure add_patient as2 begin3 insert into patient values('99','Smith','Newcastle','12-mar-1980');4 end;5 /Warning: Procedure created with compilation errorsSQL> show errorsErrors for PROCEDURE ADD_PATIENT
LINE/COL ERROR-------- -----------------------------------------3/1 PL/SQL: SQL Statement ignored3/13 PL/SQL: ORA-00947: not enough values
Technique
• Have procedure code in text file managed by simple editor– E.g. Notepad
create or replace procedure add_patient asbegininsert into patient values('99','Smith','Newcastle','12-mar-1980');end;/
• Copy and paste code from text file into SQL*Plus window
• Oracle does keep a copy in its data dictionary• Many users work from text files
Features of Procedure
• CREATE OR REPLACE add_patient AS– Either add or over-write procedure called add_patient
• Needs care
• Could over-write existing procedure
• IS is alternative for AS
• BEGIN and END – Start and finish block
• INSERT is standard SQL statement• / means compile
Error Tracking
• ‘with compilation errors’– Problem(s) encountered in compilation
• Look at these through SQL command– SHOW ERRORS (SHO ERR abbreviation)
• Diagnostics:– Statement at line 3 ignored
• As not enough values at line 3, column 13 for patient• Five columns in patient, four in insert statement
– So in compilation tables are checked for compatibility with procedure operations
• ORA-00947 is an Oracle return code for ‘not enough values’
• Only execute procedures compiled without errors
Try again
SQL> create or replace procedure add_patient (reg in char) as
2 begin 3 insert into patient
values('99','Smith','Newcastle','12-mar-1980',reg); 4 end; 5 /
Procedure created.
Parameters
• Have added 5th variable to values• Also added a parameter
– Reg• type char (as in SQL types) and in (input, read-only)
– Other types at this level are number, date
• Message ‘Procedure created’ means:– No errors found
– Procedure can be executed
– Procedure is held in Oracle’s data dictionary
Data Dictionary entry for procedure
SQL> select object_type, object_name 2 from user_objects 3 where object_type = 'PROCEDURE';
OBJECT_TYPE------------------OBJECT_NAME--------------------------------------------------PROCEDUREADD_PATIENT
Executing Procedure
SQL> execute add_patient('14-feb-2002');PL/SQL procedure successfully completed.SQL> select * from patient where pid = '99';PID PNAME------ --------------------ADDRESS-----------------------------------------------DOBIRTH DATE_REG--------- ---------99 SmithNewcastle12-MAR-80 14-FEB-02
Features of Execution
• ’14-feb-2002’ is value for parameter of type date
• Other values are hard-wired in procedure
• Message ‘… successfully completed’– No errors during run
• Subsequent SELECT confirms– New data entered for patient with pid = ’99’
Now run procedure again
SQL> execute add_patient('14-feb-2002');
BEGIN add_patient('14-feb-2002'); END;
*
ERROR at line 1:
ORA-00001: unique constraint (CGNR1.PKP) violated
ORA-06512: at "CGNR1.ADD_PATIENT", line 3
ORA-06512: at line 1
Error – why?
• Attempt to add row with same primary key as last run (’99’).
• So violation at line 3 of procedure of primary key constraint CGNR1.PKP – CGNR1 is id– PKP is constraint from CREATE TABLE
• create table patient (• pid char(6) constraint pkp primary key, ….
• ‘ORA-00001: unique constraint violated’– Oracle return code and associated message
All values from parameters
SQL> CREATE OR REPLACE PROCEDURE add_patient (pid in char, pname in char, address in char, dobirth in date, regdate in date)
2 AS 3 BEGIN 4 insert into patient values(pid,pname,address,dobirth,regdate); 5 DBMS_OUTPUT.PUT_LINE ('Insert attempted'); 6 END; 7 /
Procedure created.
No data hard-wired/output strings
• Usually meaningless to have hard-wired data values– Need dynamic input at run-time– Note two types – char, date– Values may be captured through SQL Forms
• Output strings– Varies from system to system– In Oracle
• Use DBMS_OUTPUT.PUT_LINE • Needs earlier SQL command:
– Set serveroutput on
– Note output here is unconditional and vague
Execution with all values as parameters
SQL> execute add_patient('124','Smith','Edinburgh','13-nov-1980','27-dec-2002');Insert attempted
PL/SQL procedure successfully completed.
SQL> select * from patient where pid = '124';
PID PNAME------ --------------------ADDRESS--------------------------------------------------------------------------------DOBIRTH DATE_REG--------- ---------124 SmithEdinburgh13-NOV-80 27-DEC-02
Make columns explicit
SQL> CREATE OR REPLACE PROCEDURE add_patient (pat_id in char, pat_name in char, pat_address in char, pat_dobirth in date, pat_regdate in date)
2 AS 3 BEGIN 4 insert into patient(pid,pname,address,dobirth,date_reg)
values(pat_id,pat_name,pat_address,pat_dobirth,pat_regdate); 5 DBMS_OUTPUT.PUT_LINE ('Insert attempted'); 6 END; 7 /Procedure created.
• Specifying columns for patient makes procedure immune to any later changes in order of columns in patient
Data Structures, Algorithms and Database Programming
Semester 2/ Week 17
Database Programming
Nick Rossiter/Emma-Jane Phillips-Tait
Transactions -- Rationale
• Consider two clients booking airline tickets• There are 2 seats left on a flight• Client A wants 2 seats:
– time 12:02 makes initial request– 12:06 confirms purchase through booking form– 12:08 authorises credit card payment
• Client B wants 2 seats:– time 12:03 makes initial request– 12:05 confirms purchase through booking form– 12:09 authorises credit card payment
• Situation needs careful control
Some Possibilities• Clients A and B are both told 2 seats are free in
initial enquiries• B confirms purchase before A
– But A may still proceed
• A attempts credit card debit first– If successful A secures tickets at 12:08
• B then attempts credit card debit– If successful B secures tickets at 12:09
• potentially over-writing A’s tickets• A has paid for tickets no longer his/hers
Requirements 1• When client A beats B in the initial enquiry:
– they should form a queue (serialisability)– B must wait for A to finish
• Different kinds of finish for A:– successful
• completes booking form• makes credit card debit• store results (commit)
– number of seats available is now zero
• write transaction log and finish• B cannot proceed with purchase as no tickets left
Requirements 2
– unsuccessful• may not complete booking form
• may not have funds on credit card
• undo any database changes (rollback) and finish– number of seats available is still 2
• B can now proceed to attempt to purchase the 2 tickets left
• Techniques required emulate business practice
Transactions -- ACID• A transaction is a unit of operation on a database.
– typically comprises a collection of individual actions• e.g. in SQL INSERT, UPDATE, DELETE, SELECT
• Satisfies ACID requirements:– Atomicity
• Collection of operations is viewed as a single data process
– Consistency• Data integrity is preserved
– Isolation• No interaction between one transaction and another• Intermediate results not viewable by others
– Durability• Once completed, effect of transaction is guaranteed to persist
Transactions in SQL• Logical units of work• A group of related operations that
– must be performed successfully• before any changes to the database are finalised.
• Variable size:
– entire run on SQL*Plus • e.g. spend 2 hours inserting data
– single command in SQL*Plus• e.g. one insert command
– one execution of a procedure• e.g. one run of add_patient (week 16)
SQL approach may be informal
• No explicit– BEGIN transaction, END transaction
• With autocommit OFF– SET AUTOCOMMIT OFF
• Implicit BEGIN transaction by:– start of SQL*Plus session
• Implicit END transaction by:– end of SQL*Plus session
SQL Transaction commands
• Commit;– saves current database state
– releases resources held
– equivalent to Save and Exit in MS Word
• Rollback;– returns database state to that at start of transaction
– releases resources held
– equivalent to dismiss/ do not save changes in MS Word
Use of Commands• Commit/Rollback
– explicitly entered in:• SQL*Plus window interactively• PL/SQL code including procedures
– implicitly entered• on normal EXIT from Oracle (commit 9i, rollback 8i Lite)• on abnormal exit from Oracle e.g. dismiss (rollback)• after each update command in SQL*Plus (commit)
– when autocommit is ON (or IMMEDIATE)
• after change to data definition, e.g. alter table (commit)– whatever autocommit setting
ACID in Oracle 1
• Atomicity – all commands in a transaction form a single
logical group
• Consistency– integrity checks within transaction
• Isolation– data modified by transaction not visible to
others until end of transaction
ACID in Oracle 2
• Durability– On Commit
• database state is first saved
• transaction log file is then updated– this log file may be held in several locations
• confirmation of log file writes ends transaction
– If crash (e.g. of disk) after commit• restore last save of database file
• run transaction log on database forward from:– save point to last transaction that committed
Partial Rollbacks
• Savepoints can be declared in SQL*Plus window or PL/SQL:
SAVEPOINT label; (label is a character string)
• The command ROLLBACK to label;• undoes changes back to the label in the
program or window• Many different savepoints can be declared
Locks
• Resources are held by locks• In SQL lock management is done:
– automatically with COMMIT and ROLLBACK
• Users and programmers can rely on defaults• However, some knowledge is useful for:
– tuning in production systems using LOCK command for efficiency
– understanding problems in running concurrent transactions
Example Lock Table
Task Table Row Lock type
CGNR1–1 Patient 8 W
CGNR1-2 Patient 1 R
CGSA1-1 Patient 7 W
CGSA1-2 Patient 1 R
Locking Modes • R (read) or shared
– any number of tasks can read the same data items concurrently
– CGSA1-2 and CGNR1-2 both read patient 1
• W (write) or exclusive– when writing data need exclusive access– otherwise values can change while in use by others– So CGNR1-1 is only task that can access Patient 8– and CGSA1-1 is only task that can access Patient 7
• None -- no entry in table
Locking Granularity• Can lock at level of:
– table– page (unit of disk storage)– row
• Coarse locks:– for instance a whole table– give small lock tables (not that many tables)– much contention for resources (many users queue for table access)
• Fine locks:– for instance a single row– give large lock tables (many rows included)– less contention for resources (few users queue for row access)
• Oracle defaults give fine locking
Data Structures, Algorithms and Database Programming
Semester 2/ Week 18
Database Programming
Nick Rossiter/Emma-Jane Phillips-Tait
Transactions in Procedures
• On the surface – very easy.• If everything goes well (at end):
– COMMIT:
• If things go badly (at end):– ROLLBACK;
• Problem is controlling bad outcomes:– Handling exceptions
– Giving useful feedback
Week 16 Example – with Commit
SQL> CREATE OR REPLACE PROCEDURE add_patient (pat_id in char, pat_name in char, pat_address in char,
pat_dobirth in date, pat_regdate in date) 2 AS 3 BEGIN 4 insert into patient(pid,pname,address,dobirth,date_reg)
values(pat_id,pat_name,pat_address,pat_dobirth,pat_regdate); 5 DBMS_OUTPUT.PUT_LINE ('Insert attempted'); 6 COMMIT; 7 END; 8 /
Procedure created.
Review of Assignment Procedure
• Asked to add a procedure to add vaccination data
• Generate:– one successful run– three unsuccessful runs
• Here review closely results of run
ADD_VACC procedure
SQL> CREATE OR REPLACE PROCEDURE add_vacc (pat_id in char, vis_vdate in date, vis_act in number, vac_vacc in char)
2 AS 3 BEGIN 4 insert into vaccinations(pid,vdate,action,vaccinated)
values(pat_id,vis_vdate,vis_act,vac_vacc); 5 DBMS_OUTPUT.PUT_LINE ('Insert attempted'); 6 END; 7 /Procedure created.
Successful Run 1aSQL> execute add_vacc('2','16-dec-1999',3,'cholera');
PL/SQL procedure successfully completed.
SQL> select * from vaccinations 2 where pid = '2' and action = 3;
PID VDATE ACTION VACCINATED------ --------- ---------- --------------------2 06-AUG-91 3 polio2 16-DEC-99 3 cholera
SQL> commit;
Commit complete.
Successful Run 1b
• No error messages• Message ‘PL/SQL procedure successfully completed’ is
significant. It means:– Any exception raised during run has been properly
handled
– Does not necessarily mean data has been added successfully
• COMMIT should have been last line in procedure• Here user has made decision to commit
Unsuccessful Run 1a
SQL> execute add_vacc('2','16-dec-1999',1,'cholera');
BEGIN add_vacc('2','16-dec-1999',1,'cholera'); END;
*
ERROR at line 1:
ORA-00001: unique constraint (CGNR1.PKVAC) violated
ORA-06512: at "CGNR1.ADD_VACC", line 4
ORA-06512: at line 1
Unsuccessful Run 1b
• Error message returned:– ORA-00001 indicates non-unique primary key value– Text message ‘unique constraint violated’ spells out nature of problem– CGNR1.PKVAC is name of constraint in CREATE TABLE definition for
Vaccinations• constraint pkvac primary key (pid,vdate,action)
• Note no message about successful completion.– Does not necessarily mean unsuccessful addition– Means that exception raised in INSERT operation has not been
handled within the procedure
Unsuccessful Run 2a
SQL> execute add_vacc('2','17-dec-1999',1,'cholera');
BEGIN add_vacc('2','17-dec-1999',1,'cholera'); END;
*
ERROR at line 1:
ORA-02291: integrity constraint (CGNR1.SYS_C0080698) violated - parent key not found
ORA-06512: at "CGNR1.ADD_VACC", line 4
ORA-06512: at line 1
Unsuccessful Run 2b• Error message returned:
– ORA-02291 indicates foreign key entered does not match a primary key value (in visits)
– Text message ‘parent key not found’ spells out nature of problem
– foreign key(pid,vdate) REFERENCES visits(pid,vdate);
– CGNR1.SYS_C0080698 is name of constraint– Named constraints give more information
• Again no message about successful completion– As exception not handled
Attempted unsuccessful run
SQL> execute add_vacc('2','16-dec-1999','4','cholera');
PL/SQL procedure successfully completed.
SQL> select * from vaccinations 2 where pid = '2' and action = 4;
PID VDATE ACTION VACCINATED------ --------- ---------- --------------------2 16-DEC-99 4 cholera
Worked as ‘4’ char value entered for numeric attribute action was type cast (automatically) to a number
Unsuccessful Run 3a
SQL> execute add_vacc('2','16-dec-1999','4',cholera);
BEGIN add_vacc('2','16-dec-1999','4',cholera); END;
*
ERROR at line 1:
ORA-06550: line 1, column 38:
PLS-00201: identifier 'CHOLERA' must be declared
ORA-06550: line 1, column 7:
PL/SQL: Statement ignored
Unsuccessful Run 3b
• Error message returned:– ORA-06550 indicates non-declared identifier– Parameter value CHOLERA is not in quotes– Therefore taken as variable– Not declared to system
• Again no message about successful completion– As exception not handled
Exception Handling PL/SQL
• Essential part of any program• Particularly needed for updates
– open-ended nature of user inputs
• But also needed for searches– e.g. may not find any matching data
• An exception is raised when an operation:– fails to perform normally
• A non-handled exception leads to program failure
Exceptions Raised
• With input particularly– Cannot specify all Oracle error codes in
advance– Too many codes to specify– Some rule exceptions though can be
emphasised
• Need specific exceptions• And general (catch-all) exceptions
Complete PL/SQL procedure
• CREATE OR REPLACE PROCEDURE proc_name (parameters) AS
• [DECLARE] local_vars• BEGIN • executable_code• EXCEPTION exception_code• END• /
Explanation• Upper case -- literal (as is)• Lower case (to be substituted)• [DECLARE] omitted in procedures but part of full definition for
PL/SQL • Executable_code
– SQL commands, assignments, condition checking, text output, transactions
• Exception_code:– event handling, transactions
• proc_name is procedure name• local_vars are variables declared for use within procedure
(standard SQL types + Boolean types)
Example Procedure - part 1CREATE OR REPLACE PROCEDURE add_patient (pat_id in char, pat_name
in char, pat_address in char, pat_dobirth in date, pat_regdate in date) ASpid_too_high exception;PRAGMA EXCEPTION_INIT(pid_too_high,-20000); BEGINinsert into patient(pid,pname,address,dobirth,date_reg)
values(pat_id,pat_name,pat_address,pat_dobirth,pat_regdate);DBMS_OUTPUT.PUT_LINE ('Insert attempted');IF pat_id > '500' THENRAISE pid_too_high;END IF;COMMIT;
Example Procedure - part 2
EXCEPTION
WHEN pid_too_high THEN
DBMS_OUTPUT.PUT_LINE ('pid too high');
ROLLBACK;
END;
/
Explanation 1• pid_too_high exception;
– variable pid_too_high of type exception (value true or false)
• PRAGMA EXCEPTION_INIT(pid_too_high,-20000);
– instruction to compiler – enables launch of further transaction to handle
exception pid_too_high • IF pat_id > '500' THEN RAISE pid_too_high; END IF;
– IF .. THEN … END IF construction – enforces a business rule that pid <= 500 by
• raising exception pid_too_high when this state occurs
Explanation 2
• EXCEPTION– opens exception handling part of procedure
• WHEN … THEN …; – defines actions when a particular exception
occurs
Flow of Action 1• If no exception raised
– insert is performed– commit takes place– procedure terminates with ‘successful’ message
• If specific exception for business rule raised– insert is performed– exception pid_too_high is raised in IF code– execution of main code immediately finishes– code in EXCEPTION section after WHEN pid_too_high is
executed• including rollback
– procedure terminates with ‘successful’ message
Flow of Action 2
• If another exception raised (on insert e.g. primary key violation)– insert is not performed– exception is raised in procedure– execution of main code immediately finishes– As no further exception handlers are declared
• procedure terminates with:– error reports– no ‘successful’ message
• Need catch-all exception handlers
Data Structures, Algorithms and Database Programming
Semester 2/ Week 19
Database Programming
Nick Rossiter/Emma-Jane Phillips-Tait
Selects in Procedures 1
SQL> CREATE OR REPLACE PROCEDURE sel_patient AS 2 BEGIN 3 select * from patient; 4 END; 5 /Warning: Procedure created with compilation errors.SQL> sho err……3/1 PLS-00428: an INTO clause is expected in this
SELECT statement
Selects in Procedures 2
• A procedure is no substitute for scripting here– Cannot put in simple selects
• SELECT is used in procedures to:– Fetch one row at a time
• An exception may be generated when we fetch:– no rows– multiple rows
• A cursor construction is used to handle multiple rows in an orderly fashion
Selects in Procedures 3
• SELECT attribute_list INTO variable_list– is the basic format
• Lists can be singular or multiple • Multiple entries are comma delimited• Attribute 1 goes into variable 1, 2 into 2, …
– Variables are declared in Declare section• Must be of compatible type to that in CREATE TABLE …
• Single row retrieval is guaranteed– When WHERE clause searches only on primary key or
alternate key– No need for cursor here
SELECT – Single Row –Partial Attributes
SQL> CREATE OR REPLACE PROCEDURE sel_patient 2 (pat_id in char) 3 AS 4 pname_var char(20); 5 BEGIN 6 select pname into pname_var 7 from patient 8 where pid = pat_id; 9 DBMS_OUTPUT.PUT_LINE (pname_var); 10 END; 11 /
Procedure created.
Explanation
• Input is pid• Local variable pname_var is declared of
same type– as in CREATE TABLE for patient
• Attribute value pname is passed into:– Variable pname_var– For the one row where pid = ‘1’
• The value of pname_var is then displayed
Run of Single Record Procedure
SQL> set serveroutput on
SQL> execute sel_patient('1');
Fred
PL/SQL procedure successfully completed.
• Above gives patient name ‘Fred’ for patient with id ‘1’
Automatic variable typing
• In declarations
pname_var patient.pname%TYPE;
• Gives pname_var same type as pname in patient
• Good practice:• ensures types of table attributes and procedure
variables are exactly the same
Exceptions – no rows found
• Exception which needs handling is:– when no rows are found
• PL/SQL provides a pre-defined exception:– NO_DATA_FOUND
– Can test with WHEN clause in EXCEPTION part of procedure
• To avoid procedure error at run time:– Include this exception handler
– Or use an equivalent technique (cursor attributes)
Example – Single Row Retrieval with Exception
SQL> CREATE OR REPLACE PROCEDURE sel_patient 2 (pat_id in char) 3 AS 4 pname_var patient.pname%TYPE; 5 BEGIN 6 select pname into pname_var 7 from patient 8 where pid = pat_id; 9 DBMS_OUTPUT.PUT_LINE (pname_var); 10 EXCEPTION 11 WHEN no_data_found THEN 12 DBMS_OUTPUT.PUT_LINE ('pid does not exist'); 13 END; 14 /
Procedure created.
Run with exception
SQL> execute sel_patient('77');pid does not existPL/SQL procedure successfully completed.
• Error message comes from exception• Exception handled so successful completion
– Even though nothing useful achieved (no pid ’77’)
Retrieval of Complete Row
• Declare variable (instead of pname_var):
pat_row patient%ROWTYPE;• Pat_row is a rowtype
– Holds one row of patient data
– Types as in patient table
– Refer to columns by pat_row.column• e.g. pat_row.pname addresses:
– Column pname in row pat_row
• Use separator || for multiple items in output
Revised Procedure with Rowtype
SQL> CREATE OR REPLACE PROCEDURE sel_patient2 2 (pat_id in char) AS 3 pat_row patient%ROWTYPE; 4 BEGIN 5 select * into pat_row from patient 6 where pid = pat_id; 7 DBMS_OUTPUT.PUT_LINE ('Name is:' || pat_row.pname 8 || 'Address is:' || pat_row.address); 9 EXCEPTION 10 WHEN no_data_found THEN 11 DBMS_OUTPUT.PUT_LINE ('pid does not exist'); 12 END; 13 /
Execution of Procedure
SQL> execute sel_patient2('1');
Name is:Fred Address is:Newcastle
PL/SQL procedure successfully completed.
Selections of Multiple Rows
• PL/SQL deals with one row at a time
• If SELECT potentially retrieves more than one row:– Procedure still compiles– Will work with retrieval of 0 or 1 row– Will fail with more than 1 row
• Consider retrieval on patient name
Procedure for Retrieval on Name
SQL> CREATE OR REPLACE PROCEDURE sel_patient3 2 (pat_name in char) AS 3 pat_row patient%ROWTYPE; 4 BEGIN 5 select * into pat_row from patient 6 where pname = pat_name; 7 DBMS_OUTPUT.PUT_LINE ('Id is:' || pat_row.pid 8 || 'Address is:' || pat_row.address); 9 EXCEPTION 10 WHEN no_data_found THEN 11 DBMS_OUTPUT.PUT_LINE ('pname does not exist'); 12 END;
13 /
Execution of Procedure
SQL> execute sel_patient3('Fred');Id is:1 Address is:NewcastlePL/SQL procedure successfully completed.*********************************SQL> execute sel_patient3('smith');BEGIN sel_patient3('smith'); END;*ERROR at line 1:ORA-01422: exact fetch returns more than requested number of rowsORA-06512: at "CGNR1.SEL_PATIENT3", line 5ORA-06512: at line 1 ‘Fred’ appears once in patient; ‘smith’ appears twice (in my current data). Can use
predefined exception too_many_rows to avoid error.
Cursors
• Cannot rely on luck with searches which may retrieve multiple rows
• Declare cursor (before BEGIN) as the select statement
• Have in executable part:– Open cursor– Process set, row by row, until exit– Close cursor
• Can have multiple cursors
Cursor Declaration
CURSOR p IS
select * from patient
where pname = pat_name;
• The variable p addresses– the set defined by the SELECT statement
• No INTO are needed here
Cursor Executable
OPEN p;
LOOP
FETCH p INTO pat_row;
EXIT WHEN p%NOTFOUND;
DBMS_OUTPUT.PUT_LINE ('Id is:' || pat_row.pid
|| 'Address is:' || pat_row.address);
END LOOP;
CLOSE p;
Explanation• OPEN p
– Retrieves set of rows satisfying select statement– Sets pointer to 1st row
• LOOP– Start of instructions for processing a row
• FETCH– Transfers data from current row to variables– Sets pointer to next row
• EXIT WHEN p%NOTFOUND– Exits loop when no row was transferred in last fetch
• END LOOP– Ends processing of current row; returns to LOOP
• CLOSE p – Closes cursor and releases resources
Processing of Data
• Within FETCH and END LOOP– Can do any processing required for application
• Statistical calculations• Re-packaging of data• Complex reports• Transfers to other tables• Integrity checks• Amalgamations of data from other cursors
• Exception handling is through cursor attribute %notfound– not in SELECT statement
Complete ProcedureCREATE OR REPLACE PROCEDURE sel_patient4 (pat_name in char) ASpat_row patient%ROWTYPE;CURSOR p IS select * from patientwhere pname = pat_name;BEGINOPEN p;LOOP
FETCH p INTO pat_row;EXIT WHEN p%NOTFOUND; DBMS_OUTPUT.PUT_LINE ('Id is:' || pat_row.pid|| 'Address is:' || pat_row.address);
END LOOP; CLOSE p; END; /
Execution
SQL> execute sel_patient4('smith');Id is:42 Address is:grimsbyId is:43 Address is:grimsby
PL/SQL procedure successfully completed.
SQL> execute sel_patient4('Fred');Id is:1 Address is:Newcastle
PL/SQL procedure successfully completed.
SQL> execute sel_patient4('Nigel');
PL/SQL procedure successfully completed.
Useful Reference Book
• Oracle 9i: PL/SQL Programming– Develop Powerful PL/SQL Applications
• by Scott Urman
• Oracle Press
• McGraw-Hill (2002)
Data Structures, Algorithms and Database Programming
Semester 2/ Week 20
Database Programming
Nick Rossiter/Emma-Jane Phillips-Tait
PL/SQL Review
• From perspective of assignment 5
• Plus previous exercises
Exercise week 19 - Declare Section
• CREATE OR REPLACE PROCEDURE ...– age number;– mondiff number;– daydiff number;– action_too_high exception;– vdate_too_early exception;– pragma ...
• Local variables for use within procedure– age, mondiff, daydiff for age calculations
• Exceptions to be RAISEd during run if business rule broken
• Pragma for compiler instructions (not important to logic)
Executable - Age Calculation• select * INTO pat_row from patient where
pname = pat_name;– places a row from table into pat_row PL/SQL– if row not found, constraint exception by systemage:=extract(year from sysdate) - extract(year from pat_row.dobirth);
mondiff:=extract(month from sysdate) - extract(month from pat_row.dobirth);daydiff:=extract(day from sysdate) - extract(day from pat_row.dobirth);
IF mondiff < 0 THEN age:= age - 1; END IF;
IF mondiff = 0 AND daydiff < 0 THEN age:= age - 1; END iF;
• Note use of SQL predefined procedure extract
Executable - Transfer of DataIF age >= 50 THEN DBMS_OUTPUT.PUT_LINE ('Inserting pid: ' ||
pat_row.pid);
INSERT INTO patover50(pid,pname,address,age)
VALUES(pat_row.pid, pat_row.pname, pat_row.address, age);
END IF;
• If PL/SQL variable age is over 50 then:– output message– insert into table– inserted values are
• pat_row variables from patient• PL/SQL variable -- age
Assignment 5
• Assignment 5: CM503/CM517• Set: end week 19 (Friday, 19th March 2004)• Assessed: in seminars during week 21 (Monday,
29th March - Thursday, 1st April 2004)• The assignment extends work done in weeks 18
and 19. The solutions for these exercises are on Blackboard.
• The client now wishes to revise the add vaccination procedure (call it say add_vacc2) so that it does the following in total:
Business Rules• To raise exceptions when rules broken:1.No more than two vaccinations ... per day. 2.If over 75 in age, no more than one vaccination …
per day. 3.The vaccination date >= 1st January 2003.4.The combination of cholera and typhoid on same
day ... is not permitted.5.A vaccination which is within safe period ... shall
not be given.
Exceptions
• Exception:– “We have a problem”– Fatal error for procedure
• Need to get out• Give useful info to user
– In PL/SQL never attempt to recover position• Rollback
– undo any changes made– release resources
Strategy• Insert data first• Then look at consistency (against rules)• So add vaccination data
– Then look at business rules• For instance:
– cannot assess number of actions– or see whether both cholera and typhoid given– until new data is added
• If new data breaks rules, then raise exception and undo changes (rollback)
Types of Exception• Business Rules
– Typically rules not specified in CREATE TABLE• may vary slightly from procedure to procedure
– Determined by inspecting data when provisionally added– RAISE exception by code in procedure when rule broken– Give error message to user– Rollback (Undo changes)
• Table (General) Constraints– Specified in CREATE TABLE– Exception is raised by system automatically– Can handle with WHEN OTHERS …– Display error code (SQLCODE) and Rollback
Parameters
• “Values in/out to/from the outside world”• Typed as:
– IN (input), OUT (output), INOUT (both)
– char, number, date (broad-brush types)
• Input as for add_vacc: – patient id, visit date, action number, vaccine given.
• So procedure always runs with 4 input values
Data Transfers• Common to write verified values to variety of
tables (logs, checks, safety)• For each vaccination given (i.e. each validated
treatment):a. update the vaccinations tableb. insert into the table VACC_RECORD (which you
should create) the following:pid, pname, address, age, vdate, action, vaccinated,
expiry_date– 1st, 5th, 6th, 7th given as input parameters
Remaining Values?• 2nd, 3rd from look up in Patient table• 4th by calculation on dobirth in Patient table• 8th by calculation on valid_for
– could use SQL function Add_Months(date,lasting_year*12)
• Need to retrieve patient data for supplied pid– Calculate age for pid
• Need to retrieve information on vaccine in valid_for table
Implementing Business Rules
• Need to write SQL statement to determine whether rule holds or fails
• Often declared as CURSOR (e.g. c, d, e, or descriptive name)– Could have one cursor per rule
• but may have multiple rules on one cursor
– Then OPEN a cursor (after INSERT of new data made):
• FETCH data
• Look at cursor attributes
Cursor Attributes
• c%FOUND– means last FETCH from cursor c successful
• c%NOTFOUND– means last FETCH from cursor c unsuccessful
• c%ROWCOUNT– gives running total of number of rows retrieved
in FETCHes so far
Testing of Cursor Attributes• WHEN c%NOTFOUND THEN EXIT;
– terminates current loop when rows finished
• IF d%FOUND THEN RAISE vacc_already; END IF;– if find safe vaccination, raise exception
• IF e%ROWCOUNT > 12 THEN RAISE too_many_modules; END IF;– on 13th fetch of module row, raise exception– immediately goto EXCEPTION part, too_many_modules
will be of exception type, have pragma and an exception handler
Basic Procedure Structure
• CREATE … name procedure, parameters .. AS
• [DECLARE] … PL/SQL variables, exception variables, pragmas, cursors
• BEGIN … general messages, INSERTs, deriving data, testing of cursors against rules, commit (if data satisfies rules) … END
• EXCEPTION … handle business rule problems and general constraint violations, rollback
Oracle’s PL/SQL Approach• Fairly typical for relational databases• Previous slide shows general structure• Can use experience here in:
– other SQL systems (procedure is standard)– scripting systems (e.g. PHP/Oracle or PHP/MySQL)
• Useful for placements and final-year projects
• IN JDBC and other Java-based embedded approaches:– PL/SQL type system would be Java type system
Data Structures, Algorithms and Database Programming
Semester 2/ Week 21
Database Programming
Nick Rossiter/Emma-Jane Phillips-Tait
Database Design
• After producing logical design with elegant maintainable structures:
• Need to do physical design to make it run fast.• Performance is often more important in database
applications than in more general information system design:– Emphasis on number of transactions per second
(throughput)
Database Design Methodologies
• Produce default storage schema• May be adequate for small applications• For large applications, much further tuning
required• Physical design is the technique• Concepts: memory (main/disk), target disk
architecture, blocks, access methods, indexing, clustering.
Aims of Physical Design
• Fast retrieval – usually taken as <= 5 disk accesses– Since disk access is very long compared to other access
times, number of disk accesses is often used as indicator of performance
• Fast placement – within 5 disk accesses– Insertion of data, may be in middle of file not at end
– Deleting data, actual removal or tombstone
– Updating data, including primary key and other data
Retrieval/Placement
• Distinguish between actions involving primary and secondary keys
• Primary key is that determined by normalisation– May be single or multiple attributes– Only one per table
• Secondary keys – Again may be single or multiple attributes– Many per table– Include attributes other than the primary key
• Complications such as candidate keys are omitted in this part of the course
Access Method
• An access method is the software responsible for storage and retrieval of data on disk
• Handles page I/O between disk and main memory– A page is a unit of storage on disk
– Pages may be blocked so that many are retrieved in a single disk access
• Many different access methods exist– Each has a particular technique for relating a primary
key value to a page number
Processing of Data
• All processing by software is done in main memory
• Blocks of pages are moved– from disk to main memory for retrieval by user– from main memory to disk for storage by user
• Access method drives the retrieval/storage process
Cost Model
• Identify cost of each retrieval/storage operation• Access time to disk to read/write page = D =
seek time (time to move head to required cylinder)
+ rotational delay (time to rotate disk once the head is over the required track)
+ transfer time (time to transfer data from disk to main memory)
• Typical value for D =15 milliseconds (msecs) or
15 x 10-3 secs
Other times:
• C = average time to process a record in main memory = 100 nanoseconds (nsecs) = 100 x 10-9 secs.
• R = number of records/page
• B = number of pages in file
• Note that D > C by roughly 105 times
Access Method I: the Heap
• Records (tuples) are held on file:– in no particular order
– with no indexing
– that is in a ‘heap’ – unix default file type
• Strategy:– Insertions usually at end
– Deletions are marked by tombstones
– Searching is exhaustive from start to finish until required record found
The Heap – Cost Model 1
• Cost of complete scan: B(D+RC)– For each page, one disk access D and process of R records
taking C each.– If R=1000, B=1000 (file contains 1,000,000 records)– Then cost = 1000(0.015+(1000*10-7)) =
1000(0.0150+0.0001) = 1000(0.0151) = 15.1 secs
• Cost for finding particular record: B(D+RC)/2 (scan half file on average) = 7.55 secs
• Cost for finding a range of records e.g. student names beginning with sm: B(D+RC) (must search whole file) =15.1 secs
The Heap – Cost Model 2
• Insertion: 2D + C – Fetch last page (D), process record (C ), write
last page back again (D). – Assumes:
• all insertions at end
• system can fetch last page in one disk access
• Cost = (2*0.015)+10-7 0.030 secs
The Heap – Cost Model 3
• Deletions: B(D+RC)/2 + C + D– Find particular record (scan half file -
B(D+RC)/2), process record on page (C ), write page back (D).
– Record will be flagged as deleted (tombstone)– Record will still occupy space– If reclaim space need potentially to read/write
many more pages
• Cost = 7.550 + 10-7 + 0.015 7.565 secs
Pros and Cons of Heaps
• Pros:– Cost effective where many records processed in a
single scan (can process 1,000,000 records in 15.1 secs)
– Simple access method to write and maintain
• Cons:– Very expensive for searching for single records in large
files (1 record could take 15.1 secs to find)
– Expensive for operations like sorting as no inherent order
Usage of Heaps
• Where much of the file is to be processed:– Batch mode (collect search and update requests
into batches)– Reports– Statistical summaries– Program source files
• Files which occupy a small number of pages
Data Structures, Algorithms and Database Programming
Semester 2/ Week 22
Database Programming
Nick Rossiter/Emma-Jane Phillips-Tait
Hashing
• One of the big two Access Methods• Very fast potentially
– One disk access only in ideal situation
• Used in many database and more general information systems:– where speed is vital
• Many variants to cope with certain problems
Meaning of Hash
• Definition 3. to cut into many small pieces; mince (often fol. by up).
• Example He chopped up some garlic. • Synonyms dice , mince (1) , hash (1) • Similar Words chip1 , cut up {cut (vt)} , carve ,
crumble , cube1 , divide – From
http://www.wordsmyth.net/live/home.php?content=wotw/2001.0521/wotw_esl
Hash Function
• Takes key value of record to be stored
• Applies some function (often including a chop) delivering an integer
• This integer is a page number on the disk. So– input is key– output is a page number
Simple Example
• Have:– B=10 (ten pages for disk file)– R=2,000 (2,000 records/page)– Keys {S12, S27, S30, S42}
• Apply function ‘chop’ to keys giving:– {12, 27, 30, 42} so that initial letter is discarded
Simple Example
• Then take remainder of dividing chopped key by 10.
• Why divide?– Gives integer remainder
• Why 10?– Output numbers from 0 … 9– 10 possible outputs corresponds with 10 pages for
storage
• In this case, numbers returned are:– {2, 7, 0, 2}
9
8
S277
6
5
4
3
S12, S422
1
S300
Records (only keys shown)
Page
Disk File: hash table
Retrieval
• Say user looks for record with key S42• Apply hash function to key:
– Discard initial letter, divide by 10, take remainder
– Gives 2
• Transfer page 2 to buffer in main memory• Search buffer looking for record with key
S42
Cost Model
• Retrieval of a particular record:D+0.5RC (one disk access + time taken to search half a
page for the required record)
= 0.015+(0.5*2000*10-7) = 0.0151 secs (very fast)
• Insertion of a record:Fetch page (D) + Modify in main memory (C ) + Write
back to disk (D)
= 0.015+10-7+0.015 0.0300
• Deletions same as insertions
Effectiveness
• Looks very good:– Searches in one disk access– Insertions and deletions in two disk accesses– So:
• Searching faster than heap and sorted
• Insertions and deletions similar to heaped, much faster than sorted
Minor Problem
• Complete scan:– Normally do not fill pages to leave space for
new records to be inserted– 80% initially loading– So records occupy 1.25 times number of pages
if densely packed
1.25B(D+RC) = 1.25*10(0.015+2000*10-7) 0.189 secs (against 0.152 if packed densely)
Larger Problems
• Scan for groups of records say S31-S37 will be very slow
• Each record will be in different page, not in same page.
• So 7 disk accesses instead of 1 with sorted file (once page located holding S31-S37).
Larger Problems
• What happens if page becomes full?
• This could happen if– Hash function poorly chosen e.g. all keys end
in 0 and hash function is a number divided by 10
• All records go in same page
– Simply too many records for initial space allocated to file
Overflow Area
• Have extra pages in an overflow area to hold records– Mark overflowed pages in main disk area
• Retrieval now may take 2 disk accesses to search expected page and overflow page.
• If have overflow on overflow page, may take 3+ disk accesses for retrieval.
• Insertions may also be slow – collisions on already full pages.
• Performance degrades
At Intervals in Static Hashing
• The Data Base Administrator’s lost weekend
• He/she comes in
• Closes system down to external use
• Runs a utility expanding number of pages and re-hashing all records into the new space
Alternatives to Static Hashing
• Automatic adjustment – Dynamic Hashing
• Extendible Hashing– Have index (directory) to pages which adjusts
to number of records in the system
• Linear Hashing– Have family of hash functions
Pros and Cons of Hashing
• Pros:– Very fast for searches on search key (may be 1 disk access)– Very fast for insertions and deletions (often 2 disk accesses)– No indexes to search/maintain (in most variants)
• Cons:– Slightly slower than sorted files for scan of complete file– Requires periodic off-line maintenance in static hashing as
pages become full and collisions occur– Poor for range searches
Usage of Hashing
• Applications involving:– Searching (on key) in files of any size for single
records – very fast – Insertions and deletions of single records
• So typical in On-line Transaction Processing (OLTP)
Data Structures, Algorithms and Database Programming
Semester 2/ Week 23
Database Programming
Nick Rossiter/Emma-Jane Phillips-Tait
B-trees
• What does the B stand for?• Balanced, Bushy or Bayer (apparently not clear)• Balanced means distance from root to leaf node is
same for all of tree• Bushy is a gardening term meaning all strands of
similar length• Bayer is a person who wrote a seminal paper on
them
B+-Trees
• There are some variants on B-trees.• We deal here with B+-trees where all data
entries are held in leaf nodes of tree• The two terms are interchangeable for our
purposes.• B+-trees are dynamic index structures• The tree automatically adjusts as the data
changes
B+-tree
• A B+-tree is a:– Multiway tree (fan-out or branching factor >=
2, binary tree has fan-out 2) with– Efficient self-balancing operations
• Minimum node occupancy is 50%
• Typical node occupancy can be greater than this but initially keep it around 60%
B+-tree Diagram: index+sequence set
Sequence Set The dataentries
Index set
Internal structureas in root
d = 2
Structure
• Index set (tree, other than leaf pages) is sparse– Contains key-pointer pairs– Not all keys included
• Sequence set (leaf pages) is dense– All data entries included– Data held is key + non-key data– Pages are connected by double linked lists– Can navigate in either direction– Pages are usually in sequence of primary key– No pointers are held at this level
Parameters
• d =2 says that the order of the tree is 2• Each non-terminal node contains between d and 2d
index entries (except root node 1…2d entries)• Each leaf node contains between d and 2d data
entries• So tree shown can hold 2, 3 or 4 index values in
each node• How many index pointers? • Format is always one more pointer than value. So 3,
4 or 5 pointers per node.
Capacity of Tree
• d = 2– One root node – can hold 2*d = 4 records– Next level – 5 pointers from root, each node holds
maximum 4 records = 20 records in 5 nodes– Next level – 5 pointers from each of the 5 nodes above,
each node maximum 4 records = 100 records in leaf nodes
• In practice will not pack as closely• But d=2, 3-levels – potentially addresses 100 records• If all held on disk, 3 disk accesses
Capacity of Tree
• High branching factors (fan-out) increase capacity dramatically (see seminar).
• So tree structure with high branching factor can be kept to a small number of levels
• Bushy trees (heavily pruned!) mean less disk accesses
• Root node at least will normally be held in main memory
26 33 7 13 52 65
Index multi-levels
Data Entries
20 40
1 2 3 4 7 9 13
14
Example of a B+-tree Order (d) = 2
Search Times - Single Record
• Always go to leaf node (data entries) to retrieve actual data.
• Root node in main memory• Cost = (T-1)*(D+0.5RC) for single record
– T = height of tree– (T-1) as root node is already in main memory
• If d=2, then R=4 (max), cost = (3-1)*(0.015+(0.5*4*10-7))
= 2*0.0150002 = 0.0300004 secs
Search Times - Whole File/Ranges
• Lowest cost is B(D+RC)– B is number of pages assuming data is packed
to 100% capacity, D is time for disk access, R us number of records/page, C is cost for processing each record in memory
• If Sequence Set is packed at 60%, then cost is: (100/60) * B(D+RC)
• Ranges are held in proximity in Sequence Set -- search for these is fast
Insertions
• First add into sequence set– Search for node– If space add to node
• Leave index untouched
– If no space• Try re-distributing records in sibling nodes
– Sibling node is adjacent node with same parent
• Or split node and share entries amongst the two nodes• Will involve alteration to index
• Insertions tend to push index entries up the tree • If splits all the way up and root node overflows, then
height of tree T increases by one.
Deletions• First delete from sequence set
– Search for node– Delete record from node– If node still has at least d entries
• Leave index untouched
– If node has less than d entries• Try re-distributing records in sibling nodes
– Sibling node is adjacent node with same parent
• Or merge sibling nodes• Will involve alteration to index
• Deletions tend to push index entries down the tree • If merges all the way up and root node underflows, then T
decreases by one.
Usage of B+-trees• General Purpose• Single-record Searching
– not as fast as hashing but acceptable with usually five or less disk accesses
• Processing ranges of key values– faster than hashing as records held in order of key in
Sequence Set
• Robust as while data changes– algorithms for inserts/deletes give automatic self-
balancing of tree (no re-organisations)
Data Structures, Algorithms and Database Programming
Semester 2/ Week 24
Database Programming
Nick Rossiter/Emma-Jane Phillips-Tait
Revision
• First, remarks on B+-tree Exercises.
• Can think of B+-tree as generalised binary search tree
• Binary search trees are:– good memory structures
• fast tree traversal
– poor disk structures
• if every pointer access involves a disk access
Binary Search Tree compared with B+-tree
• d = 1• then root node has:
– 1...2 data entries– 2…3 pointers
• Leaf node has 1…2 data entries, no pointers• Intermediate node has 1…2 data entries, 2…3
pointers• So binary search tree is special case of B+-tree
with d=1 and properties in red above.
Maximum Capacity (records) of B+-trees
Order (d) Fan out Maxrecords/node
Capacityat 5 levels
1 2…3 2 2*3*3*3*3 = 162
2 3…5 4 4*5*5*5*5 =2500
10 11…21 20 20*21*21*21*21= c4x106
50 51…101 100 100*101*101*101*101 = 1010
200 201…401 400 400*401*401*401*401 = 1013
Cost of Searching B+-trees (order = d)
• Number of disk accesses is number of levels (T) minus 1:– all data held in leaves of tree– top level (root) is in main memory
• Assume search half of each node on average to find a particular record (0.5RC)
• Cost = (T-1)*(D+0.5RC) where D = 0.015 secs. C = 10-7 secs, R = a number from d….2*d
Insertions
• Put record into sequence in leaf nodes• If inserted node has <=2*d records, OK• If inserted node has >2*d records:
– first try to redistribute records between inserted node, its parent and immediate sibling
– otherwise split inserted node into two nodes and pass intermediate record (key) up one level in tree
Balance of Course - 1
• SQL Scripting– bridging level 1 and level 2– reinforcing SQL knowledge– use of variables– pre-defined functions– Important area for:
• prototyping applications
• some production environments (web-based)
Balance of Course - 2
• Database Fundamentals– Transactions– Concurrency– Security– Procedures– Important for:
• production multi-user systems
Balance of Course - 3
• SQL Procedures (PL/SQL)– Differences from scripting– Business rules– Constructions– Parameterised SQL statements– Exception handling– Important for:
• handling business rules across an enterprise
Balance of Course - 4
• Access Methods– physical side – placement and retrieval of data– choice of algorithms– two main types
• B+-trees• Hashing
– Important for:• efficient access to data
Exam Paper
• Database paper counts 40%– database assignments 10%– java paper 35%– java assignments 15%
• Database paper is independent of 1st semester work– exam will be on 2nd semester material
• No previous database paper in this area
Type of exam
• Closed book– no materials can be carried in
• Two hours duration
• Four questions on paper
• Three questions should be attempted
Type of question
• 20 marks in total on each question– 8-10 marks typically for testing basic
knowledge of a subject (definitions, simple derivations, material from lecture notes)
– 10-12 marks typically for problem solving i.e. taking a scenario and providing a solution in terms of code or algorithm.
Recommended Strategy
• Be familiar with the lecture notes• Be familiar with existing exercises and their
solutions (including assignments) on BB• You should assume that the exam will test your
understanding of the lecture notes and exercises• Problems in the exam environment will generally
be simpler than those in the assignment environment.
Problem Solutions
• Under desk-bound exam conditions it is not possible to test program code exhaustively.
• Also student does not have feedback from compiler or run-time system.
• Ideal expectation is that code will:– provide basis of implementation– with feedback from live system, be readily
modified to provide an acceptable deliverable.