views (and sql) - rutgers universityszhou/336/views-triggers.pdf · views (and sql)...
TRANSCRIPT
©Borgida/Ramakrishnan & Gehrke 2003 1
Views (and SQL)
©Borgida/Ramakrishnan & Gehrke 2003 2
Levels of Abstraction
• Many views, single conceptual (logical) schema and physical schema.
» Views describe how users see the data.
» Logical/Conceptual schema defines logical structure
» Physical schema describes the files and indexes used.
Physical Schema
Logical Schema
View 1 View 2 View 3
©Borgida/Ramakrishnan & Gehrke 2003 3
Example: University Database
• Logical schema: Students(sid: string, name: string, login: string, age: integer, gpa:real)
Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string)
• Physical schema: » Relations stored as unordered files. » Index on first column of Students.
• External Schema (View): Course_info(cid:string,enrollment:integer)
©Borgida/Ramakrishnan & Gehrke 2003 4
Data Independence
• Applications insulated from how data is structured and stored.
• Physical data independence: Protection from changes in physical organization of data. (e.g., keep students sorted by name or sid?) Supports optimization.
• Logical data independence: Protection from changes in logical structure of data. (e.g., later decide not to store a gpa column, since it is redundant and rules change) Supports evolution of domain
One of the most important benefits of using a DBMS!
©Borgida/Ramakrishnan & Gehrke 2003 5
Views • A view is a virtual relation (as opposed to a table,
which is a stored relation) described by a query » (Saw it in Datalog as a derived predicate)
• Often not stored but ... • Can be queried just like any table • May be updated in certain circumstances
ISSUES: • Syntax • Query processing over views • Updates of views • <“Materialized views” and their maintenance>
©Borgida/Ramakrishnan & Gehrke 2003 6
Views in SQL
• Can query the view as if it was an ordinary table
CREATE VIEW OKStudent(sid,name,login,age,gpa) AS SELECT S.sid,S.name,S.login,S.age,S.gpa FROM Student S, Enrolled E WHERE S.sid=E.sid AND E.grade >= ‘B’
View:
SELECT X.name, login FROM OKStudent X, Enrolled E WHERE (X.age > 22) ...
Query1
SELECT X.age, avg(X.gpa) FROM OKStudent X GROUP BY X.age
Query2
©Borgida/Ramakrishnan & Gehrke 2003 7
Views processing by query expansion
• (Actually valid SQL syntax. But this happens without user knowing about it.)
CREATE VIEW OKStudent(sid,name,login,age,gpa) AS SELECT S.sid,S.name,S.login,S.age,S.gpa FROM Student S, Enrolled E WHERE S.sid=E.sid AND E.grade >= ‘B’
SELECT X.age, avg(X.gpa) FROM OKStudent X GROUP BY X.age
SELECT X.age, avg(X.gpa) FROM (SELECT S.sid,S.name,S.login,S.age,S.gpa
FROM Student S, Enrolled E WHERE S.sid=E.sid AND E.grade >= ‘B’) AS X
GROUP BY X.age
View:
Query:
Modified Query:
©Borgida/Ramakrishnan & Gehrke 2003 8
View processing by query expansion
Easier way to see what happens • in Relational Algebra
» a view definition V provides an expression (tree) TV for the view
» a query Q using the view V is another expression TQ
» query expansion substitutes the tree TV for each occurrence of name V in TQ
V
©Borgida/Ramakrishnan & Gehrke 2003 9
Uses of views 1. Support logical data independence 2. Facilitate asking frequently occurring queries (the
following are toy examples for later use)
oldStudents(sid,sname,age,gpa) :- Students(sid,sname,_,age,gpa),age>25
• 3. Restrict access of some users to data (e.g., sid is not published together with sname to protect from identity theft)
classList(sname,cname) :- Students(sid,sname,...), Enrollment(sid,cid,_),Course(cid,cname,...)
©Borgida/Ramakrishnan & Gehrke 2003 10
View updates • Since users are not supposed to be aware of the “real
content” of the database, they may want to update the view like a regular relation oldStudents(sid,sname,age,gpa) :- Students(sid,name,login,age,gpa),age>25 insert into oldStudents(sid,...) values (455, ...)
• Since views are not stored tables » updates need to be translated to changes to base tables » then the view needs to be recomputed » expect that at this point the update “shows up” in the view
©Borgida/Ramakrishnan & Gehrke 2003 11
View updates 1 Lets look at some examples oldStudents(sid,sname,age,gpa) :- Students(sid,sname,login,age,gpa),age>25
• delete from oldStudents where sid = 937 • insert oldStudents(sid,sname,age,gpa) values (455, ‘alicia’
,28,3.9) » what value to give to login? NULL
• update sname=‘fred’ where sid = 937 • update oldStudents set age=26 where sid=937 • update oldStudents set age=19 where sid=937
» as a result the tuple will disappear from the view (may puzzle the user, if not aware of the definition of the view!)
Updates to single table views seem pretty clear
©Borgida/Ramakrishnan & Gehrke 2003 12
View updates 2 Another example oldStds2(sid,sname,gpa) :- Students(sid,sname,_,age,_),age>25
• insert oldStds2(sid,sname,gpa) values (455, ‘alicia’,3.9) » what value to give to login? NULL » what value to give to age? NULL => BUT the tuple won’t show up in oldStds2, even after insertion into Students??!
» interestingly, if the view was StudentsX(sid,sname,gpa) :- Students(sid,sname,_,age,_),age=10 there would have been an obvious value to assign to age, namely 10.
©Borgida/Ramakrishnan & Gehrke 2003 13
View updates 3 Yet another example oldStudents3(sname,gpa) :- Students(sid,sname,login,age,gpa),age>25
• delete oldStudents3 where sname=‘ann’ » ?delete all tuples with name ‘ann’, even if only one tuple shows in
oldStudents3 » change these names to null?
• insert oldStudents3(sname,gpa) values (‘bob’,2.0) » what value to give to sid? NULL => but key cannot be null??!!
Moral: Even for simple views, things can get complicated!
©Borgida/Ramakrishnan & Gehrke 2003 14
View updates 4 Consider the following simple example view definition
involving joins mayVisit(person,store) :- buys(person,food), soldBy(food,store)
and suppose that the database has buys: (bob,pie), (ann,pie), soldBy: (pie,iga), (pie,sfwy)
• insert mayVisit(bob,foodtown): » what food would bob buy? » what would foodtown sell?
• delete mayVisit(bob,sfwy) : » if you detele buys(bob,pie), you delete mayVisit(bob,iga) too! » if you detele soldBy(pie,sfwy), you delete mayVisit(ann,sfwy) too!
©Borgida/Ramakrishnan & Gehrke 2003 15
View updates 5 But not all view definition involving joins are
problematic: StdCrsB(sid,sname,crsid):- Student(sid,sname,...),Enrol(sid,crsid,grade),grade=‘B’
• insert StdCrsB(4034,’zubin’,cs399) » has clear meaning of inserting Student, if necessary, and Enrol
• delete StdCrsB(3555,’shen’,cs399) » probably does not mean delete the student, only the enrolment
Problem is that it is hard for system to figure out the right meaning!
• SQL systems have their own rules • Better solution coming: TRIGGERS
©Borgida/Ramakrishnan & Gehrke 2003 16
The general view-update problem • Sometimes a view update has a “natural” translation into base
updates: usually when there is a single minimal change that can accomplish it. (At other times, the domain semantics seems to give a natural meaning -- see previous example)
• So, starting from Person(name,age) » NAMES(Name) :- (SELECT name FROM Person), insert, delete & update all seem to have obvious translations. DBMS support
such updates directly. » YOUNG(Name) :- (SELECT name FROM Person WHERE age=30) deletion, update, & insertion have an obvious and unique translation into the
same operation on Person. But not when (age<30) replaces (age=30)! » AGES(Age) :- (SELECT age FROM Person) more doubts arise. (To delete age 5, you have only one choice: delete all rows
in Person with age 5. If only one such tuple, makes sense. If many, a bit too drastic as an automatic change?? There is also alternative of setting age=5 to null)
©Borgida/Ramakrishnan & Gehrke 2003 17
The general view-update problem » And for a view like
Coeval(X,Y) :- Person(X,A),Person(Y,A). although some updates, like changing the value of X, affect only
one tuple (and hence seem to be the natural choice), other actions, like deleting (b,c) can be accomplished in many alternative ways, with no reason to choose one over the other. (e.g., if the common age of b and c was 25: delete (b,25)? delete (c,25)? both? or even alter the age of b and/or c?
DBMS support a small subset of updates with simple and unambiguous semantics (based on easy syntactic check of the view definition) and leave the rest to be “programmed” (using triggers) by application developers.
©Borgida/Ramakrishnan & Gehrke 2003 18
Materialized Views
• A view whose tuples are stored in the database is said to be materialized. » Provides fast access, like a (very high-level) cache. » At a cost: need to maintain the view as the underlying
tables change. » Alternative techniques for view maintenance:
• Re-compute view after every update to the base relations used in its definition
• Incremental updates based on changes to the base relations. (Can be expressed with triggers - see later.)
©Borgida/Ramakrishnan & Gehrke 2003 19
Triggers (and SQL)
©Borgida/Ramakrishnan & Gehrke 2003 20
Triggers • Constraints:
» How does one carry out other kinds of repairs than propagate?
» Or repairs for other kinds of constraints than foreign key? » How can one check dynamic integrity constraints:
“Salaries do not decrease”? Or check efficiently when general constraints are violated?
• View updates: How to choose one among many reasonable actions to take when views are updated?
• Alerts: » How to invoke program to reorder merchandise when
stock gets too low? » To remind someone of an unfinished task when deadline
passes?
©Borgida/Ramakrishnan & Gehrke 2003 21
Triggers Trigger: procedure that starts automatically if specified
changes occur in the DBMS • Three parts:
» Event (activates the trigger) » Condition (tests whether the triggers should run) » Action (what happens if the trigger runs)
Issues: • Event specification language:
» primitive changes to table (insert/delete/update), » (other alternatives: time alarms, “event patterns”)
• Action » can refer to data in event and condition » carried out for each tuple changed vs entire update statement
©Borgida/Ramakrishnan & Gehrke 2003 22
SQL’99 Triggers: example 1
CREATE TRIGGER dynamicSalaryCheck AFTER UPDATE OF salary ON Employee
REFERENCING OLD ROW AS oldTuple NEW ROW AS newTuple
FOR EACH ROW WHEN (oldTuple.salary > newTuple.salary) INSERT INTO WarningTable(empId#,oldSal,newSal) VALUES(oldTuple.empid#,oldTuple.salary,newTuple.salary)
e.g., Dynamic integrity constraint checking, (and exception handling)
©Borgida/Ramakrishnan & Gehrke 2003 23
Triggers - example 2
CREATE TRIGGER youngStudentInsert INSTEAD OF INSERT ON YoungStudents
REFERENCING NEW TABLE NewYngStudents FOR EACH STATEMENT INSERT INTO Students(sid, name, age, rating) SELECT sid, name, 18, null FROM NewYngStudents
(The set of tuples inserted by the statement which caused the trigger to activate.)
“Whenever students are inserted into the YoungStudent view set their age to 18 in the Student table”
CREATE VIEW YoungStudents(sid,name) AS SELECT sid,name FROM Student WHERE age <= 19.
©Borgida/Ramakrishnan & Gehrke 2003 24
SQL3 syntax CREATE TRIGGER <id> {BEFORE | AFTER | INSTEAD OF} <trigger event> ON <table name> FOR EACH { ROW | STATEMENT} REFERENCING <reference> WHEN (<condition>) <SQL procedure statement>
<trigger event> ::= INSERT | DELETE | UPDATE [OF <col name>] <reference> ::= OLD ROW AS <id> | NEW ROW AS <id> | OLD TABLE AS <id> | NEW TABLE AS <id>
The set of rows inserted/updated (as they are after the update) The set of rows deleted/updated (as they were before the update)
©Borgida/Ramakrishnan & Gehrke 2003 25
Triggers - example 3 • Simulating referential integrity constraint maintenance:
what if you wanted to have the following but SQL did not provide syntax for it
CREATE TRIGGER StudentDelete1 AFTER DELETE ON Students
REFERENCING OLD ROW AS GoneStdnt
FOR EACH ROW DELETE FROM Enrolement E WHERE E.sid = GoneStdnt.sid
CREATE TRIGGER StudentDelete2 AFTER DELETE ON Students
REFERENCING OLD TABLE AS StudentsGone
FOR EACH STATEMENT DELETE FROM Enrolement E WHERE E.sid IN (SEL sid FROM StudentsGone)
table Enrolment( sid foreign key references Student on delete cascade, cid ... ...)
©Borgida/Ramakrishnan & Gehrke 2003 26
Triggers for Integrity Checking & Maintenance • Integrity constraints (ICs) are often not used because they are too
expensive to check -- theoretically needs to be done after every update to the database
• Of course, a constraint like (salary <10000) has no chance of being made false by updates to table Courses, or even to field age of table Employee
• So DBMS implementations would like to check for violations of constraints on as small set of changes as possible. (DBMS optimizers are not very good at finding this.)
• Moreover, once we find these, users may want to specify how to react to them: sometimes disallow the operation (“abort”); other times invoke repair actions (e.g., “if customer exceeds her credit limit, increase limit by 10% and send email”.)
• Triggers are good for checking for violations and specifying violation handling!
©Borgida/Ramakrishnan & Gehrke 2003 27
Example of IC Schema: cheque(Check#,Client,Status,Amount) //Client sent Check# for Amount; Status:cashed,... hasAccount(Client,Bank) processes(Check#,Bank,State,Date)
IC: “if Calif. bank processes a check for over $100 then client must have account at that bank”
SQL query looking for errors: “Calif. bank processing a check for over $100 but client does not have account at it”
exists (select * from cheques c, processes p where c.check#=p.check# and c.amount>100 and p.state=‘Calif’ and not exists(select * from hasAccount h where h.client=c.client and h.bank=p.bank )
©Borgida/Ramakrishnan & Gehrke 2003 28
Example of IC maintenance using trigger Schema: cheque(Check#,Client,Status,Amount) hasAccount(Client,Bank) processes(Check#,Bank,State,Date)
IC: “if Calif. bank processes a check for over $100 then client must have account at that bank”
IC was true before but could become violated because of a change in the Amount of a cheque CREATE TRIGGER AFTER UPDATE OF Amount ON cheque WHEN exists ( select * from cheques c, processes p ...)
FOR EACH STATEMENT /* action1: locate offending tuple(s) and fix them */
This is an expensive test -- needs a join on Cheque and Processes
©Borgida/Ramakrishnan & Gehrke 2003 29
Better way to check and maintain same IC using trigger
AFTER UPDATE OF amount ON Cheques REFERENCING NEW ROW AS newTuple WHEN (newTuple.amount>100) and exists
(select * from processes p where newTuple.check#=p.check# and p.state=‘Calif’ and not exists(select * from hasAccount h where newTuple.client=h.client and h.bank=p.bank )
FOR EACH ROW /* action2: ask newTuple.Amount to be lowered */
Avoids an outer loop over Cheques, by using newTuple
©Borgida/Ramakrishnan & Gehrke 2003 30
Automatic derivation of triggers for�Integrity Checking [Ceri&Widom]
• How do we remember what triggers to set?
Schema: student(sid,name,age,rank) course(cid,cname,cnumber,limit) enrolment(sid,cid,term,grade)
IC: “only existing students can enroll courses”
• (1) Formulate query in Datalog using “error form”: error :- enrolment(SID,CID,TERM,GRADE), not student(SID,NAME,AGE,RANK).
A simple example first
©Borgida/Ramakrishnan & Gehrke 2003 31
Algorithm to find all events that can “falsify” a constraint = make error become true
• (2) eliminate un-needed variables (occurring only once) •
• (3) “positively” occurring predicates (enrolment) can cause new violations on insert
• (4) “negatively” occurring predicates (student) can cause violations on delete
• (5) changes to any columns corresponding to variable occurrences can cause violations on update ( student.{sid} enrolment.{sid} )
error :- enrolment(SID,_,_,_), not student(SID,_,_,_).
©Borgida/Ramakrishnan & Gehrke 2003 32
What conditions to check? How to fix? • Check if the entire constraint has become invalid
AFTER UPDATE OF cid ON enrolment WHEN exists ( (select e.sid from enrolments e) except (select s.sid
from student s) ) FOR EACH STATEMENT /* action1: locate offending tuple(s) and fix them */
This is a merge sort, assuming enrol,std stored sorted
This is a search (in sorted list). With only 1 or 2 tuples updated it is fast. With all tuples updated it is slower.
• Check if the part dealing with new value has become invalid AFTER UPDATE OF cid ON enrolment REFERENCING NEW ROW AS newTuple WHEN newTuple.sid not in
(select s.sid from student s) FOR EACH ROW /* action2: fix newTuple */
©Borgida/Ramakrishnan & Gehrke 2003 33
Automatic derivation of triggers �(more complex example)
Schema: cheque(Check#,Client,Status,Amount) //Client sent Check# for Amount; Status:cashed,...
hasAccount(Client,Bank) processes(Check#,Bank,State,Date)
IC: “if Calif. bank processes a check for over $100 then client must have account at that bank”
• (1) Formulate query in Datalog using “error form”: error :- cheque(Check#,Client,Status,Amount), Amount >100, processes(Check#,Bank,State,Date), State=‘Calif’, not hasAccount(Client,Bank).
©Borgida/Ramakrishnan & Gehrke 2003 34
Algorithm to find all events that can “falsify” a constraint = make error become true
(2) eliminate un-needed variables (occurring only once)
(3) “positively” occurring predicates (cheque, processes) can cause new violations on insert
(4) “negatively” occurring predicates (hasAccount) can cause violations on delete
(5) changes to any columns corresponding to variable occurrences can cause violations on update ( cheque.{Check#,Client,Amount} processes.{Check#,Bank,State} hasAccount.{Client,Bank})
error :- cheque(Ceck#,Client, _, Amount), Amount > 100, not hasAccount(Client,Bank), processes(Check#,Bank,State,_), State=“Calif”
©Borgida/Ramakrishnan & Gehrke 2003 35
Using triggers to spec view updates Recall StdCrsB(sid,sname,crsid):- Student(sid,sname,...),Enrol(sid,crsid,grade),grade=‘B’
• insert StdCrsB(4034,’zubin’,cs399) had clear meaning
CREATE TRIGGER INSTEAD OF INSERT ON StdCrsB FOR EACH ROW REFERENCING NEW ROW AS new BEGIN INSERT IN Student(sid,sname) VALUES (new.sid, new.sname); INSERT IN Enrol(sid,crsid,grade) VALUES (new.sid, new.crsid, “B”); END
©Borgida/Ramakrishnan & Gehrke 2003 36
Aside: on the need for ‘transactions’ for IC • Consider the situation where you have constraints that
» every course must have a student enrolled in it » enrolment has foreign key constraints to student and course
• How can you order » insert (cs333,...) into course; » insert (yuan,cs333) into enrolment;
without violating constraint in between each update? • Solution: “transaction”: group sequences of updates
begin insert (cs333,...) into course; insert (yuan,cs333) into enrolment; (commit)
end and check constraints only at the end of the transaction.
• Transactions are also useful for many other things! (e.g., if some IC violated at the end, all updates undone automatically.)
©Borgida/Ramakrishnan & Gehrke 2003 37
Using triggers to spec view updates (2) StdCrsB(sid,sname,crsid):- Student(sid,sname,...),Enrol(sid,crsid,grade),grade=‘B’
• delete StdCrsB(4034,’zubin’,cs399) means remove Enrolment (rather than remove Student or change grade from ‘B’, which would have had the same effect)
INSTEAD OF DELETE FROM StdCrsB FOR EACH STATEMENT REFERENCING OLD TABLE AS Oldies DELETE FROM Enrol e WHERE (e.sid,e.crsid) IN (SELECT sid, crsid FROM Oldies)
©Borgida/Ramakrishnan & Gehrke 2003 38
Triggers for specifying view update - e.g. 2
What should happen when the amt of a tuple in criticalCheques is updated? » if <50, disallow » if >50, propagate to cheques
criticalCheques(Check#,Client,Amt) :- cheques(Check#,Client, Status,Amt), Status<>’approved’,Amt>50, not hasAccount(Client,Bank).
VIEW:
©Borgida/Ramakrishnan & Gehrke 2003 39
Triggers for specifying view update 2
INSTEAD OF UPDATE OF amt ON criticalCheques FOR EACH ROW
REFERENCING NEW ROW AS new WHEN new.amt > 50 //if the amt<50 then it is an “illegal” update - in this case ignored
UPDATE cheques c //the base relation
SET c.amount = new.amt WHERE c.check#=new.check#
criticalCheques(Check#,Client,Amt) :- cheques(Check#,Client, Status,Amt), Status<>’approved’,Amt>50, not hasAccount(Client,Bank).
This is like an “application program”, except it is kept by the DBMS and automatically invoked at all the right times.
VIEW:
©Borgida/Ramakrishnan & Gehrke 2003 40
Triggers for Materialized View Maintenance
• After changing anything in one of the tables appearing in the view definition, we want to make sure the view is up-to-date » Use FOR EACH STATEMENT plus OLD TABLE and NEW
TABLE in SQL: before and after values of changed tuples
• (Note: real DBMS that support the declaration of “materialized views” should actually do this for you, so this is only to explain what happens underneath.)
©Borgida/Ramakrishnan & Gehrke 2003 41
Example
• Sample view definition in Datalog
“Critical cheques are ones that are not approved, are over $50, and are for clients who have no bank accounts”
criticalCheques(Check#,Client,Amt) :- cheques(Check#,Client, Status,Amt), Status<>’approved’,Amt >50, not hasAccount(Client,Bank).
©Borgida/Ramakrishnan & Gehrke 2003 42
AFTER INSERT ON Cheques FOR EACH ROW REFERENCING NEW ROW AS newChq
WHEN newChq.amt > 50 AND newChq.status <> ‘approved’ AND not exists(SELECT * FROM hasAccount h WHERE
h.client=newChq.client) INSERT INTO criticalCheques VALUES(newChq.check#, newChq.client, newChq.amount)
AFTER DELETE ON hasAccount FOR EACH ROW REFERENCING OLD ROW AS old
WHEN exists( SEL * FROM cheques c WHERE old.client = c.client AND c.amt > 50 AND c.status <> ‘approved’)
INSERT INTO criticalCheques WHERE client=old.client
©Borgida/Ramakrishnan & Gehrke 2003 43
AFTER DELETE ON Cheques FOR EACH STATEMENT
REFERENCING OLD TABLE AS oldChqs DELETE FROM criticalCheques c WHERE c.check# IS IN (SELECT o.check# FROM oldChqs)
AFTER INSERT ON hasAccount FOR EACH STATEMENT REFERENCING OLD TABLE AS newActs
DELETE FROM criticalCheques c WHERE c.client IS IN (SELECT a.client FROM newActs )
+ triggers un updates of columns check#,
©Borgida/Ramakrishnan & Gehrke 2003 44
General algorithm for deriving triggers • From view definition rule
V :- P1,...,Pn, C get 2n ECA rules
on {del|ins}Pi, if P1,..Pi-1, Pi+1,...Pn, C then {del|ins}V • But we also need to worry about multiple derivations of the
same fact (even if Pittsburgh airport closes, can still find routes via
Houston) So need rules to re-insert values! on del V, if P1,..Pi-1,Pi,Pi+1...Pn, C then insV
• order: (delete*)reinsert(insert*) • extend to deal with negation
©Borgida/Ramakrishnan & Gehrke 2003 45
Example
route(X,Y) :- train(X,Y) route(X,Y) :- route(X,Z),route(Z,Y) reachCal(C) :- station(C,S),S=‘calif’ reachCal(C) :- route(C,D),reachCal(D). unconnected(X,Y):-station(X,_),station(Y,_), not route(X,Y).
©Borgida/Ramakrishnan & Gehrke 2003 46
Triggers: summary
• Pros » Like with ICs, takes out stuff from application code and
makes it uniformly available, enforcing enterprise-wide policies.
» Allows optimization • Cons
» Difficulty of writing large sets of triggers, like programming
• termination • confluence