O-O, What Are They Doingto Relational Databases?
(The Evolution of DB2 Universal Database)
Michael J. Carey
IBM AlmadenJanuary 1999
Plan for Today's Presentation
The relational DBMS revolutionThe object-relational DBMS evolutionO-R features in DB2 Universal Database V5.2
Some O-R implementation tradeoffs (V5.2)What lies ahead for DB2 UDB & O-R databases?
Questions (and possibly answers)Please ask questions throughout...!
The Relational DBMS Revolution
The pre-relational era (1970's) Graph-based data models
Hierarchical (IMS), network (Codasyl) Low-level, navigational interfaces
Labor-intensive and error-proneThe relational era (1980's)
Simple, abstract data model Database = set of relations ("tables") 3 schema levels: views, base tables, physical schema Algebra of set-oriented operations
High-level, declarative interfaces SQL, QBE, et al Embedded languages, 4GLs
The Relational Model (in one slide)
Employees and departments
Department dno name 10 Toy 20 Shoe
Employee eno name salary dept 1 Lou 10000000 10 7 Laura 150000 20 22 Mike 80000 20
select E.name, E.salary, D.nofrom Employee E, Department Dwhere E.salary < 100000
and D.name = 'Shoe'and E.dept = D.dno;
?
Relational Databases: A Success Story
The relational model has been a big success
Simplicity has made research tractable Data independence yields productivity gains Both academia and industry have benefitted
Relational DBMS "goodies" include Efficient query optimization and execution Well-defined transaction semantics and support Excellent multi-user performance and robustness Views for data independence, authorization Constraints, triggers, and stored procedures for (shared) business rule capture/enforcement
"The" success story for parallel computing
We've Achieved Nirvana ... Right?
The world is becoming increasingly complex
New data types are appearing (e.g., multimedia) Real-world data doesn't fit neatly into tables
Entities and relationships (vs. tables) Variance among entities (vs. homogeneity) Set-valued attributes (vs. normalization)
Advanced applications bring complex data E.g., CAD/CAM data management, web data management, geographic information management, medical data management, (your favorite application goes here)
So maybe objects are the answer...? Yes, if we can keep all the relational "goodies"!
The Object-Relational DBMS EvolutionO-R extension #1: Abstract data types (ADTs)
New column types and functions/methods E.g., text, image, audio, video, time series, point, line, OLE...
For modeling new kinds of facts about enterprise entities Infrastructure for extenders/datablades/cartridges
O-R extension #2: Row types Types and functions/methods for rows of tables
Desirable features include references, inheritance, methods, late binding, and collection-valued attributes
For modeling enterprise entities with relationships & behavior
Infrastructure for DBMS-native object management
Recent SQL3 merger: Structured types Can use for types of columns and/or tables
"Not Your Father's Employee Type"
Beyond name, rank, and serial number New attribute types
Location (2-d point), job description (text), photo (image), ... Associated functions
Distance(point, point), contains(text, string), ...Beyond your basic employee record
Employees come in different flavors Emp, RSM, Programmer, Manager, Temp, ...
Employees have many known relationships Manager, department, projects, ...
Employees have behavior Age(Emp), qualified(Emp, Job), hire(Emp), ...
An Employee is a simple "business object"
OSF stands for "Object Strike Force" Semi-autonomous group "outside" UDB development
Focus: object-relational extensions for DB2 UDB Both near-term and longer-term interests Collaborate with our Toronto and Santa Teresa labs
Significant activities to date Prototyped "row type" support for DB2 UDB Delivered in DB2 UDB Version 5.2 (9/98) Significantly revised SQL3 draft standard Working on next step plus future technologies
The OSF Project at IBM Almaden
DB2 for Common Servers (Version 2) User-defined column types (UDTs/distinct types) User-defined functions (UDFs) Binary/character large objects (BLOBs/CLOBs)
Distinct types: new data types for columns Ex: create distinct type US_Dollar as Real with comparisons;
US_Dollar is an available UDT with functions =, <>, <, <=, >, >=, US_Dollar(Real), Real(US_Dollar)
User-defined functions: associated operations
create function CA_Tax (US_Dollar) returns US_Dollar external name 'money!US_Dollar' language C;
DB2 Universal Database, Version 5
Lots of other interesting features as well, e.g.:
Constraints and triggers Recursive queries OLAP support (cube and rollup) Extenders (based on UDTs/UDFs)
Wide range of hardware/software platforms PCs: Windows95, NT, OS/2, SCO Unix workstations: AIX, Solaris, HP/UX Parallel platforms: SMPs, MPPs (e.g., SP2)
Descended from Almaden's Starburst system
Extensible query compiler (with rule-based query rewrite and query optimizer components)
DB2 Universal Database, Version 5 (cont.)
Structured types and references Named types with attributes, O-O subtyping model Ref(T) for directly modelling relationships
Typed tables and table hierarchies Oid (user-provided) plus a column per attribute of T Subtables for querying and managing subtype instances
Query language extensions Substitutability for queries/updates (data independence ++)
Path expressions for querying relationships easily Functions/predicates for runtime type inquiries
Object views (via a novel approach) Virtual table hierarchies for flexible access control Also facilitates O-O views of legacy tables
New O-R Features in DB2 UDB V5.2
A Simple Example
Employee and department tables in the (late) 90's
emp
dept
student
person
exec
mgr
dept
Structured Types and References
Create structured types (and references)
create type Person_t as ( name Varchar(40), birthyear Integer);
create type Emp_t under Person_t as ( salary Integer, dept Ref(Dept_t));
create type Exec_t under Emp_t as ( bonus Integer);
create type Student_t under Person_t as ( major Varchar(20));
Structured Types and References (cont.)
Create structured types (cont).
create type Dept_t as (name Varchar(20),
budget Integer, headcount Integer, mgr Ref(Emp_t));
Okay, so I lied (a little) on the last slide...
alter type Emp_t add attribute dept Ref(Dept_t);
Typed Tables and Table Hierarchies
Now create typed tables (and subtables)
create table person of Person_t (ref is oid user generated);
create table emp of Emp_t under person (dept with options scope dept);
create table exec of Exec_t under emp;
create table student of Student_t under person;
create table dept of Dept_t (ref is oid user generated, mgr with options scope emp);
SQL Query Extensions (by example)
Substitutability
insert into empvalues (Emp_t('e100'), 'John Smith', 1968, 65000, (select oid from dept where name = 'Database'));
update person set birthyear = birthyear + 1 where name = 'John Smith';
Data modification (insert; update/delete)
select E.* from emp Ewhere E.birthyear > 1970 and E.salary > 50000;
Path expressionsselect E.name, E.dept->namefrom emp Ewhere E.dept->mgr->dept->mgr->name = 'Lou Gerstner';
Querying Table Hierarchies: An Example
oid name birthyear
P1 Harold 1970
P2 Carol 1958
oid name birthyear
dept
P3 Hamid 1956
Person
Emp
oid name .....
D1 Databases .....
Dept
select * from Person where name like 'H%'
select name, dept->name from Emp where birthyear < 1960
P4 Lou 19401940 _
SQL Query Extensions (cont.)
Support for type-dependent queries
select *from only (emp) Ewhere dept->budget > 10000000;
select namefrom person Pwhere deref(oid) is of type (only Emp_t, Student_t);
select type_name(deref(E.oid)), E.*from outer (emp) Ewhere e.oid = Emp_t('e13');
Other Data Definition Features
ref is for object id column Unique, user-generated (on insert)
scope clause for reference columns Provides critical information to the query optimizer
not null constraints Definable at any level of a table hierarchy Enforced for indicated table and its subtables
unique constraints Root level (and columns) only
create index for physical schema Unique or non-unique index on root table Non-unique index on subtable
Other Data Definition Features (cont.)
Authorization model for table hierarchies grant and revoke on table or subtables Substitutability: implicit subtable authorization on columns inherited from an authorized supertable
Ex #1: select privilege on person table Ex #2: update privilege on salary column of emp table
Some operations require authorization everywhere
deref function is of type predicate and type_xxx functions
SQL3 also supports granting of table/subtable privileges with hierarchy option
Object Views in DB2 UDB
Typed views and view hierarchies
vemp
vdept
vstudent
vperson
mgr
dept
Requirements: virtual table hierarchies Typed rows with (derived) object ids Views may be quite different from base data Support for interconnected "view schemas"
Types For Object Views
Create types for use in views
create type VPerson_t as ( name Varchar(40));
create type VEmp_t under VPerson_t as ( dept Ref(VDept_t));
create type VStudent_t under VPerson_t as ( kind Varchar(8));
create type VDept_t as (name Varchar(20), mgr Ref(VEmp_t)
);
Typed View Hierarchies
Now create typed views (and subviews)
create view vperson of VPerson_t (ref is oid user generated) as select VPerson_t(Varchar(oid)), name from only (person);
create view vemp of VEmp_t under vperson (dept with options scope vdept) as select VEmp_t(Varchar(oid)), name, VDept_t(Varchar(dept)) from emp where salary > 0;
create view vstudent of VStudent_t under vperson as select VStudent_t(Varchar(oid)), name, case when major like '%Engineer%' then 'Geek' else 'non-Geek' end from student;
create view vdept of VDept_t ...;
Some guiding principles for DB2 UDB V5.2 Performance must equal/exceed relational equivalents Design amenable to future plans w.r.t. type evolution Structured types must be supported in columns (someday)
Localize initial changes to query compiler where possible
Want "free" indexing, rewrites, optimization, parallelization, ...Influenced by discussions with a CAD/CAM vendor
Information on existing approach and installations Requirements for efficiency of new products
Let's look briefly at two areas Table hierarchy representation References and path query processing
O-R Implementation Issues/Tradeoffs
Implementation table approach One physical table per table hierarchy with:
Type tag column (to distinguish subtable rows) Object id column Columns for all columns of the root table and its subtables
Vertical partitioning approach One physical root table with:
Type tag column Object id column Columns for each root table column
N physical delta tables (one per subtable) with: Object id column Columns for each column introduced by this subtable
Implementing Table Hierarchies
Horizontal partitioning approach N separate physical tables with:
Object id column Columns for every subtable column (inherited or not)
So what did we do for UDB V5.2...? Vertical partitioning approach rejected quickly
Too many joins to materialize subtable rows Multi-column constraints and indices problematic
Horizontal partitioning approach rejected eventually Uniqueness issue for user-generated object ids Query complexity for multi-hierarchy join queries
Ex: select p.name, q.name from Person p, Person q where ... Implementation table approach taken for V5.2
Appeared to give us the most "free" functionality Adopted despite row size (null columns) downside
Implementing Table Hierarchies (cont.)
Reference values in tables should have a scope
"Other end" info for query rewrite and join optimization
Ditto for authorization checking (static vs. dynamic) Schema makes overly wide references unnecessary Uniqueness is hierarchy-relative, enforced with an index
V5.2 self-references (object ids) are user-generated
CAD/CAM vendor had "legacy references" in files Different users have different id generation schemes Loading cyclic data (e.g., emp<-> dept) is messy and slow
Ditto for creating objects from an object cache
References and Path Expressions
Path expressions are logically equivalent to subqueries
References and Path Expressions (cont.)
select E.name, E.dept->name, E.dept->mgr->namefrom emp Ewhere E.dept->headcount > 10;
Actual approach: shared subquery generation (QGM)
Compute common paths (prefixes) once to save work Not every SQL context accepts an actual subquery Also need to handle non-serializable locking levels
Efficiency obtained through query rewrite, e.g.: Subquery to outer-join transformation Outer-join to join transformation where possible
V5.2 of UDB contains new O-R features Structured types with inheritance Object tables and table hierarchies References and path expressions Object views and view hierarchies
Moreover, so does the SQL3 standard Includes object views and user-defined references IBM, Oracle, Informix heading in same general direction
Work continuing on O-R extensions Let's have a brief look...
Where We Are Today in UDB
Business rule mechanisms for typed tables Check constraints on tables/subtables (w/inheritance) Referential integrity constraints to and from tables/subtables
Triggers on tables/subtablesObject modeling and management extensions
User-defined reference types (ref using) More flexible object view definitions Type and instance (i.e., row) evolution
Structured types for attributes/columns Work in progress at IBM Santa Teresa Lab Functions/methods just around the corner as well
Additional Object Table Support
Efficient support for collection types Multivalued attributes (e.g., Project.team) Flavors: set, multiset, array, list, ... Need to integrate into SQL, support querying well Some experience from a first prototype
Other activities (and open problems) Java mappings & bindings for O-R data XML & data-centric web sites ("d-commerce") Business object servers (caching/consistency) Heterogeneous data & O-R database systems User-defined and/or external indexing Optimizer "hooks" for new data types Etc.!
Other Exploratory O-R Work
Almaden Research Center Mike Carey, Don Chamberlin, Srinivasa Narayanan, Bennet Vance; C.M. Park; Guido Moerkotte
Santa Teresa Lab Nelson Mattos Gene Fuh, Michelle Jou, Brian Tran
Toronto Lab Doug Doole, Serge Rielau, Rick Swagerman Leo Lao, Walid Rjaibi, Calisto Zuzarte Cheryl Greene, various other consultants/hecklers
And as for future versions of UDB Your name could appear here! (MS/PhD)
Partial List of UDB O-R Contributors
What About Object-Oriented DBMSs?
OOPL + DBMS = OO-DBMS Commonly based on C++ or Smalltalk Persistence, collections, queries, versions, ...
Lots of interesting and useful research results O-O data models and query languages O-O query processing, system architecture, performance Various products (O2, Objectstore, Versant, Objectivity, ...)
No widespread commercial acceptance Many differences across systems (despite ODMG-93) Never really caught up to RDBMS techology
Schema compilation, evolution painful Missing many of the relational "goodies" Single-language focus, lack of (relational) tools