ordb implementation discussion. ramakrishnan and gehrke. database management systems, 3 rd edition....
Post on 21-Dec-2015
216 views
TRANSCRIPT
![Page 1: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/1.jpg)
ORDB ImplementationDiscussion
![Page 2: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/2.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
From RDB to ORDB
Issues to address whenadding OO extensions to
DBMS system
![Page 3: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/3.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Layout of Data
• Deal with large data types : ADTs/blobs• special-purpose file space for such data, with
special access methods• Large fields in one tuple :
• One single tuple may not even fit on one disk page
• Must break into sub-tuples and link via disk pointers
• Flexible layout : • constructed types may have flexible sized sets, ,
e.g., one attribute can be a set of strings.• Need to provide meta-data inside each type
concerning layout of fields within the tuple• Insertion/deletion will cause problems when
contiguous layout of ‘tuples’ is assumed
![Page 4: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/4.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Layout of Data
• More layout design choices (clustering on disk):
• Lay out complex object nested and clustered on disk (if nested and not pointer based)
• Where to store objects that are referenced (shared) by possibly several other and different structures
• Many design options for objects that are in a type hierarchy with inheritance
• Constructed types such as arrays require novel methods, like array chunking into (4x4) subarrays for non-continuous access
![Page 5: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/5.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Objects/OIDs
• OID generation : uniqueness across time and system
• Object reference handling : • must avoid dangling references• semantics for object manipulation for
shared objects
![Page 6: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/6.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
ADTs
•Type representation: size/storage•Type access : import/export•Type manipulation: special methods
to serve as filter predicates and join predicates
•Special-purpose index structures : efficiency
![Page 7: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/7.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
ADTs
• Mechanism to add index support along with ADT:• External storage of index file outside DBMS• Provide “access method interface” a la:
• Open(), close(), search(x), retrieve-next()• Plus, statistics on external index
• Or, generic ‘template’ index structure • Generalized Search Tree (GiST) – user-extensible• Concurrency/recovery provided
![Page 8: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/8.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Query Processing
• Query Parsing :• Type checking for methods• Subtyping/Overriding
• Query Rewriting:• May translate path expressions into join
operators• Deal with collection hierarchies (UNION?)• Indices or extraction out of collection
hierarchy
![Page 9: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/9.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Query Optimization Core
• New algebra operators must be designed :• such as nest, unnest, array-ops,
values/objects, etc.
• Query optimizer must integrate them into optimization process :• New Rewrite rules• New Costing• New Heuristics
![Page 10: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/10.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Query Optimization Revisited
• Existing algebra operators revisited : SELECT
• Where clause expressions can be expensive
• So SELECT pushdown may be bad heuristic
![Page 11: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/11.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Selection Condition Rewriting
• EXAMPLE:• (tuple.attribute < 50)
• Only CPU time (on the fly)
• (tuple.location OVERLAPS lake-object)• Possibly complex CPU-heavy computations • May Involve both IO and CPU costs
• State-of-art: • consider reduction factor only
• Now, we must consider both factors:• Cost factor : dramatic variations • Reduction factor: unrelated to cost factor
![Page 12: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/12.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Operator Ordering
op1
op2
![Page 13: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/13.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Ordering of SELECT Operators
• Cost factor : dramatic variations • Reduction factor: orthogonal to cost factor• We want: maximal reduction and minimal cost• Rank ( operator ) = (reduction) * ( 1/cost ) • Order operators by increasing ‘rank’
• High rank (good) -> low in cost, and large reduction
• Low rank (bad) -> high in cost, and small reduction
![Page 14: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/14.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Access Methods ( on what ?)
• Indexes that are ADT specific• Indexes on navigation path• Indexes on methods, not just on
columns• Indexes over collection hierarchies
(trade-offs)• Indexes for new WHERE clause
expressions not just =, <, > ; but also “overlaps”
![Page 15: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/15.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Registering New Index (to Optimizer)
• What WHERE conditions it supports• Estimated cost for “matching tuple”
• Given by index designer (user?)• Monitor statistics; even construct test plans
• Estimation of reduction factors/join factors:• Register auxiliary function to estimate factor• Provide simple defaults• Estimation of method costs (~IO/CPU)
![Page 16: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/16.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Methods
• Dynamic linking of methods (outside DB)• Overwriting methods for type hierarchy• Use of “methods” with implied semantics• Incorporation of methods into query
process : termination? • “untrusted” methods : methods corrupt
server or modify DB content (side effects)• Handling of “untrusted” methods :
• restrict language; interpret vs compile, separate address space as DB server
![Page 17: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/17.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Query Optimization with Methods
• Estimation of “costs” of method predicates• Optimization of Method execution:
• Similar idea as handling correlated nested subqueries; must recognize repetition and rewrite physical plan.
• Provide some level of precomputation and reuse
• Optimization of Method execution:• 1. If called on same input, cache that one result• 2. If on full column, presort column first (groupby)• 3. Or, precompute results of methods for each
possible value in domain; and put in hash-table : fct (val );
Look up in hash-table during query processing or even join with it, instead of recomputing : val fct (val)
![Page 18: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/18.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Query Processing• User-defined aggregate functions:
• E.g., “second largest” or “second yellowest”
• Distributive aggregates: incremental computation • Provide:
• Initialize(): set up state space• Iterate(): per tuple update the state• Terminate(): compute final result based on state; and
cleanup state
• For example : “second largest” • Initialize(): 2 fields• Iterate(): per tuple compare numbers• Terminate(): remove 2 fields
![Page 19: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/19.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Following Disk Pointers?
• Complex object structures with object pointers may exist (~ disk pointers)
• Navigate complex object into memory for a long-running transaction like in CAD design
• What to do about “pointers” between subobjects or related objects ?• Swizzle = replace OIDs dereferences by in-
memory pointers, and unswizzle back at end.• Issues : In-memory table of OIDs and their state;
indicate in each object pointer via a bit.• Different policies for swizzling: on access,
attached to object brought in, etc.
![Page 20: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/20.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Models of Persistence• Different models of persistence for OODB
implementations:• Parallel type systems:
• E.g., int and dbint• User must make decision at object creation time• Allow for user control by “casting” types
• Persistence by container management:• Objects must be placed into “persistent containers” such as
relations in order to stay around• Eg., Insert o into Collection MyBooks;• Could be rather dynamic control without casting
• Persistence by reachability :• Use global variable names to objects and structures• Objects being referenced by other objects that are reachable
by application, they by transitivity are also persistent.• need garbage collection
![Page 21: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3fe2c/html5/thumbnails/21.jpg)
Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.
Summary
• A lot of work to get there: From physical database
design/layout issues up to logical query optimizer extensions
• ORDB: reuses existing implementation base and incrementally adds new features on (but relation is first-class citizen)