legodb 1 data binding workshop, avaya labs, june 2003 legodb: cost-based xml to relational...
TRANSCRIPT
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
LegoDB:Cost-based XML to Relational
“Shredding”Jerome Simeon
Bell Labs – Lucent Technologies
joint work with:Juliana Freire (Bell Labs, OGI)
Jayant Haritsa (IISc, Bangalore)Maya Ramanath (IISc, Bangalore)
Prasan Roy (Bell Labs, ITT Bombay)
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
XML Data Management
Application is based on XML– XML documents to represent information– XML Schema to model information– XPath / XQuery to access and manipulate information
XMLDocuments
XqueryEngine
XMLqueries
XMLresults
Find the title, year and box office proceeds for all 2001 movies
for $v in document(“imdbdata”)/imdb/showwhere $v/year=2001return($v/title, $v/year, $v/box_office)
RDBMS
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
Why Storing XML in a RDBMS?
For some applications it makes sense to use a relational database backend:
– Leverage many years of development of relational technology
– concurrency control/transaction support
– Scalability
– Safety (crash recovery, duplication)
– Integrate with existing data stored in an RDBMS
– Performance matters!!
But storing and querying XML data in an RDBMS is a non-trivial task
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
XML and Relational Databases
Mismatch between the relational model and XML. How to store XML data into relational tables?
– Mapping (“Shredding”) XML data into flat and regular tables
How to evaluate XML queries over relational tables?– Mapping XQuery into SQL (or SQL-XML?)...
Litterature filled with various mapping proposals:– Many variations over binary tables (Florescu et al, Grust et
al).– Shanmugasundaram et. All try to inline as much nested
elements as possible in the same table.There are many alternative mappings!
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
Various mappings...TABLE Show
(show_id INT,
title STRING,
year INT,
box_office INT,
seasons INT)
TABLE Review
(review_id INT,
tilde STRING,
review STRING,
parent_Show INT)
(I) Inline as many elements as possible
TABLE Show
(show_id INT,
title STRING,
year INT,
box_office INT,
seasons INT)
TABLE NYTReview
(review_id INT,
review STRING,
parent_Show INT)
TABLE Review
(review_id INT,
tilde STRING,
review STRING,
parent_Show INT)
(II)Partition reviews table-one for NYT,one for the rest
TABLE Show1
(show1_id INT,
title STRING,
year INT,
box_office INT)
TABLE Show2
(show2_id INT,
title STRING,
year INT,
seasons INT)
TABLE Review
(review_id INT,
tilde STRING,
review STRING,
parent_Show INT)
(III)Split Show table into TV and Movies
Define group Show { element show { element title type string, element year type integer?, element review { element * type string* }, (element box type string | element seasons type
string)}
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
...have various performances
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Q1 Q2 Q3 Q4 W1 W2
(I)
(II)
(III)
No given mapping is best
Q1: Simple selection queryQ2: Join involving a fragment of the reviewsQ3: publishing query
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
The LegoDB Storage "Shredding" Engine
An optimization approach:– automatically explores a space of possible mappings– selects the mapping which has the lowest cost for a given
application Important features:
– Application-driven: takes into account schema, data statistics and query workload
– Logical/physical independence: interface is XML-based (XML Schema, XQuery, XML data statistics)
– Leverage existing technology: XML standards; XML-specific operations for generating space of mappings; relational optimizer for evaluating configurations
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
Mapping an XML-Schema into Relations
Define group Show { element show { element title type string, element year type integer, group Reviews*, ... }define group Reviews { element review type string}
XML Schema
TABLE Show_Table( Show_id INT,
title STRING,
year INT )
TABLE Review_Table ( Review_id INT,
review STRING,
parent_Show INT )
Relational schema
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
A word about mapping queries
XQuery
for $x in //show/reviewwhere contains($x/review, “Potter”)
SQLselect reviewfrom Show_Table, Review_Tablewhere Parent_show = Show_id // join here!and review contains “Potter”
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
Transforming Schemas
Key idea: A given document can be validated by different XML Schemas:– Different but equivalent regular expressions can be used
to define an element– The presence or absence of a type name does not change
the semantics of an XML Schema Applying transformations that manipulate the types
(but preserve the element structure of schema) leads to a space of distinct relational configurations
Define XML Schema transformations that – Exploit the structure of the schema, and– lead to useful relational configurations
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
Adding a group corresponds to splitting a tabledefine group Show {
element show {
element title type string,
element year type integer,
element review type string*,
...
Original schema
Define group Show { element show { element title type string, element year type integer, group Reviews*, ... }define group Reviews { element review type string}
Transformed schema
TABLE Show( Show_id INT,
title STRING,
year INT )
TABLE Review (Review_id INT,
review STRING,
parent_Show INT )
Transformed Relational schema
TABLE Show( Show_id INT,
title STRING,
year INT,
review STRING)
Relational schema
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
Regular expression rewritings
(group A|group B),group C ==>group D|group E horizontal table partitiondefine group D=(group A,group C)define group E=(group B,group C)
group A+ ==> put first item in a table, put the group A, group A* rest in a different table
element * type string extracts certain element ==> in separate tableelement a type string |element (^a) type string ...........
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
LegoDB: Storage Design
Physical SchemaTransformation
Query/Schema/Stats
Translation
TraditionalRelational Query
Optimizer
XML data statistics
XML Schema
Relationalschema,
stats and workload
Goodconfiguration
Physical
schemaXQueryworkloa
d
cost estimate
Physical SchemaGeneration
Physical schema
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
Searching for a good configuration
Cost is key: use a relational optimizer as a black box– Support different cost-models– Quality of selected configuration depends on the accuracy of
the optimizer! Set of possible configurations that result from applying
the rewritings is very large - possibly infinite! How to search for the optimal solution?
– LegoDB use a greedy search Importance of statistics for cost evaluation
– Large collections vs. small collections (e.g., many new york time review or not?)
– Selectivity of predicates– XML structure distribution (distribution of children / parent
relation)
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
LegoDB: Runtime Support
goodconfiguration and
mapping specification
XQueryworkloa
d
DB Loader
XML document
Query Translation
mapping specification
XML result
Commercial RDBMS
tuples SQL query/results
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
Conclusion
Cost-based approach for generating relational storage for XML
– takes application characteristics into account» schema + data 'statistics' + queries
Performance shows that storage significant performance improvements
The same is likely to be true for other indexing techniques:– Full text index on XML (best for text queries?)– Native XML indexes (best for Xpath queries without
schema?)– Files!!! (Best when a lot of small documents an no need
for concurrency control)
There is no one best way to store XML...We should try to hide that from the user!
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
A few other things I've done...
Mapping XQuery to SQL/XML– Work with Jonathan Robie– SQL/XML is an extension of SQL to XML
» Standard mapping from table to XML document» Standard mapping from relational schema to XML Schema» Extension of SQL query language to builds XML elements
– Goal is to evaluation XQueries on top of an ODBC driver supporting XQuery
– Approach: identify a fragment of XQuery which has a direct syntactic mapping into SQL / XML
– Surprise: the syntactic approach worked really well, because of the way XQuery is designed. (i.e., FLWOR close to SQL statement).
Galax : XQuery 1.0 implementation
LegoDB 1Data Binding Workshop, Avaya Labs, June 2003
My take on the Data Binding problem
The main problem will be about the mismatch between type systems:
SQL vs. XML Schema vs. Object-Oriented
I've never seen a good proposal that brings any two type systems together (then builds a language one top of it)
Where I see the best chance of success:– Use XML has the data model (can represent pretty
much anything).– XML Schema can represent relational schema (see
SQL/XML standard)– Can it represent Object-Oriented?