bridging relational technology and xml
DESCRIPTION
Bridging Relational Technology and XML. Jayavel Shanmugasundaram. University of Wisconsin & IBM Almaden Research Center. eXtensible Markup Language (XML). Internet. Order Fulfillment Application. Purchasing Application. Relational Database System. Relational Database System. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/1.jpg)
Bridging Relational Technologyand XML
Jayavel Shanmugasundaram
University of Wisconsin &IBM Almaden Research Center
![Page 2: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/2.jpg)
Business to Business Interactions
Tires R Us Cars R Us
Order Fulfillment Application
PurchasingApplication
Internet
eXtensible Markup Language (XML)
Relational DatabaseSystem
Relational DatabaseSystem
![Page 3: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/3.jpg)
Shift in Application Developers’ Conceptual Data Model
XML
Relations
Code to convert XMLdata to relational data
Relational datamanipulation
Code to convert relationaldata to XML data
XML applicationdevelopment
XML applicationdevelopment
![Page 4: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/4.jpg)
Are XML Database Systemsthe Answer?
Tires R Us Cars R Us
XML DatabaseSystem
Order Fulfillment Application
XML DatabaseSystem
PurchasingApplication
Internet
eXtensible Markup Language (XML)
![Page 5: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/5.jpg)
Why use Relational Database Systems?
• Highly reliable, scalable, optimized for performance, advanced functionality– Result of 30+ years of Research & Development– XML database systems are not “industrial strength”
… and not expected to be in the foreseeable future
• Existing data and applications– XML applications have to inter-operate with existing
relational data and applications– Not enough incentive to move all existing business
applications to XML database systems• Remember object-oriented database systems?
![Page 6: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/6.jpg)
A Solution
Tires R Us Cars R Us
Relational DatabaseSystem
Order Fulfillment Application
PurchasingApplication
Internet
eXtensible Markup Language (XML)
XML Translation Layer XML Translation Layer
Relational DatabaseSystem
![Page 7: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/7.jpg)
XML Translation Layer(Contributions)
• Store and query XML documents– Harnesses relational database technology for
this purpose
• Publish existing relational data as XML documents– Allows relational data to be viewed in XML
terms
![Page 8: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/8.jpg)
Bridging Relational Technologyand XML
XML
Relations
Code to convert XMLdata to relational data
Relational datamanipulation
Code to convert relationaldata to XML data
XML applicationdevelopment
XML applicationdevelopment
XML TranslationLayer
![Page 9: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/9.jpg)
Outline
• Motivation & High-level Solution
• Background (Relations, XML)
• Storing and Querying XML Documents
• Publishing Relational Data as XML Documents
• Conclusion
![Page 10: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/10.jpg)
Relational Schema and Data
PurchaseOrder
Id Customer
200I
YearMonth
Cars R Us 10 June 1999
300I
Day
Bikes R Us null July 1999
Item
Name
200I
Cost
Firestone Tire 50 2000.00
200I
Quantity
Goodyear Tire 200 8000.00
Pid
300I Trek Tire 20
300I Schwinn Tire 100
Payment
Installment
40%
PercentagePid
300I 100%
200I 60%
1
2
1
200I
2500.00
400.00
![Page 11: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/11.jpg)
Self-describing tags
XML Document
<PurchaseOrder id=“200I” customer=“Cars R Us”> <Date> <Day> 10 </Day> <Month> June </Month> <Year> 1999 </Year> </Date> <Item name=“Firestone Tire” cost=“2000.00”> <Quantity> 50 </Quantity> </Item> <Item name=“Goodyear Tire” cost=“8000.00”> <Quantity> 200 </Quantity> </Item> <Payment> 40% </Payment> <Payment> 60% </Payment></PurchaseOrder>
![Page 12: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/12.jpg)
XML Document
<PurchaseOrder id=“200I” customer=“Cars R Us”> <Date> <Day> 10 </Day> <Month> June </Month> <Year> 1999 </Year> </Date> <Item name=“Firestone Tire” cost=“2000.00”> <Quantity> 50 </Quantity> </Item> <Item name=“Goodyear Tire” cost=“8000.00”> <Quantity> 200 </Quantity> </Item> <Payment> 40% </Payment> <Payment> 60% </Payment></PurchaseOrder>
Nested structure
Self-describing tags
![Page 13: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/13.jpg)
XML Document
<PurchaseOrder id=“200I” customer=“Cars R Us”> <Date> <Day> 10 </Day> <Month> June </Month> <Year> 1999 </Year> </Date> <Item name=“Firestone Tire” cost=“2000.00”> <Quantity> 50 </Quantity> </Item> <Item name=“Goodyear Tire” cost=“8000.00”> <Quantity> 200 </Quantity> </Item> <Payment> 40% </Payment> <Payment> 60% </Payment></PurchaseOrder>
Nested structure
Self-describing tags
Nested sets
![Page 14: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/14.jpg)
XML Document
<PurchaseOrder id=“200I” customer=“Cars R Us”> <Date> <Day> 10 </Day> <Month> June </Month> <Year> 1999 </Year> </Date> <Item name=“Firestone Tire” cost=“2000.00”> <Quantity> 50 </Quantity> </Item> <Item name=“Goodyear Tire” cost=“8000.00”> <Quantity> 200 </Quantity> </Item> <Payment> 40% </Payment> <Payment> 60% </Payment></PurchaseOrder>
Nested structure
Self-describing tags
Nested sets
Order
![Page 15: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/15.jpg)
XML Schema<PurchaseOrder id={integer} customer={string}> Date (Item)* (Payment)*</PurchaseOrder>
PurchaseOrder
Date <Date> Day? Month Year</Date>
Day <Day> {integer} </Day>
Month <Month> {string} </Month>
Year <Year> {integer} </Year>
Item <Item name={string} cost={float}> Quantity</Item>
… and so on
![Page 16: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/16.jpg)
XML Schema (contd.)
<PurchaseOrder id={integer} customer={string}> Date? (Item | Payment)*</PurchaseOrder>
PurchaseOrder
<PurchaseOrder id={integer} customer={string}> (Date | Payment*) (Item (Item Item)* Payment)*</PurchaseOrder>
PurchaseOrder
<PurchaseOrder id={integer} customer={string}> Date Item (PurchaseOrder)* Payment</PurchaseOrder>
PurchaseOrder
![Page 17: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/17.jpg)
XML Query
Find all the items bought by “Cars R Us” in 1999
For $po in /PurchaseOrderWhere $po/@customer = “Cars R Us” and $po/date/year = 1999Return $po/Item
![Page 18: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/18.jpg)
XML Query (contd.)
//Item
/(Item/(Item/Payment)*/(Payment | Item))*/Date
//Item[5]
//Item Before //Payment
![Page 19: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/19.jpg)
Outline
• Motivation & High-level Solution
• Background (Relations, XML)
• Storing and Querying XML Documents
• Publishing Relational Data as XML Documents
• Conclusion
![Page 20: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/20.jpg)
Storing and Querying XML Documents[Shanmugasundaram et. al., VLDB’99]
Relational Database System
XML Translation Layer
XMLSchema
Relational
Schema
Translation
Information
XML
Documents
Tuples
XML
Query
SQL
Query
Relational
Result
XML
Result
![Page 21: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/21.jpg)
Outline
• Motivation & High-level Solution
• Background (Relations, XML)
• Storing and Querying XML Documents– Relational Schema Design and XML Storage– Query Mapping and Result Construction
• Publishing Relational Data as XML Documents
• Conclusion
![Page 22: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/22.jpg)
XML Schema
<PurchaseOrder id={integer} customer={string}> (Date | (Payment)*) (Item (Item Item)* Payment)*</PurchaseOrder>
PurchaseOrder
![Page 23: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/23.jpg)
Desired Properties of Generated Relational Schema R
• All XML documents conforming to XML schema should be “mappable” to tuples in R
• All queries over XML documents should be “mappable” to SQL queries over R
• Not Required: Ability to re-generate XML schema from R
![Page 24: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/24.jpg)
Simplifying XML Schemas
• XML schemas can be “simplified” for translation purposes
• All without undermining storage and query functionality!
<PurchaseOrder id={integer} customer={string}> Date? (Item)* (Payment)*</PurchaseOrder>
PurchaseOrder
<PurchaseOrder id={integer} customer={string}> (Date | (Payment)*) (Item (Item Item)* Payment)*</PurchaseOrder>
PurchaseOrder
![Page 25: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/25.jpg)
Why is Simplification Possible?
• Structure in XML schemas can be captured:– Partly in relational schema– Partly as data values
<PurchaseOrder id={integer} customer={string}> Date? (Item)* (Payment)*</PurchaseOrder>
PurchaseOrder
• Order field to capture order among siblings• Sufficient to answer ordered XML queries
– PurchaseOrder/Item[5]– PurchaseOrder/Item AFTER PurchaseOrder/Payment
• Sufficient to reconstruct XML document
![Page 26: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/26.jpg)
Simplification Desiderata• Simplify structure, but preserve differences
that matter in relational model– Single occurrence (attribute)– Zero or one occurrences (nullable attribute)– Zero or more occurrences (relation)
<PurchaseOrder id={integer} customer={string}> (Date | (Payment)*) (Item (Item Item)* Payment)*</PurchaseOrder>
PurchaseOrder
<PurchaseOrder id={integer} customer={string}> Date? (Item)* (Payment)*</PurchaseOrder>
PurchaseOrder
![Page 27: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/27.jpg)
Translation Normal Form
• An XML schema production is either of the form:
<P attr1={type1} … attrm={typem}> a1 … ap ap+1? … aq? aq+1*… ar*</P>
P
• … or of the form:<P attr1={type1} … attrm={typem}> {type}</P>
P
where ai aj
![Page 28: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/28.jpg)
Example Simplification Rules
(e1 | e2) e1? e2?
(Date | (Payment)*) (Item (Item Item)* Payment)*
Date? (Item)*? (Item (Item Item)* Payment)*
e*? e*
Date? (Payment)*? (Item (Item Item)* Payment)*
Date? (Item)* (Item (Item Item)* Payment)*
![Page 29: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/29.jpg)
Simplified XML Schema<PurchaseOrder id={integer} customer={string}> Date (Item)* (Payment)*</PurchaseOrder>
PurchaseOrder
Date <Date> Day? Month Year</Date>
Day <Day> {integer} </Day>
Month <Month> {string} </Month>
Year <Year> {integer} </Year>
Item <Item name={string} cost={float}> Quantity</Item>
… and so on
![Page 30: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/30.jpg)
Relational Schema Generation
PurchaseOrder (id, customer)
Date
Day Month Year
Item (name, cost)
Quantity
Payment
1
? 1 1
**
1
Minimize: Number of joins for path expressions
Satisfy: Fourth normal form
![Page 31: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/31.jpg)
Generated Relational Schemaand Shredded XML Document
PurchaseOrder
Id Customer
200I
YearMonth
Cars R Us 10 June 1999
Day
Item
Order Name
200I
Cost
Firestone Tire 502000.00
200I
Quantity
Goodyear Tire 200
8000.00
Pid
1
3
Payment
Order
40%
ValuePid
200I 60%
2
4
200I
![Page 32: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/32.jpg)
Recursive XML Schema
PurchaseOrder (id, customer)
Item (name)
Quantity
*
1
*
PurchaseOrder
Id Customer
200I Cars R Us
Item
Order Name
200I Firestone Tire 50
200I
Quantity
Goodyear Tire 200
Pid
1
2
Id
5
6
Pid
null
Order
null
5900I Us Again 1
![Page 33: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/33.jpg)
Relational Schema Generation and XML Document Shredding (Completeness and Optimality)
• Any XML Schema X can be mapped to a relational schema R, and …
• Any XML document XD conforming to X can be converted to tuples in R
• Further, XD can be recovered from the tuples in R
• Also minimizes the number of joins for path expressions (given fourth normal form)
![Page 34: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/34.jpg)
Outline
• Motivation & High-level Solution
• Background (Relations, XML)
• Storing and Querying XML Documents– Relational Schema Design and XML Storage– Query Mapping and Result Construction
• Publishing Relational Data as XML Documents
• Conclusion
![Page 35: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/35.jpg)
XML Query
Find all the items bought by “Cars R Us” in 1999
For $po in /PurchaseOrderWhere $po/@customer = “Cars R Us” and $po/date/year = 1999Return $po/Item
![Page 36: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/36.jpg)
Path Expression Automata(Moore Machines)
/PurchaseOrder/ItemPurchaseOrder
Item
//Item Item#
![Page 37: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/37.jpg)
XML Schema Automaton(Mealy Machine)
PurchaseOrder (id, customer)
Date
Day Month Year
Item (name, cost)
Quantity
Payment
![Page 38: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/38.jpg)
Intersected Automaton
PurchaseOrder (customer)
Date
Year
Item (name, cost)
Quantity
= “Cars R Us”
= 1999
![Page 39: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/39.jpg)
Generated SQL Query
Select i.name, i.cost, i.quantityFrom PurchaseOrder p, Item iWhere p.customer = “Cars R Us” and p.year = 1999 and p.id = i.pid
Predicates
Join condition
![Page 40: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/40.jpg)
Recursive XML Query
PurchaseOrder (id, customer)
Item (name)
Quantity
Find all items (directly or indirectly)under a “Cars R Us” purchase order
For $po in /PurchaseOrderWhere $po/@customer = “Cars R Us” Return $po//Item
![Page 41: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/41.jpg)
Recursive Automata Intersection
PurchaseOrder (customer)PurchaseOrder (id, customer)
Item (name)
Quantity PurchaseOrder
= “Cars R Us”
Item (name)
Quantity
![Page 42: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/42.jpg)
Recursive SQL Generation
PurchaseOrder (customer)
PurchaseOrder
= “Cars R Us”
Item (name)
Quantity
ResultItems (id, name, quantity) as (Select it.id, it.name, it.quantityFrom PurchaseOrder po, Item itWhere po.customer = “Cars R Us” and po.id = it.pid
)
Union all
Select it.id, it.name, it.quantityFrom ResultItems rit, PurchaseOrder po, Item itWhere rit.id = po.pid and po.id = it.pid
![Page 43: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/43.jpg)
SQL Generation for Path Expressions (Completeness)
• (Almost) all path expressions can be translated to SQL
• SQL does not support seamless querying across data and meta-data (schema)– SQL based on first-order logic (with least fix-
point recursion)
• Meta-data query capability provided in the XML translation layer– Implemented and optimized using a higher-order
operator
![Page 44: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/44.jpg)
Constructing XML Results
(“Firestone Tire”, 2000.00, 50)(“Goodyear Tire”, 8000.00, 200)
<item name = “Firestone Tire” cost=“2000.00”> <quantity> 50 </quantity></item><item name=“Goodyear Tire” cost=“8000.00”> <quantity> 200 </quantity></item>
![Page 45: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/45.jpg)
Complex XML Construction
PurchaseOrder (id, customer)
Date
Day Month Year
Item (name, cost)
Quantity
Payment
= “Cars R Us”
= 1999
![Page 46: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/46.jpg)
Outline
• Motivation & High-level Solution
• Background (Relations, XML)
• Storing and Querying XML Documents
• Publishing Relational Data as XML Documents
• Conclusion
![Page 47: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/47.jpg)
Relational Schema and Data
PurchaseOrder
Id Customer
200I
YearMonth
Cars R Us 10 June 1999
Day
Item
Name
200I
Cost
Firestone Tire 50 2000.00
200I
Quantity
Goodyear Tire 200 8000.00
Pid
Payment
Installment
40%
PercentagePid
200I 60%
1
2
200I
![Page 48: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/48.jpg)
XML Document
<PurchaseOrder id=“200I” customer=“Cars R Us”> <Date> <Day> 10 </Day> <Month> June </Month> <Year> 1999 </Year> </Date> <Item name=“Firestone Tire” cost=“2000.00”> <Quantity> 50 </Quantity> </Item> <Item name=“Goodyear Tire” cost=“8000.00”> <Quantity> 200 </Quantity> </Item> <Payment> 40% </Payment> <Payment> 60% </Payment></PurchaseOrder>
![Page 49: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/49.jpg)
Naïve Approach
• Issue many SQL queries that mirror the structure of the XML document to be constructed
• Tag nested structures as they are produced
DBMS EnginePurchaseOrder
Item
Payment
Problem 1: Too many SQL queries
(200I, “Cars R Us”, 10, June, 1999)
(40%)
(60%)
(“Firestone Tire”, 2000.00, 50)
(“Goodyear Tire”, 8000.00, 200)
Problem 2: Fixed (nested loop) join strategy
![Page 50: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/50.jpg)
Relations to XML: Issues[Shanmugasundaram et. al., VLDB’00]
• Two main differences:– Ordered nested structures– Self-describing tags
• Space of alternatives:
Late TaggingEarly Tagging
Late Structuring
Early StructuringInside Engine Inside Engine
Inside Engine
Outside Engine Outside Engine
Outside Engine
![Page 51: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/51.jpg)
Naïve (Stored Procedure) Approach
• Issue queries for sub-structures and tag them• Could be a Stored Procedure to maximize performance
DBMS EnginePurchaseOrder
Item
Payment
Problem 1: Too many SQL queries
(200I, “Cars R Us”, 10, June, 1999)
(40%)
(60%)
(“Firestone Tire”, 2000.00, 50)
(“Goodyear Tire”, 8000.00, 200)
Problem 2: Fixed (nested loop) join strategy
Early Tagging, Early Structuring, Outside Engine
![Page 52: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/52.jpg)
CLOB Approach
Problem: CLOBs during query execution (storage, size estimation, copying)
Early Tagging, Early Structuring, Inside Engine
• Push down computation inside engine
• Use extensibility features of SQL– Scalar functions for tagging– Aggregate functions for creating nested
structures
• Character Large Objects (CLOBs) to hold intermediate tagged results
![Page 53: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/53.jpg)
Late Tagging, Late Structuring
• XML document content produced without tags or structure (in arbitrary order)
• Tagger adds tags and structure as final step
Relational QueryProcessing
Unstructured content
Tagging
Result XML Document
![Page 54: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/54.jpg)
Outer Union Approach
PurchaseOrder
Item Payment(200I, 2, 60%)
(200I, 1, 40%)
(200I, “Firestone Tire”, 2000.00, 50)
(200I, “Goodyear Tire”, 8000.00, 200)
(200I, “Cars R Us”, 10, June, 1999)
(200I, null , null, null , null , null , null , null, 1 , 40%, 2)
(200I, “Cars R Us”, 10 , June, 1999, null , null , null, null, null, 0)
(200I, null , null, null , null , null , null , null, 2 , 60%, 2)
(200I, null , null, null , null , “Firestone Tire” , 2000.00, 50 , null, null, 1)
(200I, null , null, null , null , “Goodyear Tire”, 8000.00, 200, null, null, 1)
Union
Problem: Wide tuples (having many columns)
Late Tagging, Late Structuring
![Page 55: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/55.jpg)
Hash-based Tagger
• Outer union results not structured early– In arbitrary order
• Tagger has to enforce nesting order during tagging– Essentially a nested group-by operation
• Hash-based approach– Hash on ancestor ids– Group siblings together and tag them
• Inside/Outside engine tagger
Late Tagging, Late Structuring
Problem: Requires memory for entire document
![Page 56: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/56.jpg)
Late Tagging, Early Structuring• Structured XML document content produced
– In document order– “Sorted Outer Union” approach
• Tagger just adds tags– In constant space– Inside/outside engine
Relational QueryProcessing
Structured content
Tagging
Result XML Document
![Page 57: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/57.jpg)
Sorted Outer Union Approach
PurchaseOrder
Item Payment(200I, 2, 60%)
(200I, 1, 40%)
(200I, “Firestone Tire”, 2000.00, 50)
(200I, “Goodyear Tire”, 8000.00, 200)
(200I, “Cars R Us”, 10, June, 1999)
(200I, null , null, null , null , null , null , null, 1 , 40%, 2)
(200I, “Cars R Us”, 10 , June, 1999, null , null , null, null, null, 0)
(200I, null , null, null , null , null , null , null, 2 , 60%, 2)
(200I, null , null, null , null , “Firestone Tire” , 2000.00, 50 , null, null, 1)
(200I, null , null, null , null , “Goodyear Tire”, 8000.00, 200, null, null, 1)
Union
Sort1 23
Late Tagging, Late Structuring
![Page 58: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/58.jpg)
PurchaseOrder
Item Payment(200I, 2, 60%)
(200I, 1, 40%)
(200I, “Firestone Tire”, 2000.00, 50)
(200I, “Goodyear Tire”, 8000.00, 200)
(200I, “Cars R Us”, 10, June, 1999)
(200I, “Cars R Us”, 10 , June, 1999, null , null , null, null, null, 0)
(200I, null , null, null , null , “Firestone Tire” , 2000.00, 50 , null, null, 1)
(200I, null , null, null , null , “Goodyear Tire”, 8000.00, 200, null, null, 1)
(200I, null , null, null , null , null , null , null, 1 , 40%, 2)
(200I, null , null, null , null , null , null , null, 2 , 60%, 2)
Union
Sort1 23
Sorted Outer Union ApproachLate Tagging, Late Structuring
Problem: Enforces Total Order
![Page 59: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/59.jpg)
Classification of AlternativesLate TaggingEarly Tagging
LateStructuring
EarlyStructuring
Inside Engine
Inside Engine
Out
side
Eng
ine
Stored Procedure
Inside Engine
Out
side
Eng
ine
Sorted Outer Union(Tagging inside)
Sorted Outer Union(Tagging outside)
Unsorted Outer Union(Tagging inside)
Unsorted Outer Union(Tagging outside)
Out
side
Eng
ine
CLOB
![Page 60: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/60.jpg)
Performance Evaluation
TABLE000 TABLE001 TABLE011TABLE010
TABLE00 TABLE01
TABLE0
Query Depth
Query Fan Out
Database Size = 10MB – 100MB
366 MHz Pentium II, 256MB
![Page 61: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/61.jpg)
Where Does Time Go?
0
5
10
15
20
25
30
35
StoredProc
CLOB UnsortedOU (Out)
UnsortedOU (In)
SortedOU (Out)
SortedOU (In)
Tim
e (in
sec
onds
)
XML File
Tagging
Bind Out
Execution
Database Size = 10MB, Query Fan Out = 2, Query Depth = 2
![Page 62: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/62.jpg)
Effect of Query Depth
0
20
40
60
2 3 4
Query Depth
Time (
in se
cond
s)
CLOB
Unsorted OU
Sorted OU
Database Size = 10MB, Query Fan Out = 2
![Page 63: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/63.jpg)
Memory Considerations
• Sufficient memory– Unsorted outer union is method of choice– Efficient hash-based tagger
• Limited memory (large documents)– Sorted outer union is method of choice– Relational sort is highly scalable
![Page 64: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/64.jpg)
XML Document Construction (Completeness and Performance)
• Any nested XML document can be constructed using “outer union” approaches
• 9x faster than previous approaches– 10 MB of data– 17 seconds for sorted outer union approach– 160 seconds for “naïve XML application developer”
approach
![Page 65: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/65.jpg)
Outline
• Motivation & High-level Solution
• Background (Relations, XML)
• Storing and Querying XML Documents
• Publishing Relational Data as XML Documents
• Conclusion
![Page 66: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/66.jpg)
Conclusion
• XML has emerged as the Internet data format
• But relational database systems will continue to be used for data management tasks
• Internet application developers currently have to explicitly bridge this “data model gap”
• Can we design a system that automatically bridges this gap for application developers?
![Page 67: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/67.jpg)
For $cust in /CustomerWhere $cust/name = “Jack”Return $cust
![Page 68: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/68.jpg)
// First prepare all the SQL statements to be executed and create cursors for themExec SQL Prepare CustStmt From “select cust.id, cust.name from Customer cust where cust.name = ‘Jack’“Exec SQL Declare CustCursor Cursor For CustStmtExec SQL Prepare AcctStmt From “select acct.id, acct.acctnum from Account acct where acct.custId = ?“Exec SQL Declare AcctCursor Cursor For AcctStmtExec SQL Prepare PorderStmt From “select porder.id, porder.acct, porder.date from PurchOrder porder where porder.custId = ?“Exec SQL Declare PorderCursor Cursor For PorderStmtExec SQL Prepare ItemStmt From “select item.id, item.desc from Item item where item.poId = ?“Exec SQL Declare ItemCursor Cursor For ItemStmtExec SQL Prepare PayStmt From “select pay.id, pay.desc from Payment pay where item.poId = ?“Exec SQL Declare PayCursor Cursor For PayStmt// Now execute SQL statements in nested order of XML document result. Start with customerXMLresult = ““Exec SQL Open CustCursorwhile (CustCursor has more rows) { Exec SQL Fetch CustCursor Into :custId, :custName XMLResult += “<customer id=“ + custId + “><name>“ + custName + “</name><accounts>“ // For each customer, issue sub-query to get account information and add to custAccts Exec SQL Open AcctCursor Using :custId while (AcctCursor has more rows) { Exec SQL Fetch AcctCursor Into :acctId, :acctNum XMLResult += “<account id=“ + acctId + “> “ + acctNum + “</account>“ } XMLResult += “</accounts><porders>“ // For each customer, issue sub-query to get purchase order information and add to custPorders Exec SQL Open PorderCursor Using :custId while (PorderCursor has more rows) { Exec SQL Fetch PorderCursor Into :poId, :poAcct, :poDate XMLResult += “<porder id=“+ poId + “ acct=“+poAcct +“><date>“+poDate +“</date><items>“ // For each purchase order, issue a sub-query to get item information and add to porderItems Exec SQL Open ItemCursor Using :poId while (ItemCursor has more rows) { Exec SQL Fetch ItemCursor Into :itemId, :itemDesc XMLResult += “<item id=“ + itemId + “>“ + itemDesc + “</item>“ } XMLResult += “</items><payments>“ // For each purchase order, issue a sub-query to get payment information and add to porderPays Exec SQL Open PayCursor Using :poId while (PayCursor has more rows) { Exec SQL Fetch PayCursor Into :payId, :payDesc XMLResult += “<payment id=“ + payId + “>“ + payDesc + “</payment>“ } XMLResult += “</payments></porder>“ } // End of looping over all purchase orders associated with a customer XMLResult += “</customer>“ Return XMLResult as one result row; reset XMLResult = ““} // loop until all customers are tagged and output
![Page 69: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/69.jpg)
Conclusion (Contd.)
• Yes! XPERANTO is the first such system• Allows users to …
– Store and query XML documents using a relational database system
– Publish existing relational data as XML documents
… using a high-level XML query language• Also provides a dramatic improvement in
performance
![Page 70: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/70.jpg)
Relational Database System Vendors
• IBM, Microsoft, Oracle, Informix, …– SQL extensions for XML
• XML Translation Layer– “Pure XML” philosophy … provides high-level
XML query interface• SQL extensions for XML, while better than writing
applications, is still low-level
– More powerful than XML-extended SQL• SQL just not designed with nifty XML features in
mind
![Page 71: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/71.jpg)
Related Research
• Storing and querying “schema-less” XML documents using a relational database system– STORED [Deutsch et. al.]– Binary decomposition [Florescu & Kossmann]
• XML Views of relational databases– SilkRoute [Fernandez et. al.]– Mediators
![Page 72: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/72.jpg)
Future Work
• Functionality– Update XML documents– Insert/Update XML views of relational data– Information-retrieval style queries
• Performance– Query-aware relational schema generation– Caching– Relational engine extensions
![Page 73: Bridging Relational Technology and XML](https://reader035.vdocument.in/reader035/viewer/2022081519/56813e7a550346895da89fc2/html5/thumbnails/73.jpg)