on the path to efficient xml queries
Post on 01-Jan-2016
27 Views
Preview:
DESCRIPTION
TRANSCRIPT
IBM Almaden Research Center
© 2006 IBM Corporation
On the Path to Efficient XML Queries
Andrey Balmin, Kevin Beyer, Fatma ÖzcanIBM Almaden Research Center
Matthias NicolaIBM Silicon Valley Lab
IBM Almaden Research Center
© 2006 IBM Corporation
New languages = new abilities + new pitfalls
XQuery
– A new query language designed specifically for XML data
SQL / XML
– Added XML as a data type, including XQuery sequences
– Added XQuery as a sublanguage
IBM Almaden Research Center
© 2006 IBM Corporation
Purpose
Teach users of the new XML query languages
Teach the teachers
Share our users’ experiences
Influence languages
IBM Almaden Research Center
© 2006 IBM Corporation
Focus
Large databases with many moderate XML documents
Schema flexibility is required
– Many schemas in one collection
– No schema validation used
– Schemas with xs:any
– Documents like Atom Syndication and RSS that allow any extension
Therefore
– Document filtering is primary concern
– Limited type inference
– Any data is possible
IBM Almaden Research Center
© 2006 IBM Corporation
Index eligibility
Index Eligibility:
We say that an index I is eligible to answer predicate P of query Q, if for any collection of XML documents D, the following holds:
Q(D) = Q( I( P,D )).
Where I( P,D ) is the set of XML documents produced, by probing index I with predicate P.
This is not as obvious as it is in relational databases.
IBM Almaden Research Center
© 2006 IBM Corporation
XML indexes in DB2
Index a linear XPath pattern over a column as a particular datatype
CREATE INDEX index-name ON table(xml-column) USING 'pattern' AS type
pattern ::= namespace-decls? (( / | // ) axis? ( name-test | kind-test ))+
axis ::= @ | child:: | attribute:: | self:: | descendant:: | descendant-or-self::
name-test ::= qname | * | ncname:* | *:ncname
kind-test ::= node() | text() | comment() | processing-instruction()
type ::= varchar | double | date | timestamp
IBM Almaden Research Center
© 2006 IBM Corporation
Query pattern index pattern
CREATE INDEX li_price ON orders(orddoc)USING XMLPATTERN '//lineitem/@price' AS double
Can use the index: more restrictive
for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ]return $i
Cannot use the index: less restrictive
for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@* > 100 ]return $i
IBM Almaden Research Center
© 2006 IBM Corporation
Match index and query predicate data type
CREATE INDEX li_price ON orders(orddoc)USING XMLPATTERN '//lineitem/@price' AS double
Can use the index: numeric predicate and index
for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ]return $i
Cannot use the index: string predicate
for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[lineitem/@price > "100" ]return $i
IBM Almaden Research Center
© 2006 IBM Corporation
Data Types for Joins
CREATE INDEX o_custid ON orders(orddoc) USING XMLPATTERN '//custid' AS double
CREATE INDEX c_custid ON customer(cdoc) USING XMLPATTERN '/customer/id' AS double
Cannot use the indexes: unknown comparison type
for $i in db2-fn:xmlcolumn("ORDERS.ORDDOC")/orderfor $j in db2-fn:xmlcolumn("CUSTOMER.CDOC")/customerwhere $i/custid = $j/idreturn $i
Can use the indexes: at least one cast required
for $i in db2-fn:xmlcolumn("ORDERS.ORDDOC")/orderfor $j in db2-fn:xmlcolumn("CUSTOMER.CDOC")/customerwhere $i/custid/xs:double(.) = $j/id/xs:double(.)return $i
IBM Almaden Research Center
© 2006 IBM Corporation
SQL/XML Query Functions
XMLQuery
Scalar function that returns an (possibly empty) XQuery sequence for every row
XMLExists
Predicate that returns true iff the XQuery sequence produced is not empty
XMLTable
Produces a table with one row for each item in the row-producing XQuery sequence, and with one column per column-producing XQuery expression. The columns may be XQuery sequences or cast to simple SQL types.
IBM Almaden Research Center
© 2006 IBM Corporation
XMLQuery does not filter rows (usually)
Cannot use the index:
SELECT XMLQuery(‘$order//lineitem[ @price > 100 ]
‘ passing orddoc as "order")FROM orders
Can use the index:
VALUES (XMLQuery(’ db2-fn:xmlcolumn("ORDERS.ORDDOC") //lineitem[ @price > 100 ] '))
Can use the index:
db2-fn:xmlcolumn('ORDERS.ORDDOC') //lineitem[ @price > 100 ]
(LI1, LI2)
()
(LI3)
()
(LI1, LI2, LI3)
LI1
LI2
LI3
Result
IBM Almaden Research Center
© 2006 IBM Corporation
XMLExists filter rows (usually)
Can use the index
SELECT ordid, orddocFROM ordersWHERE XMLExists(‘
$order//lineitem[ @price > 100 ]‘ passing orddoc as "order")
Cannot use the index: false exists
SELECT ordid, orddocFROM ordersWHERE XMLExists(‘
$order//lineitem/@price > 100‘ passing orddoc as "order")
Need XMLTest which uses XQuery’s Effective Boolean Value
IBM Almaden Research Center
© 2006 IBM Corporation
XMLQuery + XMLExists vs. XMLTable Can use the index
SELECT ordid, XMLQuery(‘$order//lineitem[@price > 100]
’ passing orddoc as "order")FROM ordersWHERE XMLExists(‘
$order//lineitem[@price > 100] ’ passing orddoc as "order")
XMLTable: More efficient and less redundant
SELECT o.ordid, t.lineitemFROM orders o,XMLTable(‘
$order//lineitem[@price > 100] ’ passing o.orddoc as "order“ COLUMNS "lineitem" XML BY REF PATH '.')as t(lineitem)
IBM Almaden Research Center
© 2006 IBM Corporation
Predicates in XMLTable column expressions Can use the index
SELECT o.ordid, t.lineitemFROM orders o,XMLTable(‘
$order//lineitem[@price > 100] ’ passing o.orddoc as "order“ COLUMNS "lineitem" XML BY REF PATH '.')as t(lineitem)
Cannot use the index
SELECT o.ordid, t.lineitem, t.priceFROM orders o, XMLTable(‘
$order//lineitem ’ passing o.orddoc as "order" COLUMNS "lineitem" XML BY REF PATH '.', "price" DECIMAL(6,3) PATH '@price[. > 100]‘ )as t(lineitem, price)
1 LI1
1 LI2
3 LI3
1 LI4 null
1 LI1 175
1 LI2 150
2 LI5 null
2 LI6 null
3 LI3 201
3 LI7 null
IBM Almaden Research Center
© 2006 IBM Corporation
Joining XML Values in SQL/XML
Can use index on product/id, but not p.id
SELECT p.name, o.orddocFROM products p, orders oWHERE XMLExists(‘
$order//lineitem/product[ id eq $pid ] ‘ passing o. orddoc as "order", p.id as "pid")
Can use index on p.id, but not product/id
SELECT p.name, o.orddocFROM products p, orders oWHERE p.id = XMLCast( XMLQuery(‘
$order//lineitem/product/id ‘ passing o. orddoc as "order") as VARCHAR(13))
Need to unify XQuery and SQL data types
IBM Almaden Research Center
© 2006 IBM Corporation
Joining XML Values in SQL/XML Probably cannot use XML indexes: SQL types differ from XML
SELECT c.name, o.orddocFROM orders o, customer cWHERE XMLCast( XMLQuery(‘
$order/order/custid ’ passing o.orddoc as "order") as DOUBLE) =XMLCast( XMLQuery(‘
$cust/customer/id ’ passing c.cdoc as "cust") as DOUBLE)
Can use XML indexes
SELECT c.name, o.orddocFROM orders o, customer cWHERE XMLExists(‘
$order/order[ custid/xs:double(.) = $cust/customer/id/xs:double(.) ]
‘ passing o.orddoc as "order", c.cdoc as "cust")
IBM Almaden Research Center
© 2006 IBM Corporation
XQuery Let Clauses
Can use the index
for $doc in db2-fn:xmlcolumn('ORDERS.ORDDOC')for $item in $doc//lineitem[ @price > 100 ]return <result>{ $item }</result>
for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/orderreturn $ord/lineitem[ @price > 100 ]
Cannot use the index
for $doc in db2-fn:xmlcolumn('ORDERS.ORDDOC')let $item:= $doc//lineitem[ @price > 100 ]return <result>{ $item }</result>
for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/orderreturn <result>{ $ord/lineitem[ @price > 100 ]}</result>
R/L1
R/L2
R/L3
R/L1,L2
R
R/L3
R
IBM Almaden Research Center
© 2006 IBM Corporation
XQuery Let Clauses
Can use the index
for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/orderwhere $ord/lineitem/@price > 100return <result>{ $ord/lineitem }</result>
Same as above
for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/orderlet $price := $ord/lineitem/@pricewhere $price > 100return <result>{ $ord/lineitem }</result>
R/L4,L1,L2
R/L3,L7
IBM Almaden Research Center
© 2006 IBM Corporation
Context is everything
$i is bound to the document node
for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') return $i/order/lineitem
$j is bound to <my_order>
for $j in ( for $o in db2-fn:xmlcolumn('ORDERS.ORDDOC')/order return <my_order>{ $o/* }</my_order> )return $j/my_order/lineitem
IBM Almaden Research Center
© 2006 IBM Corporation
Remember the dot
Produces a type error: no document node at root
let $order := <new_order>{ db2-fn:xmlcolumn('ORDERS.ORDDOC') /order[custid > 1001] } </new_order>return $order[ //customer/name ]
Absolute path expressions is a shorthand for
fn:root(.) treat as document-node().
Absolute path expressions are bad style
IBM Almaden Research Center
© 2006 IBM Corporation
Construction and View Composition
Want to rewrite this…
let $view := for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC')/ order/lineitem return <ordered>{ $i/@quantity, $i/product/@price,
<pid>{ $i/product/id/data(.) }</pid> }</ordered>
for $j in $viewwhere $j/pid = '17‘return $j/@price
IBM Almaden Research Center
© 2006 IBM Corporation
Construction and View Composition
… into this
for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') /order/lineitemwhere $i/product/id/data(.) = '17‘return $i/product/@price
but…
IBM Almaden Research Center
© 2006 IBM Corporation
Construction and View Composition
Data type changed to untypedAtomic
– id is string: comparison is now an error
– id is long: comparison is now as double instead of long
List types are concatenated
Error for duplicate @price attributes lost
New node identity lost
Parent axis is broken
Any sequence should live in tree without change.Separate identity from construction?
IBM Almaden Research Center
© 2006 IBM Corporation
Remember the namespaces
Index definition and query must match namespaces
CREATE INDEX li_price ON orders(orddoc)USING XMLPATTERN '//lineitem/@price' AS double
Cannot use the index. Which is right?
declare default element namespace "http://ournamespaces.com/order";
for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ]return $i
IBM Almaden Research Center
© 2006 IBM Corporation
Elements and text nodes differ
CREATE INDEX PRICE_TEXT ON orders.orddocUSING XMLPATTERN '//price' AS varchar
Can not use index
for $ord in db2-fn:xmlcolumn(“ORDERS.ORDDOC”) /order[ lineitem/price/text() = “99.50” ]return $ord
Element might have more data than just text
<price>99.50<unit>USD</unit></price>
IBM Almaden Research Center
© 2006 IBM Corporation
Attributes are shy
No attributes
//*//node()
Only attributes
//@*//attribute::node()
Empty result due to “principle node kind”
//@*/self:*
IBM Almaden Research Center
© 2006 IBM Corporation
Between predicates are not obvious
Might not be between: multiple prices
lineitem[ price > 100 and price < 200 ]
Between or error
lineitem[ price gt 100 and price lt 200 ]
Always between
lineitem/price/data()[ . > 100 and . < 200 ]
Between if not list type
lineitem/price[ . > 100 and . < 200 ]
Between if not list type
lineitem[ @price > 100 and @price < 200 ]
top related