indexing & querying xml data for ../regular path expressions/*
DESCRIPTION
Indexing & Querying XML Data for ../Regular Path Expressions/* . Quanzhong li Bongki MOON . SUNDAR sUPRIYA . - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/1.jpg)
<AUTHORS> <NAME ID=1>QUANZHONG LI</NAME> <NAME ID=2>BONGKI MOON</NAME><AUTHORS>
<TITLE>Indexing & Querying XML Data for
../Regular Path Expressions/*</TITLE>
<PRESENTERS> <NAME UFID=1234567>SUNDAR</NAME> <NAME UFID=7654321>SUPRIYA</NAME><PRESENTERS>
![Page 2: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/2.jpg)
Need for this paper
XML – emerged as a popular standard for data representation and data exchange on the Internet
XML Query Languages use Regular Path Expressions to query the data
Conventional approaches (for indexing & searching this data) based on Tree traversals goes for a toss! – under heavy access requests Traversing this hierarchy of XML data becomes a
overhead if the path lengths are long or unknown
What can be done???
![Page 3: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/3.jpg)
Try our System and the Algorithms !!!
New system for indexing & storing XML data – XISS New numbering scheme for elements and attributes
Quick in figuring-out ‘ancestor-descendant’ relationship New index structures
Easier to find all elements and attributes with a particular given name string
Join algorithms for processing Reg-Path-Exp queries EE-Join – to search paths from element to element EA-Join – to find element-attribute pairs KC-Join – to find KC (*) on repeated paths or elements
![Page 4: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/4.jpg)
Go XISS!!!
In general, XML data can be queried for a particular value (or) a structure
By Value: get me “document”; get me “element=‘node1’ ” or “attribute=10”
By Structure: get me parent and child elements/attributes for a given element
Components: Index Structure: element, attribute and structure
(index) Data Loader Query Processor
Numbering Scheme first…..
![Page 5: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/5.jpg)
Deitz vs. Li-Moon…
Deitz says, “If x and y are the nodes of a tree T, x is an ancestor of y iff x comes before y when I climb down the tree (pre-order), and after y when I climb up (post-order)” and shows us his scheme,
Ancestor-Descendant relationship determination in constant timeLi-Moon says, “but this lacks flexibility”This leads to many re-computations when a new node is inserted.Hmm… let us check-out Li-Moon’s….
![Page 6: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/6.jpg)
Li-Moon’s Numbering…
Hey folks, we are going to extend this preorder and cover up a range of descendants
Just associate a pair of numbers <order, size> with each node
Parent node x says to its child node y, “I came before you so my order is less than yours & my size is >= (your order + your size) and so your interval is always contained in my interval”
If there are siblings x & y (same parent), say, x is before y, then order(x) + size(x) < order(y)
![Page 7: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/7.jpg)
Voila!
Here it goes,
So, for any node x, size(x) >= size of all its direct children [ size(x) is Laarrrge!]
That being said, “Given nodes x and y of a tree T, x is an ancestor of y iff
order(x) < order(y) <= order(x) + size(x)
![Page 8: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/8.jpg)
Good news!
Easy accommodation of future insertions – more flexible
Global reordering not necessary until no more reserved spaces
order in <order, size> pair is an unique identifier for each element and attribute in the document
Attribute nodes are placed before their sibling elements in the order – why?
How this scheme helps? – wait till the algorithms!
Switching back to XISS…
![Page 9: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/9.jpg)
Internals of XISS
Index Structure Overview
![Page 10: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/10.jpg)
More structures…
Element Index
Structure Index
![Page 11: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/11.jpg)
Path Join Algorithms
Conventional approaches (top down, bottom up and hybrid traversals) – not effective
Main Idea of proposed algorithm: For a given query “chapter/-*/figure”, - find all ‘chapter’ elements - find all ‘figure’ elements - join the qualified ‘chapter-figure’ pairs
without traversing XML data trees (if ancestor- descendant relationship is obtained quickly)
![Page 12: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/12.jpg)
Complex -> Simple
Complex path expression decomposed to many simple path expressions
Intermediate results are joined to get the final result.
Different types of sub-expressions
![Page 13: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/13.jpg)
EA-Join Algorithm
To join intermediate results from sub-expressions with a list of elements and a list of attributes
E.g. “figure[@caption=‘flowchart’]”Attributes should be placed before sibling
elements in the order by the numbering scheme
![Page 14: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/14.jpg)
EA-Join Algorithm
Input: List of “figure” elements and List of “caption” attributes grouped by documents
Steps: (2 stages) Element sets and attribute sets merged by doc. Id
(single scan) Elements and attributes are merged by figuring out
the parent-child relationship using <order> value (single scan)
Output: A set of (e, a) pairs where e is the parent of a
![Page 15: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/15.jpg)
EE-Join Algorithm
To join intermediate results each of which is a list of elements from a sub-expression
E.g. “chapter/-*/figure”Input: List of “chapter” elements and List of
“figure” elementsSteps (2 stages) are similar to EA-Algorithm
Both element sets are merged by doc. Id (single scan) Chapter element and Figure element are merged by
finding the ancestor-descendant relationship using <order, size> values
Output: A set of (e, f) pairs where e is the ancestor of f
![Page 16: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/16.jpg)
EE-Algorithm
The second stage cannot be done in a single scanIn this E.g. , a “figure” element can be
descendant of more than one “chapter” element (see book1.xml)
order(figure) will lie in more than one chapter interval ([order(chapter), order(chapter) + size(chapter)])
This multiple-times scan is still highly effective in searching long or unknown length paths when compared to the conventional tree traversals.
![Page 17: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/17.jpg)
KC-Algorithm
Processes a regular path expression with zero, one or more occurrences of a subexpression
E.g. “chapter*”, “chapter+”Input: Set of elements from an XML documentSteps:
In each stage applies EE-Algorithm to previous stage’s result
Repeat until no change in result
Output: Kleene Closure of all elements in the given input set
![Page 18: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/18.jpg)
Experiments..
Prototype of XISS was implementedQuery Interface – C++; Parse XML – Gnome
XML Parser; B+-Tree - GiST C++ LibraryWorkstation:
Sun Ultrasparc-II running on Solaris 2.7 RAM: 256 MB; Hard-disk: 20GB
Data Sets Shakespeare’s Plays SIGMOD Record NITF100 and NITF1
![Page 19: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/19.jpg)
Performance Comparison
EE-Join Query: Outperformed bottom-up method by a wide margin
Real-World data set: an order of magnitude faster Synthetic data set: 6 to 10 times faster
Disk IO was a dominant Cost factor – 60% to 90% of total elapsed time
EA-Join Query: It was comparatively better than top-down and
bottom-up approachesKC-Join Query:
Performance was not measured; dependent on EE’s performance
![Page 20: Indexing & Querying XML Data for ../Regular Path Expressions/*](https://reader036.vdocument.in/reader036/viewer/2022062321/56813afe550346895da39521/html5/thumbnails/20.jpg)
THE END!
Hope this presentation was usefulTHANKS!