xpath query evaluation
DESCRIPTION
Xpath Query Evaluation. Goal. Evaluating an Xpath query against a given document To find all matches We will also consider the use of types Complexity is important Huge Documents. Data complexity vs. Combined Complexity. Two inputs to the query evaluation problem - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/1.jpg)
Xpath Query Evaluation
![Page 2: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/2.jpg)
Goal
• Evaluating an Xpath query against a given document– To find all matches
• We will also consider the use of types
• Complexity is important– Huge Documents
![Page 3: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/3.jpg)
Data complexity vs. Combined Complexity
• Two inputs to the query evaluation problem– Data (XML document) of size |D|– Query (Xpath expression) of size |Q|– Usually |Q| << |D|
• Polynomial data complexity– Complexity that is polynomial in |D|, possibly exponential in |Q|
• Polynomial combined complexity– Complexity that is polynomial in |D| and |Q|
• Fixed Parameter Tractable complexity – Complexity Poly(|D|)*f(|Q|)
![Page 4: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/4.jpg)
Xpath Query Evaluation
• Input: XML Document D, Xpath query Q
• Output: A subset of the nodes of D, as defined by Q
• We will follow Efficient Algorithms for Processing Xpath Queries / Gottlob, Koch, Pichler, TODS 2005
![Page 5: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/5.jpg)
Simple algorithm
process-location-step(n,Q) { S:-= Apply Q.first to n; If |Q|> 1 For each node n’ in s do process-location-step(n’,Q.next)}
![Page 6: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/6.jpg)
Complexity
• Worst case: in each step of Q the axis is “following”
• So we apply the query in each step on O(|D|) nodes
• And we get Time(|Q|)= |D|*Time(|Q|-1)
• I.e. the complexity is O(|D|^|Q|)
![Page 7: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/7.jpg)
Early Systems Performance
Figure taken from Gottlob, Koch, Pichler ‘05
![Page 8: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/8.jpg)
Internet Explorer 6
Figure taken from Gottlob, Koch, Pichler ‘05
![Page 9: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/9.jpg)
IE6 – performance as a function of document size
Figure taken from Gottlob, Koch, Pichler ‘05
![Page 10: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/10.jpg)
Polynomial data complexity
• Poly data complexity is sometimes considered good even if exponential in the query size
• But can we have polynomial combined complexity for Xpath query evaluation?
• Yes!
![Page 11: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/11.jpg)
Two main principles
• Query parse trees: the query is divided to parts according to its structure (not to be confused with the XML tree structure)
• Context-value tables: for every expression e occurring in the parse tree, compute a table of all valid combinations of context c and value v such that e evaluates to v in c.
![Page 12: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/12.jpg)
Xpath query parse tree
descendant::b/following-sibling::* [position() != last()]
![Page 13: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/13.jpg)
Bottom-up vs. Top-down evaluation
• We will discuss two kinds of query evaluation algorithms:– Bottom-up means that the query parse tree is
processed from the leaves up to the root– Top-down means that the parse tree is processed
from the root to the leaves
• When processing we will fill in the context-value table
![Page 14: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/14.jpg)
Bottom-up evaluation
• Main idea: compute the value for each leaf for every possible context
• Propagate upwards until the root
• Dynamic programming algorithm to avoid re-evaluation of queries in the same context
![Page 15: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/15.jpg)
Operational semantics
• Needed as a first step for evaluation algorithms
• Similar ideas used in compilers design
• Here the semantics is based on the notion of contexts
![Page 16: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/16.jpg)
Contexts
• The domain of contexts is C= dom X {<k,n> | 1<k<n< |dom|} A context is c=<x,k,n> where x is a context node k is a context position n is the context size
![Page 17: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/17.jpg)
Semantics for Xpath expressions
• The semantics of evaluating an expression is a 4-tuple where the first 3 elements are the context, and the fourth is the value obtained by evaluation in the context
![Page 18: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/18.jpg)
Some notations
• T(t): all nodes satisfying a predicate t
• E(e): all nodes satisfying a regular exp. e (applied with respect to a given axis)
• Idxx(x,S) is the index of a node x in the set s with respect to a given axis and the document order
![Page 19: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/19.jpg)
![Page 20: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/20.jpg)
Context-value Table
• Given a query sub-expression e, the context-value table of e specifies all combinations of context c and value v, such that computing e on the context c results in v
• Bottom-up algorithm follows: compute the context-value table in a bottom-up fashion with respect to the query
![Page 21: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/21.jpg)
Bottom-up algorithm
![Page 22: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/22.jpg)
Example
4 times
![Page 23: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/23.jpg)
Complexity
• O(|D|^3*|Q|) space ignoring strings and numbers– O(|Q|) tables, with 3 columns, each including values
in 1…|D| thus O(|D|^3*|Q|)– An extra O(|D|*|Q|) multiplicative factor for strings
and numbers
• O(|D|^5*|Q|) time ignoring strings and numbers– It can take O(|D|^2) to combine two nodesets– Extra O(|Q|) in case of strings and numbers
![Page 24: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/24.jpg)
Optimization
• Represent contexts as pairs of current and previous node
• Allows to get the time complexity down to O(|D|^4* |Q|^2)
• Space complexity can be brought down to O(|D|^2*|Q|^2) via more optimizations
![Page 25: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/25.jpg)
Top-down evaluation
• Similar idea
• But allows to compute only values for contexts that are needed
• Same worst-case bounds
![Page 26: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/26.jpg)
Top-down or bottom-up?
• General question in processing XML trees• The tradeoff:
– Usually easier to combine results computed in children to obtain the result at the parent
• So bottom-up traversal is usually easier to design
– On the other hand, some of the computation is redundant since we don’t know if it will become relevant
• So top-down traversal may be more efficient
![Page 27: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/27.jpg)
Linear-time fragment• Core Xpath includes only navigation
– \ and \\
• Core Xpath can be evaluated in O(|D|*|Q|)
• Observtion: no need to consider the entire triple, only current context node
• Top-down or bottom-up evaluation with essentially the same algorithm
• But smaller tables (for every query node, all document nodes and values of evaluation) are maintained.
![Page 28: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/28.jpg)
Types are helpful
• Can direct the search– In some parts of the tree there is no hope to get a
match to a given sub-expression of the query– As a result we may have tables with less entries.
• Whiteboard discussion
![Page 29: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/29.jpg)
Type Checking and Inference
• Type checking a single document: straightforward– Polynomial combined complexity if automaton
representing type is deterministic, exponential in automaton size but polynomial in document size otherwise
• Type checking the results of a (Xpath) query• Inferring the results of a query
![Page 30: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/30.jpg)
Type Inference
• An (incomplete) algorithm for type inference can work its way to the top of the query parse tree to infer a type in a bottom-up fashion – Start by inferring a type for the leaves (simple
queries), then use it for their parents
• Type Inference is inherently incomplete.• Can be performed for some languages that
are “regular” in a sense.
![Page 31: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/31.jpg)
Restricted language allowing for type inference
• Axes: child, descendant, parent, ancestor, following-sibling, etc.
• variables can be bound to nodes in the input tree= then passed as parameters
• An equality test can be performed between node ID's, but not between node values.
![Page 32: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/32.jpg)
Type Checking
• In addition to inferring a type we need to verify containment in another type.
• Type Inference can be used as a tool for Type Checking.
• Type Checking was shown to be decidable for the same language fragment, but with high complexity.
![Page 33: Xpath Query Evaluation](https://reader036.vdocument.in/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/33.jpg)
Intuitive connection to text
• Queries => regular expressions• Types (tree automata) => context free
languages• Type Inference => intersection of context free
and regular languages, resulting in a context free one
• Type checking => Type Inference + inclusion of context free languages (with some restrictions to guarantee decidability)