g. gottlob, c. koch & r. pichler tu wien, vienna, austria elias politarhos advanced databases...
Post on 21-Dec-2015
213 views
TRANSCRIPT
G. Gottlob, C. Koch & R. Pichler
TU Wien, Vienna, Austria
Elias PolitarhosAdvanced DatabasesM.Sc. in Information SystemsAthens University of Economics & Business
XML Path Language Efficient Algorithms for processing
XPathQueries
XPath Queries2
Presentation Outline
XPath OverviewXPath Engines EfficiencyQuery Evaluation Algorithms
MINCONTEXT AlgorithmLinear-time fragments of XPathLinear-space fragments of XPathConclusions
XPath Queries3
XPath Overview
Proposed by W3CSelects nodes from XML document trees Importance
XML query languageCore of XML technologies
XSLT▫XPointerXQuery
Implementation approachesHighly inefficient▫Exponential
time
XML markup language eXtensible Stylesheet
Language Transformations
XPath Queries4
XPath Efficiency
XPath engines efficiency (Query Complexity)XT (J. Clark) ▫
Exponential
XALAN (Apache foundation) ▫Exponential
Saxon (M. Kay) ▫Exponential
Internet Explorer 6 (Microsoft) ▫ExponentialQuadratic Data Complexity for Simple Path
XSLT processor
XSLT processor
Web browser
XPath engine
XPath Queries5
Query Evaluation Algorithms (1/4)
Context-value table ▫Bottom-upXML: <a><b/><b/><b/><b/></a>Query: descendant::b/following-sibling::*[position()!=last()]
Polynomial complexity
Time: O(|D|5*|Q|2) Space: O(|D|4*|Q|2) ▫
•Stores values in “data pool”
•No recalculation
XPath Queries6
Query Evaluation Algorithms (2/4)
Bottom-up Algorithm
Bottom-up Semantics Function. Finds the semantics of expressions. Used here to find the query’s semantics
Find the semantics for every leaf and add them to R
Calculate expressions from the semantics in R, until the root of the Query has been reached
XPath Queries7
Query Evaluation Algorithms (3/4)
Consider the document <a><b/><b/><b/><b/></a>. Let dom = {r, a , b1, b2, b3, b4}, where b1…b4 denote the children of a in document order. We want to evaluate the XPath query Q=descendant::b/following-sibling::*[position()!=last()] over the input context <a, 1, 1>Q parse tree
Calculates context-value tables for leaves E1, E3, E5 and E6
From E5 and E6 calculates E4 and through E3 and E4 calculates E2
Finally, from E2 and E1 the result (Q) is calculated
XPath Queries8
Evaluation of XPath Queries (4/4)
Bottom-up quick but not practicalIrrelevant intermediate results
Top-down approachVector computation
Op<>(<x1,1,…,x1,k>,…, <xm,1,…,xm,k>) = <Op(x1,1,…,x1,k),…,Op(xm,1,…,xm,k)>
Polynomial complexityTime: O(|D|4*|Q|2)Space: O(|D|3*|Q|2)
XPath Queries9
MINCONTEXT Algorithm (1/6)
Context-value tableQ parse tree: |N|2 entries/node
Improved by Top-downResult depends on context info
MINCONTEXT AlgorithmContext information = SmallRestrict context
Relevant context
XPath Queries10
MINCONTEXT Algorithm (2/6)
Relevant contextBase cases
N Leaf NodeConstant | Boolean: Relev(N) = 0Position | Last: Relev(N) = {‘cp’ } | {‘cs’}Location Step | Function: Relev(N) = {‘cn’}
Compound expressionsN Inner Node
Location step: Relev(N) = {‘cn’}Others: Relev(N) =
▫ k
i iNlev1
)(Re
Context-value table| N|2 entries/node
Every possible context node calculated
MINCONTEXTResults N
Set of nodes xj N from any previous xi N
Polynomial complexityTime: O(|D|4*|Q|2)Space: O(|D|2*|Q|2)
XPath Queries11
MINCONTEXT Algorithm (3/6)
Eval_out: If input expression=location path evaluates Input: a node N & a node set Xdom.Output: set Y of nodes that can be reached via the expression from any context-node xX
Eval_by: Takes a node N in the parse tree and a setX of possible context-nodes. It does not return a result value. For every node M in the subtree rooted at N, computes table(M), if expr (M) does not depend on context-position/size.
Eval_single: evaluates XPath expressions for single context <x,p,s>. Input: Takes node N in the parse tree & context <x,p,s>. Output: result value for this context. This Procedure is called after eval_by has been called for the node N
XPath Queries12
MINCONTEXT Algorithm (4/6)
Example
]100*::|5.0*()()*[::/*::/ selflastpositiondescendantdescendantQ
XPath Queries15
Linear-time fragments of XPath (1/2)
Core XPathFragment of XPathClean logical coreOnly sets of nodes
No arithmetical opsNo string opsSet ops (, , -, )
Time: O(|D|*|Q|)
XPath Queries16
Linear-time fragments of XPath (2/2)
XPatternsContained in XPathExtends Core XPath
ID Axis relationπ1/id(π2)/π3 π1/ π2 /id/π3
π = location path
Time: O(|D|*|Q|)
XPath Queries17
Linear-space fragments of XPath
Extended Wadler Fragment of XPathNo select, count or sum data functionsNo expressions “nodeSet RelOp nodeSet”In expressions id(id(…(string)…))
String does not depend on context
OPTMINCONTEXTSpace: O(|D|*|Q|2)Time: O(|D|2*|Q|2)
XPath Queries18
Conclusions
XPath query evaluation algorithmsContext-value table based
Bottom-upTop-down
MINCONTEXTOPTMINCONTEXT
Polynomial time
Linear complexity fragments of XPathQuery evaluation can be further optimized