g. gottlob, c. koch & r. pichler tu wien, vienna, austria elias politarhos advanced databases...

19
G. Gottlob, C. Koch & R. Pichler TU Wien, Vienna, Austria Elias Politarhos Advanced Databases M.Sc. in Information Systems Athens University of Economics & Busi XML Path Language Efficient Algorithms for processing XPath Queries

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

G. Gottlob, C. Koch & R. Pichler

TU Wien, Vienna, Austria

Elias PolitarhosAdvanced DatabasesM.Sc. in Information SystemsAthens University of Economics & Business

XML Path Language Efficient Algorithms for processing

XPathQueries

XPath Queries2

Presentation Outline

XPath OverviewXPath Engines EfficiencyQuery Evaluation Algorithms

MINCONTEXT AlgorithmLinear-time fragments of XPathLinear-space fragments of XPathConclusions

XPath Queries3

XPath Overview

Proposed by W3CSelects nodes from XML document trees Importance

XML query languageCore of XML technologies

XSLT▫XPointerXQuery

Implementation approachesHighly inefficient▫Exponential

time

XML markup language eXtensible Stylesheet

Language Transformations

XPath Queries4

XPath Efficiency

XPath engines efficiency (Query Complexity)XT (J. Clark) ▫

Exponential

XALAN (Apache foundation) ▫Exponential

Saxon (M. Kay) ▫Exponential

Internet Explorer 6 (Microsoft) ▫ExponentialQuadratic Data Complexity for Simple Path

XSLT processor

XSLT processor

Web browser

XPath engine

XPath Queries5

Query Evaluation Algorithms (1/4)

Context-value table ▫Bottom-upXML: <a><b/><b/><b/><b/></a>Query: descendant::b/following-sibling::*[position()!=last()]

Polynomial complexity

Time: O(|D|5*|Q|2) Space: O(|D|4*|Q|2) ▫

•Stores values in “data pool”

•No recalculation

XPath Queries6

Query Evaluation Algorithms (2/4)

Bottom-up Algorithm

Bottom-up Semantics Function. Finds the semantics of expressions. Used here to find the query’s semantics

Find the semantics for every leaf and add them to R

Calculate expressions from the semantics in R, until the root of the Query has been reached

XPath Queries7

Query Evaluation Algorithms (3/4)

Consider the document <a><b/><b/><b/><b/></a>. Let dom = {r, a , b1, b2, b3, b4}, where b1…b4 denote the children of a in document order. We want to evaluate the XPath query Q=descendant::b/following-sibling::*[position()!=last()] over the input context <a, 1, 1>Q parse tree

Calculates context-value tables for leaves E1, E3, E5 and E6

From E5 and E6 calculates E4 and through E3 and E4 calculates E2

Finally, from E2 and E1 the result (Q) is calculated

XPath Queries8

Evaluation of XPath Queries (4/4)

Bottom-up quick but not practicalIrrelevant intermediate results

Top-down approachVector computation

Op<>(<x1,1,…,x1,k>,…, <xm,1,…,xm,k>) = <Op(x1,1,…,x1,k),…,Op(xm,1,…,xm,k)>

Polynomial complexityTime: O(|D|4*|Q|2)Space: O(|D|3*|Q|2)

XPath Queries9

MINCONTEXT Algorithm (1/6)

Context-value tableQ parse tree: |N|2 entries/node

Improved by Top-downResult depends on context info

MINCONTEXT AlgorithmContext information = SmallRestrict context

Relevant context

XPath Queries10

MINCONTEXT Algorithm (2/6)

Relevant contextBase cases

N Leaf NodeConstant | Boolean: Relev(N) = 0Position | Last: Relev(N) = {‘cp’ } | {‘cs’}Location Step | Function: Relev(N) = {‘cn’}

Compound expressionsN Inner Node

Location step: Relev(N) = {‘cn’}Others: Relev(N) =

▫ k

i iNlev1

)(Re

Context-value table| N|2 entries/node

Every possible context node calculated

MINCONTEXTResults N

Set of nodes xj N from any previous xi N

Polynomial complexityTime: O(|D|4*|Q|2)Space: O(|D|2*|Q|2)

XPath Queries11

MINCONTEXT Algorithm (3/6)

Eval_out: If input expression=location path evaluates Input: a node N & a node set Xdom.Output: set Y of nodes that can be reached via the expression from any context-node xX

Eval_by: Takes a node N in the parse tree and a setX of possible context-nodes. It does not return a result value. For every node M in the subtree rooted at N, computes table(M), if expr (M) does not depend on context-position/size.

Eval_single: evaluates XPath expressions for single context <x,p,s>. Input: Takes node N in the parse tree & context <x,p,s>. Output: result value for this context. This Procedure is called after eval_by has been called for the node N

XPath Queries12

MINCONTEXT Algorithm (4/6)

Example

]100*::|5.0*()()*[::/*::/ selflastpositiondescendantdescendantQ

XPath Queries13

MINCONTEXT Algorithm (5/6)

Example

XPath Queries14

MINCONTEXT Algorithm (6/6)

Example MINCONTEXT

XPath Queries15

Linear-time fragments of XPath (1/2)

Core XPathFragment of XPathClean logical coreOnly sets of nodes

No arithmetical opsNo string opsSet ops (, , -, )

Time: O(|D|*|Q|)

XPath Queries16

Linear-time fragments of XPath (2/2)

XPatternsContained in XPathExtends Core XPath

ID Axis relationπ1/id(π2)/π3 π1/ π2 /id/π3

π = location path

Time: O(|D|*|Q|)

XPath Queries17

Linear-space fragments of XPath

Extended Wadler Fragment of XPathNo select, count or sum data functionsNo expressions “nodeSet RelOp nodeSet”In expressions id(id(…(string)…))

String does not depend on context

OPTMINCONTEXTSpace: O(|D|*|Q|2)Time: O(|D|2*|Q|2)

XPath Queries18

Conclusions

XPath query evaluation algorithmsContext-value table based

Bottom-upTop-down

MINCONTEXTOPTMINCONTEXT

Polynomial time

Linear complexity fragments of XPathQuery evaluation can be further optimized

XPath Queries19

dilu 2004