1 holistic twig joins: optimal xml pattern matching acm sigmod 2002
TRANSCRIPT
![Page 1: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/1.jpg)
1
Holistic Twig Joins:Optimal XML Pattern Matching
ACM SIGMOD 2002
![Page 2: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/2.jpg)
2
In this lecture
The ProblemIdeaPreliminariesPathStack AlgorithmTwigStack AlgorithmConclusions
![Page 3: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/3.jpg)
3
The problem
To find semantically connected data in the XML document in the efficient way.
There are many intermediate results produced that doesn’t participate in the final answers.
![Page 4: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/4.jpg)
4
The problem (example)
For example we have this XQuery expression: book[ title = ‘XML’ ] // author [ fn = ‘jane’ and ln = ‘doe’]
We can translate it to the twig (small tree) patternbook
title
XML
author
fn
jane
ln
doe
![Page 5: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/5.jpg)
5
The problem (example)
In order to solve this problem we have to Find all binary relationships line (book, title) and
(author, fn) To connect all the patterns we have found to
the compile answer.The problem is that every book has title
but there are only some of the with title ‘XML’, so we produce many intermediate answers that doesn’t participate in the final answer.
![Page 6: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/6.jpg)
6
In this lecture
The ProblemIdeaPreliminariesPathStack AlgorithmTwigStack AlgorithmConclusions
![Page 7: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/7.jpg)
7
Idea
The main Idea of the paper is how to save intermediate results in a compact way.
To develop algorithm that will be independent of the size of intermediate results.
The is a family of stack based algorithms invented for this purpose.
![Page 8: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/8.jpg)
8
In this lecture
The ProblemIdeaPreliminariesPathStack AlgorithmTwigStack AlgorithmConclusions
![Page 9: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/9.jpg)
9
Representing position of elements
Every node in the XML document is represented as Leaf: 3-tuple (DocId, LeftPos, LevelNum) Node: 3-tuple (DocId, LeftPos : RightPos, LevelNum)
![Page 10: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/10.jpg)
10
Representing position of elements
For examplebook
title
XML
authors
(1,3,3)
(1,2:4,2)
(1,1:31,1)
(1,5:30,2)
author
fn
jane
ln
poe
(1,6:13,2)
(1,7:9,3)
(1,8,4)
(1,10:12,3)
(1,11,4)
author
fn
john
ln
doe
(1,14:21,2)
(1,15:17,3)
(1,16,4)
(1,18:20,3)
(1,19,4)
author
fn
jane
ln
doe
(1,22:29,2)
(1,23:25,3)
(1,24,2)
(1,26:28,2)
(1,27,2)
![Page 11: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/11.jpg)
11
Representing position of elements
For example
![Page 12: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/12.jpg)
12
Representing position of elements
Profits:Easy to determine
ancestor-descendant relationship a node n1(D1,L1:R1,N1) is descendant to node n2(D2,L2:R2,N2) iff D1 = D2 , L2<L1 and R1<R2
parent-child relationship a node n1(D1,L1:R1,N1) is parent to node n2(D2,L2:R2,N2) iff D1 = D2 , L2<L1 , R1<R2 and N1+1=N2
fn
(1,7:9,3)
book
(1,1:31,1)
ln
poe
(1,10:12,3)
(1,11,4)
![Page 13: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/13.jpg)
13
Representing position of elements
Available cases:
![Page 14: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/14.jpg)
14
Matching stream
A stream Tq contains positional representations of the database nodes that match the node q
The nodes in the stream are sorted by the (DocId,LeftPos)
![Page 15: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/15.jpg)
15
jane(1,8,4)
jane(1,24,2)
author(1,22:29,2)
author(1,14:21,2)
author
Matching stream (example)book
title
XML
authors
(1,3,3)
(1,2:4,2)
(1,1:31,1)
(1,5:30,2)
fn ln
poe
(1,7:9,3) (1,10:12,3)
(1,11,4)
fn
john
ln
doe
(1,15:17,3)
(1,16,4)
(1,18:20,3)
(1,19,4)
fn ln
doe
(1,23:25,3) (1,26:28,2)
(1,27,2)
Tauthor Tjane
(1,6:13,2)
author(1,14:21,2)
author(1,6:13,2)
author(1,22:29,2)
jane(1,8,4)
jane(1,24,2)
The operations available on the streams eof, advance, next, nextL, nextR
![Page 16: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/16.jpg)
16
Linked stacks
Idea: Repeatedly construct stacks that contain partial
and total answers Remove partial answers that couldn’t be
extended to total answers
![Page 17: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/17.jpg)
17
Linked stacks (example)
A1
B1
A2
B2
C1
Data
A
B
C
Query
A1B1
A2B2
C1
Stack encoding
A1 B1 C1
A2 B2 C1
A1 B2 C1
Query results
![Page 18: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/18.jpg)
18
In this lecture
The ProblemIdeaPreliminariesPathStack AlgorithmTwigStack AlgorithmConclusions
![Page 19: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/19.jpg)
19
Stack based algorithms
The stack based algorithms uses chain of linked stack to compactly represent partial and full results
![Page 20: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/20.jpg)
20
B1
B2
2:8
4:6
A1
A2
3:7
PathStack algorithm
C1
Data
A
B
C
Query
TA TB
1:9
5
TCA1
A2
3:7
1:9
B1
B2
2:8
4:6
C1
5
![Page 21: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/21.jpg)
21
PathStack algorithm
B1
B2
2:8
4:6
A1
A2
3:7
C1
Data
A
B
C
Query
A1 B1 C1
A1 B2 C1
A2 B2 C1
Query results
TA TB
1:9
5
TC
C1
5
Stack encoding
SC SB SA
A1
1:9
B1
2:8
A2
3:7
B2
4:6
Always take an element with smallest LeftPos
![Page 22: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/22.jpg)
22
C2
8
A
B
C
Query
A1 B1 C1
A1 B2 C1
A2 B2 C1
TA TB TC
Stack encoding
SC SB SA
A1
1:10
B1
2:9
A2
3:7
B2
4:6
B1
B2
2:9
4:6
A1
A2
3:7
C1
Data
1:10
5
Add C2
here
C2
8
A1 B1 C2
RightPos < LeftPos
PathStack algorithm
![Page 23: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/23.jpg)
23
PathStack algorithm problems
To find a twig we have to divide it to many paths and Again we have intermediate results that doesn’t
participate in the final result
authors
(5:30)
author
fn
jane
ln
poe
(6:13)
(7:9)
(8)
(10:12)
(11)
author
fn
john
ln
doe
(14:21)
(15:17)
(16)
(18:20)
(19)
author
fn
jane
ln
doe
(22:29)
(23:25)
(24)
(26:28)
(27)
Query
author
fn
jane
ln
doe
![Page 24: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/24.jpg)
24
In this lecture
The ProblemIdeaPreliminariesPathStack AlgorithmTwigStack AlgorithmConclusions
![Page 25: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/25.jpg)
25
TwigStack Algorithm
Idea Before adding the node to the stack check that
he has suns that satisfies the twig pattern. When checking the sons theirs sons are checked to
Now we can be shure that every path result is joinable with at least one other path result and participates in at least one full answer.
![Page 26: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/26.jpg)
26
TwigStack Algorithm
authors
(5:30)
author
fn
jane
ln
poe
(6:13)
(7:9)
(8)
(10:12)
(11)
author
fn
john
ln
doe
(14:21)
(15:17)
(16)
(18:20)
(19)
author
fn
jane
ln
doe
(22:29)
(23:25)
(24)
(26:28)
(27)
author
fn
jane
ln
doe
![Page 27: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/27.jpg)
27
In this lecture
The ProblemIdeaPreliminariesPathStack AlgorithmTwigStack AlgorithmConclusions
![Page 28: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/28.jpg)
28
Conclusions
The PathStack and TwigStack algorithms are effective in terms of amount of intermediate results
But: They are only effective for founding ancestor-
descendant relationships. If we have also parent-son relationships in the twig
then not all nodes that are inserted to the stacks participate in the final result.
![Page 29: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/29.jpg)
29
Brake ?
![Page 30: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/30.jpg)
30
Query Structured Text in an XML Database
ACM SIGMOD 2003
![Page 31: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/31.jpg)
31
In this lecture
AbstractIntroductionMotivationAlgebraAccess methodsConclusions
![Page 32: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/32.jpg)
32
Abstract
XML documents often contain documents with structured text
It is important to integrate “information retrieval” style query evaluation
It is well studied for natural languagesBut in the case of XML the data could
reside in element descendants.
![Page 33: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/33.jpg)
33
In this lecture
AbstractIntroductionMotivationAlgebraAccess methodsConclusions
![Page 34: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/34.jpg)
34
Introduction
Boolean style queries (XQuery) Useful when users are aware of the underlying
schema
But Users often don’t know the schema And collections of XML documents are
frequently heterogeneous.
![Page 35: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/35.jpg)
35
Introduction
So we have to use relevance ranking in order to define the IR on XML
Problem: traditional IR is “document-centric”
XML IR should Be much more granulated Take document structure into account Allow more complex analysis then
determination of relevance
![Page 36: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/36.jpg)
36
In this lecture
AbstractIntroductionMotivationAlgebraAccess methodsConclusions
![Page 37: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/37.jpg)
37
Motivation
article
article-title
InternetTechnologies
author
fname sname
Jane Doe
chapter
ct
Cashing andReplication
chapter
ct
Search andRetrieval
section
section-title
SearchEngine
section
section-title
InformationRetrieval
section
section-title
Examplesp p p
… Here are someIR based Search Engines: …
…search engine NewSearch uses a new information retrieval technology
semantic information retrieval techniques are also being incorporated into some search engines
#a1
#a2 #a3
#a4 #a5
#a6
#a7
#a10
#a11
…
…
#a12
#a13
#a14
#a15
#a16
#a17
#a18
#a19
#a20 We have the
following XML document named article.xml
![Page 38: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/38.jpg)
38
Motivation
Consider the query Find document components in articles.xml that are about “search engine”. Relevance to “internet” and “information retrieval” is desirable but not necessary.
Using AND and OR predicated will not give us the desirable result
![Page 39: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/39.jpg)
39
Motivation
article
article-title
InternetTechnologies
author
fname sname
Jane Doe
chapter
ct
Cashing andReplication
chapter
ct
Search andRetrieval
section
section-title
SearchEngine
section
section-title
InformationRetrieval
section
section-title
Examplesp p p
… Here are someIR based Search Engines: …
…search engine NewSearch uses aninformation retrieval technology
semantic information retrieval techniques are also being incorporated into somesearch engines
#a1
#a2 #a3
#a4 #a5
#a6
#a7
#a10
#a11
…
…
#a12
#a13
#a14
#a15
#a16
#a17
#a18
#a19
#a20 We have the
following XML document named article.xml
![Page 40: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/40.jpg)
40
Motivation
Illustrating granulation problemWhat elements to rank?
If we will rank article The user will see all the article while the relevant
information concentrated only in the third chapter If we will rank paragraphs
The paragraphs of the last section will be returned separately
• The semantic linkage is broken and has to be reconstructed by the user
![Page 41: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/41.jpg)
41
Motivation
IR-style XML queries don’t have to be stand alone
If the use know the structure of the XML document he can add some structural constraints and limit the number of uninteresting results
![Page 42: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/42.jpg)
42
In this lecture
AbstractIntroductionMotivationAlgebraAccess methodsConclusions
![Page 43: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/43.jpg)
43
Algebra
We want to fold into a database framework the notion of relevance scoring and ranking
![Page 44: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/44.jpg)
45
Algebra
Scored Data Tree Definition:
A rooted ordered tree, such that each node has attribute-value pairs, including at least a tag and a real number valued score
A score of a tree is a score of a root node Example:
article[3.6] #a1
author #a3
sname #a5
section[3.6] #a16
![Page 45: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/45.jpg)
46
Algebra
Scored Pattern Tree Definition:
P = (T,F,S)• T=>node-labeled and edge-labeled tree• F=> formula of boolean combination of predicates
applicable to nodes• S=> set of scoring function
![Page 46: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/46.jpg)
47
Algebra
Scored Pattern Tree Example:
Query2:Find document components in the artilce.xml that are part of an article written by an author with last name “Doe” and are about “search engine”. Relevance to “internet” and “information retrieval” is desirable but not necessary.
T: $1
$2
$3
$4
pc
pc
ad*
F:
$1.tag=article & $2.tag=author & $3.tag=sname & $3.content = “Doe”
S:
$4.score = { ScoreFoo({“search engine”},{“internet”,”information retrieval”})}
$1.score = $4.score
![Page 47: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/47.jpg)
48
Algebra
Common operators Selection => Scored Selection Projection => Scored Projection Join => Scored Join
New Operators Threshold Pick
![Page 48: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/48.jpg)
49
Algebra (New Operators)
Threshold
T: $1
$2
$3
$4
pc
pc
ad*
F:
$1.tag=article & $2.tag=author & $3.tag=sname & $3.content = “Doe”
S:
$4.score = { ScoreFoo({“search engine”},{“internet”,”information retrieval”})}
$1.score = $4.score
TC%a > ...
article[3.6] #a1
author #a3
sname #a5
section[3.6] #a16
article[3.6] #a3
author #a23
sname #a25
section[3.6] #a36
article[3.6] #a1
author #a3
sname #a5
section[3.6] #a16
article[3.6] #a3
author #a23
sname #a25
section[3.6] #a36
article[3.6] #a3
author #a23
sname #a25
section[3.6] #a36
![Page 49: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/49.jpg)
50
Algebra (New Operators)
Pick
article[3.6] #a1
author #a3
sname #a5
section[3.6] #a16
article[3.6] #a2
author #a13
sname #a15
section[3.6] #a26
article[3.6] #a3
author #a23
sname #a25
section[3.6] #a36
T: $1
$2
$3
$4
pc
pc
ad*
F:
$1.tag=article & $2.tag=author & $3.tag=sname & $3.content = “Doe”
S:
$4.score = { ScoreFoo({“search engine”},{“internet”,”information retrieval”})}
$1.score = $4.score
PC
article[3.6] #a1
author #a3
sname #a5
section[3.6] #a16
article[3.6] #a3
author #a23
sname #a25
section[3.6] #a36
![Page 50: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/50.jpg)
51
Pick Example:
Algebra (New Operators)
article[5.6] #a1
chapter[5.0] #a10
section[0.8] #a12
title[0.6] #a2
sname #a5
article[5.6] #a1
section[0.6] #a14 section[3.6] #a16
title[0.8] #a13 title[0.6] #a15 p[0.8] #a18 p[1.4] #a19 p[1.4] #a20
Data Tree
Pick Condition
Data is relevant if:
1. score > 0.8
2. more then 50% of children are relevant
3. it’s direct parent node is not picked
![Page 51: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/51.jpg)
52
Translating to XQuery
Query1 Find document components in articles.xml that are
about “search engine”. Relevance to “internet” and “information retrieval” is desirable but not necessary
XQueryFor $a in document(“articles.xml”)//article/descendant-or-self::*Score $a using ScoreFoo($a,{“search engine”}, {“internet”, ”information retrieval”})Pick $a using PickFoo($a)Return
<result><score>$a</score>
</result>Sortby( score )Threshold @score >=4 stop after 5
![Page 52: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/52.jpg)
53
In this lecture
AbstractIntroductionMotivationAlgebraAccess methodsConclusions
![Page 53: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/53.jpg)
54
Access Methods
Score-Generating Methods TermJoin
Score-Utilizing Methods Pick
![Page 54: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/54.jpg)
55
Score-Generating Methods
How to give initial score to the data treeThe score of every node should be
computed according to the amount of terms that we are searching in the node or it’s descendants.
![Page 55: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/55.jpg)
56
Naïve algorithm
For every node recompute the value of the scores of all it’s ancestors
a
b a
c a
The runtime is bad
![Page 56: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/56.jpg)
57
TermJoin
Stack Based algorithm Use a stack to store the ancestors of every
node Now all ancestors would be affected by the
node
![Page 57: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/57.jpg)
58
Ta
TermJoin
ab
bc ac
a b
(1:9)
(2:7)
(3:5)
ab (4)
(6)
(8)
ab(1:9)
a(3:5)
ab(4)
ac
(8)
Encoding Stack
Phrase: “a”1a
bc (2:7)
0a
1a
1a
2a
2a
3a
1a
4a
If we have more then one word in the phrase we will operate some matching streams simultaneously
1b
![Page 58: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/58.jpg)
59
Score-Utilizing Methods
Methods that help us to filter the data according to theirs scores
Two such methods are Threshold Pick
Pick could be much of challenge to implement
![Page 59: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/59.jpg)
60
Score-Utilizing Methods
Pick algorithm The most complex part of the algorithm is
removing redundancy. The is vertical (parent-child) and horizontal
(among the siblings, e.g. return the first author from the relevant article) redundancy.
The problem is solved with stack-based algorithm
![Page 60: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/60.jpg)
61
Pick algorithm
1chapter
2title 3section
Search andretrieval 4p 5p
6section 7section
… IR …Search engine
… Search engineretrieval of syntactic
information
score = 1
score = 2score = 2
score = 0
score = 4
score = 5
score = 0
Ancestor Stack
containing elements not yet fully explored
Main stackcontaining elements
can not yet be eliminated
2
score >= 2percentage >= 50%
1 0/1
4
3 1/1
4
5
2/25
3
67
1/21/31/4
![Page 61: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/61.jpg)
62
Algebra (New Operators)
Pick
T: $1
$4
ad*
F:
$1.tag=article
S:
$4.score = { ScoreFoo({“search engine”})}
$1.score = $4.score
PC
1chapter
2title 3section
Search andretrieval 4p 5p
6section 7section
… IR …Search engine
… Search engineretrieval of syntactic
information
score >= 2percentage >= 50%
section
p p
… IR …Search engine
… Search engineretrieval of syntactic
information
![Page 62: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/62.jpg)
63
In this lecture
AbstractIntroductionMotivationAlgebraAccess methodsConclusions
![Page 63: 1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e7d5503460f94b7f9c3/html5/thumbnails/63.jpg)
64
Conclusion
Stack based algorithms are used for efficient implementation of new ideas
Usable algebra is presented that deals with scoring and relevance in the XML keyword search
Possible extension of XQuery