approximate string search in spatial databaseslifeifei/papers/sas.pdfapproximate string search is...
Post on 11-Mar-2020
3 Views
Preview:
TRANSCRIPT
1-1
Approximate String Search in Spatial Databases
Bin Yao
Florida StateUniversity
Kun Hou
Florida StateUniversity
Marios Hadjieleftheriou
AT&T Labs Research
Feifei Li
Florida StateUniversity
2-1
Introduction
Approximate string search is important
data cleaning
data integration
online search engine
2-2
Introduction
Approximate string search is important
Also, spatial database
data cleaning
data integration
online search engine
map service
strategic planning of resources
2-3
Introduction
Approximate string search is important
Also, spatial database
Our work: the approximate string search in spatial database(Spatial Approximate String (SAS) queries).
data cleaning
data integration
online search engine
map service
strategic planning of resources
3-1
Example of a SAS range query
String similarity metric: edit distance with threshold τ .
3-2
Example of a SAS range query
String similarity metric: edit distance with threshold τ .
3-3
Example of a SAS range query
String similarity metric: edit distance with threshold τ .
4-1
A straightforward solution
R-tree solution:
4-2
A straightforward solution
R-tree solution:
4-3
A straightforward solution
R-tree solution:
4-4
A straightforward solution
R-tree solution:
Problem: only utilize the spatial dimension for the pruning.
5-1
Background: q-grams based edit distance pruning
Lemma: For strings σ1 and σ2 of length |σ1| and |σ2|, ifε(σ1, σ2) = τ , then |Gσ1 ∩Gσ2 | ≥ max(|σ1|, |σ2|)− 1− (τ −1) ∗ q.
5-2
Background: q-grams based edit distance pruning
Example: q = 2, τ = 2σ1 : theatre; σ2 : theater.
Lemma: For strings σ1 and σ2 of length |σ1| and |σ2|, ifε(σ1, σ2) = τ , then |Gσ1 ∩Gσ2 | ≥ max(|σ1|, |σ2|)− 1− (τ −1) ∗ q.
5-3
Background: q-grams based edit distance pruning
Example: q = 2, τ = 2σ1 : theatre; σ2 : theater.
Lemma: For strings σ1 and σ2 of length |σ1| and |σ2|, ifε(σ1, σ2) = τ , then |Gσ1 ∩Gσ2 | ≥ max(|σ1|, |σ2|)− 1− (τ −1) ∗ q.
Gσ1{#t, th, he, ea, at, tr, re, e$},Gσ2{#t, th, he, ea, at, te, er, r$}.
5-4
Background: q-grams based edit distance pruning
Example: q = 2, τ = 2σ1 : theatre; σ2 : theater.
Lemma: For strings σ1 and σ2 of length |σ1| and |σ2|, ifε(σ1, σ2) = τ , then |Gσ1 ∩Gσ2 | ≥ max(|σ1|, |σ2|)− 1− (τ −1) ∗ q.
Gσ1{#t, th, he, ea, at, tr, re, e$},Gσ2{#t, th, he, ea, at, te, er, r$}.
6-1
Background: estimating set resemblance
Set resemblance: two sets A and Bρ(A,B) = |A∩B|
|A∪B| .
6-2
Background: estimating set resemblance
Set resemblance: two sets A and Bρ(A,B) = |A∩B|
|A∪B| .
The min-wise signature:Any set X, any x ∈ X,π ∈ F (a family of min-wise independentpermutations),
Pr(min{π(X)}) = π(x)) = 1|X| .
6-3
Background: estimating set resemblance
Set resemblance: two sets A and Bρ(A,B) = |A∩B|
|A∪B| .
s(A) = {min{π1(A)}, . . . ,min{π`(A)}},s(A ∪B) = {min{π1(A), π1(B)}, . . . ,min{π`(A), π`(B)}}.
The min-wise signature:Any set X, any x ∈ X,π ∈ F (a family of min-wise independentpermutations),
Pr(min{π(X)}) = π(x)) = 1|X| .
6-4
Background: estimating set resemblance
Unbiased estimator for ρ(A,B):
ρ(A,B) = Pr(min{π(A)} = min{π(B)}).
Set resemblance: two sets A and Bρ(A,B) = |A∩B|
|A∪B| .
s(A) = {min{π1(A)}, . . . ,min{π`(A)}},s(A ∪B) = {min{π1(A), π1(B)}, . . . ,min{π`(A), π`(B)}}.
The min-wise signature:Any set X, any x ∈ X,π ∈ F (a family of min-wise independentpermutations),
Pr(min{π(X)}) = π(x)) = 1|X| .
6-5
Background: estimating set resemblance
Unbiased estimator for ρ(A,B):
ρ(A,B) = Pr(min{π(A)} = min{π(B)}).
Set resemblance: two sets A and Bρ(A,B) = |A∩B|
|A∪B| .
s(A) = {min{π1(A)}, . . . ,min{π`(A)}},s(A ∪B) = {min{π1(A), π1(B)}, . . . ,min{π`(A), π`(B)}}.
ρ(A,B) =|{i|min{πi(A)} = min{πi(B)}}|
`.
The min-wise signature:Any set X, any x ∈ X,π ∈ F (a family of min-wise independentpermutations),
Pr(min{π(X)}) = π(x)) = 1|X| .
7-1
Background: estimating set resemblance
hashes 1 2 4h1 1 3 4h2 3 1 2h3 2 4 3h4 4 2 1
Set A = {1, 2, 4}
7-2
Background: estimating set resemblance
hashes 1 2 4h1 1 3 4h2 3 1 2h3 2 4 3h4 4 2 1
Set A = {1, 2, 4}
signatureof A
1121
7-3
Background: estimating set resemblance
hashes 1 2 4h1 1 3 4h2 3 1 2h3 2 4 3h4 4 2 1
Set A = {1, 2, 4}
signatureof A
1121
Set B = {2, 3}hashes 2 3h1 3 2h2 1 4h3 4 1h4 2 3
signatureof B
2112
7-4
Background: estimating set resemblance
hashes 1 2 4h1 1 3 4h2 3 1 2h3 2 4 3h4 4 2 1
Set A = {1, 2, 4}
signatureof A
1121
Set B = {2, 3}hashes 2 3h1 3 2h2 1 4h3 4 1h4 2 3
signatureof B
2112
7-5
Background: estimating set resemblance
hashes 1 2 4h1 1 3 4h2 3 1 2h3 2 4 3h4 4 2 1
Set A = {1, 2, 4}
signatureof A
1121
Set B = {2, 3}hashes 2 3h1 3 2h2 1 4h3 4 1h4 2 3
signatureof B
2112
ρ(A,B) = 1/4
8-1
The MHR-tree: basic idea
8-2
The MHR-tree: basic idea
Lemma 2 Let Gu be the set for the union of q-grams of strings inthe subtree of node u. For a SAS query (r, σ, τ), if |Gu∩Gσ| <|σ|− 1− (τ − 1) ∗ q, then the subtree of node u does not containany element from As.
9-1
The MHR-tree: union of q-grams (q = 2)
· · ·ab cd ab ef gh ij kl nr
9-2
The MHR-tree: union of q-grams (q = 2)
· · ·
· · ·
ab cd ab ef gh ij kl nr
ab, cd, ef gh, ij, kl, nr
9-3
The MHR-tree: union of q-grams (q = 2)
· · ·
· · ·
· · ·
ab cd ab ef gh ij kl nr
ab, cd, ef, gh, ij, kl, nr
ab, cd, ef gh, ij, kl, nr
9-4
The MHR-tree: union of q-grams (q = 2)
· · ·
· · ·
· · ·
ab cd ab ef gh ij kl nr
ab, cd, ef, gh, ij, kl, nr
ab, cd, ef gh, ij, kl, nr
MBR
10-1
The MHR-tree
Min-wise signature with linear hashing R-tree (MHR-tree)
10-2
The MHR-tree
2, 1, 3 4, 1, 2 5, 2, 1 7, 2, 5 6, 2, 1 · · ·1, 6, 3 2, 7, 4 8, 3, 2
Min-wise signature with linear hashing R-tree (MHR-tree)
10-3
The MHR-tree
2, 1, 3 4, 1, 2 5, 2, 1 7, 2, 5 6, 2, 1 · · ·1, 6, 3 2, 7, 4 8, 3, 2
Min-wise signature with linear hashing R-tree (MHR-tree)
10-4
The MHR-tree
2, 1, 3 4, 1, 2 5, 2, 1 7, 2, 5 6, 2, 1 · · ·1, 6, 3 2, 7, 4 8, 3, 2
2, 1, 1 1, 2, 2 · · ·
Min-wise signature with linear hashing R-tree (MHR-tree)
10-5
The MHR-tree
2, 1, 3 4, 1, 2 5, 2, 1 7, 2, 5 6, 2, 1 · · ·1, 6, 3 2, 7, 4 8, 3, 2
2, 1, 1 1, 2, 2 · · ·
Min-wise signature with linear hashing R-tree (MHR-tree)
10-6
The MHR-tree
2, 1, 3 4, 1, 2 5, 2, 1 7, 2, 5 6, 2, 1 · · ·1, 6, 3 2, 7, 4 8, 3, 2
2, 1, 1 1, 2, 2
1, 1, 1
· · ·
Min-wise signature with linear hashing R-tree (MHR-tree)
10-7
The MHR-tree
2, 1, 3 4, 1, 2 5, 2, 1 7, 2, 5 6, 2, 1 · · ·1, 6, 3 2, 7, 4 8, 3, 2
2, 1, 1 1, 2, 2
1, 1, 1
· · ·
Min-wise signature with linear hashing R-tree (MHR-tree)
MBR
11-1
Query algorithms for the MHR-tree
Range-MHR(MHR-tree R, Range r, String σ, int τ)
Follow the range query algorithm on R-tree,
If u is a leaf nodeFor every point p ∈ up
If p is contained in r and |Gp ∩Gσ| ≥ max(|σp|, |σ|)−1− (τ − 1) ∗ q and ε(σp, σ) < τ then Insert p in A;
ElseFor every child node wi of u
If r and MBR(wi) intersect, and |Gwi ∩Gσ| ≥ |σ| −1− (τ − 1) ∗ qinsert wi into queue;
Return A
11-2
Query algorithms for the MHR-tree
Range-MHR(MHR-tree R, Range r, String σ, int τ)
Follow the range query algorithm on R-tree,
If u is a leaf nodeFor every point p ∈ up
If p is contained in r and |Gp ∩Gσ| ≥ max(|σp|, |σ|)−1− (τ − 1) ∗ q and ε(σp, σ) < τ then Insert p in A;
ElseFor every child node wi of u
If r and MBR(wi) intersect, and |Gwi ∩Gσ| ≥ |σ| −1− (τ − 1) ∗ qinsert wi into queue;
Return A
11-3
Query algorithms for the MHR-tree
Range-MHR(MHR-tree R, Range r, String σ, int τ)
Follow the range query algorithm on R-tree,
If u is a leaf nodeFor every point p ∈ up
If p is contained in r and |Gp ∩Gσ| ≥ max(|σp|, |σ|)−1− (τ − 1) ∗ q and ε(σp, σ) < τ then Insert p in A;
ElseFor every child node wi of u
If r and MBR(wi) intersect, and |Gwi ∩Gσ| ≥ |σ| −1− (τ − 1) ∗ qinsert wi into queue;
Return A
11-4
Query algorithms for the MHR-tree
Range-MHR(MHR-tree R, Range r, String σ, int τ)
Follow the range query algorithm on R-tree,
If u is a leaf nodeFor every point p ∈ up
If p is contained in r and |Gp ∩Gσ| ≥ max(|σp|, |σ|)−1− (τ − 1) ∗ q and ε(σp, σ) < τ then Insert p in A;
ElseFor every child node wi of u
If r and MBR(wi) intersect, and |Gwi ∩Gσ| ≥ |σ| −1− (τ − 1) ∗ qinsert wi into queue;
Return A
11-5
Query algorithms for the MHR-tree
Range-MHR(MHR-tree R, Range r, String σ, int τ)
Follow the range query algorithm on R-tree,
If u is a leaf nodeFor every point p ∈ up
If p is contained in r and |Gp ∩Gσ| ≥ max(|σp|, |σ|)−1− (τ − 1) ∗ q and ε(σp, σ) < τ then Insert p in A;
ElseFor every child node wi of u
If r and MBR(wi) intersect, and |Gwi ∩Gσ| ≥ |σ| −1− (τ − 1) ∗ qinsert wi into queue;
Return A
12-1
Query algorithms for the MHR-tree
Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.
12-2
Query algorithms for the MHR-tree
Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.
σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5
12-3
Query algorithms for the MHR-tree
Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.
σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5
12-4
Query algorithms for the MHR-tree
Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.
σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5ρ(G,Gσ) = |{i|min{hi(G)}=min{hi(Gσ)}}|
` = 3/4.
12-5
Query algorithms for the MHR-tree
Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.
σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5ρ(G,Gσ) = |{i|min{hi(G)}=min{hi(Gσ)}}|
` = 3/4.
ρ(G,Gσ) = |G∩Gσ||G∪Gσ| = |(Gu∪Gσ)∩Gσ|
|(Gu∪Gσ)∪Gσ| = |Gσ||Gu∪Gσ| .
12-6
Query algorithms for the MHR-tree
Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.
σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5ρ(G,Gσ) = |{i|min{hi(G)}=min{hi(Gσ)}}|
` = 3/4.
ρ(G,Gσ) = |G∩Gσ||G∪Gσ| = |(Gu∪Gσ)∩Gσ|
|(Gu∪Gσ)∪Gσ| = |Gσ||Gu∪Gσ| .
12-7
Query algorithms for the MHR-tree
Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.
σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5ρ(G,Gσ) = |{i|min{hi(G)}=min{hi(Gσ)}}|
` = 3/4.
ρ(G,Gσ) = |G∩Gσ||G∪Gσ| = |(Gu∪Gσ)∩Gσ|
|(Gu∪Gσ)∪Gσ| = |Gσ||Gu∪Gσ| .
|Gu ∪Gσ| = |Gσ|ρ(G,Gσ) = 5/(3/4) = 20/3.
12-8
Query algorithms for the MHR-tree
Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.
σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5ρ(G,Gσ) = |{i|min{hi(G)}=min{hi(Gσ)}}|
` = 3/4.
ρ(G,Gσ) = |G∩Gσ||G∪Gσ| = |(Gu∪Gσ)∩Gσ|
|(Gu∪Gσ)∪Gσ| = |Gσ||Gu∪Gσ| .
|Gu ∪Gσ| = |Gσ|ρ(G,Gσ) = 5/(3/4) = 20/3.
ρ(Gu, Gσ) = |{i|min{hi(Gu)}=min{hi(Gσ)}}|` = 3/4.
12-9
Query algorithms for the MHR-tree
Key issue: estimating |Gu ∩Gσ| using s(Gu) and σ.
σ: crab Gσ: #c,cr,ra,ab,b$s(Gσ): 3, 1, 4, 5 s(Gu): 3, 1, 3, 5Let G = Gu ∪Gσ, s(G) = s(Gu ∪Gσ) = 3, 1, 3, 5ρ(G,Gσ) = |{i|min{hi(G)}=min{hi(Gσ)}}|
` = 3/4.
ρ(G,Gσ) = |G∩Gσ||G∪Gσ| = |(Gu∪Gσ)∩Gσ|
|(Gu∪Gσ)∪Gσ| = |Gσ||Gu∪Gσ| .
|Gu ∪Gσ| = |Gσ|ρ(G,Gσ) = 5/(3/4) = 20/3.
ρ(Gu, Gσ) = |{i|min{hi(Gu)}=min{hi(Gσ)}}|` = 3/4.
|Gu ∩Gσ| = ρ(Gu, Gσ) ∗ |Gu ∪Gσ| = 3/4 ∗ 20/3 = 5.
13-1
Duplicate q-grams in strings
Issue 1: duplicate q-grams in one string (for both query and data).
13-2
Duplicate q-grams in strings
Issue 1: duplicate q-grams in one string (for both query and data).
example:s: aabbaabb, {1#a,1aa, 1ab, 1bb, 1ba, 2aa, 2ab, 2bb, 1b$}q: aabcaa, {1#a, 1aa, 1ab, 1bc, 1ca, 2aa, 1a$}
13-3
Duplicate q-grams in strings
Issue 1: duplicate q-grams in one string (for both query and data).
example:s: aabbaabb, {1#a,1aa, 1ab, 1bb, 1ba, 2aa, 2ab, 2bb, 1b$}q: aabcaa, {1#a, 1aa, 1ab, 1bc, 1ca, 2aa, 1a$}
13-4
Duplicate q-grams in strings
Issue 1: duplicate q-grams in one string (for both query and data).
example:s: aabbaabb, {1#a,1aa, 1ab, 1bb, 1ba, 2aa, 2ab, 2bb, 1b$}q: aabcaa, {1#a, 1aa, 1ab, 1bc, 1ca, 2aa, 1a$}
13-5
Duplicate q-grams in strings
Issue 1: duplicate q-grams in one string (for both query and data).
example:s: aabbaabb, {1#a,1aa, 1ab, 1bb, 1ba, 2aa, 2ab, 2bb, 1b$}q: aabcaa, {1#a, 1aa, 1ab, 1bc, 1ca, 2aa, 1a$}
Issue 2: duplicate q-grams between strings.
13-6
Duplicate q-grams in strings
Issue 1: duplicate q-grams in one string (for both query and data).
example:s: aabbaabb, {1#a,1aa, 1ab, 1bb, 1ba, 2aa, 2ab, 2bb, 1b$}q: aabcaa, {1#a, 1aa, 1ab, 1bc, 1ca, 2aa, 1a$}
Issue 2: duplicate q-grams between strings.
Do not distinguish q-grams from different nodes.node 1: pizz (#p, pi, iz, zz, z$);node 2: zza (#z, zz, za, a$)parent node 3:union of signatures corresponding to (#p, pi, iz, zz, z$, #z,za, a$)
14-1
Selectivity estimation for SAS range queries
Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).
14-2
Selectivity estimation for SAS range queries
Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).
14-3
Selectivity estimation for SAS range queries
Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).
b1
b2
b3
14-4
Selectivity estimation for SAS range queries
Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).
b1
b2
b3
Θ(b2)
14-5
Selectivity estimation for SAS range queries
Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).
b1
b2
b3
Θ(b2)
14-6
Selectivity estimation for SAS range queries
Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).
V Sol1b1
b2
b3
V Sol2
V Sol3
Θ(b2)
14-7
Selectivity estimation for SAS range queries
Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).
V Sol1b1
b2
b3
V Sol2
V Sol3
r
Θ(b2)
14-8
Selectivity estimation for SAS range queries
Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).
V Sol1b1
b2
b3
V Sol2
V Sol3
r
Θ(b2)
Θ(b2, r)
14-9
Selectivity estimation for SAS range queries
Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).
V Sol1b1
b2
b3
V Sol2
V Sol3
r
Θ(b2)
Θ(b2, r)
|Abi | = niΘ(bi,r)Θ(bi)
ρiLMni
= Θ(bi,r)Θ(bi)
ρiLM . (ρiLM estimates the number
of similar strings with query string in bi.)
14-10
Selectivity estimation for SAS range queries
Combine the range query selectivity estimator with the stringselectivity estimator (V Sol [Mazeika et al.2007] based on min-wise signatures of inverted lists of q-grams).
V Sol1b1
b2
b3
V Sol2
V Sol3
r
Θ(b2)
Θ(b2, r)
|Abi | = niΘ(bi,r)Θ(bi)
ρiLMni
= Θ(bi,r)Θ(bi)
ρiLM . (ρiLM estimates the number
of similar strings with query string in bi.)
Improvements:Minimum number of neighborhoods principle,Spatial uniformity principle.
15-1
Two improvements for selectivity estimation
Minimum number of neighborhoods principle:
15-2
Two improvements for selectivity estimation
Minimum number of neighborhoods principle:
toyτ = 1
boycoy
min
men
mine
too
15-3
Two improvements for selectivity estimation
Minimum number of neighborhoods principle:
toyτ = 1
boycoy
min
men
mine
tooη = 4
15-4
Two improvements for selectivity estimation
Minimum number of neighborhoods principle:
toyτ = 1
boycoy
min
men
mine
tooη = 2
15-5
Two improvements for selectivity estimation
Minimum number of neighborhoods principle:
toyτ = 1
boycoy
min
men
mine
tooη = 2
A smaller η give a more accurate estimator.
15-6
Two improvements for selectivity estimation
Minimum number of neighborhoods principle:
Spatial uniformity principle:
toyτ = 1
boycoy
min
men
mine
tooη = 2
A smaller η give a more accurate estimator.
15-7
Two improvements for selectivity estimation
Minimum number of neighborhoods principle:
Spatial uniformity principle:
toyτ = 1
boycoy
min
men
mine
tooη = 2
A smaller η give a more accurate estimator.
15-8
Two improvements for selectivity estimation
Minimum number of neighborhoods principle:
Spatial uniformity principle:
toyτ = 1
boycoy
min
men
mine
tooη = 2
A smaller η give a more accurate estimator.
15-9
Two improvements for selectivity estimation
Minimum number of neighborhoods principle:
Spatial uniformity principle:
toyτ = 1
boycoy
min
men
mine
tooη = 2
A smaller η give a more accurate estimator.
16-1
The partitioning metric
Neighborhood and uniformity quality of b:∆(b) = ηbnb
∑1,...,dXi,
{X1, . . . , Xd}: the side lengths of b in each dimension.
16-2
The partitioning metric
Neighborhood and uniformity quality of b:∆(b) = ηbnb
∑1,...,dXi,
{X1, . . . , Xd}: the side lengths of b in each dimension.
Our goal:Minimize
∑ki=1 ∆(bi), where k is the number of buckets specified
by the user.
16-3
The partitioning metric
Neighborhood and uniformity quality of b:∆(b) = ηbnb
∑1,...,dXi,
{X1, . . . , Xd}: the side lengths of b in each dimension.
The greedy algorithm;The adaptive R-tree algorithm.
Our goal:Minimize
∑ki=1 ∆(bi), where k is the number of buckets specified
by the user.
17-1
The greedy algorithm
b1
b2
b3
17-2
The greedy algorithm
b1
b2
b3 ?
17-3
The greedy algorithm
b1
b2
b3
η3 · n3 ·Π(b3)
?
ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.
Π(bi): the perimeter of bi.
17-4
The greedy algorithm
b1
b2
b3
η3 · n3 ·Π(b3)
?
P
ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.
Π(bi): the perimeter of bi.
17-5
The greedy algorithm
b1
b2
b3
η3 · n3 ·Π(b3)
?
Θ(P )
ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.
Π(bi): the perimeter of bi.
17-6
The greedy algorithm
b1
b2
b3
η3 · n3 ·Π(b3)
?
k − 3 · ( ηk−3 ·
|P |k−3 ·
(Θ(P )k−3
)1/d
· d)
Θ(P )
ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.η: number of neighborhoods of strings for remaining points.
Π(bi): the perimeter of bi.
k − 3 buckets.
17-7
The greedy algorithm
b1
b2
b3
η3 · n3 ·Π(b3)
?
U(bi) = ηi · ni ·Π(bi) + ηk−i · |P | ·
(Θ(P )k−i
)1/d
· d(Π(bi) is the perimeter of bucket bi)
k − 3 · ( ηk−3 ·
|P |k−3 ·
(Θ(P )k−3
)1/d
· d)
Θ(P )
ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.η: number of neighborhoods of strings for remaining points.
Π(bi): the perimeter of bi.
k − 3 buckets.
17-8
The greedy algorithm
b1
b2
b3
η3 · n3 ·Π(b3)
U(bi) = ηi · ni ·Π(bi) + ηk−i · |P | ·
(Θ(P )k−i
)1/d
· d(Π(bi) is the perimeter of bucket bi)
k − 3 · ( ηk−3 ·
|P |k−3 ·
(Θ(P )k−3
)1/d
· d)
Θ(P )
ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.η: number of neighborhoods of strings for remaining points.
Π(bi): the perimeter of bi.
k − 3 buckets.
17-9
The greedy algorithm
b1
b2
b3
U(bi) = ηi · ni ·Π(bi) + ηk−i · |P | ·
(Θ(P )k−i
)1/d
· d(Π(bi) is the perimeter of bucket bi)
17-10
The greedy algorithm
b1
b2
b3
b4
17-11
The greedy algorithm
b1
b2
b3
b4 b5
18-1
The adaptive R-tree algorithm
k = 6
· · ·· · · · · · · · · · · ·
R-tree root
18-2
The adaptive R-tree algorithm
k = 6
· · ·· · · · · · · · · · · ·
level3
R-tree root
18-3
The adaptive R-tree algorithm
k = 6
· · ·· · · · · · · · · · · ·
level3
R-tree root
18-4
The adaptive R-tree algorithm
k = 6
· · ·· · · · · · · · · · · ·
level3
R-tree root
b3
18-5
The adaptive R-tree algorithm
k = 6
· · ·· · · · · · · · · · · ·
level3
R-tree root
b3
?
18-6
The adaptive R-tree algorithm
k = 6
· · ·· · · · · · · · · · · ·
level3
R-tree root
η3 · n3 ·Π(b3)
ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.Π(bi): the perimeter of bi.
18-7
The adaptive R-tree algorithm
k = 6
· · ·· · · · · · · · · · · ·
level3
R-tree root
η3 · n3 ·Π(b3) ηk−3 · n · (
rk−3 )1/d · d
r
ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.Π(bi): the perimeter of bi.η: number of neighborhoods of strings for remaining points.
18-8
The adaptive R-tree algorithm
k = 6
· · ·· · · · · · · · · · · ·
level3
R-tree root
U ′(bi) = ηi · ni ·Π(bi) + ηk−i · n · (
rk−i )
1/d · d.
η3 · n3 ·Π(b3) ηk−3 · n · (
rk−3 )1/d · d
r
ηi: the number of neighborhoods of strings in bi.ni: the number of points in bi.Π(bi): the perimeter of bi.η: number of neighborhoods of strings for remaining points.
18-9
The adaptive R-tree algorithm
k = 6
· · ·· · · · · · · · · · · ·
level3
R-tree root
18-10
The adaptive R-tree algorithm
k = 6
· · ·· · · · · · · · · · · ·
level3
R-tree root
19-1
Experiment setup
All experiments were executed on a Linux machine with an IntelXeon CPU at 2GHz and 2GB of memory.
19-2
Experiment setup
All experiments were executed on a Linux machine with an IntelXeon CPU at 2GHz and 2GB of memory.Real data sets:The road-networks for states (Texas, California) in USA, each pointis associated with the names of the state, county and town.Synthetic data sets: uniform points and random clustered points.Assign strings from real data sets randomly to the point generated.
19-3
Experiment setup
All experiments were executed on a Linux machine with an IntelXeon CPU at 2GHz and 2GB of memory.Real data sets:The road-networks for states (Texas, California) in USA, each pointis associated with the names of the state, county and town.Synthetic data sets: uniform points and random clustered points.Assign strings from real data sets randomly to the point generated.
The default experimental parameters are summarized below.
Symbol Definition Default Valueθ query area percentage of data space 3%N size of points set 2,000,000l signature length 50τ edit distance threshold 2d dimensionality 2
20-1
SAS range queries: impact of the signature size
20-2
SAS range queries: impact of the signature size
20-3
SAS range queries: impact of the signature size
20-4
SAS range queries: impact of the signature size
21-1
SAS range queries: query performance
21-2
SAS range queries: query performance
21-3
SAS range queries: query performance
21-4
SAS range queries: query performance
22-1
Selectivity estimation: relative errors
CA data set
22-2
Selectivity estimation: relative errors
TX data set
22-3
Selectivity estimation: relative errors
22-4
Selectivity estimation: relative errors
22-5
Selectivity estimation: relative errors
22-6
Selectivity estimation: relative errors
23-1
Selectivity estimation: cost of the adaptive estimator
23-2
Selectivity estimation: cost of the adaptive estimator
24-1
Conclusions
We designed MHR-tree for spatial approximate string queries.
24-2
Conclusions
We designed MHR-tree for spatial approximate string queries.
We designed novel selectivity estimator for SAS range queries,which take into account both the spatial and string distribu-tions.
24-3
Conclusions
We designed MHR-tree for spatial approximate string queries.
We designed novel selectivity estimator for SAS range queries,which take into account both the spatial and string distribu-tions.
Future work includes examining spatial approximate sub-stringqueries, and using the KMV synopsis to improve the perfor-mance.
25-1
The End
THANK Y OU
Q and A
top related