indexing multi-dimensional uncertain data with arbitrary probability density functions
DESCRIPTION
Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions. Yufei Tao, Reynold Cheng, Xiaokui Xiao, Wang Kai Ngai, Ben Kao, Sunil Prabhakar City University of Hong Kong Hong Kong Polytechnic University University of Hong Kong Purdue University. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/1.jpg)
Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions
Yufei Tao, Reynold Cheng, Xiaokui Xiao, Wang Kai Ngai, Ben Kao, Sunil Prabhakar
City University of Hong Kong
Hong Kong Polytechnic University
University of Hong Kong
Purdue University
![Page 2: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/2.jpg)
Multi-dimensional Uncertain Data
Moving objects An object sends its location to a server whenever its distance
from the previously reported location is larger than certain threshold.
Sensor readings Each sensor reports the temperature, humidity, UV index, …,
in its neighborhood periodically.
Querying the (uncertain) data stored in the server directly is meaningless.
![Page 3: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/3.jpg)
Uncertainty Modeling
Client 1
distance threshold
recorded locationin database
uncertaintyregion
An object’s location is described by a probability density function.
![Page 4: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/4.jpg)
Probabilistic Range Search
Client 2
Client 1
Client 4
Client 3
Client 5
Client 6
rq (The area of CityU)
Find the clients that are currently in CityU with at least 50% probability (probabilistic range query) (probability threshold)
![Page 5: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/5.jpg)
Appearance Probability
apperance probability:
x
ur(uncertainty region) rq
(query region)
rq ∩ ur
Client 1
E.g., uniform pdf:
![Page 6: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/6.jpg)
Appearance Probability
o.urrq
o.ur ∩ rq
o
must be calculated numerically
Calculation time of an appearance probability in 2D space: 1.3ms
Time for a random access: 10ms
![Page 7: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/7.jpg)
A good solution should…
Support any pdf. Minimize the number of page accesses. Minimize the number of appearance probabilit
y calculations.
Minimize the total cost (I/O + CPU)
![Page 8: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/8.jpg)
Main Idea
Pre-compute some “auxiliary information” that can be used to efficiently decide whether an object appears in a
region with at least a certain probability without calculating its actual appearance
probability.
![Page 9: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/9.jpg)
Quick Examples
o.urrqo.urrq
pq=20%
![Page 10: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/10.jpg)
Probabilistically Constrained Regions (PCR)
o.ur
l1-
app. prob. = 0.2
l1+
app. prob. = 0.2
l2+
app. prob. = 0.2
l2-
app. prob. = 0.2 l1- l1+
l2-
l2+
o.pcr(0.2)
![Page 11: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/11.jpg)
Probabilistically Constrained Regions (PCR)
o.pcr(0.2)
l1-
app. prob. = 0.2
rq
l1+
app. prob. = 0.2
rq
For a query q with search region rq and probability pq= 0.2
Observation 1.1 (pruning)
an object o can not satisfy q if rq does not intersect o.pcr(0.2)
l2-
app. prob. = 0.2
rq
rq
![Page 12: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/12.jpg)
Probabilistically Constrained Regions (PCR)
l1+
app. prob. = 0.2
o.pcr(0.2)rq
For a query q with search region rq and probability pq= 0.8
Observation 1.2 (pruning)
an object o can not satisfy q if rq does not fully contain o.pcr(0.2)
(= 1 – 0.2)
rq
l1+ l1+
app. prob. = 0.8
![Page 13: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/13.jpg)
Probabilistically Constrained Regions (PCR)
o.pcr(0.2)
l1-
o.MBR
l1-
app. prob. = 0.2
A query q with search region rq and probability pq= 0.2
Observation 1.3 (validating)
an object o definitely satisfies q if rq fully contains the part of o.MBR on the left of l1- (or on the right of l1+ or below l2- or above l2+)
rq
![Page 14: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/14.jpg)
Probabilistically Constrained Regions (PCR)
o.MBRrq
l1+l1+
app. prob. = 0.2
A query q with search region rq and probability pq= 0.8
Observation 1.4 (for validating)
an object o definitely satisfies q if rq fully contains the part of o.MBR on the left of l1+ (or on the right of l1- or below l2+ or above l2-)
l1+
app. prob. = 0.8
![Page 15: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/15.jpg)
Probabilistically Constrained Regions (PCR)
l1+l1-
app. prob. = 0.2
app. prob. = 0.2
app. prob. = 0.6
l1+l1-
app. prob. = 0.2
app. prob. = 0.2
A query q with search region rq and probability pq= 0.6
Observation 1.5 (for validating)
an object o must satisfy q if rq fully contains the part of o.MBR between l1- and l1+ (or between l2- and l2+)
=(1 – 2 * 0.2)
l1-
o.MBR
l1+
rq
![Page 16: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/16.jpg)
Probabilistically Constrained Regions (PCR)
o.pcr(0.2) provides 5 heuristics to reduce CPU cost
In general, for a prob-range query with probability threshold pq
if pq <= 0.5 o may be pruned using o.pcr( pq ) observation 1.1 o may be validated using o.pcr( pq ) observation 1.3 o may be validated using o.pcr( (1 - pq)/2 ) observation 1.5
if pq > 0.5 o may be pruned using o.pcr( 1 - pq ) observation 1.2 o may be validated using o.pcr( 1 - pq ) observation 1.4 o may be validated using o.pcr( pq /2 ) observation 1.5
pq in [0, 1] → infinite number of pq
→ infinite number of PCRsImpractical!
It is possible to use a finite number of PCRs to achieve pruning and validating.
![Page 17: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/17.jpg)
Using PCRs in a Conservative Way
o.pcr(0.2)
o.pcr(0.25)
o.pcr(0.3)
rq
for a query q with search region rq and probability pq= 0.25
Observation 1.1
E.g., U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }
Observation 2.1
an object o cannot satisfy q if rq does not intersect o.pcr(0.2)
an object o cannot satisfy q if rq does not intersect o.pcr(0.25)
rq
![Page 18: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/18.jpg)
Using PCRs in a Conservative Way
o.pcr(0.2)
o.pcr(0.25)
o.pcr(0.3)
rq
for a query q with search region rq and probability pq= 0.75
Observation 1.2
U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }
Observation 2.2
an object o cannot satisfy q if rq does not fully contain o.pcr(0.25)
an object o cannot satisfy q if rq does not fully contain o.pcr(0.3)
rq
![Page 19: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/19.jpg)
U-catalog Size m
{0, 0.5}, m = 2
{0, 0.25, 0.5}, m = 3
{0, 0.1, 0.2, 0.3, 0.4, 0.5}, m = 6
…
larger m → more PCRs → greater pruning/validating power
→ less CPU cost
larger m → higher space consumption
→ larger I/O cost
m = 9
![Page 20: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/20.jpg)
0
0.1
0.2
0.3
0.4
0.5
p
x
Conservative Functional Boxes (CFB)
o.pcr(…)U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }
o.pcr : 2m values for each dimension
o.cfbout : 4 values for each dimensiono.cfbin : 4 values for each dimensiontotal : 8 values
m = 98 : 18
o.cfbxout
o.cfbxin
![Page 21: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/21.jpg)
Conservative Functional Boxes (CFB)
0
0.1
0.2
0.3
0.4
0.5
o.pcr(0.2)
o.cfbout
o.cfbout(0.2)
rq
for a query q with search region rq and probability pq= 0.25
Observation 1.1
U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }
Observation 2.1
an object o cannot satisfy q if rq does not intersect o.pcr(0.2)
an object o cannot satisfy q if rq does not intersect o.pcr(0.25)
Observation 3.1
an object o cannot satisfy q if rq does not intersect o.cfbout(0.2)
![Page 22: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/22.jpg)
Conservative Functional Boxes (CFB)
for a query q with search region rq and probability pq= 0.75
Observation 1.2
U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }
Observation 2.2
an object o cannot satisfy q if rq does not fully contain o.pcr(0.3)
an object o cannot satisfy q if rq does not fully contain o.pcr(0.25)
Observation 3.2
an object o cannot satisfy q if rq does not fully contain o.cfbin(0.3)
0
0.1
0.2
0.3
0.4
0.5
o.pcr(0.3)
o.cfbin
o.cfbin(0.3)
rq
![Page 23: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/23.jpg)
Comparing CFBs with PCRs
CFBs have weaker pruning/validating power than PCRs
But CFBs require less space than PCRs
PCR1 PCR2 …… PCRm
Using PCRs2·m·d values
CFBout CFBin
Using CFBs8·d values
0
0.1
0.2
0.3
0.4
0.5o.cfbout
o.cfbin
p
x
o.pcr
![Page 24: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/24.jpg)
Finding Conservative Functional Boxes
goal: minimize
for the i th dimension, minimize
with the following constrains:
Linear Programming: Simplex Method
0
0.1
0.2
0.3
0.4
0.5o.cfbi-
out
p
x
o.cfbi+out
αi-out αi+
out
arctan(-βi-out)
arctan(βi+out)
![Page 25: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/25.jpg)
More in Our Paper
The U-treea dynamic index designed to accelerate prob-range queries.
![Page 26: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/26.jpg)
Experimental Results
data space: [0, 10000]d
uncertainty region shape: circle (sphere)
uncertainty region radius: 250
data set: Long Beach County (LB): 53k 2D objects, uniform pdf
California (CA): 62k 2D objects, Gaussian pdf
Aircraft: 100k 3D objects, uniform pdf
query set: 100 queries for each data set with various sizes of rq and different pq
![Page 27: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/27.jpg)
Experimental Results
![Page 28: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/28.jpg)
Experimental Results
Query performance vs. search region size (LB, pq = 0.6)
![Page 29: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/29.jpg)
Experimental Results
Query performance vs. search region size (CA, pq = 0.6)
![Page 30: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/30.jpg)
Experimental Results
Query performance vs. search region size on (Aircraft, pq = 0.6)
![Page 31: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/31.jpg)
Experimental Results
Query performance vs. probability threshold on (LB, qs = 1500)
![Page 32: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/32.jpg)
Experimental Results
Query performance vs. probability threshold on (CA, qs = 1500)
![Page 33: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/33.jpg)
Experimental Results
Query performance vs. probability threshold on (Aircraft, qs = 1500)
![Page 34: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions](https://reader035.vdocument.in/reader035/viewer/2022062809/568159a6550346895dc708b1/html5/thumbnails/34.jpg)
Summary
A fast method for answering probabilistic range search queries.