improving min/max aggregation over spatial objects donghui zhang, vassilis j. tsotras university of...

Post on 11-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Improving Min/Max Aggregation over Spatial Objects

Donghui Zhang, Vassilis J. Tsotras

University of California, Riverside

ACM GIS’01

Outline

• Problem Definition

• Straightforward Solutions

• Our Solution

• Performance Results

• By-Product: Optimized the MSB-tree

• Conclusions

ACM GIS’01

Problem Definition• Consider a collection of spatial objects.

• Each object: rectangle r, value v.

• Spatial Aggregation: find aggregate value over objects intersecting a given rectangle. We focus on MAX.

• E.g.: a database of rainfalls over geographical areas. Find max rainfall in Los Angeles area.

Problem Definition

5 4

2

7

1

ACM GIS’01

Straightforward Solutions• Use an R*-tree [BKS+90] to index the objects.

• Reduce to range search.

Straightforward Solutions

• Better approach: aR-tree [PKZ+01, LM01]. Store MAX of the sub-tree in internal nodes;

• If query rectangle contains a sub-tree, no need to search it.

ACM GIS’01

Straightforward Solutions• Use an R*-tree [BKS+90] to index the objects.

• Reduce to range search.

Straightforward Solutions

• Better approach: aR-tree [PKZ+01, LM01]. Store MAX of the sub-tree in internal nodes;

• If query rectangle contains a sub-tree, no need to search it.

ACM GIS’01

Our Solution -- overview• The MR-tree: a specialized index for Min/Max

aggregation. It uses the R*-tree and four optimization techniques:

Our Solution

k-max : increase the chance for the search algorithm to stop at higher tree levels;

box-elimination : erase information from the tree that will not contribute to any query;

union : do not insert an object which will not contribute to any query;

area-reduction : reduce the area of the object to be inserted.ACM GIS’01

The k-max Optimization• Motivation: The aR-tree is not efficient if the

query rectangle intersects but does not fully contain a sub-tree rectangle.

Optimization Techniques

8

7 4

1 5 7

5 9

2

4

9 6

4 2

ACM GIS’01

The k-max Optimization

Optimization Techniques

8

7 4

1 5 7

5 9

2

4

9 6

4 2

• Motivation: The aR-tree is not efficient if the query rectangle intersects but does not fully contain a sub-tree rectangle.

ACM GIS’01

The k-max Optimization• Along with each index record r, store the k

max-value objects in sub-tree(r).

• Upon query, if the query rectangle intersects any of the k objects at r, omit sub-tree(r).

Optimization Techniques

• Trade-off: larger k more sub-trees to be omitted during query; but also more space & update.

ACM GIS’01

The box-elimination Optimization• Motivation: if for objects o1 and o2 , o1.box

contains o2 .box and o1.value o2 .value, o2 is obsolete, i.e. does not contribute to any query and thus can be deleted.

Optimization Techniques

o1:7

o2:5

ACM GIS’01

The box-elimination Optimization• Similar for object o1 and index record r2 , i.e. if

o1.box contains r2 .box and o1.value max value in sub-tree(r2), the whole sub-tree is obsolete.

Optimization Techniques

• Trade-off: larger c smaller index size and faster query time; but also more update time.

• Ideally, remove all obsolete objects/sub-trees, but too expensive. Instead, pick c (c : constant) paths.

• The optimization: at insertion, remove obsolete objects and sub-trees along the insertion path.

ACM GIS’01

The union Optimization• Motivation 1: if a new object o1 is obsolete due to an

existing object o2 , o1 should not be inserted.

Optimization Techniques

• Motivation 2: a new object o1 may be obsolete due to the union of several existing objects.

o1: 2

8

7

ACM GIS’01

The union Optimization• Motivation 1: if a new object o1 is obsolete due to an

existing object o2 , o1 should not be inserted.

Optimization Techniques

• Motivation 2: a new object o1 may be obsolete due to the union of several existing objects.

o1: 2

8

7

ACM GIS’01

The union Optimization• Along with each index record r, store the

union of boxes of all objects in sub-tree(r); also store the MIN value of all these objects.

• Do not perform the insertion of object o1 if:

Optimization Techniques

• Question: how is the union computed and stored?

o1.box is contained in r.union, and

o1.value r.min.

ACM GIS’01

The union Optimization• Store an approximate union representation using t

(t : constant) boxes.

• The approximation should be fully contained in the actual union, and should cover as much space as possible.

Optimization Techniques

• Def: given a set of n boxes S={s1,…, sn}, the covered t-union of S is a set of t boxes A={a1,…, at} s.t. si covers ai , and

ai covers max area possible.

ACM GIS’01

The union Optimization

Optimization Techniques

• To compute the exact covered t-union: O(n

2t+4).

• We propose an much faster approximate algorithm: O(n logn).

ACM GIS’01

• Idea of our algorithm: pick up t largest boxes and expand them.

The area-reduction Optimization• Motivation: the box of a new object o1 can be reduced if

an existing object o2 intersects it with a larger or equal value.

Optimization Techniques

o2: 8 o1: 6

ACM GIS’01

The area-reduction Optimization• Motivation: the box of a new object o1 can be reduced if

an existing object o2 intersects it with a larger or equal value.

Optimization Techniques

o2: 8 o1: 6

ACM GIS’01

The area-reduction Optimization• Reduce the area of new object o1 when:

Optimization Techniques

index record r s.t. r.union intersects o.box and r.min o.value, or

one of the k max-value objects intersects o1 with a larger or equal value, or

leaf object o2 s.t. o2 .box intersects o1.box and o2 .value o1.value .

ACM GIS’01

The area-reduction Optimization• Benefit 1: reduce overlap among sibling nodes.

Optimization Techniques

8

r1 (min=9)

r2 (min=7)

new object

ACM GIS’01

The area-reduction Optimization• Benefit 1: reduce overlap among sibling nodes.

Optimization Techniques

• Benefit 2: increase chance to make new objects obsolete.

8

r1 (min=9)

r2 (min=7)

actual object inserted

ACM GIS’01

Performance Results• Datasets: 5 million square objects, size randomly chosen

from 10 to 10000 (space in each dimension is 1 to one million).

• Implemented algorithms:

Performance Results

R*: the R*-tree [BKS+90];

aR: the aR-tree [PKZ+01, LM01];

kaR: the aR-tree with k-max optimization;

MR: the MR-tree (with all the optimizations).

ACM GIS’01

Index Sizes

Performance Results

R* aR kaR MR

0

25

50

75

100

125

150In

dex

Siz

es (

#MB

)

ACM GIS’01

Performance Results

Query Performance (log scale)

• Query time is the total of 100 random queries of the same query rectangle size.

0.0001 0.001 0.01 0.1 1 10 50 .

0.01

0.1

1

10

100

1000

10000

R*

aR

kaR

MR

Query Rectangle Area (%)

Que

ry T

ime

(#se

c)

ACM GIS’01

Optimizing the MSB-tree• The MSB-tree [YW00]: efficiently maintains and computes

MIN/MAX aggregates over 1-dim interval data.

• Insertion/Query: O(logB m), B is page capacity, m is number of leaf records.

• [YW00]: periodically reconstruct the whole tree to maintain a small m. During reconstruction, the index is off-line.

• Can avoid reconstruction by applying the box-elimination optimization. Idea: if a new interval contains all intervals in a sub-tree with a larger value, the sub-tree is obsolete.

Optimizing the MSB-treeACM GIS’01

Conclusions

• Addressed the MIN/MAX aggregation problem over spatial objects;

• Four optimization techniques;

• The MR-tree;

• Much smaller index size and query time;

• By-product: optimized the MSB-tree.

ConclusionsACM GIS’01

top related