improving min/max aggregation over spatial objects donghui zhang, vassilis j. tsotras university of...

26
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

Upload: arron-martin

Post on 11-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

Improving Min/Max Aggregation over Spatial Objects

Donghui Zhang, Vassilis J. Tsotras

University of California, Riverside

ACM GIS’01

Page 2: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

Outline

• Problem Definition

• Straightforward Solutions

• Our Solution

• Performance Results

• By-Product: Optimized the MSB-tree

• Conclusions

ACM GIS’01

Page 3: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

Problem Definition• Consider a collection of spatial objects.

• Each object: rectangle r, value v.

• Spatial Aggregation: find aggregate value over objects intersecting a given rectangle. We focus on MAX.

• E.g.: a database of rainfalls over geographical areas. Find max rainfall in Los Angeles area.

Problem Definition

5 4

2

7

1

ACM GIS’01

Page 4: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

Straightforward Solutions• Use an R*-tree [BKS+90] to index the objects.

• Reduce to range search.

Straightforward Solutions

• Better approach: aR-tree [PKZ+01, LM01]. Store MAX of the sub-tree in internal nodes;

• If query rectangle contains a sub-tree, no need to search it.

ACM GIS’01

Page 5: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

Straightforward Solutions• Use an R*-tree [BKS+90] to index the objects.

• Reduce to range search.

Straightforward Solutions

• Better approach: aR-tree [PKZ+01, LM01]. Store MAX of the sub-tree in internal nodes;

• If query rectangle contains a sub-tree, no need to search it.

ACM GIS’01

Page 6: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

Our Solution -- overview• The MR-tree: a specialized index for Min/Max

aggregation. It uses the R*-tree and four optimization techniques:

Our Solution

k-max : increase the chance for the search algorithm to stop at higher tree levels;

box-elimination : erase information from the tree that will not contribute to any query;

union : do not insert an object which will not contribute to any query;

area-reduction : reduce the area of the object to be inserted.ACM GIS’01

Page 7: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The k-max Optimization• Motivation: The aR-tree is not efficient if the

query rectangle intersects but does not fully contain a sub-tree rectangle.

Optimization Techniques

8

7 4

1 5 7

5 9

2

4

9 6

4 2

ACM GIS’01

Page 8: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The k-max Optimization

Optimization Techniques

8

7 4

1 5 7

5 9

2

4

9 6

4 2

• Motivation: The aR-tree is not efficient if the query rectangle intersects but does not fully contain a sub-tree rectangle.

ACM GIS’01

Page 9: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The k-max Optimization• Along with each index record r, store the k

max-value objects in sub-tree(r).

• Upon query, if the query rectangle intersects any of the k objects at r, omit sub-tree(r).

Optimization Techniques

• Trade-off: larger k more sub-trees to be omitted during query; but also more space & update.

ACM GIS’01

Page 10: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The box-elimination Optimization• Motivation: if for objects o1 and o2 , o1.box

contains o2 .box and o1.value o2 .value, o2 is obsolete, i.e. does not contribute to any query and thus can be deleted.

Optimization Techniques

o1:7

o2:5

ACM GIS’01

Page 11: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The box-elimination Optimization• Similar for object o1 and index record r2 , i.e. if

o1.box contains r2 .box and o1.value max value in sub-tree(r2), the whole sub-tree is obsolete.

Optimization Techniques

• Trade-off: larger c smaller index size and faster query time; but also more update time.

• Ideally, remove all obsolete objects/sub-trees, but too expensive. Instead, pick c (c : constant) paths.

• The optimization: at insertion, remove obsolete objects and sub-trees along the insertion path.

ACM GIS’01

Page 12: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The union Optimization• Motivation 1: if a new object o1 is obsolete due to an

existing object o2 , o1 should not be inserted.

Optimization Techniques

• Motivation 2: a new object o1 may be obsolete due to the union of several existing objects.

o1: 2

8

7

ACM GIS’01

Page 13: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The union Optimization• Motivation 1: if a new object o1 is obsolete due to an

existing object o2 , o1 should not be inserted.

Optimization Techniques

• Motivation 2: a new object o1 may be obsolete due to the union of several existing objects.

o1: 2

8

7

ACM GIS’01

Page 14: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The union Optimization• Along with each index record r, store the

union of boxes of all objects in sub-tree(r); also store the MIN value of all these objects.

• Do not perform the insertion of object o1 if:

Optimization Techniques

• Question: how is the union computed and stored?

o1.box is contained in r.union, and

o1.value r.min.

ACM GIS’01

Page 15: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The union Optimization• Store an approximate union representation using t

(t : constant) boxes.

• The approximation should be fully contained in the actual union, and should cover as much space as possible.

Optimization Techniques

• Def: given a set of n boxes S={s1,…, sn}, the covered t-union of S is a set of t boxes A={a1,…, at} s.t. si covers ai , and

ai covers max area possible.

ACM GIS’01

Page 16: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The union Optimization

Optimization Techniques

• To compute the exact covered t-union: O(n

2t+4).

• We propose an much faster approximate algorithm: O(n logn).

ACM GIS’01

• Idea of our algorithm: pick up t largest boxes and expand them.

Page 17: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The area-reduction Optimization• Motivation: the box of a new object o1 can be reduced if

an existing object o2 intersects it with a larger or equal value.

Optimization Techniques

o2: 8 o1: 6

ACM GIS’01

Page 18: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The area-reduction Optimization• Motivation: the box of a new object o1 can be reduced if

an existing object o2 intersects it with a larger or equal value.

Optimization Techniques

o2: 8 o1: 6

ACM GIS’01

Page 19: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The area-reduction Optimization• Reduce the area of new object o1 when:

Optimization Techniques

index record r s.t. r.union intersects o.box and r.min o.value, or

one of the k max-value objects intersects o1 with a larger or equal value, or

leaf object o2 s.t. o2 .box intersects o1.box and o2 .value o1.value .

ACM GIS’01

Page 20: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The area-reduction Optimization• Benefit 1: reduce overlap among sibling nodes.

Optimization Techniques

8

r1 (min=9)

r2 (min=7)

new object

ACM GIS’01

Page 21: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

The area-reduction Optimization• Benefit 1: reduce overlap among sibling nodes.

Optimization Techniques

• Benefit 2: increase chance to make new objects obsolete.

8

r1 (min=9)

r2 (min=7)

actual object inserted

ACM GIS’01

Page 22: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

Performance Results• Datasets: 5 million square objects, size randomly chosen

from 10 to 10000 (space in each dimension is 1 to one million).

• Implemented algorithms:

Performance Results

R*: the R*-tree [BKS+90];

aR: the aR-tree [PKZ+01, LM01];

kaR: the aR-tree with k-max optimization;

MR: the MR-tree (with all the optimizations).

ACM GIS’01

Page 23: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

Index Sizes

Performance Results

R* aR kaR MR

0

25

50

75

100

125

150In

dex

Siz

es (

#MB

)

ACM GIS’01

Page 24: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

Performance Results

Query Performance (log scale)

• Query time is the total of 100 random queries of the same query rectangle size.

0.0001 0.001 0.01 0.1 1 10 50 .

0.01

0.1

1

10

100

1000

10000

R*

aR

kaR

MR

Query Rectangle Area (%)

Que

ry T

ime

(#se

c)

ACM GIS’01

Page 25: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

Optimizing the MSB-tree• The MSB-tree [YW00]: efficiently maintains and computes

MIN/MAX aggregates over 1-dim interval data.

• Insertion/Query: O(logB m), B is page capacity, m is number of leaf records.

• [YW00]: periodically reconstruct the whole tree to maintain a small m. During reconstruction, the index is off-line.

• Can avoid reconstruction by applying the box-elimination optimization. Idea: if a new interval contains all intervals in a sub-tree with a larger value, the sub-tree is obsolete.

Optimizing the MSB-treeACM GIS’01

Page 26: Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01

Conclusions

• Addressed the MIN/MAX aggregation problem over spatial objects;

• Four optimization techniques;

• The MR-tree;

• Much smaller index size and query time;

• By-product: optimized the MSB-tree.

ConclusionsACM GIS’01