performance comparison of xbr -trees and r*-trees for single dataset spatial queries

Performance Comparison of xBR-trees and R*-trees for Single Dataset Spatial

Queries

Performance Comparison of xBR-trees and R*-trees for Single Dataset Spatial

Queries

George Roumelis(Master in Information Systems, Open University of Cyprus, Cyprus)

Michael Vassilakopoulos (*)

(Dept. of Computer Science and Biomedical Informatics, University of Central Greece, Greece)

Antonio Corral(Department of Languages and Computing, University of Almeria, Spain)

* speaker

Fifteenth East-European Conference onFifteenth East-European Conference on

Advances in Databases and Information Advances in Databases and Information SystemsSystems

September 19th–23rd, 2011, Vienna, September 19th–23rd, 2011, Vienna, AustriaAustria

Outline

Motivation Contribution and Background R-tree and R*-tree XBR-tree, Internal Nodes and Leaf Nodes Single Dataset query processing on XBR-

trees Experimental Results: Settings,Tree

Building, Single Dataset Queries Conclusions and Future Work

2

Motivation In applications, a variety of Spatial Queries arise,

involving one spatial dataset, like Point Location, Window, Distance Range , Nearest Neighbor Queries

involving two spatial datasets, like Distance Join, Closest Pair, All-Nearest Neighbor Queries

Usually, there is no overall performance winner Many researchers have compared different Access

Methods, regarding their I/O and execution time performance, for a variety a Spatial Queries

In this work, we compare the popular R*-tree and the External Balanced Regular (xBR) tree for Single Dataset Spatial Queries

3

ContributionWe implement the xBR-tree and present conclusions arising from the (real data based) experimental comparison of xBR-trees and R*-trees regarding I/O Performance and Execution Time for

Tree building Point Location Queries (PLQs) Window Queries (WQs) Distance Range Queries (DRQs) K-Nearest Neighbor Queries (K-NNQs) Constrained K-Nearest Neighbor Queries (CK-NNQs)

4

Background (1) Given an index I and a query point q, the PLQ

returns true if q belongs to I and false otherwise

Given an index I and a query rectangle r, the result of the WQ is the set of all points in I that are completely inside r

Given an index I, a query point q and a distance threshold delta >= 0, the DRQ returns all points of I that are within the delta distance from q (according to a distance function)

5

Background (2) Given an index I, a query point q, and a value

K > 0, the K-NNQ returns K points of I which are closest to q (according to a distance function)

Given an index I, a query point q, a value K > 0 and a distance threshold delta >= 0, the CK-NNQ returns K closest points of I which are within delta distance from q (according to a distance function)

6

R10 R11 R12

R1 R2 R3 R4 R5 R6 R7 R8 R9

Nodes that contain points

R-Tree

Nested MBRs can be organized as a tree (R-tree)

R1

R2R5

R3

R7R9

R6

R4

R8

R10 R11

R12

7

Clusters of spatial objects can be recursively grouped into Minimum Bounding Rectangles – MBRs

R*-Tree

8

The R*-tree is the most popular R-tree variation

It added two major enhancements to the original R-tree, when a node overflow is caused

First, rather than just considering the area, the node-splitting algorithm in R*-trees also minimizes the perimeter and overlap enlargement of the MBRs

Second, an overflowed node is not split immediately, but a portion of entries of the node is reinserted from the top of the R*-tree (forced reinsertion)

xBR-trees can be defined for various dimensions

For the ease of exposition, we assume 2 dimensions

For 2 dimensions the hierarchical decomposition of space is that of Quadtrees.

The space indexed by an xBR-tree is a square, expressed in a coordinate system of real numbers

The nodes of xBR-trees are disk pages and are distinguished in two kinds: leaves, which store the actual multidimensional points and internal nodes, which provide a multiway indexing mechanism for these data

xBR-Tree

9

Each node consists of a non-predefined number of entries of the form <shape, address, REG, pointer>

An address is formed by directional digits (NW, NE, SW, SE), determines the region of a child node and is accompanied by the pointer to this child

Shape is a flag that determines if the region of the child is a complete square (used widely in queries)

REG stores the coordinates of the region referenced by address (it is more expensive to calculate it)

The region of a child is the subquadrant of its address minus the subquarants of the next addresses in the same node

Internal Nodes

10

The region of the root is the original space (a quadrangle)

The * symbol denotes the end of an address

The address of the right child is 0*, since the region of this child is the NW quadrant of the original space

The address of the left child is *: its region is the whole space minus the region of the right child

An xBR-tree example

11

Queries are processed in a top-down manner on the xBR-tree, like on the R*-tree

During a PLQ for a specified point, the appropriate leaf is determined by descending the tree from the root (unlike the R*-tree, a single path is followed)

During WQs, we examine if the subquadrant of the current internal-node entry and the query window intersect and follow the pointer to the related child

We repeat until we have examined all entries of the internal node, or until the query window is completely inscribed inside the region of the entry that we examine

Processing PLQs and WQs

12

The DRQ follows the same strategy as the WQ

At first, the querying circle is replaced from its MBR (the calculations are faster in this way) and if the answer about the intersection of the subquadrant of the current entry and the query MBR is positive, then we follow the pointer to the related child at the next lower level

If we reach a leaf with a region that intersects the query MBR, we select the points in the leaf that are inside the query circle

Processing DRQs

13

The K-NNQ algorithm follows a DF tree traversal

In an internal node, entries are visited according to their mindist from the query point

The process is repeated recursively until the leaf level is reached, where a potential next NN is found

It is possible to reach a leaf, but the next NN may exist in a neighboring region. Thus, we use a global max K-heap and insert in it every point of this leaf that is nearest to the query point than the root of the heap

When the heap is full, according to a set of conditions*, the search is stopped

*Roumelis et al., Nearest Neighbor Algorithms Using xBR-Trees, PCI 2011

Processing K-NNQs

14

Experimental Results

We used 5 real datasets of different sizes (CSN: 98022 line-segments, NApp: 24493 points, NAcl: 9203 points, Narr: 191637 line-segments, Nard: 569120 line-segments)To create 2d point datasets from non-point datasets, we used the centroids of the line-segment MBRsEnvironment used: Linux machine, Intel core duo 2x2GHz processor, 3 GB of RAM, gcc Performance measurements: I/O activity (page accesses) and Execution Time

An extended set of experimental results is accessible from: http://delab.csd.auth.gr/~michalis/xBRsys/results

Experimental settings

15

Tree Construction

In all cases, the xBR-tree uses less space (i.e. it is more compact) and time than the R*-tree (the R*-tree creation is slower, partially, due to the use of forced reinsertion that improves searching efficiency)

The difference in creation time is enlarged as the size of node increases

16

Point Location Query

The xBR-tree needs less read accesses and executes every query faster than the R*-tree

The xBR-tree needs a number of disk accesses equal to its tree height, while the R*-tree needs at least this number of access and, in most cases, even more

17

Window Query (1)

The xBR-tree needs more page accesses As the size of node increases the I/O difference between the two

trees becomes smaller In both trees, a linear dependence of the number of accesses to the

size of the node appears This is due to reduction of tree height as the size of node increases

18

Window Query (2)

Query windows that were inhabited by points (non-empty windows) were used

The xBR-tree becomes clearly faster (execution time) for all sizes of nodes and the I/O efficiency of the two trees is closer

Main memory processing is simpler (and thus faster) for the xBR-tree

19

Distance Range Query

Τhe xBR-tree needs less disk accesses and is faster than the R*-tree, in all cases and for all datasets

The results were even better when the DRQs addressed only non-empty regions

20

K- Nearest Neighbor Query

The xBR-tree needs more disk accesses than the R*-tree, but the difference gets smaller when the size of node increases

Regarding the execution time, the xBR-tree shows improved performance, in relation to its I/O difference from the R*-tree

The worse time performance of both trees is for larger node sizes This is due to the fact that as the node size increases, the trees become

very wide and very short21

Constrained K-NN Query

Studying the results of several CK-NNQ experiments, we see that the xBR-tree is improved for both performance categories

Depending on the dataset, for non-empty regions the CK-NNQ time performance of the xBR-tree is almost the same to, or much better than the R*-tree

22

Conclusions We performed an extensive (real data based)

experimental comparison of the performance of xBR-trees and R*-trees, for spatial queries where a single index is involved

The conclusions arising from this comparison show that the two structures are competitive

The xBR-tree is smaller and is built faster than the R*-tree

The performance of the xBR-tree is higher for PLQs and DRQs and for WQs when the query window is non-empty

The R*-tree is better for K-NNQs and needs less disk access for CK-NNQs

The execution time winner for CK-NNQs depends on whether the query returns result points, or not 23

Future Work

To extend the xBR-tree for modeling empty regions too

To use in the xBR-tree internal-node address representation that facilitates main memory computation

To study the relative performance of the two structures using memory consuming BF algorithms

To study the relative performance of the two trees for two dataset (join) queries

To study the relative performance of the two structures in the presence of buffering

24

Thank you for your attention

[email protected]://users.ucg.gr/~mvasilako

performance comparison of xbr -trees and r*-trees for single dataset spatial queries

Documents

tree rtree

tree xbrtree

background rtree

original rtree

popular rtree variation

query rectangle r

delta distance

query point q