efficient olap operations for spatial data using p-trees baoying wang, fei pan, dongmei ren, yue...

26
Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Upload: april-lawrence

Post on 05-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Efficient OLAP Operations

for Spatial Data Using P-Trees

Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding

William Perrizo

North Dakota State University

Page 2: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

OUTLINE

Introduction Review Of Peano Trees (P-trees) OLAP Operations Using P-trees

Peano Data Cubes (PD-Cubes) OLAP Operations

Performance Analysis Conclusion

Page 3: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

INTRODUCTION

Efficient OLAP for spatial data warehouses is in great demand

Spatial warehouses is growing with more and more spatial data, such as remotely sensed images, geographical information, digital sky survey data

The data in a warehouse are conceptually modeled as data cubes (Gray et al, 1997)

Page 4: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

INTRODUCTION (Cont.)

OLAP queries are complex and time consuming

Two major approaches to speed up OLAP Using index structures Operating on compressed data.

Bitmap index are space inefficient for high cardinality attributes, and are only suitable for narrow domains.

Page 5: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Our Approach on OLAP

A new data warehousing structure, PD-cube, is developed to facilitate OLAP operations and queries

Fast logical operations of P-Trees are used to accomplish OLAP operations.

Predicate P-trees are used to efficiently reduce data accesses by filtering out “big holes” consisting of consecutive 0’s

Page 6: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

REVIEW OF PEANO TREES (Ptrees)

The Ptree is a quadrant-based tree structure (assuming a 2-dimensional image; more generally, for n-dimensional data, an n-polytant tree)

It is used to facilitate compression and very fast logical operations on bit sequential (bSQ) data (Perrizo, 2001) Ptrees can be 1-dimensional, 2-dimensional, 3-dimensional…

The most useful form of a Ptree is the predicate-Ptree: e.g., Pure1 Ptree (P1tree) and NonPure0 Ptree (NP0-tree)

Page 7: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

bSQ File and a Pure1 tree (P1-tree)

0

1 0 0 0

0 0 1 0 1 1 0 1

1 1 1 0 0 0 1 0 1 1 0 1

0

01 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0

1 0

0 1 1 1 1

1 1 1 0

0

0 0 1 0

0

1 1

0

0 1

0

P1-tree: Tree node=1 iff that sub-quadrant is purely 1-bits.

1111110011111000111111001111111011110000111100001111000001110000

Page 8: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

55

16 8 15 16

3 0 4 1 4 4 3 4

1 1 1 0 0 0 1 0 1 1 0 1

An Count Ptree(NOTE: usually counts are the ultimate goal, but Pure1 trees are easier to work with and produce the needed counts quite quickly)

Peano or Z-ordering Pure (Pure-1/Pure-0) quadrant Root Count

Level Fan-out QID (Quadrant ID)

1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1

0 1 2 3

111

( 7, 1 ) ( 111, 001 ) 10.10.11

2

3

2 . 2 . 3

001

Page 9: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

BSQ File and a NP0-tree

NP0-tree: Node=1 iff that sub-quadrant is not pure zero. (more general; <predicate>-Ptree: node=1 iff sub-quad satisfies <predicate>

1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0

1

1 1 1 0

1 0 1 1 1 1 1 1

1 1 1 0 0 0 1 0 1 1 0 1

Page 10: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Logical Operations of P-trees

Operations are level by level Consecutive 0’s holes can be filtered out We only need to load quadrant with Qid 2 for

ANDing NP0-tree1 and NP0-tree2.

Page 11: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

OLAP OPERATIONS USING P-TREES

1. Peano Data Cube (PD-cube)

2. OLAP Operations

1) Slice/Dice

2) Rollup

3. Performance Analysis

Page 12: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Peano Data Cube (PD-cube)

The data cube is partitioned by bit position

Each bit-wised data cube is in Peano order Take advantage of the continuity and sparseness of

spatial data

An example: a 3-D data cube representing the crop yield with three dimensions: X-coordinate, Y-coordinate, and time T.

Page 13: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

A Fact Table and the PD-cubesX Y T Yield

0 0 0 15 (1111)

1 0 0 15 (1111)

0 1 0 15 (1111)

1 1 0 15 (1111)

0 0 1 15 (1111)

1 0 1 15 (1111)

0 1 1 15 (1111)

1 1 1 15 (1111)

2 0 0 15 (1111)

3 0 0 4 (0100)

2 1 0 1 (0001)

3 1 0 12 (1100)

2 0 1 12 (1100)

3 0 1 2 (0010)

2 1 1 12 (1100)

3 1 1 12 (1100)

0 2 0 15 (1111)

1 2 0 15 (1111)

0 3 0 2 (0010)

1 3 0 0 (0000)

0 2 1 15 (1111)

1 2 1 15 (1111)

0 3 1 2 (0010)

Page 14: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

OLAP Query Examples

“Find all galaxies brighter than magnitude 22.”

“Find average crop yield in a field. ”

“Find area of the region with the color red.”

“Find total traffic flow during a given period.”

Page 15: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Slice/Dice Operations

Typical select statements may have a number of predicates in their “where” clause.

The predicates may include “=”, “<” and “>”.

These predicates lead to two different query scenarios: equal queries (“=”) and range queries (“<” or “>”).

Page 16: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Equal Select Slice Example

Suppose we have a 3-D data cube representing crop yield with dimensions X, Y and T, where X = {0, 1}, Y = {0, 1} and T = {0, 1}.

X Y T Yield

0 0 0 111

1 0 0 011

0 1 0 001

1 1 0 100

0 0 1 011

1 0 1 010

0 1 1 100

1 1 1 001

1 4

7 3

3 2

1

Page 17: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Ptrees for 3-D Cube Example

Pij is a Ptree for the jth bit of the ith attribute.

Page 18: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Slice: “Get yield where Y = 1”

First get Ptree masks, and then trim all Ptrees accordingly.

1 4

7 3

3 2

1

Page 19: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Range Slice: Get yield where Y >1001

Data set {“Y > 1001”} consists of two subsets {“Y = 11**”} and {“Y = 101*”}, where * is 1 or 0.

The query clause can be written as “where Y = 11** || Y = 101*”.

The query is retrieved by Ptree mask

PMgt = PM1 || PM2.

PM1 = P21 & P22

PM2 = P21 & P’22 & P27

11111110 1101110010111010-------1001

11**

101*

P21 & P22

P21 & P’22 & P23

Page 20: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Other Properties of Range Queries

Combination of an Equality Query and a Range Query

Divide {“T01011001”} into 2 subsets {“T>01011001”}, {“T=01011001”}

Complement of a Range Query

Data set {“T01011001”} is the complement of {“T>01011001”}

With the result of query “Get yield where T>01011001, we can easily retrieve query “Get yield where T01011001” by making the complement, i.e. PMle = PM’gt.

Page 21: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Rollup Operations

PD-cube is stored in Peano order rather than in raster order. Therefore, the rollup of PD-cube is accomplished differently from the traditional data cube as a result of different storage models.

According to the Peano storage of PD-cube, we develop the recursive rollup algorithm.

Page 22: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Rollup of “Yield” along Dimension T

S2[ ] = {1, 1, 1, 0} S1 [ ] = {0, 1, 1, 1} S0 [ ] = {2, 1, 1, 1} S [ ] = {8, 7, 7, 7}

S[i] = S2[i] x 22 + S1[i] x 21 + S0[i] x 20

0 1

1 0

0

0 0

1 1

0

1 0

1 1

1

001 100

111 011

011 010 0 0 1 1 1 0

001

1 1

0 1

2 1

4 4

3

1

1

7

0 1

1 0

0

0 0

1 1

0

1 0

1 1

1

8 7

0 0 1 1 1 0 3 2

1 0 1 1 1 1 7 7

Page 23: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

PERFORMANCE ANALYSIS:

Compare our algorithm with bitmap indexed data cube method

The data is prepared in five sizes, 128x128, 256x256, 512x512, 1024x1024, and 2048x2048.

Response Time Comparison

100

1000

10000

100000

100 1000 10000 100000 1000000

Cube Size (KB)

Res

po

nse

Tim

e (m

s)

Bitmap PD-cube

When cube size > 1900KB, our method outperforms bitmap indexed data cube method.

As the cube size increases, there is a drastic increase in response time for bitmap indexed data cube method.

Page 24: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

CONCLUSION

A general spatial data warehousing structure, PD-cube, is presented to facilitate OLAP operations.

The fast logical operations of Ptrees are used to accomplish these operations.

Predicate Ptrees are used to find the “big holes” of consecutive 0’s by performing logical operations.

Experiments indicate OLAP operations using Ptrees is much faster than traditional data cube methods.

Page 25: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

FUTURE WORK

One future research direction is to extend our PD-cube into parallel data warehouse systems. It appears to be particularly promising to partition

large cubes horizontally or vertically (or both) into small cubes to improve the query performance through parallelism.

Page 26: Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

Thank you.