indexing for multidimensional data an introduction

19
Indexing for Multidimensional Data An Introduction

Upload: paul-horn

Post on 31-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Indexing for Multidimensional Data An Introduction

Indexing for Multidimensional Data

An Introduction

Page 2: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 2Jaruloj Chongstitvatana

Applications of Multidimensional Databases

• Databases with multiple-attribute key

• Spatial databases

• Geographic information system (GIS)

• Computer-aided design (CAD)

• Multimedia databases

• Medical applications

Page 3: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 3

Characteristics of Good Index Structures

• Dynamic

• Operations– Queries

• Point queries

• Range queries

• Spatial queries

– Insert

– Delete

• Simplicity

• Performance– Disk accesses

– Running time

– Storage utilization• Low % of waste space

• Memory

• Disk

• Scalability– Data size

– Data dimension

Jaruloj Chongstitvatana

Page 4: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 4

Why Hierarchical Structures

ADVANTAGES

• Allow the search to be focused on interesting subset of data

• Eliminate useless search• Clean and simple

implementation

DISADVANATGES

• Parallelism

Jaruloj Chongstitvatana

Page 5: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 5

Types of Data

• Multi-dimension point data– Database with multiple-attribute key– Point in 2D or 3D

• Interval data• Multi-dimension region data• High-dimensional point data

– Data mining

Jaruloj Chongstitvatana

Page 6: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 6Jaruloj Chongstitvatana

Comparison

B tree• Binary tree• Unbalanced• Organize data• Memory-based index

– Measuring the running time

• Practical memory size

B+ tree

• N-ary tree

• Height-balanced

• Organize data space

• Disk-based index– Measuring the number

of disk accesses

• Disk page size

Page 7: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 7Jaruloj Chongstitvatana

B tree

10

4

9

20

6

7

Page 8: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 8Jaruloj Chongstitvatana

B+ tree

6 11 14 48 19 22

16 31

Page 9: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 9

B+ tree

Jaruloj Chongstitvatana

• N-ary tree• Increase the breadth of trees to decrease the height• Used for indexing of large amount of data (stored in

disk)

Page 10: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 10

Example

Jaruloj Chongstitvatana

12 52 78

83 91

60 69 19 26 37 46

4 8

012 70

717677

7980818283

858690

9395979899

5456575960

61626667

13141719

20212226

27283135

384445

4950

567

891112

Page 11: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 11

Properties of B+ trees

For an M-ary B tree:• The root has up to M children.• Non-leaf nodes store up to M-1 keys, and have

between M/2 and M children, except the root.• All data items are stored at leaves.• All leaves have to same depth, and store

between L/2 and L data items.

Jaruloj Chongstitvatana

Page 12: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 12

Search

Jaruloj Chongstitvatana

12 52 78

83 91

60 69 19 26 37 46

4 8

012 70

717677

7980818283

858690

9395979899

5456575960

61626667

13141719

20212226

27283135

384445

4950

567

891112

Search for 66

Page 13: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 13

Insert

Jaruloj Chongstitvatana

12 52 78

83 91

60 69 19 26 37 46

4 8

012 70

717677

7980818283

858690

9395979899

5456575960

61626667

13141719

20212226

27283135

384445

4950

567

891112

Insert 55Split leave

Page 14: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 14

Insert

Jaruloj Chongstitvatana

12 52 78

83 91

60 69 19 26 37 46

4 8

012 70

717677

7980818283

858690

9395979899

5456575960

61626667

13141719

20212226

2728313536

384445

4950

567

891112

Insert 32Split leave

Insert key 31Split node

Insert key 31

Page 15: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 15Jaruloj Chongstitvatana

Handling multiple attributes

• Separate index structure for each attributes– Update all index structures for each record update.– Data are scattered in many disk pages.

a1 a2 a3

disk

a4

Page 16: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 16Jaruloj Chongstitvatana

Handling multiple attributes

• Bit interleaving

• Attribute interleaving

Page 17: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 17

Multiple-attribute indexing

•Quad-tree

•k-d tree

•k-d-B tree

•Grid file

•hB-tree

Issues• Non-linear relationship• Distance measure• k-nearest-neighbor

queries

Jaruloj Chongstitvatana

Page 18: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 18

Spatial Indexing

•R-tree

•R*-tree

•SKD-tree

Issues• Non-linear ordering• Spatial queries• High cost of determining

spatial relationship

Jaruloj Chongstitvatana

Page 19: Indexing for Multidimensional Data An Introduction

Advanced Data Structures 19

High-dimensional Indexing

•SS-tree

•TV-tree

Issues: Curse of dimensionality• Volume grows exponentially with

dimension• Partition in higher dimension is

coarser• Distance measurement in higher

dimension is not practical

Jaruloj Chongstitvatana