physical index structures logically, the index is a sorted list. physically, the sorted order is...

14
Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured Indexes: – Binary tree – B-tree –B + -tree Tree Structure ROOT NODE NODE NODE NODE LEAF NODES Node: branching point

Upload: oswald-collins

Post on 05-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

Physical Index Structures

• Logically, the index is a sorted list.

• Physically, the sorted order is normally maintained by pointers in a table.

• Tree-structured Indexes:– Binary tree– B-tree– B+-tree

Tree Structure

ROOT NODE

NODE NODE NODE

LEAF NODES

Node: branching point

Page 2: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

Binary Tree Index

• Each index entry is a node of the tree.

• The index is a table with four fields:– the true index fields, key

value and address,

– a left, or less-than, pointer that points to a node with a smaller key value and,

– a right, or greater-than, pointer - points to node with larger key value

Key value

Rightpointer

Left pointer

Data pointeri.e. data fileaddress

A binary tree node

Page 3: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

Binary Tree Index Example

16 87 13 54 22 35 39

1 2 3 4 5 6 7

161

872

133

544

356

225

Root node

Data file

16 1

87 2

21

2

3

4

5

13 3

3

54 4

4

22 5

5

6

Root nodeLP Key Add RP

Index as a table

397

6

7

35 6

39 7

7

- (only key values shown)

Page 4: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

Binary Tree Index Problems• Data pointers are dispersed throughout every level of the

tree. This results in:– Unequal access times– Complex tree traversal programming

• A binary tree is normally unbalanced:– For the tree to be balanced (i.e. equal branch lengths),

the key value at each node must be the median of the values in its sub-trees.

– This is virtually impossible, as the tree is loaded top-down, i.e. in order of arrival of key values, hence,

– the tree becomes un-balanced, and unequal access times are the result.

Page 5: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

Solution to Balance Problem in Index Tree Structures

• Load the tree “bottom-up”. That is, after a certain number of key values have been input, choose the median value to be promoted to a higher level so that it can point evenly to its left and right.

• This leads to the concepts of:– multi-value nodes, i.e. multiple key values

stored in sequence in each index node, and,– node-splitting - division of an overfull node into

two nodes, taking respectively, the low-end and high-end values of the split node.

Page 6: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

K1 K2 K3A1 A2 A3

Left pointer - points to nodewith key values less than K1

Rightpointer

Points to node whosekey values are >K1and <K2

A B-tree Node

• Multiple key values per node

• K1<K2<K3 - i.e. key values in sequence

• Pointers all point to other nodes, and therefore to ALL of the key values in those nodes

Page 7: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

Existing node values: 12 23 27 38New value to be inserted: 19

The split:

12 19 23 27 38

Key value 23 promotedto next highest level topoint to other two nodes

These values stayin the old node

These values move toa new node

B-tree Node Splitting

Page 8: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

Data file has two records - root node of index now full.Data file: Root node:

87 36 362

871

1 2 3 4

Then, new data file record of key value 27 stored in cell 3

The split:

27 36 87

Promoted

362

273

871

NewRootNode

B-tree Node Split Example

Page 9: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

362

273

871

K1 A1 K2 A2

1

2

3

4

36 22 3

27 3

87 1

Root Node

Current State of Index

Page 10: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

B-tree Pros and Cons• Balanced - i.e. every branch is the same length,

i.e. descends to the same level. Therefore,• the wild variation in access times observable in

binary trees is avoided.• However, the key values, (and associated

addresses), are still dispersed throughout all levels of the structure, leading to:– unequal path lengths, and therefore unequal

access times, and,– complex tree-traversal algorithms for logically

sequential reading/unloading of the data file.

Page 11: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

Solution to the Key Dispersal Problem

• Prohibit storage of data file addresses at all levels above leaf level.

• Consequently:– all accesses follow the same path length,

resulting in equal access times, and,– logically sequential reading of the data file

requires access to only the leaf level. That is, complex tree-traversal algorithms are not required.

Page 12: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

Implementing the Solution• Since all key values must appear at leaf level, some

key values appear more than once in the index, and therefore,

• upper-level nodes don’t need address fields, and leaf-level nodes don’t need downward index pointers,

• the median value to be promoted when a node split occurs must belong to one of the ‘halves’. i.e. the rightmost value of the left half, (leading to less-than-or-equal pointers), or the leftmost value of the right

half, (greater-than-or-equal pointers).

Page 13: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

1 2 3 4

The Data file:

56 9 72 41 34

The Root Node

Leaf Level Nodes

41

92

345

414

561

723

The left-hand node split when 41 was inserted. The high-orderend went to the right-hand node. Hence, the leaf-node pointer.

The B+-tree

Page 14: Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured

1 2 3 4

56 9 72 41 34

The Root Node 25

25

9 25 34 41

The split

41

92

256

561

723

345

414

The Data File:

The B+-treeInsertion of data file record of key value

25