of dbs 8. secondary and hierarchical access paths · separate coding sequence to code them as a bit...

16
Realization of DBS 8. Secondary and Hierarchical Access Paths Theo Härder www.haerder.de © 2011 AG DBIS Realization of Database Systems – SS 2011 Main reference: Theo Härder, Erhard Rahm: Datenbanksysteme – Konzepte und Techniken der Implementierung, Springer, 2001, Chapter 8. Patrick O’Neil, Elizabeth O’Neil: Database – Principles, Programming, Performance, 2nd edition, Morgan Kaufmann Publ., 200, Chapter 8. Realization of DBS Bit list compression Secondary key access Secondary and Hierarchical Access Paths Goals Design principles for access paths to all qualified records of a table Evaluation of search predicates by set-theoretic operations Mapping choices for hierarchical access requirements Access via secondary keys Generalized access paths Optimal codes Hierarchical access paths Join & path indexes Access via secondary keys Entry structure and link structure Use of pointer lists and compressed bit lists Bit list compression Run length compression, Null-sequence compression, Golomb codes Multi-mode compression, block compression Huffman codes Hierarchical access paths © 2011 AG DBIS 8-2 Merged indexes (generalized access path structure) Join and path index

Upload: others

Post on 05-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

8. Secondary and Hierarchical Access Paths

Theo Härderwww.haerder.de

© 2011 AG DBIS

Realization of Database Systems – SS 2011

Main reference:Theo Härder, Erhard Rahm: Datenbanksysteme – Konzepte und Techniken der Implementierung, Springer, 2001, Chapter 8.

Patrick O’Neil, Elizabeth O’Neil: Database – Principles, Programming, Performance, 2nd edition, Morgan Kaufmann Publ., 200, Chapter 8.

Realizationof DBS

Bit listcompression

Secondary keyaccess

Secondary and Hierarchical Access Paths

Goals• Design principles for access paths to all qualified records of a table• Evaluation of search predicates by set-theoretic operations• Mapping choices for hierarchical access requirements

Access via secondary keys

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

Access via secondary keys• Entry structure and link structure• Use of pointer lists and compressed bit lists

Bit list compression• Run length compression, Null-sequence compression, Golomb codes• Multi-mode compression, block compression• Huffman codes

Hierarchical access paths

© 2011 AG DBIS 8-2

p

Merged indexes (generalized access path structure)

Join and path index

Page 2: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Connection Structures for Record Sets

Materialized storage1. Physical contiguity of

records (clustering, lists)2. Chaining of records

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

record 1 record 2 record 3 record 1 record 2 record 3

Referenced storage3. Physical contiguity of

pointers (inversion)

4. Chaining of pointer

© 2011 AG DBIS 8-3

record 1 record 2 record 3

record 1 record 2 record 3

Realizationof DBS

Bit listcompression

Secondary keyaccess

Access Paths for Secondary Keys Search for records having given values

of non-identifying attributes (secondary keys)

Result is record setEno Dno Loc Salary

12345 A02 KL 45000

Dno

A02

Loc

KL

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

23456 A02 MA 51000

34567 A03 KL 48000

45678 A02 MA 55000

56789 A03 F 65000

67890 A12 KL 50000

A03

A12

. . .

MA

F

. . .

Realization: entry structure + link structure• Primary key access paths applicable as entry structure to record sets

© 2011 AG DBIS 8-4

• In principle, all connection structures can be used for record sets

most frequently: use of B*-trees and inversion techniques

Standard solution for inversion are sequential reference lists(often called OID lists or TID lists)• efficient processing of set operations• cost-effective maintenance

Page 3: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Access Paths for Secondary Keys (2) Frequent realization: inversion

• Separation of access path data and data records (referenced storage)• Reference Z realized as TID, DBK/PPP, ... • Two representation methods are possible:

a) Combined representation of lookup structure and pointer lists

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

a) Combined representation of lookup structure and pointer lists

key pointer lists

K02 4 Z Z Z Z

K03 3 Z Z Z

K12 1 Z. . .

relatively short pointer lists assumed!b) In the lookup structure, there exists (similar to access paths for primary

keys) only a single reference per key value which points to a list with

© 2011 AG DBIS 8-5

references to records (pointer list)

key

K02 4

K03 3

K12 1 ...

Z Z Z Z

Z Z Z

Z

pointer lists are managed in separate “containers”

Realizationof DBS

Bit listcompression

Secondary keyaccess

Access Paths for Secondary Keys (3)table EmpEmp ( Eno, Name, Dno, …)

E1 Müller K55 …E17 Maier K51E25 Schmitt K55…

K25 K61 K99

K8 K13 K25 K33 K45 K61 K75 K86 K99

IEmp(Dno)

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

K51 2 TID1 K61 TIDk K55 n TID1 TID2 . . . TIDn 2 TIDl TIDk

© 2011 AG DBIS 8-6

B*-tree• as access path for secondary key Dno• represents sort order of secondary keys and forward/backward chaining

Complex search operation• Range search• Generic search• Mask search (LIKE)• Phonetic search

Page 4: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Access Paths for Secondary Keys (4)

Use for Information Retrieval• Unformatted data: documents• Inversion by means of descriptors (no assignment to attributes!)

system ZD1 ZD29 . . . ZD1234

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

D1 D29 D1234. . .

bit list

bit listcompression

ZD57 ZD302 . . .

ZD777 ZD1595

very many and very few references are possible

Inversion using bit lists

© 2011 AG DBIS 8-7

Inversion using bit lists• Addressing of data records or documents

- Via allocation table AT- Directly in case of fixed length and contiguous storage

• Markings in the bit list correspond to entries of AT or computable addresses (b records per page)

• Attribute A has j attribute values a1, ..., aj

Realizationof DBS

Bit listcompression

Secondary keyaccess

Access Paths for Secondary Keys (5)

Bit matrix for A1 2 3 . . . n

0 1 0 0 1 0 0 . . .1 0 0 0 0 0 1. . .

a1

a2

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

. . .0 0 0 1 0 1 0aj

Storage as vertical bit lists enables indexing ofmulti-valued attributes (example: shopping cart with products)

Bit lists of fixed length• ji bit lists of attribute Ai

Si l d t ti

© 2011 AG DBIS 8-8

• Simple update operations • Fast comparison• Very space consuming• Only for small j

Often long null sequences: compression

Page 5: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Access Paths for Secondary Keys (6)

Compressed bit lists of variable length• Space saving• Reduction of I/O time• Additional overhead for coding and decoding

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

• Fast comparison• Ponderous update operations

Application areas of compression• Data Warehouse (inversion of Fact table)• Transfer/storage of

- Multimedia objects (Image, Audio, Video, ...)- Sparse matrices

© 2011 AG DBIS 8-9

- Objects in Geo-DBs, ...

Many compression techniques available

Realizationof DBS

Bit listcompression

Secondary keyaccess

Compression of Bit Lists

Run length compressionA “run” is a bit sequence of uniform bit marks. The uncompressed bit list is divided into subsequent alternating sequences of ‘0’s and ‘1’s. The compression technique represents each run in a coding sequence by its length (stored as a binary number). A coding sequence can be composed of several coding units of fixed length (k bits). In case of a run length larger than (2k-1) bits, a coding

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

g ( ) g g ( ) , gsequence having more than one coding units has to be used for the mapping. Compression of a run of length L with

(n-1) (2k-1) < L n (2k-1), n = 1, 2, … requires n coding units, where the first (n-1) coding units are completely filledwith ‘0’s (low value) which allows to recognize that subsequent coding unitsbelong to a coding sequence. Checking each coding unit for low values needsan extra test in case of decompression; such an implicit continuation mark of asequence prevents that the method fails for sequences of lengths > 2k.

Example (k=6): run length coding1 000001

© 2011 AG DBIS 8-10

2 00001063 11111164 000000 00000165 000000 000010

list of marks: 4, 5, 50, 115 000011 000010 101100 000001 000000 000001 000001

Page 6: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Compression of Bit Lists (2)

Null sequence compressionA Null sequence is a sequence of ‘0’ bits between two ‘1’ bits in the uncompressed bit list. The basic idea of the method is to represent the bit list only by subsequent null sequences, where a ‘1’ bit is implicitly expressed in each case. Because now length L=0 of a Null sequence can happen, the following coding can be chosen (k=6) which corresponds to the addition of binary numbers 2k-1:

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

(k=6), which corresponds to the addition of binary numbers 2k-1:

length of null sequence coding

0 0000001 00000162 11111063 111111 00000064 111111 000001

Because a coding sequence can be composed in an additive way by several coding units null sequences of arbitrary lengths can be represented n coding units are

© 2011 AG DBIS 8-11

units, null sequences of arbitrary lengths can be represented. n coding units are required if for L holds:

(n-1) (2k-1) L < n (2k-1), n = 1, 2, …

list of marks: 4, 5, 50, 115k=6

000011 000000 101100 111111 000001

Realizationof DBS

Bit listcompression

Secondary keyaccess

Compression of Bit Lists (3) Golomb coding (for null sequence compression)

A Null sequence of length L is represented by a coding sequence consisting of a variable-length prefix, a separator bit, and a remainder field of fixed length using log2m bits. The prefix is composed of L/m ‘1’ bits followed by a ‘0’ bit as separator. The remainder contains (as a binary number) the number of remaining ‘0’ bits of the Null sequence: L - m*L/m This method enables the

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

remaining 0 bits of the Null sequence: L m L/m. This method enables the compression of arbitrary long Null sequences (improved by Exp-Golomb), independent of the chosen parameters. If p is the ‘0’-bit probability in the bit list, parameter m should be chosen such that pm 0.5.

Example (m=4):

. . . 01 0000 0000 0000 0000 000 10 . . .

Null sequence

m m m m

0

© 2011 AG DBIS 8-12

1111011

prefix remainderseparator

list of marks: 4, 5, 50, 115m=8

0011 0000 111110100 11111110000

Page 7: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Compression of Bit Lists (4)

Multi-mode compressionSome bits of a coding sequence of fixed length k are reserved as so-called type bits to mark different modes of a coding sequence. A single type bitenables two modes:

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

1 : k-1 bits of the sequence are stored as “bit pattern”;0 : 2k-1-1 bits of a Null sequence are expressed by a binary number

Examplelist of marks: 4, 5, 50, 115k=6, single type bit

100011 011111 001101 110000 011111 011101 110000

© 2011 AG DBIS 8-13

Realizationof DBS

Bit listcompression

Secondary keyaccess

Compression of Bit Lists (5)

Multi-mode compression (cont.)Because of the restricted k, long Null sequences require the use of several subsequent coding sequences. Furthermore, isolated ‘1’s in a bit list need a separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example, the following four modes:

11 k 2 bit f th t d bit tt

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

11 : k-2 bits of the sequence are stored as bit pattern;10 : 2k-2 -1 bits are encoded as a sequence of ‘1’s by a binary number;01 : 2k-2 -1 bits are encoded as Null sequence by a binary number;00 : 22k-2 -1 bits are encoded as Null sequence in a doubled coding sequence

If an ‘00’-sequence is large enough to compress any Null sequence, isolated ‘1’scould be implicitly expressed

Example

© 2011 AG DBIS 8-14

Examplelist of marks: 4, 5, 50, 115k=8, two type bits

11000110 01101011 00000000 01000000 …

Page 8: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Compression of Bit Lists (6)

Block compressionThe uncompressed bit list is divided into blocks of length k. A first method replaces the individual blocks by codes of variable length. If the probabilities of specific bit patterns are known or can be estimated, Huffman codes can be used. Using block length k, 2k different patterns require 2k code words of variable length (use of a translation table with optimally assigned code words)

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

variable length (use of a translation table with optimally assigned code words).

A second method stores only blocks where at least one ‘1’ bit occurs. To mark the blocks (low value blocks) which are not stored, a second bit list is used as a directory, in which each ‘1’ mark corresponds to a block stored. Because long Null sequences may occur in the directory, it again can be compressed using null-sequence- or multi-mode-compression.

The idea to apply again block compression on the directory, leads to hierarchical block compression. It can be recursively continued until the elimination of Null

© 2011 AG DBIS 8-15

p ysequences is not worth it. Starting from the highest hierarchy level, the uncompressed bit list (index depth d) can be easily reconstructed.

Realizationof DBS

Bit listcompression

Secondary keyaccess

Compression of Bit Lists (7)

1011 0000 00000100

1010

inner nodes

rootlevel 1

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

Example• node size l = 4 and index depth d = 3• indexed set S = {2 3 9 12 13 14 38 40}

0110 0000 1001 1100 0000 0000 0000 0000 0000 0101 0000 0000 0000 0000 0000 0000leaveslevel 3

1011 0000 00000100

1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64

inner nodeslevel 2

© 2011 AG DBIS 8-16

• indexed set S = {2, 3, 9, 12, 13, 14, 38, 40}• physical storage

500010500020500030500040

. . .

1010

1011

0110

0100

1001 1100 1010

Page 9: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Optimal Codes Extended binary trees with minimal external path length can be used

to design optimal codes for n+1 characters

Sequence to be coded: A A B C A A B B C A D B A B A (15 characters)• Codes of fixed length: 2 bit A = 00

. . .D 11

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

D = 11

C2Bit = 15 * 2 = 30

Are there better codings?character frequency code

© 2011 AG DBIS 8-17

no character is prefix of another oneEw = CCode

Decoding can be performed with the same extended binary tree used to determine the codes

Proceeding: A A B C A . . . = 0 0 1 0 1 1 0 0 . . .0|0|10|110|0| . . = A A B C A . . .

Realizationof DBS

Bit listcompression

Secondary keyaccess

Huffman Algorithm The minimal coding can be derived using extended binary trees having minimal

weighted external path length. The resulting codes are called Huffman codes.

Algorithm for the construction ofbinary trees with minimal weighted external path lengthGiven:List of trees which initially consists of n external nodes as roots

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

List of trees which initially consists of n external nodes as roots. The frequencies qi are carried by the roots of the treesIdea:Determine the two trees with the lowest frequencies and remove them from the list. By means of a new root, both trees found are composed as left and right subtree to a new tree and inserted into the list. external nodes n-1 trees = internal nodes

Algorithm: Huffman (TreeList list, int n)for (i = 1; i < n; i += 1)

© 2011 AG DBIS 8-18

{ p1 = “smallest element from list”“remove p1 from list”p2 = “smallest element from list”“remove p2 from list”“create node p”“attach p1 and p2 as subtrees to p”“determine the weight of p as sum of the weights p1 and p2”“insert p into list”}

Page 10: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Huffman Algorithm (2) Execution example

1 2 3 4 5 6

qi

T1 T2 T3

T1 T2 T3 T4 T5

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

T4

Ew=

© 2011 AG DBIS 8-19

Ew

Cost:)()

2)(1()( 2

0

1

10 nO

nnnCinCC

n

i

Realizationof DBS

Bit listcompression

Secondary keyaccess

Assignment of Huffman-Codes – Example

Bitstring Li Oi value range

0000001 48 [-2.8x1014, -4.3x109]

0000010 32 [-4.3x109, -69977]

0000011 16 [-6.9976, -4441]

000010 12 [-4440, -345]

1

0

1

00

0

0 10

1

1

1

Bitstring Li Oi value range

000000001 20 [-1118485, -69910]

00000001 16 [-69909, -4374]

0000001 12 [-4373, -278]

000001 8 [-277, -22]

100

0

0

0 1

11

1

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

000011 8 [-344, -89]

00010 6 [-88, -25]

00011 4 [-24, -9]

001 3 [-8, -1]

01 3 [0, 7]

100 4 [8, 23]

101 6 [24, 87]

1100 8 [88, 343]

1101 12 [344, 4439]

11100 16 [4440 69975]

0

0

0

1

1

0

00

1

1

00

1

1

1

1

0 100001 4 [-21, -6]

0001 2 [-5, -2]

001 1 [-1, 0]

01 0 [1, 1]

10 1 [2, 3]

110 2 [4, 7]

1110 4 [8, 23]

11110 8 [24, 279]

111110 12 [280, 4375]

1

0

0

1

0

0

11

1

1

0

0

00

0

1

1

1

© 2011 AG DBIS 8-20

11100 16 [4440, 69975]

11101 32 [69976, 4.3x109]

11110 48 [3.3x109, 2.8x1014]

1

0

1

1

00

1111110 16 [4376, 69911]

11111110 20 [69912, 1118487]

01

1

00

1

1

Page 11: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Hierarchical Access Paths

Realization of functional relationships among two record types• Owner Member: Set types according to the network model• Each instance of an Owner record type is linked to 0..n instances

of the Member record type

Logical view:

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

60000KLK025678

ABEL KLK02

40000DAK021234 50000KLK024488

Eno Dno Loc Salary PRIOR PRIOR

NEXT NEXT

Dno Mno D-LocLAST/PRIORPRIOR

FIRST/NEXT NEXT

OWNEROWNEROWNER

gIllustration of navigation options

OwnerDept:

MemberEmp:

© 2011 AG DBIS 8-21

SCHULZ DAK03

45000DAK036927 55000FRK034711

Three implementations for different performance requirements

Realizationof DBS

Bit listcompression

Secondary keyaccess

Hierarchical Access Paths – Implementation

Sequential list based on pages

SET OWNER Last

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

SET MEMBER 1SET MEMBER 2SET MEMBER 3

SET MEMBER 4

Chained listSET OWNER Last/PRIOR

© 2011 AG DBIS 8-22

SET MEMBER 1 SET MEMBER 2 SET MEMBER 3 SET MEMBER 4

: optional pointer

Page 12: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Hierarchical Access Paths – Implementation (2)

Pointer array structure

SET OWNER

POINTER ARRAY

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

SET MEMBER 1 SET MEMBER 4SET MEMBER 3SET MEMBER 2

POINTER-ARRAYENTRY

1ENTRY

2ENTRY

3ENTRY

4

© 2011 AG DBIS 8-23

: optional pointer

Realizationof DBS

Bit listcompression

Secondary keyaccess

Hierarchical Access Paths –Evaluation of Implementation Techniques

Pointer array• Stable performance behavior• Behavior independent of Set growth and Set sequence• “Standard” method in case of imprecise information concerning Set size

and access frequency

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

and access frequency

Sequential list• Restricted to a single Set type per Member record type (clustering)• Fast location / insertion in Set sequence• Updates more expensive than for pointer array

Chained listAd t i f b hi f th M b d t i l

© 2011 AG DBIS 8-24

• Advantages in case of membership of the Member record type in several Sets

• Cheap switch to other Set occurrences• Sequential access faster than for pointer array• Only useful in small Set occurrences

Page 13: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Generalized Access Path Structure

Idea:Shared exploitation of an index structure (B*-tree) for several record types for which the relationships (1:1, 1:n, n:m) are defined over the same domain (e.g. for Dno) and represented by equality of attribute values

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

Use of the Index structure for• primary key access e.g. as IDept(Dno)

Dept

Emp

Mgr

Equipment

all tables carry an attribute(e.g. Dno) which is definedon domain Deptno

© 2011 AG DBIS 8-25

• secondary key access e.g. as IEmp(Dno)• hierarchical access e.g. of Dept(Dno) to Emp(Dno) or vice versa• join operations (Join) e.g. of Dept.Dno = Emp.Dno

Combined realization of primary key, secondary key, and hierarchical access paths using an extended B*-tree• Inner tree nodes remain unchanged• Leaves contain references for primary and secondary access paths

Realizationof DBS

Bit listcompression

Secondary keyaccess

B*-Tree as Combined Access Path StructureK25 K61 K99

K8 K13 K25 K33 K45 K61 K75 K86 K99

IEmp(Dno)

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

K51 2 TID1 K61 TIDkK55 n TID1 TID2 . . . TIDn 2 TIDl TIDk

IEmp/Dept(Dno) K25 K61 K99

K8 K13 K25 K33 K45 K61 K75 K86 K99

© 2011 AG DBIS 8-26

Structure contains index for Dept, Emp and link for Dept-Emp with direct access from1. OWNER to each MEMBER, 2. Each MEMBER to each other MEMBER,3. Each MEMBER to the OWNER

1 nK55 TID0 . . . TIDnTID1 TID2. . . . . .

Page 14: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

B*-Tree as Generalized Access Path Structure

K25 K61 K99

K8 K13 K25 K33 K45 K61 K75 K86 K99

IEmp/Dept/Mgr/Equip(Dno)

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

K55 1 TID. . . 3 1 4 TID TID TID TID TID TID TID TID . . .

PRIOR NEXT TIDs for Emp TIDs for optionalf t

TIDs for Dept TIDs forMgr

© 2011 AG DBIS 8-27

Access path structure comprises- 4 index structures- 6 link structures

Equipment reference tooverflow page

Realizationof DBS

Bit listcompression

Secondary keyaccess

Generalized Access Path Structure – Evaluation

Keys are stored only once Saving of storage space

Uniform structure for all access path typesSi lifi ti f i l t ti

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

Simplification of implementation

Support of join operation and certain statistical queries

Simple checking of referential integrity andfurther integrity constraints (e.g., cardinality restrictions)

© 2011 AG DBIS 8-28

Increased number of leaf pages More page accesses in case of scanning all records

of a record type in sort order

Height of the tree remains stable in most cases Similar performance behavior for locating data and update

Page 15: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Join and Path Indexes

Join index• The join index VI between two tables V and S (not necessarily disjoint) with

the join attributes A and B is defined as follows:

• VI = {(v.TID, s.TID) f(v.A, s.B) is TRUE, v V, s S}

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

• f denotes a Boolean function which defines the join predicate, which may be very complex. Especially, -joins ( {=, , <, , >, }) can be specified in this way.

• Application of selection predicates and parallelism for the join

V S

TID TID

V S

TID TID

S V

TID TID

VIV: VIS:

© 2011 AG DBIS 8-29

TIDv2 TIDs4

TIDv1 TIDs3

TIDv2 TIDs2

TIDv2 TIDs6

TIDv1 TIDs3

TIDv2 TIDs2

TIDv2 TIDs4

TIDv2 TIDs6

TIDs2 TIDv2

TIDs3 TIDv1

TIDs4 TIDv2

TIDs6 TIDv2

logical view Index auf TIDV Index auf TIDS

Realizationof DBS

Bit listcompression

Secondary keyaccess

Join and Path Indexes (2)

Multi-join index• Generalization of the idea to efficiently process join operations via a

statically computed join index (compile time instead of runtime)• Index for a two-way join is used to determine the join partners in a third

table T and to extend the index table by a column for the TIDti.

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

• If two index tables for VS and ST already exist, these can be immediately used to combine them to an extended Index table VST

• If the VST join should contain only attributes of V and T, a VT index can be created. Column S is indispensable for the join computation

Multi-join index (example)Index tables for the join: logical view

© 2011 AG DBIS 8-30

V S

TIDv1 TIDs3

TIDv2 TIDs4

TIDv2 TIDs2

S T

TIDs2 TIDt1

TIDs3 TIDt2

TIDs3 TIDt3

TIDs4 TIDt4

TIDs4 TIDt5

V S T

TIDv1 TIDs3 TIDt2

TIDv1 TIDs3 TIDt3

TIDv2 TIDs4 TIDt4

TIDv2 TIDs4 TIDt5

TIDv2 TIDs2 TIDt1

Page 16: of DBS 8. Secondary and Hierarchical Access Paths · separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example,

Realizationof DBS

Bit listcompression

Secondary keyaccess

Join and Path Indexes (4) Example

Given are the tables Dept, Emp, Proj and EP (Eno, Jno) which embodies an (n:m) relationship between Emp (Eno, Dno, ...) and Proj (Jno, ..., Loc).

Q2: SELECT D.Dno, A.ANAMEFROM Dept D, Emp E, EP M, Proj JWHERE D Dno = E Dno

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

Path index• Integration of an index

Loc into multi-join index DEMJ

Dept Emp EP Proj Loc

TIDa1 TIDp1 TIDm1 TIDj1 Berlin

TID TID TID TID B li

WHERE D.Dno = E.DnoAND E.Eno = M.EnoAND M.Jno = J.JnoAND J.Loc = :X

• Extension to n tables possible

© 2011 AG DBIS 8-31

• Enables evaluation of special queries on the index

TIDa1 TIDp2 TIDm3 TIDj1 Berlin

TIDa1 TIDp2 TIDm4 TIDj2 Köln

TIDa2 TIDp3 TIDm5 TIDj3 Bonn

. . . . . . . . . . . . . . .

• Assumption: multi-valued reference attributes in ORDBMS

• Analogous path expression to Q2:Dept.Employs-Emp.Works-at.Loc = :X

Realizationof DBS

Bit listcompression

Secondary keyaccess

Summary

Access paths for secondary keys• Entry structure: B*-tree etc.• Link structure: pointer lists, bit lists• Many compression methods available

Support of set theoretic operations

Generalized access paths

Optimal codes

Hierarchicalaccess paths

Join & pathindexes

Support of set-theoretic operations

Compression of bit lists• Support of variable-length keys and entries required• Bit lists are highly efficient in case of low domain cardinality• Huffman codes allow for flexible adaptation to value distributions

Hierarchical access paths• Support of join operations (relational model)• Efficient processing of Set operations (network model)• Link structure: chains pointer lists lists (adjustment to special workloads)

© 2011 AG DBIS 8-32

• Link structure: chains, pointer lists, lists (adjustment to special workloads)

Generalized access path structure• Support of primary key-, secondary key- and hierarchical accesses• Also applicable as special join index

Join and path indexes• Explicit construction of join results and their indexing• Path indexes only enable optimization of special queries