of dbs 8. secondary and hierarchical access paths · separate coding sequence to code them as a bit...
TRANSCRIPT
Realizationof DBS
8. Secondary and Hierarchical Access Paths
Theo Härderwww.haerder.de
© 2011 AG DBIS
Realization of Database Systems – SS 2011
Main reference:Theo Härder, Erhard Rahm: Datenbanksysteme – Konzepte und Techniken der Implementierung, Springer, 2001, Chapter 8.
Patrick O’Neil, Elizabeth O’Neil: Database – Principles, Programming, Performance, 2nd edition, Morgan Kaufmann Publ., 200, Chapter 8.
Realizationof DBS
Bit listcompression
Secondary keyaccess
Secondary and Hierarchical Access Paths
Goals• Design principles for access paths to all qualified records of a table• Evaluation of search predicates by set-theoretic operations• Mapping choices for hierarchical access requirements
Access via secondary keys
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
Access via secondary keys• Entry structure and link structure• Use of pointer lists and compressed bit lists
Bit list compression• Run length compression, Null-sequence compression, Golomb codes• Multi-mode compression, block compression• Huffman codes
Hierarchical access paths
© 2011 AG DBIS 8-2
p
Merged indexes (generalized access path structure)
Join and path index
Realizationof DBS
Bit listcompression
Secondary keyaccess
Connection Structures for Record Sets
Materialized storage1. Physical contiguity of
records (clustering, lists)2. Chaining of records
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
record 1 record 2 record 3 record 1 record 2 record 3
Referenced storage3. Physical contiguity of
pointers (inversion)
4. Chaining of pointer
© 2011 AG DBIS 8-3
record 1 record 2 record 3
record 1 record 2 record 3
Realizationof DBS
Bit listcompression
Secondary keyaccess
Access Paths for Secondary Keys Search for records having given values
of non-identifying attributes (secondary keys)
Result is record setEno Dno Loc Salary
12345 A02 KL 45000
Dno
A02
Loc
KL
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
23456 A02 MA 51000
34567 A03 KL 48000
45678 A02 MA 55000
56789 A03 F 65000
67890 A12 KL 50000
A03
A12
. . .
MA
F
. . .
Realization: entry structure + link structure• Primary key access paths applicable as entry structure to record sets
© 2011 AG DBIS 8-4
• In principle, all connection structures can be used for record sets
most frequently: use of B*-trees and inversion techniques
Standard solution for inversion are sequential reference lists(often called OID lists or TID lists)• efficient processing of set operations• cost-effective maintenance
Realizationof DBS
Bit listcompression
Secondary keyaccess
Access Paths for Secondary Keys (2) Frequent realization: inversion
• Separation of access path data and data records (referenced storage)• Reference Z realized as TID, DBK/PPP, ... • Two representation methods are possible:
a) Combined representation of lookup structure and pointer lists
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
a) Combined representation of lookup structure and pointer lists
key pointer lists
K02 4 Z Z Z Z
K03 3 Z Z Z
K12 1 Z. . .
relatively short pointer lists assumed!b) In the lookup structure, there exists (similar to access paths for primary
keys) only a single reference per key value which points to a list with
© 2011 AG DBIS 8-5
references to records (pointer list)
key
K02 4
K03 3
K12 1 ...
Z Z Z Z
Z Z Z
Z
pointer lists are managed in separate “containers”
Realizationof DBS
Bit listcompression
Secondary keyaccess
Access Paths for Secondary Keys (3)table EmpEmp ( Eno, Name, Dno, …)
E1 Müller K55 …E17 Maier K51E25 Schmitt K55…
K25 K61 K99
K8 K13 K25 K33 K45 K61 K75 K86 K99
IEmp(Dno)
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
K51 2 TID1 K61 TIDk K55 n TID1 TID2 . . . TIDn 2 TIDl TIDk
© 2011 AG DBIS 8-6
B*-tree• as access path for secondary key Dno• represents sort order of secondary keys and forward/backward chaining
Complex search operation• Range search• Generic search• Mask search (LIKE)• Phonetic search
Realizationof DBS
Bit listcompression
Secondary keyaccess
Access Paths for Secondary Keys (4)
Use for Information Retrieval• Unformatted data: documents• Inversion by means of descriptors (no assignment to attributes!)
system ZD1 ZD29 . . . ZD1234
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
D1 D29 D1234. . .
bit list
bit listcompression
ZD57 ZD302 . . .
ZD777 ZD1595
very many and very few references are possible
Inversion using bit lists
© 2011 AG DBIS 8-7
Inversion using bit lists• Addressing of data records or documents
- Via allocation table AT- Directly in case of fixed length and contiguous storage
• Markings in the bit list correspond to entries of AT or computable addresses (b records per page)
• Attribute A has j attribute values a1, ..., aj
Realizationof DBS
Bit listcompression
Secondary keyaccess
Access Paths for Secondary Keys (5)
Bit matrix for A1 2 3 . . . n
0 1 0 0 1 0 0 . . .1 0 0 0 0 0 1. . .
a1
a2
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
. . .0 0 0 1 0 1 0aj
Storage as vertical bit lists enables indexing ofmulti-valued attributes (example: shopping cart with products)
Bit lists of fixed length• ji bit lists of attribute Ai
Si l d t ti
© 2011 AG DBIS 8-8
• Simple update operations • Fast comparison• Very space consuming• Only for small j
Often long null sequences: compression
Realizationof DBS
Bit listcompression
Secondary keyaccess
Access Paths for Secondary Keys (6)
Compressed bit lists of variable length• Space saving• Reduction of I/O time• Additional overhead for coding and decoding
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
• Fast comparison• Ponderous update operations
Application areas of compression• Data Warehouse (inversion of Fact table)• Transfer/storage of
- Multimedia objects (Image, Audio, Video, ...)- Sparse matrices
© 2011 AG DBIS 8-9
- Objects in Geo-DBs, ...
Many compression techniques available
Realizationof DBS
Bit listcompression
Secondary keyaccess
Compression of Bit Lists
Run length compressionA “run” is a bit sequence of uniform bit marks. The uncompressed bit list is divided into subsequent alternating sequences of ‘0’s and ‘1’s. The compression technique represents each run in a coding sequence by its length (stored as a binary number). A coding sequence can be composed of several coding units of fixed length (k bits). In case of a run length larger than (2k-1) bits, a coding
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
g ( ) g g ( ) , gsequence having more than one coding units has to be used for the mapping. Compression of a run of length L with
(n-1) (2k-1) < L n (2k-1), n = 1, 2, … requires n coding units, where the first (n-1) coding units are completely filledwith ‘0’s (low value) which allows to recognize that subsequent coding unitsbelong to a coding sequence. Checking each coding unit for low values needsan extra test in case of decompression; such an implicit continuation mark of asequence prevents that the method fails for sequences of lengths > 2k.
Example (k=6): run length coding1 000001
© 2011 AG DBIS 8-10
2 00001063 11111164 000000 00000165 000000 000010
list of marks: 4, 5, 50, 115 000011 000010 101100 000001 000000 000001 000001
Realizationof DBS
Bit listcompression
Secondary keyaccess
Compression of Bit Lists (2)
Null sequence compressionA Null sequence is a sequence of ‘0’ bits between two ‘1’ bits in the uncompressed bit list. The basic idea of the method is to represent the bit list only by subsequent null sequences, where a ‘1’ bit is implicitly expressed in each case. Because now length L=0 of a Null sequence can happen, the following coding can be chosen (k=6) which corresponds to the addition of binary numbers 2k-1:
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
(k=6), which corresponds to the addition of binary numbers 2k-1:
length of null sequence coding
0 0000001 00000162 11111063 111111 00000064 111111 000001
Because a coding sequence can be composed in an additive way by several coding units null sequences of arbitrary lengths can be represented n coding units are
© 2011 AG DBIS 8-11
units, null sequences of arbitrary lengths can be represented. n coding units are required if for L holds:
(n-1) (2k-1) L < n (2k-1), n = 1, 2, …
list of marks: 4, 5, 50, 115k=6
000011 000000 101100 111111 000001
Realizationof DBS
Bit listcompression
Secondary keyaccess
Compression of Bit Lists (3) Golomb coding (for null sequence compression)
A Null sequence of length L is represented by a coding sequence consisting of a variable-length prefix, a separator bit, and a remainder field of fixed length using log2m bits. The prefix is composed of L/m ‘1’ bits followed by a ‘0’ bit as separator. The remainder contains (as a binary number) the number of remaining ‘0’ bits of the Null sequence: L - m*L/m This method enables the
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
remaining 0 bits of the Null sequence: L m L/m. This method enables the compression of arbitrary long Null sequences (improved by Exp-Golomb), independent of the chosen parameters. If p is the ‘0’-bit probability in the bit list, parameter m should be chosen such that pm 0.5.
Example (m=4):
. . . 01 0000 0000 0000 0000 000 10 . . .
Null sequence
m m m m
0
© 2011 AG DBIS 8-12
1111011
prefix remainderseparator
list of marks: 4, 5, 50, 115m=8
0011 0000 111110100 11111110000
Realizationof DBS
Bit listcompression
Secondary keyaccess
Compression of Bit Lists (4)
Multi-mode compressionSome bits of a coding sequence of fixed length k are reserved as so-called type bits to mark different modes of a coding sequence. A single type bitenables two modes:
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
1 : k-1 bits of the sequence are stored as “bit pattern”;0 : 2k-1-1 bits of a Null sequence are expressed by a binary number
Examplelist of marks: 4, 5, 50, 115k=6, single type bit
100011 011111 001101 110000 011111 011101 110000
© 2011 AG DBIS 8-13
Realizationof DBS
Bit listcompression
Secondary keyaccess
Compression of Bit Lists (5)
Multi-mode compression (cont.)Because of the restricted k, long Null sequences require the use of several subsequent coding sequences. Furthermore, isolated ‘1’s in a bit list need a separate coding sequence to code them as a bit pattern. Reserving a further type bit enables greater flexibility with, for example, the following four modes:
11 k 2 bit f th t d bit tt
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
11 : k-2 bits of the sequence are stored as bit pattern;10 : 2k-2 -1 bits are encoded as a sequence of ‘1’s by a binary number;01 : 2k-2 -1 bits are encoded as Null sequence by a binary number;00 : 22k-2 -1 bits are encoded as Null sequence in a doubled coding sequence
If an ‘00’-sequence is large enough to compress any Null sequence, isolated ‘1’scould be implicitly expressed
Example
© 2011 AG DBIS 8-14
Examplelist of marks: 4, 5, 50, 115k=8, two type bits
11000110 01101011 00000000 01000000 …
Realizationof DBS
Bit listcompression
Secondary keyaccess
Compression of Bit Lists (6)
Block compressionThe uncompressed bit list is divided into blocks of length k. A first method replaces the individual blocks by codes of variable length. If the probabilities of specific bit patterns are known or can be estimated, Huffman codes can be used. Using block length k, 2k different patterns require 2k code words of variable length (use of a translation table with optimally assigned code words)
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
variable length (use of a translation table with optimally assigned code words).
A second method stores only blocks where at least one ‘1’ bit occurs. To mark the blocks (low value blocks) which are not stored, a second bit list is used as a directory, in which each ‘1’ mark corresponds to a block stored. Because long Null sequences may occur in the directory, it again can be compressed using null-sequence- or multi-mode-compression.
The idea to apply again block compression on the directory, leads to hierarchical block compression. It can be recursively continued until the elimination of Null
© 2011 AG DBIS 8-15
p ysequences is not worth it. Starting from the highest hierarchy level, the uncompressed bit list (index depth d) can be easily reconstructed.
Realizationof DBS
Bit listcompression
Secondary keyaccess
Compression of Bit Lists (7)
1011 0000 00000100
1010
inner nodes
rootlevel 1
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
Example• node size l = 4 and index depth d = 3• indexed set S = {2 3 9 12 13 14 38 40}
0110 0000 1001 1100 0000 0000 0000 0000 0000 0101 0000 0000 0000 0000 0000 0000leaveslevel 3
1011 0000 00000100
1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64
inner nodeslevel 2
© 2011 AG DBIS 8-16
• indexed set S = {2, 3, 9, 12, 13, 14, 38, 40}• physical storage
500010500020500030500040
. . .
1010
1011
0110
0100
1001 1100 1010
Realizationof DBS
Bit listcompression
Secondary keyaccess
Optimal Codes Extended binary trees with minimal external path length can be used
to design optimal codes for n+1 characters
Sequence to be coded: A A B C A A B B C A D B A B A (15 characters)• Codes of fixed length: 2 bit A = 00
. . .D 11
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
D = 11
C2Bit = 15 * 2 = 30
Are there better codings?character frequency code
© 2011 AG DBIS 8-17
no character is prefix of another oneEw = CCode
Decoding can be performed with the same extended binary tree used to determine the codes
Proceeding: A A B C A . . . = 0 0 1 0 1 1 0 0 . . .0|0|10|110|0| . . = A A B C A . . .
Realizationof DBS
Bit listcompression
Secondary keyaccess
Huffman Algorithm The minimal coding can be derived using extended binary trees having minimal
weighted external path length. The resulting codes are called Huffman codes.
Algorithm for the construction ofbinary trees with minimal weighted external path lengthGiven:List of trees which initially consists of n external nodes as roots
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
List of trees which initially consists of n external nodes as roots. The frequencies qi are carried by the roots of the treesIdea:Determine the two trees with the lowest frequencies and remove them from the list. By means of a new root, both trees found are composed as left and right subtree to a new tree and inserted into the list. external nodes n-1 trees = internal nodes
Algorithm: Huffman (TreeList list, int n)for (i = 1; i < n; i += 1)
© 2011 AG DBIS 8-18
{ p1 = “smallest element from list”“remove p1 from list”p2 = “smallest element from list”“remove p2 from list”“create node p”“attach p1 and p2 as subtrees to p”“determine the weight of p as sum of the weights p1 and p2”“insert p into list”}
Realizationof DBS
Bit listcompression
Secondary keyaccess
Huffman Algorithm (2) Execution example
1 2 3 4 5 6
qi
T1 T2 T3
T1 T2 T3 T4 T5
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
T4
Ew=
© 2011 AG DBIS 8-19
Ew
Cost:)()
2)(1()( 2
0
1
10 nO
nnnCinCC
n
i
Realizationof DBS
Bit listcompression
Secondary keyaccess
Assignment of Huffman-Codes – Example
Bitstring Li Oi value range
0000001 48 [-2.8x1014, -4.3x109]
0000010 32 [-4.3x109, -69977]
0000011 16 [-6.9976, -4441]
000010 12 [-4440, -345]
1
0
1
00
0
0 10
1
1
1
Bitstring Li Oi value range
000000001 20 [-1118485, -69910]
00000001 16 [-69909, -4374]
0000001 12 [-4373, -278]
000001 8 [-277, -22]
100
0
0
0 1
11
1
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
000011 8 [-344, -89]
00010 6 [-88, -25]
00011 4 [-24, -9]
001 3 [-8, -1]
01 3 [0, 7]
100 4 [8, 23]
101 6 [24, 87]
1100 8 [88, 343]
1101 12 [344, 4439]
11100 16 [4440 69975]
0
0
0
1
1
0
00
1
1
00
1
1
1
1
0 100001 4 [-21, -6]
0001 2 [-5, -2]
001 1 [-1, 0]
01 0 [1, 1]
10 1 [2, 3]
110 2 [4, 7]
1110 4 [8, 23]
11110 8 [24, 279]
111110 12 [280, 4375]
1
0
0
1
0
0
11
1
1
0
0
00
0
1
1
1
© 2011 AG DBIS 8-20
11100 16 [4440, 69975]
11101 32 [69976, 4.3x109]
11110 48 [3.3x109, 2.8x1014]
1
0
1
1
00
1111110 16 [4376, 69911]
11111110 20 [69912, 1118487]
01
1
00
1
1
Realizationof DBS
Bit listcompression
Secondary keyaccess
Hierarchical Access Paths
Realization of functional relationships among two record types• Owner Member: Set types according to the network model• Each instance of an Owner record type is linked to 0..n instances
of the Member record type
Logical view:
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
60000KLK025678
ABEL KLK02
40000DAK021234 50000KLK024488
Eno Dno Loc Salary PRIOR PRIOR
NEXT NEXT
Dno Mno D-LocLAST/PRIORPRIOR
FIRST/NEXT NEXT
OWNEROWNEROWNER
gIllustration of navigation options
OwnerDept:
MemberEmp:
© 2011 AG DBIS 8-21
SCHULZ DAK03
45000DAK036927 55000FRK034711
Three implementations for different performance requirements
Realizationof DBS
Bit listcompression
Secondary keyaccess
Hierarchical Access Paths – Implementation
Sequential list based on pages
SET OWNER Last
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
SET MEMBER 1SET MEMBER 2SET MEMBER 3
SET MEMBER 4
Chained listSET OWNER Last/PRIOR
© 2011 AG DBIS 8-22
SET MEMBER 1 SET MEMBER 2 SET MEMBER 3 SET MEMBER 4
: optional pointer
Realizationof DBS
Bit listcompression
Secondary keyaccess
Hierarchical Access Paths – Implementation (2)
Pointer array structure
SET OWNER
POINTER ARRAY
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
SET MEMBER 1 SET MEMBER 4SET MEMBER 3SET MEMBER 2
POINTER-ARRAYENTRY
1ENTRY
2ENTRY
3ENTRY
4
© 2011 AG DBIS 8-23
: optional pointer
Realizationof DBS
Bit listcompression
Secondary keyaccess
Hierarchical Access Paths –Evaluation of Implementation Techniques
Pointer array• Stable performance behavior• Behavior independent of Set growth and Set sequence• “Standard” method in case of imprecise information concerning Set size
and access frequency
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
and access frequency
Sequential list• Restricted to a single Set type per Member record type (clustering)• Fast location / insertion in Set sequence• Updates more expensive than for pointer array
Chained listAd t i f b hi f th M b d t i l
© 2011 AG DBIS 8-24
• Advantages in case of membership of the Member record type in several Sets
• Cheap switch to other Set occurrences• Sequential access faster than for pointer array• Only useful in small Set occurrences
Realizationof DBS
Bit listcompression
Secondary keyaccess
Generalized Access Path Structure
Idea:Shared exploitation of an index structure (B*-tree) for several record types for which the relationships (1:1, 1:n, n:m) are defined over the same domain (e.g. for Dno) and represented by equality of attribute values
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
Use of the Index structure for• primary key access e.g. as IDept(Dno)
Dept
Emp
Mgr
Equipment
all tables carry an attribute(e.g. Dno) which is definedon domain Deptno
© 2011 AG DBIS 8-25
• secondary key access e.g. as IEmp(Dno)• hierarchical access e.g. of Dept(Dno) to Emp(Dno) or vice versa• join operations (Join) e.g. of Dept.Dno = Emp.Dno
Combined realization of primary key, secondary key, and hierarchical access paths using an extended B*-tree• Inner tree nodes remain unchanged• Leaves contain references for primary and secondary access paths
Realizationof DBS
Bit listcompression
Secondary keyaccess
B*-Tree as Combined Access Path StructureK25 K61 K99
K8 K13 K25 K33 K45 K61 K75 K86 K99
IEmp(Dno)
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
K51 2 TID1 K61 TIDkK55 n TID1 TID2 . . . TIDn 2 TIDl TIDk
IEmp/Dept(Dno) K25 K61 K99
K8 K13 K25 K33 K45 K61 K75 K86 K99
© 2011 AG DBIS 8-26
Structure contains index for Dept, Emp and link for Dept-Emp with direct access from1. OWNER to each MEMBER, 2. Each MEMBER to each other MEMBER,3. Each MEMBER to the OWNER
1 nK55 TID0 . . . TIDnTID1 TID2. . . . . .
Realizationof DBS
Bit listcompression
Secondary keyaccess
B*-Tree as Generalized Access Path Structure
K25 K61 K99
K8 K13 K25 K33 K45 K61 K75 K86 K99
IEmp/Dept/Mgr/Equip(Dno)
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
K55 1 TID. . . 3 1 4 TID TID TID TID TID TID TID TID . . .
PRIOR NEXT TIDs for Emp TIDs for optionalf t
TIDs for Dept TIDs forMgr
© 2011 AG DBIS 8-27
Access path structure comprises- 4 index structures- 6 link structures
Equipment reference tooverflow page
Realizationof DBS
Bit listcompression
Secondary keyaccess
Generalized Access Path Structure – Evaluation
Keys are stored only once Saving of storage space
Uniform structure for all access path typesSi lifi ti f i l t ti
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
Simplification of implementation
Support of join operation and certain statistical queries
Simple checking of referential integrity andfurther integrity constraints (e.g., cardinality restrictions)
© 2011 AG DBIS 8-28
Increased number of leaf pages More page accesses in case of scanning all records
of a record type in sort order
Height of the tree remains stable in most cases Similar performance behavior for locating data and update
Realizationof DBS
Bit listcompression
Secondary keyaccess
Join and Path Indexes
Join index• The join index VI between two tables V and S (not necessarily disjoint) with
the join attributes A and B is defined as follows:
• VI = {(v.TID, s.TID) f(v.A, s.B) is TRUE, v V, s S}
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
• f denotes a Boolean function which defines the join predicate, which may be very complex. Especially, -joins ( {=, , <, , >, }) can be specified in this way.
• Application of selection predicates and parallelism for the join
V S
TID TID
V S
TID TID
S V
TID TID
VIV: VIS:
© 2011 AG DBIS 8-29
TIDv2 TIDs4
TIDv1 TIDs3
TIDv2 TIDs2
TIDv2 TIDs6
TIDv1 TIDs3
TIDv2 TIDs2
TIDv2 TIDs4
TIDv2 TIDs6
TIDs2 TIDv2
TIDs3 TIDv1
TIDs4 TIDv2
TIDs6 TIDv2
logical view Index auf TIDV Index auf TIDS
Realizationof DBS
Bit listcompression
Secondary keyaccess
Join and Path Indexes (2)
Multi-join index• Generalization of the idea to efficiently process join operations via a
statically computed join index (compile time instead of runtime)• Index for a two-way join is used to determine the join partners in a third
table T and to extend the index table by a column for the TIDti.
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
• If two index tables for VS and ST already exist, these can be immediately used to combine them to an extended Index table VST
• If the VST join should contain only attributes of V and T, a VT index can be created. Column S is indispensable for the join computation
Multi-join index (example)Index tables for the join: logical view
© 2011 AG DBIS 8-30
V S
TIDv1 TIDs3
TIDv2 TIDs4
TIDv2 TIDs2
S T
TIDs2 TIDt1
TIDs3 TIDt2
TIDs3 TIDt3
TIDs4 TIDt4
TIDs4 TIDt5
V S T
TIDv1 TIDs3 TIDt2
TIDv1 TIDs3 TIDt3
TIDv2 TIDs4 TIDt4
TIDv2 TIDs4 TIDt5
TIDv2 TIDs2 TIDt1
Realizationof DBS
Bit listcompression
Secondary keyaccess
Join and Path Indexes (4) Example
Given are the tables Dept, Emp, Proj and EP (Eno, Jno) which embodies an (n:m) relationship between Emp (Eno, Dno, ...) and Proj (Jno, ..., Loc).
Q2: SELECT D.Dno, A.ANAMEFROM Dept D, Emp E, EP M, Proj JWHERE D Dno = E Dno
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
Path index• Integration of an index
Loc into multi-join index DEMJ
Dept Emp EP Proj Loc
TIDa1 TIDp1 TIDm1 TIDj1 Berlin
TID TID TID TID B li
WHERE D.Dno = E.DnoAND E.Eno = M.EnoAND M.Jno = J.JnoAND J.Loc = :X
• Extension to n tables possible
© 2011 AG DBIS 8-31
• Enables evaluation of special queries on the index
TIDa1 TIDp2 TIDm3 TIDj1 Berlin
TIDa1 TIDp2 TIDm4 TIDj2 Köln
TIDa2 TIDp3 TIDm5 TIDj3 Bonn
. . . . . . . . . . . . . . .
• Assumption: multi-valued reference attributes in ORDBMS
• Analogous path expression to Q2:Dept.Employs-Emp.Works-at.Loc = :X
Realizationof DBS
Bit listcompression
Secondary keyaccess
Summary
Access paths for secondary keys• Entry structure: B*-tree etc.• Link structure: pointer lists, bit lists• Many compression methods available
Support of set theoretic operations
Generalized access paths
Optimal codes
Hierarchicalaccess paths
Join & pathindexes
Support of set-theoretic operations
Compression of bit lists• Support of variable-length keys and entries required• Bit lists are highly efficient in case of low domain cardinality• Huffman codes allow for flexible adaptation to value distributions
Hierarchical access paths• Support of join operations (relational model)• Efficient processing of Set operations (network model)• Link structure: chains pointer lists lists (adjustment to special workloads)
© 2011 AG DBIS 8-32
• Link structure: chains, pointer lists, lists (adjustment to special workloads)
Generalized access path structure• Support of primary key-, secondary key- and hierarchical accesses• Also applicable as special join index
Join and path indexes• Explicit construction of join results and their indexing• Path indexes only enable optimization of special queries