indexes on sequential files
DESCRIPTION
Indexes on Sequential Files. Source: our textbook, slides by Hector Garcia-Molina. How to Represent a Relation. Suppose we scatter its records arbitrarily among the blocks of the disk How to answer SELECT * FROM R? Scan every block: ridiculously slow - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/1.jpg)
1
Indexes on Sequential Files
Source: our textbook, slides by Hector Garcia-Molina
![Page 2: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/2.jpg)
2
How to Represent a Relation
Suppose we scatter its records arbitrarily among the blocks of the disk
How to answer SELECT * FROM R? Scan every block:
ridiculously slow would require lots of overhead info in
each block and each record header
![Page 3: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/3.jpg)
3
How to Represent a Relation
Reserve some blocks for the relation No need to scan entire disk How to answer SELECT * FROM R
WHERE cond ? Scan all the records in the reserved
blocks Still ridiculously slow
![Page 4: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/4.jpg)
4
Indexes Use indexes -- special data
structures -- that allow us to find all the records that satisfy a condition "efficiently"
Possible data structures: simple indexes on sorted files secondary indexes on unsorted files B-trees hash tables
![Page 5: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/5.jpg)
5
Sorted Files Sorted file: records (tuples) of the
file (relation) are in sorted order of the field (attribute) of interest.
This field might or might not be a key of the relation.
This field is called the search key. A sorted file is also called a
sequential file.
![Page 6: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/6.jpg)
6
Index on Sequential File An index is another file containing
key-pointer pairs of the form (K,a) K is a search key a is an address (pointer) The record at address a has search
key K Particularly useful when the search
key is the primary key of the relation
![Page 7: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/7.jpg)
7
Dense Indexes An index with one entry for every key in
the data file What's the point? Index is much smaller than data file when
record contains much more than just the search key
If index is small enough to fit in main memory, record with a certain search key can be found quickly: binary search in memory, followed by only one disk I/O
![Page 8: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/8.jpg)
8
Example of a Dense IndexSequential File
2010
4030
6050
8070
10090
Dense Index102030405060708090
100110120
![Page 9: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/9.jpg)
9
Some Numbers relation with 1,000,000 tuples block size is 4096 bytes 10 records per block thus 100,000 blocks, > 400 Mbytes key field is 30 bytes pointer is 8 bytes thus at least 100 key-pointer pairs per block thus dense index size is 10,000 blocks, about 40
Mbytes since log(10,000) = 13, takes at most 14 disk
I/O's for a search
![Page 10: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/10.jpg)
10
Sparse Index Uses less space than a dense index Requires more time to find a
record with a given key In a sparse index, there is just one
(key,pointer) pair per data block. The key is for the first record in the
block.
![Page 11: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/11.jpg)
11
Sparse Index ExampleSequential File
2010
4030
6050
8070
10090
Sparse Index1030507090
110130150170190210230
![Page 12: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/12.jpg)
12
Using a Sparse Index To find the record with key K,
search the index for the largest key ≤ K
Use binary search to do this Retrieve the indicated data block Search the block for the record
with key K
![Page 13: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/13.jpg)
13
Comparing Sparse and Dense Indexes
Sparse index uses much less space In the previous numeric example,
sparse index size is now only 1000 index blocks, about 4 Mbytes
Dense index, unlike sparse, lets us answer "is there a record with key K?" without having to retrieve a data block
![Page 14: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/14.jpg)
14
Multiple Levels of Index Make an index for the index Can continue this idea for more
levels, but usually only two levels in practice
Second and higher level indexes must be sparse, otherwise no savings
![Page 15: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/15.jpg)
15
Two-Level Index ExampleSequential File
2010
4030
6050
8070
10090
Sparse 2nd level1030507090110130150170190210230
1090
170250
330410490570
![Page 16: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/16.jpg)
16
Numeric Example Again Suppose we put a second-level index on
the first-level sparse index Since first-level index uses 1000 blocks
and 100 key-pointer pairs fit per block, we need 10 blocks for second-level index
Very likely to keep the second-level index in memory
Thus search requires at most two disk I/O's (one for block of first-level index, one for data block)
![Page 17: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/17.jpg)
17
Duplicate Search Keys What if more than one record has a
given search key value? (Then the search key is not a key of the relation.)
Solution 1: Use a dense index and allow duplicate search keys in it.
To find all data records with search key K, follow all the pointers in the index with search key K
![Page 18: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/18.jpg)
18
Solution 1 Example
1010
2010
3020
3030
4540
10101020
20303030
1010
2010
3020
3030
4540
10101020
20303030
![Page 19: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/19.jpg)
19
Duplicate Search Keys with Dense Index
Solution 2: only keep record in index for first data record with each search key value (saves some space in the index)
To find all data records with search key K, follow the one pointer in the index and then move forward in the data file
![Page 20: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/20.jpg)
20
Solution 2 Example
1010
2010
3020
3030
4540
10203040
![Page 21: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/21.jpg)
21
Duplicate Search Keys with Sparse Index
Recall that index has an entry for just the first data record in each block
To find all data records with key K: find last entry (E1) in index with key ≤ K move toward front of index until reaching
entry (E2) with key < K Check data blocks pointed to by entries
from E2 to E1 for records with search key K
![Page 22: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/22.jpg)
22
Dupl. Keys w/ Sparse Index
1010
2010
3020
3030
4540
10102030
care
ful i
f loo
king
for 2
0 or
30!
![Page 23: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/23.jpg)
23
Variation on Previous Scheme
Index entry for a data block holds smallest search key that is new (did not appear in a previous block)
If there is no new search key in that block, then index entry holds the lone search key in the block
To find all data record with key K: search index for first entry whose key is either
K, or < K but next key is > K if a record with key K is in that block then scan
forward from there
![Page 24: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/24.jpg)
24
Variation Example
1010
2010
3020
3030
4540
10203030
shouldthis be40?
![Page 25: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/25.jpg)
25
Inserting and Deleting Data
Recall three main techniques: create/delete overflow blocks
overflow blocks do not have entries in a sparse index
may be able to insert new blocks in sequential order new block needs an entry in a sparse index changing an index can create same problems
make room in a full block by sliding some data to an adjacent block; combine adjacent blocks if they get too empty
![Page 26: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/26.jpg)
26
General Strategy When data file changes, index must
adapt Details depend on whether index is
sparse or dense and how data file modifications are implemented
Index file is itself sequential, so same strategies as for modifying data files can be applied to index files
![Page 27: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/27.jpg)
27
Effects of Actions on IndexAction Dense Index Sparse IndexCreate empty overflow block
none none
Delete empty overflow block
none none
Create empty (main) block
none insert
Delete empty (main) block
none delete
Insert record insert maybe updateDelete record delete maybe updateSlide record update maybe update
![Page 28: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/28.jpg)
28
Explanations for Actions create/destroy empty overflow block has
no effect on dense index since it refers to records sparse index since it refers to main records
create/destroy empty main block: no effect on dense index as above insert/delete entry in sparse index
insert/delete/slide record: insert/delete/update entry in dense index only change sparse index if affects first
record in block
![Page 29: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/29.jpg)
29
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
![Page 30: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/30.jpg)
30
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 40
![Page 31: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/31.jpg)
31
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 30
4040
![Page 32: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/32.jpg)
32
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete records 30 & 40
5070
![Page 33: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/33.jpg)
33
Deletion from dense index
2010
4030
6050
8070
10203040
50607080
![Page 34: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/34.jpg)
34
Deletion from dense index
2010
4030
6050
8070
10203040
50607080
– delete record 30
4040
![Page 35: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/35.jpg)
35
Insertion, sparse index case
2010
30
5040
60
10304060
![Page 36: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/36.jpg)
36
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 34
34
• our lucky day! we have free space where we need it!
![Page 37: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/37.jpg)
37
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 15
1520
3020
• Illustrated: Immediate reorganization• Variation:
– insert new block (chained file)– update index
![Page 38: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/38.jpg)
38
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 2525
overflow blocks(reorganize later...)
![Page 39: Indexes on Sequential Files](https://reader035.vdocument.in/reader035/viewer/2022062222/56815c20550346895dc9f61e/html5/thumbnails/39.jpg)
39
Insertion, dense index case
• Similar• Often more expensive . . .