tree structured indexing
TRANSCRIPT
-
8/6/2019 Tree Structured Indexing
1/27
TREE STRUCTUREDINDEXING
Dr. Hari Om Gupta
Professor, Department of Electrical Engineering
IIT Roorkee
-
8/6/2019 Tree Structured Indexing
2/27
Indexed Sequential Access Method (ISAM)
P0 K 1 P1 K 2 P2 K m Pm
Format for an Index Page or Block
-
8/6/2019 Tree Structured Indexing
3/27
K 1 K 2 K n
Page 1 Page 2Page n+1
Page 3
One level index structure
Index file
Data file
-
8/6/2019 Tree Structured Indexing
4/27
ISAM INDEX STRUCTURE
- - -
- - - - - -
- - - - - - - -
OVER FLOWPAGES
PRIMARY PAGES
NON LEAF PAGES
LEAFPAGES
-
8/6/2019 Tree Structured Indexing
5/27
Non leaf pages contains index entries of the form(search key value,page id )
Can use alternative (1) or (2) or (3), but commonly alternate (2) or (3)is used. (key, r id)
Data pages
Index pages
Overflow pages
Page allocation in ISAM
-
8/6/2019 Tree Structured Indexing
6/27
18 33
42
51 64
20* 27* 33* 37* 42* 46* 64* 97*10* 15* 51* 55*
Root
Sample ISAM Tree
-
8/6/2019 Tree Structured Indexing
7/27
18 33
42
51 64
20* 27* 33* 37* 42* 46* 64* 97*10* 15* 51* 55*
Root
Sample ISAM Tree23* 52* 63*
53*
Non leaf pages
Leaf pages
Overflow pages
ISAM Tree after inserts
-
8/6/2019 Tree Structured Indexing
8/27
To search a record , the number of disk I/O is equal to the number of levels of the tree and is equal to
log F N N= no. of primary leaf pages; F= Fan-out
1,000,000 records file, 10records/page and F=100 N=1,000,000/10= 100,000
Levels=log 100(100,000)=3Thus number I/O of the tree= 3 to reach the required page
If we use binary search for the sorted file, number of stepsEqual to
log 2(100,000)= 17 steps
Deletion in ISAM: BLANK OVERFLOW PAGES ARE
RELEASED WHEREAS BLANK PRIMARY PAGES ARERETAINED.
-
8/6/2019 Tree Structured Indexing
9/27
18 33
42
51 64
20* 27* 42* 46* 64* 97*10* 15* 51* 55*
Root
Sample ISAM Tree23* 52* 63*
Non leaf pages
Leaf pages
Overflow pages
ISAM Tree after deletes (53,33,37)
-
8/6/2019 Tree Structured Indexing
10/27
Disadvantages
Long overflow chains Retrieval time increases
to overcome this
(a) Initially 20% of each page is kept free.
(b) Eliminate overflow chains by a complete reorganization
of the file.
There may be to many blanks if data shrink
Advantages No need to lock index level pages thus queues & waiting
time to get access to a page is reduced in comparison to B + tree.
It is static , response time will be less if overflow pages and blank pages are few .
-
8/6/2019 Tree Structured Indexing
11/27
B+ TREE(A DYNAMIC INDEX STRUCTURE)
Operations (insert, delete) on tree keep it balanced.A minimum occupancy of 50% is guaranteed for each node exceptroot node.
Searching for a record requires just a traversal from the root node
to the appropriate leaf.Leaf pages are sorted and have pointers to link from one page toother page.Height is normally 3 or 4.Index level pages are to be locked tree dynamically changes.
-
8/6/2019 Tree Structured Indexing
12/27
K 1 K 2 K n
Page 1 Page 2Page n+1
Page 3
One level index structure
Index file
Data file
-
8/6/2019 Tree Structured Indexing
13/27
Page 1 Page 2Page n
Page 3
One level index structure
Index entries
Data entries
Non leaf nodes
Leaf pages (a sequence set-> sorted file)
B+ tree structure : index file
-
8/6/2019 Tree Structured Indexing
14/27
B+ Tree
A node other then root node may contain m entries such thatd
-
8/6/2019 Tree Structured Indexing
15/27
Insert
14 17 24 30
1 3 5 7 15 16 19 20 23 24 26 29 33 34 39 41
B+ tree, order d=2Insert record with key value 8
1 3 5 7 8
5
Root
Split leaf pages during insert of entry 8
Left side Right sided d+1
-
8/6/2019 Tree Structured Indexing
16/27
5 13
1 3 15 16 19 20 23 24 26 29 33 34
After inserting record with key value 8
5 7 8
24 30
17Root
-
8/6/2019 Tree Structured Indexing
17/27
Insert8 17 24 30
1 3 5 7 8 15 16 19 20 23 24 26 29 33 34 39 41
B+ tree after inserting Entry 8 using redistribution
Root
F or Redistribution
Have to retrieve the sibling with empty cell
Checking for redistribution increases I/O for index node spit.Thus spit may not be advantageous.
F or growing files
(a) Do not redistribute for non leaf vacancies
(b) Limited redistribution ( Only with neighbours ) for leaf pages
-
8/6/2019 Tree Structured Indexing
18/27
DeleteSearch & delete with following restriction
If a node is at minimum occupancy before deletion anddeletion causes it to go below the occupancy threshold. Whenthis happens, we must either redistribute entries from anadjacent sibling, or merge the node with a sibling, in order tomaintain minimum occupancy
-
8/6/2019 Tree Structured Indexing
19/27
5 13
1 3 15 16 19 20 24 26 29 33 34
After deleting record with key value 23
5 7 8
24 30
17Root
19 24 26 29 33 34 39 41
24 30
19 24 26 29 33 34 39 41
30
Partial B+ Tree during deletion of Entry 20
-
8/6/2019 Tree Structured Indexing
20/27
5 13 17 30
1 3 5 7 8 15 16 19 24 26 29 33 34 39 41
Root
B+ Tree during deletion of Entry 20
-
8/6/2019 Tree Structured Indexing
21/27
DuplicatesISAM
Overflow pagesB+ Tree
Use (key, r id) the record entry no. along with keymay be used as search key
-
8/6/2019 Tree Structured Indexing
22/27
B+
Tree in PracticeK ey Compression and Code
F an out of the tree= F
Height of the tree = log F (# of data entry)Thus higher the fan-out, lower is tree height andresponse time will be less
To increase the fan out, use small size of search key.Code is normally used to reduce the size of the searchkey.
-
8/6/2019 Tree Structured Indexing
23/27
Dani LalMan
Data RamVerma
Devand Sagar
Dani LalMan
Dara Singh . . .
Dan Data DevAfter key compression
-
8/6/2019 Tree Structured Indexing
24/27
B+ Tree Bulk Loading
1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41
RootSorted pages of data entries not yet in B+tree
1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41
6 9
RootSorted pages of data entries not yet in B+tree
-
8/6/2019 Tree Structured Indexing
25/27
1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41
6
RootSorted pages of data entries not yet in B+tree12
9
1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41
6
Root
data entries not yet in B+tree
12
9 15
20 31
-
8/6/2019 Tree Structured Indexing
26/27
1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41
6
Root
12
9
20 39
31
15
Complete B+
Tree
-
8/6/2019 Tree Structured Indexing
27/27