tree structured indexing

Upload: lokesh-bhojwani

Post on 08-Apr-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Tree Structured Indexing

    1/27

    TREE STRUCTUREDINDEXING

    Dr. Hari Om Gupta

    Professor, Department of Electrical Engineering

    IIT Roorkee

  • 8/6/2019 Tree Structured Indexing

    2/27

    Indexed Sequential Access Method (ISAM)

    P0 K 1 P1 K 2 P2 K m Pm

    Format for an Index Page or Block

  • 8/6/2019 Tree Structured Indexing

    3/27

    K 1 K 2 K n

    Page 1 Page 2Page n+1

    Page 3

    One level index structure

    Index file

    Data file

  • 8/6/2019 Tree Structured Indexing

    4/27

    ISAM INDEX STRUCTURE

    - - -

    - - - - - -

    - - - - - - - -

    OVER FLOWPAGES

    PRIMARY PAGES

    NON LEAF PAGES

    LEAFPAGES

  • 8/6/2019 Tree Structured Indexing

    5/27

    Non leaf pages contains index entries of the form(search key value,page id )

    Can use alternative (1) or (2) or (3), but commonly alternate (2) or (3)is used. (key, r id)

    Data pages

    Index pages

    Overflow pages

    Page allocation in ISAM

  • 8/6/2019 Tree Structured Indexing

    6/27

    18 33

    42

    51 64

    20* 27* 33* 37* 42* 46* 64* 97*10* 15* 51* 55*

    Root

    Sample ISAM Tree

  • 8/6/2019 Tree Structured Indexing

    7/27

    18 33

    42

    51 64

    20* 27* 33* 37* 42* 46* 64* 97*10* 15* 51* 55*

    Root

    Sample ISAM Tree23* 52* 63*

    53*

    Non leaf pages

    Leaf pages

    Overflow pages

    ISAM Tree after inserts

  • 8/6/2019 Tree Structured Indexing

    8/27

    To search a record , the number of disk I/O is equal to the number of levels of the tree and is equal to

    log F N N= no. of primary leaf pages; F= Fan-out

    1,000,000 records file, 10records/page and F=100 N=1,000,000/10= 100,000

    Levels=log 100(100,000)=3Thus number I/O of the tree= 3 to reach the required page

    If we use binary search for the sorted file, number of stepsEqual to

    log 2(100,000)= 17 steps

    Deletion in ISAM: BLANK OVERFLOW PAGES ARE

    RELEASED WHEREAS BLANK PRIMARY PAGES ARERETAINED.

  • 8/6/2019 Tree Structured Indexing

    9/27

    18 33

    42

    51 64

    20* 27* 42* 46* 64* 97*10* 15* 51* 55*

    Root

    Sample ISAM Tree23* 52* 63*

    Non leaf pages

    Leaf pages

    Overflow pages

    ISAM Tree after deletes (53,33,37)

  • 8/6/2019 Tree Structured Indexing

    10/27

    Disadvantages

    Long overflow chains Retrieval time increases

    to overcome this

    (a) Initially 20% of each page is kept free.

    (b) Eliminate overflow chains by a complete reorganization

    of the file.

    There may be to many blanks if data shrink

    Advantages No need to lock index level pages thus queues & waiting

    time to get access to a page is reduced in comparison to B + tree.

    It is static , response time will be less if overflow pages and blank pages are few .

  • 8/6/2019 Tree Structured Indexing

    11/27

    B+ TREE(A DYNAMIC INDEX STRUCTURE)

    Operations (insert, delete) on tree keep it balanced.A minimum occupancy of 50% is guaranteed for each node exceptroot node.

    Searching for a record requires just a traversal from the root node

    to the appropriate leaf.Leaf pages are sorted and have pointers to link from one page toother page.Height is normally 3 or 4.Index level pages are to be locked tree dynamically changes.

  • 8/6/2019 Tree Structured Indexing

    12/27

    K 1 K 2 K n

    Page 1 Page 2Page n+1

    Page 3

    One level index structure

    Index file

    Data file

  • 8/6/2019 Tree Structured Indexing

    13/27

    Page 1 Page 2Page n

    Page 3

    One level index structure

    Index entries

    Data entries

    Non leaf nodes

    Leaf pages (a sequence set-> sorted file)

    B+ tree structure : index file

  • 8/6/2019 Tree Structured Indexing

    14/27

    B+ Tree

    A node other then root node may contain m entries such thatd

  • 8/6/2019 Tree Structured Indexing

    15/27

    Insert

    14 17 24 30

    1 3 5 7 15 16 19 20 23 24 26 29 33 34 39 41

    B+ tree, order d=2Insert record with key value 8

    1 3 5 7 8

    5

    Root

    Split leaf pages during insert of entry 8

    Left side Right sided d+1

  • 8/6/2019 Tree Structured Indexing

    16/27

    5 13

    1 3 15 16 19 20 23 24 26 29 33 34

    After inserting record with key value 8

    5 7 8

    24 30

    17Root

  • 8/6/2019 Tree Structured Indexing

    17/27

    Insert8 17 24 30

    1 3 5 7 8 15 16 19 20 23 24 26 29 33 34 39 41

    B+ tree after inserting Entry 8 using redistribution

    Root

    F or Redistribution

    Have to retrieve the sibling with empty cell

    Checking for redistribution increases I/O for index node spit.Thus spit may not be advantageous.

    F or growing files

    (a) Do not redistribute for non leaf vacancies

    (b) Limited redistribution ( Only with neighbours ) for leaf pages

  • 8/6/2019 Tree Structured Indexing

    18/27

    DeleteSearch & delete with following restriction

    If a node is at minimum occupancy before deletion anddeletion causes it to go below the occupancy threshold. Whenthis happens, we must either redistribute entries from anadjacent sibling, or merge the node with a sibling, in order tomaintain minimum occupancy

  • 8/6/2019 Tree Structured Indexing

    19/27

    5 13

    1 3 15 16 19 20 24 26 29 33 34

    After deleting record with key value 23

    5 7 8

    24 30

    17Root

    19 24 26 29 33 34 39 41

    24 30

    19 24 26 29 33 34 39 41

    30

    Partial B+ Tree during deletion of Entry 20

  • 8/6/2019 Tree Structured Indexing

    20/27

    5 13 17 30

    1 3 5 7 8 15 16 19 24 26 29 33 34 39 41

    Root

    B+ Tree during deletion of Entry 20

  • 8/6/2019 Tree Structured Indexing

    21/27

    DuplicatesISAM

    Overflow pagesB+ Tree

    Use (key, r id) the record entry no. along with keymay be used as search key

  • 8/6/2019 Tree Structured Indexing

    22/27

    B+

    Tree in PracticeK ey Compression and Code

    F an out of the tree= F

    Height of the tree = log F (# of data entry)Thus higher the fan-out, lower is tree height andresponse time will be less

    To increase the fan out, use small size of search key.Code is normally used to reduce the size of the searchkey.

  • 8/6/2019 Tree Structured Indexing

    23/27

    Dani LalMan

    Data RamVerma

    Devand Sagar

    Dani LalMan

    Dara Singh . . .

    Dan Data DevAfter key compression

  • 8/6/2019 Tree Structured Indexing

    24/27

    B+ Tree Bulk Loading

    1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41

    RootSorted pages of data entries not yet in B+tree

    1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41

    6 9

    RootSorted pages of data entries not yet in B+tree

  • 8/6/2019 Tree Structured Indexing

    25/27

    1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41

    6

    RootSorted pages of data entries not yet in B+tree12

    9

    1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41

    6

    Root

    data entries not yet in B+tree

    12

    9 15

    20 31

  • 8/6/2019 Tree Structured Indexing

    26/27

    1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41

    6

    Root

    12

    9

    20 39

    31

    15

    Complete B+

    Tree

  • 8/6/2019 Tree Structured Indexing

    27/27