hashing and file structure

35

Upload: satvik-khara

Post on 09-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 1/35

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 2/35

The best search method introduced so far is thebinary search technique which has search timeproportional to log2n.

A hash search is a search in which the key,through an algorithmic function determineslocation of the data.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 3/35

A function that transforms a key into tableindex is called a hash function.

If h is hash function and KEY is the key thenh(KEY) is called the hash of key.

If r is record and h is hash function then hr iscalled the hash key of r.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 4/35

Position Key Record

0 41823 A

1 41824 B

. . .

. . .

. . .

1000 989851 Z

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 5/35

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 6/35

If we have a table of n employee records, each

identified by an employee number key whosevalue lies between 1 to n, the key value locatesthe employee record.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 7/35

A good hashing function avoids collision.

A good hash function spread keys in arrays.

A good hash function is easy to compute.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 8/35

Direct method

Subtraction method

Modulo division method

Digit Extraction method

Mid square method

Folding method

Rotation method

Pseudo random method

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 9/35

The key is address without any algorithmicmanipulation.

Example: A small organization has fewer than100 employees. Each employee assigned anemployee number from 1 to 100.Here if wecreate array of 100 employee record, theemployee number can be directly used asaddress of individual record

It can only be used for fewer records.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 10/35

Sometimes we have consecutive keys but notstarting from 1.

For example a company may have only 100 

employees but the employee number startingfrom 1000 to 1100.Here very simple hashingfunction that subtracts 1000 from the key todetermine the address.

It can be used only for fewer records.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 11/35

Also known as division remainder method, itdivides the key by the array size and uses theremainder plus 1 for address.

For example: address=key modulo listsize+1Let key=23, listsize=10 then

address= 23%10 + 1

=3+1

=4.

Means key whose value is 23 is placed at address4.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 12/35

Using digit extraction, selected digits areextracted from key and used as address

Example: let employee number be 12345 thenwe can select 1st, 2nd and 3rd digit as theaddress. So address will be 123.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 13/35

In, mid square hashing, the key is squared andaddress is selected from the middle of thesquared number.

The main limitation of this method is the sizeof key, for a 6 digit its square is of 12 digitswhich exceeds the integer size.

For example:15*15

=225

so mid square addressis 2.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 14/35

The key value is divided into parts whose sizematch the size of required address then the leftand right parts are shifted and added in middle

part. For Example: consider 9 digit roll no divided

into 3 parts of 3 digits.

1

28

364

61

2Address=128+364+312=104 

The resulting sum is greater than 999 sodiscard leading element.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 15/35

While creating the groups, the left and rightnumbers are reversed on fixed boundarybetween them and center number keeping as it

is are added. If resulting number exceeds 999 then the leading element is ignored

Example: consider 128 364 612

So Address=8

21

+364

+21

6=401

 The resulting sum is greater than 999 sodiscard leading element.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 16/35

Rotation hashing generally implemented withother hashing method. It is most useful whenkeys are serially assigned. The algorithm is

rotating the last character in front of key. Example:

Original key Key after rotation

6100101 1610010

6100102 2610010

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 17/35

Let us consider x as the key. According to y=a*x +c.We multiply x with a and then add it to c. a andc are randomly selected. The result is further

calculated by modulo division method.Address= a * x + c modulus list size + 1

=3*100 + 7 % 10 + 1

=300+7%10+1

=307%10+1

=7+1

=8

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 18/35

Two records cannot occupy same position.Such a situation when two records occupyingthe same position is called as hash clash or

hash collision.

Rehashing takes place after collision.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 19/35

Primary clustering-When two keys hash intodifferent values compete with each other insuccessive rehashes is called primary clustering

Secondary clustering-When different keys thathash to same value follow same rehash path isknown as secondary clustering.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 20/35

Rehashing open address

Chaining

Bucket hashing

Quadratic probe

Pseudorandom collision resolution

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 21/35

Rehashing involves using a secondary hashfunction on the hash key of the item. Therehashing function is applied until an empty

position is found where item can be inserted.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 22/35

Chaining builds the linked list of all itemswhose keys hash to same values. During thesearch this short linked list is traversed

sequentially for the desired key.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 23/35

In bucket hashing, a bucket that accommodatemultiple data occurrences is used.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 24/35

In quadratic probe, the increment is thecollision probe number. For 1st collision add1*1, for 2nd collision add 2*2, for 3rd collision

add 3*3 and so on«

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 25/35

First we place the keys using modulo divisionmethod and collision takes place so we usepseudorandom method until we find theempty location.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 26/35

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 27/35

4 Terms are used when we discuss files1) Field-It is the basic element of data. for e.g. student

first name is a field having some data type.2) Record- A record is collection of related fields that

can be treated as same by some program orapplicatione.g. Employee record contain fields as name,address, job type, etc.

3) File-A file is the collection of similar records. Filehave unique names and may be created ordeleted.

4) Database-A database is collection of related data.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 28/35

File organization is permanent logical structureof the file.

You tells your computer how to retrieverecords from file.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 29/35

Rapid access

Ease of update

Less storage space

Simple maintainenece

Reliablity

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 30/35

Pile

Sequential

Indexed sequential

Indexed

Hashed

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 31/35

It is simplest possible organization, the data arecollected in the file and also not required thatthe file must have same format.

Thus each record must be self describingincluding field name as well as values.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 32/35

This is common structure for large files, therecords are stored in the file according to key.

The key must be uniquely identify a record,

hence different keys have different records. However adding and deleting becomes some

what difficult.

Example:

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 33/35

Here index file is used to speed up the searchprocess and to overcome the above mentioneddifficulty.

The single level indexing structure is thesimplest one in which record contains a keypointer.

This pointer is the position of the data file .

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 34/35

An index file contains records order by recordkey.

The record key uniquely identifies the record

and determine the sequence in which it isaccessed with respect to other records.

8/8/2019 Hashing and File Structure

http://slidepdf.com/reader/full/hashing-and-file-structure 35/35

In hash file organization address or hashfunction is used as the key.

The direct files make use of hashing on keyvalue.