part ii chapter 8 hashing introduction consider we may perform insertion, searching and deletion on...

25
Part II Chapter 8 Hashing

Upload: annabelle-snellgrove

Post on 15-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

Part II

Chapter 8 Hashing

Page 2: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

IntroductionConsider we may perform insertion,

searching and deletion on a dictionary (symbol table).Array

Linked listTree

SortedNot

Sortedunbalanc

edbalanced

Insertion

O(n) / O(1) O(h) O(h) O(logk n)

Searching

O(log n) / O(1)

O(h) O(h) O(logk n)

Deletion

O(n) / O(1) O(h) O(h) O(logk n)Is it possible to perform these operations in O(1) ?

Is it possible to perform these operations in O(1) ?

Page 3: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

IntroductionIf we find a mapping from a key to an

index, then we can locate a record quickly according its key and perform random access.

S1S2S3…

012…

Page 4: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

IntroductionThis mapping can be illustrated as

follows:

Hashing: define a function h so that h(Key) = i, where h is called a hash function.

Two kindsStatic hashingDynamic hashing

hhKey

i

Page 5: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

8.2 Static Hashing

Page 6: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

DefinitionIn static hashing, identifiers/keys are

stored in table with a fixed size that is called hash table.

slot1 slot2

Bucket 0Bucket 1Bucket 2

Bucket n

Bucket: Each bucket has its

own address and is capable of holding a key.

hhx h(x)

Hash function

Identifier Bucket address

Page 7: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

DefinitionSlot: Each bucket may consists of s

slots to hold synonym (同義字 )i1 and i2 are synonyms if h(i1) = h(i2).

Distinct synonyms enter into the same bucket as long as the bucket has slots available.

Page 8: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

ExampleNumber of buckets:Number of slots for each

bucket:Define hashing function f(x)

f(x) = {i | i is the order of the initial of x}.

A and A2 are synonyms.GA and GB are synonyms.If “Doll” enters, it will be

put at buckect _______ (according to the hash function).

A A2

slot1 slot2

Bucket 0Bucket 1Bucket 2

Bucket 25

DBucket 3

GA GB

Page 9: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

Overflow and CollisionOverflow occurs when a new identifier is

mapped into a full bucket.Collision occurs when two non-identical

identifiers are hashed into the same bucket.If the number of slot is 1, then overflow and

collision occur simutaneously.

A A2

slot1 slot2

Bucket 0Bucket 1Bucket 2

If A3 enters bucket 0, A3 collides with A and A2. The bucket overflows as well.

Page 10: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

8.2.2 Hash FunctionsIdeally, we expect to find a hash

function that is one-to-one and easy to compute.

The hash function f(x) wheref(x) = {i | i is the order of the initial of x}.The hash function can result in a lot of

collisions because it only considers the initial character.

Key points: use every character in the identifier as possible.

Page 11: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

Common ApproachesDivisionMid-squareFoldingDigit Analysis

Page 12: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

DivisionThe most widely used hash functionThe key k is divided by some

number D, and the remainder is used as the bucket address.h(k) = k % DSince the bucket address is from 0 to b-1 if there are b buckets, D is usually selected as the number of buckets.

Page 13: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

Selecting The DivisorWhen the divisor is an even number, odd

integers hash into odd home buckets and even integers into even home buckets.

20%14 = 6, 30%14 = 2, 8%14 = 815%14 = 1, 3%14 = 3, 23%14 = 9

When the divisor is an odd number, odd (even) integers may hash into any home.

20%15 = 5, 30%15 = 0, 8%15 = 815%15 = 0, 3%15 = 3, 23%15 = 8

The bias in the keys does not result in a bias toward either the odd or even home buckets.

Better chance of uniformly distributed home buckets.

So do not use an even divisor.

Page 14: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

Selecting The DivisorSimilar biased distribution of home buckets is

seen, in practice, when the divisor is a multiple of prime numbers such as 3, 5, 7, …

The effect of each prime divisor p of b decreases as p gets larger.

Ideally, choose b so that it is a prime number.Alternatively, choose b so that it has no prime

factor smaller than 20.

Page 15: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

Mid-squareSquaring the key and then using an

appropriate number of bits from the middle of the square.

Example:Suppose a character is represented in 6 bits

and the bucket size is 2r.0 1 3 4

A 1

0 0 0 0 0 1 0 1 1 0 1 0 92

92x92=84640 1 0 0 0 0 0 1 0 0 0 01 0 0

r bits

Page 16: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

Mid-squareExample

Key = 113586, m =10000, where 9999 is the largest bucket address.

Squaring the key, and then we have

1 2 9 0 1 7 7 9 3 9 6

h(x) = 1779

Page 17: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

FoldingThe key k is partitioned into several parts,

all of the same length. These partitions are then added together to obtain the hash address of k.

Two schemesShift foldingFolding at the boundaries

1 2 3 2 0 3 2 4 1 1 1 2 2 0

P1 P2 P3 P4 P5

Page 18: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

P1

Folding

P2

P3

P4

P5

1 2 3

2 0 32 4 11 1 2 2 0

6 9 9

Shift folding

P1

P2

P3

P4

P5

1 2 3

3 0 22 4 12 1 1 2 0

8 9 7

Folding at the boundaries

Page 19: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

Overflow HandlingAn overflow occurs when the home bucket for a

new pair (key, element) is full.We may handle overflows by:

Search the hash table in some systematic fashion for a bucket that is not full.Linear probing (linear open addressing).Quadratic probing.Rehashing.

Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the bucket address.Array linear list.Chain.

Page 20: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

Linear ProbingAlso called linear opening addressing

Search one by one until a empty slot is found.Procedures: suppose b denotes the bucket

size.1.Compute h(k).2.Examine the hash table buckets in the order

ht[h(k)], ht[(h(k)+1)%b],…, ht[(h(k)+j)%b] until one of the following happens: ht[(h(k)+j)%b] has a pair whose key is k; k is

found. ht[(h(k)+j)%b] is empty; k is not in the table. Return to ht[h(k)]; the table is full.

Page 21: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

Linear Probing

divisor = b (number of buckets) = 17.Bucket address = key % 17.

0 4 8 12

16

• Insert pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45

6 12

29

34 28

11

23 70 33

30

45

Page 22: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

Linear Probing0 4 8 1

2166 1

229

34 28

11

23 70 33

30

45

Consider: when 51 enters, how many comparisons are required?

Linear opening addressing tends to create “cluster”. These clusters become larger as more synonyms enter.

Page 23: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

Quadratic ProbingSuppose i is used as the increment.When overflow occurs, the search is carried

out by examining h(x), (h(x)+i2)%b, and (h(x)-i2)%b.For 1≦i ≦(b-1)/2 and b is a prime number of

4j+3.For example, b=3, 7, 11,…,43, 59..

Page 24: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

RehashingIf overflow occurs at hi(x), then try hi+1(x).

Use a series of hash function h1, h2, …, hm to find an empty bucket.

h1 h2 hmx hm(x

)

Page 25: Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree

Chaining[0]

[4]

[8]

[12]

[16]

12

6

34

29

28

11

237

0

33

30

45

Disadvantage of linear probingComparison of

identifiers with different hash values.

Use linked list to connect the identifiers with the same hash value and to increase the capacity of a bucket.