complex hashing & chaining

Post on 24-Feb-2016

33 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Complex Hashing & Chaining. std ::Hash Functors. C++11 STL includes hash functors Instantiate and use as function:. std ::Hash Functors. C++11 STL includes hash functors Instantiate and use as function: One line version:. Other Types. How do we hash: Point? Employee? BitmapImage ?. - PowerPoint PPT Presentation

TRANSCRIPT

Complex Hashing

std::Hash Functors

• C++11 STL includes hashfunctors– Instantiate and use as function:

std::Hash Functors

• C++11 STL includes hashfunctors– Instantiate and use as function:

– One line version:

Other Types

• How do we hash:– Point?– Employee?– BitmapImage?

Other Types

• Cover as many bits as possible

Other Types

• Cover as many bits as possible• Combine all values that vary– "John Smith" K100203 vs "John Smith" K923424

Bitwise XOR

• Bitwise XOR : ^– combines binary values, preserves entropy

0101 ^ 1111 = 10100101 ^ 0000 = 01010101 ^ 1011 = 1110

Other Types

• Cover as many bits as possible• Combine all values that vary– "John Smith" K100203 vs "John Smith" K923424

• Try to make the lowest bits most random– 2013/05/28day << 20 ^ month << 10 ^ yearyear << 20 ^ month << 10 ^ day

Other Types

• Point:– Use shift to cover greater range

Other Types

• Person: combine hashes of parts

• Person p1:"John Smith"

• Say hash code for John Smith is 17…

Hashing Danger

012 p134

• Person p1:"John Smith"

• Say hash code for John Smith is 17…

p1.firstName = "Bob"

Hashing Danger

012 p134

• Person p1:"John Smith"

• Say hash code for John Smith is 17…

p1.firstName = "Bob"

hash(p1) just changedwon't find p1!

Hashing Danger

012 p134

Hashing Danger

• NEVER modify something being used as a hashed value in hash table!!!– Remove, modify, reinsert

or– Use immutable values for hashing

Probing Alternatives

Probing Review

• Linear Probing Issues:– Clusters

0 1 2 3 4 5 6 7 8 9

12 22 32

Quadratic Probing

• Quadratic Probing :– ith attempt to find new location shifts i2 from

original location0 1 2 3 4 5 6 7 8 9

12

Quadratic Probing

• Quadratic Probing :– ith attempt to find new location shifts i2 from

original location

Insert 22 – 1st extra attempt goes 1 slot over

0 1 2 3 4 5 6 7 8 9

12 22

Quadratic Probing

• Quadratic Probing :– ith attempt to find new location shifts i2 from

original location

Insert 32 – 1st extra attempt goes 1 slot over

0 1 2 3 4 5 6 7 8 9

12 22

Quadratic Probing

• Quadratic Probing :– ith attempt to find new location shifts i2 from

original location

Insert 32 – 2nd extra attempt goes 4 slots over

0 1 2 3 4 5 6 7 8 9

12 22 32

Quadratic Probing

• Quadratic Probing :– ith attempt to find new location shifts i2 from

original location

Insert 2 – 3rd extra attempt goes 9 slots over wraps around

0 1 2 3 4 5 6 7 8 9

2 12 22 32

Quadratic Probing

• Quadratic Probing :– Reduces clustering – May not visit every index!

• Other probing alternatives : – Multiple hashing– Second hash to determine step size

0 1 2 3 4 5 6 7 8 9

2 12 22 32

Chaining

• Chaining (Closed Addressing) :Each bucket can hold multiple values

0 1 2 3 4 5 6 7 8 9

12 22

Chaining

• Chaining (Closed Addressing) :Each bucket can hold multiple values

• Implementation– Linked List• Holds a few/zero items efficiently

– Array Based List• Better use of cache?• Wasted space?

– Binary Tree

Basic Set Algorithms• Contains(x)

– Calculate right linked list based on hash(x)– Search linked list

• Add(x)– Calculate right linked list based on hash(x)– Search linked list – if not there, add to list

• Remove(x)– Calculate right linked list based on hash(x)– Search linked list – if found, remove

Efficiency

• Avg time proportional to load factor • O() = O()

Efficiency

• Avg time proportional to load factor • O() = O()• If k is constant, technically O(n)• If k grows proportionally with n = O(1)

Efficiency

• Avg time proportional to load factor • O() = O()• If k is constant, technically O(n)• If k grows proportionally with n = O(1)• Hash table grows when load factor too large– Cost of all ops O(1)– Insert is amortized O(1)

Real World

• Cache use oftendetermining factor

• Linear probing produces clusters, but all on same cache page

But

• No natural ordering

Tree HashTableOrdered No order

Common ops: O(logN) O(1)

Ordered - O(1)

• Space vs Time trade offs– Hybrid/Duplicative representations

HashMap

• Map– Key/Value pairs

John Smith521-1234

• HashMap– Identity determined by key• Only hash key

– Value stored with key in table

top related