lecture 12: collisions

44
LECTURE 12: COLLISIONS CSC 213 – Large Scale Programming

Upload: shiloh

Post on 22-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

CSC 213 – Large Scale Programming. Lecture 12: Collisions. Today’s Goal. Today’s Goal. Review when, where, & why we use Map s Why Sequence -based approach causes problems How hash can help solve these problems What is inappropriate and incorrect about hash jokes - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 12:  Collisions

LECTURE 12: COLLISIONS

CSC 213 – Large Scale Programming

Page 2: Lecture 12:  Collisions

Today’s Goal

Page 3: Lecture 12:  Collisions

Today’s Goal

Review when, where, & why we use Maps Why Sequence-based approach causes

problems How hash can help solve these problems What is inappropriate and incorrect about

hash jokes Discover hash’s problems & what must

be done What would happen if keys hashed to same

index Ways of handling situation so that hash still

works To remove data, using null may not be

best option Dark secrets of hashing, exposed at

lecture’s end

Page 4: Lecture 12:  Collisions

Map Performance

In many situations can be matter of life-or-death 911 Operators immediately need

addresses Google’s search performance in TB/s O(log n) time too slow for these uses

Would love to use arrays Convert key to int with hash function With result of hash, have index in table to

examine

put, remove & get only O(1) time

Page 5: Lecture 12:  Collisions

Hash Table

Entrys

0 •1 02561200

01 “Jay Doe”

2 9811010002

“Bob Doe”

3 •4 45122900

04 “Jill Roe”

⁞ ⁞999

7 •999

82007519998

“Rhi Smith”

9999 •

Hash Table

Array locations either: null Reference to Entry Marker value*

Table will contain gaps Better when spread out

Hash key to index Always start with hash

Page 6: Lecture 12:  Collisions

Ideal World

key hashed to unique index Hash and done, Entry is there

Page 7: Lecture 12:  Collisions

Ideal World

key hashed to unique index Hash and done, Entry is there

And then…

You wake up

Page 8: Lecture 12:  Collisions

Collisions

Occurs when 2 keys hash to same index

Ideal hash spreads keys out evenly across table As nice side effect, this limits collisions Small table size important also, since RAM

limited Unfortunately, no such thing as ideal

hash Must handle collisions to get O(1) efficiency

buzz

Page 9: Lecture 12:  Collisions

Bad Hash

Perfect hash does not exist Cannot know all keys beforehand Clustered around a few indices Or find all keys hashed to same index

Handling bad hash is a necessary Even given Entry always check key Store multiple Entrys with same hash (Shot of adrenaline restarts heart)

Page 10: Lecture 12:  Collisions

Bucket Arrays

Make hash table an array of linked list Nodes First node aliased by the array location

Whenever we have collision, we “chain” Entrys Create new Node to store the Entry The linked list will have new Node at its

front

0 •1

2 •3 •4 •5 •

Page 11: Lecture 12:  Collisions

Bucket Arrays

But what if have really bad hash? Hashes to same index in every situation

All Entrys now found in single linked list O(n) execution times would now be required

Page 12: Lecture 12:  Collisions

Bucket Arrays

But what if have really bad hash? Hashes to same index in every situation

All Entrys now found in single linked list O(n) execution times would now be required (Also get bad case of the munchies)

Page 13: Lecture 12:  Collisions

Collisions

Normally, table holds one Entry per index Need to be smarter when keys collide

Efficiency matters important critical If we do not care, use Sequence-based

approach Several common schemes used to

provide speed Each of these schemes has strengths &

weaknesses Silver bullets do not exist in CSC, must

balance needs If all-powerful answers desired, try

Religious Studies

Page 14: Lecture 12:  Collisions

Linear Probing

Musical chairs uses this algorithm At index where key hashed examine Entry Circle through array until empty index

found

Algorithm is very simple But creates clusters of Entrys

Page 15: Lecture 12:  Collisions

Linear Probe Example

h(x) = x mod 13Now add:

44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

15 18 44 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 16: Lecture 12:  Collisions

Linear Probe Example

h(x) = x mod 13Now add:

44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

15 18 44 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 17: Lecture 12:  Collisions

Linear Probe Example

h(x) = x mod 13Now add:

44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

15 18 44 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 18: Lecture 12:  Collisions

Linear Probe Example

h(x) = x mod 13Now add:

44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

15 18 44 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 19: Lecture 12:  Collisions

Linear Probe Example

h(x) = x mod 13Now add:

44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

15 18 44 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 20: Lecture 12:  Collisions

Probing Reaction

Oh, ****Adding to hash table still O(n)

Page 21: Lecture 12:  Collisions

Quadratic Probe

Avoids primary clustering problems But does create secondary clustering (no

one cares) Quadratic probe still simple (like linear

probe) Examine Entry , k, where key is hashed Check (k + j2) % length: k+1, k+4, k+9, k+16,

… Continue probing until unused array slot

found Guaranteed to work when:

Need to get around -- table size is prime number

Under 50% full so many open slots exist

Page 22: Lecture 12:  Collisions

Quadratic Probe Example

31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12

h(x) = x mod 13Now add:

44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

Page 23: Lecture 12:  Collisions

Quadratic Probe Example

31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12

h(x) = x mod 13Now add:

44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

Page 24: Lecture 12:  Collisions

Quadratic Probe Example

31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12

h(x) = x mod 13Now add:

44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

Page 25: Lecture 12:  Collisions

Quadratic Probe Example

31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12

h(x) = x mod 13Now add:

44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

Page 26: Lecture 12:  Collisions

Quadratic Probe Example

31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12

h(x) = x mod 13Now add:

44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

Page 27: Lecture 12:  Collisions

Quadratic Probing Reaction

Darn it to heck.Adding to hash table still O(n)

Page 28: Lecture 12:  Collisions

Double Hashing

Solve bad hash with even more hash Use 2nd hash function very different from

first 2nd hash function not allowed to return zero

Re-hash key using 2nd function after the collision Check index equal to sum of two hash

functions Re-add 2nd hash to this sum to continue

probing Guaranteed to work when

Still must get around -- table size is prime number

Page 29: Lecture 12:  Collisions

Double Hash Example

31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12

h(x) = x mod 13h2(x) = 5 - (x mod 5)

Now add: 44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

Page 30: Lecture 12:  Collisions

Double Hash Example

31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12

h(x) = x mod 13h2(x) = 5 - (x mod 5)

Now add: 44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

Page 31: Lecture 12:  Collisions

Double Hash Example

31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12

h(x) = x mod 13h2(x) = 5 - (x mod 5)

Now add: 44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

Page 32: Lecture 12:  Collisions

Double Hash Example

31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12

h(x) = x mod 13h2(x) = 5 - (x mod 5)

Now add: 44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

Page 33: Lecture 12:  Collisions

Double Hash Example

31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12

h(x) = x mod 13h2(x) = 5 - (x mod 5)

Now add: 44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5

Page 34: Lecture 12:  Collisions

Double Probing Reaction

Sweet! Double hashing keeps put O(n)

Page 35: Lecture 12:  Collisions

Probing and Searching

Search index where key hashed If cannot place Entry at index

The array must keep being probed Stop only at usable index

May need to probe every index! Searching takes O(n) even with hash

May need to reallocate & rehash table Worst case O(n) put even with perfect hash

Page 36: Lecture 12:  Collisions

Post-Removal Operations

What happens when we remove an Entry? Set index to null in most structures

Consider if we call remove(44)

15 18 44 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 37: Lecture 12:  Collisions

Post-Removal Operations

What happens when we remove an Entry? Set index to null in most structures

Consider if we call remove(44)

15 18 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 38: Lecture 12:  Collisions

Post-Removal Operations

What happens when we remove an Entry? Set index to null in most structures

Consider if we call remove(44) get(31) called, what would happen?

15 18 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 39: Lecture 12:  Collisions

Post-Removal Operations

What happens when we remove an Entry? Set index to null in most structures

Consider if we call remove(44) get(31) called, what would happen?

First check index it is hashed to

15 18 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 40: Lecture 12:  Collisions

Post-Removal Operations

What happens when we remove an Entry? Set index to null in most structures

Consider if we call remove(44) get(31) called, what would happen?

First check index it is hashed to Checks first probe indexed… 15 18 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 41: Lecture 12:  Collisions

Post-Removal Operations

What happens when we remove an Entry? Set index to null in most structures

Consider if we call remove(44) get(31) called, what would happen?

First check index it is hashed to Checks first probe indexed… & stops at null

15 18 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 42: Lecture 12:  Collisions

*Marker Value Explained

Mark cleared indices in hash table Since collision could have happened,

continue search Index can be used to store new Entry

Ways to show that array index is clear Entry with null key could be used if one is

careful Could try and make key which is never

used Use static final field of type Entry

Page 43: Lecture 12:  Collisions

Why Use Hash Table & Probes?

Hash tables can require O(n) complexity Provide O(1) time if you are really good

Ultimately depends on hash function used Choose wisely and be rich

Page 44: Lecture 12:  Collisions

Before Next Lecture…

Get updated lab project into SVN directory No need to e-mail, I will collect directories at 5PM

Finish working on week #4 assignment Due at usual time tomorrow afternoon/evening

Start thinking of your design for the project Due Friday a preliminary copy of this design

Read sections 9.3 - 9.3.1 & 9.3.3 of the book What should we do if many values for 1 key?