[0][1][2][3][4][5][6][7][8][9] bing david ina abhinav erik hyun jim fiona gheeta chelsea i can...

6
Introduction to Hash Tables

Upload: suzanna-lloyd

Post on 20-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [0][1][2][3][4][5][6][7][8][9] Bing David Ina Abhinav Erik Hyun Jim Fiona Gheeta Chelsea I can easily loop through all the student records by using a

Introduction to Hash Tables

Page 2: [0][1][2][3][4][5][6][7][8][9] Bing David Ina Abhinav Erik Hyun Jim Fiona Gheeta Chelsea I can easily loop through all the student records by using a

[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]

Bing

David

Ina

Abhinav

Erik

Hyun

Jim

Fiona

Gheeta

Chelsea

I can easily loop through all the student records by using a for loop. But if I want to access Jim’s record only, I have to start at 0 and loop through the array until I find it. With a big array this could be rather inefficient. Is there a better way?

Sequential access good

Arrays

Direct access bad

Remember! The array

elements just hold

references to the objects,

not the objects themselves!

Consider this array of Student records

Page 3: [0][1][2][3][4][5][6][7][8][9] Bing David Ina Abhinav Erik Hyun Jim Fiona Gheeta Chelsea I can easily loop through all the student records by using a

Sequential access bad

Hash tablesDirect access good

Bing

David

Ina

Abhinav

Erik

Hyun

Jim

Fiona

Gheeta

Chelsea

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7][8]

[9]

Hashing Function

Jim’s student ID no.

“6”

The student records are stored in an array. The place in the array that a particular student is held is determined by the hashing function.

The hashing function takes some value, e.g. a name, or, as here, a student id number, and translates it into an array index. So if we want to find Jim’s record we just give his id number to the hashing function and it tells us where his record is located. We don’t need to search through the records. This is direct access.

Page 4: [0][1][2][3][4][5][6][7][8][9] Bing David Ina Abhinav Erik Hyun Jim Fiona Gheeta Chelsea I can easily loop through all the student records by using a

CollisionsWhat happens if the hashing function gives the same array index for two different students?

This happens and it is called a collision. There are a number of ways of dealing with collisions, the details of which you don’t need to know. But what you do need to know is that the performance of hash tables degrades over time because of multiple collisions.

Bing

David

Ina

Abhinav

Erik

Hyun

Jim

Fiona

Gheeta

Chelsea

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7][8]

[9]

Hashing Function

Hiro’s student ID no.

“6”

Collision!

Page 5: [0][1][2][3][4][5][6][7][8][9] Bing David Ina Abhinav Erik Hyun Jim Fiona Gheeta Chelsea I can easily loop through all the student records by using a

Collisions [0]

[1]

[2]

[3]

[4]

[5]

[6]

[7][8]

[9]

Hashing Function

Erik’s student ID no.

“4” ErikDavid’s student ID no. “1”

David

Hyun’s student ID no. “4”

Collision!

Hyun goes into next available

index

Hyun

If there had already been a lot of records in the array when the collision happened, Hyun may have been pushed a long way down the array.

Click to go through the animation

Later, when we try to access Hyun’s record, the hashing function still gives us 4 as the place to find him. But he’s not there! So we have to do a sequential search from index number 4, through the array, to find him. This is the reason that hash table performance degrades over time.

Page 6: [0][1][2][3][4][5][6][7][8][9] Bing David Ina Abhinav Erik Hyun Jim Fiona Gheeta Chelsea I can easily loop through all the student records by using a

The Hashing AlgorithmThe simplest way to translate the Student ID into an array index is to use the modulo operator (% in Java). The modulo operator returns the remainder of a division operation, for example 11 % 4 = 3.

Question: If we have an array of 10 elements, what do we need to mod our Student IDs by to be sure of getting some value from 0 to 10?Answer: 11

Question: Let’s say we have an array of size N. Now what to we need to mod our Student IDs by?

Answer: N+1

Random Student ID: Array size:

Array index this student will be assigned to using modulo operator:

What happens if we don’t have a

numerical Student ID to use? Say we

only have their name? Well we just

convert the string into some numerical

value using one of several methods.

MD5 is a common method; you give it

text, it gives you a 128-bit number. The

important thing is that we get an even

distribution of entries into the array to

minimize collisions.

MD5 is also used to verify copies of

documents because even if one

character has changed during the

copying, the number that MD5 returns

will be totally different.

Go