can’t provide fast insertion/removal and fast lookup at the same time vectors, linked lists,...

Can’t provide fast insertion/removal and fast lookup atthe same time

Vectors, Linked Lists, Stack, Queues, Deques

Data Structures - CSCI 102

Data Structure Limitations

Provide consistently fast operations, but must maintainan internal ordering

Binary Search Trees, Heaps

What if we didn’t care about the ordering of the elementsat all?

How can we further improve the performance of lookup,add & removal?

Each value in the table has a unique key

For operations where we only care about fastadd/remove/search, not fast traversal, we create a tablestructure to optimize for fast lookup

Lookup Tables

The key is used as a short identifier to lookup an entirevalue in the table

Your student ID is used to look up your student record(e.g. name, GPA, etc.)

Example

Search(key)See if a particular value identified by key is in thetable

What kind of operations do we need to perform on a lookuptable?

Lookup Tables

Insert(key,value)Insert a new value identified by key into the table

Remove(key)Remove the value identified by key from the table

We don’t care as much about traversal (visiting allelements) in this scenario

Let’s assume ID is a unique integer

We want to keep a directory of all the students at USC andbe able to look them up by their student ID

Sample Object

struct Student {string name;double gpa;int id;

Student data[4999];

If we can guarantee that student IDs will always range from0 to N (e.g. 0 to 4999), we could just store them in an array:

Direct Address Table

int id = 3285;Student s = data[id];

Then when we want to grab a particular student, we knowStudent N is at index N:

StudentObjects

John Doe3.20

Jane Doe2.62

Some Guy

StudentIDs

Direct Addressing

Maps keys directly to the indexes in an arrayUnused array indexes need to be marked

O(1) worst case

Generally use NULLOperations are fast

Key RestrictionsDirect Addressing Issues

Array Size

Keys must fall into a nice, uniform rangeKeys must be numeric

If there are N possible keys, then data[] must be ofsize NOur array could get HUGEWhat if we’re only using a small numbers of keys?Tons of space is wasted

How can we get around these limitations?

Hash Functions

A function that maps key values to array indexesInput records all have a unique keyThe hash function maps key to an array indexRecords are stored at data[hash(key)]Ideally every unique key also has unique hash(key)

Direct Addressing essentially uses a hash function thatdoes nothing

int directAddressHash(int studentId) {return studentId;

Hash Tables

StudentObjects

John Doe

Jane Doe

Some Guy

NameGPAID

hash(4)

hash(0)

hash(2)

StudentIDs

(Keys)

HashFunction

How can we avoid having to make our array gigantic tohold all possible keys?

Hash Functions

Hash Tables

Simple solution: use modular arithmeticSize of the backing array is no longer dependent onthe number of unique keysint modularHash(int studentId) {

return studentId % ARRAY_SIZE;}

int directAddressHash(int studentId) {return studentId;

Recall direct addressing:

FastHashing is supposed to be faster than a binary searchtree. hash(key) needs to be O(1)

What makes a good hash function?

Hash Functions

DeterministicIf we have a key K, then hash(K) must always givethe same result

Uniform distributionThe hash function should uniformly distribute keysacross all of the available indexes in the storage array

Making a good hash function is hard

For strings, use things like ASCII letter codes

Map your data into the set of natural numbersMaking a hash function

N = {0, 1, 2, ...}

Hash Functions

Prime table sizes tend to yield better resultsPrime numbers are your friend

E.g. make sure "get" and "gets" hash differentlyHandle variants of the same pattern

Try to be independent of any patterns that may exist inthe data

You won’t usually have to write your own, but you shouldknow what the default hash function does

Hash Tables do not maintain any ordering of theirinternal elements

Hashing Issues

Hash Tables

Creating a perfect hash function is almost impossible

When two distinct keys generate the same hash valueit’s called a collision

Collisions

hash(K1) == hash(K2)

If we try to insert a new element and there’s a collision,keep probing the hash table until we find a vacant space

Open Addressing

Collision Handling

If a collision occurs, use a deterministic algorithm tocalculate the next array index to check (based on theinitial hash result)

Probing

All data is stored directly in the hash table. No extra datastructures are needed.

Start with an empty Hash Table

Open Addressing (Linear Probing)

Student

Open Addressing (Linear Probing)Insert "John Doe" with ID = 123

John Doe

Student

John Doe

hash(123) = 1

hash()

hash(123) = 1

Student

John Doe

hash(123) = 1

hash()

hash(123) = 1data[1] is empty, no collision

Student

John Doe2.8123

John Doe

hash(123) = 1

hash()

store it there

Open Addressing (Linear Probing)Hash Table contains one item

John Doe2.8123

Open Addressing (Linear Probing)Insert "Jane Doe" with ID = 202

John Doe2.8123

StudentJane Doe

hash(202) = 3

John Doe2.8123

StudentJane Doe

hash()

hash(202) = 3

John Doe2.8123

StudentJane Doe

hash()

hash(202) = 3

John Doe2.8123

Jane Doe3.4202

hash()

store it there

Student

Jane Doe

John Doe2.8123

Jane Doe3.4202

Open Addressing (Linear Probing)Hash Table contains two items

John Doe2.8123

Jane Doe3.4202Student

Some Guy

Open Addressing (Linear Probing)Insert "Some Guy" with ID = 401

John Doe2.8123

Some Guy

hash(401) = 1

hash()

hash(401) = 1

John Doe2.8123

Some Guy

hash(401) = 1

hash()

hash(401) = 1data[1] is non-empty, collision!

hash(401) = 1

John Doe2.8123

Some Guy

hash()

hash(401)+1 = 2

John Doe2.8123

Some Guy

hash()

hash(401)+1 = 2data[2] is empty, no collision

hash(401) = 1

hash(401)+1 = 2data[2] is empty, no collision

John Doe2.8123

Some Guy3.5401

Jane Doe3.4202

hash(401) = 1

hash()

hash(401) = 1

data[1] is non-empty, collision!

store it there

Student

Some Guy

Some Guy3.5401

Jane Doe3.4

Open Addressing (Linear Probing)Hash Table contains three items

John Doe2.8

Search(key)What is the Big O of each of these operations?

Open Addressing (Linear Probing)

Insert(key,value)

Remove(key)

Average: O(1), Worst Case: O(N)

How big is the table?

load factor = (# of elements) / (size of array)

Operations depend on the table’s load factor

How many slots are taken already?

"Utilization"

Each slot in the Hash Table can now contain a list ofelements instead of a single element

Chaining

Collision Handling

When multiple items hash to the same slot, they areplaced in the list at that slot

This requires the overhead of an extra list for each slot thatcontains one or more elements

2.8123

Jane Doe3.4202

ChainingHash Table contains two items

John Doe

StudentSome Guy

ChainingInsert "Some Guy" with ID = 401

John Doe

2.8123

Jane Doe3.4202

2.8123

Jane Doe3.4

StudentSome Guy

hash(401) = 1

hash()

hash(401) = 1

John Doe

StudentSome Guy

hash(401) = 1

hash()

John Doe

2.8123

Jane Doe3.4

StudentSome Guy

hash(401) = 1

hash()

hash(401) = 1data[1] is non-empty, collision!Chaining says to add the newentry to the list at data[1]

John Doe

2.8123

Jane Doe3.4

StudentSome Guy

hash()

hash(401) = 1data[1] is non-empty, collision!Chaining says to add the newentry to the list at data[1]

Insert Some Guy in the list at data[1]

hash(401) = 1

John Doe2.8123

Jane Doe3.4

2.8123

Jane Doe3.4202

ChainingHash Table contains three items

Some Guy3.5401

John Doe

Chaining

Search(key)What is the Big O of each of these operations?

Insert(key,value)

Remove(key)

Average: O(1), Worst Case: O(1)

Operations depend on the average length of a chain (exceptfor insert)

If a malicious user knows what hash function you’reusing, they can intentionally cause your worst-casebehavior

The Problem

Collision Handling

When the Hash Table is created, randomly choose ahash function independent of the keys that are going tobe stored

No single input gives worst-case behavior(just like randomized Quicksort)

Universal Hashing

Like chaining, but each element in the hash table holdsanother hash table with a different hash function

Multi-Level Hashing

Collision Handling

If the set of possible keys is static (never changes), wecan develop a perfect multi-level hash to give O(1) worstcase performance

e.g. The reserved keywords in a programminglanguage are a static set of keys

Perfect Hashing

By hashing multiple times, we can greatly decrease theodds of a collision

Hash Tables generally do provide a way for you toretrieve a list of the known keys

Just keep in mind there is no guaranteed ordering ofthe keys

Other Notes

Hash Tables

C++ currently has no built-in hash tableThere’s a proposal for unordered_map in the STL is onthe tableGoogle Sparse Hash provides C++ hash tablesBoost C++ Libraries provides hash tableshttp://www.boost.org/

can’t provide fast insertion/removal and fast lookup at the same time vectors, linked lists,...

Documents

calorie lookup

chapter 18 confluently persistent deques via data...

chapter 5 stacks, queues, and deques

fungsi lookup

fast routing table lookup based on deterministic...

namefilter : achieving fast name lookup with low memory...

20 stacks queues deques

fast and deterministic hash table lookup using...

lock-free and practical doubly linked list-based deques...

appendix: lookup lists - yale university library...

purely functional, real-time deques with catenation

lookup transformation

function lookup

example rad design: ip router using fast ip lookup

programacion orientada a objetos - uco.es · pdf...

lock-free deques and doubly linked lists -...

eastfile: an energy aware, scalable, and tcam based fast ip...

excel using v-lookup and h-lookup › ... ›...

a hierarchical voxel hash for fast 3d nearest neighbor...

users guide: fast ip lookup (fipl) in the fpx · users...