Download - hashing - cs.unc.edu
Hashing
DynamicDictionaries
Operations:• create• insert• find• remove• max/min• writeoutinsortedorder
Onlydefined forobjectclassesthatareComparable
Hashtables
Operations:• create• insert• find• remove• max/min• writeoutinsortedorder
Onlydefined forobjectclassesthatareComparable haveequals defined
Hashtables
Operations:• create• insert• find• remove• max/min• writeoutinsortedorder
Onlydefined forobjectclassesthatareComparable haveequals defined
Javaspecific:FromtheJavadocumentation
Hashtables– implementation
• Haveatable(anarray)ofafixedtableSize
• A hashfunctiondetermineswhereinthistableeach
itemshouldbestored
itemhash(item)
[apositiveinteger]
%tableSize
THEDESIGNQUESTIONS
1. ChoosingtableSize
2. Choosingahashfunction
3. Whattodowhenacollision occurs
2174 % 10=4
Hashtables– tableSize
• Shoulddependonthe(maximum)numberofvaluestobestored
• Let λ =[numberofvaluesstored]/tableSize
• Loadfactor ofthehashtable
• Restrictλ tobeatmost1(or½)
• RequiretableSizetobeaprimenumber
• to“randomize”awayanypatternsthatmayariseinthehashfunction
values
• Theprimeshouldbeoftheform(4k+3)
[forreasonstobedetailedlater]
Hashtables– thehashfunction
Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis
generallyOK,unlessthekeyshave“patterns”
Otherwise,some“randomized”waytoobtainaninteger
Hashtables– thehashfunction
Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis
generallyOK,unlessthekeyshave“patterns”
Otherwise,some“randomized”waytoobtainaninteger
Hashtables– thehashfunction
Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis
generallyOK,unlessthekeyshave“patterns”
Otherwise,some“randomized”waytoobtainaninteger
Hashtables– thehashfunction
Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis
generallyOK,unlessthekeyshave“patterns”
Otherwise,some“randomized”waytoobtainaninteger
Java-specific
•EveryclasshasadefaulthashCode()methodthatreturnsaninteger
•Maybe(should be)overridden
•Requiredproperties
consistentwiththeclass’sequals()method
neednotbeconsistentacrossdifferentrunsoftheprogram
differentobjectsmayreturnthesamevalue!
Hashtables– thehashfunction
Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis
generallyOK,unlessthekeyshave“patterns”
Otherwise,some“randomized”waytoobtainaninteger
Java-specific
•EveryclasshasadefaulthashCode()methodthatreturnsaninteger
•Maybe(should be)overridden
•Requiredproperties
consistentwiththeclass’sequals()method
neednotbeconsistentacrossdifferentrunsoftheprogram
differentobjectsmayreturnthesamevalue!
FromtheJava1.5.0documentation
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Object.html#hashCode%28%29
Hashtables– collisionresolution
Theuniverse ofpossibleitemsisusuallyfargreaterthantableSize
Collision:whenmultipleitemshashontothesamelocation(akacellorbucket)
Collisionresolutionstrategiesspecifywhattodoincaseofcollision
1. Chaining(closedaddressing)
2. Probing(openaddressing)
a. Linearprobing
b. Quadraticprobing
c. DoubleHashing
d. PerfectHashing
e. CuckooHashing
Hashtables– implementation
• Haveatable(anarray)ofafixedtableSize
• A hashfunctiondetermineswhereinthistableeach
itemshouldbestored
itemhash(item)
[apositiveinteger]
%tableSize
THEDESIGNQUESTIONS
1. ChoosingtableSize
2. Choosingahashfunction
3. Whattodowhenacollision occurs
Hashtables– tableSize
Restricttheloadfactorλ =[numberofvaluesstored]/tableSize tobe
atmost1(or½)
RequiretableSizetobeaprimenumberoftheform(4k+3)
Hashtables– thehashfunction
Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis
generallyOK,unlessthekeyshave“patterns”
Otherwise,some“randomized”waytoobtainaninteger
Java-specific
•EveryclasshasadefaulthashCode()methodthatreturnsaninteger
•Maybeoverridden
•Requiredproperties
consistentwiththeclass’sequals()method
neednotbeconsistentacrossdifferentrunsoftheprogram
differentobjectsmayreturnthesamevalue!
Hashtables– collisionresolution
Theuniverse ofpossibleitemsisusuallyfargreaterthantableSize
Collision:whenmultipleitemshashontothesamelocation(akacellorbucket)
Collisionresolutionstrategiesspecifywhattodoincaseofcollision
1. Chaining(closedaddressing)
2. Probing(openaddressing)
a. Linearprobing
b. Quadraticprobing
c. DoubleHashing
d. PerfectHashing
e. CuckooHashing
Hashtables– collisionresolution: chaining
Maintainalinkedlist ateachcell/bucket
(Thehashtableisan arrayoflinkedlists)
Insert:atfrontoflist
- ifpre-condition is“notalreadyinlist,” then faster
- inanycase,later-inserteditemsoftenaccessedmorefrequently (theLRU principle)
Example:Insert02,12, 22,…,92 intoaninitiallyemptyhashtablewithtableSize =10
[Note:badchoiceoftableSize– onlytomaketheexampleeasier!!]
Maintainalinkedlist ateachcell/bucket
(Thehashtableisan arrayoflinkedlists)
Insert:atfrontoflist
- ifpre-condisthatnotalreadyinlist,thenfaster
- inanycase,later-inserteditemsoftenaccessedmorefrequently
Example:Insert02,12, 22,…,92 intoaninitiallyemptyhashtablewithtableSize =10
[Note:badchoiceoftableSize– onlytomaketheexampleeasier!!]
Hashtables– collisionresolution: chaining
Maintainalinkedlist ateachcell/bucket
(Thehashtableisan arrayoflinkedlists)
Insert:atfrontoflist
- ifpre-condisthatnotalreadyinlist,thenfaster
-inanycase,later-inserteditemsoftenaccessedmorefrequently
FindandRemove:obviousimplementations
Worst-caserun-time:Θ(N)peroperation(allelementsinthesamelist)
Averagecase:O(λ) peroperationDesignrule:forchaining,keepλ ≤1Ifλ becomesgreaterthan1,rehash (later)
Hashtables– collisionresolution: chaining
Theloadfactor:[numberofitemsstored]/tableSize
Hashtables– collisionresolution:probing
1. Chaining (closedaddressing)2. Probing (open addressing)
a. Linearprobingb. Quadraticprobingc. DoubleHashingd. PerfectHashinge. CuckooHashing
Incaseofcollision, tryalternativelocations untilanemptycellisfound
• [Open address]
Probesequence:ho(x), h1(x),h2(x),…,withhi(x)=[hash(x)+f(i)]%tableSize
Thefunction f(i) isdifferent forthedifferentprobingmethods
Avoids theuseofdynamicmemory
f(i) isalinear functionofi– typically,f(i)=i
Example:insert89,18,49,58,and 69 intoatableofsize10,usinglinearprobing
Hashtables– collisionresolution:linearprobing
1. Chaining (closed addressing)2. Probing (open addressing)
a. Linear probingb. Quadratic probingc. Double Hashingd. Perfect Hashinge. Cuckoo Hashing
In case of collision, try alternative locations until an empty cell is found
• [Open address]
Probe sequence: ho(x), h1(x), h2(x), …, with hi(x) = [hash(x) + f(i)] % tableSize
The function f(i) is different for the different probing methods
Avoids the use of dynamic memory
f(i) is a linear function of i – typically, f(i) = i
Example:insert89,18,49,58,and 69 intoatableofsize10,usinglinearprobing
Hashtables- review
Supports thebasicdynamicdictionaryops:insert,find, remove
Doesnot needclasstobeComparable
Threedesigndecisions: tableSize,hashfunction, collision resolution
Tablesize
aprime oftheform(4k+3),keepingloadfactor constraintsinmind
Hashfunction
should“randomize”theitems
Java’shashCode() method
Collision resolution: chaining
Collision resolution:probing (openaddressing)– linearprobing
Theclustering problem
Hashtables- clustering
Twocausesofclustering:
multiplekeyshashontothesamelocation(secondary clustering)
multiplekeyshashontothesamecluster(primary clustering)
Secondary clusteringcausedbyhashfunction;primary,bychoiceofprobesequence
Numberofprobesperoperationincreases with loadfactor
Hashtables– collisionresolution:probing
1. Chaining (closedaddressing)2. Probing (open addressing)
a. Linearprobingb. Quadraticprobingc. DoubleHashingd. PerfectHashinge. CuckooHashing
f(i) isaquadraticfunctionof i(e.g.,f(i)=i2)
Example:insert89,18,49,58,and 69 intoatableofsize10,usingquadraticprobing
Hashtables– collisionresolution:quadraticprobing
Example:insert89,18,49,58,and 69 intoatableofsize10,usingquadraticprobing
Hashtables– collisionresolution:quadraticprobing
Twocausesofclustering:
multiplekeyshashontothesamelocation(secondary clustering)
multiplekeyshashontothesamecluster(primary clustering)
Whichonedoesquadraticprobing solve?
primaryclustering
Efficientimplementation ofi2 à (i+1)2:(i+1)and(2i+1) inparallel,andthenaddi2 and
(2i+1)
Choosing tableSize:
-prime:atleasthalfthetablegetsprobed
-primeof theform (4k+3)andprobesequence is± i2:entiretablegetsprobed
Remove:lazydelete mustbeused
Hashtables– collisionresolution:probing
1. Chaining (closedaddressing)2. Probing (open addressing)
a. Linearprobingb. Quadraticprobingc. DoubleHashingd. PerfectHashinge. CuckooHashing
Togetridofsecondary clustering
Usetwohashfunctions: hash1(.) andhash2(.)
Probesequence“step”sizeishash2(.)
- [Unlikelydistinctitemsagreeonboth hash1(.)andhash2(.)]
hash2(.) mustneverevaluatetozero!
Acommon(good)choice:R– (xmodR), forRaprime
smallerthantableSize
Example:insert89,18,49,58,and 69 intoatableofsize10,usingdoublehashingwithhash2(x)=7– xmod7
Hashtables– collisionresolution:doublehashing
Example:insert89,18,49,58,and 69 intoatableofsize10,usingdoublehashingwithhash2(x)=7– xmod7
Hashtables– collisionresolution:probing
1. Chaining (closedaddressing)2. Probing (open addressing)
a. Linearprobingb. Quadraticprobingc. DoubleHashingd. PerfectHashinge. CuckooHashing
Hashtables– collisionresolution:Cuckoohashing
Goal:constant-timeO(1)find intheworstcase
Exampleapplication:networkroutingtables
[remove alsotakesO(1)time]
Inserthasworst-caseΘ(N)run-time
Keeptwo hashtables,andusetwodifferenthashfunctions
Hashtables– collisionresolution:Cuckoohashing
TABLE1 TABLE2
01234
A:hash1(A)=0,hash2(A)=2
A B:hash1(B)=0,hash2(B)=0B
Hashtables– collisionresolution:Cuckoohashing
TABLE1 TABLE2
01234
A:hash1(A)=0,hash2(A)=2
A
B:hash1(B)=0,hash2(B)=0B
C:hash1(C)=1,hash2(C)=4C
D:hash1(D)=1,hash2(D)=0
D
Hashtables– collisionresolution:Cuckoohashing
TABLE1 TABLE2
01234
A:hash1(A)=0,hash2(A)=2
A
B:hash1(B)=0,hash2(B)=0B
C:hash1(C)=1,hash2(C)=4
C
D:hash1(D)=1,hash2(D)=0
D
E:hash1(E)=3,hash2(E)=2E
F:hash1(F)=3,hash2(F)=4
F
Hashtables– collisionresolution:Cuckoohashing
TABLE1 TABLE2
01234
A:hash1(A)=0,hash2(A)=2
A
B:hash1(B)=0,hash2(B)=0B
C:hash1(C)=1,hash2(C)=4
C
D:hash1(D)=1,hash2(D)=0
D
E:hash1(E)=3,hash2(E)=2
E
F:hash1(F)=3,hash2(F)=4
F
Hashtables– collisionresolution:Cuckoohashing
TABLE1 TABLE2
01234
A:hash1(A)=0,hash2(A)=2
A B:hash1(B)=0,hash2(B)=0B
C:hash1(C)=1,hash2(C)=4
C
D:hash1(D)=1,hash2(D)=0
D
E:hash1(E)=3,hash2(E)=2
E
F:hash1(F)=3,hash2(F)=4
F
Hashtables– collisionresolution:Cuckoohashing
TABLE1 TABLE2
01234
A:hash1(A)=0,hash2(A)=2
A B:hash1(B)=0,hash2(B)=0B
C:hash1(C)=1,hash2(C)=4
C
D:hash1(D)=1,hash2(D)=0
D
E:hash1(E)=3,hash2(E)=2
E
F:hash1(F)=3,hash2(F)=4
F
Hashtables– collisionresolution:Cuckoohashing
Insert
- InsertintoTable1,usinghash1
- Ifcellisalreadyoccupied
- bump itemintoother table(usingappropriatehashfunction)
- Repeat
- Rehash afterkrepetitions
Eachtableshould bemorethanhalfempty
Stronger condition thanloadfactor≤½
Rehashing
Whenloadfactorbecomestoolarge…
(Approximately)double tableSize
Scan oldtable,insertingeachnon-deleteditemintothenewtable
Worst-case time?
- O(N2)
Average-case:O(N)
Amortizedanalysis
Averagecostperinsert,overasequenceofrepeatedre-hashings
[Notgreatforinteractiveapplications…]
Hashtables- review
Supports thebasicdynamicdictionaryops:insert,find, remove
Threedesigndecisions: tableSize,hashfunction, collision resolution
Tablesize:aprime oftheform(4k+3),keepingloadfactor constraintsinmind
Hashfunction
Java’shashCode() method
item goestohash(item)%tableSize
Collision:multiple itemsatthesamelocation
Collision resolution:-chaining
Collision resolution: -probing (openaddressing)- Linearprobing
- Quadraticprobing
- DoubleHashing
- CuckooHashing
Java-specific– hashCode() andequals()
public class Employee {String name;int id;public Employee(String n, int i){name = n; id = i;}
public boolean equals(Employee e){return (name == e.name && id == e.id);
}}
……
public static void main(String[] args) {Employee e1=new Employee("weiss", 001);Employee e2=new Employee("weiss", 001);System.out.println(e1.hashCode() + ", " + e2.hashCode());System.out.println(e1 == e2);System.out.println(e1.equals(e2));
Employee e2 = e1;
f(i) canbeanylinear function (a*i+b)
Ifgcd(a,tableSize)=1,thenlinearprobing willprobe theentiretable
Primaryclustering:blocksofoccupiedcellsstartforming eveninarelativelyemptytable
Hashtables– collisionresolution:linearprobing
anyitemhashing here…
f(i) canbeanylinear function (a*i+b)
Ifgcd(a,tableSize)=1,thenlinearprobing willprobe theentiretable
Primaryclustering:blocksofoccupiedcellsstartforming eveninarelativelyemptytable
Hashtables– collisionresolution:linearprobing
anyitemhashing here… grows theclusterbyone
f(i) canbeanylinear function (a*i+b)
Ifgcd(a,tableSize)=1,thenlinearprobing willprobe theentiretable
Primaryclustering:blocksofoccupiedcellsstartforming eveninarelativelyemptytable
Hashtables– collisionresolution:linearprobing
anyitemhashing here… mergesthetwoclusters
Hashtables- clustering
Twocausesofclustering:
multiplekeyshashontothesamelocation(secondary clustering)
multiplekeyshashontothesamecluster(primary clustering)
Secondary clusteringcausedbyhashfunction;primary,bychoiceofprobesequence
Numberofprobesperoperationincreases with loadfactor
Hashtables– linearprobing:remove
0
1
2
3
4
5
6
7
8
9
insertA;hash(A)=4
A
insertB;hash(B)=5
B
insertC;hash(C)=4
C
removeBfindC
Removemust beimplementedaslazydelete!!
- Loadfactorcomputed including lazy-deleteditems
- Ininserts,may“reclaim”lazy-deletedcells