csc 172 data structures. sets and hashing unadvertised in-store special: sets! in java, see weiss...
TRANSCRIPT
![Page 1: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/1.jpg)
CSC 172 DATA STRUCTURES
![Page 2: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/2.jpg)
SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The main event.
![Page 3: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/3.jpg)
Representation of Sets
ListSimple O(n) dictionary operations
Binary Search TreesO(log n) average timeRange queries, sorting
Characteristic Vector O(1) dictionary ops, but limited to small sets
Hash TableO(1) average for dictionary opsTricky to expand, no range queries
![Page 4: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/4.jpg)
Characteristic Vectors
Boolean Strings whose position corresponds to the members of some fixed “universal” set
A “1” in a location means that the element is in the setA “0” means that it is not
![Page 5: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/5.jpg)
MUSIC THEORY
A chord is a set of notes played at the same time. Represented by a 12 bit vector called a “pitch
class” {B,A#,A,G#,G,F#,F,E,D#,D,C#,C} 000010010001 represents C major 000010001001 represents C minor Rotation is “transposition” Bit reversal is “inversion”
![Page 6: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/6.jpg)
UNIX file privileges
{user, group, others} x {read, write, execute}9 possible privilegesType “ls –l” on UNIX
total 142-rw-rw-r-- 1 pawlicki none 76 Jun 20 2000 PKG416.desc-rw-rw-r-- 1 pawlicki none 28906 Jun 20 2000 PKG416.pdf-rw-rw-r-- 1 pawlicki none 1849 Jun 20 2000 let.1-rw-rw-r-- 1 pawlicki none 0 Apr 2 13:03 out-rw-rw-r-- 1 pawlicki none 39891 Jun 20 2000 stapp.uu
![Page 7: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/7.jpg)
UNIX files
The order is rwx for each of user (owner), group, and others
So, a protection mode of 110100000 means that the owner may read and write (but not execute), the group can read only and others cannot even read
![Page 8: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/8.jpg)
GAMBLING A deck has 52 cards {2C,2H,2S,2D,3C, .... KD,AC,AH,AS,AD} Represent a “hand” as a vector of 52 bits 00000000000000000000000000000000000000000000
00000101 is a pair of aces In “Texas Hold'em” everyone gets two “hole” cards and
5 “board” cards We can use bitwise & to find “hands”
![Page 9: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/9.jpg)
CV advantages
If the universal set is small, sets can be represented by bits packed 32 to a word
Insert, delete, and lookup are O(1) on the proper bit
Union, intersection, difference are implemented on a word-by-word basis
O(m) where m is the size of the setSmall constant factor (1/32)Fast, machine operations
![Page 10: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/10.jpg)
Hashing
A cool way to get from an element x to the place where x can be found
An array [0..B-1] of bucketsBucket contains a list of set elementsB = number of buckets
A hash function that takes potential set elements and quickly produces a “random” integer [0..B-1]
![Page 11: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/11.jpg)
Example
If the set elements are integers then the simplest/best hash function is usually h(x) = x % B or h(x) = x - (x%B), (never 0).
Suppose B = 6 and we wish to store the integers {70, 53, 99, 94, 83, 76, 64, 30}
They belong in the buckets 4, 5, 3, 4, 5, 4, 4, and 0Note: If B = 7 0,4,1,3,6,6,1,2
![Page 12: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/12.jpg)
Pitfalls of Hash Function Selection
We want to get a uniform distribution of elements into buckets
Beware of data patterns that cause non-uniform distribution
![Page 13: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/13.jpg)
Example
If integers were all even, then B = 6 would cause only buckets 0,2, and 4 to fill
If we hashed words in the UNIX dictionary into 10 buckets by length of word then 20% go into bucket 7
![Page 14: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/14.jpg)
Dictionary Operations
LookupGo to head of bucket h(x)Search for bucket list. If x is in the bucket
Insertion: append if not foundDelete – list deletion from bucket list
![Page 15: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/15.jpg)
Analysis
If we pick B to be new N, the number of elements in the set, then the average list is O(1) long
Thus, dictionary ops take O(1) timeWorst case: all elements go into one bucketO(n)
![Page 16: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/16.jpg)
Managing Hash Table Size
If n gets as high as 2B, create a new hash table with 2B buckets
“Rehash” every element into the new tableO(n) time total
There were at least n inserts since the last “rehash”
All these inserts took time O(n)
Thus, we “amortize” the cost of rehashing over the inserts since the last rehash
Constant factor, at worst
So, even with rehashing we get O(1) time ops
![Page 17: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/17.jpg)
Collisions
A collision occurs when two values in the set hash to the same value
There are several ways to deal with thisChaining (using a linked list or some secondary structure)Open AddressingDouble hashingLinear Probing
![Page 18: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/18.jpg)
Chaining
4
5
6
3
2
1
0 70
99 64
83 76
94
53
30
Very efficientTime Wise
Other approachesUse less space
![Page 19: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/19.jpg)
Open Addressing
When a collision occurs, if the table is not full find an available spaceLinear ProbingQuadratic ProbingDouble Hashing
![Page 20: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/20.jpg)
Linear ProbingIf the current location is occupied, try the next table
locationLinearProbingInsert(K) {
if (table is full) error;probe = h(K);while (table[probe] is occupied)
probe = ++probe % M;table[probe] = K;
}
Walk along table until an empty spot is foundUses less memory than chaining (no links)Takes more time than chaining (long walks)Deleting is a pain (mark a slot as having been deleted)
![Page 21: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/21.jpg)
Linear Probingh(K) = K % 13
181211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5,
![Page 22: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/22.jpg)
Linear Probingh(K) = K % 13
18411211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2,
![Page 23: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/23.jpg)
Linear Probingh(K) = K % 13
2218411211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9,
![Page 24: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/24.jpg)
Linear Probingh(K) = K % 13
225918411211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7,
![Page 25: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/25.jpg)
Linear Probingh(K) = K % 13
22593218411211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6,
![Page 26: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/26.jpg)
Linear Probingh(K) = K % 13
22593218411211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6, 5,
![Page 27: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/27.jpg)
Linear Probingh(K) = K % 13
22593218411211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6, 5,
![Page 28: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/28.jpg)
Linear Probingh(K) = K % 13
22593218411211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6, 5,
![Page 29: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/29.jpg)
Linear Probingh(K) = K % 13
2231593218411211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6, 5,
![Page 30: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/30.jpg)
Linear Probingh(K) = K % 13
2231593218411211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6, 5, 8
![Page 31: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/31.jpg)
Linear Probingh(K) = K % 13
2231593218411211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6, 5, 8
73
![Page 32: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/32.jpg)
Double HashingIf the current location is occupied, try another table locationUse two hash functionsIf M is prime, eventually will examine every location DoubleHashInsert(K) {
if (table is full) error;probe = h1(K);offset = h2(K);while (table[probe] is occupied)
probe = (probe+offset) % M;table[probe] = K;
}
Many of the same (dis)advantages as linear probingDistributes keys more evenly than linear probing
![Page 33: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/33.jpg)
Quadratic Probing
Don't step by 1 each time. Add i2 to the h(x) hashed location (mod B of course) for i = 1,2,...
![Page 34: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/34.jpg)
Double Hashingh1(K) = K % 13h1(K) = 8 - K % 8
1211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h1(K) : 5, 2, 9, 7, 6, 5, 8
h2(K) : 6, 7, 2, 5, 8, 1, 7
![Page 35: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/35.jpg)
Double Hashingh1(K) = K % 13h1(K) = 8 - K % 8
22593218411211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h1(K) : 5, 2, 9, 7, 6, 5, 8
h2(K) : 6, 7, 2, 5, 8, 1, 7
31
![Page 36: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/36.jpg)
Double Hashingh1(K) = K % 13h1(K) = 8 - K % 8
22593218411211109876543210
Insert: 18, 41, 22, 59, 32, 31, 73
h1(K) : 5, 2, 9, 7, 6, 5, 8
h2(K) : 6, 7, 2, 5, 8, 1, 7
3173
![Page 37: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/37.jpg)
Theoretical Results
Double Hashing
Linear Probing
Chaining
FoundNot Found
1 +α 1+α2
12
+ 1
2 (1−α )212
+ 12 (1−α )
1(1−α )
1αln 1
(1−α )
![Page 38: CSC 172 DATA STRUCTURES. SETS and HASHING Unadvertised in-store special: SETS! in JAVA, see Weiss 4.8 Simple Idea: Characteristic Vector HASHING...The](https://reader035.vdocument.in/reader035/viewer/2022062304/56649f4f5503460f94c70f84/html5/thumbnails/38.jpg)
Expected Probes
0.5 1.0
1.0
Linear Probing
Double Hashing
Chaining