hashing rehashed paul m. dorfman, independent consultant

36
Hashing Rehashed Paul M. Dorfman, Independent Consultant Gregg P. Snell, Data Savant Consulting SUGI 27, Paper 12 Orlando, FL

Upload: vienna

Post on 11-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

Hashing Rehashed Paul M. Dorfman, Independent Consultant Gregg P. Snell, Data Savant Consulting. SUGI 27, Paper 12 Orlando, FL. MYLIB.SUVBYZIP ZIPCITYSUV 66216Shawnee67 66216Shawnee-Mission67 32258Jacksonville88 27513Cary 214. ARRAY SUV(0:99999) SUV(00000)=. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Hashing RehashedPaul M. Dorfman, Independent Consultant

Gregg P. Snell, Data Savant Consulting

SUGI 27, Paper 12

Orlando, FL

Page 2: Hashing Rehashed Paul M. Dorfman, Independent Consultant

KEY- INDEXING

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=.

SUV(32258)=.

SUV(66216)=.

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 3: Hashing Rehashed Paul M. Dorfman, Independent Consultant

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=.

SUV(32258)=.

SUV(66216)=67

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 4: Hashing Rehashed Paul M. Dorfman, Independent Consultant

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=.

SUV(32258)=.

SUV(66216)=67

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 5: Hashing Rehashed Paul M. Dorfman, Independent Consultant

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=.

SUV(32258)=.

SUV(66216)=67

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 6: Hashing Rehashed Paul M. Dorfman, Independent Consultant

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=.

SUV(32258)=88

SUV(66216)=67

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 7: Hashing Rehashed Paul M. Dorfman, Independent Consultant

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=.

SUV(32258)=88

SUV(66216)=67

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 8: Hashing Rehashed Paul M. Dorfman, Independent Consultant

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=214

SUV(32258)=88

SUV(66216)=67

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 9: Hashing Rehashed Paul M. Dorfman, Independent Consultant

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=214

SUV(32258)=88

SUV(66216)=67

SUV(99999)=.

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 10: Hashing Rehashed Paul M. Dorfman, Independent Consultant

KEY- INDEXING

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

** add suv_count to master data set;

do until(eof2);

set PRODLIB.CUSTOMER end=eof2;

/* assign count by directly

addressing the array*/

suv_count=suv(zip5);

/* be sure to drop unwanted fields

introduced during table load */

drop zipcode;

output;

end;

run;

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 11: Hashing Rehashed Paul M. Dorfman, Independent Consultant
Page 12: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Hash

Hash

Hash

Hash

Hash

Hash

COLLISION RESOLUTION POLICY: LINEAR PROBING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

KEY HASH_ADDR HASH_TABLE

185 04 (00)=.

971 10 (01)=260

400 11 (02)=.

260 01 (03)=.

922 13 (04)=185

970 09 (05)=.

543 11 (06)=.

(07)=.

(08)=.

(09)=970

(10)=971

(11)=400

(12)=.

(13)=922

 

 

Page 13: Hashing Rehashed Paul M. Dorfman, Independent Consultant

COLLISION

KEY HASH_ADDR HASH_TABLE

185 04 (00)=.

971 10 (01)=260

400 11 (02)=.

260 01 (03)=.

922 13 (04)=185

970 09 (05)=.

543 11 (06)=.

(07)=.

(08)=.

(09)=970

(10)=971

(11)=400

(12)=.

(13)=922

 

 

Hash

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: LINEAR PROBING

data _null_;

array hash_table(0:&hash_size) _temporary_;

do until(eof1);

set small end=eof1;

do hash_addr=mod(key,&hash_size)+1

by -1 until

(hash_table(hash_addr)=. or

hash_table(hash_addr)=key);

if hash_addr < 0 then

hash_addr=&hash_size-1;

end;

hash_table(hash_addr) = key;

end;

/* all done, write results to the log */

put 'hash_table';

do i=0 to &hash_size;

put '(' i z2.')=' hash_table(i) z3.;

end;

run;

Page 14: Hashing Rehashed Paul M. Dorfman, Independent Consultant

COLLISION

data _null_;

array hash_table(0:&hash_size) _temporary_;

do until(eof1);

set small end=eof1;

do hash_addr=mod(key,&hash_size)+1

by -1 until

(hash_table(hash_addr)=. or

hash_table(hash_addr)=key);

if hash_addr < 0 then

hash_addr=&hash_size-1;

end;

hash_table(hash_addr) = key;

end;

/* all done, write results to the log */

put 'hash_table';

do i=0 to &hash_size;

put '(' i z2.')=' hash_table(i) z3.;

end;

run;

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: LINEAR PROBING

KEY HASH_ADDR HASH_TABLE

185 04 (00)=.

971 10 (01)=260

400 11 (02)=.

260 01 (03)=.

922 13 (04)=185

970 09 (05)=.

543 11 (06)=.

(07)=.

(08)=543

(09)=970

(10)=971

(11)=400

(12)=.

(13)=922

 

 

Page 15: Hashing Rehashed Paul M. Dorfman, Independent Consultant
Page 16: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Hash

Hash

Hash

Hash

Hash

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=. (09)=.

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Page 17: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 2Step 1

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=. (09)=.

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Page 18: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 3

Step 1Step 2

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=.

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Page 19: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 3

Step 1Step 2

Step 4

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 20: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1

COLLISION

Step 2

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 21: Hashing Rehashed Paul M. Dorfman, Independent Consultant

COLLISION

/* STEP 2.a (check for duplicates) */ traverse:

if key = hash_table(hash_addr)

then found=1;

else if link_to_next(hash_addr) ne 0

then do;

hash_addr=link_to_next(hash_addr);

goto traverse;

end;

Step 1

Step 2a

0 ?

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Step 2

Page 22: Hashing Rehashed Paul M. Dorfman, Independent Consultant

COLLISION

/* STEP 2.b (follow next_key to empty node) */

do next_key = &hash_size by -1

until(link_to_next(next_key)=.);

end;

/* STEP 2.c (change original link from 0 to next */

link_to_next(hash_addr)=next_key;

hash_addr = next_key;

Step 1

Step 2a

Step 2b

Step 2c

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Step 2

Page 23: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1

Step 2c

Step 2a

Step 2b

/* STEP 2.b (follow next_key to empty node) */

do next_key = &hash_size by -1

until(link_to_next(next_key)=.);

end;

/* STEP 2.c (change original link from 0 to next */

link_to_next(hash_addr)=next_key;

hash_addr = next_key;

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Step 2

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Page 24: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 3

Step 1

Step 2c

Step 2a

Step 2b

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Step 2

Page 25: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1

Step 2c

Step 3

Step 2b

Step 2a

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=.

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Step 2

Page 26: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1

Step 2c

Step 3

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;Step 2b

Step 2a

Step 4

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Step 2

Page 27: Hashing Rehashed Paul M. Dorfman, Independent Consultant

COLLISION

Step 1Step 2

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 28: Hashing Rehashed Paul M. Dorfman, Independent Consultant

/* STEP 2.a (check for duplicates) */ traverse:

if key = hash_table(hash_addr)

then found=1;

else if link_to_next(hash_addr) ne 0

then do;

hash_addr=link_to_next(hash_addr);

goto traverse;

end;

COLLISION

Step 1Step 2

Step 2a

0 ?

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 29: Hashing Rehashed Paul M. Dorfman, Independent Consultant

COLLISION

Step 1Step 2

Step 2bStep 2a

Step 2c

/* STEP 2.b (follow next_key to empty node) */

do next_key = &hash_size by -1

until(link_to_next(next_key)=.);

end;

/* STEP 2.c (change original link from 0 to next */

link_to_next(hash_addr)=next_key;

hash_addr = next_key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 30: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1

Step 2c

/* STEP 2.b (follow next_key to empty node) */

do next_key = &hash_size by -1

until(link_to_next(next_key)=.);

end;

/* STEP 2.c (change original link from 0 to next */

link_to_next(hash_addr)=next_key;

hash_addr = next_key;

Step 2

Step 2bStep 2a

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=8 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 31: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1 Step 3

Step 2c

Step 2

Step 2bStep 2a

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=8 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 32: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1 Step 3

Step 2c

Step 2

Step 2bStep 2a

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=0 (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=8 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 33: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1 Step 3

Step 2c

Step 2

Step 2bStep 2a

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=0 (08)=532

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=8 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 34: Hashing Rehashed Paul M. Dorfman, Independent Consultant
Page 35: Hashing Rehashed Paul M. Dorfman, Independent Consultant
Page 36: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

BENCHMARKINGRun Time (Seconds)

0 10 20 30 40 50 60 70 80 90 100

Key-Indexing

Bitmapping

Coalesced Chaining-05

Coalesced Chaining-08

Double Hashing-05

Double Hashing-08

Sqxjhsh

Format

Merge

100,000

300,000

500,000

Observations

Memory (MB)

0 10 20 30 40 50 60 70

Key-Indexing

Bitmapping

Coalesced Chaining-05

Coalesced Chaining-08

Double Hashing-05

Double Hashing-08

Sqxjhsh

Format

Merge

100,000

300,000

500,000

Observations