symmetric and aysmmetric algorithms
TRANSCRIPT
-
8/7/2019 Symmetric and Aysmmetric Algorithms
1/30
Symmetric algorithms:
IDEA encryption algorithm
IDEA stands for International Data Encryption Algorithm. IDEA is a symmetric encryption
algorithm that was developed by Dr. X. Lai and Prof. J. Massey to replace the DES standard.Unlike DES though it uses a 128 bit key. This key length makes it impossible to break by simply
trying every key. It has been one of the best publicly known algorithms for some time. It has
been around now for several years, and no practical attacks on it have been published despite ofnumerous attempts to analyze it.
IDEA is resistant to both linear and differential analysis
SEED
SEED is a block cipher developed by the Korea Information Security Agency since 1998. Both
the block and key size of SEED are 128 bits and it has a Feistel Network structure which isiterated 16 times. It has been designed to resist differential and linear cryptanalysis as well as
related key attacks. SEED uses two 8x8 S-boxes and mixes the XOR operation with modular
addition. SEED has been adopted as an ISO/IEC standard (ISO/IEC 18033-3), an IETF RFC,RFC 4269 as well as an industrial association standard of Korea (TTAS.KO-12.0004/0025).
Asymmetric algorithms:
ElGamal
The ElGamal is a public key cipher - an asymmetric key encryption algorithm for public-key
cryptography which is based on the Diffie-Hellman key agreement. ElGamal is the predecessorof DSA.
ECDSA
Elliptic Curve DSA (ECDSA) is a variant of the Digital Signature Algorithm (DSA) whichoperates on elliptic curve groups. As with Elliptic Curve Cryptography in general, the bit size of
the public key believed to be needed for ECDSA is about twice the size of the security level, inbits.
-
8/7/2019 Symmetric and Aysmmetric Algorithms
2/30
XTR
XTR is an encryption algorithm for public-key encryption. XTR is a novel method that makes
use of traces to represent and calculate powers of elements of a subgroup of a finite field. It isbased on the primitive underlying the very first public key cryptosystem, the Diffie-Hellman key
agreement protocol.
From a security point of view, XTR security relies on the difficulty of solving discrete logarithm
related problems in the multiplicative group of a finite field. Some advantages of XTR are its fast
key generation (much faster than RSA), small key sizes (much smaller than RSA, comparablewith ECC for current security settings), and speed (overall comparable with ECC for current
security settings).
Differences between symmetric and asymmetric encryption algorithms
Symmetric encryption algorithms encrypt and decrypt with the same key. Main advantages of
symmetric encryption algorithms are its security and high speed. Asymmetric encryption
algorithms encrypt and decrypt with different keys. Data is encrypted with a public key, anddecrypted with a private key. Asymmetric encryption algorithms (also known as public-key
algorithms) need at least a 3,000-bit key to achieve the same level of security of a 128-bit
symmetric algorithm. Asymmetric algorithms are incredibly slow and it is impractical to use
them to encrypt large amounts of data. Generally, symmetric encryption algorithms are muchfaster to execute on a computer than asymmetric ones. In practice they are often used together, so
that a public-key algorithm is used to encrypt a randomly generated encryption key, and therandom key is used to encrypt the actual message using a symmetric algorithm. This is
sometimes called hybrid encryption.
Strength of Encryption Algorithms
Strong encryption algorithms should always be designed so that they are as difficult to break as
possible. In theory, any encryption algorithm with a key can be broken by trying all possiblekeys in sequence. If using brute force to try all keys is the only option, the required computing
power increases exponentially with the length of the key. A 32-bit key takes 232 (about 109)
steps. This is something anyone can do on his/her home computer. An encryption algorithm with
56-bit keys, such as DES, requires a substantial effort, but using massive distributed systems
requires only hours of computing. In 1999, a brute-force search using a specially designed
supercomputer and a worldwide network of nearly 100,000 PCs on the Internet, found a DES
-
8/7/2019 Symmetric and Aysmmetric Algorithms
3/30
key in 22 hours and 15 minutes. It is currently believed that keys with at least 128 bits (as in
AES, for example) will be sufficient against brute-force attacks into the foreseeable future.
However, key length is not the only relevant issue. Many encryption algorithms can be broken
without trying all possible keys. In general, it is very difficult to design ciphers that could not be
broken more effectively using other method
The keys used in public-key encryption algorithms are usually much longer than those used in
symmetric encryption algorithms. This is caused by the extra structure that is available to the
cryptanalyst. There the problem is not that of guessing the right key, but deriving the matching
private key from the public key. In the case of RSA encryption algorithm, this could be done by
factoring a large integer that has two large prime factors. In the case of some other
cryptosystems, it is equivalent to computing the discrete logarithm modulo a large integer (which
is believed to be roughly comparable to factoring when the moduli is a large prime number).
How to decrypt files
1. OpenEncryption and Decryption Pro main window.
2. Click "Tools" button and select "Decryption".
3. Put the radio button next to "File"
4. Select the file you want to decrypt in "File you want to decrypt" field.
5. In "Save decrypted file to ..." field select where you want to save the decrypted file. You can
save decrypted file with the same name or you can change its name.
6. Click the "Decrypt" button.
7. Enter password for decryption and confirm it. Note: it is the same password that was used forfile encryption
-
8/7/2019 Symmetric and Aysmmetric Algorithms
4/30
1. /*2. Shuffle elements of Java ArrayList example
3. This java example shows how to shuffle elements of Java ArrayListobject using4. shuffle method of Collections class.5. */
6.
7. importjava.util.ArrayList;8. importjava.util.Collections;9.
10. publicclass ShuffleElementsOfArrayListExample {11.
12. publicstaticvoid main(String[] args){13.
14. //create an ArrayList object
15. ArrayList arrayList =newArrayList();16.17. //Add elements to Arraylist18. arrayList.add("A");19. arrayList.add("B");20. arrayList.add("C");21. arrayList.add("D");22. arrayList.add("E");23.
24. System.out.println("Before shuffling, ArrayList contains : "+arrayList);25.
26. /*27. To shuffle elements of Java ArrayList use,28. static void shuffle(List list) method of Collections class.29. */
30.
31. Collections.shuffle(arrayList);32.
33. System.out.println("After shuffling, ArrayList contains : "+arrayList);34.
35. }36. }
37.38. /*
39. Output would be40. Before shuffling, ArrayList contains : [A, B, C, D, E]41. After shuffling, ArrayList contains : [C, A, D, E, B]42. */
Abstract: This article describes how to implement JAVA environment IDEA
symmetric encryption algorithm. As the popularity of e-commerce and e-
government, security, encryption technology, which is widely used, the
requirements for security encryption technology is also very high. JAVA environment
-
8/7/2019 Symmetric and Aysmmetric Algorithms
5/30
under the current IDEA encryption has many advantages, because JAVA is based on
object-oriented programming language, and because of its performance has been
widely used platform-independent Internet development.
Keywords: IDEA (Internation Data Encryption Algorithm) JCA JCE reliability of key
independence
With the rapid development of Internet, e-commerce wave of overwhelming, daily
work and data are transmitted on the Internet online, greatly improving the
efficiency, reduce costs and create good results. However, the Internet network
protocol itself, there are important security issues (IP packet itself does not inherit
any security features, it is easy to forge the IP address of the package, modify its
contents, replay the previous packet and the transmission on his way to intercept
and view package content), so that the transfer of information online there is a
huge security risk the security of e-commerce has become increasingly prominent.
Encryption is the most important e-commerce security technology, encryption
directly affects the selection of e-commerce activities in the information security
level of e-commerce system, the main security problems can be solved by
encryption. Confidentiality of data through different encryption algorithms for data
encryption to achieve.
China is concerned, although it can introduce a lot of foreign equipment, but can
not rely on the introduction of encryption devices, network security as it relates to
national security of confidential information, it is necessary to their development.
There are many current international encryption algorithms, including DES (Data
Encryption Standard) is the invention of the first group the most widely used
symmetric encryption algorithm, DES key encryption with 56-bit 64-bit plaintexthoney, output 64-bit cipher, DES 56-bit There are 256 possible keys keys, but has
historically been cracked by brute-force attacks on DES key, 1998, the Electronic
Frontier Foundation (EFF) made with $250,000 dedicated computer, crack the DES
with 56 hours of the close Key, in 1999, EFF 22 hours to complete the break with
the work, so that DES algorithm has been badly hit, it's security under serious
threat. JAVA language for security and network processing ability, this paper
introduces the use of IDEA (Internation Data Encryption Algorithm) encryption
algorithm in the Java environment to achieve the security of data transmission.
one, IDEA data encryption algorithm
IDEA data encryption algorithm is to learn Jia Boshi by the Chinese scholars and the
famous cryptographer James L. Massey in 1990 jointly proposed. It is the plaintext
and ciphertext are 64 bit, but the key is 128 bits long. IDEA block cipher as an
iterative implementation of a 128-bit keys and 8 cycles. DES provides more than
security, but the key in the selection for the IDEA, we should exclude those known
as"weak key"key. DES only four weak sub-weak keys and 12 keys, and IDEA in a
-
8/7/2019 Symmetric and Aysmmetric Algorithms
6/30
considerable number of weak keys, 2 of the 51 th one. However, if the total number
of keys is very large, up to 2 power of a 128, then there are 2 power of a key 77 to
choose from. IDEA is considered to be extremely safe. 128-bit key, brute force
attack the number of tests required will be significantly increased compared with
DES, even allowing for weak key test. Moreover, it also shows its own particular
form of resistance to sexual assault professional.
Second, Java code system and the Java Cryptography Extension
Java is Sun developed an object-oriented programming language and platform
independence because it was widely used Internet development. Java Cryptographic
Framework (JCA) and Java Cryptography Extension (JCE) is designed to provide for
the Java implementation of cryptographic functions related API. They all use the
factory method to create the class routine, and then the actual encryption functions
entrusted to the providers of the specified underlying engine, the engine provided
for the class of service provider interfaces in Java, data encryption/ decryption, is
using its built-in JCE (Java Cryptography Extension) to achieve. 1.1 Java
development tools for the realization of digital signature and message digest,
including encryption, including the introduction of a flexible and based on vendor's
new application programming interface. Java Cryptographic Architecture vendor
interoperability support, and support of hardware and software. Java Cryptography
design follows two principles: (1) algorithm independence and reliability. (2)
implementation of the independence and interaction. Independence of the
algorithm by defining the service class to get the password. Users only need to
understand the concept of cryptographic algorithms, without having to care about
how to implement these concepts. To achieve the independence and interaction of
service provider via a password to achieve. Cryptographic service provider is toachieve one or more cryptographic services of one or more packages. Software
developers under certain interfaces, and the various algorithms, the package as a
provider, the user can install a different provider. Provide installation and
configuration, which can be provided will include ZIP and JAR file device on the
CLASSPATH, the re-edit the Java security properties file to set the definition of a
provider. Sun version of Java Runtime Environment, provide a default provider Sun.
Third, Java environment implementation
1. Implementation of the encryption process
void idea_enc (int data11 [],/ * 64-bit data encryption to be the first address */ int
key1 []) {
int i;
int tmp, x;
int zz [] = new int [6];
for (i = 0;i
-
8/7/2019 Symmetric and Aysmmetric Algorithms
7/30
for (int j = 0, box = i;j
-
8/7/2019 Symmetric and Aysmmetric Algorithms
8/30
platform independence, the application of the Internet, is very extensive. IDEA's use
of Java-based data encryption transmission can be implemented on different
platforms and has achieved simplicity, security and other advantages.
1. /*
2. Java Hashtable example.
3. This Java Hashtable example describes the basic operations
performed on the Hashtable.
4. */
5.6. importjava.util.Hashtable;7. importjava.util.Enumeration;8.
9. publicclass HashtableExample{10.11. publicstaticvoid main(String args[]){12.13. // constructs a new empty hashtable with default
initial capacity
14. Hashtable hashtable = newHashtable();15.16. /*17.
18. To specify initial capacity, use following
constructor19. Hashtable hashtable = new Hashtable(100);
20.
21. To create hashtable from map use following
constructor
22. Hashtable hashtable = new Hashtable(Map
myMap);
23.
24. */
25.26. hashtable.put("One", newInteger(1));//adding
value to Hashtable27.28. hashtable.put("Two", newInteger(2));29.30. hashtable.put("Three", newInteger(3));31.32. /*
-
8/7/2019 Symmetric and Aysmmetric Algorithms
9/30
33. IMPORTANT : We CAN NOT add primitives to the
Hashtable. We have to wrap it into one
34. of the wrapper classes before adding.
35. */
36.37.
/*
38.
39. To copy all key - value pairs from Map to
Hashtable use putAll method.
40.
41. Signature of putAll method is,
42. void putAll(Map m)
43.
44. */
45.46. //get number of keys present in the hashtable47. System.out.println("Hashtable contains " +
hashtable.size() + " key value pairs.");48.49. /*50.
51. To check whether Hashtable is empty or not,
use isEmpty() method.
52. isEmpty() returns true is Hashtable is empty,
otherwise false.
53.
54. */
55.
56. /*57.
58. Finding particular value from the Hashtable :
59.
60. Hashtable's contains method returns boolean
depending upon the presence of the value in given hashtable
61.
62. Signature of the contains method is,
63. boolean contains(Object value)
64.
65. */
66.67. if( hashtable.contains(newInteger(1))){68. System.out.println("Hashtable contains 1 as
value");69. }else{70. System.out.println("Hashtable does not contain
1 as value");71. }
-
8/7/2019 Symmetric and Aysmmetric Algorithms
10/30
72.73. /*74.
75. Finding particular Key from the Hashtable :
76.
77. Hashtable's containsKey method returns boolean
depending upon the presence of the
78. key in given hashtable
79.
80. Signature of the method is,
81. boolean containsKey(Object key)
82.
83. */
84.85. if(hashtable.containsKey("One")){86. System.out.println("Hashtable contains One as
key");87. }else{88. System.out.println("Hashtable does not contain
One as value");89. }90.91. /*92.
93. Use get method of Hashtable to get value
mapped to particular key.
94.
95. Signature of the get method is,
96. Object get(Object key)97.
98. */
99.100. Integer one = (Integer) hashtable.get("One");101.102. System.out.println("Value mapped with key
\"One\" is " + one);103.104. /*105. IMPORTANT: get method returns an Object, so
we need to downcast it.106. */
107.108. /*109.
110. To get all keys stored in Hashtable use keys
method.
111.
-
8/7/2019 Symmetric and Aysmmetric Algorithms
11/30
112. Signature of the keys method is,
113. Enumeration keys()
114.
115. To get Set of the keys use keySet() method
instead.
116. Set keySet()
117.
118. */
119.120. System.out.println("Retrieving all keys from
the Hashtable");121.122. Enumeration e = hashtable.keys();123.124. while( e. hasMoreElements()){125. System.out.println(e.nextElement());126. }127.128. /*129.
130. To get all values stored in Hashtable use
elements method.
131.
132. Signature of the elements method is,
133. Enumeration elements()
134.
135. To get Set of the values use entrySet() method
instead.
136. Set entrySet()137.
138. */
139.140. System.out.println("Retrieving all values from
the Hashtable");141.142. e = hashtable.elements();143.144. while(e. hasMoreElements()){145. System.out.println(e.nextElement());
146. }147.148. /*149.
150. To remove particular key - value pair from the
Hashtable use remove method.
151.
152. Signature of remove method is,
-
8/7/2019 Symmetric and Aysmmetric Algorithms
12/30
153. Object remove(Object key)
154.
155. This method returns value that was mapped to
the given key, otherwise null if mapping not found.
156.
157. */
158.159. System.out.println( hashtable.remove("One") +
" is removed from the Hashtable.");160.161. }162.163. }164.165. /*
166. OUTPUT of the above given Java Hashtable Example would
be :
167.168. Hashtable contains 3 key value pair.
169. Hashtable contains 1 as value
170. Hashtable contains One as key
171. Value mapped with key "One" is 1
172. Retrieving all keys from the Hashtable
173. One
174. Three
175. Two
176. Retrieving all values from the Hashtable
177. 1
178. 3179. 2
180. 1 is removed from the Hashtable.
181.182. */
183. Hashtable was part of the originaljava.util and is a concrete implementation of a
Dictionary. However, Java 2 reengineered Hashtable so that it also implements the Mapinterface. Thus, Hashtable is now integrated into the collections framework. It is similar
to HashMap, but is synchronized.184. Like HashMap, Hashtable stores key/value pairs in a hash table. When using a
Hashtable, you specify an object that is used as a key, and the value that you want linked
to that key. The key is then hashed, and the resulting hash code is used as the index atwhich the value is stored within the table.
185. A hash table can only store objects that override the hashCode() and equals()methods that are defined by Object. The hashCode() method must compute and returnthe hash code for the object. Of course, equals() compares two objects. Fortunately,
many of Java's built-in classes already implement the hashCode() method. For example,
the most common type ofHashtable uses a String object as the key. String implements
both hashCode() and equals().
-
8/7/2019 Symmetric and Aysmmetric Algorithms
13/30
186. The Hashtable constructors are shown here:
187. Hashtable( )
Hashtable(intsize)Hashtable(intsize, floatfillRatio)
Hashtable(Map m)
188. The first version is the default constructor. The second version creates a hashtable that has an initial size specified bysize. The third version creates a hash table that
has an initial size specified bysize and a fill ratio specified byfillRatio. This ratio must
be between 0.0 and 1.0, and it determines how full the hash table can be before it isresized upward. Specifically, when the number of elements is greater than the capacity of
the hash table multiplied by its fill ratio, the hash table is expanded. If you do not specify
a fill ratio, then 0.75 is used. Finally, the fourth version creates a hash table that is
initialized with the elements in m. The capacity of the hash table is set to twice thenumber of elements in m. The default load factor of 0.75 is used. The fourth constructor
was added by Java 2.
189.
The following example uses a Hashtable to store the names of bank depositors and their
current balances:190. // Demonstrate a Hashtable
import java.util.*;
class HTDemo {
public static void main(String args[]) {Hashtable balance = new Hashtable();
Enumeration names;
String str;double bal;
balance.put("John Doe", new Double(3434.34));
balance.put("Tom Smith", new Double(123.22));balance.put("Jane Baker", new Double(1378.00));
balance.put("Todd Hall", new Double(99.22));
balance.put("Ralph Smith", new Double(-19.08));
// Show all balances in hash table.names = balance.keys();
while(names.hasMoreElements()) {
str = (String) names.nextElement();System.out.println(str + ": " +
balance.get(str));
}
System.out.println();// Deposit 1,000 into John Doe's account
bal = ((Double)balance.get("John Doe")).doubleValue();
balance.put("John Doe", new Double(bal+1000));System.out.println("John Doe's new balance: " +
balance.get("John Doe"));
}}
-
8/7/2019 Symmetric and Aysmmetric Algorithms
14/30
191. The output from this program is shown here:
192. Ralph Smith: -19.08
Tom Smith: 123.22John Doe: 3434.34
Todd Hall: 99.22
Jane Baker: 1378.0John Doe's new balance: 4434.34
193. One important point: like the map classes, Hashtable does not directly support
iterators. Thus, the preceding program uses an enumeration to display the contents ofbalance. However, you can obtain set-views of the hash table, which permits the use of
iterators. To do so, you simply use one of the collection-view methods defined by Map,
such as entrySet( ) orkeySet( ). For example, you can obtain a set-view of the keys and
iterate through them. Here is a reworked version of the program that shows thistechnique:
194. // Use iterators with a Hashtable.
import java.util.*;
class HTDemo2 {public static void main(String args[]) {
Hashtable balance = new Hashtable();String str;
double bal;
balance.put("John Doe", new Double(3434.34));
balance.put("Tom Smith", new Double(123.22));balance.put("Jane Baker", new Double(1378.00));
balance.put("Todd Hall", new Double(99.22));
balance.put("Ralph Smith", new Double(-19.08));// show all balances in hashtable
Set set = balance.keySet(); // get set-view of keys
// get iteratorIterator itr = set.iterator();
while(itr.hasNext()) {
str = (String) itr.next();System.out.println(str + ": " +
balance.get(str));
}
System.out.println();// Deposit 1,000 into John Doe's account
bal = ((Double)balance.get("John Doe")).doubleValue();
balance.put("John Doe", new Double(bal+1000));System.out.println("John Doe's new balance: " +
balance.get("John Doe"));
}}
-
8/7/2019 Symmetric and Aysmmetric Algorithms
15/30
Hash table
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Not to be confused with Hash list or Hash tree.
"Unordered map" redirects here. For the proposed C++ class, see unordered_map
(C++).
A small phone book as a hash table.
In computer science, a hash table orhash map is adata structurethat uses a hash functiontomap identifying values, known as keys (e.g., a person's name), to their associated values (e.g.,
their telephone number). Thus, a hash table implements an associative array. The hash function
is used to transform the key into the index (the hash) of anarrayelement (theslotorbucket)where the corresponding value is to be sought.
Ideally, the hash function should map each possible key to a unique slot index, but this ideal is
rarely achievable in practice (unless the hash keys are fixed; i.e. new entries are never added to
the table after it is created). Instead, most hash table designs assume that hash collisionsdifferent keys that map to the same hash valuewill occur and must be accommodated in some
way.
In a well-dimensioned hash table, the average cost (number ofinstructions) for each lookup is
independent of the number of elements stored in the table. Many hash table designs also allowarbitrary insertions and deletions of key-value pairs, at constant average (indeed, amortized[1] )
cost per operation.[2][3]
In many situations, hash tables turn out to be more efficient than search trees or any othertable
lookup structure. For this reason, they are widely used in many kinds of computersoftware,particularly forassociative arrays,database indexing,caches, and sets.
http://en.wikipedia.org/wiki/Hash_table#mw-headhttp://en.wikipedia.org/wiki/Hash_table#p-searchhttp://en.wikipedia.org/wiki/Hash_listhttp://en.wikipedia.org/wiki/Hash_treehttp://en.wikipedia.org/wiki/Unordered_map_(C%2B%2B)http://en.wikipedia.org/wiki/Unordered_map_(C%2B%2B)http://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Unique_keyhttp://en.wikipedia.org/wiki/Value_(mathematics)http://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Array_data_typehttp://en.wikipedia.org/wiki/Array_data_typehttp://en.wikipedia.org/wiki/Array_data_typehttp://en.wikipedia.org/wiki/Collision_(computer_science)http://en.wikipedia.org/wiki/Instruction_(computer_science)http://en.wikipedia.org/wiki/Instruction_(computer_science)http://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Hash_table#cite_note-leiser-0http://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-1http://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-2http://en.wikipedia.org/wiki/Search_treehttp://en.wikipedia.org/wiki/Table_(computing)http://en.wikipedia.org/wiki/Table_(computing)http://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Database_indexhttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Set_data_structurehttp://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svghttp://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svghttp://en.wikipedia.org/wiki/Hash_table#mw-headhttp://en.wikipedia.org/wiki/Hash_table#p-searchhttp://en.wikipedia.org/wiki/Hash_listhttp://en.wikipedia.org/wiki/Hash_treehttp://en.wikipedia.org/wiki/Unordered_map_(C%2B%2B)http://en.wikipedia.org/wiki/Unordered_map_(C%2B%2B)http://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Unique_keyhttp://en.wikipedia.org/wiki/Value_(mathematics)http://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Array_data_typehttp://en.wikipedia.org/wiki/Collision_(computer_science)http://en.wikipedia.org/wiki/Instruction_(computer_science)http://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Hash_table#cite_note-leiser-0http://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-1http://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-2http://en.wikipedia.org/wiki/Search_treehttp://en.wikipedia.org/wiki/Table_(computing)http://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Database_indexhttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Set_data_structure -
8/7/2019 Symmetric and Aysmmetric Algorithms
16/30
Contents
[hide]
1 Hash function
o 1.1 Choosing a good hash functiono 1.2 Perfect hash function
2 Collision resolutiono 2.1 Load factoro 2.2 Separate chaining
2.2.1 Separate chaining with list heads 2.2.2 Separate chaining with other structures
o 2.3 Open addressingo 2.4 Coalesced hashingo 2.5 Robin Hood hashingo 2.6 Cuckoo hashingo 2.7 Hopscotch hashing
3 Dynamic resizingo 3.1 Resizing by copying all entrieso 3.2 Incremental resizingo 3.3 Monotonic keyso 3.4 Other solutions
4 Performance analysis 5 Features
o 5.1 Advantageso 5.2 Drawbacks
6 Useso 6.1 Associative arrays
o 6.2 Database indexingo 6.3 Cacheso 6.4 Setso 6.5 Object representationo 6.6 Unique data representationo 6.7 String interning
7 Implementationso 7.1 In programming languageso 7.2 Independent packages
8 History 9 See also
o 9.1 Related data structures 10 References 11 Further reading
12 External links
[edit] Hash function
Main article: Hash function
http://en.wikipedia.org/wiki/Hash_tablehttp://en.wikipedia.org/wiki/Hash_table#Hash_functionhttp://en.wikipedia.org/wiki/Hash_table#Choosing_a_good_hash_functionhttp://en.wikipedia.org/wiki/Hash_table#Perfect_hash_functionhttp://en.wikipedia.org/wiki/Hash_table#Collision_resolutionhttp://en.wikipedia.org/wiki/Hash_table#Load_factorhttp://en.wikipedia.org/wiki/Hash_table#Separate_chaininghttp://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_list_headshttp://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_other_structureshttp://en.wikipedia.org/wiki/Hash_table#Open_addressinghttp://en.wikipedia.org/wiki/Hash_table#Coalesced_hashinghttp://en.wikipedia.org/wiki/Hash_table#Robin_Hood_hashinghttp://en.wikipedia.org/wiki/Hash_table#Cuckoo_hashinghttp://en.wikipedia.org/wiki/Hash_table#Hopscotch_hashinghttp://en.wikipedia.org/wiki/Hash_table#Dynamic_resizinghttp://en.wikipedia.org/wiki/Hash_table#Resizing_by_copying_all_entrieshttp://en.wikipedia.org/wiki/Hash_table#Incremental_resizinghttp://en.wikipedia.org/wiki/Hash_table#Monotonic_keyshttp://en.wikipedia.org/wiki/Hash_table#Other_solutionshttp://en.wikipedia.org/wiki/Hash_table#Performance_analysishttp://en.wikipedia.org/wiki/Hash_table#Featureshttp://en.wikipedia.org/wiki/Hash_table#Advantageshttp://en.wikipedia.org/wiki/Hash_table#Drawbackshttp://en.wikipedia.org/wiki/Hash_table#Useshttp://en.wikipedia.org/wiki/Hash_table#Associative_arrayshttp://en.wikipedia.org/wiki/Hash_table#Database_indexinghttp://en.wikipedia.org/wiki/Hash_table#Cacheshttp://en.wikipedia.org/wiki/Hash_table#Setshttp://en.wikipedia.org/wiki/Hash_table#Object_representationhttp://en.wikipedia.org/wiki/Hash_table#Unique_data_representationhttp://en.wikipedia.org/wiki/Hash_table#String_interninghttp://en.wikipedia.org/wiki/Hash_table#Implementationshttp://en.wikipedia.org/wiki/Hash_table#In_programming_languageshttp://en.wikipedia.org/wiki/Hash_table#Independent_packageshttp://en.wikipedia.org/wiki/Hash_table#Historyhttp://en.wikipedia.org/wiki/Hash_table#See_alsohttp://en.wikipedia.org/wiki/Hash_table#Related_data_structureshttp://en.wikipedia.org/wiki/Hash_table#Referenceshttp://en.wikipedia.org/wiki/Hash_table#Further_readinghttp://en.wikipedia.org/wiki/Hash_table#External_linkshttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=1http://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Hash_tablehttp://en.wikipedia.org/wiki/Hash_table#Hash_functionhttp://en.wikipedia.org/wiki/Hash_table#Choosing_a_good_hash_functionhttp://en.wikipedia.org/wiki/Hash_table#Perfect_hash_functionhttp://en.wikipedia.org/wiki/Hash_table#Collision_resolutionhttp://en.wikipedia.org/wiki/Hash_table#Load_factorhttp://en.wikipedia.org/wiki/Hash_table#Separate_chaininghttp://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_list_headshttp://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_other_structureshttp://en.wikipedia.org/wiki/Hash_table#Open_addressinghttp://en.wikipedia.org/wiki/Hash_table#Coalesced_hashinghttp://en.wikipedia.org/wiki/Hash_table#Robin_Hood_hashinghttp://en.wikipedia.org/wiki/Hash_table#Cuckoo_hashinghttp://en.wikipedia.org/wiki/Hash_table#Hopscotch_hashinghttp://en.wikipedia.org/wiki/Hash_table#Dynamic_resizinghttp://en.wikipedia.org/wiki/Hash_table#Resizing_by_copying_all_entrieshttp://en.wikipedia.org/wiki/Hash_table#Incremental_resizinghttp://en.wikipedia.org/wiki/Hash_table#Monotonic_keyshttp://en.wikipedia.org/wiki/Hash_table#Other_solutionshttp://en.wikipedia.org/wiki/Hash_table#Performance_analysishttp://en.wikipedia.org/wiki/Hash_table#Featureshttp://en.wikipedia.org/wiki/Hash_table#Advantageshttp://en.wikipedia.org/wiki/Hash_table#Drawbackshttp://en.wikipedia.org/wiki/Hash_table#Useshttp://en.wikipedia.org/wiki/Hash_table#Associative_arrayshttp://en.wikipedia.org/wiki/Hash_table#Database_indexinghttp://en.wikipedia.org/wiki/Hash_table#Cacheshttp://en.wikipedia.org/wiki/Hash_table#Setshttp://en.wikipedia.org/wiki/Hash_table#Object_representationhttp://en.wikipedia.org/wiki/Hash_table#Unique_data_representationhttp://en.wikipedia.org/wiki/Hash_table#String_interninghttp://en.wikipedia.org/wiki/Hash_table#Implementationshttp://en.wikipedia.org/wiki/Hash_table#In_programming_languageshttp://en.wikipedia.org/wiki/Hash_table#Independent_packageshttp://en.wikipedia.org/wiki/Hash_table#Historyhttp://en.wikipedia.org/wiki/Hash_table#See_alsohttp://en.wikipedia.org/wiki/Hash_table#Related_data_structureshttp://en.wikipedia.org/wiki/Hash_table#Referenceshttp://en.wikipedia.org/wiki/Hash_table#Further_readinghttp://en.wikipedia.org/wiki/Hash_table#External_linkshttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=1http://en.wikipedia.org/wiki/Hash_function -
8/7/2019 Symmetric and Aysmmetric Algorithms
17/30
At the heart of the hash table algorithm is a simple array of items; this is often simply called thehash table. Hash table algorithms calculate an index from the data item's key and use this index
to place the data into the array. The implementation of this calculation is the hash function, f:
index = f(key, arrayLength)
The hash function calculates an index within the array from the data key. arrayLength is the
size of the array. Forassembly language or other low-level programs, a trivial hash function can
often create an index with just one or two inlinemachine instructions.
[edit] Choosing a good hash function
A good hash function and implementation algorithm are essential for good hash table
performance, but may be difficult to achieve. Poor hashing usually degrades hash table
performance by a constant factor,[citation needed] but hashing is often only a small part of the overallcomputation.
A basic requirement is that the function should provide a uniform distribution of hash values. A
non-uniform distribution increases the number of collisions, and the cost of resolving them.
Uniformity is sometimes difficult to ensure by design, but may be evaluated empirically usingstatistical tests, e.g., a chi-squared test.[4]
The distribution needs to be uniform only for table sizess that occur in the application. In
particular, if one uses dynamic resizing with exact doubling and halving ofs, the hash function
needs to be uniform only whens is apowerof two. On the other hand, some hashing algorithmsprovide uniform hashes only whens is aprime number.[5]
Foropen addressing schemes, the hash function should also avoid clustering, the mapping of twoor more keys to consecutive slots. Such clustering may cause the lookup cost to skyrocket, evenif the load factor is low and collisions are infrequent. The popular multiplicative hash [2] is
claimed to have particularly poor clustering behavior.[5]
Cryptographic hash functions are believed to provide good hash functions for any table sizes,
either by modulo reduction or bybit masking. They may also be appropriate, if there is a risk ofmalicious users trying to sabotagea network service by submitting requests designed to generate
a large number of collisions in the server's hash tables.[citation needed] However, these presumed
qualities are hardly worth their much larger computational cost and algorithmic complexity, and
the risk of sabotage can be avoided by cheaper methods (such as applying a secret salt to the
data, or using a universal hash function).
Some authors claim that good hash functions should have the avalanche effect; that is, a single-
bit change in the input key should affect, on average, half the bits in the output. Some popularhash functions do not have this property.[4]
http://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Assembly_languagehttp://en.wikipedia.org/wiki/Hash_function#Trivial_hash_functionhttp://en.wikipedia.org/wiki/Machine_instructionhttp://en.wikipedia.org/wiki/Machine_instructionhttp://en.wikipedia.org/wiki/Machine_instructionhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=2http://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Uniform_distribution_(discrete)http://en.wikipedia.org/wiki/Chi-square_testhttp://en.wikipedia.org/wiki/Hash_table#cite_note-mulvey1-3http://en.wikipedia.org/wiki/Power_functionhttp://en.wikipedia.org/wiki/Prime_numberhttp://en.wikipedia.org/wiki/Prime_numberhttp://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-4http://en.wikipedia.org/wiki/Open_addressinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-1http://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-4http://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-4http://en.wikipedia.org/wiki/Modulohttp://en.wikipedia.org/wiki/Bit_maskinghttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Salt_(cryptography)http://en.wikipedia.org/wiki/Universal_hash_functionhttp://en.wikipedia.org/wiki/Avalanche_effecthttp://en.wikipedia.org/wiki/Hash_table#cite_note-mulvey1-3http://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Assembly_languagehttp://en.wikipedia.org/wiki/Hash_function#Trivial_hash_functionhttp://en.wikipedia.org/wiki/Machine_instructionhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=2http://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Uniform_distribution_(discrete)http://en.wikipedia.org/wiki/Chi-square_testhttp://en.wikipedia.org/wiki/Hash_table#cite_note-mulvey1-3http://en.wikipedia.org/wiki/Power_functionhttp://en.wikipedia.org/wiki/Prime_numberhttp://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-4http://en.wikipedia.org/wiki/Open_addressinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-1http://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-4http://en.wikipedia.org/wiki/Modulohttp://en.wikipedia.org/wiki/Bit_maskinghttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Salt_(cryptography)http://en.wikipedia.org/wiki/Universal_hash_functionhttp://en.wikipedia.org/wiki/Avalanche_effecthttp://en.wikipedia.org/wiki/Hash_table#cite_note-mulvey1-3 -
8/7/2019 Symmetric and Aysmmetric Algorithms
18/30
[edit] Perfect hash function
If all keys are known ahead of time, aperfect hash functioncan be used to create a perfect hashtable that has no collisions. Ifminimal perfect hashing is used, every location in the hash table
can be used as well.
Perfect hashing allows for constant time lookups in the worst case. This is in contrast to most
chaining and open addressing methods, where the time for lookup is low on average, but may bevery large (proportional to the number of entries) for some sets of keys.
[edit] Collision resolution
Hash collisions are practically unavoidable when hashing a random subset of a large set of
possible keys. For example, if 2,500 keys are hashed into a million buckets, even with a perfectly
uniform random distribution, according to thebirthday problem there is a 95% chance of at leasttwo of the keys being hashed to the same slot.
Therefore, most hash table implementations have some collision resolution strategy to handle
such events. Some common strategies are described below. All these methods require that thekeys (or pointers to them) be stored in the table, together with the associated values.
[edit] Load factor
The performance of most collision resolution methods does not depend directly on the numbern
of stored entries, but depends strongly on the table's load factor, the ratio n/s between n and the
sizes of its bucket array. Sometimes this is referred to as thefill factor, as it represents the
portion of the s buckets in the structure that arefilledwith one of the n stored entries. With a
good hash function, the average lookup cost is nearly constant as the load factor increases from 0up to 0.7(about 2/3 full) or so.[citation needed] Beyond that point, the probability of collisions and the
cost of handling them increases.
On the other hand, as the load factor approaches zero, the size of the hash table increases withlittle improvement in the search cost, and memory is wasted.
http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=3http://en.wikipedia.org/wiki/Perfect_hash_functionhttp://en.wikipedia.org/wiki/Perfect_hash_functionhttp://en.wikipedia.org/wiki/Minimal_perfect_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=4http://en.wikipedia.org/wiki/Birthday_problemhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=5http://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=3http://en.wikipedia.org/wiki/Perfect_hash_functionhttp://en.wikipedia.org/wiki/Minimal_perfect_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=4http://en.wikipedia.org/wiki/Birthday_problemhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=5http://en.wikipedia.org/wiki/Wikipedia:Citation_needed -
8/7/2019 Symmetric and Aysmmetric Algorithms
19/30
[edit] Separate chaining
Hash collision resolved by separate chaining.
In the strategy known asseparate chaining, direct chaining, or simply chaining, each slot of the
bucket array is a pointer to alinked listthat contains the key-value pairs that hashed to the same
location. Lookup requires scanning the list for an entry with the given key. Insertion requires
adding a new entry record to either end of the list belonging to the hashed slot. Deletion requiressearching the list and removing the element. (The technique is also called open hashingorclosed
addressing, which should not be confused with 'open addressing' or 'closed hashing'.)
Chained hash tables with linked lists are popular because they require only basic data structures
with simple algorithms, and can use simple hash functions that are unsuitable for other methods.
The cost of a table operation is that of scanning the entries of the selected bucket for the desired
key. If the distribution of keys is sufficiently uniform, the average cost of a lookup depends only
on the average number of keys per bucketthat is, on the load factor.
Chained hash tables remain effective even when the number of table entries n is much higherthan the number of slots. Their performancedegrades more gracefully (linearly) with the load
factor. For example, a chained hash table with 1000 slots and 10,000 stored keys (load factor 10)is five to ten times slower than a 10,000-slot table (load factor 1); but still 1000 times faster thana plain sequential list, and possibly even faster than a balanced search tree.
For separate-chaining, the worst-case scenario is when all entries were inserted into the same
bucket, in which case the hash table is ineffective and the cost is that of searching the bucket data
structure. If the latter is a linear list, the lookup procedure may have to scan all its entries; so theworst-case cost is proportional to the numbern of entries in the table.
http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=6http://en.wikipedia.org/wiki/Linked_listhttp://en.wikipedia.org/wiki/Linked_listhttp://en.wikipedia.org/wiki/Linked_listhttp://en.wikipedia.org/wiki/SUHAhttp://en.wikipedia.org/wiki/Graceful_degradationhttp://en.wikipedia.org/wiki/Graceful_degradationhttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_1_LL.svghttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_1_LL.svghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=6http://en.wikipedia.org/wiki/Linked_listhttp://en.wikipedia.org/wiki/SUHAhttp://en.wikipedia.org/wiki/Graceful_degradation -
8/7/2019 Symmetric and Aysmmetric Algorithms
20/30
The bucket chains are often implemented as ordered lists, sorted by the key field; this choice
approximately halves the average cost of unsuccessful lookups, compared to an unordered
list[citation needed]. However, if some keys are much more likely to come up than others, an unorderedlist with move-to-front heuristic may be more effective. More sophisticated data structures, such
as balanced search trees, are worth considering only if the load factor is large (about 10 or more),
or if the hash distribution is likely to be very non-uniform, or if one must guarantee goodperformance even in the worst-case. However, using a larger table and/or a better hash function
may be even more effective in those cases.
Chained hash tables also inherit the disadvantages of linked lists. When storing small keys and
values, the space overhead of the next pointer in each entry record can be significant. An
additional disadvantage is that traversing a linked list has poorcache performance, making the
processor cache ineffective.
[edit] Separate chaining with list heads
Hash collision by separate chaining with head records in the bucket array.
Some chaining implementations store the first record of each chain in the slot array itself.[3] Thepurpose is to increase cache efficiency of hash table access. To save memory space, such hash
tables often have about as many slots as stored entries, meaning that many slots have two or
more entries.
[edit] Separate chaining with other structures
Instead of a list, one can use any other data structure that supports the required operations. Forexample, by using a self-balancing tree, the theoretical worst-case time of common hash table
operations (insertion, deletion, lookup) can be brought down to O(log n) rather than O(n).
However, this approach is only worth the trouble and extra memory cost, if long delays must be
http://en.wikipedia.org/wiki/Sequencehttp://en.wikipedia.org/wiki/Sequencehttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Move-to-front_heuristichttp://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=7http://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-2http://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-2http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=8http://en.wikipedia.org/wiki/Self-balancing_binary_search_treehttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_0_LL.svghttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_0_LL.svghttp://en.wikipedia.org/wiki/Sequencehttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Move-to-front_heuristichttp://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=7http://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-2http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=8http://en.wikipedia.org/wiki/Self-balancing_binary_search_treehttp://en.wikipedia.org/wiki/Big_O_notation -
8/7/2019 Symmetric and Aysmmetric Algorithms
21/30
avoided at all costs (e.g. in a real-time application), or if one expects to have many entries hashed
to the same slot (e.g. if one expects extremely non-uniform or even malicious key distributions).
The variant calledarray hash table uses adynamic array to store all the entries that hash to thesame slot.[6][7][8]Each newly inserted entry gets appended to the end of the dynamic array that is
assigned to the slot. The dynamic array is resized in an exact-fitmanner, meaning it is grownonly by as many bytes as needed. Alternative techniques such as growing the array by block
sizes orpages were found to improve insertion performance, but at a cost in space. This variationmakes more efficient use ofCPU caching and the translation lookaside buffer(TLB), because
slot entries are stored in sequential memory positions. It also dispenses with the next pointers
that are required by linked lists, which saves space. Despite frequent array resizing, spaceoverheads incurred by operating system such as memory fragmentation, were found to be small.
An elaboration on this approach is the so-called dynamic perfect hashing,[9] where a bucket that
contains kentries is organized as a perfect hash table with k2 slots. While it uses more memory
(n2 slots forn entries, in the worst case), this variant has guaranteed constant worst-case lookup
time, and low amortized time for insertion.
[edit] Open addressing
Hash collision resolved by open addressing with linear probing (interval=1). Note
that "Ted Baker" has a unique hash, but nevertheless collided with "Sandra Dee"
that had previously collided with "John Smith".
In another strategy, calledopen addressing, all entry records are stored in the bucket array itself.
When a new entry has to be inserted, the buckets are examined, starting with the hashed-to slot
http://en.wikipedia.org/w/index.php?title=Array_hash_table&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Array_hash_table&action=edit&redlink=1http://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Hash_table#cite_note-5http://en.wikipedia.org/wiki/Hash_table#cite_note-6http://en.wikipedia.org/wiki/Hash_table#cite_note-7http://en.wikipedia.org/wiki/Hash_table#cite_note-7http://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Translation_lookaside_bufferhttp://en.wikipedia.org/wiki/Dynamic_perfect_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-8http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=9http://en.wikipedia.org/wiki/Open_addressinghttp://en.wikipedia.org/wiki/Open_addressinghttp://en.wikipedia.org/wiki/Open_addressinghttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_0_SP.svghttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_0_SP.svghttp://en.wikipedia.org/w/index.php?title=Array_hash_table&action=edit&redlink=1http://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Hash_table#cite_note-5http://en.wikipedia.org/wiki/Hash_table#cite_note-6http://en.wikipedia.org/wiki/Hash_table#cite_note-7http://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Translation_lookaside_bufferhttp://en.wikipedia.org/wiki/Dynamic_perfect_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-8http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=9http://en.wikipedia.org/wiki/Open_addressing -
8/7/2019 Symmetric and Aysmmetric Algorithms
22/30
and proceeding in someprobe sequence, until an unoccupied slot is found. When searching for
an entry, the buckets are scanned in the same sequence, until either the target record is found, or
an unused array slot is found, which indicates that there is no such key in the table.[10] The name"open addressing" refers to the fact that the location ("address") of the item is not determined by
its hash value. (This method is also called closed hashing; it should not be confused with "open
hashing" or "closed addressing" that usually mean separate chaining.)
Well-known probe sequences include:
Linear probing, in which the interval between probes is fixed (usually 1) Quadratic probing, in which the interval between probes is increased by
adding the successive outputs of a quadratic polynomial to the starting valuegiven by the original hash computation
Double hashing, in which the interval between probes is computed byanother hash function
A drawback of all these open addressing schemes is that the number of stored entries cannot
exceed the number of slots in the bucket array. In fact, even with good hash functions, theirperformance seriously degrades when the load factor grows beyond 0.7 or so. For many
applications, these restrictions mandate the use of dynamic resizing, with its attendant costs.
Open addressing schemes also put more stringent requirements on the hash function: besidesdistributing the keys more uniformly over the buckets, the function must also minimize the
clustering of hash values that are consecutive in the probe order. Even experienced programmers
may find such clustering hard to avoid.
Open addressing only saves memory if the entries are small (less than 4 times the size of apointer) and the load factor is not too small. If the load factor is close to zero (that is, there are
far more buckets than stored entries), open addressing is wasteful even if each entry is just twowords.
http://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-9http://en.wikipedia.org/wiki/Linear_probinghttp://en.wikipedia.org/wiki/Quadratic_probinghttp://en.wikipedia.org/wiki/Double_hashinghttp://en.wikipedia.org/wiki/File:Hash_table_average_insertion_time.pnghttp://en.wikipedia.org/wiki/File:Hash_table_average_insertion_time.pnghttp://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-9http://en.wikipedia.org/wiki/Linear_probinghttp://en.wikipedia.org/wiki/Quadratic_probinghttp://en.wikipedia.org/wiki/Double_hashing -
8/7/2019 Symmetric and Aysmmetric Algorithms
23/30
This graph compares the average number of cache misses required to lookup
elements in tables with chaining and linear probing. As the table passes the 80%-
full mark, linear probing's performance drastically degrades.
Open addressing avoids the time overhead of allocating each new entry record, and can be
implemented even in the absence of a memory allocator. It also avoids the extra indirectionrequired to access the first entry of each bucket (that is, usually the only one). It also has betterlocality of reference, particularly with linear probing. With small record sizes, these factors can
yield better performance than chaining, particularly for lookups.
Hash tables with open addressing are also easier to serialize, because they do not use pointers.
On the other hand, normal open addressing is a poor choice for large elements, because these
elements fill entire CPU cache lines (negating the cache advantage), and a large amount of spaceis wasted on large empty table slots. If the open addressing table only stores references to
elements (external storage), it uses space comparable to chaining even for large records but loses
its speed advantage.
Generally speaking, open addressing is better used for hash tables with small records that can bestored within the table (internal storage) and fit in a cache line. They are particularly suitable for
elements of one word or less. If the table is expected to have a high load factor, the records are
large, or the data is variable-sized, chained hash tables often perform as well or better.
Ultimately, used sensibly, any kind of hash table algorithm is usually fast enough; and the
percentage of a calculation spent in hash table code is low. Memory usage is rarely considered
excessive. Therefore, in most cases the differences between these algorithms are marginal, and
other considerations typically come into play.
[edit] Coalesced hashing
A hybrid of chaining and open addressing, coalesced hashinglinks together chains of nodes
within the table itself.[10]Like open addressing, it achieves space usage and (somewhatdiminished) cache advantages over chaining. Like chaining, it does not exhibit clustering effects;
in fact, the table can be efficiently filled to a high density. Unlike chaining, it cannot have more
elements than table slots.
[edit] Robin Hood hashing
One interesting variation on double-hashing collision resolution is Robin Hood hashing. [11] Theidea is that a new key may displace a key already inserted, if its probe count is larger than that ofthe key at the current position. The net effect of this is that it reduces worst case search times in
the table. This is similar to Knuth's ordered hash tables except that the criterion for bumping a
key does not depend on a direct relationship between the keys. Since both the worst case and thevariation in the number of probes is reduced dramatically, an interesting variation is to probe the
table starting at the expected successful probe value and then expand from that position in both
directions.[12] External Robin Hashing is an extension of this algorithm where the table is stored
http://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/Serializationhttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=10http://en.wikipedia.org/wiki/Coalesced_hashinghttp://en.wikipedia.org/wiki/Coalesced_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-9http://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-9http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=11http://en.wikipedia.org/wiki/Hash_table#cite_note-10http://en.wikipedia.org/wiki/Hash_table#cite_note-11http://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/Serializationhttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=10http://en.wikipedia.org/wiki/Coalesced_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-9http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=11http://en.wikipedia.org/wiki/Hash_table#cite_note-10http://en.wikipedia.org/wiki/Hash_table#cite_note-11 -
8/7/2019 Symmetric and Aysmmetric Algorithms
24/30
in an external file and each table position corresponds to a fixed-sized page or bucket withB
records.[13]
[edit] Cuckoo hashing
Another alternative open-addressing solution is cuckoo hashing, which ensures constant lookuptime in the worst case, and constant amortized time for insertions and deletions. It uses two or
more hash functions, which means any key/value pair could be in two or more locations. Forlookup, the first hash function is used; if the key/value is not found, then the second hash
function is used, and so on. If a collision happens during insertion, then the key is re-hashed with
the second hash function to map it to another bucket. If all hash functions are used and there isstill a collision, then the key it collided with is removed to make space for the new key, and the
old key is re-hashed with one of the other hash functions, which maps it to another bucket. If that
location also results in a collision, then the process repeats until there is no collision or theprocess traverses all the buckets, at which point the table is resized. By combining multiple hash
functions with multiple cells per bucket, very high space utilisation can be achieved.
[edit] Hopscotch hashing
Another alternative open-addressing solution is hopscotch hashing,[14] which combines the
approaches ofcuckoo hashing and linear probing, yet seems in general to avoid their limitations.
In particular it works well even when the load factor grows beyond 0.9. The algorithm is wellsuited for implementing a resizable concurrent hash table.
The hopscotch hashing algorithm works by defining a neighborhood of buckets near the original
hashed bucket, where a given entry is always found. Thus, search is limited to the number of
entries in this neighborhood, which is logarithmic in the worst case, constant on average, and
with proper alignment of the neighborhood typically requires one cache miss. When inserting anentry, one first attempts to add it to a bucket in the neighborhood. However, if all buckets in this
neighborhood are occupied, the algorithm traverses buckets in sequence until an open slot (an
unoccupied bucket) is found (as in linear probing). At that point, since the empty bucket isoutside the neighborhood, items are repeatedly displaced in a sequence of hops. (This is similar
to cuckoo hashing, but with the difference that in this case the empty slot is being moved into the
neighborhood, instead of items being moved out with the hope of eventually finding an emptyslot.) Each hop brings the open slot closer to the original neighborhood, without invalidating the
neighborhood property of any of the buckets along the way. In the end, the open slot has been
moved into the neighborhood, and the entry being inserted can be added to it.
[edit] Dynamic resizing
To keep the load factor under a certain limit, e.g. under 3/4, many table implementations expandthe table when items are inserted. For example, in Java'sHashMap class the default load factor
threshold for table expansion is 0.75. Since buckets are usually implemented on top of adynamic
array and any constant proportion for resizing greater than 1 will keep the load factor under the
http://en.wikipedia.org/wiki/Hash_table#cite_note-12http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=12http://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=13http://en.wikipedia.org/wiki/Hopscotch_hashinghttp://en.wikipedia.org/wiki/Hopscotch_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-13http://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/wiki/Linear_probinghttp://en.wikipedia.org/w/index.php?title=Concurrent_hash_table&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=14http://en.wikipedia.org/wiki/Java_(programming_language)http://java.sun.com/javase/6/docs/api/java/util/HashMap.htmlhttp://java.sun.com/javase/6/docs/api/java/util/HashMap.htmlhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Hash_table#cite_note-12http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=12http://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=13http://en.wikipedia.org/wiki/Hopscotch_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-13http://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/wiki/Linear_probinghttp://en.wikipedia.org/w/index.php?title=Concurrent_hash_table&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=14http://en.wikipedia.org/wiki/Java_(programming_language)http://java.sun.com/javase/6/docs/api/java/util/HashMap.htmlhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Dynamic_array -
8/7/2019 Symmetric and Aysmmetric Algorithms
25/30
desired limit, the exact choice of the constant is determined by the samespace-time tradeoffas
fordynamic arrays.
Resizing is accompanied by a full or incremental table rehash whereby existing items aremapped to new bucket locations.
To limit the proportion of memory wasted due to empty buckets, some implementations also
shrink the size of the table - followed by a rehash - when items are deleted. From the point of
space-time tradeoffs, this operation is similar to the deallocation in dynamic arrays.
[edit] Resizing by copying all entries
A common approach is to automatically trigger a complete resizing when the load factor exceeds
some threshold rmax. Then a new larger table is allocated, all the entries of the old table areremoved and inserted into this new table, and the old table is returned to the free storage pool.
Symmetrically, when the load factor falls below a second threshold rmin, all entries are moved to
a new smaller table.
If the table size increases or decreases by a fixed percentage at each expansion, the total cost ofthese resizings, amortizedover all insert and delete operations, is still a constant, independent of
the number of entries n and of the numberm of operations performed.
For example, consider a table that was created with the minimum possible size and is doubled
each time the load ratio exceeds some threshold. Ifm elements are inserted into that table, thetotal number of extra re-insertions that occur in all dynamic resizings of the table is at most m1.
In other words, dynamic resizing roughly doubles the cost of each insert or delete operation.
[edit] Incremental resizing
Some hash table implementations, notably in real-time systems, cannot pay the price of enlarging
the hash table all at once, because it may interrupt time-critical operations. If one cannot avoid
dynamic resizing, a solution is to perform the resizing gradually:
During the resize, allocate the new hash table, but keep the old tableunchanged.
In each lookup or delete operation, check both tables. Perform insertion operations only in the new table. At each insertion also move r elements from the old table to the new table.
When all elements are removed from the old table, deallocate it.
To ensure that the old table is completely copied over before the new table itself needs to be
enlarged, it is necessary to increase the size of the table by a factor of at least (r+ 1)/rduring
resizing.
http://en.wikipedia.org/wiki/Space-time_tradeoffhttp://en.wikipedia.org/wiki/Space-time_tradeoffhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Space-time_tradeoffhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=15http://en.wikipedia.org/wiki/Dynamic_memory_allocationhttp://en.wikipedia.org/wiki/Dynamic_memory_allocationhttp://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=16http://en.wikipedia.org/wiki/Real-time_systemhttp://en.wikipedia.org/wiki/Real-time_systemhttp://en.wikipedia.org/wiki/Space-time_tradeoffhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Space-time_tradeoffhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=15http://en.wikipedia.org/wiki/Dynamic_memory_allocationhttp://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=16http://en.wikipedia.org/wiki/Real-time_system -
8/7/2019 Symmetric and Aysmmetric Algorithms
26/30
[edit] Monotonic keys
If it is known that key values will always increase monotonically, then a variation ofconsistenthashing can be achieved by keeping a list of the single most recent key value at each hash table
resize operation. Upon lookup, keys that fall in the ranges defined by these list entries are
directed to the appropriate hash function--and indeed hash table--both of which can be differentfor each range. Since it is common to grow the overall number of entries by doubling, there will
only be O(lg(N)) ranges to check, and binary search time for the redirection would be
O(lg(lg(N))).
[edit] Other solutions
Linear hashing[15]is a hash table algorithm that permits incremental hash table expansion. It is
implemented using a single hash table, but with two possible look-up functions.
Another way to decrease the cost of table resizing is to choose a hash function in such a way that
the hashes of most values do not change when the table is resized. This approach, calledconsistent hashing, is prevalent in disk-based and distributed hashes, where rehashing is
prohibitively costly.
[edit] Performance analysis
In the simplest model, the hash function is completely unspecified and the table does not resize.For the best possible choice of hash function, a table of size n with open addressing has no
collisions and holds up to n elements, with a single comparison for successful lookup, and a table
of size n with chaining and kkeys has the minimum max(0, k-n) collisions and O(1 + k/n)comparisons for lookup. For the worst choice of hash function, every insertion causes a collision,
and hash tables degenerate to linear search, with (k) amortized comparisons per insertion and
up to kcomparisons for a successful lookup.
Adding rehashing to this model is straightforward. As in a dynamic array, geometric resizing bya factor ofb implies that only k/bi keys are inserted i or more times, so that the total number of
insertions is bounded above by bk/(b-1), which is O(k). By using rehashing to maintain k< n,
tables using both chaining and open addressing can have unlimited elements and performsuccessful lookup in a single comparison for the best choice of hash function.
In more realistic models, the hash function is a random variable over a probability distribution of
hash functions, and performance is computed on average over the choice of hash function. When
this distribution is uniform, the assumption is called "simple uniform hashing" and it can beshown that hashing with chaining requires (1 + k/n) comparisons on average for an
unsuccessful lookup, and hashing with open addressing requires (1/(1 - k/n)).[16]Both these
bounds are constant, if we maintain k/n < c using table resizing, where c is a fixed constant lessthan 1.
http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=17http://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=18http://en.wikipedia.org/wiki/Linear_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-14http://en.wikipedia.org/wiki/Hash_table#cite_note-14http://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=19http://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Uniform_distribution_(discrete)http://en.wikipedia.org/wiki/Uniform_distribution_(discrete)http://en.wikipedia.org/wiki/Hash_table#cite_note-15http://en.wikipedia.org/wiki/Hash_table#cite_note-15http://en.wikipedia.org/wiki/Hash_table#cite_note-15http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=17http://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=18http://en.wikipedia.org/wiki/Linear_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-14http://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=19http://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Uniform_distribution_(discrete)http://en.wikipedia.org/wiki/Hash_table#cite_note-15 -
8/7/2019 Symmetric and Aysmmetric Algorithms
27/30
[edit] Features
[edit] Advantages
The main advantage of hash tables over other table data structures is speed. This advantage is
more apparent when the number of entries is large (thousands or more). Hash tables areparticularly efficient when the maximum number of entries can be predicted in advance, so that
the bucket array can be allocated once with the optimum size and never resized.
If the set of key-value pairs is fixed and known ahead of time (so insertions and deletions are not
allowed), one may reduce the average lookup cost by a careful choice of the hash function,bucket table size, and internal data structures. In particular, one may be able to devise a hash
function that is collision-free, or even perfect (see below). In this case the keys need not be
stored in the table.
[edit] Drawbacks
Hash tables can be more difficult to implement than self-balancing binary search trees[citation needed].
Choosing an effective hash function for a specific application is more an art than a science. Inopen-addressed hash tables it is fairly easy to create a poor hash function.
Although operations on a hash table take constant time on average, the cost of a good hash
function can be significantly higher than the inner loop of the lookup algorithm for a sequential
list or search tree. Thus hash tables are not effective when the number of entries is very small.(However, in some cases the high cost of computing the hash function can be mitigated by
saving the hash value together with the key.)
For certain string processing applications, such as spell-checking, hash tables may be lessefficient than tries,finite automata, orJudy arrays. Also, if each key is represented by a small
enough number of bits, then, instead of a hash table, one may use the key directly as the index
into an array of values. Note that there are no collisions in this case.
The entries stored in a hash table can be enumerated efficiently (at constant cost per entry), butonly in some pseudo-random order. Therefore, there is no efficient way to locate an entry whose
key is nearestto a given key. Listing all n entries in some specific order generally requires a
separate sorting step, whose cost is proportional to log(n) per entry. In comparison, ordered
search trees have lookup and insertion cost proportional to log(n), but allow finding the nearestkey at about the same cost, and orderedenumeration of all entries at constant cost per entry.
If the keys are not stored (because the hash function is collision-free), there may be no easy way
to enumerate the keys that are present in the table at any given moment.
Although the average cost per operation is constant and fairly small, the cost of a singleoperation may be quite high. In particular, if the hash table usesdynamic resizing, an insertion or
deletion operation may occasionally take time proportional to the number of entries. This may be
a serious drawback in real-time or interactive applications.
http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=20http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=21http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=22http://en.wikipedia.org/wiki/Self-balancing_binary_search_treehttp://en.wikipedia.org/wiki/Self-balancing_binary_search_treehttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Spell_checkerhttp://en.wikipedia.org/wiki/Triehttp://en.wikipedia.org/wiki/Triehttp://en.wikipedia.org/wiki/Finite_automatahttp://en.wikipedia.org/wiki/Judy_arrayhttp://en.wikipedia.org/wiki/Judy_arrayhttp://en.wikipedia.org/wiki/Hash_table#Dynamic_resizinghttp://en.wikipedia.org/wiki/Hash_table#Dynamic_resizinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=20http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=21http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=22http://en.wikipedia.org/wiki/Self-balancing_binary_search_treehttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Spell_checkerhttp://en.wikipedia.org/wiki/Triehttp://en.wikipedia.org/wiki/Finite_automatahttp://en.wikipedia.org/wiki/Judy_arrayhttp://en.wikipedia.org/wiki/Hash_table#Dynamic_resizing -
8/7/2019 Symmetric and Aysmmetric Algorithms
28/30
Hash tables in general exhibit poorlocality of referencethat is, the data to be accessed is
distributed seemingly at random in memory. Because hash tables cause access patterns that jump
around, this can triggermicroprocessor cache misses that cause long delays. Compact datastructures such as arrays searched with linear searchmay be faster, if the table is relatively small
and keys are integers or other short strings. According toMoore's Law, cache sizes are growing
exponentially and so what is considered "small" may be increasing. The optimal performancepoint varies from system to system.
Hash tables become quite inefficient when there are many collisions. While extremely uneven
hash distributions are extremely unlikely to arise by chance, amalicious adversary with
knowledge of the hash function may be able to supply information to a hash that creates worst-case behavior by causing excessive collisions, resulting in very poor performance (e.g., a denial
of service attack). In critical applications, eitheruniversal hashing can be used or a data structure
with better worst-case guarantees may be preferable.[17]
[edit] Uses
[edit] Associative arrays
Hash tables are commonly used to implement many types of in-memory tables. They are used to
implement associative arrays(arrays whose indices are arbitrary stringsor other complicated
objects), especially in interpretedprogramming languages likeAWK,Perl, and PHP.
[edit] Database indexing
Hash tables may also be used fordisk-basedpersistent data structures and database indices (such
as dbm) althoughbalanced trees are more popular in these applications.
[edit] Caches
Hash tables can be used to implement caches, auxiliary data tables that are used to speed up the
access to data that is primarily stored in slower media. In this application, hash collisions can behandled by discarding one of the two colliding entriesusually the one that is currently stored in
the table.
[edit] Sets
Besides recovering the entry that has a given key, many hash table implementations can also tellwhether such an entry exists or not.
Those structures can therefore be used to implement a set data structure, which merely recordswhether a given key belongs to a specified set of keys. In this case, the structure can be
simplified by eliminating all parts that have to do with the entry values. Hashing can be used to
implement both static and dynamic sets.
http://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Linear_searchhttp://en.wikipedia.org/wiki/Linear_searchhttp://en.wikipedia.org/wiki/Moore's_Lawhttp://en.wikipedia.org/wiki/Moore's_Lawhttp://en.wikipedia.org/wiki/Black_hathttp://en.wikipedia.org/wiki/Black_hathttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Universal_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-16http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=23http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=24http://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/String_(computing)http://en.wikipedia.org/wiki/String_(computing)http://en.wikipedia.org/wiki/Interpreter_(computer_science)http://en.wikipedia.org/wiki/Interpreter_(computer_science)http://en.wikipedia.org/wiki/Programming_languagehttp://en.wikipedia.org/wiki/AWKhttp://en.wikipedia.org/wiki/AWKhttp://en.wikipedia.org/wiki/AWKhttp://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/PHPhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=25http://en.wikipedia.org/wiki/Disk_drivehttp://en.wikipedia.org/wiki/Persistent_data_structurehttp://en.wikipedia.org/wiki/Index_(database)http://en.wikipedia.org/wiki/Dbmhttp://en.wikipedia.org/wiki/Balanced_treehttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=26http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=27http://en.wikipedia.org/wiki/Set_data_structurehttp://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Linear_searchhttp://en.wikipedia.org/wiki/Moore's_Lawhttp://en.wikipedia.org/wiki/Black_hathttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Universal_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-16http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=23http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=24http://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/String_(computing)http://en.wikipedia.org/wiki/Interpreter_(computer_science)http://en.wikipedia.org/wiki/Programming_languagehttp://en.wikipedia.org/wiki/AWKhttp://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/PHPhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=25http://en.wikipedia.org/wiki/Disk_drivehttp://en.wikipedia.org/wiki/Persistent_data_structurehttp://en.wikipedia.org/wiki/Index_(database)http://en.wikipedia.org/wiki/Dbmhttp://en.wikipedia.org/wiki/Balanced_treehttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=26http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=27http://en.wikipedia.org/wiki/Set_data_structure -
8/7/2019 Symmetric and Aysmmetric Algorithms
29/30
[edit] Object representation
Several dynamic languages, such as Perl,Python,JavaScript, and Ruby, use hash tables toimplement objects. In this representation, the keys are the names of the members and methods of
the object, and the values are pointers to the corresponding member or method.
[edit] Unique data representation
Hash tables can be used by some programs to avoid creating multiple character strings with the
same contents. For that purpose, all strings in use by the program are stored in a single hash
table, which is checked whenever a new string has to be created. This technique was introducedin Lisp interpreters under the name hash consing, and can be used with many other kinds of data
(expression trees in asymbolic algebra system, records in a database, files in a file system,
binary decision diagrams, etc.)
[edit] String interning
Main article: String interning
[edit] Implementations
[edit] In programming languages
Many programming languages provide hash table functionality, either as built-in associative
arrays or as standard library modules. InC++0x, for example, thehash_map and unordered_map
classes provide hash tables for keys and values of arbitrary type.
In PHP 5, the Zend 2 engine uses one of the hash functions from Daniel J. Bernstein to generatethe hash values used in managing the mappings of data pointers stored in a HashTable. In the
PHP source code, it is labelled as "DJBX33A" (Daniel J. Bernstein, Times 33 with Addition).
Python's built-in hash table implementation, in the form of the dict type, as well as Perl's hash
type (%) are highly optimized as they are used internally to implement namespaces.
[edit] Independent packages
Google Sparse HashThe Google SparseHash project contains several C++hash-map implementations in use at Google, with different performance
characteristics, including an implementation that optimizes for memory useand one that optimizes for speed. The memory-optimized one is extremelymemory-efficient with only 2 bits/entry of overhead.
SunriseDD An open source C library for hash table storage of arbitrary dataobjects with lock-free lookups, built-in reference counting and guaranteedorder iteration. The library can participate in external reference countingsystems or use its own built-in reference counting. It comes with a variety of
http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=28http://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/JavaScripthttp://en.wikipedia.org/wiki/JavaScripthttp://en.wikipedia.org/wiki/Ruby_(programming_language)http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=29http://en.wikipedia.org/wiki/Lisp_(programming_language)http://en.wikipedia.org/wiki/Hash_consinghttp://en.wikipedia.org/wiki/Expression_treehttp://en.wikipedia.org/w/index.php?title=Symbolic_algebra_system&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Symbolic_algebra_system&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Symbolic_algebra_system&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=30http://en.wikipedia.org/wiki/String_interninghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=31http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit§ion=32http://en.wikipedia.org/wiki/Library_(computing)http://en.wikipedia.org/wiki/C%2B%2B0xhttp://en.wikipedia.org/wiki/C%2B%2B0xhttp://en.wikipedia.org/wiki/Hash_map_(C