symmetric and aysmmetric algorithms

Upload: strbrry

Post on 08-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    1/30

    Symmetric algorithms:

    IDEA encryption algorithm

    IDEA stands for International Data Encryption Algorithm. IDEA is a symmetric encryption

    algorithm that was developed by Dr. X. Lai and Prof. J. Massey to replace the DES standard.Unlike DES though it uses a 128 bit key. This key length makes it impossible to break by simply

    trying every key. It has been one of the best publicly known algorithms for some time. It has

    been around now for several years, and no practical attacks on it have been published despite ofnumerous attempts to analyze it.

    IDEA is resistant to both linear and differential analysis

    SEED

    SEED is a block cipher developed by the Korea Information Security Agency since 1998. Both

    the block and key size of SEED are 128 bits and it has a Feistel Network structure which isiterated 16 times. It has been designed to resist differential and linear cryptanalysis as well as

    related key attacks. SEED uses two 8x8 S-boxes and mixes the XOR operation with modular

    addition. SEED has been adopted as an ISO/IEC standard (ISO/IEC 18033-3), an IETF RFC,RFC 4269 as well as an industrial association standard of Korea (TTAS.KO-12.0004/0025).

    Asymmetric algorithms:

    ElGamal

    The ElGamal is a public key cipher - an asymmetric key encryption algorithm for public-key

    cryptography which is based on the Diffie-Hellman key agreement. ElGamal is the predecessorof DSA.

    ECDSA

    Elliptic Curve DSA (ECDSA) is a variant of the Digital Signature Algorithm (DSA) whichoperates on elliptic curve groups. As with Elliptic Curve Cryptography in general, the bit size of

    the public key believed to be needed for ECDSA is about twice the size of the security level, inbits.

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    2/30

    XTR

    XTR is an encryption algorithm for public-key encryption. XTR is a novel method that makes

    use of traces to represent and calculate powers of elements of a subgroup of a finite field. It isbased on the primitive underlying the very first public key cryptosystem, the Diffie-Hellman key

    agreement protocol.

    From a security point of view, XTR security relies on the difficulty of solving discrete logarithm

    related problems in the multiplicative group of a finite field. Some advantages of XTR are its fast

    key generation (much faster than RSA), small key sizes (much smaller than RSA, comparablewith ECC for current security settings), and speed (overall comparable with ECC for current

    security settings).

    Differences between symmetric and asymmetric encryption algorithms

    Symmetric encryption algorithms encrypt and decrypt with the same key. Main advantages of

    symmetric encryption algorithms are its security and high speed. Asymmetric encryption

    algorithms encrypt and decrypt with different keys. Data is encrypted with a public key, anddecrypted with a private key. Asymmetric encryption algorithms (also known as public-key

    algorithms) need at least a 3,000-bit key to achieve the same level of security of a 128-bit

    symmetric algorithm. Asymmetric algorithms are incredibly slow and it is impractical to use

    them to encrypt large amounts of data. Generally, symmetric encryption algorithms are muchfaster to execute on a computer than asymmetric ones. In practice they are often used together, so

    that a public-key algorithm is used to encrypt a randomly generated encryption key, and therandom key is used to encrypt the actual message using a symmetric algorithm. This is

    sometimes called hybrid encryption.

    Strength of Encryption Algorithms

    Strong encryption algorithms should always be designed so that they are as difficult to break as

    possible. In theory, any encryption algorithm with a key can be broken by trying all possiblekeys in sequence. If using brute force to try all keys is the only option, the required computing

    power increases exponentially with the length of the key. A 32-bit key takes 232 (about 109)

    steps. This is something anyone can do on his/her home computer. An encryption algorithm with

    56-bit keys, such as DES, requires a substantial effort, but using massive distributed systems

    requires only hours of computing. In 1999, a brute-force search using a specially designed

    supercomputer and a worldwide network of nearly 100,000 PCs on the Internet, found a DES

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    3/30

    key in 22 hours and 15 minutes. It is currently believed that keys with at least 128 bits (as in

    AES, for example) will be sufficient against brute-force attacks into the foreseeable future.

    However, key length is not the only relevant issue. Many encryption algorithms can be broken

    without trying all possible keys. In general, it is very difficult to design ciphers that could not be

    broken more effectively using other method

    The keys used in public-key encryption algorithms are usually much longer than those used in

    symmetric encryption algorithms. This is caused by the extra structure that is available to the

    cryptanalyst. There the problem is not that of guessing the right key, but deriving the matching

    private key from the public key. In the case of RSA encryption algorithm, this could be done by

    factoring a large integer that has two large prime factors. In the case of some other

    cryptosystems, it is equivalent to computing the discrete logarithm modulo a large integer (which

    is believed to be roughly comparable to factoring when the moduli is a large prime number).

    How to decrypt files

    1. OpenEncryption and Decryption Pro main window.

    2. Click "Tools" button and select "Decryption".

    3. Put the radio button next to "File"

    4. Select the file you want to decrypt in "File you want to decrypt" field.

    5. In "Save decrypted file to ..." field select where you want to save the decrypted file. You can

    save decrypted file with the same name or you can change its name.

    6. Click the "Decrypt" button.

    7. Enter password for decryption and confirm it. Note: it is the same password that was used forfile encryption

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    4/30

    1. /*2. Shuffle elements of Java ArrayList example

    3. This java example shows how to shuffle elements of Java ArrayListobject using4. shuffle method of Collections class.5. */

    6.

    7. importjava.util.ArrayList;8. importjava.util.Collections;9.

    10. publicclass ShuffleElementsOfArrayListExample {11.

    12. publicstaticvoid main(String[] args){13.

    14. //create an ArrayList object

    15. ArrayList arrayList =newArrayList();16.17. //Add elements to Arraylist18. arrayList.add("A");19. arrayList.add("B");20. arrayList.add("C");21. arrayList.add("D");22. arrayList.add("E");23.

    24. System.out.println("Before shuffling, ArrayList contains : "+arrayList);25.

    26. /*27. To shuffle elements of Java ArrayList use,28. static void shuffle(List list) method of Collections class.29. */

    30.

    31. Collections.shuffle(arrayList);32.

    33. System.out.println("After shuffling, ArrayList contains : "+arrayList);34.

    35. }36. }

    37.38. /*

    39. Output would be40. Before shuffling, ArrayList contains : [A, B, C, D, E]41. After shuffling, ArrayList contains : [C, A, D, E, B]42. */

    Abstract: This article describes how to implement JAVA environment IDEA

    symmetric encryption algorithm. As the popularity of e-commerce and e-

    government, security, encryption technology, which is widely used, the

    requirements for security encryption technology is also very high. JAVA environment

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    5/30

    under the current IDEA encryption has many advantages, because JAVA is based on

    object-oriented programming language, and because of its performance has been

    widely used platform-independent Internet development.

    Keywords: IDEA (Internation Data Encryption Algorithm) JCA JCE reliability of key

    independence

    With the rapid development of Internet, e-commerce wave of overwhelming, daily

    work and data are transmitted on the Internet online, greatly improving the

    efficiency, reduce costs and create good results. However, the Internet network

    protocol itself, there are important security issues (IP packet itself does not inherit

    any security features, it is easy to forge the IP address of the package, modify its

    contents, replay the previous packet and the transmission on his way to intercept

    and view package content), so that the transfer of information online there is a

    huge security risk the security of e-commerce has become increasingly prominent.

    Encryption is the most important e-commerce security technology, encryption

    directly affects the selection of e-commerce activities in the information security

    level of e-commerce system, the main security problems can be solved by

    encryption. Confidentiality of data through different encryption algorithms for data

    encryption to achieve.

    China is concerned, although it can introduce a lot of foreign equipment, but can

    not rely on the introduction of encryption devices, network security as it relates to

    national security of confidential information, it is necessary to their development.

    There are many current international encryption algorithms, including DES (Data

    Encryption Standard) is the invention of the first group the most widely used

    symmetric encryption algorithm, DES key encryption with 56-bit 64-bit plaintexthoney, output 64-bit cipher, DES 56-bit There are 256 possible keys keys, but has

    historically been cracked by brute-force attacks on DES key, 1998, the Electronic

    Frontier Foundation (EFF) made with $250,000 dedicated computer, crack the DES

    with 56 hours of the close Key, in 1999, EFF 22 hours to complete the break with

    the work, so that DES algorithm has been badly hit, it's security under serious

    threat. JAVA language for security and network processing ability, this paper

    introduces the use of IDEA (Internation Data Encryption Algorithm) encryption

    algorithm in the Java environment to achieve the security of data transmission.

    one, IDEA data encryption algorithm

    IDEA data encryption algorithm is to learn Jia Boshi by the Chinese scholars and the

    famous cryptographer James L. Massey in 1990 jointly proposed. It is the plaintext

    and ciphertext are 64 bit, but the key is 128 bits long. IDEA block cipher as an

    iterative implementation of a 128-bit keys and 8 cycles. DES provides more than

    security, but the key in the selection for the IDEA, we should exclude those known

    as"weak key"key. DES only four weak sub-weak keys and 12 keys, and IDEA in a

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    6/30

    considerable number of weak keys, 2 of the 51 th one. However, if the total number

    of keys is very large, up to 2 power of a 128, then there are 2 power of a key 77 to

    choose from. IDEA is considered to be extremely safe. 128-bit key, brute force

    attack the number of tests required will be significantly increased compared with

    DES, even allowing for weak key test. Moreover, it also shows its own particular

    form of resistance to sexual assault professional.

    Second, Java code system and the Java Cryptography Extension

    Java is Sun developed an object-oriented programming language and platform

    independence because it was widely used Internet development. Java Cryptographic

    Framework (JCA) and Java Cryptography Extension (JCE) is designed to provide for

    the Java implementation of cryptographic functions related API. They all use the

    factory method to create the class routine, and then the actual encryption functions

    entrusted to the providers of the specified underlying engine, the engine provided

    for the class of service provider interfaces in Java, data encryption/ decryption, is

    using its built-in JCE (Java Cryptography Extension) to achieve. 1.1 Java

    development tools for the realization of digital signature and message digest,

    including encryption, including the introduction of a flexible and based on vendor's

    new application programming interface. Java Cryptographic Architecture vendor

    interoperability support, and support of hardware and software. Java Cryptography

    design follows two principles: (1) algorithm independence and reliability. (2)

    implementation of the independence and interaction. Independence of the

    algorithm by defining the service class to get the password. Users only need to

    understand the concept of cryptographic algorithms, without having to care about

    how to implement these concepts. To achieve the independence and interaction of

    service provider via a password to achieve. Cryptographic service provider is toachieve one or more cryptographic services of one or more packages. Software

    developers under certain interfaces, and the various algorithms, the package as a

    provider, the user can install a different provider. Provide installation and

    configuration, which can be provided will include ZIP and JAR file device on the

    CLASSPATH, the re-edit the Java security properties file to set the definition of a

    provider. Sun version of Java Runtime Environment, provide a default provider Sun.

    Third, Java environment implementation

    1. Implementation of the encryption process

    void idea_enc (int data11 [],/ * 64-bit data encryption to be the first address */ int

    key1 []) {

    int i;

    int tmp, x;

    int zz [] = new int [6];

    for (i = 0;i

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    7/30

    for (int j = 0, box = i;j

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    8/30

    platform independence, the application of the Internet, is very extensive. IDEA's use

    of Java-based data encryption transmission can be implemented on different

    platforms and has achieved simplicity, security and other advantages.

    1. /*

    2. Java Hashtable example.

    3. This Java Hashtable example describes the basic operations

    performed on the Hashtable.

    4. */

    5.6. importjava.util.Hashtable;7. importjava.util.Enumeration;8.

    9. publicclass HashtableExample{10.11. publicstaticvoid main(String args[]){12.13. // constructs a new empty hashtable with default

    initial capacity

    14. Hashtable hashtable = newHashtable();15.16. /*17.

    18. To specify initial capacity, use following

    constructor19. Hashtable hashtable = new Hashtable(100);

    20.

    21. To create hashtable from map use following

    constructor

    22. Hashtable hashtable = new Hashtable(Map

    myMap);

    23.

    24. */

    25.26. hashtable.put("One", newInteger(1));//adding

    value to Hashtable27.28. hashtable.put("Two", newInteger(2));29.30. hashtable.put("Three", newInteger(3));31.32. /*

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    9/30

    33. IMPORTANT : We CAN NOT add primitives to the

    Hashtable. We have to wrap it into one

    34. of the wrapper classes before adding.

    35. */

    36.37.

    /*

    38.

    39. To copy all key - value pairs from Map to

    Hashtable use putAll method.

    40.

    41. Signature of putAll method is,

    42. void putAll(Map m)

    43.

    44. */

    45.46. //get number of keys present in the hashtable47. System.out.println("Hashtable contains " +

    hashtable.size() + " key value pairs.");48.49. /*50.

    51. To check whether Hashtable is empty or not,

    use isEmpty() method.

    52. isEmpty() returns true is Hashtable is empty,

    otherwise false.

    53.

    54. */

    55.

    56. /*57.

    58. Finding particular value from the Hashtable :

    59.

    60. Hashtable's contains method returns boolean

    depending upon the presence of the value in given hashtable

    61.

    62. Signature of the contains method is,

    63. boolean contains(Object value)

    64.

    65. */

    66.67. if( hashtable.contains(newInteger(1))){68. System.out.println("Hashtable contains 1 as

    value");69. }else{70. System.out.println("Hashtable does not contain

    1 as value");71. }

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    10/30

    72.73. /*74.

    75. Finding particular Key from the Hashtable :

    76.

    77. Hashtable's containsKey method returns boolean

    depending upon the presence of the

    78. key in given hashtable

    79.

    80. Signature of the method is,

    81. boolean containsKey(Object key)

    82.

    83. */

    84.85. if(hashtable.containsKey("One")){86. System.out.println("Hashtable contains One as

    key");87. }else{88. System.out.println("Hashtable does not contain

    One as value");89. }90.91. /*92.

    93. Use get method of Hashtable to get value

    mapped to particular key.

    94.

    95. Signature of the get method is,

    96. Object get(Object key)97.

    98. */

    99.100. Integer one = (Integer) hashtable.get("One");101.102. System.out.println("Value mapped with key

    \"One\" is " + one);103.104. /*105. IMPORTANT: get method returns an Object, so

    we need to downcast it.106. */

    107.108. /*109.

    110. To get all keys stored in Hashtable use keys

    method.

    111.

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    11/30

    112. Signature of the keys method is,

    113. Enumeration keys()

    114.

    115. To get Set of the keys use keySet() method

    instead.

    116. Set keySet()

    117.

    118. */

    119.120. System.out.println("Retrieving all keys from

    the Hashtable");121.122. Enumeration e = hashtable.keys();123.124. while( e. hasMoreElements()){125. System.out.println(e.nextElement());126. }127.128. /*129.

    130. To get all values stored in Hashtable use

    elements method.

    131.

    132. Signature of the elements method is,

    133. Enumeration elements()

    134.

    135. To get Set of the values use entrySet() method

    instead.

    136. Set entrySet()137.

    138. */

    139.140. System.out.println("Retrieving all values from

    the Hashtable");141.142. e = hashtable.elements();143.144. while(e. hasMoreElements()){145. System.out.println(e.nextElement());

    146. }147.148. /*149.

    150. To remove particular key - value pair from the

    Hashtable use remove method.

    151.

    152. Signature of remove method is,

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    12/30

    153. Object remove(Object key)

    154.

    155. This method returns value that was mapped to

    the given key, otherwise null if mapping not found.

    156.

    157. */

    158.159. System.out.println( hashtable.remove("One") +

    " is removed from the Hashtable.");160.161. }162.163. }164.165. /*

    166. OUTPUT of the above given Java Hashtable Example would

    be :

    167.168. Hashtable contains 3 key value pair.

    169. Hashtable contains 1 as value

    170. Hashtable contains One as key

    171. Value mapped with key "One" is 1

    172. Retrieving all keys from the Hashtable

    173. One

    174. Three

    175. Two

    176. Retrieving all values from the Hashtable

    177. 1

    178. 3179. 2

    180. 1 is removed from the Hashtable.

    181.182. */

    183. Hashtable was part of the originaljava.util and is a concrete implementation of a

    Dictionary. However, Java 2 reengineered Hashtable so that it also implements the Mapinterface. Thus, Hashtable is now integrated into the collections framework. It is similar

    to HashMap, but is synchronized.184. Like HashMap, Hashtable stores key/value pairs in a hash table. When using a

    Hashtable, you specify an object that is used as a key, and the value that you want linked

    to that key. The key is then hashed, and the resulting hash code is used as the index atwhich the value is stored within the table.

    185. A hash table can only store objects that override the hashCode() and equals()methods that are defined by Object. The hashCode() method must compute and returnthe hash code for the object. Of course, equals() compares two objects. Fortunately,

    many of Java's built-in classes already implement the hashCode() method. For example,

    the most common type ofHashtable uses a String object as the key. String implements

    both hashCode() and equals().

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    13/30

    186. The Hashtable constructors are shown here:

    187. Hashtable( )

    Hashtable(intsize)Hashtable(intsize, floatfillRatio)

    Hashtable(Map m)

    188. The first version is the default constructor. The second version creates a hashtable that has an initial size specified bysize. The third version creates a hash table that

    has an initial size specified bysize and a fill ratio specified byfillRatio. This ratio must

    be between 0.0 and 1.0, and it determines how full the hash table can be before it isresized upward. Specifically, when the number of elements is greater than the capacity of

    the hash table multiplied by its fill ratio, the hash table is expanded. If you do not specify

    a fill ratio, then 0.75 is used. Finally, the fourth version creates a hash table that is

    initialized with the elements in m. The capacity of the hash table is set to twice thenumber of elements in m. The default load factor of 0.75 is used. The fourth constructor

    was added by Java 2.

    189.

    The following example uses a Hashtable to store the names of bank depositors and their

    current balances:190. // Demonstrate a Hashtable

    import java.util.*;

    class HTDemo {

    public static void main(String args[]) {Hashtable balance = new Hashtable();

    Enumeration names;

    String str;double bal;

    balance.put("John Doe", new Double(3434.34));

    balance.put("Tom Smith", new Double(123.22));balance.put("Jane Baker", new Double(1378.00));

    balance.put("Todd Hall", new Double(99.22));

    balance.put("Ralph Smith", new Double(-19.08));

    // Show all balances in hash table.names = balance.keys();

    while(names.hasMoreElements()) {

    str = (String) names.nextElement();System.out.println(str + ": " +

    balance.get(str));

    }

    System.out.println();// Deposit 1,000 into John Doe's account

    bal = ((Double)balance.get("John Doe")).doubleValue();

    balance.put("John Doe", new Double(bal+1000));System.out.println("John Doe's new balance: " +

    balance.get("John Doe"));

    }}

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    14/30

    191. The output from this program is shown here:

    192. Ralph Smith: -19.08

    Tom Smith: 123.22John Doe: 3434.34

    Todd Hall: 99.22

    Jane Baker: 1378.0John Doe's new balance: 4434.34

    193. One important point: like the map classes, Hashtable does not directly support

    iterators. Thus, the preceding program uses an enumeration to display the contents ofbalance. However, you can obtain set-views of the hash table, which permits the use of

    iterators. To do so, you simply use one of the collection-view methods defined by Map,

    such as entrySet( ) orkeySet( ). For example, you can obtain a set-view of the keys and

    iterate through them. Here is a reworked version of the program that shows thistechnique:

    194. // Use iterators with a Hashtable.

    import java.util.*;

    class HTDemo2 {public static void main(String args[]) {

    Hashtable balance = new Hashtable();String str;

    double bal;

    balance.put("John Doe", new Double(3434.34));

    balance.put("Tom Smith", new Double(123.22));balance.put("Jane Baker", new Double(1378.00));

    balance.put("Todd Hall", new Double(99.22));

    balance.put("Ralph Smith", new Double(-19.08));// show all balances in hashtable

    Set set = balance.keySet(); // get set-view of keys

    // get iteratorIterator itr = set.iterator();

    while(itr.hasNext()) {

    str = (String) itr.next();System.out.println(str + ": " +

    balance.get(str));

    }

    System.out.println();// Deposit 1,000 into John Doe's account

    bal = ((Double)balance.get("John Doe")).doubleValue();

    balance.put("John Doe", new Double(bal+1000));System.out.println("John Doe's new balance: " +

    balance.get("John Doe"));

    }}

  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    15/30

    Hash table

    From Wikipedia, the free encyclopedia

    Jump to: navigation, search

    Not to be confused with Hash list or Hash tree.

    "Unordered map" redirects here. For the proposed C++ class, see unordered_map

    (C++).

    A small phone book as a hash table.

    In computer science, a hash table orhash map is adata structurethat uses a hash functiontomap identifying values, known as keys (e.g., a person's name), to their associated values (e.g.,

    their telephone number). Thus, a hash table implements an associative array. The hash function

    is used to transform the key into the index (the hash) of anarrayelement (theslotorbucket)where the corresponding value is to be sought.

    Ideally, the hash function should map each possible key to a unique slot index, but this ideal is

    rarely achievable in practice (unless the hash keys are fixed; i.e. new entries are never added to

    the table after it is created). Instead, most hash table designs assume that hash collisionsdifferent keys that map to the same hash valuewill occur and must be accommodated in some

    way.

    In a well-dimensioned hash table, the average cost (number ofinstructions) for each lookup is

    independent of the number of elements stored in the table. Many hash table designs also allowarbitrary insertions and deletions of key-value pairs, at constant average (indeed, amortized[1] )

    cost per operation.[2][3]

    In many situations, hash tables turn out to be more efficient than search trees or any othertable

    lookup structure. For this reason, they are widely used in many kinds of computersoftware,particularly forassociative arrays,database indexing,caches, and sets.

    http://en.wikipedia.org/wiki/Hash_table#mw-headhttp://en.wikipedia.org/wiki/Hash_table#p-searchhttp://en.wikipedia.org/wiki/Hash_listhttp://en.wikipedia.org/wiki/Hash_treehttp://en.wikipedia.org/wiki/Unordered_map_(C%2B%2B)http://en.wikipedia.org/wiki/Unordered_map_(C%2B%2B)http://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Unique_keyhttp://en.wikipedia.org/wiki/Value_(mathematics)http://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Array_data_typehttp://en.wikipedia.org/wiki/Array_data_typehttp://en.wikipedia.org/wiki/Array_data_typehttp://en.wikipedia.org/wiki/Collision_(computer_science)http://en.wikipedia.org/wiki/Instruction_(computer_science)http://en.wikipedia.org/wiki/Instruction_(computer_science)http://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Hash_table#cite_note-leiser-0http://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-1http://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-2http://en.wikipedia.org/wiki/Search_treehttp://en.wikipedia.org/wiki/Table_(computing)http://en.wikipedia.org/wiki/Table_(computing)http://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Database_indexhttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Set_data_structurehttp://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svghttp://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svghttp://en.wikipedia.org/wiki/Hash_table#mw-headhttp://en.wikipedia.org/wiki/Hash_table#p-searchhttp://en.wikipedia.org/wiki/Hash_listhttp://en.wikipedia.org/wiki/Hash_treehttp://en.wikipedia.org/wiki/Unordered_map_(C%2B%2B)http://en.wikipedia.org/wiki/Unordered_map_(C%2B%2B)http://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Unique_keyhttp://en.wikipedia.org/wiki/Value_(mathematics)http://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Array_data_typehttp://en.wikipedia.org/wiki/Collision_(computer_science)http://en.wikipedia.org/wiki/Instruction_(computer_science)http://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Hash_table#cite_note-leiser-0http://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-1http://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-2http://en.wikipedia.org/wiki/Search_treehttp://en.wikipedia.org/wiki/Table_(computing)http://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Database_indexhttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Set_data_structure
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    16/30

    Contents

    [hide]

    1 Hash function

    o 1.1 Choosing a good hash functiono 1.2 Perfect hash function

    2 Collision resolutiono 2.1 Load factoro 2.2 Separate chaining

    2.2.1 Separate chaining with list heads 2.2.2 Separate chaining with other structures

    o 2.3 Open addressingo 2.4 Coalesced hashingo 2.5 Robin Hood hashingo 2.6 Cuckoo hashingo 2.7 Hopscotch hashing

    3 Dynamic resizingo 3.1 Resizing by copying all entrieso 3.2 Incremental resizingo 3.3 Monotonic keyso 3.4 Other solutions

    4 Performance analysis 5 Features

    o 5.1 Advantageso 5.2 Drawbacks

    6 Useso 6.1 Associative arrays

    o 6.2 Database indexingo 6.3 Cacheso 6.4 Setso 6.5 Object representationo 6.6 Unique data representationo 6.7 String interning

    7 Implementationso 7.1 In programming languageso 7.2 Independent packages

    8 History 9 See also

    o 9.1 Related data structures 10 References 11 Further reading

    12 External links

    [edit] Hash function

    Main article: Hash function

    http://en.wikipedia.org/wiki/Hash_tablehttp://en.wikipedia.org/wiki/Hash_table#Hash_functionhttp://en.wikipedia.org/wiki/Hash_table#Choosing_a_good_hash_functionhttp://en.wikipedia.org/wiki/Hash_table#Perfect_hash_functionhttp://en.wikipedia.org/wiki/Hash_table#Collision_resolutionhttp://en.wikipedia.org/wiki/Hash_table#Load_factorhttp://en.wikipedia.org/wiki/Hash_table#Separate_chaininghttp://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_list_headshttp://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_other_structureshttp://en.wikipedia.org/wiki/Hash_table#Open_addressinghttp://en.wikipedia.org/wiki/Hash_table#Coalesced_hashinghttp://en.wikipedia.org/wiki/Hash_table#Robin_Hood_hashinghttp://en.wikipedia.org/wiki/Hash_table#Cuckoo_hashinghttp://en.wikipedia.org/wiki/Hash_table#Hopscotch_hashinghttp://en.wikipedia.org/wiki/Hash_table#Dynamic_resizinghttp://en.wikipedia.org/wiki/Hash_table#Resizing_by_copying_all_entrieshttp://en.wikipedia.org/wiki/Hash_table#Incremental_resizinghttp://en.wikipedia.org/wiki/Hash_table#Monotonic_keyshttp://en.wikipedia.org/wiki/Hash_table#Other_solutionshttp://en.wikipedia.org/wiki/Hash_table#Performance_analysishttp://en.wikipedia.org/wiki/Hash_table#Featureshttp://en.wikipedia.org/wiki/Hash_table#Advantageshttp://en.wikipedia.org/wiki/Hash_table#Drawbackshttp://en.wikipedia.org/wiki/Hash_table#Useshttp://en.wikipedia.org/wiki/Hash_table#Associative_arrayshttp://en.wikipedia.org/wiki/Hash_table#Database_indexinghttp://en.wikipedia.org/wiki/Hash_table#Cacheshttp://en.wikipedia.org/wiki/Hash_table#Setshttp://en.wikipedia.org/wiki/Hash_table#Object_representationhttp://en.wikipedia.org/wiki/Hash_table#Unique_data_representationhttp://en.wikipedia.org/wiki/Hash_table#String_interninghttp://en.wikipedia.org/wiki/Hash_table#Implementationshttp://en.wikipedia.org/wiki/Hash_table#In_programming_languageshttp://en.wikipedia.org/wiki/Hash_table#Independent_packageshttp://en.wikipedia.org/wiki/Hash_table#Historyhttp://en.wikipedia.org/wiki/Hash_table#See_alsohttp://en.wikipedia.org/wiki/Hash_table#Related_data_structureshttp://en.wikipedia.org/wiki/Hash_table#Referenceshttp://en.wikipedia.org/wiki/Hash_table#Further_readinghttp://en.wikipedia.org/wiki/Hash_table#External_linkshttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=1http://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Hash_tablehttp://en.wikipedia.org/wiki/Hash_table#Hash_functionhttp://en.wikipedia.org/wiki/Hash_table#Choosing_a_good_hash_functionhttp://en.wikipedia.org/wiki/Hash_table#Perfect_hash_functionhttp://en.wikipedia.org/wiki/Hash_table#Collision_resolutionhttp://en.wikipedia.org/wiki/Hash_table#Load_factorhttp://en.wikipedia.org/wiki/Hash_table#Separate_chaininghttp://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_list_headshttp://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_other_structureshttp://en.wikipedia.org/wiki/Hash_table#Open_addressinghttp://en.wikipedia.org/wiki/Hash_table#Coalesced_hashinghttp://en.wikipedia.org/wiki/Hash_table#Robin_Hood_hashinghttp://en.wikipedia.org/wiki/Hash_table#Cuckoo_hashinghttp://en.wikipedia.org/wiki/Hash_table#Hopscotch_hashinghttp://en.wikipedia.org/wiki/Hash_table#Dynamic_resizinghttp://en.wikipedia.org/wiki/Hash_table#Resizing_by_copying_all_entrieshttp://en.wikipedia.org/wiki/Hash_table#Incremental_resizinghttp://en.wikipedia.org/wiki/Hash_table#Monotonic_keyshttp://en.wikipedia.org/wiki/Hash_table#Other_solutionshttp://en.wikipedia.org/wiki/Hash_table#Performance_analysishttp://en.wikipedia.org/wiki/Hash_table#Featureshttp://en.wikipedia.org/wiki/Hash_table#Advantageshttp://en.wikipedia.org/wiki/Hash_table#Drawbackshttp://en.wikipedia.org/wiki/Hash_table#Useshttp://en.wikipedia.org/wiki/Hash_table#Associative_arrayshttp://en.wikipedia.org/wiki/Hash_table#Database_indexinghttp://en.wikipedia.org/wiki/Hash_table#Cacheshttp://en.wikipedia.org/wiki/Hash_table#Setshttp://en.wikipedia.org/wiki/Hash_table#Object_representationhttp://en.wikipedia.org/wiki/Hash_table#Unique_data_representationhttp://en.wikipedia.org/wiki/Hash_table#String_interninghttp://en.wikipedia.org/wiki/Hash_table#Implementationshttp://en.wikipedia.org/wiki/Hash_table#In_programming_languageshttp://en.wikipedia.org/wiki/Hash_table#Independent_packageshttp://en.wikipedia.org/wiki/Hash_table#Historyhttp://en.wikipedia.org/wiki/Hash_table#See_alsohttp://en.wikipedia.org/wiki/Hash_table#Related_data_structureshttp://en.wikipedia.org/wiki/Hash_table#Referenceshttp://en.wikipedia.org/wiki/Hash_table#Further_readinghttp://en.wikipedia.org/wiki/Hash_table#External_linkshttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=1http://en.wikipedia.org/wiki/Hash_function
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    17/30

    At the heart of the hash table algorithm is a simple array of items; this is often simply called thehash table. Hash table algorithms calculate an index from the data item's key and use this index

    to place the data into the array. The implementation of this calculation is the hash function, f:

    index = f(key, arrayLength)

    The hash function calculates an index within the array from the data key. arrayLength is the

    size of the array. Forassembly language or other low-level programs, a trivial hash function can

    often create an index with just one or two inlinemachine instructions.

    [edit] Choosing a good hash function

    A good hash function and implementation algorithm are essential for good hash table

    performance, but may be difficult to achieve. Poor hashing usually degrades hash table

    performance by a constant factor,[citation needed] but hashing is often only a small part of the overallcomputation.

    A basic requirement is that the function should provide a uniform distribution of hash values. A

    non-uniform distribution increases the number of collisions, and the cost of resolving them.

    Uniformity is sometimes difficult to ensure by design, but may be evaluated empirically usingstatistical tests, e.g., a chi-squared test.[4]

    The distribution needs to be uniform only for table sizess that occur in the application. In

    particular, if one uses dynamic resizing with exact doubling and halving ofs, the hash function

    needs to be uniform only whens is apowerof two. On the other hand, some hashing algorithmsprovide uniform hashes only whens is aprime number.[5]

    Foropen addressing schemes, the hash function should also avoid clustering, the mapping of twoor more keys to consecutive slots. Such clustering may cause the lookup cost to skyrocket, evenif the load factor is low and collisions are infrequent. The popular multiplicative hash [2] is

    claimed to have particularly poor clustering behavior.[5]

    Cryptographic hash functions are believed to provide good hash functions for any table sizes,

    either by modulo reduction or bybit masking. They may also be appropriate, if there is a risk ofmalicious users trying to sabotagea network service by submitting requests designed to generate

    a large number of collisions in the server's hash tables.[citation needed] However, these presumed

    qualities are hardly worth their much larger computational cost and algorithmic complexity, and

    the risk of sabotage can be avoided by cheaper methods (such as applying a secret salt to the

    data, or using a universal hash function).

    Some authors claim that good hash functions should have the avalanche effect; that is, a single-

    bit change in the input key should affect, on average, half the bits in the output. Some popularhash functions do not have this property.[4]

    http://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Assembly_languagehttp://en.wikipedia.org/wiki/Hash_function#Trivial_hash_functionhttp://en.wikipedia.org/wiki/Machine_instructionhttp://en.wikipedia.org/wiki/Machine_instructionhttp://en.wikipedia.org/wiki/Machine_instructionhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=2http://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Uniform_distribution_(discrete)http://en.wikipedia.org/wiki/Chi-square_testhttp://en.wikipedia.org/wiki/Hash_table#cite_note-mulvey1-3http://en.wikipedia.org/wiki/Power_functionhttp://en.wikipedia.org/wiki/Prime_numberhttp://en.wikipedia.org/wiki/Prime_numberhttp://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-4http://en.wikipedia.org/wiki/Open_addressinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-1http://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-4http://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-4http://en.wikipedia.org/wiki/Modulohttp://en.wikipedia.org/wiki/Bit_maskinghttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Salt_(cryptography)http://en.wikipedia.org/wiki/Universal_hash_functionhttp://en.wikipedia.org/wiki/Avalanche_effecthttp://en.wikipedia.org/wiki/Hash_table#cite_note-mulvey1-3http://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Assembly_languagehttp://en.wikipedia.org/wiki/Hash_function#Trivial_hash_functionhttp://en.wikipedia.org/wiki/Machine_instructionhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=2http://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Uniform_distribution_(discrete)http://en.wikipedia.org/wiki/Chi-square_testhttp://en.wikipedia.org/wiki/Hash_table#cite_note-mulvey1-3http://en.wikipedia.org/wiki/Power_functionhttp://en.wikipedia.org/wiki/Prime_numberhttp://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-4http://en.wikipedia.org/wiki/Open_addressinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-1http://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-4http://en.wikipedia.org/wiki/Modulohttp://en.wikipedia.org/wiki/Bit_maskinghttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Salt_(cryptography)http://en.wikipedia.org/wiki/Universal_hash_functionhttp://en.wikipedia.org/wiki/Avalanche_effecthttp://en.wikipedia.org/wiki/Hash_table#cite_note-mulvey1-3
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    18/30

    [edit] Perfect hash function

    If all keys are known ahead of time, aperfect hash functioncan be used to create a perfect hashtable that has no collisions. Ifminimal perfect hashing is used, every location in the hash table

    can be used as well.

    Perfect hashing allows for constant time lookups in the worst case. This is in contrast to most

    chaining and open addressing methods, where the time for lookup is low on average, but may bevery large (proportional to the number of entries) for some sets of keys.

    [edit] Collision resolution

    Hash collisions are practically unavoidable when hashing a random subset of a large set of

    possible keys. For example, if 2,500 keys are hashed into a million buckets, even with a perfectly

    uniform random distribution, according to thebirthday problem there is a 95% chance of at leasttwo of the keys being hashed to the same slot.

    Therefore, most hash table implementations have some collision resolution strategy to handle

    such events. Some common strategies are described below. All these methods require that thekeys (or pointers to them) be stored in the table, together with the associated values.

    [edit] Load factor

    The performance of most collision resolution methods does not depend directly on the numbern

    of stored entries, but depends strongly on the table's load factor, the ratio n/s between n and the

    sizes of its bucket array. Sometimes this is referred to as thefill factor, as it represents the

    portion of the s buckets in the structure that arefilledwith one of the n stored entries. With a

    good hash function, the average lookup cost is nearly constant as the load factor increases from 0up to 0.7(about 2/3 full) or so.[citation needed] Beyond that point, the probability of collisions and the

    cost of handling them increases.

    On the other hand, as the load factor approaches zero, the size of the hash table increases withlittle improvement in the search cost, and memory is wasted.

    http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=3http://en.wikipedia.org/wiki/Perfect_hash_functionhttp://en.wikipedia.org/wiki/Perfect_hash_functionhttp://en.wikipedia.org/wiki/Minimal_perfect_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=4http://en.wikipedia.org/wiki/Birthday_problemhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=5http://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=3http://en.wikipedia.org/wiki/Perfect_hash_functionhttp://en.wikipedia.org/wiki/Minimal_perfect_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=4http://en.wikipedia.org/wiki/Birthday_problemhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=5http://en.wikipedia.org/wiki/Wikipedia:Citation_needed
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    19/30

    [edit] Separate chaining

    Hash collision resolved by separate chaining.

    In the strategy known asseparate chaining, direct chaining, or simply chaining, each slot of the

    bucket array is a pointer to alinked listthat contains the key-value pairs that hashed to the same

    location. Lookup requires scanning the list for an entry with the given key. Insertion requires

    adding a new entry record to either end of the list belonging to the hashed slot. Deletion requiressearching the list and removing the element. (The technique is also called open hashingorclosed

    addressing, which should not be confused with 'open addressing' or 'closed hashing'.)

    Chained hash tables with linked lists are popular because they require only basic data structures

    with simple algorithms, and can use simple hash functions that are unsuitable for other methods.

    The cost of a table operation is that of scanning the entries of the selected bucket for the desired

    key. If the distribution of keys is sufficiently uniform, the average cost of a lookup depends only

    on the average number of keys per bucketthat is, on the load factor.

    Chained hash tables remain effective even when the number of table entries n is much higherthan the number of slots. Their performancedegrades more gracefully (linearly) with the load

    factor. For example, a chained hash table with 1000 slots and 10,000 stored keys (load factor 10)is five to ten times slower than a 10,000-slot table (load factor 1); but still 1000 times faster thana plain sequential list, and possibly even faster than a balanced search tree.

    For separate-chaining, the worst-case scenario is when all entries were inserted into the same

    bucket, in which case the hash table is ineffective and the cost is that of searching the bucket data

    structure. If the latter is a linear list, the lookup procedure may have to scan all its entries; so theworst-case cost is proportional to the numbern of entries in the table.

    http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=6http://en.wikipedia.org/wiki/Linked_listhttp://en.wikipedia.org/wiki/Linked_listhttp://en.wikipedia.org/wiki/Linked_listhttp://en.wikipedia.org/wiki/SUHAhttp://en.wikipedia.org/wiki/Graceful_degradationhttp://en.wikipedia.org/wiki/Graceful_degradationhttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_1_LL.svghttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_1_LL.svghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=6http://en.wikipedia.org/wiki/Linked_listhttp://en.wikipedia.org/wiki/SUHAhttp://en.wikipedia.org/wiki/Graceful_degradation
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    20/30

    The bucket chains are often implemented as ordered lists, sorted by the key field; this choice

    approximately halves the average cost of unsuccessful lookups, compared to an unordered

    list[citation needed]. However, if some keys are much more likely to come up than others, an unorderedlist with move-to-front heuristic may be more effective. More sophisticated data structures, such

    as balanced search trees, are worth considering only if the load factor is large (about 10 or more),

    or if the hash distribution is likely to be very non-uniform, or if one must guarantee goodperformance even in the worst-case. However, using a larger table and/or a better hash function

    may be even more effective in those cases.

    Chained hash tables also inherit the disadvantages of linked lists. When storing small keys and

    values, the space overhead of the next pointer in each entry record can be significant. An

    additional disadvantage is that traversing a linked list has poorcache performance, making the

    processor cache ineffective.

    [edit] Separate chaining with list heads

    Hash collision by separate chaining with head records in the bucket array.

    Some chaining implementations store the first record of each chain in the slot array itself.[3] Thepurpose is to increase cache efficiency of hash table access. To save memory space, such hash

    tables often have about as many slots as stored entries, meaning that many slots have two or

    more entries.

    [edit] Separate chaining with other structures

    Instead of a list, one can use any other data structure that supports the required operations. Forexample, by using a self-balancing tree, the theoretical worst-case time of common hash table

    operations (insertion, deletion, lookup) can be brought down to O(log n) rather than O(n).

    However, this approach is only worth the trouble and extra memory cost, if long delays must be

    http://en.wikipedia.org/wiki/Sequencehttp://en.wikipedia.org/wiki/Sequencehttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Move-to-front_heuristichttp://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=7http://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-2http://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-2http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=8http://en.wikipedia.org/wiki/Self-balancing_binary_search_treehttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_0_LL.svghttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_0_LL.svghttp://en.wikipedia.org/wiki/Sequencehttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Move-to-front_heuristichttp://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=7http://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-2http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=8http://en.wikipedia.org/wiki/Self-balancing_binary_search_treehttp://en.wikipedia.org/wiki/Big_O_notation
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    21/30

    avoided at all costs (e.g. in a real-time application), or if one expects to have many entries hashed

    to the same slot (e.g. if one expects extremely non-uniform or even malicious key distributions).

    The variant calledarray hash table uses adynamic array to store all the entries that hash to thesame slot.[6][7][8]Each newly inserted entry gets appended to the end of the dynamic array that is

    assigned to the slot. The dynamic array is resized in an exact-fitmanner, meaning it is grownonly by as many bytes as needed. Alternative techniques such as growing the array by block

    sizes orpages were found to improve insertion performance, but at a cost in space. This variationmakes more efficient use ofCPU caching and the translation lookaside buffer(TLB), because

    slot entries are stored in sequential memory positions. It also dispenses with the next pointers

    that are required by linked lists, which saves space. Despite frequent array resizing, spaceoverheads incurred by operating system such as memory fragmentation, were found to be small.

    An elaboration on this approach is the so-called dynamic perfect hashing,[9] where a bucket that

    contains kentries is organized as a perfect hash table with k2 slots. While it uses more memory

    (n2 slots forn entries, in the worst case), this variant has guaranteed constant worst-case lookup

    time, and low amortized time for insertion.

    [edit] Open addressing

    Hash collision resolved by open addressing with linear probing (interval=1). Note

    that "Ted Baker" has a unique hash, but nevertheless collided with "Sandra Dee"

    that had previously collided with "John Smith".

    In another strategy, calledopen addressing, all entry records are stored in the bucket array itself.

    When a new entry has to be inserted, the buckets are examined, starting with the hashed-to slot

    http://en.wikipedia.org/w/index.php?title=Array_hash_table&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Array_hash_table&action=edit&redlink=1http://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Hash_table#cite_note-5http://en.wikipedia.org/wiki/Hash_table#cite_note-6http://en.wikipedia.org/wiki/Hash_table#cite_note-7http://en.wikipedia.org/wiki/Hash_table#cite_note-7http://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Translation_lookaside_bufferhttp://en.wikipedia.org/wiki/Dynamic_perfect_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-8http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=9http://en.wikipedia.org/wiki/Open_addressinghttp://en.wikipedia.org/wiki/Open_addressinghttp://en.wikipedia.org/wiki/Open_addressinghttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_0_SP.svghttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_0_SP.svghttp://en.wikipedia.org/w/index.php?title=Array_hash_table&action=edit&redlink=1http://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Hash_table#cite_note-5http://en.wikipedia.org/wiki/Hash_table#cite_note-6http://en.wikipedia.org/wiki/Hash_table#cite_note-7http://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Translation_lookaside_bufferhttp://en.wikipedia.org/wiki/Dynamic_perfect_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-8http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=9http://en.wikipedia.org/wiki/Open_addressing
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    22/30

    and proceeding in someprobe sequence, until an unoccupied slot is found. When searching for

    an entry, the buckets are scanned in the same sequence, until either the target record is found, or

    an unused array slot is found, which indicates that there is no such key in the table.[10] The name"open addressing" refers to the fact that the location ("address") of the item is not determined by

    its hash value. (This method is also called closed hashing; it should not be confused with "open

    hashing" or "closed addressing" that usually mean separate chaining.)

    Well-known probe sequences include:

    Linear probing, in which the interval between probes is fixed (usually 1) Quadratic probing, in which the interval between probes is increased by

    adding the successive outputs of a quadratic polynomial to the starting valuegiven by the original hash computation

    Double hashing, in which the interval between probes is computed byanother hash function

    A drawback of all these open addressing schemes is that the number of stored entries cannot

    exceed the number of slots in the bucket array. In fact, even with good hash functions, theirperformance seriously degrades when the load factor grows beyond 0.7 or so. For many

    applications, these restrictions mandate the use of dynamic resizing, with its attendant costs.

    Open addressing schemes also put more stringent requirements on the hash function: besidesdistributing the keys more uniformly over the buckets, the function must also minimize the

    clustering of hash values that are consecutive in the probe order. Even experienced programmers

    may find such clustering hard to avoid.

    Open addressing only saves memory if the entries are small (less than 4 times the size of apointer) and the load factor is not too small. If the load factor is close to zero (that is, there are

    far more buckets than stored entries), open addressing is wasteful even if each entry is just twowords.

    http://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-9http://en.wikipedia.org/wiki/Linear_probinghttp://en.wikipedia.org/wiki/Quadratic_probinghttp://en.wikipedia.org/wiki/Double_hashinghttp://en.wikipedia.org/wiki/File:Hash_table_average_insertion_time.pnghttp://en.wikipedia.org/wiki/File:Hash_table_average_insertion_time.pnghttp://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-9http://en.wikipedia.org/wiki/Linear_probinghttp://en.wikipedia.org/wiki/Quadratic_probinghttp://en.wikipedia.org/wiki/Double_hashing
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    23/30

    This graph compares the average number of cache misses required to lookup

    elements in tables with chaining and linear probing. As the table passes the 80%-

    full mark, linear probing's performance drastically degrades.

    Open addressing avoids the time overhead of allocating each new entry record, and can be

    implemented even in the absence of a memory allocator. It also avoids the extra indirectionrequired to access the first entry of each bucket (that is, usually the only one). It also has betterlocality of reference, particularly with linear probing. With small record sizes, these factors can

    yield better performance than chaining, particularly for lookups.

    Hash tables with open addressing are also easier to serialize, because they do not use pointers.

    On the other hand, normal open addressing is a poor choice for large elements, because these

    elements fill entire CPU cache lines (negating the cache advantage), and a large amount of spaceis wasted on large empty table slots. If the open addressing table only stores references to

    elements (external storage), it uses space comparable to chaining even for large records but loses

    its speed advantage.

    Generally speaking, open addressing is better used for hash tables with small records that can bestored within the table (internal storage) and fit in a cache line. They are particularly suitable for

    elements of one word or less. If the table is expected to have a high load factor, the records are

    large, or the data is variable-sized, chained hash tables often perform as well or better.

    Ultimately, used sensibly, any kind of hash table algorithm is usually fast enough; and the

    percentage of a calculation spent in hash table code is low. Memory usage is rarely considered

    excessive. Therefore, in most cases the differences between these algorithms are marginal, and

    other considerations typically come into play.

    [edit] Coalesced hashing

    A hybrid of chaining and open addressing, coalesced hashinglinks together chains of nodes

    within the table itself.[10]Like open addressing, it achieves space usage and (somewhatdiminished) cache advantages over chaining. Like chaining, it does not exhibit clustering effects;

    in fact, the table can be efficiently filled to a high density. Unlike chaining, it cannot have more

    elements than table slots.

    [edit] Robin Hood hashing

    One interesting variation on double-hashing collision resolution is Robin Hood hashing. [11] Theidea is that a new key may displace a key already inserted, if its probe count is larger than that ofthe key at the current position. The net effect of this is that it reduces worst case search times in

    the table. This is similar to Knuth's ordered hash tables except that the criterion for bumping a

    key does not depend on a direct relationship between the keys. Since both the worst case and thevariation in the number of probes is reduced dramatically, an interesting variation is to probe the

    table starting at the expected successful probe value and then expand from that position in both

    directions.[12] External Robin Hashing is an extension of this algorithm where the table is stored

    http://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/Serializationhttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=10http://en.wikipedia.org/wiki/Coalesced_hashinghttp://en.wikipedia.org/wiki/Coalesced_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-9http://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-9http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=11http://en.wikipedia.org/wiki/Hash_table#cite_note-10http://en.wikipedia.org/wiki/Hash_table#cite_note-11http://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/Serializationhttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=10http://en.wikipedia.org/wiki/Coalesced_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-9http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=11http://en.wikipedia.org/wiki/Hash_table#cite_note-10http://en.wikipedia.org/wiki/Hash_table#cite_note-11
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    24/30

    in an external file and each table position corresponds to a fixed-sized page or bucket withB

    records.[13]

    [edit] Cuckoo hashing

    Another alternative open-addressing solution is cuckoo hashing, which ensures constant lookuptime in the worst case, and constant amortized time for insertions and deletions. It uses two or

    more hash functions, which means any key/value pair could be in two or more locations. Forlookup, the first hash function is used; if the key/value is not found, then the second hash

    function is used, and so on. If a collision happens during insertion, then the key is re-hashed with

    the second hash function to map it to another bucket. If all hash functions are used and there isstill a collision, then the key it collided with is removed to make space for the new key, and the

    old key is re-hashed with one of the other hash functions, which maps it to another bucket. If that

    location also results in a collision, then the process repeats until there is no collision or theprocess traverses all the buckets, at which point the table is resized. By combining multiple hash

    functions with multiple cells per bucket, very high space utilisation can be achieved.

    [edit] Hopscotch hashing

    Another alternative open-addressing solution is hopscotch hashing,[14] which combines the

    approaches ofcuckoo hashing and linear probing, yet seems in general to avoid their limitations.

    In particular it works well even when the load factor grows beyond 0.9. The algorithm is wellsuited for implementing a resizable concurrent hash table.

    The hopscotch hashing algorithm works by defining a neighborhood of buckets near the original

    hashed bucket, where a given entry is always found. Thus, search is limited to the number of

    entries in this neighborhood, which is logarithmic in the worst case, constant on average, and

    with proper alignment of the neighborhood typically requires one cache miss. When inserting anentry, one first attempts to add it to a bucket in the neighborhood. However, if all buckets in this

    neighborhood are occupied, the algorithm traverses buckets in sequence until an open slot (an

    unoccupied bucket) is found (as in linear probing). At that point, since the empty bucket isoutside the neighborhood, items are repeatedly displaced in a sequence of hops. (This is similar

    to cuckoo hashing, but with the difference that in this case the empty slot is being moved into the

    neighborhood, instead of items being moved out with the hope of eventually finding an emptyslot.) Each hop brings the open slot closer to the original neighborhood, without invalidating the

    neighborhood property of any of the buckets along the way. In the end, the open slot has been

    moved into the neighborhood, and the entry being inserted can be added to it.

    [edit] Dynamic resizing

    To keep the load factor under a certain limit, e.g. under 3/4, many table implementations expandthe table when items are inserted. For example, in Java'sHashMap class the default load factor

    threshold for table expansion is 0.75. Since buckets are usually implemented on top of adynamic

    array and any constant proportion for resizing greater than 1 will keep the load factor under the

    http://en.wikipedia.org/wiki/Hash_table#cite_note-12http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=12http://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=13http://en.wikipedia.org/wiki/Hopscotch_hashinghttp://en.wikipedia.org/wiki/Hopscotch_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-13http://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/wiki/Linear_probinghttp://en.wikipedia.org/w/index.php?title=Concurrent_hash_table&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=14http://en.wikipedia.org/wiki/Java_(programming_language)http://java.sun.com/javase/6/docs/api/java/util/HashMap.htmlhttp://java.sun.com/javase/6/docs/api/java/util/HashMap.htmlhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Hash_table#cite_note-12http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=12http://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=13http://en.wikipedia.org/wiki/Hopscotch_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-13http://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/wiki/Linear_probinghttp://en.wikipedia.org/w/index.php?title=Concurrent_hash_table&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=14http://en.wikipedia.org/wiki/Java_(programming_language)http://java.sun.com/javase/6/docs/api/java/util/HashMap.htmlhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Dynamic_array
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    25/30

    desired limit, the exact choice of the constant is determined by the samespace-time tradeoffas

    fordynamic arrays.

    Resizing is accompanied by a full or incremental table rehash whereby existing items aremapped to new bucket locations.

    To limit the proportion of memory wasted due to empty buckets, some implementations also

    shrink the size of the table - followed by a rehash - when items are deleted. From the point of

    space-time tradeoffs, this operation is similar to the deallocation in dynamic arrays.

    [edit] Resizing by copying all entries

    A common approach is to automatically trigger a complete resizing when the load factor exceeds

    some threshold rmax. Then a new larger table is allocated, all the entries of the old table areremoved and inserted into this new table, and the old table is returned to the free storage pool.

    Symmetrically, when the load factor falls below a second threshold rmin, all entries are moved to

    a new smaller table.

    If the table size increases or decreases by a fixed percentage at each expansion, the total cost ofthese resizings, amortizedover all insert and delete operations, is still a constant, independent of

    the number of entries n and of the numberm of operations performed.

    For example, consider a table that was created with the minimum possible size and is doubled

    each time the load ratio exceeds some threshold. Ifm elements are inserted into that table, thetotal number of extra re-insertions that occur in all dynamic resizings of the table is at most m1.

    In other words, dynamic resizing roughly doubles the cost of each insert or delete operation.

    [edit] Incremental resizing

    Some hash table implementations, notably in real-time systems, cannot pay the price of enlarging

    the hash table all at once, because it may interrupt time-critical operations. If one cannot avoid

    dynamic resizing, a solution is to perform the resizing gradually:

    During the resize, allocate the new hash table, but keep the old tableunchanged.

    In each lookup or delete operation, check both tables. Perform insertion operations only in the new table. At each insertion also move r elements from the old table to the new table.

    When all elements are removed from the old table, deallocate it.

    To ensure that the old table is completely copied over before the new table itself needs to be

    enlarged, it is necessary to increase the size of the table by a factor of at least (r+ 1)/rduring

    resizing.

    http://en.wikipedia.org/wiki/Space-time_tradeoffhttp://en.wikipedia.org/wiki/Space-time_tradeoffhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Space-time_tradeoffhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=15http://en.wikipedia.org/wiki/Dynamic_memory_allocationhttp://en.wikipedia.org/wiki/Dynamic_memory_allocationhttp://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=16http://en.wikipedia.org/wiki/Real-time_systemhttp://en.wikipedia.org/wiki/Real-time_systemhttp://en.wikipedia.org/wiki/Space-time_tradeoffhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Space-time_tradeoffhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=15http://en.wikipedia.org/wiki/Dynamic_memory_allocationhttp://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=16http://en.wikipedia.org/wiki/Real-time_system
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    26/30

    [edit] Monotonic keys

    If it is known that key values will always increase monotonically, then a variation ofconsistenthashing can be achieved by keeping a list of the single most recent key value at each hash table

    resize operation. Upon lookup, keys that fall in the ranges defined by these list entries are

    directed to the appropriate hash function--and indeed hash table--both of which can be differentfor each range. Since it is common to grow the overall number of entries by doubling, there will

    only be O(lg(N)) ranges to check, and binary search time for the redirection would be

    O(lg(lg(N))).

    [edit] Other solutions

    Linear hashing[15]is a hash table algorithm that permits incremental hash table expansion. It is

    implemented using a single hash table, but with two possible look-up functions.

    Another way to decrease the cost of table resizing is to choose a hash function in such a way that

    the hashes of most values do not change when the table is resized. This approach, calledconsistent hashing, is prevalent in disk-based and distributed hashes, where rehashing is

    prohibitively costly.

    [edit] Performance analysis

    In the simplest model, the hash function is completely unspecified and the table does not resize.For the best possible choice of hash function, a table of size n with open addressing has no

    collisions and holds up to n elements, with a single comparison for successful lookup, and a table

    of size n with chaining and kkeys has the minimum max(0, k-n) collisions and O(1 + k/n)comparisons for lookup. For the worst choice of hash function, every insertion causes a collision,

    and hash tables degenerate to linear search, with (k) amortized comparisons per insertion and

    up to kcomparisons for a successful lookup.

    Adding rehashing to this model is straightforward. As in a dynamic array, geometric resizing bya factor ofb implies that only k/bi keys are inserted i or more times, so that the total number of

    insertions is bounded above by bk/(b-1), which is O(k). By using rehashing to maintain k< n,

    tables using both chaining and open addressing can have unlimited elements and performsuccessful lookup in a single comparison for the best choice of hash function.

    In more realistic models, the hash function is a random variable over a probability distribution of

    hash functions, and performance is computed on average over the choice of hash function. When

    this distribution is uniform, the assumption is called "simple uniform hashing" and it can beshown that hashing with chaining requires (1 + k/n) comparisons on average for an

    unsuccessful lookup, and hashing with open addressing requires (1/(1 - k/n)).[16]Both these

    bounds are constant, if we maintain k/n < c using table resizing, where c is a fixed constant lessthan 1.

    http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=17http://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=18http://en.wikipedia.org/wiki/Linear_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-14http://en.wikipedia.org/wiki/Hash_table#cite_note-14http://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=19http://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Uniform_distribution_(discrete)http://en.wikipedia.org/wiki/Uniform_distribution_(discrete)http://en.wikipedia.org/wiki/Hash_table#cite_note-15http://en.wikipedia.org/wiki/Hash_table#cite_note-15http://en.wikipedia.org/wiki/Hash_table#cite_note-15http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=17http://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=18http://en.wikipedia.org/wiki/Linear_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-14http://en.wikipedia.org/wiki/Consistent_hashinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=19http://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Uniform_distribution_(discrete)http://en.wikipedia.org/wiki/Hash_table#cite_note-15
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    27/30

    [edit] Features

    [edit] Advantages

    The main advantage of hash tables over other table data structures is speed. This advantage is

    more apparent when the number of entries is large (thousands or more). Hash tables areparticularly efficient when the maximum number of entries can be predicted in advance, so that

    the bucket array can be allocated once with the optimum size and never resized.

    If the set of key-value pairs is fixed and known ahead of time (so insertions and deletions are not

    allowed), one may reduce the average lookup cost by a careful choice of the hash function,bucket table size, and internal data structures. In particular, one may be able to devise a hash

    function that is collision-free, or even perfect (see below). In this case the keys need not be

    stored in the table.

    [edit] Drawbacks

    Hash tables can be more difficult to implement than self-balancing binary search trees[citation needed].

    Choosing an effective hash function for a specific application is more an art than a science. Inopen-addressed hash tables it is fairly easy to create a poor hash function.

    Although operations on a hash table take constant time on average, the cost of a good hash

    function can be significantly higher than the inner loop of the lookup algorithm for a sequential

    list or search tree. Thus hash tables are not effective when the number of entries is very small.(However, in some cases the high cost of computing the hash function can be mitigated by

    saving the hash value together with the key.)

    For certain string processing applications, such as spell-checking, hash tables may be lessefficient than tries,finite automata, orJudy arrays. Also, if each key is represented by a small

    enough number of bits, then, instead of a hash table, one may use the key directly as the index

    into an array of values. Note that there are no collisions in this case.

    The entries stored in a hash table can be enumerated efficiently (at constant cost per entry), butonly in some pseudo-random order. Therefore, there is no efficient way to locate an entry whose

    key is nearestto a given key. Listing all n entries in some specific order generally requires a

    separate sorting step, whose cost is proportional to log(n) per entry. In comparison, ordered

    search trees have lookup and insertion cost proportional to log(n), but allow finding the nearestkey at about the same cost, and orderedenumeration of all entries at constant cost per entry.

    If the keys are not stored (because the hash function is collision-free), there may be no easy way

    to enumerate the keys that are present in the table at any given moment.

    Although the average cost per operation is constant and fairly small, the cost of a singleoperation may be quite high. In particular, if the hash table usesdynamic resizing, an insertion or

    deletion operation may occasionally take time proportional to the number of entries. This may be

    a serious drawback in real-time or interactive applications.

    http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=20http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=21http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=22http://en.wikipedia.org/wiki/Self-balancing_binary_search_treehttp://en.wikipedia.org/wiki/Self-balancing_binary_search_treehttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Spell_checkerhttp://en.wikipedia.org/wiki/Triehttp://en.wikipedia.org/wiki/Triehttp://en.wikipedia.org/wiki/Finite_automatahttp://en.wikipedia.org/wiki/Judy_arrayhttp://en.wikipedia.org/wiki/Judy_arrayhttp://en.wikipedia.org/wiki/Hash_table#Dynamic_resizinghttp://en.wikipedia.org/wiki/Hash_table#Dynamic_resizinghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=20http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=21http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=22http://en.wikipedia.org/wiki/Self-balancing_binary_search_treehttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Spell_checkerhttp://en.wikipedia.org/wiki/Triehttp://en.wikipedia.org/wiki/Finite_automatahttp://en.wikipedia.org/wiki/Judy_arrayhttp://en.wikipedia.org/wiki/Hash_table#Dynamic_resizing
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    28/30

    Hash tables in general exhibit poorlocality of referencethat is, the data to be accessed is

    distributed seemingly at random in memory. Because hash tables cause access patterns that jump

    around, this can triggermicroprocessor cache misses that cause long delays. Compact datastructures such as arrays searched with linear searchmay be faster, if the table is relatively small

    and keys are integers or other short strings. According toMoore's Law, cache sizes are growing

    exponentially and so what is considered "small" may be increasing. The optimal performancepoint varies from system to system.

    Hash tables become quite inefficient when there are many collisions. While extremely uneven

    hash distributions are extremely unlikely to arise by chance, amalicious adversary with

    knowledge of the hash function may be able to supply information to a hash that creates worst-case behavior by causing excessive collisions, resulting in very poor performance (e.g., a denial

    of service attack). In critical applications, eitheruniversal hashing can be used or a data structure

    with better worst-case guarantees may be preferable.[17]

    [edit] Uses

    [edit] Associative arrays

    Hash tables are commonly used to implement many types of in-memory tables. They are used to

    implement associative arrays(arrays whose indices are arbitrary stringsor other complicated

    objects), especially in interpretedprogramming languages likeAWK,Perl, and PHP.

    [edit] Database indexing

    Hash tables may also be used fordisk-basedpersistent data structures and database indices (such

    as dbm) althoughbalanced trees are more popular in these applications.

    [edit] Caches

    Hash tables can be used to implement caches, auxiliary data tables that are used to speed up the

    access to data that is primarily stored in slower media. In this application, hash collisions can behandled by discarding one of the two colliding entriesusually the one that is currently stored in

    the table.

    [edit] Sets

    Besides recovering the entry that has a given key, many hash table implementations can also tellwhether such an entry exists or not.

    Those structures can therefore be used to implement a set data structure, which merely recordswhether a given key belongs to a specified set of keys. In this case, the structure can be

    simplified by eliminating all parts that have to do with the entry values. Hashing can be used to

    implement both static and dynamic sets.

    http://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Linear_searchhttp://en.wikipedia.org/wiki/Linear_searchhttp://en.wikipedia.org/wiki/Moore's_Lawhttp://en.wikipedia.org/wiki/Moore's_Lawhttp://en.wikipedia.org/wiki/Black_hathttp://en.wikipedia.org/wiki/Black_hathttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Universal_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-16http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=23http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=24http://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/String_(computing)http://en.wikipedia.org/wiki/String_(computing)http://en.wikipedia.org/wiki/Interpreter_(computer_science)http://en.wikipedia.org/wiki/Interpreter_(computer_science)http://en.wikipedia.org/wiki/Programming_languagehttp://en.wikipedia.org/wiki/AWKhttp://en.wikipedia.org/wiki/AWKhttp://en.wikipedia.org/wiki/AWKhttp://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/PHPhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=25http://en.wikipedia.org/wiki/Disk_drivehttp://en.wikipedia.org/wiki/Persistent_data_structurehttp://en.wikipedia.org/wiki/Index_(database)http://en.wikipedia.org/wiki/Dbmhttp://en.wikipedia.org/wiki/Balanced_treehttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=26http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=27http://en.wikipedia.org/wiki/Set_data_structurehttp://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Linear_searchhttp://en.wikipedia.org/wiki/Moore's_Lawhttp://en.wikipedia.org/wiki/Black_hathttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Universal_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-16http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=23http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=24http://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/String_(computing)http://en.wikipedia.org/wiki/Interpreter_(computer_science)http://en.wikipedia.org/wiki/Programming_languagehttp://en.wikipedia.org/wiki/AWKhttp://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/PHPhttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=25http://en.wikipedia.org/wiki/Disk_drivehttp://en.wikipedia.org/wiki/Persistent_data_structurehttp://en.wikipedia.org/wiki/Index_(database)http://en.wikipedia.org/wiki/Dbmhttp://en.wikipedia.org/wiki/Balanced_treehttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=26http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=27http://en.wikipedia.org/wiki/Set_data_structure
  • 8/7/2019 Symmetric and Aysmmetric Algorithms

    29/30

    [edit] Object representation

    Several dynamic languages, such as Perl,Python,JavaScript, and Ruby, use hash tables toimplement objects. In this representation, the keys are the names of the members and methods of

    the object, and the values are pointers to the corresponding member or method.

    [edit] Unique data representation

    Hash tables can be used by some programs to avoid creating multiple character strings with the

    same contents. For that purpose, all strings in use by the program are stored in a single hash

    table, which is checked whenever a new string has to be created. This technique was introducedin Lisp interpreters under the name hash consing, and can be used with many other kinds of data

    (expression trees in asymbolic algebra system, records in a database, files in a file system,

    binary decision diagrams, etc.)

    [edit] String interning

    Main article: String interning

    [edit] Implementations

    [edit] In programming languages

    Many programming languages provide hash table functionality, either as built-in associative

    arrays or as standard library modules. InC++0x, for example, thehash_map and unordered_map

    classes provide hash tables for keys and values of arbitrary type.

    In PHP 5, the Zend 2 engine uses one of the hash functions from Daniel J. Bernstein to generatethe hash values used in managing the mappings of data pointers stored in a HashTable. In the

    PHP source code, it is labelled as "DJBX33A" (Daniel J. Bernstein, Times 33 with Addition).

    Python's built-in hash table implementation, in the form of the dict type, as well as Perl's hash

    type (%) are highly optimized as they are used internally to implement namespaces.

    [edit] Independent packages

    Google Sparse HashThe Google SparseHash project contains several C++hash-map implementations in use at Google, with different performance

    characteristics, including an implementation that optimizes for memory useand one that optimizes for speed. The memory-optimized one is extremelymemory-efficient with only 2 bits/entry of overhead.

    SunriseDD An open source C library for hash table storage of arbitrary dataobjects with lock-free lookups, built-in reference counting and guaranteedorder iteration. The library can participate in external reference countingsystems or use its own built-in reference counting. It comes with a variety of

    http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=28http://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/JavaScripthttp://en.wikipedia.org/wiki/JavaScripthttp://en.wikipedia.org/wiki/Ruby_(programming_language)http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=29http://en.wikipedia.org/wiki/Lisp_(programming_language)http://en.wikipedia.org/wiki/Hash_consinghttp://en.wikipedia.org/wiki/Expression_treehttp://en.wikipedia.org/w/index.php?title=Symbolic_algebra_system&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Symbolic_algebra_system&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Symbolic_algebra_system&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=30http://en.wikipedia.org/wiki/String_interninghttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=31http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=32http://en.wikipedia.org/wiki/Library_(computing)http://en.wikipedia.org/wiki/C%2B%2B0xhttp://en.wikipedia.org/wiki/C%2B%2B0xhttp://en.wikipedia.org/wiki/Hash_map_(C