Data Structures Using C: 9.3.1 Building a Hash Table

9.3.1 Building a Hash Table

This section deals with a scaled-down version of the social security example. There are eighteen records, each with a key value that is a three-digit number. The records and their key values will be given in some order, normally unpredictable. We are to build a table, implemented in an array, in which the eighteen key values will be stored, so that searching can be done efficiently. One way to do this is to build a special kind of table?/FONT>a hash table. This table will then be searched for given key values. To build a hash table requires a hashing function and a collision resolution policy. A hashing function is a method of calculating an array or table address for any key. To be useful, the address must be quickly calculable.

The hash function should have other properties, as will become evident. The hash function used here assigns to any key the remainder after division of the key by the table size. If the table size is m and the key value is k, then k mod m denotes this remainder. The starting table or array size is 23.* The eighteen key values and their order are given in Table 9.2, along with the hashing function values assigned to each.

* With division-type hashing functions, the divisor, or table size, is usually a prime number greater than the number of entries to be placed in the table. We have taken this prime to be 23.

Table 9.2 Hash Addresses

               Hashing                Hashing

Key Value      Address  Key Value     Address

---------------------------------------------

   019     -->     19       468     -->    08

   392     -->     01       814     -->    09

   179     -->     18       720     -->    07

   359     -->     14       260     -->    07

   663     -->     19       802     -->    20

   262     -->     09       364     -->    19

   639     -->     18       976     -->    10

   321     -->     22       774     -->    15

   097     -->     05       566     -->    14

The hashing address for 019 is found, for example, by dividing 019 by the table size 23, to obtain 019 = 0 ?/FONT> 23 + 19. The remainder, 19, is the assigned hashing address. As each key value is input, its hashing address is calculated, and the key value is placed into that element of the table. After the first four key values are entered, the table is as shown in Figure 9.3(a).

When the fifth key value, 663, is input, its hashing address is 19, but element 19 of the table is already occupied by another key value. This event is called a collision. The collision resolution policy is the method used to determine where to store a value that has undergone collision. The most straightforward policy for resolving the conflict is called linear probing. It entails proceeding through the table from the collision element and placing 663, the colliding entry, into the first unoccupied element found. The hash table is assumed to have been initialized, so that each element was "unoccupied" or "empty." If the search for an empty element reaches the end of the table, it circles to the top of the table. In this case, 663 is entered at element 20. The final table is shown in Figure 9.3(b). A total of eight initial collisions occurred in building it.

Figure 9.3 Hash Table (a) after Four Entries and (b) Finally

Each time a location of the table is accessed in an attempt to insert a key value, one probe has been made. To enter the first key value required one probe; to enter the fifth key value required two probes. To enter keys 1 through 15 required, respectively, 1, 1, 1, 1, 2, 1, 4, 1, 1, 1, 2, 1, 5, 4, 7, 3, 1, 3 probes. You can build the table and confirm these probe numbers.