10.7.2 Random Access

We still have hash tables in our arsenal. Using hash tables in secondary storage requires some accommodation to its access capabilities. Buckets, an extension of the chaining policy for collision resolution with internal memory, are appropriate. The idea is to create a hash table based on key values. The hash table will contain pointers to lists. Each list, referred to as a bucket, contains all records that hash to the same hash table address. The lists are stored in blocks on the disk, each block holding a fixed number of records and a pointer to the next block containing other records on the list. When a key is inserted into the hash table, the program determines its hash address and then inserts the key into the bucket pointed to by that address. The key may be inserted into the bucket by traversing the list of linked blocks until a block is encountered with room for the record. If no such block is found, a new block is appended to the list, and the record is inserted in that block. Care must be taken to avoid wasting seek time (and latency time) when these lists are being traversed. This means that the blocks of a bucket should be on the same cylinder. Searching for a key is similar to making an insertion. Deletion involves the usual deletion from a list, with care taken to release unneeded blocks when possible. When small enough, the hash table itself is kept in internal memory to avoid one disk access.

In practice, such hash tables can be devised so that one or two disk accesses are required for either successful or unsuccessful searches. This is accomplished by selecting a good hash function, an appropriate fixed number of records per block, and an appropriate number of buckets. The usage factor of such a hash table is given by n/bk, where n is the number of records stored, b is the number of buckets, and k the fixed number of records per bucket. If the performance of the table degrades as insertions are made, the table may be reorganized.

Other interesting and important external hashing methods are presented in Larson [1976], Litwin [1960], and Fagin, Nievergelt, Pippenger, and Strong [1979].