10.1: The File Data Structure

A file is a collection of records. This is a loose definition of a file. Almost all of the data structures discussed in the preceding chapters satisfy it. The term file, however, is usually reserved for large collections of information stored on devices outside the computer's internal memory. It usually implies that the records are stored in secondary storage in the computer's external memory, on tapes or disks. As a result, the ways in which the file must be organized so that operations on it can be carried out efficiently are dependent on the characteristics of the secondary storage devices used to implement the file. The basic operations on a file are to insert and delete records, process or update records, and search for or retrieve records. These operations are the same as the basic operations on arrays, lists, trees, list-structures, and more complex lists, which are stored in the computer's internal memory. In designing algorithms that use files stored in external memory, programmers must weigh the trade-offs between the time and the storage required to support these operations. They must do this just as they would for data structures stored in internal memory. Time spent in organizing information results in faster operations, but this efficiency must be weighed against the increased care required to nurture and maintain the better-organized data structure. For example, balanced trees are faster to use in the worst case but harder to grow. Similar effects pertain to file manipulations.