Data Structures Using C: 10.2.2 Filters

10.2.2 Filters

Frequently, a solution to a problem is readily found by starting with one or more files of data and repeatedly applying available programs to these files. These programs, in turn, produce output data in the form of files, which then become the input files for other programs. The basic idea behind this kind of solution is a filter?/FONT>a program whose input is one file and whose output is another file. A solution to a problem may often be found by starting with an input file and applying a sequence of filters. Each filter takes as its input the output file of the preceding filter. System commands allow this composition of filters, so that complex programs can be generated with few commands.

Example 10.1

Suppose an input file contained the roll scores for the bowling problem of Chapter 1. The problem is to structure the solution so that processing this file produces an output file of statistics. n

Suppose gamescores is a filter that takes the input file and produces an output file of game scores. Once the game scores are available, a statistical package in the computer's operating system library might be used to process the game scores as an input file (as data), producing appropriate statistics. Applying this program, or filter, to the game scores file would result in an output file containing the required information. This file could be stored for future use or processed by a printing routine, producing printed output directly.

The two filters, gamescores and statistics, applied sequentially, produce the desired output file. Note that this solution is written so that the calculation of the game scores is done first, independently of the calculations producing the statistics. In the solution of Chapter 1 these tasks were intertwined.

The idea behind filters is to allow the construction of complex programs by sequencing simpler but powerful programs. The next example illustrates this more forcefully.

Example 10.2

Use filters to produce a simplified index for a book. The index should consist of the important words of the book in alphabetical order and the pages on which they appear. n

One way to achieve a solution is to apply the following filters in sequence.

1. Word-count references

2. Truncate

3. Sort

The first takes a file consisting of the book text (organized as a sequence of pages) and produces an output file of records. Each record contains a word or sequence of words, the number of pages on which it appears, and the pages on which it appears. Truncate takes a file of records as its input and outputs a file that is identical to its input file, except that all records with a designated key field value greater than n are deleted. The key field contains the number of pages on which the word or sequence appears. A reasonable value for n might be 10. This should eliminate frequently occurring but not relevant words such as "the," "a," "and," and so on. Sort takes an input file of records and creates an output file in which the records appear in sorted order by designated key field value?/FONT>in this case, the word field. The sequencing of these three relatively simple filters provides a quick and effective solution.

Using these system capabilities to store and access files of data or programs provides a framework in which to approach the entire task of problem solving. In effect, a computer installation provides a programming environment that determines the tools available for program construction. Such tools include text editors, data base management systems, file management utilities, ways to create a new program from existing programs already stored as files, and debugging aids. The enhancement of such programming environments is currently one of the major research and development areas of computer science. The command languages, also known as job control languages, provided by operating systems are used to operate on general files.