Data Structures and Algorithms: CHAPTER 2: Basic Abstract DataTypes

Let us now write the five queue commands using this representation for a queue. Formally, queues are defined by:

The commands appear in Fig. 2.22. The function addone(i) adds one to position i in the circular sense.

2.5 Mappings

A mapping or associative store is a function from elements of one type, called the domain type to elements of another (possibly the same) type, called the range type. We express the fact that the mapping M associates element r of range type rangetype with element d of domain type domaintype by M(d) = r.

Certain mappings such as square(i) = i² can be implemented easily as a Pascal function by giving an arithmetic expression or other simple means for calculating M(d) from d. However, for many mappings there is no apparent way to describe M(d) other than to store for each d the value of M(d). For example, to implement a payroll function that associates with each employee a weekly salary seems to require that we store the current salary for each employee. In the remainder of this section we describe a method of implementing functions such as the "payroll" function.

Let us consider what operations we might wish to perform on a mapping M. Given an element d of some domain type, we may wish to obtain M(d) or know whether M(d) is defined (i.e., whether d is currently in the domain of M). Or we may wish to enter new elements into the current domain of M and state their associated range values. Alternatively, we might wish to change the value of M(d). We also need a way to initialize a mapping to the null mapping, the mapping whose domain is empty. These operations are summarized by the following three commands.

Array Implementation of Mappings

Many times, the domain type of a mapping will be an elementary type that can be used as an index type of an array. In Pascal, the index types include all the finite subranges of integers, like 1..100 or 17..23, the type char and subranges of char like 'A'..'Z', and enumerated types like (north, east, south, west). For example, a cipher-breaking program might keep a mapping crypt, with 'A'..'Z' as both its domain type and its range type, such that crypt (plaintext) is the letter currently guessed to stand for the letter plaintext.

Such mappings can be implemented simply by arrays, provided there is some range type value that can stand for "undefined." For example, the above mapping crypt might be defined to have range type char, rather than 'A'..'Z', and '?' could be used to denote "undefined."

Suppose the domain and range types are domaintype and rangetype, and domaintype is a basic Pascal type. Then we can define the type MAPPING (strictly speaking, mapping from domaintype to rangetype) by the declaration

List Implementations of Mappings

There are many possible implementations of mappings with finite domains. For example, hash tables are an excellent choice in many situations, but one whose discussion we shall defer to Chapter 4. Any mapping with a finite domain can be represented by the list of pairs (d₁, r₁), (d₂, r₂), . . . , (d_k, r_k), where d₁, d₂, . . . , d_k are all the current members of the domain, and r_i is the value that the mapping associates with d_i, for i = 1, 2 , . . . ,k. We can then use any implementation of lists we choose for this list of pairs.

To be precise, the abstract data type MAPPING can be implemented by lists of elementtype, if we define

2.6 Stacks and Recursive Procedures

One important application of stacks is in the implementation of recursive procedures in programming languages. The run-time organization for a programming language is the set of data structures used to represent the values of the program variables during program execution. Every language that, like Pascal, allows recursive procedures, uses a stack of activation records to record the values for all the variables belonging to each active procedure of a program. When a procedure P is called, a new activation record for P is placed on the stack, regardless of whether there is already another activation record for P on the stack. When P returns, its activation record must be on top of the stack, since P cannot return until all procedures it has called have returned to P. Thus, we may pop the activation record for this call of P to cause control to return to the point at which P was called (that point, known as the return address, was placed in P's activation record when the call to P

Recursion simplifies the structure of many programs. In some languages, however, procedure calls are much more costly than assignment statements, so a program may run faster by a large constant factor if we eliminate recursive procedure calls from it. We do not advocate that recursion or other procedure calls be eliminated habitually; most often the structural simplicity is well worth the running time. However, in the most frequently executed portions of programs, we may wish to eliminate recursion, and it is the purpose of this discussion to illustrate how recursive procedures can be converted to nonrecursive ones by the introduction of a user-defined stack.

Example 2.3. Let us consider recursive and nonrecursive solutions to a simplified version of the classic knapsack problem in which we are given target t and a collection of positive integer weights w₁, w₂ , . . . , w_n. We are asked to determine whether there is some selection from among the weights that totals exactly t. For example, if t = 10, and the weights are 7, 5, 4, 4, and 1, we could select the second, third, and fifth weights, since 5+4+ 1 = 10.

The image that justifies the name "knapsack problem" is that we wish to carry on our back no more than t pounds, and we have a choice of items with given weights to carry. We presumably find the items' utility to be proportional to their weight,† so we wish to pack our knapsack as closely to the target weight as we can.

In Fig. 2.25 we see a function knapsack that operates on an array

weights : array [l..n] of integer.

A call to knapsack(s, i) determines whether there is a collection of the elements in weight[i] through weight[n] that sums to exactly s, and prints these weights if so. The first thing knapsack does is determine if it can respond immediately. Specifically, if s = 0, then the empty set of weights is a solution. If s < 0, there can be no solution, and if s > 0 and i > n, then we are out of weights to consider and therefore cannot find a sum equal to s.

If none of these cases applies, then we simply call knapsack(s-w_i, i + 1) to see if there is a solution that includes w_i. If there is such a solution, then the total problem is solved, and the solution includes w_i, so we print it. If there is no solution, then we call knapsack(s, i + 1) to see if there is a solution that does not use w_i.

Elimination of Tail Recursion

Often, we can eliminate mechanically the last call a procedure makes to itself. If a procedure P(x) has, as its last step, a call to P(y), then we can replace the call to P(y) by an assignment x := y, followed by a jump to the beginning of the code for P. Here, y could be an expression, but x must be a parameter passed by value, so its value is stored in a location private to this call to P ‡ P could have more than one parameter, of course, and if so, they are each treated exactly as x and y above.

This change works because rerunning P with the new value of x has exactly the same effect as calling P(y) and then returning from that call.

Notice that the fact that some of P's local variables have values the second time around is of no consequence. P could not use any of those values, or had we called P(y) as originally intended, the value used would not have been defined.

Another variant of tail recursion is illustrated by Fig. 2.25, where the last step of the function knapsack just returns the result of calling itself with other parameters. In such a situation, again provided the parameters are passed by value (or by reference if the same parameter is passed to the call), we can replace the call by assignments to the parameters and a jump to the beginning of the function. In the case of Fig. 2.25, we can replace line (8) by

Complete Recursion Elimination

The tail recursion elimination procedure removes recursion completely only when the recursive call is at the end of the procedure, and the call has the correct form. There is a more general approach that converts any recursive procedure (or function) into a nonrecursive one, but this approach introduces a user-defined stack. In general, a cell of this stack will hold:

In the case of the function knapsack, we can do something simpler. First, observe that whenever we make a call (push a record onto the stack), candidate increases by 1. Thus, we can keep candidate as a global variable, incrementing it by one every time we push the stack and decreasing it by one when we pop.

A second simplification we can make is to keep a modified "return address" on the stack. Strictly speaking, the return address for this function is either a place in some other procedure that calls knapsack, or the call at line (5), or the call at line (8). We can represent these three conditions by a "status," which has one of three values:

If we store this status symbol as the return address, then we can treat target as a global variable. When changing from status none to included, we subtract weights[candidate] from target, and we add it back in when changing from status included to excluded. To help represent the effect of the knapsack's return indicating whether a solution has been found, we use a global winflag. Once set to true, winflag remains true and causes the stack to be popped and those weights with status included to be printed. With these modifications, we can declare our stack to be a list of statuses, by

Exercises

2.1	Write a program to print the elements on a list. Throughout these exercises use list operations to implement your programs.
2.2	Write programs to insert, delete, and locate an element on a sorted list using array, pointer, and cursor implementations of lists. What is the running time of each of your programs?
2.3	Write a program to merge two sorted lists, n sorted lists.
2.4	Write a program to concatenate a list of lists.
2.5	Suppose we wish to manipulate polynomials of the form p(x) = c₁x^e₁ + c₂x^e₂ + . . . + c_nx^e_n, where e₁ > e₂ > . . . > e_n ?/FONT> 0. Such a polynomial can be represented by a linked list in which each cell has three fields: one for the coefficient c_i, one for the exponent e_i, and one for the pointer to the next cell. Write a program to differentiate polynomials represented in this manner.
2.6	Write programs to add and multiply polynomials of the form in Exercise 2.5. What is the running time of your programs as a function of the number of terms?
*2.7	Suppose we declare cells by type celltype = record bit: 0..1; next: ?/FONT> celltype end; A binary number b₁b₂ . . . b_n, where each b_i is 0 or 1, has numerical value . This number can be represented by the list b₁, b₂ , . . . , b_n. That list, in turn, can be represented as a linked list of cells of type celltype. Write a procedure increment(bnumber) that adds one to the binary number pointed to by bnumber. Hint: Make increment recursive.
2.8	Write a procedure to interchange the elements at positions p and NEXT(p) in a singly linked list.
*2.9	The following procedure was intended to remove all occurrences of element x from list L. Explain why it doesn't always work and suggest a way to repair the procedure so it performs its intended task. procedure delete ( x: elementtype; var L: LIST ); var p: position; begin p := FIRST(L); while p <> END(L) do begin if RETRIEVE(p, L) = x then DELETE(p, L); p := NEXT(p, L) end end; { delete }
2.10	We wish to store a list in an array A whose cells consist of two fields, data to store an element and position to give the (integer) position of the element. An integer last indicates that A[1] through A[last] are used to hold the list. The type LIST can be defined by type LIST = record last: integer; elements: array[1..maxlength] of record data: elementtype; position: integer end end; Write a procedure DELETE(p, L) to remove the element at position p. Include all necessary error checks.
2.11	Suppose L is a LIST and p, q, and r are positions. As a function of n, the length of list L, determine how many times the functions FIRST, END, and NEXT are executed by the following program. p := FIRST(L); while p <> END(L) do begin q := p; while q <> END(L) do begin q := NEXT(q, L); r := FIRST(L); while r <> q do r := NEXT(r, L) end; p := NEXT(p, L) end;
2.12	Rewrite the code for the LIST operations assuming a linked list representation, but without a header cell. Assume true pointers are used and position 1 is represented by nil.
2.13	Add the necessary error checks in the procedure of Fig. 2.12.
2.14	Another array representation of lists is to insert as in Section 2.2, but when deleting, simply replace the deleted element by a special value "deleted," which we assume does not appear on lists otherwise. Rewrite the list operations to implement this strategy. What are the advantages and disadvantages of the approach compared with our original array representation of lists?
2.15	Suppose we wish to use an extra bit in queue records to indicate whether a queue is empty. Modify the declarations and operations for a circular queue to accommodate this feature. Would you expect the change to be worthwhile?
2.16	A dequeue (double-ended queue) is a list from which elements can be inserted or deleted at either end. Develop array, pointer, and cursor implementations for a dequeue.
2.17	Define an ADT to support the operations ENQUEUE, DEQUEUE, and ONQUEUE. ONQUEUE(x) is a function returning true or false depending on whether x is or is not on the queue.
2.18	How would one implement a queue if the elements that are to be placed on the queue are arbitrary length strings? How long does it take to enqueue a string?
2.19	Another possible linked-list implementation of queues is to use no header cell, and let front point directly to the first cell. If the queue is empty, let front = rear = nil. Implement the queue operations for this representation. How does this implementation compare with the list implementation given for queues in Section 2.4 in terms of speed, space utilization, and conciseness of the code?
2.20	A variant of the circular queue records the position of the front element and the length of the queue. Is it necessary in this implementation to limit the length of a queue to maxlength - 1? Write the five queue operations for this implementation. Compare this implementation with the circular queue implementation of Section 2.4.
2.21	It is possible to keep two stacks in a single array, if one grows from position 1 of the array, and the other grows from the last position. Write a procedure PUSH(x, S) that pushes element x onto stack S, where S is one or the other of these two stacks. Include all necessary error checks in your procedure.
2.22	We can store k stacks in a single array if we use the data structure suggested in Fig. 2.27, for the case k = 3. We push and pop from each stack as suggested in connection with Fig. 2.17 in Section 2.3. However, if pushing onto stack i causes TOP(i) to equal BOTTOM(i-1), we first move all the stacks so that there is an appropriate size gap between each adjacent pair of stacks. For example, we might make the gaps above all stacks equal, or we might make the gap above stack i proportional to the current size of stack i (on the theory that larger stacks are likely to grow sooner, and we want to postpone as long as possible the next reorganization). On the assumption that there is a procedure reorganize to call when stacks collide, write code for the five stack operations. On the assumption that there is a procedure makenewtops that computes newtop[i], the "appropriate" position for the top of stack i, for 1 ?/FONT> i ?/FONT> k, write the procedure reorganize. Hint. Note that stack i could move up or down, and it is necessary to move stack i before stack j if the new position of stack j overlaps the old position of stack i. Consider stacks 1, 2 , . . . , k in order, but keep a stack of "goals," each goal being to move a particular stack. If on considering stack i, we can move it safely, do so, and then reconsider the stack whose number is on top of the goal stack. If we cannot safely move stack i, push i onto the goal stack. What is an appropriate implementation for the goal stack in (b)? Do we really need to keep it as a list of integers, or will a more succinct representation do? Implement makenewtops in such a way that space above each stack is proportional to the current size of that stack. What modifications of Fig. 2.27 are needed to make the implementation work for queues? For general lists?
2.23	Modify the implementations of POP and ENQUEUE in Sections 2.3 and 2.4 to return the element removed from the stack or queue. What modifications must be made if the element type is not a type that can be returned by a function?
2.24	Use a stack to eliminate recursion from the following procedures. a. function comb ( n, m: integer ): integer; { computes ( ) assuming 0 ?/FONT> m ?/FONT> n and n ?/FONT> 1 } begin if (n = 1) or (m = 0) or (m = n) then return (1) else return (comb(n-1, m) + comb(n-1, m-1)) end; { comb } Fig. 2.27. Many stacks in one array. b. procedure reverse ( var L: LIST ); { reverse list L } var x: elementtype; begin if not EMPTY(L) then begin x := RETRIEVE(FIRST(L), L); DELETE(FIRST(L), L); reverse(L); INSERT(x, END(L), L) end end; { reverse }
*2.25	Can we eliminate the tail recursion from the programs in Exercise 2.24? If so, do it.

Bibliographic Notes

Knuth [1968] contains additional information on the implementation of lists, stacks, and queues. A number of programming languages, such as LISP and SNOBOL, support lists and strings in a convenient manner. See Sammet [1969], Nicholls [1975], Pratt [1975], or Wexelblat [1981] for a history and description of many of these languages.

† Strictly speaking, the type is "LIST of elementtype." However, the implementations of lists we propose do not depend on what elementtype is; indeed, it is that independence that justifies the importance we place on the list concept. We shall use "LIST" rather than "LIST of elementtype," and similarly treat other ADT's that depend on the types of elements.

† In this case, if we eliminate records that are "the same" we might wish to check that the names and addresses are also equal; if the account numbers are equal but the other fields are not, two people may have inadvertently gotten the same account number. More likely, however, is that the same subscriber appears on the list more than once with distinct account numbers and slightly different names and/or addresses. In such cases, it is difficult to eliminate all duplicates.

† Even though L is not modified, we pass L by reference because frequently it will be a big structure and we don't want to waste time copying it.

† Making the header a complete cell simplifies the implementation of list operation in Pascal. We can use pointers for headers if we are willing to implement our operations so they do insertions and deletions at the beginning of a list in a special way. See the discussion under cursor-based implementation of lists in this section.

† Of course, there are many situations where we would like p to continue to represent the position of c.

† Incidentally, it is common practice to make the header of a doubly linked list be a cell that effectively "completes the circle." That is, the header's previous field points to the last cell and next points to the first cell. In this manner, we need not check for nil pointers in Fig. 2.14.

† Note that "consecutive" must be taken in circular sense. That is, a queue of length four could occupy the last two and first two positions of the array, for example.

† For example, firstvalue = 'A' and lastvalue = 'Z' if domaintype is 'A'..'Z'.

† In the "real" knapsack problem, we are given utility values as well as weights and are asked to maximize the utility of the items carried, subject to a weight constraint.

Basic Abstract DataTypes

2.1 The Abstract Data Type "List"

2.2 Implementation of Lists

Array Implementation of Lists

Pointer Implementation of Lists

Comparison of Methods

Cursor-Based Implementation of Lists

Doubly-Linked Lists

2.3 Stacks

An Array Implementation of Stacks

2.4 Queues

A Pointer Implementation of Queues

A Circular Array Implementation of Queues