Chapter 16

Chapter 16. THE `string` CLASS AND THE STANDARD TEMPLATE LIBRARY

You will learn about the following in this chapter:

The string Class
The auto_ptr Class
The Standard Template Library
Generic Programming
Function Objects (aka Functors)
Algorithms
Other Libraries
Summary
Review Questions
Programming Exercises

By now you are familiar with the C++ goal of reusable code. One of the big payoffs is when you can reuse code written by others. That's where class libraries come in. There are many commercially available C++ class libraries, and there also are the libraries that come as part of the C++ package. For example, you've been using the input/output classes supported by the ostream header file. This chapter will look at other reusable code available for your programming pleasure. First, the chapter examines the string class, which simplifies programming with strings. Then it looks at auto_ptr, a "smart pointer" template class that makes managing dynamic memory a bit easier. Finally, it looks at the Standard Template Library (or STL), a collection of useful templates for handling various kinds of container objects. STL exemplifies a recent programming paradigm called generic programming.

The `string` Class

Many programming applications need to process strings. C provides some support with its string.h (cstring in C++) family of string functions, and many early C++ implementations provided home-grown classes to handle strings. Chapter 12, "Classes and Dynamic Memory Allocation," introduced a modest String class to illustrate some aspects of class design. ANSI/ISO C++ itself provides a more powerful version called the string class. It is supported by the string header file.(Note that the string.h and cstring header files support the C library string functions for C-style strings, not the string class.) The key to using a class is knowing its public interface, and the string class has an extensive set of methods, including several constructors, overloaded operators for assigning strings, concatenating strings, comparing strings, and accessing individual elements, as well as utilities for finding characters and substrings in a string, and more. In short, the string class has lots of stuff.

Constructing a String

Let's begin with looking at the string constructors. After all, one of the most important things to know about a class is what your options are when creating objects of that class. Listing 16.1 uses all six of the string constructors (labeled ctor, the traditional C++ abbreviation for constructor). Table 16.1 briefly describes the constructors in the order used in the program. The constructor representations are simplified in that they conceal the fact that string really is a typedef for a template specialization basic_string<char> and that they omit an optional argument relating to memory management. (This aspect is discussed later this chapter and in Appendix F, "The String Template Class.") The type size_type is an implementation-dependent integral type defined in the string header file. The class defines string::npos as the maximum possible length of the string. Typically, this would equal the maximum value of an unsigned int. Also, the table uses the common abbreviation NBTS for null-byte-terminated string, that is, the traditional C string, which is terminated with a null character.

Table 16.1. `string` Class Constructors
Constructor	Description
`string(const char *` `s)`	Initializes `string` object to NBTS pointed to by `s`.
`string(size_type` `n, char` `c)`	Creates a `string` object of `n` elements, each initialized to the character `c`.each initialized to the character
string(const string & str, string size_type pos = 0, size_type n = npos)	Initializes `string` object to the object `str`, starting at position `pos` in `str` and going to end of `str` or using `n` characters, whichever comes first.
`string()`	Creates a default `string` object of 0 size.
`string(const char *` `s, size_type` `n)`	Initializes `string` object to NBTS pointed to by `s` and continues for `n` characters even if that exceeds the size of the NBTS.
template<class Iter> string(Iter begin, Iter end)	Initializes `string` object to the values in the range [`begin`, `end`), where `begin` and `end` act like pointers and specify locations; the range includes `begin` and is up to but not including `end`.

The program also uses the overloaded += operator, which appends one string to another, the overloaded = operator for assigning one string to another, the overloaded << operator for displaying a string object, and the overloaded [] operator for accessing an individual character in a string.

Listing 16.1 `str1.cpp`

// str1.cpp -- introducing the string class
#include <iostream>
#include <string>
using namespace std;
// using string constructors

int main()
{
    string one("Lottery Winner!");    // ctor #1
    cout << one << endl;              // overloaded <<
    string two(20, '$');              // ctor #2
    cout << two << endl;
    string three(one);                // ctor #3
    cout << three << endl;
    one += " Oops!";                  // overloaded +=
    cout << one << endl;
    two = "Sorry! That was ";
    three[0] = 'P';
    string four;                      // ctor #4
    four = two + three;               // overloaded +, =
    cout << four << endl;
    char alls[] = "All's well that ends well";
    string five(alls,20);             // ctor #5
    cout << five << "!\n";
    string six(alls+6, alls + 10);    // ctor #6
    cout << six  << ", ";
    string seven(&five[6], &five[10]);// ctor #6 again
    cout << seven << "...\n";

    return 0;
}

Compatibility Note

graphics/hands.gif

Some older string implementations do not support ctor #6.

Here is the program's output:

Lottery Winner!
$$$$$$$$$$$$$$$$$$$$
Lottery Winner!
Lottery Winner! Oops!
Sorry! That was Pottery Winner!
All's well that ends!
well, well...

Program Notes

The start of the program illustrates that you can initialize a string object to a regular C-style string and display it using the overloaded << operator:

string one("Lottery Winner!");    // ctor #1
cout << one << endl;              // overloaded <<

The next constructor initializes the string object two to a string consisting of 20 $ characters:

string two(20, '$');              // ctor #2

The copy constructor initializes the string object three to the string object one:

string three(one);                // ctor #3

The overloaded += operator appends the string " Oops!" to the string one:

one += " Oops!";                  // overloaded +=

This particular example appends a C-style string to a string object. However, the += operator is multiply overloaded so that you also can append string objects and single characters:

one += two;   // append a string object (not in program)
one += '!';   // append a type char value (not in program)

Similarly, the = operator is overloaded so that you can assign a string object to a string object, a C-style string to a string object, or a simple char value to a string object:

two = "Sorry! That was "; // assign a C-style string
two = one;                // assign a string object (not in program)
two = '?';                // assign a char value (not in program)

Overloading the [] operator, as the String example in Chapter 13, "Class Inheritance," did, permits access to individual characters in a string object by using array notation:

three[0] = 'P';

A default constructor creates an empty string that later can be given a value:

string four;                      // ctor #4
four = two + three;               // overloaded +, =

The second line uses the overloaded + operator to create a temporary string object which is then assigned, using the overloaded = operator, to the four object. As you might expect, the + operator concatenates its two operands into a single string object. The operator is multiply overloaded so the second operand can be a string object or a C-style string or a char value.

The fifth constructor takes a C-style string and an integer as arguments, with the integer indicating how many characters to copy:

char alls[] = "All's well that ends well";
string five(alls,20);             // ctor #5

Here, as the output shows, just the first 20 characters ("All's well that ends") are used to initialize the five object. As Table 16.1 notes, if the character count exceeds the length of the C-style string, the requested number of characters are still copied. So replacing 20 with 40 in the above example would result in 15 junk characters being copied at the end of five. (That is, the constructor would interpret the contents in memory following the string "All's well that ends well" as character codes.)

The sixth constructor has a template argument:

template<class Iter> string(Iter begin, Iter end);

The intent is that begin and end act like pointers pointing to two locations in memory. (In general, begin and end can be iterators, generalizations of pointers extensively used in the STL.) The constructor then uses the values between the locations pointed to by begin and end to initialize the string object it constructs. The notation [begin,end), borrowed from mathematics, means the range includes begin but doesn't include end. That is, end points to a location one past the last value to be used. Consider the following statement:

string six(alls+6, alls + 10);    // ctor #6

Because the name of an array is a pointer, both alls + 6 and alls + 10 are type char *, so the template is used with Iter replaced by type char *. The first argument points to the first w in the alls array, and the second argument points to the space following the first well. Thus, six is initialized to the string "well". Figure 16.1 shows how the constructor works.

Figure 16.1. `string` constructor using a range.

graphics/16fig01.gif

Now suppose you want to use this constructor to initialize an object to part of another string object, say, the object five. The following does not work:

string seven(five + 6, five + 10);

The reason is that the name of an object, unlike the name of an array, is not treated as the address of an object, hence five is not a pointer and five + 6 is meaningless. However, five[6] is a char value, so &five[6] is an address and can be used as an argument to the constructor:

string seven(&five[6], &five[10]);// ctor #6 again

`string` Class Input

Another useful thing to know about a class is what input options are available. For C-style strings, recall, you have three options:

char info[100];
cin >> info;             // read a word
cin.getline(info, 100);  // read a line, discard \n
cin.get(info, 100);      // read a line, leave \n in queue

So what are the input options for a string object? First, the string class overloads the >> operator, much as the String class did in Chapter 12. Because the first operand is not a string object, the overloaded >> operator function is not a class method. Instead, it is a general function taking an istream object as its first argument and a string object as its second argument. To be consistent with how the >> operator treats C-strings, the string version also reads a single word, terminating input upon reaching a whitespace character, detecting end of file, or reaching the maximum allowable number of characters that can be stored in a string object. The function works by first erasing the contents of the target string and then reading and appending one character at a time. As long as the input sequence is shorter than the maximum allowable number of characters for a string object, the operator>>(istream &, string &) function automatically dimensions the string object to fit the input string. The upshot is that you can use >> with string objects just as you do with C-style strings, but without worrying about overrunning the array size:

char fname[10];
cin >> fname;      // could be a problem if input size > 9 characters
string lname;
cin >> lname;      // can read a very, very long word

Providing a getline() equivalent is a bit less transparent because getline(), unlike operator>>(), can't be used with operator notation. Instead, it uses membership notation:

cin.getline(fname, 10);

To get the same syntax to work with a string object would require adding a new member function to the istream class, which would not be wise. Instead, the string library defines a non-member function getline() that takes an istream object as its first argument and a string object as its second argument. Thus, it's used as follows to read a line of input into a string object:

string fullName;
getline(cin, fullName);  // instead of cin.getline(fname, 10);

The getline() function first erases the contents of the destination string and then reads one character at a time from the input queue, appending it to the string. This continues until the function reaches the end of the line, detects the end of file, or reaches maximum capacity of a string object. The newline character, if detected, is read but not stored in the string. Note that unlike the istream version, the string version doesn't have a size parameter indicating the maximum number of characters to read. That's because it dimensions the string object automatically to fit the input. Listing 16.2 illustrates using the two input options.

Listing 16.2 `str2.cpp`

// str2.cpp -- string input
#include <iostream>
#include <string>
using namespace std;

int main()
{
    string word;
    cout << "Enter a line: ";
    cin >> word;
    while (cin.get() != '\n')
        continue;
    cout << word << " is all I wanted.\n";

    string line;
    cout << "Enter a line (really!): ";
    getline(cin, line);
    cout << "Line: " << line << endl;
    return 0;
}

Compatibility Note

graphics/hands.gif

Microsoft Visual C++ 5.0 and 6.0 have a bug in the getline() implementation that causes output following getline() not to appear until something is entered again.

Here is a sample run:

Enter a line: Time and tide wait for no one.
Time is all I wanted.
Enter a line (really!): All things come to he who waits.
Line: All things come to he who waits.

Working with Strings

So far, you've learned that you can create string objects in a variety of ways, display the contents of a string object, read data into a string object, append to a string object, assign to a string object, and concatenate two string objects. What else can you do?

You can compare strings. All six relational operators are overloaded for string objects, with one object being considered less than another if it occurs earlier in the machine collating sequence. If the machine collating sequence is the ASCII code, that implies that digits are less than uppercase characters and uppercase characters are less than lowercase characters. Each relational operator is overloaded three ways so that you can compare a string object with another string object, compare a string object with a C-style string, and compare a C-style string with a string object:

string snake1("cobra");
string snake2("coral");
char snake3[20] = "anaconda";
if (snake1 < snake 2)           // operator<(const string &, const string &)
    ...
if (snake1 == snake3)           // operator==(const string &, const char *)
    ...
if (snake3 != snake2)           // operator!=(const char *, const string &)
    ...

You can determine the size of a string. Both the size() and length() member functions return the number of characters in a string:

if (snake1.length() == snake2.size())
    cout << "Both strings have the same length.\n"

Why two functions that do the same thing? The length() member comes from earlier versions of the string class, while size() was added for STL compatibility.

You can search a string for a given substring or character in a variety of ways. Table 16.2 provides a short description of four variations of a find() method. Recall that string::npos is the maximum possible number of characters in a string, typically the largest unsigned int or unsigned long value.

Table 16.2. The Overloaded `find()` Method
Method Prototype	Description
size_type find(const string & str, size_type pos = 0) const	Finds the first occurrence of the substring `str`, starting the search at location `pos` in the invoking string. Returns the index of the first character of the substring if found, and returns `string::npos` otherwise.
size_type find(const char * s, size_type pos = 0) const	Finds the first occurrence of the substring `s`, starting the search at location `pos` in the invoking string. Returns the index of the first character of the substring if found, and returns `string::npos` otherwise.
size_type find(const char * s, size_type pos = 0, size_type n)	Finds the first occurrence of the substring consisting of the first `n` characters in `s`, starting the search at location `pos` in the invoking string. Returns the index of the first character of the substring if found, and returns `string::npos` otherwise.
size_type find(char ch, size_type pos = 0) const	Finds the first occurrence of the character `ch`, starting the search at location `pos` in the invoking string. Returns the index of the character if found, and returns `string::npos` otherwise.

The library also provides the related methods rfind(), find_first_of(), find_last_of(), find_first_not_of(), and find_last_not_of(), each with the same set of overloaded function signatures as the find() method. The rfind() finds the last occurrence of a substring or character. The find_first_of() finds the first occurrence in the invoking string of any of the characters in the argument. For example, the statement:

int where = snake1.find_first_of("hark");

would return the location of the r in "cobra" (i.e., the index 3) because that's the first occurrence of any of the letters in "hark" in "cobra". The find_last_of() method works the same, except it finds the last occurrence. Thus, the statement:

int where = snake1.last_first_of("hark");

would return the location of the a in "cobra". The find_first_not_of() method finds the first character in the invoking string that is not a character in the argument. So

int where = snake1.find_first_not_of("hark");

would return the location of the c in cobra, for c is not found in hark. We leave the description of find_last_not_of() as an exercise for the reader.

There are many more methods, but these are enough to put together a sample program that's a graphically impaired version of the word game Hangman. It stores a list of words in an array of string objects, picks a word at random, and lets you guess letters in the word. Six wrong guesses, and you lose. The program uses the find() function to check your guesses and the += operator to build a string object to keep track of your wrong guesses. To keep track of your good guesses, the program creates a word the same length as the mystery word but consisting of hyphens. The hyphens are then replaced by correct guesses. Listing 16.3 shows the program.

Listing 16.3 `str3.cpp`

// str3.cpp -- some string methods
#include <iostream>
#include <string>
#include <cstdlib>
#include <ctime>
#include <cctype>
using namespace std;
const int NUM = 26;
const string wordlist[NUM] = {"apiary", "beetle", "cereal",
    "danger", "ensign", "florid", "garage", "health", "insult",
    "jackal", "keeper", "loaner", "manage", "nonce", "onset",
    "plaid", "quilt", "remote", "stolid", "train", "useful",
    "valid", "whence", "xenon", "yearn", "zippy"};

int main()
{
    srand(time(0));
    char play;
    cout << "Will you play a word game? <y/n> ";
    cin >> play;
    play = tolower(play);
    while (play == 'y')
    {
        string target = wordlist[rand() % NUM];
        int length = target.length();
        string attempt(length, '-');
        string badchars;
        int guesses = 6;
        cout << "Guess my secret word. It has " << length
            << " letters, and you guess\n"
            << "one letter at a time. You get " << guesses
            << " wrong guesses.\n";
        cout << "Your word: " << attempt << endl;
        while (guesses > 0 && attempt != target)
        {
            char letter;
            cout << "Guess a letter: ";
            cin >> letter;
            if (badchars.find(letter) != string::npos
                || attempt.find(letter) != string::npos)
            {
                cout << "You already guessed that. Try again.\n";
                    continue;
            }
            int loc = target.find(letter);
            if (loc == string::npos)
            {
                cout << "Oh, bad guess!\n";
                --guesses;
                badchars += letter; // add to string
            }
            else
            {
                cout << "Good guess!\n";
                attempt[loc]=letter;
                // check if letter appears again
                loc = target.find(letter, loc + 1);
                while (loc != string::npos)
                {
                    attempt[loc]=letter;
                    loc = target.find(letter, loc + 1);
                }
           }
            cout << "Your word: " << attempt << endl;
            if (attempt != target)
            {
                if (badchars.length() > 0)
                    cout << "Bad choices: " << badchars << endl;
                cout << guesses << " bad guesses left\n";
            }
        }
        if (guesses > 0)
            cout << "That's right!\n";
        else
            cout << "Sorry, the word is " << target << ".\n";
        cout << "Will you play another? <y/n> ";
        cin >> play;
        play = tolower(play);
    }

    cout << "Bye\n";

    return 0;
}

Here's a sample run:

Will you play a word game? <y/n> y
Guess my secret word. It has 6 letters, and you guess
one letter at a time. You get 6 wrong guesses.
Your word: ------
Guess a letter: e
Oh, bad guess!
Your word: ------
Bad choices: e
5 bad guesses left
Guess a letter: a
Good guess!
Your word: a--a--
Bad choices: e
5 bad guesses left
Guess a letter: t
Oh, bad guess!
Your word: a--a--
Bad choices: et
4 bad guesses left
Guess a letter: r
Good guess!
Your word: a--ar-
Bad choices: et
4 bad guesses left
Guess a letter: y
Good guess!
Your word: a--ary
Bad choices: et
4 bad guesses left
Guess a letter: i
Good guess!
Your word: a-iary
Bad choices: et
4 bad guesses left
Guess a letter: p
Good guess!
Your word: apiary
That's right!
Will you play another? <y/n> n
Bye

Program Notes

The fact that the relational operators are overloaded lets you treat strings in the same fashion you would treat numeric variables:

while (guesses > 0 && attempt != target)

This is easier to follow than, say, using strcmp() with C-style strings.

The program uses find() to check if a character has been selected earlier; if it has been selected, it will be found in either the badchars string (bad guesses) or in the attempt string (good guesses):

if (badchars.find(letter) != string::npos
    || attempt.find(letter) != string::npos)

The npos variable is a static member of the string class. Its value, recall, is the maximum allowable number of characters for a string object. Therefore, because indexing begins at 0, it is 1 greater than the largest possible index and can be used to indicate failure to find a character or a string.

The program makes use of the fact that one of the overloaded versions of the += operator lets you append individual characters to a string:

badchars += letter;  // append a char to a string object

The heart of the program begins by checking if the chosen letter is in the mystery word:

int loc = target.find(letter);

If loc is a valid value, the letter can be placed in the corresponding location in the answer string:

attempt[loc]=letter;

However, a given letter may occur more than once in the mystery word, so the program has to keep checking. Here the program uses the optional second argument to find(), which lets you specify a starting place in the string from which to begin the search. Because the letter was found at location loc, the next search should begin at loc + 1. A while loop keeps the search going until no more occurrences of that character are found. Note that find() indicates failure if loc is after the end of the string:

// check if letter appears again
loc = target.find(letter, loc + 1);
while (loc != string::npos)
{
    attempt[loc]=letter;
    loc = target.find(letter, loc + 1);
}

Real World Note: Overloading C Functions to Use string Objects

graphics/common.gif

You can use the overloaded == operator to compare string objects. However, the case-sensitive nature in which the == operator performs its equality comparison can be a problem in some cases. Often, two strings need to be compared for equality without respect to their case. For example, a program may compare input from a user with a constant value, and the user may not use the same case. Consider the following sample:

#include <string>       //  string object

...
string strA;
cin >> strA;  // assume user enters Maplesyrup
string strB = "mapleSyrup";  // stored constand

if( strA == strB )
{
  cout << "The strings are equal.\n";
}
else
{
  cout << "The strings are not equal.\n";
}

Because 'M' is different from 'm' and 's' is different from 'S', the output is this:

The strings are not equal.

What if your program needed to perform a case-insensitive comparison on strA and strB? Many C libraries provide a stricmp() or _stricmp() function that does a case-insensitive test. (However, this function isn't listed in the C standard, so it's not universally available.) By creating your own overloaded version of this function, you can cobble together a simple workaround.

#include <string.h>  // for stricmp() on many systems
#include <string>    // string object

string strA;
cin >> strA;  // assume user enters Maplesyrup
string strB = "mapleSyrup";  // stored constant
inline bool stricmp( const std::string& strA,
                     const std::string& strB )  // overloaded function
{
  return stricmp( strA.c_str(), strB.c_str() ) == 0;    // C function
}

bool bStringsAreEqual = stricmp( strA, strB );

Using simplified syntax, you can now compare two strings for equality without regard to case.

What Else?

The string library supplies many other facilities. There are functions for erasing part or all of a string, for replacing part or all of one string with part or all of another string, for inserting material into a string or removing material from a string, for comparing part or all of one string with part or all of another string, and for extracting a substring from a string. There's a function for copying part of one string to another string, and a function for swapping the contents of two strings. Most of these functions are overloaded so that they can work with C-style strings as well as with string objects. Appendix F describes the string library function briefly.

This section has treated the string class as if it were based on the char type. In fact, as mentioned earlier, the string library really is based on a template class:

template<class charT, class traits = char _traits<charT>,
        class Allocator = allocator<charT> >
basic_string {...};

The class includes the following two typedefs:

typedef basic_string<char> string;
typedef basic_string<wchar_t> wstring;

This allows you to use strings based on the wchar_t as well as the char type. You even could develop some sort of character-like class and use the basic_string class template with it, providing your class met certain requirements. The traits class is a class that describes specific facts about the chosen character type, such as how to compare values. There are predefined specializations of the char_traits template for the char and wchar_t types, and these are the default values for traits. The Allocator class represents a class to manage memory allocation. There are predefined specializations of the allocator template for the char and wchar_t types, and these are the defaults. They use new and delete in the usual fashion, but you could reserve a chunk of memory and provide your own allocation methods.

The `auto_ptr` Class

The auto_ptr class is a template class for managing the use of dynamic memory allocation. Let's take a look at what might be needed and how it can be accomplished. Consider the following function:

void remodel(string & str)
{
    string * ps = new string(str);
    ...
    str = ps;
    return;
}

You probably see its flaw. Each time the function is called, it allocates memory from the heap but never returns it, creating a memory leak. You also know the solution—just remember to free the allocated memory by adding the following statement just before the return statement:

delete ps;

However, a solution involving the phrase "just remember to" seldom is the best solution. Sometimes you won't remember. Or you will remember but accidentally remove or comment out the code. And even if you do remember, there still can be problems. Consider the following variation:

void remodel(string & str)
{
    string * ps = new string(str);
    ...
    if (weird_thing())
        throw exception();
    str = *ps;
    delete ps;
    return;
}

If the exception is thrown, the delete statement isn't reached, and again there is a memory leak.

You can fix that oversight, as illustrated in Chapter 14, "Reusing Code in C++," but it would be nice if there were a neater solution. Let's think about what is needed. When a function like remodel() terminates, either normally or by throwing an exception, local variables are removed from the stack memory—so the memory occupied by the pointer ps is freed. What would be nice is if the memory pointed to by ps were also freed. That means that you would want the program to take an additional action when ps expires. That extra service is not provided for basic types, but it can be provided for classes via the destructor mechanism. Thus, the problem with ps is that it is just an ordinary pointer and not a class object. If it were an object, you could have its destructor delete the pointed-to memory when the object expires. And that is the idea behind auto_ptr.

Using an `auto_ptr`

The auto_ptr template defines a pointer-like object intended to be assigned an address obtained (directly or indirectly) by new. When an auto_ptr object expires, its destructor uses delete to free the memory. Thus, if you assign an address returned by new to an auto_ptr object, you don't have to remember to free the memory later; it will be freed automatically when the auto_ptr object expires. Figure 16.2 illustrates the behavioral difference between an auto_ptr and a regular pointer.

Figure 16.2. Regular pointer versus `auto_ptr`.

graphics/16fig02.gif

To create an auto_ptr object, include the memory header file, which includes the auto_ptr template. Then use the usual template syntax to instantiate the kind of pointer you require. The template includes the following:

template<class X> class auto_ptr {
public:
    explicit auto_ptr(X* p =0) throw();
...};

(The throw() notation, recall, means this constructor doesn't throw an exception.) Thus, asking for an auto_ptr of type X gives you an auto_ptr pointing to type X:

auto_ptr<double> pd(new double);  // an auto_ptr to double
                                  // (use in place of double *)
auto_ptr<string> ps(new string);  // an auto_ptr to string
                                  // (use in place of string *)

Here new double is a pointer returned by new to a newly allocated chunk of memory. It is the argument to the auto_ptr<double> constructor; that is, it is the actual argument corresponding to the formal parameter p in the prototype. Similarly, new string also is an actual argument for a constructor.

Thus, to convert the remodel() function, you would follow these three steps:

Include the memory header file.
Replace the pointer-to-string with an auto_ptr to string.
Remove the delete statement.

Here's the function with those changes made:

#include <memory>
void remodel(string & str)
{
    auto_ptr<string> ps (new string(str));
    ...
    if (weird_thing())
        throw exception();
    str = *ps;
    // delete ps;  NO LONGER NEEDED
    return;
}

Note that auto_ptr constructor is explicit, meaning there is no implicit type cast from a pointer to an auto_ptr:

auto_ptr<double> pd;
double *p_reg = new double;
pd = p_reg;                   // not allowed (implicit conversion)
pd = auto_ptr<double>(p_reg); // allowed (explicit conversion
auto_ptr<double> pauto = pd;  // not allowed (implicit conversion)
auto_ptr<double> pauto(pd);   // allowed (explicit conversion

The auto_ptr is an example of a smart pointer, an object that acts like a pointer, but with additional features. The auto_ptr class is defined so that, in most respects, it acts like a regular pointer. For example, given that ps is an auto_ptr, you can dereference it (*ps), increment it (++ps), use it to access structure members (ps->puffIndex), and assign it to a regular pointer that points to the same type. You also can assign one auto_ptr to another of the same type, but that raises an issue that the next section will face.

The template allows you to initialize an auto_ptr object to an ordinary pointer via a constructor:

`auto_ptr` Considerations

The auto_ptr is not a panacea. For example, consider the following code:

auto_ptr<int> pi(new int [200]);   // NO!

Remember, you must pair delete with new and delete [] with new []. The auto_ptr template uses delete, not delete [], so it only can be used with new, not new []. There is no auto_ptr equivalent for use with dynamic arrays. You could copy the auto_ptr template from the memory header file, rename it auto_arr_ptr, and convert the copy to use delete [] instead of delete. In that case, you would want to add support for the [] operator.

What about this?

string vacation("I wandered lonely as a cloud.");
auto_ptr<string> pvac(&vacation);  // NO!

This would apply the delete operator to non-heap memory, which is wrong.

Caution

graphics/tnt.gif

Use an auto_ptr object only for memory allocated by new, not for memory allocated by new [] or by simply declaring a variable.

Now consider assignment:

auto_ptr<string> ps (new string("I reigned lonely as a cloud."));
auto_ptr<string> vocation;
vocation = ps;

What should the assignment statement accomplish? If ps and vocation were ordinary pointers, the result would be two pointers pointing to the same string object. That is not acceptable here, for then the program would wind up attempting to delete the same object twice, once when ps expires, and once when vocation expires. There are ways to avoid this problem:

Define the assignment operator so that it makes a deep copy. This results in two pointers pointing to two distinct objects, one of which is a copy of the other.
Institute the concept of ownership, with only one smart pointer allowed to own a particular object. Only if the smart pointer owns the object will its constructor delete the object. Then have assignment transfer ownership. This is the strategy used for auto_ptr.
Create an even smarter pointer that keeps track of how many smart pointers refer to a particular object. This is called reference counting. Assignment, for example, would increase the count by one, and the expiration of a pointer would decrease the count by one. Only when the final pointer expires would delete be invoked.

The same strategies, of course, would also be applied to the copy constructors.

Each approach has its uses. Here's a situation, for example, that may not work properly using auto_ptr objects:

auto_ptr<string> films[5] =
{
    auto_ptr<string> (new string("Fowl Balls")),
    auto_ptr<string> (new string("Duck Walks")),
    auto_ptr<string> (new string("Chicken Runs")),
    auto_ptr<string> (new string("Turkey Errors")),
    auto_ptr<string> (new string("Goose Eggs"))
};
auto_ptr<string> pwin(films[2]);
int i;
cout << "The nominees for best avian baseball film are\n";
for (i = 0; i < 5; i++)
    cout << *films[i] << endl;
cout << "The winner is " << *pwin << "!\n";

The problem is that transferring ownership from films[2] to pwin may cause films[2] to no longer refer to the string. That is, after an auto_ptr object gives up ownership, it may no longer be usable. Whether it's usable or not is an implementation choice.

Smart Pointers

graphics/common.gif

The C++ library auto_ptr is an example of a smart pointer. A smart pointer is a class designed so that objects of that class have pointer-like properties. For example, a smart pointer can store the address of memory allocated by new and can be dereferenced. Because a smart pointer is a class object, it can modify and augment the behavior of a simple pointer. For instance, a smart pointer can institute reference counting. This allows several objects to share a single representation of a value tracked by a smart pointer. When the number of objects using the value drops to zero, the smart pointer can then delete the value. Smart pointers can allow for more efficient use of memory and help prevent memory leaks, but they do require the user to become familiar with new programming techniques.

The Standard Template Library

The Standard Template Library, or STL, provides a collection of templates representing containers, iterators, function objects, and algorithms. A container is a unit, like an array, that can hold several values. STL containers are homogeneous, that is, they hold values all of the same kind. Algorithms are recipes for accomplishing particular tasks, such as sorting an array or finding a particular value in a list. Iterators are objects that let you move through a container much as pointers let you move through an array; they are generalizations of pointers. Function objects are objects that act like functions; they can be class objects or function pointers (which includes function names because a function name acts as a pointer). The STL lets you construct a variety of containers, including arrays, queues, and lists, and lets you perform a variety of operations, including searching, sorting, and randomizing.

Alex Stepanov and Meng Lee developed STL at Hewlett-Laboratories, releasing the implementation in 1994. The ISO/ANSI C++ committee voted to incorporate it as a part of the C++ standard. The STL is not an example of object-oriented programming. Instead, it represents a different programming paradigm called generic programming. This makes STL interesting both in terms of what it does and in its approach. There's too much information about the STL to present in a single chapter, so we'll look at some representative examples and examine the spirit of the approach. We'll begin by looking at a few specific examples. Then, once you have a hands-on appreciation for containers, iterators, and algorithms, we'll look at the underlying design philosophy, and then take an overview of whole STL. Appendix G, "The STL Methods and Functions," summarizes the various STL methods and functions.

The `vector` Template Class

In computing, the term vector corresponds to an array rather than to the mathematical vector discussed in Chapter 11, "Working with Classes." (Mathematically, an N-dimensional mathematical vector can be represented by a set of N components, so, in that aspect, a mathematical vector is like an N-dimensional array. However, a mathematical vector has additional properties, such as inner and outer products, that a computer vector doesn't necessarily have.) A computing-style vector holds a set of like values that can be accessed randomly. That is, you can use, say, an index to directly access the tenth element of a vector without having to access the preceding nine elements first. So a vector class would provide operations similar to that of the ArrayDb class of Chapter 14. That is, you could create a vector object, assign one vector object to another, and use the [] operator to access vector elements. To make the class generic, make it a template class. That's what the STL does, defining a vector template in the vector (formerly vector.h) header file.

To create a vector template object, you use the usual <type> notation to indicate the type to be used. Also, the vector template uses dynamic memory allocation, and you can use an initialization argument to indicate how many vector elements you want:

#include vector
using namespace std;
vector<int> ratings(5);       // a vector of 5 ints
int n;
cin >> n;
vector<double> scores(n);     // a vector of n doubles

After you create a vector object, operator overloading for [] makes it possible to use the usual array notation for accessing individual elements:

ratings[0] = 9;
for (int i = 0; i < n; i++)
    cout << scores[i] << endl;

Allocators Again

graphics/common.gif

Like the string class, the various STL container templates take an optional template argument specifying what allocator object to use to manage memory. For example, the vector template begins like this:

template <class T, class Allocator = allocator<T> >
    class vector {...

If you omit a value for this template argument, the container template uses the allocator<T>s by default. This class uses new and delete in the standard ways.

Listing 16.4 uses this class in an undemanding application. This particular program creates two vector objects, one an int specialization and one a string specialization; each has five elements.

Listing 16.4 `vect1.cpp`

// vect1.cpp -- introducing the vector template
#include <iostream>
#include <string>
#include <vector>
using namespace std;

const int NUM = 5;
int main()
{
    vector<int> ratings(NUM);
    vector<string> titles(NUM);
    cout << "You will do exactly as told. You will enter\n"
         << NUM << " book titles and your ratings (0-10).\n";
    int i;
    for (i = 0; i < NUM; i++)
    {
        cout << "Enter title #" << i + 1 << ": ";
        getline(cin,titles[i]);
        cout << "Enter your rating (0-10): ";
        cin >> ratings[i];
        cin.get();
    }
    cout << "Thank you. You entered the following:\n"
          << "Rating\tBook\n";
    for (i = 0; i < NUM; i++)
    {
        cout << ratings[i] << "\t" << titles[i] << endl;
    }

    return 0;
}

Compatibility Note

graphics/hands.gif

Older implementations use vector.h instead of the vector header file. Although the order of include files shouldn't matter, some older versions of g++ require the string header file to appear before STL header files. The Microsoft Visual C++ 6.0 getline() has a bug that messes up synchronization of input and output.

Here's a sample run:

You will do exactly as told. You will enter
5 book titles and your ratings (0-10).
Enter title #1: The Cat Who Knew C++
Enter your rating (0-10): 6
Enter title #2: Felonious Felines
Enter your rating (0-10): 4
Enter title #3: Warlords of Wonk
Enter your rating (0-10): 3
Enter title #4: Don't Touch That Metaphor
Enter your rating (0-10): 5
Enter title #5: Panic Oriented Programming
Enter your rating (0-10): 8
Thank you. You entered the following:
Rating  Book
6       The Cat Who Knew C++
4       Felonious Felines
3       Warlords of Wonk
5       Don't Touch That Metaphor
8       Panic Oriented Programming

All this program does is use the vector template as a convenient way to create a dynamically allocated array. Let's see an example that uses more of the class methods.

Things to Do to Your Vectors

What else can the vector template do for you? All the STL containers provide certain basic methods, including size(), which returns the number of elements in a container, swap(), which exchanges the contents of two containers, begin(), which returns an iterator referring to the first element in a container, and end(), which returns an iterator representing past-the-end for the container.

What's an iterator? It's a generalization of a pointer. In fact, it can be a pointer. Or it can be an object for which pointer-like operations such as dereferencing (operator*()) and incrementing (operator++()) have been defined. As you'll see later, generalizing pointers to iterators allows the STL to provide a uniform interface for a variety of container classes, including ones for which simple pointers wouldn't work. Each container class defines a suitable iterator. The type name for this iterator is a class scope typedef called iterator. For example, to declare an iterator for a type double specialization of vector, you would do this:

vector<double>::iterator pd;  // pd an iterator

Suppose scores is a vector<double> object:

vector<double> scores;

Then you can use the iterator pd do things like the following:

pd = scores.begin();  // have pd point to the first element
*pd = 22.3;           // dereference pd and assign value to first element
++pd;                 // make pd point to the next element

As you can see, an iterator behaves like a pointer.

What's past-the-end? It is an iterator referring to an element one past the last element in a container. The idea is similar to the null character being one element past the last actual character in a C-style string, except that null character is the value in the element and past-the-end is a pointer (or iterator) to the element. The end() member function identifies the past-the-end location. If you set an iterator to the first element in a container and keep incrementing it, eventually it will reach past-the-end, and you will have traversed the entire contents of the container. Thus, if scores and pd are defined as above, you can display the contents with this code:

for (pd = scores.begin(); pd != scores.end(); pd++)
    cout << *pd << endl;;

All containers have the methods just discussed. The vector template class also has some methods that only some STL containers have. One handy method, called push_back() adds an element to the end of a vector. While doing so, it attends to memory management so that the vector size increases to accommodate added members. This means you can write code like the following:

vector<double> scores;  // create an empty vector
double temp;
while (cin >> temp && temp >= 0)
    scores.push_back(temp);
cout << "You entered " << scores.size() << " scores.\n";

Each loop cycle adds one more element to the scores vector. You don't have to pick the number of element when you write the program or when you run the program. As long as the program has access to sufficient memory, it will expand the size of scores as necessary.

There is an erase() method that removes a given range of a vector. It takes two iterator arguments defining the range to be removed. It's important to understand how the STL defines ranges using two iterators. The first iterator refers to the beginning of the range, and the second iterator is one beyond the end of the range. For example:

scores.erase(scores.begin(), scores.begin() + 2);

erases the first and second elements, that is, those referred to by begin() and begin() + 1. (Because vector provides random access, operations like begin() + 2 are defined for the vector class iterators.) If it1 and it2 are two iterators, the STL literature uses the notation [p1,p2) to indicate a range starting with p1 and going up to, but not including, p2. Thus, the range [begin(), end()) encompasses the entire contents of a collection (see Figure 16.3). Also, the range [p1,p1) is an empty range. The [) notation is not part of C++, so it doesn't appear in code; it just appears in documentation.

Figure 16.3. The STL range concept.

graphics/16fig03.gif

Remember

A range [it1, it2) is specified by two iterators it1 and it2, and it runs from it1 up to, but not including, it2.

An insert() method complements erase(). It takes three iterator arguments. The first gives the position ahead of which new elements are to be inserted. The second and third iterator parameters define the range to be inserted. This range typically is part of another object. For example, the code

vector<int> old;
vector<int> new;
...
old.insert(old.begin(), new.begin() + 1, new.end());

inserts all but the first element of the new vector ahead of the first element of the old vector. Incidentally, this is a case where having a past-the-end element is handy, for it makes it simple to append something to the end of a vector:

old.insert(old.end(), new.begin() + 1, new.end());

Here the new material is inserted ahead of old.end(), meaning it's placed after the last element in the vector.

Listing 16.5 illustrates the use of size(), begin(), end(), push_back(), erase(), and insert(). To simplify data-handling, the rating and title components of Listing 16.4 are incorporated into a single Review structure, and FillReview() and ShowReview() functions provide input and output facilities for Review objects.

Listing 16.5 `vect2.cpp`

// vect2.cpp -- methods and iterators
#include <iostream>
#include <string>
#include <vector>
using namespace std;

struct Review {
    string title;
    int rating;
};
bool FillReview(Review & rr);
void ShowReview(const Review & rr);
int main()
{
    vector<Review> books;
    Review temp;
    while (FillReview(temp))
        books.push_back(temp);
    cout << "Thank you. You entered the following:\n"
          << "Rating\tBook\n";
    int num = books.size();
    for (int i = 0; i < num; i++)
        ShowReview(books[i]);
    cout << "Reprising:\n"
          << "Rating\tBook\n";
    vector<Review>::iterator pr;
    for (pr = books.begin(); pr != books.end(); pr++)
        ShowReview(*pr);
    vector <Review> oldlist(books);     // copy constructor used
    if (num > 3)
    {
        // remove 2 items
        books.erase(books.begin() + 1, books.begin() + 3);
        cout << "After erasure:\n";
        for (pr = books.begin(); pr != books.end(); pr++)
            ShowReview(*pr);
        // insert 1 item
        books.insert(books.begin(), oldlist.begin() + 1,
                     oldlist.begin() + 2);
        cout << "After insertion:\n";
        for (pr = books.begin(); pr != books.end(); pr++)
            ShowReview(*pr);
    }
    books.swap(oldlist);
    cout << "Swapping oldlist with books:\n";
    for (pr = books.begin(); pr != books.end(); pr++)
        ShowReview(*pr);
    return 0;
}
bool FillReview(Review & rr)
{
    cout << "Enter book title (quit to quit): ";
    getline(cin,rr.title);
    if (rr.title == "quit")
        return false;
    cout << "Enter book rating: ";
    cin >> rr.rating;
    if (!cin)
        return false;
    cin.get();
    return true;
}

void ShowReview(const Review & rr)
{
    cout << rr.rating << "\t" << rr.title << endl;
}

Compatibility Note

graphics/hands.gif

Older implementations use vector.h instead of the vector header file. Although the order of include files shouldn't matter, some older versions of g++ require the string header file to appear before STL header files. Microsoft Visual C++ 6.0 has a bug in its getline() implementation such that the following output doesn't appear until something is entered again. (In this example, the bug requires you hit Enter twice after entering a title.)

Here is a sample program run:

Enter book title (quit to quit): The Cat Who Knew Vectors
Enter book rating: 5
Enter book title (quit to quit): Candid Canines
Enter book rating: 7
Enter book title (quit to quit): Warriors of Wonk
Enter book rating: 4
Enter book title (quit to quit): Quantum Manners
Enter book rating: 8
Enter book title (quit to quit): quit
Thank you. You entered the following:
Rating  Book
5       The Cat Who Knew Vectors
7       Candid Canines
4       Warriors of Wonk
8       Quantum Manners
Reprising:
Rating  Book
5       The Cat Who Knew Vectors
7       Candid Canines
4       Warriors of Wonk
8       Quantum Manners
After erasure:
5       The Cat Who Knew Vectors
8       Quantum Manners
After insertion:
7       Candid Canines
5       The Cat Who Knew Vectors
8       Quantum Manners
Swapping oldlist with books:
5       The Cat Who Knew Vectors
7       Candid Canines
4       Warriors of Wonk
8       Quantum Manners

More Things to Do to Your Vectors

There are many things programmers commonly do with arrays, such as search them, sort them, randomize the order, and so on. Does the vector template class have methods for these common operations? No! The STL takes a broader view, defining non-member functions for these operations. Thus, instead of defining a separate find() member function for each container class, it defines a single find() non-member function that can be used for all container classes. This design philosophy saves a lot of duplicate work. For example, suppose you had 8 container classes and 10 operations to support. If each class had its own member function, you'd need 8*10 or 80 separate member function definitions. But with the STL approach, you'd need just 10 separate non-member function definitions. And if you defined a new container class, providing you followed the proper guidelines, it, too, could use the existing 10 non-member functions to find, sort, and so on.

Let's examine three representative STL functions: for_each(), random_shuffle(), and sort(). The for_each() function can be used with any container class. It takes three arguments. The first two are iterators defining a range in the container, and the last is a pointer to a function. (More generally, the last argument is a function object; you'll learn about them presently.) The for_each() function then applies the pointed-to function to each container element in the range. The pointed-to function must not alter the value of the container elements. You can use the for_each() function instead of a for loop. For example, you can replace the code:

vector<Review>::iterator pr;
for (pr = books.begin(); pr != books.end(); pr++)
    ShowReview(*pr);

with the following:

for_each(books.begin(), books.end(), ShowReview);

This enables you to avoid dirtying your hands (and code) with explicit use of iterator variables.

The random_shuffle() function takes two iterators specifying a range and rearranges the elements in that range in random order. For example, the statement

random_shuffle(books.begin(), books.end());

randomly rearranges the order of all the elements in the books vector. Unlike for_each, which works with any container class, this function requires that the container class allow random access, which the vector class does.

The sort() function, too, requires that the container support random access. It comes in two versions. The first version takes two iterators defining a range, and it sorts that range using the < operator defined for the type element stored in the container. For example,

vector<int> coolstuff;
...
sort(coolstuff.begin(), coolstuff.end());

sorts the contents of coolstuff in ascending order, using the built-in < operator to compare values.

If the container elements are user-defined objects, then there has to be an operator<() function defined that works with that type object in order to use sort(). For example, you could sort a vector containing Review objects if you provided either a Review member function or a non-member function for operator<(). Because Review is a structure, its members are public, and a non-member function like this would serve:

bool operator<(const Review & r1, const Review & r2)
{
    if (r1.title < r2.title)
        return true;
    else if (r1.title == r2.title && r1.rating < r2.rating)
        return true;
    else
        return false;
}

With a function like this in place, you then could sort a vector of Review objects (such as books):

sort(books.begin(), books.end());

This version of the operator<() function sorts in lexicographic order of the title members. If two objects have the same title members, they then are sorted in ratings order. But suppose you want to sort in decreasing order or in order of ratings instead of titles? Then you can use the second form of sort(). It takes three arguments. The first two, again, are iterators indicating the range. The final argument is a pointer to function (more generally, a function object) to be used instead of operator<() for making the comparison. The return value should be convertible to bool, with false meaning the two arguments are in the wrong order. Here's an example of such a function:

bool WorseThan(const Review & r1, const Review & r2)
{
    if (r1.rating < r2.rating)
        return true;
    else
        return false;
}

With this function in place, you can use the following statement to sort the books vector of Review objects in order of increasing rating values:

sort(books.begin(), books.end(), WorseThan);

Note that the WorseThan() function does a less complete job than operator<() of ordering Review objects. If two objects have the same title member, the operator<() function sorts using the rating member. But if two objects have the same rating member, WorseThan() treats them as equivalent. The first kind of ordering is called total ordering, and the second kind is called strict weak ordering. With total ordering, if both a < b and b < a are false, then a and b must be identical. With strict weak ordering, that's not so. They might be identical, or they might just have one aspect that is the same, such as the rating member in the WorseThan() example. So instead of saying the two objects are identical, the best you can say for strict weak ordering is that they are equivalent.

Listing 16.6 illustrates the use of these STL functions.

Listing 16.6 `vect3.cpp`

// vect3.cpp -- using STL functions
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;

struct Review {
    string title;
    int rating;
};

bool operator<(const Review & r1, const Review & r2);
bool worseThan(const Review & r1, const Review & r2);
bool FillReview(Review & rr);
void ShowReview(const Review & rr);
int main()
{
    vector<Review> books;
    Review temp;
    while (FillReview(temp))
        books.push_back(temp);
    cout << "Thank you. You entered the following "
         << books.size() << " ratings:\n"
          << "Rating\tBook\n";
    for_each(books.begin(), books.end(), ShowReview);

    sort(books.begin(), books.end());
    cout << "Sorted by title:\nRating\tBook\n";
    for_each(books.begin(), books.end(), ShowReview);
    sort(books.begin(), books.end(), worseThan);
    cout << "Sorted by rating:\nRating\tBook\n";
    for_each(books.begin(), books.end(), ShowReview);

    random_shuffle(books.begin(), books.end());
    cout << "After shuffling:\nRating\tBook\n";
    for_each(books.begin(), books.end(), ShowReview);
    cout << "Bye.\n";
    return 0;
}

bool operator<(const Review & r1, const Review & r2)
{
    if (r1.title < r2.title)
        return true;
    else if (r1.title == r2.title && r1.rating < r2.rating)
        return true;
    else
        return false;
}

bool worseThan(const Review & r1, const Review & r2)
{
    if (r1.rating < r2.rating)
        return true;
    else
        return false;
}

bool FillReview(Review & rr)
{
    cout << "Enter book title (quit to quit): ";
    getline(cin,rr.title);
    if (rr.title == "quit")
        return false;
    cout << "Enter book rating: ";
    cin >> rr.rating;
    if (!cin)
        return false;
    cin.get();
    return true;
}

void ShowReview(const Review & rr)
{
    cout << rr.rating << "\t" << rr.title << endl;
}

Compatibility Notes

graphics/hands.gif

Older implementations use vector.h instead of the vector header file and algo.h instead of the algorithm header file. Although the order of include files shouldn't matter, g++ 2.7.1 required the string header file to appear before STL header files. The Microsoft Visual C++ 5.0 getline() has a bug that delays the next output line appearing until after the next input. Also, Microsoft Visual C++ 5.0 requires you to define operator==() in addition to operator<(). The Borland C++Builder 1.0 getline() requires an explicit delimiter argument.

Here's a sample run:

Enter book title (quit to quit): The Cat Who Can Teach You Weight Loss
Enter book rating: 8
Enter book title (quit to quit): The Dogs of Dharma
Enter book rating: 6
Enter book title (quit to quit): The Wimps of Wonk
Enter book rating: 3
Enter book title (quit to quit): Farewell and Delete
Enter book rating: 7
Enter book title (quit to quit): quit
Thank you. You entered the following 4 ratings:
Rating  Book
8       The Cat Who Can Teach You Weight Loss
6       The Dogs of Dharma
3       The Wimps of Wonk
7       Farewell and Delete
Sorted by title:
Rating  Book
7       Farewell and Delete
8       The Cat Who Can Teach You Weight Loss
6       The Dogs of Dharma
3       The Wimps of Wonk
Sorted by rating:
Rating  Book
3       The Wimps of Wonk
6       The Dogs of Dharma
7       Farewell and Delete
8       The Cat Who Can Teach You Weight Loss
After shuffling:
Rating  Book
7       Farewell and Delete
3       The Wimps of Wonk
6       The Dogs of Dharma
8       The Cat Who Can Teach You Weight Loss
Bye.

Generic Programming

Now that you have some experience using the STL, let's look at the underlying philosophy. The STL is an example of generic programming. Object-oriented programming concentrates on the data aspect of programming, while generic programming concentrates on algorithms. The main things the two approaches have in common are abstraction and the creation of reusable code, but the philosophies are quite different

A goal of generic programming is to write code that is independent of data types. Templates are the C++ tools for doing generic programs. Templates, of course, let you define a function or class in terms of a generic type. The STL goes further by providing a generic representation of algorithms. Templates make this possible, but not without the added element of careful and conscious design. To see how this mixture of templates and design works, let's see why iterators are needed.

Why Iterators?

Understanding iterators is perhaps the key to understanding the STL. Just as templates make the algorithms independent of the type of data stored, iterators make the algorithms independent of the type of container used. Thus, they are an essential component of the STL's generic approach.

To see why iterators are needed, let's look at how you might implement a find function for two different data representations and then see how you could generalize the approach. First, consider a function that searches an ordinary array of double for a particular value. You could write the function like this:

double * find_ar(double * ar, int n, const double & val)
{
    for (int i = 0; i < n; i++)
        if (ar[i] == val)
            return &ar[i];
    return 0;
}

If the function finds the value in the array, it returns the address in the array where the value is found; otherwise it returns the null pointer. It uses subscript notation to move through the array. You could use a template to generalize to arrays of any type having an == operator. Nonetheless, this algorithm is still tied to one particular data structure—the array.

So let's look at searching another kind of data structure, the linked list. (Chapter 12 used a linked list to implement a Queue class.) The list consists of linked Node structures:

struct Node
{
    double item;
    Node * p_next;
};

Suppose you have a pointer pointing to the first node in the list. The p_next pointer in each node points to the next node, and the p_next pointer for the last node in the list is set to 0. You could write a find_ll() function this way:

Node* find_ll(Node * head, const double & val)
{
    Node * start;
    for (start = head; start!= 0; start = start->next)
        if (start->item == val)
            return start;
    return 0;
}

Again, you could use a template to generalize this to lists of any data type supporting the == operator. Nonetheless, this algorithm is still tied to one particular data structure—the linked list.

If you consider details of implementation, the two find functions use different algorithms: One uses array indexing to move through a list of items, the other resets start to start->next. But broadly, the two algorithms are the same: Compare the value with each value in the container in sequence until you find a match.

The goal of generic programming would be to have a single find function that would work with arrays or linked lists or any other container type. That is, not only should the function be independent of the data type stored in the container, it should be independent of the data structure of the container itself. Templates provide a generic representation for the data type stored in a container. What's needed is a generic representation of the process of moving through the values in a container. The iterator is that generalized representation.

What properties should an iterator have in order to implement a find function? Here's a short list:

You should be able to dereference an iterator in order to access the value to which it refers. That is, if p is an iterator, *p should be defined.
You should be able to assign one iterator to another. That is, if p and q are iterators, the expression p = q should be defined.
You should be able to compare one iterator to another for equality. That is, if p and q are iterators, the expressions p == q and p != q should be defined.
You should be able to move an iterator through all the elements of a container. This can be satisfied by defining ++p and p++ for an iterator p.

There are more things an iterator could do, but nothing more it need do, at least, not for the purposes of a find function. Actually, the STL defines several levels of iterators of increasing capabilities, and we'll return to that matter later. Note, by the way, that an ordinary pointer meets the requirements of an iterator. Hence, you can rewrite the find_arr() function like this:

typedef double * iterator;
iterator find_ar(iterator ar, int n, const double & val)
{
    for (int i = 0; i < n; i++, ar++)
        if (*ar == val)
            return ar;
    return 0;
}

Then you can alter the function parameter list so that it takes a pointer to the beginning of the array and a pointer to one past the end of the array as arguments to indicate a range. (Listing 7.8 did something similar.) And the function can return the end pointer as a sign the value was not found.

typedef double * iterator;
iterator find_ar(iterator begin, iterator end, const double & val)
{
    iterator ar;
    for (ar = begin; ar += end; i++, ar++)
        if (*ar == val)
            return ar;
    return end;   // indicates val not found
}

For the find_ll() function, you can define an iterator class that defines the * and ++ operators:

struct Node
{
    double item;
    Node * p_next;
};

class iterator
{
    Node * pt;
public:
    iterator() : pt(0) {}
    iterator (Node * pn) : pt(pn) {}
    double operator*() { return pt->item;}
    iterator& operator++()     // for ++it
    {
        pt = pt->next;
        return *this;
    }
    iterator operator++(int)  // for it++
    {
        iterator tmp = *this;
        pt = pt->next;
        return tmp;
    }
// ... operator==(), operator!=(), etc.
};

(To distinguish between the prefix and postfix versions of the ++ operator, C++ adopted the convention of letting operator++() be the prefix version and operator++(int) be the postsuffix version; the argument is never used, hence needn't be given a name.)

The main point here is not how, in detail, to define the iterator class, but that with such a class, the second find function can be written like this:

iterator find_ll(iterator head, const double & val)
{
    iterator start;
    for (start = head; start!= 0; ++start)
        if (*start == val)
            return start;
    return 0;
}

This is very nearly the same as find_ar(). The point of difference is in how the two functions determine if they've reached the end of the values being searched. The find_ar() function uses an iterator to one-past-the-end, while find_ll() uses a null value stored in the final node. Remove that difference, and you can make the two functions identical. For example, you could require that the linked list have one additional element after the last official element. That is, have both the array and the linked list have a past-the-end element, and end the search when the iterator reaches the past-the-end position. Then find_ar() and find_ll() will have the same way of detecting the end of data and become identical algorithms. Note that requiring a past-the-end element moves from making requirements upon iterators to making requirements upon the container class.

The STL follows the approach just outlined. First, each container class (vector, list, deque, and so on) defines an iterator type appropriate to the class. For one class, the iterator might be a pointer; for another, it might be an object. Whatever the implementation, the iterator will provide the needed operations, such as * and ++. (Some classes may need more operations than others.) Next, each container class will have a past-the-end marker, which is the value assigned to an iterator when it has been incremented one past the last value in the container. Have begin() and end() methods that return iterators to the first element in a container and to the past-the-end position. Have the ++ operation take an iterator from the first element to past-the-end, visiting every container element en route.

To use a container class, you don't need to know how its iterators are implemented nor how past-the-end is implemented. It's enough to know it does have iterators, that begin() returns an iterator to the first element and that end() returns an iterator to past-the-end. For example, suppose you want to print the values in a vector<double> object. Then you can do this:

vector<double>::iterator pr;
for (pr = scores.begin(); pr != scores.end(); pr++)
    cout << *pr << endl;

Here the line

vector<double>::iterator pr;

identifies pr as the iterator type defined for the vector<double> class. If you used the list<double> class template instead to store scores, you could use this code:

list<double>::iterator pr;
for (pr = scores.begin(); pr != scores.end(); pr++)
    cout << *pr << endl;

The only change is in the type declared for pr. Thus, by having each class define appropriate iterators and designing the classes in a uniform fashion, the STL lets you write the same code for containers having quite dissimilar internal representations.

Actually, as a matter of style, it's better to avoid using the iterators directly; instead, if possible, use an STL function, such as for_each(), that takes care of the details for you.

Let's summarize the STL approach. Start with an algorithm for processing a container. Express it in as general terms as possible, making it independent of data type and container type. To make the general algorithm work with specific cases, define iterators that meet the needs of the algorithm and place requirements on the container design. That is, basic iterator properties and container properties stem from requirements placed upon them by the algorithm.

Kinds of Iterators

Different algorithms have different requirements for iterators. For example, a find algorithm needs the ++ operator to be defined so the iterator can step through the entire container. It needs read access to data but not write access. (It just looks at data and doesn't change it.) A sort algorithm, however, requires random access so that it can swap the two non-adjacent elements. If iter is an iterator, you can get random access by defining the + operator so that you can use expressions like iter + 10. Also, a sort algorithm needs to be able both to read and to write data.

The STL defines five kinds of iterators and describes its algorithms in terms of which kinds of iterators it needs. The five kinds are the input iterator, output iterator, forward iterator, bidirectional iterator, and random access iterator. For example, the find() prototype looks like this:

template<class InputIterator, class T>
InputIterator find(InputIterator first, InputIterator last, const T& value);

This tells you that this algorithm requires an input iterator. Similarly, the prototype

template<class RandomAccessIterator>
void sort(RandomAccessIterator first, RandomAccessIterator last);

tells you that the sort algorithm requires a random access iterator.

All five kinds of iterators can be dereferenced (that is, the * operator is defined for them) and can be compared for equality (using the == operator, possibly overloaded) and inequality (using the != operator, possibly overloaded.) If two iterators test as equal, then dereferencing one should produce the same value as dereferencing the second. That is, if:

iter1 == iter2

is true, then the following also is true:

*iter1 == *iter2

Of course these properties hold true for built-in operators and pointers, so these requirements are guides for what you must do when overloading these operators for an iterator class. Now let's look at other iterator properties.

Input Iterator

The term "input" is used from the viewpoint of a program. That is, information going from the container to the program is considered input, just as information from a keyboard to the program is considered input. So an input iterator is one that can be used by a program to read values from a container. In particular, dereferencing an input iterator must allow a program to read a value from a container, but it needn't allow a program to alter that value. So algorithms that require an input iterator are algorithms that don't change values held in a container.

An input iterator has to allow you to access all the values in a container. It does so by supporting the ++ operator, both in prefix and suffix form. If you set an input operator to the first element in a container and increment it until reaching past-the-end, it will point to every container item once en route. Incidentally, there is no guarantee that traversing a container a second time with an input iterator will move through the values in the same order. Also, after an input iterator has been incremented, there is no guarantee that its prior value can still be dereferenced. Any algorithm based on an input iterator, then, should be a single-pass algorithm that doesn't rely upon iterator values from a previous pass or upon earlier iterator values from the same pass.

Note that an input iterator is a one-way iterator; it can increment, but it can't back up.

Output Iterator

Here the term "output" indicates that the iterator is used for transferring information from a program to a container. The output iterator is similar to the input iterator, except that dereferencing is guaranteed to allow a program to alter a container value, but not to read it. If the ability to write without reading seems strange, keep in mind that property also applies to output sent to your display; cout can modify the stream of characters sent to the display, but it can't read what's on the screen. The STL is general enough that its containers can represent output devices, so you can run into the same situation with containers. Also, if an algorithm modifies the contents of a container (for example, by generating new values to be stored) without reading the contents, there's no reason to require that it use an iterator that can read the contents.

In short, you can use an input iterator for single-pass, read-only algorithms and an output operator for single-pass, write-only algorithms.

Forward Iterator

Like the input and output iterators, the forward iterator uses only the ++ operators for navigating through a container. So it can only go forward through a container one element at a time. However, unlike input and output iterators, it always goes through a sequence of values in the same order. Also, after you increment a forward iterator, you still can dereference the prior iterator value, if you've saved it, and get the same value. These properties make multiple-pass algorithms possible.

A forward iterator can allow you to both read and modify data, or it can allow you just to read it:

int * pirw;       // read-write iterator
const int * pir;  // read-only iterator

Bidirectional Iterator

Suppose you have an algorithm that needs to be able to traverse a container in both directions? For example, a reverse function could swap the first and last elements, increment the pointer to the first element, decrement the pointer to a second element, and repeat the process. A bidirectional iterator has all the features of a forward iterator and adds support for the two decrement operators (prefix and postfix).

Random Access Iterator

Some algorithms, such as sort and binary search, require the ability to jump directly to an arbitrary element of a container. This is termed random access and requires a random access iterator. It has all the features of a bidirectional iterator, plus it adds operations (like pointer addition) supporting random access and relational operators for ordering the elements. Table 16.3 lists the operations a random access iterator has beyond those of the bidirectional iterator. In this table, X represents a random iterator type, T represents the type pointed to, a and b are iterator values, n is an integer, and r is random iterator variable or reference.

Table 16.3. Random Access Iterator Operations
Expression	Comments
`a + n`	Points to the `n`th element after the one `a` points to
`n + a`	Same as `a + n`
`a - n`	Points to the `n`th element before the one `a` points to
`r += n`	Equivalent to `r = r + n`
`r -= n`	Equivalent to `r = r - n`
`a[n]`	Equivalent to `*(a + n)`
`b - a`	The value of `n` such that `b = a + n`
`a < b`	True of `b - a > 0`
`a > b`	True if `b < a`
`a >= b`	True if `!(a < b)`
`a <= b`	True if `!(a > b)`

Expressions like a + n are valid only if both a and a + n lie within the range of the container (including past-the-end).

Iterator Hierarchy

You probably noticed that the iterator kinds form a hierarchy. A forward iterator has all the capabilities of an input iterator and of an output iterator plus its own capabilities. A bidirectional iterator has all the capabilities of a forward iterator plus its own capabilities. And a random access iterator has all the capabilities of a forward iterator plus its own capabilities. Table 16.4 summarizes the main iterator capabilities. In it, i is an iterator, n is an integer.

Table 16.4. Iterator Capabilities
Iterator Capability	Input	Output	Forward	Bidirectional	Random Access
Dereferencing read	yes	no	yes	yes	yes
Dereferencing write	no	yes	yes	yes	yes
Fixed and repeatable order	no	no	yes	yes	yes
++i i++	yes	yes	yes	yes	yes
--i i--	no	no	no	yes	yes
`i[n]`	no	no	no	no	yes
`i + n`	no	no	no	no	yes
`i - n`	no	no	no	no	yes
`i += n`	no	no	no	no	yes
`i -=n`	no	no	no	no	yes

An algorithm written in terms of a particular kind of iterator can use that kind of iterator or any other iterator that has the required capabilities. So a container with, say, a random access iterator can use an algorithm written for an input iterator.

Why all these different kinds of iterators? The idea is to write an algorithm using the iterator with the fewest requirements possible, allowing it to be used with the largest range of containers. Thus, the find() function, by using a lowly input iterator, can be used with any container containing readable values. The sort() function, however, by requiring a random access iterator, can be used just with containers that support that kind of iterator.

Note that the various iterator kinds are not defined types; rather, they are conceptual characterizations. As mentioned earlier, each container class defines a class scope typedef name called iterator. So the vector<int> class has iterators of type vector<int>::iterator. But the documentation for this class would tell you that vector iterators are random access iterators. That, in turn, allows you to use algorithms based upon any iterator type because a random access iterator has all the iterator capabilities. Similarly, a list<int> class has iterators of type list<int>::iterator. The STL implements a doubly linked list, so it uses a bidirectional iterator. Thus, it can't use algorithms based on random access iterators, but it can use algorithms based on less demanding iterators.

Concepts, Refinements, and Models

The STL has several features, such as kinds of iterators, that aren't expressible in the C++ language. That is, although you can design, say, a class having the properties of a forward iterator, you can't have the compiler restrict an algorithm to only using that class. The reason is that the forward iterator is a set of requirements, not a type. The requirements could be satisfied by an iterator class you've designed, but it also can be satisfied by an ordinary pointer. An STL algorithm works with any iterator implementation that meets its requirements. STL literature uses the word concept to describe a set of requirements. Thus, there is an input iterator concept, a forward iterator concept, and so on. By the way, if you do need iterators for, say, a container class you're designing, the STL does include iterator templates for the standard varieties.

Concepts can have an inheritance-like relationship. For example, a bidirectional iterator inherits the capabilities of a forward iterator. However, you can't apply the C++ inheritance mechanism to iterators. For example, you might implement a forward iterator as a class and a bidirectional iterator as a regular pointer. So, in terms of the C++ language, this particular bidirectional iterator, being a built-in type, couldn't be derived from a class. Conceptually, however, it does inherit. Some STL literature uses the term refinement to indicate this conceptual inheritance. Thus, a bidirectional iterator is a refinement of the forward iterator concept.

A particular implementation of a concept is termed a model. Thus, an ordinary pointer-to-int is a model of the concept random access iterator. It's also a model of forward iterator, for it satisfies all the requirements of that concept.

The Pointer As Iterator

Iterators are generalizations of pointers, and a pointer satisfies all the iterator requirements. Iterators form the interface for STL algorithms, and pointers are iterators, so STL algorithms can use pointers to operate upon non-STL containers. For example, you can use STL algorithms with arrays. Suppose Receipts is an array of double values, and you would like to sort in ascending order:

const int SIZE = 100;
double Receipts[SIZE];

The STL sort() function, recall, takes as arguments an iterator pointing to the first element in a container and an iterator pointing to past-the-end. Well, &Receipts[0] (or just Receipts) is the address of the first element, and &Receipts[SIZE] (or just Receipts + SIZE) is the address of the element following the last element in the array. Thus, the function call:

sort(Receipts, Receipts + SIZE);

sorts the array. C++, by the way, does guarantee that the expression Receipts + n is defined as long as the result lies in the array or one past the end.

Thus, the fact that pointers are iterators and that algorithms are iterator-based makes it possible to apply STL algorithms to ordinary arrays. Similarly, you can apply STL algorithms to data forms of your own design, providing you supply suitable iterators (which may be pointers or objects) and past-the-end indicators.

`copy()`, `ostream_iterator`, and `istream_iterator`

The STL provides some predefined iterators. To see why, let's establish some background. There is an algorithm (copy()) for copying data from one container to another. This algorithm is expressed in terms of iterators, so it can copy from one kind of container to another or even from or to an array, because you can use pointers into an array as iterators. For example, the following copies an array into a vector:

int casts[10] = {6, 7, 2, 9 ,4 , 11, 8, 7, 10, 5};
vector<int> dice[10];
copy(casts, casts + 10, dice.begin());   // copy array to vector

The first two iterator arguments to copy() represent a range to be copied, and the final iterator argument represents the location to which the first item is copied. The first two arguments must be input iterators (or better), and the final argument must be an output iterator (or better). The copy() function overwrites existing data in the destination container, and the container has to be large enough to hold the copied elements. So you can't use copy() to place data in an empty vector, at least not without resorting to a trick to be revealed later.

Now suppose you wanted to copy information to the display. You could use copy() providing there was an iterator representing the output stream. The STL provides such an iterator with the ostream_iterator template. Using STL terminology, this template is a model of the output iterator concept. It is also an example of an adapter, a class or function that converts some other interface to an interface used by the STL. You can create an iterator of this kind by including the iterator (formerly iterator.h) header file and making a declaration:

#include <iterator>
...
ostream_iterator<int, char> out_iter(cout, " ");

The out_iter iterator now becomes an interface allowing you to use cout to display information. The first template argument (int, in this case) indicates the data type being sent to the output stream. The second template argument (char, in this case) indicates the character type used by the output stream. (Another possible value would be wchar_t.) The first constructor argument (cout, in this case), identifies the output stream being used. It also could be a stream used for file output, as discussed in Chapter 17, "Input, Output, and Files." The final character string argument is a separator to be displayed after each item sent to the output stream.

Caution

graphics/tnt.gif

Older implementations use just the first template argument for the ostream_iterator:

ostream_iterator<int> out_iter(cout, " "); // older implementation

You could use the iterator like this:

*out_iter++ = 15;   // works like cout << 15 << " ";

For a regular pointer, this would mean assigning the value 15 to the pointed-to location, and then incrementing the pointer. For this ostream_iterator, however, the statement means send 15 and then a string consisting of a space to the output stream managed by cout. Then get ready for the next output operation. You can use the iterator with copy() as follows:

copy(dice.begin(), dice.end(), out_iter);    // copy vector to output stream

This would mean to copy the entire range of the dice container to the output stream, that is, to display the contents of the container.

Or, you can skip creating a named iterator and construct an anonymous iterator instead. That is, you can use the adapter like this:

copy(dice.begin(), dice.end(), ostream_iterator<int, char>(cout, " ") );

Similarly, the iterator header file defines an istream_iterator template for adapting istream input to the iterator interface. It is a model of input iterator. You can use two istream_iterator objects to define an input range for copy():

copy(istream_iterator<int, char>(cin),
     istream_iterator<int, char>(), dice.begin());

Like ostream_iterator, istream_iterator uses two template arguments. The first indicates the data type to be read, and the second indicates the character type used by the input stream. Using a constructor argument of cin means to use the input stream managed by cin. Omitting the constructor argument indicates input failure, so the previous code means to read from the input stream until end-of-file, type mismatch, or some other input failure.

Other Useful Iterators

The iterator header file provides some other special-purpose predefined iterator types in addition to ostream_iterator and istream_iterator. They are reverse_iterator, back_insert_iterator, front_insert_iterator, and insert_iterator.

Let's start with seeing what a reverse iterator does. In essence, incrementing a reverse iterator causes it to decrement. Why not just decrement a regular iterator? The main reason is to simplify using existing functions. Suppose you want to display the contents of the dice container. As you just saw, you can use copy() and an ostream_iterator to copy the contents to the output stream:

ostream_iterator<int, char> out_iter(cout, " ");
copy(dice.begin(), dice.end(), out_iter);  // display in forward order

Now suppose you want to print the contents in reverse order. (Perhaps you are performing time-reversal studies.) There are several approaches that don't work, but rather than wallow in them, let's go to one that does. The vector class has a member function called rbegin() that returns a reverse_iterator pointing to past-the-end and a member rend() that returns a reverse_iterator pointing to the first element. Because incrementing a reverse_iterator makes it decrement, you can use the statement:

copy(dice.rbegin(), dice.rend(), out_iter); // display in reverse order

to display the contents backward. You don't even have to declare a reverse_iterator.

Remember

Both rbegin() and end() return the same value (past-the-end), but as a different type (reverse_iterator versus iterator). Similarly, both rend() and begin() return the same value (an iterator to the first element), but as a different type.

There is a special compensation reverse pointers have to make. Suppose rp is a reverse pointer initialized to dice.rbegin(). What should *rp be? Since rbegin() returns past-the-end, you shouldn't try to dereference that address. Similarly, if rend() is really the location of the first element, copy() would stop one location earlier because the end of the range is not in a range. Reverse pointers solve both problems by decrementing first, then dereferencing. That is, *rp dereferences the iterator value immediately preceding the current value of *rp. If rp points to position six, *rp is the value of position five, and so on. Listing 16.7 illustrates using copy(), an istream iterator and a reverse iterator.

Listing 16.7 `copyit.cpp`

// copyit.cpp -- copy() and iterators
#include <iostream>
#include <iterator>
#include <vector>
using namespace std;

int main()
{
    int casts[10] = {6, 7, 2, 9 ,4 , 11, 8, 7, 10, 5};
    vector<int> dice(10);
    // copy from array to vector
    copy(casts, casts + 10, dice.begin());
    cout << "Let the dice be cast!\n";
    // create an ostream iterator
    ostream_iterator<int, char> out_iter(cout, " ");
    // copy from vector to output
    copy(dice.begin(), dice.end(), out_iter);
    cout << endl;
    cout <<"Implicit use of reverse iterator.\n";
    copy(dice.rbegin(), dice.rend(), out_iter);
    cout << endl;
    cout <<"Explicit use of reverse iterator.\n";
    vector<int>::reverse_iterator ri;
    for (ri = dice.rbegin(); ri != dice.rend(); ++ri)
        cout << *ri << ' ';
    cout << endl;

    return 0;
}

Compatibility Notes

graphics/hands.gif

Older implementations may use the iterator.h and vector.h header files instead. Also, some implementations use ostream_iterator<int> instead of ostream_iterator<int, char>.

Here is the output:

Let the dice be cast!
6 7 2 9 4 11 8 7 10 5
Implicit use of reverse iterator.
5 10 7 8 11 4 9 2 7 6
Explicit use of reverse iterator.
5 10 7 8 11 4 9 2 7 6

If you have the choice of explicitly declaring iterators or using STL functions to handle the matter internally, for example, by passing an rbegin() return value to a function, take the latter course. It's one less thing to do and one less opportunity to experience human fallibility.

The other three iterators (back_insert_iterator, front_insert_iterator, and insert_iterator) also increase the generality of the STL algorithms. Many STL functions are like copy() in that they send their results to a location indicated by an output iterator. Recall that

copy(casts, casts + 10, dice.begin());

copies values to the location beginning at dice.begin(). These values overwrite the prior contents in dice, and the function assumes that dice has enough room to hold the values. That is, copy() does not automatically adjust the size of the destination to fit the information sent to it. Listing 16.7 took care of that situation by declaring dice to have 10 elements, but suppose you don't know in advance how big dice should be? Or suppose you want to add elements to dice rather than overwrite existing ones?

The three insert iterators solve these problems by converting the copying process to an insertion process. Insertion adds new elements without overwriting existing data and uses automatic memory allocation to ensure the new information fits. A back_insert_iterator inserts items at the end of the container, while a front_insert_iterator inserts items at the front. Finally, the insert_iterator inserts items in front of the location specified as an argument to the insert_iterator constructor. All three are models of the output container concept.

There are restrictions. A back insertion iterator can be used only with container types that allow rapid insertion at the end. (Rapid means a constant time algorithm; the section on containers discusses this concept further.) The vector class qualifies. A front insertion iterator can be used only with container types allowing constant time insertion at the beginning. Here the vector class doesn't qualify but the queue class does. The insertion iterator doesn't have these restrictions. Thus, you can use it to insert material at the front of a vector. However, a front insertion iterator will do so faster for those container types that support it.

Tip

graphics/bulb.gif

You can use an insert iterator to convert an algorithm that copies data to one that inserts data.

These iterators take the container type as a template argument and the actual container identifier as a constructor argument. That is, to create a back insertion iterator for a vector<int> container called dice, you do this:

back_insert_iterator<vector<int> > back_iter(dice);

Declaring a front insertion iterator has the same form. An insertion iterator declaration has an additional constructor argument to identify the insertion location:

insert_iterator<vector<int> > insert_iter(dice, dice.begin() );

Listing 16.8 illustrates using two of these iterators.

Listing 16.8 `inserts.cpp`

// inserts.cpp -- copy() and insert iterators
#include <iostream>
#include <string>
#include <iterator>
#include <vector>
using namespace std;

int main()
{
    string s1[4] = {"fine", "fish", "fashion", "fate"};
    string s2[2] = {"busy", "bats"};
    string s3[2] = {"silly", "singers"};
    vector<string> words(4);
    copy(s1, s1 + 4, words.begin());
    ostream_iterator<string, char> out(cout, " ");
    copy (words.begin(), words.end(), out);
    cout << endl;

// construct anonymous back_insert_iterator object
    copy(s2, s2 + 2, back_insert_iterator<vector<string> >(words));
    copy (words.begin(), words.end(), out);
    cout << endl;

// construct anonymous insert_iterator object
    copy(s3, s3 + 2, insert_iterator<vector<string> >(words, words.begin()));
    copy (words.begin(), words.end(), out);
    cout << endl;

    return 0;
}

Compatibility Note

graphics/hands.gif

Older compilers may use list.h and iterator.h. Also, some compilers may use ostream_iterator<int> instead of ostream_iterator<int,char>.

Here is the output:

fine fish fashion fate
fine fish fashion fate busy bats
silly singers fine fish fashion fate busy bats

If you're feeling overwhelmed by all the iterator varieties, keep in mind that using them will make them familiar. Also keep in mind that these predefined iterators expand the generality of the STL algorithms. Thus, not only can copy() copy information from one container to another, it can copy information from a container to the output stream and from the input stream to a container. And you also can use copy() to insert material into another container. So you wind up with a single function doing the work of many. And because copy() is just one of several STL functions that use an output iterator, these predefined iterators multiply the capabilities of those functions, too.

Kinds of Containers

The STL has both container concepts and container types. The concepts are general categories with names like container, sequence container, associative container, and so on. The container types are templates you can use to create specific container objects. The eleven container types are deque, list, queue, priority_queue, stack, vector, map, multimap, set, multiset, and bitset. (This chapter won't discuss bitset, which is a container for dealing with data at the bit level.) Because the concepts categorize the types, let's start with them.

The Container Concept

There is no type corresponding to the basic container concept, but the concept describes elements common to all the container classes. It's sort of a conceptual abstract base class?conceptual because the container classes don't actually use the inheritance mechanism. Or putting it another way, the container concept lays down a set of requirements that all STL container classes must satisfy.

A container is an object that stores other objects, which are all of a single type. The stored objects may be objects in the OOP sense, or they may be values of built-in types. Data stored in a container are owned by the container. That means when a container expires, so do the data stored in the container. (However, if the data are pointers, the pointed-to data does not necessarily expire.)

You can't store just any kind of object in a container. In particular, the type has to be copy constructable and assignable. Basic types satisfy these requirements as do class types unless the class definition makes one or both of the copy constructor and the assignment operator private or protected.

The basic container doesn't guarantee that its elements are stored in any particular order or that the order doesn't change, but refinements to the concept may add such guarantees. All containers do provide certain features and operations. Table 16.5 summarizes several of these common features. In the table, X represents a container type, such as vector, T represents the type of object stored in the container, a and b represent values of type X, and u represents an identifier of type X.

Table 16.5. Some Basic Container Properties
Expression	Return type	Comment	Complexity
`X::iterator`	Iterator type pointing to `T`	Any iterator category except output iterator	Compile time
`X::value_type`	`T`	The type for `T`	Compile time
`X u;`		Creates 0-size container called `u`	Constant
`X();`		Creates 0-size anonymous container	Constant
`X u(a);`		Copy constructor	Linear
`X u = a;`		Same effect as `X u(a);`
`(&a)->~X();`	`void`	Apply destructor to every element of a container	Linear
`a.begin()`	iterator	Returns an iterator referring to first element of container	Constant
`a.end()`	iterator	Returns an iterator that is a past-the-end value	Constant
`a.size()`	unsigned integral type	Number of elements, equal to `a.end() - a.begin()`	Constant
`a.swap(b)`	`void`	Swap contents of `a` and `b`	Constant
`a == b`	convertible to `bool`	True if `a` and `b` have the same size and each element in `a` is equivalent to (`==` is true) the corresponding element in `b`	Linear
`a != b`	convertible to `bool`	same as `!(a == b)`	Linear

The complexity column describes the time needed to perform an operator. This table lists three possibilities, which, from fastest to slowest, are as follows:

compile time
constant time
linear time

If the complexity is compile time, the action is performed during compilation and uses no execution time. A constant complexity means the operation takes place during runtime but doesn't depend upon the number of elements in an object. A linear complexity means the time is proportional to the number of elements. Thus, if a and b are containers, a == b has linear complexity because the == operation may have to be applied to each element of the container. Actually, that is a worst-case scenario. If two containers have different sizes, no individual comparisons need be made.

Constant Time and Linear Time Complexity

graphics/common.gif

Consider a long narrow box filled with large packages arranged in a line, and suppose the box is open at just one end. Suppose my task is to unload the package at the open end. This is a constant time task. Whether there are 10 packages or a 1000 packages behind the one at the end makes no difference.

Now suppose my task is to fetch the package at the closed end of the box. This is a linear time task. If there are 10 packages altogether, I have to unload 10 packages to get the one at the closed end. If there are 100 packages, I have to unload 100 packages at the end. Assuming I'm a tireless worker who can move only one package at a time, that will take ten times longer.

Now suppose my task is to fetch an arbitrary package. It might happen that the package I'm supposed to get is the first one at hand. However, on the average, the number of packages I have to move is still proportional to the number of packages in the container, so the task still has linear time complexity.

Replacing the long narrow box with a similar box having open sides would change the task to constant time complexity, for then I could move directly to the desired package and remove it without moving the others.

The idea of time complexity describes the effect of container size on execution time but ignores other factors. If a superhero can unload packages from a box with one open end 1000 times faster than me, the task as executed by her still has linear time complexity. In this case, her linear time performance with a closed box (open end) would be faster than my constant time performance with an open box as long as the boxes didn't have too many packages.

Complexity requirements are characteristic of the STL. While the details of an implementation may be hidden, the performance specifications should be public so that you know the computing cost of doing a particular operation.

Sequences

You can refine the basic container concept by adding requirements. The sequence is an important refinement, for six of the STL container types (deque, list, queue, priority_queue, stack, and vector) are sequences. (Recall that a queue allows elements to be added at the rear end and removed from the front. A double-ended queue, represented by deque, allows addition and removal at both ends.) The sequence concept adds the requirement that the iterator be at least a forward iterator. This, in turn, guarantees that the elements are arranged in a definite order that doesn't change from one cycle of iteration to the next.

The sequence also requires that its elements be arranged in strict linear order. That is, there is a first element, a last element, and each element but the first and last has exactly one element immediately ahead of it and one element immediately after it. An array and a linked list are examples of a sequence, while a branching structure (in which each node points to two daughter nodes) would not be.

Because elements in sequence have a definite order, operations such as inserting values at a particular location and erasing a particular range become possible. Table 16.6 lists these and other operations required of a sequence. The table uses the same notation as Table 16.5, with the addition of t representing a value of type T, that is, the type of value stored in the container, of n, an integer, and of p, q, i and j, representing iterators.

Table 16.6. Sequence Requirements
Expression	Return Type	Comments
`X a(n,t);`		Declares a sequence `a` of `n` copies of value `t`
`X(n, t)`		Creates an anonymous sequence of `n` copies of value `t`
`X a(i, j)`		Declares a sequence `a` initialized to the contents of range [`i`,`j`)
`X(i, j)`		Creates an anonymous sequence initialized to the contents of range [`i`,`j`)
`a.insert(p,t)`	`iterator`	Inserts a copy of `t` before `p`
`a.insert(p,n,t)`	`void`	Inserts `n` copies of `t` before `p`
`a.insert(p,i,j)`	`void`	Insert copies of elements in the range [`i`,`j`) before `p`
`a.erase(p)`	`iterator`	Erases the element pointed to by `p`
`a.erase(p,q)`	`iterator`	Erases the elements in the range [`p`,`q`)
`a.clear()`	`void`	Same as `erase(begin(), end())`

Because the deque, list, queue, stack, and vector template classes all are models of the sequence concept, they all support the operators of Table 16.6. In addition there are operations that are available to some of these five models. When allowed, they have constant time complexity. Table 16.7 lists these additional operations.

Table 16.7. Optional Sequence Requirements
Expression	Return Type	Meaning	Container
`a.front()`	`T&`	`*a.begin()`	`vector`, `list`, `deque`
`a.back()`	`T&`	`*--a.end()`	`vector`, `list`, `deque`
`a.push_front(t)`	`void`	`a.insert(a.begin(), t)`	`list`, `deque`
`a.push_back(t)`	`void`	`a.insert(a.end(), t)`	`vector`, `list`, `deque`
`a.pop_front(t)`	`void`	`a.erase(a.begin())`	`list`, `deque`
`a.pop_back(t)`	`void`	`a.erase(--a.end())`	`vector`, `list`, `deque`
`a[n]`	`T&`	`*(a.begin() + n)`	`vector`, `deque`
`a.at(n)`	`T&`	`*(a.begin() + n)`	`vector`, `deque`

Table 16.7 does merit a comment or two. First, you'll notice that a[n] and a.at(n) both return a reference to the nth element (numbering from 0) in a container. The difference is that a.at(n) does bounds checking and throws an out_of_range exception if n is outside the valid range for the container. Next, you might wonder why, say, push_front() is defined for list and deque and not vector. Suppose you want to insert a new value at the front of a vector of 100 elements. To make room, you have to move element 99 to position 100, and then move element 98 to position 99, and so on. This is an operation with linear time complexity because moving 100 elements would take 100 times as long as moving a single element. But the operations in Table 16.7 are supposed to be implemented only if they can be performed with constant time complexity. The design for lists and double-ended queues, however, allows an element to be added to the front without moving the other elements to new locations, so they can implement push_front() with constant time complexity. Figure 16.4 illustrates push_front() and push_back().

Figure 16.4. `push_front()` and `push_back()`.

graphics/16fig04.gif

Let's take a closer look at the six sequence container types.

`vector`

You've already seen several examples using the vector template, which is declared in the vector header file. In brief, vector is a class representation of an array. The class provides automatic memory management that allows the size of a vector object to vary dynamically, growing and shrinking as elements are added or removed. It provides random access to elements. Elements can be added to or removed from the end in constant time, but insertion and removal from the beginning and the middle are linear-time operations.

In addition to being a sequence, a vector also is a model of the reversible container concept. This adds two more class methods: rbegin() returns an iterator to the first element of the reversed sequence, and rend() returns a past-the-end iterator for the reversed sequence. So if dice is a vector<int> container and Show(int) is a function that displays an integer, the following code will first display the contents of dice in forward order and then in reverse order:

for_each(dice.begin(), dice.end(), Show);    // display in order
cout << endl;
for_each(dice.rbegin(), dice.rend(), Show);  // display in reversed order
cout << endl;

The iterator returned by the two methods is of a class scope type reverse_iterator. Incrementing such an iterator, recall, causes it to move through a reversible container in reverse order.

The vector template class is the simplest of the sequence types and is considered the type that should be used by default unless the program requirements are better satisfied by the particular virtues of the other types.

`deque`

The deque template class (declared in the deque header file) represents a double-ended queue, a type often called a deque (sounds like deck), for short. As implemented in the STL, it's a lot like a vector, supporting random access. The main difference is that inserting and removing items from the beginning of a deque object are constant-time operations instead of being linear-time operations the way they are for vector. So if most operations take place at the beginning and ends of a sequence, consider using a deque data structure.

The goal of constant-time insertion and removal at both ends of a deque make the design of a deque more complex than that of a vector. Thus, while both offer random access to elements and linear-time insertion and removal from the middle of a sequence, the vector should allow faster execution of these operations.

`list`

The list template class (declared in the list header file) represents a doubly linked list. Each element, other than the first and last, is linked to the item before it and the item following it, implying that a list can be traversed in both directions. The crucial difference between list and vector is that list provides for constant-time insertion and removal of elements at any location in the list. (The vector template, recall, provides linear-time insertion and removal except at the end, where it provides constant-time insertion and removal.) Thus, vector emphasizes rapid access via random access, while a list emphasizes rapid insertion and removal of elements.

Like vector, list is a reversible container. Unlike vector, list does not support array notation and random access. Unlike a vector iterator, a list iterator remains pointing to the same element even after items are inserted into or removed from a container. Let's clarify this statement. Suppose you have an iterator pointing to the fifth element of a vector container. Then suppose you insert an element at the beginning of the container. All the other elements have to be moved to make room, so after the insertion, the fifth element now contains the value that used to be in the fourth element. Thus, the iterator points to the same location but different data. Inserting a new element into a list, however, doesn't move the existing elements; it just alters the link information. An iterator pointing to a certain item still points to the same item, but it may be linked to different items than before.

The list template class has some list-oriented member functions in addition to those that come with sequences and reversible containers. Table 16.8 lists many of them. (For a complete list of STL methods and functions, see Appendix G.) The Alloc template parameter is one you normally don't have to worry about because it has a default value.

Table 16.8. Some `list` Member Functions
Function	Description
`void merge(list<T, Alloc>& x)`	Merges list `x` with the invoking list. Both lists must be sorted. The resulting sorted list is in the invoking list, and `x` is left empty. This function has linear time complexity.
`void remove(const T & val)`	Removes all instances of `val` from the list. This function has linear time complexity.
`void sort()`	Sorts the list using the `<` operator; the complexity is `N` log `N` for `N` elements.
void splice(iterator pos, list<T, Alloc> x)	Inserts the contents of list `x` in front of position `pos`, and `x` is left empty. This function has constant-time complexity.
`void unique()`	Collapses each consecutive group of equal elements to a single element. This function has linear-time complexity.

Listing 16.9 illustrates these methods along with the insert() method.

Listing 16.9 `list.cpp`

// list.cpp -- using a list
#include <iostream>
#include <list>
#include <iterator>
using namespace std;

int main()
{
    list<int> one(5, 2); // list of 5 2s
    int stuff[5] = {1,2,4,8, 6};
    list<int> two;
    two.insert(two.begin(),stuff, stuff + 5 );
    int more[6] = {6, 4, 2, 4, 6, 5};
    list<int> three(two);
    three.insert(three.end(), more, more + 6);

    cout << "List one: ";
    ostream_iterator<int,char> out(cout, " ");
    copy(one.begin(), one.end(), out);
    cout << endl << "List two: ";
    copy(two.begin(), two.end(), out);
    cout << endl << "List three: ";
    copy(three.begin(), three.end(), out);
    three.remove(2);
    cout << endl << "List three minus 2s: ";
    copy(three.begin(), three.end(), out);
    three.splice(three.begin(), one);
    cout << endl << "List three after splice: ";
    copy(three.begin(), three.end(), out);
    cout << endl << "List one: ";
    copy(one.begin(), one.end(), out);
    three.unique();
    cout << endl << "List three after unique: ";
    copy(three.begin(), three.end(), out);
    three.sort();
    three.unique();
    cout << endl << "List three after sort & unique: ";
    copy(three.begin(), three.end(), out);
    two.sort();
    three.merge(two);
    cout << endl << "Sorted two merged into three: ";
    copy(three.begin(), three.end(), out);
    cout << endl;

    return 0;
}

Compatibility Note

graphics/hands.gif

Older versions may use list.h and iterator.h. Also, older versions may use ostream_iterator<int> instead of ostream_iterator<int,char>.

Here is the output:

List one: 2 2 2 2 2
List two: 1 2 4 8 6
List three: 1 2 4 8 6 6 4 2 4 6 5
List three minus 2s: 1 4 8 6 6 4 4 6 5
List three after splice: 2 2 2 2 2 1 4 8 6 6 4 4 6 5
List one:
List three after unique: 2 1 4 8 6 4 6 5
List three after sort & unique: 1 2 4 5 6 8
Sorted two merged into three: 1 1 2 2 4 4 5 6 6 8 8

Program Notes

The program uses the technique discussed earlier for using the general STL copy() function and an ostream_iterator object to display the contents of a container.

The main difference between insert() and splice() is that insert() inserts a copy of the original range into the destination, while splice() moves the original range into the destination. Thus, after the contents of one are spliced to three, one is left empty. (The splice() method has additional prototypes for moving single elements and a range of elements.) The splice() method leaves iterators valid. That is, if you set a particular iterator to point to an element in one, that iterator would still point to the same element after splice() relocated it in three.

Notice that unique() only reduces adjacent equal values to a single value. After the program executes three.unique(), three still contains two 4s and two 6s that weren't adjacent. But applying sort() and then unique() does limit each value to a single appearance.

There is a non-member sort() function (Listing 16.6), but it requires random access iterators. Because the trade-off for rapid insertion was giving up random access, you can't use the non-member sort() with a list. Therefore, the class includes a member version that works within the restrictions of the class.

The `list` Toolbox

The list methods form a handy toolbox. Suppose, for example, that you have two mailing lists to organize. You could sort each list, merge them, and then use unique() to remove multiple entries.

The sort(), merge(), and unique() methods also each have a version accepting an additional argument to specify an alternative function to be used for comparing elements. Similarly, the remove() method has a version with an additional argument specifying a function used to determine whether or not an element is removed. These arguments are examples of predicate functions, a topic to which we'll return later.

`queue`

The queue template class (declared in the queue (formerly queue.h header file) is an adapter class. Recall that the ostream_iterator template is an adapter that allows an output stream to use the iterator interface. Similarly, the queue template allows an underlying class (deque, by default) to exhibit the typical queue interface.

The queue template is more restrictive than deque. Not only doesn't it permit random access to elements of a queue, the queue class doesn't even allow you to iterate through a queue. Instead, it limits you to the basic operations that define a queue. You can add an element to the rear of queue, remove an element from the front of a queue, view the values of the front and rear elements, check the number of elements, and test to see if the queue is empty. Table 16.9 lists these operations.

Table 16.9. `queue` Operations
Method	Description
`bool empty() const`	Returns `true` if the queue is empty, and `false` otherwise.
`size_type size() const`	Returns the number of elements in the queue.
`T& front()`	Returns a reference to the element at the front of the queue.
`T& back()`	Returns a reference to the element at the back of the queue.
`void push(const T& x)`	Inserts `x` at the back of the queue.
`void pop()`	Removes the element at the front of the queue.

Note that pop() is a data removal method, not a data retrieval method. If you want to use a value from a queue, first use front() to retrieve the value, and then pop() to remove it from the queue.

`priority_queue`

The priority_queue template class (also declared in the queue header file) is another adapter class. It supports the same operations as queue. The main difference is that the largest item gets moved to the front of the queue. (Life is not always fair, and neither are queues.) An internal difference is that the default underlying class is vector. You can alter the comparison used to determine what gets to the head of the queue by providing an optional constructor argument:

priority_queue<int> pq1;                // default version
priority_queue<int> pq2(greater<int>);  // use greater<int> to order

The greater<>() function is a predefined function object discussed later in this chapter.

`stack`

Like queue, stack (declared in the stack—formerly stack.h—header file) is an adapter class. It gives an underlying class (vector, by default) the typical stack interface.

The stack template is more restrictive than vector. Not only doesn't it permit random access to elements of a stack, the stack class doesn't even allow you to iterate through a stack. Instead, it limits you to the basic operations that define a stack. You can push a value onto the top of a stack, pop an element from the top of a stack, view the value at the top of the stack, check the number of elements, and test to see if the stack is empty. Table 16.10 lists these operations.

Table 16.10. `stack` Operations
Method	Description
`bool empty() const`	Returns `true` if the stack is empty, and `false` otherwise.
`size_type size() const`	Returns the number of elements in the stack.
`T& top()`	Returns a reference to the element at the top of the stack.
`void push(const T& x)`	Inserts `x` at the top of the stack.
`void pop()`	Removes the element at the top of the stack.

Much as with queue, if you want to use a value from a stack, first use top() to retrieve the value, then use pop() to remove it from the queue.

Associative Containers

The associative container is another refinement of the container concept. An associative container associates a value with a key and uses the key to find the value. For example, the values could be structures representing employee information, such as name, address, office number, home and work phones, health plan, and so on, and the key could be a unique employee number. To fetch the employee information, a program would use the key to locate the employee structure. Recall that for a container X, in general, the expression X::value_type indicates the type of value stored in the container. For an associative container, the expression X::key_type indicates the type used for the key.

The strength of an associative container is that it provides rapid access to its elements. Like a sequence, an associative container allows you to insert new elements; however, you can't specify a particular location for the inserted elements. The reason is that an associative container usually has a particular algorithm for determining where to place data so that it can retrieve information quickly.

The STL provides four associative containers: set, multiset, map, and multimap. The first two types are defined in the set header file (formerly separately in set.h and multiset.h), and the second two types are defined in the map header file (formerly separately in map.h and multimap.h).

The simplest of the bunch is set; the value type is the same as the key type, and the keys are unique, meaning there is no more than one instance of a key in a set. Indeed, for set, the value is the key. The multiset type is like the set type except that it can have more than one value with the same key. For example, if the key and value type are int, a multiset object could hold, say 1,2,2,2,3,5,7,7.

For the map type, the value type is different from the key type, and the keys are unique, with only one value per key. The multimap type is similar to map, except one key can be associated with multiple values.

There's too much information about these types to cover in this chapter (but Appendix G does list the methods), so let's just look at a simple example using set and a simple example using multimap.

A `set` Example

The STL set models several concepts. It is an associative set, it is reversible, it is sorted, and the keys are unique, so it can hold no more than one of any given value. Like vector and list, set uses a template parameter to provide the type stored:

set<string> A;  // a set of string objects

An optional second template argument can be used to indicate a comparison function or object to be used to order the key. By default, the less<> template (discussed later) is used. Older implementations may not provide a default value and thus require an explicit template parameter:

set<string, less<string> > A;  // older implementation

Consider the following code:

const int N = 6;
string s1[N] = {"buffoon", "thinkers", "for", "heavy", "can", "for"};
set<string> A(s1, s1 + N);       // initialize set A using a range from array
ostream_iterator<string, char> out(cout, " ");
copy(A.begin(), A.end(), out);

Like other containers, set has a constructor (see Table 16.6) that takes a range of iterators as arguments. This provides a simple way to initialize a set to the contents of an array. Remember, the last element of a range is one past the end, and s1 + N points to one position past the end of array s1. The output for this code fragment illustrates that keys are unique (the string "for" appears twice in the array but once in the set) and that the set is sorted:

buffoon can for heavy thinkers

Mathematics defines some standard operations for sets. The union of two sets is a set that combines the contents of the two sets. If a particular value is common to both sets, it appears just once in the union because of the unique key feature. The intersection of two sets is a set consisting of those elements common to both sets. The difference between two sets is the first set minus the elements common to both sets.

The STL provides algorithms supporting these operations. They are general functions rather than methods, so they aren't restricted to set objects. However, all set objects automatically satisfy the precondition for using these algorithms, namely, that the container be sorted. The set_union() function takes five iterators as arguments. The first two define a range in one set, the second two a range in a second set, and the final iterator is an output iterator identifying a location to copy the resultant set. For example, to display the union of sets A and B, you can do this:

set_union(A.begin(), A.end(), B.begin(), B.end(),
           ostream_iterator<string, char> out(cout, " "));

Suppose you want to place the result into a set C instead of displaying it. Then you would want the last argument to be an iterator into C. The obvious choice is C.begin(), but it doesn't work for two reasons. The first reason is that associative sets treat keys as constant values, so the iterator returned by C.begin() is a constant iterator and can't be used as an output iterator. The second reason not to use C.begin() directly is that set_union(), like copy(), overwrites existing data in a container and requires the container to have sufficient space to hold the new information. C, being empty, does not satisfy that requirement. But the insert_iterator template discussed earlier solves both problems. Earlier you saw that it converts copying to insertion. Also, it models the output iterator concept, so you can use it to write to a container. So you can construct an anonymous insert_iterator to copy information to C. The constructor, recall, takes the name of the container and an iterator as arguments:

set_union(A.begin(), A.end(), B.begin(), B.end(),
           insert_iterator<set<string> >(C, C.begin()));

The set_intersection() and set_difference() functions find the set intersection and set difference of two sets, and they have the same interface as set_union().

Two useful set methods are lower_bound() and upper_bound(). The lower_bound() method takes a key as its argument and returns an iterator pointing to the first member of the set that is not less than the key argument. Similarly, the upper_bound() method takes a key as its argument and returns an iterator pointing to the first member of the set that is greater than the key argument. For example, if you had a set of strings, you could use these methods to identify a range encompassing all strings from "b" up to "f" in the set.

Because sorting determines where additions to a set go, the class has insertion methods that just specify the material to be inserted without specifying a position. If A and B are sets of strings, for example, you can do this:

string s("tennis");
A.insert(s);                    // insert a value
B.insert(A.begin(), A.end());   // insert a range

Listing 16.10 illustrates these uses of sets.

Listing 16.10 `setops.cpp`

// setops.cpp -- some set operations
#include <iostream>
#include <string>
#include <set>
#include <algorithm>
#include <iterator>
using namespace std;

int main()
{
    const int N = 6;
    string s1[N] = {"buffoon", "thinkers", "for", "heavy", "can", "for"};
    string s2[N] = {"metal", "any", "food", "elegant", "deliver","for"};

    set<string> A(s1, s1 + N);
    set<string> B(s2, s2 + N);

    ostream_iterator<string, char> out(cout, " ");
    cout << "Set A: ";
    copy(A.begin(), A.end(), out);
    cout << endl;
    cout << "Set B: ";
    copy(B.begin(), B.end(), out);
    cout << endl;

    cout << "Union of A and B:\n";
    set_union(A.begin(), A.end(), B.begin(), B.end(), out);
    cout << endl;

    cout << "Intersection of A and B:\n";
    set_intersection(A.begin(), A.end(), B.begin(), B.end(), out);
    cout << endl;

    cout << "Difference of A and B:\n";
    set_difference(A.begin(), A.end(), B.begin(), B.end(), out);
    cout << endl;

    set<string> C;
    cout << "Set C:\n";
    set_union(A.begin(), A.end(), B.begin(), B.end(),
        insert_iterator<set<string> >(C, C.begin()));
    copy(C.begin(), C.end(), out);
    cout << endl;

    string s3("grungy");
    C.insert(s3);
    cout << "Set C after insertion:\n";
    copy(C.begin(), C.end(),out);
    cout << endl;

    cout << "Showing a range:\n";
    copy(C.lower_bound("ghost"),C.upper_bound("spook"), out);
    cout << endl;

    return 0;
}

Compatibility Note

graphics/hands.gif

Older implementations may use set.h, iterator.h, and algo.h. Older implementations may require less<string> as a second template argument for set. Also, older versions may use ostream_iterator<string> instead of ostream_iterator<string,char>.

Here is the output:

Set A: buffoon can for heavy thinkers
Set B: any deliver elegant food for metal
Union of A and B:
any buffoon can deliver elegant food for heavy metal thinkers
Intersection of A and B:
for
Difference of A and B:
buffoon can heavy thinkers
Set C:
any buffoon can deliver elegant food for heavy metal thinkers
Set C after insertion:
any buffoon can deliver elegant food for grungy heavy metal thinkers
Showing a range:
grungy heavy metal

A `multimap` Example

Like set, multimap is a reversible, sorted, associative container. However, the key type is different from the value type, and a multimap object can have more than one value associated with a particular key.

The basic multimap declaration specifies the key type and the type of value stored as template arguments. For example, the following declaration creates a multimap object using int as the key type and string as the type of value stored:

multimap<int,string> codes;

An optional third template argument can be used to indicate a comparison function or object to be used to order the key. By default, the less<> template (discussed later) is used with the key type as its parameter. Older implementations may require this template parameter explicitly.

To keep information together, the actual value type combines the key type and the data type into a single pair. To do this, the STL uses a pair<class T, class U> template class for storing two kinds of values in a single object. If keytype is the key type and datatype is the type of the stored data, then the value type is pair<const keytype, datatype>. For example, the value type for the codes object declared earlier is pair<const int, string>.

Suppose, for example, you wanted to store city names using the area code as a key. This happens to fit the codes declaration, which uses an int for a key and a string as a data type. One approach is to create a pair and then insert it:

pair<const int, string> item(213, "Los Angeles");
codes.insert(item);

Or you can create an anonymous pair object and insert it in a single statement:

codes.insert(pair<const int, string> (213, "Los Angeles"));

Because items are sorted by key, there's no need to identify an insertion location.

Given a pair object, you can access the two components by using the first and second members:

pair<const int, string> item(213, "Los Angeles");
cout << item.first << ' ' << item.second << endl;

What about getting information about a multimap object? The count() member function takes a key as its argument and returns the number of items having that key. There are lower_bound() and upper_bound() member functions that take a key and work as they did for set. There's an equal_range() member function that takes a key as its argument and returns iterators representing the range matching that key. In order to return two values, the method packages them into a pair object, this time with both template arguments being the iterator type. For example, the following would print a list of cities in the codes object with area code 718:

pair<multimap<KeyType, string>::iterator,
     multimap<KeyType, string>::iterator> range
                         = codes.equal_range(718);
cout << "Cities with area code 718:\n";
for (it = range.first; it != range.second; ++it)
    cout <<  (*it).second    << endl;

Listing 16.11 demonstrates most of these techniques. It also uses typedef to simplify some of the code writing.

Listing 16.11 `multmap.cpp`

// multmap.cpp -- use a multimap
#include <iostream>
using namespace std;
#include <string>
#include <map>
#include <algorithm>

typedef int KeyType;
typedef pair<const KeyType, string> Pair;
typedef multimap<KeyType, string> MapCode;

int main()
{
    MapCode codes;

    codes.insert(Pair(415, "San Francisco"));
    codes.insert(Pair(510, "Oakland"));
    codes.insert(Pair(718, "Brooklyn"));
    codes.insert(Pair(718, "Staten Island"));
    codes.insert(Pair(415, "San Rafael"));
    codes.insert(Pair(510, "Berkeley"));

    cout << "Number of cities with area code 415: "
         << codes.count(415) << endl;
    cout << "Number of cities with area code 718: "
         << codes.count(718) << endl;
    cout << "Number of cities with area code 510: "
         << codes.count(510) << endl;
    cout << "Area Code   City\n";
    MapCode::iterator it;
    for (it = codes.begin(); it != codes.end(); ++it)
        cout << "    " << (*it).first << "     "
            << (*it).second    << endl;

    pair<MapCode::iterator, MapCode::iterator> range
         = codes.equal_range(718);
    cout << "Cities with area code 718:\n";
    for (it = range.first; it != range.second; ++it)
        cout <<  (*it).second    << endl;

    return 0;
}

Compatibility Note

graphics/hands.gif

Older implementations may use multimap.h and algo.h. Older implementations may require less<Pair> as a third template argument for multimap. Also, older versions may use ostream_iterator<string> instead of ostream_iterator<string,char>. Borland's C++Builder 1.0 wants the const omitted from the Pair typedef.

Here is the output:

Number of cities with area code 415: 2
Number of cities with area code 718: 2
Number of cities with area code 510: 2
Area Code   City
    415     San Francisco
    415     San Rafael
    510     Oakland
    510     Berkeley
    718     Brooklyn
    718     Staten Island
Cities with area code 718:
Brooklyn
Staten Island

Function Objects (aka Functors)

Many STL algorithms use function objects, also known as functors. A functor is any object that can be used with () in the manner of a function. This includes normal function names, pointers to functions, and class objects for which the () operator is overloaded, that is, classes for which the peculiar-looking function operator()() is defined. For example, you could define a class like this:

class Linear
{
private:
    double slope;
    double y0;
public:
    Linear(double _sl = 1, double _y = 0)
        : slope(_sl), y0(_y) {}
    double operator()(double x) {return y0 + slope * x; }
};

The overloaded () operator then allows you to use Linear objects like functions:

Linear f1;
Linear f2(2.5, 10.0);
double y1 = f1(12.5);   // rhs is f1.operator()(12.5)
double y2 = f2(0.4);

Remember the for_each function? It applied a specified function to each member of a range:

for_each(books.begin(), books.end(), ShowReview);

In general, the third argument could be a functor, not just a regular function. Actually, this raises a question. How do you declare the third argument? You can't declare it as a function pointer, for a function pointer specifies the argument type. Because a container can contain about any type, you don't know in advance what particular argument type should be used. The STL solves that problem by using templates. The for_each prototype looks like this:

template<class InputIterator, class Function>
Function for_each(InputIterator first, InputIterator last, Function f);

The ShowReview() prototype was this:

void ShowReview(const Review &);

This makes the identifier ShowReview have the type void (*)(const Review &), so that is the type assigned to the template argument Function. With a different function call, the Function argument could represent a class type having an overloaded () operator. Ultimately, the for_each() code will have an expression using f(...). In the ShowReview() example, f is a pointer to a function, and f(...) invokes the function. If the final for_each() argument is an object, then f(...) becomes the object invoking its overloaded () operator.

Functor Concepts

Just as the STL defines concepts for containers and iterators, it defines functor concepts:

A generator is a functor that can be called with no arguments.
A unary function is a functor that can be called with one argument.
A binary function is a functor that can be called with two arguments.

For example, the functor supplied to for_each() should be a unary function because it is applied to one container element at a time.

Of course, these concepts come with refinements:

A unary function that returns a bool value is a predicate.
A binary function that returns a bool value is a binary predicate.

Several STL functions require predicate or binary predicate arguments. For example, Listing 16.6 uses a version of sort() that took a binary predicate as its third argument:

bool WorseThan(const Review & r1, const Review & r2);
...
sort(books.begin(), books.end(), WorseThan);

The list template has a remove_if() member that takes a predicate as an argument. It applies the predicate to each member in the indicated range, removing those elements for which the predicate returns true. For example, the following code would remove all elements greater than 100 from the list three:

bool tooBig(int n){ return n > 100; }
list<int> scores;
...
scores.remove_if(tooBig);

Incidentally, this last example shows where a class functor might be useful. Suppose you wanted to remove every value greater than 200 from a second list. It would be nice if you could pass the cut-off value to tooBig() as a second argument so you could use the function with different values, but a predicate can have but one argument. If, however, you design a TooBig class, you can use class members instead of function arguments to convey additional information:

template<class T>
class TooBig
{
private:
    T cutoff;
public:
    TooBig(const T & t) : cutoff(t) {}
    bool operator()(const T & v) { return v > cutoff; }
};

Here one value (v) is passed as a function argument while the second argument (cutoff) is set by the class constructor. Given this definition, you can initialize different TooBig objects to different cut-off values:

TooBig<int> f100(100);
list<int> froobies;
list<int> scores;
...
froobies.remove_if(f100);               // use a named function object
scores.remove_if(TooBig<int>(200));     // construct a function object

Compatibility Note

graphics/hands.gif

The remove_if() method is a template method of a template class. Template methods are a recent extension to C++ template facilities (mainly because the STL needs them), and most compilers at the time of this writing haven't implemented them yet. However, there also is a non-member remove_if() function that takes a range (two iterators) and a predicate as arguments.

Suppose that you already had a template function with two arguments:

template <class T>
bool tooBig(const T & val, const T & lim)
{
    return val > lim;
}

You can use a class to convert it to a one-argument function object:

template<class T>
class TooBig2
{
private:
    T cutoff;
public:
    TooBig2(const T & t) : cutoff(t) {}
    bool operator()(const T & v) { return tooBig<T>(v, cutoff); }
};

That is, you can do the following:

TooBig2<int> tB100(100);
int x;
cin >> x;
if (tB100(x))   // same as if (tooBig(x,100))
    ...

So the call tB100(x) is the same as tooBig(x,100), but the two-argument function is converted to a one-argument function object, with the second argument being used to construct the function object. In short, the class functor TooBig2 is a function adapter, adapting a function to meet a different interface.

Predefined Functors

The STL defines several elementary function objects. They perform actions such as adding two values, comparing two values for equality, and so on. They are provided to help support STL functions that take functions as arguments. For example, consider the transform()function. It has two versions. The first version takes four arguments. The first two are iterators specifying a range in a container. (By now you must be familiar with that approach.) The third is an iterator specifying where to copy the result. The final is a functor which is applied to each element in the range to produce each new element in the result. For example, consider the following:

const int LIM = 5;
double arr1[LIM] = {36, 39, 42, 45, 48};
vector<double> gr8(arr1, arr1 + LIM);
ostream_iterator<double, char> out(cout, " ");
transform(gr8.begin(), gr8.end(), out, sqrt);

This code would calculate the square root of each element and send the resulting values to the output stream. The destination iterator can be in the original range. For example, replacing out in the above example with gr8.begin() would copy the new values over the old values. Clearly, the functor used must be one that works with a single argument.

The second version uses a function that takes two arguments, applying it to one element from each of two ranges. It takes an additional argument, which comes third in order, identifying the start of the second range. For example, if m8 were a second vector<double> object and if mean(double, double) returned the mean of two values, the following would output the average of each pair of values from gr8 and m8:

transform(gr8.begin(), gr8.end(), m8.begin(), out, mean);

Now suppose you want to add the two arrays. You can't use + as an argument because, for type double, + is a built-in operator, not a function. You could define a function to add two numbers and use it:

double add(double x, double y) { return x + y; }
...
transform(gr8.begin(), gr8.end(), m8.begin(), out, add);

But then you'd have to define a separate function for each type. It would be better to define a template, except you don't have to, because the STL already has. The functional (formerly function.h) header defines several template class function objects including one called plus<>().

Using the plus<> class for ordinary addition is possible, if awkward:

#include <functional>
...
plus<double> add;  // create a plus<double> object
double y = add(2.2, 3.4); // using plus<double>::operator()()

But it makes it easy to provide a function object as an argument:

transform(gr8.begin(), gr8.end(), m8.begin(), out, plus<double>() );

Here, rather than create a named object, the code uses the plus<double> constructor to construct a function object to do the adding. (The parentheses indicate calling the default constructor; what's passed to transform() is the constructed function object.)

The STL provides function object equivalents for all the built-in arithmetic, relational, and logical operators. Table 16.11 shows the names for these functor equivalents. They can be used with the C++ built-in types or with any user-defined type that overloads the corresponding operator.

Caution

graphics/tnt.gif

Older implementations use the name times instead of multiplies.

Table 16.11. Operators and Function Object Equivalents
Operator	Function Object Equivalent
`+`	`plus`
`-`	`minus`
`*`	`multiplies`
`/`	`divides`
`%`	`modulus`
`-`	`negate`
`==`	`equal_to`
`!=`	`not_equal_to`
`>`	`greater`
`<`	`less`
`>=`	`greater_equal`
`<=`	`less_equal`
`&&`	`logical_and`
`\|\|`	`logical_or`
`!`	`logical_not`

Adaptable Functors and Function Adapters

The predefined functors of Table 16.11 are all adaptable. Actually, the STL has five related concepts: the adaptable generator, the adaptable unary function, the adaptable binary function, the adaptable predicate, and the adaptable binary predicate.

What makes a functor object adaptable is that it carries typedef members identifying its argument types and return type. The members are called result_type, first_argument_type, and second_argument_type, and they represent what they sound like. For example, the return type of a plus<int> object is identified as plus<int>::result_type, and this would be a typedef for int.

The significance of a function object being adaptable is that it then can be used by function adapter objects which assume the existence of these typedef members. For example, a function with an argument that is an adaptable functor can use the result_type member to declare a variable that matches the function's return type.

Indeed, the STL provides function adapter classes that use these facilities. For example, suppose you want to multiply each element of the vector gr8 by 2.5. That calls for using the transform() version with a unary function argument, like the

transform(gr8.begin(), gr8.end(), out, sqrt);

example shown earlier. The multiplies() functor can do the multiplication, but it's a binary function. So you need a function adapter that converts a functor with two arguments to one with one argument. The TooBig2 example earlier showed one way, but the STL has automated the process with the binder1st and binder2nd classes, which convert adaptable binary functions to adaptable unary functions.

Let's look at binder1st. Suppose you have an adaptable binary function object f2(). You can create a binder1st object that binds a particular value, call it val, to be used as the first argument to f2():

binder1st(f2, val) f1;

Then, invoking f1(x) with its single argument returns the same value as invoking f2() with val as its first argument and f1()'s argument as its second argument. That is, f1(x) is equivalent to f2(val, x) except that it is a unary function instead of a binary function. The f2() function has been adapted. Again, this is possible only if f2() is an adaptable function.

This might seem a bit awkward. However, the STL provides the bind1st() function to simplify using the binder1st class. You give it the function name and value used to construct a binder1st object, and it returns an object of that type. For example, let's convert the binary function multiplies() to a unary function that multiplies its argument by 2.5. Just do this:

bind1st(multiplies<double>(), 2.5)

Thus, the solution to multiplying every element in gr8 by 2.5 and displaying the results is this:

transform(gr8.begin(), gr8.end(), out,
        bind1st(multiplies<double>(), 2.5));

The binder2nd class is similar, except that it assigns the constant to the second argument instead of the first. It has a helper function called bind2nd that works analogously to bind1st.

Tip

graphics/bulb.gif

If an STL function calls for a unary function and you have an adaptable binary function that does the right thing, you can use bind1st() or bind2nd() to adapt the binary function to a unary interface.

Listing 16.12 incorporates some of the recent examples into a short program.

Listing 16.12 `funadap.cpp`

// funadap.cpp -- using function adapters
#include <iostream>
using namespace std;
#include <vector>
#include <iterator>
#include <algorithm>
#include <functional>

void Show(double);
const int LIM = 5;
int main()
{
    double arr1[LIM] = {36, 39, 42, 45, 48};
    double arr2[LIM] = {25, 27, 29, 31, 33};
    vector<double> gr8(arr1, arr1 + LIM);
    vector<double> m8(arr2, arr2 + LIM);

    cout << "gr8:\t";
    for_each(gr8.begin(), gr8.end(), Show);
    cout << endl;
    cout << "m8: \t";
    for_each(m8.begin(), m8.end(), Show);
    cout << endl;

    vector<double> sum(LIM);
    transform(gr8.begin(), gr8.end(), m8.begin(), sum.begin(),
              plus<double>());
    cout << "sum:\t";
    for_each(sum.begin(), sum.end(), Show);
    cout << endl;

    vector<double> prod(LIM);
    transform(gr8.begin(), gr8.end(), prod.begin(),
              bind1st(multiplies<double>(), 2.5));
    cout << "prod:\t";
    for_each(prod.begin(), prod.end(), Show);
    cout << endl;

    return 0;
}

void Show(double v)
{
    cout << v << ' ';
}

Compatibility Note

graphics/hands.gif

Older implementations may use vector.h, iterator.h, algo.h, and function.h. Older implementations may use times instead of multiplies.

Here is the output:

gr8:    36 39 42 45 48
m8:     25 27 29 31 33
sum:    61 66 71 76 81
prod:   90 97.5 105 112.5 120

Algorithms

The STL contains many non-member functions for working with containers. You've seen a few of them already: sort(), copy(), find(), for_each(), random_shuffle(), set_union(), set_intersection(), set_difference(), and transform(). You've probably noticed they feature the same overall design, using iterators to identify data ranges to be processed and to identify where results are to go. Some also take a function object argument to be used as part of the data processing.

There are two main generic components to the algorithm function designs. First, they use templates to provide generic types. Second, they use iterators to provide a generic representation for accessing data in a container. Thus, the copy() function can work with a container holding type double values in an array, with a container holding string values in a linked list, or with a container storing user-defined objects in a tree structure, such as used by set. Because pointers are a special case of iterators, STL functions such as copy() can be used with ordinary arrays.

The uniform container design allows there to be meaningful relations between containers of different kinds. For example, you can use copy() to copy values from an ordinary array to a vector object, from a vector object to a list object, and from a list object to a set object. You can use == to compare different kinds of containers, for example, deque and vector. This is possible because the overloaded == operator for containers uses iterators to compare contents, so a deque object and a vector object test as equal if they have the same content in the same order.

Algorithm Groups

The STL divides the algorithm library into four groups:

Non-modifying sequence operations
Mutating sequence operations
Sorting and related operations
Generalized numeric operations

The first three groups are described in the algorithm (formerly algo.h) header file, while the fourth group, being specifically oriented towards numeric data, gets its own header file, called numeric. (Formerly, they, too, were in algol.h.)

Non-modifying sequence operations operate on each element in a range. These operations leave a container unchanged. For example, find() and for_each() belong to this category.

Mutating sequence operations also operate on each element in a range. As the name suggests, however, they can change the contents of a container. The change could be in values or in the order in which the values are stored. For example, transform(), random_shuffle(), and copy() fall into this category.

Sorting and related operations include several sorting functions (including sort()) and a variety of other functions, including the set operations.

The numeric operations include functions to sum the contents of a range, calculate the inner product of two containers, calculate partial sums, and calculate adjacent differences. Typically, these are operations characteristic of arrays, so vector is the container most likely to be used with them.

Appendix G provides a complete summary of these functions.

General Properties

As you've seen again and again, STL functions work with iterators and iterator ranges. The function prototype indicates the assumptions made about the iterators. For example, the copy() function has this prototype:

template<class InputIterator, class OutputIterator>
OutputIterator copy(InputIterator first, InputIterator last,
                     OutputIterator result);

Because the identifiers InputIterator and OutputIterator are template parameters, they just as easily could have been T and U. However, the STL documentation uses the template parameter names to indicate the concept that the parameter models. So this declaration tells us that the range parameters must be input iterators or better and that the iterator indicating where the result goes must be an output parameter or better.

One way of classifying algorithms is on the basis of where the result of the algorithm is placed. Some algorithms do their work in place, others create copies. For example, when the sort() function is finished, the result occupies the same location that the original data did. So sort() is an in-place algorithm. The copy() function, however, sends the result of its work to another location, so it is a copying algorithm. The transform() function can do both. Like copy(), it uses an output iterator to indicate where the results go. Unlike copy(), transform() allows the output iterator to point to a location in the input range, so it can copy the transformed values over the original values.

Some algorithms come in two versions: an in-place version and a copying version. The STL convention is to append _copy to the name of the copying version. The latter version will take an additional output iterator parameter to specify the location to which to copy the outcome. For example, there is a replace() function with this prototype:

template<class ForwardIterator, class T>
void replace(ForwardIterator first, ForwardIterator last,
              const T& old_value, const T& new_value);

It replaces each instance of old_value with new_value. This occurs in place. Because this algorithm both reads from and writes to container elements, the iterator type has to be ForwardIterator or better. The copying version has this prototype:

template<class InputIterator, class OutputIterator, class T>
OutputIterator replace_copy(InputIterator first, InputIterator last,
              OutputIterator result,
              const T& old_value, const T& new_value);

This time the resulting data is copied to a new location given by result, so the read-only input iterator is sufficient for specifying the range.

Note that replace_copy() has an OutputIterator return type. The convention for copying algorithms is that they return an iterator pointing to the location one past that last value copied.

Another common variation is that some functions have a version that performs an action conditionally, depending upon the result of applying a function to a container element. These versions typically append _if to the function name. For example, replace_if() replaces an old value with a new value if applying a function to the old value returns a value of true. Here's the prototype:

template<class ForwardIterator, class Predicate class T>
void replace_if(ForwardIterator first, ForwardIterator last,
              Predicate pred, const T& new_value);

A predicate, recall, is the name of a unary function returning a bool value. There's also a version called replace_copy_if(). You probably can figure out what it does and what its prototype is like.

As with InputIterator, Predicate is a template parameter name and could just as easily be called T or U. However, the STL chooses to use Predicate to remind the user that the actual argument should be a model of the Predicate concept. Similarly, the STL uses terms like Generator and BinaryPredicate to identify arguments that should model other function object concepts.

The STL and the `string` Class

The string class, although not part of the STL, is designed with the STL in mind. For example, it has begin(), end(), rbegin(), and rend() members. Thus, it can use the STL interface. Listing 16.13 uses the STL to show all the permutations you can form from the letters in a word. A permutation is a rearrangement of the order of the elements in a container. The next_permutation() algorithm transforms the contents of a range to the next permutation; in the case of a string, the permutations are arranged in increasing alphabetical order. The algorithm returns true if it succeeds and false if the range already is in the final sequence. To get all the permutations of a range, you should start with the elements in the earliest possible order, and the program uses the STL sort() algorithm for that purpose.

Listing 16.13 `strgstl.cpp`

// strgstl.cpp -- applying the STL to a string
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;

int main()
{
    string letters;

    cout << "Enter the letter grouping (quit to quit): ";
    while (cin >> letters && letters != "quit")
    {
        cout << "Permutations of " << letters << endl;
        sort(letters.begin(), letters.end());
        cout << letters << endl;
        while (next_permutation(letters.begin(), letters.end()))
            cout << letters << endl;
        cout << "Enter next sequence (quit to quit): ";
    }
    cout << "Done.\n";

    return 0;
}

Here's a sample run:

Enter the letter grouping (quit to quit): wed
Permutations of wed
dew
dwe
edw
ewd
wde
wed
Enter next sequence (quit to quit): wee
Permutations of wee
eew
ewe
wee
Enter next sequence (quit to quit): quit
Done.

Note that the next_permutation() algorithm automatically provides only unique permutations, which is why the output shows more permutations for the word "wed" than for the word "wee", which has duplicate letters.

Functions Versus Container Methods

Sometimes you have a choice between using an STL method and an STL function. Usually, the method is the better choice. First, it should be better optimized for a particular container. Second, being a member function, it can use a template class's memory management facilities and resize a container when needed.

Suppose, for example, that you have a list of numbers and you want to remove all instances of a certain value, say 4, from the list. If la is a list<int> object, you can use the list remove() method:

la.remove(4);  // remove all 4s from the list

After this method call, all elements with the value 4 are removed from the list, and the list is automatically resized.

There also is an STL algorithm called remove() (see Appendix G). Instead of being invoked by an object, it takes range arguments. So, if lb is a list<int> object, a call to the function could look like this:

remove(lb.begin(), lb.end(), 4);

However, because this remove() is not a member, it can't adjust the size of the list. Instead, it makes sure all the non-removed items are at the beginning of the list, and it returns an iterator to the new past-the-end value. You then can use this iterator to fix the list size. For example, you can use the list erase() method to remove a range describing the part of the list that no longer is needed. Listing 16.14 shows how this process works.

Listing 16.14 `listrmv.cpp`

// listrmv.cpp -- applying the STL to a string
#include <iostream>
#include <list>
#include <algorithm>
using namespace std;

void Show(int);
const int LIM = 10;
int main()
{
    int ar[LIM] = {4, 5, 4, 2, 2, 3, 4, 8, 1, 4};
    list<int> la(ar, ar + LIM);
    list<int> lb(la);

    cout << "Original list contents:\n\t";
    for_each(la.begin(), la.end(), Show);
    cout << endl;
    la.remove(4);
    cout << "After using the remove() method:\n";
    cout << "la:\t";
    for_each(la.begin(), la.end(), Show);
    cout << endl;
    list<int>::iterator last;
    last = remove(lb.begin(), lb.end(), 4);
    cout << "After using the remove() function:\n";
    cout << "lb:\t";
    for_each(lb.begin(), lb.end(), Show);
    cout << endl;
    lb.erase(last, lb.end());
    cout << "After using the erase() method:\n";
    cout << "lb:\t";
    for_each(lb.begin(), lb.end(), Show);
    cout << endl;

    return 0;
}

void Show(int v)
{
    cout << v << ' ';
}

Here's the output:

Original list contents:
    4 5 4 2 2 3 4 8 1 4
After using the remove() method:
la: 5 2 2 3 8 1
After using the remove() function:
lb: 5 2 2 3 8 1 4 8 1 4
After using the erase() method:
lb: 5 2 2 3 8 1

As you can see, the remove() method reduced the list la from 10 elements to 6 elements. However, list lb still contained 10 elements after the remove() function was applied to it.

Although the methods are usually better suited, the non-method functions are more general. As you've seen, you can use them on arrays and string objects as well as STL containers, and you can use them with mixed container types, for example, saving data from a vector container in a list or a set.

Using the STL

The STL is a library whose parts are designed to work together. The STL components are tools, but they also are building blocks to create other tools. Let's illustrate that with an example. Suppose you want to write a program that lets the user enter words. At the end, you'd like a record of the words as they were entered, an alphabetical list of the words used (capitalization differences ignored), and a record of how many times each word was entered. To keep things simple, assume the input contains no numbers or punctuation.

Entering and saving the list of words is simple enough. Following the example of Listing 16.5, you can create a vector<string> object and use push_back() to add input words to the vector:

vector<string> words;
string input;
while (cin >> input && input != "quit")
    words.push_back(input);

What about getting the alphabetic word list? You can use sort() followed by unique(), but that approach overwrites the original data because sort() is an in-place algorithm. There is an easier way that avoids this problem. Create a set<string> object, and copy (using an insert iterator) the words from the vector to the set. A set automatically sorts its contents, taking the place of calling sort(), and a set only allows one copy of a key, so that takes the place of calling unique(). Wait! The specification called for ignoring the case differences. One way to handle that is to use transform() instead of copy() to copy data from the vector to the set. For the transformation function, use one that converts a string to lowercase.

set<string> wordset;
transform(words.begin(), words.end(),
    insert_iterator<set<string> > (wordset, wordset.begin()), ToLower);

The ToLower() function is easy to write. Just use transform() to apply the tolower() function to each element in the string, using the string both as source and destination. Remember, string objects, too, can use the STL functions. Passing and returning the string as a reference means the algorithm works on the original string without having to make copies.

string & ToLower(string & st)
{
    transform(st.begin(), st.end(), st.begin(), tolower);
    return st;
}

One possible problem is that the tolower() function is defined as int tolower(int), and some compilers want the function to match the element type, which is char. One solution is to replace tolower with toLower and to provide the following definition:

char toLower(char ch) { return tolower(ch); }

To get the number of times each word appeared in the input, you can use the count() function. It takes a range and a value as arguments and returns the number of times the value appears in the range. You can use the vector object to provide the range and the set object to provide the list of words to count. That is, for each word in the set, count how many times it appears in the vector. To keep the resulting count associated with the correct word, store the word and the count as a pair<const string, int> object in a map object. The word will be the key (just one copy), and the count will be the value. This can be done in a single loop:

map<string, int> wordmap;
set<string>::iterator  si;
for (si = wordset.begin(); si != wordset.end(); si++)
    wordmap.insert(pair<string, int>(*si, count(words.begin(),
    words.end(), *si)));

Caution

graphics/tnt.gif

Older STL implementations declare count() as type void. Instead of using a return value, you provide a fourth argument passed as a reference, and the number of items is added to that argument:

int ct = 0;
count(words.begin(), words.end(), *si), ct));    count added to ct

The map class has an interesting feature—you can use array notation with keys serving as indices to access the stored values. For example, wordmap["the"] would represent the value associated with the key "the", which in this case is the number of occurrences of the string "the". Because the wordset container holds all the keys used by wordmap, you can use the following code as an alternative and more attractive way of storing results:

for (si = wordset.begin(); si != wordset.end(); si++)
    wordmap[*si] = count(words.begin(), words.end(), *si);

Because si points to a string in the wordset container, *si is a string and can serve as a key for wordmap. This code places both keys and values into the wordmap map.

Similarly, you can use the array notation to report results:

for (si = wordset.begin(); si != wordset.end(); si++)
    cout << *si << ": " << wordmap[*si] << endl;

If a key is invalid, the corresponding value is 0.

Listing 16.15 puts these ideas together and includes code to display the contents of the three containers (a vector with the input, a set with a word list, and a map with a word count).

Listing 16.15 `usealgo.cpp`

//usealgo.cpp
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <map>
#include <iterator>
#include <algorithm>
#include <cctype>
using namespace std;

char toLower(char ch) { return tolower(ch); }
string & ToLower(string & st);
void display(const string & s);

int main()
{
    vector<string> words;
    cout << "Enter words (enter quit to quit):\n";
    string input;
    while (cin >> input && input != "quit")
        words.push_back(input);

    cout << "You entered the following words:\n";
    for_each(words.begin(), words.end(), display);
    cout << endl;

    // place words in set, converting to lowercase
    set<string> wordset;
    transform(words.begin(), words.end(),
        insert_iterator<set<string> > (wordset, wordset.begin()),
        ToLower);
    cout << "\nAlphabetic list of words:\n";
    for_each(wordset.begin(), wordset.end(), display);
    cout << endl;

    // place word and frequency in map
    map<string, int> wordmap;
    set<string>::iterator si;
    for (si = wordset.begin(); si != wordset.end(); si++)
        wordmap[*si] = count(words.begin(), words.end(), *si);

    // display map contents
    cout << "\nWord frequency:\n";
    for (si = wordset.begin(); si != wordset.end(); si++)
        cout << *si << ": " << wordmap[*si] << endl;

    return 0;
}

string & ToLower(string & st)
{
    transform(st.begin(), st.end(), st.begin(), toLower);
    return st;
}

void display(const string & s)
{
    cout << s << " ";
}

Compatibility Note

graphics/hands.gif

Older implementations may use vector.h, set.h, map.h, iterator.h, algo.h, and ctype.h. Older implementations may require the set and map templates to use an additional less<string> template parameter. Older versions use the type void count() function mentioned earlier.

Here is a sample run:

Enter words (enter quit to quit):
The dog saw the cat and thought the cat fat
The cat thought the cat perfect
quit
You entered the following words:
The dog saw the cat and thought the cat fat The cat thought the cat perfect

Alphabetic list of words:
and cat dog fat perfect saw that the thought

Word frequency:
and: 1
cat: 4
dog: 1
fat: 1
perfect: 1
saw: 1
that: 1
the: 4
thought: 2

The moral here is that your attitude when using the STL should be, how much code can I avoid writing myself? STL's generic and flexible design should save you lots of work. Also, the STL designers are algorithm people very much concerned with efficiency. So the algorithms are well-chosen and inline.

Other Libraries

C++ provides some other class libraries that are more specialized than the examples covered in this chapter. The complex header file provides a complex class template for complex numbers, with specializations for float, long, and long double. The class provides standard complex number operations along with standard functions that can be used with complex numbers.

The valarray header file provides a valarray template class. This class template is designed to represent numeric arrays and provides support for a variety of numeric array operations, such as adding the contents of one array to another, applying math functions to each element of an array, and applying linear algebra operations to arrays.

Summary

C++ provides a powerful set of libraries that provide solutions to many common programming problems and the tools to simplify many more problems. The string class provides a convenient means to handle strings as objects. The class provides automatic memory management and a host of methods and functions for working with strings. For example, they allow you to concatenate strings, insert one string into another, reverse a string, search a string for characters or substrings, and perform input and output operations.

The auto_ptr template makes it easier to manage memory allocated by new. If you use an auto_ptr object instead of a regular pointer to hold the address returned by new, you don't have to remember to use the delete operator later. When the auto_ptr object expires, its destructor will call the delete operator automatically.

The Standard Template Library, or STL, is a collection of container class templates, iterator class templates, function object templates, and algorithm function templates that feature a unified design based on generic programming principles. The algorithms use templates to make them generic in terms of type of stored object and an iterator interface to make them generic in terms of the type of container. Iterators are generalizations of pointers.

The STL uses the term "concept" to denote a set of requirements. For example, the concept of forward iterator includes the requirements that a forward iterator object can be dereferenced for reading and writing, and that it can be incremented. Actual implementations of the concept are said to "model" the concept. For example, the forward iterator concept could be modeled by an ordinary pointer or by an object designed to navigate a linked list. Concepts based on other concepts are termed "refinements". For example, the bidirectional iterator is refinement of the forward iterator concept.

Container classes, such as vector and set, are models of container concepts, such as container, sequence, and associative container. The STL defines several container class templates: vector, deque, list, set, multiset, map, multimap, and bitset. It also defines adapter class templates queue, priority_queue, and stack; these classes adapt an underlying container class to give it the characteristic interface suggested by the adapter class template name. Thus, stack, although based, in default, on vector, allows insertion and removal only at the top of the stack.

Some algorithms are expressed as container class methods, but the bulk are expressed as general, non-member functions. This is made possible by using iterators as the interface between containers and algorithms. One advantage to this approach is that there need be just one for_each() or copy() function, and so on, instead of a separate version for each container. A second advantage is that STL algorithms can be used with non-STL containers, such as ordinary arrays, string objects, and any classes you design consistent with the STL iterator and container idiom.

Both containers and algorithms are characterized by the type of iterator they provide or need. You should check that the container features an iterator concept that supports the algorithm's needs. For example, the for_each() algorithm uses an input iterator, whose minimal requirements are met by all the STL container class types. But sort() requires random access iterators, which not all container classes support. A container class may offer a specialized method as an option if it doesn't meet the requirements for a particular algorithm. For example, the list class has a sort() method based on bidirectional iterators, so it can use that method instead of the general function.

The STL also provides function objects, or functors, that are classes for which the () operator is overloaded; that is, for which the operator()() method is defined. Objects of such classes can be invoked using function notation but can carry additional information. Adaptable function objects, for example, have typedef statements identifying the argument types and the return value type for the function object. This information can be used by other components, such as function adapters.

The STL, by representing common container types and providing a variety of common operations implemented with efficient algorithms, all done in a generic manner, provides an excellent source of reusable code. You may be able to solve a programming problem directly with the STL tools, or you may be able to use them as building blocks to construct the solution you need.

The complex and valarray template classes support numerical operations for complex numbers and arrays.

Review Questions

1:	Consider the following class declaration: class RQ1 { private: char * st; // points to C-style string public: RQ1() { st = new char [1]; strcpy(st,""); } RQ1(const char * s) {st = new char [strlen(s) + 1]; strcpy(st, s); } RQ1(const RQ1 & rq) {st = new char [strlen(rq.st) + 1]; strcpy(st, rq.st); } ~RQ1() {delete [] st}; RQ & operator=(const RQ & rq); // more stuff }; Convert this to a declaration using a `string` object instead. What methods no longer need an explicit definition?
2:	Name at least two advantages `string` objects have over C-style strings in terms of ease-of-use.
3:	Write a function that takes a reference to a `string` as an argument and which converts the `string` to all uppercase.
4:	Which of the following are not examples of correct usage (conceptually or syntactically) of `auto_ptr`? (Assume the needed header files have been included.) auto_ptr<int> pia(new int[20]); auto_ptr<str> (new string); int rigue = 7; auto_ptr<int>pr(&rigue); auto_ptr dbl (new double);
5:	If you could make the mechanical equivalent of a stack that held golf clubs instead of numbers, why would it (conceptually) be a bad golf bag?
6:	Why would a `set` container be a poor choice for storing a hole-by-hole record of your golf scores?
7:	Because a pointer is an iterator, why didn't the STL designers simply use pointers instead of iterators?
8:	Why didn't the STL designers simply define a base iterator class, use inheritance to derive classes for the other iterator types, and express the algorithms in terms of those iterator classes?
9:	Give at least three examples of convenience advantages that a `vector` object has over an ordinary array.
10:	If Listing 16.6 were implemented with `list` instead of `vector`, what parts of the program would become invalid? Could the invalid part be fixed easily? If so, how?

Programming Exercises

A palindrome is a string that is the same backwards as it is forwards. For example, "tot" and "otto" are rather short palindromes. Write a program that lets a user enter a string and which passes a reference to the string to a bool function. The function should return true if the string is a palindrome and false otherwise. At this point, don't worry about complications such as capitalization, spaces, and punctuation. That is, this simple version will reject "Otto" and "Madam, I'm Adam". Feel free to scan the list of string methods in Appendix F for methods to simplify the task.

Do the same problem as given in programming exercise 1, but do worry about complications such as capitalization, spaces, and punctuation. That is, "Madam, I'm Adam" should test as a palindrome. For example, the testing function could reduce the string to "madamimadam" and then test if the reverse is the same. Don't forget the useful cctype library. You may find an STL function or two useful, although not necessary.

You must write a function with an old-style interface. It has this prototype:

int reduce(long ar[], int n);

The arguments are the name of an array and the number of elements in the array. The function sorts an array, removes duplicate values, and returns a value equal to the number of elements in the reduced array. Write the function using STL functions. (If you decide to use the general unique() function, note that it returns the end of the resulting range.) Test the function in a short program.

Do the same problem as described in programming exercise 3, except make it a template function:

template <class T>
int reduce(T ar[], int n);

Test the function in a short program using both a long instantiation and a string instantiation.

Redo the example shown in Listing 12.13 using the STL queue template class instead of the Queue class of Chapter 12.

A common game is the lottery card. The card has numbered spots of which a certain number are selected at random. Write a Lotto() function that takes two arguments. The first is the number of spots on a lottery card and the second the number of spots selected at random. The function returns a vector<int> object containing, in sorted order, the numbers selected at random. For example, you could use the function as follows:

vector<int> winners;
winners = Lotto(51,6);

This would assign to winners a vector containing six numbers selected randomly from the range 1 through 51. Note that simply using rand() doesn't quite do the job because it may produce duplicate values. Suggestion: Have the function create a vector containing all the possible values, use random_shuffle(), and then use the beginning of the shuffled vector to obtain the values. Also write a short program that lets you test the function.

Mat and Pat want to invite their friends to a party. They ask you to write a program that does the following:

Allows Mat to enter a list of his friends' names. The names are stored in a container and then displayed in sorted order.

Allows Pat to enter a list of her friends' names. The names are stored in a second container and then displayed in sorted order.

Creates a third container that merges the two lists, eliminating duplicates, and displays the contents of this container.

CONTENTS

Chapter 16. THE string CLASS AND THE STANDARD TEMPLATE LIBRARY

The string Class

Constructing a String

Table 16.1. string Class Constructors

Listing 16.1 str1.cpp

Program Notes

Figure 16.1. string constructor using a range.

string Class Input

Listing 16.2 str2.cpp

Working with Strings

Table 16.2. The Overloaded find() Method

Listing 16.3 str3.cpp

Program Notes

What Else?

The auto_ptr Class

Using an auto_ptr

Figure 16.2. Regular pointer versus auto_ptr.

auto_ptr Considerations

The Standard Template Library

The vector Template Class

Listing 16.4 vect1.cpp

Things to Do to Your Vectors

Figure 16.3. The STL range concept.

Listing 16.5 vect2.cpp

More Things to Do to Your Vectors

Listing 16.6 vect3.cpp

Generic Programming

Why Iterators?

Kinds of Iterators

Input Iterator

Output Iterator

Forward Iterator

Bidirectional Iterator

Random Access Iterator

Table 16.3. Random Access Iterator Operations

Iterator Hierarchy

Table 16.4. Iterator Capabilities

Concepts, Refinements, and Models

The Pointer As Iterator

copy(), ostream_iterator, and istream_iterator

Other Useful Iterators

Listing 16.7 copyit.cpp

Listing 16.8 inserts.cpp

Kinds of Containers

The Container Concept

Table 16.5. Some Basic Container Properties

Sequences

Table 16.6. Sequence Requirements

Table 16.7. Optional Sequence Requirements

Figure 16.4. push_front() and push_back().

vector

deque

list

Table 16.8. Some list Member Functions

Listing 16.9 list.cpp

Program Notes

The list Toolbox

queue

Table 16.9. queue Operations

priority_queue

stack

Table 16.10. stack Operations

Associative Containers

A set Example

Listing 16.10 setops.cpp

A multimap Example

Listing 16.11 multmap.cpp

Function Objects (aka Functors)

Functor Concepts

Predefined Functors

Table 16.11. Operators and Function Object Equivalents

Adaptable Functors and Function Adapters

Listing 16.12 funadap.cpp

Algorithms

Algorithm Groups

General Properties

The STL and the string Class

Listing 16.13 strgstl.cpp

Functions Versus Container Methods

Listing 16.14 listrmv.cpp

Chapter 16. THE `string` CLASS AND THE STANDARD TEMPLATE LIBRARY

The `string` Class

Table 16.1. `string` Class Constructors

Listing 16.1 `str1.cpp`

Figure 16.1. `string` constructor using a range.

`string` Class Input

Listing 16.2 `str2.cpp`

Table 16.2. The Overloaded `find()` Method

Listing 16.3 `str3.cpp`

The `auto_ptr` Class

Using an `auto_ptr`

Figure 16.2. Regular pointer versus `auto_ptr`.

`auto_ptr` Considerations

The `vector` Template Class

Listing 16.4 `vect1.cpp`

Listing 16.5 `vect2.cpp`

Listing 16.6 `vect3.cpp`

`copy()`, `ostream_iterator`, and `istream_iterator`

Listing 16.7 `copyit.cpp`

Listing 16.8 `inserts.cpp`

Figure 16.4. `push_front()` and `push_back()`.

`vector`

`deque`

`list`

Table 16.8. Some `list` Member Functions

Listing 16.9 `list.cpp`

The `list` Toolbox

`queue`

Table 16.9. `queue` Operations

`priority_queue`

`stack`

Table 16.10. `stack` Operations

A `set` Example

Listing 16.10 `setops.cpp`

A `multimap` Example

Listing 16.11 `multmap.cpp`

Listing 16.12 `funadap.cpp`

The STL and the `string` Class

Listing 16.13 `strgstl.cpp`

Listing 16.14 `listrmv.cpp`

Listing 16.15 `usealgo.cpp`