Sharing

Code Review 3: Sharing

Have you noticed how we oscillate between global and local perspective in our code reviews? Whenever we talk about sharing, we take the more global stance. Whenever we talk about data or implementation hiding, we take a local stance. It's time to swing the pendulum towards sharing.

Isolating Global Program Parameters

From the global perspective some of the constants that we've been so busily burying inside local scopes are actually tunable parameters of our program. What if somebody (say, our client) wants to increase the maximum length of symbolic names? Or what if he wants to be able to add more entries to the symbol table? Or desperately needs to increase the size of the input buffer? We'll have to dig deep into our classes and functions to find the appropriate constants.

Fortunately, there are several ways to make such fine tuning less cumbersome. I'll only show the most basic one-collecting all tunable constants in a single file params.h. Here's the contents of this file

const int maxBuf = 100; // size of input buffer const int maxSymbols = 40; // size of symbol table const int maxSymLen = 80; // max length of symbol name

There will always be some tunable parameters in your program whose change will require recompilation. Their place is in params.h.

There usually is another set of parameters that should be tunable by the user (or the administrator). These are called user preferences and are stored persistently in some convenient location. Even these parameters have their default values whose place is, you guessed it, in params.h.

Testing Boundary Conditions

We know that our program has built-in limitations: see params.h. Obviously we want to make these limitations as unobtrusive as possible. We generously allocate space for symbols, characters in buffers, etc., so that in day-to-day operation the user doesn't hit these limits. Unfortunately, that also means that we, the programmers, are unlikely to hit these limitations in our testing. And rest assured--whatever boundaries we don't hit, our users will!

So let's make a little experiment. Let's edit the file params.h and set all the limits to some relatively low values. We might even keep two sets of parameters--one for regular operation and one for testing. I used conditional compilation directives to switch between the two sets. Changing 0 to 1 in the #if directive will switch me back to the regular set.

#if 0 const int maxBuf = 100; // size of input buffer const int maxSymbols = 40; // size of symbol table const int maxSymLen = 80; // max length of symbol name #else const int maxBuf = 8; // size of input buffer const int maxSymbols = 5; // size of symbol table const int maxSymLen = 4; // max length of symbol name #endif

After recompiling the whole program, I ran it and immediately hit an assertion in the constructor of the Function::Table. Quick inspection showed that the program was trying to cram 14 symbols for built-in functions into our symbol table, whose maximum capacity was set to 5. That's good! Obviously the value of 5 for the size of the symbol table was too small, but we were able to catch it immediately.

Is an assertion good enough protection from this kind of a problem? In general, yes. This problem might only arise when somebody changes the value of maxSymLen or ads more built-in functions. We are safe in that case, because:

After every code change the program will be re-built in the debugging version (with the assertions turned on) and run at least once before it's released to others.

By "others" I don't mean customers--I mean other programmers on the same team. Before releasing a program to customers, much stricter testing is required.

So let's now set the size of the symbol table to 14--just enough for built-in functions--and try to enter an expression that introduces a new symbolic variable. Kaboom! We get a general protection fault!

What happened? In Parser::Factor we correctly discovered a problem, even printed out "Error: Too many variables," and then returned a null pointer. Well, what else were we supposed to do?! Unfortunately, the caller of Parser::Factor never expected a null pointer. Back to the drawing board!

Let's add checks for a null pointer to all the callers of Parser::Factor (as well as the callers of Parser::Term, who can propagate the null obtained from Parser::Factor).

One more test--this time everything works fine; that is until we exit the program. Remember, all this time we are running a debug build of our program under the debugger. A good debug runtime will do a heap check when your program frees memory. A heap check should discover such problems as buffer overflows, double deletions, etc. And indeed it does! The destructor of Store reports an overwritten memory block. A little more sleuthing and the culprit becomes obvious. Look at this code in the constructor of Store:

int id = symTab.ForceAdd ("e"); SetValue (id, exp (1)); cerr << "pi = " << 2 * acos (0.0) << endl; id = symTab.ForceAdd ("pi"); SetValue (id, 2 * acos (0.0));

When our symbol table overflows, it returns -1. SymbolTable::SetValue is called with -1 and happily obliges.

void SetValue (int id, double val) { assert (id < _size); _cell [id] = val; _status [id] = stInit; }

We were smart enough to assert that id be less than the size of the table, but not smart enough to guard against a negative value of id.

Okay, let's change that and try again. This time we hit the assertion immediately. Great! One more test with maxSymbols = 16 and we're in the open.

Now let's test the restriction on the size of the input buffer. Let's input a large expression, something like one with ten zeros. The program works, but not exactly as the user would expect. It first prints the incorrect result, then exits.

First of all, why does it exit? It turns out that, since it didn't retrieve all the characters from the input on the first try, the next time it calls cin.getline it returns immediately with an empty buffer (I'm not sure why it's empty). But that's not our biggest problem. A program that quietly returns incorrect results is much worse than the one that crashes. Incorrect information is worse than no information at all. We have to fix it immediately.

char buf [maxBuf+1]; // ... cin.getline (buf, maxBuf+1); if (strlen (buf) == maxBuf) { cerr << "Error: Input buffer overflow\n"; status = stError; break; }

Notice what I have done. I increased the size of the buffer by one, so that I could grab one character more than the self-imposed limit. Now I can detect if my greedy request was satisfied and, if so, I know that the input was larger than maxBuf and I bail out.

Now for the last test--maximum symbol length. Before we try that, we have to increase maxSymbols, so that we can enter a few test symbol into our program. What we discover is that, although the program works, it quietly lies to the user. Try inputing and expression toolong = 6 and then displaying the value of another variable tool. Instead of complaining about the use of unitialized variable, the program quietly treats both names as representing the same variable. The least we can do is to display a warning in Scanner::Accept.

if (_lenSymbol > maxSymLen) { cerr << "Warning: Variable name truncated\n"; _lenSymbol = maxSymLen; }

Scary, isn't it? Why was this program so buggy? Was this the norm or an exception? The truth is that, no matter how good you are, your programs will contain errors. The compiler will catch most of the simple errors--typos, type mismatches, wrong number of arguments, etc. Many errors will become obvious after some elementary testing. The rest are the "hard bugs".

There are many techniques to eradicate hard bugs. One is testing--in particular you can make testing easier by instrumenting your program--writing code whose only purpose is to catch bugs. We've done something like that by instrumenting the file params.h. I'll show you more testing and instrumentation techniques later.

There is also the matter of attitude. I wrote this program without much concern for error checking and exceptional conditions. Normally I would be much more careful.

Finally, there is the matter of policy. Dealing with exceptional conditions is not something you do on a whim. You have to develop some rules. We'll talk about it some more.

Templates

We have put the class List together with its sequencer in a standalone file with code reuse in mind. Any time in the future we need a linked list of integers we can just include this well tested and solid code. Did I just say "a linked list of integers"? What if we need a list of unsigned longs or doubles or even Stars for that matter? We need a class that is parametrized by a type. Enter templates.

To turn List into a template, you need to preface its class definition with

template <class T>

and replace ints with Ts. You are parametrizing the definition of List with the parameter T that stands for any type. Once you have the template defined, you can instantiate it by substituting any type for T--it doesn't even have to be a class. For instance,

List<int> myList;

will define a List of integers called myList. But you are also free to declare other lists, such as a list of unsigned longs, List<unsigned long>, a list of doubles, List<double> or a list of Stars, List<Star>.

Here's the definition of the List template.

template<class T> class List { public: List (); ~List (); void Add (T value); private: class Link { public: Link (Link * pNext, T value) : _pNext (pNext), _value (value) {} Link * Next () const { return _pNext; } T GetValue () const { return _value; } private: Link * _pNext; T _value; }; public: class Seq { public: Seq (List const & list) : _pLink (list.GetHead ()) {} bool AtEnd () const { return _pLink == 0; } void Advance () { _pLink = _pLink->Next (); } T GetValue () const { return _pLink->GetValue (); } private: Link const * _pLink; // current link }; friend Seq; private: Link const * GetHead () const { return _pHead; } Link * _pHead; };

To be more general, I've changed the name of the method that retrieves the stored value to GetValue.

For technical reasons, today's compilers require all template code to be visible in the place of template instantiation (this requirement might change in future compilers). What that means is that we have to put all the method implementations in the same header file as the template class definition. It doesn't make these methods inline (unless they are declared as such) nor does it imply that their code will be instantiated multiple times (unless you have a very cheap compiler).

Here's how you define the constructor--notice that only the first List, the one before the double colon, is followed by <T>.

template <class T> List<T>::List () : _pHead (0) {}

Same with the destructor:

template <class T> List<T>::~List () { // free linked list while (_pHead != 0) { Link * pLink = _pHead; _pHead = _pHead->Next (); delete pLink; } }

Other methods follow the same pattern.

template <class T> void List<T>::Add (T val) { Link * pLink = new Link (_pHead, val); _pHead = pLink; }

The second part of "templatization" is to substitute all the uses of List of integers with List<int>

class HTable { public: explicit HTable (int size): _size(size) { _aList = new List<int> [size]; } ~HTable () { delete []_aList; } void Add (char const * str, int id); public: class Seq: public List<int>::Seq { public: Seq (HTable const & htab, char const * str) : List<int>::Seq (htab.Find (str)) {} }; friend Seq; private: List<int> const & Find (char const * str) const; int hash (char const * str) const; List<int> * _aList; int _size; };

To summarize: A class template is a definitions of a class that is parametrized by one or more types. The type parameters are listed within angle brackets after the keyword template.

template <class type1, class type2 ...>

To instantiate the template it is enough to use it, with the type parameters substituted by actual types (either built-in or user-defined).