[ Team LiB ] Previous Section Next Section

Gotcha #17: Maximal Munch Problems

What do you do when faced with an expression like this?



++++p->*mp 


Have you ever had occasion to deal with the "Sergeant operator"?



template <typename T> 


class R {


   // . . .


   friend ostream &operator <<< // a sergeant operator?


       T >( ostream &, const R & );


};


Have you ever wondered whether the following expression is legal?



a+++++b 


Welcome to the world of maximal munch. In one of the early stages of C++ translation, the portion of the compiler that performs "lexical analysis" has the task of breaking up the input stream into individual "words," or tokens. When faced with a sequence of characters like ->*, the lexical analyzer might reasonably identify three tokens (-, >, and *), two tokens (-> and *), or a single token (->*). To avoid this ambiguous state of affairs, the lexical analyzer always identifies the longest token possible, consuming as many characters as it legally can: maximal munch.

The expression a+++++b is illegal, because it's tokenized as a ++ ++ + b, and it's illegal to post-increment an rvalue like a++. If you had wanted to post-increment a and add the result to a pre-incremented b, you'd have to introduce at least one space: a+++ ++b. If you have any regard for the readers of your code, you'll spring for another space, even though it's not strictly necessary: a++ + ++b, and no one would criticize the addition of a few parentheses: (a++) + (++b).

Maximal munch solves many more problems than it causes, but in two common situations, it's an annoyance. The first is in the instantiation of templates with arguments that are themselves instantiated templates. For example, using the standard library, one might want to declare a list of vectors of strings:



list<vector<string>> lovos; // error! 


Unfortunately, the two adjacent closing angle brackets in the instantiation are interpreted as a shift operator, and we'll get a syntax error. Whitespace is required:



list< vector<string> > lovos; 


Another situation involves using default argument initializers for pointer formal arguments:



void process( const char *= 0 ); // error! 


This declaration is attempting to use the *= assignment operator in a formal argument declaration. Syntax error. This problem comes under the "wages of sin" category, in that it wouldn't have happened if the author of the code had given the formal argument a name. Not only is such a name some of the best documentation one can provide, its presence would have made the maximal munch problem impossible:



void process( const char *processId = 0 ); 


    [ Team LiB ] Previous Section Next Section