Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 4. Data Encapsulation and Value Types

4.3. A Taxonomy of Value Types

In [Stro1997] Bjarne Stroustrup defines value semantics, as opposed to pointer semantics, as being the independence of copied entities. This is a great foundation, but we need more, I think.

One of the Imperfect C++ reviewers, Eugene Gershnik, has a language-independent definition of value types. A type is a value type if:

Instances can be created as, or later made to be, copies of another instance.
Each instance has a separate identity. Any change to one instance does not result in a change to another.
Instances cannot polymorphically substitute or be substituted by instances of another type at run time.

This is an appealing definition, but it is very broad: too broad for my tastes. We'll refine this later in the chapter.

One way to look at value types is whether, and by how much, they behave in sensible ways. For example, what should I expect given the following expressions?



String      str1("Original String");


String      str2("Imperfect");


String      str3("C++");


char const  *cs1 = str1.c_str();





str1 = str2 + " " + str3; // 1


if(!str3) { . . . }       // 2


str2.Empty()              // 3


++str;                    // 4

I would say that expression 1 would concatenate str2, " " and str3, in that order, placing the result into str1, either overwriting, extending, or replacing the storage used to represent "Original String" when str1 was constructed.^[2] I would also say that at the point of completion of expression 1 the pointer cs1 is no longer valid, and cannot be used subsequently without undefined behavior. (Of course, if String::c_str() was temporary [see section 16.2], this wouldn't be a problem, since the assignment would not be allowed.)

^[2] Although it's convenient, using the + operator for strings is a misuse. Addition is an arithmetic operation, and applying it to character strings is the first step down a long scary road (see Appendix B). Notwithstanding these misgivings, I still sell out my principles for glory in Chapter 25.

Expression 2 would likely be interpreted to mean if str3 is "not" then the contents of the block should be executed. Note that what it means to be "not" is up for debate: It may mean no contents, or that the contents contain the empty string "", or both. It could even mean that the string contains "false"! Such an expression is ambiguous, and ambiguity is the enemy of both correctness and maintainability. Sadly, I've seen this very thing in production code.

The third expression could mean: empty str2 of its contents. However, it could also mean: return a value indicating whether or not str2 is empty. Given the choice, I would always go for the former. (Types and variables are nouns; methods and functions are verbs.) Alas the standard library disagrees, and it can be hard to disagree with the standard library.^[3] Expression 4 is meaningless. I cannot think of a sensible way in which a string in C++ can be incremented.^[4] (See Appendix B to see evidence of a time long, long ago in a galaxy far, far away when this was not the case.)

^[3] With the STLSoft libraries I've had to swallow my principles and go with the flow: lowercase, underscores, empty(), etc.

^[4] In Perl, and some other scripting languages, a string may be increment by making a best interpretation of it as a numeric type, then incrementing that value, and then converting back to a string. That's fine for Perl, but I don't think that's a good thing for C++ code to be doing.

For built-in types, expected behavior is easy, as it is already prescribed and inviolable. It is our responsibility, therefore, to ensure that our types operate as expected with the operators for which they are defined. If you write an extended-precision integer type whose operator -=() performs modulus division, you'll be hunted down. Types intended to be treated as values should, as much as is possible, behave "as the ints do" [Meye1996].

In the remainder of the chapter, we investigate what I see as a spectrum of value type concepts. I suggest there are four levels:

Open Types: plain structures + API functions
Encapsulated Types: partially or fully encapsulated class types, manipulated via methods
Value Types: fully encapsulated class types, including assignment and (in)equality operators
Arithmetic Value Types: for numerics, and includes all arithmetic operators

Depending on your point of view, they are all value types, or only the last two. Whichever way you look at it, however, they're in there because they represent recognized steps in the spectrum and are used in the real world.