Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 12. Optimization

12.2. Return Value Optimization

The Return Value Optimisation (RVO) is widely documented [Dewh2003, Meye1996], so I'm not going to dwell on it too deeply. Basically, if the return value from a function is a call to the constructor of the returned type, the compiler is able to optimize the notional intermediate instance and construct directly into the instance that will receive the return value in the client code. In the following example, the temporary inside CreateString() is not created, and the construction occurs directly into the memory occupied by s1:



String CreateString(char const *s)


{


  return String((NULL == s) ? "" : s);


}


String  s1 = CreateString("Initialization via Assignment syntax");

As long as you write code that uses an explicit constructor, you can pretty well rely on any modern compiler applying this optimization. Given the following client code, the results for several modern compilers can be seen in Table 12.1, which shows the number of object constructions for each case.



1 CreateString("1");


2 String  s2 = CreateString("2");


3 String  s3(CreateString("3"));

Table 12.1. The values represent the number of instances created for each case.
Compiler
Case 1
Case 2
Case 3
Borland (5.6)
1
1
2
CodeWarrior 8
1
1
1
Comeau 4.3.0.1
1
1
2
Digital Mars 8.38
1
1
1
GCC 3.2
1
1
1
Intel 7.0
1
1
2
Visual C++ 7.1
1
1
1
Watcom 12.0
1
1
1

However, there are a couple of weird little nuances in some implementations. If you use the assignment syntax, as shown in case 2 in the table, then the copy constructor must be available in the calling context, even though it's not used. The standard (C++-98: 3.2;2) says "a copy constructor is [considered] even if it is elided by the implementation." So, if your class has hidden the copy constructor (see section 2.2.3), or it cannot be generated by the compiler (see section 2.2.1), you will not be able to use RVO.

Well, that's the theory anyway. If the copy constructor of String is explicit, then only CodeWarrior, Comeau, GCC, and Visual C++ (7.1) pick it up; the others optimize it out anyway. If it is declared private, all the compilers except Digital Mars pick that up. In either case, the application of the optimization absent of an accessible copy constructor is illegal and should not be relied upon.

If you write your code using the function call syntax, you still need the accessible copy constructor, even though, again, it is not used.



String s2(CreateString("Init via function call syntax"));

In this guise, the compilers are still prone to break the law. The same effects are seen when the copy constructor is declared private or explicit, as with the assignment syntax. But the strange thing is that several compilers are less able to apply the optimization than with the assignment syntax. As you can see from Table 12.1, Borland 5.6, Comeau 4.3.0.1, and Intel 7.0 all fail to apply RVO in the function call syntax case, whereas all compilers examined applied RVO in the case of the assignment syntax. I can only surmise that their testing has focused on the assignment syntax.

12.2.1 Named Return Value Optimization

The Named Return Value Optimization (NRVO) is a slight modification of the RVO, that's almost as widely supported and just as easy to comprehend. Sometimes we might want to manipulate a variable before returning it in a way that is not commensurate with a reasonable class interface. The standard example is in the implementation of addition operators. The last thing we'd want to do is have our class provide a constructor to facilitate concatenation, as in:



String operator +(String const &lhs, String const &rhs)


{


  return String(lhs, rhs);


}

We might just be able to live with it^[1] for strings, since there are not other binary operations returning evaluated results for String. But consider how you work with such a strategy for numeric types? What about subtraction, multiplication, division? And don't say that you can have a third parameter to the constructor defining the operation, or I'll tell the publisher to leave the second half of the book blank and charge you double!

^[1] I couldn't live with it. It is daft, ugly, and just plain wrong.

Anyway, the canonical implementation of a string concatenation is as a free function, implemented in terms of the += instance method on a copy of the first argument:



String operator +(String const &lhs, String const &rhs)


{


  String result(lhs);


  result += rhs;


  return result;


}

But since we're not returning an already constructed instance of the type, we lose the facility of RVO. Thankfully, this is where NRVO steps in. It basically does RVO for named instances. As long as the compiler can deduce that the return values from all possible code paths refer to the same variable, it can apply the construction and all subsequent manipulation of the named return value into the return context.

Just as with RVO, there is varying support for NRVO throughout the range of compilers. If we take the same client code from the RVO case, and just change CreateString() a little we can test out our compilers for NRVO.



String CreateString(char const * s1, char const * s2)


{


  String result(s1);


  result += s2;


  return result;


}

The results are quite interesting. Table 12.2 shows the number of objects constructed in each of the three cases.

Table 12.2. The values represent the number of instances created for each case.
Compiler
Case 1
Case 2
Case 3
Borland (5.6)
2
2
3
CodeWarrior 8
2
2
2
Comeau 4.3.0.1
1
1
2
Digital Mars 8.38
1
1
1
GCC 3.2
1
1
1
Intel 7.0
2
2
2
Visual C++ 7.1
2
2
2
Watcom 12.0
1
1
1

Of course, for both optimizations, it is possible that we are skewing the results by placing a printf() statement in to trace the execution. However, even if this is the case, there are two reasons why this does not concern us. First, these optimizations are not there to elide code with no side effects. In the real world, most code where we will care about these optimizations will have such non-zero execution costs. That's the whole point. It wouldn't be a terribly useful optimization otherwise, would it? Second, several of the compilers fully employ the optimizations in the tested form in several cases, and there are not reports of erroneous installations buzzing round the C++ development world.

Notwithstanding the legality issues, there are ramifications for the semantics of your software, not just its performance. If you develop software using a compiler that successfully employs these optimizations in all guises, and your application employs some kind of instance tracking mechanism as a metric on the process's health, you could be in for some unpleasant moments if you port to another compiler, until you realize what's going on.