Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 12. Optimization

12.5. Preventing Optimization

As programmers we spend so much time and effort on facilitating optimization that it can seem strange to want to prevent it. Why would we want to do such a thing?

There are two reasons. First, you may have global optimization settings for your build that are not appropriate for a given compilation unit, or even for a given block of code within a compilation unit. A good example of this is when you are optimizing for space, but you have a particular block of code that you wish to be optimized for speed.

The other reason for disabling optimizations is that in order to increase the efficiency of your code, you sometimes have to disable certain optimizations so that you can actually measure them. Sounds kooky, of course, but some modern compilers are getting so good at optimization that they can hinder your working to increase the efficiency of your own code.

When a whole compilation unit is to be optimized differently, you can simply specify different settings for your compiler(s) in the makefile/project files. However, when the specific settings are required on a subcompilation unit basis, you will have to resort to compiler-specific optimization features. For example, the following code prevents the function slow() from being optimized for speed under Intel and Visual C++ irrespective of the optimization settings defined for the compilation unit.



// functions.cpp


. . .


# pragma optimize("gt", off)


void slow()


{


  for(int i = 0; i < std::numeric_limits<int>::max(); ++i)


  {}


}


# pragma optimize("", on)


. . .

Perversely, some compilers are actually so good at optimization, they are in a sense too good. A program such as the following will be optimized out to nothing by several compilers.



int main()


{


  for(int i = 0; i < std::numeric_limits<int>::max(); ++i)


  {}


  return 0;


}

Nothing terribly surprising there, apart from the fact that a number of compilers don't actually optimize this one. Of course, in practice you'd be unlikely to be interested in profiling such empty loops. However, you might be attempting to measure the performance of some inline functions, and be contrasting the cost of your functions with equivalent ones that have no actual implementation, as in Listing 12.1.

Listing 12.1.



template <typename T>


inline T const &func1(T const &t)


{


  . . . // Actual manipulation of t


}


template <typename T>


inline T const &func2(T const &t)


{


  return t; // Stub function. Just return t


}





int main()


{


  performance_counter counter;





  counter.start();


  for(int i = 0; . . .)


  {


    func1(i);


  }


  counter.stop();


  cout << "func1: " << counter.get_millisecond() << endl;





  counter.start();


  for(int i = 0; . . .)


  {


    func2(i);


  }


  counter.stop();


  cout << "func1: " << counter.get_millisecond() << endl;

If you run loops to compare the costs of the two functions, some compilers are able to deduce that the second form does nothing, and actually elide the entire loop for the func2() case. CodeWarrior 8 and Visual C++ 7.1 can both do this.

We've already seen that volatile is not useful in multithreading, since the standard does not say anything about threading at all, and it is implementation defined whether a given implementation will respect volatile with respect to threading. However, volatile can be useful in regard to preventing optimization when it's not wanted. Indeed, the standard (C++-98: 7.1.5.1;8) says that "volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation."

So where do we apply it? Well, you might think that you could change the definition of func2() to make its argument type and return type T const volatile &. That actually works for CodeWarrior 8, but Visual C++ 7.1 is just so modern that it can't be fooled so easily. The answer is to apply it to the loop indexer i, in which case all the compilers I've tried respect this code.



for(int volatile i = 0; . . .)


{}

Notwithstanding this, I think we're very much in implementation-defined territory here. I guess we'd be surprized if an implementation didn't respect volatile given the wording in the standard, but it's always possible. On a modern operating system that uses virtual memory, the only way a volatile variable of such limited scope as our loop indexers could be modified would be via some incredibly clever injection of code from another process. It's certainly conceivable, therefore, for an implementation to just optimize away volatile, when it's very sure that it can: the keyword is only a hint, after all.

If you're feeling really paranoid, a very safe option is to call an external system function from within your loop. Since the compiler cannot, in principle, know the internal workings of a system function at compile/link time, it cannot optimize it away. A simple one would be to call time(). The downside is that you cannot know that the cost of such calls may not be constant, so you may be skewing your results. The way to get around this is to use a local static, as in Listing 18.2, and so the system call is done once, but your compiler still cannot optimize it all away.

Listing 12.2.



time_t inhibit_optimization()


{


  static time_t t = time(NULL);


  return t;


}





int main()


{


  int                 ret;


  performance_counter counter;





  counter.start();


  for(int i = 0; . . .)


  {


    . . .


    ret = static_cast<int>(inhibit_optimization());


  }


  . . .


  return ret;


}

Anyway, I'm sure you get the idea. volatile appears to be able to suppress optimization, but if you want to be really sure you need to have the compiler think that something is variable, even though you know it's constant.