Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 7. ABI

7.3. C++ ABI Requirements

There are several aspects of C++ that complicate the challenge of a unified ABI far beyond that seen for C. C++ has classes that, although they share much in common with structures, are more complex, since a particular class may have one or more base classes.

C++ provides runtime polymorphism via the virtual function mechanism. Although compilers use a common mechanism—virtual function tables—many of them have mutually incompatible schemes. C++ compilers employ much more complex, and proprietary, symbol-naming schemes, almost rendering it impossible to call a C++ binary component built with one compiler from C++ client code built with another.

On top of the symbol naming issues of C, C++ compilers use name mangling in order to provide for the overloading of functions and the policing of type-safety during linking.

Because C++ supports static objects, the language must support the construction and destruction of global and function-local static objects. We'll see in Chapter 11 that different compilers employ different schemes, resulting in different initialization orders for global objects. Clearly this is another inhibition to a C++ ABI.

C++ also has exception handling, run time type information (RTTI), and various C++ standard language functions, all of which must have the same, or compatible, formats, and the run time behavior must be consistent.

7.3.1 Object Layout

In C++, all the C structure layout issues (see section 7.2.1) still hold, but the matter is much complicated by the layout of inherited classes, templates, and virtual inheritance [Lipp1996]. The layout of inherited classes has much in common with the packing issues that are seen in C with nested structures; virtual inheritance is another of those implementation-defined (C++-98: 9.2;12) issues that adds to this complex picture.

The way in which templates are implemented also has an effect. Sections 12.3 and 12.4 cover aspects in which templates may affect object layout through inheritance.

7.3.2 Virtual Functions

The C++ standard describes the effects of the virtual function mechanism, upon which C++ run time polymorphism is based, but does not prescribe the way in which the mechanism should be implemented. It is the case that all modern commercial compilers use the virtual function table mechanism [Stro1994], whereby an instance of a class defining virtual functions (or which inherits from such a class) contains a hidden pointer—the vptr—to a table of function pointers—the vtable—which refer to the virtual functions. This commonality gives some cause for optimism, but different vendors employ different conventions. This issue is explored in detail in Chapter 8.

As well as the format of the vtable, compilers must also agree on when objects of a given class have vtables, or when they "reuse" those of a base class.

7.3.3 Calling Conventions and Mangling

One of the most obvious of the flies in the ABI soup for C++ is name mangling, which is a must to support one of C++'s fundamental mechanisms: overloading. Consider the code in Listing 7.1.

Listing 7.1.



class Mangle


{


public:


  void func(int i);


  void func(char const *);


};





int main()


{


  Mangle  mangle;


  mangle.func(10);


  mangle.func("Hello");


  return 0;


};

I've deliberately omitted definitions for Mangle's methods, so that we can sneak a look at the mangling. If you build this with Visual C++ 6.0, you will get no compilation errors, but the linker will report.



error: unresolved "void Mangle::func(char const *)" (?func@Mangle@@QAEXPBD@Z)


error: unresolved "void Mangle::func(int)"          (?func@Mangle@@QAEXH@Z)

The funny squiggles are there to support overloading. In C, there can only be a single definition of a given function within an executable, so the binary name, known as the symbol name, is given as a simple variant of the function name on a system specific basis. But in C++, functions (free functions, class methods, and instance methods) can be overloaded. If several functions with the same name can be defined, there has to be a way of providing a distinct symbol name to each overload. The same applies to same-named methods in different classes, or within same-named classes within different namespaces. The result is that all compilers perform what is called mangling, a very apt term, since moderately readable function names are mangled into indecipherable spaghetti in order to guarantee uniqueness.

Since C++ piggybacks on top of operating system APIs—which are written for handling C-compatible APIs—for loading symbols, the C++ symbol identifiers need to be converted to a single-name form. Without such a scheme, the linker would not know which overload to link to calls in client code, and the program would not function. It's conceivable that in a complete integrated environment built for C++, the linker would understand overloading and the squiggles would not be necessary. However, there'd still need to be some form of encoding, and you need look no further than the Java Native Interface (JNI) [Lian1999] to see how such things, however "pure" in implementation, are still pretty unreadable. In any case, an integrated C++ operating environment would require a C++ loader, so we're almost in circular argument territory.

But since we've got linkers to handle that, why should we care? Mangling schemes are perfectly reasonable in and of themselves, so what's the problem? It's very simple. Different compiler vendors use different mangling schemes, so it is impractical to dynamically load and call into C++ libraries built by one compiler from code built by another. We see more of this in section 9.1.

7.3.4 Static Linking

If we look now at static linking compatibilities between a simple C++ library and a simple client program, produced with different Win32 compilers,^[4] we see (in Table 7.3) that the picture is even bleaker than it is with C.

^[4] The Intel compiler now has static C++ linking capability with GCC on Linux.

Table 7.3. Static linking compatibility (C++ code) on Win32; # represents a compatible combination.
Compiler Making the Library
Compiler Using the Library
Borland
CodeWarrior
Digital Mars
GCC
Intel
Visual C++
Borland
#

CodeWarrior

#

Digital Mars

#

GCC

#

Intel

#

#
#
Visual C++

#

#
#

7.3.5 Dynamic Linking

We've discussed that the problems of clash from mangling schemes in static linking can be avoided by providing compiler-specific variants of our libraries against which clients can build their executables. But this is not a solution for the dynamic linking case. Dynamic libraries are potentially shared at run time between several executables, which may or may not have been built with different compilers. If the symbol names in the dynamic library are mangled with a convention not understood by the compiler used to create another process, that process will not load.

Once again, in practice several Win32 compiler vendors use compatible schemes, including mangling conventions, as can be seen in Table 7.4.^[5]

^[5] The Intel compiler now has dynamic C++ linking capability with GCC on Linux.

Table 7.4. Dynamic linking compatibility (C++) on Win32; # represents a compatible combination.
Compiler Making the Library
Compiler Using the Library
Borland
CodeWarrior
Digital Mars
GCC
Intel
Visual C++
Borland
#

CodeWarrior

#
#

#
#
Digital Mars

#
#

#
#
GCC

#

Intel

#
#

#
#
Visual C++

#
#

#
#

It's clear that there is a greater degree of compatibility with dynamic libraries than is the case with static linking. However, it should be equally clear that there remains a significant degree of incompatibility. This is clear proof that C++ does not have a respectable ABI to speak of on the Win32 platform.

Once again, we might consider producing several dynamic libraries, each with the appropriately mangled names. We could envisage shipping compiler-specific versions of our dynamic libraries: libmystuff_gcc.so; mystuff_cw_dmc_intel_vc.dll. (We'd still have to make more import libraries, corresponding to the static linking compatibilities shown in the previous section, but that's a minor issue.)

However, there are several problems with this approach. First, it defeats both purposes of DLLs, in that more libraries will be resident on disk and in memory, and updates to the library must be consistently built and installed in all forms. Also, some dynamic libraries act as more than just code repositories. Dynamic libraries are allowed to contain static data (and we touch more on the ramifications of this in the next chapter), which can act to provide program logic, for example, a custom-memory manager or a socket-pool. If several compiler-specific versions of the same logical library exist, things could get very hairy.

So although it is feasible to get around the dynamic-library C++ ABI problem on a limited scale by supplying multiple compiler-specific dynamic libraries, it is fraught with problems, and I am not aware of any systems that actually do this. It seems pretty clear that to achieve portability within a given operating system we have to stick to C linkage. The significant downside is that this rules out mangled names, so we cannot export/import overloaded functions, or any class methods.

So far, we seem to be agreeing with the portents of doom outlined by my friend. Thankfully there's some help at hand, we just have to get a little retro.