Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 7. ABI

7.2. C ABI Requirements

Since C++ is an almost complete superset of C, any C++ ABI would encompass a C ABI. A natural first step in examining C++ ABI issues, therefore, is to look at C-specific issues.

C is a much simpler language than C++, in line with the spirit of C (see Prologue). It does not have classes, objects, virtual functions, exception-handling, run time type information, and templates.

However, several of the issues identified above are relevant to C. C has structures, which compilers are free to align according to their own criteria, unless we intervene with our own packing dictates via compiler-specific options and pragmas (see section 8.2).

Even the way functions are called at run time and named at link-time can vary. All functions in C are expressed according to one or more calling conventions which, depending on platform, may or may not be compatible between compilers. Similarly, the names of compiled functions—symbol names—may vary between compilers. It is the case that these two issues are more varied with compilers on the Win32 operating system than, say, UNIX operating systems, which is one of the reasons why I've chosen to focus on Win32 compilers in this chapter. Those of you from exclusively UNIX backgrounds can probably breathe a sigh of relief that the extent of these problems is lessened on your operating system of choice. Hopefully you'll spare a moment to have a care for the poor Win32 programmer trying to achieve C++ (and C) binary interoperability.

Although vastly simpler than those of C++, C does have issues with its language support—such as the reentrant semantics of atexit() [Alex2001]—and this means that a C ABI is dependent on common interpretation of such library functionality between compilers.

7.2.1 Structure Layout

The layout issues in C are pretty straightforward to understand. Consider the following structure:



struct S


{


  long  l;


  int   i;


  short s;


  char  ch;


};

The size of this structure is entirely implementation defined. In practice it depends on a given compiler's packing conventions, which invariably operate on a simple algorithm. If the structure is packed to an alignment of one byte, then the four members will be packed contiguously. Assuming sizes of 8, 4, 2, and 1 byte(s) respectively for long, int, short, and char, the size of S will be 15 bytes. However, for a packing alignment of 2 bytes the size will be 16. For an alignment of 4 it will be 24 and for an alignment of 8 or more it will be 32 bytes.

If two compilers use different packing alignments then the binary components that they build will not be binary compatible. In fact, packing alignment can generally be set by a compiler option, so it's quite possible to build different binary components of a single application by a single compiler and still have incompatibilities.

7.2.2 Calling Conventions, Symbol Names, and Object File Formats

Where compilers have, or are persuaded to use, the same structure packing alignment, the next problem comes when linking binary components together. This comes down to two issues: symbol names and calling conventions.

In the early days of C on UNIX [Lind1994], all compilers used the same calling convention, which is known as the C calling convention, whereby arguments are pushed onto the stack right to left, and the caller clears up the stack.^[3] More recently, other compilers, especially on the Windows platforms, have provided other conventions, known as stdcall (standard call), pascal, and fastcall, among others. On such environments, the C calling convention is known as cdecl. If two compilers are using different calling conventions, then the binary components they generate will not be compatible.

^[3] This is how variable argument list functions—int printf(char const *, ...);—are supported, because the calling context "knows" how many arguments have been passed, whereas the function itself would have no chance in the general case.

A related issue is that of the names given to symbols in the object files, the binary files produced by compilation corresponding to the compilation units (see Prologue 0.4). Symbol names are the names given to functions and variables with external linkage in object files in order that when the program is linked the various symbols can be located and linked together. Classically, the format is simply to use the function name itself [Kern1999]. Many compiler vendors define their own symbol naming schemes for C++, and this is one factor why C++ compatibility is poor (see section 7.3.3). This is further complicated by the fact that some compilers provide different names for symbols based on the threading model used for a given build.

Finally, object files may have different formats, which is another source of incompatibility on the Win32 platform. Several different formats exist—that is, OML, COFF, and so on—and they are generally incompatible, although most compilers support more than one format.

7.2.3 Static Linking

In practice, some compilers are able to share the same static libraries. Table 7.1 shows the compatibilities in the building and use of simple C static libraries for some popular Win32 compilers. As you can see, there are a few combinations in which compatibility is achieved, but the overall picture is not terribly encouraging. I've generally gone along with the default code generation options for each compiler in gathering this information. There do exist a few additional tailoring measures, but there is certainly no way to ensure full interoperability.

Table 7.1. Static linking compatibility (C code) on Win32; # represents a compatible combination.
Compiler Making the Library
Compiler Using the Library
Borland
CodeWarrior
Digital Mars
GCC
Intel
Visual C++
Borland
#

CodeWarrior

#

Digital Mars

#

GCC

#

#
#
#
Intel

#

#
#
#
Visual C++

#

#
#
#

Clearly, the lack of an ABI prevents the provision of a single statically linked version of your library, on the Win32 platform; similar incompatibilities exist on other platforms, although the Itanium standardization means that GCC and Intel can cooperate on Linux. There is a prosaic solution to the problem. For each compiler your clients may wish to use, you need to have access to that compiler, or a compatible one, and produce target libraries aimed at it. Thus, you might create mystuff_gcc.a (mystuff_a.lib for Win32), mystuff_cw.a (mystuff_cw.lib for Win32), and so on. The practical impediment is that you are unlikely to have access to all the compilers you may need, especially if you want to support several operating systems. Even if you do, it is an odious task to maintain all the makefiles/project files—more like hard labor than software engineering. It may be considered reasonable for small projects, but it's unacceptable for any large project: you can't imagine writing operating systems in this way!

7.2.4 Dynamic Linking

Modern operating systems, and many modern applications, make use of a technique known as dynamic linking, which we look at in detail in Chapter 9. In this case, you link against the library in a similar way, but the code is not copied into the finished executable. Rather, entry points are recorded, and when an executable is loaded by the operating system, the dynamic libraries on which it depends are also loaded and the entry points altered to point into the actual code within the library as it now resides within the new process's address space. On Win32 systems, the creation of a dynamic library is usually accompanied by the generation of a small static library, known as an import library, which contains the code that the application will use to fix up addresses when the dynamic library is loaded. Executables are linked against such import libraries in the same way as they are with normal (static) libraries.

From the library side, the functions that are made available for dynamic linking are known as export functions. Depending on the compilers and/or operating system, all the functions in the library may be exported, or only those that you explicitly mark in some way, for example, selection via mapfile on Solaris or using _ _declspec(dllexport) modifiers for Win32 compilers.

An advantage of dynamic linking is that demands on disk space and operating system working sets are reduced, because there aren't duplicated blocks of code spread throughout several executable files or in several concurrently executing processes [Rich1997]. It also means that bug fixes and enhancements can be deployed without requiring any rebuilds of dependent executables. Indeed, where the libraries are part of the operating system, such updates can be done without the program vendors, or even the users, being aware of it, as a side effect to the installation or update of other software or to the operating system itself.

Naturally, there are downsides to using shared libraries, the so-called "DLL Hell" [Rich2002] whereby updated versions of dynamic libraries can, if they contain bugs, break previously well-functioning (and often essential) programs. The other side of DLL Hell is the all too common problem of fixes to libraries breaking programs that depended on the bugs. Naming no names, you understand. Notwithstanding these very real problems, the advantages generally outweigh such concerns, and it is hard to conceive of a move back to pure static linking.

The impact of dynamic linking on our compatibilities is significant, as can be seen in Table 7.2. Each client program for a given compiler was built against the import library built by that compiler, and the compatibility tested by switching the DLLs with those built using the other compilers and attempting execution.

Table 7.2. Dynamic linking compatibility (C code, cdecl) on Win32; # represents a compatible combination.
Compiler Making the Library
Compiler Using the Library
Borland
CodeWarrior
Digital Mars
GCC
Intel
Visual C++
Borland
#
#
#
#
#
#
CodeWarrior
#*
#
#
#
#
#
Digital Mars
#*
#
#
#
#
#
GCC
#*
#
#
#
#
#
Intel
#*
#
#
#
#
#
Visual C++
#*
#
#
#
#
#

There is almost perfect compatibility here, apart from using other compilers' libraries by Borland (marked *). This is due to the fact that Borland prefixes any symbols using the C calling convention with a leading underscore in dynamic libraries, as the Win32 convention requires for static linking. Most other Win32 compilers omit them, probably due to the fact that Microsoft's Visual C++ does so. (Working around this with Borland can be a bit of a pain, hence the *. I'll leave it to the reader to delve into this subject. Hint: check out the -u- option, and the IMPLIB and IMPDEF tools.) On UNIX systems these prefix naming issues do not occur.

If you think about it, this full compatibility makes perfect sense. Whether you're on Win32 or on Solaris, you have to be able to interface to dynamic system libraries. If you weren't able to do so, then your compiler would not be able to generate code for that system, and there'd be little point in producing it for that environment.