Chapter 15. Floating-Point

God created the integers, all else is the work of man.

—Leopold Kronecker

Operating on floating-point numbers with integer arithmetic and logical instructions is often a messy proposition. This is particularly true for the rules and formats of the IEEE Standard for Binary Floating-Point Arithmetic, IEEE Std. 754-1985, commonly known as "IEEE arithmetic." It has the NaN (not a number) and infinities, which are special cases for almost all operations. It has plus and minus zero, which must compare equal to one another. It has a fourth comparison result, "unordered." The most significant bit of the fraction is not explicitly present in "normal" numbers, but it is in "denormalized" or "subnormal" numbers. The fraction is in signed-true form and the exponent is in biased form, whereas integers are now almost universally in two's-complement form. There are of course reasons for all this, but it results in programs that are full of compares and branches, and that present a challenge to implement efficiently.

We assume the reader has some familiarity with the IEEE standard, and thus summarize it here only very briefly.