3-2 Rounding Up/Down to the Next Power of 2

We define two functions that are similar to floor and ceiling, but which are directed roundings to the closest integral power of 2, rather than to the closest integer. Mathematically, they are defined by

graphics/03icon06.gif

The initial letters of the function names are intended to suggest "floor" and "ceiling." Thus, flp2(x) is the greatest power of 2 that is x, and clp2(x) is the least power of 2 that is x. These definitions make sense even when x is not an integer (e.g., flp2(0.1) = 0.0625). The functions satisfy several relations analogous to those involving floor and ceiling, such as those shown below, where n is an integer.

graphics/03icon07.gif

Computationally, we deal only with the case in which x is an integer, and we take it to be unsigned, so the functions are well defined for all x. We require the value computed to be the arithmetically correct value modulo 2³² (that is, we take clp2(x) to be 0 for x > 2³¹). The functions are tabulated below for a few values of x.

graphics/03icon08.gif

Functions flp2 and clp2 are connected by the relations shown below. These can be used to compute one from the other, subject to the indicated restrictions.

graphics/03icon09.gif

The round-up and round-down functions can be computed quite easily with the number of leading zeros instruction, as shown below. However, for these relations to hold for x = 0 and x > 2³¹, the computer must have its shift instructions defined to produce 0 for shift amounts of -1, 32, and 63. Many machines (e.g., PowerPC) have "mod 64" shifts, which do this. In the case of -1, it is adequate if the machine shifts in the opposite direction (that is, a shift left of -1 becomes a shift right of 1).

graphics/03icon10.gif

Rounding Down

Figure 3-1 illustrates a branch-free algorithm that might be useful if number of leading zeros is not available. This algorithm is based on right-propagating the leftmost 1-bit, and executes in 12 instructions.

Figure 3-1 Greatest power of 2 less than or equal to x, branch-free.

unsigned flp2(unsigned x) {

   x = x | (x >> 1);

   x = x | (x >> 2);

   x = x | (x >> 4);

   x = x | (x >> 8);

   x = x | (x >>16);

   return x - (x >> 1);

Figure 3-2 shows two simple loops that compute the same function. All variables are unsigned integers. The loop on the right keeps turning off the rightmost 1-bit of x until x = 0, and then returns the previous value of x.

Figure 3-2 Greatest power of 2 less than or equal to x, simple loops.

y = 0x80000000;           do {

while (y > x)              ?y = x;

   y = y >> 1;               x = x & (x - 1);

return                  ?} while(x != 0);

                          return y;

The loop on the left executes in 4nlz(x) + 3 instructions. The loop on the right, for x 0, executes in 4pop(x) instructions, ^[1] if the comparison to 0 is zero-cost.

^[1] pop(x) is the number of 1-bits in x.

Rounding Up

The right-propagation trick yields a good algorithm for rounding up to the next power of 2. This algorithm, shown in Figure 3-3, is branch-free and runs in 12 instructions.

Figure 3-3 Least power of 2 greater than or equal to x.

unsigned clp2(unsigned x) {

   x = x - 1;

   x = x | (x >> 1);

   x = x | (x >> 2);

   x = x | (x >> 4);

   x = x | (x >> 8);

   x = x | (x >>16);

   return x + 1;

An attempt to compute this with the obvious loop does not work out very well:

y = 1;

while (y < x)     // Unsigned comparison.

   y = 2*y;

return y;

This code returns 1 for x = 0, which is probably not what you want, loops forever for x 2³¹, and executes in 4n +3 instructions, where n is the power of 2 of the returned integer. Thus, it is slower than the branch-free code, in terms of instructions executed, for n 3 (x 8).