We define two functions that are similar to floor and ceiling, but which are directed roundings to the closest integral power of 2, rather than to the closest integer. Mathematically, they are defined by
The initial letters of the function names are
intended to suggest "floor" and "ceiling." Thus, flp2(x) is the greatest power of 2 that is x, and clp2(x) is the least power of 2 that is
x. These
definitions make sense even when x is not an
integer (e.g., flp2(0.1) = 0.0625). The functions satisfy several relations
analogous to those involving floor and ceiling, such as those shown below,
where n is an integer.
Computationally, we deal only with the case in which x is an integer, and we take it to be unsigned, so the functions are well defined for all x. We require the value computed to be the arithmetically correct value modulo 232 (that is, we take clp2(x) to be 0 for x > 231). The functions are tabulated below for a few values of x.
Functions flp2 and clp2 are connected by the relations shown below. These can be used to compute one from the other, subject to the indicated restrictions.
The round-up and round-down functions can be computed quite easily with the number of leading zeros instruction, as shown below. However, for these relations to hold for x = 0 and x > 231, the computer must have its shift instructions defined to produce 0 for shift amounts of -1, 32, and 63. Many machines (e.g., PowerPC) have "mod 64" shifts, which do this. In the case of -1, it is adequate if the machine shifts in the opposite direction (that is, a shift left of -1 becomes a shift right of 1).
Figure 3-1 illustrates a branch-free algorithm that might be useful if number of leading zeros is not available. This algorithm is based on right-propagating the leftmost 1-bit, and executes in 12 instructions.
unsigned flp2(unsigned x) {
牋 x = x | (x >> 1);
牋爔 = x | (x >> 2);
牋爔 = x | (x >> 4);
牋爔 = x | (x >> 8);
牋爔 = x | (x >>16);
牋爎eturn x - (x >> 1);
}
Figure 3-2 shows two simple loops that compute the same function. All variables are unsigned integers. The loop on the right keeps turning off the rightmost 1-bit of x until x = 0, and then returns the previous value of x.
y = 0x80000000;牋牋牋牋牋 do {
while (y > x)牋牋牋牋牋牋牋?y = x;
牋爕 = y >> 1;牋牋牋牋牋牋牋 x = x & (x - 1);
return牋牋牋牋牋牋牋牋牋?} while(x != 0);
牋牋牋牋牋牋牋牋牋牋牋牋牋return y;
The loop on the left executes in 4nlz(x) + 3 instructions. The loop on the right, for
x 0, executes in 4pop(x) instructions, [1] if the comparison to 0 is
zero-cost.
[1] pop(x) is the number of 1-bits in x.
The right-propagation trick yields a good algorithm for rounding up to the next power of 2. This algorithm, shown in Figure 3-3, is branch-free and runs in 12 instructions.
unsigned clp2(unsigned x) {
牋 x = x - 1;
牋爔 = x | (x >> 1);
牋爔 = x | (x >> 2);
牋爔 = x | (x >> 4);
牋爔 = x | (x >> 8);
牋爔 = x | (x >>16);
牋爎eturn x + 1;
}
An attempt to compute this with the obvious loop does not work out very well:
y = 1;
while (y < x)牋牋 // Unsigned comparison.
牋爕 = 2*y;
return y;
This code returns 1 for x = 0, which is probably not what you want, loops
forever for x 231, and
executes in 4n +3 instructions, where n is the power of 2 of the returned integer. Thus, it
is slower than the branch-free code, in terms of instructions executed, for n
3 (x
8).