10-15 Exact Division by Constants

By "exact division," we mean division in which it is known beforehand, somehow, that the remainder is 0. Although this situation is not common, it does arise, for example, when subtracting two pointers in the C language. In C, the result of p - q, where p and q are pointers, is well defined and portable only if p and q point to objects in the same array [H&S, sec. 7.6.2]. If the array element size is s, the object code for the difference p - q computes (p - q)/s.

The material in this section was motivated by [GM, sec. 9].

The method to be given applies to both signed and unsigned exact division, and is based on the following theorem.

Theorem MI. If a and m are relatively prime integers, then there exists an integer a? 1 a?/span> < m, such that

graphics/10icon155.gif

 

Thus a?/span> is a multiplicative inverse of a, modulo m. There are several ways to prove this theorem; three proofs are given in [NZM, 52]. The proof below requires only a very basic familiarity with congruences.

Proof. We will prove something a little more general than the theorem. If a and m are relatively prime (and hence nonzero), then as x ranges over all m distinct values modulo m, ax takes on all m distinct values modulo m. For example, if a = 3 and m = 8, then as x ranges from 0 to 7, ax = 0, 3, 6, 9, 12, 15, 18, 21 or, reduced modulo 8, ax = 0, 3, 6, 1, 4, 7, 2, 5. Observe that all values from 0 to 7 are present in the last sequence.

To see this in general, assume that it is not true. Then there exist distinct integers that map to the same value when multiplied by a; that is, there exist x and y, with x y (mod m), such that

graphics/10icon156.gif

 

But then there exists an integer k such that

graphics/10icon157.gif

 

Because a has no factor in common with m, it must be that x - y is a multiple of m; that is,

graphics/10icon158.gif

 

This contradicts the hypothesis.

Now, because ax takes on all m distinct values modulo m, as x ranges over the m values, it must take on the value 1 for some x.

The proof shows that there is only one value (modulo m) of x such that ax 1 (mod m)梩hat is, the multiplicative inverse is unique, apart from additive multiples of m. It also shows that there is a unique (modulo m) integer x such that ax b (mod m)where b is any integer.

As an example, consider the case m = 16. Then graphics/10icon159.gifbecause 3?1 = 33 1 (mod 16). We could just as well take graphics/10icon160.gifbecause 3?-5) = -15 1 (mod 16). Similarly, graphics/10icon161.gifbecause (-3)? = -15 1 (mod 16).

These observations are important because they show that the concepts apply to both signed and unsigned numbers. If we are working in the domain of unsigned integers on a 4-bit machine, we take graphics/10icon162.gifIn the domain of signed integers, we take graphics/10icon163.gifBut 11 and -5 have the same representation in two's-complement (because they differ by 16), so the same computer word contents can serve in both domains as the multiplicative inverse.

The theorem applies directly to the problem of division (signed and unsigned) by an odd integer d on a W-bit computer. Because any odd integer is relatively prime to 2W, the theorem says that if d is odd, there exists an integer d?/span> (unique in the range 0 to 2W - 1 or in the range -2W - 1 to 2W - 1 - 1) such that

graphics/10icon164.gif

 

Hence for any integer n that is a multiple of d,

graphics/10icon165.gif

 

In other words, n/d can be calculated by multiplying n by d?/span>, and retaining only the rightmost W bits of the product.

If the divisor d is even, let d = do ?2k, where do is odd and k 1. Then, simply shift n right k positions (shifting out 0's), and then multiply by graphics/icon01.gif(the shift could be done after the multiplication as well).

Below is the code for division of n by 7, where n is a multiple of 7. This code gives the correct result whether it is considered to be signed or unsigned division.

li牋?M,0xB6DB6DB7?Mult. inverse, (5*2**32 + 1)/7. 
mul牋 q,M,n牋牋牋牋 q = n/7. 

Computing the Multiplicative Inverse by the Euclidean Algorithm

How can we compute the multiplicative inverse? The standard method is by means of the "extended Euclidean algorithm." This is briefly discussed below as it applies to our problem, and the interested reader is referred to [NZM, 13] and to [Knu2, sec. 4.5.2] for a more complete discussion.

Given an odd divisor d, we wish to solve for x

graphics/10icon166.gif

 

where, in our application, m = 2W and W is the word size of the machine. This will be accomplished if we can solve for integers x and y (positive, negative, or 0) the equation

graphics/10icon167.gif

 

Toward this end, first make d positive by adding a sufficient number of multiples of m to it. (d and d + km have the same multiplicative inverse.) Second, write the following equations (in which d, m > 0):

graphics/10icon168.gif

 

If d = 1, we are done, because (ii) shows that x = 1. Otherwise, compute

graphics/10icon169.gif

 

Third, multiply Equation (ii) by q and subtract it from (i). This gives

graphics/10icon170.gif

 

This equation holds because we have simply multiplied one equation by a constant and subtracted it from another. If rem(m - d, d) = 1, we are done; this last equation is the solution and x = - 1 - q.

Repeat this process on the last two equations, obtaining a fourth, and continue until the right-hand side of the equation is 1. The multiplier of d, reduced modulo m, is then the desired inverse of d.

Incidentally, if m - d < d, so that the first quotient is 0, then the third row will be a copy of the first, so that the second quotient will be nonzero. Furthermore, most texts start with the first row being

graphics/10icon171.gif

 

but in our application m = 2W is not representable in the machine.

The process is best illustrated by an example. Let m = 256 and d = 7. Then the calculation proceeds as follows. To get the third row, note that graphics/10icon172.gif

7(-1) + 256( 1) = 249 
7( 1) + 256( 0) = 7 
7(-36) + 256( 1) = 4 
7( 37) + 256(-1) = 3 
7(-73) + 256( 2) = 1 

Thus, the multiplicative inverse of 7, modulo 256, is -73 or, expressed in the range 0 to 255, is 183. Check: 7?83 = 1281 1 (mod 256).

From the third row on, the integers in the right-hand column are all remainders with respect to the number above it as a divisor (d being the dividend), so they form a sequence of strictly decreasing nonnegative integers. Therefore, the sequence must end in 0 (as the above would if carried one more step). Furthermore, the value just before the 0 must be 1, for the following reason. Suppose the sequence ends in b followed by 0, with b 1. Then, the integer preceding the b must be a multiple of b, let's say k1b, for the next remainder to be 0. The integer preceding k1b must be of the form k1k2b + b, for the next remainder to be b. Continuing up the sequence, every number must be a multiple of b, including the first two (in the positions of the 249 and the 7 in the above example). But this is impossible, because the first two integers are m - d and d, which are relatively prime.

This constitutes an informal proof that the above process terminates, with a value of 1 in the right-hand column, and hence it finds the multiplicative inverse of d.

To carry this out on a computer, first note that if d < 0 we should add 2W to it. But with two's-complement arithmetic it is not necessary to actually do anything here; simply interpret d as an unsigned number regardless of how the application interprets it.

The computation of q must use unsigned division.

Observe that the calculations can be done modulo m, because this does not change the right-hand column (these values are in the range 0 to m - 1 anyway). This is important, because it enables the calculations to be done in "single precision," using the computer's modulo-2W unsigned arithmetic.

Most of the quantities in the table need not be represented. The column of multiples of 256 need not be represented, because in solving dx + my = 1, we do not need the value of y. There is no need to represent d in the first column. Reduced to its bare essentials, then, the calculation of the above example is carried out as follows:

255?249 
牋1牋?7 
220牋?4 
?7牋?3 
183牋?1 

A C program for performing this computation is shown in Figure 10-4.

Figure 10-4 Multiplicative inverse modulo 232 by the Euclidean algorithm.
unsigned mulinv(unsigned d) {牋牋牋牋牋 // d must be odd. 
牋爑nsigned x1, v1, x2, v2, x3, v3, q; 
 
牋 x1 = 0xFFFFFFFF;牋牋 v1 = -d; 
牋爔2 = 1;牋牋牋牋牋牋?v2 = d; 
牋爓hile (v2 > 1) {
牋 牋爍 = v1/v2; 
牋牋牋x3 = x1 - q*x2;牋 v3 = v1 - q*v2; 
牋牋牋x1 = x2;牋牋牋牋?v1 = v2; 
牋牋牋x2 = x3;牋牋牋牋?v2 = v3; 
牋爙 
牋爎eturn(x2); 
} 

The reason the loop continuation condition is (v2 > 1) rather than the more natural (v2 != 1) is that if the latter condition were used, the loop would never terminate if the program were invoked with an even argument. It is best that programs not loop forever even if misused. (If the argument d is even, v2 never takes on the value 1, but it does become 0.)

What does the program compute if given an even argument? As written, it computes a number x such that dx 0 (mod 232), which is probably not useful. However, with the minor modification of changing the loop continuation condition to (v2 != 0) and returning x1 rather than x2, it computes a number x such that dx g (mod 232) where g is the greatest common divisor of d and 232梩hat is, the greatest power of 2 that divides d. The modified program still computes the multiplicative inverse of d for d odd, but it requires one more iteration than the unmodified program.

As for the number of iterations (divisions) required by the above program, for d odd and less than 20, it requires a maximum of 3 and an average of 1.7. For d in the neighborhood of 1000, it requires a maximum of 11 and an average of about 6.

Computing the Multiplicative Inverse by Newton's Method

It is well known that, over the real numbers, 1/d, for d 0, can be calculated to ever increasing accuracy by iteratively evaluating

Equation 31

graphics/10icon173.gif

 

provided the initial estimate x0 is sufficiently close to 1/d. The number of digits of accuracy approximately doubles with each iteration.

It is not so well known that this same formula can be used to find the multiplicative inverse in the domain of modular arithmetic on integers! For example, to find the multiplicative inverse of 3, modulo 256, start with x0 = 1 (any odd number will do). Then,

graphics/10icon174.gif

 

The iteration has reached a fixed point modulo 256, so -85, or 171, is the multiplicative inverse of 3 (modulo 256). All calculations can be done modulo 256.

Why does this work? Because if xn satisfies

graphics/10icon175.gif

 

and if xn + 1 is defined by (31), then

graphics/10icon176.gif

 

To see this, let dxn = 1 + km. Then

graphics/10icon177.gif

 

In our application, m is a power of 2, say 2N. In this case, if

graphics/10icon178.gif

 

In a sense, if xn is regarded as a sort of approximation to d?/span>, then each iteration of (31) doubles the number of bits of "accuracy" of the approximation.

It happens that, modulo 8, the multiplicative inverse of any (odd) number d is d itself. Thus, taking x0 = d is a reasonable and simple initial guess at d?/span>. Then, (31) will give values of x1, x2, ? such that

graphics/10icon179.gif

 

Thus, four iterations suffice to find the multiplicative inverse modulo 232 (if x 1 (mod 248) then x 1 (mod 2n) for n 48). This leads to the C program in Figure 10-5, in which all computations are done modulo 232.

Figure 10-5 Multiplicative inverse modulo 232 by Newton's method.
unsigned mulinv(unsigned d) {牋牋牋 // d must be odd. 
牋爑nsigned xn, t; 
 
牋 xn = d; 
loop: t = d*xn; 
牋牋牋if (t == 1) return xn; 
牋牋牋xn = xn*(2 - t); 
牋牋牋goto loop; 
} 

For about half the values of d, this program takes 4 1/2 iterations, or nine multiplications. For the other half (those for which the initial value of xn is "correct to 4 bits"梩hat is, d2 1 (mod 16)), it takes seven or fewer, usually seven, multiplications. Thus, it takes about eight multiplications on average.

A variation is to simply execute the loop four times, regardless of d, perhaps "strung out" to eliminate the loop control (eight multiplications). Another variation is to somehow make the initial estimate x0 "correct to 4 bits" (that is, find x0 that satisfies dx0 1 (mod 16)). Then, only three loop iterations are required. Some ways to set the initial estimate are

graphics/10icon180.gif

 

Here, the multiplication by 2 is a left shift, and the computations are done modulo 232 (ignoring overflow). Because the second formula uses a multiplication, it saves only one.

This concern about execution time is of course totally unimportant for the compiler application. For that application, the routine would be so seldom used that it should be coded for minimum space. But there may be applications in which it is desirable to compute the multiplicative inverse quickly.

Sample Multiplicative Inverses

We conclude this section with a listing of some multiplicative inverses in Table 10-3.

Table 10-3. Sample Multiplicative Inverses

d

d?/span>

(dec)

mod 16 (dec)

mod 232 (hex)

mod 264 (hex)

-7

-7

49249249

9249249249249249

-5

3

33333333

3333333333333333

-3

5

55555555

5555555555555555

-1

-1

FFFFFFFF

FFFFFFFFFFFFFFFF

1

1

1

1

3

11

AAAAAAAB

AAAAAAAAAAAAAAAB

5

13

CCCCCCCD

CCCCCCCCCCCCCCCD

7

7

B6DB6DB7

6DB6DB6DB6DB6DB7

9

9

38E38E39

8E38E38E38E38E39

11

3

BA2E8BA3

2E8BA2E8BA2E8BA3

13

5

C4EC4EC5

4EC4EC4EC4EC4EC5

15

15

EEEEEEEF

EEEEEEEEEEEEEEEF

25

 

C28F5C29

8F5C28F5C28F5C29

125

 

26E978D5

1CAC083126E978D5

625

 

3AFB7E91

D288CE703AFB7E91

You may notice that in several cases (d = 3, 5, 9, 11), the multiplicative inverse of d is the same as the magic number for unsigned division by d (see Section 10-14, "Sample Magic Numbers," on page 189). This is more or less a coincidence. It happens that for these numbers, the magic number M is equal to the multiplier m, and these are of the form (2p + 1)/d, with p 32. In this case, notice that

graphics/10icon181.gif

 

so that M d?/span> (mod 232).