Floating-point Numbers Aren’t Real

To illustrate, assign 2147483647 (the largest signed 32-bit integer) to a 32-bit float variable (x, say), and print it. You’ll see 2147483648. Now print . Still 2147483648. Now print x - 65 and you’ll get 2147483520! Why? Because the spacing between adjacent floats in that range is 128, and floating-point operations round to the nearest floating-point number.

Knowing the spacing in the neighborhood of a floating-point number can help you avoid classic numerical blunders. For example, if you’re performing an iterative calculation, such as searching for the root of an equation, there’s no sense in asking for greater precision than the number system can give in the neighborhood of the answer. Make sure that the tolerance you request is no smaller than the spacing there; otherwise you’ll loop forever.

Smearing can occur in even more subtle ways. Suppose a library naively computes e^x by the formula 1 + x + x²/2 + x³/3! + …. This works fine for positive x, but consider what happens when x is a large negative number. The even-powered terms result in large positive numbers, and subtracting the odd-powered magnitudes will not even affect the result. The problem here is that the roundoff in the large, positive terms is in a digit position of much greater significance than the true answer. The answer diverges toward positive infinity! The solution here is also simple: for negative x, compute e^x = 1/e^|x|.

By Chuck Allison