Is there a formula for overflow?

Wed Nov 30 17:49:26 UTC 2022

On Wed, Nov 30, 2022 at 03:07:44AM +0000, thebluepandabear via Digitalmars-d-learn wrote:
> I am reading through Ali's book about D, and he gives the following
> examples for an overflow:
[...]
> The result overflows and is 1705032704.
> 
> Also for the second example, it overflows and comes up with the value
> of 4294967286:
[...]
> Also a fun one, the following produces 4294967295:
[...]
> But the book doesn't talk about why the D compiler came up with these
> results (and it also goes for division/multiplication) for the
> overflow (or maybe it did further down ?), as in it didn't talk about
> the formula it used for calculating this value.
> 
> I am curious as to what formula the D compiler uses for calculating
> 'overflowed' values, if such thing exists? :)
[...]

First, you have to understand that it's not the D compiler that imposes
some arbitrary maximum after which an integer value will overflow. To
overflow rules are more-or-less a description of how the hardware
behaves under the hood, rather than something the compiler deliberately
imposes.

The value 4294967296 is actually 2^32. Why 2^32? Because that's the
number of distinct values a 32-bit machine register can hold. In fact,
the maximum value that a 32-bit register can hold is (2^32 - 1), i.e.,
4294967295, because 0 is one of the values represented.  Now, (2^32 - 1)
is the maximum for uint, but for int, one bit is reserved for indicating
the sign of the value, so the actual maximum is (2^31 - 1), i.e.,
2147483647, which, incidentally, is the value of int.max.

As for the actual overflow behaviour, it's a simple consequence of the
2's complement representation of integers. I.e., -1 is represented not
as:
	0b1000_0000_0000_0000__0000_0000_0000_0001

but rather as:

	0b1111_1111_1111_1111__1111_1111_1111_1111 (i.e. 0xFFFF_FFFF)

The most negative value that can be represented in 32-bit 2's complement
is:

	0b1000_0000_0000_0000__0000_0000_0000_0000 (0x8000_0000)

which is -2^31 == -2147483648, which, incidentally, is int.min.

Why 2's complement?  Well, because that's what the machine implements...
but why did the engineers choose to implement it that way?  Because 2's
complement representation has the interesting property that addition and
subtraction can be done exactly as if the values were unsigned, and
you'd get the correct results back when you reinterpret them as 2's
complement.  I.e., if you add 1 to 0xFFFF_FFFF, interpreted as an
unsigned integer, you get 0x0000_0000 (there's a leading 1 on the far
left but it's in the 33rd bit, which doesn't fit in the machine
register, so it gets discarded).  If you reinterpret 0xFFFF_FFFF in 2's
complement as -1, then you have the nice result that (-1) + 1 == 0 (the
+ here is unsigned addition).

One consequence of this is that negation is implemented as:

	-x == ~(x-1)

As a consequence of the 2's-complement representation of integers, and
the way arithmetic operations are implemented, the overflow rules that
you read about just fall out of the rules as natural consequences:

	0x7FFF_FFFF + 1 == 0x8000_0000

i.e., interpreted as 2's complement:

	2147483647 + 1 == -2147483648

Negating an unsigned number is equivalent to doing ~(x-1), so:

	cast(uint) -1 == 0xFFFF_FFFF == 2^32 - 1 == 4294967295

Adding 3_000_000_000 to 3_000_000_000 (in hexadecimal, that's
0xB2D05E00) gives you:

	0xB2D05E00 + 0xB2D05E00 == 1_65A0BC00

But that leading 1 is in the 33rd bit, which doesn't fit into a 32-bit
register (i.e., it overflows). If you discard it, you get 0x65A0BC00,
which in decimal is 1705032704.

So you see, none of this is D's own idiosyncratic rules; it's merely a
reflection of how the machine implements 32-bit integer values.
(Analogous results can be said for 64-bit values.)

T

-- 
If you want to solve a problem, you need to address its root cause, not just its symptoms. Otherwise it's like treating cancer with Tylenol...