A strange div bug on Linux x86_64, (both dmd & ldc2): long -5000 / size_t 2 = 9223372036854773308
Adam D. Ruppe
destructionator at gmail.com
Fri Aug 14 13:05:28 UTC 2020
On Friday, 14 August 2020 at 09:32:58 UTC, FeepingCreature wrote:
> (*Why* is multiplication fine? I have no idea... but it works
> in spot testing.)
mul 32 bit * 32 bit and imul 32*32 give the same result in the
lower 32 bits of the result. Only the upper half shows the
difference.
The algorithm* in binary looks kinda like a sign extension
followed by a series of shifts and adds.
https://en.wikipedia.org/wiki/Multiplication_algorithm#Binary_or_Peasant_multiplication
The signed product is the sum of left-shifted values after sign
extension. We know the sum of two's complement is the same
regardless of sign after extension, and that left shift is going
to naturally only start to become an issue on the high word. Thus
same deal for a 32 bit result (int = int * int), but you can
expect differences in a 64 bit result (long = int * int).
Let's first demo it with a 4 bit example. Say -2 * 3.
-2 = 1110 (2 = 0010, then flip the bits + 1: 1101 + 1 = 1110)
3 = 0011
I'll do the long form in 8 bit, so first, we need to sign extend
them, which is just duplicating the msb all the way left):
shr --- shl
11111110 * 00000011
-----------
01111111 * 00000110
00111111 * 00001100
00011111 * 00011000
00001111 * 00110000
00000111 * 01100000
00000011 * 11000000
00000001 * 10000000 (c)
Now we get the sum of all the rhs values there if the lhs small
bit is set... which happens to be all of them here
11111010 (c)
That's obviously signed, flip the bits and add one to get our
result back out, -6. Whether we chop off or keep those high bits,
no difference.
That was an imul since I sign extended. Now, let's do mul, the
unsigned one. Same deal except we just pad left with zero:
00001110 * 00000011 (unsigned would be 14 * 3)
-------------------
00000111 * 00000110
00000011 * 00001100
00000001 * 00011000
Sum: 00101010 (unsigned is 42)
Well, those lower bits look the same... 1010, though here we
interpret that as decimal 10 instead of -6, but same bits so if
the compiler casted back to signed we'd never know the
difference. But those upper bits... oh my, zeroes instead of
ones, positive number.
With positive values, sign extension and and zero extension are
the same thing - all zeroes. And since 0 * x = 0 for all x, it is
all discarded once it shifts into the target bits.
But with negative values, the sign extension gives us a bunch of
ones on the lhs to shift in. The rhs doesn't really care - it
gets a bunch of zeroes shifted in on the right so it ignores it.
But those ones on the left change the high word, it now results
in that rhs stuff getting added.
And if you wanna do a test program with 32 bit numbers of course
you will see this same result. Same result as int, discarding
those upper 32 bits, but different assigned long or ulong since
now the initial sign extension led to different values up there.
But since C and D will both happily discard those without an
explicit cast you might never even know.
sorry if this was a bit wordy, if i had the time, i would edit it
down more
More information about the Digitalmars-d
mailing list