A strange div bug on Linux x86_64, (both dmd & ldc2): long -5000 / size_t 2 = 9223372036854773308

Fri Aug 14 13:05:28 UTC 2020

On Friday, 14 August 2020 at 09:32:58 UTC, FeepingCreature wrote:
> (*Why* is multiplication fine? I have no idea... but it works 
> in spot testing.)

mul 32 bit * 32 bit and imul 32*32 give the same result in the 
lower 32 bits of the result. Only the upper half shows the 
difference.

The algorithm* in binary looks kinda like a sign extension 
followed by a series of shifts and adds. 
https://en.wikipedia.org/wiki/Multiplication_algorithm#Binary_or_Peasant_multiplication

The signed product is the sum of left-shifted values after sign 
extension. We know the sum of two's complement is the same 
regardless of sign after extension, and that left shift is going 
to naturally only start to become an issue on the high word. Thus 
same deal for a 32 bit result (int = int * int), but you can 
expect differences in a 64 bit result (long = int * int).

Let's first demo it with a 4 bit example. Say -2 * 3.

-2 = 1110 (2 = 0010, then flip the bits + 1: 1101 + 1 = 1110)
  3 = 0011

I'll do the long form in 8 bit, so first, we need to sign extend 
them, which is just duplicating the msb all the way left):

shr     ---     shl
11111110 * 00000011
-----------
01111111 * 00000110
00111111 * 00001100
00011111 * 00011000
00001111 * 00110000
00000111 * 01100000
00000011 * 11000000
00000001 * 10000000 (c)

Now we get the sum of all the rhs values there if the lhs small 
bit is set... which happens to be all of them here

11111010 (c)

That's obviously signed, flip the bits and add one to get our 
result back out, -6. Whether we chop off or keep those high bits, 
no difference.

That was an imul since I sign extended. Now, let's do mul, the 
unsigned one. Same deal except we just pad left with zero:

00001110 * 00000011 (unsigned would be 14 * 3)
-------------------
00000111 * 00000110
00000011 * 00001100
00000001 * 00011000

Sum:       00101010 (unsigned is 42)

Well, those lower bits look the same... 1010, though here we 
interpret that as decimal 10 instead of -6, but same bits so if 
the compiler casted back to signed we'd never know the 
difference. But those upper bits... oh my, zeroes instead of 
ones, positive number.

With positive values, sign extension and and zero extension are 
the same thing - all zeroes. And since 0 * x = 0 for all x, it is 
all discarded once it shifts into the target bits.

But with negative values, the sign extension gives us a bunch of 
ones on the lhs to shift in. The rhs doesn't really care - it 
gets a bunch of zeroes shifted in on the right so it ignores it. 
But those ones on the left change the high word, it now results 
in that rhs stuff getting added.

And if you wanna do a test program with 32 bit numbers of course 
you will see this same result. Same result as int, discarding 
those upper 32 bits, but different assigned long or ulong since 
now the initial sign extension led to different values up there. 
But since C and D will both happily discard those without an 
explicit cast you might never even know.

sorry if this was a bit wordy, if i had the time, i would edit it 
down more