Comparing D vs C++ (wierd behaviour of C++)

Tue Jul 24 20:59:22 UTC 2018

On Tuesday, 24 July 2018 at 19:24:05 UTC, Ecstatic Coder wrote:
> On Tuesday, 24 July 2018 at 15:08:35 UTC, Patrick Schluter 
> wrote:
>> On Tuesday, 24 July 2018 at 14:41:17 UTC, Ecstatic Coder wrote:
>>> On Tuesday, 24 July 2018 at 14:08:26 UTC, Daniel Kozak wrote:
>>>> I am not C++ expert so this seems wierd to me:
>>>>
>>>> #include <iostream>
>>>> #include <string>
>>>>
>>>> using namespace std;
>>>>
>>>> int main(int argc, char **argv)
>>>> {
>>>> 	char c = 0xFF;
>>>> 	std::string sData = {c,c,c,c};
>>>> 	unsigned int i = (((((sData[0]&0xFF)*256
>>>> 					+ (sData[1]&0xFF))*256)
>>>> 					+ (sData[2]&0xFF))*256
>>>> 					+ (sData[3]&0xFF));
>>>> 					
>>>> 	if (i != 0xFFFFFFFF) { // it is true why?
>>>> 		// this print 18446744073709551615 wow
>>>> 		std::cout << "WTF: " << i  << std::endl;
>>>> 	}	    	
>>>> 	return 0;
>>>> }
>>>>
>>>> compiled with:
>>>> g++ -O2 -Wall  -o "test" "test.cxx"
>>>> when compiled with -O0 it works as expected
>>>>
>>>> Vs. D:
>>>>
>>>> import std.stdio;
>>>>
>>>> void main(string[] args)
>>>> {
>>>> 	char c = 0xFF;
>>>> 	string sData = [c,c,c,c];
>>>> 	uint i = (((((sData[0]&0xFF)*256
>>>> 					+ (sData[1]&0xFF))*256)
>>>> 					+ (sData[2]&0xFF))*256
>>>> 					+ (sData[3]&0xFF));
>>>> 	if (i != 0xFFFFFFFF) { // is false - make sense
>>>> 		writefln("WTF: %d", i);
>>>> 	}			
>>>> }
>>>>
>>>> compiled with:
>>>> dmd -release -inline -boundscheck=off -w -of"test" "test.d"
>>>>
>>>> So it is code gen bug on c++ side, or there is something 
>>>> wrong with that code.
>>>
>>> As the C++ char are signed by default, when you accumulate 
>>> several shifted 8 bit -1 into a char result and then store it 
>>> in a 64 bit unsigned buffer, you get -1 in 64 bits : 
>>> 18446744073709551615.
>>
>> That's not exactly what happens here. There's no 64 bit buffer.
>
> Sure about that ? ;)

Yes, there are no "buffers" only register and a place on the 
stack for the variable i.

As said it's undefined behaviour so anything goes. I just checked 
on godbolt what code is generated. https://godbolt.org/g/wxqfmM
So with -O0 this happens:
 From line 41 to line 77 the instruction to make the calculation. 
At line 78
mov DWORD PTR [rbp-40], eax which is writing out 32 bits to 
reserved space of i.
At line 85  mov eax, DWORD PTR [rbp-40] reloads that value in 
eax, this annuls the high part of RAX => RAX contains 
0x0000_0000_FFFF_FFFF

On the -O2 version it's even simpler. The calculation is done at 
compile time and the endresult -1 is put directly to the output. 
The test is even removed. Everything happens in the compiler.