Integer conversions too pedantic in 64-bit

Tue Feb 15 12:57:20 PST 2011

On Tue, 15 Feb 2011 14:15:06 -0500, Rainer Schuetze <r.sagitario at gmx.de>  
wrote:

>
> I think David has raised a good point here that seems to have been lost  
> in the discussion about naming.
>
> Please note that the C name of the machine word integer was usually  
> called "int". The C standard only specifies a minimum bit-size for the  
> different types (see for example  
> http://www.ericgiguere.com/articles/ansi-c-summary.html). Most of  
> current C++ implementations have identical "int" sizes, but now "long"  
> is different. This approach has failed and has caused many headaches  
> when porting software from one platform to another. D has recognized  
> this and has explicitely defined the bit-size of the various integer  
> types. That's good!
>
> Now, with size_t the distinction between platforms creeps back into the  
> language. It is everywhere across phobos, be it as length of ranges or  
> size of containers. This can get viral, as everything that gets in touch  
> with these values might have to stick to size_t. Is this really desired?

Do you really want portable code?  The thing is, size_t is specifically  
defined to be *the word size* whereas C defines int as a fuzzy size  
"should be at least 16 bits, and recommended to be equivalent to the  
natural size of the machine".  size_t is *guaranteed* to be the same size  
on the same platform, even among different compilers.

In addition size_t isn't actually defined by the compiler.  So the library  
controls the size of size_t, not the compiler.  This should make it  
extremely portable.

> Consider saving an array to disk, trying to read it on another platform.  
> How many bits should be written for the size of that array?

It depends on the protocol or file format definition.  It should be  
irrelevant what platform/architecture you are on.  Any format or protocol  
worth its salt will define what size integers you should store.

Then you need a protocol implementation that converts between the native  
size and the stored size.

This is just like network endianness vs. host endianness.  You always use  
htonl and ntohl even if your platform has the same endianness as the  
network, because you want your code to be portable.  Not using them is a  
no-no even if it works fine on your big-endian system.

> I don't have a perfect solution, but maybe builtin arrays could be  
> limited to 2^^32-1 elements (or maybe 2^^31-1 to get rid of endless  
> signed/unsigned conversions), so the normal type to be used is still  
> "int". Ranges should adopt the type sizes of the underlying objects.

No, this is too limiting.  If I have 64GB of memory (not out of the  
question), and I want to have a 5GB array, I think I should be allowed  
to.  This is one of the main reasons to go to 64-bit in the first place.

-Steve