size_t + ptrdiff_t

Sun Feb 19 11:21:04 PST 2012

On 02/19/2012 07:27 PM, Manu wrote:
> On 19 February 2012 20:07, Timon Gehr <timon.gehr at gmx.ch
> <mailto:timon.gehr at gmx.ch>> wrote:
>
>     On 02/19/2012 03:59 PM, Manu wrote:
>
>         Okay, so it came up a couple of times, but the questions is,
>         what are we
>         going to do about it?
>
>         size_t and ptrdiff_t are incomplete, and represent non-complimentary
>         signed/unsigned halves of the requirement.
>         There are TWO types needed, register size, and pointer size.
>         Currently,
>         these are assumed to be the same, which is a false assumption.
>
>         I propose size_t + ssize_t should both exist, and represent the
>         native
>         integer size. Also something like ptr_t, and ptrdiff_t should also
>         exist, and represent the size of the pointer.
>
>         Personally, I don't like the _t notation at all. It doesn't fit
>         the rest
>         of the D types, but it's established, so I don't expect it can
>         change.
>         But we do need the 2 missing types.
>
>         There is also the problem that there is lots of code written
>         using the
>         incorrect types. Some time needs to be taken to correct phobos
>         too I guess.
>
>
>     Currently, size_t is defined to be what you call ptr_t, ptrdiff_t is
>     present, and what you call size_t/ssize_t does not exist. Under
>     which circumstances is it important to have a distinct type that
>     denotes the register size? What kind of code requires such a type?
>     It is unportable.
>
>

Note that I agree that getting the terminology straight would be an 
overall improvement.

> It is just as unportable as size_t its self.

Currently, size_t is typeof(array.length). This is portable, and is 
basically the only place size_t commonly occurs in D code.

> The reason you need it is to improve portability, otherwise people need to create arbitrary
> version mess, which will inevitably be incorrect.
> Anything from calling convention code, structure layout/packing, copying
> memory, basically optimising for 64bits at all... I can imagine static
> branches on the width of that type to select different paths.

That is not a very valid use case. In every static branch you'll know 
exactly what the width is.

> Even just basic efficiency, using 32bit ints on many 64bit machines
> require extra sign-extend opcodes after every single load... total waste
> of cpu time.
>

Using 64bit ints everywhere to represent 32bit ints won't make your 
program go faster. Cache lines fill up faster when the data contains 
large amounts of unnecessary padding. Furthermore, the compiler should 
be able to eliminate unneeded sign-extend operations. Anyway, extra 
sign-extend opcodes are not worth caring about if you get up to twice 
the number of conflict cache misses.

> Currently, if you're running a 64bit system with 32bit pointers, there
> is absolutely nothing that exists at compile time to tell you you're
> running a 64bit system,

Isn't there some version identifier for this? If there is not, such an 
identifier could be introduced trivially and this must be done.

> or to declare a variable of the machines native
> type, which you're crazy if you say is not important information.

What do you do with the machine's native type other than checking its 
size in a static if declaration? If you don't, then the code is 
unportable, and using the proper fixed size types would make it 
portable. If you do, then you could have checked a built-in version 
instead. What you effectively want for optimization is the most 
efficient type that is at least a certain number of bits wide. And even 
then, it is a moot point, because storing such variables in memory will 
add unnecessary padding to your data structures.

> What's the point of a 64bit machine, if you treat it exactly like a 32bit
> machine in every aspect?

There is none.