Why is size_t unsigned?

Sun Jul 21 23:07:19 PDT 2013

On Mon, Jul 22, 2013 at 06:43:47AM +0200, JS wrote:
> On Monday, 22 July 2013 at 04:31:12 UTC, H. S. Teoh wrote:
> >On Mon, Jul 22, 2013 at 05:47:34AM +0200, JS wrote:
[...]
> >>There seems to be no real good reason why size_t is unsigned...
> >[...]
> >
> >The reason is because it must span the range of CPU-addressable
> >memory addresses. Note that due to way virtual memory works, that may
> >have nothing to do with the actual size of your data (e.g. on Linux,
> >it's possible to allocate more memory than you actually have, as long
> >as you don't actually use it all -- the kernel simply maps the
> >addresses in your page tables into a single zeroed-out page, and
> >marks it as copy-on-write, so you can actually have an array bigger
> >than available memory as long as most of the elements are binary
> >zeroes (though I don't know if druntime currently actually supports
> >such a thing)).
> >
> >
> >T
> 
> but a size has nothing to do with an address.

Size is the absolute difference between two addresses.  So it must be
able to represent up to diff(0, maxAddress).

Besides, the whole thing about size being unsigned is because negative
size makes no sense.

Basically, you have to know that size_t is unsigned, and so you should
be aware of the pitfalls of underflow.

> Sure in x86 we may need to allocate 3GB of data and this would require
> size_t > 2^31 ==> it must be unsigned. But strings really don't need
> to have an unsigned length. If you really need a string of length >
> size_t/2 then have the string type implement a different length
> property.

It would add too much complication to have some types use unsigned size
and others use signed size.

[...]
> this way, for 99.99999999% of the cases where strings are actually <
> 1/2 size_t, one doesn't have to waste cycles doing extra comparing
> or typing extra code... or better, spending hours looking for some
> obscure bug because one compared an int to a uint and no warning was
> thrown.

The real issue here is not whether size_t is signed or unsigned, but the
implicit conversion between them.  This, arguably, is a flaw in the
language design.  Bearophile has been clamoring for a long time about
not allowing implicit signed/unsigned conversion. If you search in
bugzilla you should find the issues he filed for this. :)

Once implicit conversion between signed/unsigned is removed, the root
problem disappears -- mistakes like (i < array.length-1) where i is an
int will cause a compile error (comparing signed with unsigned). In the
cases where you actually want wraparound behaviour, an explicit cast
will be required, which is self-documenting and makes the programmer
aware of the potential pitfalls.

> Alternatively,
> 
> for(int i = 0; i < s.length - 1; i++) could at lease check for
> underflow on the cmp and break the loop.

If you're bent on subtracting array lengths, do this:

	assert(s.length <= int.max);
	int len = cast(int)s.length;
	for (int i=0; i < len-1; i++) {
		...
	}

The optimizer should be able to reduce len to whatever it does when you
write s.length inside the loop condition. The cast incurs no runtime
penalty, because 2's complement representation for signed/unsigned
numbers are identical when the numbers concerned are positive.

This way, you make the intent of the code clear, and force it to fail if
your assumptions didn't hold. Self-documenting code is always a good
thing.

T

-- 
Век живи - век учись. А дураком помрёшь.