Signed word lengths and indexes

Mon Jun 14 15:43:32 PDT 2010

On 14/06/2010 21:52, bearophile wrote:
> I have found a Reddit discussion few days old:
> http://www.reddit.com/r/programming/comments/cdwz5/the_perils_of_unsigned_iteration_in_cc/
>
>  It contains this, that I quote (I have no idea if it's true), plus
> follow-ups:
>
>> At Google using uints of all kinds for anything other than bitmasks
>> or other inherently bit-y, non computable things is strongly
>> discouraged. This includes things like array sizes, and the
>> warnings for conversion of size_t to int are disabled. I think it's
>> a good call.<
>
> I have expressed similar ideas here:
> http://d.puremagic.com/issues/show_bug.cgi?id=3843
>
> Unless someone explains me why I am wrong, I will keep thinking that
> using unsigned words to represent lengths and indexes, as D does, is
> wrong and unsafe, and using signed words (I think C# uses ints for
> that purpose) in D is a better design choice.

Well for a start, you lose half your addressable memory.

unsigned numbers are only a problem if you don't understand how they 
work, but that goes for just about everything else as well.

Personally I hate the use of signed numbers as array indices; it's 
moronic and demonstrates the writers lack of understanding. It's very 
rare to actually want to index an array with a negative number.
Last time I did that was years ago when writing in assembler; and that
was an optimisation hack to squeeze maximum performance out of my code.

c.f.

Item getItem(int indx) {
   if(indx >= 0 && indx < _arr.length)
     return _arr[indx];
   throw new Error(...)
}

vs.

// cleaner no?
Item getItem(uint indx) {
   if(indx < _arr.length)
     return _arr[indx];
   throw new Error(...)
}

and backwards iteration:

for(int i = end - 1; i >= 0; --i)
   ...

vs

for(uint i = end - 1; i < length; --i)
   ...

Ok about the same, but I find the second more clear, the
i < length clearly indicates iteration over the whole array.

And that second wrong bit of code on the blog is wrong
with signed numbers as well:

int len = strlen(some_c_str);
// say some_c_str is empty so len = 0
int i;
for (i = 0; i < len - 1; ++i) {
   // so len - 1 == -1
   // iterate until i wraps round and becomes -1
}

Using 'int's doesn't magically fix it. Wrong code is just wrong.

I do think that allowing un-casted assignments between signed/unsigned 
is a problem though; that's where most of the bugs creep up I've come 
across crop up. I think D should simply disallow implicit mixing of 
signd-ness.

Hasn't that been discussed before? (I'm not referring to the recent post 
in d.learn) It seems familiar.

-- 
My enormous talent is exceeded only by my outrageous laziness.
http://www.ssTk.co.uk