char[] annoyance...

Tue Apr 11 02:45:58 PDT 2006

Regan Heath wrote:
> On Mon, 10 Apr 2006 16:28:00 +0000 (UTC), Kevin Bealer 
> <Kevin_member at pathlink.com> wrote:
>> In article <ops7rncsol23k2f5 at nrage.netwin.co.nz>, Regan Heath says...
>>>
>>> Take this code:
>>>
>>> void main()
>>> {
>>>     //..open a file, read line for line, on each line:
>>>
>>>     for(int i = 0; i < line.length-2; i++) {
>>>         if (line[i..i+2] != "||") continue;
>>>         //..etc..
>>>     }
>>> }
>>>
>>>
>>> There is a subtle bug. On all lines with a length of 0 or 1 it will give
>>> the following error:
>>>
>>> Error: ArrayBoundsError line_length.d(6)
>>>
>>>
>>> The problem is of course the statement "i < line.length-2". 
>>> line.length is
>>> unsigned, and when you - 2 from an unsigned value.. well lets just say
>>> that it's bigger than the actual length of the line - 2.
>>>
>>>
>>> Of course there are plently of other ways to code this, perhaps using
>>> foreach, but that's not the point. The point is that this code  _can_ be
>>> written and on the surface looks fine. Not even -w (warnings) spots the
>>> signed/unsigned problem. At the very least can we get a warning for 
>>> this?
>>>
>>> Regan
>>
>> I see the "gotcha" here, and whether conversions should be done is an
>> interesting question.  But I wanted to propose a slightly different 
>> solution.
>>
>>> for(int i = 0; i+2 < line.length; i++) {
>>
>> Since you are reference "i+2" in the expression, that is really the 
>> value what you need to be in the [0..length) range.
> 
> You're probably right.. now, if only I could train my brain to think of 
> it that way round :)
> 
>> More generally, always do the arithmetic with the signed variables in 
>> cases like this if there is a possibility of wraparound.
> 
> That's good advice, however you have to realise there is a possibility 
> of wraparound, something I didn't do (or I wouldn't have had the problem 
> in the first place) :(
> 
>> But if the automatic conversions were specified to do uint->int that 
>> would also be fine with me.
> 
> But wouldn't that introduce a bug here:
>   int a = int.max-1;
>   uint b = int.max+1;
>   assert(a < b);
> 
> wouldn't b be promoted to a signed value of <some negative number>?
> 
>> One can also imagine a language with the following inequalities:
>>
>> +>  unsigned greater-than
>> +<  unsigned less than
>> ->  signed greater than
>> -<  signed less than
>> +>= unsigned greater-or-equal
>> etc
>>
>> This would not replace the current >, <, but would essentially just be 
>> shorthand for existing expressions:
>>
>> (A +> B) is the same as (cast(uint)A > cast(uint)B)
>>
>> .. and so on, except that uint, ulong, ushort, etc would be chosen as 
>> needed.
>>
>> Is this worth doing?  Maybe - it's not a big deal to me, but the 
>> signed/unsigned question does represent a common gotcha in certain 
>> expressions.
> 
> It's a good idea, but, same problem as before; you have to realise 
> length is signed and you have to realise there is a chance for 
> wraparound. A warning about the signed/unsigned comparrison is the very 
> least we should do. I would be tempted to even make it an outright error 
> requiring a cast (or one of these new operators above) to handle.
> 
> Regan

Wouldn't it be somewhat the best, if signed/unsigned comparison actually 
worked correctly in mathematical sense?

int a;
uint b;

(a<b)  becomes (a<0 || cast(uint)a < b)
(a==b) becomes (a>=0 && cast(uint)a == b)
(a>b)  becomes (a>=0 && cast(uint)a > b)

Unfortunately, that doesn't solve the original problem

if (a < b.length-2)

What would solve it is making .length signed. Even though it should 
puristically be unsigned, and losing one bit could theoretically be a 
problem (because you couldn't allocate new byte[more than 2GB], the 
reality is that a 32-bit app can't allocate more than 2GB of RAM anyway 
(in Windows at least), and for 64-bit apps, losing that one bit is 
totally insignificant, unless someone expects a computer with 9 exabytes 
of ram anytime soon :)

And the only place really needing change (and a trivial one) would be 
the array resizing code..

xs0