char[] annoyance...

Sun Apr 9 20:36:18 PDT 2006

On Mon, 10 Apr 2006 12:17:30 +1000, Derek Parnell <derek at psych.ward> wrote:
> On Mon, 10 Apr 2006 14:04:35 +1200, Regan Heath wrote:
>
>> On Mon, 10 Apr 2006 11:58:09 +1000, Derek Parnell <derek at psych.ward>  
>> wrote:
>>> On Mon, 10 Apr 2006 12:23:06 +1200, Regan Heath wrote:
>>>
>>>> Take this code:
>>>>
>>>> void main()
>>>> {
>>>> 	//..open a file, read line for line, on each line:
>>>>
>>>> 	for(int i = 0; i < line.length-2; i++) {
>>>> 		if (line[i..i+2] != "||") continue;
>>>> 		//..etc..
>>>> 	}
>>>> }
>>>>
>>>> There is a subtle bug. On all lines with a length of 0 or 1 it will  
>>>> give
>>>> the following error:
>>>>
>>>> Error: ArrayBoundsError line_length.d(6)
>>>>
>>>> The problem is of course the statement "i < line.length-2".  
>>>> line.length
>>>> is
>>>> unsigned, and when you - 2 from an unsigned value.. well lets just say
>>>> that it's bigger than the actual length of the line - 2.
>>>>
>>>> Of course there are plently of other ways to code this, perhaps using
>>>> foreach, but that's not the point. The point is that this code  _can_  
>>>> be
>>>> written and on the surface looks fine. Not even -w (warnings) spots  
>>>> the
>>>> signed/unsigned problem. At the very least can we get a warning for
>>>> this?
>>>
>>> I too have tripped up on this 'bug' and it is very annoying and
>>> surprising.
>>>
>>> However your approach to the problem might need to change...as you  
>>> state
>>> -
>>> "On all lines with a length of 0 or 1 it will give the following
>>> error..."
>>> - and this is because the test that you are performing is only  
>>> applicable
>>> to lines with two or more characters in them ... so make that  
>>> condition a
>>> part of the algorithm ...
>>>
>>>   if (line.length >= 2) {
>>>     for(int i = 0; i < line.length-2; i++) {
>>>       if (line[i..i+2] != "||") continue;
>>>       //..etc..
>>>     }
>>>   }
>>>   else {
>>>      // do something for short lines...
>>>   }
>>
>> Thanks Derek, that's exactly what I did.
>
> Of course; that was obvious. No put-down was implied ;-)

and none was taken. :)

>> The point of this post isn't to get help with one specific problem but
>> rather to ask whether there is a solution to the underlying problem
>> behaviour. I suspect this behaviour will be the source of many bugs to
>> come and I wonder if there is a way to avoid them.
>
> The point of my reply was that the "solution to the underlying problem
> behaviour" is to properly express (and thus document) the algorithm  
> rather than rely on side-effects of the implemented language.

You're right, in that this bug is caused by a side-effect of the language.  
I disagree that the algorithm is/was incompletely/incorrectly expressed  
(more on this below).

WRT the side-effect; My question is, do we attempt to prevent future bugs  
of this sort or do we ignore it?

Off the top of my head, we can:
a - avoid the side-effect i.e. make length signed.
b - attempt to catch it, and others like it. i.e. add a signed/unsigned  
warning.
c - have a type with the range of an unsigned type and a lower bound of 0  
(no underflow)

> In this case, as an example of such algorithm documentation, if one
> explicitly states that the search only applies to lines of two or more
> characters, it helps the reader of the code understand your intentions.
> Without that, the code phrase (i < line.length-2) might not trigger the
> reader's thoughts about handling short lines.

I agree with the general statement; you should express the algorithm in a  
clear fashion.

In fact that it why I think we need to fix this. This side-effect results  
in a less clear/concise expression of the actual algorithm. It results in  
a superfluous check for a non-existant special case. Allow me to  
elaborate..

 From a purely algorithmic perspective there is nothing special about short  
lines in this case, the same rule applies as to any other line, "the index  
'i' starts at 0 and must always be less than the length of the line - 2".  
I believe this is a complete and correct expression of the algorithm  
itself.

However, a side-effect of the representation (code/language) is creating a  
special case for short lines which does not logically exist. A special  
case which is not obvious at the time the code is written, resulting in a  
bug. A special case which complicates the representation of the algorithm  
(the code) by requiring we check for short lines for no gain and some loss  
in terms of performance (negligible) and readability.

Of course you can argue that the limitations of the representation  
(code/language) are always going to affect how we can express any given  
algorithm, this is true. Ideally however we should be able to express an  
algorithm as precisely/exactly as possible without a risk of a  
side-effects causing them to fail.

At the very least, in this case, we can make the programmer aware of the  
side effect where it applies.

Regan