string is rarely useful as a function argument

Chad J chadjoan at __spam.is.bad__gmail.com
Sun Jan 1 17:21:54 PST 2012


On 01/01/2012 06:36 PM, Timon Gehr wrote:
> On 01/02/2012 12:16 AM, Chad J wrote:
>> On 01/01/2012 02:25 PM, Timon Gehr wrote:
>>> On 01/01/2012 08:01 PM, Chad J wrote:
>>>> On 01/01/2012 10:39 AM, Timon Gehr wrote:
>>>>> On 01/01/2012 04:13 PM, Chad J wrote:
>>>>>> On 01/01/2012 07:59 AM, Timon Gehr wrote:
>>>>>>> On 01/01/2012 05:53 AM, Chad J wrote:
>>>>>>>>
>>>>>>>> If you haven't been educated about unicode or how D handles it, you
>>>>>>>> might write this:
>>>>>>>>
>>>>>>>> char[] str;
>>>>>>>> ... load str ...
>>>>>>>> for ( int i = 0; i<     str.length; i++ )
>>>>>>>> {
>>>>>>>>         font.render(str[i]); // Ewww.
>>>>>>>>         ...
>>>>>>>> }
>>>>>>>>
>>>>>>>
>>>>>>> That actually looks like a bug that might happen in real world code.
>>>>>>> What is the signature of font.render?
>>>>>>
>>>>>> In my mind it's defined something like this:
>>>>>>
>>>>>> class Font
>>>>>> {
>>>>>>     ...
>>>>>>
>>>>>>        /** Render the given code point at
>>>>>>            the current (x,y) cursor position. */
>>>>>>        void render( dchar c )
>>>>>>        {
>>>>>>            ...
>>>>>>        }
>>>>>> }
>>>>>>
>>>>>> (Of course I don't know minute details like where the "cursor
>>>>>> position"
>>>>>> comes from, but I figure it doesn't matter.)
>>>>>>
>>>>>> I probably wrote some code like that loop a very long time ago, but I
>>>>>> probably don't have that code around anymore, or at least not easily
>>>>>> findable.
>>>>>
>>>>> I think the main issue here is that char implicitly converts to dchar:
>>>>> This is an implicit reinterpret-cast that is nonsensical if the
>>>>> character is outside the ascii-range.
>>>>
>>>> I agree.
>>>>
>>>> Perhaps the compiler should insert a check on the 8th bit in cases like
>>>> these?
>>>>
>>>> I suppose it's possible someone could declare a bunch of individual
>>>> char's and then start manipulating code units that way, and such an 8th
>>>> bit check could thwart those manipulations, but I would also counter
>>>> that such low manipulations should be done on ubyte's instead.
>>>>
>>>> I don't know how much this would help though.  Seems like too little,
>>>> too late.
>>>
>>> I think the conversion char ->  dchar should just require an explicit
>>> cast. The runtime check is better left to std.conv.to;
>>>
>>
>> What of valid transfers of ASCII characters into dchar?
>>
>> Normally this is a widening operation, so I can see how it is
>> permissible.
>>
>>>>
>>>> The bigger problem is that a char is being taken from a char[] and
>>>> thereby loses its context as (potentially) being part of a larger
>>>> codepoint.
>>>
>>> If it is part of a larger code point, then it has its highest bit set.
>>> Any individual char that has its highest bit set does not carry a
>>> character on its own. If it is not set, then it is a single ASCII
>>> character.
>>
>> See above.
>>
>>
>> I think that assigning from a char[i] to another char[j] is probably
>> safe.  Similarly for slicing.  These calculations tend to occur, I
>> suspect, when the text is well-anchored.  I believe your balanced
>> parentheses example falls into this category:
>> (repasted for reader convenience)
>>
>> void main(){
>>      string s = readln();
>>      int nest = 0;
>>      foreach(x;s){ // iterates by code unit
>>          if(x=='(') nest++;
>>          else if(x==')'&&  --nest<0) goto unbalanced;
>>      }
>>      if(!nest){
>>          writeln("balanced parentheses");
>>          return;
>>      }
>> unbalanced:
>>      writeln("unbalanced parentheses");
>> }
>>
>> With these observations in hand, I would consider the safety of
>> operations to go like this:
>>
>> char[i] = char[j];           // (Reasonably) Safe
>> char[i1..i2] = char[j1..j2]; // (Reasonably) Safe
>> char = char;                 // Safe
>> dchar = char                 // Safe.  Widening.
>> char = char[i];              // Not safe.  Should error.
>> dchar = char[i];             // Not safe.  Should error. (Corollary)
>> dchar = dchar[i];            // Safe.
>> char = char[i1..i2];         // Nonsensical; already an error.
> 
> That is an interesting point of view. Your proposal would therefore be
> to constrain char to the ASCII range except if it is embedded in an
> array? It would break the balanced parentheses example.

I just ran the example and wow, x didn't type-infer to dchar like I
expected it to.  I thought the comment might be wrong, but no, it is
correct, x type-infers to char.

I expected it to behave more like the old days before type inference
showed up everywhere:

void main(){
     string s = readln();
     int nest = 0;
     foreach(dchar x;s){ // iterates by code POINT; notice the dchar.
         if(x=='(') nest++;
         else if(x==')'&&  --nest<0) goto unbalanced;
     }
     if(!nest){
         writeln("balanced parentheses");
         return;
     }
unbalanced:
     writeln("unbalanced parentheses");
}

This version wouldn't be broken.  If the type inference changed, the
other version wouldn't be broken either.  This could break other things
though.  Bummer.


More information about the Digitalmars-d mailing list