string is rarely useful as a function argument
Chad J
chadjoan at __spam.is.bad__gmail.com
Sun Jan 1 17:21:54 PST 2012
On 01/01/2012 06:36 PM, Timon Gehr wrote:
> On 01/02/2012 12:16 AM, Chad J wrote:
>> On 01/01/2012 02:25 PM, Timon Gehr wrote:
>>> On 01/01/2012 08:01 PM, Chad J wrote:
>>>> On 01/01/2012 10:39 AM, Timon Gehr wrote:
>>>>> On 01/01/2012 04:13 PM, Chad J wrote:
>>>>>> On 01/01/2012 07:59 AM, Timon Gehr wrote:
>>>>>>> On 01/01/2012 05:53 AM, Chad J wrote:
>>>>>>>>
>>>>>>>> If you haven't been educated about unicode or how D handles it, you
>>>>>>>> might write this:
>>>>>>>>
>>>>>>>> char[] str;
>>>>>>>> ... load str ...
>>>>>>>> for ( int i = 0; i< str.length; i++ )
>>>>>>>> {
>>>>>>>> font.render(str[i]); // Ewww.
>>>>>>>> ...
>>>>>>>> }
>>>>>>>>
>>>>>>>
>>>>>>> That actually looks like a bug that might happen in real world code.
>>>>>>> What is the signature of font.render?
>>>>>>
>>>>>> In my mind it's defined something like this:
>>>>>>
>>>>>> class Font
>>>>>> {
>>>>>> ...
>>>>>>
>>>>>> /** Render the given code point at
>>>>>> the current (x,y) cursor position. */
>>>>>> void render( dchar c )
>>>>>> {
>>>>>> ...
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> (Of course I don't know minute details like where the "cursor
>>>>>> position"
>>>>>> comes from, but I figure it doesn't matter.)
>>>>>>
>>>>>> I probably wrote some code like that loop a very long time ago, but I
>>>>>> probably don't have that code around anymore, or at least not easily
>>>>>> findable.
>>>>>
>>>>> I think the main issue here is that char implicitly converts to dchar:
>>>>> This is an implicit reinterpret-cast that is nonsensical if the
>>>>> character is outside the ascii-range.
>>>>
>>>> I agree.
>>>>
>>>> Perhaps the compiler should insert a check on the 8th bit in cases like
>>>> these?
>>>>
>>>> I suppose it's possible someone could declare a bunch of individual
>>>> char's and then start manipulating code units that way, and such an 8th
>>>> bit check could thwart those manipulations, but I would also counter
>>>> that such low manipulations should be done on ubyte's instead.
>>>>
>>>> I don't know how much this would help though. Seems like too little,
>>>> too late.
>>>
>>> I think the conversion char -> dchar should just require an explicit
>>> cast. The runtime check is better left to std.conv.to;
>>>
>>
>> What of valid transfers of ASCII characters into dchar?
>>
>> Normally this is a widening operation, so I can see how it is
>> permissible.
>>
>>>>
>>>> The bigger problem is that a char is being taken from a char[] and
>>>> thereby loses its context as (potentially) being part of a larger
>>>> codepoint.
>>>
>>> If it is part of a larger code point, then it has its highest bit set.
>>> Any individual char that has its highest bit set does not carry a
>>> character on its own. If it is not set, then it is a single ASCII
>>> character.
>>
>> See above.
>>
>>
>> I think that assigning from a char[i] to another char[j] is probably
>> safe. Similarly for slicing. These calculations tend to occur, I
>> suspect, when the text is well-anchored. I believe your balanced
>> parentheses example falls into this category:
>> (repasted for reader convenience)
>>
>> void main(){
>> string s = readln();
>> int nest = 0;
>> foreach(x;s){ // iterates by code unit
>> if(x=='(') nest++;
>> else if(x==')'&& --nest<0) goto unbalanced;
>> }
>> if(!nest){
>> writeln("balanced parentheses");
>> return;
>> }
>> unbalanced:
>> writeln("unbalanced parentheses");
>> }
>>
>> With these observations in hand, I would consider the safety of
>> operations to go like this:
>>
>> char[i] = char[j]; // (Reasonably) Safe
>> char[i1..i2] = char[j1..j2]; // (Reasonably) Safe
>> char = char; // Safe
>> dchar = char // Safe. Widening.
>> char = char[i]; // Not safe. Should error.
>> dchar = char[i]; // Not safe. Should error. (Corollary)
>> dchar = dchar[i]; // Safe.
>> char = char[i1..i2]; // Nonsensical; already an error.
>
> That is an interesting point of view. Your proposal would therefore be
> to constrain char to the ASCII range except if it is embedded in an
> array? It would break the balanced parentheses example.
I just ran the example and wow, x didn't type-infer to dchar like I
expected it to. I thought the comment might be wrong, but no, it is
correct, x type-infers to char.
I expected it to behave more like the old days before type inference
showed up everywhere:
void main(){
string s = readln();
int nest = 0;
foreach(dchar x;s){ // iterates by code POINT; notice the dchar.
if(x=='(') nest++;
else if(x==')'&& --nest<0) goto unbalanced;
}
if(!nest){
writeln("balanced parentheses");
return;
}
unbalanced:
writeln("unbalanced parentheses");
}
This version wouldn't be broken. If the type inference changed, the
other version wouldn't be broken either. This could break other things
though. Bummer.
More information about the Digitalmars-d
mailing list