string is rarely useful as a function argument
Chad J
chadjoan at __spam.is.bad__gmail.com
Sun Jan 1 15:16:49 PST 2012
On 01/01/2012 02:25 PM, Timon Gehr wrote:
> On 01/01/2012 08:01 PM, Chad J wrote:
>> On 01/01/2012 10:39 AM, Timon Gehr wrote:
>>> On 01/01/2012 04:13 PM, Chad J wrote:
>>>> On 01/01/2012 07:59 AM, Timon Gehr wrote:
>>>>> On 01/01/2012 05:53 AM, Chad J wrote:
>>>>>>
>>>>>> If you haven't been educated about unicode or how D handles it, you
>>>>>> might write this:
>>>>>>
>>>>>> char[] str;
>>>>>> ... load str ...
>>>>>> for ( int i = 0; i< str.length; i++ )
>>>>>> {
>>>>>> font.render(str[i]); // Ewww.
>>>>>> ...
>>>>>> }
>>>>>>
>>>>>
>>>>> That actually looks like a bug that might happen in real world code.
>>>>> What is the signature of font.render?
>>>>
>>>> In my mind it's defined something like this:
>>>>
>>>> class Font
>>>> {
>>>> ...
>>>>
>>>> /** Render the given code point at
>>>> the current (x,y) cursor position. */
>>>> void render( dchar c )
>>>> {
>>>> ...
>>>> }
>>>> }
>>>>
>>>> (Of course I don't know minute details like where the "cursor position"
>>>> comes from, but I figure it doesn't matter.)
>>>>
>>>> I probably wrote some code like that loop a very long time ago, but I
>>>> probably don't have that code around anymore, or at least not easily
>>>> findable.
>>>
>>> I think the main issue here is that char implicitly converts to dchar:
>>> This is an implicit reinterpret-cast that is nonsensical if the
>>> character is outside the ascii-range.
>>
>> I agree.
>>
>> Perhaps the compiler should insert a check on the 8th bit in cases like
>> these?
>>
>> I suppose it's possible someone could declare a bunch of individual
>> char's and then start manipulating code units that way, and such an 8th
>> bit check could thwart those manipulations, but I would also counter
>> that such low manipulations should be done on ubyte's instead.
>>
>> I don't know how much this would help though. Seems like too little,
>> too late.
>
> I think the conversion char -> dchar should just require an explicit
> cast. The runtime check is better left to std.conv.to;
>
What of valid transfers of ASCII characters into dchar?
Normally this is a widening operation, so I can see how it is permissible.
>>
>> The bigger problem is that a char is being taken from a char[] and
>> thereby loses its context as (potentially) being part of a larger
>> codepoint.
>
> If it is part of a larger code point, then it has its highest bit set.
> Any individual char that has its highest bit set does not carry a
> character on its own. If it is not set, then it is a single ASCII
> character.
See above.
I think that assigning from a char[i] to another char[j] is probably
safe. Similarly for slicing. These calculations tend to occur, I
suspect, when the text is well-anchored. I believe your balanced
parentheses example falls into this category:
(repasted for reader convenience)
void main(){
string s = readln();
int nest = 0;
foreach(x;s){ // iterates by code unit
if(x=='(') nest++;
else if(x==')' && --nest<0) goto unbalanced;
}
if(!nest){
writeln("balanced parentheses");
return;
}
unbalanced:
writeln("unbalanced parentheses");
}
With these observations in hand, I would consider the safety of
operations to go like this:
char[i] = char[j]; // (Reasonably) Safe
char[i1..i2] = char[j1..j2]; // (Reasonably) Safe
char = char; // Safe
dchar = char // Safe. Widening.
char = char[i]; // Not safe. Should error.
dchar = char[i]; // Not safe. Should error. (Corollary)
dchar = dchar[i]; // Safe.
char = char[i1..i2]; // Nonsensical; already an error.
More information about the Digitalmars-d
mailing list