string is rarely useful as a function argument

Timon Gehr timon.gehr at gmx.ch
Sun Jan 1 11:25:39 PST 2012


On 01/01/2012 08:01 PM, Chad J wrote:
> On 01/01/2012 10:39 AM, Timon Gehr wrote:
>> On 01/01/2012 04:13 PM, Chad J wrote:
>>> On 01/01/2012 07:59 AM, Timon Gehr wrote:
>>>> On 01/01/2012 05:53 AM, Chad J wrote:
>>>>>
>>>>> If you haven't been educated about unicode or how D handles it, you
>>>>> might write this:
>>>>>
>>>>> char[] str;
>>>>> ... load str ...
>>>>> for ( int i = 0; i<    str.length; i++ )
>>>>> {
>>>>>        font.render(str[i]); // Ewww.
>>>>>        ...
>>>>> }
>>>>>
>>>>
>>>> That actually looks like a bug that might happen in real world code.
>>>> What is the signature of font.render?
>>>
>>> In my mind it's defined something like this:
>>>
>>> class Font
>>> {
>>>    ...
>>>
>>>       /** Render the given code point at
>>>           the current (x,y) cursor position. */
>>>       void render( dchar c )
>>>       {
>>>           ...
>>>       }
>>> }
>>>
>>> (Of course I don't know minute details like where the "cursor position"
>>> comes from, but I figure it doesn't matter.)
>>>
>>> I probably wrote some code like that loop a very long time ago, but I
>>> probably don't have that code around anymore, or at least not easily
>>> findable.
>>
>> I think the main issue here is that char implicitly converts to dchar:
>> This is an implicit reinterpret-cast that is nonsensical if the
>> character is outside the ascii-range.
>
> I agree.
>
> Perhaps the compiler should insert a check on the 8th bit in cases like
> these?
>
> I suppose it's possible someone could declare a bunch of individual
> char's and then start manipulating code units that way, and such an 8th
> bit check could thwart those manipulations, but I would also counter
> that such low manipulations should be done on ubyte's instead.
>
> I don't know how much this would help though.  Seems like too little,
> too late.

I think the conversion char -> dchar should just require an explicit 
cast. The runtime check is better left to std.conv.to;

>
> The bigger problem is that a char is being taken from a char[] and
> thereby loses its context as (potentially) being part of a larger
> codepoint.

If it is part of a larger code point, then it has its highest bit set. 
Any individual char that has its highest bit set does not carry a 
character on its own. If it is not set, then it is a single ASCII character.


More information about the Digitalmars-d mailing list