string is rarely useful as a function argument
Timon Gehr
timon.gehr at gmx.ch
Sun Jan 1 11:25:39 PST 2012
On 01/01/2012 08:01 PM, Chad J wrote:
> On 01/01/2012 10:39 AM, Timon Gehr wrote:
>> On 01/01/2012 04:13 PM, Chad J wrote:
>>> On 01/01/2012 07:59 AM, Timon Gehr wrote:
>>>> On 01/01/2012 05:53 AM, Chad J wrote:
>>>>>
>>>>> If you haven't been educated about unicode or how D handles it, you
>>>>> might write this:
>>>>>
>>>>> char[] str;
>>>>> ... load str ...
>>>>> for ( int i = 0; i< str.length; i++ )
>>>>> {
>>>>> font.render(str[i]); // Ewww.
>>>>> ...
>>>>> }
>>>>>
>>>>
>>>> That actually looks like a bug that might happen in real world code.
>>>> What is the signature of font.render?
>>>
>>> In my mind it's defined something like this:
>>>
>>> class Font
>>> {
>>> ...
>>>
>>> /** Render the given code point at
>>> the current (x,y) cursor position. */
>>> void render( dchar c )
>>> {
>>> ...
>>> }
>>> }
>>>
>>> (Of course I don't know minute details like where the "cursor position"
>>> comes from, but I figure it doesn't matter.)
>>>
>>> I probably wrote some code like that loop a very long time ago, but I
>>> probably don't have that code around anymore, or at least not easily
>>> findable.
>>
>> I think the main issue here is that char implicitly converts to dchar:
>> This is an implicit reinterpret-cast that is nonsensical if the
>> character is outside the ascii-range.
>
> I agree.
>
> Perhaps the compiler should insert a check on the 8th bit in cases like
> these?
>
> I suppose it's possible someone could declare a bunch of individual
> char's and then start manipulating code units that way, and such an 8th
> bit check could thwart those manipulations, but I would also counter
> that such low manipulations should be done on ubyte's instead.
>
> I don't know how much this would help though. Seems like too little,
> too late.
I think the conversion char -> dchar should just require an explicit
cast. The runtime check is better left to std.conv.to;
>
> The bigger problem is that a char is being taken from a char[] and
> thereby loses its context as (potentially) being part of a larger
> codepoint.
If it is part of a larger code point, then it has its highest bit set.
Any individual char that has its highest bit set does not carry a
character on its own. If it is not set, then it is a single ASCII character.
More information about the Digitalmars-d
mailing list