Making all strings UTF ranges has some risk of WTF

Thu Feb 4 03:25:50 PST 2010

Am 04.02.2010, 04:05 Uhr, schrieb grauzone <none at example.net>:

> Andrei Alexandrescu wrote:
>> What can be done about that? I see a number of solutions:
>>  (a) Do not operate the change at all.
>>  (b) Operate the change and mention that in range algorithms you should  
>> check hasLength and only then use "length" under the assumption that it  
>> really means "elements count".
>>  (c) Deprecate the name .length for UTF-8 and UTF-16 strings, and  
>> define a different name for that. Any other name (codeUnits, codes  
>> etc.) would do. The entire point is to not make algorithms believe  
>> strings have a .length property.
>>  (d) Have std.range define a distinct property called e.g. "count" and  
>> then specialize it appropriately. Then change all references to .length  
>> in std.algorithm and elsewhere to .count.
>>  What would you do? Any ideas are welcome.
>

Definitely against (c)+(d).

> Change the type of string literals from char[] (or whatever the string  
> type is in D2) to a wrapper struct defined in object.d:
>
> struct string {
>      char[] raw;
> }
>

That sounds like a really reasonable way to me.