Making all strings UTF ranges has some risk of WTF

Robert Jacques sandford at jhu.edu
Wed Feb 3 18:28:59 PST 2010


On Wed, 03 Feb 2010 21:00:21 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:
> (a) Do not operate the change at all.
>
> (b) Operate the change and mention that in range algorithms you should  
> check hasLength and only then use "length" under the assumption that it  
> really means "elements count".
>
> (c) Deprecate the name .length for UTF-8 and UTF-16 strings, and define  
> a different name for that. Any other name (codeUnits, codes etc.) would  
> do. The entire point is to not make algorithms believe strings have a  
> .length property.
>
> (d) Have std.range define a distinct property called e.g. "count" and  
> then specialize it appropriately. Then change all references to .length  
> in std.algorithm and elsewhere to .count.
>
> What would you do? Any ideas are welcome.

I like b) and d), with a slight preference for d. I think the benefits of  
strings being encoding correct and able to use std.algorithm outweighs the  
disadvantages. And making char[] different from T[] is going to play havoc  
with templated algorithms. Another alternative is to remove the char types  
 from the language and implement them as library ranges.



More information about the Digitalmars-d mailing list