Making all strings UTF ranges has some risk of WTF
Robert Jacques
sandford at jhu.edu
Wed Feb 3 18:28:59 PST 2010
On Wed, 03 Feb 2010 21:00:21 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> wrote:
> (a) Do not operate the change at all.
>
> (b) Operate the change and mention that in range algorithms you should
> check hasLength and only then use "length" under the assumption that it
> really means "elements count".
>
> (c) Deprecate the name .length for UTF-8 and UTF-16 strings, and define
> a different name for that. Any other name (codeUnits, codes etc.) would
> do. The entire point is to not make algorithms believe strings have a
> .length property.
>
> (d) Have std.range define a distinct property called e.g. "count" and
> then specialize it appropriately. Then change all references to .length
> in std.algorithm and elsewhere to .count.
>
> What would you do? Any ideas are welcome.
I like b) and d), with a slight preference for d. I think the benefits of
strings being encoding correct and able to use std.algorithm outweighs the
disadvantages. And making char[] different from T[] is going to play havoc
with templated algorithms. Another alternative is to remove the char types
from the language and implement them as library ranges.
More information about the Digitalmars-d
mailing list