Making all strings UTF ranges has some risk of WTF
Chad J
chadjoan at __spam.is.bad__gmail.com
Wed Feb 3 18:50:58 PST 2010
Andrei Alexandrescu wrote:
> ...
>
> What can be done about that? I see a number of solutions:
>
> (a) Do not operate the change at all.
>
> (b) Operate the change and mention that in range algorithms you should
> check hasLength and only then use "length" under the assumption that it
> really means "elements count".
>
> (c) Deprecate the name .length for UTF-8 and UTF-16 strings, and define
> a different name for that. Any other name (codeUnits, codes etc.) would
> do. The entire point is to not make algorithms believe strings have a
> .length property.
>
> (d) Have std.range define a distinct property called e.g. "count" and
> then specialize it appropriately. Then change all references to .length
> in std.algorithm and elsewhere to .count.
>
> What would you do? Any ideas are welcome.
>
>
> Andrei
I'm leaning towards (c) here.
To me the .length on char[] and wchar[] are kinda like doing this:
struct SomePOD
{
int a, b;
double y;
}
SomePOD pod;
auto len = pod.length;
assert(len == 16); // true.
I'll admit it's not a perfect analogy. What I'm playing on here is that
the .length on char[] and wchar[] returns the /size of/ the string in
bytes rather than the /length/ of the string in number of (well-formed)
characters.
Unfortunately .sizeof is supposed to return the size of the string's
reference (8 bytes on x86 systems) and not the size of the string, IIRC.
So that's taken.
So perhaps a .bytes or .nbytes property. Maybe make it work for arrays
of structs and things like that too. A tuple (or any container) of
non-homogeneous elements could probably benefit from this property as well.
Given such a property being available, I wouldn't miss .length at all.
It's quite misleading.
More information about the Digitalmars-d
mailing list