Inconsitency
nickles
ben at world-of-ben.de
Sun Oct 13 07:14:12 PDT 2013
Ok, I understand, that "length" is - obviously - used in analogy
to any array's length value.
Still, this seems to be inconsistent. D elaborates on
implementing "char"s as UTF-8 which means that a "char" in D can
be of any length between 1 and 4 bytes for an arbitrary Unicode
code point. Shouldn't then this (i.e. the character's length) be
the "unit of measurement" for "char"s - like e.g. the size of the
underlying struct in an array of "struct"s? The story continues
with indexing "string"s: In a consistent implementation, shouldn't
writeln("säд"[2])
return "д" instead of the trailing surrogate of this cyrillic
letter?
Btw. how do YOU implement this for "string" (for "dstring" it
works - logically, for "wstring" the same problem arises for code
points above D800)?
Also, I understand, that there is the std.utf.count() function
which returns the length that I was searching for. However, why -
if D is so UTF-8-centric - isn't this function implemented in the
core like ".length"?
More information about the Digitalmars-d
mailing list