Inconsitency

nickles ben at world-of-ben.de
Sun Oct 13 07:14:12 PDT 2013


Ok, I understand, that "length" is - obviously - used in analogy 
to any array's length value.

Still, this seems to be inconsistent. D elaborates on 
implementing "char"s as UTF-8 which means that a "char" in D can 
be of any length between 1 and 4 bytes for an arbitrary Unicode 
code point. Shouldn't then this (i.e. the character's length) be 
the "unit of measurement" for "char"s - like e.g. the size of the 
underlying struct in an array of "struct"s? The story continues 
with indexing "string"s: In a consistent implementation, shouldn't

    writeln("säд"[2])

return "д" instead of the trailing surrogate of this cyrillic 
letter?
Btw. how do YOU implement this for "string" (for "dstring" it 
works - logically, for "wstring" the same problem arises for code 
points above D800)?

Also, I understand, that there is the std.utf.count() function 
which returns the length that I was searching for. However, why - 
if D is so UTF-8-centric - isn't this function implemented in the 
core like ".length"?



More information about the Digitalmars-d mailing list