Inconsitency

Maxim Fomin maxim at maxim-fomin.ru
Sun Oct 13 10:22:19 PDT 2013


On Sunday, 13 October 2013 at 14:14:14 UTC, nickles wrote:
> Ok, I understand, that "length" is - obviously - used in 
> analogy to any array's length value.
>
> Still, this seems to be inconsistent. D elaborates on 
> implementing "char"s as UTF-8 which means that a "char" in D 
> can be of any length between 1 and 4 bytes for an arbitrary 
> Unicode code point. Shouldn't then this (i.e. the character's 
> length) be the "unit of measurement" for "char"s - like e.g. 
> the size of the underlying struct in an array of "struct"s? The 
> story continues with indexing "string"s: In a consistent 
> implementation, shouldn't
>
>    writeln("säд"[2])
>
> return "д" instead of the trailing surrogate of this cyrillic 
> letter?

This is impossible given current design. At runtime "säд"[2] is 
viewed as struct { void *ptr; size_t length; }; ptr points to 
memory having at least five bytes and length having value 5. 
Druntime hasn't taken UTF course.

One option would be to add support in druntime so it can 
correctly handle such strings, or implement separate string type 
which does not default to char[], but of course the easiest way 
is to convince everybody that everything is OK and advice to use 
some library function which does the job correctly essentially 
implying that the language does the job wrong (pardon me, some D 
skepticism, the deeper I am in it, the more critically view it).


More information about the Digitalmars-d mailing list