Inconsitency

monarch_dodra monarchdodra at gmail.com
Sun Oct 13 12:53:08 PDT 2013


On Sunday, 13 October 2013 at 14:14:14 UTC, nickles wrote:
> Ok, I understand, that "length" is - obviously - used in 
> analogy to any array's length value.
>
> Still, this seems to be inconsistent. D elaborates on 
> implementing "char"s as UTF-8 which means that a "char" in D 
> can be of any length between 1 and 4 bytes for an arbitrary 
> Unicode code point. Shouldn't then this (i.e. the character's 
> length) be the "unit of measurement" for "char"s - like e.g. 
> the size of the underlying struct in an array of "struct"s? The 
> story continues with indexing "string"s: In a consistent 
> implementation, shouldn't
>
>    writeln("säд"[2])
>
> return "д" instead of the trailing surrogate of this cyrillic 
> letter?

I think the root misunderstanding is that you think that a string 
is random access.

A string *isn't* random access. They are implemented *inside* an 
array, but unless you know *exactly* what you are doing, you 
shouldn't index, slice or take the length of a string.

A string should be handled like a bidirectional range.

Once you've understood that, it becomes much simpler.
You want the first character? front.
You want to skip the first character? popFront.

You want an arbitrary character in o(N) time?
myString.dropFrontExactly(N).front;
You want an arbitrary character in o(1) time?
You can't.


More information about the Digitalmars-d mailing list