Inconsitency
Dicebot
public at dicebot.lv
Sun Oct 13 10:01:14 PDT 2013
On Sunday, 13 October 2013 at 16:31:58 UTC, nickles wrote:
> Well that's a point; on the other hand, D is constantly
> creating and throwing away new strings, so this isn't quite an
> argument. The current solution puts the programmer in charge of
> dealing with UTF-x, where a more consistent implementation
> would put the burden on the implementors of the libraries/core,
> i.e. the ones who usually have a better understanding of
> Unicode than the average programmer.
Ironically, reason is consistency. `string` is just
`immutable(char)[]` and it conforms to usual array behavior
rules. Saying that array element value assignment may allocate it
hardly a good option.
> So, how do you guys handle UTF-8 strings in D? What are your
> solutions to the problems described? Does it all come down to
> converting "string"s and "wstring"s to "dstring"s, manipulating
> them, and re-convert them to "string"s? Btw, what would this
> mean in terms of speed?
If single element access is needed, str.front yields decoded
`dchar`. Or simple `foreach (dchar d; str)` - it won't hide the
fact it is O(n) operation at least. As `str.front` yields dchar,
most `std.algorithm` and `std.range` utilities will also work
correctly on default UTF-8 strings.
Slicing / .length are probably the only operations that do not
respect UTF-8 encoding (because they are exactly the same for all
arrays).
More information about the Digitalmars-d
mailing list