Inconsitency

Dicebot public at dicebot.lv
Sun Oct 13 10:01:14 PDT 2013


On Sunday, 13 October 2013 at 16:31:58 UTC, nickles wrote:
> Well that's a point; on the other hand, D is constantly 
> creating and throwing away new strings, so this isn't quite an 
> argument. The current solution puts the programmer in charge of 
> dealing with UTF-x, where a more consistent implementation 
> would put the burden on the implementors of the libraries/core, 
> i.e. the ones who usually have a better understanding of 
> Unicode than the average programmer.

Ironically, reason is consistency. `string` is just 
`immutable(char)[]` and it conforms to usual array behavior 
rules. Saying that array element value assignment may allocate it 
hardly a good option.

> So, how do you guys handle UTF-8 strings in D? What are your 
> solutions to the problems described? Does it all come down to 
> converting "string"s and "wstring"s to "dstring"s, manipulating 
> them, and re-convert them to "string"s? Btw, what would this 
> mean in terms of speed?

If single element access is needed, str.front yields decoded 
`dchar`. Or simple `foreach (dchar d; str)` - it won't hide the 
fact it is O(n) operation at least. As `str.front` yields dchar, 
most `std.algorithm` and `std.range` utilities will also work 
correctly on default UTF-8 strings.

Slicing / .length are probably the only operations that do not 
respect UTF-8 encoding (because they are exactly the same for all 
arrays).


More information about the Digitalmars-d mailing list