Making all strings UTF ranges has some risk of WTF

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Feb 4 09:19:42 PST 2010


bearophile wrote:
> Simen kjaeraas:
>> Of the above, I feel (b) is the correct solution, and I understand
>> it has already been implemented in svn.
> 
> Yes, I presume he was mostly looking for a justification of his ideas
> he has already accepted and even partially implemented :-)

I am ready to throw away the implementation as soon as a better idea 
comes around. As other times, I operated the change to see how things 
feel with the new approach.

Generally it feels like the new state of affairs is a solid improvement. 
One recurring problem has been that some code has assumed that 
ElementType!SomeString has the width of one encoding unit. That 
assumption is no longer true so I had to change such code with 
typeof(SomeString.init[0]). Probably I'll abstract that as 
CodeUnit!SomeString in std.traits.

I also found some bugs; for example Levenshtein distance was erroneous 
because it didn't operate at character level. The fix using front and 
popFront was very simple.

Regarding defining an entire new struct for strings, I think that's a 
sensible approach. With the new operators in tow, UString (universal 
string) that traffics in dchar and makes representation a detail would 
be nicely implementable. It could even have mutable elements at dchar 
granularity. My feeling is, however, that at this point too much 
toothpaste is out of the tube for that to happen in D2. That would be 
offset if current strings were unbearable, but I think they're working 
very well.


Andrei



More information about the Digitalmars-d mailing list