dchar unicode phobos

Sean Kelly sean at f4.ca
Wed Jun 7 11:53:58 PDT 2006


Oskar Linde wrote:
> Sean Kelly skrev:
> 
>> Eventually however, I plan to add split, join, etc.  These will 
>> probably all assume fixed-width elements, with improved support for 
>> char and wchar strings in a std.string module, as supporting variable 
>> width encoding will slow down the algorithms.
> 
> It sounds reasonable to avoid any variable length awareness in 
> std.array, but I don't really see how supporting that will make split or 
> join any slower. For instance
> 
> (char[]).split(char)
> (char[]).split(char[])
> (char[]).split(bool delegate(char))
> 
> Aren't affected by variable length encodings.

The most obvious performance issue with variable width encodings is with 
searching and matching routines.  And most routines in std.array 
ultimately rely on searching and matching in some form.  However, I 
wasn't going to go so far as to support type conversion for this stuff:

     size_t find( char[] str, dchar elem );

which does help a bit.

 > Only:
> 
> (char[]).split(dchar)
> (char[]).split(bool delegate(dchar))
> 
> are, (by using a dchar foreach over a char[]), but here, the user is 
> explicit about wanting a multi byte implementation. Putting the 
> implementation of the last two versions in std.string gives a neat 
> std.string/std.array separation, but risk confusing the user:
> 
> - Why would "abc".split('a') be in std.array while "abc".split('å') 
> requires std.string?

I had initially thought that std.utf.stride would be required to avoid 
false matches for search routines but have since been told otherwise, so 
there may be no reason for the specialized std.string functions I'd 
mentioned.  I forgot about this bit while writing my last post :-)


Sean



More information about the Digitalmars-d mailing list