std.string.reverse() for mutable array of chars
Jonathan M Davis
jmdavisProg at gmx.com
Fri Dec 9 03:05:19 PST 2011
On Friday, December 09, 2011 05:58:40 bearophile wrote:
> Jonathan M Davis:
> > And as I explained in bug# 7085, reverse's behavior with regards to
> > dchar[] is completely correct. It's reversing the code points, _not_
> > the graphemes.
> OK. Maybe I will open a differently worded enhancement request, for a
> grapheme-aware std.string.
> > If you want to reverse a char[], then cast it to ubyte[] and reverse
> > that. If you want to reverse a wchar[], then cast it to ushort[] and
> > reverse that. In Phobos, strings are ranges of dchar, so reverse is
> > going to reverse code points. If you want it to reverse code units
> > instead, then you just use the appropriate cast. There's no reason to
> > have it reverse the code units and completely mess up unicode strings.
>
> I am not interested in reversing code units. Sorry if my post has led to
> this wrong idea. For this specific problem I am not going to cast to
> ubyte[] or ushort[] because it gives very wrong results.
>
> It's possible to write a "correct" (that doesn't take into account
> graphemes) reverse even if you do not use casts, keeping the array as
> char[] or wchar[], reversing the bytes, and then reversing the bytes of
> each variable-length codepoint. This is what I was asking to an in-place
> reverse().
I don't expect that std.string will _ever_ be grapheme-aware or be processed
by default as a range of graphemes. That's far too expensive as far as
performance goes. Rather, we're likely to have a wrapper and/or separate
range-type which handles graphemes. Then if you want the extra correctness and
are willing to pay the cost, you use that. As I understand it, std.regex does
have the beginnings of such, but we do still need to have a range type of some
variety (probably in std.utf) which fully supports graphemes.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list