string types: const(char)[] and cstring

Tue May 29 17:47:31 PDT 2007

Marcin Kuszczak Wrote:
> Regan Heath wrote:
> 
> > Marcin Kuszczak Wrote:
> >> Regan Heath wrote:
> >> 
> >> > The default language/library support can reverse utf8 and 16 but it's
> >> > not ideal, eg.  convert to utf32, reverse, convert back. ;)
> >> > 
> >> > Regan
> >> 
> >> I am not sure what do you mean with this sentence...
> >> 
> >> dstring implementation doesn't do things according to your description,
> >> so it's definitely not a case here...
> > 
> > I'm lost, what is "dstring"?
> > 
> > All I meant was that using std.utf you can say:
> > 
> > char[] text = "<characters which take more than 1 char to represent>";
> > 
> > text = toUTF8(toUTF32(text).reverse);
> > 
> > and the result will be a correctly reversed UTF8 string.  Or am I missing
> > something?
> > 
> > Regan Heath
> 
> dstring is implementation of string struct by Chris Miller which takes care
> about slicing utf8 sequences and is compatible with char[], wchar[] and
> dchar[]. I mentioned it because I think that it's better when foreach know
> nothing about slicing utf8 sequence (opposite to way it is implemented
> currently). It should be responsibility of string class (like e.g. dstring)
> with proper opApply method. Because my previous e-mail was in context of
> dstring, I haven't understood what did you mean... 'reverse' and 'sort'
> could be also implemented in such class in a way which will cope properly
> with utf8 sequences...

Ahh, thanks, that clears up the confusion I had.  Yes, a string class/struct could definately handle the codepoint issue.  It would also be able to handle it better than the method I suggested, which is a brute force method based on an assumption which may prove to be false (I suspect toUTF32 it converts UTF8 and 16 to non-compound UTF32 in all cases.  But I could be wrong)

But to respond to your original point (which I didn't address earlier, sorry) I have no problem with the foreach behaviour:

char[] text = "<compound characters>";
foreach(dchar c; text) { .. }

because, I suspect, the code which handles this is in std.utf (toUTF32) already.  You seem to want to move the behaviour to a string class, but why can't it exist in both places?

I guess the problem you might have with it is that it effectively says to someone implementing a D compiler:  You need to handle conversions from/to UTF8, 16 and 32 and (assuming I am correct about toUTF32) you need to convert UTF8 and 16 to non-compound UTF32.

Which might make it harder for someone to implement a D compiler.  I don't know.

Regan Heath