Proposal for fixing dchar ranges

Chris Williams yoreanon-chrisw at yahoo.co.jp
Mon Mar 10 14:51:37 PDT 2014


On Monday, 10 March 2014 at 18:13:14 UTC, Steven Schveighoffer 
wrote:
> Indexing is rarely a feature one needs or should use, 
> especially with encoded strings.

If I was writing something like a chat or terminal window, I 
would want to be able to jump to chunks of text based on some 
sort of buffer length, then search for actual character 
boundaries. Similarly, if I was indexing text, I don't care what 
the underlying data is just whether any particular set of n-bytes 
have been seen together among some document. For the latter case, 
I don't need to be able to interpret the data as text while 
indexing, but once I perform an actual search and want to jump 
the user to that line in the file, being able to take a byte 
offset that I had stored in the index and convert that to a 
textual position would be good.

I do think that D should have something like

alias String8 = UTF!char;
alias String16 = UTF!wchar;
alias String32 = UTF!dchar;

And that those sit on top of an underlying immutable(xchar)[] 
buffer, providing variants of things like foreach and length 
based on code-point or grapheme boundaries. But I don't think 
there's any value in reinterpretting "string". Not being a struct 
or an object, it doesn't have the extensibility to be useful for 
all the variations of access that working with Unicode and the 
underlying bytes warrants.


More information about the Digitalmars-d mailing list