[phobos] UTF-8 string slicing

Sun Aug 21 21:00:26 PDT 2011

On 08/19/2011 08:40 PM, Jonathan M Davis wrote:
> On Friday, August 19, 2011 19:58:34 Benjamin Shropshire wrote:
>> On 08/18/2011 02:21 AM, unDEFER wrote:
>>> Hello!
>>>
>>> D language specification says that it supports UTF-8 strings, but I
>>> can't
>>> find how to slice UTF-8 string by character index, not by bytes numbers.
>>> Why there is no simple slice function in std.utf like attached code?
>> BTW: your code is flawed. Feed it some of the stuff near the end of this
>> post and it will fail:
>>
>> http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtm
>> l-self-contained-tags/1732454#1732454
>>
>> tl;dr; your code doesn't slice on characters but something called (IIRC)
>> code points. If you start worrying about diacritic (and many end user
>> will want you to)
>> you need to do a bunch more processing.
>>
>> http://en.wikipedia.org/wiki/Diacritic
> His code works as well as slicing a dstring does - save for the efficiency
> issues. There is no way in Phobos at present to deal with graphemes. All of
> the string processing in Phobos deals with code points. For the most part,
> this works great, but it is true that it isn't complete. I expect that we'll
> get grapheme support eventually (Ibelieve that Dmitry has done some work on a
> grapheme range for the updates that he's been doing to std.regex for GSoC, so
> we may get it from there). But for now, none of the string processing in D
> worries about graphemes - just code points.

My thought on that subject is: I can see good reason to index on proper 
characters (get the 4th char in the word), good reason to index to a 
character (or sometimes a code point) near some byte position and there 
are clearly good reason to iterate thought code points, but I don't see 
much value to be had from asking for a random Nth code point that can't 
be had via something that has fewer problem and/or is cheaper.

> - Jonathan M Davis
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos