[phobos] UTF-8 string slicing

unDEFER undefer at gmail.com
Fri Aug 19 10:24:41 PDT 2011


Maybe it is so.
We have 3 methods to slice UTF-8:

string substr = str[str.toUTFindex(from) .. str.toUTFindex(to)]
//UTF index, not UCS like Sean Kelly wrote
or
string substr = toUTF8(array(takeExactly(drop(str, from), to -  
firstIndex)));
or
string substr = toUTF8(toUTF32(str)[from..to]);

But anyway the documentation must be more obvious in this part. I have  
learn documentation of D language for 3 days, but I don't understand what  
means UTF-8 support from this..
Now I can't to understand how to difference methods which works with  
strings at UTF-8 symbols level, and methods which works at bytes level.
The fact which the next code
----
writeln( arr.length );
arr.popFront();
writeln( arr.length );
----
prints 9 after 10 for any array but for UTF-8 and UTF-16 strings may print  
as well 8 or lesser, seems too confusing for me.

On Fri, 19 Aug 2011 18:38:21 +0400, SHOO <zan77137 at nifty.com> wrote:

> I agree. The special syntax is unnecessary.
> I usually used Japanese, but slice of the UTF-8 string has not become
> the problem.
> When it is necessary that it looks like it, it is effective to use
> dstring(UTF-32).
>
> When I slice it in UTF-8 including the multi-byte character string,
> the delimiter is an ASCII code in most cases.
> Otherwise, I think that I do not need the special syntax because it is
> considerably special processing. (e.g. Regex)
>
> 2011/8/19 Walter Bright <walter at digitalmars.com>:
>>
>>
>> Sean Kelly wrote:
>>>
>>> I need to do this from time to time, but I generally just do something
>>> like:
>>>
>>> buf[0 .. buf.toUCSindex(n)]
>>>
>>> A shorthand might be nice though, I suppose.
>>>
>>>
>>
>> Somewhat surprisingly, such a function is rarely needed (I've never  
>> needed
>> it in working with UTF8)
>> and so I don't think a special syntax for it is justified.
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos

-- 
registered Linux user #360474
Don't worry, I can read OpenOffice.org


More information about the phobos mailing list