[phobos] UTF-8 string slicing
Walter Bright
walter at digitalmars.com
Sat Aug 20 07:59:19 PDT 2011
unDEFER wrote:
> On Sat, 20 Aug 2011 06:49:33 +0400, Walter Bright
> <walter at digitalmars.com> wrote:
>
>> There isn't any getting away from understanding that UTF-8 is a
>> multi-byte encoding.
>
> If it is so, then arr.popFront() must break UTF-8 strings ;-)
>
>> If you want to use an encoding with a 1:1 correspondence between
>> indices and characters, use dchar encoding.
>
> For me use in 4 times more memory for ASCII seems too wasteful, sorry.
Exactly - all I'm saying is that if you want the benefits of UTF-8 - low
memory consumption *and* high speed processing, you have to be cognizant
of its underlying storage scheme. In order to get a higher level of "I
don't care how it is stored, I just want to pretend it's an array of
Unicode characters", you'll have to give up one or more of efficiency
and memory consumption.
>
> Walter, I really very like your creation. It is great. Big thank you
> for it!
> I really believe that there is no bugs, only not documented features ;-)
> I just want to say that the documentation now give enough information.
> std.range or std.array documentation don't say anything about it's
> behaviour on UTF-8 strings.
> I'm already see source codes to know what really does any function.
> Open Source is really great :-)
>
I agree, open source can make up for gaps in the documentation.
More information about the phobos
mailing list