[phobos] UTF-8 string slicing

Walter Bright walter at digitalmars.com
Sat Aug 20 07:59:19 PDT 2011



unDEFER wrote:
> On Sat, 20 Aug 2011 06:49:33 +0400, Walter Bright 
> <walter at digitalmars.com> wrote:
>
>> There isn't any getting away from understanding that UTF-8 is a 
>> multi-byte encoding.
>
> If it is so, then arr.popFront() must break UTF-8 strings ;-)
>
>> If you want to use an encoding with a 1:1 correspondence between 
>> indices and characters, use dchar encoding.
>
> For me use in 4 times more memory for ASCII seems too wasteful, sorry.

Exactly - all I'm saying is that if you want the benefits of UTF-8 - low 
memory consumption *and* high speed processing, you have to be cognizant 
of its underlying storage scheme. In order to get a higher level of "I 
don't care how it is stored, I just want to pretend it's an array of 
Unicode characters", you'll have to give up one or more of efficiency 
and memory consumption.

>
> Walter, I really very like your creation. It is great. Big thank you 
> for it!
> I really believe that there is no bugs, only not documented features ;-)
> I just want to say that the documentation now give enough information.
> std.range or std.array documentation don't say anything about it's 
> behaviour on UTF-8 strings.
> I'm already see source codes to know what really does any function. 
> Open Source is really great :-)
>

I agree, open source can make up for gaps in the documentation.


More information about the phobos mailing list