Operating with substrings in strings

Oskar Linde olREM at OVEnada.kth.se
Fri Aug 18 15:29:36 PDT 2006


Frank Benoit wrote:

> 
>> Slicing:
>> 
>> char[] h = "hello";
>> char[] sub = h[1..3] // Slice the string "hello"
>> writefln(sub); // Prints "el"
>> 
>> http://digitalmars.com/d/arrays.html#slicing
>> 
> 
> I do not know much about UTF8. And I am often not sure if I do string
> processing right. Can someone enlighten me?
> 
> If I have
> char[] str = ... some multibyte utf8 chars;
> 
> What does str.length give me. The number of bytes or the number of
> characters by looking at every character, which one are multi-bytes?

The number of bytes.

> 
> If I do some slicing (str[3..4]), does the indices slice at these byte
> positions and I have the risk of destroying the string or does it look
> at the characters to find the start of the third utf8 character?

It counts the byte positions. And you are correct. You risk splitting in the
middle of a utf-8 code sequence making the string invalid. 

> 
> Or did I miss something completely?

Not as far as I can tell. :)

/Oskar



More information about the Digitalmars-d-learn mailing list