Operating with substrings in strings

Derek Parnell derek at psyc.ward
Fri Aug 18 18:17:24 PDT 2006


On Fri, 18 Aug 2006 22:03:49 +0200, Frank Benoit wrote:

>> Slicing:
>> 
>> char[] h = "hello";
>> char[] sub = h[1..3] // Slice the string "hello"
>> writefln(sub); // Prints "el"
>> 
>> http://digitalmars.com/d/arrays.html#slicing
>> 
> 
> I do not know much about UTF8. And I am often not sure if I do string
> processing right. Can someone enlighten me?
> 
> If I have
> char[] str = ... some multibyte utf8 chars;
> 
> What does str.length give me. The number of bytes or the number of
> characters by looking at every character, which one are multi-bytes?

The number of bytes not characters.
 
> If I do some slicing (str[3..4]), does the indices slice at these byte
> positions and I have the risk of destroying the string or does it look
> at the characters to find the start of the third utf8 character?
> 
> Or did I miss something completely?

No you didn't. The above slicing is only guaranteed if the variable
contains ASCII text. If it doesn't then you will have to use more
sophisticated methods.

For example:

  char[] subtext;
  char[] text;

  subtext = toUTF8(toUTF32(text)[1..3]);


-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"



More information about the Digitalmars-d-learn mailing list