To Walter, about char[] initialization by FF

Oskar Linde oskar.lindeREM at OVEgmail.com
Mon Jul 31 02:50:29 PDT 2006


Serg Kovrov wrote:
> Maybe I missed the point here, correct me if I misunderstood.

You have understood correctly.

> This is how I see the problem with char[] as utf-8 *string*. The length 
> of array of chars is not always count of characters, but rather size of 
> array in bytes. Which makes no sense for me. For that purpose I would 
> like to see separate properties.

Having char[].length return something other than the actual number of 
char-units would break it's array semantics.

> For example,
> char[] str = "тест";
> word "test" in russian - 4 cyrillic characters, would give you 
> str.length 8, which make no use of this length property if you not sure 
> that string is latin characters only.

It is actually not very often that you need to count the number of 
characters as opposed to the number of (UTF-8) code units. Counting the 
number of characters is also a rather expensive operation. All the 
ordinary operations (searching, slicing, concatenation, sub-string 
search, etc) operate on code units rather than characters.

It is easy to implement your own character count though:

size_t count(char[] arr) {
	size_t c = 0;
	foreach(dchar c;arr)
		c++;
	return c;
}

assert("тест".count() == 4);

Also note that:

assert("тест"d.length == 4);

/Oskar





More information about the Digitalmars-d mailing list