To Walter, about char[] initialization by FF

Thomas Kuehne thomas-dloop at kuehne.cn
Mon Jul 31 12:33:35 PDT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Oskar Linde schrieb am 2006-07-31:
> Serg Kovrov wrote:

>> For example,
>> char[] str = "????";
>> word "test" in russian - 4 cyrillic characters, would give you 
>> str.length 8, which make no use of this length property if you not sure 
>> that string is latin characters only.
>
> It is actually not very often that you need to count the number of 
> characters as opposed to the number of (UTF-8) code units. Counting the 
> number of characters is also a rather expensive operation. All the 
> ordinary operations (searching, slicing, concatenation, sub-string 
> search, etc) operate on code units rather than characters.
>
> It is easy to implement your own character count though:
>
> size_t count(char[] arr) {
> 	size_t c = 0;
> 	foreach(dchar c;arr)
> 		c++;
> 	return c;
> }
>
> assert("????".count() == 4);
>
> Also note that:
>
> assert("????"d.length == 4);

I hate to be pedantic but dchar[] can only be used to count the code
points - not the characters. A "character" can be composed by more than
one code point/dchar. This feature is frequent used for accents, marks
and some Asian scripts.

- -> http://www.unicode.org

Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFEzmhrLK5blCcjpWoRAnJhAJ0VKD2sD++PkR0hnFfGIAgFxn8OGgCeLg0Y
mp2vyHbFrwExwr3h6/etjWc=
=9RLJ
-----END PGP SIGNATURE-----



More information about the Digitalmars-d mailing list