To Walter, about char[] initialization by FF

Andrew Fedoniouk news at terrainformatica.com
Mon Jul 31 14:33:13 PDT 2006


"Thomas Kuehne" <thomas-dloop at kuehne.cn> wrote in message 
news:ls52q3-3o8.ln1 at birke.kuehne.cn...
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Oskar Linde schrieb am 2006-07-31:
>> Serg Kovrov wrote:
>
>>> For example,
>>> char[] str = "????";
>>> word "test" in russian - 4 cyrillic characters, would give you
>>> str.length 8, which make no use of this length property if you not sure
>>> that string is latin characters only.
>>
>> It is actually not very often that you need to count the number of
>> characters as opposed to the number of (UTF-8) code units. Counting the
>> number of characters is also a rather expensive operation. All the
>> ordinary operations (searching, slicing, concatenation, sub-string
>> search, etc) operate on code units rather than characters.
>>
>> It is easy to implement your own character count though:
>>
>> size_t count(char[] arr) {
>> size_t c = 0;
>> foreach(dchar c;arr)
>> c++;
>> return c;
>> }
>>
>> assert("????".count() == 4);
>>
>> Also note that:
>>
>> assert("????"d.length == 4);
>
> I hate to be pedantic but dchar[] can only be used to count the code
> points - not the characters. A "character" can be composed by more than
> one code point/dchar. This feature is frequent used for accents, marks
> and some Asian scripts.
>
> - -> http://www.unicode.org
>


Right, Thomas,

umlaut as a separate code point can exist
so A with umlaut can be represented by two code points.
But as far as I remember the intention was and is
to have in Unicode also all full forms like "A-with-umlaut"
So you can always "compress" multi code point forms into
single point counterparts.

This way "????"d.length == 4 will be true -
it is just depeneds on your text parser.

Andrew.



> Thomas
>
>
> -----BEGIN PGP SIGNATURE-----
>
> iD8DBQFEzmhrLK5blCcjpWoRAnJhAJ0VKD2sD++PkR0hnFfGIAgFxn8OGgCeLg0Y
> mp2vyHbFrwExwr3h6/etjWc=
> =9RLJ
> -----END PGP SIGNATURE----- 





More information about the Digitalmars-d mailing list