The length of strings vs. # of chars vs. sizeof

Sun Nov 1 14:07:18 PST 2009

Rainer Deyke wrote:
> Charles Hixson wrote:
>> I've read and re-read the documentation, but I can't decide whether a
>> UTF-8 character that takes multiple bytes to express counts as one or
>> multiple values in length and sizeof.  Sizeof seems to presume that all
>> entries are the same length, but otherwise it seems to be the property I
>> need.  (I suppose that I could just enter a string that I know is
>> multi-byte chars, but it sure would be better if I could find out from
>> the documentation.)  I'm pretty certain that it just counts as one
>> character for indexing, so length would almost need to also count the
>> number of characters rather than bytes.
> 
> Strings are just arrays of code units.  Their length is the number of
> elements (i.e. code units) they contain, just like other arrays.  A code
> point may comprise multiple code units, and a logical character may
> comprise multiple code points.  The latter is true even with dchar/utf-32.
> 
	So, in UTF-8, length is the number of bytes in the string and 
sizeof is 8 (on 32-bits systems).

		Jerome
-- 
mailto:jeberger at free.fr
http://jeberger.free.fr
Jabber: jeberger at jabber.fr

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://lists.puremagic.com/pipermail/digitalmars-d-learn/attachments/20091101/076c1fc0/attachment.pgp>