The length of strings vs. # of chars vs. sizeof
Rainer Deyke
rainerd at eldwood.com
Sun Nov 1 12:12:10 PST 2009
Charles Hixson wrote:
> I've read and re-read the documentation, but I can't decide whether a
> UTF-8 character that takes multiple bytes to express counts as one or
> multiple values in length and sizeof. Sizeof seems to presume that all
> entries are the same length, but otherwise it seems to be the property I
> need. (I suppose that I could just enter a string that I know is
> multi-byte chars, but it sure would be better if I could find out from
> the documentation.) I'm pretty certain that it just counts as one
> character for indexing, so length would almost need to also count the
> number of characters rather than bytes.
Strings are just arrays of code units. Their length is the number of
elements (i.e. code units) they contain, just like other arrays. A code
point may comprise multiple code units, and a logical character may
comprise multiple code points. The latter is true even with dchar/utf-32.
--
Rainer Deyke - rainerd at eldwood.com
More information about the Digitalmars-d-learn
mailing list