The length of strings vs. # of chars vs. sizeof

Rainer Deyke rainerd at eldwood.com
Sun Nov 1 19:08:53 PST 2009


Jesse Phillips wrote:
> I believe the documentation you are looking for is:
> 
> http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD
> 
> It is more about understanding UTF than it is about learning strings.

One thing that page fails to mention is that D has no awareness of
anything higher-level than code points.  In particular:
  - dchar contains a code point, not a logical character.
  - D has no awareness of canonical forms and precomposed/decomposed
characters (at the language level).  (Some characters can be represented
as either one or two code points.  D does not know that these are
supposed to represent the same character.)
  - Although D stops you from outputting an incomplete code point, it
does not stop you from outputting an incomplete logical character.

Also, some D library functions only work on the ASCII subset of utf-8.


-- 
Rainer Deyke - rainerd at eldwood.com


More information about the Digitalmars-d-learn mailing list