Beginner not getting "string"

Nick nick at example.com
Sun Aug 29 10:44:03 PDT 2010


Reading Andrei's book and something seems amiss:

1. A char in D is a code *unit* not a code point. Considering that code 
units are generally used to encode in an encoding, I would have expected 
that the type for a code unit to be byte or something similar, as far 
from code points as possible. In my mind, Unicode characters, aka chars 
are code points.

2. Thus a string in D is an array of code *units*, although in Unicode a 
string is really an array of code points.

3. Iterating a string in D is wrong by default, iterating over code 
units instead of characters (code points). Even worse, the error does 
not appear until you put some non-ascii text in there.

4. All string-processing calls (like sort, toupper, split and such) are 
by default wrong on non-ascii strings. Wrong without any error, warning 
or anything.

So I guess my question is why, in a language with the power and 
expressiveness of D, in our day and age, would one choose such an 
exposed, fragile implementation of string that ensures that the default 
code one writes for text manipulation is most likely wrong?

I18N is one of the first things I judge a new language by and so far D 
is... puzzling.

I don't know much about D so I am probably just not getting it but can 
you please point me to some rationale behind these string design decisions?

Thanks!
Nick


More information about the Digitalmars-d mailing list