Beginner not getting "string"
Nick
nick at example.com
Sun Aug 29 10:44:03 PDT 2010
Reading Andrei's book and something seems amiss:
1. A char in D is a code *unit* not a code point. Considering that code
units are generally used to encode in an encoding, I would have expected
that the type for a code unit to be byte or something similar, as far
from code points as possible. In my mind, Unicode characters, aka chars
are code points.
2. Thus a string in D is an array of code *units*, although in Unicode a
string is really an array of code points.
3. Iterating a string in D is wrong by default, iterating over code
units instead of characters (code points). Even worse, the error does
not appear until you put some non-ascii text in there.
4. All string-processing calls (like sort, toupper, split and such) are
by default wrong on non-ascii strings. Wrong without any error, warning
or anything.
So I guess my question is why, in a language with the power and
expressiveness of D, in our day and age, would one choose such an
exposed, fragile implementation of string that ensures that the default
code one writes for text manipulation is most likely wrong?
I18N is one of the first things I judge a new language by and so far D
is... puzzling.
I don't know much about D so I am probably just not getting it but can
you please point me to some rationale behind these string design decisions?
Thanks!
Nick
More information about the Digitalmars-d
mailing list