To Walter, about char[] initialization by FF
Unknown W. Brackets
unknown at simplemachines.org
Sat Jul 29 21:33:44 PDT 2006
It really sounds to me like you're looking for UCS-2, then (e.g. as used
in JavaScript, etc.) For that, length calculation (which is what I
presume you mean) is inexpensive.
As to your below assertion, I disagree. What I think you meant was:
"char[] is not designed for effective multi-byte text processing."
I will agree that wchar[] would be much better in that case, and even
that limiting it to UCS-2 (which is, afaik, a subset of UTF-16) would
probably make things significantly easier to work with.
Nonetheless, I was only commenting on how D is currently designed and
implemented. Perhaps there was some misunderstanding here.
Even so, I don't see how initializing it to FF makes any problem. I
think everyone understands that char[] is meant to hold UTF-8, and if
you don't like that or don't want to use it, there are other methods
available to you (heh, you can even use UTF-32!)
I don't see that the initialization of these variables will cause anyone
any problems. The only time I want such a variable initialized to 0 is
when I use a numeric type, not a character type (and then, I try to use
= 0 anyway.)
It seems like what you may want to do is simply this:
typedef ushort ucs2_t = 0;
And use that type. Mission accomplished. Or, use various different
encodings - in which case I humbly suggest:
typedef ubyte latin1_t = 0;
typedef ushort ucs2_t = 0;
typedef ubyte koi8r_t = 0;
typedef ubyte big5_t = 0;
And so on, so on, so on...
-[Unknown]
> So statement: "char[] in D supposed to hold only UTF-8 encoded text"
> immediately leads us to "D is not designed for effective text processing".
>
> Is this logic clear?
More information about the Digitalmars-d
mailing list