To Walter, about char[] initialization by FF

Unknown W. Brackets unknown at simplemachines.org
Sat Jul 29 21:33:44 PDT 2006


It really sounds to me like you're looking for UCS-2, then (e.g. as used 
in JavaScript, etc.)  For that, length calculation (which is what I 
presume you mean) is inexpensive.

As to your below assertion, I disagree.  What I think you meant was:

"char[] is not designed for effective multi-byte text processing."

I will agree that wchar[] would be much better in that case, and even 
that limiting it to UCS-2 (which is, afaik, a subset of UTF-16) would 
probably make things significantly easier to work with.

Nonetheless, I was only commenting on how D is currently designed and 
implemented.  Perhaps there was some misunderstanding here.

Even so, I don't see how initializing it to FF makes any problem.  I 
think everyone understands that char[] is meant to hold UTF-8, and if 
you don't like that or don't want to use it, there are other methods 
available to you (heh, you can even use UTF-32!)

I don't see that the initialization of these variables will cause anyone 
any problems.  The only time I want such a variable initialized to 0 is 
when I use a numeric type, not a character type (and then, I try to use 
= 0 anyway.)

It seems like what you may want to do is simply this:

typedef ushort ucs2_t = 0;

And use that type.  Mission accomplished.  Or, use various different 
encodings - in which case I humbly suggest:

typedef ubyte latin1_t = 0;
typedef ushort ucs2_t = 0;
typedef ubyte koi8r_t = 0;
typedef ubyte big5_t = 0;

And so on, so on, so on...

-[Unknown]


> So statement: "char[] in D supposed to hold only UTF-8 encoded text"
> immediately leads us to "D is not designed for effective text processing".
> 
> Is this logic clear?



More information about the Digitalmars-d mailing list