First Impressions
Anders F Björklund
afb at algonet.se
Sun Oct 1 15:30:57 PDT 2006
BCS wrote:
> ubyte is an 8 bit unsigned number not a character encoding.
Right, I actually meant ubyte[] but void[] might have been
more accurate for representing any (even non-UTF) encoding.
(I used ubyte[] in my mapping functions, since they only
used legacy 8-bit encodings like "cp1252" or "macroman")
Re-reading your post, it seems to me that you were more talking
about doing an alias to the UTF type most suitable for the OS ?
I guess UTF-8 would be a good choice if the operating system
doesn't use Unicode, since then it'll have to do lookups anyway.
Otherwise the existing "wchar_t" isn't bad for such an UTF type,
it will be UTF-16 on Windows and UTF-32 on Unix (linux,darwin,...)
>> All ASCII characters are valid UTF-8 code units, so it's OK.
>
> But UTF-8 is not ASCII.
So you would like a char "type" that would only take ASCII ?
I guess that is *one* way of dealing with it, you could also
have a wchar type that wouldn't accept surrogates (BMP only)
Then it would be OK to index them by code unit / character...
(since each allowed character would fit into one code unit)
Sounds a little like signed vs. unsigned integers actually ?
Then again, 5 character types is even worse than the 3 now.
--anders
More information about the Digitalmars-d
mailing list