To Walter, about char[] initialization by FF
Walter Bright
newshound at digitalmars.com
Sat Jul 29 19:16:12 PDT 2006
Andrew Fedoniouk wrote:
>> In D, char[] is a UTF-8 sequence. It's well defined, and therefore
>> portable. It supports every human language.
>
> What does it mean "UTF-8 ... supports ...every human language" ?
>
> It allows to encode - yes.
We both know what UTF-8 is and does.
> But in runtime support means quite different thing
> and I am pretty sure you know what I mean here.
I'm sure there are bugs in the library UTF-8 support. But they are bugs,
are fixable, and not fundamental problems. As you find any, please post
them to bugzilla.
> In Java as we know UTF-8 is used for representing
> string literals inside .class files but being loaded they
> became vectors of Java chars - unicode BMP codepoints
> (ushort). And this serves almost all character cases.
> Exceptions like: it is not trivial to do effectively
> processing of single byte encoded things there - you need
> to rewrite the whole set of functions to handle this.
>
> Please don't think that UTF-8 is a panacea.
I don't. But it's way better than C/C++, because you can rely on it and
your code will work with different languages out of the box.
> For example in China they use GB2312 encoding
> to represent almost 7000 Chinese characters in active use now.
> This is strictly 2 bytes enconding and
> don't even try to ask them to switch to UTF-8
> (3 bytes as a rule). This will increase their internet
> traffic by 1/3.
>
> Same apply to Europe. E.g. in Russia
> there are 32 characters in alphabet and it is
> just enough to have one byte encoding for
> English/Russian text. It makes no sense
> to send over the wire two bytes (russian in utf-8)
> instead of one for the sites like lib.ru.
>
> Sorry but guys are paying there for each byte
> downloaded from Internet. This apply
> to almost all countries except of US and Canada.
If one needs to use a custom encoding, use ubyte[] or ushort[]. If one
needs to be universal, use char[], wchar[], or dchar[]. And for what
it's worth, D isn't a web transmission protocol. I don't see any problem
with a D program converting its input from Format X to UTF for internal
processing, and then converting its output back to X or Y or Z.
More information about the Digitalmars-d
mailing list