To Walter, about char[] initialization by FF

Andrew Fedoniouk news at terrainformatica.com
Sat Jul 29 18:43:59 PDT 2006


"Walter Bright" <newshound at digitalmars.com> wrote in message 
news:eagut9$2l96$1 at digitaldaemon.com...
> Andrew Fedoniouk wrote:
>> I will ask again:
>>
>> What:
>> char c = 'a';
>> means for you?
>> And following in C/C++:
>>
>> #pragma(encoding,"KOI-8R")
>>
>> char c = '?';
>>
>> ?
>
> Pragmas are implementation defined behavior in C and C++, meaning they are 
> unportable and rather useless. Not only that, char's themselves are 
> implementation defined, and so it is very difficult to write portable code 
> that deals with anything other than a-zA-Z0-9 and a few other characters.
>
> In D, char[] is a UTF-8 sequence. It's well defined, and therefore 
> portable. It supports every human language.

What does it mean "UTF-8 ... supports ...every human language" ?

It allows to encode - yes.

But in runtime support means quite different thing
and I am pretty sure you know what I mean here.

In Java as we know UTF-8 is used for representing
string literals inside .class files but being loaded they
became vectors of Java chars - unicode BMP codepoints
(ushort). And this serves almost all character cases.
Exceptions like: it is not trivial to do effectively
processing of single byte encoded things there - you need
to rewrite the whole set of functions to handle this.

Please don't think that UTF-8 is a panacea.

For example in China they use GB2312 encoding
to represent almost 7000 Chinese characters in active use now.
This is strictly 2 bytes enconding and
don't even try to ask them to switch to UTF-8
(3 bytes as a rule). This will increase their internet
traffic by 1/3.

Same apply to Europe. E.g. in Russia
there are 32 characters in alphabet and it is
just enough to have one byte encoding for
English/Russian text. It makes no sense
to send over the wire two bytes (russian in utf-8)
instead of one for the sites like lib.ru.

Sorry but guys are paying there for each byte
downloaded from Internet. This apply
to almost all countries except of US and Canada.

Andrew Fedoniouk.
http://terrainformatica.com





More information about the Digitalmars-d mailing list