Fix Phobos dependencies on autodecoding

Patrick Schluter Patrick.Schluter at bbox.fr
Fri Aug 16 10:32:06 UTC 2019


On Friday, 16 August 2019 at 09:34:21 UTC, Walter Bright wrote:
> On 8/16/2019 2:20 AM, Patrick Schluter wrote:
>> Sorry, no it didn't work in reality before Unicode. Multi 
>> language system were a mess.
>
> I have several older books that move facilely between multiple 
> languages. It's not a mess.
>
> Since the reader can figure all this out without invisible 
> semantic information in the glyphs, that invisible information 
> is not necessary.

Unicode's purpose is not limited to the output at the end the 
processing chain. It's the whole processing chain that is the 
point.

>
> Once you print/display the Unicode string, all that semantic 
> information is gone. It is not needed.

As said, printing is only a minor part of language processing. To 
give an example from the EU again, and just to illustrate, we 
have exactly three laser printer (one is a photocopier) on each 
floor of our offices. You may say; o you're the IT guys, you 
don't need to print that much, to which I respond, half of the 
floor is populated with the english translation unit and while 
they indeed use the printers more than us, it is not a 
significant part of their workflow.

>
>
>> Unicode works much, much better than anything that existed 
>> before. The issue is that not a lot of people work in a 
>> multi-language environment and don't have a clue of the unholy 
>> mess it was before.
>
> Actually, I do. Zortech C++ supported multiple code pages, 
> multiple multibyte encodings, and had error messages in 4 
> languages.

Each string was in its own language. We have to deal with texts 
that are mixed languages. Sentences in Bulgarian with an office 
address in Greece, embedded in a xml file. Codepages don't work 
in that case, or you have to introduce an escaping scheme much 
more brittle and annoying than utf-8 or utf-16 encoding.
European Parliament's session logs are what is called panaché 
documents, i.e. the transcripts are in native language of 
intervening MEP's. So completely mixed documents.

>
> Unicode, in its original vision, solved those problems.

Unicode is not perfect and indeed the crap with emoji is crap, 
but Unicode is better than what was used before.
And to insist again, Unicode is mostly about "DATA PROCESSING". 
Sometime it might result to a human readable result, but that is 
only one part of its purpose.


More information about the Digitalmars-d mailing list