writefln and ASCII

Wed Sep 13 07:55:42 PDT 2006

Steve Horne wrote:
> On Tue, 12 Sep 2006 15:03:20 +0300, Serg Kovrov <kovrov at no.spam>
> wrote:
> 
>> How do I writefln a string from ASCII file contained illegal UTF-8 
>> characters, but legal as ASCII? For example ndash symbol - ASCII 0x96).
> 
> Just to add some angry ranting to what has already been said...

I can understand your frustration. I felt the same way you did for awhile. The 
thing that changed my mind was realizing that I think Unicode has some great 
features.

Unicode threads do have a tendency to be rather long so here is my short 
contribution up front. UTF-8 is great if you can be fairly sure you will only be 
using ASCII data. UTF-16 is great for almost every writing system that is 
currently used on the planet Earth.

> 
> Then, there was a whole bunch of codepages - different character sets
> for different countries. These exploited characters 128 to 255, but
> each codepage defined the characters differently. Some codepages had
> multi-byte characters.

   Unicode is not so bad
   Unicode 不是那么坏
   Unicode δεν είναι τόσο κακό
   Unicode はあまり悪くない
   Unicode не настолько плох

When I wrote this message I see English, Chinese, Greek, Japanese and Russian 
characters displayed. My preferred text editor (TextPad) uses codepages and 
wants me to pick whether to display only one of Chinese, Greek, Japanese or 
Russian. With Unicode it is possible to read and write all of the above.

If you think Unicode is overly complex then perhaps you should have a go at 
writing some code to display this message correctly using codepages. You will 
need to identify codepage boundaries and then you can probably use frequency 
tables to identify the codepage used within each boundry. You might want to 
mitigate the high error rates by also checking dictionaries appropriate for each 
codepage. Of course dictionaries only go so far so you might also need to know 
how each language and its dialects vary words.