writefln and ASCII
nobody
nobody at mailinator.com
Wed Sep 13 07:55:42 PDT 2006
Steve Horne wrote:
> On Tue, 12 Sep 2006 15:03:20 +0300, Serg Kovrov <kovrov at no.spam>
> wrote:
>
>> How do I writefln a string from ASCII file contained illegal UTF-8
>> characters, but legal as ASCII? For example ndash symbol - ASCII 0x96).
>
> Just to add some angry ranting to what has already been said...
I can understand your frustration. I felt the same way you did for awhile. The
thing that changed my mind was realizing that I think Unicode has some great
features.
Unicode threads do have a tendency to be rather long so here is my short
contribution up front. UTF-8 is great if you can be fairly sure you will only be
using ASCII data. UTF-16 is great for almost every writing system that is
currently used on the planet Earth.
>
> Then, there was a whole bunch of codepages - different character sets
> for different countries. These exploited characters 128 to 255, but
> each codepage defined the characters differently. Some codepages had
> multi-byte characters.
Unicode is not so bad
Unicode 不是那么坏
Unicode δεν είναι τόσο κακό
Unicode はあまり悪くない
Unicode не настолько плох
When I wrote this message I see English, Chinese, Greek, Japanese and Russian
characters displayed. My preferred text editor (TextPad) uses codepages and
wants me to pick whether to display only one of Chinese, Greek, Japanese or
Russian. With Unicode it is possible to read and write all of the above.
If you think Unicode is overly complex then perhaps you should have a go at
writing some code to display this message correctly using codepages. You will
need to identify codepage boundaries and then you can probably use frequency
tables to identify the codepage used within each boundry. You might want to
mitigate the high error rates by also checking dictionaries appropriate for each
codepage. Of course dictionaries only go so far so you might also need to know
how each language and its dialects vary words.
More information about the Digitalmars-d-learn
mailing list