writefln and ASCII
nobody
nobody at mailinator.com
Wed Sep 13 11:17:13 PDT 2006
Steve Horne wrote:
> On Wed, 13 Sep 2006 10:55:42 -0400, nobody <nobody at mailinator.com>
> wrote:
>
>> When I wrote this message I see English, Chinese, Greek, Japanese and Russian
>> characters displayed.
>
> Obviously, yes. I just think Unicode could have been simpler.
I think they kept it as simple as was reasonably possible. Once you admit a need
to use more than a single byte to represent an entity then any solution is going
to have the same complications.
They really did need to remain backwards compatible with ASCII while also
allowing the bulk of non-ASCII to be represented as 2 bytes.
UTF-8 is free of endian ambiguity and is fully compatible with ASCII data but
might use as many as 8 bytes to represent a single Unicode code point. UTF-16
represents the bulk of code points actually used in the world with only 2 bytes
but as with any data using more than one byte it has to address endian ambiguities.
>
>> You will
>> need to identify codepage boundaries and then you can probably use frequency
>> tables to identify the codepage used within each boundry.
>
> Metadata. When your document cannot be represented as a simple text
> file, use something else.
>
It is my opinion that if you need metadata in addition to textual data then your
method of representing textual data is inadequate.
I am certain that to freely mix data from any codepage you would probably use
something like an escape code. If you were really sly you would probably use
ASCII as a default code page and then let the highest bit being set represent an
escape code -- which is exactly how UTF-8 starts out. How you would imagine
filling out the rest?
More information about the Digitalmars-d-learn
mailing list