writefln and ASCII

nobody nobody at mailinator.com
Wed Sep 13 11:17:13 PDT 2006


Steve Horne wrote:
> On Wed, 13 Sep 2006 10:55:42 -0400, nobody <nobody at mailinator.com>
> wrote:
> 
>> When I wrote this message I see English, Chinese, Greek, Japanese and Russian 
>> characters displayed.
> 
> Obviously, yes. I just think Unicode could have been simpler.

I think they kept it as simple as was reasonably possible. Once you admit a need 
to use more than a single byte to represent an entity then any solution is going 
to have the same complications.

They really did need to remain backwards compatible with ASCII while also 
allowing the bulk of non-ASCII to be represented as 2 bytes.

UTF-8 is free of endian ambiguity and is fully compatible with ASCII data but 
might use as many as 8 bytes to represent a single Unicode code point. UTF-16 
represents the bulk of code points actually used in the world with only 2 bytes 
but as with any data using more than one byte it has to address endian ambiguities.

> 
>> You will 
>> need to identify codepage boundaries and then you can probably use frequency 
>> tables to identify the codepage used within each boundry.
> 
> Metadata. When your document cannot be represented as a simple text
> file, use something else.
> 

It is my opinion that if you need metadata in addition to textual data then your 
method of representing textual data is inadequate.

I am certain that to freely mix data from any codepage you would probably use 
something like an escape code. If you were really sly you would probably use 
ASCII as a default code page and then let the highest bit being set represent an 
escape code -- which is exactly how UTF-8 starts out. How you would imagine 
filling out the rest?



More information about the Digitalmars-d-learn mailing list