writefln and ASCII

Thu Sep 14 12:40:33 PDT 2006

Steve Horne wrote:
> On Wed, 13 Sep 2006 14:17:13 -0400, nobody <nobody at mailinator.com>
> wrote:
> 
>>> Metadata. When your document cannot be represented as a simple text
>>> file, use something else.
>>>
>> It is my opinion that if you need metadata in addition to textual data then your 
>> method of representing textual data is inadequate.
> 
> Ah. So you believe that HTML and XML are garbage, then. Along with all
> binary word-processor document files.
> 
> But then, Unicode is inadequate also. You need additional metadata for
> anything beyond the simplest text. Unicode gives you a huge selection
> of characters, but it can't specify paragraphs styles etc.

I certainly believe that using HTML or XML to store plain text data is garbage. 
I recognize the possibility of expanding on plain text data with HTML or 
whatever -- but only when it is appropriate. Surely you would acknowledge that a 
Turing complete "encoding" for plain text is overkill?

> 
>> I am certain that to freely mix data from any codepage you would probably use 
>> something like an escape code.
> 
> That would be the most cryptically compressed form of metadata, I
> suppose. But why compress the metadata at the expense of the character
> data?
> 
> Switching languages and codepages is a relatively rare thing. Most
> documents don't do it at all. Even those that do are hardly likely to
> switch every other character.
 >
> By the huffman compression principle of representing the most frequent
> things with the smallest codes, the logical thing to do is to have
> single byte characters as much as possible and use a multibyte
> sequence - a tag - to select codepages.

That is indeed an elegant text encoding scheme. It took me some time to find any 
problems with it. The biggest disadvantage is that I am pretty sure you would 
need to heavily rewrite RegExp engines to get them to work across codepages. 
Unicode allows you to ask for any single codepoint questions like isalpha -- 
which means RegExp engines only need (relatively) minor modifications to work 
with Unicode.

Now to close by making heavy use of the "look shiny stuff!" fallacy. Here are 
some things you should be able to read and write in newsgroups because of Unicode:

   ∫ (a+b) dx = ∫ a dx + ∫ b dx = (a+b)x + C

   ∃x ∀y (x ∉ y)

   ∇ × E = - ∂B/∂t