To Walter, about char[] initialization by FF

Sun Jul 30 09:38:08 PDT 2006

It's true that in HTML, attribute names were limited to a subset of 
characters available for use in the document.  Namely, as mentioned, 
alpha-type characters (/[A-Za-z][A-Za-z0-9\.\-]*/.)  You couldn't even 
use accented chars.

However (in the case of HTML), you were required to use specific 
(English) attribute names anyway for HTML to validate; it's really not a 
significant limitation.  Few people used SGML for anything else.

XML allows for Unicode attribute and element names... PIs, CDATA, 
PCDATA, etc.  And, of course, allows you to reference any Unicode code 
point (e.g. &#1234;.)

We could also talk about the limitations of horse driven carriages, and 
how they can only go a certain speed... nonetheless, we have cars now, 
so I'm not terribly worried about HTML's technical limitations anymore.

-[Unknown]

>> Consider this: attribute names in html (sgml) represented by
>> ascii codes only - you don't need utf-8 processing to deal with them 
>> at all.
>> You also cannot use utf-8 for storing attribute values generally 
>> speaking.
>> Attribute values participate in CSS selector analysis and some selectors
>> require char by char (char as a code point and not a D char) access.
> 
> I'd be surprised at that, since UTF-8 is a documented, supported HTML 
> page encoding method. But if UTF-8 doesn't work for you, you can use 
> wchar (UTF-16) or dchar (UTF-32), or ubyte (for anything else).
>