TDPL reaches Thermopylae level

Fri Oct 30 14:25:03 PDT 2009

Andrei Alexandrescu Wrote:

> Lars T. Kyllingstad wrote:
> > Nick Sabalausky wrote:
> >> "Chris Nicholson-Sauls" <ibisbasenji at gmail.com> wrote in message 
> >> news:hcctuf$140a$1 at digitalmars.com...
> >>> Granted LTR is common enough to be expectable and acceptable.  To be 
> >>> perfectly honest, I don't believe I have *ever* even used 
> >>> wchar/wstring. Char/string gosh yes; dchar/dstring quite a bit as 
> >>> well, where I need the simplicity; but I've yet to feel much need for 
> >>> the "weirdo" middle child of UTF.
> >>>
> >>
> >> Given that just about anything outside of D (at least as far as I've 
> >> seen) that attempts to use unicode does so with UTF-16 (or just uses 
> >> UCS-2 and pretends that's UTF-16...), wchar and wstring are great for 
> >> dealing with that. For instance, my Goldie engine for GOLD currently 
> >> uses wchar in a number of places because GOLD's .cfg format stores 
> >> text in...well, presumably UTF-16 (I haven't tested to see if it's 
> >> really UCS-2). But yea, as long as you're not dealing with anything 
> >> that's already in UTF-16 or that expects it, then it does seem to be 
> >> somewhat questionable. 
> > 
> > I think this says it all:
> > 
> > http://en.wikipedia.org/wiki/Utf-16#Use_in_major_operating_systems_and_environments 
> > 
> > 
> > -Lars :)
> 
> Yep, there was a frenzy when UCS-2 came about: everybody thought two 
> bytes will be enough for everyone. So UCS-2 was widely adopted - who 
> wouldn't love to have constant character width? Then, the UTF-16 
> surrogate business came about, and the only logical step they could take 
> was to migrate to UTF-16, which was upward compatible to UCS-2. I 
> personally think UTF-8 is a better overall design though.
> 
> Andrei

"I personally think UTF-8 is a better overall design though."

Unicode Technical Note #12 by The Unicode Consortium apparently disagree,
recommending UTF-16 for Processing.

http://unicode.org/notes/tn12/

The major claim in the TN is that Unicode is optimized for UTF-16.  The rest of
the argument looks like a VHS (everyone is using it i.e. UTF-16) versus Beta argument.

So who's right?  My personal view is that whilst they are the *Unicode Consortium*,
I have great difficulty in accepting UTF-16 as the one-and-holy encoding.

FWIW, there was a subthread during a discussion about the ordained features of 
programming languages on LtU a while back.

http://lambda-the-ultimate.org/node/3166#comment-46233
What Are The Resolved Debates in General Purpose Language Design?

Its a long discussion so easier to search for UTF or Unicode on the page if you're interested.

cheers
Justin Johansson