TDPL reaches Thermopylae level
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Fri Oct 30 20:22:31 PDT 2009
Justin Johansson wrote:
> Andrei Alexandrescu Wrote:
>
>> Lars T. Kyllingstad wrote:
>>> Nick Sabalausky wrote:
>>>> "Chris Nicholson-Sauls" <ibisbasenji at gmail.com> wrote in message
>>>> news:hcctuf$140a$1 at digitalmars.com...
>>>>> Granted LTR is common enough to be expectable and acceptable. To be
>>>>> perfectly honest, I don't believe I have *ever* even used
>>>>> wchar/wstring. Char/string gosh yes; dchar/dstring quite a bit as
>>>>> well, where I need the simplicity; but I've yet to feel much need for
>>>>> the "weirdo" middle child of UTF.
>>>>>
>>>> Given that just about anything outside of D (at least as far as I've
>>>> seen) that attempts to use unicode does so with UTF-16 (or just uses
>>>> UCS-2 and pretends that's UTF-16...), wchar and wstring are great for
>>>> dealing with that. For instance, my Goldie engine for GOLD currently
>>>> uses wchar in a number of places because GOLD's .cfg format stores
>>>> text in...well, presumably UTF-16 (I haven't tested to see if it's
>>>> really UCS-2). But yea, as long as you're not dealing with anything
>>>> that's already in UTF-16 or that expects it, then it does seem to be
>>>> somewhat questionable.
>>> I think this says it all:
>>>
>>> http://en.wikipedia.org/wiki/Utf-16#Use_in_major_operating_systems_and_environments
>>>
>>>
>>> -Lars :)
>> Yep, there was a frenzy when UCS-2 came about: everybody thought two
>> bytes will be enough for everyone. So UCS-2 was widely adopted - who
>> wouldn't love to have constant character width? Then, the UTF-16
>> surrogate business came about, and the only logical step they could take
>> was to migrate to UTF-16, which was upward compatible to UCS-2. I
>> personally think UTF-8 is a better overall design though.
>>
>> Andrei
>
> "I personally think UTF-8 is a better overall design though."
>
> Unicode Technical Note #12 by The Unicode Consortium apparently disagree,
> recommending UTF-16 for Processing.
>
> http://unicode.org/notes/tn12/
>
> The major claim in the TN is that Unicode is optimized for UTF-16. The rest of
> the argument looks like a VHS (everyone is using it i.e. UTF-16) versus Beta argument.
>
> So who's right? My personal view is that whilst they are the *Unicode Consortium*,
> I have great difficulty in accepting UTF-16 as the one-and-holy encoding.
>
> FWIW, there was a subthread during a discussion about the ordained features of
> programming languages on LtU a while back.
>
> http://lambda-the-ultimate.org/node/3166#comment-46233
> What Are The Resolved Debates in General Purpose Language Design?
>
> Its a long discussion so easier to search for UTF or Unicode on the page if you're interested.
>
> cheers
> Justin Johansson
Thanks for the pointers. One of the reasons for which I like the design
of UTF-8 is its generality: it's a variable-length code for any number
of 31 bits. In contrast, UTF-16 is a relies on specific dead zones
inside the assigned space. But the authors of the unicode.org article do
make a few good points, such as there not being any invalid UTF-16
symbol. But then that actually can be seen as a strength of UTF-8 - the
binary files that are actually UTF-8 files are statistically so scarce,
UTF-8 has a very solid method of checking whether a file is UTF-8 or
something else.
Andrei
More information about the Digitalmars-d
mailing list