Why UTF-8/16 character encodings?

Walter Bright newshound2 at digitalmars.com
Sat May 25 12:30:22 PDT 2013


On 5/25/2013 5:43 AM, Andrei Alexandrescu wrote:
> On 5/25/13 3:33 AM, Joakim wrote:
>> On Saturday, 25 May 2013 at 01:58:41 UTC, Walter Bright wrote:
>>> This is more a problem with the algorithms taking the easy way than a
>>> problem with UTF-8. You can do all the string algorithms, including
>>> regex, by working with the UTF-8 directly rather than converting to
>>> UTF-32. Then the algorithms work at full speed.
>> I call BS on this. There's no way working on a variable-width encoding
>> can be as "full speed" as a constant-width encoding. Perhaps you mean
>> that the slowdown is minimal, but I doubt that also.
>
> You mentioned this a couple of times, and I wonder what makes you so sure. On
> contemporary architectures small is fast and large is slow; betting on replacing
> larger data with more computation is quite often a win.

On the other hand, Joakim even admits his single byte encoding is variable 
length, as otherwise he simply dismisses the rarely used (!) Chinese, Japanese, 
and Korean languages, as well as any text that contains words from more than one 
language.

I suspect he's trolling us, and quite successfully.



More information about the Digitalmars-d mailing list