The Case Against Autodecode
Joakim via Digitalmars-d
digitalmars-d at puremagic.com
Wed Jun 1 06:57:27 PDT 2016
On Wednesday, 1 June 2016 at 10:04:42 UTC, Marc Schütz wrote:
> On Tuesday, 31 May 2016 at 16:29:33 UTC, Joakim wrote:
>> UTF-8 is an antiquated hack that needs to be eradicated. It
>> forces all other languages than English to be twice as long,
>> for no good reason, have fun with that when you're downloading
>> text on a 2G connection in the developing world.
>
> I assume you're talking about the web here. In this case, plain
> text makes up only a minor part of the entire traffic, the
> majority of which is images (binary data), javascript and
> stylesheets (almost pure ASCII), and HTML markup (ditto). It's
> like not significant even without taking compression into
> account, which is ubiquitous.
No, I explicitly said not the web in a subsequent post. The
ignorance here of what 2G speeds are like is mind-boggling.
>> It is unnecessarily inefficient, which is precisely why
>> auto-decoding is a problem.
>
> No, inefficiency is the least of the problems with
> auto-decoding.
Right... that's why this 200-post thread was spawned with that as
the main reason.
>> It is only a matter of time till UTF-8 is ditched.
>
> This is ridiculous, even if your other claims were true.
The UTF-8 encoding is what's ridiculous.
>>
>> D devs should lead the way in getting rid of the UTF-8
>> encoding, not bickering about how to make it more palatable.
>> I suggested a single-byte encoding for most languages, with
>> double-byte for the ones which wouldn't fit in a byte. Use
>> some kind of header or other metadata to combine strings of
>> different languages, _rather than encoding the language into
>> every character!_
>
> I think I remember that post, and - sorry to be so blunt - it
> was one of the worst things I've ever seen proposed regarding
> text encoding.
Well, when you _like_ a ludicrous encoding like UTF-8, not sure
your opinion matters.
>>
>> The common string-handling use case, by far, is strings with
>> only one language, with a distant second some substrings in a
>> second language, yet here we are putting the overhead into
>> every character to allow inserting characters from an
>> arbitrary language! This is madness.
>
> No. The common string-handling use case is code that is unaware
> which script (not language, btw) your text is in.
Lol, this may be the dumbest argument put forth yet.
I don't think anyone here even understands what a good encoding
is and what it's for, which is why there's no point in debating
this.
More information about the Digitalmars-d
mailing list