[OT] Effect of UTF-8 on 2G connections
Wyatt via Digitalmars-d
digitalmars-d at puremagic.com
Wed Jun 1 11:30:25 PDT 2016
On Wednesday, 1 June 2016 at 16:45:04 UTC, Joakim wrote:
> On Wednesday, 1 June 2016 at 15:02:33 UTC, Wyatt wrote:
>> It's not hard. I think a lot of us remember when a 14.4 modem
>> was cutting-edge.
>
> Well, then apparently you're unaware of how bloated web pages
> are nowadays. It used to take me minutes to download popular
> web pages _back then_ at _top speed_, and those pages were a
> _lot_ smaller.
It's telling that you think the encoding of the text is anything
but the tiniest fraction of the problem. You should look at
where the actual weight of a "modern" web page comes from.
>> Codepages and incompatible encodings were terrible then, too.
>>
>> Never again.
>
> This only shows you probably don't know the difference between
> an encoding and a code page,
"I suggested a single-byte encoding for most languages, with
double-byte for the ones which wouldn't fit in a byte. Use some
kind of header or other metadata to combine strings of different
languages, _rather than encoding the language into every
character!_"
Yeah, that? That's codepages. And your exact proposal to put
encodings in the header was ALSO tried around the time that
Unicode was getting hashed out. It sucked. A lot. (Not as bad
as storing it in the directory metadata, though.)
>>> Well, when you _like_ a ludicrous encoding like UTF-8, not
>>> sure your opinion matters.
>>
>> It _is_ kind of ludicrous, isn't it? But it really is the
>> least-bad option for the most text. Sorry, bub.
>
> I think we can do a lot better.
Maybe. But no one's done it yet.
> The vast majority of software is written for _one_ language,
> the local one. You may think otherwise because the software
> that sells the most and makes the most money is
> internationalized software like Windows or iOS, because it can
> be resold into many markets. But as a percentage of lines of
> code written, such international code is almost nothing.
I'm surprised you think this even matters after talking about web
pages. The browser is your most common string processing
situation. Nothing else even comes close.
> largely ignoring the possibilities of the header scheme I
> suggested.
"Possibilities" that were considered and discarded decades ago by
people with way better credentials. The era of single-byte
encodings is gone, it won't come back, and good riddance to bad
rubbish.
> I could call that "trolling" by all of you, :) but I'll instead
> call it what it likely is, reactionary thinking, and move on.
It's not trolling to call you out for clearly not doing your
homework.
> I don't think you understand: _you_ are the special case.
Oh, I understand perfectly. _We_ (whoever "we" are) can handle
any sequence of glyphs and combining characters (correctly-formed
or not) in any language at any time, so we're the special case...?
Yeah, it sounds funny to me, too.
> The 5 billion people outside the US and EU are _not the special
> case_.
Fortunately, it works for them to.
> The problem is all the rest, and those just below who cannot
> afford it at all, in part because the tech is not as efficient
> as it could be yet. Ditching UTF-8 will be one way to make it
> more efficient.
All right, now you've found the special case; the case where the
generic, unambiguous encoding may need to be lowered to something
else: people for whom that encoding is suboptimal because of
_current_ network constraints.
I fully acknowledge it's a couple billion people and that's
nothing to sneeze at, but I also see that it's a situation that
will become less relevant over time.
-Wyatt
More information about the Digitalmars-d
mailing list