Dicebot on leaving D: It is anarchy driven development in all its glory.

H. S. Teoh hsteoh at quickfur.ath.cx
Wed Sep 5 22:00:27 UTC 2018


On Wed, Sep 05, 2018 at 09:33:27PM +0000, aliak via Digitalmars-d wrote:
[...]
> The dstring is only ok because the 2 code units fit in a dchar right?
> But all the other ones are as expected right?

And dstring will be wrong once you have non-precomposed diacritics and
other composing sequences.


> Seriously... why is it not graphemes by default for correctness
> whyyyyyyy!

Because grapheme decoding is SLOW, and most of the time you don't even
need it anyway.  SLOW as in, it will easily add a factor of 3-5 (if not
worse!) to your string processing time, which will make your
natively-compiled D code a laughing stock of interpreted languages like
Python.  It will make autodecoding look like an optimization(!).

Grapheme decoding is really only necessary when (1) you're typesetting a
Unicode string, and (2) you're counting the number of visual characters
taken up by the string (though grapheme counting even in this case may
not give you what you want, thanks to double-width characters,
zero-width characters, etc. -- though it can form the basis of correct
counting code).

For all other cases, you really don't need grapheme decoding, and being
forced to iterate over graphemes when unnecessary will add a horrible
overhead, worse than autodecoding does today.

//

Seriously, people need to get over the fantasy that they can just use
Unicode without understanding how Unicode works.  Most of the time, you
can get the illusion that it's working, but actually 99% of the time the
code is actually wrong and will do the wrong thing when given an
unexpected (but still valid) Unicode string.  You can't drive without a
license, and even if you try anyway, the chances of ending up in a nasty
accident is pretty high.  People *need* to learn how to use Unicode
properly before complaining about why this or that doesn't work the way
they thought it should work.


T

-- 
Gone Chopin. Bach in a minuet.


More information about the Digitalmars-d mailing list