Why not flag away the mistakes of the past?
Guillaume Piolat
notthat at email.com
Fri Mar 9 11:58:54 UTC 2018
On Thursday, 8 March 2018 at 17:35:11 UTC, H. S. Teoh wrote:
> Yeah, the only reason autodecoding survived in the beginning
> was because Andrei (wrongly) thought that a Unicode code point
> was equivalent to a grapheme. If that had been the case, the
> cost associated with auto-decoding may have been justifiable.
> Unfortunately, that is not the case, which greatly diminishes
> most of the advantages that autodecoding was meant to have. So
> it ended up being something that incurred a significant
> performance hit, yet did not offer the advantages it was
> supposed to. To fully live up to Andrei's original vision, it
> would have to include grapheme segmentation as well.
> Unfortunately, graphemes are of arbitrary length and cannot in
> general fit in a single dchar (or any fixed-size type), and
> grapheme segmentation is extremely costly to compute, so doing
> it by default would kill D's string manipulation performance.
I remember something a bit different last time it was discussed:
- removing auto-decoding was breaking a lot of code, it's used
in lots of place
- performance loss could be mitigated with .byCodeUnit everytime
- Andrei correctly advocating against breakage
Personally I do use auto-decoding, often iterating by codepoint,
and uses it for fonts and parsers. It's correct for a large
subset of languages. You gave us a feature and now we are using
it ;)
More information about the Digitalmars-d
mailing list