Why not flag away the mistakes of the past?

Guillaume Piolat notthat at email.com
Fri Mar 9 11:58:54 UTC 2018


On Thursday, 8 March 2018 at 17:35:11 UTC, H. S. Teoh wrote:
> Yeah, the only reason autodecoding survived in the beginning 
> was because Andrei (wrongly) thought that a Unicode code point 
> was equivalent to a grapheme.  If that had been the case, the 
> cost associated with auto-decoding may have been justifiable.  
> Unfortunately, that is not the case, which greatly diminishes 
> most of the advantages that autodecoding was meant to have.  So 
> it ended up being something that incurred a significant 
> performance hit, yet did not offer the advantages it was 
> supposed to.  To fully live up to Andrei's original vision, it 
> would have to include grapheme segmentation as well.  
> Unfortunately, graphemes are of arbitrary length and cannot in 
> general fit in a single dchar (or any fixed-size type), and 
> grapheme segmentation is extremely costly to compute, so doing 
> it by default would kill D's string manipulation performance.


I remember something a bit different last time it was discussed:

  - removing auto-decoding was breaking a lot of code, it's used 
in lots of place
  - performance loss could be mitigated with .byCodeUnit everytime
  - Andrei correctly advocating against breakage

Personally I do use auto-decoding, often iterating by codepoint, 
and uses it for fonts and parsers. It's correct for a large 
subset of languages. You gave us a feature and now we are using 
it ;)


More information about the Digitalmars-d mailing list