The Case Against Autodecode

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Fri May 13 15:09:52 PDT 2016


On Fri, May 13, 2016 at 09:26:40PM +0200, Marco Leise via Digitalmars-d wrote:
> Am Fri, 13 May 2016 10:49:24 +0000
> schrieb Marc Schütz <schuetzm at gmx.net>:
> 
> > In fact, even most European languages are affected if NFD 
> > normalization is used, which is the default on MacOS X.
> > 
> > And this is actually the main problem with it: It was introduced 
> > to make unicode string handling correct. Well, it doesn't, 
> > therefore it has no justification.
> 
> +1 for leaning back and contemplate exactly what auto-decode
> was aiming for and how it missed that goal.
> 
> You'll see that an ö may still be cut between the o and the ¨.
> Hangul symbols are composed of pieces that go in different
> corners. Those would also be split up by auto-decode.
> 
> Can we handle real world text AT ALL? Are graphemes good
> enough to find the column in a fixed width display of some
> string (e.g. line+column or an error)? No, there my still be
> full-width characters in there that take 2 columns. :p
[...]

A simple lookup table ought to fix this. Preferably in std.uni so that
it doesn't get reinvented by every other project.


T

-- 
Don't modify spaghetti code unless you can eat the consequences.


More information about the Digitalmars-d mailing list