The Case For Autodecode

ag0aep6g via Digitalmars-d digitalmars-d at puremagic.com
Fri Jun 3 04:24:40 PDT 2016


This is mostly me trying to make sense of the discussion.

So everyone hates autodecoding. But Andrei seems to hate it a good bit 
less than everyone else. As far as I could follow, he has one reason for 
that, which might not be clear to everyone:

char converts implicitly to dchar, so the compiler lets you search for a 
dchar in a range of chars. But that gives nonsensical results. For 
example, you won't find 'ö' in  "ö".byChar, but you will find '¶' in 
there ('¶' is U+00B6, 'ö' is U+00F6, and 'ö' is encoded as 0xC3 0xB6 in 
UTF-8).

The same does not happen when searching for a grapheme in a range of 
code points, because you just can't do that accidentally. dchar does not 
implicitly convert to std.uni.Grapheme.

So autodecoding shields the user from one surprising aspect of narrow 
strings, and indeed this one kind of problem does not exist with code 
points.

So:
code units - a lot of surprises
code points - a lot of surprises minus one

I don't think this makes autodecoding actually desirable, but I do think 
it prevents a mistake that could otherwise be common.

The issue could also be avoided by making char not convert implicitly to 
dchar. I would like that, but it would of course be another substantial 
breaking change.

At Andrei: Apologies if I'm misrepresenting your position. If you have 
other arguments in favor of autodecoding, they haven't gotten through to me.

At everyone: Apologies if I'm just stating the obvious here. I needed 
this pointed out, and it happened in the depths of the other thread. So 
maybe this is an aspect others haven't considered either.

Finally, this is not the only argument in favor of *keeping* 
autodecoding, of course. Not wanting to break user code is the big one 
there, I guess.


More information about the Digitalmars-d mailing list