Autodecode?

Sun Aug 16 21:30:50 UTC 2020

On Sunday, 16 August 2020 at 20:53:41 UTC, JN wrote:
> Related to this thread: 
> https://forum.dlang.org/post/xtjzhkvszdiwvrmryubq@forum.dlang.org
>
> I don't want to hijack it with my newbie questions. What is 
> autodecode and why is it such a big deal? From what I've seen 
> it's related to handling Unicode characters? And D has the 
> wrong defaults?

For built-in arrays, the range primitives (empty, front, 
popFront, etc.) are implemented as free functions in the 
standard-library module `std.range.primitives`. [1]

For most arrays, these work the way you'd expect: empty checks if 
the array is empty, front returns `array[0]`, and popFront does 
`array = array[1..$]`.

But for char[] and wchar[] specifically, `front` and `popFront` 
work differently. They treat the arrays as UTF-8 or UTF-16 
encoded Unicode strings, and return/pop the first *code point* 
instead of the first *code unit*. In other words, they 
"automatically decode" the array.

This has a number of annoying consequences. New users get 
mysterious template errors in the middle of range pipelines 
complaining about a mismatch between `dchar` (the type of a code 
point) and `char` (the type of a code unit). Generic code that 
deals with arrays has to add special cases for char[] and 
wchar[]. Strings don't work correctly in betterC because Unicode 
decoding can throw an exception. [2] If you search the forums, 
you'll find plenty more complaints.

The intent behind autodecoding was to help programmers avoid 
common Unicode-related errors by doing "the right thing" by 
default. The problem is that (a) decoding to code points isn't 
always the right thing, and (b) autodecoding ended up causing a 
bunch of additional problems of its own.

[1] 
http://dpldocs.info/experimental-docs/std.range.primitives.html
[2] https://issues.dlang.org/show_bug.cgi?id=20139