Autodecode?
Paul Backus
snarwin at gmail.com
Sun Aug 16 21:30:50 UTC 2020
On Sunday, 16 August 2020 at 20:53:41 UTC, JN wrote:
> Related to this thread:
> https://forum.dlang.org/post/xtjzhkvszdiwvrmryubq@forum.dlang.org
>
> I don't want to hijack it with my newbie questions. What is
> autodecode and why is it such a big deal? From what I've seen
> it's related to handling Unicode characters? And D has the
> wrong defaults?
For built-in arrays, the range primitives (empty, front,
popFront, etc.) are implemented as free functions in the
standard-library module `std.range.primitives`. [1]
For most arrays, these work the way you'd expect: empty checks if
the array is empty, front returns `array[0]`, and popFront does
`array = array[1..$]`.
But for char[] and wchar[] specifically, `front` and `popFront`
work differently. They treat the arrays as UTF-8 or UTF-16
encoded Unicode strings, and return/pop the first *code point*
instead of the first *code unit*. In other words, they
"automatically decode" the array.
This has a number of annoying consequences. New users get
mysterious template errors in the middle of range pipelines
complaining about a mismatch between `dchar` (the type of a code
point) and `char` (the type of a code unit). Generic code that
deals with arrays has to add special cases for char[] and
wchar[]. Strings don't work correctly in betterC because Unicode
decoding can throw an exception. [2] If you search the forums,
you'll find plenty more complaints.
The intent behind autodecoding was to help programmers avoid
common Unicode-related errors by doing "the right thing" by
default. The problem is that (a) decoding to code points isn't
always the right thing, and (b) autodecoding ended up causing a
bunch of additional problems of its own.
[1]
http://dpldocs.info/experimental-docs/std.range.primitives.html
[2] https://issues.dlang.org/show_bug.cgi?id=20139
More information about the Digitalmars-d-learn
mailing list