Why does enumerate over range return dchar, when ranging without returns char?

Jonathan M Davis newsgroup.d at jmdavisprog.com
Thu May 3 10:09:32 UTC 2018


On Thursday, May 03, 2018 22:00:04 rikki cattermole via Digitalmars-d-learn 
wrote:
> On 03/05/2018 9:50 PM, ag0aep6g wrote:
> > On 05/03/2018 07:56 AM, rikki cattermole wrote:
> >>> ```
> >>> import std.stdio;
> >>> import std.range : enumerate;
> >>>
> >>> void main()
> >>> {
> >>>      char[] s = ['a','b','c'];
> >>>
> >>>      char[3] x;
> >>>      auto i = 0;
> >>>      foreach(c; s) {
> >>>          x[i] = c;
> >>>          i++;
> >>>      }
> >>>
> >>>      writeln(x);
> >>> }
> >>> ```
> >>> Above works without cast.
> >>>
> >>> '''
> >>> import std.stdio;
> >>> import std.range : enumerate;
> >>>
> >>> void main()
> >>>      {
> >>>      char[] s = ['a','b','c'];
> >>>
> >>>      char[3] x;
> >>>      foreach(i, c; enumerate(s)) {
> >>>          x[i] = c;
> >>>          i++;
> >>>      }
> >>>
> >>>      writeln(x);
> >>> }
> >>> ```
> >
> > [...]
> >
> >> The first example uses auto-decoding (UTF-8 codepoints into a single
> >> UTF-32 one). This is considered a bad thing. But the compiler can
> >> disable it and leave it as UTF-8 code point upon request.
> >
> > The first example (foreach over a char[]) doesn't do any decoding. UTF-8
> > stays UTF-8.
> >
> > Also, a `char` is a UTF-8 code *unit*, not a code *point*.
> >
> >> The second example returns a Voldemort type (means no-name) which
> >> happens to be an input range. Where it can't disable anything and has
> >> been told that it is returning a dchar. See[0] as to where this gets
> >> decoded.
> >
> > This is auto decoding.
> >
> >> Writing two small functions to replace it (and popFront), will
> >> override this behavior.
> >
> > This sounds like you can disable auto decoding by providing your own
> > range primitives in your own module. That doesn't work, because Phobos
> > would still use the ones from std.range.primitives.
>
> Hmm, I swear this use to work.
>
> Oh well, easy fix:
>
> import std.algorithm;
>
> struct Wrapper {
>      char[] input;
>      alias input this;
>
>      @property char front() { return input[0]; }
>      @property bool empty() {return input.length == 0;}
>      void popFront() { input = input[1 .. $]; }
> }
>
> void main() {
>       char[] text = ['1', '2', '3'];
>
>      foreach(c; Wrapper(text).filter!(a => a != '\0')) {
>       pragma(msg, typeof(c));
>      }
> }

The standard way to get around auto-decoding is std.utf.byCodeUnit.

- Jonathan M Davis



More information about the Digitalmars-d-learn mailing list