Input ranges

via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sat Apr 18 15:01:54 PDT 2015


It seems input ranges without any indirection in memory are not
working well with algorithms. This seems to be understood by the
D community. I did not know. Here is my story on the topic so
far:

Recently, I learned that I did not know input ranges much at all,
totally misjudging std.range.refRange in its usefulness to input
ranges:
https://github.com/D-Programming-Language/phobos/pull/3123

At this point some experiments might be in order. (using 2.067.0)

Input ranges from std.stdio are used for reading files. So
assuming we create a file

     auto f = File("test.txt", "w");
     f.writeln(iota(5).map!(a => repeat(to!string(a), 
4)).joiner.joiner("\n"));
     f.close();

We should be able groupBy (chunkBy) its lines:

     writeln(File("test.txt").byLine.groupBy!((a,b) => a == b));

The result is just one group, that is all lines are considered 
equal:

     [["0", "0", "0", "0", "1", "1", "1", "1", "2", "2", "2", "2", 
"3", "3", "3", "3", "4", "4", "4", "4"]]

Alas, byLine reuses the same buffer for each line and thus
groupBy keeps comparing each line with itself. There is a version
of byLine that makes copies:

     writeln(File("test.txt").byLineCopy.groupBy!((a,b) => a == 
b));

Indeed, the result is as expected:

     [["0", "0", "0", "0"], ["1", "1", "1", "1"], ["2", "2", "2", 
"2"], ["3", "3", "3", "3"], ["4", "4", "4", "4"]]

A final test with the undocumented byRecord method (the mapping
after groupBy is for beauty only and does not change the result):

     writeln(File("test.txt")
             .byRecord!string("%s")
             .groupBy!((a,b) => a == b)
             .map!(map!(a => a[0])));

Here, the result is most peculiar:

     [["0", "0", "0", "0"], ["1", "1", "1"], ["2", "2", "2"], 
["3", "3", "3"], ["4", "4", "4"]]

Is byRecord broken? (It is undocumented after all.) In a way,
because it does not contain any indirection. The current fields
tuple is a simple member of the ByRecord struct.

In contrast, the ByLineCopy struct is just a wrapper to a ref
counted ByLineCopyImpl struct with a simple note:

         /* Ref-counting stops the source range's ByLineCopyImpl
          * from getting out of sync after the range is copied, 
e.g.
          * when accessing range.front, then using std.range.take,
          * then accessing range.front again. */

I am uncomfortable at this point. Simple and efficient input
ranges fail in unexpected ways. Internal indirections make all
the difference. It feels like input ranges are hiding something
that should not be hidden.

What am I missing?


More information about the Digitalmars-d-learn mailing list