Input ranges

anonymous via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Apr 19 04:33:25 PDT 2015


On Saturday, 18 April 2015 at 22:01:56 UTC, Ulrich Küttler wrote:
> Input ranges from std.stdio are used for reading files. So
> assuming we create a file
>
>     auto f = File("test.txt", "w");
>     f.writeln(iota(5).map!(a => repeat(to!string(a), 
> 4)).joiner.joiner("\n"));
>     f.close();
>
> We should be able groupBy (chunkBy) its lines:
>
>     writeln(File("test.txt").byLine.groupBy!((a,b) => a == b));
>
> The result is just one group, that is all lines are considered 
> equal:
>
>     [["0", "0", "0", "0", "1", "1", "1", "1", "2", "2", "2", 
> "2", "3", "3", "3", "3", "4", "4", "4", "4"]]
>
> Alas, byLine reuses the same buffer for each line and thus
> groupBy keeps comparing each line with itself. There is a 
> version
> of byLine that makes copies:
>
>     writeln(File("test.txt").byLineCopy.groupBy!((a,b) => a == 
> b));
>
> Indeed, the result is as expected:
>
>     [["0", "0", "0", "0"], ["1", "1", "1", "1"], ["2", "2", 
> "2", "2"], ["3", "3", "3", "3"], ["4", "4", "4", "4"]]

Yeah, byLine is dangerous. byLineCopy should probably have been 
the default. Maybe we should rename byLine to byLineNoCopy (doing 
the proper deprecation dance, of course).

> A final test with the undocumented byRecord method (the mapping
> after groupBy is for beauty only and does not change the 
> result):
>
>     writeln(File("test.txt")
>             .byRecord!string("%s")
>             .groupBy!((a,b) => a == b)
>             .map!(map!(a => a[0])));
>
> Here, the result is most peculiar:
>
>     [["0", "0", "0", "0"], ["1", "1", "1"], ["2", "2", "2"], 
> ["3", "3", "3"], ["4", "4", "4"]]
>
> Is byRecord broken? (It is undocumented after all.) In a way,
> because it does not contain any indirection. The current fields
> tuple is a simple member of the ByRecord struct.
>
> In contrast, the ByLineCopy struct is just a wrapper to a ref
> counted ByLineCopyImpl struct with a simple note:
>
>         /* Ref-counting stops the source range's ByLineCopyImpl
>          * from getting out of sync after the range is copied, 
> e.g.
>          * when accessing range.front, then using 
> std.range.take,
>          * then accessing range.front again. */
>
> I am uncomfortable at this point. Simple and efficient input
> ranges fail in unexpected ways. Internal indirections make all
> the difference. It feels like input ranges are hiding something
> that should not be hidden.
>
> What am I missing?

I guess the problem is the mix of value and reference semantics. 
ByRecord's `current` is a value, but its `file` has reference 
semantics. So, a copy of a ByRecord affects one part of the 
original but not the other.

Maybe copying should be `@disable`d for such ranges/structs. Then 
you couldn't pass it by value to groupBy. Instead you would have 
to use something like (the fixed version of) refRange, which 
works properly.


More information about the Digitalmars-d-learn mailing list