Input ranges
anonymous via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Sun Apr 19 04:33:25 PDT 2015
On Saturday, 18 April 2015 at 22:01:56 UTC, Ulrich Küttler wrote:
> Input ranges from std.stdio are used for reading files. So
> assuming we create a file
>
> auto f = File("test.txt", "w");
> f.writeln(iota(5).map!(a => repeat(to!string(a),
> 4)).joiner.joiner("\n"));
> f.close();
>
> We should be able groupBy (chunkBy) its lines:
>
> writeln(File("test.txt").byLine.groupBy!((a,b) => a == b));
>
> The result is just one group, that is all lines are considered
> equal:
>
> [["0", "0", "0", "0", "1", "1", "1", "1", "2", "2", "2",
> "2", "3", "3", "3", "3", "4", "4", "4", "4"]]
>
> Alas, byLine reuses the same buffer for each line and thus
> groupBy keeps comparing each line with itself. There is a
> version
> of byLine that makes copies:
>
> writeln(File("test.txt").byLineCopy.groupBy!((a,b) => a ==
> b));
>
> Indeed, the result is as expected:
>
> [["0", "0", "0", "0"], ["1", "1", "1", "1"], ["2", "2",
> "2", "2"], ["3", "3", "3", "3"], ["4", "4", "4", "4"]]
Yeah, byLine is dangerous. byLineCopy should probably have been
the default. Maybe we should rename byLine to byLineNoCopy (doing
the proper deprecation dance, of course).
> A final test with the undocumented byRecord method (the mapping
> after groupBy is for beauty only and does not change the
> result):
>
> writeln(File("test.txt")
> .byRecord!string("%s")
> .groupBy!((a,b) => a == b)
> .map!(map!(a => a[0])));
>
> Here, the result is most peculiar:
>
> [["0", "0", "0", "0"], ["1", "1", "1"], ["2", "2", "2"],
> ["3", "3", "3"], ["4", "4", "4"]]
>
> Is byRecord broken? (It is undocumented after all.) In a way,
> because it does not contain any indirection. The current fields
> tuple is a simple member of the ByRecord struct.
>
> In contrast, the ByLineCopy struct is just a wrapper to a ref
> counted ByLineCopyImpl struct with a simple note:
>
> /* Ref-counting stops the source range's ByLineCopyImpl
> * from getting out of sync after the range is copied,
> e.g.
> * when accessing range.front, then using
> std.range.take,
> * then accessing range.front again. */
>
> I am uncomfortable at this point. Simple and efficient input
> ranges fail in unexpected ways. Internal indirections make all
> the difference. It feels like input ranges are hiding something
> that should not be hidden.
>
> What am I missing?
I guess the problem is the mix of value and reference semantics.
ByRecord's `current` is a value, but its `file` has reference
semantics. So, a copy of a ByRecord affects one part of the
original but not the other.
Maybe copying should be `@disable`d for such ranges/structs. Then
you couldn't pass it by value to groupBy. Instead you would have
to use something like (the fixed version of) refRange, which
works properly.
More information about the Digitalmars-d-learn
mailing list