problem with byLine

Tue May 15 08:11:07 PDT 2012

On Tue, May 15, 2012 at 04:42:39PM +0200, dcoder wrote:
> On Monday, 14 May 2012 at 09:00:14 UTC, simendsjo wrote:
[...]
> >I believe byLine reuses the internal buffer. Try duping the lines:
> 
> 
> 
> >  auto i = f.byLine().map!"a.idup"().array();
> 
> 
> Can someone please explain to me the last line?
> 
> I'm trying to learn D, by playing with code and reading this forum.
> I'm a slow learner.  :)
> 
> Anyways, I looked at std.stdio code and noticed that byLine resturns
> a struct ByLine, but where does the .map come from?  Thanks!

map is one of the generic algorithms from std.algorithm. It "maps" the
expression "a.idup" to each element returned by byLine().

I'll try to explain slowly.

First, f.byLine() returns a range of lines, that is, a struct that has
the members .front, .popFront(), and .empty. Anything that implements
these three methods can be treated as a sort of generic "array", which
we call a "range". By not requiring a specific type for map (and other
algorithms in std.algorithm), we allow any kind of concrete type to be
used without needing any explicit conversions. In a nutshell, the struct
that f.byLine() returns is, abstractly speaking, a range of lines that
you can iterate over.

Second, .map!"a.idup"() is making use of Unified Function Call Syntax
(UFCS), which is a neat feature of D that if you use member invocation
syntax obj.memb(x,y,z), but memb isn't a member of obj, then the
compiler quietly rewrites the call to be memb(obj,x,y,z). So, the line
up to the .map call is actually treated by the compiler as:

	map!"a.idup"(f.byLine())

that is, it takes the expression "a.idup" and applies it to each of the
lines in f.byLine(), substituting "a" with each respective line. This,
in effect, creates another range of lines, which is the range resulting
from calling .idup on each line returned by f.byLine(). In other words,
this makes a copy of each line returned by f.byLine().

Finally, the .array() call at the end turns the range returned by map()
back into an array. This is needed because, just as map() takes a range
as input (remember, a range is anything with the members .front,
.popFront(), and .empty), it also returns a range. What it returns is an
internal object that implements the range methods .front, .popFront(),
and .empty, and which iterates over each element of the result. This
internal object is, in general, not the same as an actual array, so to
get an array out of it, we need to explicitly make an array from it
using .array().

Here, again, UFCS is exploited: std.algorithm actually defines a
function called array(R), which takes a single parameter, a range, to be
turned into an array. Since the range returned by map() doesn't have a
member called "array", when you write f.byLine().map!"a.idup"().array(),
it gets translated into:

	array(map!"a.idup"(f.byLine()))

which is how you'd write this in traditional function-call syntax. UFCS,
however, lets you write things in the order they happen, which some find
to be more readable:

	f.byLine().map!"a.idup"().array()

means "take f, get a range of its lines, map the expression "a.idup" to
the lines, then make an array out of that".

The key to this line is the map!"a.idup", which makes a copy of each
line returned by f.byLine(). The reason this is necessary is because
byLine() doesn't return an array; it _reuses_ an internal buffer to
store each line it reads from f. So if you didn't duplicate each line,
by the time the next line is read, the original line has been
overwritten, so you'll get garbage data. Using map!"a.idup" essentially
means "call .idup on every line returned", thereby avoiding this
problem.

Hope this helps.

T

-- 
This is not a sentence.