lazy thoughts

Mon Jan 12 12:44:22 PST 2009

On Tue, Jan 13, 2009 at 2:05 AM, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> wrote:
> (I originally emailed this to Walter and a couple others, but I thought it
> might be better to gather insights from the community.)
>
> I'm gearing up for changing std.algorithm, and I am thinking of making the
> major change of favoring lazy evaluation throughout. For example, consider
> the map function. Right now map allocates and returns a new vector, for
> example:
>
> int[] arr = [ 1, 2, 3, 4 ];
> auto squares = map!("a * a")(arr);
> assert(squares == [ 1, 4, 9, 16 ]);
>
> What happens is unfortunate because (1) all evaluation is done upfront even
> if it is only partially needed, (2) allocation is done every call, (3) the
> whole scheme won't work with unbounded inputs, e.g. generators or even large
> files.
>
> So now that we have nice range support in the language, I'm thinking very
> seriously of switching full-bore to lazy semantics. In that case map returns
> a range - a little struct that saves the input and trickles out results one
> at a time.
>
> One nice thing about lazy is that lazy includes eager. If you actually want
> to "eagerize" map, you just call eager against the returned range:
>
> int[] arr = [ 1, 2, 3, 4 ];
> auto squares = eager(map!("a * a")(arr));
> assert(squares == [ 1, 4, 9, 16 ]);
>
> The real advantage comes when you actually exploit the laziness, e.g.:
>
> int[] arr = [ 1, 2, 3, 4 ];
> auto squares = map!("a * a")(arr);
> foreach (x; squares) {
>    writeln(x);
> }
>
> Look ma, no allocation!
>
> I just wanted to gauge your opinion on this. It is a major departure from
> the STL, and I think in the right direction. It is a departure nonetheless
> and some STL users might feel uncomfortable.
>
> Also, lazy evaluation has the risk of getting confusing as there's a lot of
> escaping. Consider:
>
> int[] arr = [ 1, 2, 3, 4 ];
> auto squares = map!("a * a")(arr);
> arr[] = [ 5, 6, 7, 8 ];
>
> Now iterating squares will see different numbers than the original ones.
>
> Please let me know what you think!

I think having such functions is great.  I'm not so sure about
removing the others and replacing them wholesale with these.  In
Python they have had two versions of most functions, like range vs
xrange.  The former returns an array with the result, the latter
returns a lazy iterable.  In the Python 3.0 they have done away with
most of the x__ versions and made those default.  So you might say
A-ha! the array versions weren't necessary!  But the difference is
that Python cares more about simplicity than performance, so a little
loss in speed in exchange for only-one-way-to-do-it is acceptable
there.

But I'm just guessing there will be some penalty for doing everything
lazily in the case where you know you want a full array.  More data is
needed on that I think.

But even assuming it's is a free lunch, I'd want a better way make an
array than putting .eager on everything.  First off, that's not even
remotely a verb (like .eval() or .exec()), nor is it a noun that
clearly expresses the type it will become (ala array() or toArray()).
Second it's just a lot of typing for something that could be a pretty
common case, still.   I'd be happier with a one letter difference,
like Python had.  But making the defaults the other way would be fine
by me,   map -> lazy map   emap -> eager map.

--bb