is std.algorithm.joiner lazy?
Puming via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Thu Apr 7 17:30:05 PDT 2016
On Thursday, 7 April 2016 at 18:15:07 UTC, Jonathan M Davis wrote:
> On Thursday, April 07, 2016 08:47:15 Puming via
> Digitalmars-d-learn wrote:
>> On Thursday, 7 April 2016 at 08:27:23 UTC, Edwin van Leeuwen
>>
>> wrote:
>> > On Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:
>> >> On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van
>> >> Leeuwen wrote:
>> >>
>> >> OK. Even if it consumes the first two elements, then why
>> >> does it have to consume them AGAIN when actually used? If
>> >> the function mkarray has side effects, it could lead to
>> >> problems.
>> >
>> > After some testing it seems to get each element twice, calls
>> > front on the MapResult twice, on each element. The first two
>> > mkarray are both for first element, the second two for the
>> > second. You can solve this by caching the front call with:
>> >
>> > xs.map!(x=>mkarray(x)).cache.joiner;
>>
>> Thanks! I added more elements to xs and checked that you are
>> right.
>>
>> So EVERY element is accessed twice with joiner. Better add
>> that to the docs, and note the use of cache.
>
> I would note that in general, it's not uncommon for an
> algorithm to access front multiple times. So, this really isn't
> a joiner-specific issue. If anything, it's map that should get
> a note in its docs, not joiner. You really should just expect
> front to be called multiple times. So, if that's a problem, use
> cache. But joiner is not doing anything abnormal.
But in the joiner docs, it says joiner is lazy. But accessing
front multiple times is not true laziness. I think it better note
that after the lazy part: "joiner is lazy, but it will access the
front twice".
If there are many other lazy functions behave like this, I
suggest to make a new name for it, like 'semi-lazy', to be more
accurate.
Maybe its my fault, I didn't know what cache does before Edwin
told me.
So there is the solution, it just is not easy for newbies to find
out because there is no direct link between these functions.
>
> And it's not even the case that it necessarily makes sense to
> make a rule of thumb that ranges should copy front instead of
> calling it multiple times, because if front returns by ref,
> calling front multiple times is likely to be cheapepr, and
> while we don't properly support non-copyable types (like
> UniquePtr) with ranges right now, we really should, so if
> anything, it becomes the case that algorithms should favor
> calling front multiple times over copying its value.
Indeed. I think copy is not good. But multiple access is a thing
to note. When I want to use lazy things, it usually is that I'm
reading files, so accessing twice is not acceptable.
>
> So, there are pros and cons involved with copying front vs
> calling it multiple times, and I think that both approaches are
> both pretty common at this point. So, given how frequently it
> makes sense for map to allocate (e.g. to!string(a)), map should
> probably have a note about cache, but overall, it's just
> something that you need to be aware of. Regardless, I don't
> think that it makes sense to put anything in joiner's docs
> about it.
There is another problem, map, cache, and joiner don't work when
composed multiple times. I've submitted a bug,
https://issues.dlang.org/show_bug.cgi?id=15891, can you confirm?
Because of this, now I have to read a file multiple times(using
only joiner), or have to eagerly retrieve data in an array (which
is too big), or fall back to an imperative way of manually
accessing each file. They are all bad.
>
> - Jonathan M Davis
More information about the Digitalmars-d-learn
mailing list