is std.algorithm.joiner lazy?

Puming via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Apr 7 17:30:05 PDT 2016


On Thursday, 7 April 2016 at 18:15:07 UTC, Jonathan M Davis wrote:
> On Thursday, April 07, 2016 08:47:15 Puming via 
> Digitalmars-d-learn wrote:
>> On Thursday, 7 April 2016 at 08:27:23 UTC, Edwin van Leeuwen
>>
>> wrote:
>> > On Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:
>> >> On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van 
>> >> Leeuwen wrote:
>> >>
>> >> OK. Even if it consumes the first two elements, then why 
>> >> does it have to consume them AGAIN when actually used? If 
>> >> the function mkarray has side effects, it could lead to 
>> >> problems.
>> >
>> > After some testing it seems to get each element twice, calls 
>> > front on the MapResult twice, on each element. The first two 
>> > mkarray are both for first element, the second two for the 
>> > second. You can solve this by caching the front call with:
>> >
>> > xs.map!(x=>mkarray(x)).cache.joiner;
>>
>> Thanks! I added more elements to xs and checked that you are 
>> right.
>>
>> So EVERY element is accessed twice with joiner. Better add 
>> that to the docs, and note the use of cache.
>
> I would note that in general, it's not uncommon for an 
> algorithm to access front multiple times. So, this really isn't 
> a joiner-specific issue. If anything, it's map that should get 
> a note in its docs, not joiner. You really should just expect 
> front to be called multiple times. So, if that's a problem, use 
> cache. But joiner is not doing anything abnormal.

But in the joiner docs, it says joiner is lazy. But accessing 
front multiple times is not true laziness. I think it better note 
that after the lazy part: "joiner is lazy, but it will access the 
front twice".

If there are many other lazy functions behave like this, I 
suggest to make a new name for it, like 'semi-lazy', to be more 
accurate.

Maybe its my fault, I didn't know what cache does before Edwin 
told me.
So there is the solution, it just is not easy for newbies to find 
out because there is no direct link between these functions.

>
> And it's not even the case that it necessarily makes sense to 
> make a rule of thumb that ranges should copy front instead of 
> calling it multiple times, because if front returns by ref, 
> calling front multiple times is likely to be cheapepr, and 
> while we don't properly support non-copyable types (like 
> UniquePtr) with ranges right now, we really should, so if 
> anything, it becomes the case that algorithms should favor 
> calling front multiple times over copying its value.

Indeed. I think copy is not good. But multiple access is a thing 
to note. When I want to use lazy things, it usually is that I'm 
reading files, so accessing twice is not acceptable.

>
> So, there are pros and cons involved with copying front vs 
> calling it multiple times, and I think that both approaches are 
> both pretty common at this point. So, given how frequently it 
> makes sense for map to allocate (e.g. to!string(a)), map should 
> probably have a note about cache, but overall, it's just 
> something that you need to be aware of. Regardless, I don't 
> think that it makes sense to put anything in joiner's docs 
> about it.

There is another problem, map, cache, and joiner don't work when 
composed multiple times. I've submitted a bug, 
https://issues.dlang.org/show_bug.cgi?id=15891, can you confirm?

Because of this, now I have to read a file multiple times(using 
only joiner), or have to eagerly retrieve data in an array (which 
is too big), or fall back to an imperative way of manually 
accessing each file. They are all bad.
>
> - Jonathan M Davis




More information about the Digitalmars-d-learn mailing list