Transience of .front in input vs. forward ranges

deadalnix deadalnix at gmail.com
Mon Nov 5 07:19:15 PST 2012


Le 04/11/2012 23:10, H. S. Teoh a écrit :
> On Sun, Nov 04, 2012 at 12:38:06PM -0500, Andrei Alexandrescu wrote:
>> On 11/4/12 12:26 PM, deadalnix wrote:
>>> I think it fit nicely D's phylosophy, in the sense it does provide a
>>> safe, easy to use interface while providing a backdoor when this
>>> isn't enough.
>>
>> It doesn't fit the (admittedly difficult to fulfill) desideratum that
>> the obvious code is safe and fast. And the obvious code uses byLine,
>> not byLine.transient.
>
> Actually, it does. The proposal was to modify byLine so that it returns
> strings instead of a reused char[] buffer. The current version of byLine
> that has transient front will be moved into a .transient property. This
> has the following results:
>
> - All existing code doesn't break: it just becomes a tad slower from
>    duplicating each line. Correctness is not broken.
>
> - Code that wants to be fast can call .transient at the end of the
>    construction, e.g., stdin.byLine().map!myFunc.transient. Assuming
>    that map propagates .transient (which also implies it's safe to use
>    with transient ranges), this will return an optimized range that
>    reuses the transient buffer safely.
>
> - Code that calls .transient on non-transient ranges get a no-op:
>    [1,2,3].map!myFunc.transient is equivalent to [1,2,3].map!myFunc,
>    because non-transient ranges just alias .transient to themselves (this
>    can be done via UFCS so no actual change needs to be made to existing
>    non-transient ranges).
>
> IOW, code is always 100% safe by default. You can optionally use
> .transient with certain ranges if you'd like for it to be faster. Using
> .transient where it isn't supported simply defaults to the usual safe
> behaviour (no performance increase, but no safety bugs either).
>
> Now, this can be taken one step further. User code need not even know
> what .transient is. As deadalnix said, the insight is that it is only
> invoked by range *consumers*. For example, one could in theory make
> canFind() transient-safe, so the user would write this:
>
> 	auto n = stdin.byLine().map!myFunc.canFind(myNeedle);
>
> and canFind uses the .transient range to do the finding. Assuming
> map!myFunc correctly supports .transient, this request for the faster
> range automatically propagates back to byLine(), and so the code is fast
> *without the user even asking for .transient*.
>
> And if the range being scanned is non-transient, then it behaves
> normally as it does today. For example, if map!myFunc does *not* support
> .transient, then when canFind asks for .transient, UFCS takes over and
> it just gets the (non-transient) mapped range back. Which in turn asks
> for the *non-transient* version of byLine(), so in no case will there be
> any safety breakage.
>
> I think this solution is most elegant so far, in the sense that:
>
> (1) Existing code doesn't break;
> (2) Existing code doesn't even need to change, or can be slowly
> optimized to use .transient on a case-by-case basis without any code
> breakage;
> (3) The user doesn't even need to know what .transient is to reap the
> benefits, where it's supported;
> (4) The algorithm writer decides whether or not .transient is supported,
> guaranteeing that there will be no hidden bugs caused by assuming .front
> is persistent and then getting a transient range passed to it. So it's
> the algorithm that declares whether or not it supports .transient.
>
>
>> Back to a simpler solution: what's wrong with adding alternative
>> APIs for certain input ranges? We have byLine, byChunk,
>> byChunkAsync. We may as well add eachLine, eachChunk, eachChunkAsync
>> and let the documentation explain the differences.
> [...]
>
> This is less elegant than deadalnix's proposal, because there is a
> certain risk associated with using .transient ranges. Now the onus is on
> the user to make sure he doesn't pass a transient range to an algorithm
> that can't handle it.
>
> With deadalnix's proposal, it's the *algorithm* that decides whether or
> not it knows how to handle .transient properly. The user doesn't have to
> know, and can still reap the benefits. An algorithm that doesn't know
> how to deal with transient ranges simply does nothing, and UFCS takes
> over to provide the default .transient, which just returns the original
> non-transient range.
>

I think I'll hire you when I need to promote my ideas :D This is much 
better explained than what I did myself and is exactly what I had in mind.


More information about the Digitalmars-d mailing list