Transience of .front in input vs. forward ranges

H. S. Teoh hsteoh at quickfur.ath.cx
Sun Nov 4 14:10:11 PST 2012


On Sun, Nov 04, 2012 at 12:38:06PM -0500, Andrei Alexandrescu wrote:
> On 11/4/12 12:26 PM, deadalnix wrote:
> >I think it fit nicely D's phylosophy, in the sense it does provide a
> >safe, easy to use interface while providing a backdoor when this
> >isn't enough.
> 
> It doesn't fit the (admittedly difficult to fulfill) desideratum that
> the obvious code is safe and fast. And the obvious code uses byLine,
> not byLine.transient.

Actually, it does. The proposal was to modify byLine so that it returns
strings instead of a reused char[] buffer. The current version of byLine
that has transient front will be moved into a .transient property. This
has the following results:

- All existing code doesn't break: it just becomes a tad slower from
  duplicating each line. Correctness is not broken.

- Code that wants to be fast can call .transient at the end of the
  construction, e.g., stdin.byLine().map!myFunc.transient. Assuming
  that map propagates .transient (which also implies it's safe to use
  with transient ranges), this will return an optimized range that
  reuses the transient buffer safely.

- Code that calls .transient on non-transient ranges get a no-op:
  [1,2,3].map!myFunc.transient is equivalent to [1,2,3].map!myFunc,
  because non-transient ranges just alias .transient to themselves (this
  can be done via UFCS so no actual change needs to be made to existing
  non-transient ranges).

IOW, code is always 100% safe by default. You can optionally use
.transient with certain ranges if you'd like for it to be faster. Using
.transient where it isn't supported simply defaults to the usual safe
behaviour (no performance increase, but no safety bugs either).

Now, this can be taken one step further. User code need not even know
what .transient is. As deadalnix said, the insight is that it is only
invoked by range *consumers*. For example, one could in theory make
canFind() transient-safe, so the user would write this:

	auto n = stdin.byLine().map!myFunc.canFind(myNeedle);

and canFind uses the .transient range to do the finding. Assuming
map!myFunc correctly supports .transient, this request for the faster
range automatically propagates back to byLine(), and so the code is fast
*without the user even asking for .transient*.

And if the range being scanned is non-transient, then it behaves
normally as it does today. For example, if map!myFunc does *not* support
.transient, then when canFind asks for .transient, UFCS takes over and
it just gets the (non-transient) mapped range back. Which in turn asks
for the *non-transient* version of byLine(), so in no case will there be
any safety breakage.

I think this solution is most elegant so far, in the sense that:

(1) Existing code doesn't break;
(2) Existing code doesn't even need to change, or can be slowly
optimized to use .transient on a case-by-case basis without any code
breakage;
(3) The user doesn't even need to know what .transient is to reap the
benefits, where it's supported;
(4) The algorithm writer decides whether or not .transient is supported,
guaranteeing that there will be no hidden bugs caused by assuming .front
is persistent and then getting a transient range passed to it. So it's
the algorithm that declares whether or not it supports .transient.


> Back to a simpler solution: what's wrong with adding alternative
> APIs for certain input ranges? We have byLine, byChunk,
> byChunkAsync. We may as well add eachLine, eachChunk, eachChunkAsync
> and let the documentation explain the differences.
[...]

This is less elegant than deadalnix's proposal, because there is a
certain risk associated with using .transient ranges. Now the onus is on
the user to make sure he doesn't pass a transient range to an algorithm
that can't handle it.

With deadalnix's proposal, it's the *algorithm* that decides whether or
not it knows how to handle .transient properly. The user doesn't have to
know, and can still reap the benefits. An algorithm that doesn't know
how to deal with transient ranges simply does nothing, and UFCS takes
over to provide the default .transient, which just returns the original
non-transient range.


T

-- 
Государство делает вид, что платит нам зарплату, а мы делаем вид, что работаем.


More information about the Digitalmars-d mailing list