Transient ranges

Mon May 30 06:26:35 PDT 2016

On 5/30/16 12:17 AM, Jonathan M Davis via Digitalmars-d wrote:
> On Sunday, May 29, 2016 13:36:24 Steven Schveighoffer via Digitalmars-d wrote:
>> On 5/27/16 9:48 PM, Jonathan M Davis via Digitalmars-d wrote:
>>> On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
>>>> So what about the convention to explicitely declare a
>>>> `.transient` enum member on a range, if the front element value
>>>> can change?
>>>
>>> Honestly, I don't think that supporting transient ranges is worth it.
>>> Every
>>> single range-based function would have to either test that the "transient"
>>> enum wasn't there or take transient ranges into account, and
>>> realistically,
>>> that isn't going to happen. For better or worse, we do have byLine in
>>> std.stdio, which has a transient front, but aside from the performance
>>> benefits, it's been a disaster.
>>
>> Wholly disagree. If we didn't cache the element, D would be a
>> laughingstock of performance-minded tests.
>
> Having byLine not copy its buffer is fine. Having it be a range is not.
> Algorithms in general just do not play well with that behavior, and I don't
> think that it's reasonable to expect them to.

I disagree. Most algorithms in std.algorithm are fine with transient ranges.

>
>>> It's way too error-prone. We now have
>>> byLineCopy to combat that, but of course, byLine is the more obvious
>>> function and thus more likely to be used (plus it's been around longer),
>>> so
>>> a _lot_ of code is going to end up using it, and a good chunk of that code
>>> really should be using byLineCopy.
>>
>> There's nothing actually wrong with using byLine, and copying on demand.
>> Why such a negative connotation?
>
> Because it does not play nicely with ranges, and aside from a few rare
> ranges like byLine that have to deal directly with I/O, transience isn't
> even useful. Having an efficient solution that plays nicely with I/O is
> definitely important, but it doesn't need to be a range, especially when it
> complicates ranges in general. byLine doesn't even work with
> std.array.array, and if even that doesn't work, I don't see how a range
> could be considered well-behaved.

Here is how I think about it: the front element is valid and stable 
until you call popFront. After that, anything goes for the old front.

This is entirely reasonable, and fits into many many algorithms. This 
isn't a functional-only language, mutation is valid in D.

>
>>> I'm of the opinion that if you want a transient front, you should just use
>>> opApply and skip ranges entirely.
>>
>> So you want to make this code invalid? Why?
>>
>> foreach(i; map!(a => a.to!int)(stdin.byLine))
>> {
>>     // process each integer
>>     ...
>> }
>>
>> You want to make me copy each line to a heap-allocated string so I can
>> parse it?!!
>
> If it's a range, then it can be passed around to other algorithms with
> impunity, and almost nothing is written with the idea that a range's front
> is transient.

Algorithms which are fine with byLine from std.algorithm.searching (not 
going to go through all of the submodules):

all, any, balancedParens, count, countUntil, find, canFind, findAmong 
(with first range being transient), skipOver (second parameter 
transient), startsWith, until.

algorithms which would would compile with transient ranges, but not work 
correctly: minCount, maxCount

Now, if minCount and maxCount could introspect transience, it could .dup 
the elements to make sure they were returned properly.

> There's no way to check for transience, and I don't think
> that it's even vaguely worth adding yet another range primitive that has to
> be checked for everywhere just for this case. Transience does _not_ play
> nicely with algorithms in general.

I think your understanding of this is incorrect. A range is transient by 
default if the type of the element allows modification via reference. It 
doesn't have to be checked everywhere, because most algorithms are fine 
with or without transience.

> Using opApply doesn't completely solve the problem (since the buffer could
> still escape - we'd need some kind of scope attribute or wrapper to fix that
> problem), but it makes it so that you can't pass such a a range around and
> run into problems with all of the algorithms that don't play nicely with it.
> So, instead, you end up with code that looks something like
>
> foreach(line; stdin.byLine())
> {
>     auto i = line.to!int();
>     ...
> }
>
> And yes, it's slightly longer, but it prevents a whole class of bugs by not
> having it be a range with a transient front.

Sure, as long as we're adding new newsgroups, let's at one that's titled 
"why can't I use byLine as a range", as this will be a popular topic.

>
>>> Allowing for front to be transient -
>>> whether you can check for it or not - simply is not worth the extra
>>> complications. I'd love it if we deprecated byLine's range functions, and
>>> made it use opApply instead and just declare transient ranges to be
>>> completely unsupported. If you want to write your code to have a transient
>>> front, you can obviously take that risk, but you're on your own.
>>
>> There is no way to disallow front from being transient. In fact, it
>> should be assumed that it is the default unless it's wholly a value-type.
>
> Pretty much no range-based code is written with the idea that front is
> transient.

Provably false, see above.

-Steve