Should we add drop to Phobos?

Tue Aug 16 16:19:35 PDT 2011

On 17.08.2011 2:35, Jonathan M Davis wrote:
> On Tuesday, August 16, 2011 15:10 Dmitry Olshansky wrote:
>> On 17.08.2011 1:41, Jonathan M Davis wrote:
>>> On Tuesday, August 16, 2011 13:32 Dmitry Olshansky wrote:
>>>> On 16.08.2011 23:37, Jonathan M Davis wrote:
>>>>> On Tuesday, August 16, 2011 14:27:18 Dmitry Olshansky wrote:
>>>>>> On 16.08.2011 6:33, Jonathan M Davis wrote:
>>>>>>> On Tuesday, August 16, 2011 04:26:48 Martin Nowak wrote:
>>>>>>>> I personally like immutable methods way better than a ref changing
>>>>>>>> function like popFrontN.
>>>>>>>> As you pointed out writing str.find(";").drop(2).find(";").drop(2)
>>>>>>>> is cleaner than writing this with popFrontN.
>>>>>>>>
>>>>>>>> Drop will have issues for input ranges.
>>>>>>> No more than many range-based functions do. Yes, it'll alter the
>>>>>>> input, that's to be expected with any range-based function that you
>>>>>>> pass an input range to. drop could be made to take only a forward
>>>>>>> range, but that seems unnecessarily limiting. You just have to be
>>>>>>> aware of the fact that range-based functions always remove elements
>>>>>>> from input ranges when they process them - either that or they don't
>>>>>>> work with input ranges.
>>>>>>>
>>>>>>>> Adding a ref count parameter overload to let you know how many
>>>>>>>> elements
>>>>>>>> were dropped/not dropped seems too messy. No idea for that one.
>>>>>>> It's a tradeoff. And if you really want drop, you probably don't care
>>>>>>> anyway, since what you'd be doing would be chaining functions. If you
>>>>>>> really care, just use popFrontN.
>>>>>>>
>>>>>>>> The documentation should clearly state that this offers similar
>>>>>>>> functionality to popFrontN but it's purpose
>>>>>>>> is to enable a different syntax so that people don't get completely
>>>>>>>> confused.
>>>>>>> It does. This is the ddoc comment that I have for it:
>>>>>>>
>>>>>>> /++
>>>>>>>
>>>>>>> Pops $(D n) elements off of the given range and returns it. If
>>>>>>> the length of the given range is less than $(D n), then an
>>>>>>> empty range is returned. The original range is unaltered as
>>>>>>> long as it's a value type.
>>>>>> Shouldn't this be : "as long as it's not an Input range", for
>>>>>> everything else there is this clanky .save()
>>>>> No, I meant what it says. drop doesn't call save, so it'll work with
>>>>> input ranges. However, if you use a range which is a class with it,
>>>>> then it's going to alter the original range.
>>>> I was arguing that Input range is always altered no matter if it's
>>>> struct or class or whatnot. So if it's used on forward range it should
>>>> really call .save (worth a single static if ?) I think, thus avoiding
>>>> side effects even if it's class.
>>>> For input ranges it becomes equivalent of (r.popFrontN(n), r); with all
>>>> side effects as you described.
>>>> Seriously, "value type input ranges" are forward ranges, isn't it? (at
>>>> least they should be)
>>> There's a definite argument for static if-ing forward ranges like that.
>>> And perhaps we should do that in general in Phobos (certainly, I don't
>>> think that we always handle reference-type ranges properly right now,
>>> and the best way to handle that is up for some debate). But the point
>>> was that if you pass a class which is a forward range, it's not going to
>>> be saved, whereas a struct which was a value type would be.
>> Yes, that's the problem save was created to solve, albeit this solution
>> is error prone.
>> I would love to hear of a better solution if there is one.
> Yes and no. save was created so that you have a way to save the state of range
> which is a class so that it can be a forward range. As such, when you're
> looking to save the state of a range, you need to use save to do it -
> otherwise your code is broken with regards to classes. The problem is what
> happens when you pass a range to a function. For structs which are value types
> (which they usually are) and for arrays, this results that you saved the range
> doing that, but with a class, you didn't. So, while classes then work
> correctly whenever you explicitly save, they don't work the same when you get
> inadvertent saves like that. So, in order to have identical behavior, you need
> to enact a policy of saving all forward ranges at the beginning of any range-
> based function, and we haven't done that. So, while class forward ranges work,
> they don't work in the same way that struct or array forward ranges do.
>
>>> In general, the problem is a bit hairy, given that structs are usually
>>> value types (but not always) and classes never are - and most of the
>>> time we're dealing with ranges that are either structs or arrays - most
>>> of which are value types in the sense that if you pass them to a
>>> function without ref, altering them won't alter the original (though it
>>> can generally alter the elements in the range and thus affect the
>>> original). And very few ranges that we deal with aren't forward ranges.
>>> So, I'm pretty sure that we're not handling all of these cases correctly
>>> in Phobos. It's been bothering me of late, and I've been debating how we
>>> should best be going about it.
>> I thought there was consensus that .save is to be called, but indeed it
>> haven't been checked/enforced/enacted.
> There was a consensus that save needs to be called when explicitly saving the
> state of a range - we decided not to rely on a range's state being copied when
> you copied the range. However, I'm not aware of the issue of
> automatic/inadvertent copies when passing to functions being discussed. It may
> have been, but if so, I don't remember it, and it's definitely not the case
> that such a policy has been applied. AFAIK, there isn't a single function in
> Phobos which saves a range at the beginning of its body to make sure that
> value-type and reference-type forward ranges have identical behavior for that
> function. And I'm thinking that we need to enact such a policy.

I see, it's wasn't obvious from the beginning.

>>> I think that in the general case, right now, if you pass a class range to
>>> a function without calling save, it's going to alter the original, and I
>>> suspect that there are a few places that we assume that a range isn't
>>> going to be altered by a function we pass it to (since it usually
>>> doesn't as we're usually using forward ranges) and that there's code
>>> which is supposed to work with input ranges, but actually doesn't. So,
>>> it's on my todo list to create tests for std.range and std.algorithm
>>> which make sure that functions handle input ranges and class ranges
>>> properly.
>>> That still leaves the question open as to how best handle cases where a
>>> forward range which is a struct is automatically saved (since it's copied
>>> when it's passed to a function) but where a class isn't automatically
>>> saved (since it's a reference type). We should probably just make it so
>>> that forward ranges always have save called on them at the top of a
>>> range-based function, even if the function works with input ranges -
>>> then it would be consistent across all types of forward ranges. At
>>> least, that's the best that I can think of, so I should probably just
>>> change drop to work that way, which would make for a cleaner explanation
>>> in the documentation, if nothing else.
>> I think forward ranges that are value types should just have ref save{
>> return this; } that in the end should entail 0 copies(?).
>> Creating a simple function that does return move(x.save) on ForwardRange
>> and move(x) on others might be good idea to abstract this discrepancy away.

While I agree on all of the above, I'm obviously not making myself clear 
about avoiding copies.

> That would copy. Returning ref allows you to chain functions that take ref,
> and it allows you to alter the variable which is returned as long as you do it
> in a single expression without assigning it to anything.
>
> Besides, that doesn't help in this case anyway. The problem is that save
> _isn't_ being called but a copy is still happening in the case of value-type
> ranges and arrays, whereas it isn't happening with reference-type ranges.
>

Let's deal with my strange "0 copies" point, in fact, it's about 
ensuring exactly 1 copy is made:

void doStuff(R)(R range)//here struct is copied when passed, class is 
passed by ref and not copied
{
     auto r  = range.save(); //here struct should avoid copy by 
returning ref, while class object  finally does create a copy
///    ... work with r
}

replacing range.save to simply range on non-forward ranges.

-- 
Dmitry Olshansky