isInputRange is false for non-copyable types

Shachar Shemesh shachar at weka.io
Tue Apr 17 03:39:19 UTC 2018


This is going nowhere.

Let's flesh this out face to face at dconf.

Shachar

On 16/04/18 12:01, Jonathan M Davis wrote:
> On Monday, April 16, 2018 07:15:36 Shachar Shemesh via Digitalmars-d wrote:
>> On 16/04/18 01:28, Jonathan M Davis wrote:
>>> The fact that foreach copies the range that it's given makes using ref
>>> with anything other than arrays a bit of an iffy proposition. It will
>>> compile regardless of whether front returns by ref or not (which
>>> arguably is a bug), but it will only affect the original elements if
>>> copying the range doesn't copy the elements
>>
>> You keep referring to ranges that contain the data, but I am racking my
>> mind and failing to find examples.
> 
> Pretty much any generative range does. e.g. iota contains data, and each
> element is then calculated from the current value. The same goes for stuff
> like random number generators. It's quite common for ranges to have popFront
> calculate the next front and store it as a member rather than have front do
> the calculation, since then multiple calls to front don't incur the cost of
> whatever has to be done to get the next element, and they can just return
> the same value each time. In fact, that's generally considered to be best
> practice. It's true that most ranges do not contain every element directly
> in a struct, but some do (e.g. std.range.only does, which is a way to
> generate a range from a list of elements without allocating anything on the
> heap), and many ranges at least store the value for front. So, I'd say that
> a fairly large percentage of ranges store at least one value directly in the
> range object.
> 
>> Plus, as I've said before, if ranges are expected to contain data,
>> copying them seems like an incredibly bad idea. The whole idea behind
>> casually copying the range about is that it's cheap to do.
> 
> Yes, it's generally assumed that it's cheap to copy a range, but it's also
> generally assumed by D code in general that copying is cheap. Relatively few
> functions in Phobos accept arguments by ref. Some do accept by auto ref -
> either to forward the refness or to avoid a copy - but since passing by ref
> doesn't work with rvalues, most code in Phobos doesn't do so unless it's
> intended to mutate the argument and give the caller access to the result. I
> would strongly suggest that anyone who is looking to put an object that is
> expensive to copy in a range consider making it so that that object is on
> the heap rather than directly in the range.
> 
>>> But as it stands, using ref with foreach in generic code is not going to
>>> go well. If anything, it's actually a code smell.
>>
>> You are conflating ref with owning data, which I find unsubstantiated.
> 
> My point is that if you have
> 
> foreach(ref e; range)
> {
> }
> 
> then unless range is a dynamic array, the odds are very high that it's a
> bug, because then you risk problems like
> 
> foreach(ref e; range)
> {
>      e = foo;
> }
> 
> Outside of dynamic arrays, that assignment is usually pointless. ref implies
> that it's going to actually affect the range that you passed to foreach, but
> in most cases, it won't. Because the range is copied, in order for ref to
> affect elements of the range that is passed to foreach, the range must be a
> reference type or must act like a dynamic array in the sense that each copy
> refers to exactly the same objects - which is true for some ranges (e.g.
> those over a container) - but for many ranges, it isn't (e.g. generative
> ranges aren't usually backed by anything). Also, the front of most ranges
> does not return by ref, which would be required for the ref on foreach to
> actually allow you to mutate the element. As such, I'd argue that the fact
> that ref is allowed on foreach for all ranges is a bug. At minimum, it
> should require that front return by ref, and even then, depending on the
> range, it won't necessary work to assign to the element and have it affect
> the range that was copied when it was passed to foreach. So, while it will
> work in _some_ cases to pass a range to foreach and then use ref, in
> general, I would consider it a code smell, because the odds are high that it
> does not do what the programmer who wrote it assumed that it would do.
> 
>>> And if a templated function fails to compile if it's given a particular
>>> argument, and that failure is not due to the template constraint, that
>>> usually means that the template constraint is incomplete.
>>
>> I do not accept this argument outright. In fact, our experience at Weka
>> has drove us to the exact opposite direction.
>>
>> We deliberately set up functions to include invalid cases and then
>> either let them fail because of lacking constraints, or even set up a
>> static assert to fail the function generation rather than fail matching.
>>
>> We found that the error messages produced by the alternative are simply
>> too difficult to parse out and figure what went wrong.
> 
> For better or for worse, it's generally considered a bug in Phobos if it's
> possible to write code that gets past the template constraint and then does
> not compile. There are reasons why that can be a problematic approach, but
> that is the approach that we've taken thus far. As such, if isInputRange
> started allowing non-copyable elements, it would be expected that all
> range-based functions in Phobos would then have to either require
> copyability in their template constraint or actually work with non-copyable
> types.
> 
>>> it would mean that _every_ range-based function then has to worry about
>>> non-copyable elements
>>
>> No, just those in libraries. The same way they have to worry about big O
>> complexity, type safety, @nogc and a bunch of other things.
>>
>> When you write a library, you have to think about your users. This
>> doesn't change anything about that.
> 
> It changes it by adding yet another complication to worry about when
> implementing ranges and range-based functions that no one implementing
> range-based code currently has to worry about. And writing bug-free generic,
> range-based code can already be surprisingly difficult without adding yet
> another complication into the mix.
> 
>>> Not all algorithms can call front just once without
>>> copying it
>>
>> Huh? Where did that come from?
>>
>> Phobos already assumes that calling front needs to be an O(1) operation.
>> You can call front as many times as you like. Why would you assume
>> otherwise?
> 
> Sure, you can call front as many times as you want. But in some cases, it's
> more expensive to call front multiple times than it is to call it once and
> store the result. Also, while the result of front may be equivalent every
> time, it may not be exactly the same object every time. For instance,
> 
> auto r2 = range.map!(a => to!string())();
> 
> is going to to result in a range that allocates a string every time that
> front is called. This sort of thing is not an uncommon thing to do with map,
> and while many ranges are implemented to do most of their work in popFront
> (thereby making calling front cheap), there's no requirement that they do
> so. So, there are ranges that do their work in front rather than popFront,
> and in the case of map, it actually has to be done in front, because map can
> be random-access, and using popFront to calculate the value only works for
> front, not arbitrary elements. In such cases, calling front multiple times
> is definitely worse than just calling it once and storing the result. I
> think that pretty much the only time that it's better to call front multiple
> times rathre than calling it once and storing the result is when it returns
> by ref, because if it doesn't return by ref, you're either going to get a
> copy every time you call front, or the range is going to be creating a new
> object every time you call front. In those cases, storing the result of
> front and reusing it would be cheaper. And since most ranges do not return
> by ref, most ranges are not avoiding copies or any expensive calculations by
> calling front multiple times. If anything, they're incurring them.
> 
> Now, arguably, in the cases where a generic function needs to access front
> multiple times, it would be best if it tested front for refness and then
> used static ifs so that it stored the result of front when it didn't return
> by ref and called it multiple times when it did, but that would get ugly
> fast, and I'm not aware of any Phobos code that currently does that.
> 
>>> The result of all of this is that in generic code, you have no clue
>>> whatsoever what the semantics are of copying a range around, because it
>>> depends entirely on the copying semantics of whatever range that code is
>>> currently being instantiated with. As such, generic code basically has
>>> to
>>> assume that once you copy a range, you can't mutate the original (since
>>> it may or may not be affected by mutating the copy and may or may not
>>> be cleanly affected by mutating the copy),
>>
>> Jonathan, I'm sorry, but what you are describing is called "a bug".
>>
>> Input ranges provide no guarantees about copying the range itself. If
>> you do it and expect *anything* (which it seems you do), you have a bug.
>> If you need to copy the range itself, you absolutely need to require a
>> forward range.
>>
>> A forward range with ref front absolutely promises that both iterations
>> reference the same elements.
>>
>> I find myself sincerely hoping that you are making these examples up in
>> order to win this discussion turned into an argument for some reason,
>> instead of the way phobos is actually written, because if you're right
>> phobos is *broken*.
>>
>> The good news is that if Phobos is broken, this is a great way to flush
>> out the places it's broken. Let's add UTs calling all Phobos functions
>> accepting input ranges (as opposed to forward ranges) with non-copyable
>> ranges (i.e. - the range itself is non-copyable) and fix the compilation
>> errors. This shouldn't take more than a few days to do.
> 
> I'm not making anything up. Very little range-based code passes ranges by
> reference. They're copied by value all the time, even if they're not forward
> ranges. foreach itself copies a range before iterating over it. Yes, if you
> want to iterate over two independent copies, you need a forward range, and
> you need to call save, but ranges are copied constantly. It's just that it's
> then a bug to do anything with the original after the copy has been made. As
> it stands, a range that cannot be copied is going to work with almost
> nothing in Phobos and would be borderline useless. And adding ref everywhere
> to range-based functions to change that would destroy our ability to chain
> function calls with ranges. Also, a large percentage of range-based
> functions return a range that wraps another range, and that means copying
> the range that's passed in. It simply wouldn't work with many functions to
> accept a range by ref without then copying the range inside the function
> anyway. The assumption that input ranges are copyable is built into almost
> everything range-based in all of Phobos. The rare exceptions are functions
> like std.conv.parse which are specifically intended to mutate the range
> that's passed in rather than copying it.
> 
>>>> And allowing ranges with uncopyable members would turn this potential
>>>> deadly run-time bugs into easy to find compile time errors. Sound like
>>>> a
>>>> great reason to allow it, to me.
>>>
>>> Ranges have to be copyable.
>>
>> And why would a range iterating over uncopyable objects not be copyable?
>> How are the two related?
> 
> Because many ranges do stuff like store the value of front after calculating
> it with popFront or store some set of values to calculate new values. No
> such range would work with non-copyable element types even if isInputRange
> allowed them. Now, that doesn't mean that _some_ ranges couldn't be made to
> work with non-copyable element types, but the fact that ranges must be
> copyable means that any range that worked with non-copyable element types
> could not be a struct that stored any of its elements as a direct member.
> 
> - Jonathan M Davis
> 



More information about the Digitalmars-d mailing list