should pure functions accept/deal with shared data?

Thu Jun 7 14:36:45 PDT 2012

On 06/07/12 21:55, Steven Schveighoffer wrote:
> On Thu, 07 Jun 2012 15:16:20 -0400, Artur Skawina <art.08.09 at gmail.com> wrote:
> 
>> On 06/07/12 20:29, Steven Schveighoffer wrote:
> 
>>> I'm not proposing disallowing mutable references, just shared references.
>>
>> I know, but if a D function marked as "pure" takes a mutable ref (which a shared
>> one has to be assumed to be), it won't be treated as really pure for optimization
>> purposes (yes, i'm deliberately trying to avoid "strong" and "weak").
> 
> However, a mutable pure function can be *inside* an optimizable pure function, and the optimizable function can still be optimized.
> 
> A PAS function (pure accepting shared), however, devolves to a mutable pure function.  That is, there is zero advantage of having a pure function take shared vs. simply mutable TLS.
> 
> There is only one reason to mark a function that does not take all immutable or value type arguments as pure -- so it can be called inside a strong-pure function.  Otherwise, it's just a normal function, and even marked as pure will not be optimized.  You gain nothing else by marking it pure.
> 
> So let's look at two cases.  I'll re-state my example, in terms of two overloads, one which takes shared int and one which takes just int (both of which do the right thing):
> 
> void inc(ref int t) pure;
> {
>   ++t;
> }
> 
> void inc(ref shared(int) t) pure
> {
>   atomicOp!"++"(t);
> }
> 
> Now, let's define a strong-pure function that uses inc:
> 
> int slowAdd(int x, int y) pure
> {
>    while(y--) inc(x);
>    return x;
> }
> 
> I think we can both agree that inc *cannot* be optimized away, and that we agree slowAdd is *fully pure*.  That is, slowAdd *can* be optimized away, even though its call to inc cannot.
> 
> Now, what about a strong-pure function using the second (shared) form?  A strong pure function has to have all parameters (and return types) that are immutable or implicitly convertable to immutable.
> 
> I'll re-define slowAdd:
> 
> int slowAddShared(int x, int y) pure
> {
>    shared int sx = x;
>    while(y--) inc(sx);
>    return sx;
> }
> 
> We can agree for the same reason the original slowAdd is strong-pure, slowAddShared is strong-pure.
> 
> But what do we gain by being able to declare sx shared?  We can't return it as shared, or slowAddShared becomes weak-pure. 

Actually, *value* return types shouldn't prevent the function from being pure. But
there is not much point in returning them as shared, other than to avoid explicit
casts, something that would better solved with some kind of 'unique' class.

> We can't share it while inside slowAddShared, because we have no outlet for it, and we cannot access global variables.  In essence, marking sx as shared does *nothing*.  In fact, it does worse than nothing -- we now have to contend with shared for data that actually is *provably* unshared.  In other words, we are wasting cycles doing atomic operations instead of straight ops on a shared type.  Not only that, but because there are no outlets, declaring *any* data as shared while inside a strong-pure function is useless, no matter how we define any PAS functions.
> 
> So if shared is useless inside a strong-pure function, and the only point in marking a non-pure-optimizable function as pure is so it can be called within a strong-pure function, then pure is useless as an attribute on a function that accepts or returns shared data.  *Every case* where you use such a function inside a strong-pure function is incorrect.

We clearly agree completely; this is exactly what I'm saying in the paragraph you
quoted below. What i'm *also* saying is that the 'incorrectness' of it is harmless
in practice - so I'm not sure that it should be forbidden, and handled specially
(which would be necessary in the inferred-purity cases).

>> And any caller
>> will have to obtain this shared ref either from a mutable argument or global state.
>> Hence that "pure" function with shared inputs will *never* actually be pure.
>> So I'm wondering what would be the gain from banning shared in weakly pure functions
> 
> What is to gain is clarity, and more control over parameter types in generic code.
> 
> If shared is banned, than:
> 
> void inc(T)(ref T t) pure { ++t; }
> 
> *always* does the right thing.  As the author of inc, I am done.  I don't need template constraints or documentation, or anything else, and I don't need to worry about users abusing my function.  The compiler will enforce nobody uses this on shared data, which would require an atomic operation.

Having a type that allows operators that are either illegal or wrongly implemented
is not a problem specific to pure functions. My argument is that 'shared int' as
a type is worthless and should never appear in real code. The 'Atomic!int' example
was not made up - it is a real template used in my code that only allows legal
operations. That first 'inc' example would end up using

      pragma(attribute, always_inline) void opOpAssign(string op:"+")(size_t n) {
         asm { "lock add"~opsuffix~" %1, %0 #ATOMIC_ADD" : "+m" data : "ir" n*unitsize ; }
      }

and work correctly. I don't think it makes sense to worry about using built-in
types marked as shared directly, that is not likely to do the right thing; in fact
using shared(T) should probably be forbidden for every T that can not guarantee
every operation on it to be correct and always safe.

(oh, and that opOpAssign is intentionally not marked as pure, but I should probably
check what the compiler does; when i wrote it, I was assuming that the shared 'this',
shared 'data' and lack of outputs would make it do the right thing)

>> (Ugh, you made me use that word after all ;) ).
> 
> I did nothing of the sort :)
> 
>> AFAICT you're proposing to forbid something which currently is a NOOP.
> 
> It's not a NOOP, marking something as shared means you need special handling.  You can't call most functions or methods with shared data.  And if you do handle shared data, it's not just "the same" as unshared data -- you need to contend with data races, memory barriers, etc.  Just because it's marked shared doesn't mean everything about it is handled.

Exactly, see above. That's why you never access "raw" shared data - you always wrap it.
("access" meaning read and/or write, passing refs around is fine)
Problem solved.

>> And the change
>> could have consequences for templated functions or lambdas, where "pure" is inferred.
> 
> I would label those as *helpful* and *positive* consequences ;)

Are you saying that 

   auto f(T)(T v) { return v+v; } 

should be inferred as impure when used with a shared(T), but (weakly) pure
otherwise?

artur