should pure functions accept/deal with shared data?

Thu Jun 7 15:42:53 PDT 2012

On Thu, 07 Jun 2012 17:36:45 -0400, Artur Skawina <art.08.09 at gmail.com>  
wrote:

> On 06/07/12 21:55, Steven Schveighoffer wrote:
>> On Thu, 07 Jun 2012 15:16:20 -0400, Artur Skawina <art.08.09 at gmail.com>  
>> wrote:
>>
>>> On 06/07/12 20:29, Steven Schveighoffer wrote:
>>
>>>> I'm not proposing disallowing mutable references, just shared  
>>>> references.
>>>
>>> I know, but if a D function marked as "pure" takes a mutable ref  
>>> (which a shared
>>> one has to be assumed to be), it won't be treated as really pure for  
>>> optimization
>>> purposes (yes, i'm deliberately trying to avoid "strong" and "weak").
>>
>> However, a mutable pure function can be *inside* an optimizable pure  
>> function, and the optimizable function can still be optimized.
>>
>> A PAS function (pure accepting shared), however, devolves to a mutable  
>> pure function.  That is, there is zero advantage of having a pure  
>> function take shared vs. simply mutable TLS.
>>
>> There is only one reason to mark a function that does not take all  
>> immutable or value type arguments as pure -- so it can be called inside  
>> a strong-pure function.  Otherwise, it's just a normal function, and  
>> even marked as pure will not be optimized.  You gain nothing else by  
>> marking it pure.
>>
>> So let's look at two cases.  I'll re-state my example, in terms of two  
>> overloads, one which takes shared int and one which takes just int  
>> (both of which do the right thing):
>>
>> void inc(ref int t) pure;
>> {
>>   ++t;
>> }
>>
>> void inc(ref shared(int) t) pure
>> {
>>   atomicOp!"++"(t);
>> }
>>
>> Now, let's define a strong-pure function that uses inc:
>>
>> int slowAdd(int x, int y) pure
>> {
>>    while(y--) inc(x);
>>    return x;
>> }
>>
>> I think we can both agree that inc *cannot* be optimized away, and that  
>> we agree slowAdd is *fully pure*.  That is, slowAdd *can* be optimized  
>> away, even though its call to inc cannot.
>>
>> Now, what about a strong-pure function using the second (shared) form?   
>> A strong pure function has to have all parameters (and return types)  
>> that are immutable or implicitly convertable to immutable.
>>
>> I'll re-define slowAdd:
>>
>> int slowAddShared(int x, int y) pure
>> {
>>    shared int sx = x;
>>    while(y--) inc(sx);
>>    return sx;
>> }
>>
>> We can agree for the same reason the original slowAdd is strong-pure,  
>> slowAddShared is strong-pure.
>>
>> But what do we gain by being able to declare sx shared?  We can't  
>> return it as shared, or slowAddShared becomes weak-pure.
>
> Actually, *value* return types shouldn't prevent the function from being  
> pure. But
> there is not much point in returning them as shared, other than to avoid  
> explicit
> casts, something that would better solved with some kind of 'unique'  
> class.

Right, what I meant was, returning a shared reference.  For example, if a  
pure function allocated memory and returned it as a shared pointer, that  
would make it non-optimizable pure (weak pure).

>> We can't share it while inside slowAddShared, because we have no outlet  
>> for it, and we cannot access global variables.  In essence, marking sx  
>> as shared does *nothing*.  In fact, it does worse than nothing -- we  
>> now have to contend with shared for data that actually is *provably*  
>> unshared.  In other words, we are wasting cycles doing atomic  
>> operations instead of straight ops on a shared type.  Not only that,  
>> but because there are no outlets, declaring *any* data as shared while  
>> inside a strong-pure function is useless, no matter how we define any  
>> PAS functions.
>>
>> So if shared is useless inside a strong-pure function, and the only  
>> point in marking a non-pure-optimizable function as pure is so it can  
>> be called within a strong-pure function, then pure is useless as an  
>> attribute on a function that accepts or returns shared data.  *Every  
>> case* where you use such a function inside a strong-pure function is  
>> incorrect.
>
> We clearly agree completely; this is exactly what I'm saying in the  
> paragraph you
> quoted below. What i'm *also* saying is that the 'incorrectness' of it  
> is harmless
> in practice - so I'm not sure that it should be forbidden, and handled  
> specially
> (which would be necessary in the inferred-purity cases).

I have given you an example of where it is harmful.  There is benefit in  
being able to say "since I marked this function pure, I know I don't have  
to deal with threading."  It allows you to eliminate possible  
multi-threading mistakes from whole swaths of code, especial generic code  
which accepts a myriad of types.

You know there are a ton of generic functions in phobos that don't check  
*at all* whether shared data is being given to them?  Simply marking them  
pure (which should be viable for most functions) would eliminate that  
worry.

>>> And any caller
>>> will have to obtain this shared ref either from a mutable argument or  
>>> global state.
>>> Hence that "pure" function with shared inputs will *never* actually be  
>>> pure.
>>> So I'm wondering what would be the gain from banning shared in weakly  
>>> pure functions
>>
>> What is to gain is clarity, and more control over parameter types in  
>> generic code.
>>
>> If shared is banned, than:
>>
>> void inc(T)(ref T t) pure { ++t; }
>>
>> *always* does the right thing.  As the author of inc, I am done.  I  
>> don't need template constraints or documentation, or anything else, and  
>> I don't need to worry about users abusing my function.  The compiler  
>> will enforce nobody uses this on shared data, which would require an  
>> atomic operation.
>
> Having a type that allows operators that are either illegal or wrongly  
> implemented
> is not a problem specific to pure functions. My argument is that 'shared  
> int' as
> a type is worthless and should never appear in real code. The  
> 'Atomic!int' example
> was not made up - it is a real template used in my code that only allows  
> legal
> operations. That first 'inc' example would end up using
>
>       pragma(attribute, always_inline) void opOpAssign(string  
> op:"+")(size_t n) {
>          asm { "lock add"~opsuffix~" %1, %0 #ATOMIC_ADD" : "+m" data :  
> "ir" n*unitsize ; }
>       }
>
> and work correctly. I don't think it makes sense to worry about using  
> built-in
> types marked as shared directly, that is not likely to do the right  
> thing; in fact
> using shared(T) should probably be forbidden for every T that can not  
> guarantee
> every operation on it to be correct and always safe.

I would be in favor of that.  Right now the huge benefit of shared is what  
you can assume on stuff that's *not* marked as shared.  Using actual  
shared types is very cumbersome and difficult to understand.

Giving shared more useful and robust meaning would be a huge benefit.

> (oh, and that opOpAssign is intentionally not marked as pure, but I  
> should probably
> check what the compiler does; when i wrote it, I was assuming that the  
> shared 'this',
> shared 'data' and lack of outputs would make it do the right thing)

marking it as pure is like putting static on a class.  It will achieve  
nothing.

>>> AFAICT you're proposing to forbid something which currently is a NOOP.
>>
>> It's not a NOOP, marking something as shared means you need special  
>> handling.  You can't call most functions or methods with shared data.   
>> And if you do handle shared data, it's not just "the same" as unshared  
>> data -- you need to contend with data races, memory barriers, etc.   
>> Just because it's marked shared doesn't mean everything about it is  
>> handled.
>
> Exactly, see above. That's why you never access "raw" shared data - you  
> always wrap it.
> ("access" meaning read and/or write, passing refs around is fine)
> Problem solved.

Let's not forget the main benefit of pure -- to allow optimization.   
Marking something as optimizable that *can never be* optimized or be a  
part of *any* optimizable function serves no purpose.

Let's not forget a secondary benefit of pure -- dispatchability (probably  
a better term for this).  If I know there's no shared data involved, I can  
dispatch a pure function to another worker thread without worry of races,  
especially a strong-pure function, but it's quite easy to prove validity  
for a weak-pure function.

If shared is involved, the second aspect goes out the window.

>>> And the change
>>> could have consequences for templated functions or lambdas, where  
>>> "pure" is inferred.
>>
>> I would label those as *helpful* and *positive* consequences ;)
>
> Are you saying that
>
>    auto f(T)(T v) { return v+v; }
>
> should be inferred as impure when used with a shared(T), but (weakly)  
> pure
> otherwise?

You are saying two different things here...
f's purity depends on the expression (v + v)'s purity.  And the level of  
purity (weak or strong) depends on the level of (v + v)'s purity.  IF v +  
v is strong-pure (such as int + int), then f is strong-pure.  If v + v is  
weak-pure, f is weak pure.  If v + v is not pure, then f is not pure.   
That is how it works today.

What I'm saying is, shared just shouldn't be allowed to be any part of  
pure.  So if T is defined as shared int, even though it actually makes no  
sense whatsoever for your example, f will be unpure.

That's another aspect of shared that needs to be addressed -- type  
inference for shared expressions.

for instance:

shared int x, y;

auto z = x + y;

What type should z be?  Right now it's shared, but that makes *no* sense,  
because z is not shared until you share it.  Why should auto opt-in to  
something it doesn't have to?

Likewise with IFTI, f(x) should probably equate to f!int(x) (in which case  
it *would* be pure)

-Steve