should pure functions accept/deal with shared data?

Thu Jun 7 12:55:31 PDT 2012

On Thu, 07 Jun 2012 15:16:20 -0400, Artur Skawina <art.08.09 at gmail.com>  
wrote:

> On 06/07/12 20:29, Steven Schveighoffer wrote:

>> I'm not proposing disallowing mutable references, just shared  
>> references.
>
> I know, but if a D function marked as "pure" takes a mutable ref (which  
> a shared
> one has to be assumed to be), it won't be treated as really pure for  
> optimization
> purposes (yes, i'm deliberately trying to avoid "strong" and "weak").

However, a mutable pure function can be *inside* an optimizable pure  
function, and the optimizable function can still be optimized.

A PAS function (pure accepting shared), however, devolves to a mutable  
pure function.  That is, there is zero advantage of having a pure function  
take shared vs. simply mutable TLS.

There is only one reason to mark a function that does not take all  
immutable or value type arguments as pure -- so it can be called inside a  
strong-pure function.  Otherwise, it's just a normal function, and even  
marked as pure will not be optimized.  You gain nothing else by marking it  
pure.

So let's look at two cases.  I'll re-state my example, in terms of two  
overloads, one which takes shared int and one which takes just int (both  
of which do the right thing):

void inc(ref int t) pure;
{
   ++t;
}

void inc(ref shared(int) t) pure
{
   atomicOp!"++"(t);
}

Now, let's define a strong-pure function that uses inc:

int slowAdd(int x, int y) pure
{
    while(y--) inc(x);
    return x;
}

I think we can both agree that inc *cannot* be optimized away, and that we  
agree slowAdd is *fully pure*.  That is, slowAdd *can* be optimized away,  
even though its call to inc cannot.

Now, what about a strong-pure function using the second (shared) form?  A  
strong pure function has to have all parameters (and return types) that  
are immutable or implicitly convertable to immutable.

I'll re-define slowAdd:

int slowAddShared(int x, int y) pure
{
    shared int sx = x;
    while(y--) inc(sx);
    return sx;
}

We can agree for the same reason the original slowAdd is strong-pure,  
slowAddShared is strong-pure.

But what do we gain by being able to declare sx shared?  We can't return  
it as shared, or slowAddShared becomes weak-pure.  We can't share it while  
inside slowAddShared, because we have no outlet for it, and we cannot  
access global variables.  In essence, marking sx as shared does  
*nothing*.  In fact, it does worse than nothing -- we now have to contend  
with shared for data that actually is *provably* unshared.  In other  
words, we are wasting cycles doing atomic operations instead of straight  
ops on a shared type.  Not only that, but because there are no outlets,  
declaring *any* data as shared while inside a strong-pure function is  
useless, no matter how we define any PAS functions.

So if shared is useless inside a strong-pure function, and the only point  
in marking a non-pure-optimizable function as pure is so it can be called  
within a strong-pure function, then pure is useless as an attribute on a  
function that accepts or returns shared data.  *Every case* where you use  
such a function inside a strong-pure function is incorrect.

But *mutable* data accepting functions *are* useful, because it allows us  
to modularize pure functions.  For example, sort can be (and should be)  
pure.  Instead of implementing a functional-style sort, or manually  
sorting data inside a strong-pure function, we can simply call sort, and  
it acts as a component of a strong-pure function, fully optimizable based  
on pure optimization rules.

> And any caller
> will have to obtain this shared ref either from a mutable argument or  
> global state.
> Hence that "pure" function with shared inputs will *never* actually be  
> pure.
> So I'm wondering what would be the gain from banning shared in weakly  
> pure functions

What is to gain is clarity, and more control over parameter types in  
generic code.

If shared is banned, than:

void inc(T)(ref T t) pure { ++t; }

*always* does the right thing.  As the author of inc, I am done.  I don't  
need template constraints or documentation, or anything else, and I don't  
need to worry about users abusing my function.  The compiler will enforce  
nobody uses this on shared data, which would require an atomic operation.

> (Ugh, you made me use that word after all ;) ).

I did nothing of the sort :)

> AFAICT you're proposing to forbid something which currently is a NOOP.

It's not a NOOP, marking something as shared means you need special  
handling.  You can't call most functions or methods with shared data.  And  
if you do handle shared data, it's not just "the same" as unshared data --  
you need to contend with data races, memory barriers, etc.  Just because  
it's marked shared doesn't mean everything about it is handled.

> And the change
> could have consequences for templated functions or lambdas, where "pure"  
> is inferred.

I would label those as *helpful* and *positive* consequences ;)

-Steve