should pure functions accept/deal with shared data?
Steven Schveighoffer
schveiguy at yahoo.com
Thu Jun 7 15:42:53 PDT 2012
On Thu, 07 Jun 2012 17:36:45 -0400, Artur Skawina <art.08.09 at gmail.com>
wrote:
> On 06/07/12 21:55, Steven Schveighoffer wrote:
>> On Thu, 07 Jun 2012 15:16:20 -0400, Artur Skawina <art.08.09 at gmail.com>
>> wrote:
>>
>>> On 06/07/12 20:29, Steven Schveighoffer wrote:
>>
>>>> I'm not proposing disallowing mutable references, just shared
>>>> references.
>>>
>>> I know, but if a D function marked as "pure" takes a mutable ref
>>> (which a shared
>>> one has to be assumed to be), it won't be treated as really pure for
>>> optimization
>>> purposes (yes, i'm deliberately trying to avoid "strong" and "weak").
>>
>> However, a mutable pure function can be *inside* an optimizable pure
>> function, and the optimizable function can still be optimized.
>>
>> A PAS function (pure accepting shared), however, devolves to a mutable
>> pure function. That is, there is zero advantage of having a pure
>> function take shared vs. simply mutable TLS.
>>
>> There is only one reason to mark a function that does not take all
>> immutable or value type arguments as pure -- so it can be called inside
>> a strong-pure function. Otherwise, it's just a normal function, and
>> even marked as pure will not be optimized. You gain nothing else by
>> marking it pure.
>>
>> So let's look at two cases. I'll re-state my example, in terms of two
>> overloads, one which takes shared int and one which takes just int
>> (both of which do the right thing):
>>
>> void inc(ref int t) pure;
>> {
>> ++t;
>> }
>>
>> void inc(ref shared(int) t) pure
>> {
>> atomicOp!"++"(t);
>> }
>>
>> Now, let's define a strong-pure function that uses inc:
>>
>> int slowAdd(int x, int y) pure
>> {
>> while(y--) inc(x);
>> return x;
>> }
>>
>> I think we can both agree that inc *cannot* be optimized away, and that
>> we agree slowAdd is *fully pure*. That is, slowAdd *can* be optimized
>> away, even though its call to inc cannot.
>>
>> Now, what about a strong-pure function using the second (shared) form?
>> A strong pure function has to have all parameters (and return types)
>> that are immutable or implicitly convertable to immutable.
>>
>> I'll re-define slowAdd:
>>
>> int slowAddShared(int x, int y) pure
>> {
>> shared int sx = x;
>> while(y--) inc(sx);
>> return sx;
>> }
>>
>> We can agree for the same reason the original slowAdd is strong-pure,
>> slowAddShared is strong-pure.
>>
>> But what do we gain by being able to declare sx shared? We can't
>> return it as shared, or slowAddShared becomes weak-pure.
>
> Actually, *value* return types shouldn't prevent the function from being
> pure. But
> there is not much point in returning them as shared, other than to avoid
> explicit
> casts, something that would better solved with some kind of 'unique'
> class.
Right, what I meant was, returning a shared reference. For example, if a
pure function allocated memory and returned it as a shared pointer, that
would make it non-optimizable pure (weak pure).
>> We can't share it while inside slowAddShared, because we have no outlet
>> for it, and we cannot access global variables. In essence, marking sx
>> as shared does *nothing*. In fact, it does worse than nothing -- we
>> now have to contend with shared for data that actually is *provably*
>> unshared. In other words, we are wasting cycles doing atomic
>> operations instead of straight ops on a shared type. Not only that,
>> but because there are no outlets, declaring *any* data as shared while
>> inside a strong-pure function is useless, no matter how we define any
>> PAS functions.
>>
>> So if shared is useless inside a strong-pure function, and the only
>> point in marking a non-pure-optimizable function as pure is so it can
>> be called within a strong-pure function, then pure is useless as an
>> attribute on a function that accepts or returns shared data. *Every
>> case* where you use such a function inside a strong-pure function is
>> incorrect.
>
> We clearly agree completely; this is exactly what I'm saying in the
> paragraph you
> quoted below. What i'm *also* saying is that the 'incorrectness' of it
> is harmless
> in practice - so I'm not sure that it should be forbidden, and handled
> specially
> (which would be necessary in the inferred-purity cases).
I have given you an example of where it is harmful. There is benefit in
being able to say "since I marked this function pure, I know I don't have
to deal with threading." It allows you to eliminate possible
multi-threading mistakes from whole swaths of code, especial generic code
which accepts a myriad of types.
You know there are a ton of generic functions in phobos that don't check
*at all* whether shared data is being given to them? Simply marking them
pure (which should be viable for most functions) would eliminate that
worry.
>>> And any caller
>>> will have to obtain this shared ref either from a mutable argument or
>>> global state.
>>> Hence that "pure" function with shared inputs will *never* actually be
>>> pure.
>>> So I'm wondering what would be the gain from banning shared in weakly
>>> pure functions
>>
>> What is to gain is clarity, and more control over parameter types in
>> generic code.
>>
>> If shared is banned, than:
>>
>> void inc(T)(ref T t) pure { ++t; }
>>
>> *always* does the right thing. As the author of inc, I am done. I
>> don't need template constraints or documentation, or anything else, and
>> I don't need to worry about users abusing my function. The compiler
>> will enforce nobody uses this on shared data, which would require an
>> atomic operation.
>
> Having a type that allows operators that are either illegal or wrongly
> implemented
> is not a problem specific to pure functions. My argument is that 'shared
> int' as
> a type is worthless and should never appear in real code. The
> 'Atomic!int' example
> was not made up - it is a real template used in my code that only allows
> legal
> operations. That first 'inc' example would end up using
>
> pragma(attribute, always_inline) void opOpAssign(string
> op:"+")(size_t n) {
> asm { "lock add"~opsuffix~" %1, %0 #ATOMIC_ADD" : "+m" data :
> "ir" n*unitsize ; }
> }
>
> and work correctly. I don't think it makes sense to worry about using
> built-in
> types marked as shared directly, that is not likely to do the right
> thing; in fact
> using shared(T) should probably be forbidden for every T that can not
> guarantee
> every operation on it to be correct and always safe.
I would be in favor of that. Right now the huge benefit of shared is what
you can assume on stuff that's *not* marked as shared. Using actual
shared types is very cumbersome and difficult to understand.
Giving shared more useful and robust meaning would be a huge benefit.
> (oh, and that opOpAssign is intentionally not marked as pure, but I
> should probably
> check what the compiler does; when i wrote it, I was assuming that the
> shared 'this',
> shared 'data' and lack of outputs would make it do the right thing)
marking it as pure is like putting static on a class. It will achieve
nothing.
>>> AFAICT you're proposing to forbid something which currently is a NOOP.
>>
>> It's not a NOOP, marking something as shared means you need special
>> handling. You can't call most functions or methods with shared data.
>> And if you do handle shared data, it's not just "the same" as unshared
>> data -- you need to contend with data races, memory barriers, etc.
>> Just because it's marked shared doesn't mean everything about it is
>> handled.
>
> Exactly, see above. That's why you never access "raw" shared data - you
> always wrap it.
> ("access" meaning read and/or write, passing refs around is fine)
> Problem solved.
Let's not forget the main benefit of pure -- to allow optimization.
Marking something as optimizable that *can never be* optimized or be a
part of *any* optimizable function serves no purpose.
Let's not forget a secondary benefit of pure -- dispatchability (probably
a better term for this). If I know there's no shared data involved, I can
dispatch a pure function to another worker thread without worry of races,
especially a strong-pure function, but it's quite easy to prove validity
for a weak-pure function.
If shared is involved, the second aspect goes out the window.
>>> And the change
>>> could have consequences for templated functions or lambdas, where
>>> "pure" is inferred.
>>
>> I would label those as *helpful* and *positive* consequences ;)
>
> Are you saying that
>
> auto f(T)(T v) { return v+v; }
>
> should be inferred as impure when used with a shared(T), but (weakly)
> pure
> otherwise?
You are saying two different things here...
f's purity depends on the expression (v + v)'s purity. And the level of
purity (weak or strong) depends on the level of (v + v)'s purity. IF v +
v is strong-pure (such as int + int), then f is strong-pure. If v + v is
weak-pure, f is weak pure. If v + v is not pure, then f is not pure.
That is how it works today.
What I'm saying is, shared just shouldn't be allowed to be any part of
pure. So if T is defined as shared int, even though it actually makes no
sense whatsoever for your example, f will be unpure.
That's another aspect of shared that needs to be addressed -- type
inference for shared expressions.
for instance:
shared int x, y;
auto z = x + y;
What type should z be? Right now it's shared, but that makes *no* sense,
because z is not shared until you share it. Why should auto opt-in to
something it doesn't have to?
Likewise with IFTI, f(x) should probably equate to f!int(x) (in which case
it *would* be pure)
-Steve
More information about the Digitalmars-d
mailing list