Forcing weak-pure

Tue Mar 25 16:30:04 PDT 2014

On 03/25/14 21:51, Steven Schveighoffer wrote:
> On Tue, 25 Mar 2014 14:49:27 -0400, Artur Skawina <art.08.09 at gmail.com> wrote:
> 
>> On 03/25/14 14:30, Steven Schveighoffer wrote:
>>> [...] functions like GC.setAttr and assumeSafeAppend cannot be marked pure. For example:
>>>
>>> auto str = "hello".idup;
>>> str = str[0..1];
>>> str.assumeSafeAppend();
>>> str ~= "iya";
>>>
>>> The compiler could rationally elide the call to assumeSafeAppend if it is pure. We are not using the return value, and the only parameter is immutable. Since pure functions technically have no side effects, this call can be eliminated. A recent compiler change made such calls warnings (not using result of strong-pure function). But assumeSafeAppend really should be weak-pure, because it does have an effect. In essence, you are technically passing to assumeSafeAppend a pointer to the block that contains the slice, not the slice itself. And that block is mutable.
>>>
>>> GC.setAttr has similar issues.
>>>
>>> How can we force these to be weak-pure?
>>
>> Functions returning 'void' and w/o mutable args cannot be logically pure,
>> as long as they aren't no-ops, obviously. While this property could be
>> used to render them "weak-pure" in d-speak, this (or any other approach
>> to marking them as such) would not be enough...
>>
>>    // assuming 'assumeSafeAppend()' is "weak-pure":
>>
>>    string f(string s) pure { s.assumeSafeAppend(); s ~= "d"; return s; }
>>    string a = "abc".idup, b = f(a[0..2]);
> 
> This could not be elided, because you are using the return value. It would have to be called at least once.
> 
> However, I am OK with it being elided for a second call with the same input. In other words, this strong pure function does not have the same issues.

It's ok to treat allocator and factory functions as pure, because those really
are logically pure, ie can't affect any /visible/ state and return results that
are unique.
'assumeSafeAppend()' does have side effects - it can lead to corruption of other,
not directly related, data. The only thing that prevents this is a declaration
by the programmer -- "Trust me, I own this data and there are no other live aliases".
The problem is that this declaration would now happen /implicitly when calling
a (strongly) pure function/.
Note that the 'a' string in the above example could no longer be "abc" after 'f()'
runs.
The "pure" concept (re-)definition in D is bad enough even without having to worry
about supposedly pure functions stomping on unrelated data.

> But I get what you are saying -- just wrap it again, and we're back to the same issue. It is a systematic problem, because the concept is that you are not passing mutable references, even though you are then using those references to mark global data.
> 
> There is no really good answer I suppose. The huge downside of not marking assumeSafeAppend as weak-pure is that one cannot use assumeSafeAppend or set attributes inside pure functions that deal with arrays with immutable data. For example, the Appender type cannot be marked pure for immutable data.

If there was a version of assumeSafeAppend() marked as pure, then that one could
be used for such cases, where no aliases are allowed to escape before an ownership
transfer happens. It's much easier to audit a contained implementation (eg Appender)
than to check the whole program and every caller of every function which relies on
a potentially unsafe assumption. If 'assumeSafeAppend()' isn't pure /by default/
then D's purity guarantees are at least a little bit stronger.

Or maybe such a change (marking unsafe code as pure and hoping that the programmer
knows exactly what (s)he's doing) wouldn't really make the situation significantly
worse than it already is - I'm not sure...

artur