purity and memory allocations/pointers

Sat Aug 3 22:37:46 PDT 2013

On Sun, Aug 04, 2013 at 01:15:24AM +0200, John Colvin wrote:
[...]
> Is there anywhere formal defining D's pure (weak vs strong etc.)? A
> page in the wiki perhaps?
> 
> Imagine someone new coming to D and being confused by what our
> purity system is. It would suck to only be able to give an ad-hoc
> answer or link them to a previous discussion.
> 
> I would offer but I don't really understand it myself.

It's really quite simple once you understand the motivation behind it.

Pure (aka "strong" purity) means the return value of the function only
depends on its arguments, not on any hidden state, not on any global
state, and it has no side-effects. So no static variables, no globals,
no side-effects. This is purity as generally understood in functional
languages.

Now, traditionally, (strong) purity is interpreted to mean that the
function cannot use anything with side effects, like assignment
statements and mutable variables, since easiest way to ensure that
function as a whole has no side-effects is to build it out of primitives
that also has no side-effects.

The first insight in D, however, is that as long as the function's
return value only ever depends on its arguments, and it cannot access
any global variables (or cause side-effects like I/O), then it cannot be
externally distinguished from a genuinely pure function (in the Haskell
sense of pure). The fact that the function may be using mutable
variables and assignment statements is invisible to the outside world,
so for all intents and purposes, it might as well be Haskell-pure. This
leads to the convenience in D that it's OK to use mutable variables and
assignment statements, etc., that are traditionally prohibited in pure
languages, inside a D pure function, as long as the outside world can't
tell the difference. So we don't have to go full-blown functional
programming inside the function; we can still use loops and arrays and
stuff. This makes pure functions much more convenient to write, yet
without sacrificing the advantages of true purity.

The second insight in D, is that suppose you have a function, let's call
it W, that (1) has no internal state, (2) doesn't directly access global
variables, and (3) can affect the outside world but only through its
arguments (e.g., a pointer, a reference to mutable variables, etc.).
Now, clearly, being able to modify external state through its arguments
violates the traditional sense of purity. But here's where the insight
comes in: suppose we have a strongly pure function as described in the
previous paragraph, let's call it F. It has no side-effects, no hidden
state, no accessing of global variables, but does have some local
variables it uses to calculate its return value. The question is, will F
remain strongly pure if it calls W?

So let's think about the ramifications of calling W. Since W is able to
modify the outside world through its arguments, it seems that F can no
longer be pure, since any side-effects caused by W will also implicate
F. But if you think about it, the worst that F can do is to pass a
reference to a local variable to W. It can't pass a reference to a
global variable because it doesn't have any access to a global variable!
So whatever side-effect W may have, that side-effect is confined inside
the scope of F, and continues to be invisible to the world outside of F.
We can therefore conclude that F is *still* strongly pure!

Now, the story would be different if W could access global variables --
then its side-effects would be visible outside of F, and so F can no
longer be pure. So W still has some restrictions on it, it's just that
it's a little more lax than the restrictions on F.

So here is where the idea of weak purity comes in. We say that W is
"weakly pure", because it violates the traditional definition of purity
(can't mutate outside state through reference arguments), but it is
still restricted enough that when it's called from a strongly pure
function, all of its side-effects are confined inside the local scope of
the strongly pure function, and thereby are invisible to the world
outside of the strongly pure function. IOW, a weakly pure function is a
function that can be called by a strongly pure function without
violating the strong purity of the latter.

Finally, we make the observation that the distinction between a
strongly-pure and weakly-pure function is completely determined by its
argument types. A weakly-pure function is still not allowed to access
global variables, for example; the only way it can mutate the outside
world is if at least one of its arguments is mutable, or contains a
reference to some outside state. If all of its arguments are immutable
and contain no references, then the function can't actually have any
side-effects on the outside world at all, and so it is, in fact,
strongly pure. Thus, this allows us to use the "pure" keyword on *both*
strongly-pure and weakly-pure functions: the weakly pure functions can
be identified by the fact that one or more of their arguments may
contain mutable references; the strongly pure functions are the ones
whose arguments are immutable (or implicitly castable to immutable, like
ints -- since modifying an int argument has no effect on the outside
world).

Thus, we arrive at our current definition of purity in D.

To end, I'd like to mention the reason we want to go through all this
tedium to define strongly pure vs. weakly pure: the key insight is that
even though weakly pure functions are *not* pure in the traditional
sense, they *can* be used as building blocks to build strongly pure
functions! This greatly increases the implementation possibilities of
strongly pure functions. Not only can we use local variables and
assignments in imperative code inside a strongly pure function, we can
also be assured that calling weakly-pure functions, which have
side-effects, will still maintain our purity! The notion of weak purity
assures us that none of the side-effects we use inside the strongly-pure
function will ever "leak out" to the outside world. Therefore, we
continue to enjoy the benefits of purity-based optimizations, like
caching of return values, merging identical subexpressions, and many of
the other optimizations that are traditionally restricted only to
functional languages.

T

-- 
"640K ought to be enough" -- Bill G., 1984. "The Internet is not a primary goal for PC usage" -- Bill G., 1995. "Linux has no impact on Microsoft's strategy" -- Bill G., 1999.