purity question

H. S. Teoh via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Tue May 30 11:51:41 PDT 2017


On Tue, May 30, 2017 at 11:10:19AM -0700, Jonathan M Davis via Digitalmars-d-learn wrote:
[...]
> Yeah, basically, D's pure was originally what is now sometimes called
> "strongly pure," which is quite close to functionally pure in that the
> same input results in the same output (it still allows alocating
> memory and returning it though, so you can have equal but different
> objects if the same function is called multiple times with the same
> arguments in different contexts where the compiler doesn't memoize
> it). However, pure was so restrictive as to be borderline useless,
> because it could only pass types that were immutable or implicitly
> convertible to immutable. So, it was expanded to include any function
> which could not access global, mutable state, because such functions
> can be called from a "strongly" pure function without violating the
> strongly pure function's guarantees - and thus "weakly" pure functions
> were born, making it so that D's pure is really more like @noglobal
> than functionally pure. It's a critical building block in functional
> purity, and strongly pure functions still get all of the benefits that
> they got before (and now they can actually be useful, because they can
> call many more functions), but _most_ pure functions are weakly pure
> and thus don't get the full benefits of functional purity - and that's
> why some folks like Ketmar tend to get annoyed with D's pure. But the
> way it is is actually quite useful even if it's initially confusing.
> And even when there are no strongly pure functions in your program,
> knowing that a function cannot access global variables except through
> its arguments means a lot in and of itself.
[...]

In retrospect, we could have named "weakly pure" something like
@noglobal, and "strongly pure" as pure.

But be that as it may, I think D's pure system is actually extremely
ingenious.  Look at it this way: in the classical sense of functional
purity, which is what you get in (pure) functional languages, it can get
quite difficult to implement the functionality you want, because the
language enforces that every primitive you use in your implementation
must be pure.  The reasoning is that if you're only allowed to use pure
primitives, then the resulting function is guaranteed to be pure.  It's
nice and simple. However, it's also rather cumbersome, especially in the
context of an imperative language like D.

For example, in a functional language you cannot assign new values to
variables, and you cannot write for-loops, because the loop index is not
allowed to mutate. You cannot mutate anything, because mutation makes
the code impure. So instead of straightforward loops, you need to resort
to things like tail recursion; instead of mutation, you need to use
monads, and so on.  I'm not saying this is a bad thing, but it's just
cumbersome because you, the programmer, cannot simply implement a
function with a straightforward algorithm, but you have to work harder
to express the algorithm in functional terms using recursion and other
functional (pure) constructs.

The first insight in D's purity system is the observation that, given
some function f(), from the caller's POV all they care about is that (1)
the function always returns the same value given the same arguments, and
(2) there are no visible side-effects.  Point (2) is where the insight
lies: in a sense, it *doesn't matter* if f() does all kinds of impure
stuff in order to compute its return value, as long as the outside world
cannot see it.  It may be internally impure, but externally, as far as
the outside world is concerned, it's pure, because the caller can't tell
the difference.  So D allows things like variables, mutation, for-loops,
and all kinds of stuff inside (strongly) pure functions -- as long as
no global state is touched, and as long as the function arguments aren't
modified (from the caller's POV), f() is pure.  In other words, as long
as f() keeps its dirty laundry to itself and leaves the outside world
untouched, it is externally pure, even if it's internally impure.

The next insight is this: suppose f() calls g(), and g() is impure.  In
the traditional functional language purity system, this is outright
illegal, because a pure function, by definition, cannot call an impure
function, otherwise it is itself impure.  However, suppose g() does not
modify any global state.  It *may* modify stuff through its arguments --
which makes it impure.  But if f() is not allowed to modify its
arguments and has no global state, then at worst, it can only pass its
internal, non-global state to g(). Therefore, it is impossible for g()
to reach global state through its arguments. Thus, f()'s (external)
purity is preserved.

Note the fine distinction here, that g()'s external purity is keyed on
being called from inside a strongly-pure function f(). If f() is impure,
then there is no guarantee that g() may modify global state (because f()
may pass a reference to a global variable to g(), and g() modifies that
global via the reference).

In D terminology, we say g() is "weakly pure" -- its purity is dependent
on the context in which it's called.  We say f() is "strongly pure",
because it is (externally) pure in the traditional functional sense that
its return value is cacheable and it has no (observable) side-effects.

(This similar to nothrow functions being allowed to throw exceptions
internally -- as long as the exceptions are caught before the function
returns.  Internally, the function does throw exceptions; but externally
it's nothrow because the caller cannot ever see said exceptions.)

//

Why go through all of this complicated reasoning?  Because it expands
the horizons of what a strongly-pure function can do, *while still
remaining externally pure*.  A strongly pure function is now allowed to
call a weakly-pure (i.e., @noglobal) function, rather than being
restricted to only calling other strongly-pure functions.  Allowing more
functions to be callable from inside a strongly-pure function means we
can do much more inside a strongly-pure function, without breaking its
(external) purity. So we can still enjoy the benefits of strong
(classical) purity, like cacheability, etc., without having our hands
tied in the implementation of the function.

It's true that the `pure` keyword can be misleading for newcomers,
because unfortunately, due to historical reasons, `pure` is used for
both "strongly pure" and "weakly pure" functions.  In fact, in the
current compiler implementation, `pure` only means "weakly pure"; it's
just that the compiler only applies optimizations afforded by strong
purity when the function arguments are immutable or implicitly
convertible to immutable.  If we could have redesigned the language, we
could have called it by a less confusing name, like @noglobal, with
`pure` reserved for strongly-pure functions.


T

-- 
An elephant: A mouse built to government specifications. -- Robert Heinlein


More information about the Digitalmars-d-learn mailing list