What is pure and what is not pure
Georg Wrede
georg at nospam.org
Fri Apr 4 02:15:09 PDT 2008
There's a lot of discussing going on here, and everybody has an opinion.
It might help if we can agree on some basics.
To be pure, a function has to:
- Always return the same result value given the same argument value(s).
- The function result value cannot depend on any hidden information or
state that may change as program execution proceeds.
- The function can not depend on any external input from I/O devices.
- The function can not cause any side effect observable outside of
itself (except of course, returning a value).
- The function cannot cause any output to any IO device.
Having said that, the following is also true:
- The purity of a function is /only/ dependent on what it actually
does, not on any semantics or syntax.
This last one means, that a function may be pure even if it is not
explicitly meant to be by the programmer. What counts is whether it
actually fulfills the above requirements in reality. The reverse is also
true, a function intended and even decorated as pure, may not actually
be pure.
Any syntactic devices a language associates with pure functions (e.g.
such as decorations around the function definition) serve only to help
the compiler and/or the programmer.
- They may help the compiler try to enforce purity
- They may help the compiler generate more optimal code
- They may help the programmer remember his own intentions
But,
- They *don't necessarily define* whether the function is pure or not.
Further,
- If a pure function calls an impure function, then it becomes impure.
Put another way: a function is pure only if it itself is pure and every
function it calls is pure. (This is especially important with methods of
objects received as arguments, that our function may call. Important,
because it seems to be hard to notice, or grasp, unless explicitly
stated. Similarly, if our function itself is a method of an object, it
may not change the object's state, or let that state influence its output.)
From all of the above also follow some interesting properties.
Understanding (or at the very least, believing) these, will make it
easier to follow the discourse in this newsgroup.
Pure functions, or the use of them in a program, do not need constant
(in any form) to be part of the language. While this is true, it of
course demands discipline and acuteness from the D programmer. (This is
about the same thing as writing object oriented programs in plain C. An
example would be the Mars Pathfinder rover firmware.)
The compiler has to "know" if a function is pure before it can try to
generate more optimal code with pureness optimizations. There are
essentially two ways to do this. Either the programmer tells the
compiler that the function is pure (e.g. by decorating the function, or
by compiler directives (not in D), or some other means), or the compiler
can try to figure it out for itself. The latter is not always possible.
The language creator can try to help the programmer, mainly by raising
barriers against actions that would make the function impure, /but these
actions only serve as aids/, since an idiot programmer could still
invent ways to circumvent these. (Which is (sort of) OK, since D is a
Systems Programming Language, after all.) Such barriers include, among
other things:
- Not letting the function write to any entities outside of its own
internal scope. These include globals, the surrounding scope, arguments,
members of structs or objects passed to it, or accessible via them (no
matter what the method of access is), directly or even indirectly.
- Not letting the function read from any entities outside its own
internal scope, unless it's guaranteed that they don't change during the
run of the program. This is usually understood as being constants (or
whatever we today call them), but may also be the result of a
command-line parameter, etc., as long as we know they won't change
during the run of the program. (Now, here's a killer: of course a pure
function can read globals, variables, I/O, or whatever /that may
change/, but to stay Pure, it has to /not/ let those values have any
influence on its output. In other words, if it reads e.g. a global or
regular variable, it has to store it in a dummy variable, and never use
that information so that it affects the function's own output! From this
follows that it has no use for such a value, and from this follows that
such an access is superfluous. Technically, however, the function still
retains its purity. Because such an action can not /even in theory/
serve any purpose, it is just as well that such read access is forbidden.)
On the face of it, the fact that we want to give pure functions also
arguments that are passed by reference (and not only value type data
(POD)), would seem to create a problem with the statement:
- Always return the same result value given the same argument value(s).
Consider the bit pattern the function receives as argument. Say we give
an object instance to the pure function. It receives the reference as
the bit pattern. But the contents of the object may change between
invocations of our pure function, and therefore result in different output.
This simply means that bit pattern as a measure of "sameness" of input,
is invalid. We have to consider everything the function receives via
this argument (directly and indirectly) during the course of its
invocation, as constituting the "input". When /all/ of that is "same",
then, and only then, has the function the obligation to return the same
value as previously.
So, in "given the same argument value(s)", the "value(s)" has to be
interpreted as /all of what/ the function examines in or via such value(s).
We may have a function that is pure in one context and impure in another
context. A trivial example would be a sort function that, given two
arguments sorts the first and outputs into the second argument, and
given one argument returns a sorted copy of it.
This is legal, and unambiguous. However, it might be laborious for the
compiler to know the difference (when it tries to optimize using purity
as a guideline -- but when not (as when there's a compiler option in
effect, forbidding optimizing), even this shouldn't be a problem), and
it certainly would be impractical to invent a way of manually informing
the compiler of the function's purity/non-purity in this case. In the
interest of not unduely complicating things, I suggest (if one ever
really has to do this, at all) to use overloading, that is, two
functions with the same name, one being pure and the other one not.
I hope this wraps it up. Now, if only somebody would write a similar
post about const, we'd all be a lot better off. :-)
(PS, it might be possible later (D3?) to relax on some barriers. One
thing could be that objects passed as arguments that get some of their
methods run, might be allowed to change their private, internal state as
side effects. In theory, this could be redefined as not violating the
purity of our function, since such a state change is "that object's own
private business". (We are, after all, treading new ground here.)
However, pure functions are introduced in D for the purpose of
optimizing code, and some special object's internal state is a fringe
issue in comparison. Once we feel confident with pure functions, we may
find other corner cases, too, to maybe reconsider.
More information about the Digitalmars-d
mailing list