What is pure and what is not pure

Fri Apr 4 02:15:09 PDT 2008

There's a lot of discussing going on here, and everybody has an opinion. 
It might help if we can agree on some basics.

To be pure, a function has to:

  - Always return the same result value given the same argument value(s).

  - The function result value cannot depend on any hidden information or 
state that may change as program execution proceeds.

  - The function can not depend on any external input from I/O devices.

  - The function can not cause any side effect observable outside of 
itself (except of course, returning a value).

  - The function cannot cause any output to any IO device.

Having said that, the following is also true:

  - The purity of a function is /only/ dependent on what it actually 
does, not on any semantics or syntax.

This last one means, that a function may be pure even if it is not 
explicitly meant to be by the programmer. What counts is whether it 
actually fulfills the above requirements in reality. The reverse is also 
true, a function intended and even decorated as pure, may not actually 
be pure.

Any syntactic devices a language associates with pure functions (e.g. 
such as decorations around the function definition) serve only to help 
the compiler and/or the programmer.

  - They may help the compiler try to enforce purity
  - They may help the compiler generate more optimal code
  - They may help the programmer remember his own intentions

But,

  - They *don't necessarily define* whether the function is pure or not.

Further,

  - If a pure function calls an impure function, then it becomes impure.

Put another way: a function is pure only if it itself is pure and every 
function it calls is pure. (This is especially important with methods of 
objects received as arguments, that our function may call. Important, 
because it seems to be hard to notice, or grasp, unless explicitly 
stated. Similarly, if our function itself is a method of an object, it 
may not change the object's state, or let that state influence its output.)

 From all of the above also follow some interesting properties. 
Understanding (or at the very least, believing) these, will make it 
easier to follow the discourse in this newsgroup.

Pure functions, or the use of them in a program, do not need constant 
(in any form) to be part of the language. While this is true, it of 
course demands discipline and acuteness from the D programmer. (This is 
about the same thing as writing object oriented programs in plain C. An 
example would be the Mars Pathfinder rover firmware.)

The compiler has to "know" if a function is pure before it can try to 
generate more optimal code with pureness optimizations. There are 
essentially two ways to do this. Either the programmer tells the 
compiler that the function is pure (e.g. by decorating the function, or 
by compiler directives (not in D), or some other means), or the compiler 
can try to figure it out for itself. The latter is not always possible.

The language creator can try to help the programmer, mainly by raising 
barriers against actions that would make the function impure, /but these 
actions only serve as aids/, since an idiot programmer could still 
invent ways to circumvent these. (Which is (sort of) OK, since D is a 
Systems Programming Language, after all.) Such barriers include, among 
other things:

  - Not letting the function write to any entities outside of its own 
internal scope. These include globals, the surrounding scope, arguments, 
members of structs or objects passed to it, or accessible via them (no 
matter what the method of access is), directly or even indirectly.

  - Not letting the function read from any entities outside its own 
internal scope, unless it's guaranteed that they don't change during the 
run of the program. This is usually understood as being constants (or 
whatever we today call them), but may also be the result of a 
command-line parameter, etc., as long as we know they won't change 
during the run of the program. (Now, here's a killer: of course a pure 
function can read globals, variables, I/O, or whatever /that may 
change/, but to stay Pure, it has to /not/ let those values have any 
influence on its output. In other words, if it reads e.g. a global or 
regular variable, it has to store it in a dummy variable, and never use 
that information so that it affects the function's own output! From this 
follows that it has no use for such a value, and from this follows that 
such an access is superfluous. Technically, however, the function still 
retains its purity. Because such an action can not /even in theory/ 
serve any purpose, it is just as well that such read access is forbidden.)

On the face of it, the fact that we want to give pure functions also 
arguments that are passed by reference (and not only value type data 
(POD)), would seem to create a problem with the statement:

  - Always return the same result value given the same argument value(s).

Consider the bit pattern the function receives as argument. Say we give 
an object instance to the pure function. It receives the reference as 
the bit pattern. But the contents of the object may change between 
invocations of our pure function, and therefore result in different output.

This simply means that bit pattern as a measure of "sameness" of input, 
is invalid. We have to consider everything the function receives via 
this argument (directly and indirectly) during the course of its 
invocation, as constituting the "input". When /all/ of that is "same", 
then, and only then, has the function the obligation to return the same 
value as previously.

So, in "given the same argument value(s)", the "value(s)" has to be 
interpreted as /all of what/ the function examines in or via such value(s).

We may have a function that is pure in one context and impure in another 
context. A trivial example would be a sort function that, given two 
arguments sorts the first and outputs into the second argument, and 
given one argument returns a sorted copy of it.

This is legal, and unambiguous. However, it might be laborious for the 
compiler to know the difference (when it tries to optimize using purity 
as a guideline -- but when not (as when there's a compiler option in 
effect, forbidding optimizing), even this shouldn't be a problem), and 
it certainly would be impractical to invent a way of manually informing 
the compiler of the function's purity/non-purity in this case. In the 
interest of not unduely complicating things, I suggest (if one ever 
really has to do this, at all) to use overloading, that is, two 
functions with the same name, one being pure and the other one not.

I hope this wraps it up. Now, if only somebody would write a similar 
post about const, we'd all be a lot better off. :-)

(PS, it might be possible later (D3?) to relax on some barriers. One 
thing could be that objects passed as arguments that get some of their 
methods run, might be allowed to change their private, internal state as 
side effects. In theory, this could be redefined as not violating the 
purity of our function, since such a state change is "that object's own 
private business". (We are, after all, treading new ground here.) 
However, pure functions are introduced in D for the purpose of 
optimizing code, and some special object's internal state is a fringe 
issue in comparison. Once we feel confident with pure functions, we may 
find other corner cases, too, to maybe reconsider.