pure or not pure?
Georg Wrede
georg at nospam.org
Thu Apr 10 10:34:42 PDT 2008
Steven Schveighoffer wrote:
> "Georg Wrede" wrote
>>Steven Schveighoffer wrote:
>>>"Janice Caron" wrote
>>>
>>>So how is new char[] a pure function? And why can't I 'wrap' new? For
>>>instance, if I have a function:
>>>
>>>int[] createIncreasingSequence(int s, int e, int step){...}
>>>
>>>which creates an array of ints that start at s and end at e, increasing
>>>by step each time, why can't I make that pure? i.e. I want to specify my
>>>own way to initialize an array. If I do that in 10 different pure
>>>functions, you're saying I have to re-implement that code in each one?
>>
>>The return value must be immutable if you're going to use it several
>>times.
>
> Agreed. But I don't want to use it several times, I want to use it once :)
>
> I want it to act like new does. For example, if I have
>
> char[] c = new char[5];
>
> in two different pure functions, new better not return the same exact memory
> space for both! It should be called every time. But I can't do this with a
> custom function because pure functions can only call other pure functions.
> Basically, I'm asking if there will be a way to move a common piece of a
> function that allocates and initializes data out of a pure function and into
> another.
>
> It would be cool if the compiler did not perform special optimizations (such
> as memoization) on pure functions that return mutable heap data, but still
> enforced the rules of purity.
>
>>And if you want to cache the return values outside of the function (as in
>>only returning a new array when new parameters are used, and otherwise
>>just a reference to a previously returned one), then I guess I don't have
>>an opinion. :-?
>>
>>>This was misleading on my part. I should have asked instead of these two
>>>questions, if substr is allowed to be pure, can it be called from a
>>>non-pure function? In the context of being called from a pure function,
>>>src is unique, because pure has to have unique data. But in the context
>>>of calling from a non-pure function, it might not be...
>>
>>Ultimately, any function is called from main(). Which is a non-pure
>>function.
I'll have to take that back. Of course main can be a pure function. Take
the unix sort function for an example. It receives data (which by the
way, is /assumed/ immutable, that is, the input file is assumed not to
change while sort is running) and _without_ otherwise changing its
environment (read, files on the file system), returns a value (the
sorted input).
Actually a lot of unix functionality is based on this. (Almost (just so
I'm not caught lying :-) )) all the unix commands that take stdin as
input and stdout as output, can be considered pure functions. (That most
of them also output to stderr doesn't count in this discussion. :-) Nor
the fact that they can change files (like the tee command) if the user
wants. Also, they're excellent examples of functions that can be pure or
not pure, depending on the invocation.)
> This is a special case of pure function. Let's say I have a pure function:
>
> pure invariant(char)[] f()
> {
> char[] c = new char[15];
> g(c);
> return assumeUnique!(c);
> }
>
> Now, if g is a pure function that takes a char[] argument and accesses the
> data referenced by the argument, it could be OK to call it from another pure
> function because we could assume that the pure function can only use heap
> data that is 'local' to the function. But it might be invalid to call such
> a pure function from a non-pure function because data held by a non-pure
> function is not always considered 'local'. On the other hand, if a pure
> function can hold pointers to heap data that is shared, but just can't use
> it, then a pure function that takes mutable heap data might not be able to
> use it, so a function like g() isn't possible.
I'd say that
- you can call a pure function from a non-pure function, anytime
- you can call a non-pure function from a pure function, BUT THAT
makes your pure function lose its purity. It'll still work, however.
(And of course, the compiler could be polite enough to remind you of
losing purity -- if the compiler can see that you're actually trying to
have a pure function, as opposed to the function being pure just by
happenstance.)
For a program none of this matters. I'd say that you can mix freely
whatever you like. BUT, if you wish the compiler has the possibility to
do serious optimizing, the you better not soil your pure functions.
I'd say, that is your only punishment for frivolous mixing. The data
doesn't get corrupted, and the sky doesn't turn checkered above you.
> If pure functions only allow invariant heap parameters and invariant heap
> returns, then this whole discussion becomes moot, but if they accept
> non-invariant heap parameters, or allow non-invariant heap returns, I want
> to know what the rules for those are. It seems to me that only allowing
> invariant heap data is a severe limitation that will leave D functional
> programming crippled in the face of an actual FP language.
One of the dreamed benefits of deciding that _D_ pure functions can only
access invariant data is to ensure that the compiler can do memoisation
when it wants to. That is, then the programmer doesn't have to do
memoisation by hand. This lessens bugs, eases coding, and gives the
compiler potentially big possibilities of very serious optimizing. (Not
that DMD probably will achieve such in some time, but still. After all,
we're designing a laguage here, too, which should make it _possible_ to
write such compilers.)
If we define
- pure functions can only receive immutable data as arguments
- pure functions can only access immutable other data
- pure functions can only return immutable data
- pure functions can't change or output other than the return "value"
- pure functions can only use other pure functions
and then we define
- "return value" can also be a reference to immutable heap data
- pure functions can use the heap as their local storage
and we decide that the last one means that pure functions can create and
manipulate mutable data on the heap, but only so that nobody else has
access to this data (including previous, later, or concurrent instances
of the same function),
then we should be half-way to the goals A/W are trying to reach.
---
Now, why all this heap stuff?
Simply because even Real Functional Languages, that make it look like
the stack is the only place in the world, can actually use the heap all
they want, only they do it behind the back of the programmer. It's not
about *literally* using the stack only, it's about the metaphor.
This metaphor is dictated by the fact that pure functionality means that
stuff is passed via return values only.
Now, in D (which is a multi-paradigm, practical language, for Systems
Programming), we don't want to restrict ourselves any more by purity
than we'd want to do that with OO, for example.
Before C++ was invented, we had languages that were OO-only, and
languages that were non-OO. (Even Tiobe still uses this distinction, and
I always wonder to which category they've put D.) The reason for this
either-or was simply that people were new to OO, and hadn't yet figured
out how (or whether) they can mix OO and procedural.
To take another example, when I got a VIC-20 in the early eighties, it
only had a Microsoft Basic interpreter. You couldn't do structured
programming with it simply because the facilities were missing in the
language. This made the programming effort more than exponential into
the number of lines written. Today, I only do structural programming (as
in versus unstructural), and admire Walter, who still mixes structured
and unstructured, quite fluently. (Just see Phobos source, especially
the C files.)
Now, with D we are charting new ground here. We are (arguably) the first
to make a serious effort of mixing functional (as in pure) with
procedural programming.
And this is where the heap stuff comes along. It would be impractical
for D to pass everything via the stack. First of all, non-trivial
progams would potentially need huge stack spaces, and in many
environments the stack space allocated for a program is given at startup
time. So, either we need to allocate stack just-in-case, or risk running
out of stack in mid-run. Wasteful, and impolite, especially in a
multiuser environment.
But to achieve this, we, at this point, have to decide that both input
to pure functions and output (i.e. the return "value") have to be
invariant. Only this lets us get away with only "pretending" to use the
stack, while in essence we pass references to heap stuff via the stack.
---
Later, when we're more at ease with mixing functional with
non-functional, we might well find ways to ease these restrictions. But
that might be several years after we've actually started using this in
real projects.
PS, oh Bob, I wish Walter would comment on this post, in any way.
Suppose my understanding on this differs from that of A/W, then I'd only
be misleading everybody here.
More information about the Digitalmars-d
mailing list