pure or not pure?

Thu Apr 10 09:08:26 PDT 2008

"Janice Caron" wrote
> On 10/04/2008, Steven Schveighoffer wrote:
>> So your solution is, create a duplicate array that is never used again,
>
> If you need to modify it, then it needs to be mutable, so a heap
> allocation is unavoidable in any case.

Yes, but this is 2x heap allocation.  For no reason.

>
>>  thereby increasing the runtime
>
> No, /decreasing/ the runtime, because the construction of the
> invariant array only has to happen ONCE. (Conceptually, it happens
> every time you call the function, but that's the beauty of functional
> programming. The compiler is able to cache things and reuse them).

Please re-read the example.  If I only ever call createIncreasingSequence 
with a set of arguments once, then it's memoizing on data that's never used 
again.  My supposition is that I have several functions that have similar 
code that I parameterize into a pure function.

Yes, f, and g can be cached, but that doesn't mean createIncreasingSequence 
needs to be cached.  What I'm trying to do (which is a common programming 
practice) is take common pieces of functions and use a separate function to 
do that piece.  My common piece happens to allocate data.

In other words, if I have pure functions:

f(x)
{
   statement1;
   statement2;
}

g(x)
{
   statement1;
   statement2;
}

Then I should be able to do

h(x)
{
    statement1;
    statement2;
}

f(x)
{
   h(x);
}

g(x)
{
   h(x);
}

No matter what statement 1 or 2 is.  If it happens to include memory 
allocation, and that is allowed in a pure function, then I should be able to 
wrap that memory allocation in another function.  If the compiler can't 
optimize on it inside the pure function, then it can't optimize it in the 
sub function, but who cares at that point?

>
>
>> Of course functional programming languages have mutable data.
>
> Haskell doesn't.
>
>>  They just
>>  don't have SHARED data.
>
> Haskell has no mutable data at all, only constants. In Haskell, you cannot 
> do
>
>    x = x + 1
>
> because x, once assigned, keeps its value until it goes out of scope.
> Now, I'm not saying that D needs to be that extreme. I'm only saying
> that to get the benefits of FP, you have to hand over a certain amount
> of control to the compiler, to the language. You no longer get to
> choose the order of evaluation. You cannot be certain what will and
> what won't be caller. You are no longer in /any/ position to estimate
> execution time on the basis of what will or will not be called, or
> when, or in what order.

I've been reading a bit about FP, and I see now that you are correct that 
mutable data isn't allowed.  But let's not forget that D is not a functional 
language, so a loop like:

for(int i = 0; i < n; i++)
   statement;

Is legal inside a D pure function, whereas the same code would be achieved 
through tail-recursion in a functional language.  Having mutable return 
values and arguments may not prevent the compiler from considering the 
function pure (then again, it may, that is my question).  But certainly, 
comparing D to Haskell doesn't prove either case.

I don't think the intent of bolting functional programming optimization 
opportunities on top of D is going to mean that during pure functions we 
will have to use a fully FP style.

>>  So let me restate: not having the ability to pass around UNIQUE mutable 
>> heap
>> data to and from pure functions is going to limit severely the usefulness 
>> of
>>  pure functions.
>
> I disagree - partly, because the compiler cannot statically verify
> uniqueness, so you would immediately be compromising the ability of FP
> to strongly typecheck everything, thereby robbing the compiler of the
> guarantees that it needs to ensure fast and crash-free
> multiprocessing. But mostly because FP requires a different way of
> thinking about things, and trying to force it to accomodate non-FP
> paradigms because they sit well with the way you think about problems
> just isn't playing the game. You have to trust that the compiler a
> little more. It can optimise better - you just have to keep your
> fingers crossed that it actually will!

OK, sure, so that means you believe:

char[] c = new char[5];

is invalid inside a pure function because the compiler cannot verify 
uniqueness of mutable heap data, right?  But the data returned IS unique. 
Couldn't there be a way to specify that the data is unique, or at least 
local?

I consider having to dup data just so I can use it to be wasteful, no matter 
what paradigm I'm using.

-Steve