If you could make any changes to D, what would they look like?

Fri Oct 29 21:16:49 UTC 2021

On Fri, Oct 29, 2021 at 04:14:35PM +0000, Paul Backus via Digitalmars-d wrote:
> On Friday, 29 October 2021 at 16:08:10 UTC, H. S. Teoh wrote:
> > > This is not the case with allocation/free, which are, by
> > > defintion, dependend on a global state (even if only thread
> > > local).
> > 
> > Yeah, pureFree makes no sense at all. It's a disaster waiting to
> > happen.
> 
> I think the original sin here is allowing GC allocation (`new`, `~=`,
> closures) to be `pure`, for "pragmatic" reasons.
> 
> Once you've done that, it's not hard to justify adding `pureMalloc`
> too. And once you have that, why not `pureFree`? It's just a little
> white lie; surely nobody will get hurt.
> 
> Of course the end result is that `pure` ends up being basically
> useless for anything beyond linting, and can't be fixed without
> breaking lots of existing code.

I think the real root problem is mixing incompatible levels of
abstraction.

At some level of abstraction, one could argue that GC allocation (or
memory allocation in general) is an intrinsic feature of the layer of
abstraction you're working with: a bunch of functions that do
computations with arrays can be considered pure if the implementation of
said arrays is abstracted away by the language, and these functions use
only the array primitives given to them by the abstraction, i.e., they
don't allocate or free memory directly, so they do not directly observe
the external effects of allocation.  Think of a program in a functional
language, for example. The implementation is definitely changing global
state -- doing I/O, allocating/freeing memory, etc.. But at the
abstraction level of the function language itself, these implementation
details are hidden away and one can meaningfully speak of the purity of
functions written in that language. One may legally optimize code based
on the abstracted semantics, because the semantics at the higher level
are preserved in spite of the low-level implementation details being
changed.

The problems come, however, when you have code that operates *both* at
the abstract level *and* deal with the low-level implementation at the
same time.  Suddenly, there is no longer a clear separation between code
in the higher-level abstraction and the lower-level implementation where
you have to deal with dirty details like allocating and freeing memory.
So the assumptions that the higher-level abstraction provides may no
longer hold, and that's where you begin to run into trouble.
Optimizations based on guarantees provided by the higher-level
abstraction become invalidated by lower-level code that break these
assumptions (because they operate outside of the confines of the
higher-level abstraction).

This is why array manipulation in a D pure function is in some sense
permissible, under certain assumptions, but things like pureFree do not
make sense, because it clearly mixes incompatible levels of abstraction
in a way that will inevitably lead to problems.  If we were to permit
array allocations in pure code, then we must necessarily also commit to
not go outside of the confines of that level of abstraction -- i.e., we
are not allowed to use memory allocation primitives that said array
operations are based on. As soon as this is violated, the whole thing
comes crashing down, because your program now has some operations that
are outside the abstraction assumed by the optimizations based on
`pure`.  Meaning that these optimizations now may be invalid.

The situation is similar to `immutable`. If you're operating at the GC
level, there is strictly speaking no such thing as immutable, because
the GC code casts untyped memory into immutable and vice versa, so that
the same block of memory may be immutable at one point in time but
become mutable when it's later collected and reallocated to mutable
data.  But this does not mean we're not allowed to optimize based on
immutable; by the same line of argument we might as well throw const and
immutable to the winds.  Instead, we declare GC code as @system, with
the GC interface @trusted, i.e., the GC operates outside of the confines
of immutability, but we trust it to do its job properly so that when we
return to the higher-level abstraction, all our previous assumptions
about immutable continue to hold.

So for pure, it's the same thing. For something to be pure you must have
a well-defined set of abstractions based on which the optimizer is
allowed to make certain transformations to your code.  You must adhere
to the restrictions imposed by this abstraction -- which is what the
`pure` qualifier is ostensibly for -- otherwise you end up in UB
territory, just like the situation with casting away immutable.  The
only sane way to maintain D's purity system is that code marked pure
cannot contain anything that violates the assumptions we have imposed on
pure.  Otherwise we're in de facto UB territory even if the spec's
definition of UB doesn't specifically state this case.

Long story short, pureFree makes no sense because it's very intent is to
make a visible change to the global state of memory -- clearly at a much
lower level of abstraction than `pure` is intended to operate at, and
clearly outside the `pure` abstraction.  In fact, I'd say that
*anything* that explicitly allocates/deallocates memory ought to be
prohibited from being marked `pure`.  Array operations are OK if we view
them as intrinsic, opaque operations that the pure abstraction grants
us. But anything that explicitly deals with memory allocation is clearly
an operation outside the `pure` abstraction, so allowing it to be marked
`pure` will inevitably break assumptions and land us in trouble.

T

-- 
Meat: euphemism for dead animal. -- Flora