`@safe` by default. What about `@pure` and `immutable` by default?

Thu Apr 18 20:30:17 UTC 2019

On Thursday, April 18, 2019 12:33:33 PM MDT Eugene Wissner via Digitalmars-d 
wrote:
> On Thursday, 18 April 2019 at 14:52:53 UTC, Jonathan M Davis
>
> wrote:
> > pure means that the function does not access mutable, global
> > state except via the function's arguments. That's it. nd the
> > compiler _is_ able to do stuff with that and does so right now.
>
> That compiler actually does something with pure is new to me. So
> I was wrong here.
>
> > It will elide function calls (though under such restricted
> > circumstances that it's not really worth it), but the bigger
> > gains come from how it helps the type system. For instance,
> >
> > Foo* foo(int i) pure { return new Foo(i); }
> >
> > immutable f = foo();
>
> I'm not really convinced this pattern is useful, but fine there
> are may be different use cases, I'm not familiar with. But this
> doesn't even work under similar circumstances:
>
> int* foo() pure
> {
>      return new int(5);
> }
>
> immutable f = foo();
>
> gives: cannot use non-constant CTFE pointer in an initializer
> `&[5][0]`

I take it that you were trying to initialize a variable that has to be
initialized during CTFE? CTFE doesn't particularly like casts, and it
couldn't even have pointers transfer from compile-time to runtime until
fairly recently. So, the code generation probably inserts a cast that CTFE
doesn't like. It works at runtime, and it should probably be made to work at
compile-time. It's also more useful with far more complex pieces of code -
like if you were initializing an immutable AA with a bunch of values or some
other piece of data that required more than a constructor. It can be done
without pure, but that then requires a cast, and it's up to the programmer
to make sure that the data is unique, and it's safe to cast it to immutable,
whereas with pure, the compiler can verify that for you.

> > Something like func(42) * func(42) will result in a call being
> > elided if func is pure, but even splitting it up onto two lines
> > kills that. e.g.
> >
> > auto f = func(42);
> > f = func(42) * f;
>
> I've just tested it and if I don't miss something, no call
> elision is done. GDC eliminates actually the call if it has the
> source code, but GDC does it whether the function is pure or not.
> I also don't see how it may be possible.

It's my understanding that calling a strongly pure function multiple times
within the same expression will result in the compiler eliding more than the
first call (though I haven't tested it recently). The function's parameters
must therefore be immutable (or implicitly convertible to immutable as would
be the case with ints), or it can't do it. So, _very_ few functions will
qualify, and how often does anyone call the same function multiple times
with the same arguments in a single expression? For it to really be useful,
you'd have to be doing a lot of math code with pure functions or something
else that involved a lot of immutable variables (which most code doesn't
have) where it made sense to call the same function multiple times in the
same expression (which most code doesn't do). It would be more useful with
code flow analysis, because then you'd potentially get call elision across
an entire function, but given Walter's stance on code flow analysis, I doubt
that it's ever happening. In general, while it is my understanding that the
compiler will elide multiple, identical calls to a strongly pure function
within a single expression, that just isn't very useful in practice - which
is part of why the whole idea that pure is there for actual, functional
purity is kind of bogus. It's also why pure was expanded beyond strongly
pure, because strongly pure functions don't happen often without weakly pure
function helpers, and even then, they don't happen often. In reality, the
biggest benefits to pure probably come from constructing immutable objects,
with the secondary benefit being that you know that a section of code can't
access any globals except through function arguments.

> size_t foo() pure
> {
>      return cast(size_t) new int(5);
> }
>
> It is a perfectly valid pure function, that doesn't depend on any
> global state, doesn't have arguments, without casting away the
> impurity, but it returns different values every time.

Well, you found a loophole then. The fact that you can allocate in a pure
function is extremely useful, but it does come with the caveat that even
though the allocated objects will always have the same value, they won't be
the exact same object. So, by casting to get the pointer value, you can
indeed cheat it. I'm not sure how possible it is to have the compiler
prevent it, but it's also not something that's likely to be a problem in
practice. It _is_ a loophole though brought on by one of the aspects of pure
that was made more lax in order to increase its usefulness.

> > I don't get this. pure is fairly well understood.
>
> Thinking of the last discussion about pure, just before
> pureMalloc was introduced, I got a different feeling, but well,
> it kind of does, what the specification says.

The problem with pureMalloc is that you're trying to emulate what happens
when allocating memory via new, which technically violates not accessing
mutable, global state. It's just that it was decided that they way that it
did it with new was acceptable, since mutating the GC bookkeeping wasn't
really part of the program's logical state (though that does lead to the
loophole you mentioned above). pureMalloc is then having the programmer do
something similar without the compiler's help and without the GC cleaning up
after it (so, it needs a corresponding free call). Call elision in
particular is deadly, and while it's not going to happen often, having it
happen in a way that would result in memory being freed twice or not freed
at all would be a big problem. So, the whole pureMalloc thing is a bit of a
mess. Certainly, it's not dealing with pure in any kind of normal manner,
and it's trying to convince the compiler that something can be considered
pure without the compiler then having problems due to the fact that it isn't
actually pure. As far as just using pure goes and what that does, it's well
understood overall. It's trying to trick the compiler where things get
messy.

> I also don't find Haskell's purity "insane", but actually very
> useful and solid, so I might be biased torwards "strong purity"
> or "no purity at all".

I programmed in Haskell a fair bit in college. I think that it was a good
experience, because it greatly increased my ability to write functional
code, and it greatly increased how good I was with stuff like recursion.
That being said, I think that it's an insane way to program in practice
(monads being a prime example of some of what happens when you go down that
route). The fact that D is multiparadigm means that you can use such idioms
where they make a lot of sense (e.g. a lot of range code tends to be fairly
functional in nature), but it's not forced on you. I really don't understand
anyone who would _want_ to program in Haskell (or any language like it) as
much more than a learning experience.

Regardless, D's pure really doesn't have much to do with functional purity
at this point, even if that was why it was originally put in the language.
Having it be @noglobal would be far more accurate, though it _can_ be used
to have actual, functional purity in some cases. It's useful to have, but if
we were going to ditch one of the function attributes, I'd probably put it
near the top of the list. I don't want to lose it, but it tends to be truly
useful in a rather limited number of circumstances, and I would hate to see
it forced on code in general.

- Jonathan M Davis