[Issue 8185] Pure functions and pointers

d-bugmail at puremagic.com d-bugmail at puremagic.com
Mon Jun 4 09:06:42 PDT 2012


http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #38 from art.08.09 at gmail.com 2012-06-04 09:08:38 PDT ---
(In reply to comment #23)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > (In reply to comment #12)
> > > > (In reply to comment #11)
> > > > > Pointers may only access their own memory blocks, therefore exactly those
> > > > > blocks participate in argument value and return value.
> > > > 
> > > > What does 'their own memory block' mean?
> > > 
> > > The allocated memory block it points into.
> > 
> > But, as the bounds are unknown to the compiler, it does not have the this
> > information, it has to assume everything is reachable via the pointer.
> 
> 1. It does not need the information. Dereferencing a pointer outside the valid
>    bounds results in undefined behavior. Therefore the compiler can just ignore
>    the possibility.

The problem is there are no "valid bounds". Unless you'd like to declare
   (char* p) {return p[1];}
as invalid, which as you yourself say is restrictive (but IMO acceptable for
pure functions, at least the ones that are automatically inferred as pure).

> 2. It can gain some information at the call site. Eg:
> 
> int foo(const(int)* y)pure;
> 
> void main(){
>     int* x = new int;
>     int* y = new int;
>     auto a = foo(x);
>     auto b = foo(y);
>     auto c = foo(x);
> 
>     assert(a == c);
> }

According to certain replies in this report, that assertion could fail. :) 

But i get what you're saying - now consider this foo() definition instead:

   int foo()(const(int)* y) {
      int r;
      foreach (i; 0..size_t.max)
         r += y[i];
      return r;
   }

   /* same main () */

The compiler will treat foo() as pure, so if it would be able to act on the
a==c assumption above, it could also do the same here. And now it would be
completely wrong - the function doesn't even try to pretend that it's pure, yet
it will be inferred as if it were and there's no (clean) way to prevent that.
If the compiler optimizes based on a==c, it will miscompile the program.
This is why the restrictions on what is accessed via a pointer in a pure
function is necessary. Note it only matters for templates/literals/lambdas, ie
the cases where purity is inferred; the programmer can always add the purity
tag when he knows it is (logically) safe (eg most C string functions).

And yes, my example code doesn't make sense as-is, but it only servers to
illustrate the problem, there are sane implementations of foo(T*p) which under
the right conditions will have the same issues.

BTW, is my foo() above @safe? According to the compiler here - it is.


> 3. Aliasing is the classic optimization killer even without 'pure'.

Yes. Maybe it's a good thing that D doesn't attempt to define it, given the
amount of confusion something like "pure" causes...


> 4. Invalid use of pointers can break every other aspect of the type system.
>    Why single out 'pure' ?

It has nothing to do with "invalid use of pointers", unless, again, p[1] is
deemed invalid.


> > This is
> > why i suggested above that only dereferencing a pointer should be allowed in
> > pure functions.
> > 
> 
> This is too restrictive.

What else do you want to be able to do with a pointer in a pure function?
Dereferencing it and working with the value itself should work, anything else?
Note that you should be able to explicitly tell the compiler to assume
something is pure even when the code accesses more than just the pointed-to
element.


> > And one way to make it work is to forbid dereferencing pointers and require fat
> > ones. Then the bounds would be known.
> 
> The bounds are usually known only at runtime.
> The compiler does not have more to work with.
> From the compiler's point of view, an array access out of bounds
> and an invalid pointer dereference are very similar.

Having well defined aliasing rules would help, yes, but I think that's beyond
the scope of this bug.


> > > > and, if the access isn't restricted somehow, makes the
> > > > function dependent on global memory state.
> > > 
> > > ? A function independent of memory state is useless.
> > 
> > int n(int i) {return i+42;}
> > 
> 
> Where do you store the parameter 'i' if not in some memory location?

I said "global memory state". The parameters are *local* state, just like
variables - they can not escape (you can't return their address) and the values
depend only on function inputs. Arguments containing references can be seen as
part of the global state, but those are explicitly defined as inputs that the
function depends on. And that definition wrt to pointers is exactly what this
bug is about.


> > > f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler
> > > is not allowed to perform optimizations that change defined program behavior.
> > 
> > f4 isn't pure, by any definition - it depends on (or in this example modifies)
> > state, which the caller may not even consider reachable.
> 
> Then it is the caller's fault. What is considered reachable is well-defined,
> and f4 must document its valid inputs.

f4() takes a pointer; AFAICT you've said above that it should be able to do
more than just dereference it. So what exactly is considered reachable? 


> > The compiler can
> > assume that a pure function does not access any mutable state other than what
> > can be directly or indirectly reached via the arguments -- that is what
> > function purity is all about. If the compiler has to assume that a pure
> > function that takes a pointer argument can read or modify everything, the
> > "pure" tag becomes worthless.
> 
> No pointer _argument_ necessary.
> 
> int foo()pure{
>     enum int* everything = cast(int*)...;
>     return *everything;
> }
> 
> As I already pointed out, unsafe language features can be used to subvert the

p[i] can be just as dangerous as the cast. The questions is - can the compiler
treat a function containing these constructs as still pure? If the programmer
says so, it's fine - purity by convention works.


> type system. If pure functions should be restricted to the safe subset, they
> can be marked @safe, or compiled with the -safe compiler switch.

   int foo()(int* y) @safe {
      int r;
      foreach (i; 0..size_t.max)
         r += y[i]++;
      return r;
   }

But it's not related to this bug.


> > And what's worse, it allows other "truly" pure
> > function to call our immoral one. 
> > 
> 
> Nothing wrong with that.

It is wrong - if a pure functions can be optimized out and it calls another one
that has side effects. Again, the case when a human incorrectly tags a function
is not really the problem, it's when the compiler does that behind the
programmers back.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list