[Issue 8185] Pure functions and pointers

Mon Jun 4 03:37:54 PDT 2012

http://d.puremagic.com/issues/show_bug.cgi?id=8185

--- Comment #29 from timon.gehr at gmx.ch 2012-06-04 03:39:52 PDT ---
(In reply to comment #25)
> I am partly playing Devil's advocate here, but:
> 
> (In reply to comment #23)
> > > This is
> > > why i suggested above that only dereferencing a pointer should be allowed in
> > > pure functions.
> > > 
> > This is too restrictive.
> 
> Why?

Because safety is an orthogonal concern. eg. strlen is a pure function.
By the same way of reasoning, all unsafe features could be banned in all parts
of the code, not just in pure functions.

> 
> > > And one way to make it work is to forbid dereferencing pointers and require fat
> > > ones. Then the bounds would be known.
> > 
> > The bounds are usually known only at runtime.
> > The compiler does not have more to work with.
> > From the compiler's point of view, an array access out of bounds
> > and an invalid pointer dereference are very similar.
> 
> There is an important semantic difference between these two – a slice is a
> bounded region of memory, whereas a pointer per se just represents a reference
> to a single value.

Yes, 'per se'. Effectively, it references all memory in the same allocated
memory block. (This is also the view taken by the GC.)

> ---
> int foo(int* p) pure {
>   return *(p - 1); // Is this legal?
> }
> 

If it is legal depends on whether or not *(p-1) is part of the same memory
block. A conservative analysis (as is done in @safe code) would have to flag
the access as illegal.

> auto a = new int[10];
> foo(a.ptr + 1);
> ---

a.ptr is a pointer. The arithmetics are flagged as illegal in @safe code even
though it is safe. What do the examples show?

> 
> > > > ? A function independent of memory state is useless.
> > > 
> > > int n(int i) {return i+42;}
> > Where do you store the parameter 'i' if not in some memory location?
> 
> In a register, but that's besides the point

Indeed, because a register is just memory after all.

> – which is that the type of i, int,
> makes it clear that n depends on exactly four bytes of memory. In »struct Node
> { Node* next; } void foo(Node* n) pure;«, on the other hand, following your
> interpretation foo() might depend on an almost arbitrarily large amount of
> memory (consider e.g. uninitialized memory in the area between a heap-allocated
> Node instance and the end of the block where it resides,
> which, if interpreted as Node instance(s), might have »false pointers« to other memory blocks, etc.).
> 

The language does not define such a thing. Accessing this area therefore
results in undefined behavior.

> > > > f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler
> > > > is not allowed to perform optimizations that change defined program behavior.
> > > 
> > > f4 isn't pure, by any definition - it depends on (or in this example modifies)
> > > state, which the caller may not even consider reachable.
> > 
> > Then it is the caller's fault. What is considered reachable is well-defined […]
> 
> Is it? Could you please repeat the definition then,

It is written down in the C standard. There is no formal specification for D.

> and point out how this is
> clear from the definition of purity according to the spec,

This would not be defined in the pages about purity, but rather in the pages
about pointer arithmetics, which are missing, presumably because they would be
the same as in C.

> »Pure functions are
> functions that produce the same result for the same arguments«.
> 

This is not a definition of the 'pure' keyword. It relies on informal terms
such as 'the same' and does not require annotation of a function. Therefore the
sentence should be dropped from the documentation.

If a function is marked with 'pure', then it may not reference mutable free
variables.

> > and f4 must document its valid inputs.
> ---
> /// Passing anything other than `false` is illegal.
> int g_state;
> void foo(bool neverTrue) pure {
>    if (neverTrue) g_state = 42;
> }
> ---
> 
> Should this be allowed to be pure? Well, if strlen is, then ostensibly yes, but

No, because it is trivial to devise an equivalent implementation that does not
require the compiler to read documentation comments:

int g_state;
void foo(bool neverTrue) pure in{assert(!neverTrue);} body { }

The same does not hold for 'strlen', therefore the analogy immediately breaks
down.

> isn't this too permissive of an interpretation, as the type system can't
> actually guarantee it? Shouldn't rather a cast to pure at the _call site_ be
> required if called with know good values, just as in other cases where the type
> system can't prove a certain invariant, but the programmer can?

The type system of an unsafe language cannot prove _any_ invariants, because
unsafe operations may result in undefined behavior. This does not imply we'd
better have to drop the entire type system.

> Purity by convention works just fine without the pure keyword as well…

This is not only about purity by convention, it is about memory safety by
convention. In @safe code, all the concerns raised immediately disappear.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------