pure functions without invariant (a get-rid-of-const proposal)

hasen hasan.aljudy at gmail.com
Sun Mar 8 04:38:09 PDT 2009


The line of thinking for introducing const/invariant seems to go like this:

- D needs to be suitable for parallelism
- let's introduce pure functions
- pure functions need to accept immutable data only
- let's introduce const!! (invariant)

OK, I can agree with most of this, except for the last part! It's not 
necessary that invariant is needed to introduce pure functions into D. 
There must be better ways.

First of all, one could argue, why? what's wrong with invariant?

Well, there are a number of things that are wrong with it:

First of all, immutable data is a concept, implementing it as 
"invariant" seems kind of hackish. By definition, immutable data doesn't 
change. Forcing invariant on top of a mutable type is like saying "This 
data would usually change, but please, for this instance, make it not 
un-changeable!"
This is a HACK. Immutable types should have immutability be a 
fundamental part of the type, not some kind of a glue on top of it; a 
flag that's set on or off!

For instance, it doesn't make sense to have an invariant pointer to 
invariant pointer to invariant data. If the data is invariant, then why 
access it through pointers at all?

When data is immutable, then all you need is the data! You shouldn't 
need to know where is it stored.

Invariant pointers just don't make sense!

Another thing to consider. When you have an object (in the oop sense of 
the term), and you mark it as immutable, what's the point? The whole 
point of objects is to encapsulate state that could change. If the 
object's state doesn't change at all, then it becomes much less 
interesting as an object. All you need then is the data in that object, 
and not the object itself.

Invariant (oop) objects don't make sense.

You're trying to glue completely different things together and hope that 
they stick.

I see this as similar to C++ trying to be "Object Oriented" without 
automatic memory management, and it just doesn't make sense! They put 
high level concepts in a low level language with no consideration to how 
people actually need to use OOP. (They can argue all day long about how 
they have inheritance and polymorphism!)

Similarly for D2, are we considering how people actually need to use 
immutable data in their code?

Wouldn't it make much more sense, to have immutable data-types (such as 
python tuples, and python strings), and remove the "invariant" concept 
all together?

Let's step back and look at what we're trying to achieve again, and 
think of a different way to implement it.

We want to have pure functions, but those have some restrictions!
A pure function's result can only depend on its input parameters (so 
cannot have side effects)
- doesn't read global state
- doesn't perform I/O operations
- doesn't change input parameters (if they are passed by reference, that is)

It's clear that we need immutable types, so, just introduce immutable 
types, but would they be like?
I suggest looking at python.

Consider for instance, python strings (let's call them python-style 
strings from here on):
     -immutable type
     -no `a[b] = c` operation defined on string a
     -`a + b` returns a completely new string
     -not implemented in terms of an underlying `char` type

We can use it as an inspiration for the string type in D2 (excluding the 
"no underlying char type" part)

We can also extend the concept and introduce immutable lists
     -similar to python-style strings (see above)
     -python-style string can be implemented as an immutable list
         - while we're at it, string slice and index operations should 
operate on characters, not bytes!
             - I actually prefer that string is an actual type, not just 
an alias
     -the elements of the list must also be immutable.
         - immutable list to mutable data doesn't make sense (wouldn't 
be used or needed in practice)

Immutable lists don't need an `invariant` keyword, they can be defined 
by i[ ] instead of [ ] (where `i[` is a proper token, like `!(` and 
`r"`). This ties strongly to the idea that immutability is a fundamental 
part of the type, not just some kind of annotation.

The other immutable type we need is a clustering of data, similar to 
tuples in python.
I'm proposing here "immutable struct objects"
   - based on struct
   - internal state is initialized in constructor, never changed anywhere
   - have reference semantics (no value semantics at all, similar to 
python strings)
   - all operations simply return a new object
   - fields can only be of immutable types or native types with value 
semantics (no pointers)

for a practical example, consider a 3d vector:

istruct vector //istruce is an immutable struct type
{
     real x = 0; //ok, initialization behavior
     real z = 0;
     real y = 0;
     this( real px, real py, real pz )
     {
         //constructor can initialize fields
         x = px;
         y = px;
         z = pz;
     }

     real length() { return sqrt( x*x + y*y + z*z ); } //doesn't change 
any state (doesn't need to, anyway)

     vector normalized()
     {
         real len = length();
         return vector( x/len, y/len, z/len ); //returns a new object
         //no need for "new", as immutable structs have no value 
semantics, only reference semantics
         //so "new" is implicit
     }

     vecor opAdd( vector other )
     {
         return vector( x + other.x, y + other.y, z + other.z ); 
//returns a new object
     }

     //..etc..
}
//usage example
auto a = vector(1,2,3);
auto b = vector(5,4,3);
auto c = a.cross(b); //neither a nor b is changed

Why should you consider this proposal?

Well, for one thing, this is how things work in functional programming 
(AFAIK), so this approach sticks to the spirit of functional languages. 
What I mean is, this how data types are like! (I can only speak about 
Haskell here)

Also, This approach is more natural.
Immutable types don't have a state to change by their nature, where as 
if you annotate some type as invariant, you're saying that this object 
could potentially change, but shouldn't for this instance.
This creates confusion, consider when a type that's mutable by 
definition, suddenly needs to be passed to pure functions, and you're in 
for trouble! Welcome to the invariant virus!
If you need this object to be immutable, then why base it on a mutable 
type at all?

The suggested approach presented here removes much complication from the 
type system!
I mean, just what the hell is `invariant(char*)** p`??? why would anyone 
pass such a thing to a pure function?

There's an issue I want to brush on, that is:

Reading global state

Pure function cannot depend on global state, but probably one exception 
can be allowed:
     - if the global state is of a native or immutable type
     - if it's a constant
     - it can't be initialized by an impure function
         Consider if the variable is initialized to `now()`? If a pure 
function reads this variable,
         then it's getting another input, which varies depending on when 
the program was run.
      - corollary: pure functions cannot use:
         - global pointers
         - global (normal/oop) objects
Essentially, this global data is not a state at all, it's simply an 
alias for a value.
e.g. mathematical PI

Now, it's easier to define (and check) pure functions:

A pure function needs to satisfy the following:
- All parameters are either of native types passed by value, or of 
immutable type (passed by reference)
- doesn't read global state unless const (as stated above)
- doesn't perform IO

There are things that pure functions can do (I think, but someone 
correct me if I'm wrong please!)

Pure functions can create and modify local state (as long as it's local 
to the function)
This entails that pure function can call impure functions if:
     - the impure function doesn't read or change global state in an 
impure-way (again, as stated above)
     - the impure function doesn't perform I/O

Let's call such functions "semi-pure" (def: impure functions that are 
callable from within pure functions)


This applies to all kinds of expressions, as they can all be reduced to 
functions. For instance, newing objects can be reduced to calling their 
constructors, which are functions that can be checked for 
pureness/semi-pureness.


When the coder marks a function as "pure", the compiler must check that 
it actually is pure.
I think the compiler can do this, in a way similar to how it can detect 
which functions can have compile time function execution (CTFE)

Probably the procedure can be something like this:
   - functions can be marked (by the compiler) as one of four: pure, 
semi-pure, impure, unknown
   - At first, all functions are marked unknown
   - Three passes:
     Pass1: detect impure functions:
       - functions that read global state in an impure way
       - functions that perform I/O
       - functions that call other impure functions (I'm not sure how 
simple or hard this can be)
     Pass2: detect semi-pure functions:
       - unknown functions that accept parameters of mutable types
         - please note: only unknown functions are checked at this pass
     Pass3: check pure functions:
       - check functions marked by the programmer as "pure"
         - are they already marked impure or semi-pure? issue an error
         - do they call impure functions? issue an error
         - if they call semi-pure functions, it's ok.
       - if you're here, you passed the pureness test!

There are probably some quirks here and there about what constitutes a 
proper semi-pure function,
but I'm sure it can essentially be worked out, without resorting to 
`invariant`.



More information about the Digitalmars-d mailing list