pure functions without invariant (a get-rid-of-const proposal)
hasen
hasan.aljudy at gmail.com
Sun Mar 8 04:38:09 PDT 2009
The line of thinking for introducing const/invariant seems to go like this:
- D needs to be suitable for parallelism
- let's introduce pure functions
- pure functions need to accept immutable data only
- let's introduce const!! (invariant)
OK, I can agree with most of this, except for the last part! It's not
necessary that invariant is needed to introduce pure functions into D.
There must be better ways.
First of all, one could argue, why? what's wrong with invariant?
Well, there are a number of things that are wrong with it:
First of all, immutable data is a concept, implementing it as
"invariant" seems kind of hackish. By definition, immutable data doesn't
change. Forcing invariant on top of a mutable type is like saying "This
data would usually change, but please, for this instance, make it not
un-changeable!"
This is a HACK. Immutable types should have immutability be a
fundamental part of the type, not some kind of a glue on top of it; a
flag that's set on or off!
For instance, it doesn't make sense to have an invariant pointer to
invariant pointer to invariant data. If the data is invariant, then why
access it through pointers at all?
When data is immutable, then all you need is the data! You shouldn't
need to know where is it stored.
Invariant pointers just don't make sense!
Another thing to consider. When you have an object (in the oop sense of
the term), and you mark it as immutable, what's the point? The whole
point of objects is to encapsulate state that could change. If the
object's state doesn't change at all, then it becomes much less
interesting as an object. All you need then is the data in that object,
and not the object itself.
Invariant (oop) objects don't make sense.
You're trying to glue completely different things together and hope that
they stick.
I see this as similar to C++ trying to be "Object Oriented" without
automatic memory management, and it just doesn't make sense! They put
high level concepts in a low level language with no consideration to how
people actually need to use OOP. (They can argue all day long about how
they have inheritance and polymorphism!)
Similarly for D2, are we considering how people actually need to use
immutable data in their code?
Wouldn't it make much more sense, to have immutable data-types (such as
python tuples, and python strings), and remove the "invariant" concept
all together?
Let's step back and look at what we're trying to achieve again, and
think of a different way to implement it.
We want to have pure functions, but those have some restrictions!
A pure function's result can only depend on its input parameters (so
cannot have side effects)
- doesn't read global state
- doesn't perform I/O operations
- doesn't change input parameters (if they are passed by reference, that is)
It's clear that we need immutable types, so, just introduce immutable
types, but would they be like?
I suggest looking at python.
Consider for instance, python strings (let's call them python-style
strings from here on):
-immutable type
-no `a[b] = c` operation defined on string a
-`a + b` returns a completely new string
-not implemented in terms of an underlying `char` type
We can use it as an inspiration for the string type in D2 (excluding the
"no underlying char type" part)
We can also extend the concept and introduce immutable lists
-similar to python-style strings (see above)
-python-style string can be implemented as an immutable list
- while we're at it, string slice and index operations should
operate on characters, not bytes!
- I actually prefer that string is an actual type, not just
an alias
-the elements of the list must also be immutable.
- immutable list to mutable data doesn't make sense (wouldn't
be used or needed in practice)
Immutable lists don't need an `invariant` keyword, they can be defined
by i[ ] instead of [ ] (where `i[` is a proper token, like `!(` and
`r"`). This ties strongly to the idea that immutability is a fundamental
part of the type, not just some kind of annotation.
The other immutable type we need is a clustering of data, similar to
tuples in python.
I'm proposing here "immutable struct objects"
- based on struct
- internal state is initialized in constructor, never changed anywhere
- have reference semantics (no value semantics at all, similar to
python strings)
- all operations simply return a new object
- fields can only be of immutable types or native types with value
semantics (no pointers)
for a practical example, consider a 3d vector:
istruct vector //istruce is an immutable struct type
{
real x = 0; //ok, initialization behavior
real z = 0;
real y = 0;
this( real px, real py, real pz )
{
//constructor can initialize fields
x = px;
y = px;
z = pz;
}
real length() { return sqrt( x*x + y*y + z*z ); } //doesn't change
any state (doesn't need to, anyway)
vector normalized()
{
real len = length();
return vector( x/len, y/len, z/len ); //returns a new object
//no need for "new", as immutable structs have no value
semantics, only reference semantics
//so "new" is implicit
}
vecor opAdd( vector other )
{
return vector( x + other.x, y + other.y, z + other.z );
//returns a new object
}
//..etc..
}
//usage example
auto a = vector(1,2,3);
auto b = vector(5,4,3);
auto c = a.cross(b); //neither a nor b is changed
Why should you consider this proposal?
Well, for one thing, this is how things work in functional programming
(AFAIK), so this approach sticks to the spirit of functional languages.
What I mean is, this how data types are like! (I can only speak about
Haskell here)
Also, This approach is more natural.
Immutable types don't have a state to change by their nature, where as
if you annotate some type as invariant, you're saying that this object
could potentially change, but shouldn't for this instance.
This creates confusion, consider when a type that's mutable by
definition, suddenly needs to be passed to pure functions, and you're in
for trouble! Welcome to the invariant virus!
If you need this object to be immutable, then why base it on a mutable
type at all?
The suggested approach presented here removes much complication from the
type system!
I mean, just what the hell is `invariant(char*)** p`??? why would anyone
pass such a thing to a pure function?
There's an issue I want to brush on, that is:
Reading global state
Pure function cannot depend on global state, but probably one exception
can be allowed:
- if the global state is of a native or immutable type
- if it's a constant
- it can't be initialized by an impure function
Consider if the variable is initialized to `now()`? If a pure
function reads this variable,
then it's getting another input, which varies depending on when
the program was run.
- corollary: pure functions cannot use:
- global pointers
- global (normal/oop) objects
Essentially, this global data is not a state at all, it's simply an
alias for a value.
e.g. mathematical PI
Now, it's easier to define (and check) pure functions:
A pure function needs to satisfy the following:
- All parameters are either of native types passed by value, or of
immutable type (passed by reference)
- doesn't read global state unless const (as stated above)
- doesn't perform IO
There are things that pure functions can do (I think, but someone
correct me if I'm wrong please!)
Pure functions can create and modify local state (as long as it's local
to the function)
This entails that pure function can call impure functions if:
- the impure function doesn't read or change global state in an
impure-way (again, as stated above)
- the impure function doesn't perform I/O
Let's call such functions "semi-pure" (def: impure functions that are
callable from within pure functions)
This applies to all kinds of expressions, as they can all be reduced to
functions. For instance, newing objects can be reduced to calling their
constructors, which are functions that can be checked for
pureness/semi-pureness.
When the coder marks a function as "pure", the compiler must check that
it actually is pure.
I think the compiler can do this, in a way similar to how it can detect
which functions can have compile time function execution (CTFE)
Probably the procedure can be something like this:
- functions can be marked (by the compiler) as one of four: pure,
semi-pure, impure, unknown
- At first, all functions are marked unknown
- Three passes:
Pass1: detect impure functions:
- functions that read global state in an impure way
- functions that perform I/O
- functions that call other impure functions (I'm not sure how
simple or hard this can be)
Pass2: detect semi-pure functions:
- unknown functions that accept parameters of mutable types
- please note: only unknown functions are checked at this pass
Pass3: check pure functions:
- check functions marked by the programmer as "pure"
- are they already marked impure or semi-pure? issue an error
- do they call impure functions? issue an error
- if they call semi-pure functions, it's ok.
- if you're here, you passed the pureness test!
There are probably some quirks here and there about what constitutes a
proper semi-pure function,
but I'm sure it can essentially be worked out, without resorting to
`invariant`.
More information about the Digitalmars-d
mailing list