Auto-invariance (WAS: const )

Fri Mar 28 12:58:55 PDT 2008

== Quote from Benji Smith (benji at benjismith.net)'s article
> Sean Kelly wrote:
> > I think the need for const vs. invariant in D 2.0 is that the compiler isn't
> > smart enough to know when the referenced data should actually never
> > change, and because a bunch of interesting optimizations are available
> > when the data really doesn't ever change, we have a keyword to hint to
> > the compiler that this is true.
> If the compiler can't actually detect mutation of const values, then how
> will it enforce const correctness? Is the 'const' keyword just a fancy
> compiler-enforced form of documentation?

The compiler can locally, but what if it doesn't have access to the entire
program source when compiling a module?  For example, let's say I create
a library function like so:

    void doSomething( const(char)[] buf, void delegate() dg )
    {
        writefln( buf ); // A
        dg();
        writefln( buf ); // B
    }

If the compiler knows that the data referenced by buf will not change for
the entire length of the call to doSomething() then it can cache information
about buf in registers, etc.  However, doSomething() contains an opaque
call to external code, which could in theory modify the contents of buf.
Consider:

    char[] inp = "hello world";
    doSomething( inp, () { inp[0] = 'j'; } );

So because there is an opaque call inside doSomething(), the compiler must
assume that the contents of buf may change across that call, and therefore
may only assume that buf is invariant from the beginning of doSomething()
up to the call to dg(), then from the call of dg() to the end of doSomething().

What I don't like about 'invariant' is that while the label is in the correct or
at least the easiest place for the compiler, its location means code duplication
for the programmer if she wants to have her function work for both constant
and non-constant data.  This is fine for passing shared data between threads:

class MyClass
{
    private string m_s;
    void setString( string s ) { m_s = s; }
    void setString( char[] s ) { m_s = s.idup; }
}

as it avoids the need to perform a copy, or more generally, to acquire a mutex.
However, I think the far more common case will be that library programmers
will want to create algorithms that work on any kind of string and have the
compiler simply optimize the code ideally for each situation:

size_t find(T)( in T[] buf, T pat )
{
    foreach( i, e; buf )
        if( e == pat )
            return i;
    return buf.length;
}

Here, the need to create separate, identical implementations for the same
algorithm that vary only by the constancy of buf simply so the compiler can
optimize differently for each aspect is horrible.  I would much rather have
the compiler invisibly generate the different permutations for me and do
something fancy with the name mangling to sort out all out invisibly.  So:

"abc".find( 'b' ); // calls the "invariant(char)[]" permutation
char[] buf; buf.find( 'b' ); // calls the "const(char)[]" permutation

Of course, this means that the compiler will silently be generating more
code but so what?  A decent linker could throw out the unused routines
anyway.  I haven't given this idea much thought, but on the surface it seems
like it could do away with the need for 'invariant' almost entirely.

Sean