Proposal for scoped const contracts

Mon Mar 24 09:58:02 PDT 2008

This idea has come from the discussion on the const debacle thread.

It is basically an idea for scoped const.  The main goal is so that one can 
specify that a function does not modify an argument without affecting the 
constness of the input.

The main problem to solve would be that I have a function with an argument 
that returns a subset of the argument.  The easiest function to help explain 
the problem is strchr.  Please please do NOT tell me that my design is 
fundamentally unsound because you can return a range or pair, and then slice 
the original arg based on that pair.  There are other examples that cannot 
be solved this way, this is just the easiest to explain with.  Everyone who 
uses C should know about strchr:

char *strchr(char const *source, char const *pattern);

The result of strchr is meant to be a pointer into source where pattern 
exists.

Note that this is not even close to const-correct in C, because if you pass 
in a const source, the const is inhernetly cast away.

So let's move to the D version, which I'll specify with const to begin with:

const(char)[] strchr(const(char)[] source, const(char)[] pattern);

Note that const(char)[] MUST be the return value, because otherwise we 
cannot return a slice into source.  So far so good, but now, if I am using 
strchr to search for a pattern in a mutable string, and then I want to 
MODIFY the original string, I must cast away const, because the return value 
is const.  OK, so you might say let's add an overload (or templatize 
strchr):

char[] strchr(char[] source, const(char)[] pattern);

Which compiles and works, but I cannot specify with the signature that 
source will not be modified.  Therefore, the compiler is not able to take 
advantage of optimizations, and the caller is not guaranteed his source 
array will be untouched.

So, how do we specify this?  I propose a keyword is used to specify "scoped 
const", which basically means, "this variable is const within this function, 
but reverts to it's original const-ness when returned", let's call it foo 
(as a generic name for now):

foo(char)[] strchr(foo(char)[] source, const(char)[] pattern);

Note that foo only specifies source and not pattern because we are not 
returning anything from pattern, so it can be fully const.

What does this mean?  foo(char)[] source is not modifiable within strchr, 
but is implicitly castable to the type of the argument at the call site.  So 
if we call strchr with a char[], foo(char)[] is essentially an alias to 
const(char)[] while inside strchr, but upon return is implicitly castable 
back to char[].  This does not violate any const contracts because the 
argument was mutable to begin with.  If we call strchr with a const(char)[], 
foo(char)[] cannot be implicitly cast to char[] because the call site 
version was not mutable, and implicitly removing const would violate const 
rules.  These rules can easily be checked by the compiler at the call site, 
and so the function source does not need to be available.

So why must we have a keyword specification?  Because of the expressive 
nature of const types, you must be able to match exactly where the const 
comes into play.  For example, const(char)* is different than const(char*), 
and so foo must be just as expressive.  And in addition, the type returned 
may not be exactly the parameter passed in, but the const-ness must be 
upheld.

For example, what if the argument was a class, and the return type was 
unrelated:

foo(membertype) getMember(foo(classtype ct)) { foo(membertype) return 
ct.member;}

Note that if member is a function, it must also be foo, or else the contract 
could be violated.

You should be able to declare intermediate variables of type foo(x):

foo(membertype) = ct.member;

What if there are multiple arguments, and the result may come from any of 
them:

foo(T) min(T)(foo(T) val1, foo(T) val2);

what if one calls min with a mutable, and an invariant type?  The answer is 
that foo should map to the least common denominator.  If all of the foo's 
are identical (invariant, const, or mutable), then the resulting foo would 
be identical.  If any of them differ, the resulting foo must be const to 
uphold const-correctness.

In any case, val1, and val2 are const for the body of the function.

There are other benefits.  For example, to implement a min function that 
allows a mutable return for mutable arguments, you must define min as a 
template, which can generate up to 6 variations (for all the different 
argument const types), but with the foo notation, the function generated is 
always identical.  The only check for const-correctness is at the call site.

Note that this idea is very similar to Janice's idea of:

K(T) f(const K, T)(K(T) t);

The differences are:
  - This idea does not require a different template instantiation for 
identical code, and in fact is not a template, so it does not require source 
or generate bloat.
  - This idea ensures that the argument remains const inside the function 
even if the argument at the call site is mutable.  It enforces the contract 
that the caller is making that the argument will never be modified inside 
the function.

------------------- PROPOSAL FOR KEYWORD -------------------

That is my general proposal for scoped const, and as an orthogonal 
suggestion, which should by no means take away from my above proposal, I 
suggest we use the argument keywords 'in' and 'out' to specify foo:

out(char)[] strchr(in(char)[] source, const(char)[] pattern);

So arguments are implicitly castable to 'in', no matter if they are mutable, 
const, or invariant.
'in' types are implicitly castable to 'out' types.
'in' arguments cannot be modified inside the function (i.e. they are 
essentially const, but with the additional specification that they can be 
cast to 'out').
'out' is an alias for the constness at the call site defined by the 
following rules:
    -  if all of the 'in' parameters are of one constancy, (i.e. all are 
mutable, all are invariant, or all are const), then out is defined to be the 
same constancy.
    -  if there are two different constancy values for 'in', then 'out' is 
defined to be const.
These type declarations are made at the call site, not inside the function. 
The function is compiled the same for all versions of 'in' and 'out'.

And for functions that are members of a class:

in out(T) func() {...} // essentially, in(this)
or
out(T) func() in {...}

Rationale:  I think in and out are pretty much defunct keywords in this 
context (out replaced by ref, in replaced by const), and so are fair game 
for this syntax.  They are also very good english descriptions of what I am 
trying to do.

-Steve