Do we really need const?

Tue Sep 18 00:21:31 PDT 2007

On 9/18/07, Walter Bright <newshound1 at digitalmars.com> wrote:
> Robert Fraser wrote:
> > I'd like to pose a question to those who have used C++'s const: do
> > you feel that it has saved more time by preventing bugs than it has
> > taken by being forced to type it all the time, and the time spent
> > when it has to be removed all throughout a hierarchy, as inevitably
> > has to happen at least once? That is, const-correctness is a time
> > investment, so do you feel that investment has paid off for you?
>
> At the upcoming http://www.astoriaseminar.com, I think I'll do some
> asking around on this issue. There'll be a lot of C++ diehards there.

The issues in C++ aren't necessarily the same as the issues in D,
however. Perhaps the most significant thing is that in C++, all
classes are passed by value by default. That means that function
parameters are essentially const by default, because the function gets
its own copy of the object rather than referencing the original. To
pass by pointer or reference, you must explicitly code it to do that,
and only then does const start to creep in. This can lead to some very
awkward coding practices, e.g.

void f(std::string const & s);

...which basically does exactly the same job as

void f(std::string s)

Obviously you don't need "const" in the latter case because you're
passing by value, but as soon as you start passing by reference (for
efficiency - it avoids unnecessary copy constructor and destructor
calls) you suddenly start to need const.

In D, things are a bit different, because (as in Java) classes are
passed by reference, and that means that the const keyword is going to
be needed a lot more.

An alternative approach might be to have reference types passed to
functions as const by default, requiring the function author to
explicitly state (by means of the ref keyword) that the object is
mutable. This would mean that classes would then have exactly the same
semantics as structs. e.g.

struct S;
class C;
void f(S s); /* f gets a copy of s, so cannot modify the original */
void f(ref S s); /* s passed by reference, so f can modify it */
void f(C c); /* s passed by const reference, so f cannot modify it */
void f(ref C c); /* s passed by reference-to-mutable, so f can modify it */

Of course, this brings us back to the head/tail const distinction. In
the above examples, we are only concerned with the constness of the
object's members. In the fourth example, we arrive at a situation
which /can never happen in C++/, because in C++, references are
/always/ head-const. That is:

void f(C & c)
{
    s = new C(); /*ERROR - c is a reference */
}

will not compile, because even though c was not declared as const, it
is nonetheless head-const /because it is a reference/

This leads me to my second thought (I have more...), which is the
notion that all function parameters should be head-const, not just by
default, but absolutely. This would support all of the preceeding
argument, but it would also mean that, for example

void f(int n)
{
    ++n; /* Error */
}

would no longer compile. That's not the end of the world, because n is
local anyway. The only change the programmer would need make to their
code is to make a local copy of n, like this:

void f(int n0)
{
    n = n0; /* local copy - may modify */
    ++n; /* OK */
}

which brings me to my third and final observation, which is that this
scheme needs one final "fix" before it becomes usable, because, as
I've described it above, structs would be passed by value, and yet the
function would not be able to modify them. And obviously that's bad.
So here's the final trick to make it all hunky dory.

For value types, such as structs or ints, we (that is, the compiler),
divide them into two categories: Category A consists of all primitive
types, and all structs which are less than some threshold size (say,
16 bytes). Category B consists of all remaining structs. In summary,
category = (T.sizeof < 16) ? "A" : "B".

Category A objects are passed by value.

Category B objects are passed by reference.

Some examples would help explain:

---------------------------

struct SmallStruct
{
    int x;
    int y;
}

SmallStruct s;
f(s);

void f(SmallStruct s) /* s is passed by value and is head-const */
{
    s.x = 3; /* Error - s is head-const */
    SmallStruct s2 = s;
    s2.x = 3; /* OK */
}

void g(ref BigStruct s) /* s is passed by reference and is head-const */
{
    s.x = 3; /* OK */
}

---------------------------

struct BigStruct
{
    int x;
    int[100] y;
}

SmallStruct s;
f(s);

void f(BigStruct s) /* s is passed by reference and is head-const and
tail-const */
{
    s.x = 3; /* Error - s is tail-const */
    SmallStruct s2 = s;
    s2.x = 3; /* OK */
}

void g(ref BigStruct s) /* s is passed by reference and is head-const */
{
    s.x = 3; /* OK */
}

---------------------------

class MyClass
{
    int x;
    int y;
}

MyClass s;
f(s);

void f(MyClass s) /* s is already a reference, which passed by copy,
but we consider it head-const and tail-const */
{
	s = new MyClass; /* Error - s is head-const */
    s.x = 3; /* Error - s is tail-const */
    MyClass s2 = s.dup;
    s2.x = 3; /* OK */
}

void g(ref MyClass s) /* s is already a reference, which passed by
copy, but we consider it head-const */
{
	s = new MyClass; /* Error - s is head-const */
    s.x = 3; /* OK */
}

---------------------------

The important thing to observe in these examples is that everything
works the same - the semantics are identical at both the caller site
and the callee site.

The second important thing to observe is that the word "const" is
completely absent from these examples. If you have const-by-default,
you don't need it. Instead, you work around head-constness by making a
local copy, and you override tail-constness by using the "ref"
keyword.

Plus - you get built-in effeciency for passing large structs to functions.

I believe that this will help programmers to write code quickly
without having to remember to write "const" all over the place. If
they need to modify the original, the compiler will remind them to
throw in a "ref" keyword to make it explicit. Everyone wins. It's easy
to write code, and the compiler gets to do its checking.