Interesting Research Paper on Constructors in OO Languages

Mon Jul 15 15:27:40 PDT 2013

On Mon, Jul 15, 2013 at 09:06:38PM +0200, Meta wrote:
> I saw an interesting post on Hacker News about constructors in OO
> languages. Apparently they are a real stumbling block for some
> programmers, which was quite a surprise to me. I think this might be
> relevant to a discussion about named parameters and whether we
> should ditch constructors for another kind of construct.
> 
> Link to the newsgroup post, the link to the paper is near the top:
> http://erlang.org/pipermail/erlang-questions/2012-March/065519.html

Thanks for the link; this touches on one of my pet peeves about OO
libraries: constructors.

I consider myself to be a "systematic" programmer (according to the
definition in the paper); I can work equally well with ctors with
arguments vs. create-set-call objects. But I find that mandatory ctors
with arguments are a pain to work with, *both* to write and to use.

On the usability side, there's the mental workload of having to remember
which order the arguments appear in (or look it up in the IDE, or
whatever -- the point is that I can't just type the ctor call straight
from my head). Then there's the problem of needing to create objects
required by the ctor before you can call the ctor. In some cases, this
can be inconvenient -- I always have to remember to setup and create
other objects before I can create this one, because its ctor requires
said objects as arguments. Then there's the lack of flexibility: no
matter what you do, it seems that anything that requires more than a
single ctor argument inevitably becomes either (1) too complex,
requiring too many arguments, and therefore very difficult to use, or
(2) too simplistic, and therefore unable to do some things that I may
want to do (e.g. some fields are default-initialized with no way to
specify the initial values of the fields, 'cos otherwise the ctor would
have too many arguments). No matter what you do, it seems almost
impossible to come up with an ideal ctor except in trivial cases where
it requires only 1 argument or is a default ctor.

On the writability side, one of my pet peeves is base class ctors that
require multiple arguments. Every level of inheritance inevitably adds
more arguments each time, and by the time you're 5-6 levels down the
class hierarchy, your ctor calls just have an unmanageable number of
parameters. Not to mention the violation of DRY by requiring much
redundant typing just to pass arguments from the inherited class' ctor
up the class hierarchy. Tons of bugs to be had everywhere, given the
amount of repeated typing needed.

In the simplest cases, of course, these aren't big issues, but this kind
of ctor design is clearly not scalable.

OTOH, the create-set-call pattern isn't panacea either. One of the
biggest problems with this pattern is that you can't guarantee your
objects are in a consistent state at all times. This is very bad,
because all your methods will have to check if some value has been set
yet, before it uses it. This adds a lot of complexity that could've been
avoided had everything been set at ctor-time. This also makes class
invariants needlessly complex. Moreover, I've seen many classes in this
category exhibit undefined behaviour if you call a value-setting method
after you start using the object. Too many classes falsely assume that
you will always call set methods and then "use" methods in that order.
If you call a set method after calling a "use" method, you're quite
likely to run into bugs in the class, e.g. part of the object's state
doesn't reflect the new value you set, because the "use" methods were
written with the assumption that when they were called the first time,
the values you set earlier won't change thereafter.

I've always found Perl's approach a more balanced way to tackle this
problem (even though Perl's OO system as a whole suffers from other,
shall we say, idiosyncrasies). In Perl, objects start out as arbitrary
key-value pairs, and nothing differentiates them from a regular AA until
you call the 'bless' built-in function on them, at which point they
become "officially" a member of some particular class. This neatly
sidesteps the whole ctor mess: you can initialize the initial AA with
whatever values you want, in whatever order you want. When you finally
"kicked it into shape", as the cited paper puts it, you "promote" that
set of key-value pairs into an "official" member of the class, and
thereafter, you can't simply modify fields anymore except through class
methods. This means you now have the possibility of enforcing invariants
on the object without crippling the flexibility of constructing it.
(Well, OK, in Perl, this last bit isn't necessarily true, but in an
ideal implementation of this initialize-bless-use approach, the object's
fields would become non-public after being blessed and can only be
updated by "official" object methods.)

In the spirit of this approach, I've written some C++ code in the past
that looked something like this:

	class BaseClass {
	public:
		// Encapsulate ctor arguments
		struct Args {
			int baseparm1, baseparm2;
		};
		BaseClass(Args args) {
			// initialize object based on fields in
			// BaseClass::Args.
		}
	};

	class MyClass : public BaseClass {
	public:
		// Encapsulate ctor arguments
		struct Args : BaseClass::Args {
			int parm1, parm2;
		};

		MyClass(Args args) : BaseClass(args) {
			// initialize object based on fields in args
		}
	};

Basically, the Args structs let the user set up whatever values they
want to, in whatever order they wish, then they are "blessed" into real
class instances by the ctor. Encapsulating ctor arguments in these
structs alleviates the problem of proliferating ctor arguments as the
class hierarchy grows: each derived class simply hands off the Args
struct (which is itself in a hierarchy that parallels that of the
classes) to the base class ctor. All ctors in the class hierarchy needs
only a single (polymorphic) argument.

This approach also localizes the changes required when you modify base
class arguments -- in the old way of having multiple ctor arguments,
adding or changing arguments to the base class ctor requires you to
update every single derived class ctor accordingly -- very bad. But
here, adding a new field to BaseClass::Args requires zero changes to all
derived classes, which is a Good Thing(tm).

In some cases, if the class in relatively simple, the private members of
the class can simply be themselves an instance of the Args struct, so
the ctor could be nothing more than just:

	MyClass(Args args) : BaseClass(args), myArgs(args) {}

which gets rid of that silly baroque dance of naming ctor arguments as
_a, _b, _c, then writing in the ctor body a=_a, b=_b, c=_c (which can be
rather error prone if you mistype a _ somewhere or forget to assign one
of the members). Since the private copy of Args is not accessible from
outside, class methods can use the values freely without having to worry
about inconsistent states -- the ctor can check class invariants before
creating the class object, ensuring that the internal copy of Args is in
a consistent state.

The Args structs themselves, of course, can have ctors that setup sane
default values for each field, so that lazy users can simply call:

	MyClass *obj = new MyClass(MyClass::Args());

and get a working, consistent class object with default settings. This
way of setting default values also lets the user only change fields that
they don't want to use default values for, rather than be constricted by
the order of ctor default arguments: if you're unlucky enough to need a
non-default value in a later parameter, you're forced to repeat the
default values for everything that comes before it.

In D, this approach isn't quite as nice, because D structs don't have
inheritance, so you can't simply pass Args from derived class to base
class. You'd have to explicitly do something like:

	class BaseClass {
	public:
		struct Args { ...  }
		this(Args args) { ... }
	}

	class MyClass {
	public:
		struct Args {
			BaseClass.Args base;	// <-- explicit inclusion of BaseClass.Args
			...
		}
		this(Args args) {
			super(args.base);	// <-- more verbose than just super(args);
			...
		}
	}

Initializing the args also isn't as nice, since user code will have to
know exactly which fields are in .base and which aren't. You can't just
write, like in C++:

	// C++
	MyClass::Args args;
	args.basefield1 = 123;
	args.field2 = 321;

you'd have to write, in D:

	// D
	MyClass.Args args;
	args.base.basefield1 = 123;
	args.field2 = 321;

which isn't as nice in terms of encapsulation, since ideally user code
should need to care about the exact boundaries between base class and
derived class.

I haven't really thought about how this might be made nicer in D,
though.

T

-- 
I am Ohm of Borg. Resistance is voltage over current.