Interesting Research Paper on Constructors in OO Languages

Mon Jul 15 18:54:27 PDT 2013

On Monday, 15 July 2013 at 22:29:14 UTC, H. S. Teoh wrote:
> I consider myself to be a "systematic" programmer (according to 
> the
> definition in the paper); I can work equally well with ctors 
> with
> arguments vs. create-set-call objects. But I find that 
> mandatory ctors
> with arguments are a pain to work with, *both* to write and to 
> use.

I also find constructors with multiple arguments a pain to use. 
They get difficult to maintain as your project grows. One of my 
pet projects has a very shallow class hierarchy, but the 
constructors of each object down the tree have many arguments, 
with descendants adding on even more. It gets to be a real 
headache when you have more than 3 constructors per class to deal 
with base class overloads, multiple arguments, etc.

> On the usability side, there's the mental workload of having to 
> remember
> which order the arguments appear in (or look it up in the IDE, 
> or
> whatever -- the point is that I can't just type the ctor call 
> straight
> from my head). Then there's the problem of needing to create 
> objects
> required by the ctor before you can call the ctor. In some 
> cases, this
> can be inconvenient -- I always have to remember to setup and 
> create
> other objects before I can create this one, because its ctor 
> requires
> said objects as arguments. Then there's the lack of 
> flexibility: no
> matter what you do, it seems that anything that requires more 
> than a
> single ctor argument inevitably becomes either (1) too complex,
> requiring too many arguments, and therefore very difficult to 
> use, or
> (2) too simplistic, and therefore unable to do some things that 
> I may
> want to do (e.g. some fields are default-initialized with no 
> way to
> specify the initial values of the fields, 'cos otherwise the 
> ctor would
> have too many arguments). No matter what you do, it seems almost
> impossible to come up with an ideal ctor except in trivial 
> cases where
> it requires only 1 argument or is a default ctor.

Having to create other objects to pass to a constructor is 
particularly painful. You'd better pray that they have trivial 
constructors, or else things can get hairy really fast. Multiple 
nested constructors can also create a large amount of code bloat. 
Once the constructor grows large enough, I generally put each 
argument on its own line to ensure that it's clear what I'm 
calling it with. This has the unfortunate side effect of making 
the call span multiple lines. In my opinion, a constructor 
requiring more than 10 lines is an unsightly abomination.

> On the writability side, one of my pet peeves is base class 
> ctors that
> require multiple arguments. Every level of inheritance 
> inevitably adds
> more arguments each time, and by the time you're 5-6 levels 
> down the
> class hierarchy, your ctor calls just have an unmanageable 
> number of
> parameters. Not to mention the violation of DRY by requiring 
> much
> redundant typing just to pass arguments from the inherited 
> class' ctor
> up the class hierarchy. Tons of bugs to be had everywhere, 
> given the
> amount of repeated typing needed.
>
> In the simplest cases, of course, these aren't big issues, but 
> this kind
> of ctor design is clearly not scalable.
>
> OTOH, the create-set-call pattern isn't panacea either. One of 
> the
> biggest problems with this pattern is that you can't guarantee 
> your
> objects are in a consistent state at all times. This is very 
> bad,
> because all your methods will have to check if some value has 
> been set
> yet, before it uses it. This adds a lot of complexity that 
> could've been
> avoided had everything been set at ctor-time. This also makes 
> class
> invariants needlessly complex. Moreover, I've seen many classes 
> in this
> category exhibit undefined behaviour if you call a 
> value-setting method
> after you start using the object. Too many classes falsely 
> assume that
> you will always call set methods and then "use" methods in that 
> order.
> If you call a set method after calling a "use" method, you're 
> quite
> likely to run into bugs in the class, e.g. part of the object's 
> state
> doesn't reflect the new value you set, because the "use" 
> methods were
> written with the assumption that when they were called the 
> first time,
> the values you set earlier won't change thereafter.

I've found that a good way to keep constructors manageable is to 
use the builder pattern. Create a builder object that has its 
fields set by the programmer, which is then passed to the 'real' 
object for construction. You can provide default arguments, 
optional arguments, etc. Combine this with a fluid interface and 
I think it looks a lot better. Of course, this has the 
disadvantage of requiring a *lot* of boilerplate, but I think 
this could be okay in D, as a builder class is exactly the kind 
of thing that can be automatically generated.

> I've always found Perl's approach a more balanced way to tackle 
> this
> problem (even though Perl's OO system as a whole suffers from 
> other,
> shall we say, idiosyncrasies). In Perl, objects start out as 
> arbitrary
> key-value pairs, and nothing differentiates them from a regular 
> AA until
> you call the 'bless' built-in function on them, at which point 
> they
> become "officially" a member of some particular class. This 
> neatly
> sidesteps the whole ctor mess: you can initialize the initial 
> AA with
> whatever values you want, in whatever order you want. When you 
> finally
> "kicked it into shape", as the cited paper puts it, you 
> "promote" that
> set of key-value pairs into an "official" member of the class, 
> and
> thereafter, you can't simply modify fields anymore except 
> through class
> methods. This means you now have the possibility of enforcing 
> invariants
> on the object without crippling the flexibility of constructing 
> it.
> (Well, OK, in Perl, this last bit isn't necessarily true, but 
> in an
> ideal implementation of this initialize-bless-use approach, the 
> object's
> fields would become non-public after being blessed and can only 
> be
> updated by "official" object methods.)
>
> In the spirit of this approach, I've written some C++ code in 
> the past
> that looked something like this:
>
> 	class BaseClass {
> 	public:
> 		// Encapsulate ctor arguments
> 		struct Args {
> 			int baseparm1, baseparm2;
> 		};
> 		BaseClass(Args args) {
> 			// initialize object based on fields in
> 			// BaseClass::Args.
> 		}
> 	};
>
> 	class MyClass : public BaseClass {
> 	public:
> 		// Encapsulate ctor arguments
> 		struct Args : BaseClass::Args {
> 			int parm1, parm2;
> 		};
>
> 		MyClass(Args args) : BaseClass(args) {
> 			// initialize object based on fields in args
> 		}
> 	};
>
> Basically, the Args structs let the user set up whatever values 
> they
> want to, in whatever order they wish, then they are "blessed" 
> into real
> class instances by the ctor. Encapsulating ctor arguments in 
> these
> structs alleviates the problem of proliferating ctor arguments 
> as the
> class hierarchy grows: each derived class simply hands off the 
> Args
> struct (which is itself in a hierarchy that parallels that of 
> the
> classes) to the base class ctor. All ctors in the class 
> hierarchy needs
> only a single (polymorphic) argument.
>
> This approach also localizes the changes required when you 
> modify base
> class arguments -- in the old way of having multiple ctor 
> arguments,
> adding or changing arguments to the base class ctor requires 
> you to
> update every single derived class ctor accordingly -- very bad. 
> But
> here, adding a new field to BaseClass::Args requires zero 
> changes to all
> derived classes, which is a Good Thing(tm).
>
> In some cases, if the class in relatively simple, the private 
> members of
> the class can simply be themselves an instance of the Args 
> struct, so
> the ctor could be nothing more than just:
>
> 	MyClass(Args args) : BaseClass(args), myArgs(args) {}
>
> which gets rid of that silly baroque dance of naming ctor 
> arguments as
> _a, _b, _c, then writing in the ctor body a=_a, b=_b, c=_c 
> (which can be
> rather error prone if you mistype a _ somewhere or forget to 
> assign one
> of the members). Since the private copy of Args is not 
> accessible from
> outside, class methods can use the values freely without having 
> to worry
> about inconsistent states -- the ctor can check class 
> invariants before
> creating the class object, ensuring that the internal copy of 
> Args is in
> a consistent state.
>
> The Args structs themselves, of course, can have ctors that 
> setup sane
> default values for each field, so that lazy users can simply 
> call:
>
> 	MyClass *obj = new MyClass(MyClass::Args());
>
> and get a working, consistent class object with default 
> settings. This
> way of setting default values also lets the user only change 
> fields that
> they don't want to use default values for, rather than be 
> constricted by
> the order of ctor default arguments: if you're unlucky enough 
> to need a
> non-default value in a later parameter, you're forced to repeat 
> the
> default values for everything that comes before it.
>
> In D, this approach isn't quite as nice, because D structs 
> don't have
> inheritance, so you can't simply pass Args from derived class 
> to base
> class. You'd have to explicitly do something like:
>
> 	class BaseClass {
> 	public:
> 		struct Args { ...  }
> 		this(Args args) { ... }
> 	}
>
> 	class MyClass {
> 	public:
> 		struct Args {
> 			BaseClass.Args base;	// <-- explicit inclusion of 
> BaseClass.Args
> 			...
> 		}
> 		this(Args args) {
> 			super(args.base);	// <-- more verbose than just super(args);
> 			...
> 		}
> 	}
>
> Initializing the args also isn't as nice, since user code will 
> have to
> know exactly which fields are in .base and which aren't. You 
> can't just
> write, like in C++:
>
> 	// C++
> 	MyClass::Args args;
> 	args.basefield1 = 123;
> 	args.field2 = 321;
>
> you'd have to write, in D:
>
> 	// D
> 	MyClass.Args args;
> 	args.base.basefield1 = 123;
> 	args.field2 = 321;
>
> which isn't as nice in terms of encapsulation, since ideally 
> user code
> should need to care about the exact boundaries between base 
> class and
> derived class.
>
> I haven't really thought about how this might be made nicer in 
> D,
> though.
>
>
> T

See above, this is basically the builder pattern. It's a neat 
trick, giving your args objects a class hierarchy of their own. I 
think that one drawback of that, however, is that now you have to 
maintain *two* class hierarchies. Have you found this to be a 
problem in practice?

As an aside, you could probably simulate the inheritance of the 
args objects in D either with alias this or even opDispatch. 
Still, this means that you need to nest the structs within 
each-other, and this could get silly after 2-3 "generations" of 
args objects.