Interesting Research Paper on Constructors in OO Languages

Thu Jul 18 11:00:44 PDT 2013

On Thu, Jul 18, 2013 at 10:13:58AM +0100, Regan Heath wrote:
> On Wed, 17 Jul 2013 18:58:53 +0100, H. S. Teoh
> <hsteoh at quickfur.ath.cx> wrote:
[...]
> >I guess my point was that if we boil this down to the essentials,
> >it's basically the same idea as a builder pattern, just implemented
> >slightly differently. In the builder pattern, a separate object (or
> >struct, or whatever) is used to encapsulate the state of the object
> >that we'd like it to be in, which we then pass to the ctor to create
> >the object in that state. The idea is the same, though: set up a
> >bunch of values representing the desired initial state of the object,
> >then, to borrow Perl's terminology, "bless" it into a full-fledged
> >class instance.
> 
> It achieves the same ends, but does it differently.  My idea requires
> compiler support (which makes it unlikely to happen) and doesn't
> require separate objects (which I think is a big plus).

Why would requiring separate objects be a problem?

[...]
> Thanks for the description of your idea.
> 
> As I understand it, in your approach all the mandatory parameters
> for all classes in the hierarchy are /always/ passed to the final
> child constructor.  In my idea a constructor in the hierarchy could
> chose to set some of the mandatory members of it's parents, and the
> compiler would detect that and would not require the initialisation
> block to contain those members.

In my case, the derived class ctor could manually set some of the fields
in Args before handing to the superclass. Of course, it's not as ideal,
since if user code already sets said fields, then they get silently
overridden.

> Also, in your approach there isn't currently any enforcement that
> the user sets all the mandatory parameters of Args, and this is
> kinda the main issue my idea solves.

True. One workaround is to use Nullable and check that in the ctor. But
I suppose it's not as great as a compile-time check.

> >One thing about your implementation that I found limiting was that
> >you *have* to declare all required fields on-the-spot before the
> >compiler will let your 'new' call pass, so if you have to create 5
> >similar instances of the class, you have to copy-n-paste most of the
> >set-method calls:
> >
> >	auto obj1 = new C() {
> >		name = "test1",
> >		age = 12,
> >		school = "D Burg High School"
> >	});
> >
> >[...]
> >
> >Whereas using my approach, you can simply reuse the Args struct
> >several times:
> >
> >	C.Args args;
> >	args.name = "test1";
> >	args.age = 12;
> >	args.school = "D Burg High School";
> >	auto obj1 = new C(args);
> >
> >	args.name = "test2";
> >	auto obj2 = new C(args);
> >
> >	args.name = "test3";
> >	auto obj3 = new C(args);
> >
> >	... // etc.
> 
> Or.. you use a mixin, or better still you add a copy-constructor or
> .dup method to your class to duplicate it :)

But then you end up with the problem of needing to call set methods
after the .dup, which may complicate things if the set methods need to
do non-trivial initialization of internal structures (caches or internal
representations, etc.). Whereas if you hadn't needed to .dup, you could
have gotten by without writing any set methods for your class, but now
you have to.

[...]
> In my case you can call different functions in the initialisation
> block, e.g.
> 
> void defineObject(C c)
> {
>   c.school = "...);
> }
> 
> C c = new C() {
>   defineObject()
> }
> 
> :)

So the compiler has to recursively traverse function calls in the
initialization block in order to check that all required fields are set?
That could have entail some implementational issues, if said function
calls can be arbitrarily complex. (If you have complex control logic in
said functions, the compiler can't in general determine whether or not
some paths will/will not be taken that may assignment statements to the
object's fields, since that would be equivalent to the halting problem.
Worse, the compiler would have to track aliases of the object being set,
in order to know which assignment statements are setting fields in the
object, and which are just computations on the side.)

Furthermore, what if defineObject tries to do something with C other
than setting up fields? The object would be in an illegal state since it
hasn't been fully constructed yet.

> >>I think another interesting idea is using the builder pattern with
> >>create-set-call objects.
> >>
> >>For example, a builder template class could inspect the object for
> >>UDA's indicating a data member which is required during
> >>initialisation.  It would contain a bool[] to flag each member as
> >>not/initialised and expose a setMember() method which would call the
> >>underlying object setMember() and return a reference to itself.
> >>
> >>At some point, these setMember() method would want to return another
> >>template class which contained just a build() member.  I'm not sure
> >>how/if this is possible in D.
> >[...]
> >
> >Hmm, this is an interesting idea indeed. I think it may be possible to
> >implement in the current language.
> 
> The issue I think is the step where you want to mutate the return
> type from the type with setX members to the type with build().

I'm not sure I understand that sentence. Could you rephrase it?

> >Maybe we can make use of UDAs to indicate which fields are mandatory
> 
> That was what I was thinking.
> 
> >[...]
> >Just a rough idea, haven't actually tried to compile this code yet.
> 
> Worth a go, it doesn't require compiler support like my idea so it's
> far more likely you'll get something at the end of it.. I can just
> sit on my hands and/or try to promote my idea.
> 
> I still prefer my idea :P.  I think it's cleaner and simpler, this
> is in part because it requires compiler support and that hides the
> gory details, but also because create-set-call is a simpler style in
> itself.  Provided the weaknesses of create-set-call can be addressed
> I might be tempted to use that style.
[...]

One thing I like about your idea is that you can reuse the same chunk of
memory that the eventual object is going to sit in. With my approach,
the ctors still have to copy the struct fields into the object fields,
so there is some overhead there. (Having said that though, that overhead
shouldn't be anything worse than the ctor-with-arguments calls it
replaces; you're basically just abstracting away the ctor parameters on
the stack into a struct. In machine code it's pretty much equivalent.)

Requiring compiler support, though, as you said, makes your idea less
likely to actually happen. I still see it as essentially equivalent to
my approach; the syntax is different and the usage pattern differs, but
at the end of the day, it amounts to the same thing: basically your
objects have two phases, a post-creation, pre-usage stage where you set
things up, and a post-setup stage where you actually start using it.

Anyway, now that I'm thinking about this problem again, I'd like to take
a step back and consider if any other good approaches may exist to
tackle this issue. I'm thinking of the general case where the
initialization of an object may be arbitrarily complex, such that
neither a struct of ctor arguments nor an initialization block may be
sufficient.

The problem with the struct approach is, what if you need a complex
setup process, say constructing a graph with complex interconnections
between nodes? In order to express such a thing, you have to essentially
already create the object before you can pass the struct to the ctor,
which kinda defeats the purpose. Similarly, your approach of an
initialization block suffers from the limitation that the initialization
is confined to that block, and you can't allow arbitrary code in that
block (otherwise you could end up using an object that hasn't been fully
constructed yet -- like the defineObject problem I pointed out above).

Keeping in mind the create-set-call pattern and Perl's approach of
"blessing" an object into a full-fledged class instance, I wonder if a
more radical approach might be to have the language acknowledge that
objects have two phases, a preinitialized state, and a fully-initialized
state. These two would have distinct types *in the type system*, such
that you cannot, for example, call post-init methods on a
pre-initialization object, and you can't call an init method on a
post-initialization object. The ctor would be the unique transition
point which takes a preinitialized object, verifies compliance with
class invariants, and returns a post-initialization object.

In pseudo-code, this might look something like this:

	class MyClass {
	public:
		@preinit void setName(string name);
		@preinit void setAge(int age);

		this() {
			if (!validateFields())
				throw new Exception(...);
		}

		// The following are "normal" methods that cannot be
		// called in a preinit state.
		void computeStatistics();
		void dotDotDotMagic();
	}

	void main() {
		auto obj = new MyClass();
		assert(typeof(obj) == MyClass.preinit);
		/* MyClass.preinit is a special type indicating that the
		 * object isn't fully initialized yet */

		// Compile error: cannot call non- at preinit method on
		// @preinit object.
		//obj.computeStatistics();

		obj.setName(...);	// OK
		obj.setAge(...);	// OK

		// Transition object to full-fledged state
		obj.this();		// not sure about this syntax yet

		assert(typeof(obj) == MyClass);
		/* Now obj is a full-fledged member of the class */

		// Compile error: can't call @preinit method on
		// non-preinit object
		//obj.setName(...);

		obj.computeStatistics();	// OK
	}

MyClass.preinit would be a separate type in the type system, so that you
can pass it around without any risk that someone will try to perform
illegal operations on it before it's fully initialized:

	void doSetup(MyClass.preinit obj) {
		obj.setName(...);		// OK
		//obj.computeStatistics();	// compile error
	}
	void main() {
		auto obj = new MyClass();
		doSetup(obj);		// OK
		obj.this();		// "promote" to full-fledged object

		// Illegal: can't implicitly convert MyClass into
		// MyClass.preinit.
		//doSetup(obj);

		obj.computeStatistics(); // OK
	}

Maybe "obj.this()" is not a good syntax, perhaps "obj.promote()"?

In any case, this is a rather radical idea which requires language
support; I'm not sure how practical it is. :)

T

-- 
"Uhh, I'm still not here." -- KD, while "away" on ICQ.