Interesting Research Paper on Constructors in OO Languages

Wed Jul 17 10:58:53 PDT 2013

On Wed, Jul 17, 2013 at 11:00:38AM +0100, Regan Heath wrote:
> On Tue, 16 Jul 2013 23:01:57 +0100, H. S. Teoh
> <hsteoh at quickfur.ath.cx> wrote:
> >On Tue, Jul 16, 2013 at 06:17:48PM +0100, Regan Heath wrote:
[...]
> >>>>class Foo
> >>>>{
> >>>>  string name;
> >>>>  int age;
> >>>>
> >>>>  invariant
> >>>>  {
> >>>>    assert(name != null);
> >>>>    assert(age > 0);
> >>>>  }
> >>>>
> >>>>  property string Name...
> >>>>  property int Age...
> >>>>}
> >>>>
> >>>>void main()
> >>>>{
> >>>>  Foo f = new Foo() {
> >>>>    Name = "test",    // calls property Name setter
> >>>>    Age = 12          // calls property Age setter
> >>>>  };
> >>>>}
> >
> >Maybe I'm missing something obvious, but isn't this essentially the
> >same thing as having named ctor parameters?
> 
> Yes, if we're comparing this to ctors with named parameters.  I
> wasn't doing that however, I was asking this Q:
> 
> "Or, perhaps another way to ask a similar W is.. can the compiler
> statically verify that a create-set-call style object has been
> initialised, or rather that an attempt has at least been made to
> initialise all the required parts."
> 
> Emphasis on "create-set-call" :)  The weakness to create-set-call
> style is the desire for a valid object as soon as an attempt can be
> made to use it.  Which implies the need for some sort of enforcement
> of initialisation and as I mentioned in my first post the issue of
> preventing this intialisation being spread out, or intermingled with
> others and thus making the semantics of it harder to see.

Ah, I see. So basically, you need some kind of enforcement of a
two-state object, pre-initialization and post-initialization. Basically,
the ctor is empty, so you allocate the object first, then set some
values into it, then it "officially" becomes a full-fledged instance of
the class. To prevent problems with consistency, a sharp transition
between setting values and using the object is enforced. Am I right?

I guess my point was that if we boil this down to the essentials, it's
basically the same idea as a builder pattern, just implemented slightly
differently. In the builder pattern, a separate object (or struct, or
whatever) is used to encapsulate the state of the object that we'd like
it to be in, which we then pass to the ctor to create the object in that
state. The idea is the same, though: set up a bunch of values
representing the desired initial state of the object, then, to borrow
Perl's terminology, "bless" it into a full-fledged class instance.

> My idea here attempted to solve those issues with create-set-call only.

Fair enough. I guess my approach was from the angle of trying to address
the problem from the confines of the current language. So, same idea,
different implementation. :)

[...]
> >>The idea was to /use/ the code in the invariant to determine which
> >>member fields should be set during the initialisation statement and
> >>then statically verify that a call was made to some member function
> >>to set them.  The actual values set aren't important, just that some
> >>attempt has been made to set them.  That's about the limit of what I
> >>think you could do statically, in the general case.
> >[...]
> >
> >This still doesn't address the issue of ctor argument proliferation,
> >though
> 
> It wasn't supposed to :)  create-set-call ctors have no arguments.

True. But if the ctor call requires a code block that initializes
mandatory initial values, then isn't it essentially the same thing as
ctors that have arguments? If the class hierarchy is deep, and base
classes have mandatory fields to be set, then you still have the same
problem, just in a different manifestation.

> >if each level of the class hierarchy adds 1-2 additional parameters,
> >you still need to write tons of boilerplate in your derived classes
> >to percolate those additional parameters up the inheritance tree.
> 
> In the create-set-call style additional required 'arguments' would
> appear as setter member functions whose underlying data member is
> verified in the invariant and would therefore be enforced by the
> syntax I detailed.

What happens when base classes also have required setter member
functions that you must call?

> >Now imagine if at some point you need to change some base class ctor
> >parameters. Now instead of making a single change to the base class,
> >you have to update every single derived class to make the same change
> >to every ctor, so that the new version of the parameter (or new
> >parameter) is properly percolated up the inheritance tree.
> 
> This is one reason why create-set-call might be desirable, no ctor
> arguments, no problem.

Right.

> So, to take my idea a little further - WRT class inheritance.  The
> compiler, for a derived class, would need to inspect the invariants
> of all classes involved (these are and-ed already), inspect the
> constructors of the derived classes (for calls to initialise
> members), and the initialisation block I described and verify
> statically that an attempt was made to initialise all the members
> which appear in all the invariants.

I see. So basically the user still has to set up all required values
before you can use the object, the advantage being that you don't have
to manually percolate these values up the inheritance tree in the ctors.

It seems to be essentially the same thing as my approach, just
implemented differently. :) In my approach, ctor arguments are
encapsulated inside a struct, currently called Args by convention. So if
you have, say, a class hierarchy where class B inherits from class A,
and A.this() has 5 parameters and B.this() adds another 5 parameters,
then B.Args would have 10 fields. To create an instance of B, the user
would do this:

	B.Args args;
	args.field1 = 10;
	args.field2 = 20;
	...
	auto obj = new B(args);

So in a sense, this isn't that much different from your approach, in
that the user sets a bunch of values desired for the initial state of
the object, then gets a full-fledged object out of it at the end.

In my case, all ctors in the class hierarchy would take a single struct
argument encapsulating all ctor arguments for that class (including
arguments to its respective base class ctors, etc.). So ctors would look
like this:

	class B : A {
		struct Args { ... }
		this(Args args) {
			super(...);
			... // set up object based on values in args
		}
	}

The trick here, then, is that call to super(...). The naïve way of doing
this is to (manually) include base class ctor arguments as part of
B.Args, then in B's ctor, we collect those arguments together in A.Args,
and hand that over to A's ctor. But we can do better. Since A.Args is
already defined, there's no need to duplicate all those fields in
B.Args; we can simply do this:

	class B : A {
		struct Args {
			A.Args baseClassArgs;
			... // fields specific to B
		}
		this(Args args) {
			super(args.baseClassArgs);
			...
		}
	}

This is ugly, though, 'cos now user code has to know about
B.Args.baseClassArgs:

	B.Args args;
	args.baseClassArgs.baseClassParm1 = 123;
	args.derivedClassParm1 = 234;
	...
	auto obj = new B(args);

So the next step is to use alias this to make .baseClassArgs transparent
to user code:

	class B : A {
		struct Args {
			A.Args baseClassArgs;
			alias baseClassArgs this; // <--- N.B.
			... // fields specific to B
		}
		this(Args args) {
			// Nice side-effect of alias this: we can pass
			// args to super without needing to explicitly
			// name .baseClassArgs.
			super(args);
			...
		}
	}

	// Now user code doesn't need to know about .baseClassArgs:
	B.Args args;
	args.baseClassParm1 = 123;
	args.derivedClassParm1 = 234;
	...
	auto obj = new B(args);

This is starting to look pretty good. Now the next step is, having to
type A.Args baseClassArgs each time is a lot of boilerplate, and could
be error-prone. For example, if we accidentally wrote C.Args instead of
A.Args:

	class B : A {
		struct Args {
			C.Args baseClassArgs; // <--- oops!
			alias baseClassArgs this;
			...
		}
		...
	}

So the next step is to make the type of baseClassArgs automatically
inferred, so that no matter how we move B around in the class hierarchy,
it will always be correct:

	class B : A {
		struct Args {
			typeof(super).Args baseClassArgs; // ah, much better!
			alias baseClassArgs this;
			...
		}
		this(Args args) {
			super(args);
			...
		}
	}

This is good, because now, the declaration of B.Args is independent of
whatever base class B has. Similarly, thanks to the alias this
introduced earlier, the call to super(...) is always written
super(args), without any explicit reference to the specific base class.
DRY is good. Of course, this is still a lot of boilerplate: you have to
keep typing out the first 3 lines of the declaration of Args, in every
derived class. But now that we've made this declaration independent of
an explicit base class name, we can factor it into a mixin:

	mixin template CtorArgs(string fields) {
		struct Args {
			typeof(super).Args baseClassArgs;
			alias baseClassArgs this;
			mixin(fields);
		}
	}

	class B : A {
		mixin CtorArgs!(q{
			int derivedParm1;
			int derivedParm2;
			...
		});
		this(Args args) {
			super(args);
			...
		}
	}

Now we can simply use CtorArgs!(...) in each derived class to
automatically declare the Args struct correctly. The boilerplate is now
minimal. Things continue to work even if we move B around in the class
hierarchy. Say we want to derive B from C instead of A; then we'd simply
write:

	class B : C {	// <-- this is the only line that's different!
		mixin CtorArgs!(q{
			int derivedParm1;
			int derivedParm2;
			...
		});
		this(Args args) {
			super(args);
			...
		}
	}

Finally, we add a little detail to our mixin so that we can use it for
the root of the class hierarchy as well. Right now, we still have to
explicitly declare A.Args (assuming A is the root of our hierarchy),
which is bad, because you may accidentally call it something that
doesn't match what CtorArgs expects. We'd like to be able to
consistently use CtorArgs even in the root base class, so that if we
ever need to re-root the hierarchy, things will continue to Just Work.
So we revise CtorArgs thus:

	mixin template CtorArgs(string fields) {
		struct Args {
			static if (!is(typeof(super)==Object)) {
				typeof(super).Args baseClassArgs;
				alias baseClassArgs this;
			}
			mixin(fields);
		}
	}

Basically, the static if just omits the whole baseClassArgs and alias
this deal ('cos the root of the hierarchy has no superclass that also
has an Args struct). So now we can write:

	class A {
		mixin CtorArgs!(q{ /* ctor fields here */ });
		...
	}

And if we ever re-root the hierarchy, we can simply write:

	class A : B {	// <--- this is the only line that changes
		mixin CtorArgs!(q{ /* ctor fields here */ });
		...
	}

> >I think my approach of using builder structs with a parallel
> >inheritance tree is still better
> 
> It may be, it certainly looked quite neat but I haven't had a
> detailed look at it TBH.  I think you've missunderstood my idea
> however, or rather, the issues it was intended to solve :)  Perhaps
> my idea is too limiting for you?  I could certainly understand that
> point of view.

Well, I think our approaches are essentially the same thing, just
implemented differently. :)

One thing about your implementation that I found limiting was that you
*have* to declare all required fields on-the-spot before the compiler
will let your 'new' call pass, so if you have to create 5 similar
instances of the class, you have to copy-n-paste most of the set-method
calls:

	auto obj1 = new C() {
		name = "test1",
		age = 12,
		school = "D Burg High School"
	});

	auto obj2 = new C() {
		name = "test2",
		age = 12,
		school = "D Burg High School"
	}

	auto obj3 = new C() {
		name = "test3",
		age = 12,
		school = "D Burg High School"
	}

	auto obj4 = new C() {
		name = "test4",
		age = 12,
		school = "D Burg High School"
	}

	auto obj5 = new C() {
		name = "test5",
		age = 12,
		school = "D Burg High School"
	}

Whereas using my approach, you can simply reuse the Args struct several
times:

	C.Args args;
	args.name = "test1";
	args.age = 12;
	args.school = "D Burg High School";
	auto obj1 = new C(args);

	args.name = "test2";
	auto obj2 = new C(args);

	args.name = "test3";
	auto obj3 = new C(args);

	... // etc.

You can also have different functions setup different parts of C.Args:

	C createObject(C.Args args) {
		// N.B. only need to set a subset of fields
		args.school = "D Burg High School";
		return new C(args);
	}

	void main() {
		C.Args args;
		args.name = "test1";
		args.age = 12;		// partially setup Args
		auto obj = createObject(args); // createObject fills out rest of the fields.
		...

		args.name = "test2";	// modify a few parameters
		auto obj2 = createObject(args); // createObject doesn't need to know about this change
	}

This is nice if there are a lot of parameters and you don't want to
collect the setting up of all of them in one place.

> I think another interesting idea is using the builder pattern with
> create-set-call objects.
> 
> For example, a builder template class could inspect the object for
> UDA's indicating a data member which is required during
> initialisation.  It would contain a bool[] to flag each member as
> not/initialised and expose a setMember() method which would call the
> underlying object setMember() and return a reference to itself.
> 
> At some point, these setMember() method would want to return another
> template class which contained just a build() member.  I'm not sure
> how/if this is possible in D.
[...]

Hmm, this is an interesting idea indeed. I think it may be possible to
implement in the current language. It would solve the problem of
mandatory fields, which is currently a main weakness of my approach (the
user can neglect to setup a field in Args, and there's no way to enforce
that those fields *must* be set -- you could provide sane defaults in
the declaration of Args, but if some fields have no sane default value,
then you're out of luck). One approach is to use Nullable for mandatory
fields (or equivalently, use bool[] as you suggest), then the ctors will
throw an exception if a required field hasn't been set yet. Which isn't
a bad solution, since ctors in theory *should* vet their input values
before creating an instance of the class anyway. But it does require
some amount of boilerplate.

Maybe we can make use of UDAs to indicate which fields are mandatory,
then have a template (or mixin template) uses compile-time reflection to
generate the code that verifies that these fields have indeed been set.
Maybe something like:

	struct RequiredAttr {}

	// Warning: have not tried to compile this yet
	mixin template checkCtorArgs(alias args) {
		alias Args = typeof(args);
		foreach (field; __traits(allMembers, Args)) {
			// (Ugh, __traits syntax is so ugly)
			static if (is(__traits(getAttributes,
				__traits(getMember, args,
				field)[0])==RequiredAttr))
			{
				if (__traits(getMember, args, field) is null)
					throw new Exception("...");
			}
		}
	}

	class B : A {
		mixin CtorArgs!(q{
			int myfield1;	// this one is optional
			@(RequiredAttr) Nullable!int myfield2; // this one is mandatory
		});
		this(Args args) {
			mixin checkCtorArgs!(args);
				// throws if any mandatory fields aren't set
			...
		}
	}

Just a rough idea, haven't actually tried to compile this code yet.

On second thoughts, maybe we could just check for an instantiation of
Nullable instead of using a UDA, since if you forget to use a nullable
value (like int instead of Nullable!int), this code wouldn't work.

Or maybe enhance the CtorArgs template to automatically substitute
Nullable!T when it sees a field of type T that's marked with
@(RequiredAttr). Or maybe your bool[] idea is better, since it avoids
the dependency on Nullable.

In any case, this is an interesting direction to look into.

T

-- 
Тише едешь, дальше будешь.