Interesting Research Paper on Constructors in OO Languages

Regan Heath regan at netmail.co.nz
Fri Jul 19 03:31:53 PDT 2013


On Thu, 18 Jul 2013 19:00:44 +0100, H. S. Teoh <hsteoh at quickfur.ath.cx>  
wrote:

> On Thu, Jul 18, 2013 at 10:13:58AM +0100, Regan Heath wrote:
>> On Wed, 17 Jul 2013 18:58:53 +0100, H. S. Teoh
>> <hsteoh at quickfur.ath.cx> wrote:
> [...]
>> >I guess my point was that if we boil this down to the essentials,
>> >it's basically the same idea as a builder pattern, just implemented
>> >slightly differently. In the builder pattern, a separate object (or
>> >struct, or whatever) is used to encapsulate the state of the object
>> >that we'd like it to be in, which we then pass to the ctor to create
>> >the object in that state. The idea is the same, though: set up a
>> >bunch of values representing the desired initial state of the object,
>> >then, to borrow Perl's terminology, "bless" it into a full-fledged
>> >class instance.
>>
>> It achieves the same ends, but does it differently.  My idea requires
>> compiler support (which makes it unlikely to happen) and doesn't
>> require separate objects (which I think is a big plus).
>
> Why would requiring separate objects be a problem?

It's not a problem, it's just better not to, if at all possible. K.I.S.S.  
:)

> In my case, the derived class ctor could manually set some of the fields
> in Args before handing to the superclass. Of course, it's not as ideal,
> since if user code already sets said fields, then they get silently
> overridden.

That's the problem I was imagining.

>> Also, in your approach there isn't currently any enforcement that
>> the user sets all the mandatory parameters of Args, and this is
>> kinda the main issue my idea solves.
>
> True. One workaround is to use Nullable and check that in the ctor. But
> I suppose it's not as great as a compile-time check.

Yeah, I was angling for a static/compile time check, if at all possible.

>> >Whereas using my approach, you can simply reuse the Args struct
>> >several times:
>> >
>> >	C.Args args;
>> >	args.name = "test1";
>> >	args.age = 12;
>> >	args.school = "D Burg High School";
>> >	auto obj1 = new C(args);
>> >
>> >	args.name = "test2";
>> >	auto obj2 = new C(args);
>> >
>> >	args.name = "test3";
>> >	auto obj3 = new C(args);
>> >
>> >	... // etc.
>>
>> Or.. you use a mixin, or better still you add a copy-constructor or
>> .dup method to your class to duplicate it :)
>
> But then you end up with the problem of needing to call set methods
> after the .dup

Which is no different to setting args.name beforehand, the same number of  
assignments.  In the example above it's N+1 assignments, N args or dup'ed  
members and 1 more for 'name' before or after the construction.

> which may complicate things if the set methods need to
> do non-trivial initialization of internal structures (caches or internal
> representations, etc.).

Ahh, yes, and in this case you'd want to use the idea below, where you  
call a method to set the common parts and manually set the differences.

> Whereas if you hadn't needed to .dup, you could
> have gotten by without writing any set methods for your class, but now
> you have to.

create-set-call <- 'set' is kinda an integral part of the whole thing :P

> [...]
>> In my case you can call different functions in the initialisation
>> block, e.g.
>>
>> void defineObject(C c)
>> {
>>   c.school = "...);
>> }
>>
>> C c = new C() {
>>   defineObject()
>> }
>>
>> :)
>
> So the compiler has to recursively traverse function calls in the
> initialization block in order to check that all required fields are set?

Yes.  This was an off the cuff idea, but it /is/ a natural extension of  
the idea for the compiler to traverse the setters called inside the  
initialisation block, and ctors in the hierarchy, etc.

> That could have entail some implementational issues, if said function
> calls can be arbitrarily complex. (If you have complex control logic in
> said functions, the compiler can't in general determine whether or not
> some paths will/will not be taken that may assignment statements to the
> object's fields, since that would be equivalent to the halting problem.

All true.  The compiler has a couple of options to (re)solve these issues:
1. It could simply baulk at the complexity and error.
2. It could take the safe route and assume those member assignments it  
cannot verify are uninitialised, forcing manual init.

In fact, erroring at complexity might make for better code in many ways.   
You would have to perform your complex initialisation beforehand, store  
the result in a variable, and then construct/initblock your object.

It does limit your choice of style, but create-set-call already does that  
.. and I'm not immediately against style limitations assuming they  
actually result in better code.

> Worse, the compiler would have to track aliases of the object being set,
> in order to know which assignment statements are setting fields in the
> object, and which are just computations on the side.)

No, aliasing would simply be ignored.  In fact, calling a setter on  
another object in an initblock should probably be an error.  Part of the  
whole "don't mix initialisation" goal I started with.  It does require  
strict properties.

> Furthermore, what if defineObject tries to do something with C other
> than setting up fields? The object would be in an illegal state since it
> hasn't been fully constructed yet.

That's an error.  This is why in my initial post I stated that we'd need  
explicit/well defined properties.  All you would be allowed to call in an  
initialisation block, on the object being initialised, are setter  
properties.. and possibly methods or free function which only call setter  
properties.

>> >>I think another interesting idea is using the builder pattern with
>> >>create-set-call objects.
>> >>
>> >>For example, a builder template class could inspect the object for
>> >>UDA's indicating a data member which is required during
>> >>initialisation.  It would contain a bool[] to flag each member as
>> >>not/initialised and expose a setMember() method which would call the
>> >>underlying object setMember() and return a reference to itself.
>> >>
>> >>At some point, these setMember() method would want to return another
>> >>template class which contained just a build() member.  I'm not sure
>> >>how/if this is possible in D.
>> >[...]
>> >
>> >Hmm, this is an interesting idea indeed. I think it may be possible to
>> >implement in the current language.
>>
>> The issue I think is the step where you want to mutate the return
>> type from the type with setX members to the type with build().
>
> I'm not sure I understand that sentence. Could you rephrase it?

I am imagining using a template to create a type which wraps the original  
object.  The created type would expose setter properties for all the  
mandatory members, and nothing else.  The user would call these setters,  
using UFCS/chain style, however, only after setting all the mandatory  
properties do we want to expose an additional member called build() which  
returns the constructed/initialised object.

So, an example:

class Foo {...}

auto f = Builder!(Foo)().setName("Regan").setAge(33).build();

The type of the object returned from the Builder!(Foo) is our first  
created type, which exposes setName() and setAge(), however the type  
returned from setAge (or whichever member assignment is done last) is the  
second created type, which either has all the set.. members plus build()  
or only build().  The build() method returns a Foo.

So, the type of 'f' above is Foo.

The goal here is to make build() statically available when Foo is  
completely initialised and not before.  Of course we could simplify all  
this by making it available immediately and throwing if some members are  
uninitialised - but that is a runtime check and I was angling for a  
compile time one.

If you wanted to enforce a specific init ordering you could even produce a  
separate type containing only the next member to init, and from each  
setter return the next type in sequence - like a type state machine :p

The template bloat however..

> The problem with the struct approach is, what if you need a complex
> setup process, say constructing a graph with complex interconnections
> between nodes? In order to express such a thing, you have to essentially
> already create the object before you can pass the struct to the ctor,
> which kinda defeats the purpose. Similarly, your approach of an
> initialization block suffers from the limitation that the initialization
> is confined to that block, and you can't allow arbitrary code in that
> block (otherwise you could end up using an object that hasn't been fully
> constructed yet -- like the defineObject problem I pointed out above).

Yes, neither idea works for all possible use-cases.  Yours is naturally  
broader and less limiting because I was starting from a limited  
create-set-call style and imposing further limitation on how it can be  
used.

> Keeping in mind the create-set-call pattern and Perl's approach of
> "blessing" an object into a full-fledged class instance, I wonder if a
> more radical approach might be to have the language acknowledge that
> objects have two phases, a preinitialized state, and a fully-initialized
> state. These two would have distinct types *in the type system*, such
> that you cannot, for example, call post-init methods on a
> pre-initialization object, and you can't call an init method on a
> post-initialization object.

That is essentially the same idea as the builder template solution I talk  
about above :)

> The ctor would be the unique transition
> point which takes a preinitialized object, verifies compliance with
> class invariants, and returns a post-initialization object.

AKA build() above :)

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/


More information about the Digitalmars-d mailing list