Cloning in D

Mon Sep 6 19:16:21 PDT 2010

On 2010-09-06 20:55:16 -0400, dsimcha <dsimcha at yahoo.com> said:

> == Quote from Michel Fortin (michel.fortin at michelf.com)'s article
>> I'm under the impression that a too permissive generic implementation
>> of cloning is going to break things in various scenarios.
> 
> In general you raise some very good issues, but IMHO the right way to 
> do cloning
> is to have permissive generic cloning that works in the 90% of cases and can be
> easily overridden in the 10% of cases, not to require writing tons of 
> boilerplate
> in the 90% of cases just to make sure it doesn't do the wrong thing by 
> default in
> the 10% of cases.

To me automatic cloning of everything (physical cloning in your 
parlance) looks more like 50/50 work/doesn't-work ratio. I can only 
guess, but I'm probably used to different use cases than you are.

> A second point is that the thing that brought this whole cloning issue 
> to my mind
> was making std.concurrency's message passing model less obtuse.  Right now it's
> hard to use for non-trivial things because there's no safe way to pass complex
> state between threads.  If we start allowing all kinds of exceptions to 
> the "clone
> the **entire** object graph" rule, cloning will rapidly become useless 
> for safely
> passing complex object graphs between threads.

This I agree with. I'm not arguing against automatic cloning per-see, 
I'm just trying to show cases where it doesn't work well.

Personally, I'm rather skeptical that we can make it safe and efficient 
at the same time without better support from the language, something 
akin the mythical "unique" type modifier representing a reference with 
no aliasing.

>> What if your
>> object or structure is part of a huge hierarchy where things contains
>> pointers to their parent (and indirectly to the whole hierarchy), will
>> the whole hierarchy be cloned?
> 
> Isn't that kind of the point?

Well, that depends. If you send each leaves of a tree as a message to 
various threads presumably to perform something concurrently with the 
data in that leaf, then you may want only the leaf to be copied. You 
may not want every parent down to the root and then up to every other 
leaf to be copied alongside with each message just because the leaf you 
send has a pointer to the parent.

In fact, it depends on the situation. If what you want to do with the 
leaf in the other thread requires the leaf to know its parent and 
everything else, then sure you need to copy the whole hierarchy. But 
otherwise it's a horrible waste of memory and CPU to clone the whole 
object graph for each message, even though it won't affect the 
program's correctness.

And it's basically the same thing with observers. If your observer is a 
controller in charge of updating a window when something changes, you 
don't want to clone the observer, then clone the window and everything 
in it just because you're sending some piece of data to another thread. 
Perhaps the program architecture is just wrong, or perhaps that 
observer is a synchronized class capable of handling function calls 
from multiple threads so it doesn't really need to be copied.

>> What happens if your object or structure
>> maintains a reference to a singleton, will we get two instances of a
>> singleton?
> 
> Very good point.  I guess the reasonable use case for holding a reference to a
> singleton (instead of just using the globally accessible one) would be if it's
> polymorphic with some other object type?  If you're using message passing
> concurrency, most of your mutable singletons are probably thread-local, 
> and what
> you probably really want to do is use the thread-local singleton of the thread
> you're passing to.

What intrigues me is how such a mechanism would work... although in my 
mind it's probably not even worth supporting at all, singletons be 
damned!

>> My understanding is that a data structure containing a pointer cannot
>> be cloned safely unless it contains some specific code to perform the
>> cloning. That's because the type system can't tell you which pointers
>> point to things owned by the struct/class and which one need to be
>> discarded when cloning (such as a list of observers, or the parents of
>> a hierarchy).
> 
> This discussion is making me think we really need two kinds of cloning: 
>  Physical
> cloning would clone the entire object graph no matter what, such that 
> the cloned
> object could be safely passed to another thread via std.concurrency and 
> be given a
> unique type.  Logical cloning would be more like what you describe.  In 
> general,
> this discussion has been incredibly useful because I had previously only
> considered physical cloning.

This is an interesting and valid observation. But I think you need to 
leave a door open to customization of the "physical cloning" case too. 
The ability to avoid cloning unnecessary data is as necessary as the 
ability to easily copying an entire object graph.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/