std.container.RedBlackTree versus C++ std::set

Thu Feb 14 04:22:59 PST 2013

First, thank you all for the replies!

Jonathan M Davis wrote:
> Also, it could make a big difference if you use gdc or ldc 
> rather than dmd.

Thank you for the suggestion, I'll check that.  However, I don't 
expect the GCC optimizer to reduce the number of calls in this 
specific test, since it did not kick in in the more obvious case 
as I mentioned.

Right now, I am more concerned with the performance of the 
front-end (RedBlackTree): if it currently can't be tuned to 
perform fast enough, I'll just know that I should implement my 
own version of a binary search tree container for my needs.

Rob T wrote:
> You can check if disabling the GC just before the insert process
> improves the performance. You may see 3x performance 
> improvement.
> Disabling is safe provided you re-enable, this can be done
> reliably with scope(exit) or something similar.

Hmm, it indeed performed faster, though not 3x in my setup.  
However, the number of copy constructor calls stayed the same.

Steven Schveighoffer wrote:
> I find the number of postblit calls excessive too.
> I will have to look into why that happens.
> I can say that once an element is allocated, it is not moved or 
> copied.

Could it be easily enforced in the library?  Something like this: 
when accessing data only for internal use in the data structure, 
use an interface that disables assignment?  Can such cases be 
distinguished from the ones when the library user actively wants 
to copy data?

Steven Schveighoffer wrote:
> I will note that std.container.RedBlackTree is a port of 
> dcollections'
> RBTree implementation.

Thank you for pointing that out.  Perhaps I'll have to try the 
test with that implementation, too.

monarch_dodra wrote:
> Keep in mind that C++ and D have very different philosophies
> regarding copy construction.
> ...
> The conclusion is that the comparison is not fair: D's pass by
> value is not *very* different from C++'s pass by ref (in amount
> of data copied).

Well, perhaps I didn't make myself clear enough.  The comparison 
I posted is not intended to be a fair comparison of languages in 
the general case!  It is just my use case stripped to a minimal 
example that still shows the number of calls correctly.  In the 
actual use case, I have something like

-----
struct element
{
	long value;
	int [] one;
	int [] two;

	this (this)
	{
		one = one.dup;
		two = two.dup;
	}
}
-----

Sure I could think of some other way to represent the data I need 
as an object, this one just seems the most intuitive.

monarch_dodra wrote:
> If you *do* want a fair-er comparison, then I'd suggest you
> implement a ".dup" on your object, and see how many times THAT
> gets called. Guess what: 0.

Right, but it's the copy constructor that gets called when I copy 
a struct, and in these cases, I actually want it to copy the 
arrays as well.  Otherwise, the element object may end up in a 
bad state.

Do you imply that I should implement dup for every struct I 
create and then leave the copy constructor empty?  As the library 
makes use of the copy constructor and not dup, I fear it would be 
an incorrect design, since the library will make broken copies of 
my struct and pass them around.

monarch_dodra wrote:
> I'm not saying D's approach is perfect, just that D's library is
> optimized for D-like types.

and Rob T wrote:
> I agree. There are cases where structs make a lot of sense,
> usually when they are very simple simple and contain no pointers
> or references, otherwise structs should be avoided in favor of
> classes to avoid doing copy/move constructors and to avoid
> concerns over performance optimizations. With classes, only
> certain points in your code require that a duplicate copy be 
> made
> of the class instance, the majority of code need only to pass
> around a reference which is the default behavior - easy and 
> fast!

So, you suggest passing things by reference once they grow beyond 
several bytes in size?  Wouldn't that mean having to constantly 
pay attention to some counter-intuitive code when I actually want 
my objects to behave like values, not references (e.g., matrices 
or other complex math objects)?  In my example case with arrays 
one and two, I do want my whole object to obey value semantics.

Once again, thank you all for attention.

-----
Ivan Kazmenko.