std.container.RedBlackTree versus C++ std::set

Ivan Kazmenko gassa at mail.ru
Wed Feb 13 15:22:02 PST 2013


Hi!

I'm learning to use D collections properly, and I'm looking for a 
sorted data structure with logarithmic access time (i.e., a 
binary search tree will do, but a hash table would not help). As 
far as I can see, std.container.RedBlackTree is exactly what I 
need. However, I am not sure if I use it as intended as its 
performance seems inferior to a C++ STL solution (e.g., std::set).

To be more specific, right now I wonder what is the best (or 
intended) way to store an object in the RedBlackTree: should it 
be a class reference, or a struct (passed by value), or something 
quirkier like an integer pointing into an array or a simple 
pointer. The rest of my program suggested to use structs, but the 
whole thing turned out to be rather slow, and the profiler told 
me that these structs are being copied around much more than I 
anticipated.

And so I wrote a minimalistic test program to check the number of 
copy (postblit) constructor calls. Here is the D version:

-----
import std.container;
import std.stdio;

immutable int LIMIT = 100000;

struct element
{
	static int postblit_counter;

	long x;

	int opCmp (ref element other)
	{
		return (x > other.x) - (x < other.x);
	}

	this (long nx)
	{
		x = nx;
	}

	this (ref this)
	{
		assert (x == x);
		postblit_counter++;
	}
}

alias RedBlackTree !(element) container;

void main ()
{
	auto f = new container ();
	element.postblit_counter = 0;
	foreach (i; 0..LIMIT)
	{
		f.insert (element (i));
	}
	writefln ("%s", element.postblit_counter);
}
-----

And now here is a C++ equivalent:

-----
#include <cstdio>
#include <set>
#include <stdint.h>

const int LIMIT = 100000;

using namespace std;

struct element
{
	static int postblit_counter;

	int64_t x;

	bool operator < (const element other) const
	{
		return (x < other.x);
	}

	element (int64_t nx)
	{
		x = nx;
	}

	element (const element & other)
	{
		postblit_counter++;
	}
};

int element::postblit_counter;

typedef set <element> container;

int main (void)
{
	container f;
	element::postblit_counter = 0;
	for (int i = 0; i < LIMIT; i++)
	{
		f.insert (element (i));
	}
	printf ("%d\n", element::postblit_counter);
	return 0;
}
-----

And the results are:
D2 (DMD 2.059, -O):             11,389,556
C++ (MinGW GCC 4.7.2, -O2):      3,072,387

As you can see, in order to insert 100,000 elements, D needs a 
few times more copy constructor calls than C++. However, as far 
as I know, the internal structure used by C++ std::set is the 
very same red-black tree! Am I doing something wrong? And if not, 
what is this cost paid for, are there any features that 
RedBlackTree possesses and STL set doesn't?

Personally, I don't see why at all we should call the copy 
constructor more than once per element. I mean, if we intend to 
build a generic data structure, we sure need an internal node 
object with some extra bytes (internal references and counters) 
per each element, at least in the case of a red-black tree. So 
why don't we just bind each element to that internal node once 
and for all, and then, as long as the node is in the structure, 
use the data contained in it only by reference? What do we 
achieve if we choose to use it by value somewhere?

And if the intended design is "use class references", will that 
create an overhead for garbage collector later on?

...well, thank you for reading it to this point.

-----
Ivan Kazmenko.


More information about the Digitalmars-d-learn mailing list