enforce()?

Steven Schveighoffer schveiguy at yahoo.com
Mon Jul 19 11:50:11 PDT 2010


On Mon, 19 Jul 2010 13:47:38 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> On 07/19/2010 12:23 PM, Steven Schveighoffer wrote:
>> On Mon, 19 Jul 2010 12:21:36 -0400, Andrei Alexandrescu
>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>
>>> By the way, I'm still eagerly waiting for your red-black tree
>>> implementation.
>>
>> Sorry for the delay, I've been very busy at work, and I wanted to slip
>> in a couple druntime fixes for array appending.
>>
>> All that is left really is the unit testing, and making the docs more
>> phobos-ish.
>>
>>> I think it would be pure awesomeness if you massaged the red/black bit
>>> inside one of the pointers. I figured out a way of doing that without
>>> throwing off the garbage collector:
>>
>> Yes, that works (BTW, you don't need the union, I hate unions :), just
>> substitute _bits for _left everywhere, I think it would even work with a
>> moving GC).
>
> Walter told me that union is instrumental to keeping the compiler in the  
> know about such shenanigans. What does your idea look like? You mean  
> keeping a possibly misaligned RBTreeNode pointer and manipulating that?  
> I think that's a bit worse than unions because it transforms a sure  
> thing into a maybe works thing.

I don't pretend to know what ominous problems Walter knows about regarding  
the compiler's view, but here is what I'm thinking:

If a pointer points to the beginning of a node, and a node has at least  
one pointer in it (which it must, since it's a tree), then pointing one  
byte into the node means you're still pointing at the same block, making  
sure the GC doesn't collect.

Really, the generated code will be exactly the same as your solution, but  
it's less of a misuse of the type system IMO (believe it or not).  I'd  
rather use casts when you are trying to use something that's typed as one  
thing as something else.  When using unions, I usually expect only one  
member of the union to be valid at any one time.

And wouldn't a union be more egregious with the upcoming mostly-precise  
scanner?

>
>> But I don't know how important it is to save that extra 4 bytes/node. A
>> redblack node already has 3 pointers in it, the flag puts it to 16 bytes
>> instead of overhead instead of 12. It certainly can be an implementation
>> choice.
>>
>> I can look at left-leaning trees (I've had it on my todo list for
>> dcollections too).
>
> Sounds great. If the payload is one word, on a 32-bit system we'd have  
> 20 bytes per node. I seem to recall the current GC can allocate 16 bytes  
> and then 32 bytes and then 48 bytes, so with the embedded bit we're  
> looking at halving the total allocated size. Not too shoddy!

Not quite :)  There is one byte for padding because (insert gasp-inspiring  
music accent) all struct heap allocations are allocated through newArray  
with a size of 1.  I discovered this when working on the array append  
patch.

So even a 16-byte struct requires a 32-byte block.

> Then the relative overhead of that extra word is not felt up until a  
> payload of 20 bytes, at which point again it jumps to 33%.

Most of this is mitigated if you have a custom allocator that allocates an  
array of nodes at once (what I do in dcollections).  As a simple  
implementation, you could allocate enough nodes to be under a certain  
threshold of wasted space.

> I wonder what things look like (alignment, granularity) for the 64-bit  
> implementation.

They must be 8-byte aligned, and have 3 8-byte pointers, so that means at  
least 24 bytes.  If you store an int, then it will still fit in the  
32-byte block.  I don't know what's planned as the minimum size for 64-bit  
GC.

-Steve


More information about the Digitalmars-d mailing list