Threading and the Garbage handler mess.

Sean Kelly sean at invisibleduck.org
Sat Sep 6 08:52:57 PDT 2008


Alan Knowles wrote:
...
> 
> The problems..
> -------------------------------------------------
> A) CROSS THREAD FREE's ARE EXTREMELY DANGEROUS
> 
>  The current GC implementations free across threads both explicity with 
> delete, and implicitly with genCollect() which is just downright dangerous.
> 
> Consider this example - const char Gotcha
> 
> .......................................
> extern (C) {
>     void * x = mylib_new();
>     void mylib_dosomething(void x* const char* z);
>     
> }
> ... d code ...
> function risky_function(char[] a) {
>     mylib_dosomething(mylib, a.toStringz());
> }
> .......................................
> 
> 
> In the above code I've left const char* in there, as that's what's in 
> the C headers, but is not actually in the D code.
> 
> What is critical about this code is that while mylib* x is active, and 
> has been passed 'char* z' in D, as in the risky_function() examples. For 
> the purpose of the garbage collecter which can only look at memory it 
> knows about to find pointers to 'char *z' z is ready to be collected... 
> This problem can easily crop up with the libc functions that expect 
> const char*'s and store data which another method may queried later, and 
> find has been overwritten with something completely different (imagine 
> function pointers!!)

There are very few C standard library routines which store data that 
other functions reference, largely because they aren't thread-safe.  So 
I don't think this is actually an issue in practice.  But more 
generally, you're right.  If you're using an external library that 
stores a reference to GCed data and you intend to discard your own 
reference to that data then you have to tell the GC not to collect it.

> -------------------------------------------------
> B) NOT RUNNING genCollect() JUST WILL KILL MEMORY
> as in the example above, you may think that turning off the GC and not 
> running getCollect(), they trying to free memory as you get it might be 
> a better idea.
> 
> Unfortunatly generally D is dependant and expects you to be running the 
> GC! a really good example is the toStringz() code.
> 
> what toStrings does is this
> char* toStringz(char[] s) {
>     char[] copy = new char[s.length+1];
>     copy[0..s.length] = s;
>     copy[s.length] =0; // pad end with \0
>     return copy.ptr;
> }
> 
> what char[] looks like under gdb is
> struct { length = 123; ptr = "xxxxxxxxxx" }
> 
> the above code is probably doing a malloc() for the struct and a 
> malloc() for the ptr data, but leaves the struct hanging and expected to 
> be garbage collected, returning the ptr which is then sent on to a 
> external method, and might be expected to be constant. ***Well not 100% 
> sure about that behaviour, the struct may not be malloced, but just 
> allocated on the stack..***

You're right.  String operations in D necessarily assume the presence of 
a GC.  Concatenation, appending, etc, all allocate new memory and leave 
the old memory untouched... assuming that it may still be referenced by 
something.

> -------------------------------------------------
> POSSIBLE SOLUTION?
> -------------------------------------------------
> 
> What I've hacked into the GC so far:
> 
> - getIdx() for std.thread.Thread exposing the private thread.idx ???< 
> how reliable is this??? - should be OK as long as you never delete a 
> thread....
> 
> - A GC Log like Array for associating each bit of malloc()'d memory is 
> owned by which thread.
> 
> - Checks in genCollect code to ensure that it does not try and free 
> other thread's memory.
> 
> - Warnings in free() when the program does exactly that..

What about transferring data between threads?  That aside, I think the 
new "shared" semantics in D2 address this fairly well.

> ONGOING IDEAS
> -------------------------------------------------
> - One gcx pool for each thread.

This needs language support to work well, since some restrictions must 
be placed on the use of static data if GC scanning of all threads is to 
be avoided.


Sean



More information about the Digitalmars-d mailing list