Threading and the Garbage handler mess.

Alan Knowles alan at akbkhome.com
Sat Sep 6 08:10:48 PDT 2008


If anyone noticed this week I posted a few questions, and then a few bis 
of code relating to threading and the GC.

Between tearing my hair out and generally verging on total madness, this 
week I'm beginning to get to grips with the GC in a threaded environment.

Basically this statement is true for both phobos and tango (although 
I've examined the phobos GC far more than tango)

-------------------------------------------------
"THE GC AS DELIVERED IS COMPLETELY BROKEN FOR REAL THREADED APPLICATONS, 
and you should not try and write threaded applicaitons without expecting 
to do a large amount of hacking to the GC"
-------------------------------------------------

Background  - HOW THE GC WORKS:
-------------------------------------------------
let's first explain the basic ideas behind the Garbage collector:

Since most things D land are malloc'd via the GC - the GC knows all 
about the memory you have allocated.. - It can also look at the stack 
(currently running memory, eg. int/pointers that you have created in 
your methods)..

Basically what it does is this:

- loop through all the memory it knows about, see's if any of it points 
to any of the memory it has allocated..

- if the memory it has allocated has not been reference by the memory it 
knows about.. then it's dead meat, and can be returned to the  'stack' 
so it can be made available to any 'new' malloc() call.

There is quite a bit more in there, like a really nice design for 
buckets/ and linked lists that default fill it... enabling quick 
allocation depending on what size of memory is requested...

Anyway I hope you get the point..




The problems..
-------------------------------------------------
A) CROSS THREAD FREE's ARE EXTREMELY DANGEROUS

  The current GC implementations free across threads both explicity with 
delete, and implicitly with genCollect() which is just downright dangerous.

Consider this example - const char Gotcha

.......................................
extern (C) {
	void * x = mylib_new();
	void mylib_dosomething(void x* const char* z);
	
}
... d code ...
function risky_function(char[] a) {
	mylib_dosomething(mylib, a.toStringz());
}
.......................................


In the above code I've left const char* in there, as that's what's in 
the C headers, but is not actually in the D code.

What is critical about this code is that while mylib* x is active, and 
has been passed 'char* z' in D, as in the risky_function() examples. 
For the purpose of the garbage collecter which can only look at memory 
it knows about to find pointers to 'char *z' z is ready to be 
collected... This problem can easily crop up with the libc functions 
that expect const char*'s and store data which another method may 
queried later, and find has been overwritten with something completely 
different (imagine function pointers!!)

the only workaround for this is to addRoot() and store the return value 
to any 'toStringz()' call which is delivered into a const char* C call.. 
- even this may not help as you can accidentally delete these objects..

Normally all of this is not a common problem in a non-threaded 
application, as you would probably just run the genCollect() after you 
have finished dealing with mylib() - and everything will get cleaned up 
nicely.

Unfortunatly with a multi-threaded application - if you run genCollect() 
at the wrong time (eg. in another thread), you will destroy the char* 
that was supposed to be constant...


-------------------------------------------------
B) NOT RUNNING genCollect() JUST WILL KILL MEMORY
as in the example above, you may think that turning off the GC and not 
running getCollect(), they trying to free memory as you get it might be 
a better idea.

Unfortunatly generally D is dependant and expects you to be running the 
GC! a really good example is the toStringz() code.

what toStrings does is this
char* toStringz(char[] s) {
	char[] copy = new char[s.length+1];
	copy[0..s.length] = s;
	copy[s.length] =0; // pad end with \0
	return copy.ptr;
}

what char[] looks like under gdb is
struct { length = 123; ptr = "xxxxxxxxxx" }

the above code is probably doing a malloc() for the struct and a 
malloc() for the ptr data, but leaves the struct hanging and expected to 
be garbage collected, returning the ptr which is then sent on to a 
external method, and might be expected to be constant. ***Well not 100% 
sure about that behaviour, the struct may not be malloced, but just 
allocated on the stack..***

There are many other examples in D of code where trying to manage and 
clear memory explicitly is dificult / or near impossible (assoative 
arrays for example) - and the expectation is that the garbage collecter 
be used to sort it out..

The reality is that it's extremely difficult to be sure that you are not 
leaking memory, there are no built in tools currently for testing this 
(there is code in dsource = i've not tested it yet) , and it's kind of 
goes against the principle of D, where you can for the most part throw 
away all that memory allocation code, and concentrate on getting things 
done..

-------------------------------------------------
POSSIBLE SOLUTION?
-------------------------------------------------

What I've hacked into the GC so far:

- getIdx() for std.thread.Thread exposing the private thread.idx ???< 
how reliable is this??? - should be OK as long as you never delete a 
thread....

- A GC Log like Array for associating each bit of malloc()'d memory is 
owned by which thread.

- Checks in genCollect code to ensure that it does not try and free 
other thread's memory.

- Warnings in free() when the program does exactly that..

- Information in LOGGING so that malloc can be traced to a line of code 
  (class/method etc.- backtrace_symbols does not appear to return line 
numbers...) - see recent post


At this point running a threaded server under heavy load is alot more 
stable than before. however as you can understand by looking at the 
descriptions above. it still has some serious issues.

- Searching all the threads memory each time it wants to tidy memory a 
little is horribly inefficient..

- There are still issues with the const char* that should be checked for..

ONGOING IDEAS
-------------------------------------------------
- One gcx pool for each thread.

This seems like the most sensible solution.. - it would mean that the 
genCollect() run on a specific thread would only look at memory related 
to that thread.
** Implications
  + memory allocated in one thread and used in another could not be 
freed by that  thread. -- need to think more about this.
  ++ if you pass a complex structure when starting a thread, you will 
have to flag the  root in the parent thread to ensure that it's not 
freed by the starting thread.
  ++ the child thread will then need a way to flag so the starting 
thread knows it can  free it... -
  ++ making changes to a object in the child thread may make memory 
allocation  problematic.... -> the child thread does not know about the 
parent?!?!



- rooting the results from getStringz() - and warning users!!!!!
- using strncpy and removing the need for the struct?? ( not sure if it 
makes much difference - it's based on struct assumtion previously)

Suggested   code for toStringz()  = something like:

char* toStringz(char[] s) {
	char *c = malloc(s.length+1;
	strncpy(c, s.ptr, s.length);
	c[s.length]=0;
	gc.addRoot(c);
	return c;
}

free should check Root() to see if it's registed there, to prevent it 
being free'd by accident. (genCollect already does) an additional method 
rootFree() - which basically does removeRoot() + free()...


Anyway it'd be interested in feedback.
Regards
Alan






More information about the Digitalmars-d mailing list