General Problems for GC'ed Applications?

Unknown W. Brackets unknown at simplemachines.org
Sun Jul 23 01:58:38 PDT 2006


Personally, I see it as good coding practice to, in D, use delete on 
those things you know you need/want to delete immediately.  Just don't 
go nuts about those things you don't.

Typically, in a well programmed piece of software, the number of 
allocations you can't be entirely sure about deleting will be much less 
than those you are sure about.  Without a GC, this small percentage 
causes a huge amount of headache.  With it, you can ignore these and 
worry only about the easily-determined cases.

Throwing caution to the wind and using the GC for everything isn't, in 
my opinion, a good idea.  If anyone disagrees with me, I'd like to hear 
it, but I know of no reason why you'd want to do that.  I realize many 
people do throw caution to the wind, but so do many people drive drunk. 
  It doesn't mean anyone suggests it, necessarily.

#1 does not seem to be a severe problem to me.  Memory usage will be 
higher, but that's why there's flex for cache and buffers.

#2 would be a problem for any application.  I do not estimate that my 
application, if I did not track every allocation down (but only delete 
when I know for sure), would use even 50% more memory.

As such, I think it would be difficult to show a piece of well-written 
software that runs into this problem using the GC, but does not 
otherwise.  IMHO, if a program requires so much memory swapping starts 
happening frequently, it's a bug, a design flaw, or a fact of life. 
It's not the garbage collector.

Yes, a collect will cause swapping - if you have that much memory used. 
  Ideally, collects won't happen often (since they can't just happen 
whenever anyway, they happen when you use up milestones of memory) and 
you can disable/enable the GC and run collects manually when it makes 
the most sense for your software.

Failing that, software which is known to use a large amount of memory 
may need to use manual memory management.  Likely said software will 
perform poorly anyway.

#3 depends on the above cases being true.  Some competition may happen 
indeed, but I think it would take more analysis to see how strongly this 
would affect things.

Neither is this a new problem; programs will always compete for main 
memory.  #3 is only a problem when taking into account the higher memory 
usage, and is negligible if the memory usage is not much higher.

I would say these problems are heavily dependent on application design, 
code quality, and purpose.  In other words, these are more problems for 
the programmer than for the garbage collector.  But this is only my opinion.

As for the risks... a I see as the OS's problem where #1 is an issue; #2 
I see as a problem regardless of garbage collection; c I agree with, as 
mentioned above, these should be balanced - only when it is not of 
benefit should the "leak" be left for the GC.

I'm afraid I'm not terribly familiar with the dining philosopher's 
problem, but again I think this is a problem only somewhat aggravated by 
garbage collection.

Most of your post seems to be wholly concerned with applications that 
use at least the exact figure of Too Much Memory (tm).  While I realize 
there are several special-cases where such usage is necessary or 
acceptable, I seem at a loss to think of any general or practical 
reasons, aside from poor code quality... or database systems.

The software on my home computer which uses the most memory uses nothing 
even close to the amount of system memory I have.  Indeed, the sum of 
memory use on my machine with several programs running is still less 
than that typically shipped with machines even a few years ago (I'm 
putting that number at 512 megabytes.)

The servers I manage have fairly standard amounts of ram (average, let's 
say, 2 gigabytes.)  Typically, even in periods of high traffic use, they 
do not take much swap.  In fact, Apache is garbage collected 
(pools/subpools, that is) and doesn't seem to be a problem at all.  PHP, 
which is used on many of these servers, is not garbage collected 
(traditional memory management) and tends to hog memory just a bit.

MySQL and other database systems obviously take the largest chunk.  For 
such systems, you don't want any of your data paged, ever.  You 
typically have large, static cache areas which you don't even want 
garbage collected, and you never realloc/free until the end of the 
process.  These areas would not be garbage collected and the data in 
them would not be scanned by the garbage collector.

In fact, you'd normally want your database on a dedicated machine with 
extra helpings of memory for these reasons.  Whether or not it was 
garbage collected wouldn't affect whether it was suitable only as a 
single instance on a single machine.  As above, this is a matter of the 
software, not of the GC.

So my suggestion is that you look at the limitations of your software, 
design, and development team and make your decisions from there.  A 
sweeping statement that garbage collection causes a dining philosopher's 
problem just doesn't seem correct to me.

Thanks,
-[Unknown]


> I see three problems:
> 
> 1) The typical behaviour of a GC'ed application is to require more 
> and more main memory but not to need it. Hence every GC'ed 
> application forces the OS to diminish the size of the system cache 
> held in main memory until the GC of the application kicks in.
> 
> 2) If the available main memory is unsufficient for the true memory 
> requirements of the application and the OS provides virtual memory 
> by swapping out to secondary storage, every run of the GC forces 
> the OS to slowly swap back all data for this application from 
> secondary storage and runs of the GC occur frequently, because main 
> memory is tight.
> 
> 3) If there is more than one GC'ed application running, those 
> applications compete for the available main memory.
> 
> 
> I see four risks:
> 
> a) from 1: The overall reaction of the system gets slower in favor 
> for the GC'ed application.
> 
> b) from 2: Projects decomposed into several subtasks may face 
> severe runtime problems when integrating the independent and 
> succesful tested modules.
> 
> c) from 2 and b: The reduction of man time in the development and 
> maintenance phases for not being forced to avoid memory leaks may 
> be overly compensated by an increase of machine time by a factor of 
> 50 or more.
> 
> d) from 1 and 3: A more complicated version of the dining 
> philosophers problem is introduced. In this version every 
> philosopher is allowed to rush around the table and grab all unused 
> forks and declare them used, before he starts to eat---and nobody 
> can force him to put them back on the table.
> 
> 
> Conclusion:
> 
> I know that solving the original dining philosophers problem took 
> several years and I do not see any awareness towards this more 
> complicated version arising by using a GC.
> Risks c) and d) are true killers.
> Therefore GC'ed applications currently seem to be suitable only if 
> they are running single instance on a machine well equipped with 
> main memory and no other GC'ed applications are used.
> To assure that these conditions hold, the GC should maintain 
> statistics on the duration of its runs and frequency of calls. This 
> would allows the GC to throw an "Almost out of memory".



More information about the Digitalmars-d mailing list