O(N) Garbage collection?

Sat Feb 19 12:04:58 PST 2011

Sat, 19 Feb 2011 14:32:27 -0500, dsimcha wrote:

> On 2/19/2011 12:50 PM, Ulrik Mikaelsson wrote:
>> Just a thought; I guess the references to the non-GC-scanned strings
>> are held in GC-scanned memory, right? Are the number of such references
>> also increased linearly?
> 
> Well, first of all, the benchmark I posted seems to indicate otherwise.
>   Second of all, I was running this program before on yeast DNA and it
> was ridiculously fast.  Then I tried to do the same thing on human DNA
> and it became slow as molasses.  Roughly speaking, w/o getting into the
> biology much, I've got one string for each gene.  Yeast have about 1/3
> as many genes as humans, but the genes are on average about 100 times
> smaller.  Therefore, the difference should be at most a small constant
> factor and in actuality it's a huge constant factor.
> 
> Note:  I know I could make the program in question a lot more space
> efficient, and that's what I ended up doing.  It works now.  It's just
> that it was originally written for yeast, where space efficiency is
> obviously not a concern, and I would have liked to just try a one-off
> calculation on the human genome without having to rewrite portions of
> it.

Probably one reason for this behavior is the lack of testing. My desktop 
only has 24 GB of DDR3. I have another machine with 16 GB of DDR2, but 
don't know how to combine the address spaces via clustering. This would 
also horribly drag down GC performance. Even JVM is badly tuned for 
larger systems, they might use the Azul Java runtimes instead..