The problem with the D GC

Mon Jan 8 23:20:55 PST 2007

%u wrote:
> == Quote from Bill Baxter (dnewsgroup at billbaxter.com)'s article
>> Here's a slightly less contrived version of Oskar's gc test.
>> import std.math;
>> import std.random;
>> import std.stdio;
>> void main() {
>>      // The real memory use, ~40 mb
>>      double[] data;
>>      data.length = 5_000_000;
>>      foreach(i, inout x; data) {
>>          x = sin(cast(double)i/data.length);
>>          //x = 1;
>>      }
>>      int count = 0;
>>      int gcount = 0;
>>      while(1) {
>>          // simulate reading a few kb of data
>>          double[] incoming;
>>          incoming.length = 1000 + rand() % 5000;
>>          foreach(i, inout x; incoming) {
>>              x = sin(cast(double)i/incoming.length);
>>              //x = 5;
>>          }
>>          // do something with the data...
>>          // print status message every so often
>>          count += incoming.length;
>>          if (count > 1_000_000) {
>>              count = 0;
>>              gcount++;
>>              writefln("%s processed", gcount);
>>          }
>>      }
>> }
>> This one uses doubles instead of uints and the data is the sin of some
>> number.  These are _very_ realistic values for numeric data to have.
>> The same effect can be seen.  Instead of hovering around 40MB, the
>> memory use grows and grows and performance slows and slows.
>> This seems to be a very big issue.  The GC seems to be pretty much
>> useless right now if you're going to have a lot of floating point data
>> in your app.
>> --bb
>> Oskar Linde wrote:
>>> After having fought a while with D programs with runaway memory leaks,
>>> I've unfortunately had to come to the conclusion that the D GC is not
>>> ready for production use. The problem is what I'd call "spurious
>>> pointers". That is random data (strings, numbers, image data, audio or
>>> whatever) appearing to the GC to be full of pointers to all over the
>>> memory space.
>>>
>>> Consider this simple program. It is designed to have a memory footprint
>>> of about 20 mb and then continuously process data.
>>>
> 
> Agreed. This needs to be changed. Is the GC in that tango
> library any better?

It's a modified version of the DMD GC.  The "don't scan blocks 
containing elements smaller than pointer size" feature is built-in, and 
there is user-level control of that behavior on a per-block basis, among 
other things.  But it's still the same old mark/sweep GC at heart.

Sean