deprecated delete and manual memory management

Fri Apr 29 06:03:12 PDT 2011

On Thu, 28 Apr 2011 16:14:48 -0400, Timon Gehr <timon.gehr at gmx.ch> wrote:

> Steven Schveighoffer wrote:
>> I would need to see what your benchmark is before I'd say it was
>> conclusive.
>
> Ok. Version with delete is GC_hints (delete does not give many semantic
> guarantees, ergo it is a hint), version without delete is GC. It is also  
> attached
> for convenience.
>
> This benchmark tests quite high loads on the GC heap, so for usual  
> applications
> the GC won't perform all that bad. But then again, usual applications  
> don't come
> with a memory allocation bottleneck.
>
> @Andrei: The benchmark also includes a comparison to malloc/free  
> (version c_heap).
>
> Timon
>
>
> /* Benchmark GC vs GC & hints vs freelist vs C Heap
>  * My measurements on Intel Core Duo 2x2.4 GhZ on Ubuntu 10.04 (lucid)  
> Amd64
>  * version=GC:       1m45s
>  * version=GC_hints: ~6.8s
>  * //using custom allocators:
>  * version=freelist: ~3.7s
>  * version=c_heap:   ~4.5s
>  */
>
>
> import std.stdio;
> import std.c.stdlib;
> import std.c.string;
> import std.algorithm:swap;
>
>
> version=GC;//default
> //version=GC_hints;
> //version=freelist;
> //version=c_heap;
>
>
> version(freelist) version=calldel;
> version(GC_hints) version=calldel;
> version(c_heap) version=calldel;
>
> class foo{
> 	char data[1020]=void;
> 	static foo freelist;
> 	foo next;
> 	version(freelist){
> 		new(size_t s){
> 			if(freelist !is null){
> 				void *r=cast(void*)freelist;
> 				freelist=freelist.next;
> 				return r;
> 			}
> 			return cast(void*)new char[s];//could use malloc(s); effects are  
> similar
> 		}
> 		delete(void *p){
> 			if(p){
> 				foo oldfl=freelist;
> 				freelist=cast(foo)p;
> 				freelist.next=oldfl;
> 			}
> 		}
> 	}
> 	version(c_heap){
> 		new(size_t s){return malloc(s);}
> 		delete(void *p){if(p) free(p);}
> 	}
> };
>
> foo[100000] a;
>
> void main(){
> 	srand(100);
> 	int top;
> 	int dir;//0: expected growing, 1: expected shrinking
> 	foreach(t;0..10000000){
> 		if(!top){
> 			a[top++]=new foo;
> 			dir=0;
> 		}else if(top==100000){
> 			top--;
> 			version(calldel) delete a[top];
> 			dir=1;
> 		}else if(dir^!(rand()%3)){
> 			top--;
> 			version(calldel) delete a[top];
> 		}else a[top++]=new foo;
> 		if(!rand()%100) for(int i=0;i<100;i++) swap(a[0],a[rand()%top]);//mess  
> around
> 	}
> }

I'll point out a couple of issues here:

1. you are never removing elements from the array when version(calldel) is  
not enabled.  That is, the GC might run, but does not clean any objects  
when you just do top--.  Although this might not make a huge difference,  
it is not an equivalent comparison.  I'd suggest adding else clauses to  
the version(calldel) where you set a[top] to null.
2. the GC is conservative, and you are allocating swaths of 1K objects.   
It's quite possible that you are running into false pointer issues.  This  
might be helped with a more precise GC (which there is a patch for in  
bugzilla).  David Simcha has also checked into github improvements for  
allocation of large objects.  This may also help with your benchmark.

I too have seen the GC perform horribly in situations like this, where you  
are allocating lots and lots of data.  All you need to do to prove it is  
append an array for so many elements, you will see huge pauses where the  
GC runs.

These are situations where delete helps because of the conservative  
implementation of the GC.  Yes, it is a valid point, but it doesn't prove  
that delete is better against *any* GC.  What we need to do is improve the  
GC.

I tried setting a[top] to null in the GC system.  For comparison, on my  
system your original GC test took 2m42s.

With setting a[top] to null, the new run on my system is 2m36s.  So the  
time does not decrease significantly.

Some more data:

I added the following line to the foreach loop:

if(!(t % 10000)) writeln(t, "\t", top);

There is some interesting behavior, particularly for the GC and GC_hints  
versions.  Your code obviously goes through cycles in the array, first  
growing up to 100,000, then shrinking back down to 0, then repeating the  
cycle.  There is some noise with the rand % 3, but it is not enough to  
make the growth and shrinking truly random.  What is interesting in this  
display is, in both GC_hints and GC versions, the initial growth to  
100,000 is somewhat slow, with the GC_hints version being a bit faster.

However, after that, the difference is drastic.  The GC_hints version  
shrinks very quickly, and the GC version continues its slow pace.  This is  
mildly surprising, because setting the top element to null should be much  
faster than delete, and both are using the GC to allocate new nodes.   
However, what I find really interesting is the subsequent growth back up  
to 100,000 is still slow in the GC version, but is still very speedy in  
the GC_hints version.  Given that both are using the GC for allocation, I  
would expect even footing.  There is definitely something different going  
on when delete is called than when a collection cycle is called.  The GC  
version may leave more objects allocated than the GC_hints version, but it  
should speed up as it can collect more object blocks to reuse.  It's  
almost as if it's not reusing memory, but my system isn't running out of  
memory.  So I'm not sure exactly what's going on.

This clearly needs some investigation, thank you for sharing your  
benchmark.

-Steve