Why are void[] contents marked as having pointers?

Wed Jun 3 18:14:49 PDT 2009

Walter Bright Wrote:

> Vladimir Panteleev wrote:
> > I don't know why it was decided to mark the contents of void[] as
> > "might have pointers". It makes no sense! Consider:
> 
> [...]
> 
> > 3) It's very rare in practice that the only pointer to your
> > object (which you still plan to access later) to be stored in a
> > void[]-allocated array!
> 
> Rare or common, it still would be a nasty bug lurking to catch someone. 
> The default behavior in D should be to be correct code. Doing 
> potentially unsafe things to improve performance should require extra 
> effort - in this case it would be either using the gc function to mark 
> the memory as not containing pointers, or storing them as ubyte[] instead.

As quite a newby, I can sum up what I understood as follows:

1. The idea of void[] is that you can put anything in it without casting. 
2. Because of this, you might put pointers in a void[].
3. Since you have "legitimately" stored pointers, and we don't want to have the GC throw away something that we still have valid pointers for, we have to have the GC scan over void[] arrays for possible hits.

4. This pretty much means that any "big"(*) D program can not afford to put uniformly distributed data in a void[] array, because the GC will stop working correctly - it will not dispose of stuff that you don't need any more.
(*) where "big" means a program that creates and destroys a lot of objects.

So, currently if you want to use void[] to store non-pointers, you need to use the gc function to mark the memory as not containing pointers.

A comment and a question. I agree that suddenly losing data because you stored a pointer in a void[] is worse than GC not working well. However, since GC in D is so automatic, almost any use of void[] to store non-pointer data will cause massive memory leaks and eventual program failure. 

I can see 4 solutions...

First, to not allow non-pointers to be stored in void[]. So non-pointers are stored in ubyte[], pointers in void[]. Kinda looses the main point of using void[].

Second, void[] is not scanned by GC, but you can mark it to be. This can cause bugs if you store a pointer in void[], and later retreive it, but don't mark correctly.

Third, void[] is scanned by GC,  but you can mark it not to be. This can cause memory leaks if you store complex data in void[] in a big program, and don't handle GC marking correctly.

Forth - somewhat more complex. Since the compiler knows exactly when a pointer is stored in a void[] and when not, it would be possible to have the compiler handle all by itself, as long as the property of having to be scanned by GC is dirty - once a variable has it, any other that touches that variable gets the property.

Of these four solutions, the last 3 can still cause bugs if one stores both pointers and data in the same void[] array, no matter how the memory is marked, unless one does that marking on a very fine scale (is that possible?)

My conclusion from all this is either "don't use void[]", or "only use void[] to store pointers" if you don't want bugs in a valid program.