Why are void[] contents marked as having pointers?

Christopher Wright dhasenan at gmail.com
Wed Jun 3 19:32:58 PDT 2009


MLT wrote:
> Walter Bright Wrote:
> 
>> Vladimir Panteleev wrote:
>>> I don't know why it was decided to mark the contents of void[] as
>>> "might have pointers". It makes no sense! Consider:
>> [...]
>>
>>> 3) It's very rare in practice that the only pointer to your
>>> object (which you still plan to access later) to be stored in a
>>> void[]-allocated array!
>> Rare or common, it still would be a nasty bug lurking to catch someone. 
>> The default behavior in D should be to be correct code. Doing 
>> potentially unsafe things to improve performance should require extra 
>> effort - in this case it would be either using the gc function to mark 
>> the memory as not containing pointers, or storing them as ubyte[] instead.
> 
> As quite a newby, I can sum up what I understood as follows:
> 
> 1. The idea of void[] is that you can put anything in it without casting. 
> 2. Because of this, you might put pointers in a void[].
> 3. Since you have "legitimately" stored pointers, and we don't want to have the GC throw away something that we still have valid pointers for, we have to have the GC scan over void[] arrays for possible hits.
> 
> 4. This pretty much means that any "big"(*) D program can not afford to put uniformly distributed data in a void[] array, because the GC will stop working correctly - it will not dispose of stuff that you don't need any more.
> (*) where "big" means a program that creates and destroys a lot of objects.
> 
> So, currently if you want to use void[] to store non-pointers, you need to use the gc function to mark the memory as not containing pointers.
> 
> A comment and a question. I agree that suddenly losing data because you stored a pointer in a void[] is worse than GC not working well. However, since GC in D is so automatic, almost any use of void[] to store non-pointer data will cause massive memory leaks and eventual program failure. 

First, this is no problem if you are merely aliasing an existing array. 
In order for it to be an issue, you must copy from some array to a 
void[] -- for instance, appending to an existing void[], or .dup'ing a 
void[] alias. (While a GC could work around the latter case, it would be 
unsafe -- you can append something with pointers to a void[] copy of an 
int[].)

> I can see 4 solutions...
> 
> First, to not allow non-pointers to be stored in void[]. So non-pointers are stored in ubyte[], pointers in void[]. Kinda looses the main point of using void[].
> 
> Second, void[] is not scanned by GC, but you can mark it to be. This can cause bugs if you store a pointer in void[], and later retreive it, but don't mark correctly.

This is an unsafe option.

> Third, void[] is scanned by GC,  but you can mark it not to be. This can cause memory leaks if you store complex data in void[] in a big program, and don't handle GC marking correctly.

This is already available. If you know your array doesn't have pointers, 
you can call GC.hasNoPointers(array.ptr).

This is a safe option.

> Forth - somewhat more complex. Since the compiler knows exactly when a pointer is stored in a void[] and when not, it would be possible to have the compiler handle all by itself, as long as the property of having to be scanned by GC is dirty - once a variable has it, any other that touches that variable gets the property.

This isn't really the case unless you get some really invasive whole 
program analysis (not available with D's compilation model, or if you 
want to interact with code written in other languages, or if you want to 
do runtime dynamic linking) or a really invasive runtime (think of 
calling a method every time you access an array).

In point of fact, that's not going to be enough. You need to call the 
runtime with every assignment, since you might be passing individual 
ubytes around when they're part of a pointer and reassembling them 
somewhere else.

> Of these four solutions, the last 3 can still cause bugs if one stores both pointers and data in the same void[] array, no matter how the memory is marked, unless one does that marking on a very fine scale (is that possible?)

struct S
{
	int i;
	int* j;
}

You're screwed.

> My conclusion from all this is either "don't use void[]", or "only use void[] to store pointers" if you don't want bugs in a valid program.

Not bugs, but potential performance issues. And the advice should be 
"don't allocate void[]", to split hairs.



More information about the Digitalmars-d mailing list