Why are void[] contents marked as having pointers?
Vladimir Panteleev
thecybershadow at gmail.com
Sun May 31 11:45:23 PDT 2009
I just went through a ~15000-line project and replaced most occurrences of void[]. Now the project is an ugly mess of void[], ubyte[] and casts, but at least it doesn't leak memory like crazy any more.
I don't know why it was decided to mark the contents of void[] as "might have pointers". It makes no sense! Consider:
1) void[] has this wonderful, magical property that any array type implicitly casts to void[]. This makes it wonderful to use in libraries and functions that manipulate data with no regards to what it actually contains. Network libraries, compression libraries, etc. - right about anywhere where you'd use a void* and length in C++, a void[] is just and appropriate.
2) Despite that void[] is "typeless", you can still operate on it - namely, slice and concatenate them. Pass a void[] to a network send() function - how much did you send? Half the buffer? No problem, slice it away and store the rest - and no casts.
3) It's very rare in practice that the only pointer to your object (which you still plan to access later) to be stored in a void[]-allocated array! Remember, the properties of memory regions are determined when the memory is allocated, so casting an array of structures to a void[] will not lose you that reference. You'd need to move your pointer to a void[]-array (which you need to allocate explicitly or, for example, concatenating your reference to the void[]), then drop the reference to your original structure, for this to happen.
Here's a simple naive implementation of a buffer:
void[] buffer;
void queue(void[] data)
{
buffer ~= data;
}
...
queue([1,2,3][]);
queue("Hello, World!");
No casts! So simple and beautiful. However, should you use this pattern to work with larger amounts of data with a high entropy, the "minefield" effect will cause the GC to stop collecting most data. Sure, you can call std.gc.hasNoPointers, but you need to do it after every single concatenation... and it makes expressions with more than one concatenation unsafe.
I heard that Tango copies over the properties of arrays when they are reallocated, which helps but solves the problem only partially.
So, I ask you: is there actually code out there that depends on the way void[] works right now? I brought up this argument a year or so ago on IRC, and there were people who defended ferociously the current design using idealisms ("it should work like what it sounds like, it should contain any type" or something like that), but I've yet to see a practical argument.
P.S. How come the standard library doesn't have a simple function like this?
T[] toArray(T)(inout T data) { return (&data)[0..1]; }
It happens often that I need to get a slice of memory around an object's reference (for example to pass it to a function that takes a void[] :D), and typing (&x)[0..1] every time feels like a hack.
--
Best regards,
Vladimir mailto:thecybershadow at gmail.com
More information about the Digitalmars-d
mailing list