Enhanced array appending
Steven Schveighoffer
schveiguy at yahoo.com
Thu Dec 24 11:58:38 PST 2009
On Thu, 24 Dec 2009 13:39:21 -0500, grauzone <none at example.net> wrote:
> Sorry about that, I didn't have a close look at the patch. I guess I was
> more thinking about Andrei's original proposal (and how I thought it may
> be implemented).
No problem.
>
> It seems you store the length field inside the array's memory block
> (instead of the cache, which just speeds up gc_query). That's awesome,
> but I'm going to complain again: now you have to keep a length field for
> *all* memory allocations, not only arrays! For most object allocations,
> this means 4 more bytes additional overhead.
Interestingly enough, the storage overhead is zero except for memory
blocks > 256 bytes. I'll explain:
A probably not well-known piece of trivia is that D allocates 1 extra byte
for arrays. Why would it do this you say? Because of the GC.
Imagine that it does not do this, what happens when you do something like
this:
ubyte[] array = new ubyte[16]; // allocates a 16-byte block for the array
array = array[$..$];
If you look at array's ptr member, it now no longer points to the
allocated block, but the next block. Although it isn't important for the
GC to keep around the allocated block any more, it now will keep the next
block from being collected. In addition, if you tried appending to array,
it might start using unallocated memory!
So the byte at the end already was in use. I sort of commandeered it for
length use. For blocks up to and including 256 bytes, I use the last byte
of the block as the length storage. For blocks of 512 to a half page
(2048 bytes), I use the last 2 bytes, so there is one extra overhead byte
compared to the current implementation.
Blocks larger than that follow different rules, they are not stored in
bins, but just a whole page at a time. With those blocks, it is possible
to extend the block by adding more pages if they are free, so it's not OK
to store the length at the end of the block, since the end of the block
may change. So I store at the beginning. I use up a full 8 bytes, and
the reason is alignment, I don't know what you're putting in the array, so
I must keep the data 8 byte-aligned.
For classes, I allocate the required extra data as if the class were an
array of the class data size, and then set the "ghost" length at the end
of the block to 0. If a class data exceeds a half page, the ghost length
is at the beginning, where the vtable ptr is, so it's extremely unlikely
to accidentally match that length. Note that it doesn't make a huge
difference in most cases because the block used for the class is a power
of 2 anyways, so in most cases there's plenty of wasted space at the end.
I found out during testing that allocating a new struct is equivalent to
allocating a new array of that struct of size 1, and returning it's
pointer, so that aspect is already covered.
> Also, if you use GC.malloc directly, and the user tries to append to
> slices to it, your code may break. GC.malloc doesn't seem to pad the
> memory block with a length field.
Yes, this is a possible problem. However, using GC.malloc directly and
then treating the result as a normal array is probably extremely rare. At
least it is not valid for safe D. It probably should be explained as a
danger in GC.malloc, but I don't think this will adversely affect most
code. There will be functions that should obviate the need for calling
GC.malloc to allocate an array (specifically, allocating uninitialized
data).
> I must say that I find your argumentation strange: didn't you say adding
> an additional field to the slice struct is too much overhead?
Overhead when passing around, not overhead for storage. For example,
pushing 12 bytes onto the stack instead of 8 when calling a function with
a slice. If you want safe appends, you need to store the allocation
length somewhere, there's no way around that.
> Also, a solution which keeps the pointer to the array length in the
> slice struct would still be faster. The cache lookup cost is not zero.
This is the trade-off I chose between performance when passing around
slices and using them and performance for appending. If you focus all
your performance concerns on appending, then other areas suffer. I don't
know if what I chose will be the best solution, but I can't think of any
way to have *both* speedy slice usage and near-zero append overhead. If
someone can think of a better solution, I'll be happy to incorporate it,
but after using and understanding slices while developing stuff for Tango
(Tango uses slices to get every ounce of performance!), I'm convinced that
as-fast-as-possible slice semantics for passing around data is essential.
This is where a custom type that focuses performance on appending at the
expense of other functions is good to have. I think such applications
where you need the absolute best append performance without any other
functionality are pretty rare.
> Anyway, now that semantics and performance are somewhat sane, none of
> these remaining issues are too important. Thanks Steve and Andrei!
Cool!
-Steve
More information about the Digitalmars-d
mailing list