In-place extension of arrays only for certain alignment?
acehreli at yahoo.com
Wed Aug 17 18:40:06 UTC 2022
On 8/16/22 19:33, Steven Schveighoffer wrote:
> Everything in the memory allocator is in terms of pages. A pool is a
> section of pages. The large blocks are a *multiple* of pages, whereas
> the small blocks are pages that are *divided* into same-sized chunks.
Thank you. I am appreciating this discussion a lot.
> When you want to allocate a block, if it's a half-page or less, then it
> goes into the small pool, where you can't glue 2 blocks together.
That's how it should be because we wouldn't want to work to know which
blocks are glued.
>> > The reason why your `bad` version fails is because when it must
>> > reallocate, it still is only allocating 1 element.
I will get back to this 1 element allocation below.
>> That part I still don't understand. The same block of e.g. 16 bytes
>> still has room. Why not use that remaining portion?
> It does until it doesn't. Then it needs to reallocate.
I think my problem was caused by my test data being 16 bytes and (I
think) the runtime picked memory from the 16-byte pool without any hope
of future growth into adjacent block.
Using a 16-byte block sounds like a good strategy at first because
nobody knows whether an array will get more than one element.
However, if my guess is correct (i.e. the first element of size of
16-bytes is placed on a 16-byte block), then the next allocation will
always allocate memory for the second element.
One might argue that dynamic arrays are likely to have more than a
single element, so the initial block should at least be twice the
element size. This would cut memory allocation by 1 count for all
arrays. And in my case of 1-element arrays, allocation count would be
halved. (Because I pay for every single append right now.)
Of course, I can understand that there can be applications where a large
number of arrays (e.g. a 2D array) may have zero-elements or
one-element, in which case my proposal of allocating the first element
on a e.g. 32-byte block would be wasteful.
I think such cases are rare and we incur 1 extra allocation penalty for
all arrays for that strategy.
> Maybe some ASCII art? `A` is "used by the current slice", `x` is
> "allocated, but not referenced by the array". `.` is "part of the block
> but not used (i.e. can grow into this)". `M` is "metadata"
> auto arr = new ubyte; // AAAA AAAA AA.. ...M
> arr = arr[1 .. $]; // xAAA AAAA AA.. ...M
> arr ~= 0; // xAAA AAAA AAA. ...M
> arr ~= 0; // xAAA AAAA AAAA ...M
> arr = arr[3 .. $]; // xxxx AAAA AAAA ...M
> arr ~= cast(ubyte)[1, 2, 3]; // xxxx AAAA AAAA AAAM // full!
> arr ~= 1; // AAAA AAAA AAAA ...M // reallocated!
> arr.ptr has changed
That all makes sense. I didn't think the meta data would be at the end
but I sense it's related to the "end slice", so it's a better place
> Metadata isn't always stored. Plus, it's optimized for the block size.
> For example, any block that is 256 bytes or less only needs a single
> byte to store the "used" space.
That's pretty interesting and smart.
> What is your focus? Why do you really want this "optimization" of gluing
> together items to happen?
This is about what you and I talked about in the past and something I
mentioned in my DConf lightning talk this year. I am imagining a
FIFO-like cache where elements of a source range are stored. There is a
sliding window involved. I want to drop the unused front elements
because they are expensive in 2 ways:
1) If the elements are not needed anymore, I have to move my slice
forward so that the GC collects their pages.
2) If they are not needed anymore, I don't want to even copy them to a
new block because this would be expensive, and in the case of an
infinite source range, impossible.
The question is when to apply this dropping of old front elements. When
I need to add one more element to the array, I can detect whether this
*may* allocate by the expression 'arr.length == arr.capacity' but not
really though, because the runtime may give me adjacent room without
allocation. So I can delay the "drop the front elements" computation
because there will be no actual copying at this time.
But the bigger issue is, because I drop elements my array never gets
large enough to take advantage of this optimization and there is an
allocation for every single append.
Ok, that sounds very useful. In addition to "I can detect when it *may*
allocate", I can detect whether there is adjacent free room. (I can ask
for just 1 element extension; I tested; and it works.) (I guess this
GC.extend has to grab a GC lock.)
However, for that to work, I seem to need the initial block pointer that
the GC knows about. (My sliding array's .ptr not work, so I have to save
the initial arr.ptr).
1) Although understanding the inner workings of the runtime is very
useful and core.memory has interesting functions, it feels too much
fragile work to get exactly what I want. I should manage my own memory
(likely still backed by the GC).
2) I argue that the initial allocation should be more than 1 element so
that we skip 1 allocation for most arrays and 50% for my window-of-1
sliding window case.
More information about the Digitalmars-d-learn