In-place extension of arrays only for certain alignment?
Ali Çehreli
acehreli at yahoo.com
Wed Aug 17 18:40:06 UTC 2022
On 8/16/22 19:33, Steven Schveighoffer wrote:
> Everything in the memory allocator is in terms of pages. A pool is a
> section of pages. The large blocks are a *multiple* of pages, whereas
> the small blocks are pages that are *divided* into same-sized chunks.
Thank you. I am appreciating this discussion a lot.
> When you want to allocate a block, if it's a half-page or less, then it
> goes into the small pool, where you can't glue 2 blocks together.
That's how it should be because we wouldn't want to work to know which
blocks are glued.
>> > The reason why your `bad` version fails is because when it must
>> > reallocate, it still is only allocating 1 element.
I will get back to this 1 element allocation below.
>> That part I still don't understand. The same block of e.g. 16 bytes
>> still has room. Why not use that remaining portion?
>
> It does until it doesn't. Then it needs to reallocate.
I think my problem was caused by my test data being 16 bytes and (I
think) the runtime picked memory from the 16-byte pool without any hope
of future growth into adjacent block.
Using a 16-byte block sounds like a good strategy at first because
nobody knows whether an array will get more than one element.
However, if my guess is correct (i.e. the first element of size of
16-bytes is placed on a 16-byte block), then the next allocation will
always allocate memory for the second element.
One might argue that dynamic arrays are likely to have more than a
single element, so the initial block should at least be twice the
element size. This would cut memory allocation by 1 count for all
arrays. And in my case of 1-element arrays, allocation count would be
halved. (Because I pay for every single append right now.)
Of course, I can understand that there can be applications where a large
number of arrays (e.g. a 2D array) may have zero-elements or
one-element, in which case my proposal of allocating the first element
on a e.g. 32-byte block would be wasteful.
I think such cases are rare and we incur 1 extra allocation penalty for
all arrays for that strategy.
> Maybe some ASCII art? `A` is "used by the current slice", `x` is
> "allocated, but not referenced by the array". `.` is "part of the block
> but not used (i.e. can grow into this)". `M` is "metadata"
>
> ```d
> auto arr = new ubyte[10]; // AAAA AAAA AA.. ...M
> arr = arr[1 .. $]; // xAAA AAAA AA.. ...M
> arr ~= 0; // xAAA AAAA AAA. ...M
> arr ~= 0; // xAAA AAAA AAAA ...M
> arr = arr[3 .. $]; // xxxx AAAA AAAA ...M
> arr ~= cast(ubyte[])[1, 2, 3]; // xxxx AAAA AAAA AAAM // full!
> arr ~= 1; // AAAA AAAA AAAA ...M // reallocated!
> arr.ptr has changed
> ```
That all makes sense. I didn't think the meta data would be at the end
but I sense it's related to the "end slice", so it's a better place
there. (?)
> Metadata isn't always stored. Plus, it's optimized for the block size.
> For example, any block that is 256 bytes or less only needs a single
> byte to store the "used" space.
That's pretty interesting and smart.
> What is your focus? Why do you really want this "optimization" of gluing
> together items to happen?
This is about what you and I talked about in the past and something I
mentioned in my DConf lightning talk this year. I am imagining a
FIFO-like cache where elements of a source range are stored. There is a
sliding window involved. I want to drop the unused front elements
because they are expensive in 2 ways:
1) If the elements are not needed anymore, I have to move my slice
forward so that the GC collects their pages.
2) If they are not needed anymore, I don't want to even copy them to a
new block because this would be expensive, and in the case of an
infinite source range, impossible.
The question is when to apply this dropping of old front elements. When
I need to add one more element to the array, I can detect whether this
*may* allocate by the expression 'arr.length == arr.capacity' but not
really though, because the runtime may give me adjacent room without
allocation. So I can delay the "drop the front elements" computation
because there will be no actual copying at this time.
But the bigger issue is, because I drop elements my array never gets
large enough to take advantage of this optimization and there is an
allocation for every single append.
> https://dlang.org/phobos/core_memory.html#.GC.extend
Ok, that sounds very useful. In addition to "I can detect when it *may*
allocate", I can detect whether there is adjacent free room. (I can ask
for just 1 element extension; I tested; and it works.) (I guess this
GC.extend has to grab a GC lock.)
However, for that to work, I seem to need the initial block pointer that
the GC knows about. (My sliding array's .ptr not work, so I have to save
the initial arr.ptr).
Conclusion:
1) Although understanding the inner workings of the runtime is very
useful and core.memory has interesting functions, it feels too much
fragile work to get exactly what I want. I should manage my own memory
(likely still backed by the GC).
2) I argue that the initial allocation should be more than 1 element so
that we skip 1 allocation for most arrays and 50% for my window-of-1
sliding window case.
Ali
More information about the Digitalmars-d-learn
mailing list