std.allocator needs your help
Dmitry Olshansky
dmitry.olsh at gmail.com
Tue Sep 24 01:46:20 PDT 2013
23-Sep-2013 03:49, Andrei Alexandrescu пишет:
> Hello,
>
>
> I am making good progress on the design of std.allocator, and I am
> optimistic about the way it turns out. D's introspection capabilities
> really shine through, and in places the design does feel really
> archetypal - e.g. "this is the essence of a freelist allocator". It's a
> very good feeling. The overall inspiration comes from Berger's
> HeapLayers, but D's introspection takes that pattern to a whole new level.
>
> Several details are still in flux, but at the top level it seems most
> natural to divide things in two tiers:
>
> 1. Typed allocator, i.e. every request for allocation comes with the
> exact type requested;
>
> 2. Untyped allocator - traffics exclusively in ubyte[].
>
Looks good (s/ubyte[]/void[] per current discussion).
Do you imagine Typed allocators as something more then adapters that
simplify a common pattern of allocate + emplace / destroy + deallocate?
(create!T/destroy!T)
> struct NullAllocator
> {
> enum alignment = real.alignof;
> enum size_t available = 0;
> ubyte[] allocate(size_t s) shared { return null; }
> bool expand(ref ubyte[] b, size_t minDelta, size_t maxDelta) shared
> { assert(b is null); return false; }
> bool reallocate(ref ubyte[] b, size_t) shared
> { assert(b is null); return false; }
> void deallocate(ubyte[] b) shared { assert(b is null); }
> void collect() shared { }
> void deallocateAll() shared { }
> static shared NullAllocator it;
> }
>
> Primitives:
>
First things first - can't allocator return alignment as run-time value
- a property (just like 'available' does)? The implementation contract
is that it must be O(1) vanila syscall-free function. (Read this as
consult system info exactly once, at construction).
Thinking more of it - it also may be immutable size_t? Then it gets
proper value at construction and then is never changed.
> * expand(b, minDelta, maxDelta) grows b's length by at least minDelta
> (and on a best-effort basis by at least maxDelta) and returns true, or
> does nothing and returns false. In most allocators this should be @safe.
> (One key insight is that expand() can be made @safe whereas shrink() or
> realloc() are most often not; such mini-epiphanies are very exciting
> because they put the design on a beam guide with few principles and many
> consequences.) The precondition is that b is null or has been previously
> returned by allocate(). This method is optional.
Great to see this primitive. Classic malloc-ators are so lame...
(+ WinAPI Heaps fits here)
> * deallocate(b) deallocates the memory allocated for b. b must have been
> previously allocated by the same allocator. This method is usually
> unsafe (but there are notable implementations that may offer safety,
> such as unbounded freelists.) This method is optional.
Does the following implication hold "have a deallocate" --> must be
manually managed? Otherwise how would one reliably work with it and not
leak? This brings us to traits that allocators may (should) have
a-la automatic? free-all on termination? Zeros on allocate (more common
then one would think)? etc.
> * deallocateAll() deallocates in one shot all memory previously
> allocated by this allocator. This method is optional, and when present
> is almost always unsafe (I see no meaningful @safe implementation.)
> Region allocators are notable examples of allocators that define this
> method.
Okay.. I presume region "mark" works by spiting off a subinstance of
allocator and "release" by deallocateAll().
>
> * collect() frees unused memory that had been allocated with this
> allocator. This optional method is implemented by tracing collectors and
> is usually @safe.
>
This seems hard and/or suboptimal w/o typeinfo --> typed allocators? I
guess they could be more then a simple helper.
> There are quite a few more things to define more precisely, but this
> part of the design has become quite stable. To validate the design, I've
> defined some simple allocators (Mallocator, GCAllocator, Freelist,
> StackRegion, Region etc) but not the more involved ones, such as
> coalescing heaps, buddy system etc.
The problem I see is that allocators are rooted in poor foundation -
malloc is so out of fashion (its interface is simply too thin on
guarantees), sbrk is frankly a stone-age syscall.
I would love to make a root allocator one on top of mmap/VirtualAlloc.
This leads me to my biggest problem with classical memory management -
~20 years of PCs having virtual memory support directly in the CPU, and
yet hardly anything outside of OS takes advantage of it. They
(allocators) seem simply not aware of it - none of interface implies
anything that would help user take advantage of it. I'm talking of
(potentially) large allocations that may be often resized.
(large allocations actually almost always a blind spot/fallback in
allocators)
To the point - we have address space and optionally memory committed in
there, leading us to a 3-state of an _address range_:
available, reserved, committed. The ranges of 2nd state are dirt cheap
(abundant on 64bit) and could be easily flipped to committed and back.
So I have a pattern I want to try to fit in your design. At present it
is a datastructure + allocation logic. What I would want is clean and
nice composition for ultimate reuse.
An example is a heavy-duty array (could be used for potentially large
stack). The requirements are:
a) Never reallocate on resize - in certain cases it may even be too
large to reallocate in RAM (!)
b) Never waste RAM beyond some limit (a few pages or a tiny faction of
size is fine)
c) O(1) amortized appending as in plain array and other properties of an
array -> it's a dynamic array after all
Currently I use mmap/madvise and related entities directly. It goes as
follows.
Allocate a sufficiently large - as large as "it" may get (1Mb, 1G, 10G
you name it) _address range_. Call this size a 'capacity'.
Commit some memory (optionally on first resize) - call this a
'committed'. Appending goes using up committed space as typical array
would do. Whenever it needs more it then that applies the same extending
algorithm as usual but with (committed, capacity) pair.
Ditto on resize back - (with some amortization) it asks OS to decommit
pages decreasing the commited amount.
In short - a c++ "vector" that has 2 capacities (current and absolute
maximum) with a plus that it never reallocates. What to do if/when it
hits the upper bound - is up to specific application (terminate,
fallback to something else). It has the danger of sucking up address
space on 32bits though.. but who cares :)
Now how to fit this monster into the current API...
Plainly I need a way to tell hey Mr. Allocator RESERVE me a 1Gb of RAM
but don't chew on, it please. In other words - allocate needs something
like a "guaranteed capacity of the block". By default it could be == size.
struct Allocator
{
...
//allocate size bytes with a block of with no less then capacity bytes.
void[] allocate(size_t size, size_t capacity=size);
...
}
Or maybe somehow add a reserve method explicitly...
Then extend then will easily do the second part of the job - in-place
resize (commit/decommit as required) and both hints make a lot of sense.
The pattern ideally would lower to: take a generic array, plug-in a
MemMap allocator, reserve top amount of memory at start ... profit!
--
Dmitry Olshansky
More information about the Digitalmars-d
mailing list