std.allocator needs your help
Brad Anderson
eco at gnuk.net
Tue Sep 24 09:39:50 PDT 2013
On Tuesday, 24 September 2013 at 08:46:36 UTC, Dmitry Olshansky
wrote:
> 23-Sep-2013 03:49, Andrei Alexandrescu пишет:
>> Hello,
>>
>>
>> I am making good progress on the design of std.allocator, and
>> I am
>> optimistic about the way it turns out. D's introspection
>> capabilities
>> really shine through, and in places the design does feel really
>> archetypal - e.g. "this is the essence of a freelist
>> allocator". It's a
>> very good feeling. The overall inspiration comes from Berger's
>> HeapLayers, but D's introspection takes that pattern to a
>> whole new level.
>>
>> Several details are still in flux, but at the top level it
>> seems most
>> natural to divide things in two tiers:
>>
>> 1. Typed allocator, i.e. every request for allocation comes
>> with the
>> exact type requested;
>>
>> 2. Untyped allocator - traffics exclusively in ubyte[].
>>
>
> Looks good (s/ubyte[]/void[] per current discussion).
>
> Do you imagine Typed allocators as something more then adapters
> that simplify a common pattern of allocate + emplace / destroy
> + deallocate? (create!T/destroy!T)
>
>> struct NullAllocator
>> {
>> enum alignment = real.alignof;
>> enum size_t available = 0;
>> ubyte[] allocate(size_t s) shared { return null; }
>> bool expand(ref ubyte[] b, size_t minDelta, size_t
>> maxDelta) shared
>> { assert(b is null); return false; }
>> bool reallocate(ref ubyte[] b, size_t) shared
>> { assert(b is null); return false; }
>> void deallocate(ubyte[] b) shared { assert(b is null); }
>> void collect() shared { }
>> void deallocateAll() shared { }
>> static shared NullAllocator it;
>> }
>>
>> Primitives:
>>
>
> First things first - can't allocator return alignment as
> run-time value - a property (just like 'available' does)? The
> implementation contract is that it must be O(1) vanila
> syscall-free function. (Read this as consult system info
> exactly once, at construction).
>
> Thinking more of it - it also may be immutable size_t? Then it
> gets proper value at construction and then is never changed.
>
>> * expand(b, minDelta, maxDelta) grows b's length by at least
>> minDelta
>> (and on a best-effort basis by at least maxDelta) and returns
>> true, or
>> does nothing and returns false. In most allocators this should
>> be @safe.
>> (One key insight is that expand() can be made @safe whereas
>> shrink() or
>> realloc() are most often not; such mini-epiphanies are very
>> exciting
>> because they put the design on a beam guide with few
>> principles and many
>> consequences.) The precondition is that b is null or has been
>> previously
>> returned by allocate(). This method is optional.
>
> Great to see this primitive. Classic malloc-ators are so lame...
> (+ WinAPI Heaps fits here)
>
>> * deallocate(b) deallocates the memory allocated for b. b must
>> have been
>> previously allocated by the same allocator. This method is
>> usually
>> unsafe (but there are notable implementations that may offer
>> safety,
>> such as unbounded freelists.) This method is optional.
>
> Does the following implication hold "have a deallocate" -->
> must be manually managed? Otherwise how would one reliably work
> with it and not leak? This brings us to traits that allocators
> may (should) have
> a-la automatic? free-all on termination? Zeros on allocate
> (more common then one would think)? etc.
>
>> * deallocateAll() deallocates in one shot all memory previously
>> allocated by this allocator. This method is optional, and when
>> present
>> is almost always unsafe (I see no meaningful @safe
>> implementation.)
>> Region allocators are notable examples of allocators that
>> define this
>> method.
>
> Okay.. I presume region "mark" works by spiting off a
> subinstance of allocator and "release" by deallocateAll().
>
>>
>> * collect() frees unused memory that had been allocated with
>> this
>> allocator. This optional method is implemented by tracing
>> collectors and
>> is usually @safe.
>>
>
> This seems hard and/or suboptimal w/o typeinfo --> typed
> allocators? I guess they could be more then a simple helper.
>
>> There are quite a few more things to define more precisely,
>> but this
>> part of the design has become quite stable. To validate the
>> design, I've
>> defined some simple allocators (Mallocator, GCAllocator,
>> Freelist,
>> StackRegion, Region etc) but not the more involved ones, such
>> as
>> coalescing heaps, buddy system etc.
>
> The problem I see is that allocators are rooted in poor
> foundation - malloc is so out of fashion (its interface is
> simply too thin on guarantees), sbrk is frankly a stone-age
> syscall.
>
> I would love to make a root allocator one on top of
> mmap/VirtualAlloc.
> This leads me to my biggest problem with classical memory
> management - ~20 years of PCs having virtual memory support
> directly in the CPU, and yet hardly anything outside of OS
> takes advantage of it. They (allocators) seem simply not aware
> of it - none of interface implies anything that would help user
> take advantage of it. I'm talking of (potentially) large
> allocations that may be often resized.
>
> (large allocations actually almost always a blind spot/fallback
> in allocators)
>
> To the point - we have address space and optionally memory
> committed in there, leading us to a 3-state of an _address
> range_:
> available, reserved, committed. The ranges of 2nd state are
> dirt cheap (abundant on 64bit) and could be easily flipped to
> committed and back.
>
> So I have a pattern I want to try to fit in your design. At
> present it is a datastructure + allocation logic. What I would
> want is clean and nice composition for ultimate reuse.
>
> An example is a heavy-duty array (could be used for potentially
> large stack). The requirements are:
> a) Never reallocate on resize - in certain cases it may even be
> too large to reallocate in RAM (!)
> b) Never waste RAM beyond some limit (a few pages or a tiny
> faction of size is fine)
> c) O(1) amortized appending as in plain array and other
> properties of an array -> it's a dynamic array after all
>
> Currently I use mmap/madvise and related entities directly. It
> goes as follows.
>
> Allocate a sufficiently large - as large as "it" may get (1Mb,
> 1G, 10G you name it) _address range_. Call this size a
> 'capacity'.
> Commit some memory (optionally on first resize) - call this a
> 'committed'. Appending goes using up committed space as typical
> array would do. Whenever it needs more it then that applies the
> same extending algorithm as usual but with (committed,
> capacity) pair.
> Ditto on resize back - (with some amortization) it asks OS to
> decommit pages decreasing the commited amount.
>
> In short - a c++ "vector" that has 2 capacities (current and
> absolute maximum) with a plus that it never reallocates. What
> to do if/when it hits the upper bound - is up to specific
> application (terminate, fallback to something else). It has the
> danger of sucking up address space on 32bits though.. but who
> cares :)
>
>
Somewhat related:
http://probablydance.com/2013/05/13/4gb-per-vector/
More information about the Digitalmars-d
mailing list