std.allocator needs your help

Tue Sep 24 09:39:50 PDT 2013

On Tuesday, 24 September 2013 at 08:46:36 UTC, Dmitry Olshansky 
wrote:
> 23-Sep-2013 03:49, Andrei Alexandrescu пишет:
>> Hello,
>>
>>
>> I am making good progress on the design of std.allocator, and 
>> I am
>> optimistic about the way it turns out. D's introspection 
>> capabilities
>> really shine through, and in places the design does feel really
>> archetypal - e.g. "this is the essence of a freelist 
>> allocator". It's a
>> very good feeling. The overall inspiration comes from Berger's
>> HeapLayers, but D's introspection takes that pattern to a 
>> whole new level.
>>
>> Several details are still in flux, but at the top level it 
>> seems most
>> natural to divide things in two tiers:
>>
>> 1. Typed allocator, i.e. every request for allocation comes 
>> with the
>> exact type requested;
>>
>> 2. Untyped allocator - traffics exclusively in ubyte[].
>>
>
> Looks good (s/ubyte[]/void[] per current discussion).
>
> Do you imagine Typed allocators as something more then adapters 
> that simplify a common pattern of allocate + emplace / destroy 
> + deallocate? (create!T/destroy!T)
>
>> struct NullAllocator
>> {
>>     enum alignment = real.alignof;
>>     enum size_t available = 0;
>>     ubyte[] allocate(size_t s) shared { return null; }
>>     bool expand(ref ubyte[] b, size_t minDelta, size_t 
>> maxDelta) shared
>>     { assert(b is null); return false; }
>>     bool reallocate(ref ubyte[] b, size_t) shared
>>     { assert(b is null); return false; }
>>     void deallocate(ubyte[] b) shared { assert(b is null); }
>>     void collect() shared { }
>>     void deallocateAll() shared { }
>>     static shared NullAllocator it;
>> }
>>
>> Primitives:
>>
>
> First things first - can't allocator return alignment as 
> run-time value - a property (just like 'available' does)? The 
> implementation contract is that it must be O(1) vanila 
> syscall-free function. (Read this as consult system info 
> exactly once, at construction).
>
> Thinking more of it - it also may be immutable size_t? Then it 
> gets proper value at construction and then is never changed.
>
>> * expand(b, minDelta, maxDelta) grows b's length by at least 
>> minDelta
>> (and on a best-effort basis by at least maxDelta) and returns 
>> true, or
>> does nothing and returns false. In most allocators this should 
>> be @safe.
>> (One key insight is that expand() can be made @safe whereas 
>> shrink() or
>> realloc() are most often not; such mini-epiphanies are very 
>> exciting
>> because they put the design on a beam guide with few 
>> principles and many
>> consequences.) The precondition is that b is null or has been 
>> previously
>> returned by allocate(). This method is optional.
>
> Great to see this primitive. Classic malloc-ators are so lame...
> (+ WinAPI Heaps fits here)
>
>> * deallocate(b) deallocates the memory allocated for b. b must 
>> have been
>> previously allocated by the same allocator. This method is 
>> usually
>> unsafe (but there are notable implementations that may offer 
>> safety,
>> such as unbounded freelists.) This method is optional.
>
> Does the following implication hold "have a deallocate" --> 
> must be manually managed? Otherwise how would one reliably work 
> with it and not leak? This brings us to traits that allocators 
> may (should) have
> a-la automatic? free-all on termination? Zeros on allocate 
> (more common then one would think)? etc.
>
>> * deallocateAll() deallocates in one shot all memory previously
>> allocated by this allocator. This method is optional, and when 
>> present
>> is almost always unsafe (I see no meaningful @safe 
>> implementation.)
>> Region allocators are notable examples of allocators that 
>> define this
>> method.
>
> Okay.. I presume region "mark" works by spiting off a 
> subinstance of allocator and "release" by deallocateAll().
>
>>
>> * collect() frees unused memory that had been allocated with 
>> this
>> allocator. This optional method is implemented by tracing 
>> collectors and
>> is usually @safe.
>>
>
> This seems hard and/or suboptimal w/o typeinfo --> typed 
> allocators? I guess they could be more then a simple helper.
>
>> There are quite a few more things to define more precisely, 
>> but this
>> part of the design has become quite stable. To validate the 
>> design, I've
>> defined some simple allocators (Mallocator, GCAllocator, 
>> Freelist,
>> StackRegion, Region etc) but not the more involved ones, such 
>> as
>> coalescing heaps, buddy system etc.
>
> The problem I see is that allocators are rooted in poor 
> foundation - malloc is so out of fashion (its interface is 
> simply too thin on guarantees), sbrk is frankly a stone-age 
> syscall.
>
> I would love to make a root allocator one on top of 
> mmap/VirtualAlloc.
> This leads me to my biggest problem with classical memory 
> management - ~20 years of PCs having virtual memory support 
> directly in the CPU, and yet hardly anything outside of OS 
> takes advantage of it. They (allocators) seem simply not aware 
> of it - none of interface implies anything that would help user 
> take advantage of it. I'm talking of (potentially) large 
> allocations that may be often resized.
>
> (large allocations actually almost always a blind spot/fallback 
> in allocators)
>
> To the point - we have address space and optionally memory 
> committed in there, leading us to a 3-state of an  _address 
> range_:
> available, reserved, committed. The ranges of 2nd state are 
> dirt cheap (abundant on 64bit) and could be easily flipped to 
> committed and back.
>
> So I have a pattern I want to try to fit in your design. At 
> present it is a datastructure + allocation logic. What I would 
> want is clean and nice composition for ultimate reuse.
>
> An example is a heavy-duty array (could be used for potentially 
> large stack). The requirements are:
> a) Never reallocate on resize - in certain cases it may even be 
> too large to reallocate in RAM (!)
> b) Never waste RAM beyond some limit (a few pages or a tiny 
> faction of size is fine)
> c) O(1) amortized appending as in plain array and other 
> properties of an array -> it's a dynamic array after all
>
> Currently I use mmap/madvise and related entities directly. It 
> goes as follows.
>
> Allocate a sufficiently large - as large as "it" may get (1Mb, 
> 1G, 10G you name it) _address range_. Call this size a 
> 'capacity'.
> Commit some memory (optionally on first resize) - call this a 
> 'committed'. Appending goes using up committed space as typical 
> array would do. Whenever it needs more it then that applies the 
> same extending algorithm as usual but with (committed, 
> capacity) pair.
> Ditto on resize back - (with some amortization) it asks OS to 
> decommit pages decreasing the commited amount.
>
> In short - a c++ "vector" that has 2 capacities (current and 
> absolute maximum) with a plus that it never reallocates. What 
> to do if/when it hits the upper bound - is up to specific 
> application (terminate, fallback to something else). It has the 
> danger of sucking up address space on 32bits though.. but who 
> cares :)
>
>

Somewhat related: 
http://probablydance.com/2013/05/13/4gb-per-vector/