std.allocator needs your help

Mon Sep 23 06:58:42 PDT 2013

On 9/22/13 9:03 PM, Manu wrote:
> On 23 September 2013 12:28, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org <mailto:SeeWebsiteForEmail at erdani.org>>
> wrote:
>     My design makes it very easy to experiment by allowing one to define
>     complex allocators out of a few simple building blocks. It is not a
>     general-purpose allocator, but it allows one to define any number of
>     such.
>
> Oh okay, so this isn't really intended as a system then, so much a
> suggested API?

For some definition of "system" and "API", yes :o).

> That makes almost all my questions redundant. I'm interested in the
> system, not the API of a single allocator (although your API looks fine
> to me).
> I already have allocators I use in my own code. Naturally, they don't
> inter-operate with anything, and that's what I thought std.allocator was
> meant to address.

Great. Do you have a couple of nontrivial allocators (heap, buddy system 
etc) that could be adapted to the described API?

>     The proposed design makes it easy to create allocator objects. How
>     they are used and combined is left to the application.
>
> Is that the intended limit of std.allocator's responsibility, or will
> patterns come later?

Some higher level design will come later. I'm not sure whether or not 
you'll find it satisfying, for reasons I'll expand on below.

> Leaving the usage up to the application means we've gained nothing.
> I already have more than enough allocators which I use throughout my
> code. The problem is that they don't inter-operate, and certainly not
> with foreign code/libraries.
> This is what I hoped std.allocator would address.

Again, if you already have many allocators, please let me know if you 
can share some.

std.allocator will prescribe a standard for defining allocators, with 
which the rest of std will work, same as std.range prescribes a standard 
for defining ranges, with which std.algorithm, std.format, and other 
modules work. Clearly one could come back with "but I already have my 
own ranges that use first/done/next instead of front/empty/popFront, so 
I'm not sure what we're gaining here".

>     An allocator instance is a variable like any other. So you use the
>     classic techniques (shared globals, thread-local globals, passing
>     around as parameter) for using the same allocator object from
>     multiple places.
>
>
> Okay, that's fine... but this sort of manual management implies that I'm
> using it explicitly. That's where it all falls down for me.

I think a disconnect here is that you think "it" where I think "them". 
It's natural for an application to use one allocator that's not provided 
by the standard library, and it's often the case that an application 
defines and uses _several_ allocators for different parts of it. Then 
the natural question arises, how to deal with these allocators, pass 
them around, etc. etc.

> Eg, I want to use a library, it's allocation patterns are incompatible
> with my application; I need to provide it with an allocator.
> What now? Is every library responsible for presenting the user with a
> mechanism for providing allocators? What if the author forgets? (a
> problem I've frequently had to chase up in the past when dealing with
> 3rd party libraries)

If the author forgets and hardcodes a library to use malloc(), I have no 
way around that.

> Once a library is designed to expect a user to supply an allocator, what
> happens if the user doesn't? Fall-back logic/boilerplate exists in every
> library I guess...

The library wouldn't need to worry as there would be the notion of a 
default allocator (probably backed by the existing GC).

> And does that mean that applications+libraries are required to ALWAYS
> allocate through given allocator objects?

Yes, they should.

> That effectively makes the new keyword redundant.

new will still be used to tap into the global shared GC. std.allocator 
will provide other means of allocating memory.

> And what about the GC?

The current global GC is unaffected for the time being.

> I can't really consider std.allocator intil it presents some usage patterns.

Then you'd need to wait a little bit.

>         It wasn't clear to me from your demonstration, but 'collect()'
>         implies
>         that GC becomes allocator-aware; how does that work?
>
>
>     No, each allocator has its own means of dealing with memory. One
>     could define a tracing allocator independent of the global GC.
>
>
> I'm not sure what this means. Other than I gather that the GC and
> allocators are fundamentally separate?

Yes, they'd be distinct. Imagine an allocator that requests 4 MB from 
the GC as NO_SCAN memory, and then does its own management inside that 
block. User-level code allocates and frees e.g. strings or whatever from 
that block, without the global GC intervening.

> Is it possible to create a tracing allocator without language support?

I think it is possible.

> Does the current language insert any runtime calls to support the GC?

Aside from operator new, I don't think so.

> I want a ref-counting GC for instance to replace the existing GC, but
> it's impossible to implement one of them nicely without support from the
> language, to insert implicit inc/dec ref calls all over the place, and
> to optimise away redundant inc/dec sequences.

Unfortunately that's a chymera I had to abandon, at least at this level. 
The problem is that installing an allocator does not get to define what 
a pointer is and what a reference is. These are notions hardwired into 
the language, so the notion of turning a switch and replacing the global 
GC with a reference counting scheme is impossible at the level of a 
library API.

(As an aside, you still need tracing for collecting cycles in a 
transparent reference counting scheme, so it's not all roses.)

What I do hope to get to is to have allocators define their own pointers 
and reference types. User code that uses those will be guaranteed 
certain allocation behaviors.

> I can easily define an allocator to use in my own code if it's entirely
> up to me how I use it, but that completely defeats the purpose of this
> exercise.

It doesn't. As long as the standard prescribes ONE specific API for 
defining untyped allocators, if you define your own to satisfy that API, 
then you'll be able to use your allocator with e.g. std.container, just 
the same as defining your own range as std.range requires allows you to 
tap into std.algorithm.

> Until there aren't standard usage patterns, practises, conventions that
> ALL code follows, then we have nothing. I was hoping to hear your
> thoughts about those details.

>         It's quite an additional burden of resources and management to
>         manage
>         the individual allocations with a range allocator above what is
>         supposed
>         to be a performance critical allocator to begin with.
>
>
>     I don't understand this.
>
>
> It's irrelevant here.
> But fwiw, in relation to the prior point about block-freeing a range
> allocation;

What is a "range allocation"?

> there will be many *typed* allocations within these ranges,
> but a typical range allocator doesn't keep track of the allocations within.

Do you mean s/range/region/?

> This seems like a common problem that may or may not want to be
> addressed in std.allocator.
> If the answer is simply "your range allocator should keep track of the
> offsets of allocations, and their types", then fine. But that seems like
> boilerplate that could be automated, or maybe there is a
> different/separate system for such tracking?

If you meant region, then yes that's boilerplate that hopefully will be 
reasonably automated by std.allocator. (What I discussed so far predates 
that stage of the design.)

>         C++'s design seems reasonable in some ways, but history has
>         demonstrated
>         that it's a total failure, which is almost never actually used (I've
>         certainly never seen anyone use it).
>
>
>     Agreed. I've seen some uses of it that quite fall within the notion
>     of the proverbial exception that prove the rule.
>
>
> I think the main fail of C++'s design is that it mangles the type.
> I don't think a type should be defined by the way it's memory is
> allocated, especially since that could change from application to
> application, or even call to call. For my money, that's the fundamental
> flaw in C++'s design.

This is not a flaw as much as an engineering choice with advantages and 
disadvantages on the relative merits of which reasonable people may 
disagree.

There are two _fundamental_ flaws of the C++ allocator design, in the 
sense that they are very difficult to argue in favor of and relatively 
easy to argue against:

1. Allocators are parameterized by type; instead, individual allocations 
should be parameterized by type.

2. There is no appropriate handling for allocators with state.

The proposed std.allocator design deals with (2) with care, and will 
deal with (1) when it gets to typed allocators.

> Well as an atom, as you say, it seems like a good first step.
> I can't see any obvious issues, although I don't think I quite
> understand the collect() function if it has no relation to the GC. What
> is it's purpose?

At this point collect() is only implemented by the global GC. It is 
possible I'll drop it from the final design. However, it's also possible 
that collect() will be properly defined as "collect all objects 
allocated within this particular allocator that are not referred from 
any objects also allocated within this allocator". I think that's a 
useful definition.

> If the idea is that you might implement some sort of tracking heap which
> is able to perform a collect, how is that actually practical without
> language support?

Language support would be needed for things like scanning the stack and 
the globals. But one can gainfully use a heap with semantics as 
described just above, which requires no language support.

> I had imagined going into this that, like the range interface which the
> _language_ understands and interacts with, the allocator interface would
> be the same, ie, the language would understand this API and integrate it
> with 'new', and the GC... somehow.

The D language has no idea what a range is. The notion is completely 
defined in std.range.

> If allocators are just an object like in C++ that people may or may not
> use, I don't think it'll succeed as a system. I reckon it needs deep
> language integration to be truly useful.

I guess that's to be seen.

> The key problem to solve is the friction between different libraries,
> and different moments within a single application its self.
> I feel almost like the 'current' allocator needs to be managed as some
> sort of state-machine. Passing them manually down the callstack is no
> good. And 'hard' binding objects to their allocators like C++ is no good
> either.

I think it's understood that if a library chooses its own ways to 
allocate memory, there's no way around that. The point of std.allocator 
is that it defines a common interface that user code can work with.

Andrei