std.allocator needs your help

Mon Sep 23 08:32:26 PDT 2013

On 23 September 2013 23:58, Andrei Alexandrescu <
SeeWebsiteForEmail at erdani.org> wrote:

> On 9/22/13 9:03 PM, Manu wrote:
>
>> On 23 September 2013 12:28, Andrei Alexandrescu
>> <SeeWebsiteForEmail at erdani.org <mailto:SeeWebsiteForEmail@**erdani.org<SeeWebsiteForEmail at erdani.org>
>> >>
>>
>> wrote:
>>     My design makes it very easy to experiment by allowing one to define
>>     complex allocators out of a few simple building blocks. It is not a
>>     general-purpose allocator, but it allows one to define any number of
>>     such.
>>
>> Oh okay, so this isn't really intended as a system then, so much a
>> suggested API?
>>
>
> For some definition of "system" and "API", yes :o).
>
>
>  That makes almost all my questions redundant. I'm interested in the
>> system, not the API of a single allocator (although your API looks fine
>> to me).
>> I already have allocators I use in my own code. Naturally, they don't
>> inter-operate with anything, and that's what I thought std.allocator was
>> meant to address.
>>
>
> Great. Do you have a couple of nontrivial allocators (heap, buddy system
> etc) that could be adapted to the described API?
>

Err, not really actually. When I use custom allocator's, it's for
performance, which basically implies that it IS a trivial allocator :)
The common ones I use are: stack-based mark&release, circular buffers,
pools, pool groups (collection of different sized pools)... that might be
it actually. Very simple tools for different purposes.

     The proposed design makes it easy to create allocator objects. How
>>     they are used and combined is left to the application.
>>
>> Is that the intended limit of std.allocator's responsibility, or will
>> patterns come later?
>>
>
> Some higher level design will come later. I'm not sure whether or not
> you'll find it satisfying, for reasons I'll expand on below.
>
>
>  Leaving the usage up to the application means we've gained nothing.
>> I already have more than enough allocators which I use throughout my
>> code. The problem is that they don't inter-operate, and certainly not
>> with foreign code/libraries.
>> This is what I hoped std.allocator would address.
>>
>
> Again, if you already have many allocators, please let me know if you can
> share some.
>
> std.allocator will prescribe a standard for defining allocators, with
> which the rest of std will work, same as std.range prescribes a standard
> for defining ranges, with which std.algorithm, std.format, and other
> modules work. Clearly one could come back with "but I already have my own
> ranges that use first/done/next instead of front/empty/popFront, so I'm not
> sure what we're gaining here".
>

No, it's just that I'm saying std.allocator needs to do a lot more than
define a contract before I can start to consider if it solves my problems.
This is a good first step though, I'm happy to discuss this, but I think
discussion about the practical application may also reveal design details
at this level.

It's like you say, I can rename my allocator's methods to suit an agreed
standard, that'll take me 2 minutes, but it's how the rest of the universe
interacts with that API that matters, and if it effectively solves my
problems.

     An allocator instance is a variable like any other. So you use the
>>     classic techniques (shared globals, thread-local globals, passing
>>     around as parameter) for using the same allocator object from
>>     multiple places.
>>
>>
>> Okay, that's fine... but this sort of manual management implies that I'm
>> using it explicitly. That's where it all falls down for me.
>>
>
> I think a disconnect here is that you think "it" where I think "them".
> It's natural for an application to use one allocator that's not provided by
> the standard library, and it's often the case that an application defines
> and uses _several_ allocators for different parts of it. Then the natural
> question arises, how to deal with these allocators, pass them around, etc.
> etc.

No, I certainly understand you mean 'them', but you lead to what I'm
asking, how do these things get carried/passed around. Are they discreet,
or will they invade argument lists everywhere? Are they free to flow in/out
of libraries in a natural way?
These patterns are what will define the system as I see it.
Perhaps more importantly, where do these allocators get their memory
themselves (if they're not a bottom-level allocator)? Global override
perhaps, or should a memory source always be explicitly provided to a
non-bottom-level allocator?

 Eg, I want to use a library, it's allocation patterns are incompatible
>> with my application; I need to provide it with an allocator.
>> What now? Is every library responsible for presenting the user with a
>> mechanism for providing allocators? What if the author forgets? (a
>> problem I've frequently had to chase up in the past when dealing with
>> 3rd party libraries)
>>
>
> If the author forgets and hardcodes a library to use malloc(), I have no
> way around that.

Sure, but the common case is that the author will almost certainly use
keyword 'new'. How can I affect that as a 3rd party?
This would require me overriding the global allocator somehow... which you
touched on earlier.

 Once a library is designed to expect a user to supply an allocator, what
>> happens if the user doesn't? Fall-back logic/boilerplate exists in every
>> library I guess...
>>
>
> The library wouldn't need to worry as there would be the notion of a
> default allocator (probably backed by the existing GC).

Right. So it's looking like like the ability to override the global
allocator is a critical requirement.

 And does that mean that applications+libraries are required to ALWAYS
>> allocate through given allocator objects?
>>
>
> Yes, they should.

Then we make keyword 'new' redundant?

 That effectively makes the new keyword redundant.
>>
>
> new will still be used to tap into the global shared GC. std.allocator
> will provide other means of allocating memory.

I think the system will fail here. People will use 'new', siomply because
it's a keyword. Once that's boxed in a library, I will no longer be able to
affect that inconsiderate behaviour from my application.
Again, I think this signals that a global override is necessary.

 And what about the GC?
>>
>
> The current global GC is unaffected for the time being.
>
>
>  I can't really consider std.allocator intil it presents some usage
>> patterns.
>>
>
> Then you'd need to wait a little bit.
>

Okay.

         It wasn't clear to me from your demonstration, but 'collect()'
>>         implies
>>         that GC becomes allocator-aware; how does that work?
>>
>>
>>     No, each allocator has its own means of dealing with memory. One
>>     could define a tracing allocator independent of the global GC.
>>
>>
>> I'm not sure what this means. Other than I gather that the GC and
>> allocators are fundamentally separate?
>>
>
> Yes, they'd be distinct. Imagine an allocator that requests 4 MB from the
> GC as NO_SCAN memory, and then does its own management inside that block.
> User-level code allocates and frees e.g. strings or whatever from that
> block, without the global GC intervening.

Yup, that's fine. But what if the GC isn't the bottom level? There's just
another allocator underneath.
What I'm saying is, the GC should *be* an allocator, not be a separate
entity.

I want to eliminate the GC from my application. Ideally, in the future, it
can be replaced with an ARC, which I have become convinced is the right
choice for my work.

 Is it possible to create a tracing allocator without language support?
>>
>
> I think it is possible.
>
>
>  Does the current language insert any runtime calls to support the GC?
>>
>
> Aside from operator new, I don't think so.

Okay, so a flexible lowering of 'new' is all we need for now?
It will certainly need substantially more language support for ARC.

 I want a ref-counting GC for instance to replace the existing GC, but
>> it's impossible to implement one of them nicely without support from the
>> language, to insert implicit inc/dec ref calls all over the place, and
>> to optimise away redundant inc/dec sequences.
>>
>
> Unfortunately that's a chymera I had to abandon, at least at this level.

And there's the part you said I'm not going to like? ;)

The problem is that installing an allocator does not get to define what a
> pointer is and what a reference is.

Why not? A pointer has a type, like anything else. An ARC pointer can
theoretically have the compiler insert ARC magic.
That does imply though that the allocator affects the type, which I don't
like... I'll think on it.

These are notions hardwired into the language, so the notion of turning a
> switch and replacing the global GC with a reference counting scheme is
> impossible at the level of a library API.
>

Indeed it is. So is this API being built upon an incomplete foundation? Is
there something missing, and can it be added later, or will this design
cement some details that might need changing in the future? (we all know
potentially breaking changes like that will never actually happen)

(As an aside, you still need tracing for collecting cycles in a transparent
> reference counting scheme, so it's not all roses.)
>

It's true, but it's possible to explicitly control all those factors. It
remains deterministic.

What I do hope to get to is to have allocators define their own pointers
> and reference types. User code that uses those will be guaranteed certain
> allocation behaviors.

Interesting, will this mangle the pointer type, or the object type being
pointed to? The latter is obviously not desirable. Does the former actually
work in theory?

 I can easily define an allocator to use in my own code if it's entirely
>> up to me how I use it, but that completely defeats the purpose of this
>> exercise.
>>
>
> It doesn't. As long as the standard prescribes ONE specific API for
> defining untyped allocators, if you define your own to satisfy that API,
> then you'll be able to use your allocator with e.g. std.container, just the
> same as defining your own range as std.range requires allows you to tap
> into std.algorithm.

I realise this. That's all fine.

 Until there aren't standard usage patterns, practises, conventions that
>> ALL code follows, then we have nothing. I was hoping to hear your
>> thoughts about those details.
>>
>
>
>
>          It's quite an additional burden of resources and management to
>>         manage
>>         the individual allocations with a range allocator above what is
>>         supposed
>>         to be a performance critical allocator to begin with.
>>
>>
>>     I don't understand this.
>>
>>
>> It's irrelevant here.
>> But fwiw, in relation to the prior point about block-freeing a range
>> allocation;
>>
>
> What is a "range allocation"?
>
>
>  there will be many *typed* allocations within these ranges,
>> but a typical range allocator doesn't keep track of the allocations
>> within.
>>
>
> Do you mean s/range/region/?

Yes.

 This seems like a common problem that may or may not want to be
>> addressed in std.allocator.
>> If the answer is simply "your range allocator should keep track of the
>> offsets of allocations, and their types", then fine. But that seems like
>> boilerplate that could be automated, or maybe there is a
>> different/separate system for such tracking?
>>
>
> If you meant region, then yes that's boilerplate that hopefully will be
> reasonably automated by std.allocator. (What I discussed so far predates
> that stage of the design.)
>
>          C++'s design seems reasonable in some ways, but history has
>>         demonstrated
>>         that it's a total failure, which is almost never actually used
>> (I've
>>         certainly never seen anyone use it).
>>
>>
>>     Agreed. I've seen some uses of it that quite fall within the notion
>>     of the proverbial exception that prove the rule.
>>
>>
>> I think the main fail of C++'s design is that it mangles the type.
>> I don't think a type should be defined by the way it's memory is
>> allocated, especially since that could change from application to
>> application, or even call to call. For my money, that's the fundamental
>> flaw in C++'s design.
>>
>
> This is not a flaw as much as an engineering choice with advantages and
> disadvantages on the relative merits of which reasonable people may
> disagree.
>
> There are two _fundamental_ flaws of the C++ allocator design, in the
> sense that they are very difficult to argue in favor of and relatively easy
> to argue against:
>
> 1. Allocators are parameterized by type; instead, individual allocations
> should be parameterized by type.
>
> 2. There is no appropriate handling for allocators with state.
>
> The proposed std.allocator design deals with (2) with care, and will deal
> with (1) when it gets to typed allocators.

Fair enough. These are certainly more critical mistakes than the one I
raised.
I'm trying to remember the details of the practical failures I ran into
trying to use C++ allocators years ago.
Eventually, experience proved to us (myself and colleagues) that it wasn't
worth the mess, and we simply pursued a more direct solution. I've heard
similar stories from friends in other companies...
I need to try and recall the specific scenarios though, they might be
interesting :/ .. (going back the better part of a decade >_<)

 Well as an atom, as you say, it seems like a good first step.
>> I can't see any obvious issues, although I don't think I quite
>> understand the collect() function if it has no relation to the GC. What
>> is it's purpose?
>>
>
> At this point collect() is only implemented by the global GC. It is
> possible I'll drop it from the final design. However, it's also possible
> that collect() will be properly defined as "collect all objects allocated
> within this particular allocator that are not referred from any objects
> also allocated within this allocator". I think that's a useful definition.

Perhaps. I'm not sure how this situation arises though. Unless you've
managed to implement your own GC inside an allocator.

 If the idea is that you might implement some sort of tracking heap which
>> is able to perform a collect, how is that actually practical without
>> language support?
>>
>
> Language support would be needed for things like scanning the stack and
> the globals. But one can gainfully use a heap with semantics as described
> just above, which requires no language support.
>
>
>  I had imagined going into this that, like the range interface which the
>> _language_ understands and interacts with, the allocator interface would
>> be the same, ie, the language would understand this API and integrate it
>> with 'new', and the GC... somehow.
>>
>
> The D language has no idea what a range is. The notion is completely
> defined in std.range.
>
>
>  If allocators are just an object like in C++ that people may or may not
>> use, I don't think it'll succeed as a system. I reckon it needs deep
>> language integration to be truly useful.
>>
>
> I guess that's to be seen.

I think a critical detail to keep in mind, is that (I suspect) people
simply won't use it if it doesn't interface with keyword 'new'.
It also complicates generic code, and makes it more difficult to retrofit
an allocator where 'new' is already in use.

 The key problem to solve is the friction between different libraries,
>> and different moments within a single application its self.
>> I feel almost like the 'current' allocator needs to be managed as some
>> sort of state-machine. Passing them manually down the callstack is no
>> good. And 'hard' binding objects to their allocators like C++ is no good
>> either.
>>
>
> I think it's understood that if a library chooses its own ways to allocate
> memory, there's no way around that.

Are we talking here about explicit choice for sourcing memory, or just that
the library allocates through the default/GC?

This is the case where I like to distinguish a bottom-level allocator from
a high-level allocator.
A library probably wants to use some patterns for allocation of it's
object, these are high-level allocators, but where it sources it's memory
from still needs to be overridable.
It's extremely common that I want to enforce that a library exist entirely
within a designated heap. It can't fall back to the global GC.

I work on platforms where memory is not unified. Different resources need
to go into different heaps.
It has happened on numerous occasions that we have been denied a useful
library simply because the author did not provide allocation hooks, and the
author was not responsive to requests... leading to my favourite scenario
of re-inventing yet another wheel (the story of my career).
It shouldn't be the case that the author has to manually account for the
possibility that someone might want to provide a heap for the libraries
resources.

This is equally true for the filesystem (a gripe I haven't really raised in
D yet).

The point of std.allocator is that it defines a common interface that user
> code can work with.
>
>
> Andrei
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20130924/cf12c117/attachment-0001.html>