std.allocator needs your help

Tue Sep 24 08:25:11 PDT 2013

On 9/23/13 11:06 PM, Manu wrote:
> On 24 September 2013 15:31, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org <mailto:SeeWebsiteForEmail at erdani.org>>
> wrote:
>
>     On 9/23/13 9:56 PM, Manu wrote:
>
>         You can't go wasting GPU memory by overallocating every block.
>
>
>     Only the larger chunk may need to be overallocated if all
>     allocations are then rounded up.
>
>
> I don't follow.
> If I want to allocate 4k aligned, then 8k will be allocated (because it
> wants to store an offset).

What do extant GPU allocators do here?

> Any smaller allocation let's say, 16 bytes, will round up to 4k. You
> can't waste precious gpu ram like that.

That's easy, you just segregate allocations by size.

> A minimum and a maximum (guaranteed without over-allocating) alignment
> may be useful.

What's the semantics of the minimum?

> But I think allocators need to be given the opportunity to do the best
> it can.
>
>         It's definitely important that allocator's are able to receive an
>         alignment request, and give them the opportunity to fulfill a
>         dynamic
>         alignment request without always resorting to an over-allocation
>         strategy.
>
>
>     I'd need a bit of convincing. I'm not sure everybody needs to pay
>     for a few, and it is quite possible that malloc_align suffers from
>     the same fragmentation issues as the next guy. Also, there's always
>     the possibility of leaving some bits to lower-level functions.
>
>
> What are they paying exactly? An extra arg to allocate that can probably
> be defaulted?
>    void[] allocate(size_t bytes, size_t align = this.alignment) shared;

For allocating relatively small objects (say up to 32K), we're looking 
at tens of cycles, no more. An extra argument needs to be passed around 
and more importantly looked at and acted upon. At this level it's a 
serious dent in the time budget.

Part of the matter is that small objects must in a way the fastest to 
allocate. For larger objects, it is true to some extent that the caller 
will do some work with the obtained memory, which offsets the relative 
cost of allocation. (That being said, Jason Evans told me you can't 
always assume the caller will do at least "a memset amount of work".)

Anyhow it stands to reason that you don't want to pay for matters 
related to alignment without even looking.

> Or is it the burden of adding the overallocation boilerplate logic to
> each allocator for simple allocators that don't want to deal with
> alignment in a conservative way?
> I imagine that could possibly be automated, the boilerplate could be
> given as a library.
>
> void[] allocate(size_t size, size_t align)
> {
>    size_t allocSize =
> std.allocator.getSizeCompensatingForAlignment(size, align);
>
>    void[] mem = ...; // allocation logic using allocSize
>
>    return std.allocator.alignAllocation(mem, align); // adjusts the
> range, and maybe write the offset to the prior bytes
> }

One possibility I'm thinking of is to make maximum alignment a static 
property of the allocator. It may be set during runtime, but a given 
allocator object has one define maximum allocation. Would that be 
satisfactory to all?

Andrei