std.allocator needs your help

Tue Sep 24 11:02:40 PDT 2013

On Tuesday, 24 September 2013 at 17:02:18 UTC, Andrei 
Alexandrescu wrote:
> On 9/24/13 9:58 AM, Peter Alexander wrote:
>> On Tuesday, 24 September 2013 at 15:25:11 UTC, Andrei 
>> Alexandrescu wrote:
>>>> What are they paying exactly? An extra arg to allocate that 
>>>> can probably
>>>> be defaulted?
>>>>  void[] allocate(size_t bytes, size_t align = 
>>>> this.alignment) shared;
>>>
>>> For allocating relatively small objects (say up to 32K), 
>>> we're looking
>>> at tens of cycles, no more. An extra argument needs to be 
>>> passed
>>> around and more importantly looked at and acted upon. At this 
>>> level
>>> it's a serious dent in the time budget.
>>
>> The cost of a few cycles really doesn't matter for memory 
>> allocation...
>> If you are really allocating memory so frequently that those 
>> few extra
>> cycles matter then you are probably going to be memory bound 
>> anyway.
>
> It does. I'm not even going to argue this.

Sorry but I find this insulting. Myself and Manu, both 
professional and senior game developers with a lot of experience 
in performance are both arguing against you. I'm not saying this 
makes us automatically right, but I think it's rude to dismiss 
our concerns as not even worthy of discussion.

>> I think this is a situation where you need to justify yourself 
>> with
>> something concrete. Can you provide an example of some code 
>> whose
>> performance is significantly impacted by the addition of an 
>> alignment
>> parameter? It has to be "real code" that does something 
>> useful, not just
>> a loop the continually calls allocate.
>
> Strings.

Strings what? Just allocating lots of small strings?

Ok, I've put together a benchmark of the simplest allocator I can 
think of (pointer bump) doing *nothing* but allocating 12 bytes 
at a time and copying a pre-defined string into the allocated 
memory: http://dpaste.dzfl.pl/59636d82

On my machine, the difference between the version with alignment 
and the version without 1%. I tried changing the allocator to a 
class so that the allocation was virtual and not inlined, and the 
difference was still only ~2% (Yes, I verified in the generated 
code that nothing was being omitted).

In a real scenario, much more will be going on outside the 
allocator, making the overhead much less than 1%.

Please let me know if you take issue with the benchmark. I wrote 
this quickly so hopefully I have not made any mistakes.