std.allocator needs your help
Peter Alexander
peter.alexander.au at gmail.com
Tue Sep 24 11:02:40 PDT 2013
On Tuesday, 24 September 2013 at 17:02:18 UTC, Andrei
Alexandrescu wrote:
> On 9/24/13 9:58 AM, Peter Alexander wrote:
>> On Tuesday, 24 September 2013 at 15:25:11 UTC, Andrei
>> Alexandrescu wrote:
>>>> What are they paying exactly? An extra arg to allocate that
>>>> can probably
>>>> be defaulted?
>>>> void[] allocate(size_t bytes, size_t align =
>>>> this.alignment) shared;
>>>
>>> For allocating relatively small objects (say up to 32K),
>>> we're looking
>>> at tens of cycles, no more. An extra argument needs to be
>>> passed
>>> around and more importantly looked at and acted upon. At this
>>> level
>>> it's a serious dent in the time budget.
>>
>> The cost of a few cycles really doesn't matter for memory
>> allocation...
>> If you are really allocating memory so frequently that those
>> few extra
>> cycles matter then you are probably going to be memory bound
>> anyway.
>
> It does. I'm not even going to argue this.
Sorry but I find this insulting. Myself and Manu, both
professional and senior game developers with a lot of experience
in performance are both arguing against you. I'm not saying this
makes us automatically right, but I think it's rude to dismiss
our concerns as not even worthy of discussion.
>> I think this is a situation where you need to justify yourself
>> with
>> something concrete. Can you provide an example of some code
>> whose
>> performance is significantly impacted by the addition of an
>> alignment
>> parameter? It has to be "real code" that does something
>> useful, not just
>> a loop the continually calls allocate.
>
> Strings.
Strings what? Just allocating lots of small strings?
Ok, I've put together a benchmark of the simplest allocator I can
think of (pointer bump) doing *nothing* but allocating 12 bytes
at a time and copying a pre-defined string into the allocated
memory: http://dpaste.dzfl.pl/59636d82
On my machine, the difference between the version with alignment
and the version without 1%. I tried changing the allocator to a
class so that the allocation was virtual and not inlined, and the
difference was still only ~2% (Yes, I verified in the generated
code that nothing was being omitted).
In a real scenario, much more will be going on outside the
allocator, making the overhead much less than 1%.
Please let me know if you take issue with the benchmark. I wrote
this quickly so hopefully I have not made any mistakes.
More information about the Digitalmars-d
mailing list