std.variant benchmark

Mon Jul 30 04:54:52 PDT 2012

On 30/07/12 13:24, Andrei Alexandrescu wrote:
> On 7/30/12 4:34 AM, Dmitry Olshansky wrote:
>> On 30-Jul-12 06:01, Andrei Alexandrescu wrote:
>>>> In fact memcpy could and should be replaced with word by word copy for
>>>> almost all of struct sizes up to ~32 bytes (as size is known in advance
>>>> for this particular function pointer i.e. handler!int).
>>>
>>> In fact memcpy should be smart enough to do all that, but apparently it
>>> doesn't.
>>>
>>
>> I'd say array ops could and should do this (since compiler has all the
>> info at compile-time). On the other hand memcpy is just one tired C
>> function aimed at fast blitting of any memory chunks.
>> (Even just call/ret pair is too much of overhead in case of int).
>
> memcpy is implemented as an intrinsic on many platforms. I'm not sure
> whether it is on dmd, but it is on dmc
> (http://www.digitalmars.com/ctg/sc.html), icc, and gcc
> (http://software.intel.com/en-us/articles/memcpy-performance/). But then
> clearly using simple word assignments wherever possible makes for a more
> robust performance profile.

It is an intrinsic on DMD, but it isn't done optimally. Mostly it just 
compiles to a couple of loads + the single instruction
rep movsd; / rep movsq;
which is perfect for medium-sized lengths when everything is aligned, 
but once it is longer than a few hundred bytes, it should be done as a 
function call. (The optimal method involves cache blocking).
Also for very short lengths it should be done as a couple of loads.