std.variant benchmark

Sun Jul 29 11:57:18 PDT 2012

On Sunday, 29 July 2012 at 14:43:09 UTC, Dmitry Olshansky wrote:
> On 29-Jul-12 18:17, Andrei Alexandrescu wrote:
>> On 7/29/12 8:17 AM, Gor Gyolchanyan wrote:
>>> std.variant is so incredibly slow! It's practically unusable 
>>> for
>>> anything, which requires even a tiny bit of performance.
>>
>> You do realize you actually benchmark against a function that 
>> does
>> nothing, right? Clearly there are ways in which we can improve
>> std.variant to the point initialization costs assignment of 
>> two words,
>> but this benchmark doesn't help. (Incidentally I just prepared 
>> a class
>> at C++ and Beyond on benchmarking, and this benchmark makes a 
>> lot of the
>> mistakes described therein...)
>>
>>
>> Andrei
>
>
> This should be more relevant then:
>
> //fib.d
> import std.datetime, std.stdio, std.variant;
>
> auto fib(Int)()
> {
> 	Int a = 1, b = 1;
> 	for(size_t i=0; i<100; i++){
> 		Int c = a + b;
> 		a = b;
> 		b = c;
> 	}
> 	return a;	
> }
>
> void main()
> {
> 	writeln(benchmark!(fib!int, fib!long, fib!Variant)(10_000));
> }
>
>
> dmd -O -inline -release fib.d
>
> Output:
>
> [TickDuration(197), TickDuration(276), TickDuration(93370107)]
>
> I'm horrified. Who was working on std.variant enhancements? 
> Please chime in.

I thought this results are a bit strange, so I converted the 
result to seconds. This gave me:

[3.73e-06, 3.721e-06, 2.97281]

One million inner loop iterations in under 4 microseconds? My 
processor's frequency isn't measured in THz, so something strange 
must be going on here. In order to find out what it was, I 
changed the code to this:

     writeln(benchmark!(fib!int, fib!long)(1000_000_000)[]
         .map!"a.nsecs() * 1.0e-9");

and used a profiler on it. The relevant part of the output is:

     0.00 :	  445969:       test   %r12d,%r12d
     0.00 :	  44596c:       je     445975 <_D3std8date
    46.67 :	  44596e:       inc    %ebx
     0.00 :	  445970:       cmp    %r12d,%ebx
     0.00 :	  445973:       jb     44596e <_D3std8date
     0.00 :	  445975:       lea    -0x18(%rbp),%rdi
     0.00 :	  445979:       callq  45a048 <_D3std8date
     0.00 :	  44597e:       mov    %rax,0x0(%r13)
     0.00 :	  445982:       lea    -0x18(%rbp),%rdi
     0.00 :	  445986:       callq  459fb4 <_D3std8date
     0.00 :	  44598b:       xor    %ebx,%ebx
     0.00 :	  44598d:       test   %r12d,%r12d
     0.00 :	  445990:       je     445999 <_D3std8date
    53.33 :	  445992:       inc    %ebx
     0.00 :	  445994:       cmp    %r12d,%ebx
     0.00 :	  445997:       jb     445992 <_D3std8date

As you can see, most of the time is spent in two loops with empty 
body, so your code is benchmarking Variant against nothing, too. 
Adding asm{ nop; } to fib changes the output to this:

[0.00437154, 0.00444938, 3.03917]

Whih is still a huge difference.