std.variant benchmark
jerro
a at a.com
Sun Jul 29 11:57:18 PDT 2012
On Sunday, 29 July 2012 at 14:43:09 UTC, Dmitry Olshansky wrote:
> On 29-Jul-12 18:17, Andrei Alexandrescu wrote:
>> On 7/29/12 8:17 AM, Gor Gyolchanyan wrote:
>>> std.variant is so incredibly slow! It's practically unusable
>>> for
>>> anything, which requires even a tiny bit of performance.
>>
>> You do realize you actually benchmark against a function that
>> does
>> nothing, right? Clearly there are ways in which we can improve
>> std.variant to the point initialization costs assignment of
>> two words,
>> but this benchmark doesn't help. (Incidentally I just prepared
>> a class
>> at C++ and Beyond on benchmarking, and this benchmark makes a
>> lot of the
>> mistakes described therein...)
>>
>>
>> Andrei
>
>
> This should be more relevant then:
>
> //fib.d
> import std.datetime, std.stdio, std.variant;
>
> auto fib(Int)()
> {
> Int a = 1, b = 1;
> for(size_t i=0; i<100; i++){
> Int c = a + b;
> a = b;
> b = c;
> }
> return a;
> }
>
> void main()
> {
> writeln(benchmark!(fib!int, fib!long, fib!Variant)(10_000));
> }
>
>
> dmd -O -inline -release fib.d
>
> Output:
>
> [TickDuration(197), TickDuration(276), TickDuration(93370107)]
>
> I'm horrified. Who was working on std.variant enhancements?
> Please chime in.
I thought this results are a bit strange, so I converted the
result to seconds. This gave me:
[3.73e-06, 3.721e-06, 2.97281]
One million inner loop iterations in under 4 microseconds? My
processor's frequency isn't measured in THz, so something strange
must be going on here. In order to find out what it was, I
changed the code to this:
writeln(benchmark!(fib!int, fib!long)(1000_000_000)[]
.map!"a.nsecs() * 1.0e-9");
and used a profiler on it. The relevant part of the output is:
0.00 : 445969: test %r12d,%r12d
0.00 : 44596c: je 445975 <_D3std8date
46.67 : 44596e: inc %ebx
0.00 : 445970: cmp %r12d,%ebx
0.00 : 445973: jb 44596e <_D3std8date
0.00 : 445975: lea -0x18(%rbp),%rdi
0.00 : 445979: callq 45a048 <_D3std8date
0.00 : 44597e: mov %rax,0x0(%r13)
0.00 : 445982: lea -0x18(%rbp),%rdi
0.00 : 445986: callq 459fb4 <_D3std8date
0.00 : 44598b: xor %ebx,%ebx
0.00 : 44598d: test %r12d,%r12d
0.00 : 445990: je 445999 <_D3std8date
53.33 : 445992: inc %ebx
0.00 : 445994: cmp %r12d,%ebx
0.00 : 445997: jb 445992 <_D3std8date
As you can see, most of the time is spent in two loops with empty
body, so your code is benchmarking Variant against nothing, too.
Adding asm{ nop; } to fib changes the output to this:
[0.00437154, 0.00444938, 3.03917]
Whih is still a huge difference.
More information about the Digitalmars-d
mailing list