std.variant benchmark

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Sun Jul 29 19:01:12 PDT 2012


On 7/29/12 4:11 PM, Dmitry Olshansky wrote:
> On 29-Jul-12 22:59, Andrei Alexandrescu wrote:
>> On 7/29/12 10:43 AM, Dmitry Olshansky wrote:
>>> I'm horrified. Who was working on std.variant enhancements? Please chime
>>> in.
>>
>> I guess you just volunteered! When I looked at it this morning I noticed
>> a few signs of bit rot, e.g. opAssign returns by value and such. (Only
>> planting a "ref" there improves performance a good amount.)
>>
> Strange, but numbers stay the same for me.

Depends on what you measure.

>> Variant has a simple design with (in case of int) an int and a pointer
>> to a function. Many of its operations incur an indirect call through
>> that pointer. This makes operations slower than the time-honored design
>> of using an integral tag and switching on it, but offers in return the
>> ability to hold any type without needing to enumerate all types
>> explicitly.
>>
>
> Well obviously integers do not take lightly when they are copied around
> with memcpy like some pompous arrays that alone incurs *some* penalty.

Yah, when I replaced memcpy with simple assignment for Variant<int> I 
saw some improvements.

> In fact memcpy could and should be replaced with word by word copy for
> almost all of struct sizes up to ~32 bytes (as size is known in advance
> for this particular function pointer i.e. handler!int).

In fact memcpy should be smart enough to do all that, but apparently it 
doesn't.

> I've found something far more evil:
[snip]
> So to boot in order to get to int + int _detected_ it needs 4 calls to
> fptr(OpID.testConversion, null, &info) == 0 and acquire TypeInfo each
> time. Not counting 2 gets to obtain result. It's just madness.

Heh, this all gives me a new, humbling perspective on my earlier self 
running my mouth about other designs :o). As always, history and context 
makes one much more empathic.

Variant has a very old design and implementation. It is, in fact, the 
very first design I've ever carried in D; I'd decided, if I can't do 
that, D isn't powerful enough. The compiler was so buggy back then, 
implementation needed to take many contortions, and it was a minor 
miracle the whole house of cards stood together.

The approach was to design for usability first, performance could come 
later. So design only had to not prevent good performance from being 
improved transparently later, which I think it has achieved.

The basic approach was to use a pointer to function (instead of an 
integral) as a discriminator. The pointer to function was taken from the 
instantiation of a template function. That means Variant could work with 
any type, even types not yet defined. I'd made it a clear goal that I 
must not enumerate all types in one place, or "register" types for use 
with Variant - that would have been failure. Using a pointer to template 
function naturally associated a distinct tag with each type, 
automatically. I think it was a good idea then, and it's a good idea now.

On this data layout functionality can be built in two ways:

(a) Implement it inside the dispatcher function template, or
(b) Test against it "outside", in the template Variant code.

The first approach is more general, the second is faster. Right now 
Variant uses (a) almost exclusively. It's time to start doing (b) for 
more types.

> Plain hardcoded Type to integer switch seems so much better now.

But we can do that now.

>> We can use the pointer to function as a tag for improving performance on
>> primitive types.
>>
>
> Indeed we can though it won't be as fast as simple tag - pointers
> have quite big and sparse addresses giving us the same as:
>
> if(ptr == handler!int)
> ....
> else if(ptr == handler!uint)
> ....
> else
> ....

Yes, and what we gain is speed for select known types. As long as the 
number of those is relatively small, using simple test and branch should 
serve us well. And we get to keep the generality, too.

> Another way to go is make 2-d function table for all built-ins and all
> operations on them. It's in fact a bunch of tables:
> Type1 x Type2 for all of important basic operations. Tables are static
> so it won't be much of a problem. At least int + int then would be
> 1 memory read, 1 call, 2 (casted as needed) reads & 1 store.

I think that's not worth it, but hey, worth a try.


Thanks,

Andrei




More information about the Digitalmars-d mailing list