Help with Template Code

Frits van Bommel fvbommel at REMwOVExCAPSs.nl
Sun Apr 1 16:24:03 PDT 2007


Max Samukha wrote:
> On Sun, 01 Apr 2007 22:19:03 +0200, Frits van Bommel
> <fvbommel at REMwOVExCAPSs.nl> wrote:
[snip]
>> With optimizations it just moves mem->reg, reg->mem. It generates code 
>> bit-for-bit identical to:
>> ---
>>     static S opCall(int x_, float y_, char[] z_) {
>> 	    S s = void;
>> 	    s.x = x_;
>> 	    s.y = y_;
>> 	    s.z = z_;
>> 	    return s;
>>     }
>> ---
>> for the version Max posted (with =void)
>>
>> (The only difference is the mangled name; the mixin name is in there for 
>> the mixed-in version)
> When compiling on Win XP with dmd 1.010 using -O -inline -release, the
> time difference is more than 40%. The source is this:
> 
[snip]
> 
> What am I doing wrong?

For one thing, clock() isn't exactly accurate. Also, you didn't mention 
how many times you ran the test (the first time will likely take longer 
because the program has to be loaded into cache first, it's best to run 
it a couple of times before looking at the results)

However, those don't seem to be the issue here.
Looking at the generated assembly I see that while the functions compile 
to the exact same thing (as I mentioned in my previous post), DMD 
doesn't seem to inline the mixed-in version :(...
(You can also see this without inspecting the generated code: if you 
leave off -inline from the command line the two versions take the same 
amount of time, at least on my computer)

So it would seem there's no way to get the mixed-in version to equal 
speed simply because it won't be inlined by DMD...

Note: GDC (with -O3 -finline) doesn't seem to have this problem. In 
fact, I had to add some code so it doesn't optimize out the entire loop 
:P. Even then, the code seems to be identical and (unsurprisingly) runs 
just as fast.


P.S. I performed these tests on Linux (amd64). Another fun fact: the 
GDC-compiled version ran about twice as fast as the fastest DMD-compiled 
one. I think that my GDC being set up to generate 64-bit code may have 
had something to do with this though, so it's not really a fair 
comparison of the optimizers in the compilers. (Unless you count 
generating 64-bit code for 64-bit processors as an optimization ;) )


More information about the Digitalmars-d-learn mailing list