Help with Template Code

Max Samukha samukha at voliacable.com
Mon Apr 2 04:22:44 PDT 2007


On Mon, 02 Apr 2007 01:24:03 +0200, Frits van Bommel
<fvbommel at REMwOVExCAPSs.nl> wrote:

>Max Samukha wrote:
>> On Sun, 01 Apr 2007 22:19:03 +0200, Frits van Bommel
>> <fvbommel at REMwOVExCAPSs.nl> wrote:
>[snip]
>>> With optimizations it just moves mem->reg, reg->mem. It generates code 
>>> bit-for-bit identical to:
>>> ---
>>>     static S opCall(int x_, float y_, char[] z_) {
>>> 	    S s = void;
>>> 	    s.x = x_;
>>> 	    s.y = y_;
>>> 	    s.z = z_;
>>> 	    return s;
>>>     }
>>> ---
>>> for the version Max posted (with =void)
>>>
>>> (The only difference is the mangled name; the mixin name is in there for 
>>> the mixed-in version)
>> When compiling on Win XP with dmd 1.010 using -O -inline -release, the
>> time difference is more than 40%. The source is this:
>> 
>[snip]
>> 
>> What am I doing wrong?
>
>For one thing, clock() isn't exactly accurate. Also, you didn't mention 
>how many times you ran the test (the first time will likely take longer 
>because the program has to be loaded into cache first, it's best to run 
>it a couple of times before looking at the results)

I was running it over and over again. You are right, of course. I
shouldn't have run those silly speed tests at all. The disassembly is
a D programmer's friend:).
>
>However, those don't seem to be the issue here.
>Looking at the generated assembly I see that while the functions compile 
>to the exact same thing (as I mentioned in my previous post), DMD 
>doesn't seem to inline the mixed-in version :(...
>(You can also see this without inspecting the generated code: if you 
>leave off -inline from the command line the two versions take the same 
>amount of time, at least on my computer)
>
>So it would seem there's no way to get the mixed-in version to equal 
>speed simply because it won't be inlined by DMD...
>
>Note: GDC (with -O3 -finline) doesn't seem to have this problem. In 
>fact, I had to add some code so it doesn't optimize out the entire loop 
>:P. Even then, the code seems to be identical and (unsurprisingly) runs 
>just as fast.
>
>
>P.S. I performed these tests on Linux (amd64). Another fun fact: the 
>GDC-compiled version ran about twice as fast as the fastest DMD-compiled 
>one. I think that my GDC being set up to generate 64-bit code may have 
>had something to do with this though, so it's not really a fair 
>comparison of the optimizers in the compilers. (Unless you count 
>generating 64-bit code for 64-bit processors as an optimization ;) )

It seems like dmd is not going to support 64 bit processors in the
foreseeable future (stdio super-performance seems to be the priority)


More information about the Digitalmars-d-learn mailing list