Help with Template Code
Max Samukha
samukha at voliacable.com
Mon Apr 2 04:22:44 PDT 2007
On Mon, 02 Apr 2007 01:24:03 +0200, Frits van Bommel
<fvbommel at REMwOVExCAPSs.nl> wrote:
>Max Samukha wrote:
>> On Sun, 01 Apr 2007 22:19:03 +0200, Frits van Bommel
>> <fvbommel at REMwOVExCAPSs.nl> wrote:
>[snip]
>>> With optimizations it just moves mem->reg, reg->mem. It generates code
>>> bit-for-bit identical to:
>>> ---
>>> static S opCall(int x_, float y_, char[] z_) {
>>> S s = void;
>>> s.x = x_;
>>> s.y = y_;
>>> s.z = z_;
>>> return s;
>>> }
>>> ---
>>> for the version Max posted (with =void)
>>>
>>> (The only difference is the mangled name; the mixin name is in there for
>>> the mixed-in version)
>> When compiling on Win XP with dmd 1.010 using -O -inline -release, the
>> time difference is more than 40%. The source is this:
>>
>[snip]
>>
>> What am I doing wrong?
>
>For one thing, clock() isn't exactly accurate. Also, you didn't mention
>how many times you ran the test (the first time will likely take longer
>because the program has to be loaded into cache first, it's best to run
>it a couple of times before looking at the results)
I was running it over and over again. You are right, of course. I
shouldn't have run those silly speed tests at all. The disassembly is
a D programmer's friend:).
>
>However, those don't seem to be the issue here.
>Looking at the generated assembly I see that while the functions compile
>to the exact same thing (as I mentioned in my previous post), DMD
>doesn't seem to inline the mixed-in version :(...
>(You can also see this without inspecting the generated code: if you
>leave off -inline from the command line the two versions take the same
>amount of time, at least on my computer)
>
>So it would seem there's no way to get the mixed-in version to equal
>speed simply because it won't be inlined by DMD...
>
>Note: GDC (with -O3 -finline) doesn't seem to have this problem. In
>fact, I had to add some code so it doesn't optimize out the entire loop
>:P. Even then, the code seems to be identical and (unsurprisingly) runs
>just as fast.
>
>
>P.S. I performed these tests on Linux (amd64). Another fun fact: the
>GDC-compiled version ran about twice as fast as the fastest DMD-compiled
>one. I think that my GDC being set up to generate 64-bit code may have
>had something to do with this though, so it's not really a fair
>comparison of the optimizers in the compilers. (Unless you count
>generating 64-bit code for 64-bit processors as an optimization ;) )
It seems like dmd is not going to support 64 bit processors in the
foreseeable future (stdio super-performance seems to be the priority)
More information about the Digitalmars-d-learn
mailing list