Help with Template Code

Max Samukha samukha at voliacable.com
Sun Apr 1 15:21:19 PDT 2007


On Sun, 01 Apr 2007 22:19:03 +0200, Frits van Bommel
<fvbommel at REMwOVExCAPSs.nl> wrote:

>Jarrett Billingsley wrote:
>> "Max Samukha" <samukha at voliacable.com> wrote in message 
>> news:nmmv03h5g5mtbkn6hetbnu33ei6nnnhd67 at 4ax.com...
>>> I thought it should, too. But when tested on Windows with dmd 1.010,
>>> the tuple version is significantly slower. I'm still not sure why.
>> 
>> Ahh, looking at the disassembly it makes sense now.  What happens is that 
>> when you write:
>> 
>> foreach(i, arg; args)
>>     t.tupleof[i] = arg;
>> 
>> It gets turned into something like _this_:
>> 
>> typeof(args[0]) arg0 = args[0];
>> t.tupleof[0] = arg0;
>> typeof(args[1]) arg1 = args[1];
>> t.tupleof[1] = arg1;
>> typeof(args[2]) arg2 = args[2];
>> t.tupleof[2] = arg2;
>> 
>> Notice it copies the argument value into a temp variable, then that temp 
>> variable into the struct.  Very inefficient.
>> 
>> Unfortunately I don't know of any way to get around this..
>
>Yes, DMD does that, *unless you turn on optimizations* ;).
>Measuring performance without optimization switches is pretty much useless.
>
>With optimizations it just moves mem->reg, reg->mem. It generates code 
>bit-for-bit identical to:
>---
>     static S opCall(int x_, float y_, char[] z_) {
>	    S s = void;
>	    s.x = x_;
>	    s.y = y_;
>	    s.z = z_;
>	    return s;
>     }
>---
>for the version Max posted (with =void)
>
>(The only difference is the mangled name; the mixin name is in there for 
>the mixed-in version)
When compiling on Win XP with dmd 1.010 using -O -inline -release, the
time difference is more than 40%. The source is this:

import std.stdio;
import std.c.time;

template StructCtor()
{
     static typeof(*this) opCall(typeof(typeof(*this).tupleof) args)
     {
         typeof(*this) t = void;

         foreach(i, arg; args)
             t.tupleof[i] = arg;

         return t;
    }
}


struct Bar
{
    int x;
    int y;
    int z;

    //mixin StructCtor;

    static Bar opCall(int x, int y, int z)
    {
        Bar result = void;
        result.x = x;
        result.y = y;
        result.z = z;
        return result;
    }
}

void main()
{
    auto c = clock();
    for (int i = 0; i < 100000000; i++)
    {
        auto test = Bar(i, i, i);
    }
    writefln(clock() - c);
}

What am I doing wrong?



  


More information about the Digitalmars-d-learn mailing list