Compile-time optimization

Tue Jul 23 09:21:08 PDT 2013

There seems to be a lot of improvement for programming languages 
to optimize compile time aspects that are not taken into account. 
With ctfe I think such optimizations are more and more relevant 
in D.

I'll give a simple example:

Take a standard string joining function:

string join(string[] s)
{
     string ss;
     foreach(sss; s) ss ~= sss;
     return ss
}

when this string function is called at run-time with literal 
arguments it is extremely inefficient:

join(["a", "b", "c"]);

because the result can obviously be computed at compile-time by 
the compiler.

using ctfe can solve this problem but only when used in const 
results.

But there should be no reason why join(["a", "b", "c"]) can't be 
optimized at compile time so that the function call is replaced 
with "abc".

Now, think of join as defined like

string join(T...)(T t);

and we call it like

join("a", s, "b", "c", ss);

where s is string that is compile time evaluable and ss is a 
run-time string array.

the optimal join code would be something like

r = "a"~s~"bc"
foreach(sss; ss) r ~= sss;

(and which would change depending on the argument types and if 
they are compile time evaluable)

To generate such "optimal" code(one that gets the compiler to do 
as much work as it can at compile time) is somewhat convoluted in 
D. We can use string mixins to solve the problem but not easily 
because we can't pass variadic arguments to templates.

e.g.,

join(T...)(T s)
{
     mixin(tJoinH!(s))
}

fails because s is not known at compile time... even if s is 
partially known.

It would be nice if we could pass the arguments to a template 
even if they are not completely known at compile time.

e.g., join("a", s),

tJoinH can receive "a" and s but s's value, if not compile-time 
known, is undefined.

Hence, tJoinH can optimize the arguments by attempting to join 
the compile-time arguments and insert optimal code for the 
run-time arguments.

That is, if we call a function with an argument list, we should 
be able to pass that to a template without any errors and be able 
to determine the value of each argument... if the argument is 
undefined, then that becomes it's value.

Another way to see it work is with sum:

sum(1,2,x);

int sum(T...)(T i)
{
     // return 1 + 2 + i[3]; for the call above, it is optimal and 
our goal to generate
     // return mixin(tSumH!(i)); // hypothetically generates 1 + 2 
+ i[3] for the call above
     // return mixin(tSumH2!(i)); // generates i[1] + i[2] + i[3]. 
Current possible way

     // naive approach(assumes all arguments are known only at 
run-time):
     int y;
     foreach(ii; i) y += ii;
     return y;
}

I'm able to achieve this partially by passing T, rather than i, 
to the template function and the variable name, i. This works but 
may not produce optimal results(referencing the stack instead of 
the value of a literal... which prevents it from being further 
optimized).

e.g., from above, instead of return 1 + 2 + i[3]; I would 
generate return i[1] + i[2] + i[3]; Not optimal because i[1] and 
i[2] are treated as run-time entities even though in the above 
case they are not.

If D had some model to create functions that can optimize the way 
they deal with their compile-time arguments and run-time 
arguments then it would be pretty easy to create such functions.

e.g.,

string join(T...)(T t)    // Note join has a variadic parameter
{
     ctfe {
          // called when join is used in ctfe, t can be used to 
get the values directly if they exist at compile time, else their 
value is undefined.
          // can call join here with run-time entities

     }

     // runtime version goes here
}

we might have something like this

string join(T...)(T t) // easy notation to constrain T on char, 
string, string[]
{
     ctfe
     {
         string code = "string s;";
         foreach(k, val; t)
         {
             if (isDefined!val)
                 code ~= `s ~= "`~join(val)~`";`; // calls join as 
a ctfe on val.
             else
             {
                 if (type == string[])
                     code ~= `foreach(ss; t[`~k~`]) s~= sss;`;
                 } else code ~= `s ~= t[`~k~`];`;
             }
         }
         code ~= "return s;";
         mixin code;
     }

     // concatenates all arguments of t as if they were run-time 
arguments.
     string s;
     foreach(k, ss; t)
         static if (T[k].type == string[])
             foreach(sss; ss)
                 s ~= sss;
         else
             s ~= ss;
     return s;
}

I'm not saying this works but the idea is that the ctfe block is 
executed at compile time WHEN there are non-compile time 
arguments(ok, maybe ctfe isn't the best keyword). The block then 
figures out how to best deal with the non-ct arguments.

In the above code note how join is called in the ctfe block to 
handle known compile time types. This is just standard ctfe 
execution. The real "magic" happens when it is dealing with 
non-ct arguments, in which case it builds up meta code which is 
used as a mixin(special statement at end of ctfe block) that is 
inserted.

So for join("a", s); the code would execute something like

at compile time(in ctfe block)
     code = `string s;`;
     "a" is defined so
     code ~= `s ~= "~join("a")~`";`;
         join("a") is called, since "a" is known, the non-ctfe 
block is called, so we actually have
     code ~= `s ~= "a";`;
  next for s(which is undefined at compile time)
     code ~= `s ~= t[1];`;
  end
     code ~= "return s;";
so the string mixin code looks like:

     string s;
     s ~= "a";
     s ~= t[1];
     return s;

or simplified,

    return "a"~t[1];

which is an optimal join code for the join call join("a", s);

(it would make more sense to call a template in the ctfe block so 
we don't have form code strings of code strings)

Note also that having such a ctfe block does not break existing 
code structures and pre existing functions can easily be upgraded 
for optimal behavior.

Having the compiler do as much optimization as possible should 
produce a significant speedup in most applications.

When a function is called with any known compile time entities, 
it is possible for an optimization to take place. I think if D 
incorporates some mechanism in it's language to do so it will go 
a long.

In fact, all this simply demonstrates that ctfe's are not as 
powerful as they can be. Calling a ctfe'able function with just 
one run-time variable will prevent it from being ctfe'ed and 
optimized completely... even though there is still some 
possibility for optimization.

Hopefully some will read this acutely and attempt to ponder its 
potential.  I expect the immediate naysayers who can't see past 
their own noses... and those that get hung up on non-essential 
details(I said that the block doesn't have to be called ctfe... 
oh hell, you do realize we don't even need to use a block?).