Efficient idiom for fastest code

Nicholas Wilson iamthewilsonator at hotmail.com
Wed May 23 03:00:17 UTC 2018


On Wednesday, 23 May 2018 at 02:24:08 UTC, IntegratedDimensions 
wrote:
>
> Many times in expensive loops one must make decisions. 
> Decisions must be determined and the determination costs.
>
> for(int i = 0; i < N; i++)
> {
>     if(decision(i)) A; else B;
> }
>
> the if statement costs N times the cycle cost.
>
> In some cases the decision holds for continuous ranges. For 
> some 0 <= n <= N the decision is constant, but n is 
> arbitrary(determined by unknown factors at compile time).
>
> One can speed up the routine by using something akin to a 
> simplified strategy pattern where one uses 
> functions/delegates/lambdas to code a faster version without 
> the test:
>
>
> for(int i = 0; i < N; i++)
> {
>     d = (){ if(decision(i)) A; else d = () { B; };
>     d();
> }
>

assuming you meant
> auto d = (int i){ if(decision(i)) A; else d = (int i) { B; };};
> for(int i = 0; i < N; i++)
> {
>     d(i);
> }
>

> this code basically reduces to
>
> for(int i = 0; i < N; i++)
> {
>     B;
> }
>
> Once the decision fails, which we assume once it fails it 
> always fails in this particular case.
>
> Therefor, once the decision fails it kicks in the "faster" 
> code. Suppose decision is very slow.
>
> The cycle cost is then n times rather than N times.
>
> In general, such techniques can be used to speed up code, 
> sometimes significantly, but are difficult to implement using a 
> standard pattern and for more complex decision functions.
>
>
> Does anyone see any way to use some some of D's power to help 
> simplify such things but not using something like a strategy 
> pattern or complexifying the overall design(using advanced 
> techniques such as templates, mixins is not making the code 
> more complex).

If decision is pure, consider using static foreach 
(iota(N).map!decision) { ... }to unroll it at compile time. even 
if it isn't the compiler may still be able to factor out parts of 
repeated calls to decision, look at the assembly/IR to confirm 
this.

Otherwise PROFILE!!!!! (and use profile guided optimisation! this 
implies using a compiler other than DMD which if you care about 
performance you should be doing anyway)

Blindly trying to optimise is just as likely to hurt performance.
in particular don't underestimate the branch predictor. Even if 
decision is complex, if there is a pattern in evaluating 
iota(n).map!decision the branch predictor will pick up on it.

In terms of exploiting knowledge of decision a priori that the 
compiler somehow lacks you really don't have much option but to 
do it yourself.




More information about the Digitalmars-d-learn mailing list