Exact arithmetic with quadratic irrationals

Wed Apr 19 13:18:55 PDT 2017

On Wed, Apr 19, 2017 at 07:54:02PM +0000, Stanislav Blinov via Digitalmars-d wrote:
> Awesome! Congrats and thanks for sharing.
> 
> On Wednesday, 19 April 2017 at 19:32:14 UTC, H. S. Teoh wrote:
> 
> > Haha, it seems that the only roadblocks were related to the
> > implementation quality of std.numeric.gcd... nothing that a few
> > relatively-simple PRs couldn't fix.  So overall, D is still awesome.
> 
> There's another one, which is more about dmd: dmd does not inline gcd,
> which, when arguments are const, turns gcd into a double function call
> :D

If I weren't such a sucker for the bleeDing edge with dmd (I actually
compile even my serious projects with git HEAD dmd, except when
performance matters), I'd be standardizing on ldc or gdc, which have far
superior optimizers.

I consistently find that my CPU-intensive projects perform at least
20-30% worse on dmd than gdc (and I presume ldc), sometimes even as bad
as 40-50%, due to dmd's inliner giving up far too easily. I don't know
if this has been fixed yet, but the last time I checked, if you wrote
this:

	int func(int x, int y) {
		if (x<0)
			return y;
		return x;
	}

instead of:

	int func(int x, int y) {
		if (x<0)
			return y;
		else
			return x;
	}

then the dmd inliner would not inline the function.

Because of sensitivities like this, the inliner gives up far too early
in the process, thus the optimizer misses out on further optimization
opportunities that would have opened up, had the function been inlined.

The last time I checked, I also found that dmd was rather weak at loop
optimizations (and loops are very important in performance as we all
know) compared to gdc. Again, the same domino effect (or rather, the
missing thereof) applies: by failing to, for example, hoist a constant
expression out of the loop, further optimization opportunities are
missed, whereas gdc, after hoisting the expression out, would discover
that the loop can be reduced further, perhaps via a strength reduction,
and then unrolled, and then inlined inside the caller, then vectorized,
etc.. This chain of optimizations were missed because of one missed
optimization early in the process. Hence the suboptimal resulting code.

T

-- 
If blunt statements had a point, they wouldn't be blunt...