How to override opCmp when two different sorts are needed?

Wed Aug 15 04:33:36 PDT 2007

Regan Heath wrote:
> Deewiant wrote:
>> I'd imagine it's quite hard to inline a delegate or function pointer.
>> In any
>> case, the current optimizer is a joke in many ways.
> 
> True about delegates/fpointers I reckon.
> 
> I don't know why people keep hating on the optimizer.  Isn't it the same
> optimizer used in DMC, a C compiler with 25 odd years of history?  Sure,
> it's not 'optimized' for D specifically but it still holds it's own in
> shootout results in the general sense, right?
> 

It's not _bad_ per se. I understand the problem is difficult and that Walter's
doing it on his own, but I can't help but drool over the thought of an "Intel D
Compiler".

The main problems with the Digital Mars optimizer, IMO, are as follows.

(Disclaimer: I understand that languages have to cater for many different
environments and some of the following may only speed things up on my computer,
while slowing things down on others. Still, there are ample resources on the
'Net regarding the performance of practically all parts of various processors.)

- Almost no D support, as you say. For instance, slices:

array[x .. x + y]
array[x..$][0..y]
array[0..y][x..$]
(&array[x])[0..y]

The above forms are equivalent, yet generate very different code. I feel that
the middle two are clearer expressions of "take y elements starting from x" but
they result in very slow code: it's actually the last one, using a pointer,
that's the fastest.

- Can't inline functions containing assembly language. This is very annoying: if
you have a fast hand-coded micro-optimization you have to manually inline the
function everywhere you use it. Otherwise, you tend to end up creating much
slower code because the function isn't inlined.

- No support for instruction sets beyond the basic x86. (MMX, SSE, 3DNow!, etc.)

- In general, can't (or only rarely can) do stuff like loop unrolling,
elimination of branching/jumps, and "loop invariant code motion" which is the
fancy term for moving something like "x * y * z + a / b - c" from a loop which
just sets all array elements to that value.

It's really annoying to find that writing something as simple as "int tmp = x*2"
prior to a loop and using tmp instead of x*2 can sometimes result in noticeable
speedups.

Some things which might just be me expecting too much, I don't know how good
other optimizers are at this:

- Poor execution path analysis. For instance:

for (size_t i = 5; i < maximum_i; ++i) {

	for (size_t j = i; j >= 5; --j)
		// do stuff
}

Here, the inner loop tests for j >= 5. Since j = i, and i goes from 5 up to
maximum_i, the test will always be true. Hence the test could be eliminated for
the first iteration, making the inner loop equivalent to:

size_t j = i;
do {
	// stuff
} while (j-- >= 5);

Another example: assume the loop's contents are simply:

array[j-1] = array[j];

Now, two subtractions are performed. First to get j-1, and then to decrement j
in the while test. The value could be cached, which may speed things up.

(Manually optimizing code combining the above two examples lead to a measurable
speedup on a sort algorithm I was benchmarking yesterday.)

- Doesn't do algebra very well (includes integer, floating point, boolean, and
bit vector algebra). May also be related to the above. For instance:

int x = -y;
int a = x;
int b = a + y;

Here, b is set to "a + y", equivalent to "x + y", equivalent to "-y + y",
equivalent to 0. This kind of stuff has to be caught manually.

Optimizing_cpp.pdf over at http://www.agner.org/optimize/ compares the
optimizations applied by C++ compilers in section 7.2, and it includes DMC. It's
a bit unfair in that the compilers are of different ages, but I don't think
there's been that much progress in any of the compilers anyway.

> Instead of limiting it to opCall why not define an Interface 'Sortable'
> and define a compare method which takes a mode flag and "Sortable rhs"
> or similar.
> 

If you just want to sort basic types with varying orderings, wrapping them in a
class is a waste of memory and CPU.

Currently, the routine takes any callable type: something with opCall, or a
function pointer or delegate. The idea is to allow the option of
ascending/descending because those are the two most common cases and result in
major reduction of execution time.

-- 
Remove ".doesnotlike.spam" from the mail address.