__restrict, architecture intrinsics vs asm, consoles, and other stuff

Fri Sep 23 14:40:33 PDT 2011

== Quote from Walter Bright (newshound2 at digitalmars.com)'s article
> D doesn't have __restrict. I'm going to argue that it is unnecessary. AFAIK,
> __restrict is most used in writing vector operations. D, on the other hand, has
> a dedicated vector operation syntax:
>    a[] += b[] * c;
> where a[] and b[] are required to not be overlapping, hence enabling
> parallelization of the operation.

Use of __restrict is certainly not limited to your example, it's applicable basically anywhere
that a pointer is dereferenced on either side of a write through any other pointer, or a
function call (since it could potentially do anything), the resident value from the previous
dereference is invalidated and must be reloaded needlessly unless the pointer is explicitly
marked restrict.

http://cellperformance.beyond3d.com/articles/2006/05/demystifying-the-restrict-keyword.html

For RISC architectures in particular, __restrict is mandatory when optimising certain hot
functions without making a mess of your code (declaring stack locals all over the place), and
I think I've run into cases where even that's not enough.

> D does have some intrinsics, like sin() and cos(). They tend to get added on a
> strictly as-needed basis, not a speculative one.
> D has no current intention to replace the inline assembler with intrinsics.
> As for custom intrinsics, Don Clugston wrote an amazing piece of demonstration D
> code a while back that would take a string representing a floating point
> expression, and would literally compile it (using Compile Time Function
> Execution) and produce a string literal of inline asm functions, which were then
> compiled by the inline assembler.
> So yes, it is entirely possible and practical for end users to write custom
> intrinsics.

I hadn't thought of that using compile-time functions, that's really nice.
I'm not sure if that'll be enough to generate good code in all cases, but I'll do some
experiments and see where it goes.
The main problem with writing (intelligently generated) inline asm vs using intrinsics, is in
the context of the C (or D) source code, you don't have enough context to know about the state
of the register assignment, and producing the appropriate loads/stores. Also, the opcodes
selected to perform the operation may change with context. (again, specific examples are hard
to fabricate, but I've had them consistently pop up over the years)

Also, I think someone else said that you couldn't inline functions with inline asm? Is that
correct? If so, I assume that's intended to be fixed?

> > As an extension from that, why is there no hardware vector support
> > in the language? Surely a primitive vector4 type would be a sensible
> > thing to have?
> The language supports it now (see the aforementioned vector syntax), it's just
> that the vector code gen isn't done (currently it is just implemented using loops).

Are you referring to the comment about special casing a float[4]? I can see why one might
reach for that as a solution, but it sounds like a really bad idea to me...

> > Is it possible in D currently to pass vectors to functions by value
> > in registers? Without an intrinsic vector type, it would seem
> > impossible.
> Vectors (statically dimensioned arrays) are currently passed by value (unlike C
> or C++).

Do you mean that like a memcpy to the stack, or somehow intuitively using the hardware vector
registers to pass arguments to the function properly?

> > How can I do this in a nice way in D? I'm long sick of writing
> > unsightly vector classes in C++, but fortunately using vendor
> > specific compiler intrinsics usually leads to decent code
> > generation. I can currently imagine an equally ugly (possibly worse)
> > hardware vector library in D, if it's even possible. But perhaps
> > I've missed something here?
> Your C++ vector code should be amenable to translation to D, so that effort of
> yours isn't lost, except that it'd have to be in inline asm rather than intrinsics.

But sadly, in that case, it wouldn't work. Without an intrinsic hardware vector type, there's
no way to pass vectors to functions in registers, and also, using explicit asm, you tend to
end up with endless unnecessary loads and stores, and potentially a lot of redundant
shuffling/permutation. This will differ radically between architectures too.
I think I read in another post too that functions containing inline asm will not be inlined?
How does the D compiler go at optimising code around inline asm blocks? Most compilers have a
lot of trouble optimising around inline asm blocks, and many don't even attempt to do so...

How does GDC compare to DMD? Does it do a good job?
I really need to take the weekend and do a lot of experiments I think.