System programming in D (Was: The God Language)
Peter Alexander
peter.alexander.au at gmail.com
Thu Jan 5 15:10:13 PST 2012
On 5/01/12 7:41 PM, Sean Kelly wrote:
> On Jan 5, 2012, at 10:02 AM, Manu wrote:
>>
>> That said, this is just one of numerous issues myself and the OP raised. I don't know why this one became the most popular for discussion... my suspicion is that is because this is the easiest of my complaints to dismiss and shut down ;)
>
> It's also about the only language change among the issues you mentioned. Most of the others are QOI issues for compiler vendors. What I've been curious about is if you really have a need for the performance that would be granted by these features, or if this is more of an idealistic issue.
It's not idealistic. For example, in my current project, I have a 3x
perf improvement by rewriting that function with a few hundred lines of
inline asm, purely to use SIMD instructions.
This is a nuisance because:
(a) It's hard to maintain. I have to thoroughly document what registers
I'm using for what just so that I don't forget.
(b) Difficult to optimize further. I could optimize the inline assembly
further by doing better scheduling of instructions, but instruction
scheduling naturally messes up the organization of your code, which
makes it a maintenance nightmare.
(c) It's not cross platform. Luckily x86/x86_64 are similar enough that
I can write the code once and patch up the differences with CTFE +
string mixins.
I know other parts of my code that would benefit from SIMD, but it's too
much hassle to write and maintain inline assembly.
If we had support for
align(16) float[4] a, b;
a[] += b[]; // addps on x86
Then that would solve a lot of problems, but only solves the problem
when you are doing "float-like" operations (addition, multiplication
etc.) There's no obvious existing syntax for doing things like shuffles,
conversions, SIMD square roots, cache control etc. that would naturally
match to SIMD instructions.
Also, there's no way to tell the compiler whether you want to treat a
float[4] as an array or a vector. Vectors are suited for data parallel
execution whereas array are suited for indexing. If the compiler makes
the wrong decision then you suffer heavily.
Ideally, we'd introduce vector types, e.g. vec_float4, vec_int4,
vec_double2 etc.
These would naturally match to vector registers on CPUs and be aligned
appropriately for the target platform.
Elementary operations would match naturally and generate the code you
expect. Shuffling and other non-elementary operations would require the
use of intrinsics.
// 4 vector norms in parallel
vec_float4 xs, ys, zs, ws;
vec_float4 lengths = vec_sqrt(xs * xs + ys * ys + zs * zs + ws * ws);
On x86 w/SSE, this would ideally generate:
// assuming xs in xmm0, ys in xmm1 etc.
mulps xmm0, xmm0;
mulps xmm1, xmm1;
addps xmm0, xmm1;
mulps xmm2, xmm2;
addps xmm0, xmm2;
mulps xmm3, xmm3;
addps xmm0, xmm3;
sqrtps xmm0, xmm0;
On platforms that don't support the vector types natively, there's two
options (1) compile error, (2) compile, replacing them with float ops.
I think this is the only sensible way forward.
More information about the Digitalmars-d
mailing list