Mind the Ninja Gap
bearophile
bearophileHUGS at lycos.com
Sun Aug 11 10:42:02 PDT 2013
I have found an excellent paper, written recently, that I don't
remember being discussed here:
"Can Traditional Programming Bridge the Ninja Performance Gap for
Parallel Computing Applications?", by Nadathur Satish et. al.
(2012):
http://software.intel.com/sites/default/files/article/386514/isca-2012-paper.pdf
If that link doesn't work, this is an alternative link to a
document with a worse formatting:
http://www.intel.it/content/dam/www/public/us/en/documents/technology-briefs/intel-labs-closing-ninja-gap-paper.pdf
The "Ninja Gap" is the performance gap between mostly-numerical
code written by experts compared to the same code written in a
traditional style by lesser programmers. The Intel Labs
researchers of this paper show that such gap is widening along
the time, and they also show simple means to bridge most of this
gap using modern compilers and some standard coding practices
(with such practices the gap becomes just about 1.3X performance
difference, that is often acceptable. While a gap of 50X is not
good).
In my opinion numerical code is an important usage for the D
language, and I think D should _help_ the programmer bridge as
much of that ninja gap without resorting to actual ninja-level
coding and the use lot of inline asm. So in my opinion this is an
important paper.
Here I discussed a bit related matters, regarding the (very
interesting and probably worth partially copying) ISPC compiler:
http://forum.dlang.org/post/hwsjzlxystpymnvfxutp@forum.dlang.org
The paper presents several benchmarks, and for each one show how
to brige most of the gap using some standard means. Then at page
7 they write a summary of such strategies. Such strategies and
means should be available in Phobos (or where not possible as D
built-ins). (Example: Phobos should offer a simple data structure
to perform the change from Array-Of-Structures (AOS) to
Structure-Of-Array (SOA). I think this is not too much hard to
implement, it's not too much lines of code, and it's useful in
many situations, so I think it's a candidate for Phobos inclusion
once someone implements it.)
Regarding D itself, I think it already has most of the needed
features, but it should use them well/better. An example is
visible in the missing vectorized comparisons (an example using
the CILK plus array notation):
T tmp[S];
tmp[0:S] = A[0:S] * k - B[0:S];
if (tmp[0:S] > 5.0f) {
tmp[0:S] = tmp[0:S] * sin(B[0:S]);
}
(For me it's surprising how large firms as Intel and Microsoft
keep inventing nice ideas, implementing them writing compilers
and tools, and 99% of those things get ignored by most people and
end being forgotten. Being a technological researcher in such
firms seems a sad work).
The optimization examples shown in this paper sometimes use
algorithmic changes, that are beyond what D/Phobos is expected to
do automatically. But a language can help in other ways. This is
a simple O(n^2) loop pair for a N-body benchmark discussed in
this paper (there are approximate algorithms to solve the N-body
problem, but this benchmark uses the simple algorithm):
foreach (immutable i, const ref b1; bodies)
foreach (const ref b2; bodies)
result[i] += compute(b1, b2);
A numerics-oriented language could help the programmer write code
like this that improves the performance blocking the loop:
foreach (immutable size_t j; iota(0, bodies.length, blockSize))
foreach (immutable i, const ref b1; bodies)
foreach (const ref b2; bodies[j .. min($, j + blockSize)])
result[i] += compute(b1, b2);
So a language good for heavy numerical processing has to help the
programmer use the standard tricks presented in this paper in
simple clean ways.
Eventually I'll try to implement the AOS-to-SOA data structure.
But before being considered for inclusion in Phobos I think it
should be used for some time in real code, assuming some D
programmers are interested in writing "numeric processing"-style
code.
Bye,
bearophile
More information about the Digitalmars-d
mailing list